Perturbation Theory for Variational Inference
|
|
- Lindsey Heath
- 6 years ago
- Views:
Transcription
1 Perturbation heor for Variational Inference Manfred Opper U Berlin Marco Fraccaro echnical Universit of Denmark Ulrich Paquet Apple Ale Susemihl U Berlin Ole Winther echnical Universit of Denmark Abstract he variational approimation is known to underestimate the true variance of a probabilit measure, like a posterior distribution. In latent variable models whose parameters are permutation-invariant with respect to the likelihood, the variational approimation might self-prune and ignore its parameters. We view the variational free energ and its accompaning evidence lower bound as a first-order term from a perturbation of the true log partition function and derive a power series of corrections. We sketch b means of a variational matri factorization eample how a further term could correct for predictions from a self-pruned approimation. Introduction We consider the probabilit measure p() = µ()e H() Z on a random variable. We are often concerned with the computation of the negative log partition function log Z = log dµ() e H(), or of epectations Ef() = dµ() f()e H() under p. As these are tpicall not analticall tractable, the variational approimation introduces a tractable approimation measure q() = Z q µ()e Hq(), and adjusts the parameters in H q in such a wa that the free energ which is defined via the Kullback Leibler divergence is minimal. his ields the form F q = D(q, p) log Z = E q log q() p() log Z F q = log Z q + E q V () where V = H H q. he minimizer of F usuall underestimates the variance of p, and smmetries are sometimes not broken,, so that the approimation self-prunes. In Sec. we illustrate this effect with a small matri factorization eample, and then show how predictions (epectations Ef() of f() under the posterior p) could be improved. Perturbation theor Perturbation theor aims to find approimate solutions to a problem given eact solutions of a simpler related sub-problem (the VB solution in our case). If we define Ĥλ = H q + λv =
2 ( λ)h q + λh, we have Ĥ = H and Ĥ = H q. Let us now define the perturbation epansion using the cumulant epansion of log E q e V, where E q denotes the epectation with respect to the variational distribution: log dµ() e Ĥλ() = log Z q log E q e λv () = log Z q + λe q V λ }{{} E q (V E q V ) + λ! E q (V E q V ) +... () F q for λ= At the end, of course, we set λ =. he first order term in () ields the variational free energ F q in (), i.e. the negative of the usual evidence lower bound (ELBO). It is therefore reasonable to correct it using higher orders. Note that this ma not be a convergent series, but lead to an asmptotic epansion onl. he approach is similar to corrections to Epectation Propagation, 6, but the computation of the correction terms here require less effort. A variational linear response correction can also be applied 5. Epectations. In a similar wa, we can compute corrections for epectations of functions. We first define E λ as the epectation with respect to p λ () = µ()e Ĥλ() (we then have E q = E ) and notice that dµ() f() e Ĥλ() dµ() f() e Ĥ() λv () E λ f() = = dµ() e Ĥλ() dµ() e Ĥ() λv () ) dµ() f() e ( Ĥ() λv () + λ V () λ! V () ±... = ( dµ() e Ĥ() λv () + λ V () λ! V () ±... ) ( ) E f() λv + λ V λ! V ±... = ( E λv + λ V λ! V ). () ±... Using z = + z + z +... we can re-epand the part from the denominator in () to get (λe V λ E V + λ! E V ±...) = + λe V λ E V + λ E V ±... () Putting this together with the numerator in () ields ) E λ f() = (E f() λe f()v + λ E f()v ±... ) ( + λe V λ E V + λ E V ±... = E f() λe f()v + λe f()e V + λ E f()v λ E f()e V λ E f()v E V + λ E f()e V ±... = E f() λ f(), V () λ E V () f(), V () + λ f(), V () ±... (5) Variational Matri Factorization As to eample, we factorize a scalar matri r = + ɛ with ɛ N (ɛ;, β ). In the parlance of recommender sstems (as we ll later consider sparsel observed matrices R) a user (modelled b a latent scalar ) rates an item (modelled b ). One rating Note r N (r;, β ) z = + z + z +... onl converges for z <.
3 ..5 λ =. MC VB VB-st VB-nd E noise std dev β / Figure : he epected affinit E under p(, r, β) as a Monte Carlo ground truth (MC), and its approimations. We plot the results with r = and α = α =. With these parameters the VB estimate is zero for β > / (see Fig. and analsis in ), but is accuratel corrected through the first order term in (7). Under less noise smmetr is broken (β < / ), and the first order correction (6) is remains accurate. is observed. Assume priors p() = N (;, α ) and p() = N (;, α ). Let q(, ) = q()q() be a factorized approimation with q() = N (; µ, γ ) and q() = N (; µ, γ ). We will investigate how the variational prediction of a rating E and its further terms in (5) change as the observation noise standard deviation β increases. Motivation. his to eample is the building block for computing the corrections for real recommender sstem models, i.e. matri factorization models with sparsel observed ratings matrices and K latent dimension for both user and item vectors. As shown in the Supplementar Material, the corrections for the predicted rating from user i to item j, E i j, can be shown to be the sum of K(N i + M j ) corrections like the ones presented in this section, when q full factorizes. N i denotes the number of items rated b user i and M j the number of users that have rated item j. We epect the correction to be largest when N i and M j are small. Before commencing to technical details, consider Fig., which is accompanied b Fig.. When there is little observation noise, with β being small, the variational Baes (VB) solution E closel matches E, obtained from a Monte Carlo (MC) estimate. As the noise is increased, the VB solution stops breaking smmetr and snaps to zero (see Fig., and Nakajima et al. s analsis ), after which E = will alwas be predicted. he first order correction gives a substantial improvement towards the ground truth, although with no guarantee that (5) is a convergent series, the inclusion of a second order term incorporates a degradation. echnical details. he terms that constitute V (, ) = H(, ) H q (, ) are H(, ) = β(r ) + α + α H q (, ) = Let s look at the first order of the epansion in (5), E = E q λe q V + λe q E q V, γ ( µ ) + γ ( µ ). bearing in mind that q is the minimizer of F q = log Z q + E q V. he terms that we require in the epansion are E q = µ µ E q V E q E q V = { µ µ α α 8 β γ γ γ γ + βr { β µ µ γ γ γ } + µ } µ µ { β γ { βr γ } + µ { } βr γ }, (6)
4 p(, r, β =.) p(, r, β =.6) p(, r, β =.8) p(, r, β =.) p(, r, β =.5) p( r, β =.) p( r, β =.6) p( r, β =.8) p( r, β =.) p( r, β =.5) affinit 5 affinit 5 6 affinit 6 8 affinit affinit Figure : he posterior p(, r, β ), and the accompaning full factorized VB solution (circular contours), for r =. he approimation breaks smmetr at β =. Fig. presents corrections to the mean of p( r, β ). with the latter being Cov q, V. o illustrate the effect of the correction, consider high observation noise β > in Fig.. he VB approimation locks on to a zero mean, and does not break smmetr. With µ u = µ v = for eample, the onl remaining term to first order is (using λ = ) E β r, (7) γ γ and the correction is verifiabl in the direction of r. Summar We ve motivated, b means of a to eample, the benefit of and framework for a perturbation theoretical view on variational inference. Outside the scope of the workshop, current and future work encompass the application to matri factorization (see Supplementar Material), variational inference of stochastic differential equations, Gaussian Process (GP) classification, and the variational approach for sparse GP regression. References D. J. C. MacKa. Local minima, smmetr-breaking, and model pruning in variational free energ minimization. echnical report, Universit of Cambridge Cavendish Laborator,. S. Nakajima and M. Sugiama. heoretical analsis of Baesian matri factorization. Journal of Machine Learning Research, (Nov):58 68,. S. Nakajima, M. Sugiama, S. D. Babacan, and R. omioka. Global analtic solution of full-observed variational Baesian matri factorization. Journal of Machine Learning Research, (Jan): 7,. M. Opper, U. Paquet, and O. Winther. Perturbative corrections for approimate inference in Gaussian latent variable models. Journal of Machine Learning Research, (Sep): ,. 5 M. Opper and O. Winther. Variational linear response. In Advances in Neural Information Processing Sstems 6, pages MI Press,. 6 U. Paquet, O. Winther, and M. Opper. Perturbation corrections in approimate inference: Miture modelling applications. Journal of Machine Learning Research, (Jun):6, 9.
5 Supplementar Material Matri factorization - general case We now consider a recommender sstem with N users, M items and K latent dimensions. An N M sparse rating matri R can be constructed using each user s list of rated items in the catalogue. An element r ij in R contains the rating that user i has given to item j. Matri factorization techniques factorize the rating matri as R = X Y + E, where X is an K N user matri, the item matri Y is K M and the residual term E is N M. We denote the set of observed entries in R b R. We use a Gaussian likelihood for the ratings, i.e. r ij N (r ij ; i j, β ), and isotropic Gaussian priors for both user and item vectors: K K p( i ) = N ( ik ;, α ) p( j ) = N ( jk ;, α ). (8) It is common to take take a full factorized approimation for user i s and item j s vectors over dimensions k =,..., K: q( i ) = k q( ik ) = k N ( ik ; µ ik, γ ik ) q( j) = k q( jk ) = k N ( jk ; η jk, ζ jk ). he main quantit of interest in a recommender sstem is the predicted rating E i j, whose first order correction requires the computation of i j, V (X, Y). With our assumptions, the functions H and H q in V (X, Y) = H(X, Y) H q (X, Y) are H(X, Y) = β (r ij i j ) + α i i + α j j, (i,j) i j H q (X, Y) = γ ik ( ik µ ik ) + ζ jk ( jk η jk ). i,k j,k Counterintuitivel, we see that while we are onl interested in correcting E i j, both H and H q depend also on terms relative to items not seen b user i and users that have not rated item j. As we will see shortl, however, these terms do not contribute in i j, V (X, Y). In its unsimplified form, i j, V (X, Y) = i j, β N M (r nm n m ) + α n n + α (n,m) R N n= K γ nk ( nk µ nk ) M n= m= m= m m } K ζ mk ( mk η mk ) In the computation of i j, V (X, Y), all the terms in V (X, Y) that do not depend on i or j are constants and leave the covariance unchanged: considering the user vectors we have for eample N i j, α n n = i j, α i i. (n,m) R n= In a similar wa, among the likelihood terms (n,m) R (r nm n m ) we need onl to keep the ones corresponding to the items rated b user i and the users that rated item j (i.e. onl the i-th row and j-th column of the ratings matri). Defining M(i) as the set containing the indices of the items rated b user i and N (j) the set containing the indices of the users that rated item j, we have i j, (r nm n m ) = i j, (r im i m ) + (r nj n j ). Notice that we now use η and ζ for item mean and precisions, to avoid subscript spaghetti. 5
6 After removing all the constant terms the simplified covariance is given b i j, V = i j, β (r im i m ) + β (r nj n j ) + + α i i + α j j K γ ik ( ik µ ik ) K ζ jk ( jk η jk ) = β Cov i j, (r im i m ) + β Cov i j, (r nj n j ) + + K K i j, α i i + α j j γ ik ( ik µ ik ) ζ jk ( jk η jk ) = β f ( i, i, Y i ) + β f ( i, j, X j ) + g( i, j ). where X j is the restriction of X to the columns indeed b N (j), and Y i is the restriction of Y to the columns indeed b M(i). Computation of g( i, j ). hanks to our choice of a full factorized VB distribution and the bi-linearit of the covariance operator we have K K ( g( i, j ) = ik ik, α ik + α jk γ ik ( ik µ ik ) ζ jk ( jk η jk ) ) = K ik ik, α ik + α jk γ ik ( ik µ ik ) ζ jk ( jk η jk ). Computation of f ( i, i, Y i ) and f ( i, j, X j ). We now focus below onl on the computation of f ( i, i, Y i ), as the results for f ( i, j, X j ) are analogous. We rewrite the term f ( i, i, Y i ) as f ( i, i, Y i ) = i j, = = = (r im i m ) i j, (r im i m ) i j, r im r im i m + ( i m ) { rim i j, i m + Cov i j, ( i m ) } and analse separatel each of the covariance terms. hanks to the full factorization of the VB posterior we have i j, K K K i m = Cov ik jk, ik mk = ik jk, ik mk. 6
7 For i j, ( i m) we get instead i j, ( i m ) K K K = ik jk, ( ik mk ) + ik mk ip mp p<k K = ik jk, ( ik mk ) K K + ik jk, ik mk ip mp p<k K = ik jk, ( ik mk ) K + ik jk, ic mc ik mk K = ik jk, ( ik mk ) K + E ic mc ik jk, ik mk Combining these results we obtain f ( i, i, Y i ) = { K r im ik jk, ik mk + ik jk, ( ik mk ) + + E ic mc ik jk, ik mk = K r im E ic mc ik jk, ik mk + ik jk, ( ik mk ) If we then define r (k) im as the epected contribution to the rating r im from component k, i.e. r (k) im = r im E ic mc = r im µ ic η mc, we can rewrite f ( i, i, Y i ) as f ( i, i, Y i ) = i j, = = K K (r im i m ) { r (k) im ik jk, ik mk + ik jk, ( ik mk ) } ik jk, (r (k) im ik mk ). For each of the M(i) points we will then onl need the correction for K univariate problems (that are far simpler to compute). Notice in particular that this result is onl possible thanks to our assumption of full factorized priors and VB posteriors, as there are no correlations among variables of the same user/item vectors to be taken into account when computing the covariances. 7
8 Putting all together: corrections for the predicted ratings Combining the results above, the first order correction for the required epectation of this matri factorization problem is given b i j, V = β + β K K ik jk, (r (k) im ik mk ) + ik jk, (r (k) nj nk jk ) + + K ik ik, α ik + α jk γ ik ( ik µ ik ) ζ jk ( jk η jk ) K β = ik jk, (r (k) im ik mk ) + + β ik jk, (r (k) nj nk jk ) + + Cov } ik ik, α ik + α jk γ ik ( ik µ ik ) ζ jk ( jk η jk ) Due to the full factorized VB approimation, the computation of these covariances does not require vector operations and can be easil done manuall or using smbolic mathematics software such as Smp. he required non-central Gaussian moments can be computed using Wick s theorem. Assuming that r ij is an unobserved rating that we want to predict, the final epression for the covariance term in the first order correction of E i j is i j, V K β = µ ik η jk ηmk βr(k) im η jk η mk + β µ ik η jk + γ ik γ ik γ ik ζ mk + β µ ik η jk µ nk βr (k) nj µ ik µ nk + β µ ik η jk + ζ jk ζ jk γ nk ζ jk ( α + + α γ ik ζ jk ) µ ik η jk } o illustrate the effect of the correction we can compute its value in the case where smmetries in the user vectors u i are not broken, i.e. µ i = (this ma happen for nois users with onl few rated items). In this case, the epected rating with a first order correction is (with λ = ) E i j K βr (k) im η jk η mk γ ik. We see that the epectation depends on the rating given b user i to other items in the catalogue weighted b how similar their vectors are to j (so that we have a positive contribution from r (k) im to his means that the likelihood term in V (X, Y) does not depend on i j. In the to eample presented in Section instead we were computing a correction on an observed rating. 8
9 the predicted rating onl if the k-th components of the means of the j-th and m-th item vectors have the same direction). 9
CONTINUOUS SPATIAL DATA ANALYSIS
CONTINUOUS SPATIAL DATA ANALSIS 1. Overview of Spatial Stochastic Processes The ke difference between continuous spatial data and point patterns is that there is now assumed to be a meaningful value, s
More informationMMJ1153 COMPUTATIONAL METHOD IN SOLID MECHANICS PRELIMINARIES TO FEM
B Course Content: A INTRODUCTION AND OVERVIEW Numerical method and Computer-Aided Engineering; Phsical problems; Mathematical models; Finite element method;. B Elements and nodes, natural coordinates,
More informationINF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification
INF 4300 151014 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter 1-6 in Duda and Hart: Pattern Classification 151014 INF 4300 1 Introduction to classification One of the most challenging
More informationLarge-scale Ordinal Collaborative Filtering
Large-scale Ordinal Collaborative Filtering Ulrich Paquet, Blaise Thomson, and Ole Winther Microsoft Research Cambridge, University of Cambridge, Technical University of Denmark ulripa@microsoft.com,brmt2@cam.ac.uk,owi@imm.dtu.dk
More informationINF Introduction to classifiction Anne Solberg
INF 4300 8.09.17 Introduction to classifiction Anne Solberg anne@ifi.uio.no Introduction to classification Based on handout from Pattern Recognition b Theodoridis, available after the lecture INF 4300
More informationINF Anne Solberg One of the most challenging topics in image analysis is recognizing a specific object in an image.
INF 4300 700 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter -6 6inDuda and Hart: attern Classification 303 INF 4300 Introduction to classification One of the most challenging
More informationRecurrent Latent Variable Networks for Session-Based Recommendation
Recurrent Latent Variable Networks for Session-Based Recommendation Panayiotis Christodoulou Cyprus University of Technology paa.christodoulou@edu.cut.ac.cy 27/8/2017 Panayiotis Christodoulou (C.U.T.)
More informationShort course A vademecum of statistical pattern recognition techniques with applications to image and video analysis. Agenda
Short course A vademecum of statistical pattern recognition techniques with applications to image and video analysis Lecture Recalls of probability theory Massimo Piccardi University of Technology, Sydney,
More informationComputation fundamentals of discrete GMRF representations of continuous domain spatial models
Computation fundamentals of discrete GMRF representations of continuous domain spatial models Finn Lindgren September 23 2015 v0.2.2 Abstract The fundamental formulas and algorithms for Bayesian spatial
More informationReview of Probability
Review of robabilit robabilit Theor: Man techniques in speech processing require the manipulation of probabilities and statistics. The two principal application areas we will encounter are: Statistical
More information6. Vector Random Variables
6. Vector Random Variables In the previous chapter we presented methods for dealing with two random variables. In this chapter we etend these methods to the case of n random variables in the following
More informationA variational radial basis function approximation for diffusion processes
A variational radial basis function approximation for diffusion processes Michail D. Vrettas, Dan Cornford and Yuan Shen Aston University - Neural Computing Research Group Aston Triangle, Birmingham B4
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationOverview. IAML: Linear Regression. Examples of regression problems. The Regression Problem
3 / 38 Overview 4 / 38 IAML: Linear Regression Nigel Goddard School of Informatics Semester The linear model Fitting the linear model to data Probabilistic interpretation of the error function Eamples
More informationGaussian Process Approximations of Stochastic Differential Equations
Gaussian Process Approximations of Stochastic Differential Equations Cédric Archambeau Centre for Computational Statistics and Machine Learning University College London c.archambeau@cs.ucl.ac.uk CSML
More informationMAE 323: Chapter 4. Plane Stress and Plane Strain. The Stress Equilibrium Equation
The Stress Equilibrium Equation As we mentioned in Chapter 2, using the Galerkin formulation and a choice of shape functions, we can derive a discretized form of most differential equations. In Structural
More informationWhere now? Machine Learning and Bayesian Inference
Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone etension 67 Email: sbh@clcamacuk wwwclcamacuk/ sbh/ Where now? There are some simple take-home messages from
More information2: Distributions of Several Variables, Error Propagation
: Distributions of Several Variables, Error Propagation Distribution of several variables. variables The joint probabilit distribution function of two variables and can be genericall written f(, with the
More informationThe Variational Gaussian Approximation Revisited
The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationApplications of Gauss-Radau and Gauss-Lobatto Numerical Integrations Over a Four Node Quadrilateral Finite Element
Avaiable online at www.banglaol.info angladesh J. Sci. Ind. Res. (), 77-86, 008 ANGLADESH JOURNAL OF SCIENTIFIC AND INDUSTRIAL RESEARCH CSIR E-mail: bsir07gmail.com Abstract Applications of Gauss-Radau
More information15. Eigenvalues, Eigenvectors
5 Eigenvalues, Eigenvectors Matri of a Linear Transformation Consider a linear ( transformation ) L : a b R 2 R 2 Suppose we know that L and L Then c d because of linearit, we can determine what L does
More informationGaussian processes for inference in stochastic differential equations
Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017
More informationA New Perspective and Extension of the Gaussian Filter
Robotics: Science and Sstems 2015 Rome, Ital, Jul 13-17, 2015 A New Perspective and Etension of the Gaussian Filter Manuel Wüthrich, Sebastian Trimpe, Daniel Kappler and Stefan Schaal Autonomous Motion
More informationLecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan
Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always
More informationCross-validation for detecting and preventing overfitting
Cross-validation for detecting and preventing overfitting A Regression Problem = f() + noise Can we learn f from this data? Note to other teachers and users of these slides. Andrew would be delighted if
More informationFinding Limits Graphically and Numerically. An Introduction to Limits
8 CHAPTER Limits and Their Properties Section Finding Limits Graphicall and Numericall Estimate a it using a numerical or graphical approach Learn different was that a it can fail to eist Stud and use
More informationUnderstanding Covariance Estimates in Expectation Propagation
Understanding Covariance Estimates in Expectation Propagation William Stephenson Department of EECS Massachusetts Institute of Technology Cambridge, MA 019 wtstephe@csail.mit.edu Tamara Broderick Department
More informationChapter 4 Analytic Trigonometry
Analtic Trigonometr Chapter Analtic Trigonometr Inverse Trigonometric Functions The trigonometric functions act as an operator on the variable (angle, resulting in an output value Suppose this process
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationThe connection of dropout and Bayesian statistics
The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University
More informationUnit 12 Study Notes 1 Systems of Equations
You should learn to: Unit Stud Notes Sstems of Equations. Solve sstems of equations b substitution.. Solve sstems of equations b graphing (calculator). 3. Solve sstems of equations b elimination. 4. Solve
More informationFinding Limits Graphically and Numerically. An Introduction to Limits
60_00.qd //0 :05 PM Page 8 8 CHAPTER Limits and Their Properties Section. Finding Limits Graphicall and Numericall Estimate a it using a numerical or graphical approach. Learn different was that a it can
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More information8.1 Exponents and Roots
Section 8. Eponents and Roots 75 8. Eponents and Roots Before defining the net famil of functions, the eponential functions, we will need to discuss eponent notation in detail. As we shall see, eponents
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing
More informationOptimal Kernels for Unsupervised Learning
Optimal Kernels for Unsupervised Learning Sepp Hochreiter and Klaus Obermaer Bernstein Center for Computational Neuroscience and echnische Universität Berlin 587 Berlin, German {hochreit,ob}@cs.tu-berlin.de
More informationSection J Venn diagrams
Section J Venn diagrams A Venn diagram is a wa of using regions to represent sets. If the overlap it means that there are some items in both sets. Worked eample J. Draw a Venn diagram representing the
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationSupplementary Material: A Robust Approach to Sequential Information Theoretic Planning
Supplementar aterial: A Robust Approach to Sequential Information Theoretic Planning Sue Zheng Jason Pacheco John W. Fisher, III. Proofs of Estimator Properties.. Proof of Prop. Here we show how the bias
More informationSmoothness Selection. Simon Wood Mathematical Sciences, University of Bath, U.K.
Smoothness Selection Simon Wood Mathematical Sciences, Universit of Bath, U.K. Smoothness selection approaches The smoothing model i = f ( i ) + ɛ i, ɛ i N(0, σ 2 ), is represented via a basis epansion
More informationA Comparison of Particle Filters for Personal Positioning
VI Hotine-Marussi Symposium of Theoretical and Computational Geodesy May 9-June 6. A Comparison of Particle Filters for Personal Positioning D. Petrovich and R. Piché Institute of Mathematics Tampere University
More informationBiostatistics in Research Practice - Regression I
Biostatistics in Research Practice - Regression I Simon Crouch 30th Januar 2007 In scientific studies, we often wish to model the relationships between observed variables over a sample of different subjects.
More information1 History of statistical/machine learning. 2 Supervised learning. 3 Two approaches to supervised learning. 4 The general learning procedure
Overview Breiman L (2001). Statistical modeling: The two cultures Statistical learning reading group Aleander L 1 Histor of statistical/machine learning 2 Supervised learning 3 Two approaches to supervised
More information3.3 Logarithmic Functions and Their Graphs
274 CHAPTER 3 Eponential, Logistic, and Logarithmic Functions What ou ll learn about Inverses of Eponential Functions Common Logarithms Base 0 Natural Logarithms Base e Graphs of Logarithmic Functions
More informationGaussian Process priors with Uncertain Inputs: Multiple-Step-Ahead Prediction
Gaussian Process priors with Uncertain Inputs: Multiple-Step-Ahead Prediction Agathe Girard Dept. of Computing Science University of Glasgow Glasgow, UK agathe@dcs.gla.ac.uk Carl Edward Rasmussen Gatsby
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory
More informationBayesian Semi-supervised Learning with Deep Generative Models
Bayesian Semi-supervised Learning with Deep Generative Models Jonathan Gordon Department of Engineering Cambridge University jg801@cam.ac.uk José Miguel Hernández-Lobato Department of Engineering Cambridge
More informationGaussian integrals. Calvin W. Johnson. September 9, The basic Gaussian and its normalization
Gaussian integrals Calvin W. Johnson September 9, 24 The basic Gaussian and its normalization The Gaussian function or the normal distribution, ep ( α 2), () is a widely used function in physics and mathematical
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory
More informationA Practitioner s Guide to Generalized Linear Models
A Practitioners Guide to Generalized Linear Models Background The classical linear models and most of the minimum bias procedures are special cases of generalized linear models (GLMs). GLMs are more technically
More informationGaussian Process Vine Copulas for Multivariate Dependence
Gaussian Process Vine Copulas for Multivariate Dependence José Miguel Hernández Lobato 1,2, David López Paz 3,2 and Zoubin Ghahramani 1 June 27, 2013 1 University of Cambridge 2 Equal Contributor 3 Ma-Planck-Institute
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationUncertainty and Parameter Space Analysis in Visualization -
Uncertaint and Parameter Space Analsis in Visualiation - Session 4: Structural Uncertaint Analing the effect of uncertaint on the appearance of structures in scalar fields Rüdiger Westermann and Tobias
More informationDART_LAB Tutorial Section 2: How should observations impact an unobserved state variable? Multivariate assimilation.
DART_LAB Tutorial Section 2: How should observations impact an unobserved state variable? Multivariate assimilation. UCAR 2014 The National Center for Atmospheric Research is sponsored by the National
More information. a m1 a mn. a 1 a 2 a = a n
Biostat 140655, 2008: Matrix Algebra Review 1 Definition: An m n matrix, A m n, is a rectangular array of real numbers with m rows and n columns Element in the i th row and the j th column is denoted by
More informationA comparison of estimation accuracy by the use of KF, EKF & UKF filters
Computational Methods and Eperimental Measurements XIII 779 A comparison of estimation accurac b the use of KF EKF & UKF filters S. Konatowski & A. T. Pieniężn Department of Electronics Militar Universit
More informationWhy do we care? Measurements. Handling uncertainty over time: predicting, estimating, recognizing, learning. Dealing with time
Handling uncertainty over time: predicting, estimating, recognizing, learning Chris Atkeson 2004 Why do we care? Speech recognition makes use of dependence of words and phonemes across time. Knowing where
More informationσ = F/A. (1.2) σ xy σ yy σ zx σ xz σ yz σ, (1.3) The use of the opposite convention should cause no problem because σ ij = σ ji.
Cambridge Universit Press 978-1-107-00452-8 - Metal Forming: Mechanics Metallurg, Fourth Edition Ecerpt 1 Stress Strain An understing of stress strain is essential for the analsis of metal forming operations.
More informationUnsupervised Learning
CS 3750 Advanced Machine Learning hkc6@pitt.edu Unsupervised Learning Data: Just data, no labels Goal: Learn some underlying hidden structure of the data P(, ) P( ) Principle Component Analysis (Dimensionality
More informationSTA 414/2104, Spring 2014, Practice Problem Set #1
STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,
More informationRegression with Input-Dependent Noise: A Bayesian Treatment
Regression with Input-Dependent oise: A Bayesian Treatment Christopher M. Bishop C.M.BishopGaston.ac.uk Cazhaow S. Qazaz qazazcsgaston.ac.uk eural Computing Research Group Aston University, Birmingham,
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationChapter 5: Systems of Equations
Chapter : Sstems of Equations Section.: Sstems in Two Variables... 0 Section. Eercises... 9 Section.: Sstems in Three Variables... Section. Eercises... Section.: Linear Inequalities... Section.: Eercises.
More informationPolynomial and Rational Functions
Name Date Chapter Polnomial and Rational Functions Section.1 Quadratic Functions Objective: In this lesson ou learned how to sketch and analze graphs of quadratic functions. Important Vocabular Define
More informationApproximate inference, Sampling & Variational inference Fall Cours 9 November 25
Approimate inference, Sampling & Variational inference Fall 2015 Cours 9 November 25 Enseignant: Guillaume Obozinski Scribe: Basile Clément, Nathan de Lara 9.1 Approimate inference with MCMC 9.1.1 Gibbs
More informationBlack-box α-divergence Minimization
Black-box α-divergence Minimization José Miguel Hernández-Lobato, Yingzhen Li, Daniel Hernández-Lobato, Thang Bui, Richard Turner, Harvard University, University of Cambridge, Universidad Autónoma de Madrid.
More informationLecture 3: Pattern Classification. Pattern classification
EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and
More informationProperties of Limits
33460_003qd //04 :3 PM Page 59 SECTION 3 Evaluating Limits Analticall 59 Section 3 Evaluating Limits Analticall Evaluate a it using properties of its Develop and use a strateg for finding its Evaluate
More informationMachine Learning. 1. Linear Regression
Machine Learning Machine Learning 1. Linear Regression Lars Schmidt-Thieme Information Sstems and Machine Learning Lab (ISMLL) Institute for Business Economics and Information Sstems & Institute for Computer
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 11, 2016 Paper presentations and final project proposal Send me the names of your group member (2 or 3 students) before October 15 (this Friday)
More informationLecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu
Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes
More informationA Process over all Stationary Covariance Kernels
A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that
More informationAuto-Encoding Variational Bayes
Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower
More informationChapter 3. Theory of measurement
Chapter. Introduction An energetic He + -ion beam is incident on thermal sodium atoms. Figure. shows the configuration in which the interaction one is determined b the crossing of the laser-, sodium- and
More informationEigenGP: Gaussian Process Models with Adaptive Eigenfunctions
Proceedings of the Twent-Fourth International Joint Conference on Artificial Intelligence (IJCAI 5) : Gaussian Process Models with Adaptive Eigenfunctions Hao Peng Department of Computer Science Purdue
More informationMobile Robot Localization
Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations
More informationThe Force Table Introduction: Theory:
1 The Force Table Introduction: "The Force Table" is a simple tool for demonstrating Newton s First Law and the vector nature of forces. This tool is based on the principle of equilibrium. An object is
More informationType II variational methods in Bayesian estimation
Type II variational methods in Bayesian estimation J. A. Palmer, D. P. Wipf, and K. Kreutz-Delgado Department of Electrical and Computer Engineering University of California San Diego, La Jolla, CA 9093
More informationApproximate Message Passing
Approximate Message Passing Mohammad Emtiyaz Khan CS, UBC February 8, 2012 Abstract In this note, I summarize Sections 5.1 and 5.2 of Arian Maleki s PhD thesis. 1 Notation We denote scalars by small letters
More informationWhy do we care? Examples. Bayes Rule. What room am I in? Handling uncertainty over time: predicting, estimating, recognizing, learning
Handling uncertainty over time: predicting, estimating, recognizing, learning Chris Atkeson 004 Why do we care? Speech recognition makes use of dependence of words and phonemes across time. Knowing where
More informationDeep Nonlinear Non-Gaussian Filtering for Dynamical Systems
Deep Nonlinear Non-Gaussian Filtering for Dnamical Sstems Arash Mehrjou Department of Empirical Inference Ma Planck Institute for Intelligent Sstems arash.mehrjou@tuebingen.mpg.de Bernhard Schölkopf Department
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 7 Unsupervised Learning Statistical Perspective Probability Models Discrete & Continuous: Gaussian, Bernoulli, Multinomial Maimum Likelihood Logistic
More informationVariational Autoencoder
Variational Autoencoder Göker Erdo gan August 8, 2017 The variational autoencoder (VA) [1] is a nonlinear latent variable model with an efficient gradient-based training procedure based on variational
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture Notes Fall 2009 November, 2009 Byoung-Ta Zhang School of Computer Science and Engineering & Cognitive Science, Brain Science, and Bioinformatics Seoul National University
More informationBayesian Quadrature: Model-based Approximate Integration. David Duvenaud University of Cambridge
Bayesian Quadrature: Model-based Approimate Integration David Duvenaud University of Cambridge The Quadrature Problem ˆ We want to estimate an integral Z = f ()p()d ˆ Most computational problems in inference
More informationSelf-Averaging Expectation Propagation
Self-Averaging Expectation Propagation Burak Çakmak Department of Electronic Systems Aalborg Universitet Aalborg 90, Denmark buc@es.aau.dk Manfred Opper Department of Artificial Intelligence Technische
More informationGaussian Process Vine Copulas for Multivariate Dependence
Gaussian Process Vine Copulas for Multivariate Dependence José Miguel Hernández-Lobato 1,2 joint work with David López-Paz 2,3 and Zoubin Ghahramani 1 1 Department of Engineering, Cambridge University,
More informationAndriy Mnih and Ruslan Salakhutdinov
MATRIX FACTORIZATION METHODS FOR COLLABORATIVE FILTERING Andriy Mnih and Ruslan Salakhutdinov University of Toronto, Machine Learning Group 1 What is collaborative filtering? The goal of collaborative
More informationParameterized Joint Densities with Gaussian and Gaussian Mixture Marginals
Parameterized Joint Densities with Gaussian and Gaussian Miture Marginals Feli Sawo, Dietrich Brunn, and Uwe D. Hanebeck Intelligent Sensor-Actuator-Sstems Laborator Institute of Computer Science and Engineering
More informationSupplementary Note on Bayesian analysis
Supplementary Note on Bayesian analysis Structured variability of muscle activations supports the minimal intervention principle of motor control Francisco J. Valero-Cuevas 1,2,3, Madhusudhan Venkadesan
More informationViktor Shkolnikov, Juan G. Santiago*
Supplementar Information: Coupling Isotachophoresis with Affinit Chromatograph for Rapid and Selective Purification with High Column Utilization. Part I: Theor Viktor Shkolnikov, Juan G. Santiago* Department
More informationComputation of Total Capacity for Discrete Memoryless Multiple-Access Channels
IEEE TRANSACTIONS ON INFORATION THEORY, VOL. 50, NO. 11, NOVEBER 2004 2779 Computation of Total Capacit for Discrete emorless ultiple-access Channels ohammad Rezaeian, ember, IEEE, and Ale Grant, Senior
More informationBayesian Inference of Noise Levels in Regression
Bayesian Inference of Noise Levels in Regression Christopher M. Bishop Microsoft Research, 7 J. J. Thomson Avenue, Cambridge, CB FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop
More informationAnomaly Detection and Removal Using Non-Stationary Gaussian Processes
Anomaly Detection and Removal Using Non-Stationary Gaussian ocesses Steven Reece Roman Garnett, Michael Osborne and Stephen Roberts Robotics Research Group Dept Engineering Science Oford University, UK
More informationBayesian IV: the normal case with multiple endogenous variables
Baesian IV: the normal case with multiple endogenous variables Timoth Cogle and Richard Startz revised Januar 05 Abstract We set out a Gibbs sampler for the linear instrumental-variable model with normal
More informationNotes on Discriminant Functions and Optimal Classification
Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem
More informationFACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures AB = BA = I,
FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 7 MATRICES II Inverse of a matri Sstems of linear equations Solution of sets of linear equations elimination methods 4
More informationProbabilistic & Bayesian deep learning. Andreas Damianou
Probabilistic & Bayesian deep learning Andreas Damianou Amazon Research Cambridge, UK Talk at University of Sheffield, 19 March 2019 In this talk Not in this talk: CRFs, Boltzmann machines,... In this
More information