Riemannian Stein Variational Gradient Descent for Bayesian Inference
|
|
- Francis Norris
- 5 years ago
- Views:
Transcription
1 Riemannian Stein Variational Gradient Descent for Bayesian Inference Chang Liu, Jun Zhu 1 Dept. of Comp. Sci. & Tech., TNList Lab; Center for Bio-Inspired Computing Research State Key Lab for Intell. Tech. & Systems, Tsinghua University, Beijing, China {chang-li14@mails., dcszj@}tsinghua.edu.cn AAAI New Orleans 1 Corresponding author. Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 1 / 24
2 Introduction 1 Introduction 2 Preliminaries 3 Riemannian SVGD Derivation of the Directional Derivative Expression in the Embedded Space 4 Experiments Bayesian Logistic Regression Spherical Admixture Model Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 2 / 24
3 Introduction Introduction Bayesian inference: given a dataset D and a Bayesian model p(x, D), estimate the posterior of the latent variable p(x D). Comparison of current inference methods: model-based variational inference methods (M-VIs), Monte Carlo methods (MCs) and particle-based variational inference methods (P-VIs) Methods M-VIs MCs P-VIs Asymptotic Accuracy No Yes Promising Approximation Flexibility Limited Unlimited Unlimited Iteration Effectiveness Yes Weak Strong Particle Efficiency (do not apply) Weak Strong Stein Variational Gradient Descent (SVGD) [7]: a P-VI with minimal assumption and impressive performance. Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 3 / 24
4 Introduction Introduction In this work: Generalize SVGD to the Riemann manifold settings, so that we can: Purpose 1 Adapt SVGD to tasks on Riemann manifold and introduce the first P-VI to the Riemannian world. Purpose 2 Improve SVGD efficiency for usual tasks (ones on Euclidean space) by exploring information geometry. Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 4 / 24
5 Preliminaries 1 Introduction 2 Preliminaries 3 Riemannian SVGD Derivation of the Directional Derivative Expression in the Embedded Space 4 Experiments Bayesian Logistic Regression Spherical Admixture Model Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 5 / 24
6 Preliminaries Stein Variational Gradient Descent (SVGD) The idea of SVGD: A deterministic continuous-time dynamics d dtx(t) = φ(x(t)) on M = R m (where φ : R m R m ) will induce a continuously evolving distribution q t on M. At some instant t, for a fixed dynamics φ, find the decreasing rate of KL(q t p), i.e. the Directional Derivative d dt KL(q t p) in the direction of φ. Find φ that maximizes the directional derivative, i.e. the Functional Gradient φ (the steepest ascending direction ). For close-form solution, φ is chosen from H m, where H is the reproducing kernel Hilbert space (RKHS) of some kernel. Apply the dynamics φ to samples {x (s) } S s=1 of q t: {x (s) + εφ (x (s) )} S s=1 forms a set of samples of q t+ε. Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 6 / 24
7 Riemannian SVGD 1 Introduction 2 Preliminaries 3 Riemannian SVGD Derivation of the Directional Derivative Expression in the Embedded Space 4 Experiments Bayesian Logistic Regression Spherical Admixture Model Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 7 / 24
8 Riemannian SVGD Roadmap For a general Riemann manifold M, Any deterministic continuous-time dynamics on M is described by a vector field X on M. It induces a continuously evolving distribution on M with density q t (w.r.t. Riemann volume form). Derive the Directional Derivative d dt KL(q t p) under dynamics X. Derive the Functional Gradient X := (max arg max) X X =1 d dt KL(q t p). Moreover, for Purpose 1, express X in the Embedded Space of M when M has no global coordinate systems (c.s.), e.g. hyperspheres. Finally, simulate the dynamics X for a small time step ε to update samples. Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 8 / 24
9 Riemannian SVGD Derivation of the Directional Derivative Derivation of the Directional Derivative Let q t be the evolving density under dynamics X. Lemma (Continuity Equation on Riemann Manifold) q t t = div(q tx) = X[q t ] q t div(x). X[q t ]: the action of the vector field X on the smooth function q t. In any c.s., X[q t ] = X i i q t. div(x): the divergence of vector field X. In any c.s., div(x) = i ( G X i )/ G, where G is the matrix expression under the c.s. of the Riemann metric of M. Theorem (Directional Derivative) Let p be a fixed distribution. Then the directional derivative is d dt KL(q t p)=e qt [div(px)/p]=e qt [ X[log p]+div(x) ]. Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 9 / 24
10 Riemannian SVGD The task now: X := (max arg max) X X, X X =1 J (X) := E q [ X[log p] + div(x) ], where X is some subspace of the space of vector fields on M, such that the requirements are met: Requirements on X, thus on X R1: X is a valid vector field on M; R2: X is coordinate invariant; R3: X can be expressed in closed form, where q appears only in terms of expectation. Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 10 / 24
11 Riemannian SVGD R1: X is a valid vector field on M. Why needed: deductions are based on valid vector fields. Note: non-trivial to guarantee! Example (Vector fields on hyperspheres) Vector fields on an even-dimensional hypersphere must have one zero-vector-valued point (critical point) due to the hairy ball theorem ([1], Theorem ). The choice in SVGD X = H m cannot guarantee R1. Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 11 / 24
12 Riemannian SVGD R2: X is coordinate invariant. Concept: the expression of an object on M in any c.s. is the same. E.g. vector field, gradient and divergence. Why needed: necessary to avoid ambiguity or arbitrariness of the solution. The vector field X should be independent of the choice of c.s. in which it is expressed. Note: the choice in SVGD X = H m cannot guarantee R2. R3: X can be expressed in closed form, where q appears only in terms of expectation. Why needed: for tractable implementation, and for avoiding making restrictive assumptions on q. Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 12 / 24
13 Riemannian SVGD Our Solution X = {grad f f H}, where H is the RKHS of some kernel. grad f is the gradient of the smooth function f. In any c.s., (grad f) j = g ij i f, where g ij is the entry of G 1 under the c.s. Theorem For Gaussian RKHS, X is isometrically isomorphic to H, thus it is a Hilbert space. Our solution guarantees all the requirements: The gradient is a well-defined object on M and it is guaranteed to be a valid vector field and coordinate invariant (see paper for detailed interpretation). Close-form solution can be derived (see next). Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 13 / 24
14 Riemannian SVGD The close-form solution: Theorem (Functional Gradient) X = grad f, f = E q [ (grad K)[log p] + K ], where notations with prime take x as argument while others take x and K takes both, and f := div(grad f). In any c.s., [ X i (g = g ij je ab q a log(p G ) + a g ab) ] b K + g ab a b K. Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 14 / 24
15 Riemannian SVGD Purpose 2 Improve efficiency for the usual inference tasks on Euclidean space R m. Apply the idea of information geometry [3, 2]: for a Bayesian model with prior p(x) and likelihood p(d x), take M = {p( x) : x R m } and treat x as the coordinate of p( x). In this global c.s., G(x) is the Fisher information matrix of p( x) (and typically subtract by the Hessian of log p(x)). Calculate the tangent vector at each sample using the c.s. expression [ X i (g = g ij je ab q a log(p G ) + a g ab) ] b K + g ab a b K, where the target distribution p = p(x D) p(x)p(d x) and the expectation is estimated by averaging over samples. Simulate the dynamics for a small time step ε to update samples: x (s) x (s) + εx (x (s) ). Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 15 / 24
16 Riemannian SVGD Expression in the Embedded Space Expression in the Embedded Space Purpose 1 Enable applicability to inference tasks on non-linear Riemann manifolds. In the coordinate space of M: Some manifolds have no global c.s., e.g. hypersphere S n 1 := {x R n : x = 1} and Stiefel manifold [5]. Cumbersome switch among local c.s. G would be singular near the edge of coordinate space. In the embedded space of M: M can be expressed globally, and is natural for S n 1 and Stiefel manifold. No singularity problems. Requires exponential map and density w.r.t. Hausdorff measure, which are available for S n 1 and Stiefel manifold. Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 16 / 24
17 Riemannian SVGD Expression in the Embedded Space Expression in the Embedded Space Proposition (Functional Gradient in the Embedded Space) Let m-dim Riemann manifold M isometrically embedded in R n (with orthonormal basis {y α } n α=1 )) via Ξ : M Rn. Let p be the density w.r.t. the Hausdorff measure on Ξ(M). Then X = (I n N N ) f, [( f = E q log ( p G )) (I n NN ) ( K) + K ( tr N ( K)N ) + ( (M ) (G 1 M ) ) ] ( K), where I n R n n is the identity matrix, = ( y 1,..., y n), M R n m : M αi = yα, N R n (n m) is the set of orthonormal basis x i of the orthogonal complement of Ξ (T x M), and tr( ) is the trace. Simulating the dynamics requires the exponential map Exp of M: y (s) Exp y (s)(εx (y (s) )). Exp y (v): moves y on Ξ(M) straightly along the direction of v. Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 17 / 24
18 Riemannian SVGD Expression in the Embedded Space Expression in the Embedded Space Proposition (Functional Gradient for Embedded Hyperspheres) For S n 1 isometrically embedded in R n with orthonormal basis {y α } n α=1, we have X = (I n y y ) f, where f = E q [( log p ) ( K) + K y ( K ) y (y log p + n 1)y K ]. Exponential map on S n 1 : Exp y (v) = y cos( v ) + (v/ v ) sin( v ). Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 18 / 24
19 Experiments 1 Introduction 2 Preliminaries 3 Riemannian SVGD Derivation of the Directional Derivative Expression in the Embedded Space 4 Experiments Bayesian Logistic Regression Spherical Admixture Model Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 19 / 24
20 Experiments Bayesian Logistic Regression BLR: for Purpose 2 Model: Bayesian Logistic Regression (BLR) w N (0, αi m ), y d Bern(σ(w x d )), where σ(x) = 1/(1 + e x ). Euclidean task: w R m. Posterior: p(w {(x d, y d )}), log-density gradient known. Riemann metric tensor G: FisherInfo Hessian, known in close form. Kernel: Gaussian kernel in the coordinate space. Baselines: vanilla SVGD. Evaluation: averaged test accuracy. Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 20 / 24
21 Experiments Bayesian Logistic Regression BLR: for Purpose 2 Results: accuracy accuracy SVGD RSVGD (Ours) SVGD RSVGD (Ours) iteration iteration (a) On Splice19 dataset (b) On Covertype dataset Figure: Test accuracy along iteration for BLR. Both methods are run 20 times on Splice19 and 10 times on Covertype. Each run on Covertype uses a random train(80%)-test(20%) split as in [7]. Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 21 / 24
22 Experiments Spherical Admixture Model SAM: for Purpose 1 Model: Spherical Admixture Model (SAM) [8] Observed var.: tf-idf representation of documents: v d S V 1. Latent var.: spherical topics: β t S V 1. Non-linear Riemann manifold task: β (S V 1 ) T. Posterior: p(β v) (w.r.t. the Hausdorff measure), log-density gradient can be estimated [6]. Kernel: von-mises Fisher (vmf) kernel K(y, y ) = exp(κy y ), the restriction of Gaussian kernel in R n on S n 1. Baselines: Variational Inference (VI) [8]: the vanilla inference method of SAM. Geodesic Monte Carlo (GMC) [4]: MCMC for RM in the embed. sp. Stochastic Gradient GMC (SGGMC) [6]: SG-MCMC for RM in the embeded space. (-b: mini-batch grad. est. -f: full-batch grad. est.) For MCMCs, -seq: samples from one chain. -par: newest samples from multiple chains. Evaluation: log-perplexity (negative log-likelihood of test dataset under the trained model) [6]. Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 22 / 24
23 Experiments Spherical Admixture Model SAM: for Purpose 1 Results: log perplexity VI GMC seq GMC par SGGMCb seq SGGMCb par SGGMCf seq SGGMCf par RSVGD (Ours) log perplexity GMC seq GMC par SGGMCb seq SGGMCb par SGGMCf seq SGGMCf par RSVGD (Ours) epochs (a) Results with 100 particles number of particles (b) Results at 200 epochs Figure: Results on the SAM inference task on 20News-different dataset, in log-perplexity. We run SGGMCf for full batch and SGGMCb for a mini-batch size of 50. Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 23 / 24
24 Thank you! Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 24 / 24
25 References Ralph Abraham, Jerrold E Marsden, and Tudor Ratiu. Manifolds, tensor analysis, and applications, volume 75. Springer Science & Business Media, Shun-Ichi Amari. Information geometry and its applications. Springer, Shun-Ichi Amari and Hiroshi Nagaoka. Methods of information geometry, volume 191. American Mathematical Soc., Simon Byrne and Mark Girolami. Geodesic monte carlo on embedded manifolds. Scandinavian Journal of Statistics, 40(4): , I. M. James. The topology of Stiefel manifolds, volume 24. Cambridge University Press, Chang Liu, Jun Zhu, and Yang Song. Stochastic gradient geodesic mcmc methods. In Advances In Neural Information Processing Systems, pages , Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 24 / 24
26 References Qiang Liu and Dilin Wang. Stein variational gradient descent: A general purpose bayesian inference algorithm. In Advances in Neural Information Processing Systems, pages , Joseph Reisinger, Austin Waters, Bryan Silverthorn, and Raymond J. Mooney. Spherical topic models. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), pages , Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 24 / 24
Riemannian Stein Variational Gradient Descent for Bayesian Inference
The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) Riemannian Stein Variational Gradient Descent for Bayesian Inference Chang Liu, Jun Zhu Dept. of Comp. Sci. & Tech., TNList Lab; Center
More informationarxiv: v1 [stat.ml] 30 Nov 2017
Riemannian Stein Variational Gradient Descent for Bayesian Inference Chang Liu, Jun Zhu Dept. of Comp. Sci. & Tech., TNList Lab; Center for Bio-Inspired Computing Research State Key Lab for Intell. Tech.
More informationStein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm
Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d
More informationAfternoon Meeting on Bayesian Computation 2018 University of Reading
Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of
More informationReinforcement Learning as Variational Inference: Two Recent Approaches
Reinforcement Learning as Variational Inference: Two Recent Approaches Rohith Kuditipudi Duke University 11 August 2017 Outline 1 Background 2 Stein Variational Policy Gradient 3 Soft Q-Learning 4 Closing
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationManifold Monte Carlo Methods
Manifold Monte Carlo Methods Mark Girolami Department of Statistical Science University College London Joint work with Ben Calderhead Research Section Ordinary Meeting The Royal Statistical Society October
More informationAppendices: Stochastic Backpropagation and Approximate Inference in Deep Generative Models
Appendices: Stochastic Backpropagation and Approximate Inference in Deep Generative Models Danilo Jimenez Rezende Shakir Mohamed Daan Wierstra Google DeepMind, London, United Kingdom DANILOR@GOOGLE.COM
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationArtificial Neural Networks
Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Knowledge
More informationScalable Deep Poisson Factor Analysis for Topic Modeling: Supplementary Material
: Supplementary Material Zhe Gan ZHEGAN@DUKEEDU Changyou Chen CHANGYOUCHEN@DUKEEDU Ricardo Henao RICARDOHENAO@DUKEEDU David Carlson DAVIDCARLSON@DUKEEDU Lawrence Carin LCARIN@DUKEEDU Department of Electrical
More informationAlgorithms for Variational Learning of Mixture of Gaussians
Algorithms for Variational Learning of Mixture of Gaussians Instructors: Tapani Raiko and Antti Honkela Bayes Group Adaptive Informatics Research Center 28.08.2008 Variational Bayesian Inference Mixture
More informationOn Markov chain Monte Carlo methods for tall data
On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational
More informationLogistic Regression: Online, Lazy, Kernelized, Sequential, etc.
Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationPattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM
Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationThe connection of dropout and Bayesian statistics
The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University
More informationMonte Carlo in Bayesian Statistics
Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationKernel Learning via Random Fourier Representations
Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random
More informationDistributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College
Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic
More informationKernel adaptive Sequential Monte Carlo
Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline
More informationConvergence Rates of Kernel Quadrature Rules
Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction
More informationAdvances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008
Advances in Manifold Learning Presented by: Nakul Verma June 10, 008 Outline Motivation Manifolds Manifold Learning Random projection of manifolds for dimension reduction Introduction to random projections
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationRandomized Algorithms
Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models
More informationRiemannian Optimization Method on the Flag Manifold for Independent Subspace Analysis
Riemannian Optimization Method on the Flag Manifold for Independent Subspace Analysis Yasunori Nishimori 1, Shotaro Akaho 1, and Mark D. Plumbley 2 1 Neuroscience Research Institute, National Institute
More informationA Randomized Approach for Crowdsourcing in the Presence of Multiple Views
A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion
More informationStein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm
Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu Dilin Wang Department of Computer Science Dartmouth College Hanover, NH 03755 {qiang.liu, dilin.wang.gr}@dartmouth.edu
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationVariational Inference via Stochastic Backpropagation
Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation
More informationKernel Sequential Monte Carlo
Kernel Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37 Section
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationLEARNING SPARSE STRUCTURED ENSEMBLES WITH STOCASTIC GTADIENT MCMC SAMPLING AND NETWORK PRUNING
LEARNING SPARSE STRUCTURED ENSEMBLES WITH STOCASTIC GTADIENT MCMC SAMPLING AND NETWORK PRUNING Yichi Zhang Zhijian Ou Speech Processing and Machine Intelligence (SPMI) Lab Department of Electronic Engineering
More informationLearning Binary Classifiers for Multi-Class Problem
Research Memorandum No. 1010 September 28, 2006 Learning Binary Classifiers for Multi-Class Problem Shiro Ikeda The Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku, Tokyo, 106-8569,
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
More informationBias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions
- Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions Simon Luo The University of Sydney Data61, CSIRO simon.luo@data61.csiro.au Mahito Sugiyama National Institute of
More informationPILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationDeep learning with differential Gaussian process flows
Deep learning with differential Gaussian process flows Pashupati Hegde Markus Heinonen Harri Lähdesmäki Samuel Kaski Helsinki Institute for Information Technology HIIT Department of Computer Science, Aalto
More informationDIFFERENTIAL GEOMETRY HW 12
DIFFERENTIAL GEOMETRY HW 1 CLAY SHONKWILER 3 Find the Lie algebra so(n) of the special orthogonal group SO(n), and the explicit formula for the Lie bracket there. Proof. Since SO(n) is a subgroup of GL(n),
More informationHyperplane Margin Classifiers on the Multinomial Manifold
Hyperplane Margin Classifiers on the Multinomial Manifold Guy Lebanon LEBANON@CS.CMU.EDU John Lafferty LAFFERTY@CS.CMU.EDU School of Computer Science, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh
More informationExpectation Propagation for Approximate Bayesian Inference
Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationMetric Embedding for Kernel Classification Rules
Metric Embedding for Kernel Classification Rules Bharath K. Sriperumbudur University of California, San Diego (Joint work with Omer Lang & Gert Lanckriet) Bharath K. Sriperumbudur (UCSD) Metric Embedding
More informationVariational Bayesian Logistic Regression
Variational Bayesian Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic
More informationLearning to Sample Using Stein Discrepancy
Learning to Sample Using Stein Discrepancy Dilin Wang Yihao Feng Qiang Liu Department of Computer Science Dartmouth College Hanover, NH 03755 {dilin.wang.gr, yihao.feng.gr, qiang.liu}@dartmouth.edu Abstract
More informationFaster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions
Faster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions Mohammad Emtiyaz Khan, Reza Babanezhad, Wu Lin, Mark Schmidt, Masashi Sugiyama Conference on Uncertainty
More informationCSC2541 Lecture 5 Natural Gradient
CSC2541 Lecture 5 Natural Gradient Roger Grosse Roger Grosse CSC2541 Lecture 5 Natural Gradient 1 / 12 Motivation Two classes of optimization procedures used throughout ML (stochastic) gradient descent,
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationUnsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto
Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian
More informationHOMEWORK #4: LOGISTIC REGRESSION
HOMEWORK #4: LOGISTIC REGRESSION Probabilistic Learning: Theory and Algorithms CS 274A, Winter 2019 Due: 11am Monday, February 25th, 2019 Submit scan of plots/written responses to Gradebook; submit your
More informationNew Insights and Perspectives on the Natural Gradient Method
1 / 18 New Insights and Perspectives on the Natural Gradient Method Yoonho Lee Department of Computer Science and Engineering Pohang University of Science and Technology March 13, 2018 Motivation 2 / 18
More informationCALCULUS ON MANIFOLDS. 1. Riemannian manifolds Recall that for any smooth manifold M, dim M = n, the union T M =
CALCULUS ON MANIFOLDS 1. Riemannian manifolds Recall that for any smooth manifold M, dim M = n, the union T M = a M T am, called the tangent bundle, is itself a smooth manifold, dim T M = 2n. Example 1.
More informationProbabilistic Graphical Models for Image Analysis - Lecture 4
Probabilistic Graphical Models for Image Analysis - Lecture 4 Stefan Bauer 12 October 2018 Max Planck ETH Center for Learning Systems Overview 1. Repetition 2. α-divergence 3. Variational Inference 4.
More informationEnergy-Based Generative Adversarial Network
Energy-Based Generative Adversarial Network Energy-Based Generative Adversarial Network J. Zhao, M. Mathieu and Y. LeCun Learning to Draw Samples: With Application to Amoritized MLE for Generalized Adversarial
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationInformation geometry for bivariate distribution control
Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic
More informationDiscriminant Analysis with High Dimensional. von Mises-Fisher distribution and
Athens Journal of Sciences December 2014 Discriminant Analysis with High Dimensional von Mises - Fisher Distributions By Mario Romanazzi This paper extends previous work in discriminant analysis with von
More informationStochastic Variational Inference
Stochastic Variational Inference David M. Blei Princeton University (DRAFT: DO NOT CITE) December 8, 2011 We derive a stochastic optimization algorithm for mean field variational inference, which we call
More informationLecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications
Lecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications Qixing Huang The University of Texas at Austin huangqx@cs.utexas.edu 1 Disclaimer This note is adapted from Section
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationLatent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data
Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data Yarin Gal Yutian Chen Zoubin Ghahramani yg279@cam.ac.uk Distribution Estimation Distribution estimation of categorical
More informationFacilitating Bayesian Continual Learning by Natural Gradients and Stein Gradients
Facilitating Bayesian Continual Learning by Natural Gradients and Stein Gradients Yu Chen University of Bristol yc14600@bristol.ac.u Tom Diethe Amazon tdiethe@amazon.com Neil Lawrence Amazon lawrennd@amazon.com
More informationDimensionality Reduction with Generalized Linear Models
Dimensionality Reduction with Generalized Linear Models Mo Chen 1 Wei Li 1 Wei Zhang 2 Xiaogang Wang 1 1 Department of Electronic Engineering 2 Department of Information Engineering The Chinese University
More informationBasic Principles of Unsupervised and Unsupervised
Basic Principles of Unsupervised and Unsupervised Learning Toward Deep Learning Shun ichi Amari (RIKEN Brain Science Institute) collaborators: R. Karakida, M. Okada (U. Tokyo) Deep Learning Self Organization
More informationNeural Learning in Structured Parameter Spaces Natural Riemannian Gradient
Neural Learning in Structured Parameter Spaces Natural Riemannian Gradient Shun-ichi Amari RIKEN Frontier Research Program, RIKEN, Hirosawa 2-1, Wako-shi 351-01, Japan amari@zoo.riken.go.jp Abstract The
More informationOnline Bayesian Passive-Agressive Learning
Online Bayesian Passive-Agressive Learning International Conference on Machine Learning, 2014 Tianlin Shi Jun Zhu Tsinghua University, China 21 August 2015 Presented by: Kyle Ulrich Introduction Online
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationDiversity-Promoting Bayesian Learning of Latent Variable Models
Diversity-Promoting Bayesian Learning of Latent Variable Models Pengtao Xie 1, Jun Zhu 1,2 and Eric Xing 1 1 Machine Learning Department, Carnegie Mellon University 2 Department of Computer Science and
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationComparison of Modern Stochastic Optimization Algorithms
Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,
More informationFrom independent component analysis to score matching
From independent component analysis to score matching Aapo Hyvärinen Dept of Computer Science & HIIT Dept of Mathematics and Statistics University of Helsinki Finland 1 Abstract First, short introduction
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationVariational Autoencoders
Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationLecture 6. Notes on Linear Algebra. Perceptron
Lecture 6. Notes on Linear Algebra. Perceptron COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Notes on linear algebra Vectors
More informationInformation geometry of mirror descent
Information geometry of mirror descent Geometric Science of Information Anthea Monod Department of Statistical Science Duke University Information Initiative at Duke G. Raskutti (UW Madison) and S. Mukherjee
More informationEstimation theory and information geometry based on denoising
Estimation theory and information geometry based on denoising Aapo Hyvärinen Dept of Computer Science & HIIT Dept of Mathematics and Statistics University of Helsinki Finland 1 Abstract What is the best
More informationCS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1
More informationFast yet Simple Natural-Gradient Variational Inference in Complex Models
Fast yet Simple Natural-Gradient Variational Inference in Complex Models Mohammad Emtiyaz Khan RIKEN Center for AI Project, Tokyo, Japan http://emtiyaz.github.io Joint work with Wu Lin (UBC) DidrikNielsen,
More informationMATHEMATICAL entities that do not form Euclidean
Kernel Methods on Riemannian Manifolds with Gaussian RBF Kernels Sadeep Jayasumana, Student Member, IEEE, Richard Hartley, Fellow, IEEE, Mathieu Salzmann, Member, IEEE, Hongdong Li, Member, IEEE, and Mehrtash
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationGraph Matching & Information. Geometry. Towards a Quantum Approach. David Emms
Graph Matching & Information Geometry Towards a Quantum Approach David Emms Overview Graph matching problem. Quantum algorithms. Classical approaches. How these could help towards a Quantum approach. Graphs
More informationAd Placement Strategies
Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad
More informationTime-Sensitive Dirichlet Process Mixture Models
Time-Sensitive Dirichlet Process Mixture Models Xiaojin Zhu Zoubin Ghahramani John Lafferty May 25 CMU-CALD-5-4 School of Computer Science Carnegie Mellon University Pittsburgh, PA 523 Abstract We introduce
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationOnline Bayesian Passive-Aggressive Learning
Online Bayesian Passive-Aggressive Learning Full Journal Version: http://qr.net/b1rd Tianlin Shi Jun Zhu ICML 2014 T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 1 / 35 Outline Introduction Motivation Framework
More informationDevelopment of Stochastic Artificial Neural Networks for Hydrological Prediction
Development of Stochastic Artificial Neural Networks for Hydrological Prediction G. B. Kingston, M. F. Lambert and H. R. Maier Centre for Applied Modelling in Water Engineering, School of Civil and Environmental
More informationTransport Continuity Property
On Riemannian manifolds satisfying the Transport Continuity Property Université de Nice - Sophia Antipolis (Joint work with A. Figalli and C. Villani) I. Statement of the problem Optimal transport on Riemannian
More informationActor-critic methods. Dialogue Systems Group, Cambridge University Engineering Department. February 21, 2017
Actor-critic methods Milica Gašić Dialogue Systems Group, Cambridge University Engineering Department February 21, 2017 1 / 21 In this lecture... The actor-critic architecture Least-Squares Policy Iteration
More informationSequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015
Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative
More informationSupplementary Material of High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models
Supplementary Material of High-Order Stochastic Gradient hermostats for Bayesian Learning of Deep Models Chunyuan Li, Changyou Chen, Kai Fan 2 and Lawrence Carin Department of Electrical and Computer Engineering,
More informationPractical Bayesian Optimization of Machine Learning. Learning Algorithms
Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that
More informationSupervised Dimension Reduction:
Supervised Dimension Reduction: A Tale of Two Manifolds S. Mukherjee, K. Mao, F. Liang, Q. Wu, M. Maggioni, D-X. Zhou Department of Statistical Science Institute for Genome Sciences & Policy Department
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationLearning Task Grouping and Overlap in Multi-Task Learning
Learning Task Grouping and Overlap in Multi-Task Learning Abhishek Kumar Hal Daumé III Department of Computer Science University of Mayland, College Park 20 May 2013 Proceedings of the 29 th International
More information