Riemannian Stein Variational Gradient Descent for Bayesian Inference

Size: px
Start display at page:

Download "Riemannian Stein Variational Gradient Descent for Bayesian Inference"

Transcription

1 Riemannian Stein Variational Gradient Descent for Bayesian Inference Chang Liu, Jun Zhu 1 Dept. of Comp. Sci. & Tech., TNList Lab; Center for Bio-Inspired Computing Research State Key Lab for Intell. Tech. & Systems, Tsinghua University, Beijing, China {chang-li14@mails., dcszj@}tsinghua.edu.cn AAAI New Orleans 1 Corresponding author. Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 1 / 24

2 Introduction 1 Introduction 2 Preliminaries 3 Riemannian SVGD Derivation of the Directional Derivative Expression in the Embedded Space 4 Experiments Bayesian Logistic Regression Spherical Admixture Model Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 2 / 24

3 Introduction Introduction Bayesian inference: given a dataset D and a Bayesian model p(x, D), estimate the posterior of the latent variable p(x D). Comparison of current inference methods: model-based variational inference methods (M-VIs), Monte Carlo methods (MCs) and particle-based variational inference methods (P-VIs) Methods M-VIs MCs P-VIs Asymptotic Accuracy No Yes Promising Approximation Flexibility Limited Unlimited Unlimited Iteration Effectiveness Yes Weak Strong Particle Efficiency (do not apply) Weak Strong Stein Variational Gradient Descent (SVGD) [7]: a P-VI with minimal assumption and impressive performance. Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 3 / 24

4 Introduction Introduction In this work: Generalize SVGD to the Riemann manifold settings, so that we can: Purpose 1 Adapt SVGD to tasks on Riemann manifold and introduce the first P-VI to the Riemannian world. Purpose 2 Improve SVGD efficiency for usual tasks (ones on Euclidean space) by exploring information geometry. Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 4 / 24

5 Preliminaries 1 Introduction 2 Preliminaries 3 Riemannian SVGD Derivation of the Directional Derivative Expression in the Embedded Space 4 Experiments Bayesian Logistic Regression Spherical Admixture Model Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 5 / 24

6 Preliminaries Stein Variational Gradient Descent (SVGD) The idea of SVGD: A deterministic continuous-time dynamics d dtx(t) = φ(x(t)) on M = R m (where φ : R m R m ) will induce a continuously evolving distribution q t on M. At some instant t, for a fixed dynamics φ, find the decreasing rate of KL(q t p), i.e. the Directional Derivative d dt KL(q t p) in the direction of φ. Find φ that maximizes the directional derivative, i.e. the Functional Gradient φ (the steepest ascending direction ). For close-form solution, φ is chosen from H m, where H is the reproducing kernel Hilbert space (RKHS) of some kernel. Apply the dynamics φ to samples {x (s) } S s=1 of q t: {x (s) + εφ (x (s) )} S s=1 forms a set of samples of q t+ε. Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 6 / 24

7 Riemannian SVGD 1 Introduction 2 Preliminaries 3 Riemannian SVGD Derivation of the Directional Derivative Expression in the Embedded Space 4 Experiments Bayesian Logistic Regression Spherical Admixture Model Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 7 / 24

8 Riemannian SVGD Roadmap For a general Riemann manifold M, Any deterministic continuous-time dynamics on M is described by a vector field X on M. It induces a continuously evolving distribution on M with density q t (w.r.t. Riemann volume form). Derive the Directional Derivative d dt KL(q t p) under dynamics X. Derive the Functional Gradient X := (max arg max) X X =1 d dt KL(q t p). Moreover, for Purpose 1, express X in the Embedded Space of M when M has no global coordinate systems (c.s.), e.g. hyperspheres. Finally, simulate the dynamics X for a small time step ε to update samples. Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 8 / 24

9 Riemannian SVGD Derivation of the Directional Derivative Derivation of the Directional Derivative Let q t be the evolving density under dynamics X. Lemma (Continuity Equation on Riemann Manifold) q t t = div(q tx) = X[q t ] q t div(x). X[q t ]: the action of the vector field X on the smooth function q t. In any c.s., X[q t ] = X i i q t. div(x): the divergence of vector field X. In any c.s., div(x) = i ( G X i )/ G, where G is the matrix expression under the c.s. of the Riemann metric of M. Theorem (Directional Derivative) Let p be a fixed distribution. Then the directional derivative is d dt KL(q t p)=e qt [div(px)/p]=e qt [ X[log p]+div(x) ]. Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 9 / 24

10 Riemannian SVGD The task now: X := (max arg max) X X, X X =1 J (X) := E q [ X[log p] + div(x) ], where X is some subspace of the space of vector fields on M, such that the requirements are met: Requirements on X, thus on X R1: X is a valid vector field on M; R2: X is coordinate invariant; R3: X can be expressed in closed form, where q appears only in terms of expectation. Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 10 / 24

11 Riemannian SVGD R1: X is a valid vector field on M. Why needed: deductions are based on valid vector fields. Note: non-trivial to guarantee! Example (Vector fields on hyperspheres) Vector fields on an even-dimensional hypersphere must have one zero-vector-valued point (critical point) due to the hairy ball theorem ([1], Theorem ). The choice in SVGD X = H m cannot guarantee R1. Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 11 / 24

12 Riemannian SVGD R2: X is coordinate invariant. Concept: the expression of an object on M in any c.s. is the same. E.g. vector field, gradient and divergence. Why needed: necessary to avoid ambiguity or arbitrariness of the solution. The vector field X should be independent of the choice of c.s. in which it is expressed. Note: the choice in SVGD X = H m cannot guarantee R2. R3: X can be expressed in closed form, where q appears only in terms of expectation. Why needed: for tractable implementation, and for avoiding making restrictive assumptions on q. Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 12 / 24

13 Riemannian SVGD Our Solution X = {grad f f H}, where H is the RKHS of some kernel. grad f is the gradient of the smooth function f. In any c.s., (grad f) j = g ij i f, where g ij is the entry of G 1 under the c.s. Theorem For Gaussian RKHS, X is isometrically isomorphic to H, thus it is a Hilbert space. Our solution guarantees all the requirements: The gradient is a well-defined object on M and it is guaranteed to be a valid vector field and coordinate invariant (see paper for detailed interpretation). Close-form solution can be derived (see next). Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 13 / 24

14 Riemannian SVGD The close-form solution: Theorem (Functional Gradient) X = grad f, f = E q [ (grad K)[log p] + K ], where notations with prime take x as argument while others take x and K takes both, and f := div(grad f). In any c.s., [ X i (g = g ij je ab q a log(p G ) + a g ab) ] b K + g ab a b K. Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 14 / 24

15 Riemannian SVGD Purpose 2 Improve efficiency for the usual inference tasks on Euclidean space R m. Apply the idea of information geometry [3, 2]: for a Bayesian model with prior p(x) and likelihood p(d x), take M = {p( x) : x R m } and treat x as the coordinate of p( x). In this global c.s., G(x) is the Fisher information matrix of p( x) (and typically subtract by the Hessian of log p(x)). Calculate the tangent vector at each sample using the c.s. expression [ X i (g = g ij je ab q a log(p G ) + a g ab) ] b K + g ab a b K, where the target distribution p = p(x D) p(x)p(d x) and the expectation is estimated by averaging over samples. Simulate the dynamics for a small time step ε to update samples: x (s) x (s) + εx (x (s) ). Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 15 / 24

16 Riemannian SVGD Expression in the Embedded Space Expression in the Embedded Space Purpose 1 Enable applicability to inference tasks on non-linear Riemann manifolds. In the coordinate space of M: Some manifolds have no global c.s., e.g. hypersphere S n 1 := {x R n : x = 1} and Stiefel manifold [5]. Cumbersome switch among local c.s. G would be singular near the edge of coordinate space. In the embedded space of M: M can be expressed globally, and is natural for S n 1 and Stiefel manifold. No singularity problems. Requires exponential map and density w.r.t. Hausdorff measure, which are available for S n 1 and Stiefel manifold. Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 16 / 24

17 Riemannian SVGD Expression in the Embedded Space Expression in the Embedded Space Proposition (Functional Gradient in the Embedded Space) Let m-dim Riemann manifold M isometrically embedded in R n (with orthonormal basis {y α } n α=1 )) via Ξ : M Rn. Let p be the density w.r.t. the Hausdorff measure on Ξ(M). Then X = (I n N N ) f, [( f = E q log ( p G )) (I n NN ) ( K) + K ( tr N ( K)N ) + ( (M ) (G 1 M ) ) ] ( K), where I n R n n is the identity matrix, = ( y 1,..., y n), M R n m : M αi = yα, N R n (n m) is the set of orthonormal basis x i of the orthogonal complement of Ξ (T x M), and tr( ) is the trace. Simulating the dynamics requires the exponential map Exp of M: y (s) Exp y (s)(εx (y (s) )). Exp y (v): moves y on Ξ(M) straightly along the direction of v. Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 17 / 24

18 Riemannian SVGD Expression in the Embedded Space Expression in the Embedded Space Proposition (Functional Gradient for Embedded Hyperspheres) For S n 1 isometrically embedded in R n with orthonormal basis {y α } n α=1, we have X = (I n y y ) f, where f = E q [( log p ) ( K) + K y ( K ) y (y log p + n 1)y K ]. Exponential map on S n 1 : Exp y (v) = y cos( v ) + (v/ v ) sin( v ). Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 18 / 24

19 Experiments 1 Introduction 2 Preliminaries 3 Riemannian SVGD Derivation of the Directional Derivative Expression in the Embedded Space 4 Experiments Bayesian Logistic Regression Spherical Admixture Model Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 19 / 24

20 Experiments Bayesian Logistic Regression BLR: for Purpose 2 Model: Bayesian Logistic Regression (BLR) w N (0, αi m ), y d Bern(σ(w x d )), where σ(x) = 1/(1 + e x ). Euclidean task: w R m. Posterior: p(w {(x d, y d )}), log-density gradient known. Riemann metric tensor G: FisherInfo Hessian, known in close form. Kernel: Gaussian kernel in the coordinate space. Baselines: vanilla SVGD. Evaluation: averaged test accuracy. Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 20 / 24

21 Experiments Bayesian Logistic Regression BLR: for Purpose 2 Results: accuracy accuracy SVGD RSVGD (Ours) SVGD RSVGD (Ours) iteration iteration (a) On Splice19 dataset (b) On Covertype dataset Figure: Test accuracy along iteration for BLR. Both methods are run 20 times on Splice19 and 10 times on Covertype. Each run on Covertype uses a random train(80%)-test(20%) split as in [7]. Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 21 / 24

22 Experiments Spherical Admixture Model SAM: for Purpose 1 Model: Spherical Admixture Model (SAM) [8] Observed var.: tf-idf representation of documents: v d S V 1. Latent var.: spherical topics: β t S V 1. Non-linear Riemann manifold task: β (S V 1 ) T. Posterior: p(β v) (w.r.t. the Hausdorff measure), log-density gradient can be estimated [6]. Kernel: von-mises Fisher (vmf) kernel K(y, y ) = exp(κy y ), the restriction of Gaussian kernel in R n on S n 1. Baselines: Variational Inference (VI) [8]: the vanilla inference method of SAM. Geodesic Monte Carlo (GMC) [4]: MCMC for RM in the embed. sp. Stochastic Gradient GMC (SGGMC) [6]: SG-MCMC for RM in the embeded space. (-b: mini-batch grad. est. -f: full-batch grad. est.) For MCMCs, -seq: samples from one chain. -par: newest samples from multiple chains. Evaluation: log-perplexity (negative log-likelihood of test dataset under the trained model) [6]. Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 22 / 24

23 Experiments Spherical Admixture Model SAM: for Purpose 1 Results: log perplexity VI GMC seq GMC par SGGMCb seq SGGMCb par SGGMCf seq SGGMCf par RSVGD (Ours) log perplexity GMC seq GMC par SGGMCb seq SGGMCb par SGGMCf seq SGGMCf par RSVGD (Ours) epochs (a) Results with 100 particles number of particles (b) Results at 200 epochs Figure: Results on the SAM inference task on 20News-different dataset, in log-perplexity. We run SGGMCf for full batch and SGGMCb for a mini-batch size of 50. Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 23 / 24

24 Thank you! Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 24 / 24

25 References Ralph Abraham, Jerrold E Marsden, and Tudor Ratiu. Manifolds, tensor analysis, and applications, volume 75. Springer Science & Business Media, Shun-Ichi Amari. Information geometry and its applications. Springer, Shun-Ichi Amari and Hiroshi Nagaoka. Methods of information geometry, volume 191. American Mathematical Soc., Simon Byrne and Mark Girolami. Geodesic monte carlo on embedded manifolds. Scandinavian Journal of Statistics, 40(4): , I. M. James. The topology of Stiefel manifolds, volume 24. Cambridge University Press, Chang Liu, Jun Zhu, and Yang Song. Stochastic gradient geodesic mcmc methods. In Advances In Neural Information Processing Systems, pages , Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 24 / 24

26 References Qiang Liu and Dilin Wang. Stein variational gradient descent: A general purpose bayesian inference algorithm. In Advances in Neural Information Processing Systems, pages , Joseph Reisinger, Austin Waters, Bryan Silverthorn, and Raymond J. Mooney. Spherical topic models. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), pages , Chang Liu and Jun Zhu (THU) Riemannian Stein Variational Gradient Descent for Bayesian Inference 24 / 24

Riemannian Stein Variational Gradient Descent for Bayesian Inference

Riemannian Stein Variational Gradient Descent for Bayesian Inference The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) Riemannian Stein Variational Gradient Descent for Bayesian Inference Chang Liu, Jun Zhu Dept. of Comp. Sci. & Tech., TNList Lab; Center

More information

arxiv: v1 [stat.ml] 30 Nov 2017

arxiv: v1 [stat.ml] 30 Nov 2017 Riemannian Stein Variational Gradient Descent for Bayesian Inference Chang Liu, Jun Zhu Dept. of Comp. Sci. & Tech., TNList Lab; Center for Bio-Inspired Computing Research State Key Lab for Intell. Tech.

More information

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d

More information

Afternoon Meeting on Bayesian Computation 2018 University of Reading

Afternoon Meeting on Bayesian Computation 2018 University of Reading Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of

More information

Reinforcement Learning as Variational Inference: Two Recent Approaches

Reinforcement Learning as Variational Inference: Two Recent Approaches Reinforcement Learning as Variational Inference: Two Recent Approaches Rohith Kuditipudi Duke University 11 August 2017 Outline 1 Background 2 Stein Variational Policy Gradient 3 Soft Q-Learning 4 Closing

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Manifold Monte Carlo Methods

Manifold Monte Carlo Methods Manifold Monte Carlo Methods Mark Girolami Department of Statistical Science University College London Joint work with Ben Calderhead Research Section Ordinary Meeting The Royal Statistical Society October

More information

Appendices: Stochastic Backpropagation and Approximate Inference in Deep Generative Models

Appendices: Stochastic Backpropagation and Approximate Inference in Deep Generative Models Appendices: Stochastic Backpropagation and Approximate Inference in Deep Generative Models Danilo Jimenez Rezende Shakir Mohamed Daan Wierstra Google DeepMind, London, United Kingdom DANILOR@GOOGLE.COM

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Knowledge

More information

Scalable Deep Poisson Factor Analysis for Topic Modeling: Supplementary Material

Scalable Deep Poisson Factor Analysis for Topic Modeling: Supplementary Material : Supplementary Material Zhe Gan ZHEGAN@DUKEEDU Changyou Chen CHANGYOUCHEN@DUKEEDU Ricardo Henao RICARDOHENAO@DUKEEDU David Carlson DAVIDCARLSON@DUKEEDU Lawrence Carin LCARIN@DUKEEDU Department of Electrical

More information

Algorithms for Variational Learning of Mixture of Gaussians

Algorithms for Variational Learning of Mixture of Gaussians Algorithms for Variational Learning of Mixture of Gaussians Instructors: Tapani Raiko and Antti Honkela Bayes Group Adaptive Informatics Research Center 28.08.2008 Variational Bayesian Inference Mixture

More information

On Markov chain Monte Carlo methods for tall data

On Markov chain Monte Carlo methods for tall data On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational

More information

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

The connection of dropout and Bayesian statistics

The connection of dropout and Bayesian statistics The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University

More information

Monte Carlo in Bayesian Statistics

Monte Carlo in Bayesian Statistics Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Kernel Learning via Random Fourier Representations

Kernel Learning via Random Fourier Representations Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random

More information

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

Convergence Rates of Kernel Quadrature Rules

Convergence Rates of Kernel Quadrature Rules Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction

More information

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008 Advances in Manifold Learning Presented by: Nakul Verma June 10, 008 Outline Motivation Manifolds Manifold Learning Random projection of manifolds for dimension reduction Introduction to random projections

More information

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

Riemannian Optimization Method on the Flag Manifold for Independent Subspace Analysis

Riemannian Optimization Method on the Flag Manifold for Independent Subspace Analysis Riemannian Optimization Method on the Flag Manifold for Independent Subspace Analysis Yasunori Nishimori 1, Shotaro Akaho 1, and Mark D. Plumbley 2 1 Neuroscience Research Institute, National Institute

More information

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion

More information

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu Dilin Wang Department of Computer Science Dartmouth College Hanover, NH 03755 {qiang.liu, dilin.wang.gr}@dartmouth.edu

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Variational Inference via Stochastic Backpropagation

Variational Inference via Stochastic Backpropagation Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation

More information

Kernel Sequential Monte Carlo

Kernel Sequential Monte Carlo Kernel Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37 Section

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

LEARNING SPARSE STRUCTURED ENSEMBLES WITH STOCASTIC GTADIENT MCMC SAMPLING AND NETWORK PRUNING

LEARNING SPARSE STRUCTURED ENSEMBLES WITH STOCASTIC GTADIENT MCMC SAMPLING AND NETWORK PRUNING LEARNING SPARSE STRUCTURED ENSEMBLES WITH STOCASTIC GTADIENT MCMC SAMPLING AND NETWORK PRUNING Yichi Zhang Zhijian Ou Speech Processing and Machine Intelligence (SPMI) Lab Department of Electronic Engineering

More information

Learning Binary Classifiers for Multi-Class Problem

Learning Binary Classifiers for Multi-Class Problem Research Memorandum No. 1010 September 28, 2006 Learning Binary Classifiers for Multi-Class Problem Shiro Ikeda The Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku, Tokyo, 106-8569,

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

More information

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions - Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions Simon Luo The University of Sydney Data61, CSIRO simon.luo@data61.csiro.au Mahito Sugiyama National Institute of

More information

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Deep learning with differential Gaussian process flows

Deep learning with differential Gaussian process flows Deep learning with differential Gaussian process flows Pashupati Hegde Markus Heinonen Harri Lähdesmäki Samuel Kaski Helsinki Institute for Information Technology HIIT Department of Computer Science, Aalto

More information

DIFFERENTIAL GEOMETRY HW 12

DIFFERENTIAL GEOMETRY HW 12 DIFFERENTIAL GEOMETRY HW 1 CLAY SHONKWILER 3 Find the Lie algebra so(n) of the special orthogonal group SO(n), and the explicit formula for the Lie bracket there. Proof. Since SO(n) is a subgroup of GL(n),

More information

Hyperplane Margin Classifiers on the Multinomial Manifold

Hyperplane Margin Classifiers on the Multinomial Manifold Hyperplane Margin Classifiers on the Multinomial Manifold Guy Lebanon LEBANON@CS.CMU.EDU John Lafferty LAFFERTY@CS.CMU.EDU School of Computer Science, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh

More information

Expectation Propagation for Approximate Bayesian Inference

Expectation Propagation for Approximate Bayesian Inference Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

Metric Embedding for Kernel Classification Rules

Metric Embedding for Kernel Classification Rules Metric Embedding for Kernel Classification Rules Bharath K. Sriperumbudur University of California, San Diego (Joint work with Omer Lang & Gert Lanckriet) Bharath K. Sriperumbudur (UCSD) Metric Embedding

More information

Variational Bayesian Logistic Regression

Variational Bayesian Logistic Regression Variational Bayesian Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic

More information

Learning to Sample Using Stein Discrepancy

Learning to Sample Using Stein Discrepancy Learning to Sample Using Stein Discrepancy Dilin Wang Yihao Feng Qiang Liu Department of Computer Science Dartmouth College Hanover, NH 03755 {dilin.wang.gr, yihao.feng.gr, qiang.liu}@dartmouth.edu Abstract

More information

Faster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions

Faster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions Faster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions Mohammad Emtiyaz Khan, Reza Babanezhad, Wu Lin, Mark Schmidt, Masashi Sugiyama Conference on Uncertainty

More information

CSC2541 Lecture 5 Natural Gradient

CSC2541 Lecture 5 Natural Gradient CSC2541 Lecture 5 Natural Gradient Roger Grosse Roger Grosse CSC2541 Lecture 5 Natural Gradient 1 / 12 Motivation Two classes of optimization procedures used throughout ML (stochastic) gradient descent,

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian

More information

HOMEWORK #4: LOGISTIC REGRESSION

HOMEWORK #4: LOGISTIC REGRESSION HOMEWORK #4: LOGISTIC REGRESSION Probabilistic Learning: Theory and Algorithms CS 274A, Winter 2019 Due: 11am Monday, February 25th, 2019 Submit scan of plots/written responses to Gradebook; submit your

More information

New Insights and Perspectives on the Natural Gradient Method

New Insights and Perspectives on the Natural Gradient Method 1 / 18 New Insights and Perspectives on the Natural Gradient Method Yoonho Lee Department of Computer Science and Engineering Pohang University of Science and Technology March 13, 2018 Motivation 2 / 18

More information

CALCULUS ON MANIFOLDS. 1. Riemannian manifolds Recall that for any smooth manifold M, dim M = n, the union T M =

CALCULUS ON MANIFOLDS. 1. Riemannian manifolds Recall that for any smooth manifold M, dim M = n, the union T M = CALCULUS ON MANIFOLDS 1. Riemannian manifolds Recall that for any smooth manifold M, dim M = n, the union T M = a M T am, called the tangent bundle, is itself a smooth manifold, dim T M = 2n. Example 1.

More information

Probabilistic Graphical Models for Image Analysis - Lecture 4

Probabilistic Graphical Models for Image Analysis - Lecture 4 Probabilistic Graphical Models for Image Analysis - Lecture 4 Stefan Bauer 12 October 2018 Max Planck ETH Center for Learning Systems Overview 1. Repetition 2. α-divergence 3. Variational Inference 4.

More information

Energy-Based Generative Adversarial Network

Energy-Based Generative Adversarial Network Energy-Based Generative Adversarial Network Energy-Based Generative Adversarial Network J. Zhao, M. Mathieu and Y. LeCun Learning to Draw Samples: With Application to Amoritized MLE for Generalized Adversarial

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

Discriminant Analysis with High Dimensional. von Mises-Fisher distribution and

Discriminant Analysis with High Dimensional. von Mises-Fisher distribution and Athens Journal of Sciences December 2014 Discriminant Analysis with High Dimensional von Mises - Fisher Distributions By Mario Romanazzi This paper extends previous work in discriminant analysis with von

More information

Stochastic Variational Inference

Stochastic Variational Inference Stochastic Variational Inference David M. Blei Princeton University (DRAFT: DO NOT CITE) December 8, 2011 We derive a stochastic optimization algorithm for mean field variational inference, which we call

More information

Lecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications

Lecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications Lecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications Qixing Huang The University of Texas at Austin huangqx@cs.utexas.edu 1 Disclaimer This note is adapted from Section

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data

Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data Yarin Gal Yutian Chen Zoubin Ghahramani yg279@cam.ac.uk Distribution Estimation Distribution estimation of categorical

More information

Facilitating Bayesian Continual Learning by Natural Gradients and Stein Gradients

Facilitating Bayesian Continual Learning by Natural Gradients and Stein Gradients Facilitating Bayesian Continual Learning by Natural Gradients and Stein Gradients Yu Chen University of Bristol yc14600@bristol.ac.u Tom Diethe Amazon tdiethe@amazon.com Neil Lawrence Amazon lawrennd@amazon.com

More information

Dimensionality Reduction with Generalized Linear Models

Dimensionality Reduction with Generalized Linear Models Dimensionality Reduction with Generalized Linear Models Mo Chen 1 Wei Li 1 Wei Zhang 2 Xiaogang Wang 1 1 Department of Electronic Engineering 2 Department of Information Engineering The Chinese University

More information

Basic Principles of Unsupervised and Unsupervised

Basic Principles of Unsupervised and Unsupervised Basic Principles of Unsupervised and Unsupervised Learning Toward Deep Learning Shun ichi Amari (RIKEN Brain Science Institute) collaborators: R. Karakida, M. Okada (U. Tokyo) Deep Learning Self Organization

More information

Neural Learning in Structured Parameter Spaces Natural Riemannian Gradient

Neural Learning in Structured Parameter Spaces Natural Riemannian Gradient Neural Learning in Structured Parameter Spaces Natural Riemannian Gradient Shun-ichi Amari RIKEN Frontier Research Program, RIKEN, Hirosawa 2-1, Wako-shi 351-01, Japan amari@zoo.riken.go.jp Abstract The

More information

Online Bayesian Passive-Agressive Learning

Online Bayesian Passive-Agressive Learning Online Bayesian Passive-Agressive Learning International Conference on Machine Learning, 2014 Tianlin Shi Jun Zhu Tsinghua University, China 21 August 2015 Presented by: Kyle Ulrich Introduction Online

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Diversity-Promoting Bayesian Learning of Latent Variable Models

Diversity-Promoting Bayesian Learning of Latent Variable Models Diversity-Promoting Bayesian Learning of Latent Variable Models Pengtao Xie 1, Jun Zhu 1,2 and Eric Xing 1 1 Machine Learning Department, Carnegie Mellon University 2 Department of Computer Science and

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

Comparison of Modern Stochastic Optimization Algorithms

Comparison of Modern Stochastic Optimization Algorithms Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,

More information

From independent component analysis to score matching

From independent component analysis to score matching From independent component analysis to score matching Aapo Hyvärinen Dept of Computer Science & HIIT Dept of Mathematics and Statistics University of Helsinki Finland 1 Abstract First, short introduction

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Variational Autoencoders

Variational Autoencoders Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Lecture 6. Notes on Linear Algebra. Perceptron

Lecture 6. Notes on Linear Algebra. Perceptron Lecture 6. Notes on Linear Algebra. Perceptron COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Notes on linear algebra Vectors

More information

Information geometry of mirror descent

Information geometry of mirror descent Information geometry of mirror descent Geometric Science of Information Anthea Monod Department of Statistical Science Duke University Information Initiative at Duke G. Raskutti (UW Madison) and S. Mukherjee

More information

Estimation theory and information geometry based on denoising

Estimation theory and information geometry based on denoising Estimation theory and information geometry based on denoising Aapo Hyvärinen Dept of Computer Science & HIIT Dept of Mathematics and Statistics University of Helsinki Finland 1 Abstract What is the best

More information

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1

More information

Fast yet Simple Natural-Gradient Variational Inference in Complex Models

Fast yet Simple Natural-Gradient Variational Inference in Complex Models Fast yet Simple Natural-Gradient Variational Inference in Complex Models Mohammad Emtiyaz Khan RIKEN Center for AI Project, Tokyo, Japan http://emtiyaz.github.io Joint work with Wu Lin (UBC) DidrikNielsen,

More information

MATHEMATICAL entities that do not form Euclidean

MATHEMATICAL entities that do not form Euclidean Kernel Methods on Riemannian Manifolds with Gaussian RBF Kernels Sadeep Jayasumana, Student Member, IEEE, Richard Hartley, Fellow, IEEE, Mathieu Salzmann, Member, IEEE, Hongdong Li, Member, IEEE, and Mehrtash

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Graph Matching & Information. Geometry. Towards a Quantum Approach. David Emms

Graph Matching & Information. Geometry. Towards a Quantum Approach. David Emms Graph Matching & Information Geometry Towards a Quantum Approach David Emms Overview Graph matching problem. Quantum algorithms. Classical approaches. How these could help towards a Quantum approach. Graphs

More information

Ad Placement Strategies

Ad Placement Strategies Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad

More information

Time-Sensitive Dirichlet Process Mixture Models

Time-Sensitive Dirichlet Process Mixture Models Time-Sensitive Dirichlet Process Mixture Models Xiaojin Zhu Zoubin Ghahramani John Lafferty May 25 CMU-CALD-5-4 School of Computer Science Carnegie Mellon University Pittsburgh, PA 523 Abstract We introduce

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Online Bayesian Passive-Aggressive Learning

Online Bayesian Passive-Aggressive Learning Online Bayesian Passive-Aggressive Learning Full Journal Version: http://qr.net/b1rd Tianlin Shi Jun Zhu ICML 2014 T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 1 / 35 Outline Introduction Motivation Framework

More information

Development of Stochastic Artificial Neural Networks for Hydrological Prediction

Development of Stochastic Artificial Neural Networks for Hydrological Prediction Development of Stochastic Artificial Neural Networks for Hydrological Prediction G. B. Kingston, M. F. Lambert and H. R. Maier Centre for Applied Modelling in Water Engineering, School of Civil and Environmental

More information

Transport Continuity Property

Transport Continuity Property On Riemannian manifolds satisfying the Transport Continuity Property Université de Nice - Sophia Antipolis (Joint work with A. Figalli and C. Villani) I. Statement of the problem Optimal transport on Riemannian

More information

Actor-critic methods. Dialogue Systems Group, Cambridge University Engineering Department. February 21, 2017

Actor-critic methods. Dialogue Systems Group, Cambridge University Engineering Department. February 21, 2017 Actor-critic methods Milica Gašić Dialogue Systems Group, Cambridge University Engineering Department February 21, 2017 1 / 21 In this lecture... The actor-critic architecture Least-Squares Policy Iteration

More information

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015 Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative

More information

Supplementary Material of High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models

Supplementary Material of High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models Supplementary Material of High-Order Stochastic Gradient hermostats for Bayesian Learning of Deep Models Chunyuan Li, Changyou Chen, Kai Fan 2 and Lawrence Carin Department of Electrical and Computer Engineering,

More information

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Practical Bayesian Optimization of Machine Learning. Learning Algorithms Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that

More information

Supervised Dimension Reduction:

Supervised Dimension Reduction: Supervised Dimension Reduction: A Tale of Two Manifolds S. Mukherjee, K. Mao, F. Liang, Q. Wu, M. Maggioni, D-X. Zhou Department of Statistical Science Institute for Genome Sciences & Policy Department

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Learning Task Grouping and Overlap in Multi-Task Learning

Learning Task Grouping and Overlap in Multi-Task Learning Learning Task Grouping and Overlap in Multi-Task Learning Abhishek Kumar Hal Daumé III Department of Computer Science University of Mayland, College Park 20 May 2013 Proceedings of the 29 th International

More information