Gradient Estimation Using Stochastic Computation Graphs
|
|
- Hortense Lindsey
- 5 years ago
- Views:
Transcription
1 Gradient Estimation Using Stochastic Computation Graphs Yoonho Lee Department of Computer Science and Engineering Pohang University of Science and Technology May 9, 2017
2 Outline Stochastic Gradient Estimation Stochastic Computation Graphs Stochastic Computation Graphs in RL Other Examples of Stochastic Computation Graphs
3 Outline Stochastic Gradient Estimation Stochastic Computation Graphs Stochastic Computation Graphs in RL Other Examples of Stochastic Computation Graphs
4 Stochastic Gradient Estimation R(θ) = E w Pθ (dw)[f (θ, w)] Approximate θ R(θ) can be used for: Computational Finance Reinforcement Learning Variational Inference Explaining the brain with neural networks
5 Stochastic Gradient Estimation R(θ) = E w Pθ (dw)[f (θ, w)] = f (θ, w)p θ (dw) Ω
6 Stochastic Gradient Estimation R(θ) = E w Pθ (dw)[f (θ, w)] = = Ω Ω f (θ, w) P θ(dw) G(dw) G(dw) f (θ, w)p θ (dw)
7 Stochastic Gradient Estimation R(θ) = E w Pθ (dw)[f (θ, w)] = θ R(θ) = θ = Ω Ω = Ω Ω f (θ, w) P θ(dw) G(dw) G(dw) f (θ, w) P θ(dw) G(dw) G(dw) θ (f (θ, w) P θ(dw) G(dw) )G(dw) f (θ, w)p θ (dw) = E w G(dw) [ θ f (θ, w) P θ(dw) G(dw) + f (θ, w) θp θ (dw) ] G(dw)
8 Stochastic Gradient Estimation So far we have θ R(θ) = E w G(dw) [ θ f (θ, w) P θ(dw) G(dw) + f (θ, w) θp θ (dw) ] G(dw)
9 Stochastic Gradient Estimation So far we have θ R(θ) = E w G(dw) [ θ f (θ, w) P θ(dw) G(dw) + f (θ, w) θp θ (dw) ] G(dw) Letting G = P θ0 and evaluating at θ = θ 0, we get θ R(θ) = E w Pθ0 (dw)[ θ f (θ, w) P θ(dw) θ=θ0 P θ0 (dw) + f (θ, w) θp θ (dw) P θ0 (dw) ] = E w Pθ0 (dw)[ θ f (θ, w) + f (θ, w) θ lnp θ (dw)] θ=θ0 θ=θ0
10 Stochastic Gradient Estimation So far we have R(θ) = E w Pθ (dw)[f (θ, w)] θ R(θ) = E w Pθ0 (dw)[ θ f (θ, w) + f (θ, w) θ lnp θ (dw)] θ=θ0 θ=θ0
11 Stochastic Gradient Estimation So far we have R(θ) = E w Pθ (dw)[f (θ, w)] θ R(θ) = E w Pθ0 (dw)[ θ f (θ, w) + f (θ, w) θ lnp θ (dw)] θ=θ0 1. Assuming w fully determines f : θ R(θ) = E w Pθ0 (dw)[f (w) θ lnp θ (dw)] θ=θ0 2. Assuming P θ (dw) is independent of θ: θ R(θ) = E w P(dw) [ θ f (θ, w)] θ=θ0 θ=θ0
12 Stochastic Gradient Estimation SF vs PD R(θ) = E w Pθ (dw)[f (θ, w)] Score Function (SF) Estimator: θ R(θ) = E w Pθ0 (dw)[f (w) θ lnp θ (dw)] θ=θ0 Pathwise Derivative (PD) Estimator: θ R(θ) = E w P(dw) [ θ f (θ, w)] θ=θ0
13 Stochastic Gradient Estimation SF vs PD R(θ) = E w Pθ (dw)[f (θ, w)] Score Function (SF) Estimator: θ R(θ) = E w Pθ0 (dw)[f (w) θ lnp θ (dw)] θ=θ0 Pathwise Derivative (PD) Estimator: θ R(θ) = E w P(dw) [ θ f (θ, w)] θ=θ0 SF allows discontinuous f. SF requires sample values f ; PD requires derivatives f. SF takes derivatives over the probability distribution of w; PD takes derivatives over the function f. SF has high variance on high-dimensional w. PD has high variance on rough f.
14 Stochastic Gradient Estimation Choices for PD estimator blog.shakirm.com/2015/10/machine-learning-trick-of-the-day-4-reparameterisation-tricks/
15 Outline Stochastic Gradient Estimation Stochastic Computation Graphs Stochastic Computation Graphs in RL Other Examples of Stochastic Computation Graphs
16 Gradient Estimation Using Stochastic Computation Graphs (NIPS 2015) John Schulman, Nicolas Heess, Theophane Weber, Pieter Abbeel
17 Stochastic Computation Graphs SF estimator: θ R(θ) = E x Pθ0 (dx)[f (x) θ lnp θ (dx)] θ=θ0 PD estimator: θ R(θ) = E z P(dz) [ θ f (x(θ, z))] θ=θ0
18 Stochastic Computation Graphs Stochastic computation graphs are a design choice rather than an intrinsic property of a problem Stochastic nodes block gradient flow
19 Stochastic Computation Graphs Neural Networks Neural Networks are a special case of stochastic computation graphs
20 Stochastic Computation Graphs Formal Definition
21 Stochastic Computation Graphs Deriving Unbiased Estimators Condition 1. (Differentiability Requirements) Given input node θ Θ, for all edges (v, w) which satisfy θ D v and θ D w, the following condition holds: if w is deterministic, w v exists, and if w is stochastic, v p(w PARENTS w ) exists. Theorem 1. Suppose θ Θ satisfies Condition 1. The following equations hold: θ E [ c ] [ ( = E θ logp(w DEPS w ) ) ˆQ w + c C w S c C θ D w θ D c [ = E c C ĉ w S θ D w θ logp(w DEPS w ) + c C θ D c ] θ c(deps c) ] θ c(deps c)
22 Stochastic Computation Graphs Surrogate loss functions We have this equation from Theorem 1: [ ] [ θ E c = E c C Note that by defining L(Θ, S) := w S w S θ D w ( θ logp(w DEPS w ) ) ˆQ w + c C θ D c logp(w DEPS w ) ˆQ w + c C c(deps c ) ] θ c(deps c) we have [ ] [ ] θ E c = E L(Θ, S) θ c C
23 Stochastic Computation Graphs Surrogate loss functions
24 Stochastic Computation Graphs Implementations Making stochastic computation graphs with neural network packages is easy tensorflow.org/api_docs/python/tf/contrib/bayesflow/stochastic_gradient_estimators github.com/pytorch/pytorch/blob/6a69f7007b37bb36d8a712cdf5cebe3ee0cc1cc8/torch/autograd/ variable.py github.com/blei-lab/edward/blob/master/edward/inferences/klqp.py
25 Outline Stochastic Gradient Estimation Stochastic Computation Graphs Stochastic Computation Graphs in RL Other Examples of Stochastic Computation Graphs
26 MDP The MDP formulation itself is a stochastic computation graph
27 Policy Gradient Surrogate Loss 1 The policy gradient algorithm is Theorem 1 applied to the MDP graph 1 Richard S Sutton et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation. In: NIPS (1999).
28 Stochastic Value Gradient SVG(0) Learn Q φ, and update 2 θ θ T t=1 Q φ (s t, π θ (s t, ɛ t )) 2 Nicolas Heess et al. Learning Continuous Control Policies by Stochastic Value Gradients. In: NIPS (2015).
29 Stochastic Value Gradient SVG(1) Learn V φ and dynamics model f such that s t+1 = f (s t, a t ) + δ t. Given transition (s t, a t, s t+1 ), infer noise δ t. 3 θ θ T t=1 r t + γv φ (f (s t, a t ) + δ t ) 3 Nicolas Heess et al. Learning Continuous Control Policies by Stochastic Value Gradients. In: NIPS (2015).
30 Stochastic Value Gradient SVG( ) Learn f, infer all noise variables δ t. Differentiate through whole graph. 4 4 Nicolas Heess et al. Learning Continuous Control Policies by Stochastic Value Gradients. In: NIPS (2015).
31 Deterministic Policy Gradient 5 5 David Silver et al. Deterministic Policy Gradient Algorithms. In: ICML (2014).
32 Evolution Strategies 6 6 Tim Salimans et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning. In: arxiv (2017).
33 Outline Stochastic Gradient Estimation Stochastic Computation Graphs Stochastic Computation Graphs in RL Other Examples of Stochastic Computation Graphs
34 Neural Variational Inference Where 7 L(x, θ, φ) = E h Q(h x) [logp θ (x, h) logq φ (h x)] Score function estimator 7 Andriy Mnih and Karol Gregor. Neural Variational Inference and Learning in Belief Networks. In: ICML (2014).
35 Variational Autoencoder 8 Pathwise derivative estimator applied to the exact same graph as NVIL 8 Diederik P Kingma and Max Welling. Auto-Encoding Variational Bayes. In: ICLR (2014).
36 DRAW 9 Sequential version of VAE Same graph as SVG( ) with (known) deterministic dynamics(e=state, z=action, L=reward) Similarly, VAE can be seen as SVG( ) with 1 timestep and deterministic dynamics 9 Karol Gregor et al. DRAW: A Recurrent Neural Network For Image Generation. In: ICML (2015).
37 GAN Connection to SVG(0) has been established in [10] Actor has no access to state. Observation is a random choice between real or fake image. Reward is 1 if real, 0 if fake. D learns value function of observation. G s learning signal is D David Pfau and Oriol Vinyals. Connecting Generative Adversarial Networks and Actor-Critic Methods. In: (2016).
38 Bayes by backprop and our cost function f is: 11 f (D, θ) = E w q(w θ) [f (w, θ)] = E w q(w θ) [ logp(d w) + logq(w θ) logp(w)] From the point of view of stochastic computation graphs, weights and activations are not different Estimate ELBO gradient of activation with PD estimator: VAE Estimate ELBO gradient of weight with PD estimator: Bayes by Backprop 11 Charles Blundell et al. Weight Uncertainty in Neural Networks. In: ICML (2015).
39 Summary Stochastic computation graphs explain many different algorithms with one gradient estimator(theorem 1) Stochastic computation graphs give us insight into the behavior of estimators Stochastic computation graphs reveal equivalencies: NVIL[7]=Policy gradient[12] VAE[6]/DRAW[4]=SVG( )[5] GAN[3]=SVG(0)[5]
40 References I [1] Charles Blundell et al. Weight Uncertainty in Neural Networks. In: ICML (2015). [2] Michael Fu. Stochastic Gradient Estimation. In: (2005). [3] Ian J. Goodfellow et al. Generative Adversarial Networks. In: NIPS (2014). [4] Karol Gregor et al. DRAW: A Recurrent Neural Network For Image Generation. In: ICML (2015). [5] Nicolas Heess et al. Learning Continuous Control Policies by Stochastic Value Gradients. In: NIPS (2015). [6] Diederik P Kingma and Max Welling. Auto-Encoding Variational Bayes. In: ICLR (2014). [7] Andriy Mnih and Karol Gregor. Neural Variational Inference and Learning in Belief Networks. In: ICML (2014).
41 References II [8] David Pfau and Oriol Vinyals. Connecting Generative Adversarial Networks and Actor-Critic Methods. In: (2016). [9] Tim Salimans et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning. In: arxiv (2017). [10] John Schulman et al. Gradient Estimation Using Stochastic Computation Graphs. In: NIPS (2015). [11] David Silver et al. Deterministic Policy Gradient Algorithms. In: ICML (2014). [12] Richard S Sutton et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation. In: NIPS (1999).
42 Thank You
Policy Gradient Methods. February 13, 2017
Policy Gradient Methods February 13, 2017 Policy Optimization Problems maximize E π [expression] π Fixed-horizon episodic: T 1 Average-cost: lim T 1 T r t T 1 r t Infinite-horizon discounted: γt r t Variable-length
More informationarxiv: v3 [cs.lg] 25 Feb 2016 ABSTRACT
MUPROP: UNBIASED BACKPROPAGATION FOR STOCHASTIC NEURAL NETWORKS Shixiang Gu 1 2, Sergey Levine 3, Ilya Sutskever 3, and Andriy Mnih 4 1 University of Cambridge 2 MPI for Intelligent Systems, Tübingen,
More informationDeep Reinforcement Learning via Policy Optimization
Deep Reinforcement Learning via Policy Optimization John Schulman July 3, 2017 Introduction Deep Reinforcement Learning: What to Learn? Policies (select next action) Deep Reinforcement Learning: What to
More informationReinforcement Learning and NLP
1 Reinforcement Learning and NLP Kapil Thadani kapil@cs.columbia.edu RESEARCH Outline 2 Model-free RL Markov decision processes (MDPs) Derivative-free optimization Policy gradients Variance reduction Value
More informationDeep Reinforcement Learning: Policy Gradients and Q-Learning
Deep Reinforcement Learning: Policy Gradients and Q-Learning John Schulman Bay Area Deep Learning School September 24, 2016 Introduction and Overview Aim of This Talk What is deep RL, and should I use
More informationBACKPROPAGATION THROUGH THE VOID
BACKPROPAGATION THROUGH THE VOID Optimizing Control Variates for Black-Box Gradient Estimation 27 Nov 2017, University of Cambridge Speaker: Geoffrey Roeder, University of Toronto OPTIMIZING EXPECTATIONS
More informationVariational Autoencoder
Variational Autoencoder Göker Erdo gan August 8, 2017 The variational autoencoder (VA) [1] is a nonlinear latent variable model with an efficient gradient-based training procedure based on variational
More informationProbabilistic & Bayesian deep learning. Andreas Damianou
Probabilistic & Bayesian deep learning Andreas Damianou Amazon Research Cambridge, UK Talk at University of Sheffield, 19 March 2019 In this talk Not in this talk: CRFs, Boltzmann machines,... In this
More informationVIME: Variational Information Maximizing Exploration
1 VIME: Variational Information Maximizing Exploration Rein Houthooft, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel Reviewed by Zhao Song January 13, 2017 1 Exploration in Reinforcement
More informationReinforced Variational Inference
Reinforced Variational Inference Theophane Weber 1 Nicolas Heess 1 S. M. Ali Eslami 1 John Schulman 2 David Wingate 3 David Silver 1 1 Google DeepMind 2 University of California, Berkeley 3 Brigham Young
More informationDeep Reinforcement Learning
Deep Reinforcement Learning John Schulman 1 MLSS, May 2016, Cadiz 1 Berkeley Artificial Intelligence Research Lab Agenda Introduction and Overview Markov Decision Processes Reinforcement Learning via Black-Box
More informationLecture 9: Policy Gradient II 1
Lecture 9: Policy Gradient II 1 Emma Brunskill CS234 Reinforcement Learning. Winter 2019 Additional reading: Sutton and Barto 2018 Chp. 13 1 With many slides from or derived from David Silver and John
More informationReinforcement Learning for NLP
Reinforcement Learning for NLP Advanced Machine Learning for NLP Jordan Boyd-Graber REINFORCEMENT OVERVIEW, POLICY GRADIENT Adapted from slides by David Silver, Pieter Abbeel, and John Schulman Advanced
More informationREINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning
REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning Ronen Tamari The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (#67679) February 28, 2016 Ronen Tamari
More informationAn Information Theoretic Interpretation of Variational Inference based on the MDL Principle and the Bits-Back Coding Scheme
An Information Theoretic Interpretation of Variational Inference based on the MDL Principle and the Bits-Back Coding Scheme Ghassen Jerfel April 2017 As we will see during this talk, the Bayesian and information-theoretic
More informationIntroduction to Reinforcement Learning. CMPT 882 Mar. 18
Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and
More informationBackpropagation Through
Backpropagation Through Backpropagation Through Backpropagation Through Will Grathwohl Dami Choi Yuhuai Wu Geoff Roeder David Duvenaud Where do we see this guy? L( ) =E p(b ) [f(b)] Just about everywhere!
More informationarxiv: v1 [stat.ml] 3 Nov 2016
CATEGORICAL REPARAMETERIZATION WITH GUMBEL-SOFTMAX Eric Jang Google Brain ejang@google.com Shixiang Gu University of Cambridge MPI Tübingen Google Brain sg717@cam.ac.uk Ben Poole Stanford University poole@cs.stanford.edu
More informationCombine Monte Carlo with Exhaustive Search: Effective Variational Inference and Policy Gradient Reinforcement Learning
Combine Monte Carlo with Exhaustive Search: Effective Variational Inference and Policy Gradient Reinforcement Learning Michalis K. Titsias Department of Informatics Athens University of Economics and Business
More informationarxiv: v5 [stat.ml] 5 Aug 2017
CATEGORICAL REPARAMETERIZATION WITH GUMBEL-SOFTMAX Eric Jang Google Brain ejang@google.com Shixiang Gu University of Cambridge MPI Tübingen sg77@cam.ac.uk Ben Poole Stanford University poole@cs.stanford.edu
More informationGENERATIVE ADVERSARIAL LEARNING
GENERATIVE ADVERSARIAL LEARNING OF MARKOV CHAINS Jiaming Song, Shengjia Zhao & Stefano Ermon Computer Science Department Stanford University {tsong,zhaosj12,ermon}@cs.stanford.edu ABSTRACT We investigate
More informationThe Success of Deep Generative Models
The Success of Deep Generative Models Jakub Tomczak AMLAB, University of Amsterdam CERN, 2018 What is AI about? What is AI about? Decision making: What is AI about? Decision making: new data High probability
More informationLecture 9: Policy Gradient II (Post lecture) 2
Lecture 9: Policy Gradient II (Post lecture) 2 Emma Brunskill CS234 Reinforcement Learning. Winter 2018 Additional reading: Sutton and Barto 2018 Chp. 13 2 With many slides from or derived from David Silver
More informationDiCE: The Infinitely Differentiable Monte Carlo Estimator
Jakob Foerster 1 Gregory Farquhar * 1 Maruan Al-Shedivat * 2 Tim Rocktäschel 1 Eric P. Xing 2 Shimon Whiteson 1 Moreover, to match the first order gradient after a single differentiation, the SL treats
More informationAuto-Encoding Variational Bayes. Stochastic Backpropagation and Approximate Inference in Deep Generative Models
Auto-Encoding Variational Bayes Diederik Kingma and Max Welling Stochastic Backpropagation and Approximate Inference in Deep Generative Models Danilo J. Rezende, Shakir Mohamed, Daan Wierstra Neural Variational
More informationDiscrete Latent Variable Models
Discrete Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 14 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 14 1 / 29 Summary Major themes in the course
More informationAuto-Encoding Variational Bayes
Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower
More informationSearching for the Principles of Reasoning and Intelligence
Searching for the Principles of Reasoning and Intelligence Shakir Mohamed shakir@deepmind.com DALI 2018 @shakir_a Statistical Operations Estimation and Learning Inference Hypothesis Testing Summarisation
More informationBayesian Deep Learning
Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference
More informationDeep Reinforcement Learning. Scratching the surface
Deep Reinforcement Learning Scratching the surface Deep Reinforcement Learning Scenario of Reinforcement Learning Observation State Agent Action Change the environment Don t do that Reward Environment
More informationarxiv: v1 [cs.lg] 28 Dec 2017
PixelSNAIL: An Improved Autoregressive Generative Model arxiv:1712.09763v1 [cs.lg] 28 Dec 2017 Xi Chen, Nikhil Mishra, Mostafa Rohaninejad, Pieter Abbeel Embodied Intelligence UC Berkeley, Department of
More informationStochastic Backpropagation, Variational Inference, and Semi-Supervised Learning
Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning Diederik (Durk) Kingma Danilo J. Rezende (*) Max Welling Shakir Mohamed (**) Stochastic Gradient Variational Inference Bayesian
More informationLecture 8: Policy Gradient I 2
Lecture 8: Policy Gradient I 2 Emma Brunskill CS234 Reinforcement Learning. Winter 2018 Additional reading: Sutton and Barto 2018 Chp. 13 2 With many slides from or derived from David Silver and John Schulman
More informationarxiv: v4 [stat.co] 19 May 2015
Markov Chain Monte Carlo and Variational Inference: Bridging the Gap arxiv:14.6460v4 [stat.co] 19 May 201 Tim Salimans Algoritmica Diederik P. Kingma and Max Welling University of Amsterdam Abstract Recent
More informationDICE: THE INFINITELY DIFFERENTIABLE MONTE CARLO ESTIMATOR
DICE: THE INFINITELY DIFFERENTIABLE MONTE CARLO ESTIMATOR Jakob Foerster, Greg Farquhar, Tim Rocktäschel, Shimon Whiteson University of Oxford / Correspondence to: jakob.foerster@cs.ox.ac.uk Maruan Al-Shedivat
More informationDiscrete Variables and Gradient Estimators
iscrete Variables and Gradient Estimators This assignment is designed to get you comfortable deriving gradient estimators, and optimizing distributions over discrete random variables. For most questions,
More informationLocal Expectation Gradients for Doubly Stochastic. Variational Inference
Local Expectation Gradients for Doubly Stochastic Variational Inference arxiv:1503.01494v1 [stat.ml] 4 Mar 2015 Michalis K. Titsias Athens University of Economics and Business, 76, Patission Str. GR10434,
More informationEvaluating the Variance of
Evaluating the Variance of Likelihood-Ratio Gradient Estimators Seiya Tokui 2 Issei Sato 2 3 Preferred Networks 2 The University of Tokyo 3 RIKEN ICML 27 @ Sydney Task: Gradient estimation for stochastic
More informationReinforcement Learning
Reinforcement Learning From the basics to Deep RL Olivier Sigaud ISIR, UPMC + INRIA http://people.isir.upmc.fr/sigaud September 14, 2017 1 / 54 Introduction Outline Some quick background about discrete
More informationReinforcement Learning
Reinforcement Learning Policy gradients Daniel Hennes 26.06.2017 University Stuttgart - IPVS - Machine Learning & Robotics 1 Policy based reinforcement learning So far we approximated the action value
More informationREBAR: Low-variance, unbiased gradient estimates for discrete latent variable models
RBAR: Low-variance, unbiased gradient estimates for discrete latent variable models George Tucker 1,, Andriy Mnih 2, Chris J. Maddison 2,3, Dieterich Lawson 1,*, Jascha Sohl-Dickstein 1 1 Google Brain,
More informationBayesian Semi-supervised Learning with Deep Generative Models
Bayesian Semi-supervised Learning with Deep Generative Models Jonathan Gordon Department of Engineering Cambridge University jg801@cam.ac.uk José Miguel Hernández-Lobato Department of Engineering Cambridge
More informationDeep Generative Models for Graph Generation. Jian Tang HEC Montreal CIFAR AI Chair, Mila
Deep Generative Models for Graph Generation Jian Tang HEC Montreal CIFAR AI Chair, Mila Email: jian.tang@hec.ca Deep Generative Models Goal: model data distribution p(x) explicitly or implicitly, where
More informationINF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018
Machine learning for image classification Lecture 14: Reinforcement learning May 9, 2018 Page 3 Outline Motivation Introduction to reinforcement learning (RL) Value function based methods (Q-learning)
More informationNonparametric Inference for Auto-Encoding Variational Bayes
Nonparametric Inference for Auto-Encoding Variational Bayes Erik Bodin * Iman Malik * Carl Henrik Ek * Neill D. F. Campbell * University of Bristol University of Bath Variational approximations are an
More informationNew Insights and Perspectives on the Natural Gradient Method
1 / 18 New Insights and Perspectives on the Natural Gradient Method Yoonho Lee Department of Computer Science and Engineering Pohang University of Science and Technology March 13, 2018 Motivation 2 / 18
More informationResampled Proposal Distributions for Variational Inference and Learning
Resampled Proposal Distributions for Variational Inference and Learning Aditya Grover * 1 Ramki Gummadi * 2 Miguel Lazaro-Gredilla 2 Dale Schuurmans 3 Stefano Ermon 1 Abstract Learning generative models
More informationTTIC 31230, Fundamentals of Deep Learning David McAllester, Winter Variational Autoencoders
TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter 2018 Variational Autoencoders 1 The Latent Variable Cross-Entropy Objective We will now drop the negation and switch to argmax. Φ = argmax
More informationResampled Proposal Distributions for Variational Inference and Learning
Resampled Proposal Distributions for Variational Inference and Learning Aditya Grover * 1 Ramki Gummadi * 2 Miguel Lazaro-Gredilla 2 Dale Schuurmans 3 Stefano Ermon 1 Abstract Learning generative models
More informationVariational Inference for Monte Carlo Objectives
Variational Inference for Monte Carlo Objectives Andriy Mnih, Danilo Rezende June 21, 2016 1 / 18 Introduction Variational training of directed generative models has been widely adopted. Results depend
More informationVariational Inference via Stochastic Backpropagation
Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation
More informationarxiv: v3 [cs.lg] 23 Feb 2018
BACKPROPAGATION THROUGH THE VOID: OPTIMIZING CONTROL VARIATES FOR BLACK-BOX GRADIENT ESTIMATION Will Grathwohl, Dami Choi, Yuhuai Wu, Geoffrey Roeder, David Duvenaud University of Toronto and Vector Institute
More informationVariational Dropout via Empirical Bayes
Variational Dropout via Empirical Bayes Valery Kharitonov 1 kharvd@gmail.com Dmitry Molchanov 1, dmolch111@gmail.com Dmitry Vetrov 1, vetrovd@yandex.ru 1 National Research University Higher School of Economics,
More informationREINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS
Worshop trac - ICLR 207 REINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS Chris Cremer, Quaid Morris & David Duvenaud Department of Computer Science University of Toronto {ccremer,duvenaud}@cs.toronto.edu
More informationVariational Inference in TensorFlow. Danijar Hafner Stanford CS University College London, Google Brain
Variational Inference in TensorFlow Danijar Hafner Stanford CS 20 2018-02-16 University College London, Google Brain Outline Variational Inference Tensorflow Distributions VAE in TensorFlow Variational
More informationDenoising Criterion for Variational Auto-Encoding Framework
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Denoising Criterion for Variational Auto-Encoding Framework Daniel Jiwoong Im, Sungjin Ahn, Roland Memisevic, Yoshua
More informationReward-modulated inference
Buck Shlegeris Matthew Alger COMP3740, 2014 Outline Supervised, unsupervised, and reinforcement learning Neural nets RMI Results with RMI Types of machine learning supervised unsupervised reinforcement
More informationEvaluating the Variance of Likelihood-Ratio Gradient Estimators
Seiya Tokui 1 2 Issei Sato 3 2 Abstract The likelihood-ratio method is often used to estimate gradients of stochastic computations, for which baselines are required to reduce the estimation variance. Many
More informationVariational Autoencoders (VAEs)
September 26 & October 3, 2017 Section 1 Preliminaries Kullback-Leibler divergence KL divergence (continuous case) p(x) andq(x) are two density distributions. Then the KL-divergence is defined as Z KL(p
More informationMarkov Chain Monte Carlo and Variational Inference: Bridging the Gap
Markov Chain Monte Carlo and Variational Inference: Bridging the Gap Tim Salimans Algoritmica Diederik P. Kingma and Max Welling University of Amsterdam TIM@ALGORITMICA.NL [D.P.KINGMA,M.WELLING]@UVA.NL
More informationScalable Non-linear Beta Process Factor Analysis
Scalable Non-linear Beta Process Factor Analysis Kai Fan Duke University kai.fan@stat.duke.edu Katherine Heller Duke University kheller@stat.duke.com Abstract We propose a non-linear extension of the factor
More informationVariational Autoencoders
Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly
More informationVariational AutoEncoder: An Introduction and Recent Perspectives
Metode de optimizare Riemanniene pentru învățare profundă Proiect cofinanțat din Fondul European de Dezvoltare Regională prin Programul Operațional Competitivitate 2014-2020 Variational AutoEncoder: An
More informationQ-Learning in Continuous State Action Spaces
Q-Learning in Continuous State Action Spaces Alex Irpan alexirpan@berkeley.edu December 5, 2015 Contents 1 Introduction 1 2 Background 1 3 Q-Learning 2 4 Q-Learning In Continuous Spaces 4 5 Experimental
More informationLEARNING TO SAMPLE WITH ADVERSARIALLY LEARNED LIKELIHOOD-RATIO
LEARNING TO SAMPLE WITH ADVERSARIALLY LEARNED LIKELIHOOD-RATIO Chunyuan Li, Jianqiao Li, Guoyin Wang & Lawrence Carin Duke University {chunyuan.li,jianqiao.li,guoyin.wang,lcarin}@duke.edu ABSTRACT We link
More informationVariational Bayes on Monte Carlo Steroids
Variational Bayes on Monte Carlo Steroids Aditya Grover, Stefano Ermon Department of Computer Science Stanford University {adityag,ermon}@cs.stanford.edu Abstract Variational approaches are often used
More informationApproximation Methods in Reinforcement Learning
2018 CS420, Machine Learning, Lecture 12 Approximation Methods in Reinforcement Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/cs420/index.html Reinforcement
More informationUnsupervised Learning
CS 3750 Advanced Machine Learning hkc6@pitt.edu Unsupervised Learning Data: Just data, no labels Goal: Learn some underlying hidden structure of the data P(, ) P( ) Principle Component Analysis (Dimensionality
More informationProbabilistic Graphical Models
10-708 Probabilistic Graphical Models Homework 3 (v1.1.0) Due Apr 14, 7:00 PM Rules: 1. Homework is due on the due date at 7:00 PM. The homework should be submitted via Gradescope. Solution to each problem
More informationO P T I M I Z I N G E X P E C TAT I O N S : T O S T O C H A S T I C C O M P U TAT I O N G R A P H S. john schulman. Summer, 2016
O P T I M I Z I N G E X P E C TAT I O N S : FROM DEEP REINFORCEMENT LEARNING T O S T O C H A S T I C C O M P U TAT I O N G R A P H S john schulman Summer, 2016 A dissertation submitted in partial satisfaction
More information(Deep) Reinforcement Learning
Martin Matyášek Artificial Intelligence Center Czech Technical University in Prague October 27, 2016 Martin Matyášek VPD, 2016 1 / 17 Reinforcement Learning in a picture R. S. Sutton and A. G. Barto 2015
More informationVariance Reduction for Policy Gradient Methods. March 13, 2017
Variance Reduction for Policy Gradient Methods March 13, 2017 Reward Shaping Reward Shaping Reward Shaping Reward shaping: r(s, a, s ) = r(s, a, s ) + γφ(s ) Φ(s) for arbitrary potential Φ Theorem: r admits
More informationarxiv: v1 [cs.lg] 1 Jun 2017
Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning arxiv:706.00387v [cs.lg] Jun 207 Shixiang Gu University of Cambridge Max Planck Institute
More informationCombining PPO and Evolutionary Strategies for Better Policy Search
Combining PPO and Evolutionary Strategies for Better Policy Search Jennifer She 1 Abstract A good policy search algorithm needs to strike a balance between being able to explore candidate policies and
More informationDeep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016
Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is Variational Inference? What is Variational Inference? Want to estimate some distribution, p*(x) p*(x) What is
More informationAdvanced Policy Gradient Methods: Natural Gradient, TRPO, and More. March 8, 2017
Advanced Policy Gradient Methods: Natural Gradient, TRPO, and More March 8, 2017 Defining a Loss Function for RL Let η(π) denote the expected return of π [ ] η(π) = E s0 ρ 0,a t π( s t) γ t r t We collect
More informationLDA with Amortized Inference
LDA with Amortied Inference Nanbo Sun Abstract This report describes how to frame Latent Dirichlet Allocation LDA as a Variational Auto- Encoder VAE and use the Amortied Variational Inference AVI to optimie
More informationExperiments on the Consciousness Prior
Yoshua Bengio and William Fedus UNIVERSITÉ DE MONTRÉAL, MILA Abstract Experiments are proposed to explore a novel prior for representation learning, which can be combined with other priors in order to
More informationDeep Reinforcement Learning SISL. Jeremy Morton (jmorton2) November 7, Stanford Intelligent Systems Laboratory
Deep Reinforcement Learning Jeremy Morton (jmorton2) November 7, 2016 SISL Stanford Intelligent Systems Laboratory Overview 2 1 Motivation 2 Neural Networks 3 Deep Reinforcement Learning 4 Deep Learning
More informationVariational Deep Q Network
Variational Deep Q Network Yunhao Tang Department of IEOR Columbia University yt2541@columbia.edu Alp Kucukelbir Department of Computer Science Columbia University alp@cs.columbia.edu Abstract We propose
More informationQ-PROP: SAMPLE-EFFICIENT POLICY GRADIENT
Q-PROP: SAMPLE-EFFICIENT POLICY GRADIENT WITH AN OFF-POLICY CRITIC Shixiang Gu 123, Timothy Lillicrap 4, Zoubin Ghahramani 16, Richard E. Turner 1, Sergey Levine 35 sg717@cam.ac.uk,countzero@google.com,zoubin@eng.cam.ac.uk,
More informationTrust Region Policy Optimization
Trust Region Policy Optimization Yixin Lin Duke University yixin.lin@duke.edu March 28, 2017 Yixin Lin (Duke) TRPO March 28, 2017 1 / 21 Overview 1 Preliminaries Markov Decision Processes Policy iteration
More informationUsing Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *
Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 1. pp. 87 94. Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms
More informationONLINE VARIANCE-REDUCING OPTIMIZATION
ONLINE VARIANCE-REDUCING OPTIMIZATION Nicolas Le Roux Google Brain nlr@google.com Reza Babanezhad University of British Columbia rezababa@cs.ubc.ca Pierre-Antoine Manzagol Google Brain manzagop@google.com
More informationarxiv: v3 [cs.lg] 4 Jun 2016 Abstract
Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks Tim Salimans OpenAI tim@openai.com Diederik P. Kingma OpenAI dpkingma@openai.com arxiv:1602.07868v3 [cs.lg]
More informationBlack-box α-divergence Minimization
Black-box α-divergence Minimization José Miguel Hernández-Lobato, Yingzhen Li, Daniel Hernández-Lobato, Thang Bui, Richard Turner, Harvard University, University of Cambridge, Universidad Autónoma de Madrid.
More informationarxiv: v1 [cs.lg] 31 May 2016
Curiosity-driven Exploration in Deep Reinforcement Learning via Bayesian Neural Networks arxiv:165.9674v1 [cs.lg] 31 May 216 Rein Houthooft, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel
More informationDeep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang
Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i
More informationarxiv: v2 [cs.lg] 7 Mar 2018
Comparing Deep Reinforcement Learning and Evolutionary Methods in Continuous Control arxiv:1712.00006v2 [cs.lg] 7 Mar 2018 Shangtong Zhang 1, Osmar R. Zaiane 2 12 Dept. of Computing Science, University
More informationRecurrent Latent Variable Networks for Session-Based Recommendation
Recurrent Latent Variable Networks for Session-Based Recommendation Panayiotis Christodoulou Cyprus University of Technology paa.christodoulou@edu.cut.ac.cy 27/8/2017 Panayiotis Christodoulou (C.U.T.)
More informationFAST ADAPTATION IN GENERATIVE MODELS WITH GENERATIVE MATCHING NETWORKS
FAST ADAPTATION IN GENERATIVE MODELS WITH GENERATIVE MATCHING NETWORKS Sergey Bartunov & Dmitry P. Vetrov National Research University Higher School of Economics (HSE), Yandex Moscow, Russia ABSTRACT We
More informationCONCEPT LEARNING WITH ENERGY-BASED MODELS
CONCEPT LEARNING WITH ENERGY-BASED MODELS Igor Mordatch OpenAI San Francisco mordatch@openai.com 1 OVERVIEW Many hallmarks of human intelligence, such as generalizing from limited experience, abstract
More informationGenerative adversarial networks
14-1: Generative adversarial networks Prof. J.C. Kao, UCLA Generative adversarial networks Why GANs? GAN intuition GAN equilibrium GAN implementation Practical considerations Much of these notes are based
More informationarxiv: v2 [stat.ml] 15 Aug 2017
Worshop trac - ICLR 207 REINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS Chris Cremer, Quaid Morris & David Duvenaud Department of Computer Science University of Toronto {ccremer,duvenaud}@cs.toronto.edu
More informationDeep Reinforcement Learning
Martin Matyášek Artificial Intelligence Center Czech Technical University in Prague October 27, 2016 Martin Matyášek VPD, 2016 1 / 50 Reinforcement Learning in a picture R. S. Sutton and A. G. Barto 2015
More informationDISCRETE FLOW POSTERIORS FOR VARIATIONAL
DISCRETE FLOW POSTERIORS FOR VARIATIONAL INFERENCE IN DISCRETE DYNAMICAL SYSTEMS Anonymous authors Paper under double-blind review ABSTRACT Each training step for a variational autoencoder (VAE) requires
More informationarxiv: v4 [cs.lg] 16 Apr 2015
REWEIGHTED WAKE-SLEEP Jörg Bornschein and Yoshua Bengio Department of Computer Science and Operations Research University of Montreal Montreal, Quebec, Canada ABSTRACT arxiv:1406.2751v4 [cs.lg] 16 Apr
More informationMachine Learning I Continuous Reinforcement Learning
Machine Learning I Continuous Reinforcement Learning Thomas Rückstieß Technische Universität München January 7/8, 2010 RL Problem Statement (reminder) state s t+1 ENVIRONMENT reward r t+1 new step r t
More informationLecture 14: Deep Generative Learning
Generative Modeling CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 14: Deep Generative Learning Density estimation Reconstructing probability density function using samples Bohyung Han
More informationCENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 14. Sinan Kalkan
CENG 783 Special topics in Deep Learning AlchemyAPI Week 14 Sinan Kalkan Today Hopfield Networks Boltzmann Machines Deep BM, Restricted BM Generative Adversarial Networks Variational Auto-encoders Autoregressive
More information