Gradient Estimation Using Stochastic Computation Graphs

Similar documents
Policy Gradient Methods. February 13, 2017

arxiv: v3 [cs.lg] 25 Feb 2016 ABSTRACT

Deep Reinforcement Learning via Policy Optimization

Reinforcement Learning and NLP

Deep Reinforcement Learning: Policy Gradients and Q-Learning

BACKPROPAGATION THROUGH THE VOID

Variational Autoencoder

Probabilistic & Bayesian deep learning. Andreas Damianou

VIME: Variational Information Maximizing Exploration

Reinforced Variational Inference

Deep Reinforcement Learning

Lecture 9: Policy Gradient II 1

Reinforcement Learning for NLP

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning

An Information Theoretic Interpretation of Variational Inference based on the MDL Principle and the Bits-Back Coding Scheme

Introduction to Reinforcement Learning. CMPT 882 Mar. 18

Backpropagation Through

arxiv: v1 [stat.ml] 3 Nov 2016

Combine Monte Carlo with Exhaustive Search: Effective Variational Inference and Policy Gradient Reinforcement Learning

arxiv: v5 [stat.ml] 5 Aug 2017

GENERATIVE ADVERSARIAL LEARNING

The Success of Deep Generative Models

Lecture 9: Policy Gradient II (Post lecture) 2

DiCE: The Infinitely Differentiable Monte Carlo Estimator

Auto-Encoding Variational Bayes. Stochastic Backpropagation and Approximate Inference in Deep Generative Models

Discrete Latent Variable Models

Auto-Encoding Variational Bayes

Searching for the Principles of Reasoning and Intelligence

Bayesian Deep Learning

Deep Reinforcement Learning. Scratching the surface

arxiv: v1 [cs.lg] 28 Dec 2017

Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning

Lecture 8: Policy Gradient I 2

arxiv: v4 [stat.co] 19 May 2015

DICE: THE INFINITELY DIFFERENTIABLE MONTE CARLO ESTIMATOR

Discrete Variables and Gradient Estimators

Local Expectation Gradients for Doubly Stochastic. Variational Inference

Evaluating the Variance of

Reinforcement Learning

Reinforcement Learning

REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models

Bayesian Semi-supervised Learning with Deep Generative Models

Deep Generative Models for Graph Generation. Jian Tang HEC Montreal CIFAR AI Chair, Mila

INF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018

Nonparametric Inference for Auto-Encoding Variational Bayes

New Insights and Perspectives on the Natural Gradient Method

Resampled Proposal Distributions for Variational Inference and Learning

TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter Variational Autoencoders

Resampled Proposal Distributions for Variational Inference and Learning

Variational Inference for Monte Carlo Objectives

Variational Inference via Stochastic Backpropagation

arxiv: v3 [cs.lg] 23 Feb 2018

Variational Dropout via Empirical Bayes

REINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS

Variational Inference in TensorFlow. Danijar Hafner Stanford CS University College London, Google Brain

Denoising Criterion for Variational Auto-Encoding Framework

Reward-modulated inference

Evaluating the Variance of Likelihood-Ratio Gradient Estimators

Variational Autoencoders (VAEs)

Markov Chain Monte Carlo and Variational Inference: Bridging the Gap

Scalable Non-linear Beta Process Factor Analysis

Variational Autoencoders

Variational AutoEncoder: An Introduction and Recent Perspectives

Q-Learning in Continuous State Action Spaces

LEARNING TO SAMPLE WITH ADVERSARIALLY LEARNED LIKELIHOOD-RATIO

Variational Bayes on Monte Carlo Steroids

Approximation Methods in Reinforcement Learning

Unsupervised Learning

Probabilistic Graphical Models

O P T I M I Z I N G E X P E C TAT I O N S : T O S T O C H A S T I C C O M P U TAT I O N G R A P H S. john schulman. Summer, 2016

(Deep) Reinforcement Learning

Variance Reduction for Policy Gradient Methods. March 13, 2017

arxiv: v1 [cs.lg] 1 Jun 2017

Combining PPO and Evolutionary Strategies for Better Policy Search

Deep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016

Advanced Policy Gradient Methods: Natural Gradient, TRPO, and More. March 8, 2017

LDA with Amortized Inference

Experiments on the Consciousness Prior

Deep Reinforcement Learning SISL. Jeremy Morton (jmorton2) November 7, Stanford Intelligent Systems Laboratory

Variational Deep Q Network

Q-PROP: SAMPLE-EFFICIENT POLICY GRADIENT

Trust Region Policy Optimization

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *

ONLINE VARIANCE-REDUCING OPTIMIZATION

arxiv: v3 [cs.lg] 4 Jun 2016 Abstract

Black-box α-divergence Minimization

arxiv: v1 [cs.lg] 31 May 2016

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

arxiv: v2 [cs.lg] 7 Mar 2018

Recurrent Latent Variable Networks for Session-Based Recommendation

FAST ADAPTATION IN GENERATIVE MODELS WITH GENERATIVE MATCHING NETWORKS

CONCEPT LEARNING WITH ENERGY-BASED MODELS

Generative adversarial networks

arxiv: v2 [stat.ml] 15 Aug 2017

Deep Reinforcement Learning

DISCRETE FLOW POSTERIORS FOR VARIATIONAL

arxiv: v4 [cs.lg] 16 Apr 2015

Machine Learning I Continuous Reinforcement Learning

Lecture 14: Deep Generative Learning

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 14. Sinan Kalkan

Transcription:

Gradient Estimation Using Stochastic Computation Graphs Yoonho Lee Department of Computer Science and Engineering Pohang University of Science and Technology May 9, 2017

Outline Stochastic Gradient Estimation Stochastic Computation Graphs Stochastic Computation Graphs in RL Other Examples of Stochastic Computation Graphs

Outline Stochastic Gradient Estimation Stochastic Computation Graphs Stochastic Computation Graphs in RL Other Examples of Stochastic Computation Graphs

Stochastic Gradient Estimation R(θ) = E w Pθ (dw)[f (θ, w)] Approximate θ R(θ) can be used for: Computational Finance Reinforcement Learning Variational Inference Explaining the brain with neural networks

Stochastic Gradient Estimation R(θ) = E w Pθ (dw)[f (θ, w)] = f (θ, w)p θ (dw) Ω

Stochastic Gradient Estimation R(θ) = E w Pθ (dw)[f (θ, w)] = = Ω Ω f (θ, w) P θ(dw) G(dw) G(dw) f (θ, w)p θ (dw)

Stochastic Gradient Estimation R(θ) = E w Pθ (dw)[f (θ, w)] = θ R(θ) = θ = Ω Ω = Ω Ω f (θ, w) P θ(dw) G(dw) G(dw) f (θ, w) P θ(dw) G(dw) G(dw) θ (f (θ, w) P θ(dw) G(dw) )G(dw) f (θ, w)p θ (dw) = E w G(dw) [ θ f (θ, w) P θ(dw) G(dw) + f (θ, w) θp θ (dw) ] G(dw)

Stochastic Gradient Estimation So far we have θ R(θ) = E w G(dw) [ θ f (θ, w) P θ(dw) G(dw) + f (θ, w) θp θ (dw) ] G(dw)

Stochastic Gradient Estimation So far we have θ R(θ) = E w G(dw) [ θ f (θ, w) P θ(dw) G(dw) + f (θ, w) θp θ (dw) ] G(dw) Letting G = P θ0 and evaluating at θ = θ 0, we get θ R(θ) = E w Pθ0 (dw)[ θ f (θ, w) P θ(dw) θ=θ0 P θ0 (dw) + f (θ, w) θp θ (dw) P θ0 (dw) ] = E w Pθ0 (dw)[ θ f (θ, w) + f (θ, w) θ lnp θ (dw)] θ=θ0 θ=θ0

Stochastic Gradient Estimation So far we have R(θ) = E w Pθ (dw)[f (θ, w)] θ R(θ) = E w Pθ0 (dw)[ θ f (θ, w) + f (θ, w) θ lnp θ (dw)] θ=θ0 θ=θ0

Stochastic Gradient Estimation So far we have R(θ) = E w Pθ (dw)[f (θ, w)] θ R(θ) = E w Pθ0 (dw)[ θ f (θ, w) + f (θ, w) θ lnp θ (dw)] θ=θ0 1. Assuming w fully determines f : θ R(θ) = E w Pθ0 (dw)[f (w) θ lnp θ (dw)] θ=θ0 2. Assuming P θ (dw) is independent of θ: θ R(θ) = E w P(dw) [ θ f (θ, w)] θ=θ0 θ=θ0

Stochastic Gradient Estimation SF vs PD R(θ) = E w Pθ (dw)[f (θ, w)] Score Function (SF) Estimator: θ R(θ) = E w Pθ0 (dw)[f (w) θ lnp θ (dw)] θ=θ0 Pathwise Derivative (PD) Estimator: θ R(θ) = E w P(dw) [ θ f (θ, w)] θ=θ0

Stochastic Gradient Estimation SF vs PD R(θ) = E w Pθ (dw)[f (θ, w)] Score Function (SF) Estimator: θ R(θ) = E w Pθ0 (dw)[f (w) θ lnp θ (dw)] θ=θ0 Pathwise Derivative (PD) Estimator: θ R(θ) = E w P(dw) [ θ f (θ, w)] θ=θ0 SF allows discontinuous f. SF requires sample values f ; PD requires derivatives f. SF takes derivatives over the probability distribution of w; PD takes derivatives over the function f. SF has high variance on high-dimensional w. PD has high variance on rough f.

Stochastic Gradient Estimation Choices for PD estimator blog.shakirm.com/2015/10/machine-learning-trick-of-the-day-4-reparameterisation-tricks/

Outline Stochastic Gradient Estimation Stochastic Computation Graphs Stochastic Computation Graphs in RL Other Examples of Stochastic Computation Graphs

Gradient Estimation Using Stochastic Computation Graphs (NIPS 2015) John Schulman, Nicolas Heess, Theophane Weber, Pieter Abbeel

Stochastic Computation Graphs SF estimator: θ R(θ) = E x Pθ0 (dx)[f (x) θ lnp θ (dx)] θ=θ0 PD estimator: θ R(θ) = E z P(dz) [ θ f (x(θ, z))] θ=θ0

Stochastic Computation Graphs Stochastic computation graphs are a design choice rather than an intrinsic property of a problem Stochastic nodes block gradient flow

Stochastic Computation Graphs Neural Networks Neural Networks are a special case of stochastic computation graphs

Stochastic Computation Graphs Formal Definition

Stochastic Computation Graphs Deriving Unbiased Estimators Condition 1. (Differentiability Requirements) Given input node θ Θ, for all edges (v, w) which satisfy θ D v and θ D w, the following condition holds: if w is deterministic, w v exists, and if w is stochastic, v p(w PARENTS w ) exists. Theorem 1. Suppose θ Θ satisfies Condition 1. The following equations hold: θ E [ c ] [ ( = E θ logp(w DEPS w ) ) ˆQ w + c C w S c C θ D w θ D c [ = E c C ĉ w S θ D w θ logp(w DEPS w ) + c C θ D c ] θ c(deps c) ] θ c(deps c)

Stochastic Computation Graphs Surrogate loss functions We have this equation from Theorem 1: [ ] [ θ E c = E c C Note that by defining L(Θ, S) := w S w S θ D w ( θ logp(w DEPS w ) ) ˆQ w + c C θ D c logp(w DEPS w ) ˆQ w + c C c(deps c ) ] θ c(deps c) we have [ ] [ ] θ E c = E L(Θ, S) θ c C

Stochastic Computation Graphs Surrogate loss functions

Stochastic Computation Graphs Implementations Making stochastic computation graphs with neural network packages is easy tensorflow.org/api_docs/python/tf/contrib/bayesflow/stochastic_gradient_estimators github.com/pytorch/pytorch/blob/6a69f7007b37bb36d8a712cdf5cebe3ee0cc1cc8/torch/autograd/ variable.py github.com/blei-lab/edward/blob/master/edward/inferences/klqp.py

Outline Stochastic Gradient Estimation Stochastic Computation Graphs Stochastic Computation Graphs in RL Other Examples of Stochastic Computation Graphs

MDP The MDP formulation itself is a stochastic computation graph

Policy Gradient Surrogate Loss 1 The policy gradient algorithm is Theorem 1 applied to the MDP graph 1 Richard S Sutton et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation. In: NIPS (1999).

Stochastic Value Gradient SVG(0) Learn Q φ, and update 2 θ θ T t=1 Q φ (s t, π θ (s t, ɛ t )) 2 Nicolas Heess et al. Learning Continuous Control Policies by Stochastic Value Gradients. In: NIPS (2015).

Stochastic Value Gradient SVG(1) Learn V φ and dynamics model f such that s t+1 = f (s t, a t ) + δ t. Given transition (s t, a t, s t+1 ), infer noise δ t. 3 θ θ T t=1 r t + γv φ (f (s t, a t ) + δ t ) 3 Nicolas Heess et al. Learning Continuous Control Policies by Stochastic Value Gradients. In: NIPS (2015).

Stochastic Value Gradient SVG( ) Learn f, infer all noise variables δ t. Differentiate through whole graph. 4 4 Nicolas Heess et al. Learning Continuous Control Policies by Stochastic Value Gradients. In: NIPS (2015).

Deterministic Policy Gradient 5 5 David Silver et al. Deterministic Policy Gradient Algorithms. In: ICML (2014).

Evolution Strategies 6 6 Tim Salimans et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning. In: arxiv (2017).

Outline Stochastic Gradient Estimation Stochastic Computation Graphs Stochastic Computation Graphs in RL Other Examples of Stochastic Computation Graphs

Neural Variational Inference Where 7 L(x, θ, φ) = E h Q(h x) [logp θ (x, h) logq φ (h x)] Score function estimator 7 Andriy Mnih and Karol Gregor. Neural Variational Inference and Learning in Belief Networks. In: ICML (2014).

Variational Autoencoder 8 Pathwise derivative estimator applied to the exact same graph as NVIL 8 Diederik P Kingma and Max Welling. Auto-Encoding Variational Bayes. In: ICLR (2014).

DRAW 9 Sequential version of VAE Same graph as SVG( ) with (known) deterministic dynamics(e=state, z=action, L=reward) Similarly, VAE can be seen as SVG( ) with 1 timestep and deterministic dynamics 9 Karol Gregor et al. DRAW: A Recurrent Neural Network For Image Generation. In: ICML (2015).

GAN Connection to SVG(0) has been established in [10] Actor has no access to state. Observation is a random choice between real or fake image. Reward is 1 if real, 0 if fake. D learns value function of observation. G s learning signal is D. 10 10 David Pfau and Oriol Vinyals. Connecting Generative Adversarial Networks and Actor-Critic Methods. In: (2016).

Bayes by backprop and our cost function f is: 11 f (D, θ) = E w q(w θ) [f (w, θ)] = E w q(w θ) [ logp(d w) + logq(w θ) logp(w)] From the point of view of stochastic computation graphs, weights and activations are not different Estimate ELBO gradient of activation with PD estimator: VAE Estimate ELBO gradient of weight with PD estimator: Bayes by Backprop 11 Charles Blundell et al. Weight Uncertainty in Neural Networks. In: ICML (2015).

Summary Stochastic computation graphs explain many different algorithms with one gradient estimator(theorem 1) Stochastic computation graphs give us insight into the behavior of estimators Stochastic computation graphs reveal equivalencies: NVIL[7]=Policy gradient[12] VAE[6]/DRAW[4]=SVG( )[5] GAN[3]=SVG(0)[5]

References I [1] Charles Blundell et al. Weight Uncertainty in Neural Networks. In: ICML (2015). [2] Michael Fu. Stochastic Gradient Estimation. In: (2005). [3] Ian J. Goodfellow et al. Generative Adversarial Networks. In: NIPS (2014). [4] Karol Gregor et al. DRAW: A Recurrent Neural Network For Image Generation. In: ICML (2015). [5] Nicolas Heess et al. Learning Continuous Control Policies by Stochastic Value Gradients. In: NIPS (2015). [6] Diederik P Kingma and Max Welling. Auto-Encoding Variational Bayes. In: ICLR (2014). [7] Andriy Mnih and Karol Gregor. Neural Variational Inference and Learning in Belief Networks. In: ICML (2014).

References II [8] David Pfau and Oriol Vinyals. Connecting Generative Adversarial Networks and Actor-Critic Methods. In: (2016). [9] Tim Salimans et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning. In: arxiv (2017). [10] John Schulman et al. Gradient Estimation Using Stochastic Computation Graphs. In: NIPS (2015). [11] David Silver et al. Deterministic Policy Gradient Algorithms. In: ICML (2014). [12] Richard S Sutton et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation. In: NIPS (1999).

Thank You