Backpropagation Through

Similar documents
BACKPROPAGATION THROUGH THE VOID

Discrete Variables and Gradient Estimators

REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models

arxiv: v3 [cs.lg] 23 Feb 2018

Discrete Latent Variable Models

arxiv: v4 [cs.lg] 6 Nov 2017

Evaluating the Variance of

Gradient Estimation Using Stochastic Computation Graphs

Reinforcement Learning

Deep Generative Models

Reinforcement Learning and NLP

Summary of A Few Recent Papers about Discrete Generative models

Natural Gradients via the Variational Predictive Distribution

Variational Inference for Monte Carlo Objectives

Reinforcement Learning via Policy Optimization

Lecture 9: Policy Gradient II (Post lecture) 2

Lecture 9: Policy Gradient II 1

Variational Inference in TensorFlow. Danijar Hafner Stanford CS University College London, Google Brain

Machine Learning I Continuous Reinforcement Learning

Deep Latent-Variable Models of Natural Language

Variational Autoencoders (VAEs)

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning

The Mirage of Action-Dependent Baselines in Reinforcement Learning

Approximation Methods in Reinforcement Learning

Deep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016

Bayesian Deep Learning

Combine Monte Carlo with Exhaustive Search: Effective Variational Inference and Policy Gradient Reinforcement Learning

Variational Autoencoder

Reinforcement Learning

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley

ARM: Augment-REINFORCE-Merge Gradient for Discrete Latent Variable Models

GENERATIVE ADVERSARIAL LEARNING

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

Sandwiching the marginal likelihood using bidirectional Monte Carlo. Roger Grosse

Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation

Policy Gradient Methods. February 13, 2017

Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning

Auto-Encoding Variational Bayes

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *

CSC321 Lecture 22: Q-Learning

The connection of dropout and Bayesian statistics

Experiments on the Consciousness Prior

CS Machine Learning Qualifying Exam

Confidence Estimation Methods for Neural Networks: A Practical Comparison

Probabilistic Graphical Models

Deep Reinforcement Learning SISL. Jeremy Morton (jmorton2) November 7, Stanford Intelligent Systems Laboratory

Inference Suboptimality in Variational Autoencoders

arxiv: v5 [stat.ml] 5 Aug 2017

CSE 190: Reinforcement Learning: An Introduction. Chapter 8: Generalization and Function Approximation. Pop Quiz: What Function Are We Approximating?

Variational Inference via Stochastic Backpropagation

The Mirage of Action-Dependent Baselines in Reinforcement Learning

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

REINTERPRETING IMPORTANCE-WEIGHTED AUTOENCODERS

Chapter 8: Generalization and Function Approximation

Unsupervised Learning

Denoising Criterion for Variational Auto-Encoding Framework

Bayesian Quadrature: Model-based Approximate Integration. David Duvenaud University of Cambridge

Temporal Difference. Learning KENNETH TRAN. Principal Research Engineer, MSR AI

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Bayesian Semi-supervised Learning with Deep Generative Models

Variational Autoencoders

arxiv: v4 [stat.ml] 23 Feb 2018

TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter Variational Autoencoders

Actor-critic methods. Dialogue Systems Group, Cambridge University Engineering Department. February 21, 2017

Auto-Encoding Variational Bayes. Stochastic Backpropagation and Approximate Inference in Deep Generative Models

Introduction to Reinforcement Learning. CMPT 882 Mar. 18

Variance Reduction for Policy Gradient Methods. March 13, 2017

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College

Operator Variational Inference

Behavior Policy Gradient Supplemental Material

Introduction of Reinforcement Learning

The Success of Deep Generative Models

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

Lecture 8: Policy Gradient

Variational Bayes on Monte Carlo Steroids

A picture of the energy landscape of! deep neural networks

15-889e Policy Search: Gradient Methods Emma Brunskill. All slides from David Silver (with EB adding minor modificafons), unless otherwise noted

Prof. Dr. Ann Nowé. Artificial Intelligence Lab ai.vub.ac.be

Credit Assignment: Beyond Backpropagation

Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems

Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms

Robust Locally-Linear Controllable Embedding

Variational Autoencoders. Presented by Alex Beatson Materials from Yann LeCun, Jaan Altosaar, Shakir Mohamed

Notes on policy gradients and the log derivative trick for reinforcement learning

Bayesian Paragraph Vectors

Lecture 8: Policy Gradient I 2

Deep Reinforcement Learning: Policy Gradients and Q-Learning

Temporal context calibrates interval timing

Reinforcement learning an introduction

Reinforced Variational Inference

Variational Dropout and the Local Reparameterization Trick

Deep Reinforcement Learning. Scratching the surface

INF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018

Generative adversarial networks

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 3: Introduction to Deep Learning (continued)

Stochastic Generative Hashing

Deep Generative Models for Graph Generation. Jian Tang HEC Montreal CIFAR AI Chair, Mila

Latent Variable Models

CSC321 Lecture 20: Autoencoders

Transcription:

Backpropagation Through

Backpropagation Through

Backpropagation Through Will Grathwohl Dami Choi Yuhuai Wu Geoff Roeder David Duvenaud

Where do we see this guy? L( ) =E p(b ) [f(b)] Just about everywhere! Variational Inference Reinforcement Learning Hard Attention And so many more!

Gradient based optimization Gradient based optimization is the standard method used today to optimize expectations Necessary if models are neural-net based Very rarely can this gradient be computed analytically

Otherwise, we estimate A number of approaches exist to estimate this gradient They make varying levels of assumptions about the distribution and function being optimized Most popular methods either make strong assumptions or suffer from high variance

REINFORCE (Williams, 1992) ĝ REINFORCE [f] =f (b) @ @ log p(b ), b p(b ) Unbiased Has few requirements Suffers from high variance Easy to compute

Reparameterization (Kingma & Welling, 2014) ĝ reparam [f] = @f @b @b @ b = T (, ), p( ) Makes stronger assumptions Lower variance empirically Unbiased Requires f(b) is known and differentiable Requires p(b ) is reparameterizable

Concrete (Maddison et al., 2016) ĝ concrete [f] = @ @f (z/t) @ (z/t) @ z = T (, ), p( ) Works well in practice Low variance from reparameterization Biased Adds temperature hyper-parameter Requires that Requires Requires p(z ) f(b) f(b) is known, and differentiable is reparameterizable behaves predictably outside of domain

Control Variates Allow us to reduce variance of a Monte Carlo estimator ĝ new (b) =ĝ(b) c(b)+e p(b) [c(b)] Variance is reduced if corr(g, c) > 0 Does not change bias

Putting it all together We would like a general gradient estimator that is unbiased low variance usable when f(b) is unknown useable when p(b ) is discrete

Our Approach ĝ LAX = g REINFORCE [f] g REINFORCE [c ]+g reparam [c ]

Our Approach ĝ LAX = g REINFORCE [f] g REINFORCE [c ]+g reparam [c ] =[f(b) c (b)] @ @ log p(b )+ @ @ c (b) Start with the reinforce estimator for We introduce a new functionc (b) f(b) We subtract the reinforce estimator of its gradient and add the reparameterization estimator Can be thought of as using the reinforce estimator of variate c (b) as a control

Optimizing the Control Variate @ @ Variance(ĝ) =E apple @ @ ĝ2 For any unbiased estimator we can get Monte Carlo estimates for the gradient of the variance of ĝ Use to optimize c

Extension to discrete p(b ) ĝ RELAX =[f(b) c ( z)] @ @ log p(b )+ @ @ c (z) @ @ c ( z) b = H(z),z p(z ), z p(z b, ) When b is discrete, we introduce a relaxed distribution and a function H( where H(z) =b p(b ) p(z ) We use the conditioning scheme introduced in REBAR (Tucker et al. 2017) Unbiased for all c

A Simple Example E p(b ) [(t b) 2 ] Used to validate REBAR (used t =.45) We use t =.499 REBAR, REINFORCE fail due to noise outweighing signal Can RELAX improve?

RELAX outperforms baselines Considerably reduced variance! RELAX learns reasonable surrogate

Analyzing the Surrogate REBAR s fixed surrogate cannot produce consistent and correct gradients RELAX learns to balance REINFORCE variance and reparameterization variance

A More Interesting Application log p(x) L( ) =E q(b x) [log p(x b) + log p(b) log q(b x)] Discrete VAE Latent state is 200 Bernoulli variables Discrete sampling makes reparameterization estimator unusable c (z) =f( (z)) + r (z)

Results

Reinforcement Learning Policy gradient methods are very popular today (A2C, A3C, ACKTR) Seeks to find argmax E ( ) [R( )] Does this by estimating @ @ E ( )[R( )] R is not known so many popular estimators cannot be used

Actor Critic ĝ AC = TX t=1 @ log (a t s t, ) @ " T X t 0 =t r t 0 c (s t ) # c is an estimate of the value function This is exactly the REINFORCE estimator using an estimate of the value function as a control variate Why not use action in control variate? Dependence on action would add bias

LAX for RL ĝ LAX = TX t=1 @ log (a t s t, ) @ " T X t 0 =t r t 0 c (s t,a t )) # + @ @ c (s t,a t ) Allows for action dependence in control variate Remains unbiased Similar extension available for discrete action spaces

Results Improved performance Lower variance gradient estimates

Future Work RL Incorporate other variance reduction techniques (GAE, reward bootstrapping, trust-region) Ways to train the surrogate off-policy Applications Inference of graph structure (Submitted to NIPS 2018) Inference of discrete neural network architecture components (ICLR 2018 Workshop) RELAX applied to text GANS! (ReGAN) Sequence models with discrete latent variables