Branes with Brains. Reinforcement learning in the landscape of intersecting brane worlds. String_Data 2017, Boston 11/30/2017

Size: px
Start display at page:

Download "Branes with Brains. Reinforcement learning in the landscape of intersecting brane worlds. String_Data 2017, Boston 11/30/2017"

Transcription

1 Branes with Brains Reinforcement learning in the landscape of intersecting brane worlds FABIAN RUEHLE (UNIVERSITY OF OXFORD) String_Data 2017, Boston 11/30/2017 Based on [work in progress] with Brent Nelson and Jim Halverson

2 Motivation - ML Three approaches to machine learning: Supervised Learning: Train the machine by telling it what to do Unsupervised Learning: Let the machine train without telling it what to do Reinforcement Learning: [Sutton, Barto 98 17] Based on behavioral psychology Don t tell the machine exactly what to do but reward good and/or punish bad actions AI = reinforcement learning + deep (neural networks) learning [Silver 16]

3 Motivation - RL Agents interact with an environment (e.g. string landscape) Each interaction changes the state of the agent, e.g. the dof s parameterizing the string vacuum Each step is either rewarded (action lead to a more realistic vacuum) or punished (action lead to a less realistic vacuum) The agent acts with the aim of maximizing its long-term reward Agent repeats actions until it is told to stop (found a realistic vacuum or give up)

4 Outline String Theory setup: Intersecting D6-branes on orbifolds of toroidal orientifolds Implementation in Reinforcement Learning (RL) Basic overview Implementing the RL code Modelling the environment Preliminary results Finding consistent solutions Conclusion

5 String Theory 101 Intersecting D6-branes on orbifolds of toroidal orientifolds

6 String Theory 101 Have: IIA String Theory in 9D + time with 32 supercharges Want: A Theory in 3D + time with 4 supercharges Idea: Make extra 6D so small that we do not see them How do we do that? 1. Make them compact 2. Make their diameter so small that our experiments cannot detect them Reduce supercharges from 32 to 4: Identify some points with their mirror image

7 String Theory Setup Why this setup? Well studied Comparatively simple Number of (well-defined) solutions known to be finite: [Blumenhagen,Gmeiner,Honecker,Lust,Weigand '04'05; Douglas, Taylor '07,...] [Ibanez, Uranga 12] Use symmetries to relate different vacua [Douglas, Taylor 07] Combine consistency conditions to rule out combinations BUT: Number of possibilities so large that not a single interesting solution could be found despite enormous random scans (estimated to 1:10 9 ) Seems Taylor-made for big data / AI methods

8 String Theory Compactification How to make a dimension compact? ) Pacman

9 String Theory Compactification How to make a dimension compact? ) Pacman

10 String Theory Compactification How to make a dimension compact? ) Pacman

11 String Theory Compactification How to make a dimension compact? ) Pacman

12 String Theory Compactification y 1 y 2 y 3 x 1 x 2 x 3 Now six compact dimensions, but idea too simple Resulting space too simple (but just a little bit) Make it a bit more complicated

13 String Theory Orbifolds y 1 T 2 T 2 /Z 2 x 1 Mathematically: (x 1,y 1 )! ( x 1, y 1 ) Resulting object is called an orbifold Need to also orientifold: (x 1,y 1 )! (x 1, y 1 ) (plus something similar for the string itself)

14 String Theory Winding numbers (n, m) =(1, 0) (n, m) =(0, 1) (n, m) =(1, 2) Winding numbers (n, m) : Note: Due to orientifold: include (n, m), (n, m)

15 String Theory D6 branes 3D y 1 y 2 y 3 T 2 T 2 T 2 x 1 x 2 x 3 D6 brane: our 3D + a line on each torus Can stack multiple D6 branes on top of each other Brane stacks, Tuple: (N,n 1,m 1,n 2,m 2,n 3,m 3 )

16 String Theory Gauge group and particles Observed gauge group: U(N) : N Special cases: D6 branes on top of each other SO(2N) : N D6 branes parallel to O6-plane Sp(N) : N D6 branes orthogonal to O6-plane Intersection of N-brane and M -brane stack: Particles in representation SU(3) SU(2) U(1) Y (N,M) 1, 1 Observed particles in the universe: 3 (3, 2) 1 +3 (3, 1) 4 +3 (3, 1) (1, 2) 3 +1 (1, 2) 3 +3 (1, 1) 6 Quarks Leptons + Higgs

17 String Theory MSSM 3D y 1 y 2 y 3 T 2 T 2 T 2 x 1 x 2 x 3 Green and yellow intersect in 3 1 1=3 points Note: Counting intersections on the orbifold a bit more subtle

18 String Theory Consistency Tadpole cancellation: Balance energy of D6 and O6: 0 N a n a #stacks 1 n a 2 n a X B N a n a 1 m a 2m a N a m a a=1 1 n a 2 m a C A = B A N a m a 1m a 2 n a 3 8 K-Theory: Global consistency: 0 2N a m a 1m a 2m a 1 #stacks 3 X B N a m a 1 n a 2 n a N a n a a=1 1 m a 2 n a C A mod 3 2N a n a 1 n a 2 m a C A = C A

19 String Theory Consistency SUSY (computational control): 8a =1,...,# stacks m a 1m a 2m a 3 jm a 1n a 2n a 3 kn a 1m a 2n a 3 `n a 1n a 2m a 3 =0 n a 1n a 2n a 3 jn a 1m a 2m a 3 km a 1n a 2m a 3 `m a 1m a 2n a 3 > 0 Pheno: SU(3) SU(2) U(1) + particles T =(T 1,T 2,...,T k ), k =#U(N) stacks is U(1) iff: N 1 m 1 1 2N 2 m 2 1 2N k m k 1 T 1 0 2N 1 m 2 1 2N 2 m 2 2 2N k m k A T 2 2 B. C 2N 1 m 2 3 2N 2 m 2 3 2N k m A 3 T k A

20 String Theory IIA state space State space gigantic Choose a maximal value for winding number N B Let be the number of possible winding number combinations (up to ) after symmetry reduction N S Let be the maximal number of stacks NB Allows for combinations N S w max w max Note: Each stack can have N =1, 2, 3,... branes

21 Reinforcement learning

22 Reinforcement learning - Overview At time t, agent in state s t 2 S total Select action from action space A based on policy : S total 7! A a t Receive reward r t 2 R for action a t based on reward function R, R : S total A! R Transition to the next state Try to maximize long-term return G t =, Keep track of state value s t+1 v(s) ( how good is the state ) Adv = r 2 (0, 1] Compute advantage estimate ( how much better than expected has the action turned out to be ) 1X k=1 v k r t+k

23 Reinforcement Learning - Overview How to maximize future return? Depends on policy Several approaches Tabular (small state/action spaces): [Sutton, Barto 98] Temporal difference learning SARSA ) my breakout group on Friday Q-learning Deep RL (large/infinite state/action spaces): Deep Q-Network [Mnih et al 15] Asynchronous advantage actor-critic (A3C) [Mnih et al 16] Variations/extensions: Wolpertinger [Dulac-Arnold et al 16], Rainbow[Hessel et al '17]

24 Reinforcement Learning - A3C Global instance Policy Value Network Input Worker 1 Worker 2 Worker n Policy Value Policy Value Policy Value Network Network Network Input Input Input Environment Environment Environment

25 Reinforcement Learning - A3C Asynchronous: Have n workers explore the environment simultaneously and asynchronously improves training stability (experience of workers separated) improves exploration Advantage: Use advantage to update policy Actor-critic: To maximize return need to know state or action value and optimize policy. Methods like Q-learning focuses on value function Methods like policy-gradient focus on policy AC: Use value estimate ( critic ) to update policy ( actor )

26 Reinforcement Learning - Implementation Open AI Gym: Interface between agent (RL) and environment (string landscape) [Brockman et al '16] We provide the environment We use ChainerRL s implementation of A3C for the agent Environment step Chainer RL action space observation (state) space reset make env method (A3C,DQN, ) NN architecture (FF, LSTM, ) step: go to new state return (new_state, reward, done, comment) reset: reset episode return start_state make environment specify RL method (A3C) specify policy NN (FF,LSTM)

27 Reinforcement learning - Model the environment State space:, s t =[(N 1,n 1 1,m 1 1,n 1 2,m 1 2,n 1 3,m 1 3), (N 2,n 2 1,...),...] Action space Reward R s t 2 S total S total = N max NB : Need a notion of how good a state is 1. By how much does a set of stacks violate the tadpole? 2. Is a set of stacks fully consistent (Tadpole, K-Theory, SUSY) (Note: the latter two are binary, hard to define distance) 3. How far is the state from the Standard Model Missing a group factor of SU(3) SU(2) U(1)? Too few Standard model particles (Q, u, d, L, H u,h d,e)? Extra exotics (particles charged under the Standard Model but not observed so far) Note: Only works if good states are close by in this sense N S N S A = {N a! N a ± 1, add stack (N,n 1,...), remove stack (N,n 1,...)}

28 Preliminary results Parameters: 16 or 32 workers (1 CPU, threads, 2.6GHz) Training time of the order 10h Neural networks for value and policy evaluation Feed-forward NN with 2 hidden Softmax layers with 200 nodes RNN with linear (200 nodes) and LSTM layer (128 nodes) Initial state: Empty stack Maximal steps per episode: 10, , evaluation runs every 100,000 steps

29 Preliminary results - Tadpole cancellation mean scores log(average # steps to solution) # steps log(# steps) Tadpole cancellation Maximum of 10,000 steps in an episode Reward for Tadpole cancellation: 10 6 Punishment for step: X 8 Tadpolei (s)

30 Preliminary results - Tadpole cancellation mean scores log(average # steps to solution) # steps log(# steps) Tadpole cancellation Maximum of 50,000 steps in an episode Reward for Tadpole cancellation: 10 6 Punishment for step: X 8 Tadpole i (s)

31 Preliminary results - Tadpole cancellation mean scores log(average # steps to solution) # of Steps log(# steps) Tadpole cancellation Maximum of 250,000 steps in an episode Reward for Tadpole cancellation: 10 6 Punishment for step: X 8 Tadpole i (s)

32 Preliminary results - Tadpole+K-Theory+SUSY mean scores # steps Full consistency (TC + K-Th + SUSY) Maximum of 10,000 steps in an episode Rewards: (TC,K-Th, SUSY) = (10 7, 10 8, 10 9 ) Punishment for step: X 8 Tadpole i (s)

33 Conclusion Reinforcement learning very promising approach to AI + ML A3C performs very well for different environments (mostly tested for Atari games) Type II orientifold setup well-suited for landscape analysis Physics well-understood Number of configurations too large to approach w/ conventional methods! no Standard model found so far Preliminary results: A3C works well for consistency (Tadpole, K-Theory, SUSY) Getting close to SM

34 Thank you for your attention!

On the Phenomenology of Four Dimensional Gepner Models

On the Phenomenology of Four Dimensional Gepner Models On the Phenomenology of Four Dimensional Gepner Models Mirian Tsulaia (Vienna University of Technology) Work in progress with Elias Kiritsis (University of Crete) Liverpool, March 27 29, 2008 String Phenomenology

More information

1 Introduction 2. 4 Q-Learning The Q-value The Temporal Difference The whole Q-Learning process... 5

1 Introduction 2. 4 Q-Learning The Q-value The Temporal Difference The whole Q-Learning process... 5 Table of contents 1 Introduction 2 2 Markov Decision Processes 2 3 Future Cumulative Reward 3 4 Q-Learning 4 4.1 The Q-value.............................................. 4 4.2 The Temporal Difference.......................................

More information

Introduction of Reinforcement Learning

Introduction of Reinforcement Learning Introduction of Reinforcement Learning Deep Reinforcement Learning Reference Textbook: Reinforcement Learning: An Introduction http://incompleteideas.net/sutton/book/the-book.html Lectures of David Silver

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Cyber Rodent Project Some slides from: David Silver, Radford Neal CSC411: Machine Learning and Data Mining, Winter 2017 Michael Guerzhoy 1 Reinforcement Learning Supervised learning:

More information

REINFORCEMENT LEARNING

REINFORCEMENT LEARNING REINFORCEMENT LEARNING Larry Page: Where s Google going next? DeepMind's DQN playing Breakout Contents Introduction to Reinforcement Learning Deep Q-Learning INTRODUCTION TO REINFORCEMENT LEARNING Contents

More information

Deep Reinforcement Learning. Scratching the surface

Deep Reinforcement Learning. Scratching the surface Deep Reinforcement Learning Scratching the surface Deep Reinforcement Learning Scenario of Reinforcement Learning Observation State Agent Action Change the environment Don t do that Reward Environment

More information

INF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018

INF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018 Machine learning for image classification Lecture 14: Reinforcement learning May 9, 2018 Page 3 Outline Motivation Introduction to reinforcement learning (RL) Value function based methods (Q-learning)

More information

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)

More information

CS230: Lecture 9 Deep Reinforcement Learning

CS230: Lecture 9 Deep Reinforcement Learning CS230: Lecture 9 Deep Reinforcement Learning Kian Katanforoosh Menti code: 21 90 15 Today s outline I. Motivation II. Recycling is good: an introduction to RL III. Deep Q-Learning IV. Application of Deep

More information

(Deep) Reinforcement Learning

(Deep) Reinforcement Learning Martin Matyášek Artificial Intelligence Center Czech Technical University in Prague October 27, 2016 Martin Matyášek VPD, 2016 1 / 17 Reinforcement Learning in a picture R. S. Sutton and A. G. Barto 2015

More information

Approximate Q-Learning. Dan Weld / University of Washington

Approximate Q-Learning. Dan Weld / University of Washington Approximate Q-Learning Dan Weld / University of Washington [Many slides taken from Dan Klein and Pieter Abbeel / CS188 Intro to AI at UC Berkeley materials available at http://ai.berkeley.edu.] Q Learning

More information

CSC321 Lecture 22: Q-Learning

CSC321 Lecture 22: Q-Learning CSC321 Lecture 22: Q-Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Q-Learning 1 / 21 Overview Second of 3 lectures on reinforcement learning Last time: policy gradient (e.g. REINFORCE) Optimize

More information

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning Ronen Tamari The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (#67679) February 28, 2016 Ronen Tamari

More information

Q-Learning in Continuous State Action Spaces

Q-Learning in Continuous State Action Spaces Q-Learning in Continuous State Action Spaces Alex Irpan alexirpan@berkeley.edu December 5, 2015 Contents 1 Introduction 1 2 Background 1 3 Q-Learning 2 4 Q-Learning In Continuous Spaces 4 5 Experimental

More information

Deep Reinforcement Learning SISL. Jeremy Morton (jmorton2) November 7, Stanford Intelligent Systems Laboratory

Deep Reinforcement Learning SISL. Jeremy Morton (jmorton2) November 7, Stanford Intelligent Systems Laboratory Deep Reinforcement Learning Jeremy Morton (jmorton2) November 7, 2016 SISL Stanford Intelligent Systems Laboratory Overview 2 1 Motivation 2 Neural Networks 3 Deep Reinforcement Learning 4 Deep Learning

More information

Reinforcement Learning and NLP

Reinforcement Learning and NLP 1 Reinforcement Learning and NLP Kapil Thadani kapil@cs.columbia.edu RESEARCH Outline 2 Model-free RL Markov decision processes (MDPs) Derivative-free optimization Policy gradients Variance reduction Value

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Ron Parr CompSci 7 Department of Computer Science Duke University With thanks to Kris Hauser for some content RL Highlights Everybody likes to learn from experience Use ML techniques

More information

Human-level control through deep reinforcement. Liia Butler

Human-level control through deep reinforcement. Liia Butler Humanlevel control through deep reinforcement Liia Butler But first... A quote "The question of whether machines can think... is about as relevant as the question of whether submarines can swim" Edsger

More information

Discrete gauge symmetries and Open strings

Discrete gauge symmetries and Open strings Discrete gauge symmetries and Open strings Pascal Anastasopoulos work in progress with M. Cvetic, J. Halverson, R. Richter and P. Vaudrevange. Corfu - 20/09/2012 Plan of the talk Motivation Discrete gauge

More information

Hidden Sector Baryogenesis. Jason Kumar (Texas A&M University) w/ Bhaskar Dutta (hep-th/ ) and w/ B.D and Louis Leblond (hepth/ )

Hidden Sector Baryogenesis. Jason Kumar (Texas A&M University) w/ Bhaskar Dutta (hep-th/ ) and w/ B.D and Louis Leblond (hepth/ ) Hidden Sector Baryogenesis Jason Kumar (Texas A&M University) w/ Bhaskar Dutta (hep-th/0608188) and w/ B.D and Louis Leblond (hepth/0703278) Basic Issue low-energy interactions seem to preserve baryon

More information

New Physics from Type IIA Quivers

New Physics from Type IIA Quivers New Physics from Type IIA Quivers The string vacuum Tadpoles and extended MSSM quivers New matter and Z s Axigluons from type IIa quivers M. Cvetič, J. Halverson, PL: Implications of String Constraints

More information

Reinforcement Learning. Machine Learning, Fall 2010

Reinforcement Learning. Machine Learning, Fall 2010 Reinforcement Learning Machine Learning, Fall 2010 1 Administrativia This week: finish RL, most likely start graphical models LA2: due on Thursday LA3: comes out on Thursday TA Office hours: Today 1:30-2:30

More information

Reinforcement Learning for NLP

Reinforcement Learning for NLP Reinforcement Learning for NLP Advanced Machine Learning for NLP Jordan Boyd-Graber REINFORCEMENT OVERVIEW, POLICY GRADIENT Adapted from slides by David Silver, Pieter Abbeel, and John Schulman Advanced

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Dipendra Misra Cornell University dkm@cs.cornell.edu https://dipendramisra.wordpress.com/ Task Grasp the green cup. Output: Sequence of controller actions Setup from Lenz et. al.

More information

Reinforcement Learning Part 2

Reinforcement Learning Part 2 Reinforcement Learning Part 2 Dipendra Misra Cornell University dkm@cs.cornell.edu https://dipendramisra.wordpress.com/ From previous tutorial Reinforcement Learning Exploration No supervision Agent-Reward-Environment

More information

Anomalous discrete symmetries in D-brane models

Anomalous discrete symmetries in D-brane models Anomalous discrete symmetries in D-brane models Shohei Uemura Maskawa Institute for Science and Culture, Kyoto Sangyo University based on a work in progress with Tatsuo Kobayashi (Hokkaido University),

More information

Payments System Design Using Reinforcement Learning: A Progress Report

Payments System Design Using Reinforcement Learning: A Progress Report Payments System Design Using Reinforcement Learning: A Progress Report A. Desai 1 H. Du 1 R. Garratt 2 F. Rivadeneyra 1 1 Bank of Canada 2 University of California Santa Barbara 16th Payment and Settlement

More information

Machine Learning I Reinforcement Learning

Machine Learning I Reinforcement Learning Machine Learning I Reinforcement Learning Thomas Rückstieß Technische Universität München December 17/18, 2009 Literature Book: Reinforcement Learning: An Introduction Sutton & Barto (free online version:

More information

Web of threefold bases in F-theory and machine learning

Web of threefold bases in F-theory and machine learning and machine learning 1510.04978 & 1710.11235 with W. Taylor CTP, MIT String Data Science, Northeastern; Dec. 2th, 2017 1 / 33 Exploring a huge oriented graph 2 / 33 Nodes in the graph Physical setup: 4D

More information

Grand Unification and Strings:

Grand Unification and Strings: Grand Unification and Strings: the Geography of Extra Dimensions Hans Peter Nilles Physikalisches Institut, Universität Bonn Based on work with S. Förste, P. Vaudrevange and A. Wingerter hep-th/0406208,

More information

Supersymmetric Standard Models in String Theory

Supersymmetric Standard Models in String Theory Supersymmetric Standard Models in String Theory (a) Spectrum (b) Couplings (c) Moduli stabilisation Type II side [toroidal orientifolds]- brief summary- status (a),(b)&(c) Heterotic side [Calabi-Yau compactification]

More information

Introduction to Reinforcement Learning. CMPT 882 Mar. 18

Introduction to Reinforcement Learning. CMPT 882 Mar. 18 Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and

More information

An exploration of threefold bases in F-theory

An exploration of threefold bases in F-theory 1510.04978 & upcoming work with W. Taylor CTP, MIT String Pheno 2017; Jul. 6th, 2017 F-theory landscape program Classify distinct F-theory compactifications to 4D F-theory compactification on an elliptic

More information

Deep Reinforcement Learning. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 19, 2017

Deep Reinforcement Learning. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 19, 2017 Deep Reinforcement Learning STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 19, 2017 Outline Introduction to Reinforcement Learning AlphaGo (Deep RL for Computer Go)

More information

Neural Map. Structured Memory for Deep RL. Emilio Parisotto

Neural Map. Structured Memory for Deep RL. Emilio Parisotto Neural Map Structured Memory for Deep RL Emilio Parisotto eparisot@andrew.cmu.edu PhD Student Machine Learning Department Carnegie Mellon University Supervised Learning Most deep learning problems are

More information

Lecture 10 - Planning under Uncertainty (III)

Lecture 10 - Planning under Uncertainty (III) Lecture 10 - Planning under Uncertainty (III) Jesse Hoey School of Computer Science University of Waterloo March 27, 2018 Readings: Poole & Mackworth (2nd ed.)chapter 12.1,12.3-12.9 1/ 34 Reinforcement

More information

CS 570: Machine Learning Seminar. Fall 2016

CS 570: Machine Learning Seminar. Fall 2016 CS 570: Machine Learning Seminar Fall 2016 Class Information Class web page: http://web.cecs.pdx.edu/~mm/mlseminar2016-2017/fall2016/ Class mailing list: cs570@cs.pdx.edu My office hours: T,Th, 2-3pm or

More information

Temporal Difference Learning & Policy Iteration

Temporal Difference Learning & Policy Iteration Temporal Difference Learning & Policy Iteration Advanced Topics in Reinforcement Learning Seminar WS 15/16 ±0 ±0 +1 by Tobias Joppen 03.11.2015 Fachbereich Informatik Knowledge Engineering Group Prof.

More information

Physics of Type II and Heterotic SM & GUT String Vacua

Physics of Type II and Heterotic SM & GUT String Vacua Physics of Type II and Heterotic SM & GUT String Vacua (a) Spectrum (b) Couplings (c) Moduli stabilisation Type II side [toroidal orientifolds]- summary (a)&(b) [(c)- no time] new results on SU(5) GUT

More information

Deep reinforcement learning. Dialogue Systems Group, Cambridge University Engineering Department

Deep reinforcement learning. Dialogue Systems Group, Cambridge University Engineering Department Deep reinforcement learning Milica Gašić Dialogue Systems Group, Cambridge University Engineering Department 1 / 25 In this lecture... Introduction to deep reinforcement learning Value-based Deep RL Deep

More information

15-780: ReinforcementLearning

15-780: ReinforcementLearning 15-780: ReinforcementLearning J. Zico Kolter March 2, 2016 1 Outline Challenge of RL Model-based methods Model-free methods Exploration and exploitation 2 Outline Challenge of RL Model-based methods Model-free

More information

David Silver, Google DeepMind

David Silver, Google DeepMind Tutorial: Deep Reinforcement Learning David Silver, Google DeepMind Outline Introduction to Deep Learning Introduction to Reinforcement Learning Value-Based Deep RL Policy-Based Deep RL Model-Based Deep

More information

Policy Gradient Methods. February 13, 2017

Policy Gradient Methods. February 13, 2017 Policy Gradient Methods February 13, 2017 Policy Optimization Problems maximize E π [expression] π Fixed-horizon episodic: T 1 Average-cost: lim T 1 T r t T 1 r t Infinite-horizon discounted: γt r t Variable-length

More information

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition

More information

Heterotic Brane World

Heterotic Brane World Heterotic Brane World Hans Peter Nilles Physikalisches Institut Universität Bonn Germany Based on work with S. Förste, P. Vaudrevange and A. Wingerter hep-th/0406208, to appear in Physical Review D Heterotic

More information

Reinforcement Learning II. George Konidaris

Reinforcement Learning II. George Konidaris Reinforcement Learning II George Konidaris gdk@cs.brown.edu Fall 2017 Reinforcement Learning π : S A max R = t=0 t r t MDPs Agent interacts with an environment At each time t: Receives sensor signal Executes

More information

Reinforcement Learning. George Konidaris

Reinforcement Learning. George Konidaris Reinforcement Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom

More information

Reinforcement Learning II. George Konidaris

Reinforcement Learning II. George Konidaris Reinforcement Learning II George Konidaris gdk@cs.brown.edu Fall 2018 Reinforcement Learning π : S A max R = t=0 t r t MDPs Agent interacts with an environment At each time t: Receives sensor signal Executes

More information

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight) CSCE 636 Neural Networks Instructor: Yoonsuck Choe Deep Learning What Is Deep Learning? Learning higher level abstractions/representations from data. Motivation: how the brain represents sensory information

More information

INTRODUCTION TO EVOLUTION STRATEGY ALGORITHMS. James Gleeson Eric Langlois William Saunders

INTRODUCTION TO EVOLUTION STRATEGY ALGORITHMS. James Gleeson Eric Langlois William Saunders INTRODUCTION TO EVOLUTION STRATEGY ALGORITHMS James Gleeson Eric Langlois William Saunders REINFORCEMENT LEARNING CHALLENGES f(θ) is a discrete function of theta How do we get a gradient θ f? Credit assignment

More information

6 Reinforcement Learning

6 Reinforcement Learning 6 Reinforcement Learning As discussed above, a basic form of supervised learning is function approximation, relating input vectors to output vectors, or, more generally, finding density functions p(y,

More information

Experiments on the Consciousness Prior

Experiments on the Consciousness Prior Yoshua Bengio and William Fedus UNIVERSITÉ DE MONTRÉAL, MILA Abstract Experiments are proposed to explore a novel prior for representation learning, which can be combined with other priors in order to

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Reinforcement learning Daniel Hennes 4.12.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Reinforcement learning Model based and

More information

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch Psychology 452 Week 12: Deep Learning What Is Deep Learning? Preliminary Ideas (that we already know!) The Restricted Boltzmann Machine (RBM) Many Layers of RBMs Pros and Cons of Deep Learning Course Structure

More information

Prof. Dr. Ann Nowé. Artificial Intelligence Lab ai.vub.ac.be

Prof. Dr. Ann Nowé. Artificial Intelligence Lab ai.vub.ac.be REINFORCEMENT LEARNING AN INTRODUCTION Prof. Dr. Ann Nowé Artificial Intelligence Lab ai.vub.ac.be REINFORCEMENT LEARNING WHAT IS IT? What is it? Learning from interaction Learning about, from, and while

More information

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, et al. Google arxiv:1609.08144v2 Reviewed by : Bill

More information

Strings, Exotics, and the 750 GeV Diphoton Excess

Strings, Exotics, and the 750 GeV Diphoton Excess Strings, Exotics, and the 750 GeV Diphoton Excess Events / 0 GeV 4 ATLAS Preliminary Data Background-only fit Spin-0 Selection s = 1 TeV,. fb The ATLAS and CMS results 1 Phenomenology and theory Data -

More information

CS599 Lecture 1 Introduction To RL

CS599 Lecture 1 Introduction To RL CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming

More information

Machine Learning. Reinforcement learning. Hamid Beigy. Sharif University of Technology. Fall 1396

Machine Learning. Reinforcement learning. Hamid Beigy. Sharif University of Technology. Fall 1396 Machine Learning Reinforcement learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 1 / 32 Table of contents 1 Introduction

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Function approximation Daniel Hennes 19.06.2017 University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Eligibility traces n-step TD returns Forward and backward view Function

More information

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight) CSCE 636 Neural Networks Instructor: Yoonsuck Choe Deep Learning What Is Deep Learning? Learning higher level abstractions/representations from data. Motivation: how the brain represents sensory information

More information

Deep Reinforcement Learning: Policy Gradients and Q-Learning

Deep Reinforcement Learning: Policy Gradients and Q-Learning Deep Reinforcement Learning: Policy Gradients and Q-Learning John Schulman Bay Area Deep Learning School September 24, 2016 Introduction and Overview Aim of This Talk What is deep RL, and should I use

More information

CMU Lecture 12: Reinforcement Learning. Teacher: Gianni A. Di Caro

CMU Lecture 12: Reinforcement Learning. Teacher: Gianni A. Di Caro CMU 15-781 Lecture 12: Reinforcement Learning Teacher: Gianni A. Di Caro REINFORCEMENT LEARNING Transition Model? State Action Reward model? Agent Goal: Maximize expected sum of future rewards 2 MDP PLANNING

More information

Reinforcement learning an introduction

Reinforcement learning an introduction Reinforcement learning an introduction Prof. Dr. Ann Nowé Computational Modeling Group AIlab ai.vub.ac.be November 2013 Reinforcement Learning What is it? Learning from interaction Learning about, from,

More information

Predictions from F-theory GUTs. (High vs. low-scale SUSY; proton decay)

Predictions from F-theory GUTs. (High vs. low-scale SUSY; proton decay) Predictions from F-theory GUTs Arthur Hebecker (Heidelberg) (including some original work with James Unwin (Notre Dame)) Outline: Motivation (Why F-theory GUTs?) Proton decay Flavor; neutrino masses Gauge

More information

Approximation Methods in Reinforcement Learning

Approximation Methods in Reinforcement Learning 2018 CS420, Machine Learning, Lecture 12 Approximation Methods in Reinforcement Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/cs420/index.html Reinforcement

More information

An Axion-induced SM/MSSM Higgs Landscape and the Weak Gravity Conjecture

An Axion-induced SM/MSSM Higgs Landscape and the Weak Gravity Conjecture An Axion-induced SM/MSSM Higgs Landscape and the Weak Gravity Conjecture Alvaro Herraez Universidad Autónoma de Madrid & Instituto de Fisica Teórica UAM-CSIC Based on: A.H., L. Ibáñez [1610.08836] String

More information

D-BRANE MODEL BUILDING, PART I

D-BRANE MODEL BUILDING, PART I D-BRANE MODEL BUILDING, PART I Luis E. Ibáñez Instituto de Física Teórica (IFT-UAM/CSIC) Universidad Autónoma de Madrid PITP School, IAS Princeton, July 2008 L.E. Ibáñez; D-BRANE MODEL BUILDING PART I,

More information

CS Machine Learning Qualifying Exam

CS Machine Learning Qualifying Exam CS Machine Learning Qualifying Exam Georgia Institute of Technology March 30, 2017 The exam is divided into four areas: Core, Statistical Methods and Models, Learning Theory, and Decision Processes. There

More information

Today s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes

Today s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes Today s s Lecture Lecture 20: Learning -4 Review of Neural Networks Markov-Decision Processes Victor Lesser CMPSCI 683 Fall 2004 Reinforcement learning 2 Back-propagation Applicability of Neural Networks

More information

The correspondence between free fermionic models and orbifolds

The correspondence between free fermionic models and orbifolds The correspondence between free fermionic models and orbifolds Panos Athanasopoulos based on arxiv 1506.xxxx with A. Faraggi, S. Groot Nibbelink and V. Mehta Panos Athanasopoulos (UoL) Free fermionic models

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

The convergence limit of the temporal difference learning

The convergence limit of the temporal difference learning The convergence limit of the temporal difference learning Ryosuke Nomura the University of Tokyo September 3, 2013 1 Outline Reinforcement Learning Convergence limit Construction of the feature vector

More information

Lecture 1: March 7, 2018

Lecture 1: March 7, 2018 Reinforcement Learning Spring Semester, 2017/8 Lecture 1: March 7, 2018 Lecturer: Yishay Mansour Scribe: ym DISCLAIMER: Based on Learning and Planning in Dynamical Systems by Shie Mannor c, all rights

More information

Sequence Modeling with Neural Networks

Sequence Modeling with Neural Networks Sequence Modeling with Neural Networks Harini Suresh y 0 y 1 y 2 s 0 s 1 s 2... x 0 x 1 x 2 hat is a sequence? This morning I took the dog for a walk. sentence medical signals speech waveform Successes

More information

EteRNA-RL: Designing RNA secondary structures with reinforcement learning. Isaac Kauvar, Ethan Richman, Will Allen

EteRNA-RL: Designing RNA secondary structures with reinforcement learning. Isaac Kauvar, Ethan Richman, Will Allen EteRNA-RL: Designing RNA secondary structures with reinforcement learning Isaac Kauvar, Ethan Richman, Will Allen The Problem Specify an RNA sequence that will fold into a desired secondary structure.

More information

Deep Reinforcement Learning

Deep Reinforcement Learning Martin Matyášek Artificial Intelligence Center Czech Technical University in Prague October 27, 2016 Martin Matyášek VPD, 2016 1 / 50 Reinforcement Learning in a picture R. S. Sutton and A. G. Barto 2015

More information

Review: TD-Learning. TD (SARSA) Learning for Q-values. Bellman Equations for Q-values. P (s, a, s )[R(s, a, s )+ Q (s, (s ))]

Review: TD-Learning. TD (SARSA) Learning for Q-values. Bellman Equations for Q-values. P (s, a, s )[R(s, a, s )+ Q (s, (s ))] Review: TD-Learning function TD-Learning(mdp) returns a policy Class #: Reinforcement Learning, II 8s S, U(s) =0 set start-state s s 0 choose action a, using -greedy policy based on U(s) U(s) U(s)+ [r

More information

Marks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam:

Marks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam: Marks } Assignment 1: Should be out this weekend } All are marked, I m trying to tally them and perhaps add bonus points } Mid-term: Before the last lecture } Mid-term deferred exam: } This Saturday, 9am-10.30am,

More information

Forward Actor-Critic for Nonlinear Function Approximation in Reinforcement Learning

Forward Actor-Critic for Nonlinear Function Approximation in Reinforcement Learning Forward Actor-Critic for Nonlinear Function Approximation in Reinforcement Learning Vivek Veeriah Dept. of Computing Science University of Alberta Edmonton, Canada vivekveeriah@ualberta.ca Harm van Seijen

More information

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning CSCI-699: Advanced Topics in Deep Learning 01/16/2019 Nitin Kamra Spring 2019 Introduction to Reinforcement Learning 1 What is Reinforcement Learning? So far we have seen unsupervised and supervised learning.

More information

arxiv:hep-th/ v3 11 Oct 2004

arxiv:hep-th/ v3 11 Oct 2004 UPR-1068-T, hep-th/0403061 Supersymmetric Pati-Salam Models from Intersecting D6-branes: A Road to the Standard Model Mirjam Cvetič, 1 Tianjun Li, 2 and Tao Liu 1 1 Department of Physics and Astronomy,

More information

Reinforcement. Function Approximation. Learning with KATJA HOFMANN. Researcher, MSR Cambridge

Reinforcement. Function Approximation. Learning with KATJA HOFMANN. Researcher, MSR Cambridge Reinforcement Learning with Function Approximation KATJA HOFMANN Researcher, MSR Cambridge Representation and Generalization in RL Focus on training stability Learning generalizable value functions Navigating

More information

Theory III: String Theory. presented by Dieter Lüst, MPI and LMU-München

Theory III: String Theory. presented by Dieter Lüst, MPI and LMU-München Theory III: String Theory presented by Dieter Lüst, MPI and LMU-München The string theory group at MPI (started in 2004): The string theory group at MPI (started in 2004): Permanent members: R. Blumenhagen,

More information

Multiagent (Deep) Reinforcement Learning

Multiagent (Deep) Reinforcement Learning Multiagent (Deep) Reinforcement Learning MARTIN PILÁT (MARTIN.PILAT@MFF.CUNI.CZ) Reinforcement learning The agent needs to learn to perform tasks in environment No prior knowledge about the effects of

More information

String Phenomenology ???

String Phenomenology ??? String Phenomenology Andre Lukas Oxford, Theoretical Physics d=11 SUGRA IIB M IIA??? I E x E 8 8 SO(32) Outline A (very) basic introduction to string theory String theory and the real world? Recent work

More information

Reinforcement Learning and Deep Reinforcement Learning

Reinforcement Learning and Deep Reinforcement Learning Reinforcement Learning and Deep Reinforcement Learning Ashis Kumer Biswas, Ph.D. ashis.biswas@ucdenver.edu Deep Learning November 5, 2018 1 / 64 Outlines 1 Principles of Reinforcement Learning 2 The Q

More information

CS885 Reinforcement Learning Lecture 7a: May 23, 2018

CS885 Reinforcement Learning Lecture 7a: May 23, 2018 CS885 Reinforcement Learning Lecture 7a: May 23, 2018 Policy Gradient Methods [SutBar] Sec. 13.1-13.3, 13.7 [SigBuf] Sec. 5.1-5.2, [RusNor] Sec. 21.5 CS885 Spring 2018 Pascal Poupart 1 Outline Stochastic

More information

Learning Deep Architectures for AI. Part I - Vijay Chakilam

Learning Deep Architectures for AI. Part I - Vijay Chakilam Learning Deep Architectures for AI - Yoshua Bengio Part I - Vijay Chakilam Chapter 0: Preliminaries Neural Network Models The basic idea behind the neural network approach is to model the response as a

More information

Machine learning, incomputably large data sets, and the string landscape

Machine learning, incomputably large data sets, and the string landscape Machine learning, incomputably large data sets, and the string landscape 2017 Workshop on Data Science and String Theory Northeastern University December 1, 2017 Washington (Wati) Taylor, MIT Based in

More information

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal

More information

Natural Language Processing. Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu

Natural Language Processing. Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu Natural Language Processing Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu Projects Project descriptions due today! Last class Sequence to sequence models Attention Pointer networks Today Weak

More information

Reinforcement Learning. Introduction

Reinforcement Learning. Introduction Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control

More information

Basics of reinforcement learning

Basics of reinforcement learning Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system

More information

arxiv: v1 [hep-th] 6 Mar 2014

arxiv: v1 [hep-th] 6 Mar 2014 LMU-ASC 05/14 TUM-HEP 932/14 Geography of Fields in Extra Dimensions: String Theory Lessons for Particle Physics Hans Peter Nilles a and Patrick K.S. Vaudrevange b arxiv:1403.1597v1 [hep-th] 6 Mar 2014

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 12: Probability 3/2/2011 Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. 1 Announcements P3 due on Monday (3/7) at 4:59pm W3 going out

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Temporal Difference Learning Temporal difference learning, TD prediction, Q-learning, elibigility traces. (many slides from Marc Toussaint) Vien Ngo Marc Toussaint University of

More information

Stringy Instantons, Backreaction and Dimers.

Stringy Instantons, Backreaction and Dimers. Stringy Instantons, Backreaction and Dimers. Eduardo García-Valdecasas Tenreiro Instituto de Física Teórica UAM/CSIC, Madrid Based on 1605.08092 and 1704.05888 by E.G. & A. Uranga and ongoing work String

More information

Sterile Neutrinos from the Top Down

Sterile Neutrinos from the Top Down Sterile Neutrinos from the Top Down Active-sterile mixing The landscape Small Dirac/Majorana masses The mini-seesaw Light Sterile Neutrinos: A White Paper (K. Abazajian et al), 1204.5379 Neutrino Masses

More information

A derivation of the Standard Model. Based on Nucl.Phys. B883 (2014) with B. Gato Rivera

A derivation of the Standard Model. Based on Nucl.Phys. B883 (2014) with B. Gato Rivera A derivation of the Standard Model Based on Nucl.Phys. B883 (2014) 529-580 with B. Gato Rivera High Energy Weighted Ensemble The Standard Model Anthropic Features Low Energy High Energy Unique Theory The

More information