Branes with Brains. Reinforcement learning in the landscape of intersecting brane worlds. String_Data 2017, Boston 11/30/2017
|
|
- Marshall McDowell
- 5 years ago
- Views:
Transcription
1 Branes with Brains Reinforcement learning in the landscape of intersecting brane worlds FABIAN RUEHLE (UNIVERSITY OF OXFORD) String_Data 2017, Boston 11/30/2017 Based on [work in progress] with Brent Nelson and Jim Halverson
2 Motivation - ML Three approaches to machine learning: Supervised Learning: Train the machine by telling it what to do Unsupervised Learning: Let the machine train without telling it what to do Reinforcement Learning: [Sutton, Barto 98 17] Based on behavioral psychology Don t tell the machine exactly what to do but reward good and/or punish bad actions AI = reinforcement learning + deep (neural networks) learning [Silver 16]
3 Motivation - RL Agents interact with an environment (e.g. string landscape) Each interaction changes the state of the agent, e.g. the dof s parameterizing the string vacuum Each step is either rewarded (action lead to a more realistic vacuum) or punished (action lead to a less realistic vacuum) The agent acts with the aim of maximizing its long-term reward Agent repeats actions until it is told to stop (found a realistic vacuum or give up)
4 Outline String Theory setup: Intersecting D6-branes on orbifolds of toroidal orientifolds Implementation in Reinforcement Learning (RL) Basic overview Implementing the RL code Modelling the environment Preliminary results Finding consistent solutions Conclusion
5 String Theory 101 Intersecting D6-branes on orbifolds of toroidal orientifolds
6 String Theory 101 Have: IIA String Theory in 9D + time with 32 supercharges Want: A Theory in 3D + time with 4 supercharges Idea: Make extra 6D so small that we do not see them How do we do that? 1. Make them compact 2. Make their diameter so small that our experiments cannot detect them Reduce supercharges from 32 to 4: Identify some points with their mirror image
7 String Theory Setup Why this setup? Well studied Comparatively simple Number of (well-defined) solutions known to be finite: [Blumenhagen,Gmeiner,Honecker,Lust,Weigand '04'05; Douglas, Taylor '07,...] [Ibanez, Uranga 12] Use symmetries to relate different vacua [Douglas, Taylor 07] Combine consistency conditions to rule out combinations BUT: Number of possibilities so large that not a single interesting solution could be found despite enormous random scans (estimated to 1:10 9 ) Seems Taylor-made for big data / AI methods
8 String Theory Compactification How to make a dimension compact? ) Pacman
9 String Theory Compactification How to make a dimension compact? ) Pacman
10 String Theory Compactification How to make a dimension compact? ) Pacman
11 String Theory Compactification How to make a dimension compact? ) Pacman
12 String Theory Compactification y 1 y 2 y 3 x 1 x 2 x 3 Now six compact dimensions, but idea too simple Resulting space too simple (but just a little bit) Make it a bit more complicated
13 String Theory Orbifolds y 1 T 2 T 2 /Z 2 x 1 Mathematically: (x 1,y 1 )! ( x 1, y 1 ) Resulting object is called an orbifold Need to also orientifold: (x 1,y 1 )! (x 1, y 1 ) (plus something similar for the string itself)
14 String Theory Winding numbers (n, m) =(1, 0) (n, m) =(0, 1) (n, m) =(1, 2) Winding numbers (n, m) : Note: Due to orientifold: include (n, m), (n, m)
15 String Theory D6 branes 3D y 1 y 2 y 3 T 2 T 2 T 2 x 1 x 2 x 3 D6 brane: our 3D + a line on each torus Can stack multiple D6 branes on top of each other Brane stacks, Tuple: (N,n 1,m 1,n 2,m 2,n 3,m 3 )
16 String Theory Gauge group and particles Observed gauge group: U(N) : N Special cases: D6 branes on top of each other SO(2N) : N D6 branes parallel to O6-plane Sp(N) : N D6 branes orthogonal to O6-plane Intersection of N-brane and M -brane stack: Particles in representation SU(3) SU(2) U(1) Y (N,M) 1, 1 Observed particles in the universe: 3 (3, 2) 1 +3 (3, 1) 4 +3 (3, 1) (1, 2) 3 +1 (1, 2) 3 +3 (1, 1) 6 Quarks Leptons + Higgs
17 String Theory MSSM 3D y 1 y 2 y 3 T 2 T 2 T 2 x 1 x 2 x 3 Green and yellow intersect in 3 1 1=3 points Note: Counting intersections on the orbifold a bit more subtle
18 String Theory Consistency Tadpole cancellation: Balance energy of D6 and O6: 0 N a n a #stacks 1 n a 2 n a X B N a n a 1 m a 2m a N a m a a=1 1 n a 2 m a C A = B A N a m a 1m a 2 n a 3 8 K-Theory: Global consistency: 0 2N a m a 1m a 2m a 1 #stacks 3 X B N a m a 1 n a 2 n a N a n a a=1 1 m a 2 n a C A mod 3 2N a n a 1 n a 2 m a C A = C A
19 String Theory Consistency SUSY (computational control): 8a =1,...,# stacks m a 1m a 2m a 3 jm a 1n a 2n a 3 kn a 1m a 2n a 3 `n a 1n a 2m a 3 =0 n a 1n a 2n a 3 jn a 1m a 2m a 3 km a 1n a 2m a 3 `m a 1m a 2n a 3 > 0 Pheno: SU(3) SU(2) U(1) + particles T =(T 1,T 2,...,T k ), k =#U(N) stacks is U(1) iff: N 1 m 1 1 2N 2 m 2 1 2N k m k 1 T 1 0 2N 1 m 2 1 2N 2 m 2 2 2N k m k A T 2 2 B. C 2N 1 m 2 3 2N 2 m 2 3 2N k m A 3 T k A
20 String Theory IIA state space State space gigantic Choose a maximal value for winding number N B Let be the number of possible winding number combinations (up to ) after symmetry reduction N S Let be the maximal number of stacks NB Allows for combinations N S w max w max Note: Each stack can have N =1, 2, 3,... branes
21 Reinforcement learning
22 Reinforcement learning - Overview At time t, agent in state s t 2 S total Select action from action space A based on policy : S total 7! A a t Receive reward r t 2 R for action a t based on reward function R, R : S total A! R Transition to the next state Try to maximize long-term return G t =, Keep track of state value s t+1 v(s) ( how good is the state ) Adv = r 2 (0, 1] Compute advantage estimate ( how much better than expected has the action turned out to be ) 1X k=1 v k r t+k
23 Reinforcement Learning - Overview How to maximize future return? Depends on policy Several approaches Tabular (small state/action spaces): [Sutton, Barto 98] Temporal difference learning SARSA ) my breakout group on Friday Q-learning Deep RL (large/infinite state/action spaces): Deep Q-Network [Mnih et al 15] Asynchronous advantage actor-critic (A3C) [Mnih et al 16] Variations/extensions: Wolpertinger [Dulac-Arnold et al 16], Rainbow[Hessel et al '17]
24 Reinforcement Learning - A3C Global instance Policy Value Network Input Worker 1 Worker 2 Worker n Policy Value Policy Value Policy Value Network Network Network Input Input Input Environment Environment Environment
25 Reinforcement Learning - A3C Asynchronous: Have n workers explore the environment simultaneously and asynchronously improves training stability (experience of workers separated) improves exploration Advantage: Use advantage to update policy Actor-critic: To maximize return need to know state or action value and optimize policy. Methods like Q-learning focuses on value function Methods like policy-gradient focus on policy AC: Use value estimate ( critic ) to update policy ( actor )
26 Reinforcement Learning - Implementation Open AI Gym: Interface between agent (RL) and environment (string landscape) [Brockman et al '16] We provide the environment We use ChainerRL s implementation of A3C for the agent Environment step Chainer RL action space observation (state) space reset make env method (A3C,DQN, ) NN architecture (FF, LSTM, ) step: go to new state return (new_state, reward, done, comment) reset: reset episode return start_state make environment specify RL method (A3C) specify policy NN (FF,LSTM)
27 Reinforcement learning - Model the environment State space:, s t =[(N 1,n 1 1,m 1 1,n 1 2,m 1 2,n 1 3,m 1 3), (N 2,n 2 1,...),...] Action space Reward R s t 2 S total S total = N max NB : Need a notion of how good a state is 1. By how much does a set of stacks violate the tadpole? 2. Is a set of stacks fully consistent (Tadpole, K-Theory, SUSY) (Note: the latter two are binary, hard to define distance) 3. How far is the state from the Standard Model Missing a group factor of SU(3) SU(2) U(1)? Too few Standard model particles (Q, u, d, L, H u,h d,e)? Extra exotics (particles charged under the Standard Model but not observed so far) Note: Only works if good states are close by in this sense N S N S A = {N a! N a ± 1, add stack (N,n 1,...), remove stack (N,n 1,...)}
28 Preliminary results Parameters: 16 or 32 workers (1 CPU, threads, 2.6GHz) Training time of the order 10h Neural networks for value and policy evaluation Feed-forward NN with 2 hidden Softmax layers with 200 nodes RNN with linear (200 nodes) and LSTM layer (128 nodes) Initial state: Empty stack Maximal steps per episode: 10, , evaluation runs every 100,000 steps
29 Preliminary results - Tadpole cancellation mean scores log(average # steps to solution) # steps log(# steps) Tadpole cancellation Maximum of 10,000 steps in an episode Reward for Tadpole cancellation: 10 6 Punishment for step: X 8 Tadpolei (s)
30 Preliminary results - Tadpole cancellation mean scores log(average # steps to solution) # steps log(# steps) Tadpole cancellation Maximum of 50,000 steps in an episode Reward for Tadpole cancellation: 10 6 Punishment for step: X 8 Tadpole i (s)
31 Preliminary results - Tadpole cancellation mean scores log(average # steps to solution) # of Steps log(# steps) Tadpole cancellation Maximum of 250,000 steps in an episode Reward for Tadpole cancellation: 10 6 Punishment for step: X 8 Tadpole i (s)
32 Preliminary results - Tadpole+K-Theory+SUSY mean scores # steps Full consistency (TC + K-Th + SUSY) Maximum of 10,000 steps in an episode Rewards: (TC,K-Th, SUSY) = (10 7, 10 8, 10 9 ) Punishment for step: X 8 Tadpole i (s)
33 Conclusion Reinforcement learning very promising approach to AI + ML A3C performs very well for different environments (mostly tested for Atari games) Type II orientifold setup well-suited for landscape analysis Physics well-understood Number of configurations too large to approach w/ conventional methods! no Standard model found so far Preliminary results: A3C works well for consistency (Tadpole, K-Theory, SUSY) Getting close to SM
34 Thank you for your attention!
On the Phenomenology of Four Dimensional Gepner Models
On the Phenomenology of Four Dimensional Gepner Models Mirian Tsulaia (Vienna University of Technology) Work in progress with Elias Kiritsis (University of Crete) Liverpool, March 27 29, 2008 String Phenomenology
More information1 Introduction 2. 4 Q-Learning The Q-value The Temporal Difference The whole Q-Learning process... 5
Table of contents 1 Introduction 2 2 Markov Decision Processes 2 3 Future Cumulative Reward 3 4 Q-Learning 4 4.1 The Q-value.............................................. 4 4.2 The Temporal Difference.......................................
More informationIntroduction of Reinforcement Learning
Introduction of Reinforcement Learning Deep Reinforcement Learning Reference Textbook: Reinforcement Learning: An Introduction http://incompleteideas.net/sutton/book/the-book.html Lectures of David Silver
More informationReinforcement Learning
Reinforcement Learning Cyber Rodent Project Some slides from: David Silver, Radford Neal CSC411: Machine Learning and Data Mining, Winter 2017 Michael Guerzhoy 1 Reinforcement Learning Supervised learning:
More informationREINFORCEMENT LEARNING
REINFORCEMENT LEARNING Larry Page: Where s Google going next? DeepMind's DQN playing Breakout Contents Introduction to Reinforcement Learning Deep Q-Learning INTRODUCTION TO REINFORCEMENT LEARNING Contents
More informationDeep Reinforcement Learning. Scratching the surface
Deep Reinforcement Learning Scratching the surface Deep Reinforcement Learning Scenario of Reinforcement Learning Observation State Agent Action Change the environment Don t do that Reward Environment
More informationINF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018
Machine learning for image classification Lecture 14: Reinforcement learning May 9, 2018 Page 3 Outline Motivation Introduction to reinforcement learning (RL) Value function based methods (Q-learning)
More informationChristopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015
Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)
More informationCS230: Lecture 9 Deep Reinforcement Learning
CS230: Lecture 9 Deep Reinforcement Learning Kian Katanforoosh Menti code: 21 90 15 Today s outline I. Motivation II. Recycling is good: an introduction to RL III. Deep Q-Learning IV. Application of Deep
More information(Deep) Reinforcement Learning
Martin Matyášek Artificial Intelligence Center Czech Technical University in Prague October 27, 2016 Martin Matyášek VPD, 2016 1 / 17 Reinforcement Learning in a picture R. S. Sutton and A. G. Barto 2015
More informationApproximate Q-Learning. Dan Weld / University of Washington
Approximate Q-Learning Dan Weld / University of Washington [Many slides taken from Dan Klein and Pieter Abbeel / CS188 Intro to AI at UC Berkeley materials available at http://ai.berkeley.edu.] Q Learning
More informationCSC321 Lecture 22: Q-Learning
CSC321 Lecture 22: Q-Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Q-Learning 1 / 21 Overview Second of 3 lectures on reinforcement learning Last time: policy gradient (e.g. REINFORCE) Optimize
More informationREINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning
REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning Ronen Tamari The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (#67679) February 28, 2016 Ronen Tamari
More informationQ-Learning in Continuous State Action Spaces
Q-Learning in Continuous State Action Spaces Alex Irpan alexirpan@berkeley.edu December 5, 2015 Contents 1 Introduction 1 2 Background 1 3 Q-Learning 2 4 Q-Learning In Continuous Spaces 4 5 Experimental
More informationDeep Reinforcement Learning SISL. Jeremy Morton (jmorton2) November 7, Stanford Intelligent Systems Laboratory
Deep Reinforcement Learning Jeremy Morton (jmorton2) November 7, 2016 SISL Stanford Intelligent Systems Laboratory Overview 2 1 Motivation 2 Neural Networks 3 Deep Reinforcement Learning 4 Deep Learning
More informationReinforcement Learning and NLP
1 Reinforcement Learning and NLP Kapil Thadani kapil@cs.columbia.edu RESEARCH Outline 2 Model-free RL Markov decision processes (MDPs) Derivative-free optimization Policy gradients Variance reduction Value
More informationReinforcement Learning
Reinforcement Learning Ron Parr CompSci 7 Department of Computer Science Duke University With thanks to Kris Hauser for some content RL Highlights Everybody likes to learn from experience Use ML techniques
More informationHuman-level control through deep reinforcement. Liia Butler
Humanlevel control through deep reinforcement Liia Butler But first... A quote "The question of whether machines can think... is about as relevant as the question of whether submarines can swim" Edsger
More informationDiscrete gauge symmetries and Open strings
Discrete gauge symmetries and Open strings Pascal Anastasopoulos work in progress with M. Cvetic, J. Halverson, R. Richter and P. Vaudrevange. Corfu - 20/09/2012 Plan of the talk Motivation Discrete gauge
More informationHidden Sector Baryogenesis. Jason Kumar (Texas A&M University) w/ Bhaskar Dutta (hep-th/ ) and w/ B.D and Louis Leblond (hepth/ )
Hidden Sector Baryogenesis Jason Kumar (Texas A&M University) w/ Bhaskar Dutta (hep-th/0608188) and w/ B.D and Louis Leblond (hepth/0703278) Basic Issue low-energy interactions seem to preserve baryon
More informationNew Physics from Type IIA Quivers
New Physics from Type IIA Quivers The string vacuum Tadpoles and extended MSSM quivers New matter and Z s Axigluons from type IIa quivers M. Cvetič, J. Halverson, PL: Implications of String Constraints
More informationReinforcement Learning. Machine Learning, Fall 2010
Reinforcement Learning Machine Learning, Fall 2010 1 Administrativia This week: finish RL, most likely start graphical models LA2: due on Thursday LA3: comes out on Thursday TA Office hours: Today 1:30-2:30
More informationReinforcement Learning for NLP
Reinforcement Learning for NLP Advanced Machine Learning for NLP Jordan Boyd-Graber REINFORCEMENT OVERVIEW, POLICY GRADIENT Adapted from slides by David Silver, Pieter Abbeel, and John Schulman Advanced
More informationReinforcement Learning
Reinforcement Learning Dipendra Misra Cornell University dkm@cs.cornell.edu https://dipendramisra.wordpress.com/ Task Grasp the green cup. Output: Sequence of controller actions Setup from Lenz et. al.
More informationReinforcement Learning Part 2
Reinforcement Learning Part 2 Dipendra Misra Cornell University dkm@cs.cornell.edu https://dipendramisra.wordpress.com/ From previous tutorial Reinforcement Learning Exploration No supervision Agent-Reward-Environment
More informationAnomalous discrete symmetries in D-brane models
Anomalous discrete symmetries in D-brane models Shohei Uemura Maskawa Institute for Science and Culture, Kyoto Sangyo University based on a work in progress with Tatsuo Kobayashi (Hokkaido University),
More informationPayments System Design Using Reinforcement Learning: A Progress Report
Payments System Design Using Reinforcement Learning: A Progress Report A. Desai 1 H. Du 1 R. Garratt 2 F. Rivadeneyra 1 1 Bank of Canada 2 University of California Santa Barbara 16th Payment and Settlement
More informationMachine Learning I Reinforcement Learning
Machine Learning I Reinforcement Learning Thomas Rückstieß Technische Universität München December 17/18, 2009 Literature Book: Reinforcement Learning: An Introduction Sutton & Barto (free online version:
More informationWeb of threefold bases in F-theory and machine learning
and machine learning 1510.04978 & 1710.11235 with W. Taylor CTP, MIT String Data Science, Northeastern; Dec. 2th, 2017 1 / 33 Exploring a huge oriented graph 2 / 33 Nodes in the graph Physical setup: 4D
More informationGrand Unification and Strings:
Grand Unification and Strings: the Geography of Extra Dimensions Hans Peter Nilles Physikalisches Institut, Universität Bonn Based on work with S. Förste, P. Vaudrevange and A. Wingerter hep-th/0406208,
More informationSupersymmetric Standard Models in String Theory
Supersymmetric Standard Models in String Theory (a) Spectrum (b) Couplings (c) Moduli stabilisation Type II side [toroidal orientifolds]- brief summary- status (a),(b)&(c) Heterotic side [Calabi-Yau compactification]
More informationIntroduction to Reinforcement Learning. CMPT 882 Mar. 18
Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and
More informationAn exploration of threefold bases in F-theory
1510.04978 & upcoming work with W. Taylor CTP, MIT String Pheno 2017; Jul. 6th, 2017 F-theory landscape program Classify distinct F-theory compactifications to 4D F-theory compactification on an elliptic
More informationDeep Reinforcement Learning. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 19, 2017
Deep Reinforcement Learning STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 19, 2017 Outline Introduction to Reinforcement Learning AlphaGo (Deep RL for Computer Go)
More informationNeural Map. Structured Memory for Deep RL. Emilio Parisotto
Neural Map Structured Memory for Deep RL Emilio Parisotto eparisot@andrew.cmu.edu PhD Student Machine Learning Department Carnegie Mellon University Supervised Learning Most deep learning problems are
More informationLecture 10 - Planning under Uncertainty (III)
Lecture 10 - Planning under Uncertainty (III) Jesse Hoey School of Computer Science University of Waterloo March 27, 2018 Readings: Poole & Mackworth (2nd ed.)chapter 12.1,12.3-12.9 1/ 34 Reinforcement
More informationCS 570: Machine Learning Seminar. Fall 2016
CS 570: Machine Learning Seminar Fall 2016 Class Information Class web page: http://web.cecs.pdx.edu/~mm/mlseminar2016-2017/fall2016/ Class mailing list: cs570@cs.pdx.edu My office hours: T,Th, 2-3pm or
More informationTemporal Difference Learning & Policy Iteration
Temporal Difference Learning & Policy Iteration Advanced Topics in Reinforcement Learning Seminar WS 15/16 ±0 ±0 +1 by Tobias Joppen 03.11.2015 Fachbereich Informatik Knowledge Engineering Group Prof.
More informationPhysics of Type II and Heterotic SM & GUT String Vacua
Physics of Type II and Heterotic SM & GUT String Vacua (a) Spectrum (b) Couplings (c) Moduli stabilisation Type II side [toroidal orientifolds]- summary (a)&(b) [(c)- no time] new results on SU(5) GUT
More informationDeep reinforcement learning. Dialogue Systems Group, Cambridge University Engineering Department
Deep reinforcement learning Milica Gašić Dialogue Systems Group, Cambridge University Engineering Department 1 / 25 In this lecture... Introduction to deep reinforcement learning Value-based Deep RL Deep
More information15-780: ReinforcementLearning
15-780: ReinforcementLearning J. Zico Kolter March 2, 2016 1 Outline Challenge of RL Model-based methods Model-free methods Exploration and exploitation 2 Outline Challenge of RL Model-based methods Model-free
More informationDavid Silver, Google DeepMind
Tutorial: Deep Reinforcement Learning David Silver, Google DeepMind Outline Introduction to Deep Learning Introduction to Reinforcement Learning Value-Based Deep RL Policy-Based Deep RL Model-Based Deep
More informationPolicy Gradient Methods. February 13, 2017
Policy Gradient Methods February 13, 2017 Policy Optimization Problems maximize E π [expression] π Fixed-horizon episodic: T 1 Average-cost: lim T 1 T r t T 1 r t Infinite-horizon discounted: γt r t Variable-length
More informationArtificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen
Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition
More informationHeterotic Brane World
Heterotic Brane World Hans Peter Nilles Physikalisches Institut Universität Bonn Germany Based on work with S. Förste, P. Vaudrevange and A. Wingerter hep-th/0406208, to appear in Physical Review D Heterotic
More informationReinforcement Learning II. George Konidaris
Reinforcement Learning II George Konidaris gdk@cs.brown.edu Fall 2017 Reinforcement Learning π : S A max R = t=0 t r t MDPs Agent interacts with an environment At each time t: Receives sensor signal Executes
More informationReinforcement Learning. George Konidaris
Reinforcement Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom
More informationReinforcement Learning II. George Konidaris
Reinforcement Learning II George Konidaris gdk@cs.brown.edu Fall 2018 Reinforcement Learning π : S A max R = t=0 t r t MDPs Agent interacts with an environment At each time t: Receives sensor signal Executes
More informationDeep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)
CSCE 636 Neural Networks Instructor: Yoonsuck Choe Deep Learning What Is Deep Learning? Learning higher level abstractions/representations from data. Motivation: how the brain represents sensory information
More informationINTRODUCTION TO EVOLUTION STRATEGY ALGORITHMS. James Gleeson Eric Langlois William Saunders
INTRODUCTION TO EVOLUTION STRATEGY ALGORITHMS James Gleeson Eric Langlois William Saunders REINFORCEMENT LEARNING CHALLENGES f(θ) is a discrete function of theta How do we get a gradient θ f? Credit assignment
More information6 Reinforcement Learning
6 Reinforcement Learning As discussed above, a basic form of supervised learning is function approximation, relating input vectors to output vectors, or, more generally, finding density functions p(y,
More informationExperiments on the Consciousness Prior
Yoshua Bengio and William Fedus UNIVERSITÉ DE MONTRÉAL, MILA Abstract Experiments are proposed to explore a novel prior for representation learning, which can be combined with other priors in order to
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Reinforcement learning Daniel Hennes 4.12.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Reinforcement learning Model based and
More informationCourse Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch
Psychology 452 Week 12: Deep Learning What Is Deep Learning? Preliminary Ideas (that we already know!) The Restricted Boltzmann Machine (RBM) Many Layers of RBMs Pros and Cons of Deep Learning Course Structure
More informationProf. Dr. Ann Nowé. Artificial Intelligence Lab ai.vub.ac.be
REINFORCEMENT LEARNING AN INTRODUCTION Prof. Dr. Ann Nowé Artificial Intelligence Lab ai.vub.ac.be REINFORCEMENT LEARNING WHAT IS IT? What is it? Learning from interaction Learning about, from, and while
More informationGoogle s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, et al. Google arxiv:1609.08144v2 Reviewed by : Bill
More informationStrings, Exotics, and the 750 GeV Diphoton Excess
Strings, Exotics, and the 750 GeV Diphoton Excess Events / 0 GeV 4 ATLAS Preliminary Data Background-only fit Spin-0 Selection s = 1 TeV,. fb The ATLAS and CMS results 1 Phenomenology and theory Data -
More informationCS599 Lecture 1 Introduction To RL
CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming
More informationMachine Learning. Reinforcement learning. Hamid Beigy. Sharif University of Technology. Fall 1396
Machine Learning Reinforcement learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 1 / 32 Table of contents 1 Introduction
More informationReinforcement Learning
Reinforcement Learning Function approximation Daniel Hennes 19.06.2017 University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Eligibility traces n-step TD returns Forward and backward view Function
More informationDeep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)
CSCE 636 Neural Networks Instructor: Yoonsuck Choe Deep Learning What Is Deep Learning? Learning higher level abstractions/representations from data. Motivation: how the brain represents sensory information
More informationDeep Reinforcement Learning: Policy Gradients and Q-Learning
Deep Reinforcement Learning: Policy Gradients and Q-Learning John Schulman Bay Area Deep Learning School September 24, 2016 Introduction and Overview Aim of This Talk What is deep RL, and should I use
More informationCMU Lecture 12: Reinforcement Learning. Teacher: Gianni A. Di Caro
CMU 15-781 Lecture 12: Reinforcement Learning Teacher: Gianni A. Di Caro REINFORCEMENT LEARNING Transition Model? State Action Reward model? Agent Goal: Maximize expected sum of future rewards 2 MDP PLANNING
More informationReinforcement learning an introduction
Reinforcement learning an introduction Prof. Dr. Ann Nowé Computational Modeling Group AIlab ai.vub.ac.be November 2013 Reinforcement Learning What is it? Learning from interaction Learning about, from,
More informationPredictions from F-theory GUTs. (High vs. low-scale SUSY; proton decay)
Predictions from F-theory GUTs Arthur Hebecker (Heidelberg) (including some original work with James Unwin (Notre Dame)) Outline: Motivation (Why F-theory GUTs?) Proton decay Flavor; neutrino masses Gauge
More informationApproximation Methods in Reinforcement Learning
2018 CS420, Machine Learning, Lecture 12 Approximation Methods in Reinforcement Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/cs420/index.html Reinforcement
More informationAn Axion-induced SM/MSSM Higgs Landscape and the Weak Gravity Conjecture
An Axion-induced SM/MSSM Higgs Landscape and the Weak Gravity Conjecture Alvaro Herraez Universidad Autónoma de Madrid & Instituto de Fisica Teórica UAM-CSIC Based on: A.H., L. Ibáñez [1610.08836] String
More informationD-BRANE MODEL BUILDING, PART I
D-BRANE MODEL BUILDING, PART I Luis E. Ibáñez Instituto de Física Teórica (IFT-UAM/CSIC) Universidad Autónoma de Madrid PITP School, IAS Princeton, July 2008 L.E. Ibáñez; D-BRANE MODEL BUILDING PART I,
More informationCS Machine Learning Qualifying Exam
CS Machine Learning Qualifying Exam Georgia Institute of Technology March 30, 2017 The exam is divided into four areas: Core, Statistical Methods and Models, Learning Theory, and Decision Processes. There
More informationToday s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes
Today s s Lecture Lecture 20: Learning -4 Review of Neural Networks Markov-Decision Processes Victor Lesser CMPSCI 683 Fall 2004 Reinforcement learning 2 Back-propagation Applicability of Neural Networks
More informationThe correspondence between free fermionic models and orbifolds
The correspondence between free fermionic models and orbifolds Panos Athanasopoulos based on arxiv 1506.xxxx with A. Faraggi, S. Groot Nibbelink and V. Mehta Panos Athanasopoulos (UoL) Free fermionic models
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More informationThe convergence limit of the temporal difference learning
The convergence limit of the temporal difference learning Ryosuke Nomura the University of Tokyo September 3, 2013 1 Outline Reinforcement Learning Convergence limit Construction of the feature vector
More informationLecture 1: March 7, 2018
Reinforcement Learning Spring Semester, 2017/8 Lecture 1: March 7, 2018 Lecturer: Yishay Mansour Scribe: ym DISCLAIMER: Based on Learning and Planning in Dynamical Systems by Shie Mannor c, all rights
More informationSequence Modeling with Neural Networks
Sequence Modeling with Neural Networks Harini Suresh y 0 y 1 y 2 s 0 s 1 s 2... x 0 x 1 x 2 hat is a sequence? This morning I took the dog for a walk. sentence medical signals speech waveform Successes
More informationEteRNA-RL: Designing RNA secondary structures with reinforcement learning. Isaac Kauvar, Ethan Richman, Will Allen
EteRNA-RL: Designing RNA secondary structures with reinforcement learning Isaac Kauvar, Ethan Richman, Will Allen The Problem Specify an RNA sequence that will fold into a desired secondary structure.
More informationDeep Reinforcement Learning
Martin Matyášek Artificial Intelligence Center Czech Technical University in Prague October 27, 2016 Martin Matyášek VPD, 2016 1 / 50 Reinforcement Learning in a picture R. S. Sutton and A. G. Barto 2015
More informationReview: TD-Learning. TD (SARSA) Learning for Q-values. Bellman Equations for Q-values. P (s, a, s )[R(s, a, s )+ Q (s, (s ))]
Review: TD-Learning function TD-Learning(mdp) returns a policy Class #: Reinforcement Learning, II 8s S, U(s) =0 set start-state s s 0 choose action a, using -greedy policy based on U(s) U(s) U(s)+ [r
More informationMarks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam:
Marks } Assignment 1: Should be out this weekend } All are marked, I m trying to tally them and perhaps add bonus points } Mid-term: Before the last lecture } Mid-term deferred exam: } This Saturday, 9am-10.30am,
More informationForward Actor-Critic for Nonlinear Function Approximation in Reinforcement Learning
Forward Actor-Critic for Nonlinear Function Approximation in Reinforcement Learning Vivek Veeriah Dept. of Computing Science University of Alberta Edmonton, Canada vivekveeriah@ualberta.ca Harm van Seijen
More informationIntroduction to Reinforcement Learning
CSCI-699: Advanced Topics in Deep Learning 01/16/2019 Nitin Kamra Spring 2019 Introduction to Reinforcement Learning 1 What is Reinforcement Learning? So far we have seen unsupervised and supervised learning.
More informationarxiv:hep-th/ v3 11 Oct 2004
UPR-1068-T, hep-th/0403061 Supersymmetric Pati-Salam Models from Intersecting D6-branes: A Road to the Standard Model Mirjam Cvetič, 1 Tianjun Li, 2 and Tao Liu 1 1 Department of Physics and Astronomy,
More informationReinforcement. Function Approximation. Learning with KATJA HOFMANN. Researcher, MSR Cambridge
Reinforcement Learning with Function Approximation KATJA HOFMANN Researcher, MSR Cambridge Representation and Generalization in RL Focus on training stability Learning generalizable value functions Navigating
More informationTheory III: String Theory. presented by Dieter Lüst, MPI and LMU-München
Theory III: String Theory presented by Dieter Lüst, MPI and LMU-München The string theory group at MPI (started in 2004): The string theory group at MPI (started in 2004): Permanent members: R. Blumenhagen,
More informationMultiagent (Deep) Reinforcement Learning
Multiagent (Deep) Reinforcement Learning MARTIN PILÁT (MARTIN.PILAT@MFF.CUNI.CZ) Reinforcement learning The agent needs to learn to perform tasks in environment No prior knowledge about the effects of
More informationString Phenomenology ???
String Phenomenology Andre Lukas Oxford, Theoretical Physics d=11 SUGRA IIB M IIA??? I E x E 8 8 SO(32) Outline A (very) basic introduction to string theory String theory and the real world? Recent work
More informationReinforcement Learning and Deep Reinforcement Learning
Reinforcement Learning and Deep Reinforcement Learning Ashis Kumer Biswas, Ph.D. ashis.biswas@ucdenver.edu Deep Learning November 5, 2018 1 / 64 Outlines 1 Principles of Reinforcement Learning 2 The Q
More informationCS885 Reinforcement Learning Lecture 7a: May 23, 2018
CS885 Reinforcement Learning Lecture 7a: May 23, 2018 Policy Gradient Methods [SutBar] Sec. 13.1-13.3, 13.7 [SigBuf] Sec. 5.1-5.2, [RusNor] Sec. 21.5 CS885 Spring 2018 Pascal Poupart 1 Outline Stochastic
More informationLearning Deep Architectures for AI. Part I - Vijay Chakilam
Learning Deep Architectures for AI - Yoshua Bengio Part I - Vijay Chakilam Chapter 0: Preliminaries Neural Network Models The basic idea behind the neural network approach is to model the response as a
More informationMachine learning, incomputably large data sets, and the string landscape
Machine learning, incomputably large data sets, and the string landscape 2017 Workshop on Data Science and String Theory Northeastern University December 1, 2017 Washington (Wati) Taylor, MIT Based in
More informationPattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore
Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal
More informationNatural Language Processing. Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu
Natural Language Processing Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu Projects Project descriptions due today! Last class Sequence to sequence models Attention Pointer networks Today Weak
More informationReinforcement Learning. Introduction
Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control
More informationBasics of reinforcement learning
Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system
More informationarxiv: v1 [hep-th] 6 Mar 2014
LMU-ASC 05/14 TUM-HEP 932/14 Geography of Fields in Extra Dimensions: String Theory Lessons for Particle Physics Hans Peter Nilles a and Patrick K.S. Vaudrevange b arxiv:1403.1597v1 [hep-th] 6 Mar 2014
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2011 Lecture 12: Probability 3/2/2011 Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. 1 Announcements P3 due on Monday (3/7) at 4:59pm W3 going out
More informationReinforcement Learning
Reinforcement Learning Temporal Difference Learning Temporal difference learning, TD prediction, Q-learning, elibigility traces. (many slides from Marc Toussaint) Vien Ngo Marc Toussaint University of
More informationStringy Instantons, Backreaction and Dimers.
Stringy Instantons, Backreaction and Dimers. Eduardo García-Valdecasas Tenreiro Instituto de Física Teórica UAM/CSIC, Madrid Based on 1605.08092 and 1704.05888 by E.G. & A. Uranga and ongoing work String
More informationSterile Neutrinos from the Top Down
Sterile Neutrinos from the Top Down Active-sterile mixing The landscape Small Dirac/Majorana masses The mini-seesaw Light Sterile Neutrinos: A White Paper (K. Abazajian et al), 1204.5379 Neutrino Masses
More informationA derivation of the Standard Model. Based on Nucl.Phys. B883 (2014) with B. Gato Rivera
A derivation of the Standard Model Based on Nucl.Phys. B883 (2014) 529-580 with B. Gato Rivera High Energy Weighted Ensemble The Standard Model Anthropic Features Low Energy High Energy Unique Theory The
More information