Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning
|
|
- Sabrina Boone
- 5 years ago
- Views:
Transcription
1 Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning Stefan Schaal Max-Planck-Institute for Intelligent Systems Tübingen, Germany & Computer Science, Neuroscience, & Biomedical Engineering University of Southern California, Los Angeles
2 Where Did We Stop...
3 Outline A Bit of Robotics History Foundations of Control Adaptive Control Learning Control - Model-based Robot Learning - Reinforcement Learning
4 What Needs to Be Learned in Learning Control? Internal Models Value Functions Control Policies Coordinate Transformations The Majority of the Learning Problems Involve Function Approximation Unsupervised Learning & Classification
5 Learning Internal Models Forward Models - models the causal functional relationship - for example: Inverse Models - models the inverse of the causal functional relationship - for example: y = f ( x) ( ( )!q G( q) )!!q = B 1 ( q) τ C q,!q x = f 1 ( y) B( q)!!q + C( q,!q )!q + G( q) = τ - NOTE: inverse models are not necessarily functions any more!
6 Inverse Models May Not Be Trivially Learnable
7 Inverse Models May Not Be Trivially Learnable t = f ( θ 1 1 1,θ ) 2 t = f ( θ 2 2 1,θ ) 2 what is f 1 ( t)?
8 Characteristics of Function Approximation in Robotics Incremental Learning large amounts of data continual learning to be approximated functions of growing and unknown complexity Fast Learning data efficient computationally efficient real-time Robust Learning minimal interference hundreds of inputs
9 Linear Regression: One of the Simplest Function Approximation Methods Recall the simple adaptive control model with: f ( x) = θx - find the line through all data points - imagine a spring attached between the line and each data point - all springs have the same spring constant - points far away generate more force (danger of outliers) - springs are vertical - solution is the minimum energy solution achieved by the springs y x
10 Linear Regression: One of the Simplest The data generating model: The Least Squares cost function Minimizing the cost gives the least-square solution Function Approximation Methods y =!w T!x + w 0 + ε = w T x + ε where x = x T T!w,1,w = w 0 J = 1 2 t y where : t = J w = 0 = J w, E{ ε} = 0 ( ) T ( t y) = 1 ( 2 t Xw ) T ( t Xw) t 1 t 2 t n = t T X + Xw, X = x 1 T x 2 T x n T 1 ( 2 t Xw ) T ( t Xw) ( ) T X = t T X + w T X T X thus : t T X = w T X T X or X T t = X T Xw result : w = ( X T X) 1 X T t ( ) T X = t Xw
11 Recursive Least Squares: An Incremental Version of Linear Regression Based on the matrix inversion theorem: ( A BC) 1 = A 1 + A 1 B( I + CA 1 B) 1 CA 1 Incremental updating of a linear regression model Initialize: P n = I 1 γ For every new data point ( x,t) (note that x includes the bias): P n+1 = 1 λ Pn Pn xx T P n λ + x T P n x W n+1 = W n + P n+1 x( t W nt x) T where γ << 1 (note P ( X T X) 1 where λ = 1 if no forgetting < 1 if forgetting - NOTE: RLS gives exactly the same solution as linear regression if no forgetting
12 Making Linear Regression Nonlinear: Locally Weighted Regression Region of Validity 2θ k Receptive Field Activation w 1 Linear Model 0 N i=1 ( ) 2 J = w i y i x i T β Note: Using GPs, SVR, Mixture Models, etc., are other ways to nonlinear regression
13 Locally Weighted Regression Piecewise linear function approximation, Each local model is learned from only local data No over-fitting due to too many local models (unlike RBFs, ME)
14 Locally Weighted Regression Linear Model: y = β x T x + β 0 = β T x where learned with x = [ x T 1] T Recursive weighted least squares: β n+1 k = β n k + wp n+1 k x( y!x T n β ) T k P n+1 k = 1 λ P n k P n k!x!x T n P k λ w +!xt P k n!x Weighting Kernel: w = exp 1 2 ( x c )T D( x c) Combined Prediction: y = K i=1 K w i i=1 w i y k where learned with D = M T M Gradient descent in penalized leave-one-out local cross-validation (PRESS) cost function: M n+1 k = M n k α J M N n J = N w k,i y i ŷ k,i, i + γ D k,ij i=1 w k,i i=1 ( ) < w gen i=1, j =1 add model when if min w k k createnew RF at c K +1 = x
15 Locally Weighted Regression ( ( ( ))) z = max exp( 10x 2 ),exp( 50y 2 ),1.25exp 5 x 2 + y x x
16 Locally Weighted Regression Inserted into Adaptive Control
17 Locally Weighted Regression Learn forward model of task dynamics, then computer controller
18 Locally Weighted Regression Learn forward model of task dynamics, then computer controller
19 Criticism of Locally Weighted Learning Breaks down in high-dimensional spaces Computationally expensive and numerically brittle due to (incremental) dxd matrix inversion Not compatible with modern probabilistic statistical learning algorithms Too many manual tuning parameters
20 The Curse of Dimensionality The power of local learning comes from exploiting the discriminative power of local neighborhood relations. But the notion of a local breaks down in high dim. spaces!
21 The Curse of Dimensionality Movement Data is Locally Low Dimensional 0.25 / / 0.2 Probability 0.15 Thus, locally weighted learning can work if used with local dimensionality reduction! Dimensionality / / 105 Derived with Bayesian Factor Analysis
22 A Bayesian Approach to Locally Weighted Learning Linear Regression as a Graphical Model y i = x i T β + ε ( ) ε N 0,ψ y β = ( X T X) 1 Xy
23 A Bayesian Approach to Locally Weighted Learning Inserting a Partial-Least-Squares-like projection as a set of hidden variables z i,m = x i,m β j + η m y i = d z i,m m=1 ε N 0,ψ y + ε ( ) η N ( 0,ψ ) m z,m
24 A Bayesian Approach to Locally Weighted Learning Robust linear regression with automatic relevance detection (ARD, sparsification) z i,m = x i,m β j + η m y i = d z i,m m=1 ε N 0,ψ y + ε ( ) η m N ( 0,ψ ) z,m β m N 0, 1 α m α m Gamma( a α,b α,)
25 A Full Bayesian Treatment of Locally Weighted Learning The final model for full Bayesian parameter adaptation for regression and locality ψ y y i i = 1,.., N ψ z1 ψ z2 ψ zd z i1 b 1 z i2 z id b d b 2 x i1 w i1 x i2 w x i2 id w id h 1 h 2 h d
26 Locally Weighted Learning In High Dimensional Spaces Learning the cross function in 20-dimensional space z TextEnd 0.5 z TextEnd y x y x
27 Locally Weighted Learning In High Dimensional Spaces Learning the cross function in 20-dimensional space nmse on Test Set D-cross 10D-cross 20D-cross #Training Data Points #Receptive Fields / Average #Projections
28 Locally Weighted Learning In High Dimensional Spaces Learning inverse kinematics in 60 dimensional space
29 Locally Weighted Learning In High Dimensional Spaces Skill learning
30 Outline A Bit of Robotics History Foundations of Control Adaptive Control Learning Control - Model-based Robot Learning - Reinforcement Learning
31 Given: A Parameterized Policy and a Controller Note: we are now starting to address planning, i.e,. where do desired trajectories come from?
32 Trial & Error Learning Reinforcement Learning from Trajectories Problem: How can a motor system learn a novel motor skill? Reinforcement learning is a general approach to this problem, but little work has been done to scale to the highdimensional continuous stateaction domains of humans Approach: Teach with imitation learning the initial skill using a parameterized control policy Provide an objective function for the skill Perform trial-and-error learning from exploratory trajectories
33 Reinforcement Learning Terminology Policies perceived state to action mapping (can be probabilistic) Reward functions maps the perceived stateaction pair into a a single number, an immediate reward (stochastic) Value functions maps the state into the accumulated expected reward that would be received if starting in the state Models predicts the next state given the current state and action (can be probabilistic) l Policy: what to do l Reward: what is good l Value: what is good because it predicts reward l Model: what follows what Objective: Optimize Reward!
34 Value Functions The value of a state is the expected return starting from that state; depends on the agent s policy: State - value function for policy π : { } = E π γ k r t + k +1 x t = x V π (x) = E π R t x t = x The value of taking an action in a state under policy π is the expected return starting from that state, taking that action, and thereafter following π : Action - value function for policy π : Q π (x,u) = E π R t x t = x,u t = u k =0 k =0 { } = E π γ k r t + k +1 x t = x,u t = u
35 Bellman Equation for a Policy π The basic idea: So: V π (x) = E { π R t x t = x} = E { π r t +1 + γ V ( x ) t +1 x t = x}
36 Bellman Optimality Equation for V* The value of a state under an optimal policy must equal the expected return for the best action from that state: V *( x) = max Q π ( x,u) u A(x) = max u A(x) E{ r + γ V *( x x = x,u = u) } t +1 t +1 t t V is the unique solution of this system of equations.
37 Bellman Optimality Equation for Q* The value of a state/action under an optimal policy must equal the expected return for this action from that state, and then following the optimal policy: { } Q (x,u) = E r + γ maxq (x,u') x = x,u = u t+1 t+1 t t u' Q is the unique solution of this system of equations.
38 Example: Learning a Pendulum Swing-Up Note: Both policy and value function are rather complex landscapes with discontinuities! Policy Value Function
39 Some More Exciting Examples
40 State-Based vs. Trajectory-based Reinforcement Learning From about , value function-based (i.e., state-based) reinforcement learning has been dominant (textbook Sutton&Barto) Pros: - well-understood theory - convergence proofs for discrete state-action systems - a useful set of algorithms to work with (model-based and model-free) - ideally a globally optimal solution Cons: - problematic in continuous state-action spaces (max-operator in continuous spaces) - curse of dimensionality in high-dimensional systems - hard to combine with function approximation - greed (= agressive) updating Trajectory-based reinforcement learning Pros: - can work in high dimensional continuous state-action spaces - does not suffer from the curse of dimensionality Cons: - Locally optimal solutions - classical methods learn very slowly
41 Trajectory-based Reinforement Learning with Parameterized Policies ( ) = π x t u t!x d t ( ) = π x d t ( ( ),t,α ) or ( ( ),t,α ) Example: Dynamic Systems Policies, initalized by imitation τ!!y = α z ( β z ( g y)!y ) + τ!x = α x x k i=1 k i=1 w i b i x w i
42 Trajectory-based Reinforcement Learning Define a cost function along the trajectory: T J = E τ i=0 And a parameterized control policy (e.g., a movement primitive) τ!y = f ( y, goal,b) r i Optimize J with respect to parameters b, e.g., by gradient descent b n+1 = b n + α J b
43 Example: Learning with Natural Gradients Goal: Hit ball to fly far Note: about trials are needed.
44 Reinforcement Learning from Trajectories State-of-the-art of Reinforcement Learning from Trajectories: - Given the cost per trajectory τ : - The motor primitives with parameters b: RL with Natural Gradients J = E τ T τ!y = f ( y,goal,b) b new = b old + α J NAC b Probabilistic RL with Reward-Weighted Regression b new Trajectory-based Q-learning (fitted Q-iteration) - an actor-critic based method based on an action-value function over trajectories RL with path-integrals (a probabilistic, model-based/model-free approach derived from stochastic optimal control) i=0 r i R τ b τ / R τ T T
45 Reinforcement Learning Based on Path Integrals Pre-requisites System Dynamics (Control-Affine): ( ) = F x,u,t!x = f ( x,t) + G( x) u( t) + ε( t) ( ) Cost Function: r t = q(x t ) u T t Ru t T J xt = E xt q T + r t ' dt ' t '=t Note: this is a more structured approach to RL Goal: find commands u that minimize this cost
46 Reinforcement Learning Based on Path Integrals Sketch of the Path-Integral Derivation Stochastic HJB Equations: t V ( x t,t) = min u t:t m r t + x V ( x t,t) T F( x,u,t ) { ( )} Tr Ω ( x,u,t ) 2 V x,t x t min u t:t m 1 2 u T t Ru t + q t + x V x t,t ( ) T f x,t ( ) + x V x t,t ( ) T G x ( )u t u T t R + x V ( x t,t) T G( x t ) = 0 u t = R 1 G( x t ) T x V ( x t,t) ( ) Tr G x { ( )Σ G( x)t 2 x V ( x t,t)} = 0
47 Reinforcement Learning Based on Path Integrals Sketch of the Path-Integral Derivation t V ( x t,t) = min u t:t m r t + x V ( x t,t) T F( x,u,t ) u t = R 1 G( x ) T t x V ( x t,t) ( )!x = f ( x,t) + G( x) u( t) + ε( t) { ( )} Tr Ω ( x,u,t ) 2 V x,t x t t V ( x t,t) = 1 2 V x x (,t t ) T G( x)r 1 G( x) T x V ( x t,t) + q t + x V ( x t,t) T f x,t ( ) Tr G x { ( )Σ G( x)t 2 x V ( x t,t)}
48 Reinforcement Learning Based on Path Integrals Sketch of the Path-Integral Derivation t V ( x t,t) = 1 2 V x x (,t t ) T G( x)r 1 G( x) T x V ( x t,t) + q t + x V ( x t,t) T f x,t ( ) Tr G x { ( )Σ G( x)t 2 x V ( x t,t)} Simplification: λr 1 = Σ Log-Transformation Trick: V ( x t,t) = λ logψ ( x t,t) t ψ ( x t,t) = 1 λ ψ ( x t,t)q t x ψ ( x t,t) T f x,t ( ) 1 2 Tr G x { ( )Σ G( x)t 2 xψ ( x t,t)} Chapman Kolmogorov PDE: 2nd Order and Linear
49 Reinforcement Learning Based on Path Integrals Sketch of the Path-Integral Derivation t ψ ( x t,t) = 1 λ ψ ( x,t t )q t x ψ ( x t,t) T f x,t ( ) 1 2 Tr G x { ( )Σ G( x)t 2 xψ ( x t,t)} Application of Feynman-Kac Theorem: A numerical method to solve certain PDEs ( ) = E τ ψ x T,T ψ x t,t ( )exp t '=T t '=t 1 λ q t ' dt '
50 Reinforcement Learning Based on Path Integrals Sketch of the Path-Integral Derivation ( ) = E τ ψ x T,T ψ x t,t u t = R 1 G x t ( )exp t '=T t '=t ( ) T x V ( x t,t) 1 λ q t ' dt ' A bit of algebra... { u t = E τ w τ R 1 G( x t ) T ( G( x t )R 1 G ( x ) T ) 1 t G( } x t )ε t Optimal Control Law
51 Path Integral RL Applied to Parameterized Policies (Motor Primitives) Note that a version of motor primitives can be written as control affine stochastic differential equations!x = f ( x) + g T ( θ + ε) - ε is interpreted as intentionally injected exploration noise - the parameters θ are the control vector - f(x) is the spring-damper of the primitives - g(x) are the basis functions of the function approximator It is also necessary to create a iterative version of path integral optimal control - the original path integral optimal control framework explores only based on the passive dynamics, i.e., u=0
52 PI 2 Reinforcement Learning For parameterized policies like dynamic motor primitives, a beautifully simple algorithm results: 1) Create K trajectories of the motor primitive for a given task with noise. 2) We can write the cost to go from every time step t of the trajectory as: R t = q T + T i=t r i 3) The probability of a trajectory becomes k P( ξ t ) = exp 1 λ R t K j =1 k exp 1 λ R t j 4) Update the parameter θ of the motor primitive as Δθ t = K k =1 k P( ξ t ) R 1 g k (x t )g k (x t ) T g k (x t ) T R 1 g k (x t ) 5) Final parameter update θ new = θ old + Δθ t ε k t Note that there a NO open tuning parameters except for the exploration noise
53 PI 2 Reinforcement Learning The Intuition of Path Integral Reinforcement Learning - Generate multiple trials i with some variation, e.g., due to noise or exploration - For every time t, compute i the cost R t for every trial: R t i T = q T + q(x t ) u t t T Ru t dτ t - Convert the cost into a positive weight w t i = exp( i λr ) t - Update the motor command at every time step to be the reward weighted average of all experienced commands in the trial w i i t u t u t new = i i w t i Surprisingly, this intuition turns out to be the optimal solution
54 PI 2 Reinforcement Learning: Some Remarks PI 2 can be model-based to model-free Rigid Body Dynamics:!!q = M q Control Law: u = u ff + K p ( ) 1 u C( q,!q )!q + G( q) ( ) ( q d q) + K D (!q d!q ) Motor Primitives:!!q i d = α ( z β ( z g i q i ) d!q i ) d + ψ T θ PI 2 can optimize trajectory plans, controllers, or both PI 2 has only one open parameter, i.e., the level of exploration noise PI 2 allows a rather simple derivation of inverse reinforcement learning
55 Example: Results on 2D Reaching Through a Via Point Via-Point Cost y [m] Number of Roll-Outs Initial x [m] PI2 REINFORCE PG NAC
56 Example: Results on 20D Reaching Through a Via Point Via-Point Cost y [m] Number of Roll-Outs x [m]
57 Example: Results on 50D Reaching Through a Via Point Via-Point Cost y [m] Number of Roll-Outs x [m]
58 Example: Dog Jump 600 This is a 12 DOF motor system, using 50 basis functions per primitive. Learning converges after about trial! Performance improved by 15cm (0.5 body lengths) Cost Number of Roll-Outs
59 Reinforcement Learning in Manipulation
60 Learning Locomotion over Rough Terrain
61 Outline A Bit of Robotics History Foundations of Control Adaptive Control Learning Control - Model-based Robot Learning - Reinforcement Learning What Comes Next?
62 Towards Truly Autonomous Robots Big Robots Very Little Robots
Path Integral Stochastic Optimal Control for Reinforcement Learning
Preprint August 3, 204 The st Multidisciplinary Conference on Reinforcement Learning and Decision Making RLDM203 Path Integral Stochastic Optimal Control for Reinforcement Learning Farbod Farshidian Institute
More informationOnline Learning in High Dimensions. LWPR and it s application
Lecture 9 LWPR Online Learning in High Dimensions Contents: LWPR and it s application Sethu Vijayakumar, Aaron D'Souza and Stefan Schaal, Incremental Online Learning in High Dimensions, Neural Computation,
More informationOptimal Control with Learned Forward Models
Optimal Control with Learned Forward Models Pieter Abbeel UC Berkeley Jan Peters TU Darmstadt 1 Where we are? Reinforcement Learning Data = {(x i, u i, x i+1, r i )}} x u xx r u xx V (x) π (u x) Now V
More informationELEC-E8119 Robotics: Manipulation, Decision Making and Learning Policy gradient approaches. Ville Kyrki
ELEC-E8119 Robotics: Manipulation, Decision Making and Learning Policy gradient approaches Ville Kyrki 9.10.2017 Today Direct policy learning via policy gradient. Learning goals Understand basis and limitations
More informationCITEC SummerSchool 2013 Learning From Physics to Knowledge Selected Learning Methods
CITEC SummerSchool 23 Learning From Physics to Knowledge Selected Learning Methods Robert Haschke Neuroinformatics Group CITEC, Bielefeld University September, th 23 Robert Haschke (CITEC) Learning From
More informationLearning Policy Improvements with Path Integrals
Evangelos Theodorou etheodor@usc.edu Jonas Buchli jonas@buchli.org Stefan Schaal sschaal@usc.edu Computer Science, Neuroscience, & Biomedical Engineering University of Southern California, 370 S.McClintock
More informationTutorial on Policy Gradient Methods. Jan Peters
Tutorial on Policy Gradient Methods Jan Peters Outline 1. Reinforcement Learning 2. Finite Difference vs Likelihood-Ratio Policy Gradients 3. Likelihood-Ratio Policy Gradients 4. Conclusion General Setup
More informationCS599 Lecture 1 Introduction To RL
CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming
More informationREINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning
REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning Ronen Tamari The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (#67679) February 28, 2016 Ronen Tamari
More informationLearning Manipulation Patterns
Introduction Dynamic Movement Primitives Reinforcement Learning Neuroinformatics Group February 29, 2012 Introduction Dynamic Movement Primitives Reinforcement Learning Motivation shift to reactive motion
More informationLecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications
Lecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications Qixing Huang The University of Texas at Austin huangqx@cs.utexas.edu 1 Disclaimer This note is adapted from Section
More informationIntroduction to Reinforcement Learning. CMPT 882 Mar. 18
Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and
More informationBasics of reinforcement learning
Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system
More informationCS545 Contents XVI. Adaptive Control. Reading Assignment for Next Class. u Model Reference Adaptive Control. u Self-Tuning Regulators
CS545 Contents XVI Adaptive Control u Model Reference Adaptive Control u Self-Tuning Regulators u Linear Regression u Recursive Least Squares u Gradient Descent u Feedback-Error Learning Reading Assignment
More informationCS545 Contents XVI. l Adaptive Control. l Reading Assignment for Next Class
CS545 Contents XVI Adaptive Control Model Reference Adaptive Control Self-Tuning Regulators Linear Regression Recursive Least Squares Gradient Descent Feedback-Error Learning Reading Assignment for Next
More informationOptimal Control. McGill COMP 765 Oct 3 rd, 2017
Optimal Control McGill COMP 765 Oct 3 rd, 2017 Classical Control Quiz Question 1: Can a PID controller be used to balance an inverted pendulum: A) That starts upright? B) That must be swung-up (perhaps
More informationModel-Based Reinforcement Learning with Continuous States and Actions
Marc P. Deisenroth, Carl E. Rasmussen, and Jan Peters: Model-Based Reinforcement Learning with Continuous States and Actions in Proceedings of the 16th European Symposium on Artificial Neural Networks
More informationINF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018
Machine learning for image classification Lecture 14: Reinforcement learning May 9, 2018 Page 3 Outline Motivation Introduction to reinforcement learning (RL) Value function based methods (Q-learning)
More informationFully Bayesian Deep Gaussian Processes for Uncertainty Quantification
Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification N. Zabaras 1 S. Atkinson 1 Center for Informatics and Computational Science Department of Aerospace and Mechanical Engineering University
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationREINFORCEMENT learning (RL) is a machine learning
762 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 5, MAY 213 Online Learning Control Using Adaptive Critic Designs With Sparse Kernel Machines Xin Xu, Senior Member, IEEE, Zhongsheng
More informationProbabilistic Model-based Imitation Learning
Probabilistic Model-based Imitation Learning Peter Englert, Alexandros Paraschos, Jan Peters,2, and Marc Peter Deisenroth Department of Computer Science, Technische Universität Darmstadt, Germany. 2 Max
More informationProbabilistic Model-based Imitation Learning
Probabilistic Model-based Imitation Learning Peter Englert, Alexandros Paraschos, Jan Peters,2, and Marc Peter Deisenroth Department of Computer Science, Technische Universität Darmstadt, Germany. 2 Max
More informationReinforcement Learning of Motor Skills in High Dimensions: A Path Integral Approach
200 IEEE International Conference on Robotics and Automation Anchorage Convention District May 3-8, 200, Anchorage, Alaska, USA Reinforcement Learning of Motor Skills in High Dimensions: A Path Integral
More informationA Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games
International Journal of Fuzzy Systems manuscript (will be inserted by the editor) A Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games Mostafa D Awheda Howard M Schwartz Received:
More informationReinforcement Learning
Reinforcement Learning Ron Parr CompSci 7 Department of Computer Science Duke University With thanks to Kris Hauser for some content RL Highlights Everybody likes to learn from experience Use ML techniques
More informationPolicy Search for Path Integral Control
Policy Search for Path Integral Control Vicenç Gómez 1,2, Hilbert J Kappen 2, Jan Peters 3,4, and Gerhard Neumann 3 1 Universitat Pompeu Fabra, Barcelona Department of Information and Communication Technologies,
More informationReinforcement Learning of Motor Skills in High Dimensions: A Path Integral Approach
Reinforcement Learning of Motor Skills in High Dimensions: A Path Integral Approach Evangelos Theodorou, Jonas Buchli, and Stefan Schaal Abstract Reinforcement learning RL is one of the most general approaches
More informationPolicy Gradient Reinforcement Learning for Robotics
Policy Gradient Reinforcement Learning for Robotics Michael C. Koval mkoval@cs.rutgers.edu Michael L. Littman mlittman@cs.rutgers.edu May 9, 211 1 Introduction Learning in an environment with a continuous
More informationA Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley
A Tour of Reinforcement Learning The View from Continuous Control Benjamin Recht University of California, Berkeley trustable, scalable, predictable Control Theory! Reinforcement Learning is the study
More informationVariable Impedance Control A Reinforcement Learning Approach
Variable Impedance Control A Reinforcement Learning Approach Jonas Buchli, Evangelos Theodorou, Freek Stulp, Stefan Schaal Abstract One of the hallmarks of the performance, versatility, and robustness
More informationLearning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods
Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods Yaakov Engel Joint work with Peter Szabo and Dmitry Volkinshtein (ex. Technion) Why use GPs in RL? A Bayesian approach
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationInverse Optimality Design for Biological Movement Systems
Inverse Optimality Design for Biological Movement Systems Weiwei Li Emanuel Todorov Dan Liu Nordson Asymtek Carlsbad CA 921 USA e-mail: wwli@ieee.org. University of Washington Seattle WA 98195 USA Google
More informationReinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina
Reinforcement Learning Introduction Introduction Unsupervised learning has no outcome (no feedback). Supervised learning has outcome so we know what to predict. Reinforcement learning is in between it
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationCS545 Contents XIII. Trajectory Planning. Reading Assignment for Next Class
CS545 Contents XIII Trajectory Planning Control Policies Desired Trajectories Optimization Methods Dynamical Systems Reading Assignment for Next Class See http://www-clmc.usc.edu/~cs545 Learning Policies
More informationToday s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning
CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides
More informationMachine Learning I Continuous Reinforcement Learning
Machine Learning I Continuous Reinforcement Learning Thomas Rückstieß Technische Universität München January 7/8, 2010 RL Problem Statement (reminder) state s t+1 ENVIRONMENT reward r t+1 new step r t
More informationCOMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati
COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning Hanna Kurniawati Today } What is machine learning? } Where is it used? } Types of machine learning
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationLinear Models for Regression. Sargur Srihari
Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood
More informationBalancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm
Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu
More informationAutonomous Helicopter Flight via Reinforcement Learning
Autonomous Helicopter Flight via Reinforcement Learning Authors: Andrew Y. Ng, H. Jin Kim, Michael I. Jordan, Shankar Sastry Presenters: Shiv Ballianda, Jerrolyn Hebert, Shuiwang Ji, Kenley Malveaux, Huy
More informationMarks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam:
Marks } Assignment 1: Should be out this weekend } All are marked, I m trying to tally them and perhaps add bonus points } Mid-term: Before the last lecture } Mid-term deferred exam: } This Saturday, 9am-10.30am,
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationToday s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes
Today s s Lecture Lecture 20: Learning -4 Review of Neural Networks Markov-Decision Processes Victor Lesser CMPSCI 683 Fall 2004 Reinforcement learning 2 Back-propagation Applicability of Neural Networks
More informationCS Machine Learning Qualifying Exam
CS Machine Learning Qualifying Exam Georgia Institute of Technology March 30, 2017 The exam is divided into four areas: Core, Statistical Methods and Models, Learning Theory, and Decision Processes. There
More informationGame Physics. Game and Media Technology Master Program - Utrecht University. Dr. Nicolas Pronost
Game and Media Technology Master Program - Utrecht University Dr. Nicolas Pronost Rigid body physics Particle system Most simple instance of a physics system Each object (body) is a particle Each particle
More informationSupport Vector Machines and Bayes Regression
Statistical Techniques in Robotics (16-831, F11) Lecture #14 (Monday ctober 31th) Support Vector Machines and Bayes Regression Lecturer: Drew Bagnell Scribe: Carl Doersch 1 1 Linear SVMs We begin by considering
More informationDeep Reinforcement Learning SISL. Jeremy Morton (jmorton2) November 7, Stanford Intelligent Systems Laboratory
Deep Reinforcement Learning Jeremy Morton (jmorton2) November 7, 2016 SISL Stanford Intelligent Systems Laboratory Overview 2 1 Motivation 2 Neural Networks 3 Deep Reinforcement Learning 4 Deep Learning
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More informationChapter 3: The Reinforcement Learning Problem
Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which
More informationDeep Feedforward Networks
Deep Feedforward Networks Yongjin Park 1 Goal of Feedforward Networks Deep Feedforward Networks are also called as Feedforward neural networks or Multilayer Perceptrons Their Goal: approximate some function
More informationRobotics. Control Theory. Marc Toussaint U Stuttgart
Robotics Control Theory Topics in control theory, optimal control, HJB equation, infinite horizon case, Linear-Quadratic optimal control, Riccati equations (differential, algebraic, discrete-time), controllability,
More informationTrajectory encoding methods in robotics
JOŽEF STEFAN INTERNATIONAL POSTGRADUATE SCHOOL Humanoid and Service Robotics Trajectory encoding methods in robotics Andrej Gams Jožef Stefan Institute Ljubljana, Slovenia andrej.gams@ijs.si 12. 04. 2018
More informationUsing Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *
Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 1. pp. 87 94. Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms
More informationThe Art of Sequential Optimization via Simulations
The Art of Sequential Optimization via Simulations Stochastic Systems and Learning Laboratory EE, CS* & ISE* Departments Viterbi School of Engineering University of Southern California (Based on joint
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
More informationTopic 3: Neural Networks
CS 4850/6850: Introduction to Machine Learning Fall 2018 Topic 3: Neural Networks Instructor: Daniel L. Pimentel-Alarcón c Copyright 2018 3.1 Introduction Neural networks are arguably the main reason why
More informationMinimax Differential Dynamic Programming: An Application to Robust Biped Walking
Minimax Differential Dynamic Programming: An Application to Robust Biped Walking Jun Morimoto Human Information Science Labs, Department 3, ATR International Keihanna Science City, Kyoto, JAPAN, 619-0288
More informationBayesian Networks Inference with Probabilistic Graphical Models
4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning
More informationLecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation
Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation CS234: RL Emma Brunskill Winter 2018 Material builds on structure from David SIlver s Lecture 4: Model-Free
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationReal Time Stochastic Control and Decision Making: From theory to algorithms and applications
Real Time Stochastic Control and Decision Making: From theory to algorithms and applications Evangelos A. Theodorou Autonomous Control and Decision Systems Lab Challenges in control Uncertainty Stochastic
More informationReinforcement Learning. Machine Learning, Fall 2010
Reinforcement Learning Machine Learning, Fall 2010 1 Administrativia This week: finish RL, most likely start graphical models LA2: due on Thursday LA3: comes out on Thursday TA Office hours: Today 1:30-2:30
More informationState Space Abstractions for Reinforcement Learning
State Space Abstractions for Reinforcement Learning Rowan McAllister and Thang Bui MLG RCC 6 November 24 / 24 Outline Introduction Markov Decision Process Reinforcement Learning State Abstraction 2 Abstraction
More informationDeep Reinforcement Learning: Policy Gradients and Q-Learning
Deep Reinforcement Learning: Policy Gradients and Q-Learning John Schulman Bay Area Deep Learning School September 24, 2016 Introduction and Overview Aim of This Talk What is deep RL, and should I use
More informationMathematical Formulation of Our Example
Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot
More informationBayesian Linear Regression. Sargur Srihari
Bayesian Linear Regression Sargur srihari@cedar.buffalo.edu Topics in Bayesian Regression Recall Max Likelihood Linear Regression Parameter Distribution Predictive Distribution Equivalent Kernel 2 Linear
More informationThe Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017
The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the
More informationReinforcement Learning based on On-line EM Algorithm
Reinforcement Learning based on On-line EM Algorithm Masa-aki Sato t t ATR Human Information Processing Research Laboratories Seika, Kyoto 619-0288, Japan masaaki@hip.atr.co.jp Shin Ishii +t tnara Institute
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationPath Integral Policy Improvement with Covariance Matrix Adaptation
JMLR: Workshop and Conference Proceedings 1 14, 2012 European Workshop on Reinforcement Learning (EWRL) Path Integral Policy Improvement with Covariance Matrix Adaptation Freek Stulp freek.stulp@ensta-paristech.fr
More informationLearning Variable Impedance Control
Learning Variable Impedance Control Jonas Buchli, Freek Stulp, Evangelos Theodorou, Stefan Schaal November 16, 2011 Abstract One of the hallmarks of the performance, versatility, and robustness of biological
More informationRobotics & Automation. Lecture 25. Dynamics of Constrained Systems, Dynamic Control. John T. Wen. April 26, 2007
Robotics & Automation Lecture 25 Dynamics of Constrained Systems, Dynamic Control John T. Wen April 26, 2007 Last Time Order N Forward Dynamics (3-sweep algorithm) Factorization perspective: causal-anticausal
More informationLinear Models for Regression
Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationActive Policy Iteration: Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning
Active Policy Iteration: fficient xploration through Active Learning for Value Function Approximation in Reinforcement Learning Takayuki Akiyama, Hirotaka Hachiya, and Masashi Sugiyama Department of Computer
More informationELEC4631 s Lecture 2: Dynamic Control Systems 7 March Overview of dynamic control systems
ELEC4631 s Lecture 2: Dynamic Control Systems 7 March 2011 Overview of dynamic control systems Goals of Controller design Autonomous dynamic systems Linear Multi-input multi-output (MIMO) systems Bat flight
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II
1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector
More informationStatistical Machine Learning from Data
January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Multi-Layer Perceptrons Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole
More informationLECTURE NOTE #NEW 6 PROF. ALAN YUILLE
LECTURE NOTE #NEW 6 PROF. ALAN YUILLE 1. Introduction to Regression Now consider learning the conditional distribution p(y x). This is often easier than learning the likelihood function p(x y) and the
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationMathematical optimization
Optimization Mathematical optimization Determine the best solutions to certain mathematically defined problems that are under constrained determine optimality criteria determine the convergence of the
More informationA Probabilistic Representation for Dynamic Movement Primitives
A Probabilistic Representation for Dynamic Movement Primitives Franziska Meier,2 and Stefan Schaal,2 CLMC Lab, University of Southern California, Los Angeles, USA 2 Autonomous Motion Department, MPI for
More informationREINFORCEMENT LEARNING
REINFORCEMENT LEARNING Larry Page: Where s Google going next? DeepMind's DQN playing Breakout Contents Introduction to Reinforcement Learning Deep Q-Learning INTRODUCTION TO REINFORCEMENT LEARNING Contents
More informationThe Perceptron. Volker Tresp Summer 2014
The Perceptron Volker Tresp Summer 2014 1 Introduction One of the first serious learning machines Most important elements in learning tasks Collection and preprocessing of training data Definition of a
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationLinear discriminant functions
Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationDeep Learning for Partial Differential Equations (PDEs)
Deep Learning for Partial Differential Equations (PDEs) Kailai Xu kailaix@stanford.edu Bella Shi bshi@stanford.edu Shuyi Yin syin3@stanford.edu Abstract Partial differential equations (PDEs) have been
More informationGatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts III-IV
Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts III-IV Aapo Hyvärinen Gatsby Unit University College London Part III: Estimation of unnormalized models Often,
More informationRobotics. Dynamics. Marc Toussaint U Stuttgart
Robotics Dynamics 1D point mass, damping & oscillation, PID, dynamics of mechanical systems, Euler-Lagrange equation, Newton-Euler recursion, general robot dynamics, joint space control, reference trajectory
More informationLinear Models for Regression
Linear Models for Regression CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 The Regression Problem Training data: A set of input-output
More informationReinforcement Learning
Reinforcement Learning Function approximation Mario Martin CS-UPC May 18, 2018 Mario Martin (CS-UPC) Reinforcement Learning May 18, 2018 / 65 Recap Algorithms: MonteCarlo methods for Policy Evaluation
More informationComplexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning
Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning Christos Dimitrakakis Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationKernel Bayes Rule: Nonparametric Bayesian inference with kernels
Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December
More informationNeural Networks in Structured Prediction. November 17, 2015
Neural Networks in Structured Prediction November 17, 2015 HWs and Paper Last homework is going to be posted soon Neural net NER tagging model This is a new structured model Paper - Thursday after Thanksgiving
More information