Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning

Size: px
Start display at page:

Download "Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning"

Transcription

1 Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning Stefan Schaal Max-Planck-Institute for Intelligent Systems Tübingen, Germany & Computer Science, Neuroscience, & Biomedical Engineering University of Southern California, Los Angeles

2 Where Did We Stop...

3 Outline A Bit of Robotics History Foundations of Control Adaptive Control Learning Control - Model-based Robot Learning - Reinforcement Learning

4 What Needs to Be Learned in Learning Control? Internal Models Value Functions Control Policies Coordinate Transformations The Majority of the Learning Problems Involve Function Approximation Unsupervised Learning & Classification

5 Learning Internal Models Forward Models - models the causal functional relationship - for example: Inverse Models - models the inverse of the causal functional relationship - for example: y = f ( x) ( ( )!q G( q) )!!q = B 1 ( q) τ C q,!q x = f 1 ( y) B( q)!!q + C( q,!q )!q + G( q) = τ - NOTE: inverse models are not necessarily functions any more!

6 Inverse Models May Not Be Trivially Learnable

7 Inverse Models May Not Be Trivially Learnable t = f ( θ 1 1 1,θ ) 2 t = f ( θ 2 2 1,θ ) 2 what is f 1 ( t)?

8 Characteristics of Function Approximation in Robotics Incremental Learning large amounts of data continual learning to be approximated functions of growing and unknown complexity Fast Learning data efficient computationally efficient real-time Robust Learning minimal interference hundreds of inputs

9 Linear Regression: One of the Simplest Function Approximation Methods Recall the simple adaptive control model with: f ( x) = θx - find the line through all data points - imagine a spring attached between the line and each data point - all springs have the same spring constant - points far away generate more force (danger of outliers) - springs are vertical - solution is the minimum energy solution achieved by the springs y x

10 Linear Regression: One of the Simplest The data generating model: The Least Squares cost function Minimizing the cost gives the least-square solution Function Approximation Methods y =!w T!x + w 0 + ε = w T x + ε where x = x T T!w,1,w = w 0 J = 1 2 t y where : t = J w = 0 = J w, E{ ε} = 0 ( ) T ( t y) = 1 ( 2 t Xw ) T ( t Xw) t 1 t 2 t n = t T X + Xw, X = x 1 T x 2 T x n T 1 ( 2 t Xw ) T ( t Xw) ( ) T X = t T X + w T X T X thus : t T X = w T X T X or X T t = X T Xw result : w = ( X T X) 1 X T t ( ) T X = t Xw

11 Recursive Least Squares: An Incremental Version of Linear Regression Based on the matrix inversion theorem: ( A BC) 1 = A 1 + A 1 B( I + CA 1 B) 1 CA 1 Incremental updating of a linear regression model Initialize: P n = I 1 γ For every new data point ( x,t) (note that x includes the bias): P n+1 = 1 λ Pn Pn xx T P n λ + x T P n x W n+1 = W n + P n+1 x( t W nt x) T where γ << 1 (note P ( X T X) 1 where λ = 1 if no forgetting < 1 if forgetting - NOTE: RLS gives exactly the same solution as linear regression if no forgetting

12 Making Linear Regression Nonlinear: Locally Weighted Regression Region of Validity 2θ k Receptive Field Activation w 1 Linear Model 0 N i=1 ( ) 2 J = w i y i x i T β Note: Using GPs, SVR, Mixture Models, etc., are other ways to nonlinear regression

13 Locally Weighted Regression Piecewise linear function approximation, Each local model is learned from only local data No over-fitting due to too many local models (unlike RBFs, ME)

14 Locally Weighted Regression Linear Model: y = β x T x + β 0 = β T x where learned with x = [ x T 1] T Recursive weighted least squares: β n+1 k = β n k + wp n+1 k x( y!x T n β ) T k P n+1 k = 1 λ P n k P n k!x!x T n P k λ w +!xt P k n!x Weighting Kernel: w = exp 1 2 ( x c )T D( x c) Combined Prediction: y = K i=1 K w i i=1 w i y k where learned with D = M T M Gradient descent in penalized leave-one-out local cross-validation (PRESS) cost function: M n+1 k = M n k α J M N n J = N w k,i y i ŷ k,i, i + γ D k,ij i=1 w k,i i=1 ( ) < w gen i=1, j =1 add model when if min w k k createnew RF at c K +1 = x

15 Locally Weighted Regression ( ( ( ))) z = max exp( 10x 2 ),exp( 50y 2 ),1.25exp 5 x 2 + y x x

16 Locally Weighted Regression Inserted into Adaptive Control

17 Locally Weighted Regression Learn forward model of task dynamics, then computer controller

18 Locally Weighted Regression Learn forward model of task dynamics, then computer controller

19 Criticism of Locally Weighted Learning Breaks down in high-dimensional spaces Computationally expensive and numerically brittle due to (incremental) dxd matrix inversion Not compatible with modern probabilistic statistical learning algorithms Too many manual tuning parameters

20 The Curse of Dimensionality The power of local learning comes from exploiting the discriminative power of local neighborhood relations. But the notion of a local breaks down in high dim. spaces!

21 The Curse of Dimensionality Movement Data is Locally Low Dimensional 0.25 / / 0.2 Probability 0.15 Thus, locally weighted learning can work if used with local dimensionality reduction! Dimensionality / / 105 Derived with Bayesian Factor Analysis

22 A Bayesian Approach to Locally Weighted Learning Linear Regression as a Graphical Model y i = x i T β + ε ( ) ε N 0,ψ y β = ( X T X) 1 Xy

23 A Bayesian Approach to Locally Weighted Learning Inserting a Partial-Least-Squares-like projection as a set of hidden variables z i,m = x i,m β j + η m y i = d z i,m m=1 ε N 0,ψ y + ε ( ) η N ( 0,ψ ) m z,m

24 A Bayesian Approach to Locally Weighted Learning Robust linear regression with automatic relevance detection (ARD, sparsification) z i,m = x i,m β j + η m y i = d z i,m m=1 ε N 0,ψ y + ε ( ) η m N ( 0,ψ ) z,m β m N 0, 1 α m α m Gamma( a α,b α,)

25 A Full Bayesian Treatment of Locally Weighted Learning The final model for full Bayesian parameter adaptation for regression and locality ψ y y i i = 1,.., N ψ z1 ψ z2 ψ zd z i1 b 1 z i2 z id b d b 2 x i1 w i1 x i2 w x i2 id w id h 1 h 2 h d

26 Locally Weighted Learning In High Dimensional Spaces Learning the cross function in 20-dimensional space z TextEnd 0.5 z TextEnd y x y x

27 Locally Weighted Learning In High Dimensional Spaces Learning the cross function in 20-dimensional space nmse on Test Set D-cross 10D-cross 20D-cross #Training Data Points #Receptive Fields / Average #Projections

28 Locally Weighted Learning In High Dimensional Spaces Learning inverse kinematics in 60 dimensional space

29 Locally Weighted Learning In High Dimensional Spaces Skill learning

30 Outline A Bit of Robotics History Foundations of Control Adaptive Control Learning Control - Model-based Robot Learning - Reinforcement Learning

31 Given: A Parameterized Policy and a Controller Note: we are now starting to address planning, i.e,. where do desired trajectories come from?

32 Trial & Error Learning Reinforcement Learning from Trajectories Problem: How can a motor system learn a novel motor skill? Reinforcement learning is a general approach to this problem, but little work has been done to scale to the highdimensional continuous stateaction domains of humans Approach: Teach with imitation learning the initial skill using a parameterized control policy Provide an objective function for the skill Perform trial-and-error learning from exploratory trajectories

33 Reinforcement Learning Terminology Policies perceived state to action mapping (can be probabilistic) Reward functions maps the perceived stateaction pair into a a single number, an immediate reward (stochastic) Value functions maps the state into the accumulated expected reward that would be received if starting in the state Models predicts the next state given the current state and action (can be probabilistic) l Policy: what to do l Reward: what is good l Value: what is good because it predicts reward l Model: what follows what Objective: Optimize Reward!

34 Value Functions The value of a state is the expected return starting from that state; depends on the agent s policy: State - value function for policy π : { } = E π γ k r t + k +1 x t = x V π (x) = E π R t x t = x The value of taking an action in a state under policy π is the expected return starting from that state, taking that action, and thereafter following π : Action - value function for policy π : Q π (x,u) = E π R t x t = x,u t = u k =0 k =0 { } = E π γ k r t + k +1 x t = x,u t = u

35 Bellman Equation for a Policy π The basic idea: So: V π (x) = E { π R t x t = x} = E { π r t +1 + γ V ( x ) t +1 x t = x}

36 Bellman Optimality Equation for V* The value of a state under an optimal policy must equal the expected return for the best action from that state: V *( x) = max Q π ( x,u) u A(x) = max u A(x) E{ r + γ V *( x x = x,u = u) } t +1 t +1 t t V is the unique solution of this system of equations.

37 Bellman Optimality Equation for Q* The value of a state/action under an optimal policy must equal the expected return for this action from that state, and then following the optimal policy: { } Q (x,u) = E r + γ maxq (x,u') x = x,u = u t+1 t+1 t t u' Q is the unique solution of this system of equations.

38 Example: Learning a Pendulum Swing-Up Note: Both policy and value function are rather complex landscapes with discontinuities! Policy Value Function

39 Some More Exciting Examples

40 State-Based vs. Trajectory-based Reinforcement Learning From about , value function-based (i.e., state-based) reinforcement learning has been dominant (textbook Sutton&Barto) Pros: - well-understood theory - convergence proofs for discrete state-action systems - a useful set of algorithms to work with (model-based and model-free) - ideally a globally optimal solution Cons: - problematic in continuous state-action spaces (max-operator in continuous spaces) - curse of dimensionality in high-dimensional systems - hard to combine with function approximation - greed (= agressive) updating Trajectory-based reinforcement learning Pros: - can work in high dimensional continuous state-action spaces - does not suffer from the curse of dimensionality Cons: - Locally optimal solutions - classical methods learn very slowly

41 Trajectory-based Reinforement Learning with Parameterized Policies ( ) = π x t u t!x d t ( ) = π x d t ( ( ),t,α ) or ( ( ),t,α ) Example: Dynamic Systems Policies, initalized by imitation τ!!y = α z ( β z ( g y)!y ) + τ!x = α x x k i=1 k i=1 w i b i x w i

42 Trajectory-based Reinforcement Learning Define a cost function along the trajectory: T J = E τ i=0 And a parameterized control policy (e.g., a movement primitive) τ!y = f ( y, goal,b) r i Optimize J with respect to parameters b, e.g., by gradient descent b n+1 = b n + α J b

43 Example: Learning with Natural Gradients Goal: Hit ball to fly far Note: about trials are needed.

44 Reinforcement Learning from Trajectories State-of-the-art of Reinforcement Learning from Trajectories: - Given the cost per trajectory τ : - The motor primitives with parameters b: RL with Natural Gradients J = E τ T τ!y = f ( y,goal,b) b new = b old + α J NAC b Probabilistic RL with Reward-Weighted Regression b new Trajectory-based Q-learning (fitted Q-iteration) - an actor-critic based method based on an action-value function over trajectories RL with path-integrals (a probabilistic, model-based/model-free approach derived from stochastic optimal control) i=0 r i R τ b τ / R τ T T

45 Reinforcement Learning Based on Path Integrals Pre-requisites System Dynamics (Control-Affine): ( ) = F x,u,t!x = f ( x,t) + G( x) u( t) + ε( t) ( ) Cost Function: r t = q(x t ) u T t Ru t T J xt = E xt q T + r t ' dt ' t '=t Note: this is a more structured approach to RL Goal: find commands u that minimize this cost

46 Reinforcement Learning Based on Path Integrals Sketch of the Path-Integral Derivation Stochastic HJB Equations: t V ( x t,t) = min u t:t m r t + x V ( x t,t) T F( x,u,t ) { ( )} Tr Ω ( x,u,t ) 2 V x,t x t min u t:t m 1 2 u T t Ru t + q t + x V x t,t ( ) T f x,t ( ) + x V x t,t ( ) T G x ( )u t u T t R + x V ( x t,t) T G( x t ) = 0 u t = R 1 G( x t ) T x V ( x t,t) ( ) Tr G x { ( )Σ G( x)t 2 x V ( x t,t)} = 0

47 Reinforcement Learning Based on Path Integrals Sketch of the Path-Integral Derivation t V ( x t,t) = min u t:t m r t + x V ( x t,t) T F( x,u,t ) u t = R 1 G( x ) T t x V ( x t,t) ( )!x = f ( x,t) + G( x) u( t) + ε( t) { ( )} Tr Ω ( x,u,t ) 2 V x,t x t t V ( x t,t) = 1 2 V x x (,t t ) T G( x)r 1 G( x) T x V ( x t,t) + q t + x V ( x t,t) T f x,t ( ) Tr G x { ( )Σ G( x)t 2 x V ( x t,t)}

48 Reinforcement Learning Based on Path Integrals Sketch of the Path-Integral Derivation t V ( x t,t) = 1 2 V x x (,t t ) T G( x)r 1 G( x) T x V ( x t,t) + q t + x V ( x t,t) T f x,t ( ) Tr G x { ( )Σ G( x)t 2 x V ( x t,t)} Simplification: λr 1 = Σ Log-Transformation Trick: V ( x t,t) = λ logψ ( x t,t) t ψ ( x t,t) = 1 λ ψ ( x t,t)q t x ψ ( x t,t) T f x,t ( ) 1 2 Tr G x { ( )Σ G( x)t 2 xψ ( x t,t)} Chapman Kolmogorov PDE: 2nd Order and Linear

49 Reinforcement Learning Based on Path Integrals Sketch of the Path-Integral Derivation t ψ ( x t,t) = 1 λ ψ ( x,t t )q t x ψ ( x t,t) T f x,t ( ) 1 2 Tr G x { ( )Σ G( x)t 2 xψ ( x t,t)} Application of Feynman-Kac Theorem: A numerical method to solve certain PDEs ( ) = E τ ψ x T,T ψ x t,t ( )exp t '=T t '=t 1 λ q t ' dt '

50 Reinforcement Learning Based on Path Integrals Sketch of the Path-Integral Derivation ( ) = E τ ψ x T,T ψ x t,t u t = R 1 G x t ( )exp t '=T t '=t ( ) T x V ( x t,t) 1 λ q t ' dt ' A bit of algebra... { u t = E τ w τ R 1 G( x t ) T ( G( x t )R 1 G ( x ) T ) 1 t G( } x t )ε t Optimal Control Law

51 Path Integral RL Applied to Parameterized Policies (Motor Primitives) Note that a version of motor primitives can be written as control affine stochastic differential equations!x = f ( x) + g T ( θ + ε) - ε is interpreted as intentionally injected exploration noise - the parameters θ are the control vector - f(x) is the spring-damper of the primitives - g(x) are the basis functions of the function approximator It is also necessary to create a iterative version of path integral optimal control - the original path integral optimal control framework explores only based on the passive dynamics, i.e., u=0

52 PI 2 Reinforcement Learning For parameterized policies like dynamic motor primitives, a beautifully simple algorithm results: 1) Create K trajectories of the motor primitive for a given task with noise. 2) We can write the cost to go from every time step t of the trajectory as: R t = q T + T i=t r i 3) The probability of a trajectory becomes k P( ξ t ) = exp 1 λ R t K j =1 k exp 1 λ R t j 4) Update the parameter θ of the motor primitive as Δθ t = K k =1 k P( ξ t ) R 1 g k (x t )g k (x t ) T g k (x t ) T R 1 g k (x t ) 5) Final parameter update θ new = θ old + Δθ t ε k t Note that there a NO open tuning parameters except for the exploration noise

53 PI 2 Reinforcement Learning The Intuition of Path Integral Reinforcement Learning - Generate multiple trials i with some variation, e.g., due to noise or exploration - For every time t, compute i the cost R t for every trial: R t i T = q T + q(x t ) u t t T Ru t dτ t - Convert the cost into a positive weight w t i = exp( i λr ) t - Update the motor command at every time step to be the reward weighted average of all experienced commands in the trial w i i t u t u t new = i i w t i Surprisingly, this intuition turns out to be the optimal solution

54 PI 2 Reinforcement Learning: Some Remarks PI 2 can be model-based to model-free Rigid Body Dynamics:!!q = M q Control Law: u = u ff + K p ( ) 1 u C( q,!q )!q + G( q) ( ) ( q d q) + K D (!q d!q ) Motor Primitives:!!q i d = α ( z β ( z g i q i ) d!q i ) d + ψ T θ PI 2 can optimize trajectory plans, controllers, or both PI 2 has only one open parameter, i.e., the level of exploration noise PI 2 allows a rather simple derivation of inverse reinforcement learning

55 Example: Results on 2D Reaching Through a Via Point Via-Point Cost y [m] Number of Roll-Outs Initial x [m] PI2 REINFORCE PG NAC

56 Example: Results on 20D Reaching Through a Via Point Via-Point Cost y [m] Number of Roll-Outs x [m]

57 Example: Results on 50D Reaching Through a Via Point Via-Point Cost y [m] Number of Roll-Outs x [m]

58 Example: Dog Jump 600 This is a 12 DOF motor system, using 50 basis functions per primitive. Learning converges after about trial! Performance improved by 15cm (0.5 body lengths) Cost Number of Roll-Outs

59 Reinforcement Learning in Manipulation

60 Learning Locomotion over Rough Terrain

61 Outline A Bit of Robotics History Foundations of Control Adaptive Control Learning Control - Model-based Robot Learning - Reinforcement Learning What Comes Next?

62 Towards Truly Autonomous Robots Big Robots Very Little Robots

Path Integral Stochastic Optimal Control for Reinforcement Learning

Path Integral Stochastic Optimal Control for Reinforcement Learning Preprint August 3, 204 The st Multidisciplinary Conference on Reinforcement Learning and Decision Making RLDM203 Path Integral Stochastic Optimal Control for Reinforcement Learning Farbod Farshidian Institute

More information

Online Learning in High Dimensions. LWPR and it s application

Online Learning in High Dimensions. LWPR and it s application Lecture 9 LWPR Online Learning in High Dimensions Contents: LWPR and it s application Sethu Vijayakumar, Aaron D'Souza and Stefan Schaal, Incremental Online Learning in High Dimensions, Neural Computation,

More information

Optimal Control with Learned Forward Models

Optimal Control with Learned Forward Models Optimal Control with Learned Forward Models Pieter Abbeel UC Berkeley Jan Peters TU Darmstadt 1 Where we are? Reinforcement Learning Data = {(x i, u i, x i+1, r i )}} x u xx r u xx V (x) π (u x) Now V

More information

ELEC-E8119 Robotics: Manipulation, Decision Making and Learning Policy gradient approaches. Ville Kyrki

ELEC-E8119 Robotics: Manipulation, Decision Making and Learning Policy gradient approaches. Ville Kyrki ELEC-E8119 Robotics: Manipulation, Decision Making and Learning Policy gradient approaches Ville Kyrki 9.10.2017 Today Direct policy learning via policy gradient. Learning goals Understand basis and limitations

More information

CITEC SummerSchool 2013 Learning From Physics to Knowledge Selected Learning Methods

CITEC SummerSchool 2013 Learning From Physics to Knowledge Selected Learning Methods CITEC SummerSchool 23 Learning From Physics to Knowledge Selected Learning Methods Robert Haschke Neuroinformatics Group CITEC, Bielefeld University September, th 23 Robert Haschke (CITEC) Learning From

More information

Learning Policy Improvements with Path Integrals

Learning Policy Improvements with Path Integrals Evangelos Theodorou etheodor@usc.edu Jonas Buchli jonas@buchli.org Stefan Schaal sschaal@usc.edu Computer Science, Neuroscience, & Biomedical Engineering University of Southern California, 370 S.McClintock

More information

Tutorial on Policy Gradient Methods. Jan Peters

Tutorial on Policy Gradient Methods. Jan Peters Tutorial on Policy Gradient Methods Jan Peters Outline 1. Reinforcement Learning 2. Finite Difference vs Likelihood-Ratio Policy Gradients 3. Likelihood-Ratio Policy Gradients 4. Conclusion General Setup

More information

CS599 Lecture 1 Introduction To RL

CS599 Lecture 1 Introduction To RL CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming

More information

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning Ronen Tamari The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (#67679) February 28, 2016 Ronen Tamari

More information

Learning Manipulation Patterns

Learning Manipulation Patterns Introduction Dynamic Movement Primitives Reinforcement Learning Neuroinformatics Group February 29, 2012 Introduction Dynamic Movement Primitives Reinforcement Learning Motivation shift to reactive motion

More information

Lecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications

Lecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications Lecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications Qixing Huang The University of Texas at Austin huangqx@cs.utexas.edu 1 Disclaimer This note is adapted from Section

More information

Introduction to Reinforcement Learning. CMPT 882 Mar. 18

Introduction to Reinforcement Learning. CMPT 882 Mar. 18 Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and

More information

Basics of reinforcement learning

Basics of reinforcement learning Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system

More information

CS545 Contents XVI. Adaptive Control. Reading Assignment for Next Class. u Model Reference Adaptive Control. u Self-Tuning Regulators

CS545 Contents XVI. Adaptive Control. Reading Assignment for Next Class. u Model Reference Adaptive Control. u Self-Tuning Regulators CS545 Contents XVI Adaptive Control u Model Reference Adaptive Control u Self-Tuning Regulators u Linear Regression u Recursive Least Squares u Gradient Descent u Feedback-Error Learning Reading Assignment

More information

CS545 Contents XVI. l Adaptive Control. l Reading Assignment for Next Class

CS545 Contents XVI. l Adaptive Control. l Reading Assignment for Next Class CS545 Contents XVI Adaptive Control Model Reference Adaptive Control Self-Tuning Regulators Linear Regression Recursive Least Squares Gradient Descent Feedback-Error Learning Reading Assignment for Next

More information

Optimal Control. McGill COMP 765 Oct 3 rd, 2017

Optimal Control. McGill COMP 765 Oct 3 rd, 2017 Optimal Control McGill COMP 765 Oct 3 rd, 2017 Classical Control Quiz Question 1: Can a PID controller be used to balance an inverted pendulum: A) That starts upright? B) That must be swung-up (perhaps

More information

Model-Based Reinforcement Learning with Continuous States and Actions

Model-Based Reinforcement Learning with Continuous States and Actions Marc P. Deisenroth, Carl E. Rasmussen, and Jan Peters: Model-Based Reinforcement Learning with Continuous States and Actions in Proceedings of the 16th European Symposium on Artificial Neural Networks

More information

INF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018

INF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018 Machine learning for image classification Lecture 14: Reinforcement learning May 9, 2018 Page 3 Outline Motivation Introduction to reinforcement learning (RL) Value function based methods (Q-learning)

More information

Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification

Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification N. Zabaras 1 S. Atkinson 1 Center for Informatics and Computational Science Department of Aerospace and Mechanical Engineering University

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

REINFORCEMENT learning (RL) is a machine learning

REINFORCEMENT learning (RL) is a machine learning 762 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 5, MAY 213 Online Learning Control Using Adaptive Critic Designs With Sparse Kernel Machines Xin Xu, Senior Member, IEEE, Zhongsheng

More information

Probabilistic Model-based Imitation Learning

Probabilistic Model-based Imitation Learning Probabilistic Model-based Imitation Learning Peter Englert, Alexandros Paraschos, Jan Peters,2, and Marc Peter Deisenroth Department of Computer Science, Technische Universität Darmstadt, Germany. 2 Max

More information

Probabilistic Model-based Imitation Learning

Probabilistic Model-based Imitation Learning Probabilistic Model-based Imitation Learning Peter Englert, Alexandros Paraschos, Jan Peters,2, and Marc Peter Deisenroth Department of Computer Science, Technische Universität Darmstadt, Germany. 2 Max

More information

Reinforcement Learning of Motor Skills in High Dimensions: A Path Integral Approach

Reinforcement Learning of Motor Skills in High Dimensions: A Path Integral Approach 200 IEEE International Conference on Robotics and Automation Anchorage Convention District May 3-8, 200, Anchorage, Alaska, USA Reinforcement Learning of Motor Skills in High Dimensions: A Path Integral

More information

A Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games

A Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games International Journal of Fuzzy Systems manuscript (will be inserted by the editor) A Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games Mostafa D Awheda Howard M Schwartz Received:

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Ron Parr CompSci 7 Department of Computer Science Duke University With thanks to Kris Hauser for some content RL Highlights Everybody likes to learn from experience Use ML techniques

More information

Policy Search for Path Integral Control

Policy Search for Path Integral Control Policy Search for Path Integral Control Vicenç Gómez 1,2, Hilbert J Kappen 2, Jan Peters 3,4, and Gerhard Neumann 3 1 Universitat Pompeu Fabra, Barcelona Department of Information and Communication Technologies,

More information

Reinforcement Learning of Motor Skills in High Dimensions: A Path Integral Approach

Reinforcement Learning of Motor Skills in High Dimensions: A Path Integral Approach Reinforcement Learning of Motor Skills in High Dimensions: A Path Integral Approach Evangelos Theodorou, Jonas Buchli, and Stefan Schaal Abstract Reinforcement learning RL is one of the most general approaches

More information

Policy Gradient Reinforcement Learning for Robotics

Policy Gradient Reinforcement Learning for Robotics Policy Gradient Reinforcement Learning for Robotics Michael C. Koval mkoval@cs.rutgers.edu Michael L. Littman mlittman@cs.rutgers.edu May 9, 211 1 Introduction Learning in an environment with a continuous

More information

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley A Tour of Reinforcement Learning The View from Continuous Control Benjamin Recht University of California, Berkeley trustable, scalable, predictable Control Theory! Reinforcement Learning is the study

More information

Variable Impedance Control A Reinforcement Learning Approach

Variable Impedance Control A Reinforcement Learning Approach Variable Impedance Control A Reinforcement Learning Approach Jonas Buchli, Evangelos Theodorou, Freek Stulp, Stefan Schaal Abstract One of the hallmarks of the performance, versatility, and robustness

More information

Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods

Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods Yaakov Engel Joint work with Peter Szabo and Dmitry Volkinshtein (ex. Technion) Why use GPs in RL? A Bayesian approach

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Inverse Optimality Design for Biological Movement Systems

Inverse Optimality Design for Biological Movement Systems Inverse Optimality Design for Biological Movement Systems Weiwei Li Emanuel Todorov Dan Liu Nordson Asymtek Carlsbad CA 921 USA e-mail: wwli@ieee.org. University of Washington Seattle WA 98195 USA Google

More information

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina Reinforcement Learning Introduction Introduction Unsupervised learning has no outcome (no feedback). Supervised learning has outcome so we know what to predict. Reinforcement learning is in between it

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

CS545 Contents XIII. Trajectory Planning. Reading Assignment for Next Class

CS545 Contents XIII. Trajectory Planning. Reading Assignment for Next Class CS545 Contents XIII Trajectory Planning Control Policies Desired Trajectories Optimization Methods Dynamical Systems Reading Assignment for Next Class See http://www-clmc.usc.edu/~cs545 Learning Policies

More information

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides

More information

Machine Learning I Continuous Reinforcement Learning

Machine Learning I Continuous Reinforcement Learning Machine Learning I Continuous Reinforcement Learning Thomas Rückstieß Technische Universität München January 7/8, 2010 RL Problem Statement (reminder) state s t+1 ENVIRONMENT reward r t+1 new step r t

More information

COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati

COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning Hanna Kurniawati Today } What is machine learning? } Where is it used? } Types of machine learning

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Linear Models for Regression. Sargur Srihari

Linear Models for Regression. Sargur Srihari Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood

More information

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu

More information

Autonomous Helicopter Flight via Reinforcement Learning

Autonomous Helicopter Flight via Reinforcement Learning Autonomous Helicopter Flight via Reinforcement Learning Authors: Andrew Y. Ng, H. Jin Kim, Michael I. Jordan, Shankar Sastry Presenters: Shiv Ballianda, Jerrolyn Hebert, Shuiwang Ji, Kenley Malveaux, Huy

More information

Marks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam:

Marks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam: Marks } Assignment 1: Should be out this weekend } All are marked, I m trying to tally them and perhaps add bonus points } Mid-term: Before the last lecture } Mid-term deferred exam: } This Saturday, 9am-10.30am,

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Today s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes

Today s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes Today s s Lecture Lecture 20: Learning -4 Review of Neural Networks Markov-Decision Processes Victor Lesser CMPSCI 683 Fall 2004 Reinforcement learning 2 Back-propagation Applicability of Neural Networks

More information

CS Machine Learning Qualifying Exam

CS Machine Learning Qualifying Exam CS Machine Learning Qualifying Exam Georgia Institute of Technology March 30, 2017 The exam is divided into four areas: Core, Statistical Methods and Models, Learning Theory, and Decision Processes. There

More information

Game Physics. Game and Media Technology Master Program - Utrecht University. Dr. Nicolas Pronost

Game Physics. Game and Media Technology Master Program - Utrecht University. Dr. Nicolas Pronost Game and Media Technology Master Program - Utrecht University Dr. Nicolas Pronost Rigid body physics Particle system Most simple instance of a physics system Each object (body) is a particle Each particle

More information

Support Vector Machines and Bayes Regression

Support Vector Machines and Bayes Regression Statistical Techniques in Robotics (16-831, F11) Lecture #14 (Monday ctober 31th) Support Vector Machines and Bayes Regression Lecturer: Drew Bagnell Scribe: Carl Doersch 1 1 Linear SVMs We begin by considering

More information

Deep Reinforcement Learning SISL. Jeremy Morton (jmorton2) November 7, Stanford Intelligent Systems Laboratory

Deep Reinforcement Learning SISL. Jeremy Morton (jmorton2) November 7, Stanford Intelligent Systems Laboratory Deep Reinforcement Learning Jeremy Morton (jmorton2) November 7, 2016 SISL Stanford Intelligent Systems Laboratory Overview 2 1 Motivation 2 Neural Networks 3 Deep Reinforcement Learning 4 Deep Learning

More information

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging

More information

Chapter 3: The Reinforcement Learning Problem

Chapter 3: The Reinforcement Learning Problem Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Yongjin Park 1 Goal of Feedforward Networks Deep Feedforward Networks are also called as Feedforward neural networks or Multilayer Perceptrons Their Goal: approximate some function

More information

Robotics. Control Theory. Marc Toussaint U Stuttgart

Robotics. Control Theory. Marc Toussaint U Stuttgart Robotics Control Theory Topics in control theory, optimal control, HJB equation, infinite horizon case, Linear-Quadratic optimal control, Riccati equations (differential, algebraic, discrete-time), controllability,

More information

Trajectory encoding methods in robotics

Trajectory encoding methods in robotics JOŽEF STEFAN INTERNATIONAL POSTGRADUATE SCHOOL Humanoid and Service Robotics Trajectory encoding methods in robotics Andrej Gams Jožef Stefan Institute Ljubljana, Slovenia andrej.gams@ijs.si 12. 04. 2018

More information

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms * Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 1. pp. 87 94. Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms

More information

The Art of Sequential Optimization via Simulations

The Art of Sequential Optimization via Simulations The Art of Sequential Optimization via Simulations Stochastic Systems and Learning Laboratory EE, CS* & ISE* Departments Viterbi School of Engineering University of Southern California (Based on joint

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

More information

Topic 3: Neural Networks

Topic 3: Neural Networks CS 4850/6850: Introduction to Machine Learning Fall 2018 Topic 3: Neural Networks Instructor: Daniel L. Pimentel-Alarcón c Copyright 2018 3.1 Introduction Neural networks are arguably the main reason why

More information

Minimax Differential Dynamic Programming: An Application to Robust Biped Walking

Minimax Differential Dynamic Programming: An Application to Robust Biped Walking Minimax Differential Dynamic Programming: An Application to Robust Biped Walking Jun Morimoto Human Information Science Labs, Department 3, ATR International Keihanna Science City, Kyoto, JAPAN, 619-0288

More information

Bayesian Networks Inference with Probabilistic Graphical Models

Bayesian Networks Inference with Probabilistic Graphical Models 4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning

More information

Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation

Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation CS234: RL Emma Brunskill Winter 2018 Material builds on structure from David SIlver s Lecture 4: Model-Free

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

Real Time Stochastic Control and Decision Making: From theory to algorithms and applications

Real Time Stochastic Control and Decision Making: From theory to algorithms and applications Real Time Stochastic Control and Decision Making: From theory to algorithms and applications Evangelos A. Theodorou Autonomous Control and Decision Systems Lab Challenges in control Uncertainty Stochastic

More information

Reinforcement Learning. Machine Learning, Fall 2010

Reinforcement Learning. Machine Learning, Fall 2010 Reinforcement Learning Machine Learning, Fall 2010 1 Administrativia This week: finish RL, most likely start graphical models LA2: due on Thursday LA3: comes out on Thursday TA Office hours: Today 1:30-2:30

More information

State Space Abstractions for Reinforcement Learning

State Space Abstractions for Reinforcement Learning State Space Abstractions for Reinforcement Learning Rowan McAllister and Thang Bui MLG RCC 6 November 24 / 24 Outline Introduction Markov Decision Process Reinforcement Learning State Abstraction 2 Abstraction

More information

Deep Reinforcement Learning: Policy Gradients and Q-Learning

Deep Reinforcement Learning: Policy Gradients and Q-Learning Deep Reinforcement Learning: Policy Gradients and Q-Learning John Schulman Bay Area Deep Learning School September 24, 2016 Introduction and Overview Aim of This Talk What is deep RL, and should I use

More information

Mathematical Formulation of Our Example

Mathematical Formulation of Our Example Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot

More information

Bayesian Linear Regression. Sargur Srihari

Bayesian Linear Regression. Sargur Srihari Bayesian Linear Regression Sargur srihari@cedar.buffalo.edu Topics in Bayesian Regression Recall Max Likelihood Linear Regression Parameter Distribution Predictive Distribution Equivalent Kernel 2 Linear

More information

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017 The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the

More information

Reinforcement Learning based on On-line EM Algorithm

Reinforcement Learning based on On-line EM Algorithm Reinforcement Learning based on On-line EM Algorithm Masa-aki Sato t t ATR Human Information Processing Research Laboratories Seika, Kyoto 619-0288, Japan masaaki@hip.atr.co.jp Shin Ishii +t tnara Institute

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

Path Integral Policy Improvement with Covariance Matrix Adaptation

Path Integral Policy Improvement with Covariance Matrix Adaptation JMLR: Workshop and Conference Proceedings 1 14, 2012 European Workshop on Reinforcement Learning (EWRL) Path Integral Policy Improvement with Covariance Matrix Adaptation Freek Stulp freek.stulp@ensta-paristech.fr

More information

Learning Variable Impedance Control

Learning Variable Impedance Control Learning Variable Impedance Control Jonas Buchli, Freek Stulp, Evangelos Theodorou, Stefan Schaal November 16, 2011 Abstract One of the hallmarks of the performance, versatility, and robustness of biological

More information

Robotics & Automation. Lecture 25. Dynamics of Constrained Systems, Dynamic Control. John T. Wen. April 26, 2007

Robotics & Automation. Lecture 25. Dynamics of Constrained Systems, Dynamic Control. John T. Wen. April 26, 2007 Robotics & Automation Lecture 25 Dynamics of Constrained Systems, Dynamic Control John T. Wen April 26, 2007 Last Time Order N Forward Dynamics (3-sweep algorithm) Factorization perspective: causal-anticausal

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Active Policy Iteration: Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning

Active Policy Iteration: Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning Active Policy Iteration: fficient xploration through Active Learning for Value Function Approximation in Reinforcement Learning Takayuki Akiyama, Hirotaka Hachiya, and Masashi Sugiyama Department of Computer

More information

ELEC4631 s Lecture 2: Dynamic Control Systems 7 March Overview of dynamic control systems

ELEC4631 s Lecture 2: Dynamic Control Systems 7 March Overview of dynamic control systems ELEC4631 s Lecture 2: Dynamic Control Systems 7 March 2011 Overview of dynamic control systems Goals of Controller design Autonomous dynamic systems Linear Multi-input multi-output (MIMO) systems Bat flight

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II 1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Multi-Layer Perceptrons Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole

More information

LECTURE NOTE #NEW 6 PROF. ALAN YUILLE

LECTURE NOTE #NEW 6 PROF. ALAN YUILLE LECTURE NOTE #NEW 6 PROF. ALAN YUILLE 1. Introduction to Regression Now consider learning the conditional distribution p(y x). This is often easier than learning the likelihood function p(x y) and the

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Mathematical optimization

Mathematical optimization Optimization Mathematical optimization Determine the best solutions to certain mathematically defined problems that are under constrained determine optimality criteria determine the convergence of the

More information

A Probabilistic Representation for Dynamic Movement Primitives

A Probabilistic Representation for Dynamic Movement Primitives A Probabilistic Representation for Dynamic Movement Primitives Franziska Meier,2 and Stefan Schaal,2 CLMC Lab, University of Southern California, Los Angeles, USA 2 Autonomous Motion Department, MPI for

More information

REINFORCEMENT LEARNING

REINFORCEMENT LEARNING REINFORCEMENT LEARNING Larry Page: Where s Google going next? DeepMind's DQN playing Breakout Contents Introduction to Reinforcement Learning Deep Q-Learning INTRODUCTION TO REINFORCEMENT LEARNING Contents

More information

The Perceptron. Volker Tresp Summer 2014

The Perceptron. Volker Tresp Summer 2014 The Perceptron Volker Tresp Summer 2014 1 Introduction One of the first serious learning machines Most important elements in learning tasks Collection and preprocessing of training data Definition of a

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

Linear discriminant functions

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Deep Learning for Partial Differential Equations (PDEs)

Deep Learning for Partial Differential Equations (PDEs) Deep Learning for Partial Differential Equations (PDEs) Kailai Xu kailaix@stanford.edu Bella Shi bshi@stanford.edu Shuyi Yin syin3@stanford.edu Abstract Partial differential equations (PDEs) have been

More information

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts III-IV

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts III-IV Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts III-IV Aapo Hyvärinen Gatsby Unit University College London Part III: Estimation of unnormalized models Often,

More information

Robotics. Dynamics. Marc Toussaint U Stuttgart

Robotics. Dynamics. Marc Toussaint U Stuttgart Robotics Dynamics 1D point mass, damping & oscillation, PID, dynamics of mechanical systems, Euler-Lagrange equation, Newton-Euler recursion, general robot dynamics, joint space control, reference trajectory

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 The Regression Problem Training data: A set of input-output

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Function approximation Mario Martin CS-UPC May 18, 2018 Mario Martin (CS-UPC) Reinforcement Learning May 18, 2018 / 65 Recap Algorithms: MonteCarlo methods for Policy Evaluation

More information

Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning

Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning Christos Dimitrakakis Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December

More information

Neural Networks in Structured Prediction. November 17, 2015

Neural Networks in Structured Prediction. November 17, 2015 Neural Networks in Structured Prediction November 17, 2015 HWs and Paper Last homework is going to be posted soon Neural net NER tagging model This is a new structured model Paper - Thursday after Thanksgiving

More information