CITEC SummerSchool 2013 Learning From Physics to Knowledge Selected Learning Methods

Similar documents
Learning Manipulation Patterns

GAUSSIAN PROCESS REGRESSION

Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning

Bits of Machine Learning Part 1: Supervised Learning

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

Lecture 13: Dynamical Systems based Representation. Contents:

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Path Integral Stochastic Optimal Control for Reinforcement Learning

Trajectory encoding methods in robotics

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Reading Group on Deep Learning Session 1

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Bayesian Linear Regression. Sargur Srihari

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

A Probabilistic Representation for Dynamic Movement Primitives

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Basic Principles of Unsupervised and Unsupervised

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Pattern Recognition and Machine Learning

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Introduction to Gaussian Process

Introduction to Machine Learning

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Lecture 6. Regression

Neural Network Training

Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes

ECE521 week 3: 23/26 January 2017

Bayesian Machine Learning

Linear Models for Regression. Sargur Srihari

Tutorial on Policy Gradient Methods. Jan Peters

GWAS V: Gaussian processes

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Several ways to solve the MSO problem

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

Gaussian Process Regression

Probabilistic & Unsupervised Learning

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Learning Gaussian Process Models from Uncertain Data

Nonparameteric Regression:

Linear discriminant functions

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Model Selection for Gaussian Processes

Outline Lecture 2 2(32)

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Reliability Monitoring Using Log Gaussian Process Regression

Linear Models for Regression

Deep Feedforward Networks

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Linear Models for Regression

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann

Deep Feedforward Networks. Sargur N. Srihari

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

LECTURE NOTE #NEW 6 PROF. ALAN YUILLE

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Gaussian with mean ( µ ) and standard deviation ( σ)

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

Probabilistic Model-based Imitation Learning

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28

Lecture : Probabilistic Machine Learning

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

1 Machine Learning Concepts (16 points)

Nonparametric Bayesian Methods (Gaussian Processes)

Artificial Neural Networks

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Gaussian Processes. 1 What problems can be solved by Gaussian Processes?

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

A summary of Deep Learning without Poor Local Minima

Policy Search for Path Integral Control

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting)

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Neural Networks and Deep Learning

Hierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Learning Motor Skills from Partially Observed Movements Executed at Different Speeds

Probabilistic Model-based Imitation Learning

Optimal Control with Learned Forward Models

Dropout as a Bayesian Approximation: Insights and Applications

Cheng Soon Ong & Christian Walder. Canberra February June 2018

CSC411: Final Review. James Lucas & David Madras. December 3, 2018

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

Gaussian Processes (10/16/13)

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Multiple-step Time Series Forecasting with Sparse Gaussian Processes

Gaussian Process Latent Variable Models for Dimensionality Reduction and Time Series Modeling

Machine Learning Lecture 12

Lecture 5: Logistic Regression. Neural Networks

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *

Transcription:

CITEC SummerSchool 23 Learning From Physics to Knowledge Selected Learning Methods Robert Haschke Neuroinformatics Group CITEC, Bielefeld University September, th 23 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 / 44

Outline Representations 2 Function Learning 3 Perceptual Grouping 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 2 / 44

Representations Outline Representations Feature Space Expansion Gaussian Processes Reservoir Computing 2 Function Learning 3 Perceptual Grouping 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 3 / 44

Representations Feature Space Expansion Feature Space Expansion High-dimensional feature spaces facilitate computations, e.g. enable linear separability of class regions polynomial expansion: (x..x d ) (x..x d,.., x i x j.., x i x j x k ) time history: use (x t, x t... x t k ) R d (k+) filters temporal convolution with kernels: x(t) = K(t, t ) x(t ) dt Kernel trick Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 3 / 44

Representations Feature Space Expansion Kernel Trick linear regressors or perceptrons often have the form y(x) = w t x = N λ α x α x α= Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44

Representations Feature Space Expansion Kernel Trick linear regressors or perceptrons often have the form y(x) = w t x = N α= λ α x α x = α w α φ(x α ) φ(x) goal: introduce non-linear, high-dimensional features φ k (x) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44

Representations Feature Space Expansion Kernel Trick linear regressors or perceptrons often have the form y(x) = w t x = N α= λ α x α x = α w α φ(x α ) φ(x) = α w α K(x α, x) goal: introduce non-linear, high-dimensional features φ k (x) kernel directly computes scalar product of feature vectors Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44

Representations Feature Space Expansion Kernel Trick linear regressors or perceptrons often have the form y(x) = w t x = N α= λ α x α x = α w α φ(x α ) φ(x) = α w α K(x α, x) goal: introduce non-linear, high-dimensional features φ k (x) kernel directly computes scalar product of feature vectors typical examples, e.g. in support vector machines (SVM): polynomial kernel: K(x, x ) = (x x + ) p Gaussian kernel: K(x, x ) = exp( x x 2 2σ 2 ) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44

Representations Feature Space Expansion Example: Bayesian Linear Regression f (x, w) = w t φ(x) y = f (x, w) + η Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 5 / 44

Representations Feature Space Expansion Example: Bayesian Linear Regression f (x, w) = w t φ(x) p(y x, w) = N (w t φ(x), β ) y = f (x, w) + η Gaussian data distribution Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 5 / 44

Representations Feature Space Expansion Example: Bayesian Linear Regression f (x, w) = w t φ(x) y = f (x, w) + η p(y x, w) = N (w t φ(x), β ) Gaussian data distribution ŵ ML = (Φ t Φ) Φ t y Φ t = [φ(x ),..., φ(x N )] R d N (maximum likelihood estimator) y t = [y,..., y N ] R N Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 5 / 44

Representations Feature Space Expansion Example: Bayesian Linear Regression f (x, w) = w t φ(x) y = f (x, w) + η p(y x, w) = N (w t φ(x), β ) Gaussian data distribution ŵ ML = (Φ t Φ) Φ t y Φ t = [φ(x ),..., φ(x N )] R d N (maximum likelihood estimator) y t = [y,..., y N ] R N p(w) = N (, α ) Gaussian a-priori distribution Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 5 / 44

Representations Feature Space Expansion Example: Bayesian Linear Regression f (x, w) = w t φ(x) y = f (x, w) + η p(y x, w) = N (w t φ(x), β ) Gaussian data distribution ŵ ML = (Φ t Φ) Φ t y Φ t = [φ(x ),..., φ(x N )] R d N (maximum likelihood estimator) y t = [y,..., y N ] R N p(w) = N (, α ) p(w y) = N (m N, S N ) ŵ MAP = m N = βs N Φ t y S N = (α + βφ t Φ) Gaussian a-priori distribution a-posteriori distribution MAP estimator R d d Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 5 / 44

Representations Feature Space Expansion Example: Bayesian Linear Regression f (x, w) = w t φ(x) y = f (x, w) + η p(y x, w) = N (w t φ(x), β ) Gaussian data distribution ŵ ML = (Φ t Φ) Φ t y Φ t = [φ(x ),..., φ(x N )] R d N (maximum likelihood estimator) y t = [y,..., y N ] R N p(w) = N (, α ) p(w y) = N (m N, S N ) ŵ MAP = m N = βs N Φ t y S N = (α + βφ t Φ) Gaussian a-priori distribution a-posteriori distribution MAP estimator R d d ŷ(x) = φ(x) t ŵ MAP = β φ(x) t S N Φ t y = α β φ(x) t S N φ(x α ) y α Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 5 / 44

Representations Feature Space Expansion Example: Bayesian Linear Regression f (x, w) = w t φ(x) y = f (x, w) + η p(y x, w) = N (w t φ(x), β ) Gaussian data distribution ŵ ML = (Φ t Φ) Φ t y Φ t = [φ(x ),..., φ(x N )] R d N (maximum likelihood estimator) y t = [y,..., y N ] R N p(w) = N (, α ) p(w y) = N (m N, S N ) ŵ MAP = m N = βs N Φ t y S N = (α + βφ t Φ) Gaussian a-priori distribution a-posteriori distribution MAP estimator R d d ŷ(x) = φ(x) t ŵ MAP = β φ(x) t S N Φ t y = α β φ(x) t S N φ(x α ) y α = α K(x, x α ) y α K(x, x α ) = βφ(x) t S N φ(x α ) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 5 / 44

Representations Gaussian Processes Outline Representations Feature Space Expansion Gaussian Processes Reservoir Computing 2 Function Learning 3 Perceptual Grouping 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP.5.5.5 2 3 4 5 6 7 8 9 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP.5.5.5.5 2 2 3 4 5 6 7 8 9 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP 2.5.5.5.5 2 2 3 4 5 6 7 8 9 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP 2.5.5.5.5 2 2 3 4 5 6 7 8 9 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP.5.5.5.5 2 3 4 5 6 7 8 9 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP.5.5.5.5 2 3 4 5 6 7 8 9 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP.5.5.5.5 2 3 4 5 6 7 8 9 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP.5.5.5.5 2 3 4 5 6 7 8 9 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP.5.5.5.5 2 2 3 4 5 6 7 8 9 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP.5.5.5.5 2 3 4 5 6 7 8 9 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP.5.5.5.5 2 3 4 5 6 7 8 9 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP.5.5.5.5 2 3 4 5 6 7 8 9 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP.5.5.5 2 3 4 5 6 7 8 9 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP.5.5.5 2 3 4 5 6 7 8 9 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

Representations Gaussian Processes Gaussian Processes distribution over parameters w induces distribution over estimator functions idea of Gaussian Processes: directly operate on function distributions, skipping the parameter representation Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 7 / 44

Representations Gaussian Processes Gaussian Processes distribution over parameters w induces distribution over estimator functions idea of Gaussian Processes: directly operate on function distributions, skipping the parameter representation Definition A Gaussian process is a collection of random variables, any finite collection of which has a joint Gaussian distribution. A Gaussian process is fully specified by a mean function f (x) and covariance function k(x, x ). Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 7 / 44

Representations Gaussian Processes Gaussian Processes distribution over parameters w induces distribution over estimator functions idea of Gaussian Processes: directly operate on function distributions, skipping the parameter representation Definition A Gaussian process is a collection of random variables, any finite collection of which has a joint Gaussian distribution. A Gaussian process is fully specified by a mean function f (x) and covariance function k(x, x ). Example linear model f (x) = w t φ(x) with Gaussian prior p(w) = N (, α ) any tuple [f (x )... f (x N )] has Gaussian distribution (as linear combination of Gaussian distributed variables w) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 7 / 44

Representations Gaussian Processes Gaussian Processes Definition A Gaussian process is a collection of random variables, any finite collection of which has a joint Gaussian distribution. A Gaussian process is fully specified by a mean function f (x) and covariance function k(x, x ). Example linear model f (x) = w t φ(x) with Gaussian prior p(w) = N (, α ) any tuple [f (x )... f (x N )] has Gaussian distribution (as linear combination of Gaussian distributed variables w) Specification: mean + covariance function mean: E[f (x)] = E[w t ] φ(x) = covariance: E[f (x)f (x )] = φ(x) t E[ww t ] φ(x ) = α φ(x) t φ(x ) k(x, x ) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 7 / 44

Representations Gaussian Processes Gaussian Process Regression How we can exploit GPs for regression? N training points x... x N induce a Gaussian distribution of associated function values f N = [f (x )... f (x N )]. Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 8 / 44

Representations Gaussian Processes Gaussian Process Regression How we can exploit GPs for regression? N training points x... x N induce a Gaussian distribution of associated function values f N = [f (x )... f (x N )]. Embedding an additional N +-th query point x N+ again yields a Gaussian distribution, now of f N+ = [f (x )... f (x N ), f (x N+ )]. Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 8 / 44

Representations Gaussian Processes Gaussian Process Regression How we can exploit GPs for regression? N training points x... x N induce a Gaussian distribution of associated function values f N = [f (x )... f (x N )]. Embedding an additional N +-th query point x N+ again yields a Gaussian distribution, now of f N+ = [f (x )... f (x N ), f (x N+ )]. Evaluate predictive distribution p(f (x N+ ) f (x )... f (x N )). Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 8 / 44

Representations Gaussian Processes Gaussian Process Regression Computational Steps p(f N+ ) = N (, K N+ ) ( ) KN k K N+ = k t R (N+) (N+) c K N (x n, x m ) = k(x n, x m ) k = [k(x, x N+ )... k(x N, x N+ )] R N c = k(x N+, x N+ ) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 9 / 44

Representations Gaussian Processes Gaussian Process Regression Computational Steps p(f N+ ) = N (, K N+ ) ( ) KN k K N+ = k t R (N+) (N+) c K N (x n, x m ) = k(x n, x m ) predictive distribution k = [k(x, x N+ )... k(x N, x N+ )] R N c = k(x N+, x N+ ) p(f N+ f N ) is again Gaussian with mean µ(x N+ ) = k t K N f N and variance σ 2 (x N+ ) = c k t K N k Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 9 / 44

Representations Reservoir Computing Outline Representations Feature Space Expansion Gaussian Processes Reservoir Computing 2 Function Learning 3 Perceptual Grouping 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 / 44

Representations Reservoir Computing Multilayer Perceptron hidden layer serves as high-dimensional feature space back propagation creates optimal features but: input weights adapt only slowly Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 / 44

Representations Reservoir Computing Extreme Learning Machine simplification: fixed, random input features linear readout facilitates learning: regression methods Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 / 44

Representations Reservoir Computing Reservoir Computing recurrent reservoir allows for temporal dynamics even more rich feature space linear readout: regression Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 2 / 44

Representations Reservoir Computing Reservoir Computing recurrent reservoir allows for temporal dynamics even more rich feature space linear readout: regression echo state network liquid state machine backpropagation-decorrelation Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 2 / 44

Representations Reservoir Computing Reservoir Computing recurrent reservoir allows for temporal dynamics even more rich feature space linear readout: regression echo state network liquid state machine backpropagation-decorrelation How to tune the reservoir? echo state property: activity decays without excitation Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 2 / 44

Representations Reservoir Computing Associative Reservoir Network [Steil] learning in both directions input forcing bidirectional association forward mapping inverse mapping Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 3 / 44

Function Learning Outline Representations 2 Function Learning Parameterized SOM Unsupervised Kernel Regression 3 Perceptual Grouping 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44

Function Learning Parameterized SOM Outline Representations 2 Function Learning Parameterized SOM Unsupervised Kernel Regression 3 Perceptual Grouping 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44

Function Learning Parameterized SOM Parameterized SOM [Ritter 993] fast learning of functions and inverses generalization of self-organizing maps (SOM) from discrete to continuous manifolds c c 3 33 x 3 M w(c) x 2 discrete mapping x C w c Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44

Function Learning Parameterized SOM Parameterized SOM [Ritter 993] fast learning of functions and inverses generalization of self-organizing maps (SOM) from discrete to continuous manifolds c c 3 33 x 3 M w(s) x 2 s 2 s continuous mapping x S coordinate system S spanned by nodes c A w c manifold M in input space X Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44

Function Learning Parameterized SOM PSOM: Function Modeling linear superposition of basis functions H(c, s): w(s) = c A H(c, s)w c H(c, s) should form an orthonormal basis system: H(c, c ) = δ cc representing constant functions: s S H(c, s) = c A Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 5 / 44

Function Learning Parameterized SOM PSOM: Function Modeling simple polynomials along every grid dimension piecewise linear functions:.9.9.9.8.8.8.7.7.7.6.6.6.5.5.5.4.4.4.3.3.3.2.2.2....2.4.6.8.2.4.6.8.2.4.6.8 m multiplicative combination of multiple grid dimensions: H(c, s) = h ν (s ν, A ν ) ν= Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

Function Learning Parameterized SOM PSOM: Function Modeling simple polynomials along every grid dimension Lagrange polynomials: h(c, s) = s c c c c A, c c.9.9.9.8.8.8.7.7.7.6.6.6.5.5.5.4.4.4.3.3.3.2.2.2....2.4.6.8.2.4.6.8.2.4.6.8 m multiplicative combination of multiple grid dimensions: H(c, s) = h ν (s ν, A ν ) ν= Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

Function Learning Parameterized SOM.8.8.8.6.6.6.4.4.4.2.2.2.8.6.8.6.8.6.4.2.5.4.2.5.4.2.5.8.6.4.2.8.6.4.2.8.6.4.2 linear basis functions.8.6.8.6.8.6.4.2.5.4.2.5.4.2.5.8.8.8.6.6.6.4.4.4.2.2.2.8.6.8.6.8.6.4.2.5.4.2.5.4.2.5 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 7 / 44

Function Learning Parameterized SOM.8.8.8.6.6.6.4.4.4.2.2.2.2.2.2.8.6.8.6.8.6.4.2.5.4.2.5.4.2.5.8.6.4.2.8.6.4.2.8.6.4.2 quadratic basis functions.2.2.2.8.6.8.6.8.6.4.2.5.4.2.5.4.2.5.8.8.8.6.6.6.4.4.4.2.2.2.2.2.2.8.6.8.6.8.6.4.2.5.4.2.5.4.2.5 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 8 / 44

Function Learning Parameterized SOM Inverse Mapping find coordinates s closest to observation w s = arg min s S using gradient descent w w(s) s t+ = s t η s w(s t ) (w w(s t )) starting at closest discrete node s = arg min.4 c A.2.8.6.4.2.2.8.6.5 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 9 / 44

Function Learning Parameterized SOM Example: Kinematics Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 2 / 44

Function Learning Unsupervised Kernel Regression Outline Representations 2 Function Learning Parameterized SOM Unsupervised Kernel Regression 3 Perceptual Grouping 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 2 / 44

Function Learning Unsupervised Kernel Regression Unsupervised Kernel Regression [Klanke 26] learning of continuous non-linear manifolds generalizes from fixed PSOM grid employs unsupervised formulation of Nadaraya-Watson estimator Y = {y, y 2,..., y N } R d N X = {x, x 2,..., x N } R q N y = f(x; X) observed data latent parameters corresp. functional relationship f(x; X) = i K(x x i ) y i j K(x x j) K(x x i ) = N (, Σ) Gaussian kernel Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 2 / 44

Function Learning Unsupervised Kernel Regression UKR Learning Minimize reconstruction error R(X) = y m f (x m ; X) 2 N m init t = 3 t = 7 t = t = 4 t = 2 t = 5 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 22 / 44

Function Learning Unsupervised Kernel Regression UKR Learning Minimize reconstruction error R(X) = y m f m (x m ; X) 2 N m (leave-one-out CV) init t = 3 t = 7 t = t = 4 t = 2 t = 5 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 22 / 44

Function Learning Unsupervised Kernel Regression Learned Manipulation Manifold Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 23 / 44

Function Learning Unsupervised Kernel Regression Manipulation Manifold in action Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 24 / 44

Function Learning Unsupervised Kernel Regression Shadow Robot Hand Opening a Bottle Cap [Humanoids 2] Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 25 / 44

Perceptual Grouping Outline Representations 2 Function Learning 3 Perceptual Grouping Gestalt Laws Competitive Layer Model Kuramoto Oscillator Network Comparison Learning 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 26 / 44

Perceptual Grouping Gestalt Laws Outline Representations 2 Function Learning 3 Perceptual Grouping Gestalt Laws Competitive Layer Model Kuramoto Oscillator Network Comparison Learning 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 26 / 44

Perceptual Grouping Gestalt Laws Perceptual Grouping Colour Segmentation Texture Segmentation Segmenting Cell Images Contour Grouping Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 26 / 44

Perceptual Grouping Gestalt Laws Gestalt Laws [Wertheimer 923] proximity similarity closure continuation Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 27 / 44

Perceptual Grouping Gestalt Laws Gestalt Laws [Wertheimer 923] proximity similarity closure continuation attraction repulsion Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 27 / 44

Perceptual Grouping Gestalt Laws Interaction Matrix 2 3 4 5 6 7 8 9 compatibility of feature pairs induces interaction matrix 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 attraction: positive feedback repulsion: negative feedback Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 28 / 44

Perceptual Grouping Gestalt Laws Interaction Matrix target segmentation compatibility of feature pairs induces interaction matrix block structure defines groups interaction matrix Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 28 / 44

Perceptual Grouping Gestalt Laws Interaction Matrix target segmentation compatibility of feature pairs induces interaction matrix block structure defines groups real features unsorted interaction matrix Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 28 / 44

Perceptual Grouping Gestalt Laws Interaction Matrix target segmentation compatibility of feature pairs induces interaction matrix block structure defines groups real features unsorted and noisy interaction matrix Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 28 / 44

Perceptual Grouping Gestalt Laws Interaction Matrix target segmentation compatibility of feature pairs induces interaction matrix block structure defines groups real features unsorted and noisy robust grouping dynamics interaction matrix Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 28 / 44

Perceptual Grouping Competitive Layer Model Outline Representations 2 Function Learning 3 Perceptual Grouping Gestalt Laws Competitive Layer Model Kuramoto Oscillator Network Comparison Learning 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 29 / 44

Perceptual Grouping Competitive Layer Model Competitive Layer Model (CLM) [Ritter 99] input: set of features input: feature vector m r examples: position: m r = (x r, y r ) T oriented line features: m r = (x r, y r, φ r ) T Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 29 / 44

Perceptual Grouping Competitive Layer Model Competitive Layer Model (CLM) [Ritter 99] 2 layered architecture of neurons neurons x rα L layers α =,..., L columns r of neurons activation: linear threshold σ(x) = max(, x) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 29 / 44

Perceptual Grouping Competitive Layer Model Competitive Layer Model (CLM) [Ritter 99] 3 goal: grouping as activation within layers label ˆα(r) : ˆα(r) = : ˆα(r) = 2 : ˆα(r) = L Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 29 / 44

Perceptual Grouping Competitive Layer Model Competitive Layer Model (CLM) [Ritter 99] 4 lateral compatibility interaction f rr induces grouping lateral interaction f rr f rr compatibility of feature pair m r m r Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 29 / 44

Perceptual Grouping Competitive Layer Model Competitive Layer Model (CLM) [Ritter 99] 5 vertical competition: pushes groups to layers vertical inhibition J Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 29 / 44

Perceptual Grouping Competitive Layer Model Competitive Layer Model (CLM) [Ritter 99] 6 column-wise stimulation with feature-dependent bias bias h r overall activity per column significance of feature m r Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 29 / 44

Perceptual Grouping Competitive Layer Model Competitive Layer Model (CLM) [Ritter 99] 6 simulation of recurrent dynamics overall dynamics ẋ rα = x rα + ( σ J(h r x rβ ) + β r f rr x r α ) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 29 / 44

Perceptual Grouping Kuramoto Oscillator Network Outline Representations 2 Function Learning 3 Perceptual Grouping Gestalt Laws Competitive Layer Model Kuramoto Oscillator Network Comparison Learning 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 3 / 44

Perceptual Grouping Kuramoto Oscillator Network Kuramoto Oscillator Network [Meier 23] more efficient grouping dynamics based on coupled oscillators phase θ and frequency ω phase coupling by f rr O O 2 ψ 2 r 2 phase coupling phase coupling O 3 ω L ω 2 θ ω θ r = ω r + K N N f rr sin(θ r θ r ) r = v v 2 v 3 input Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 3 / 44

Perceptual Grouping Kuramoto Oscillator Network Kuramoto Oscillator Network [Meier 23] more efficient grouping dynamics based on coupled oscillators phase θ and frequency ω phase coupling by f rr phase similarity determines frequency grouping O O 2 ψ 2 r 2 phase coupling phase coupling O 3 ω L ω 2 similar features share same frequency θ ω θ r = ω r + K N N f rr sin(θ r θ r ) r = ω r ( = ω argmax α r N (α) f rr ( cos(θr θ r ) + )) 2 v v 2 v 3 input Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 3 / 44

Perceptual Grouping Kuramoto Oscillator Network Example: Contour Grouping Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 3 / 44

Perceptual Grouping Comparison Outline Representations 2 Function Learning 3 Perceptual Grouping Gestalt Laws Competitive Layer Model Kuramoto Oscillator Network Comparison Learning 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 32 / 44

Perceptual Grouping Comparison Evaluation: CLM vs. Oscillator Network groups á features layers resp. frequencies (only needed) different amount of noise in interaction matrices.5.5.5.5.5 -.5 -.5 -.5 -.5 -.5 % - % - 2% - 3% - 4% - Figure : Interaction matrices with different amounts of inverted interaction values. Black pixel represents attraction, white is repelling. Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 32 / 44

Perceptual Grouping Comparison Evaluation Results similar grouping quality grouping quality, mean and stddev.8.6.4.2 CLM Oscillators 5 5 2 25 3 35 4 45 % of noise Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 33 / 44

Perceptual Grouping Comparison Evaluation Results similar grouping quality reduced computational complexity increased convergence speed # of update steps, mean and stddev 3 25 2 5 5 CLM Oscillators 5 5 2 25 3 CLM Oscillators 3 32 33 34 35 36 37 38 39 % of noise % of noise Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 33 / 44 9 8 7 6 5 4 3 2 # of update steps, mean and stddev

Perceptual Grouping Comparison Evaluation Results similar grouping quality reduced computational complexity increased convergence speed faster recovery on changing interaction matrix (splitting from to 2 groups after initial convergence) average number of steps CLM Oscillators 5 5 2 25 3 35 % of noise Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 33 / 44

Perceptual Grouping Learning Outline Representations 2 Function Learning 3 Perceptual Grouping Gestalt Laws Competitive Layer Model Kuramoto Oscillator Network Comparison Learning 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 34 / 44

Perceptual Grouping Learning How to integrate learning? recurrent dynamics robustly creates grouping dynamics determined by interaction matrix f rr How to learn compatibilities? Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 34 / 44

Perceptual Grouping Learning How to integrate learning? recurrent dynamics robustly creates grouping dynamics determined by interaction matrix f rr How to learn compatibilities? CLM dynamics extremly robust wrt. noise in interaction only learn coarse interaction matrix Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 34 / 44

Perceptual Grouping Learning Learning Architecture [Weng 26] Compute more general distance function on feature pairs Learn distance prototypes from labeled samples (VQ) Exploit labels to count pos./neg. weighted prototypes Overview: labeled training data proximity vector d(m r, m r' ) Voronoi cell proximity prototype proximity space D positive coefficents + 2 clustering 3 f rr' > f rr' < negative coefficents interaction function Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 35 / 44

Imitation Learning Outline Representations 2 Function Learning 3 Perceptual Grouping 4 Imitation Learning Dynamic Movement Primitives Reinforcement Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 36 / 44

Imitation Learning Imitation Learning learn from observations How to observe actions? Which elements are important? What to learn? How to represent observed actions? improving + adapting motion autonomous exploration Reinforcement Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 36 / 44

Imitation Learning Imitation Learning learn from observations How to observe actions? Which elements are important? What to learn? How to represent observed actions? Dynamic Movement Primitives improving + adapting motion autonomous exploration Reinforcement Learning Policy Improvement with Path Integrals (PI 2 ) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 36 / 44

Imitation Learning Dynamic Movement Primitives Outline Representations 2 Function Learning 3 Perceptual Grouping 4 Imitation Learning Dynamic Movement Primitives Reinforcement Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 37 / 44

Imitation Learning Dynamic Movement Primitives Dynamic Movement Primitives [Ijspeert, Nakanishi, Schaal, ICRA 2] spring-damper system generates basic motion towards goal τẍ t = k (g x t ) c ẋ t v Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 37 / 44

Imitation Learning Dynamic Movement Primitives Dynamic Movement Primitives [Ijspeert, Nakanishi, Schaal, ICRA 2] spring-damper system generates basic motion towards goal τẍ t = k (g x t ) c ẋ t Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 37 / 44

Imitation Learning Dynamic Movement Primitives Dynamic Movement Primitives [Ijspeert, Nakanishi, Schaal, ICRA 2] add external force to represent complex trajectory shapes τẍ t = k (g x t ) c ẋ t + g T t θ Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 37 / 44

Imitation Learning Dynamic Movement Primitives Dynamic Movement Primitives [Ijspeert, Nakanishi, Schaal, ICRA 2] add external force to represent complex trajectory shapes τẍ t = k (g x t ) c ẋ t + g T t θ Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 37 / 44

Imitation Learning Dynamic Movement Primitives Dynamic Movement Primitives [Ijspeert, Nakanishi, Schaal, ICRA 2] add external force to represent complex trajectory shapes τẍ t = k (g x t ) c ẋ t + g T t θ Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 37 / 44

Imitation Learning Dynamic Movement Primitives External Force f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x ) s weighted sum of basis functions ψ i Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 38 / 44

Imitation Learning Dynamic Movement Primitives External Force f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x ) s weighted sum of basis functions ψ i soft-max normalization Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 38 / 44

Imitation Learning Dynamic Movement Primitives External Force f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x ) s weighted sum of basis functions ψ i soft-max normalization amplitude scaled by initial distance to goal Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 38 / 44

Imitation Learning Dynamic Movement Primitives External Force f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x ) s weighted sum of basis functions ψ i soft-max normalization amplitude scaled by initial distance to goal influence weighted by canonical time s Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 38 / 44

Imitation Learning Dynamic Movement Primitives External Force f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x ) s weighted sum of basis functions ψ i soft-max normalization amplitude scaled by initial distance to goal influence weighted by canonical time s Gaussian basis functions ψ i ψ i (s) = exp( h i (s c i ) 2 ) c i logarithmically distributed in [... ] s 2 [Schaal et al, Brain Research, 27] Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 38 / 44

Imitation Learning Dynamic Movement Primitives Canonical System f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x ) s decouple external force from spring-damper evolution new phase / time variable s τṡ = α s s initially set to...... exponentially converges to Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 39 / 44

Imitation Learning Dynamic Movement Primitives Canonical System f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x ) s decouple external force from spring-damper evolution new phase / time variable s τṡ = α s + α c (x actual x expected ) 2 s initially set to...... exponentially converges to pause influence of force on perturbations Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 39 / 44

Imitation Learning Dynamic Movement Primitives Properties τẍ t = k (g x t ) c ẋ t + g T t θ τṡ = α s f (s) = gt T θ = i θ iψ i (s) i ψ i(s) (g x ) s convergence to goal g motions are self-similar for different goal or start points coupling of multiple DOF through canonical phase s adapt τ for temporal scaling robust to perturbations due to attractor dynamics decoupling basic goal-directed motion from task-specific trajectory shape weights θ i can be learned with linear regression Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44

Imitation Learning Dynamic Movement Primitives Learning from Demonstration record motion x(t), ẋ(t), ẍ(t) choose τ to match duration evolve canonical system s(t) f target (s) = τẍ(t) k(g x(t) cẋ(t)) minimize E = s (f target(s) f (s)) 2 with regression.4.2.2.2.2.8.6.4.2 X.4.8.6.4.2.2.4.6 X.5.5 X.2.4.2.2.4.6.8 X [Ijspeert, Nakanishi, Schaal, ICRA 2] Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44

Imitation Learning Reinforcement Learning Outline Representations 2 Function Learning 3 Perceptual Grouping 4 Imitation Learning Dynamic Movement Primitives Reinforcement Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 42 / 44

Imitation Learning Reinforcement Learning Robustifying Motions motion from imitation learning is fragile robustify by self-exploration competing RL methods: PI 2 Policy Improvement with Path Integrals [Stefan Schaal] PoWeR Policy Learning by Weighting Exploration with the Returns [Jan Peters] Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 42 / 44

Imitation Learning Reinforcement Learning Policy Improvement with Path Integrals PI 2 [Evangelos 2] Optimize shape parameters θ w.r.t. cost function J Use direct reinforcement learning Exploration directly in policy parameter space θ Use Policy Improvement with Path Integrals PI 2 Derived from principles of optimal control Update rule based on cost-weighted averaging (next slide) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 43 / 44

Input: DMP with initial parameters θ Imitation Learning Reinforcement Learning [Figures from Stulp] τẍ t = k(g x t ) cẋ t ) + g T t θ Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 44 / 44

Imitation Learning Input: DMP with initial parameters θ, cost function J Reinforcement Learning [Figures from Stulp] τẍ t = k(g x t ) cẋ t ) J(τ i ) + g T t θ Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 44 / 44

Imitation Learning Reinforcement Learning Input: DMP with initial parameters θ, cost function J While (cost not converged) [Figures from Stulp] Explore τẍ t = k(g x t ) cẋ t ) J(τ i ) + g T t θ Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 44 / 44

Imitation Learning Reinforcement Learning Input: DMP with initial parameters θ, cost function J While (cost not converged) [Figures from Stulp] Explore sample exploration vectors τẍ t = k(g x t ) cẋ t ) J(τ i ) + g T t (θ + ɛ i,k ) ɛ i,k N (, Σ) θ k = θ + ɛ k Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 44 / 44

Imitation Learning Reinforcement Learning Input: DMP with initial parameters θ, cost function J While (cost not converged) [Figures from Stulp] Explore sample exploration vectors execute DMP τẍ t = k(g x t ) cẋ t ) J(τ i ) + g T t (θ + ɛ i,k ) ɛ i,k N (, Σ) θ k = θ + ɛ k Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 44 / 44

Imitation Learning Reinforcement Learning Input: DMP with initial parameters θ, cost function J While (cost not converged) [Figures from Stulp] Explore sample exploration vectors execute DMP determine cost τẍ t = k(g x t ) cẋ t ) J(τ i ) + g T t (θ + ɛ i,k ) ɛ i,k N (, Σ) θ k = θ + ɛ k Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 44 / 44

Imitation Learning Reinforcement Learning Input: DMP with initial parameters θ, cost function J While (cost not converged) [Figures from Stulp] Explore sample exploration vectors execute DMP determine cost Update weighted averaging with Boltzmann dist. τẍ t = k(g x t ) cẋ t ) J(τ i ) + g T t (θ + ɛ i,k ) exp( λ J(τ i,k )) ɛ i,k N (, Σ) P(τ i,k) = k exp( λ J(τ i,k )) θ k = θ + ɛ k K θ ti = P(τ i,k )ɛ i,k Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 44 / 44 k=

Imitation Learning Reinforcement Learning Input: DMP with initial parameters θ, cost function J While (cost not converged) [Figures from Stulp] Explore sample exploration vectors execute DMP determine cost Update weighted averaging with Boltzmann dist. parameter update τẍ t = k(g x t ) cẋ t ) J(τ i ) + g T t (θ + ɛ i,k ) ɛ i,k N (, Σ) θ k = θ + ɛ k θ θ + θ P(τ i,k ) = exp( λ J(τ i,k )) k exp( λ J(τ i,k )) K θ ti = P(τ i,k )ɛ i,k Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 44 / 44 k=

Imitation Learning Reinforcement Learning Input: DMP with initial parameters θ, cost function J While (cost not converged) [Figures from Stulp] Explore sample exploration vectors execute DMP determine cost Update weighted averaging with Boltzmann dist. parameter update τẍ t = k(g x t ) cẋ t ) J(τ i ) + g T t (θ + ɛ i,k ) ɛ i,k N (, Σ) θ k = θ + ɛ k θ θ + θ P(τ i,k ) = exp( λ J(τ i,k )) k exp( λ J(τ i,k )) K θ ti = P(τ i,k )ɛ i,k Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 44 / 44 k=