CITEC SummerSchool 23 Learning From Physics to Knowledge Selected Learning Methods Robert Haschke Neuroinformatics Group CITEC, Bielefeld University September, th 23 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 / 44
Outline Representations 2 Function Learning 3 Perceptual Grouping 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 2 / 44
Representations Outline Representations Feature Space Expansion Gaussian Processes Reservoir Computing 2 Function Learning 3 Perceptual Grouping 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 3 / 44
Representations Feature Space Expansion Feature Space Expansion High-dimensional feature spaces facilitate computations, e.g. enable linear separability of class regions polynomial expansion: (x..x d ) (x..x d,.., x i x j.., x i x j x k ) time history: use (x t, x t... x t k ) R d (k+) filters temporal convolution with kernels: x(t) = K(t, t ) x(t ) dt Kernel trick Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 3 / 44
Representations Feature Space Expansion Kernel Trick linear regressors or perceptrons often have the form y(x) = w t x = N λ α x α x α= Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44
Representations Feature Space Expansion Kernel Trick linear regressors or perceptrons often have the form y(x) = w t x = N α= λ α x α x = α w α φ(x α ) φ(x) goal: introduce non-linear, high-dimensional features φ k (x) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44
Representations Feature Space Expansion Kernel Trick linear regressors or perceptrons often have the form y(x) = w t x = N α= λ α x α x = α w α φ(x α ) φ(x) = α w α K(x α, x) goal: introduce non-linear, high-dimensional features φ k (x) kernel directly computes scalar product of feature vectors Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44
Representations Feature Space Expansion Kernel Trick linear regressors or perceptrons often have the form y(x) = w t x = N α= λ α x α x = α w α φ(x α ) φ(x) = α w α K(x α, x) goal: introduce non-linear, high-dimensional features φ k (x) kernel directly computes scalar product of feature vectors typical examples, e.g. in support vector machines (SVM): polynomial kernel: K(x, x ) = (x x + ) p Gaussian kernel: K(x, x ) = exp( x x 2 2σ 2 ) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44
Representations Feature Space Expansion Example: Bayesian Linear Regression f (x, w) = w t φ(x) y = f (x, w) + η Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 5 / 44
Representations Feature Space Expansion Example: Bayesian Linear Regression f (x, w) = w t φ(x) p(y x, w) = N (w t φ(x), β ) y = f (x, w) + η Gaussian data distribution Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 5 / 44
Representations Feature Space Expansion Example: Bayesian Linear Regression f (x, w) = w t φ(x) y = f (x, w) + η p(y x, w) = N (w t φ(x), β ) Gaussian data distribution ŵ ML = (Φ t Φ) Φ t y Φ t = [φ(x ),..., φ(x N )] R d N (maximum likelihood estimator) y t = [y,..., y N ] R N Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 5 / 44
Representations Feature Space Expansion Example: Bayesian Linear Regression f (x, w) = w t φ(x) y = f (x, w) + η p(y x, w) = N (w t φ(x), β ) Gaussian data distribution ŵ ML = (Φ t Φ) Φ t y Φ t = [φ(x ),..., φ(x N )] R d N (maximum likelihood estimator) y t = [y,..., y N ] R N p(w) = N (, α ) Gaussian a-priori distribution Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 5 / 44
Representations Feature Space Expansion Example: Bayesian Linear Regression f (x, w) = w t φ(x) y = f (x, w) + η p(y x, w) = N (w t φ(x), β ) Gaussian data distribution ŵ ML = (Φ t Φ) Φ t y Φ t = [φ(x ),..., φ(x N )] R d N (maximum likelihood estimator) y t = [y,..., y N ] R N p(w) = N (, α ) p(w y) = N (m N, S N ) ŵ MAP = m N = βs N Φ t y S N = (α + βφ t Φ) Gaussian a-priori distribution a-posteriori distribution MAP estimator R d d Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 5 / 44
Representations Feature Space Expansion Example: Bayesian Linear Regression f (x, w) = w t φ(x) y = f (x, w) + η p(y x, w) = N (w t φ(x), β ) Gaussian data distribution ŵ ML = (Φ t Φ) Φ t y Φ t = [φ(x ),..., φ(x N )] R d N (maximum likelihood estimator) y t = [y,..., y N ] R N p(w) = N (, α ) p(w y) = N (m N, S N ) ŵ MAP = m N = βs N Φ t y S N = (α + βφ t Φ) Gaussian a-priori distribution a-posteriori distribution MAP estimator R d d ŷ(x) = φ(x) t ŵ MAP = β φ(x) t S N Φ t y = α β φ(x) t S N φ(x α ) y α Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 5 / 44
Representations Feature Space Expansion Example: Bayesian Linear Regression f (x, w) = w t φ(x) y = f (x, w) + η p(y x, w) = N (w t φ(x), β ) Gaussian data distribution ŵ ML = (Φ t Φ) Φ t y Φ t = [φ(x ),..., φ(x N )] R d N (maximum likelihood estimator) y t = [y,..., y N ] R N p(w) = N (, α ) p(w y) = N (m N, S N ) ŵ MAP = m N = βs N Φ t y S N = (α + βφ t Φ) Gaussian a-priori distribution a-posteriori distribution MAP estimator R d d ŷ(x) = φ(x) t ŵ MAP = β φ(x) t S N Φ t y = α β φ(x) t S N φ(x α ) y α = α K(x, x α ) y α K(x, x α ) = βφ(x) t S N φ(x α ) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 5 / 44
Representations Gaussian Processes Outline Representations Feature Space Expansion Gaussian Processes Reservoir Computing 2 Function Learning 3 Perceptual Grouping 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP.5.5.5 2 3 4 5 6 7 8 9 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP.5.5.5.5 2 2 3 4 5 6 7 8 9 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP 2.5.5.5.5 2 2 3 4 5 6 7 8 9 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP 2.5.5.5.5 2 2 3 4 5 6 7 8 9 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP.5.5.5.5 2 3 4 5 6 7 8 9 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP.5.5.5.5 2 3 4 5 6 7 8 9 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP.5.5.5.5 2 3 4 5 6 7 8 9 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP.5.5.5.5 2 3 4 5 6 7 8 9 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP.5.5.5.5 2 2 3 4 5 6 7 8 9 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP.5.5.5.5 2 3 4 5 6 7 8 9 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP.5.5.5.5 2 3 4 5 6 7 8 9 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP.5.5.5.5 2 3 4 5 6 7 8 9 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP.5.5.5 2 3 4 5 6 7 8 9 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP.5.5.5 2 3 4 5 6 7 8 9 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
Representations Gaussian Processes Gaussian Processes distribution over parameters w induces distribution over estimator functions idea of Gaussian Processes: directly operate on function distributions, skipping the parameter representation Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 7 / 44
Representations Gaussian Processes Gaussian Processes distribution over parameters w induces distribution over estimator functions idea of Gaussian Processes: directly operate on function distributions, skipping the parameter representation Definition A Gaussian process is a collection of random variables, any finite collection of which has a joint Gaussian distribution. A Gaussian process is fully specified by a mean function f (x) and covariance function k(x, x ). Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 7 / 44
Representations Gaussian Processes Gaussian Processes distribution over parameters w induces distribution over estimator functions idea of Gaussian Processes: directly operate on function distributions, skipping the parameter representation Definition A Gaussian process is a collection of random variables, any finite collection of which has a joint Gaussian distribution. A Gaussian process is fully specified by a mean function f (x) and covariance function k(x, x ). Example linear model f (x) = w t φ(x) with Gaussian prior p(w) = N (, α ) any tuple [f (x )... f (x N )] has Gaussian distribution (as linear combination of Gaussian distributed variables w) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 7 / 44
Representations Gaussian Processes Gaussian Processes Definition A Gaussian process is a collection of random variables, any finite collection of which has a joint Gaussian distribution. A Gaussian process is fully specified by a mean function f (x) and covariance function k(x, x ). Example linear model f (x) = w t φ(x) with Gaussian prior p(w) = N (, α ) any tuple [f (x )... f (x N )] has Gaussian distribution (as linear combination of Gaussian distributed variables w) Specification: mean + covariance function mean: E[f (x)] = E[w t ] φ(x) = covariance: E[f (x)f (x )] = φ(x) t E[ww t ] φ(x ) = α φ(x) t φ(x ) k(x, x ) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 7 / 44
Representations Gaussian Processes Gaussian Process Regression How we can exploit GPs for regression? N training points x... x N induce a Gaussian distribution of associated function values f N = [f (x )... f (x N )]. Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 8 / 44
Representations Gaussian Processes Gaussian Process Regression How we can exploit GPs for regression? N training points x... x N induce a Gaussian distribution of associated function values f N = [f (x )... f (x N )]. Embedding an additional N +-th query point x N+ again yields a Gaussian distribution, now of f N+ = [f (x )... f (x N ), f (x N+ )]. Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 8 / 44
Representations Gaussian Processes Gaussian Process Regression How we can exploit GPs for regression? N training points x... x N induce a Gaussian distribution of associated function values f N = [f (x )... f (x N )]. Embedding an additional N +-th query point x N+ again yields a Gaussian distribution, now of f N+ = [f (x )... f (x N ), f (x N+ )]. Evaluate predictive distribution p(f (x N+ ) f (x )... f (x N )). Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 8 / 44
Representations Gaussian Processes Gaussian Process Regression Computational Steps p(f N+ ) = N (, K N+ ) ( ) KN k K N+ = k t R (N+) (N+) c K N (x n, x m ) = k(x n, x m ) k = [k(x, x N+ )... k(x N, x N+ )] R N c = k(x N+, x N+ ) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 9 / 44
Representations Gaussian Processes Gaussian Process Regression Computational Steps p(f N+ ) = N (, K N+ ) ( ) KN k K N+ = k t R (N+) (N+) c K N (x n, x m ) = k(x n, x m ) predictive distribution k = [k(x, x N+ )... k(x N, x N+ )] R N c = k(x N+, x N+ ) p(f N+ f N ) is again Gaussian with mean µ(x N+ ) = k t K N f N and variance σ 2 (x N+ ) = c k t K N k Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 9 / 44
Representations Reservoir Computing Outline Representations Feature Space Expansion Gaussian Processes Reservoir Computing 2 Function Learning 3 Perceptual Grouping 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 / 44
Representations Reservoir Computing Multilayer Perceptron hidden layer serves as high-dimensional feature space back propagation creates optimal features but: input weights adapt only slowly Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 / 44
Representations Reservoir Computing Extreme Learning Machine simplification: fixed, random input features linear readout facilitates learning: regression methods Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 / 44
Representations Reservoir Computing Reservoir Computing recurrent reservoir allows for temporal dynamics even more rich feature space linear readout: regression Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 2 / 44
Representations Reservoir Computing Reservoir Computing recurrent reservoir allows for temporal dynamics even more rich feature space linear readout: regression echo state network liquid state machine backpropagation-decorrelation Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 2 / 44
Representations Reservoir Computing Reservoir Computing recurrent reservoir allows for temporal dynamics even more rich feature space linear readout: regression echo state network liquid state machine backpropagation-decorrelation How to tune the reservoir? echo state property: activity decays without excitation Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 2 / 44
Representations Reservoir Computing Associative Reservoir Network [Steil] learning in both directions input forcing bidirectional association forward mapping inverse mapping Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 3 / 44
Function Learning Outline Representations 2 Function Learning Parameterized SOM Unsupervised Kernel Regression 3 Perceptual Grouping 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44
Function Learning Parameterized SOM Outline Representations 2 Function Learning Parameterized SOM Unsupervised Kernel Regression 3 Perceptual Grouping 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44
Function Learning Parameterized SOM Parameterized SOM [Ritter 993] fast learning of functions and inverses generalization of self-organizing maps (SOM) from discrete to continuous manifolds c c 3 33 x 3 M w(c) x 2 discrete mapping x C w c Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44
Function Learning Parameterized SOM Parameterized SOM [Ritter 993] fast learning of functions and inverses generalization of self-organizing maps (SOM) from discrete to continuous manifolds c c 3 33 x 3 M w(s) x 2 s 2 s continuous mapping x S coordinate system S spanned by nodes c A w c manifold M in input space X Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44
Function Learning Parameterized SOM PSOM: Function Modeling linear superposition of basis functions H(c, s): w(s) = c A H(c, s)w c H(c, s) should form an orthonormal basis system: H(c, c ) = δ cc representing constant functions: s S H(c, s) = c A Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 5 / 44
Function Learning Parameterized SOM PSOM: Function Modeling simple polynomials along every grid dimension piecewise linear functions:.9.9.9.8.8.8.7.7.7.6.6.6.5.5.5.4.4.4.3.3.3.2.2.2....2.4.6.8.2.4.6.8.2.4.6.8 m multiplicative combination of multiple grid dimensions: H(c, s) = h ν (s ν, A ν ) ν= Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
Function Learning Parameterized SOM PSOM: Function Modeling simple polynomials along every grid dimension Lagrange polynomials: h(c, s) = s c c c c A, c c.9.9.9.8.8.8.7.7.7.6.6.6.5.5.5.4.4.4.3.3.3.2.2.2....2.4.6.8.2.4.6.8.2.4.6.8 m multiplicative combination of multiple grid dimensions: H(c, s) = h ν (s ν, A ν ) ν= Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
Function Learning Parameterized SOM.8.8.8.6.6.6.4.4.4.2.2.2.8.6.8.6.8.6.4.2.5.4.2.5.4.2.5.8.6.4.2.8.6.4.2.8.6.4.2 linear basis functions.8.6.8.6.8.6.4.2.5.4.2.5.4.2.5.8.8.8.6.6.6.4.4.4.2.2.2.8.6.8.6.8.6.4.2.5.4.2.5.4.2.5 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 7 / 44
Function Learning Parameterized SOM.8.8.8.6.6.6.4.4.4.2.2.2.2.2.2.8.6.8.6.8.6.4.2.5.4.2.5.4.2.5.8.6.4.2.8.6.4.2.8.6.4.2 quadratic basis functions.2.2.2.8.6.8.6.8.6.4.2.5.4.2.5.4.2.5.8.8.8.6.6.6.4.4.4.2.2.2.2.2.2.8.6.8.6.8.6.4.2.5.4.2.5.4.2.5 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 8 / 44
Function Learning Parameterized SOM Inverse Mapping find coordinates s closest to observation w s = arg min s S using gradient descent w w(s) s t+ = s t η s w(s t ) (w w(s t )) starting at closest discrete node s = arg min.4 c A.2.8.6.4.2.2.8.6.5 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 9 / 44
Function Learning Parameterized SOM Example: Kinematics Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 2 / 44
Function Learning Unsupervised Kernel Regression Outline Representations 2 Function Learning Parameterized SOM Unsupervised Kernel Regression 3 Perceptual Grouping 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 2 / 44
Function Learning Unsupervised Kernel Regression Unsupervised Kernel Regression [Klanke 26] learning of continuous non-linear manifolds generalizes from fixed PSOM grid employs unsupervised formulation of Nadaraya-Watson estimator Y = {y, y 2,..., y N } R d N X = {x, x 2,..., x N } R q N y = f(x; X) observed data latent parameters corresp. functional relationship f(x; X) = i K(x x i ) y i j K(x x j) K(x x i ) = N (, Σ) Gaussian kernel Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 2 / 44
Function Learning Unsupervised Kernel Regression UKR Learning Minimize reconstruction error R(X) = y m f (x m ; X) 2 N m init t = 3 t = 7 t = t = 4 t = 2 t = 5 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 22 / 44
Function Learning Unsupervised Kernel Regression UKR Learning Minimize reconstruction error R(X) = y m f m (x m ; X) 2 N m (leave-one-out CV) init t = 3 t = 7 t = t = 4 t = 2 t = 5 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 22 / 44
Function Learning Unsupervised Kernel Regression Learned Manipulation Manifold Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 23 / 44
Function Learning Unsupervised Kernel Regression Manipulation Manifold in action Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 24 / 44
Function Learning Unsupervised Kernel Regression Shadow Robot Hand Opening a Bottle Cap [Humanoids 2] Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 25 / 44
Perceptual Grouping Outline Representations 2 Function Learning 3 Perceptual Grouping Gestalt Laws Competitive Layer Model Kuramoto Oscillator Network Comparison Learning 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 26 / 44
Perceptual Grouping Gestalt Laws Outline Representations 2 Function Learning 3 Perceptual Grouping Gestalt Laws Competitive Layer Model Kuramoto Oscillator Network Comparison Learning 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 26 / 44
Perceptual Grouping Gestalt Laws Perceptual Grouping Colour Segmentation Texture Segmentation Segmenting Cell Images Contour Grouping Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 26 / 44
Perceptual Grouping Gestalt Laws Gestalt Laws [Wertheimer 923] proximity similarity closure continuation Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 27 / 44
Perceptual Grouping Gestalt Laws Gestalt Laws [Wertheimer 923] proximity similarity closure continuation attraction repulsion Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 27 / 44
Perceptual Grouping Gestalt Laws Interaction Matrix 2 3 4 5 6 7 8 9 compatibility of feature pairs induces interaction matrix 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 attraction: positive feedback repulsion: negative feedback Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 28 / 44
Perceptual Grouping Gestalt Laws Interaction Matrix target segmentation compatibility of feature pairs induces interaction matrix block structure defines groups interaction matrix Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 28 / 44
Perceptual Grouping Gestalt Laws Interaction Matrix target segmentation compatibility of feature pairs induces interaction matrix block structure defines groups real features unsorted interaction matrix Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 28 / 44
Perceptual Grouping Gestalt Laws Interaction Matrix target segmentation compatibility of feature pairs induces interaction matrix block structure defines groups real features unsorted and noisy interaction matrix Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 28 / 44
Perceptual Grouping Gestalt Laws Interaction Matrix target segmentation compatibility of feature pairs induces interaction matrix block structure defines groups real features unsorted and noisy robust grouping dynamics interaction matrix Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 28 / 44
Perceptual Grouping Competitive Layer Model Outline Representations 2 Function Learning 3 Perceptual Grouping Gestalt Laws Competitive Layer Model Kuramoto Oscillator Network Comparison Learning 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 29 / 44
Perceptual Grouping Competitive Layer Model Competitive Layer Model (CLM) [Ritter 99] input: set of features input: feature vector m r examples: position: m r = (x r, y r ) T oriented line features: m r = (x r, y r, φ r ) T Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 29 / 44
Perceptual Grouping Competitive Layer Model Competitive Layer Model (CLM) [Ritter 99] 2 layered architecture of neurons neurons x rα L layers α =,..., L columns r of neurons activation: linear threshold σ(x) = max(, x) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 29 / 44
Perceptual Grouping Competitive Layer Model Competitive Layer Model (CLM) [Ritter 99] 3 goal: grouping as activation within layers label ˆα(r) : ˆα(r) = : ˆα(r) = 2 : ˆα(r) = L Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 29 / 44
Perceptual Grouping Competitive Layer Model Competitive Layer Model (CLM) [Ritter 99] 4 lateral compatibility interaction f rr induces grouping lateral interaction f rr f rr compatibility of feature pair m r m r Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 29 / 44
Perceptual Grouping Competitive Layer Model Competitive Layer Model (CLM) [Ritter 99] 5 vertical competition: pushes groups to layers vertical inhibition J Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 29 / 44
Perceptual Grouping Competitive Layer Model Competitive Layer Model (CLM) [Ritter 99] 6 column-wise stimulation with feature-dependent bias bias h r overall activity per column significance of feature m r Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 29 / 44
Perceptual Grouping Competitive Layer Model Competitive Layer Model (CLM) [Ritter 99] 6 simulation of recurrent dynamics overall dynamics ẋ rα = x rα + ( σ J(h r x rβ ) + β r f rr x r α ) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 29 / 44
Perceptual Grouping Kuramoto Oscillator Network Outline Representations 2 Function Learning 3 Perceptual Grouping Gestalt Laws Competitive Layer Model Kuramoto Oscillator Network Comparison Learning 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 3 / 44
Perceptual Grouping Kuramoto Oscillator Network Kuramoto Oscillator Network [Meier 23] more efficient grouping dynamics based on coupled oscillators phase θ and frequency ω phase coupling by f rr O O 2 ψ 2 r 2 phase coupling phase coupling O 3 ω L ω 2 θ ω θ r = ω r + K N N f rr sin(θ r θ r ) r = v v 2 v 3 input Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 3 / 44
Perceptual Grouping Kuramoto Oscillator Network Kuramoto Oscillator Network [Meier 23] more efficient grouping dynamics based on coupled oscillators phase θ and frequency ω phase coupling by f rr phase similarity determines frequency grouping O O 2 ψ 2 r 2 phase coupling phase coupling O 3 ω L ω 2 similar features share same frequency θ ω θ r = ω r + K N N f rr sin(θ r θ r ) r = ω r ( = ω argmax α r N (α) f rr ( cos(θr θ r ) + )) 2 v v 2 v 3 input Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 3 / 44
Perceptual Grouping Kuramoto Oscillator Network Example: Contour Grouping Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 3 / 44
Perceptual Grouping Comparison Outline Representations 2 Function Learning 3 Perceptual Grouping Gestalt Laws Competitive Layer Model Kuramoto Oscillator Network Comparison Learning 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 32 / 44
Perceptual Grouping Comparison Evaluation: CLM vs. Oscillator Network groups á features layers resp. frequencies (only needed) different amount of noise in interaction matrices.5.5.5.5.5 -.5 -.5 -.5 -.5 -.5 % - % - 2% - 3% - 4% - Figure : Interaction matrices with different amounts of inverted interaction values. Black pixel represents attraction, white is repelling. Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 32 / 44
Perceptual Grouping Comparison Evaluation Results similar grouping quality grouping quality, mean and stddev.8.6.4.2 CLM Oscillators 5 5 2 25 3 35 4 45 % of noise Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 33 / 44
Perceptual Grouping Comparison Evaluation Results similar grouping quality reduced computational complexity increased convergence speed # of update steps, mean and stddev 3 25 2 5 5 CLM Oscillators 5 5 2 25 3 CLM Oscillators 3 32 33 34 35 36 37 38 39 % of noise % of noise Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 33 / 44 9 8 7 6 5 4 3 2 # of update steps, mean and stddev
Perceptual Grouping Comparison Evaluation Results similar grouping quality reduced computational complexity increased convergence speed faster recovery on changing interaction matrix (splitting from to 2 groups after initial convergence) average number of steps CLM Oscillators 5 5 2 25 3 35 % of noise Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 33 / 44
Perceptual Grouping Learning Outline Representations 2 Function Learning 3 Perceptual Grouping Gestalt Laws Competitive Layer Model Kuramoto Oscillator Network Comparison Learning 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 34 / 44
Perceptual Grouping Learning How to integrate learning? recurrent dynamics robustly creates grouping dynamics determined by interaction matrix f rr How to learn compatibilities? Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 34 / 44
Perceptual Grouping Learning How to integrate learning? recurrent dynamics robustly creates grouping dynamics determined by interaction matrix f rr How to learn compatibilities? CLM dynamics extremly robust wrt. noise in interaction only learn coarse interaction matrix Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 34 / 44
Perceptual Grouping Learning Learning Architecture [Weng 26] Compute more general distance function on feature pairs Learn distance prototypes from labeled samples (VQ) Exploit labels to count pos./neg. weighted prototypes Overview: labeled training data proximity vector d(m r, m r' ) Voronoi cell proximity prototype proximity space D positive coefficents + 2 clustering 3 f rr' > f rr' < negative coefficents interaction function Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 35 / 44
Imitation Learning Outline Representations 2 Function Learning 3 Perceptual Grouping 4 Imitation Learning Dynamic Movement Primitives Reinforcement Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 36 / 44
Imitation Learning Imitation Learning learn from observations How to observe actions? Which elements are important? What to learn? How to represent observed actions? improving + adapting motion autonomous exploration Reinforcement Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 36 / 44
Imitation Learning Imitation Learning learn from observations How to observe actions? Which elements are important? What to learn? How to represent observed actions? Dynamic Movement Primitives improving + adapting motion autonomous exploration Reinforcement Learning Policy Improvement with Path Integrals (PI 2 ) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 36 / 44
Imitation Learning Dynamic Movement Primitives Outline Representations 2 Function Learning 3 Perceptual Grouping 4 Imitation Learning Dynamic Movement Primitives Reinforcement Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 37 / 44
Imitation Learning Dynamic Movement Primitives Dynamic Movement Primitives [Ijspeert, Nakanishi, Schaal, ICRA 2] spring-damper system generates basic motion towards goal τẍ t = k (g x t ) c ẋ t v Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 37 / 44
Imitation Learning Dynamic Movement Primitives Dynamic Movement Primitives [Ijspeert, Nakanishi, Schaal, ICRA 2] spring-damper system generates basic motion towards goal τẍ t = k (g x t ) c ẋ t Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 37 / 44
Imitation Learning Dynamic Movement Primitives Dynamic Movement Primitives [Ijspeert, Nakanishi, Schaal, ICRA 2] add external force to represent complex trajectory shapes τẍ t = k (g x t ) c ẋ t + g T t θ Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 37 / 44
Imitation Learning Dynamic Movement Primitives Dynamic Movement Primitives [Ijspeert, Nakanishi, Schaal, ICRA 2] add external force to represent complex trajectory shapes τẍ t = k (g x t ) c ẋ t + g T t θ Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 37 / 44
Imitation Learning Dynamic Movement Primitives Dynamic Movement Primitives [Ijspeert, Nakanishi, Schaal, ICRA 2] add external force to represent complex trajectory shapes τẍ t = k (g x t ) c ẋ t + g T t θ Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 37 / 44
Imitation Learning Dynamic Movement Primitives External Force f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x ) s weighted sum of basis functions ψ i Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 38 / 44
Imitation Learning Dynamic Movement Primitives External Force f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x ) s weighted sum of basis functions ψ i soft-max normalization Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 38 / 44
Imitation Learning Dynamic Movement Primitives External Force f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x ) s weighted sum of basis functions ψ i soft-max normalization amplitude scaled by initial distance to goal Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 38 / 44
Imitation Learning Dynamic Movement Primitives External Force f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x ) s weighted sum of basis functions ψ i soft-max normalization amplitude scaled by initial distance to goal influence weighted by canonical time s Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 38 / 44
Imitation Learning Dynamic Movement Primitives External Force f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x ) s weighted sum of basis functions ψ i soft-max normalization amplitude scaled by initial distance to goal influence weighted by canonical time s Gaussian basis functions ψ i ψ i (s) = exp( h i (s c i ) 2 ) c i logarithmically distributed in [... ] s 2 [Schaal et al, Brain Research, 27] Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 38 / 44
Imitation Learning Dynamic Movement Primitives Canonical System f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x ) s decouple external force from spring-damper evolution new phase / time variable s τṡ = α s s initially set to...... exponentially converges to Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 39 / 44
Imitation Learning Dynamic Movement Primitives Canonical System f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x ) s decouple external force from spring-damper evolution new phase / time variable s τṡ = α s + α c (x actual x expected ) 2 s initially set to...... exponentially converges to pause influence of force on perturbations Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 39 / 44
Imitation Learning Dynamic Movement Primitives Properties τẍ t = k (g x t ) c ẋ t + g T t θ τṡ = α s f (s) = gt T θ = i θ iψ i (s) i ψ i(s) (g x ) s convergence to goal g motions are self-similar for different goal or start points coupling of multiple DOF through canonical phase s adapt τ for temporal scaling robust to perturbations due to attractor dynamics decoupling basic goal-directed motion from task-specific trajectory shape weights θ i can be learned with linear regression Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44
Imitation Learning Dynamic Movement Primitives Learning from Demonstration record motion x(t), ẋ(t), ẍ(t) choose τ to match duration evolve canonical system s(t) f target (s) = τẍ(t) k(g x(t) cẋ(t)) minimize E = s (f target(s) f (s)) 2 with regression.4.2.2.2.2.8.6.4.2 X.4.8.6.4.2.2.4.6 X.5.5 X.2.4.2.2.4.6.8 X [Ijspeert, Nakanishi, Schaal, ICRA 2] Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44
Imitation Learning Reinforcement Learning Outline Representations 2 Function Learning 3 Perceptual Grouping 4 Imitation Learning Dynamic Movement Primitives Reinforcement Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 42 / 44
Imitation Learning Reinforcement Learning Robustifying Motions motion from imitation learning is fragile robustify by self-exploration competing RL methods: PI 2 Policy Improvement with Path Integrals [Stefan Schaal] PoWeR Policy Learning by Weighting Exploration with the Returns [Jan Peters] Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 42 / 44
Imitation Learning Reinforcement Learning Policy Improvement with Path Integrals PI 2 [Evangelos 2] Optimize shape parameters θ w.r.t. cost function J Use direct reinforcement learning Exploration directly in policy parameter space θ Use Policy Improvement with Path Integrals PI 2 Derived from principles of optimal control Update rule based on cost-weighted averaging (next slide) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 43 / 44
Input: DMP with initial parameters θ Imitation Learning Reinforcement Learning [Figures from Stulp] τẍ t = k(g x t ) cẋ t ) + g T t θ Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 44 / 44
Imitation Learning Input: DMP with initial parameters θ, cost function J Reinforcement Learning [Figures from Stulp] τẍ t = k(g x t ) cẋ t ) J(τ i ) + g T t θ Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 44 / 44
Imitation Learning Reinforcement Learning Input: DMP with initial parameters θ, cost function J While (cost not converged) [Figures from Stulp] Explore τẍ t = k(g x t ) cẋ t ) J(τ i ) + g T t θ Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 44 / 44
Imitation Learning Reinforcement Learning Input: DMP with initial parameters θ, cost function J While (cost not converged) [Figures from Stulp] Explore sample exploration vectors τẍ t = k(g x t ) cẋ t ) J(τ i ) + g T t (θ + ɛ i,k ) ɛ i,k N (, Σ) θ k = θ + ɛ k Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 44 / 44
Imitation Learning Reinforcement Learning Input: DMP with initial parameters θ, cost function J While (cost not converged) [Figures from Stulp] Explore sample exploration vectors execute DMP τẍ t = k(g x t ) cẋ t ) J(τ i ) + g T t (θ + ɛ i,k ) ɛ i,k N (, Σ) θ k = θ + ɛ k Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 44 / 44
Imitation Learning Reinforcement Learning Input: DMP with initial parameters θ, cost function J While (cost not converged) [Figures from Stulp] Explore sample exploration vectors execute DMP determine cost τẍ t = k(g x t ) cẋ t ) J(τ i ) + g T t (θ + ɛ i,k ) ɛ i,k N (, Σ) θ k = θ + ɛ k Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 44 / 44
Imitation Learning Reinforcement Learning Input: DMP with initial parameters θ, cost function J While (cost not converged) [Figures from Stulp] Explore sample exploration vectors execute DMP determine cost Update weighted averaging with Boltzmann dist. τẍ t = k(g x t ) cẋ t ) J(τ i ) + g T t (θ + ɛ i,k ) exp( λ J(τ i,k )) ɛ i,k N (, Σ) P(τ i,k) = k exp( λ J(τ i,k )) θ k = θ + ɛ k K θ ti = P(τ i,k )ɛ i,k Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 44 / 44 k=
Imitation Learning Reinforcement Learning Input: DMP with initial parameters θ, cost function J While (cost not converged) [Figures from Stulp] Explore sample exploration vectors execute DMP determine cost Update weighted averaging with Boltzmann dist. parameter update τẍ t = k(g x t ) cẋ t ) J(τ i ) + g T t (θ + ɛ i,k ) ɛ i,k N (, Σ) θ k = θ + ɛ k θ θ + θ P(τ i,k ) = exp( λ J(τ i,k )) k exp( λ J(τ i,k )) K θ ti = P(τ i,k )ɛ i,k Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 44 / 44 k=
Imitation Learning Reinforcement Learning Input: DMP with initial parameters θ, cost function J While (cost not converged) [Figures from Stulp] Explore sample exploration vectors execute DMP determine cost Update weighted averaging with Boltzmann dist. parameter update τẍ t = k(g x t ) cẋ t ) J(τ i ) + g T t (θ + ɛ i,k ) ɛ i,k N (, Σ) θ k = θ + ɛ k θ θ + θ P(τ i,k ) = exp( λ J(τ i,k )) k exp( λ J(τ i,k )) K θ ti = P(τ i,k )ɛ i,k Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 44 / 44 k=