CITEC SummerSchool 2013 Learning From Physics to Knowledge Selected Learning Methods
|
|
- Roland Kennedy
- 6 years ago
- Views:
Transcription
1 CITEC SummerSchool 23 Learning From Physics to Knowledge Selected Learning Methods Robert Haschke Neuroinformatics Group CITEC, Bielefeld University September, th 23 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 / 44
2 Outline Representations 2 Function Learning 3 Perceptual Grouping 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 2 / 44
3 Representations Outline Representations Feature Space Expansion Gaussian Processes Reservoir Computing 2 Function Learning 3 Perceptual Grouping 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 3 / 44
4 Representations Feature Space Expansion Feature Space Expansion High-dimensional feature spaces facilitate computations, e.g. enable linear separability of class regions polynomial expansion: (x..x d ) (x..x d,.., x i x j.., x i x j x k ) time history: use (x t, x t... x t k ) R d (k+) filters temporal convolution with kernels: x(t) = K(t, t ) x(t ) dt Kernel trick Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 3 / 44
5 Representations Feature Space Expansion Kernel Trick linear regressors or perceptrons often have the form y(x) = w t x = N λ α x α x α= Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44
6 Representations Feature Space Expansion Kernel Trick linear regressors or perceptrons often have the form y(x) = w t x = N α= λ α x α x = α w α φ(x α ) φ(x) goal: introduce non-linear, high-dimensional features φ k (x) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44
7 Representations Feature Space Expansion Kernel Trick linear regressors or perceptrons often have the form y(x) = w t x = N α= λ α x α x = α w α φ(x α ) φ(x) = α w α K(x α, x) goal: introduce non-linear, high-dimensional features φ k (x) kernel directly computes scalar product of feature vectors Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44
8 Representations Feature Space Expansion Kernel Trick linear regressors or perceptrons often have the form y(x) = w t x = N α= λ α x α x = α w α φ(x α ) φ(x) = α w α K(x α, x) goal: introduce non-linear, high-dimensional features φ k (x) kernel directly computes scalar product of feature vectors typical examples, e.g. in support vector machines (SVM): polynomial kernel: K(x, x ) = (x x + ) p Gaussian kernel: K(x, x ) = exp( x x 2 2σ 2 ) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44
9 Representations Feature Space Expansion Example: Bayesian Linear Regression f (x, w) = w t φ(x) y = f (x, w) + η Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 5 / 44
10 Representations Feature Space Expansion Example: Bayesian Linear Regression f (x, w) = w t φ(x) p(y x, w) = N (w t φ(x), β ) y = f (x, w) + η Gaussian data distribution Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 5 / 44
11 Representations Feature Space Expansion Example: Bayesian Linear Regression f (x, w) = w t φ(x) y = f (x, w) + η p(y x, w) = N (w t φ(x), β ) Gaussian data distribution ŵ ML = (Φ t Φ) Φ t y Φ t = [φ(x ),..., φ(x N )] R d N (maximum likelihood estimator) y t = [y,..., y N ] R N Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 5 / 44
12 Representations Feature Space Expansion Example: Bayesian Linear Regression f (x, w) = w t φ(x) y = f (x, w) + η p(y x, w) = N (w t φ(x), β ) Gaussian data distribution ŵ ML = (Φ t Φ) Φ t y Φ t = [φ(x ),..., φ(x N )] R d N (maximum likelihood estimator) y t = [y,..., y N ] R N p(w) = N (, α ) Gaussian a-priori distribution Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 5 / 44
13 Representations Feature Space Expansion Example: Bayesian Linear Regression f (x, w) = w t φ(x) y = f (x, w) + η p(y x, w) = N (w t φ(x), β ) Gaussian data distribution ŵ ML = (Φ t Φ) Φ t y Φ t = [φ(x ),..., φ(x N )] R d N (maximum likelihood estimator) y t = [y,..., y N ] R N p(w) = N (, α ) p(w y) = N (m N, S N ) ŵ MAP = m N = βs N Φ t y S N = (α + βφ t Φ) Gaussian a-priori distribution a-posteriori distribution MAP estimator R d d Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 5 / 44
14 Representations Feature Space Expansion Example: Bayesian Linear Regression f (x, w) = w t φ(x) y = f (x, w) + η p(y x, w) = N (w t φ(x), β ) Gaussian data distribution ŵ ML = (Φ t Φ) Φ t y Φ t = [φ(x ),..., φ(x N )] R d N (maximum likelihood estimator) y t = [y,..., y N ] R N p(w) = N (, α ) p(w y) = N (m N, S N ) ŵ MAP = m N = βs N Φ t y S N = (α + βφ t Φ) Gaussian a-priori distribution a-posteriori distribution MAP estimator R d d ŷ(x) = φ(x) t ŵ MAP = β φ(x) t S N Φ t y = α β φ(x) t S N φ(x α ) y α Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 5 / 44
15 Representations Feature Space Expansion Example: Bayesian Linear Regression f (x, w) = w t φ(x) y = f (x, w) + η p(y x, w) = N (w t φ(x), β ) Gaussian data distribution ŵ ML = (Φ t Φ) Φ t y Φ t = [φ(x ),..., φ(x N )] R d N (maximum likelihood estimator) y t = [y,..., y N ] R N p(w) = N (, α ) p(w y) = N (m N, S N ) ŵ MAP = m N = βs N Φ t y S N = (α + βφ t Φ) Gaussian a-priori distribution a-posteriori distribution MAP estimator R d d ŷ(x) = φ(x) t ŵ MAP = β φ(x) t S N Φ t y = α β φ(x) t S N φ(x α ) y α = α K(x, x α ) y α K(x, x α ) = βφ(x) t S N φ(x α ) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 5 / 44
16 Representations Gaussian Processes Outline Representations Feature Space Expansion Gaussian Processes Reservoir Computing 2 Function Learning 3 Perceptual Grouping 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
17 Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
18 Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
19 Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
20 Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
21 Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
22 Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
23 Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
24 Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
25 Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
26 Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
27 Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
28 Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
29 Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
30 Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
31 Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
32 Representations Gaussian Processes Gaussian Processes distribution over parameters w induces distribution over estimator functions idea of Gaussian Processes: directly operate on function distributions, skipping the parameter representation Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 7 / 44
33 Representations Gaussian Processes Gaussian Processes distribution over parameters w induces distribution over estimator functions idea of Gaussian Processes: directly operate on function distributions, skipping the parameter representation Definition A Gaussian process is a collection of random variables, any finite collection of which has a joint Gaussian distribution. A Gaussian process is fully specified by a mean function f (x) and covariance function k(x, x ). Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 7 / 44
34 Representations Gaussian Processes Gaussian Processes distribution over parameters w induces distribution over estimator functions idea of Gaussian Processes: directly operate on function distributions, skipping the parameter representation Definition A Gaussian process is a collection of random variables, any finite collection of which has a joint Gaussian distribution. A Gaussian process is fully specified by a mean function f (x) and covariance function k(x, x ). Example linear model f (x) = w t φ(x) with Gaussian prior p(w) = N (, α ) any tuple [f (x )... f (x N )] has Gaussian distribution (as linear combination of Gaussian distributed variables w) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 7 / 44
35 Representations Gaussian Processes Gaussian Processes Definition A Gaussian process is a collection of random variables, any finite collection of which has a joint Gaussian distribution. A Gaussian process is fully specified by a mean function f (x) and covariance function k(x, x ). Example linear model f (x) = w t φ(x) with Gaussian prior p(w) = N (, α ) any tuple [f (x )... f (x N )] has Gaussian distribution (as linear combination of Gaussian distributed variables w) Specification: mean + covariance function mean: E[f (x)] = E[w t ] φ(x) = covariance: E[f (x)f (x )] = φ(x) t E[ww t ] φ(x ) = α φ(x) t φ(x ) k(x, x ) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 7 / 44
36 Representations Gaussian Processes Gaussian Process Regression How we can exploit GPs for regression? N training points x... x N induce a Gaussian distribution of associated function values f N = [f (x )... f (x N )]. Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 8 / 44
37 Representations Gaussian Processes Gaussian Process Regression How we can exploit GPs for regression? N training points x... x N induce a Gaussian distribution of associated function values f N = [f (x )... f (x N )]. Embedding an additional N +-th query point x N+ again yields a Gaussian distribution, now of f N+ = [f (x )... f (x N ), f (x N+ )]. Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 8 / 44
38 Representations Gaussian Processes Gaussian Process Regression How we can exploit GPs for regression? N training points x... x N induce a Gaussian distribution of associated function values f N = [f (x )... f (x N )]. Embedding an additional N +-th query point x N+ again yields a Gaussian distribution, now of f N+ = [f (x )... f (x N ), f (x N+ )]. Evaluate predictive distribution p(f (x N+ ) f (x )... f (x N )). Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 8 / 44
39 Representations Gaussian Processes Gaussian Process Regression Computational Steps p(f N+ ) = N (, K N+ ) ( ) KN k K N+ = k t R (N+) (N+) c K N (x n, x m ) = k(x n, x m ) k = [k(x, x N+ )... k(x N, x N+ )] R N c = k(x N+, x N+ ) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 9 / 44
40 Representations Gaussian Processes Gaussian Process Regression Computational Steps p(f N+ ) = N (, K N+ ) ( ) KN k K N+ = k t R (N+) (N+) c K N (x n, x m ) = k(x n, x m ) predictive distribution k = [k(x, x N+ )... k(x N, x N+ )] R N c = k(x N+, x N+ ) p(f N+ f N ) is again Gaussian with mean µ(x N+ ) = k t K N f N and variance σ 2 (x N+ ) = c k t K N k Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 9 / 44
41 Representations Reservoir Computing Outline Representations Feature Space Expansion Gaussian Processes Reservoir Computing 2 Function Learning 3 Perceptual Grouping 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 / 44
42 Representations Reservoir Computing Multilayer Perceptron hidden layer serves as high-dimensional feature space back propagation creates optimal features but: input weights adapt only slowly Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 / 44
43 Representations Reservoir Computing Extreme Learning Machine simplification: fixed, random input features linear readout facilitates learning: regression methods Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 / 44
44 Representations Reservoir Computing Reservoir Computing recurrent reservoir allows for temporal dynamics even more rich feature space linear readout: regression Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 2 / 44
45 Representations Reservoir Computing Reservoir Computing recurrent reservoir allows for temporal dynamics even more rich feature space linear readout: regression echo state network liquid state machine backpropagation-decorrelation Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 2 / 44
46 Representations Reservoir Computing Reservoir Computing recurrent reservoir allows for temporal dynamics even more rich feature space linear readout: regression echo state network liquid state machine backpropagation-decorrelation How to tune the reservoir? echo state property: activity decays without excitation Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 2 / 44
47 Representations Reservoir Computing Associative Reservoir Network [Steil] learning in both directions input forcing bidirectional association forward mapping inverse mapping Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 3 / 44
48 Function Learning Outline Representations 2 Function Learning Parameterized SOM Unsupervised Kernel Regression 3 Perceptual Grouping 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44
49 Function Learning Parameterized SOM Outline Representations 2 Function Learning Parameterized SOM Unsupervised Kernel Regression 3 Perceptual Grouping 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44
50 Function Learning Parameterized SOM Parameterized SOM [Ritter 993] fast learning of functions and inverses generalization of self-organizing maps (SOM) from discrete to continuous manifolds c c 3 33 x 3 M w(c) x 2 discrete mapping x C w c Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44
51 Function Learning Parameterized SOM Parameterized SOM [Ritter 993] fast learning of functions and inverses generalization of self-organizing maps (SOM) from discrete to continuous manifolds c c 3 33 x 3 M w(s) x 2 s 2 s continuous mapping x S coordinate system S spanned by nodes c A w c manifold M in input space X Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44
52 Function Learning Parameterized SOM PSOM: Function Modeling linear superposition of basis functions H(c, s): w(s) = c A H(c, s)w c H(c, s) should form an orthonormal basis system: H(c, c ) = δ cc representing constant functions: s S H(c, s) = c A Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 5 / 44
53 Function Learning Parameterized SOM PSOM: Function Modeling simple polynomials along every grid dimension piecewise linear functions: m multiplicative combination of multiple grid dimensions: H(c, s) = h ν (s ν, A ν ) ν= Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
54 Function Learning Parameterized SOM PSOM: Function Modeling simple polynomials along every grid dimension Lagrange polynomials: h(c, s) = s c c c c A, c c m multiplicative combination of multiple grid dimensions: H(c, s) = h ν (s ν, A ν ) ν= Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44
55 Function Learning Parameterized SOM linear basis functions Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 7 / 44
56 Function Learning Parameterized SOM quadratic basis functions Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 8 / 44
57 Function Learning Parameterized SOM Inverse Mapping find coordinates s closest to observation w s = arg min s S using gradient descent w w(s) s t+ = s t η s w(s t ) (w w(s t )) starting at closest discrete node s = arg min.4 c A Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 9 / 44
58 Function Learning Parameterized SOM Example: Kinematics Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 2 / 44
59 Function Learning Unsupervised Kernel Regression Outline Representations 2 Function Learning Parameterized SOM Unsupervised Kernel Regression 3 Perceptual Grouping 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 2 / 44
60 Function Learning Unsupervised Kernel Regression Unsupervised Kernel Regression [Klanke 26] learning of continuous non-linear manifolds generalizes from fixed PSOM grid employs unsupervised formulation of Nadaraya-Watson estimator Y = {y, y 2,..., y N } R d N X = {x, x 2,..., x N } R q N y = f(x; X) observed data latent parameters corresp. functional relationship f(x; X) = i K(x x i ) y i j K(x x j) K(x x i ) = N (, Σ) Gaussian kernel Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 2 / 44
61 Function Learning Unsupervised Kernel Regression UKR Learning Minimize reconstruction error R(X) = y m f (x m ; X) 2 N m init t = 3 t = 7 t = t = 4 t = 2 t = 5 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
62 Function Learning Unsupervised Kernel Regression UKR Learning Minimize reconstruction error R(X) = y m f m (x m ; X) 2 N m (leave-one-out CV) init t = 3 t = 7 t = t = 4 t = 2 t = 5 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
63 Function Learning Unsupervised Kernel Regression Learned Manipulation Manifold Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
64 Function Learning Unsupervised Kernel Regression Manipulation Manifold in action Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
65 Function Learning Unsupervised Kernel Regression Shadow Robot Hand Opening a Bottle Cap [Humanoids 2] Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
66 Perceptual Grouping Outline Representations 2 Function Learning 3 Perceptual Grouping Gestalt Laws Competitive Layer Model Kuramoto Oscillator Network Comparison Learning 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
67 Perceptual Grouping Gestalt Laws Outline Representations 2 Function Learning 3 Perceptual Grouping Gestalt Laws Competitive Layer Model Kuramoto Oscillator Network Comparison Learning 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
68 Perceptual Grouping Gestalt Laws Perceptual Grouping Colour Segmentation Texture Segmentation Segmenting Cell Images Contour Grouping Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
69 Perceptual Grouping Gestalt Laws Gestalt Laws [Wertheimer 923] proximity similarity closure continuation Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
70 Perceptual Grouping Gestalt Laws Gestalt Laws [Wertheimer 923] proximity similarity closure continuation attraction repulsion Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
71 Perceptual Grouping Gestalt Laws Interaction Matrix compatibility of feature pairs induces interaction matrix attraction: positive feedback repulsion: negative feedback Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
72 Perceptual Grouping Gestalt Laws Interaction Matrix target segmentation compatibility of feature pairs induces interaction matrix block structure defines groups interaction matrix Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
73 Perceptual Grouping Gestalt Laws Interaction Matrix target segmentation compatibility of feature pairs induces interaction matrix block structure defines groups real features unsorted interaction matrix Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
74 Perceptual Grouping Gestalt Laws Interaction Matrix target segmentation compatibility of feature pairs induces interaction matrix block structure defines groups real features unsorted and noisy interaction matrix Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
75 Perceptual Grouping Gestalt Laws Interaction Matrix target segmentation compatibility of feature pairs induces interaction matrix block structure defines groups real features unsorted and noisy robust grouping dynamics interaction matrix Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
76 Perceptual Grouping Competitive Layer Model Outline Representations 2 Function Learning 3 Perceptual Grouping Gestalt Laws Competitive Layer Model Kuramoto Oscillator Network Comparison Learning 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
77 Perceptual Grouping Competitive Layer Model Competitive Layer Model (CLM) [Ritter 99] input: set of features input: feature vector m r examples: position: m r = (x r, y r ) T oriented line features: m r = (x r, y r, φ r ) T Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
78 Perceptual Grouping Competitive Layer Model Competitive Layer Model (CLM) [Ritter 99] 2 layered architecture of neurons neurons x rα L layers α =,..., L columns r of neurons activation: linear threshold σ(x) = max(, x) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
79 Perceptual Grouping Competitive Layer Model Competitive Layer Model (CLM) [Ritter 99] 3 goal: grouping as activation within layers label ˆα(r) : ˆα(r) = : ˆα(r) = 2 : ˆα(r) = L Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
80 Perceptual Grouping Competitive Layer Model Competitive Layer Model (CLM) [Ritter 99] 4 lateral compatibility interaction f rr induces grouping lateral interaction f rr f rr compatibility of feature pair m r m r Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
81 Perceptual Grouping Competitive Layer Model Competitive Layer Model (CLM) [Ritter 99] 5 vertical competition: pushes groups to layers vertical inhibition J Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
82 Perceptual Grouping Competitive Layer Model Competitive Layer Model (CLM) [Ritter 99] 6 column-wise stimulation with feature-dependent bias bias h r overall activity per column significance of feature m r Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
83 Perceptual Grouping Competitive Layer Model Competitive Layer Model (CLM) [Ritter 99] 6 simulation of recurrent dynamics overall dynamics ẋ rα = x rα + ( σ J(h r x rβ ) + β r f rr x r α ) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
84 Perceptual Grouping Kuramoto Oscillator Network Outline Representations 2 Function Learning 3 Perceptual Grouping Gestalt Laws Competitive Layer Model Kuramoto Oscillator Network Comparison Learning 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 3 / 44
85 Perceptual Grouping Kuramoto Oscillator Network Kuramoto Oscillator Network [Meier 23] more efficient grouping dynamics based on coupled oscillators phase θ and frequency ω phase coupling by f rr O O 2 ψ 2 r 2 phase coupling phase coupling O 3 ω L ω 2 θ ω θ r = ω r + K N N f rr sin(θ r θ r ) r = v v 2 v 3 input Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 3 / 44
86 Perceptual Grouping Kuramoto Oscillator Network Kuramoto Oscillator Network [Meier 23] more efficient grouping dynamics based on coupled oscillators phase θ and frequency ω phase coupling by f rr phase similarity determines frequency grouping O O 2 ψ 2 r 2 phase coupling phase coupling O 3 ω L ω 2 similar features share same frequency θ ω θ r = ω r + K N N f rr sin(θ r θ r ) r = ω r ( = ω argmax α r N (α) f rr ( cos(θr θ r ) + )) 2 v v 2 v 3 input Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 3 / 44
87 Perceptual Grouping Kuramoto Oscillator Network Example: Contour Grouping Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 3 / 44
88 Perceptual Grouping Comparison Outline Representations 2 Function Learning 3 Perceptual Grouping Gestalt Laws Competitive Layer Model Kuramoto Oscillator Network Comparison Learning 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
89 Perceptual Grouping Comparison Evaluation: CLM vs. Oscillator Network groups á features layers resp. frequencies (only needed) different amount of noise in interaction matrices % - % - 2% - 3% - 4% - Figure : Interaction matrices with different amounts of inverted interaction values. Black pixel represents attraction, white is repelling. Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
90 Perceptual Grouping Comparison Evaluation Results similar grouping quality grouping quality, mean and stddev CLM Oscillators % of noise Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
91 Perceptual Grouping Comparison Evaluation Results similar grouping quality reduced computational complexity increased convergence speed # of update steps, mean and stddev CLM Oscillators CLM Oscillators % of noise % of noise Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / # of update steps, mean and stddev
92 Perceptual Grouping Comparison Evaluation Results similar grouping quality reduced computational complexity increased convergence speed faster recovery on changing interaction matrix (splitting from to 2 groups after initial convergence) average number of steps CLM Oscillators % of noise Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
93 Perceptual Grouping Learning Outline Representations 2 Function Learning 3 Perceptual Grouping Gestalt Laws Competitive Layer Model Kuramoto Oscillator Network Comparison Learning 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
94 Perceptual Grouping Learning How to integrate learning? recurrent dynamics robustly creates grouping dynamics determined by interaction matrix f rr How to learn compatibilities? Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
95 Perceptual Grouping Learning How to integrate learning? recurrent dynamics robustly creates grouping dynamics determined by interaction matrix f rr How to learn compatibilities? CLM dynamics extremly robust wrt. noise in interaction only learn coarse interaction matrix Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
96 Perceptual Grouping Learning Learning Architecture [Weng 26] Compute more general distance function on feature pairs Learn distance prototypes from labeled samples (VQ) Exploit labels to count pos./neg. weighted prototypes Overview: labeled training data proximity vector d(m r, m r' ) Voronoi cell proximity prototype proximity space D positive coefficents + 2 clustering 3 f rr' > f rr' < negative coefficents interaction function Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
97 Imitation Learning Outline Representations 2 Function Learning 3 Perceptual Grouping 4 Imitation Learning Dynamic Movement Primitives Reinforcement Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
98 Imitation Learning Imitation Learning learn from observations How to observe actions? Which elements are important? What to learn? How to represent observed actions? improving + adapting motion autonomous exploration Reinforcement Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
99 Imitation Learning Imitation Learning learn from observations How to observe actions? Which elements are important? What to learn? How to represent observed actions? Dynamic Movement Primitives improving + adapting motion autonomous exploration Reinforcement Learning Policy Improvement with Path Integrals (PI 2 ) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
100 Imitation Learning Dynamic Movement Primitives Outline Representations 2 Function Learning 3 Perceptual Grouping 4 Imitation Learning Dynamic Movement Primitives Reinforcement Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
101 Imitation Learning Dynamic Movement Primitives Dynamic Movement Primitives [Ijspeert, Nakanishi, Schaal, ICRA 2] spring-damper system generates basic motion towards goal τẍ t = k (g x t ) c ẋ t v Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
102 Imitation Learning Dynamic Movement Primitives Dynamic Movement Primitives [Ijspeert, Nakanishi, Schaal, ICRA 2] spring-damper system generates basic motion towards goal τẍ t = k (g x t ) c ẋ t Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
103 Imitation Learning Dynamic Movement Primitives Dynamic Movement Primitives [Ijspeert, Nakanishi, Schaal, ICRA 2] add external force to represent complex trajectory shapes τẍ t = k (g x t ) c ẋ t + g T t θ Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
104 Imitation Learning Dynamic Movement Primitives Dynamic Movement Primitives [Ijspeert, Nakanishi, Schaal, ICRA 2] add external force to represent complex trajectory shapes τẍ t = k (g x t ) c ẋ t + g T t θ Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
105 Imitation Learning Dynamic Movement Primitives Dynamic Movement Primitives [Ijspeert, Nakanishi, Schaal, ICRA 2] add external force to represent complex trajectory shapes τẍ t = k (g x t ) c ẋ t + g T t θ Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
106 Imitation Learning Dynamic Movement Primitives External Force f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x ) s weighted sum of basis functions ψ i Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
107 Imitation Learning Dynamic Movement Primitives External Force f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x ) s weighted sum of basis functions ψ i soft-max normalization Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
108 Imitation Learning Dynamic Movement Primitives External Force f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x ) s weighted sum of basis functions ψ i soft-max normalization amplitude scaled by initial distance to goal Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
109 Imitation Learning Dynamic Movement Primitives External Force f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x ) s weighted sum of basis functions ψ i soft-max normalization amplitude scaled by initial distance to goal influence weighted by canonical time s Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
110 Imitation Learning Dynamic Movement Primitives External Force f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x ) s weighted sum of basis functions ψ i soft-max normalization amplitude scaled by initial distance to goal influence weighted by canonical time s Gaussian basis functions ψ i ψ i (s) = exp( h i (s c i ) 2 ) c i logarithmically distributed in [... ] s 2 [Schaal et al, Brain Research, 27] Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
111 Imitation Learning Dynamic Movement Primitives Canonical System f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x ) s decouple external force from spring-damper evolution new phase / time variable s τṡ = α s s initially set to exponentially converges to Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
112 Imitation Learning Dynamic Movement Primitives Canonical System f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x ) s decouple external force from spring-damper evolution new phase / time variable s τṡ = α s + α c (x actual x expected ) 2 s initially set to exponentially converges to pause influence of force on perturbations Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
113 Imitation Learning Dynamic Movement Primitives Properties τẍ t = k (g x t ) c ẋ t + g T t θ τṡ = α s f (s) = gt T θ = i θ iψ i (s) i ψ i(s) (g x ) s convergence to goal g motions are self-similar for different goal or start points coupling of multiple DOF through canonical phase s adapt τ for temporal scaling robust to perturbations due to attractor dynamics decoupling basic goal-directed motion from task-specific trajectory shape weights θ i can be learned with linear regression Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44
114 Imitation Learning Dynamic Movement Primitives Learning from Demonstration record motion x(t), ẋ(t), ẍ(t) choose τ to match duration evolve canonical system s(t) f target (s) = τẍ(t) k(g x(t) cẋ(t)) minimize E = s (f target(s) f (s)) 2 with regression X X.5.5 X X [Ijspeert, Nakanishi, Schaal, ICRA 2] Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44
115 Imitation Learning Reinforcement Learning Outline Representations 2 Function Learning 3 Perceptual Grouping 4 Imitation Learning Dynamic Movement Primitives Reinforcement Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
116 Imitation Learning Reinforcement Learning Robustifying Motions motion from imitation learning is fragile robustify by self-exploration competing RL methods: PI 2 Policy Improvement with Path Integrals [Stefan Schaal] PoWeR Policy Learning by Weighting Exploration with the Returns [Jan Peters] Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
117 Imitation Learning Reinforcement Learning Policy Improvement with Path Integrals PI 2 [Evangelos 2] Optimize shape parameters θ w.r.t. cost function J Use direct reinforcement learning Exploration directly in policy parameter space θ Use Policy Improvement with Path Integrals PI 2 Derived from principles of optimal control Update rule based on cost-weighted averaging (next slide) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
118 Input: DMP with initial parameters θ Imitation Learning Reinforcement Learning [Figures from Stulp] τẍ t = k(g x t ) cẋ t ) + g T t θ Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
119 Imitation Learning Input: DMP with initial parameters θ, cost function J Reinforcement Learning [Figures from Stulp] τẍ t = k(g x t ) cẋ t ) J(τ i ) + g T t θ Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
120 Imitation Learning Reinforcement Learning Input: DMP with initial parameters θ, cost function J While (cost not converged) [Figures from Stulp] Explore τẍ t = k(g x t ) cẋ t ) J(τ i ) + g T t θ Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
121 Imitation Learning Reinforcement Learning Input: DMP with initial parameters θ, cost function J While (cost not converged) [Figures from Stulp] Explore sample exploration vectors τẍ t = k(g x t ) cẋ t ) J(τ i ) + g T t (θ + ɛ i,k ) ɛ i,k N (, Σ) θ k = θ + ɛ k Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
122 Imitation Learning Reinforcement Learning Input: DMP with initial parameters θ, cost function J While (cost not converged) [Figures from Stulp] Explore sample exploration vectors execute DMP τẍ t = k(g x t ) cẋ t ) J(τ i ) + g T t (θ + ɛ i,k ) ɛ i,k N (, Σ) θ k = θ + ɛ k Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
123 Imitation Learning Reinforcement Learning Input: DMP with initial parameters θ, cost function J While (cost not converged) [Figures from Stulp] Explore sample exploration vectors execute DMP determine cost τẍ t = k(g x t ) cẋ t ) J(τ i ) + g T t (θ + ɛ i,k ) ɛ i,k N (, Σ) θ k = θ + ɛ k Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44
124 Imitation Learning Reinforcement Learning Input: DMP with initial parameters θ, cost function J While (cost not converged) [Figures from Stulp] Explore sample exploration vectors execute DMP determine cost Update weighted averaging with Boltzmann dist. τẍ t = k(g x t ) cẋ t ) J(τ i ) + g T t (θ + ɛ i,k ) exp( λ J(τ i,k )) ɛ i,k N (, Σ) P(τ i,k) = k exp( λ J(τ i,k )) θ k = θ + ɛ k K θ ti = P(τ i,k )ɛ i,k Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44 k=
125 Imitation Learning Reinforcement Learning Input: DMP with initial parameters θ, cost function J While (cost not converged) [Figures from Stulp] Explore sample exploration vectors execute DMP determine cost Update weighted averaging with Boltzmann dist. parameter update τẍ t = k(g x t ) cẋ t ) J(τ i ) + g T t (θ + ɛ i,k ) ɛ i,k N (, Σ) θ k = θ + ɛ k θ θ + θ P(τ i,k ) = exp( λ J(τ i,k )) k exp( λ J(τ i,k )) K θ ti = P(τ i,k )ɛ i,k Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44 k=
126 Imitation Learning Reinforcement Learning Input: DMP with initial parameters θ, cost function J While (cost not converged) [Figures from Stulp] Explore sample exploration vectors execute DMP determine cost Update weighted averaging with Boltzmann dist. parameter update τẍ t = k(g x t ) cẋ t ) J(τ i ) + g T t (θ + ɛ i,k ) ɛ i,k N (, Σ) θ k = θ + ɛ k θ θ + θ P(τ i,k ) = exp( λ J(τ i,k )) k exp( λ J(τ i,k )) K θ ti = P(τ i,k )ɛ i,k Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44 k=
Learning Manipulation Patterns
Introduction Dynamic Movement Primitives Reinforcement Learning Neuroinformatics Group February 29, 2012 Introduction Dynamic Movement Primitives Reinforcement Learning Motivation shift to reactive motion
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationRobotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning
Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning Stefan Schaal Max-Planck-Institute for Intelligent Systems Tübingen, Germany & Computer Science, Neuroscience, &
More informationBits of Machine Learning Part 1: Supervised Learning
Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationLecture 13: Dynamical Systems based Representation. Contents:
Lecture 13: Dynamical Systems based Representation Contents: Differential Equation Force Fields, Velocity Fields Dynamical systems for Trajectory Plans Generating plans dynamically Fitting (or modifying)
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationPath Integral Stochastic Optimal Control for Reinforcement Learning
Preprint August 3, 204 The st Multidisciplinary Conference on Reinforcement Learning and Decision Making RLDM203 Path Integral Stochastic Optimal Control for Reinforcement Learning Farbod Farshidian Institute
More informationTrajectory encoding methods in robotics
JOŽEF STEFAN INTERNATIONAL POSTGRADUATE SCHOOL Humanoid and Service Robotics Trajectory encoding methods in robotics Andrej Gams Jožef Stefan Institute Ljubljana, Slovenia andrej.gams@ijs.si 12. 04. 2018
More informationPILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More informationReading Group on Deep Learning Session 1
Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationIntroduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen
Neural Networks - I Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Neural Networks 1 /
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationBayesian Linear Regression. Sargur Srihari
Bayesian Linear Regression Sargur srihari@cedar.buffalo.edu Topics in Bayesian Regression Recall Max Likelihood Linear Regression Parameter Distribution Predictive Distribution Equivalent Kernel 2 Linear
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationA Probabilistic Representation for Dynamic Movement Primitives
A Probabilistic Representation for Dynamic Movement Primitives Franziska Meier,2 and Stefan Schaal,2 CLMC Lab, University of Southern California, Los Angeles, USA 2 Autonomous Motion Department, MPI for
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
More informationBasic Principles of Unsupervised and Unsupervised
Basic Principles of Unsupervised and Unsupervised Learning Toward Deep Learning Shun ichi Amari (RIKEN Brain Science Institute) collaborators: R. Karakida, M. Okada (U. Tokyo) Deep Learning Self Organization
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V
More informationIntroduction to Gaussian Process
Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationComputer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression
Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.
More informationLecture 6. Regression
Lecture 6. Regression Prof. Alan Yuille Summer 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationFully Bayesian Deep Gaussian Processes for Uncertainty Quantification
Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification N. Zabaras 1 S. Atkinson 1 Center for Informatics and Computational Science Department of Aerospace and Mechanical Engineering University
More informationStatistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes
Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan 1, M. Koval and P. Parashar 1 Applications of Gaussian
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationLinear Models for Regression. Sargur Srihari
Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood
More informationTutorial on Policy Gradient Methods. Jan Peters
Tutorial on Policy Gradient Methods Jan Peters Outline 1. Reinforcement Learning 2. Finite Difference vs Likelihood-Ratio Policy Gradients 3. Likelihood-Ratio Policy Gradients 4. Conclusion General Setup
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationMachine Learning - MT & 5. Basis Expansion, Regularization, Validation
Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships
More informationSeveral ways to solve the MSO problem
Several ways to solve the MSO problem J. J. Steil - Bielefeld University - Neuroinformatics Group P.O.-Box 0 0 3, D-3350 Bielefeld - Germany Abstract. The so called MSO-problem, a simple superposition
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationGaussian Process Regression
Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers
Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:
More informationLearning Gaussian Process Models from Uncertain Data
Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationLinear discriminant functions
Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative
More informationMachine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart
Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationModel Selection for Gaussian Processes
Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal
More informationOutline Lecture 2 2(32)
Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic
More informationLecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu
Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationReliability Monitoring Using Log Gaussian Process Regression
COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical
More informationLinear Models for Regression
Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationDeep Feedforward Networks
Deep Feedforward Networks Yongjin Park 1 Goal of Feedforward Networks Deep Feedforward Networks are also called as Feedforward neural networks or Multilayer Perceptrons Their Goal: approximate some function
More informationVasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks
C.M. Bishop s PRML: Chapter 5; Neural Networks Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations,
More informationArtificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino
Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as
More informationLinear Models for Regression
Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationMachine Learning! in just a few minutes. Jan Peters Gerhard Neumann
Machine Learning! in just a few minutes Jan Peters Gerhard Neumann 1 Purpose of this Lecture Foundations of machine learning tools for robotics We focus on regression methods and general principles Often
More informationDeep Feedforward Networks. Sargur N. Srihari
Deep Feedforward Networks Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation
More informationPattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods
Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter
More informationLECTURE NOTE #NEW 6 PROF. ALAN YUILLE
LECTURE NOTE #NEW 6 PROF. ALAN YUILLE 1. Introduction to Regression Now consider learning the conditional distribution p(y x). This is often easier than learning the likelihood function p(x y) and the
More informationLearning Deep Architectures for AI. Part II - Vijay Chakilam
Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model
More informationGaussian with mean ( µ ) and standard deviation ( σ)
Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationProbabilistic Model-based Imitation Learning
Probabilistic Model-based Imitation Learning Peter Englert, Alexandros Paraschos, Jan Peters,2, and Marc Peter Deisenroth Department of Computer Science, Technische Universität Darmstadt, Germany. 2 Max
More informationNeural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28
1 / 28 Neural Networks Mark van Rossum School of Informatics, University of Edinburgh January 15, 2018 2 / 28 Goals: Understand how (recurrent) networks behave Find a way to teach networks to do a certain
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationPractical Bayesian Optimization of Machine Learning. Learning Algorithms
Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that
More information1 Machine Learning Concepts (16 points)
CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationArtificial Neural Networks
Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Knowledge
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationGaussian Processes. 1 What problems can be solved by Gaussian Processes?
Statistical Techniques in Robotics (16-831, F1) Lecture#19 (Wednesday November 16) Gaussian Processes Lecturer: Drew Bagnell Scribe:Yamuna Krishnamurthy 1 1 What problems can be solved by Gaussian Processes?
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II
1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector
More informationA summary of Deep Learning without Poor Local Minima
A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given
More informationPolicy Search for Path Integral Control
Policy Search for Path Integral Control Vicenç Gómez 1,2, Hilbert J Kappen 2, Jan Peters 3,4, and Gerhard Neumann 3 1 Universitat Pompeu Fabra, Barcelona Department of Information and Communication Technologies,
More informationAdvanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting)
Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Professor: Aude Billard Assistants: Nadia Figueroa, Ilaria Lauzana and Brice Platerrier E-mails: aude.billard@epfl.ch,
More informationStatistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks
Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Jan Drchal Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science Topics covered
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationHierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References
24th March 2011 Update Hierarchical Model Rao and Ballard (1999) presented a hierarchical model of visual cortex to show how classical and extra-classical Receptive Field (RF) effects could be explained
More informationTutorial on Gaussian Processes and the Gaussian Process Latent Variable Model
Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,
More informationLearning Motor Skills from Partially Observed Movements Executed at Different Speeds
Learning Motor Skills from Partially Observed Movements Executed at Different Speeds Marco Ewerton 1, Guilherme Maeda 1, Jan Peters 1,2 and Gerhard Neumann 3 Abstract Learning motor skills from multiple
More informationProbabilistic Model-based Imitation Learning
Probabilistic Model-based Imitation Learning Peter Englert, Alexandros Paraschos, Jan Peters,2, and Marc Peter Deisenroth Department of Computer Science, Technische Universität Darmstadt, Germany. 2 Max
More informationOptimal Control with Learned Forward Models
Optimal Control with Learned Forward Models Pieter Abbeel UC Berkeley Jan Peters TU Darmstadt 1 Where we are? Reinforcement Learning Data = {(x i, u i, x i+1, r i )}} x u xx r u xx V (x) π (u x) Now V
More informationDropout as a Bayesian Approximation: Insights and Applications
Dropout as a Bayesian Approximation: Insights and Applications Yarin Gal and Zoubin Ghahramani Discussion by: Chunyuan Li Jan. 15, 2016 1 / 16 Main idea In the framework of variational inference, the authors
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationCSC411: Final Review. James Lucas & David Madras. December 3, 2018
CSC411: Final Review James Lucas & David Madras December 3, 2018 Agenda 1. A brief overview 2. Some sample questions Basic ML Terminology The final exam will be on the entire course; however, it will be
More informationEfficient Inference in Fully Connected CRFs with Gaussian Edge Potentials
Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials by Phillip Krahenbuhl and Vladlen Koltun Presented by Adam Stambler Multi-class image segmentation Assign a class label to each
More informationARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92
ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000
More informationGaussian Processes (10/16/13)
STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationMultiple-step Time Series Forecasting with Sparse Gaussian Processes
Multiple-step Time Series Forecasting with Sparse Gaussian Processes Perry Groot ab Peter Lucas a Paul van den Bosch b a Radboud University, Model-Based Systems Development, Heyendaalseweg 135, 6525 AJ
More informationGaussian Process Latent Variable Models for Dimensionality Reduction and Time Series Modeling
Gaussian Process Latent Variable Models for Dimensionality Reduction and Time Series Modeling Nakul Gopalan IAS, TU Darmstadt nakul.gopalan@stud.tu-darmstadt.de Abstract Time series data of high dimensions
More informationMachine Learning Lecture 12
Machine Learning Lecture 12 Neural Networks 30.11.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory Probability
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationUsing Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *
Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 1. pp. 87 94. Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms
More information