CITEC SummerSchool 2013 Learning From Physics to Knowledge Selected Learning Methods

Size: px
Start display at page:

Download "CITEC SummerSchool 2013 Learning From Physics to Knowledge Selected Learning Methods"

Transcription

1 CITEC SummerSchool 23 Learning From Physics to Knowledge Selected Learning Methods Robert Haschke Neuroinformatics Group CITEC, Bielefeld University September, th 23 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 / 44

2 Outline Representations 2 Function Learning 3 Perceptual Grouping 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 2 / 44

3 Representations Outline Representations Feature Space Expansion Gaussian Processes Reservoir Computing 2 Function Learning 3 Perceptual Grouping 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 3 / 44

4 Representations Feature Space Expansion Feature Space Expansion High-dimensional feature spaces facilitate computations, e.g. enable linear separability of class regions polynomial expansion: (x..x d ) (x..x d,.., x i x j.., x i x j x k ) time history: use (x t, x t... x t k ) R d (k+) filters temporal convolution with kernels: x(t) = K(t, t ) x(t ) dt Kernel trick Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 3 / 44

5 Representations Feature Space Expansion Kernel Trick linear regressors or perceptrons often have the form y(x) = w t x = N λ α x α x α= Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44

6 Representations Feature Space Expansion Kernel Trick linear regressors or perceptrons often have the form y(x) = w t x = N α= λ α x α x = α w α φ(x α ) φ(x) goal: introduce non-linear, high-dimensional features φ k (x) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44

7 Representations Feature Space Expansion Kernel Trick linear regressors or perceptrons often have the form y(x) = w t x = N α= λ α x α x = α w α φ(x α ) φ(x) = α w α K(x α, x) goal: introduce non-linear, high-dimensional features φ k (x) kernel directly computes scalar product of feature vectors Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44

8 Representations Feature Space Expansion Kernel Trick linear regressors or perceptrons often have the form y(x) = w t x = N α= λ α x α x = α w α φ(x α ) φ(x) = α w α K(x α, x) goal: introduce non-linear, high-dimensional features φ k (x) kernel directly computes scalar product of feature vectors typical examples, e.g. in support vector machines (SVM): polynomial kernel: K(x, x ) = (x x + ) p Gaussian kernel: K(x, x ) = exp( x x 2 2σ 2 ) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44

9 Representations Feature Space Expansion Example: Bayesian Linear Regression f (x, w) = w t φ(x) y = f (x, w) + η Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 5 / 44

10 Representations Feature Space Expansion Example: Bayesian Linear Regression f (x, w) = w t φ(x) p(y x, w) = N (w t φ(x), β ) y = f (x, w) + η Gaussian data distribution Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 5 / 44

11 Representations Feature Space Expansion Example: Bayesian Linear Regression f (x, w) = w t φ(x) y = f (x, w) + η p(y x, w) = N (w t φ(x), β ) Gaussian data distribution ŵ ML = (Φ t Φ) Φ t y Φ t = [φ(x ),..., φ(x N )] R d N (maximum likelihood estimator) y t = [y,..., y N ] R N Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 5 / 44

12 Representations Feature Space Expansion Example: Bayesian Linear Regression f (x, w) = w t φ(x) y = f (x, w) + η p(y x, w) = N (w t φ(x), β ) Gaussian data distribution ŵ ML = (Φ t Φ) Φ t y Φ t = [φ(x ),..., φ(x N )] R d N (maximum likelihood estimator) y t = [y,..., y N ] R N p(w) = N (, α ) Gaussian a-priori distribution Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 5 / 44

13 Representations Feature Space Expansion Example: Bayesian Linear Regression f (x, w) = w t φ(x) y = f (x, w) + η p(y x, w) = N (w t φ(x), β ) Gaussian data distribution ŵ ML = (Φ t Φ) Φ t y Φ t = [φ(x ),..., φ(x N )] R d N (maximum likelihood estimator) y t = [y,..., y N ] R N p(w) = N (, α ) p(w y) = N (m N, S N ) ŵ MAP = m N = βs N Φ t y S N = (α + βφ t Φ) Gaussian a-priori distribution a-posteriori distribution MAP estimator R d d Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 5 / 44

14 Representations Feature Space Expansion Example: Bayesian Linear Regression f (x, w) = w t φ(x) y = f (x, w) + η p(y x, w) = N (w t φ(x), β ) Gaussian data distribution ŵ ML = (Φ t Φ) Φ t y Φ t = [φ(x ),..., φ(x N )] R d N (maximum likelihood estimator) y t = [y,..., y N ] R N p(w) = N (, α ) p(w y) = N (m N, S N ) ŵ MAP = m N = βs N Φ t y S N = (α + βφ t Φ) Gaussian a-priori distribution a-posteriori distribution MAP estimator R d d ŷ(x) = φ(x) t ŵ MAP = β φ(x) t S N Φ t y = α β φ(x) t S N φ(x α ) y α Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 5 / 44

15 Representations Feature Space Expansion Example: Bayesian Linear Regression f (x, w) = w t φ(x) y = f (x, w) + η p(y x, w) = N (w t φ(x), β ) Gaussian data distribution ŵ ML = (Φ t Φ) Φ t y Φ t = [φ(x ),..., φ(x N )] R d N (maximum likelihood estimator) y t = [y,..., y N ] R N p(w) = N (, α ) p(w y) = N (m N, S N ) ŵ MAP = m N = βs N Φ t y S N = (α + βφ t Φ) Gaussian a-priori distribution a-posteriori distribution MAP estimator R d d ŷ(x) = φ(x) t ŵ MAP = β φ(x) t S N Φ t y = α β φ(x) t S N φ(x α ) y α = α K(x, x α ) y α K(x, x α ) = βφ(x) t S N φ(x α ) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 5 / 44

16 Representations Gaussian Processes Outline Representations Feature Space Expansion Gaussian Processes Reservoir Computing 2 Function Learning 3 Perceptual Grouping 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

17 Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

18 Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

19 Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

20 Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

21 Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

22 Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

23 Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

24 Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

25 Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

26 Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

27 Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

28 Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

29 Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

30 Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

31 Representations Gaussian Processes Gaussian Processes predictive distribution of Bayesian Linear Regression: p(y x) = N (ŵmap t φ(x), σ2 N (x)) σn 2 (x) = β + φ(x) t S N φ(x) variance: data noise + uncertainty of ŵ MAP Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

32 Representations Gaussian Processes Gaussian Processes distribution over parameters w induces distribution over estimator functions idea of Gaussian Processes: directly operate on function distributions, skipping the parameter representation Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 7 / 44

33 Representations Gaussian Processes Gaussian Processes distribution over parameters w induces distribution over estimator functions idea of Gaussian Processes: directly operate on function distributions, skipping the parameter representation Definition A Gaussian process is a collection of random variables, any finite collection of which has a joint Gaussian distribution. A Gaussian process is fully specified by a mean function f (x) and covariance function k(x, x ). Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 7 / 44

34 Representations Gaussian Processes Gaussian Processes distribution over parameters w induces distribution over estimator functions idea of Gaussian Processes: directly operate on function distributions, skipping the parameter representation Definition A Gaussian process is a collection of random variables, any finite collection of which has a joint Gaussian distribution. A Gaussian process is fully specified by a mean function f (x) and covariance function k(x, x ). Example linear model f (x) = w t φ(x) with Gaussian prior p(w) = N (, α ) any tuple [f (x )... f (x N )] has Gaussian distribution (as linear combination of Gaussian distributed variables w) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 7 / 44

35 Representations Gaussian Processes Gaussian Processes Definition A Gaussian process is a collection of random variables, any finite collection of which has a joint Gaussian distribution. A Gaussian process is fully specified by a mean function f (x) and covariance function k(x, x ). Example linear model f (x) = w t φ(x) with Gaussian prior p(w) = N (, α ) any tuple [f (x )... f (x N )] has Gaussian distribution (as linear combination of Gaussian distributed variables w) Specification: mean + covariance function mean: E[f (x)] = E[w t ] φ(x) = covariance: E[f (x)f (x )] = φ(x) t E[ww t ] φ(x ) = α φ(x) t φ(x ) k(x, x ) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 7 / 44

36 Representations Gaussian Processes Gaussian Process Regression How we can exploit GPs for regression? N training points x... x N induce a Gaussian distribution of associated function values f N = [f (x )... f (x N )]. Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 8 / 44

37 Representations Gaussian Processes Gaussian Process Regression How we can exploit GPs for regression? N training points x... x N induce a Gaussian distribution of associated function values f N = [f (x )... f (x N )]. Embedding an additional N +-th query point x N+ again yields a Gaussian distribution, now of f N+ = [f (x )... f (x N ), f (x N+ )]. Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 8 / 44

38 Representations Gaussian Processes Gaussian Process Regression How we can exploit GPs for regression? N training points x... x N induce a Gaussian distribution of associated function values f N = [f (x )... f (x N )]. Embedding an additional N +-th query point x N+ again yields a Gaussian distribution, now of f N+ = [f (x )... f (x N ), f (x N+ )]. Evaluate predictive distribution p(f (x N+ ) f (x )... f (x N )). Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 8 / 44

39 Representations Gaussian Processes Gaussian Process Regression Computational Steps p(f N+ ) = N (, K N+ ) ( ) KN k K N+ = k t R (N+) (N+) c K N (x n, x m ) = k(x n, x m ) k = [k(x, x N+ )... k(x N, x N+ )] R N c = k(x N+, x N+ ) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 9 / 44

40 Representations Gaussian Processes Gaussian Process Regression Computational Steps p(f N+ ) = N (, K N+ ) ( ) KN k K N+ = k t R (N+) (N+) c K N (x n, x m ) = k(x n, x m ) predictive distribution k = [k(x, x N+ )... k(x N, x N+ )] R N c = k(x N+, x N+ ) p(f N+ f N ) is again Gaussian with mean µ(x N+ ) = k t K N f N and variance σ 2 (x N+ ) = c k t K N k Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 9 / 44

41 Representations Reservoir Computing Outline Representations Feature Space Expansion Gaussian Processes Reservoir Computing 2 Function Learning 3 Perceptual Grouping 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 / 44

42 Representations Reservoir Computing Multilayer Perceptron hidden layer serves as high-dimensional feature space back propagation creates optimal features but: input weights adapt only slowly Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 / 44

43 Representations Reservoir Computing Extreme Learning Machine simplification: fixed, random input features linear readout facilitates learning: regression methods Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 / 44

44 Representations Reservoir Computing Reservoir Computing recurrent reservoir allows for temporal dynamics even more rich feature space linear readout: regression Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 2 / 44

45 Representations Reservoir Computing Reservoir Computing recurrent reservoir allows for temporal dynamics even more rich feature space linear readout: regression echo state network liquid state machine backpropagation-decorrelation Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 2 / 44

46 Representations Reservoir Computing Reservoir Computing recurrent reservoir allows for temporal dynamics even more rich feature space linear readout: regression echo state network liquid state machine backpropagation-decorrelation How to tune the reservoir? echo state property: activity decays without excitation Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 2 / 44

47 Representations Reservoir Computing Associative Reservoir Network [Steil] learning in both directions input forcing bidirectional association forward mapping inverse mapping Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 3 / 44

48 Function Learning Outline Representations 2 Function Learning Parameterized SOM Unsupervised Kernel Regression 3 Perceptual Grouping 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44

49 Function Learning Parameterized SOM Outline Representations 2 Function Learning Parameterized SOM Unsupervised Kernel Regression 3 Perceptual Grouping 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44

50 Function Learning Parameterized SOM Parameterized SOM [Ritter 993] fast learning of functions and inverses generalization of self-organizing maps (SOM) from discrete to continuous manifolds c c 3 33 x 3 M w(c) x 2 discrete mapping x C w c Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44

51 Function Learning Parameterized SOM Parameterized SOM [Ritter 993] fast learning of functions and inverses generalization of self-organizing maps (SOM) from discrete to continuous manifolds c c 3 33 x 3 M w(s) x 2 s 2 s continuous mapping x S coordinate system S spanned by nodes c A w c manifold M in input space X Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44

52 Function Learning Parameterized SOM PSOM: Function Modeling linear superposition of basis functions H(c, s): w(s) = c A H(c, s)w c H(c, s) should form an orthonormal basis system: H(c, c ) = δ cc representing constant functions: s S H(c, s) = c A Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 5 / 44

53 Function Learning Parameterized SOM PSOM: Function Modeling simple polynomials along every grid dimension piecewise linear functions: m multiplicative combination of multiple grid dimensions: H(c, s) = h ν (s ν, A ν ) ν= Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

54 Function Learning Parameterized SOM PSOM: Function Modeling simple polynomials along every grid dimension Lagrange polynomials: h(c, s) = s c c c c A, c c m multiplicative combination of multiple grid dimensions: H(c, s) = h ν (s ν, A ν ) ν= Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 6 / 44

55 Function Learning Parameterized SOM linear basis functions Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 7 / 44

56 Function Learning Parameterized SOM quadratic basis functions Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 8 / 44

57 Function Learning Parameterized SOM Inverse Mapping find coordinates s closest to observation w s = arg min s S using gradient descent w w(s) s t+ = s t η s w(s t ) (w w(s t )) starting at closest discrete node s = arg min.4 c A Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 9 / 44

58 Function Learning Parameterized SOM Example: Kinematics Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 2 / 44

59 Function Learning Unsupervised Kernel Regression Outline Representations 2 Function Learning Parameterized SOM Unsupervised Kernel Regression 3 Perceptual Grouping 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 2 / 44

60 Function Learning Unsupervised Kernel Regression Unsupervised Kernel Regression [Klanke 26] learning of continuous non-linear manifolds generalizes from fixed PSOM grid employs unsupervised formulation of Nadaraya-Watson estimator Y = {y, y 2,..., y N } R d N X = {x, x 2,..., x N } R q N y = f(x; X) observed data latent parameters corresp. functional relationship f(x; X) = i K(x x i ) y i j K(x x j) K(x x i ) = N (, Σ) Gaussian kernel Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 2 / 44

61 Function Learning Unsupervised Kernel Regression UKR Learning Minimize reconstruction error R(X) = y m f (x m ; X) 2 N m init t = 3 t = 7 t = t = 4 t = 2 t = 5 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

62 Function Learning Unsupervised Kernel Regression UKR Learning Minimize reconstruction error R(X) = y m f m (x m ; X) 2 N m (leave-one-out CV) init t = 3 t = 7 t = t = 4 t = 2 t = 5 Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

63 Function Learning Unsupervised Kernel Regression Learned Manipulation Manifold Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

64 Function Learning Unsupervised Kernel Regression Manipulation Manifold in action Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

65 Function Learning Unsupervised Kernel Regression Shadow Robot Hand Opening a Bottle Cap [Humanoids 2] Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

66 Perceptual Grouping Outline Representations 2 Function Learning 3 Perceptual Grouping Gestalt Laws Competitive Layer Model Kuramoto Oscillator Network Comparison Learning 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

67 Perceptual Grouping Gestalt Laws Outline Representations 2 Function Learning 3 Perceptual Grouping Gestalt Laws Competitive Layer Model Kuramoto Oscillator Network Comparison Learning 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

68 Perceptual Grouping Gestalt Laws Perceptual Grouping Colour Segmentation Texture Segmentation Segmenting Cell Images Contour Grouping Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

69 Perceptual Grouping Gestalt Laws Gestalt Laws [Wertheimer 923] proximity similarity closure continuation Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

70 Perceptual Grouping Gestalt Laws Gestalt Laws [Wertheimer 923] proximity similarity closure continuation attraction repulsion Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

71 Perceptual Grouping Gestalt Laws Interaction Matrix compatibility of feature pairs induces interaction matrix attraction: positive feedback repulsion: negative feedback Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

72 Perceptual Grouping Gestalt Laws Interaction Matrix target segmentation compatibility of feature pairs induces interaction matrix block structure defines groups interaction matrix Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

73 Perceptual Grouping Gestalt Laws Interaction Matrix target segmentation compatibility of feature pairs induces interaction matrix block structure defines groups real features unsorted interaction matrix Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

74 Perceptual Grouping Gestalt Laws Interaction Matrix target segmentation compatibility of feature pairs induces interaction matrix block structure defines groups real features unsorted and noisy interaction matrix Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

75 Perceptual Grouping Gestalt Laws Interaction Matrix target segmentation compatibility of feature pairs induces interaction matrix block structure defines groups real features unsorted and noisy robust grouping dynamics interaction matrix Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

76 Perceptual Grouping Competitive Layer Model Outline Representations 2 Function Learning 3 Perceptual Grouping Gestalt Laws Competitive Layer Model Kuramoto Oscillator Network Comparison Learning 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

77 Perceptual Grouping Competitive Layer Model Competitive Layer Model (CLM) [Ritter 99] input: set of features input: feature vector m r examples: position: m r = (x r, y r ) T oriented line features: m r = (x r, y r, φ r ) T Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

78 Perceptual Grouping Competitive Layer Model Competitive Layer Model (CLM) [Ritter 99] 2 layered architecture of neurons neurons x rα L layers α =,..., L columns r of neurons activation: linear threshold σ(x) = max(, x) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

79 Perceptual Grouping Competitive Layer Model Competitive Layer Model (CLM) [Ritter 99] 3 goal: grouping as activation within layers label ˆα(r) : ˆα(r) = : ˆα(r) = 2 : ˆα(r) = L Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

80 Perceptual Grouping Competitive Layer Model Competitive Layer Model (CLM) [Ritter 99] 4 lateral compatibility interaction f rr induces grouping lateral interaction f rr f rr compatibility of feature pair m r m r Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

81 Perceptual Grouping Competitive Layer Model Competitive Layer Model (CLM) [Ritter 99] 5 vertical competition: pushes groups to layers vertical inhibition J Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

82 Perceptual Grouping Competitive Layer Model Competitive Layer Model (CLM) [Ritter 99] 6 column-wise stimulation with feature-dependent bias bias h r overall activity per column significance of feature m r Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

83 Perceptual Grouping Competitive Layer Model Competitive Layer Model (CLM) [Ritter 99] 6 simulation of recurrent dynamics overall dynamics ẋ rα = x rα + ( σ J(h r x rβ ) + β r f rr x r α ) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

84 Perceptual Grouping Kuramoto Oscillator Network Outline Representations 2 Function Learning 3 Perceptual Grouping Gestalt Laws Competitive Layer Model Kuramoto Oscillator Network Comparison Learning 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 3 / 44

85 Perceptual Grouping Kuramoto Oscillator Network Kuramoto Oscillator Network [Meier 23] more efficient grouping dynamics based on coupled oscillators phase θ and frequency ω phase coupling by f rr O O 2 ψ 2 r 2 phase coupling phase coupling O 3 ω L ω 2 θ ω θ r = ω r + K N N f rr sin(θ r θ r ) r = v v 2 v 3 input Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 3 / 44

86 Perceptual Grouping Kuramoto Oscillator Network Kuramoto Oscillator Network [Meier 23] more efficient grouping dynamics based on coupled oscillators phase θ and frequency ω phase coupling by f rr phase similarity determines frequency grouping O O 2 ψ 2 r 2 phase coupling phase coupling O 3 ω L ω 2 similar features share same frequency θ ω θ r = ω r + K N N f rr sin(θ r θ r ) r = ω r ( = ω argmax α r N (α) f rr ( cos(θr θ r ) + )) 2 v v 2 v 3 input Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 3 / 44

87 Perceptual Grouping Kuramoto Oscillator Network Example: Contour Grouping Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 3 / 44

88 Perceptual Grouping Comparison Outline Representations 2 Function Learning 3 Perceptual Grouping Gestalt Laws Competitive Layer Model Kuramoto Oscillator Network Comparison Learning 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

89 Perceptual Grouping Comparison Evaluation: CLM vs. Oscillator Network groups á features layers resp. frequencies (only needed) different amount of noise in interaction matrices % - % - 2% - 3% - 4% - Figure : Interaction matrices with different amounts of inverted interaction values. Black pixel represents attraction, white is repelling. Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

90 Perceptual Grouping Comparison Evaluation Results similar grouping quality grouping quality, mean and stddev CLM Oscillators % of noise Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

91 Perceptual Grouping Comparison Evaluation Results similar grouping quality reduced computational complexity increased convergence speed # of update steps, mean and stddev CLM Oscillators CLM Oscillators % of noise % of noise Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / # of update steps, mean and stddev

92 Perceptual Grouping Comparison Evaluation Results similar grouping quality reduced computational complexity increased convergence speed faster recovery on changing interaction matrix (splitting from to 2 groups after initial convergence) average number of steps CLM Oscillators % of noise Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

93 Perceptual Grouping Learning Outline Representations 2 Function Learning 3 Perceptual Grouping Gestalt Laws Competitive Layer Model Kuramoto Oscillator Network Comparison Learning 4 Imitation Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

94 Perceptual Grouping Learning How to integrate learning? recurrent dynamics robustly creates grouping dynamics determined by interaction matrix f rr How to learn compatibilities? Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

95 Perceptual Grouping Learning How to integrate learning? recurrent dynamics robustly creates grouping dynamics determined by interaction matrix f rr How to learn compatibilities? CLM dynamics extremly robust wrt. noise in interaction only learn coarse interaction matrix Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

96 Perceptual Grouping Learning Learning Architecture [Weng 26] Compute more general distance function on feature pairs Learn distance prototypes from labeled samples (VQ) Exploit labels to count pos./neg. weighted prototypes Overview: labeled training data proximity vector d(m r, m r' ) Voronoi cell proximity prototype proximity space D positive coefficents + 2 clustering 3 f rr' > f rr' < negative coefficents interaction function Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

97 Imitation Learning Outline Representations 2 Function Learning 3 Perceptual Grouping 4 Imitation Learning Dynamic Movement Primitives Reinforcement Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

98 Imitation Learning Imitation Learning learn from observations How to observe actions? Which elements are important? What to learn? How to represent observed actions? improving + adapting motion autonomous exploration Reinforcement Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

99 Imitation Learning Imitation Learning learn from observations How to observe actions? Which elements are important? What to learn? How to represent observed actions? Dynamic Movement Primitives improving + adapting motion autonomous exploration Reinforcement Learning Policy Improvement with Path Integrals (PI 2 ) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

100 Imitation Learning Dynamic Movement Primitives Outline Representations 2 Function Learning 3 Perceptual Grouping 4 Imitation Learning Dynamic Movement Primitives Reinforcement Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

101 Imitation Learning Dynamic Movement Primitives Dynamic Movement Primitives [Ijspeert, Nakanishi, Schaal, ICRA 2] spring-damper system generates basic motion towards goal τẍ t = k (g x t ) c ẋ t v Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

102 Imitation Learning Dynamic Movement Primitives Dynamic Movement Primitives [Ijspeert, Nakanishi, Schaal, ICRA 2] spring-damper system generates basic motion towards goal τẍ t = k (g x t ) c ẋ t Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

103 Imitation Learning Dynamic Movement Primitives Dynamic Movement Primitives [Ijspeert, Nakanishi, Schaal, ICRA 2] add external force to represent complex trajectory shapes τẍ t = k (g x t ) c ẋ t + g T t θ Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

104 Imitation Learning Dynamic Movement Primitives Dynamic Movement Primitives [Ijspeert, Nakanishi, Schaal, ICRA 2] add external force to represent complex trajectory shapes τẍ t = k (g x t ) c ẋ t + g T t θ Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

105 Imitation Learning Dynamic Movement Primitives Dynamic Movement Primitives [Ijspeert, Nakanishi, Schaal, ICRA 2] add external force to represent complex trajectory shapes τẍ t = k (g x t ) c ẋ t + g T t θ Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

106 Imitation Learning Dynamic Movement Primitives External Force f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x ) s weighted sum of basis functions ψ i Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

107 Imitation Learning Dynamic Movement Primitives External Force f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x ) s weighted sum of basis functions ψ i soft-max normalization Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

108 Imitation Learning Dynamic Movement Primitives External Force f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x ) s weighted sum of basis functions ψ i soft-max normalization amplitude scaled by initial distance to goal Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

109 Imitation Learning Dynamic Movement Primitives External Force f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x ) s weighted sum of basis functions ψ i soft-max normalization amplitude scaled by initial distance to goal influence weighted by canonical time s Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

110 Imitation Learning Dynamic Movement Primitives External Force f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x ) s weighted sum of basis functions ψ i soft-max normalization amplitude scaled by initial distance to goal influence weighted by canonical time s Gaussian basis functions ψ i ψ i (s) = exp( h i (s c i ) 2 ) c i logarithmically distributed in [... ] s 2 [Schaal et al, Brain Research, 27] Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

111 Imitation Learning Dynamic Movement Primitives Canonical System f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x ) s decouple external force from spring-damper evolution new phase / time variable s τṡ = α s s initially set to exponentially converges to Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

112 Imitation Learning Dynamic Movement Primitives Canonical System f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x ) s decouple external force from spring-damper evolution new phase / time variable s τṡ = α s + α c (x actual x expected ) 2 s initially set to exponentially converges to pause influence of force on perturbations Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

113 Imitation Learning Dynamic Movement Primitives Properties τẍ t = k (g x t ) c ẋ t + g T t θ τṡ = α s f (s) = gt T θ = i θ iψ i (s) i ψ i(s) (g x ) s convergence to goal g motions are self-similar for different goal or start points coupling of multiple DOF through canonical phase s adapt τ for temporal scaling robust to perturbations due to attractor dynamics decoupling basic goal-directed motion from task-specific trajectory shape weights θ i can be learned with linear regression Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44

114 Imitation Learning Dynamic Movement Primitives Learning from Demonstration record motion x(t), ẋ(t), ẍ(t) choose τ to match duration evolve canonical system s(t) f target (s) = τẍ(t) k(g x(t) cẋ(t)) minimize E = s (f target(s) f (s)) 2 with regression X X.5.5 X X [Ijspeert, Nakanishi, Schaal, ICRA 2] Robert Haschke (CITEC) Learning From Physics to Knowledge Sep 23 4 / 44

115 Imitation Learning Reinforcement Learning Outline Representations 2 Function Learning 3 Perceptual Grouping 4 Imitation Learning Dynamic Movement Primitives Reinforcement Learning Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

116 Imitation Learning Reinforcement Learning Robustifying Motions motion from imitation learning is fragile robustify by self-exploration competing RL methods: PI 2 Policy Improvement with Path Integrals [Stefan Schaal] PoWeR Policy Learning by Weighting Exploration with the Returns [Jan Peters] Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

117 Imitation Learning Reinforcement Learning Policy Improvement with Path Integrals PI 2 [Evangelos 2] Optimize shape parameters θ w.r.t. cost function J Use direct reinforcement learning Exploration directly in policy parameter space θ Use Policy Improvement with Path Integrals PI 2 Derived from principles of optimal control Update rule based on cost-weighted averaging (next slide) Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

118 Input: DMP with initial parameters θ Imitation Learning Reinforcement Learning [Figures from Stulp] τẍ t = k(g x t ) cẋ t ) + g T t θ Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

119 Imitation Learning Input: DMP with initial parameters θ, cost function J Reinforcement Learning [Figures from Stulp] τẍ t = k(g x t ) cẋ t ) J(τ i ) + g T t θ Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

120 Imitation Learning Reinforcement Learning Input: DMP with initial parameters θ, cost function J While (cost not converged) [Figures from Stulp] Explore τẍ t = k(g x t ) cẋ t ) J(τ i ) + g T t θ Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

121 Imitation Learning Reinforcement Learning Input: DMP with initial parameters θ, cost function J While (cost not converged) [Figures from Stulp] Explore sample exploration vectors τẍ t = k(g x t ) cẋ t ) J(τ i ) + g T t (θ + ɛ i,k ) ɛ i,k N (, Σ) θ k = θ + ɛ k Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

122 Imitation Learning Reinforcement Learning Input: DMP with initial parameters θ, cost function J While (cost not converged) [Figures from Stulp] Explore sample exploration vectors execute DMP τẍ t = k(g x t ) cẋ t ) J(τ i ) + g T t (θ + ɛ i,k ) ɛ i,k N (, Σ) θ k = θ + ɛ k Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

123 Imitation Learning Reinforcement Learning Input: DMP with initial parameters θ, cost function J While (cost not converged) [Figures from Stulp] Explore sample exploration vectors execute DMP determine cost τẍ t = k(g x t ) cẋ t ) J(τ i ) + g T t (θ + ɛ i,k ) ɛ i,k N (, Σ) θ k = θ + ɛ k Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44

124 Imitation Learning Reinforcement Learning Input: DMP with initial parameters θ, cost function J While (cost not converged) [Figures from Stulp] Explore sample exploration vectors execute DMP determine cost Update weighted averaging with Boltzmann dist. τẍ t = k(g x t ) cẋ t ) J(τ i ) + g T t (θ + ɛ i,k ) exp( λ J(τ i,k )) ɛ i,k N (, Σ) P(τ i,k) = k exp( λ J(τ i,k )) θ k = θ + ɛ k K θ ti = P(τ i,k )ɛ i,k Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44 k=

125 Imitation Learning Reinforcement Learning Input: DMP with initial parameters θ, cost function J While (cost not converged) [Figures from Stulp] Explore sample exploration vectors execute DMP determine cost Update weighted averaging with Boltzmann dist. parameter update τẍ t = k(g x t ) cẋ t ) J(τ i ) + g T t (θ + ɛ i,k ) ɛ i,k N (, Σ) θ k = θ + ɛ k θ θ + θ P(τ i,k ) = exp( λ J(τ i,k )) k exp( λ J(τ i,k )) K θ ti = P(τ i,k )ɛ i,k Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44 k=

126 Imitation Learning Reinforcement Learning Input: DMP with initial parameters θ, cost function J While (cost not converged) [Figures from Stulp] Explore sample exploration vectors execute DMP determine cost Update weighted averaging with Boltzmann dist. parameter update τẍ t = k(g x t ) cẋ t ) J(τ i ) + g T t (θ + ɛ i,k ) ɛ i,k N (, Σ) θ k = θ + ɛ k θ θ + θ P(τ i,k ) = exp( λ J(τ i,k )) k exp( λ J(τ i,k )) K θ ti = P(τ i,k )ɛ i,k Robert Haschke (CITEC) Learning From Physics to Knowledge Sep / 44 k=

Learning Manipulation Patterns

Learning Manipulation Patterns Introduction Dynamic Movement Primitives Reinforcement Learning Neuroinformatics Group February 29, 2012 Introduction Dynamic Movement Primitives Reinforcement Learning Motivation shift to reactive motion

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning

Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning Stefan Schaal Max-Planck-Institute for Intelligent Systems Tübingen, Germany & Computer Science, Neuroscience, &

More information

Bits of Machine Learning Part 1: Supervised Learning

Bits of Machine Learning Part 1: Supervised Learning Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

Lecture 13: Dynamical Systems based Representation. Contents:

Lecture 13: Dynamical Systems based Representation. Contents: Lecture 13: Dynamical Systems based Representation Contents: Differential Equation Force Fields, Velocity Fields Dynamical systems for Trajectory Plans Generating plans dynamically Fitting (or modifying)

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Path Integral Stochastic Optimal Control for Reinforcement Learning

Path Integral Stochastic Optimal Control for Reinforcement Learning Preprint August 3, 204 The st Multidisciplinary Conference on Reinforcement Learning and Decision Making RLDM203 Path Integral Stochastic Optimal Control for Reinforcement Learning Farbod Farshidian Institute

More information

Trajectory encoding methods in robotics

Trajectory encoding methods in robotics JOŽEF STEFAN INTERNATIONAL POSTGRADUATE SCHOOL Humanoid and Service Robotics Trajectory encoding methods in robotics Andrej Gams Jožef Stefan Institute Ljubljana, Slovenia andrej.gams@ijs.si 12. 04. 2018

More information

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen Neural Networks - I Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Neural Networks 1 /

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Bayesian Linear Regression. Sargur Srihari

Bayesian Linear Regression. Sargur Srihari Bayesian Linear Regression Sargur srihari@cedar.buffalo.edu Topics in Bayesian Regression Recall Max Likelihood Linear Regression Parameter Distribution Predictive Distribution Equivalent Kernel 2 Linear

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

A Probabilistic Representation for Dynamic Movement Primitives

A Probabilistic Representation for Dynamic Movement Primitives A Probabilistic Representation for Dynamic Movement Primitives Franziska Meier,2 and Stefan Schaal,2 CLMC Lab, University of Southern California, Los Angeles, USA 2 Autonomous Motion Department, MPI for

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

More information

Basic Principles of Unsupervised and Unsupervised

Basic Principles of Unsupervised and Unsupervised Basic Principles of Unsupervised and Unsupervised Learning Toward Deep Learning Shun ichi Amari (RIKEN Brain Science Institute) collaborators: R. Karakida, M. Okada (U. Tokyo) Deep Learning Self Organization

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

Lecture 6. Regression

Lecture 6. Regression Lecture 6. Regression Prof. Alan Yuille Summer 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification

Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification N. Zabaras 1 S. Atkinson 1 Center for Informatics and Computational Science Department of Aerospace and Mechanical Engineering University

More information

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan 1, M. Koval and P. Parashar 1 Applications of Gaussian

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Linear Models for Regression. Sargur Srihari

Linear Models for Regression. Sargur Srihari Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood

More information

Tutorial on Policy Gradient Methods. Jan Peters

Tutorial on Policy Gradient Methods. Jan Peters Tutorial on Policy Gradient Methods Jan Peters Outline 1. Reinforcement Learning 2. Finite Difference vs Likelihood-Ratio Policy Gradients 3. Likelihood-Ratio Policy Gradients 4. Conclusion General Setup

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships

More information

Several ways to solve the MSO problem

Several ways to solve the MSO problem Several ways to solve the MSO problem J. J. Steil - Bielefeld University - Neuroinformatics Group P.O.-Box 0 0 3, D-3350 Bielefeld - Germany Abstract. The so called MSO-problem, a simple superposition

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Linear discriminant functions

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative

More information

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Model Selection for Gaussian Processes

Model Selection for Gaussian Processes Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal

More information

Outline Lecture 2 2(32)

Outline Lecture 2 2(32) Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Reliability Monitoring Using Log Gaussian Process Regression

Reliability Monitoring Using Log Gaussian Process Regression COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Yongjin Park 1 Goal of Feedforward Networks Deep Feedforward Networks are also called as Feedforward neural networks or Multilayer Perceptrons Their Goal: approximate some function

More information

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks C.M. Bishop s PRML: Chapter 5; Neural Networks Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations,

More information

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann Machine Learning! in just a few minutes Jan Peters Gerhard Neumann 1 Purpose of this Lecture Foundations of machine learning tools for robotics We focus on regression methods and general principles Often

More information

Deep Feedforward Networks. Sargur N. Srihari

Deep Feedforward Networks. Sargur N. Srihari Deep Feedforward Networks Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation

More information

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter

More information

LECTURE NOTE #NEW 6 PROF. ALAN YUILLE

LECTURE NOTE #NEW 6 PROF. ALAN YUILLE LECTURE NOTE #NEW 6 PROF. ALAN YUILLE 1. Introduction to Regression Now consider learning the conditional distribution p(y x). This is often easier than learning the likelihood function p(x y) and the

More information

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Learning Deep Architectures for AI. Part II - Vijay Chakilam Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model

More information

Gaussian with mean ( µ ) and standard deviation ( σ)

Gaussian with mean ( µ ) and standard deviation ( σ) Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

Probabilistic Model-based Imitation Learning

Probabilistic Model-based Imitation Learning Probabilistic Model-based Imitation Learning Peter Englert, Alexandros Paraschos, Jan Peters,2, and Marc Peter Deisenroth Department of Computer Science, Technische Universität Darmstadt, Germany. 2 Max

More information

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28 1 / 28 Neural Networks Mark van Rossum School of Informatics, University of Edinburgh January 15, 2018 2 / 28 Goals: Understand how (recurrent) networks behave Find a way to teach networks to do a certain

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Practical Bayesian Optimization of Machine Learning. Learning Algorithms Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that

More information

1 Machine Learning Concepts (16 points)

1 Machine Learning Concepts (16 points) CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Knowledge

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Gaussian Processes. 1 What problems can be solved by Gaussian Processes?

Gaussian Processes. 1 What problems can be solved by Gaussian Processes? Statistical Techniques in Robotics (16-831, F1) Lecture#19 (Wednesday November 16) Gaussian Processes Lecturer: Drew Bagnell Scribe:Yamuna Krishnamurthy 1 1 What problems can be solved by Gaussian Processes?

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II 1 Non-linear regression techniques Part - II Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Support vector regression Boosting random projections Relevance vector

More information

A summary of Deep Learning without Poor Local Minima

A summary of Deep Learning without Poor Local Minima A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given

More information

Policy Search for Path Integral Control

Policy Search for Path Integral Control Policy Search for Path Integral Control Vicenç Gómez 1,2, Hilbert J Kappen 2, Jan Peters 3,4, and Gerhard Neumann 3 1 Universitat Pompeu Fabra, Barcelona Department of Information and Communication Technologies,

More information

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting)

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Professor: Aude Billard Assistants: Nadia Figueroa, Ilaria Lauzana and Brice Platerrier E-mails: aude.billard@epfl.ch,

More information

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Jan Drchal Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science Topics covered

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Hierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References

Hierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References 24th March 2011 Update Hierarchical Model Rao and Ballard (1999) presented a hierarchical model of visual cortex to show how classical and extra-classical Receptive Field (RF) effects could be explained

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

Learning Motor Skills from Partially Observed Movements Executed at Different Speeds

Learning Motor Skills from Partially Observed Movements Executed at Different Speeds Learning Motor Skills from Partially Observed Movements Executed at Different Speeds Marco Ewerton 1, Guilherme Maeda 1, Jan Peters 1,2 and Gerhard Neumann 3 Abstract Learning motor skills from multiple

More information

Probabilistic Model-based Imitation Learning

Probabilistic Model-based Imitation Learning Probabilistic Model-based Imitation Learning Peter Englert, Alexandros Paraschos, Jan Peters,2, and Marc Peter Deisenroth Department of Computer Science, Technische Universität Darmstadt, Germany. 2 Max

More information

Optimal Control with Learned Forward Models

Optimal Control with Learned Forward Models Optimal Control with Learned Forward Models Pieter Abbeel UC Berkeley Jan Peters TU Darmstadt 1 Where we are? Reinforcement Learning Data = {(x i, u i, x i+1, r i )}} x u xx r u xx V (x) π (u x) Now V

More information

Dropout as a Bayesian Approximation: Insights and Applications

Dropout as a Bayesian Approximation: Insights and Applications Dropout as a Bayesian Approximation: Insights and Applications Yarin Gal and Zoubin Ghahramani Discussion by: Chunyuan Li Jan. 15, 2016 1 / 16 Main idea In the framework of variational inference, the authors

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

CSC411: Final Review. James Lucas & David Madras. December 3, 2018

CSC411: Final Review. James Lucas & David Madras. December 3, 2018 CSC411: Final Review James Lucas & David Madras December 3, 2018 Agenda 1. A brief overview 2. Some sample questions Basic ML Terminology The final exam will be on the entire course; however, it will be

More information

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials by Phillip Krahenbuhl and Vladlen Koltun Presented by Adam Stambler Multi-class image segmentation Assign a class label to each

More information

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Multiple-step Time Series Forecasting with Sparse Gaussian Processes

Multiple-step Time Series Forecasting with Sparse Gaussian Processes Multiple-step Time Series Forecasting with Sparse Gaussian Processes Perry Groot ab Peter Lucas a Paul van den Bosch b a Radboud University, Model-Based Systems Development, Heyendaalseweg 135, 6525 AJ

More information

Gaussian Process Latent Variable Models for Dimensionality Reduction and Time Series Modeling

Gaussian Process Latent Variable Models for Dimensionality Reduction and Time Series Modeling Gaussian Process Latent Variable Models for Dimensionality Reduction and Time Series Modeling Nakul Gopalan IAS, TU Darmstadt nakul.gopalan@stud.tu-darmstadt.de Abstract Time series data of high dimensions

More information

Machine Learning Lecture 12

Machine Learning Lecture 12 Machine Learning Lecture 12 Neural Networks 30.11.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory Probability

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms * Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 1. pp. 87 94. Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms

More information