Basic Principles of Unsupervised and Unsupervised

Size: px

Start display at page:

Download "Basic Principles of Unsupervised and Unsupervised"

Lesley George
5 years ago
Views:

1 Basic Principles of Unsupervised and Unsupervised Learning Toward Deep Learning Shun ichi Amari (RIKEN Brain Science Institute) collaborators: R. Karakida, M. Okada (U. Tokyo)

2 Deep Learning Self Organization + Supervised Learning RBM: Restricted Boltzmann Machine Auto Encoder, Recurrent Net Dropout Contrastive divergence

3 Simple Hebbian Self Organization

4 self organization of

5 Equillibrium

6 Equillibrium: special cases

7 Two and many clusters

8 Dynamics of self organization

9 Lyapunov Function

10 Further Problems Distributed small clusters; large clusters Mutual interactions among h neurons neural field Localized receptive fields invariance

11 Boltzmann Machine

12 RBM: Restricted Boltzmann Machine

13 RBM

14 Self Organization

15 Interaction of Hidden Neurons

17 Recurrent Net (Auto Encoder)

18 Recurrent Net Self Organization

19 Gaussian RBM is easy Higher order interactions Gram Charlier expansion

20 Gaussian Boltzmann Machine

21 Equilibrium Solution

22 Equilibrium Solution General Solution diagonalized by You can choose m( k) eigen values form Stable Solution the case of m = k

23 Contrastive Divergence RBM 2 layered probabilistic neural network No connections within layers Visible v W Hidden h How to train RBM Maximum Likelihood (ML) learning is hard Sampling Input Equilibrium Many iterations of Gibbs Sampling demand too much computational time 23

24 Contrastive Divergence Solution

25 Solution General Solution Stable Solution = k the same analytical form with maximum likelihood regardless of n

26 Simulation Each Layer : 10 Neurons Input: 10 dim. Gaussian Distribution Mean = 0, Variance[0.2, 0.4,, 2], Covariance = 0 ML Extracted Eigenvalue Extracted Eigenvalue Input Eigenvalues Extracted Eigenvalues

27 Bayesian Duality in Exponential Family Data x Parameter (higher order concepts) Curved exponential family

28 RBM = h, x = Wv x = v = hw

29 Two Manifolds

30 Geometry of CDn (contrastive divergence)

31 Bernoulli Gaussian RBM ICA R. Karakida

32 Equilibrium Analysis: Results Assumption of Input s: Independent and nonnegative sources B: N N orthogonal matrix ICA (independent Component Analysis) Solutions If, ML and CD learning have the following stable solutions: W s Space CD Solutions Mean value: Model variance : σ ML Solutions ICA 32

33 Simulation The number of Neurons: N = M = 2, σ = 1/2 Sources p (s) Uniform Distribution Mixing Input CD ICA Solution Output Independent sources are extracted in G B RBM 33

34 Supervised Learning Multilayer perceptron Back prop learning Singularity!! Natural Gradient Solves Difficulty

35 Mathematical Neurons w x y wx h i i x y ( u) u

36 Multilayer Perceptrons y v i wi x w 1 x x ( x1, x2,..., x n ) x y f x v w x, i i ( w,..., w ; v,..., v ) 1 m 1 m

37 Multilayer Perceptron neuromanifold () x space of functions S y f x, θ v i w i x θ v, v ; w, w 1 m 1, m

38 Backpropagation ---gradient learning x x examples :,,, training set y1 1 y t t 1 l( y, x; ) y f x, 2 log p y, x; 2 l( yt, xt; t) t t f x, v w x i i

39 Flaws of MLP slow convergence : Plateau error local minima Boosting and Bagging; SVM

40 Parameter Space vs Function Space

41 Singularity of MLP example

42 Geometry of singular model y v wx n v v w 0 W

43 singularities

44 Gaussian mixture ;,, 1 p x v w w v x w v x w x exp x singular : w w, v 1v v w 1 w 2

45 Steepest Direction---Natural Gradient l( ) l l l,, 1 n 1 l G l 2 d i j d d Gd = G d d ij lx (, y; ) t t t t t

46 Natural Gradient max dl l d l d 2 1 l G l lx (, y; ) t t t t t

47 Adaptive Natural Gradient

48 Learning, Estimation, and Model Selection x: x; Egen D p0 y p y E train Dpemp y x; E E gen gen d 2n E train d n d :dimension

52 Coordinate Transformation u w2 w1 : u0 v1w1 v2w2 w w w v v v1 v2 v v v2 v1 z z 1 v

53 Singular lines in the parameter space

54 Learning Trajectory near the singularity

55 Milnor attractor

56 Dynamic vector fields: Redundant case

57 Dynamics of Learning : Trajectories log z l z z l v h v z h z z z h v z c z u u u u u

58 Dynamic vector fields: General case ( z <1 part stable)

59 Dynamic vector fields: General case ( z >1 part stable )

60 Fig. 2: trajectories

UNSUPERVISED LEARNING

UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training