Basic Principles of Unsupervised and Unsupervised Learning Toward Deep Learning Shun ichi Amari (RIKEN Brain Science Institute) collaborators: R. Karakida, M. Okada (U. Tokyo)
Deep Learning Self Organization + Supervised Learning RBM: Restricted Boltzmann Machine Auto Encoder, Recurrent Net Dropout Contrastive divergence
Simple Hebbian Self Organization
self organization of
Equillibrium
Equillibrium: special cases
Two and many clusters
Dynamics of self organization
Lyapunov Function
Further Problems Distributed small clusters; large clusters Mutual interactions among h neurons neural field Localized receptive fields invariance
Boltzmann Machine
RBM: Restricted Boltzmann Machine
RBM
Self Organization
Interaction of Hidden Neurons
Recurrent Net (Auto Encoder)
Recurrent Net Self Organization
Gaussian RBM is easy Higher order interactions Gram Charlier expansion
Gaussian Boltzmann Machine
Equilibrium Solution
Equilibrium Solution General Solution diagonalized by You can choose m( k) eigen values form Stable Solution the case of m = k
Contrastive Divergence RBM 2 layered probabilistic neural network No connections within layers Visible v W Hidden h How to train RBM Maximum Likelihood (ML) learning is hard Sampling Input Equilibrium Many iterations of Gibbs Sampling demand too much computational time 23
Contrastive Divergence Solution
Solution General Solution Stable Solution = k the same analytical form with maximum likelihood regardless of n
Simulation Each Layer : 10 Neurons Input: 10 dim. Gaussian Distribution Mean = 0, Variance[0.2, 0.4,, 2], Covariance = 0 ML Extracted Eigenvalue Extracted Eigenvalue Input Eigenvalues Extracted Eigenvalues
Bayesian Duality in Exponential Family Data x Parameter (higher order concepts) Curved exponential family
RBM = h, x = Wv x = v = hw
Two Manifolds
Geometry of CDn (contrastive divergence)
Bernoulli Gaussian RBM ICA R. Karakida
Equilibrium Analysis: Results Assumption of Input s: Independent and nonnegative sources B: N N orthogonal matrix ICA (independent Component Analysis) Solutions If, ML and CD learning have the following stable solutions: W s Space CD Solutions Mean value: Model variance : σ ML Solutions ICA 32
Simulation The number of Neurons: N = M = 2, σ = 1/2 Sources p (s) Uniform Distribution Mixing Input CD ICA Solution Output Independent sources are extracted in G B RBM 33
Supervised Learning Multilayer perceptron Back prop learning Singularity!! Natural Gradient Solves Difficulty
Mathematical Neurons w x y wx h i i x y ( u) u
Multilayer Perceptrons y v i wi x w 1 x x ( x1, x2,..., x n ) x y f x v w x, i i ( w,..., w ; v,..., v ) 1 m 1 m
Multilayer Perceptron neuromanifold () x space of functions S y f x, θ v i w i x θ v, v ; w, w 1 m 1, m
Backpropagation ---gradient learning x x examples :,,, training set y1 1 y t t 1 l( y, x; ) y f x, 2 log p y, x; 2 l( yt, xt; t) t t f x, v w x i i
Flaws of MLP slow convergence : Plateau error local minima Boosting and Bagging; SVM
Parameter Space vs Function Space
Singularity of MLP example
Geometry of singular model y v wx n v v w 0 W
singularities
Gaussian mixture ;,, 1 p x v w w v x w v x w 1 2 1 2 1 1 2 2 2 x exp x singular : w w, v 1v 0 1 2 v w 1 w 2
Steepest Direction---Natural Gradient l( ) l l l,, 1 n 1 l G l 2 d i j d d Gd = G d d ij lx (, y; ) t t t t t
Natural Gradient max dl l d l d 2 1 l G l lx (, y; ) t t t t t
Adaptive Natural Gradient
Learning, Estimation, and Model Selection x: x; Egen D p0 y p y E train Dpemp y x; E E gen gen d 2n E train d n d :dimension
Coordinate Transformation u w2 w1 : u0 v1w1 v2w2 w w w v v v1 v2 v v v2 v1 z z 1 v
Singular lines in the parameter space
Learning Trajectory near the singularity
Milnor attractor
Dynamic vector fields: Redundant case
Dynamics of Learning : Trajectories 2 2 2 2 2 2 2 2 2 1 1 2 1 1 1 log z l z z l v h v z h z z z h v z c z u u u u u
Dynamic vector fields: General case ( z <1 part stable)
Dynamic vector fields: General case ( z >1 part stable )
Fig. 2: trajectories