Basic Principles of Unsupervised and Unsupervised

Basic Principles of Unsupervised and Unsupervised Learning Toward Deep Learning Shun ichi Amari (RIKEN Brain Science Institute) collaborators: R. Karakida, M. Okada (U. Tokyo)

Deep Learning Self Organization + Supervised Learning RBM: Restricted Boltzmann Machine Auto Encoder, Recurrent Net Dropout Contrastive divergence

Simple Hebbian Self Organization

self organization of

Equillibrium

Equillibrium: special cases

Two and many clusters

Dynamics of self organization

Lyapunov Function

Further Problems Distributed small clusters; large clusters Mutual interactions among h neurons neural field Localized receptive fields invariance

Boltzmann Machine

RBM: Restricted Boltzmann Machine

RBM

Self Organization

Interaction of Hidden Neurons

Recurrent Net (Auto Encoder)

Recurrent Net Self Organization

Gaussian RBM is easy Higher order interactions Gram Charlier expansion

Gaussian Boltzmann Machine

Equilibrium Solution

Equilibrium Solution General Solution diagonalized by You can choose m( k) eigen values form Stable Solution the case of m = k

Contrastive Divergence RBM 2 layered probabilistic neural network No connections within layers Visible v W Hidden h How to train RBM Maximum Likelihood (ML) learning is hard Sampling Input Equilibrium Many iterations of Gibbs Sampling demand too much computational time 23

Contrastive Divergence Solution

Solution General Solution Stable Solution = k the same analytical form with maximum likelihood regardless of n

Simulation Each Layer : 10 Neurons Input: 10 dim. Gaussian Distribution Mean = 0, Variance[0.2, 0.4,, 2], Covariance = 0 ML Extracted Eigenvalue Extracted Eigenvalue Input Eigenvalues Extracted Eigenvalues

Bayesian Duality in Exponential Family Data x Parameter (higher order concepts) Curved exponential family

RBM = h, x = Wv x = v = hw

Two Manifolds

Geometry of CDn (contrastive divergence)

Bernoulli Gaussian RBM ICA R. Karakida

Equilibrium Analysis: Results Assumption of Input s: Independent and nonnegative sources B: N N orthogonal matrix ICA (independent Component Analysis) Solutions If, ML and CD learning have the following stable solutions: W s Space CD Solutions Mean value: Model variance : σ ML Solutions ICA 32

Simulation The number of Neurons: N = M = 2, σ = 1/2 Sources p (s) Uniform Distribution Mixing Input CD ICA Solution Output Independent sources are extracted in G B RBM 33

Supervised Learning Multilayer perceptron Back prop learning Singularity!! Natural Gradient Solves Difficulty

Mathematical Neurons w x y wx h i i x y ( u) u

Multilayer Perceptrons y v i wi x w 1 x x ( x1, x2,..., x n ) x y f x v w x, i i ( w,..., w ; v,..., v ) 1 m 1 m

Multilayer Perceptron neuromanifold () x space of functions S y f x, θ v i w i x θ v, v ; w, w 1 m 1, m

Backpropagation ---gradient learning x x examples :,,, training set y1 1 y t t 1 l( y, x; ) y f x, 2 log p y, x; 2 l( yt, xt; t) t t f x, v w x i i

Flaws of MLP slow convergence : Plateau error local minima Boosting and Bagging; SVM

Parameter Space vs Function Space

Singularity of MLP example

Geometry of singular model y v wx n v v w 0 W

singularities

Gaussian mixture ;,, 1 p x v w w v x w v x w 1 2 1 2 1 1 2 2 2 x exp x singular : w w, v 1v 0 1 2 v w 1 w 2

Steepest Direction---Natural Gradient l( ) l l l,, 1 n 1 l G l 2 d i j d d Gd = G d d ij lx (, y; ) t t t t t

Natural Gradient max dl l d l d 2 1 l G l lx (, y; ) t t t t t

Adaptive Natural Gradient

Learning, Estimation, and Model Selection x: x; Egen D p0 y p y E train Dpemp y x; E E gen gen d 2n E train d n d :dimension

Coordinate Transformation u w2 w1 : u0 v1w1 v2w2 w w w v v v1 v2 v v v2 v1 z z 1 v

Singular lines in the parameter space

Learning Trajectory near the singularity

Milnor attractor

Dynamic vector fields: Redundant case

Dynamics of Learning : Trajectories 2 2 2 2 2 2 2 2 2 1 1 2 1 1 1 log z l z z l v h v z h z z z h v z c z u u u u u

Dynamic vector fields: General case ( z <1 part stable)

Dynamic vector fields: General case ( z >1 part stable )

Fig. 2: trajectories