Basic Principles of Unsupervised and Unsupervised

Similar documents
UNSUPERVISED LEARNING

Introduction to Neural Networks

Non-Convex Optimization in Machine Learning. Jan Mrkos AIC

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28

Greedy Layer-Wise Training of Deep Networks

Deep unsupervised learning

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Statistical Learning Theory. Part I 5. Deep Learning

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017

Index. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow,

PATTERN CLASSIFICATION

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

Unsupervised Neural Nets

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Deep Neural Networks

Pattern Recognition and Machine Learning

Lecture 16 Deep Neural Generative Models

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

How to do backpropagation in a brain

Lecture 14: Deep Generative Learning

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Restricted Boltzmann Machines for Collaborative Filtering

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Machine Learning. Neural Networks

Neural Network Training

Neural networks. Chapter 19, Sections 1 5 1

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Regularization in Neural Networks

Hopfield Networks and Boltzmann Machines. Christian Borgelt Artificial Neural Networks and Deep Learning 296

Neural networks. Chapter 20, Section 5 1

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Unsupervised Learning

Part 2. Representation Learning Algorithms

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Neural networks. Chapter 20. Chapter 20 1

Learning to Disentangle Factors of Variation with Manifold Learning

CSC321 Lecture 20: Autoencoders

Probabilistic & Unsupervised Learning

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Artificial Neural Networks. MGS Lecture 2

Statistical learning. Chapter 20, Sections 1 4 1

Neural Networks Lecture 4: Radial Bases Function Networks

STA 414/2104: Lecture 8

Variational Autoencoders

Robust Classification using Boltzmann machines by Vasileios Vasilakakis

Introduction to Restricted Boltzmann Machines

Natural Gradient Learning for Over- and Under-Complete Bases in ICA

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Machine Learning Lecture 5

Deep Learning: a gentle introduction

ESANN'2003 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2003, d-side publi., ISBN X, pp.

STA 414/2104: Machine Learning

1. A discrete-time recurrent network is described by the following equation: y(n + 1) = A y(n) + B x(n)

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Neural Networks. Hopfield Nets and Auto Associators Fall 2017

Error Empirical error. Generalization error. Time (number of iteration)

Using Kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University

Deep Feedforward Networks

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts III-IV

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Artificial Neural Networks Examination, June 2005

= w 2. w 1. B j. A j. C + j1j2

CSC 411 Lecture 10: Neural Networks

Statistical Machine Learning from Data

Statistical Machine Learning from Data

Immediate Reward Reinforcement Learning for Projective Kernel Methods

STA 414/2104: Lecture 8

Deep Neural Networks

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

Lecture 4: Perceptrons and Multilayer Perceptrons

Variational Inference via Stochastic Backpropagation

Discriminative Learning of Sum-Product Networks. Robert Gens Pedro Domingos

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Machine Learning Basics III

Kyle Reing University of Southern California April 18, 2018

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Lecture 5: Recurrent Neural Networks

Learning Energy-Based Models of High-Dimensional Data

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Chapter 20. Deep Generative Models

Introduction to Neural Networks

Normalization Techniques

Local minima and plateaus in hierarchical structures of multilayer perceptrons

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

CS534 Machine Learning - Spring Final Exam

A summary of Deep Learning without Poor Local Minima

Hierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References

Transcription:

Basic Principles of Unsupervised and Unsupervised Learning Toward Deep Learning Shun ichi Amari (RIKEN Brain Science Institute) collaborators: R. Karakida, M. Okada (U. Tokyo)

Deep Learning Self Organization + Supervised Learning RBM: Restricted Boltzmann Machine Auto Encoder, Recurrent Net Dropout Contrastive divergence

Simple Hebbian Self Organization

self organization of

Equillibrium

Equillibrium: special cases

Two and many clusters

Dynamics of self organization

Lyapunov Function

Further Problems Distributed small clusters; large clusters Mutual interactions among h neurons neural field Localized receptive fields invariance

Boltzmann Machine

RBM: Restricted Boltzmann Machine

RBM

Self Organization

Interaction of Hidden Neurons

Recurrent Net (Auto Encoder)

Recurrent Net Self Organization

Gaussian RBM is easy Higher order interactions Gram Charlier expansion

Gaussian Boltzmann Machine

Equilibrium Solution

Equilibrium Solution General Solution diagonalized by You can choose m( k) eigen values form Stable Solution the case of m = k

Contrastive Divergence RBM 2 layered probabilistic neural network No connections within layers Visible v W Hidden h How to train RBM Maximum Likelihood (ML) learning is hard Sampling Input Equilibrium Many iterations of Gibbs Sampling demand too much computational time 23

Contrastive Divergence Solution

Solution General Solution Stable Solution = k the same analytical form with maximum likelihood regardless of n

Simulation Each Layer : 10 Neurons Input: 10 dim. Gaussian Distribution Mean = 0, Variance[0.2, 0.4,, 2], Covariance = 0 ML Extracted Eigenvalue Extracted Eigenvalue Input Eigenvalues Extracted Eigenvalues

Bayesian Duality in Exponential Family Data x Parameter (higher order concepts) Curved exponential family

RBM = h, x = Wv x = v = hw

Two Manifolds

Geometry of CDn (contrastive divergence)

Bernoulli Gaussian RBM ICA R. Karakida

Equilibrium Analysis: Results Assumption of Input s: Independent and nonnegative sources B: N N orthogonal matrix ICA (independent Component Analysis) Solutions If, ML and CD learning have the following stable solutions: W s Space CD Solutions Mean value: Model variance : σ ML Solutions ICA 32

Simulation The number of Neurons: N = M = 2, σ = 1/2 Sources p (s) Uniform Distribution Mixing Input CD ICA Solution Output Independent sources are extracted in G B RBM 33

Supervised Learning Multilayer perceptron Back prop learning Singularity!! Natural Gradient Solves Difficulty

Mathematical Neurons w x y wx h i i x y ( u) u

Multilayer Perceptrons y v i wi x w 1 x x ( x1, x2,..., x n ) x y f x v w x, i i ( w,..., w ; v,..., v ) 1 m 1 m

Multilayer Perceptron neuromanifold () x space of functions S y f x, θ v i w i x θ v, v ; w, w 1 m 1, m

Backpropagation ---gradient learning x x examples :,,, training set y1 1 y t t 1 l( y, x; ) y f x, 2 log p y, x; 2 l( yt, xt; t) t t f x, v w x i i

Flaws of MLP slow convergence : Plateau error local minima Boosting and Bagging; SVM

Parameter Space vs Function Space

Singularity of MLP example

Geometry of singular model y v wx n v v w 0 W

singularities

Gaussian mixture ;,, 1 p x v w w v x w v x w 1 2 1 2 1 1 2 2 2 x exp x singular : w w, v 1v 0 1 2 v w 1 w 2

Steepest Direction---Natural Gradient l( ) l l l,, 1 n 1 l G l 2 d i j d d Gd = G d d ij lx (, y; ) t t t t t

Natural Gradient max dl l d l d 2 1 l G l lx (, y; ) t t t t t

Adaptive Natural Gradient

Learning, Estimation, and Model Selection x: x; Egen D p0 y p y E train Dpemp y x; E E gen gen d 2n E train d n d :dimension

Coordinate Transformation u w2 w1 : u0 v1w1 v2w2 w w w v v v1 v2 v v v2 v1 z z 1 v

Singular lines in the parameter space

Learning Trajectory near the singularity

Milnor attractor

Dynamic vector fields: Redundant case

Dynamics of Learning : Trajectories 2 2 2 2 2 2 2 2 2 1 1 2 1 1 1 log z l z z l v h v z h z z z h v z c z u u u u u

Dynamic vector fields: General case ( z <1 part stable)

Dynamic vector fields: General case ( z >1 part stable )

Fig. 2: trajectories