Hebb rule book: 'The Organization of Behavior' Theory about the neural bases of learning

Similar documents
The Hebb rule Neurons that fire together wire together.

Synaptic Plasticity. Introduction. Biophysics of Synaptic Plasticity. Functional Modes of Synaptic Plasticity. Activity-dependent synaptic plasticity:

Outline. NIP: Hebbian Learning. Overview. Types of Learning. Neural Information Processing. Amos Storkey

Covariance and Correlation Matrix

Plasticity and Learning

AS Elementary Cybernetics. Lecture 4: Neocybernetic Basic Models

Iterative face image feature extraction with Generalized Hebbian Algorithm and a Sanger-like BCM rule

Financial Informatics XVII:

Neuron Grids as seen as. Elastic Systems. Heikki Hyötyniemi TKK / Control Engineering Presentation at NeuroCafé March 31, 2006

Learning and Meta-learning

A. The Hopfield Network. III. Recurrent Neural Networks. Typical Artificial Neuron. Typical Artificial Neuron. Hopfield Network.

In this section, we review some basics of modeling via linear algebra: finding a line of best fit, Hebbian learning, pattern classification.

MODELS OF LEARNING AND THE POLAR DECOMPOSITION OF BOUNDED LINEAR OPERATORS

Machine Learning and Adaptive Systems. Lectures 5 & 6

Table of Contents EQ EQ EQ EQ EQ EQ KOHONEN...70 EQ STEPHEN GROSSBERG...70 EQ EQ.11...

A. The Hopfield Network. III. Recurrent Neural Networks. Typical Artificial Neuron. Typical Artificial Neuron. Hopfield Network.

Principal Components Analysis and Unsupervised Hebbian Learning

Neural networks: Unsupervised learning

Performance Surfaces and Optimum Points

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Principal Component Analysis CS498

PCA and LDA. Man-Wai MAK

Synapse Model. Neurotransmitter is released into cleft between axonal button and dendritic spine

Computational Neuroscience. Structure Dynamics Implementation Algorithm Computation - Function

How do biological neurons learn? Insights from computational modelling of

Neuron, Volume 63. Supplemental Data. Generating Coherent Patterns of Activity. from Chaotic Neural Networks. David Sussillo and L.F.

Slide10. Haykin Chapter 8: Principal Components Analysis. Motivation. Principal Component Analysis: Variance Probe

Exercises * on Principal Component Analysis

Neural Network Training

Introduction to Neural Networks

Nonlinear Dynamics of Neural Firing

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

CSE/NB 528 Final Lecture: All Good Things Must. CSE/NB 528: Final Lecture

Face Recognition and Biometric Systems

How do synapses transform inputs?

Dimensionality Reduction and Principal Components

PCA and LDA. Man-Wai MAK

Dimensionality Reduction

Principal Component Analysis

1 Principal Components Analysis

Hebbian Learning II. Robert Jacobs Department of Brain & Cognitive Sciences University of Rochester. July 20, 2017

A review of stability and dynamical behaviors of differential equations:

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Information Theory and Neuroscience II

Neural Networks. Fundamentals of Neural Networks : Architectures, Algorithms and Applications. L, Fausett, 1994

Linear Algebra Methods for Data Mining

Learning with Singular Vectors

Vector Spaces and Linear Maps

Principal Component Analysis and Linear Discriminant Analysis

A Robust PCA by LMSER Learning with Iterative Error. Bai-ling Zhang Irwin King Lei Xu.

Video 6.1 Vijay Kumar and Ani Hsieh

Master Recherche IAC TC2: Apprentissage Statistique & Optimisation

Applications of eigenvalues/eigenvectors

Simple neuron model Components of simple neuron

Lecture 12: Detailed balance and Eigenfunction methods

Intelligent Data Analysis. Principal Component Analysis. School of Computer Science University of Birmingham

Linear Algebra & Geometry why is linear algebra useful in computer vision?

HW3 - Due 02/06. Each answer must be mathematically justified. Don t forget your name. 1 2, A = 2 2

What is Principal Component Analysis?

Math 1553, Introduction to Linear Algebra

Dimensionality Reduction and Principle Components

Fundamentals of Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Independent Vector Analysis (IVA)

Kernel PCA, clustering and canonical correlation analysis

Constrained Projection Approximation Algorithms for Principal Component Analysis

Matrices related to linear transformations

Linear Algebra Methods for Data Mining

Econ Slides from Lecture 7

CAAM 336 DIFFERENTIAL EQUATIONS IN SCI AND ENG Examination 1

Principal Components Analysis (PCA)

Tutorial on Principal Component Analysis

Comparative Performance Analysis of Three Algorithms for Principal Component Analysis

In this section, we review some basics of modeling via linear algebra: finding a line of best fit, Hebbian learning, pattern classification.

Mathematical foundations - linear algebra

MATH 829: Introduction to Data Mining and Analysis Principal component analysis

Machine Learning. Neural Networks

Lecture 5 Supspace Tranformations Eigendecompositions, kernel PCA and CCA

7. Variable extraction and dimensionality reduction

Preliminary Examination, Numerical Analysis, August 2016

Systems Biology: A Personal View IX. Landscapes. Sitabhra Sinha IMSc Chennai

PCA, Kernel PCA, ICA

LECTURE 16: PCA AND SVD

Applications of Eigenvalues & Eigenvectors

Maximum variance formulation

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection

MOL 410/510: Introduction to Biological Dynamics Fall 2012 Problem Set #4, Nonlinear Dynamical Systems (due 10/19/2012) 6 MUST DO Questions, 1

On Parameter Estimation for Neuron Models

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Analytical Mechanics for Relativity and Quantum Mechanics

Linear Dimensionality Reduction

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Artificial Neural Networks The Introduction

Neural Modeling and Computational Neuroscience. Claudio Gallicchio

Eigenfaces. Face Recognition Using Principal Components Analysis

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

Lecture 4: Feed Forward Neural Networks

Eigenvalues, Eigenvectors, and an Intro to PCA

Independent Component Analysis and Its Application on Accelerator Physics

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

We observe the model neuron s response to constant input current, studying the dependence of:

Transcription:

PCA by neurons

Hebb rule 1949 book: 'The Organization of Behavior' Theory about the neural bases of learning Learning takes place in synapses. Synapses get modified, they get stronger when the pre- and postsynaptic cells fire together. When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A s efficiency, as one of the cells firing B, is increased "Cells that fire together, wire together"

Hebb Rule (simplified linear neuron) rate rate Input The neuron performs v = w T x Hebb rule: Δw = α x v w, x can have negative values

Stability The neuron performs v = w T x Hebb rule: Δw = α x v What will happen to the weights over a long time? Use differential equation for Hebb: (1/τ) dw / dt = α x v d/dt w 2 = 2w T dw / dt = 2w T αx v w T x = v therefore: = 2αv 2 (τ is taken as 1) Therefore: d/dt w 2 = 2αv 2 The derivative is always positive, therefore w will grow in size over time

Oja s rule and normalization length normalization: w (w + αv x) / w With Taylor expansion to first term: w(t+1) = w(t) + α v(x vw) (Oja s rule) Oja ~ normalized Hebb Similarity to Hebb: w(t+1) = w(t) +αvx ' with x ' = (x vw) Feedback, or forgetting term: αv 2 w

Erkki Oja Oja E. (1982) A simplified neuron model as a principal component analyzer. Journal of Mathematical Biology, 15:267-2735

Oja rule: effect on stability we used above: d/dt w 2 = 2w T dw / dt Put the new dw/dt from Oja rule: α v(x vw) = 2αw T v(x vw) = (as before, w T x = v) = 2αv 2 (1 - w 2 ) Instead of 2αv 2 we had before Steady state is when w 2 = 1

Comment: Neuronal Normalization Normalization as a canonical neural computation Carandini & Heeger 2012 Uses a general form: Different systems have somewhat different specific forms. For contrast normalization: C i are the input neurons, local contrast elements

Summary Hebb rule: w(t+1) = w(t) +αvx Normalization: w (w + αv x) / w Oja rule: w w + αv (x vw)

Summary For Hebb rule d/dt w 2 ~ 2αv 2 (growing) For Oja rule: d/dt w 2 ~ 2αv 2 (1 - w 2 ) (stable for w = 1)

Convergence The exact dynamics of the Oja rule have been solved by Wyatt and Elfaldel 1995 It shows that the w u 1 which is the first eigen-vector of X T X Qualitative argument, not the full solution

Final value of w Δw = α (xv v 2 w) Oja rule v = x T w = w T x Δw = α (x x T w w T x x T w w) Averaging over inputs x: Δw = α (Cw w T Cw w) = 0 (0 for steady-state) w T Cw is a scalar, λ Cw λw = 0 At convergence (assuming convergence) w is an eigenvector of C

Weight will be normalized: Also at convergence: We defined w T Cw as a scalar, λ λ = w T Cw = w T λw = λ w 2 w 2 = 1 Oja rule results in final length normalized to 1

It will in fact be the largest eigenvector. Without normalization each dimension grows exponentially with λ i With normalization only the largest λ i survives If there is more than one eignevector with the largest eigenvalue it will converge to a combination, that depends on the starting conditions Following Oja's rule, w will converge to the largest eigenvectors of the data matrix XX T For full convergence, the learning rate α has to decrease over time. A typical decreasing sequence is α(t) = 1/t

Full PCA by Neural Net First pc

Procedure Use Oja s rule to find the principal component Project the data orthogonal to the first principal component Use Oja s rule on the projected data to find the next major component Repeat the above for m p (m = desired components; p = input space dimensionality) How to find the projection onto orthogonal direction? Deflation method: subtract the principal component from the input

Oja rule: Δw = αv(x vw) Sanger rule: Δw i = αv i (x Σ k=1 i v k w k ) Oja multi-unit rule: Δw i = αv i (x Σ 1 N v k w k ) In Sanger the sum is for k up to j, all previous units, rather than all units. Was shown to converge Oja network converges in simulations

Connections in Sanger Network Δw j = α v j (x Σ j v k w k )

PCA by Neural Network Models: The Oja rule extracts on line the first principal component of the data Extensions of the network can extract the first m principal components of the data