Hebbian Learning II Robert Jacobs Department of Brain & Cognitive Sciences University of Rochester July 20, 2017
Goals Teach about one-half of an undergraduate course on Linear Algebra Understand when supervised Hebbian learning works perfectly (and when it does not) Patten completion: Supervised Hebbian learning finds a weight matrix such that the input vectors are the eigenvectors of this weight matrix
Vector age height weight Joe = 37 72 175 Mary = 10 30 61 Vectors have both a length (magnitude) and a direction
Graphical representation for the vector Mary :
Multiplication of a Vector by a Scalar 2 [ 2 1 ] = [ 4 2 ] Scalar multiplication corresponds to lengthening or shortening a vector (while leaving it pointing in the same direction)
Addition of Vectors 1 2 1 + 2 1 3 = 3 3 4
Linear Combination of Vectors u = c 1 v 1 + c 2 v 2 The set of all linear combinations of the { v i } is called the set spanned by the { v i }
The three vectors 1 0 0, 0 1 0, 0 0 1 span all of three-dimensional space since any vector u = written as a linear combination: 1 u = a 0 + b 0 0 1 0 + c 0 0 1 a b c can be In general, n vectors suffice to span n-dimensional space
Linear Independence If none of the vectors in a set can be written as a linear combination of the others, then the set is called linearly independent n-dimensional space is the set of vectors spanned by a set of n linearly independent vectors. The n vectors are referred to as a basis for the space
[ ] [ ] 1 2, are colinear and, thus, are linearly dependent. They 1 2 span only a 1-dimensional space [ ] [ ] 1 2, are linearly independent (and, thus, span a 1 1 2-dimensional space)
[ 1 1 ] [ 2, 1 ] [ 1, 3 ] are linearly dependent ( v 3 = 7 v 1 4 v 2 ) 1 2 0, 3 2 0, 9 10 0 are linearly dependent (no vector with a non-zero third component can be generated from this set)
For a given n-dimensional space, there are an infinite number of basis for that space Every vector has a different representation (i.e., set of coordinates) for each basis Which basis is best?
Inner (Dot) Product of Two Vectors v = 3 1 2, w = 1 2 1 v w = i v i w i = (3 1) + ( 1 2) + (2 1) = 3
Length of a Vector The length of a vector (denoted ) is the square root of the inner product of the vector with itself Let v = 3 1 2. v = ( v v) 1/2 = 3 2 + ( 1) 2 + 2 2
Follows from the Pythagorean Theorem:
Angle Between Two Vectors cos θ = = v w v w i v iw i ( i v2 i )1/2 ( i w2 i )1/2 Roughly a measure of similarity between two vectors: If v and w are random variables (so v and w are vectors of values for these variables) with zero mean, then this formula is their correlation If the inner product is zero, then cos θ = 0 (meaning that the two vectors are orthogonal)
Projection of One Vector Onto Another Vector
Let x be the projection of v onto w (a number, not a vector): x = v cos θ = v w w If w = 1, then x = v w
Example: Linear Neural Network with One Output Unit x 1 w 1 x2 w 2 y x 3 w 3 The output (y = w 1 x 1 + w 2 x 2 + w 3 x 3 = w x) gives an indication of how close the input x is to the weight vector w: If y > 0, then x is similar to w If y = 0, then x is orthogonal to w If y < 0, then x is dissimilar to w
Matrix An array of numbers. For example: [ ] 3 4 5 W = 1 0 1
Multiplication of a Matrix and a Vector u = W v: Matrix W maps from one space of vectors ( v) to a new space of vectors ( u) In general, vectors v and u may have different dimensionalities
Multiplication of a Matrix and a Vector W = [ 3 4 5 1 0 1 ] v = 1 0 2 u = W v = = = [ 3 4 5 1 0 1 ] 1 0 2 [ (3 1) + (4 0) + (5 2) (1 1) + (0 0) + (1 2) [ ] 13 3 ]
Multiplication of a Matrix and a Vector The following are equivalent: Form inner product of each row of matrix with vector u = W v is a linear combination of the columns of W. The coefficients are the components of v
Neural Network with Multiple Input and Multiple Output Units x 1 w 11 y 1 x2 y 2 x 3 w 23 [ y1 y 2 ] = [ w11 w 12 w 13 w 21 w 22 w 23 ] x 1 x 2 x 3 y = W x
Linearity A function is said to be linear if: f(cx) = cf(x) f(x 1 + x 2 ) = f(x 1 ) + f(x 2 )
Implication: If we know how a system responds to the basis of a space, then we can easily compute how it responds to all vectors in that space Let { v i } be a basis for a space Let v be an arbitrary vector in this space Then: W v = W (c 1 v 1 + c 2 v 2 + + c n v n ) = c 1 W v 1 + c 2 W v 2 + + c n W v n
Eigenvectors and Eigenvalues Limit our attention to square matrices (i.e., v and u have the same dimensionality) In general, multiplication by a matrix changes both a vector s direction and length However, there are some vectors that will change only in length, not direction For these vectors, multiplication by the matrix is no different than multiplication by a scalar where λ is a scalar W v = λ v Such vectors are called eigenvectors, and the scalar λ is called an eigenvalue
[ 4 1 2 1 ] [ 1 2 ] [ 1 = 2 2 ] Each vector that is colinear with an eigenvector is itself an eigenvector: [ 4 1 2 1 ] [ 2 4 ] [ 2 = 2 4 ] We will reserve the term eigenvector only for vectors of length 1
An n n matrix can have up to (but no more than) n distinct eigenvalues If it has n distinct eigenvalues, then the n associated eigenvectors are linearly independent Thus, these eigenvectors form a basis
Let { v i } be linearly independent eigenvectors of matrix W, and let v be an arbitrary vector. Then: u = W v = W (c 1 v 1 + + c n v n ) = c 1 W v 1 + + c n W v n = c 1 λ 1 v 1 + c n λ n v n There are no matrices in this last equation. Just a simple linear combination of eigenvectors.
Eigenvectors and eigenvalues reveal the directions in which matrix multiplication stretches and shrinks a space (i.e., it reveals which input vectors a system gives small and large responses to). v W u Power method for finding the largest eigenvector of a matrix
Transpose Turn a column vector into a row vector. For example, we can re-write an inner product as follows: w v = w T v = [w 1 w 2 w 3 ] v 1 v 2 v 3 = (w 1 v 1 ) + (w 2 v 2 ) + (w 3 v 3 )
Outer Product w v T = = w 1 w 2 w 3 [v 1 v 2 v 3 ] w 1 v 1 w 1 v 2 w 1 v 3 w 2 v 1 w 2 v 2 w 2 v 3 w 3 v 1 w 3 v 2 w 3 v 3 If w and v are random variables (the components of w and v are values of these variables) with zero means, and if w = v, then this is a covariance matrix
Using Linear Algebra to Study Supervised Hebbian Learning
Neural Network With One Output Unit: One Input-Output Pattern One input-output pattern: x y Assume x = 1 If we choose w = x, then y = w T x = x T x = 1 But we want y to equal y. So let w = y x w T x = (y x) T x = y ( x T x) = y
Problem of finding w corresponds to finding a vector whose projection onto x is y x y * There are an infinite number of solutions On the previous slide, we made the simple choice of the vector that points in the same directions as x
Neural Network With Multiple Output Units: One Input-Output Pattern One input-output pattern: x y Assume x = 1 Let W = y x T y = W x = ( y x T ) x = y ( x T x) = y
Example With Multiple Input-Output Patterns x 1 = x 2 = x 3 = 0.577 0.577 0.577 0.816 0.408 0.408 0.0 0.707 0.707 y 1 = 3 y 2 = 2 y 3 = 4
Based on 1st pattern, w 1 = Based on 2nd pattern, w 2 = Based on 3rd pattern, w 3 = Next, set weight matrix W : 1.731 1.731 1.731 1.632 0.816 0.816 0.0 2.828 2.828 W = w 1 + w 2 + w 3 0.099 = 0.281 5.375
Verify: W x 1 = y 1 W x 2 = y 2 W x 3 = y 3 Q: Why does this work? A: If the input vectors are orthogonal, then the Hebb rule works perfectly (!!!)
Hebb Rule Works Perfectly When Inputs Are Orthogonal Assume input vectors are unit length and mutually orthogonal: { 1 if i = j x T i x j = 0 else Set W i = y i x T Set W = W 1 + + W n
For all i: y = W x i = (W 1 + + W n ) x i = ( y 1 x T 1 + y n x T n) x i = y 1 x T 1 x i + + y n x T n x i = 0 + + y i + + 0 = y i
Caveat If the input vectors are not orthogonal, the Hebb rule is not guaranteed to work perfectly If the input vectors are linearly independent, the LMS rule works perfectly
Example: Hebb Fails Input Output 1-1 1-1 1 1 1 1 1 1 1 1 1-1 -1 1-1 -1 1-1 Hebb rule: overall weight changes for w 1, w 2, and w 4 are 0 (i.e., Hebb rule does not work) There are successful weights: w 1 = 1, w 2 = 1, w 3 = 2, and w 4 = 1 (but Hebb rule won t find these values)
Hebb Learning and Pattern Completion
Recurrent Network
Associate input vectors with scalar copies of themselves: Assume λ i are distinct y i = λ i x i Assume input vectors are unit length and mutually orthogonal: { 1 if i = j x T i x j = 0 else
Set W i = y i x T i = λ i x i x T i Set W = W 1 + + W n W x i = (W 1 + + W n ) x i = (λ 1 x 1 x T 1 + λ n x n x T n) x i = λ 1 x 1 x T 1 x i + + λ n x n x T n x i = 0 + + λ i x i + + 0 = λ i x i Hebb rule creates a weight matrix such that the input vectors are the eigenvectors of this matrix (!!!)