Slie10 Haykin Chapter 14: Neuroynamics (3r E. Chapter 13) CPSC 636-600 Instructor: Yoonsuck Choe Spring 2012 Neural Networks with Temporal Behavior Inclusion of feeback gives temporal characteristics to neural networks: recurrent networks. Two ways to a feeback: Local feeback Global feeback Recurrent networks can become unstable or stable. Main interest is in recurrent network s stability: neuroynamics. Stability is a property of the whole system: coorination between parts is necessary. 1 2 Stability in Nonlinear Dynamical System Lyapunov stablity: more on this later. Stuy of neuroynamics: Deterministic neuroynamics: expresse as nonlinear ifferential equations. Stochastic neuroynamics: expresse in terms of stochastic nonlinear ifferential equations. Recurrent networks perturbe by noise. Preliminaries: Dynamical Systems A ynamical system is a system whose state varies with time. State-space moel: values of state variables change over time. Example: x 1 (t), x 2 (t),..., x N (t) are state variables that hol ifferent values uner inepenent variable t. This escribes a system of orer N, an x(t) is calle the state vector. The ynamics of the system is expresse using orinary ifferential equations: t x j(t) = F j (x j (t)), j = 1, 2,..., N. or, more conveniently x(t) = F(x(t)). t 3 4
Autonomous vs. Non-autonomous Dynamical State Space Systems x 2 t=1 x/t t=0 Autonomous: F( ) oes not explicitly epen on time. Non-autonomous: F( ) explicitly epens on time. t=2... x 1 F as a Vector Fiel Since x can be seen as velocity, F(x) can be seen as a t velocity vector fiel, or a vector fiel. In a vector fiel, each point in space (x) is associate with one unique vector (irection an magnitue). In a scalar fiel, one point has one scalar value. 5 It is convenient to view the state-space equation x t escribing the motion of a point in N-imensional space = F(x) as (Eucliean or non-eucliean). Note: t is continuous! The points traverse over time is calle the trajectory or the orbit. The tangent vector shows the instantaneous velocity at the initial conition. 6 Phase Portrait an Vector Fiel Conitions for the Solution of the State Space Equation A unique solution to the state space equation exists only uner certain conitions, which resticts the form of F(x). For a solution to exist, it is sufficient for F(x) to be continuous in all of its arguments. For uniqueness, it must meet the Lipschitz conition. Lipschitz conition: Let x an u be a pair of vectors in an open set M in a normal vector space. A vector function F(x) that satisfies: Re curves show the state (phase) portrait represente by trajectories from ifferent initial points. The blue arrows in the backgroun shows the vector fiel. Source: http://www.math.ku.eu/ byers/oe/b_cp_lab/pict.html 7 F(x) F(u) K x u for some constant K, the above is sai to be Lipschitz, an K is calle the Lipschitz constant for F(x). If F i / x j are finite everywhere, F(x) meet the Lipschitz conition. 8
Stability of Equilibrium States x M is sai to be an equilibrium state (or singular point) of the system if x t x= x = F( x) = 0. How the system behaves near these equilibrium states is of great interest. Near these points, we can approximate the ynamics by linearizing F(x) (using Taylor expansion) aroun x, i.e., x(t) = x + x(t): F(x) x + A x(t) Stability of in Linearize System In the linearize system, the property of the Jacobian matrix A etermine the behavior near equilibrium points. This is because x(t) A x(t). t If A is nonsingular, A 1 exists an this can be use to escribe the local behavior near the equilibrium x. where A is the Jacobian: A = x F(x) x= x The eigenvalues of the matrix A characterize ifferent classes of behaviors. 9 10 Eigenvalues/Eigenvectors Example: 2n-Orer System For a square matrix A, if a vector x an a scalar value λ exists so that (A λi)x = 0 then x is calle an eigenvector of A an λ an eigenvalue. Note, the above is simply Ax = λx An intuitive meaning is: x is the irection in which applying the linear transformation A only changes the magnitue of x (by λ) but not the angle. There can be as many as n eigenvector/eigenvalue for an n n matrix. 11 Positive/negative, real/imaginary character of eigenvalues of Jacobian etermine behavior. Stable noe (real -), unstable focus (Complex, + real) Stable focus (Complex, - real), Sale point (real + -) Unstable noe(real +), Center (Complex, 0 real) 12
Definitions of Stability Lyapunov s Theorem Uniformly stable for an arbitrary ɛ > 0, if there exists a positive δ such that x(0) x < δ implies x(t) x < ɛ for all t > 0. Convergent if there exists a positive δ such that x(0) x < δ implies x(t) x as t Asymptotically stable if both stable an convergent. Globally asymptotically stable if stable an all trajectoreis of the system converge to x as time t approaches infinity. Theorem 1: The equilibrium state x is stable if in a small neighborhoo of x there exists a positive efinite function V (x) such that its erivative with respect to time is negative semiefinite in that region. Theorem 2: The equilibrium state x is asymptotically stable if in a small neighborhoo of x there exists a positive efinite function V (x) such that its erivative with respect to time is negative efinite in that region. A scalar function V (x) that satisfies these conitions is calle a Lyapunov function for the equilibrium state x. 13 14 Attractors Types of Attractors Dissipative systems are characterize by attracting sets or manifols of imensionality lower than that of the embeing space. These are calle attractors. Regions of initial conitions of nonzero state space volume converge to these attractors as time t increases. Point attractors (left) Limit cycle attractors (right) Strange (chaotic) attractors (not shown) 15 16
Neuroynamical Moels We will focus on state variables are continuous-value, an those with ynamics expresse in ifferential equations or ifference equations. Properties: Large number of egree of freeom. Nonlinearity Dissipative (as oppose to conservative), i.e., open system. Noise Manipulation of Attractors as a Recurrent Nnet Paraigm We can ientify attractors with computational objects (associative memories, input-output mappers, etc.). In orer to o so, we must exercise control over the location of the attractors in the state space of the system. A learning algorithm will manipulate the equations governing the ynamical behavior so that a esire location of attractors are set. One goo way to o this is to use the energy minimization paraigm (e.g., by Hopfiel). 17 18 Hopfiel Moel Discrete Hopfiel Moel Base on McCulloch-Pitts moel (neurons with +1 or -1 output). Energy function is efine as E = 1 2 j=1 w ji x i x j (i j). Network ynamics will evolve in the irection that minimizes E. N units with full connection among every noe (no self-feeback). Given M input patterns, each having the same imensionality as the network, can be memorize in attractors of the network. Implements a content-aressable memory. Starting with an initial pattern, the ynamic will converge towar the attractor of the basin of attraction where the inital pattern was place. 19 20
Content-Aressable Memory Map a set of patterns to be memorize ξ µ onto fixe points x µ in the ynamical system realize by the recurrent network. Encoing: Mapping from ξ µ to x µ Decoing: Reverse mapping from state space x µ to ξ µ. Hopfiel Moel: Storage The learning is similar to Hebbian learning: M w ji = 1 ξ µ,j ξ µ,i N µ=1 with w ji = 0 if i = j. (Learning is one-shot.) In matrix form the above becomes: W = 1 N M µ=1 ξ µ ξ T µ MI The resulting weight matrix W is symmetric: W = W T. 21 22 Hopfiel Moel: Activation (Retrieval) Initialize the network with a probe pattern ξ probe. x j (0) = ξ probe,j. Upate output of each neuron (picking them by ranom) as x j (n + 1) = sgn ( N until x reaches a fixe point. w ji x i (n) ). Spurious States The weight matrix W is symmetric, thus the eigenvalues of W are all real. For large number of patters M, the matrix is egenerate, i.e., several eigenvectors can have the same eigenvalue. These eigenvectors form a subspace, an when the associate eigenvalue is 0, it is calle a null space. This is ue to M being smaller than the number of neurons N. Hopfiel network as content aressable memory: Discrete Hopfiel network acts as a vector projector (project probe vector onto subspace spanne by training patterns). Unerlying ynamics rive the network to converge to one of the corners of the unit hypercube. Spurious states are those corners of the hypercube that o not belong to the training pattern set. 23 24
Storage Capacity of Hopfiel Network Given a probe equal to the store pattern ξ ν, the activation of the jth neuron can be ecompose into the signal term an the noise term: v j = N = 1 N w jiξ ν,i M ξ N µ,j ξ µ,iξ ν,i µ=1 = ξ ν,j }{{} signal (ξ 3 ν,j =ξ ν,j {±1}) The signal-to-noise ratio is efine as ρ = + 1 N M µ=1,µ ν variance of signal = 1 variance of noise (M 1)/N N M The reciprocal of ρ, calle the loa parameter is esignate as α. ξ µ,j ξ µ,i ξ ν,i } {{ } noise Accoring to Amit an others, this value nees to be less than 0.14 (critical value α c ). 25 Storage Capacity of Hopfiel Network (cont ) Given α = 0.14, the storage capacity becomes M c = α c N = 0.14N when some error is allowe in the final patterns. For almost error-free performance, the storage capacity become N M c = 2 log e N Thus, storage capacity of Hopfiel network scales less than linearly with the size N of the network. This is a major limitation of the Hopfiel moel. 26 Cohen-Grossberg Theorem Cohen an Grossberg (1983) showe how to assess the stability of a certain class of neural networks: t u j = a j (u j ). [ b j (u j ) c ji ϕ i (u i ) Neural network with the above ynamics amits a Lyapunov function efine as: E = 1 2 where j=1 c ji ϕ i (u i )ϕ j (u j ) j=1 ϕ (λ) = λ (ϕ j(λ)). 27 ] uj 0, j = 1, 2,..., N b j(λ)ϕ j (λ)λ, Cohen-Grossberg Theorem (cont ) For the efinition in the previous slie to be vali, the following conitions nee to be met. The synaptic weights are symmetric. The function a j (u j ) satisfies the conition for nonnegativity. The nonlinear activation function ϕ j (u j ) nees to follow the monotonicity conition: With the above ϕ j (u j) = E t 0 ensuring global stability of the system. u j ϕ j (u j ) 0. Hopfiel moel can be seen as a special case of the Cohen-Grossberg theorem. 28
Demo Noisy input Partial input Capacity overloa 29