AS Elementary Cybernetics. Lecture 4: Neocybernetic Basic Models

Size: px

Start display at page:

Download "AS Elementary Cybernetics. Lecture 4: Neocybernetic Basic Models"

Abner Kelly
5 years ago
Views:

1 AS Elementary Cybernetics Lecture 4: Neocybernetic Basic Models

2 Starting point when modeling real complex systems: Observation: Bottom-up approaches (studying the mechanisms alone) is futile Another observation: op-down approaches alone are similarly hopeless there is no grounding Mission: Both views have to be combined One needs vision from top One needs substance from bottom ry to apply the ideas to a prototypical example: Modeling of neural networks the best understood complex system (?) Remember that combining the two views is a big challenge: Computationalism (numeric) and traditional AI (symbolic) seem to be incompatible; low-level functions and high-level (emergent) functionalities are very different

3 Neocybernetic starting points summary he details (along the time axis) are abstracted away, holistic view from the above is applied here exist local actions only, there are no structures of centralized control It is assumed that the underlying interactions and feedbacks are consistent, maintaining the system integrity his means that one can assume stationarity and dynamic balance in the system in varying environmental conditions An additional assumption: Linearity is pursued as long as it is reasonable Sounds simple are there any new intuitions available? Strong guiding principles for modeling

4 Modeling a neuron Neural (chemical) signals are pulse coded, asynchronous,... extremely complicated Simplification: Only the relevant information is represented the activation levels

5 Abstraction level #1 riggering of neuronal pulses is stochastic Assume that in stationary environmental conditions the average number of pulses in some time interval remains constant Only study statistical phenomena: Abstract the time axis away, only model average activity Perceptron: Linear summation of input signals v j + activation function: x i = ( ) i f W v and linear version m i = = i ij j j= 1 x W v w v v 1 v m w i1 w im Σ For the whole linear grid ζ f( ζ) x = W v x i

6 Still, remember the reality...

7 he emergence idea is exploited here deterministic activity variables are employed to describe behaviors How to exploit the first-level neuron abstraction, how to reach the neuron grid level of abstraction? Neural networks research studies this opposite ends: 1. Feedforward perceptron networks Non-intuitive: Black-box model, unanalyzable Mathematically strong: Smooth functions can be approximated to arbitrary accuracy 2. Kohonen s self-organizing maps (SOM) Intuitive: Easily interpretable by humans (visual pattern recognition capability exploited) Less mathematical: A mapping from m dimensional real-valued vectors to n integers Now, again, trust deep structures more than surface patterns!

8 Hebbian learning Artificial neural networks are mainly seen as computational tools only o capture the functional essence of neuronal systems, one has to elaborate on the domain area he Hebbian learning rule (by physician Donald O. Hebb) also dates back to mid-1900 s: If the neuron activity correlates with the input signal, the corresponding synaptic weight increases Are there some goals for neurons included here?! Is there something teleological taking place? Bold assumptions make it possible to reach powerful models

9 raditional approach Assume: Perceptron activity x i is a linear function of the input signal v j, where the vector w ij contains the synaptic weight: x with Hebbian law applied in adaptation: Correlation between input and neuronal activity expressed as x i ν j, so that dw dt = w v ij ij j ij = γ x v = γ w v 2 i j ij j assuming here, for simplicity, that m = 1. his learning law is unstable the synaptic weight grows infinitely, and so does x i! x i m = j= 1 x ij

10 Enhancements Stabilization by the Oja s rule (by Erkki Oja): dw dt ij = γ wv γ wx 2 2 ij j ij i Motivation: Keeps the weight vector bounded ( W i = 1), and average signal size E x i } = 1 Extracts the first principal component of the data Compare to the logistic formulation of limited growth! Extension: Generalized Hebbian Algorithm (GHA): Structural tailoring makes it possible to deflate pc s one at a time However, the new formula is nonlinear: Analysis of neuron grids containing such elements is difficult, and extending them is equally difficult What to do instead?

11 Level of synapses he neocybernetic guidelines are: Search for balance and linearity Note: Nonlinearity was not included in the original Hebbian law it was only introduced for pragmatic reasons Are there other ways to reach stability in linear terms? Yes one can apply negative feedback: dwij 1 dw 1 = γ i xivj τ w or in matrix form i ij = γ xv τ W dt dt he steady-state is } E } W = γτ E xv =Γ xv Synaptic weights become coded in a correlation matrix!

12 Level of neuron grids Just the same principles can be applied when studying the neuron grid level balance and linearity Distinguish sources: x W = ( A B) and v = u so that A xx and =Γ E } B =Γ E xu } o implement negative feedback, one needs to apply the anti-hebbian action between otherwise Hebbian neurons: dx Ax Bu dt = + 1 } 1 E E } so that the steady state becomes x = A Bu = xx xu u = φ u Model is stable! Eigenvalues of A always real and non-negative

13 Hebbian/anti-Hebbian system x = Ax+ Bu E xx} E xx} 1 1 E } xx E } xx 2 2 A Explicit feedback structures Completely localized operation, even though centralized matrix formulations applied to reach mathematical compactness x 1 B u 1 x 2 E } 1 1 E 1 2 } E } E xu 2 1 } E xu} xu E } xu 2 3 u 2 u 3

14 Different time scales after all, u also varies u j t u j u j x i τ x i τ u j x i τ

15 owards abstraction level #2 Cybernetic model = statistical model of balances x(u) Assume dynamics of u is essentially slower than that of x and study the covariance properties: or or 1 1 } } } } } xx = xx xu uu xu xx } E E E E E E 3 } } } xx = xu uu xu } E E E E ( E uu } ) E uu } 3 3 φ φ = φ φ n< m Balance on the statistical level = second-order balance Operator E connects the levels hird level of time scales!

16 Solution Expression fulfilled for φ = θ n D, where θ n is a matrix of n of the covariance matrix eigenvectors, and D is orthogonal Loose motivation: his is because left-hand side is then ( E } ) E } and right-hand side is E = n n = Λn } E } ( ) ( ) = n n = Λ n = Λn φ uu φ D θ uu θ D D D D D φ uu φ D θ uu θ D D D Stable solution when θ n contains the most significant data covariance matrix eigenvectors

17 Accurate proof: Adaptation converges to PS Study how the mapping matrices become adapted; assume that this process is divided in k steps: 1 x(0) = A (0) B(0) u he first part of this expression does not affect the subspace being spanned; so only B(k) is now of interest. Write it applying the basis axes spanned by the principal components of data, so that and B(0) = DΘ. Now one has Bk ( ) = B(0)E uu k k k = DΘ ΘΛ Θ = DΛ Θ, } } } } } x(1) = A (1) B(1) u = A (1) E x(0) u u = A (1) E A (0) B(0) uu u = A (1) A (0) B(0) E uu u } } x A B u A x u u A A A B uu uu u A A A B uu u (2) = (2) (2) = (2) E (1) = (2) E (1) (0) (0) E = (2) (1) (0) (0) E 1 xk A kbk u= = A ( k i) B(0)E uu u 1 ( ) = ( ) ( ) E uu } =ΘΛΘ } k i= 0 } k meaning that in the mapping matrix the relevance of the principal component direction j is weighted by λ j. Because the variables x i are linearly independent, it is the n most significant of those directions that only will remain visible in the mapped data after adaptation. his completes the proof. 2

18 Principal subspace analysis Any subset of input data principal components can be selected for φ he subspace spanned by the n most significant principal components gives a stable solution Conclusion: Competitive learning (combined Hebbian and anti-hebbian learning) without any structural constraints results in self-regulation (balance) and self-organization (in terms of principal subspace).

Emergent patterns he process (convergence of x) can be

E } E J x u = x xx x x xu } u 2 Models of local minima (m = 2, n

19 Emergent patterns he process (convergence of x) can be substituted with the final pattern: Details are lost, but the essence remains (?) he pattern is characterized in terms of a cost criterion 1 (, ) E } E J x u = x xx x x xu } u 2 Models of local minima (m = 2, n = 1): Process itself = Gradient descent minimization for the criterion! u 2 u 2 u 2 1 φ = / 2 1 u 1 0 φ = 1 u 1 1 φ = / 2 1 u 1

20 Mathematics vs. reality 1. Correlations vs. covariances he matrices being studied are correlation matrices rather than covariance matrices (as is normally the case in PCA) his means that now data u is not assumed to be zero-mean, there is no need for preprocessing; in practice, the variables are always non-negative From physical point of view, this is beneficial: Note that the actual signal carriers (chemical concentrations / pulse frequencies) cannot be negative 2. Principal subspace vs. principal components When applying the linear structure, the actual principal components are not distinguished, only the subspace spanned by them his means that the variables can again all be non-negative, so that the signals x again can be physically plausible Indeed: If all u j are non-negative, and initially all x i have non-negative values, the x i s will always be non-negative (so that one has a positive system)

21 Unification of layers Again, study a synapse; it is not a static mapping but a dynamic system (much faster than the grid dynamics) Assume this (trivial) system is also cybernetic: dxij 2 = γe x } E } i xij + γ xv i j vj dt or in balance xv } i j x 2 } i E x = v = w v E ij j ij j Extra term! Comparing this to the grid model, one can see that the two layers are qualitatively identical if one selects Var } Γ= xx 1

22 echnical experiments In technical terms, one can define the algorithm 1 x = Eˆ xx ˆ xu u where dê xx } dt dê xu } dt } E } xx } = λ Ê + λ xx xu } = λ Ê + λ xu In practice, easiest implemented in discrete-time Matrix inversion lemma can be applied Matrices can be masked to implement hierarchies, etc. Initialization: Matrix A has to be always positive definite

23 Principal subspaces found OK for different data sets Data 1 Data 2 HAH = Hebbian/Anti- Hebbian algorithm Data 3 Data 4

24 PC s cannot always be told apart Data 1 Data 2 Data 3 Data 4

25 Summary: Neocybernetic models First-order cybernetic system: For any stable A, assume that there holds dx Ax Bu dt = + 1 with x = A Bu = φ u Second-order cybernetic system: Additionally, assume that the matrices are A E =Γ xx } and B =Γ E xu } Higher-order (optimized) cybernetic system: Additionally, asume that Var } xx 1 Γ= or Γ= E xx } 1 Newton algorithm: Second-order convergence

26 Example: Hand-written digits here were a large body of 32x32 pixel images, representing digits from 0 to 9, over 8000 samples Examples of typical 9 Examples of less typical 9

27 Algorithm for Hebbian/anti-Hebbian learning... LOOP iterate for data in the kxm matrix U % Balance of latent variables Xbar = U * (inv(exx)*exu)'; % Model adaptation Exu = lambda*exu + (1-lambda)*Xbar'*U/k; Exx = lambda*exx + (1-lambda)*Xbar'*Xbar/k; % PCA rather than PSA through structural constraints Exx = tril(ones(n,n)).*exx; END % Recursive algorithm can be boosted with matrix inversion lemma

28 ... resulting in Principal Components Parameters: m = 1024 n = λ = 0.5 Rather fast convergence, starting from θ 1 on top left

29 Summary of Clever Agents Emergence in terms of self-regulation (stability) and selforganization (principal subspace analysis) reached his is reached applying physiologically plausible operations and model is linear scalable beyond toy domains Learning is local but not completely local: Need communication among neurons (anti-hebbian structures) Roles of signals different: How to motivate the inversion in adaptation direction (anti-hebbian learning)? Solution next: Apply non-idealities in an unorthodox way! here exist no unidirectional causal flows in real life systems Feedback: Exploiting a signal exhausts that signal

30 Stupid Agents } φ = qe xu stiffness, coupling factor φ Τ u + + u x Δu ϕ ϕ = E ux } q Implicit feedback Exhaustion through Exploitation Generalized Boltzmann Machine?

31 Simply go for resources Again: Balancing is reached by feedback, but now not explicitly but implicitly through the environment x = φ u u = u ϕ x Also environment finds its balance Only exploiting locally visible quantities, implement evolutionary adaptation symmetrically as φ = ϕ = q q E E xu } xu } How to characterize this environmental balance?

32 Because x = q xu u, one can write two covariances: and so that E } xu } = q xu } uu } E E E } 2 } } } } xx = q xu uu xu = q xu xu } E E E E E E -1/2-1/2 In = q E xx E xu E xu E xx q =θ θ } } } } D θ θ D 1 } -1/2 } } } } -1/2 In = q E xx E xu E uu E xu E xx q q D θ θ D Forget the trivial solution where x i is identically zero

33 Similarly, if x = Q xu u for some (diagonal) matrix Q: and E } xu } = Q xu } uu } E E E } = } } } = } } E xx QE xu E uu E xu Q E xu E xu Q Note: this has to be symmetric, so that } } } xx = xx = Q xu xu } E E E E Stronger formulation is reached: } } -1/2 1/2 θ = E xu E xx Q For non-identical q i, this has to become diagonal also

34 Equalization of environmental variances Because θθ= I n and θ uu θ = Q, θ consists of the n (most significant) eigenvectors of E uu }, and E uu } If n = m, the variation structure becomes trivial: E 1 uu = I m q } or E } 1 E } 1 uu = Q Visible data variation becomes whitened by the feedback Relation to ICA : Assume that this whitened data is further processed by neurons (FOBI) but this has to be nonlinear! On the other hand, if q i are different, the modes become separated in the PCA style (rather than PSA) Still try to avoid nonlinearity!

35 λ Open-loop eigenvalues Sub-cybernetic system Marginally cybernetic Hyper-cybernetic λ n 1/q λn+1 Untouched 1/q 1 1/q n Unequal stiffnesses Eigenvalue index DEMO equalvar.m

36 Emergence of Intelligence Solving for the balance signals, one can write ( } ) 1 1 E E } x = xx + Q xu u E } his can be rewritten in a form where there is xu as the feedforward matrix, and E } 1 xx + Q as the feedback matrix Of course, u has changed to u as it is the only signal visible It turns out that if the coupling increases, so that qi, this structure equals the prior one with explicit feedback his means that in evolution (to be studied later closer) a selfish agent finally becomes a clever agent Without preprogramming, an agent becomes context-aware

37 Variance inheritance Further study the relationship between x and original u: ( } } ) 1 2 E E E } n x = I + q xu xu q xu u ( }) 1 E E } n = I + q xx q xu u Multiply from the right by transpose, and take expectations: ( } ) E E } E I } n + q xx xx In + q xx ( ) ( ) 2 } 1/2 1/2 } } n = E xx I + q E xx E xx = } } } 2 q E xu E uu E xu

38 ( I E }) n + q xx 2-1/2-1/2 } } } } } = q qe xx E xu E uu E xu E xx q D θ Solving for the latent covariance: E } 1 ( E } ) 1/2 1 xx = q D θ uu θd I n q q his means that the external and internal eigenvalues (variances) are related as follows: q λ 1 i q i j for pairs, there must hold θ D q λ > 1 i j

39 Effect of feedback = add black noise σ 2 Adding/subtracting white noise 1 n White noise = Constant increase in all directions Black noise = Decrease in all directions (if poss.) Adding black noise 1/q 1 n

40 Results of orthogonal basis rotations 1 λ 1 n otal variance above zero level intact regardless of the rotations otal variance above 1 changes! Mathematical view of data λ 1 Cybernetic view of data 1 n

41 Factor Analysis Algorithm rotates the PCA subspace axes

42 owards differentiation of features A simple example of nonlinear extensions: CU function If variable is positive, let it through; otherwise, filter it out Well in line with modeling of activity in neuronal systems: Frequencies cannot become negative (interpretation in terms of pulse trains) Concentrations cannot become negative (interpretation in terms of chemicals) Sizes of neuron populations cannot become negative Power / information content cannot become negative, etc. Makes modes separated Still: End result almost linear! f cut,i f (() xz ) i f ( x) i = x, when x > 0 i 0, when x 0 i i x z i Symmetry breaking?!

43 Algorithm for Hebbian feedback learning... LOOP iterate for data in kxm matrix U END % Balance of latent variables Xbar = U * (inv(eye(n)+q*exu*exu')*q*exu)'; % Enhance model convergence by nonlinearity Xbar = Xbar.*(Xbar>0); % Balance of the environmental signals Ubar = U - Xbar*Exu; % Model adaptation Exu = lambda*exu + (1-lambda)*Xbar'*Ubar/k; % Maintaining system activity Exx = Xbar'*Xbar/k; q = q + P*diag(ref - sqrt(diag(exx)));

44 ... resulting in Sparse Components! Parameters: m = 1024 n = λ = 0.97 ref = DEMO digitfeat.m

1 2 3 4 5 6 7 8 9 10 Work load becomes distributed 11 12 13 14

below: 25 16 17 18 19 20 21 22 23 24 0 1 2 3 4 5 6 7 8 9 1 2 3

45 Work load becomes distributed Correlations between inputs and neuronal activities shown below: max 0 min

46 Sparse coding raditional modeling goal mathematically optimal and simple to implement: Minimize the overall size of the model Sparsity-oriented modeling goal often physically relevant, but cumbersome to characterize: Minimize the number of simultaneously active representations while there are no acute constraints on the overall model size here is a tradeoff: too sparse model becomes inaccurate Note the (intuitive) connection to cognitive system: here are no acute constraints on the long-term memory (LM), whereas the size of the working memory / short-term memory is limited to 7 +/ 2

47 Studied later... Visual V1 cortex seems to do this kind of decomposing

48 Conclusion: Clever vs. stupid agents Both agent types can exist resulting systems very different Intelligent agent Stupid/selfish agent Hebbian/anti-Hebbian learning u u Hebbian environmental learning x = E xx } -1 E xu } u x = qe xu } u Principal Component (Subspace) Analysis x x Sparse Component (Subspace) Analysis Complete insulation ( or elimination) of environment Stiffness regulation, equalization of environment

Neuron Grids as seen as. Elastic Systems. Heikki Hyötyniemi TKK / Control Engineering Presentation at NeuroCafé March 31, 2006

Neuron Grids as seen as. Elastic Systems. Heikki Hyötyniemi TKK / Control Engineering Presentation at NeuroCafé March 31, 2006 Neuron Grids as seen as Elastic Systems Heikki Hyötyniemi KK / Control Engineering Presentation at NeuroCafé March 31, 2006 Heikki Hyötyniemi Chairman of the Finnish Artificial Intelligence Society (FAIS)