+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing

Linear recurrent networks Simpler, much more amenable to analytic treatment E.g. by choosing + ( + ) = Firing rates can be negative Approximates dynamics around fixed point Approximation often reasonable in presence of weak background activity output input In this case, firing rate is relative to baseline rate Rough, but often useful approximation: calculate dynamics of linear network, apply nonlinearity to solution B τ = + + =

Solution for symmetric M and timeindependent input h: Idea: Express firing rate v in terms of eigenvectors of M and solve for timedependent coefficients c µ ( ) = v µ= µ ( ) µ For a real symmetric matrix this is always possible. The eigenvectors satisfy output B input τ = + + µ = λ µ µ They are orthogonal and can be normalized to unit length µ ν = δ µν The eigenvalues λ ν are real for real symmetric matrices.

Substituting into the network equation yields v τ µ= ( ) = µ µ = v µ= v µ= µ ( ) µ ( λ µ ) µ ( ) µ + output B input τ = + + Taking the dot product of each side with e ν yields: τ ν = ( λ ν) ν ( ) + ν This involves only one coefficient c µ, i.e. the different components are decoupled

The solution of τ ν = ( λ ν) ν ( ) + ν ν ( ) = ν λ ν with initial condition ( ) for time-independent input h is ( ( ( λ ν) τ ν ( ) = ν ( ) The full solution is then obtained by )) output input The above equation for c µ (t) has a number of important characteristics to be discussed in the following! B + ν ( ) ( ) = τ = + + v µ= ( ( λ ) ν) τ µ ( ) µ

ν ( ) = ν λ ν ( ( ( λ )) ν) τ + ν ( ) ( ( λ ) ν) τ Discussion If λ v >1, the exponential functions grow without bound, reflecting a fundamental instability If λ v <1, c µ approaches the steady-state value exponentially The time constant is τ /( λ ν ) ν /( λ ν ) Thus, strictly speaking, the system has no memory, i.e. the effect of the initial condition decays to zero exponentially However, the time constant of decay can be very large, much larger than the time constant τ r of an individual neuron.

ν ( ) = ν λ ν ( ( ( λ )) ν) τ The steady state value is proportional to, the projection of the input vector onto the relevant eigenvector The steady state value of the firing rate is = v ν= ( ν ) λ ν ν ν Selective amplification: If one eigenvalue is close to one and all others are significantly different from one: ( ) λ + ν ( ) ( ( λ ν) τ ) More general: Projection onto k-dimensional subspace, in case of k degenerate eigenvalues close to one

Note: the steady-state solution of a recurrent network with time-independent input is also a solution of a feedforward network To see this, consider: Fixed point with steady state output: (for λ ν <1 and constant input u) τ = + + This can be rewritten as a feedforward network = + with = = ( )

Neural Decoding Forward mapping: how will neurons respond to a given stimulus? Backward mapping: how can you know something about the stimulus from the neural responses?

Perception an inverse problem Consider an experiment where we record from a neuron: let s be a stimulus parameter (like orientation of moving edge) let r be the response of the neuron (e.g. spike count firing rate) Then we can define: P[s], the probability of stimulus s being presented P[r], the probability of response r being recorded P[r,s], the probability of stimulus s being presented and response r being recorded (joint probability) P[r s], the conditional probability of evoking response r given that stimulus s was presented P[s r], the conditional probability that stimulus s was presented given the response r was evoked Bayes theorem: P[s r] = P[r s] P[s] P[r] Neural decoding! P[s]: prior P[s r]: posterior probability

Formalizing Neural Variability: Elements of information Theory Mutual information: Difference between total response entropy and average response entropy on trials that involve presentation of the same stimulus. Entropy of the responses evoked by repeated presentations of a given stimulus s: = Average over all the stimuli: = =, Noise entropy Entropy associated with that part of the response variability that is not due to changes in the stimulus, but arises from other sources.

Mutual information: The mutual information is the difference between the total response entropy and the average response entropy on trials that involve repetitive presentation of the same stimulus. = = +, With = one gets =, ( ) ( ) and with stimulus, = = =, ( ),, Which is the mutual information expressed by the KL-divergence

Mutual information is symmetric with respect to Interchanges of r and s =, ( ),, = +, = +, Average uncertainty about identity of stimulus given response.

Neural Example: neuron in MT Middle temporal cortex: large receptive fields sensitive to object motion record from single neuron during movement patterns such as the ones below animal is trained to decide if the coherent movement is upwards or downwards

Left: behavioral performance of the animal and of an ideal observer considering single neuron Right: histograms (thinned) of average firing rate for different stimuli (up/down) at different coherence levels

Likelihood Consider probability distribution depending on parameter θ Likelihood: L( x) =P (x ) The likelihood of parameter value θ given an observed (fixed) outcome x is equal to the probability of x given the parameter value θ Example Given that I have flipped a coin 100 times and it is a fair coin, what is the probability of it landing heads-up every time? "Given that I have flipped a coin 100 times and it has landed heads-up 100 times, what is the likelihood that the coin is fair?

Optimal discrimination Optimal strategy for discriminating between two alternative signals presented in background of noise? Assume we must base our decisions on the observation of a single observable x x could be e.g. the firing rate of a neuron when a particular stimulus is present If the signal is blue then the values of x are chosen from P(x blue) If the signal is red then the values of x are chosen from P(x red) If we have seen a particular value of x, can we tell which signal was presented? Intuition: Divide x axis at critical point x 0 : Everything to right is called a blue, everything to the left a red. How should we choose x 0?

More difficult decision problem: was stimulus blue or red present? 0.2 Optimal decision threshold Probability density 0.15 0.1 0.05 0 0 5 10 15 Firing rate Compare firing rate to threshold If larger than 7.5: blue; else: red

More difficult decision problem: was stimulus blue or red present? 0.2 Optimal decision threshold Probability density 0.15 0.1 0.05 0 0 5 10 15 Firing rate Choose decision boundary based on likelihood ratio: L( x) L( x) = p(x ) p(x )

A general result Applies also to multimodal and multivariate distributions 0.5 Probability density 0.4 0.3 0.2 0.1 0 0 5 10 15 Firing rate Choose decision boundary based on likelihood ratio: L( x) L( x) = p(x ) p(x )

Likelihood ratio, loss, bias L+: penalty associated with answering plus when the correct answer is minus L- : penalty associated with answering minus when the correct answer is plus Given firing rate r, the probabilities that the correct answer is - : P[- r] Probabilities that the correct answer is + : P[+ r]

Average loss expected for a plus answer given r: loss associated with being wrong times the probability of being the loss wrong associated Loss + = L + P[ r] = + Average loss expected for a minus answer given r: oss + = L + P[ r]. S Loss = L P[+ r]. plusî if Loss + Lo= L P[+ r]. A r Loss + Loss this strategy giv Strategy: cut losses. Answer plus if Using P[+ r] = p[r +]P[+] p[r] and P[ r] = p[r ]P[ ] p[r] yields l(r) = p[r +] p[r ] L + P[ ] L P[+]

Interpretation of l(r) = p[r +] p[r ] L + P[ ] L P[+] Likelihood ratio is compared to a threshold Threshold composed of two terms: penalty prior

Continuous variable. Example: Cricket cercal system 1.0 0.5 0.0 0 90 180 (degrees) 270 360 8 maximum likelihood 8 Bayesian error (degrees) 6 4 2 6 4 2 0-180 -90 0 90 180 wind direction (degrees) 0-180 -90 0 90 180 wind direction (degrees)

Optimal General Decoding Methods Optimal methods: Bayesian, maximum a posteriori (MAP), maximum likelihood (ML) MAP estimation: start with Bayes rule = This allows to calculate probability of each possible stimulus s given the neural response r. (But requires knowledge of p(r s), which is (still) difficult to estimate.) Bayesian decoding.: = The MAP estimate of s is the value s* which maximizes p(s r). If p(s) does not depend on s then s* maximizes p(r s). This is called the maximum likelihood (ML) estimate.

Decoding arbitrary stimulus parameter by a population of independent Poisson-neurons Instructive example: Array of N neurons, preferred stimulus value uniformly distributed Tuning curves ( ) = ( ( ) ) σ

The homogeneous Poisson process during very short time interval Δt there is a fixed probability of an event (spike) occurring independent of what happened previously if r is the rate of the Poisson process, then the probability of finding a spike in a short interval Δt is given by r Δt. ( ) The probability of seeing exactly n spikes in a (long) interval T is given by the Poisson distribution: = = (r ) ( r )

Gaussian fit ( ) = = (r ) Properties: mean: E[n] = rt variance: E[ (n - E[n]) 2 ] = rt Fano factor: variance/mean = 1 ( r ) good approximation by Gaussian for large rt (see fig. B)

Poisson neurons: Average rate to stimulus s determined by tuning curve Probability of stimulus evoking spikes = ( ( ) ) ( ) Assume independent neurons: = = ( ( ) ) Apply ML, use logarithm and only consider terms depending on s = ( ( ) ) +... Find maximum of r.h.s. by setting derivative to zero: 1.0 0.5 0.0 = -5-4 -3-2 -1 0 1 2 3 4 5 ( ) ( ( ) ) = ( ) ( ( ) ) = ( ) (remember) = (r ) = ( r ) ( ) ( ) = =

ML estimate Reproduced from previous slide = ( ) ( ) = For Gaussians we can use ( )/ ( ) = ( )/σ and thus obtain = /σ /σ and if all tuning curves have the same width = Intuitive explanation: firing rate weighted average of preferred values of encoding neurons

MAP estimate: include prior information Prior information is taken into account ( ) We now have = ( ( ) ) + +... Derivative to zero: = = ( ) ( ) + = which leads to 1.5 1.0 0.5 = /σ + /σ /σ + /σ Solid: constant stimulus distribution Dashed: s prior = -2, σ prior =1 0.0-1.0-0.5 0.0 0.5 1.0