Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment Jean-Pierre Nadal CNRS & EHESS Laboratoire de Physique Statistique (LPS, UMR 8550 CNRS - ENS UPMC Univ. Paris Diderot) Ecole Normale Supérieure (ENS) & Centre d Analyse et de Mathématique Sociales (CAMS, UMR 8557 CNRS - EHESS) Ecole des Hautes Etudes en Sciences Sociales (EHESS) nadal@lps.ens.fr
Neural coding Photoreceptors intensities data Retina Principal Component Analysis (PCA) Neural representation: activivities of ganglion cells Representation: projection onto the principal axis environment stimulus network neural representation data algorithm (neural code) signal filter ρ (.) θ X θ = { θ 1, θ 2,, θ N } X = { X 1, X 2,, X p }
Ecological approach to sensory coding: efficient adaptation to the natural environment Horace Barlow, 1961 H. B. Barlow. Possible principles underlying the transformation of sensory messages. Sensory Communication, pp. 217-234, 1961 efficient coding hypothesis sensory processing in the brain should be adapted to natural stimuli e.g. neurons in the visual (or auditory) system of a given animal should be optimized for coding images (or sounds) representative of those found in the natural environment of that animal. It has been shown that filters optimized for coding natural images lead to filters which resemble the receptive fields of simple-cells in V1. In the auditory domain, optimizing a network for coding natural sounds leads to filters which resemble the impulse response of cochlear filters found in the inner ear. Formalization: tools from Information Theory, Statistical (Bayesian) inference, parameter estimation
Neural coding PCA: max variances, well adapted to Gaussian like distributions More general: «infomax» Max Mutual Information[stimuli ; neural representation] environment stimulus network neural representation data algorithm (neural code) signal filter ρ (.) θ W X θ = { θ 1, θ 2,, θ N } X = { X 1, X 2,, X p }
Information Theory (Shannon) Entropy, Shannon information Mutual Information = output entropy equivocation o Capacity o Infomax Redundancy o different types of redundancies o min redundancy (H. Barlow, 1961) = Independent Component Analysis (ICA) Parameter estimation Cramer-Rao inequality Fisher Information
Entropy (Shannon information) Basic properties Discrete case Σ k p k = 1 (k=1,, K) H = - Σ k p k ln p k ln K Max entropy: equiprobable distribution, p k = 1/K H = ln K Binary case: 1 bit of information = 1 binary variable with equiprobable states (fair coin) H = ln 2 H / ln 2 = 1 (bit) With the logarithm in base 2: information in bits
Binary case binary random variable taking one value with probability f and the other with probability 1-f H = f ln f 1 f ln 1 f With logarithms in base 2: information in bits H 2 = f log 2 f (1 f) log 2 (1 f) log 2 (. ) = ln(. ) ln 2 H 2 1 H 2 f = H 2 (1 f) H 2 f = 0 = H 2 (f = 1) = 0 0 0 1/2 1 f For f = 1 2 H 2 = 1 bit
Entropy (Shannon information) Continuous case x X; probability distribution differential entropy H =
Entropy (Shannon information) Basic properties Max entropy? Continuous case (differential) entropy H = - ρ (x) ln ρ (x) dx among distributions ρ with support [a, b]: uniform distribution on [a, b]; ρ =1/(b-a); H = ln (b-a) among distributions ρ with support ]-, [ ρ (x) = under variance constraint, <x 2 >= σ 2 Gaussian distribution; H = ½ ln(2 π e σ 2 )
Entropy (Shannon information): simple examples Uniform distribution Gaussian distribution Multidimensional Gaussian distribution
Mutual information environment stimulus neural representation ρ (.) θ Q( X θ ) X = { X 1, X 2,, X p } I [ θ, X ] = [ entropy of X ] [ entropy of X given θ ] (equivocation) = ln Q X Q X d p X + ln Q(X θ) Q(X θ) d p X ρ θ d N θ = Information that X carries about θ = Information that θ carries about X = H X + H θ H(θ, X) P θ, X = ln ρ θ Q X P θ, X dp X d N θ Kullback-Leibler divergence between the joint distribution and the product distribution Output distribution (marginal distribution of X): Q X = Q(X θ) ρ θ d N θ Joint distribution of X and θ : P θ, X = Q(X θ) ρ θ
Mutual Information = difference of entropies large number of p objects Object type: τ {, } f = probability to have an object of type Classification Data analysis Signal processing Encoding If no error Maxwell s demon H 1 = 0 H 2 = 0 Entropy (Shannon information): H H = p f ln f 1 f ln(1 f) H = Information gain = decrease in entropy I = H - H 1 - H 2 = H Box number: σ { 1, 2}
Mutual Information = difference of entropies large number of p objects Object type: τ {, } f = probability to have an object of type Classification Data analysis Signal processing Encoding If noise, errors Drunk Maxwell s demon H 1 > 0 H 2 > 0 Box number: σ { 1, 2} Entropy (Shannon information): H H = p f ln f 1 f ln(1 f) H = Information gain = decrease in entropy I = H - H 1 - H 2 = mutual information between τ and σ
Basic properties of the mutual information For any random variables, X and Y: I(X,Y) 0 I(X,Y) = 0 iff the two random variables are statistically independent Mutual info = relative entropy (Kullback-Leibler divergence) between the joint and the factorized distributions Case X discrete: I(X,Y) H(X) ln K (similarly, if Y discrete, I(X,Y) H(Y) ln M ) Data processing theorem S X Y Z I(S,Z), I(S,Y), I(X,Y) I(S,Z) I(S,Y) I(X,Y) I(X,Y) = I(S,Y) + I(X,Y S) I(S,Y)
Mutual information environment stimulus neural representation ρ (.) θ Q( X θ ) X = { X 1, X 2,, X p } I [ θ, X ] 0 ( I = 0 θ and X are statistically independent ) Capacity: Q given C = max I [ θ, X ρ ] (transmission channel) principle for optimal coding: Infomax ρ given max I [ θ, X ] Q Redundancy (environment) R = I [ X 1, X 2,, X p ] 0 Barlow s principle: R = Σ k I [ θ, X k min R ] - I [ θ, X ] (can be < 0)
Shannon: Communication theory Channel codeword of length decoding: message = = largest number of codewords that can be decoded with a fraction of error Capacity: Memory less channel: ρ = proba. dist. of τ
Stimulus S output V Mutual info I (V,S) = output entropy - equivocation equivocation = entropy of the output given the stimulus, averaged over the stimulus distribution A useful particular case: additive noise V = f(s) + noise equivocation? = noise entropy (hence independent of the input distribution) I (V,S) = output entropy - noise entropy