Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment

Similar documents
Supervised Learning Part I

Efficient Coding. Odelia Schwartz 2017

Information Theory. Mark van Rossum. January 24, School of Informatics, University of Edinburgh 1 / 35

Lecture 2: August 31

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing

Introduction to Information Theory

Natural Image Statistics and Neural Representations

Information in Biology

Information Theory Primer:

Slide11 Haykin Chapter 10: Information-Theoretic Models

encoding and estimation bottleneck and limits to visual fidelity

Information in Biology

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

Quantitative Biology Lecture 3

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

5 Mutual Information and Channel Capacity


COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem

Communication Theory II

Lecture 22: Final Review

Entropies & Information Theory

Quiz 2 Date: Monday, November 21, 2016

Shannon s noisy-channel theorem

(Classical) Information Theory III: Noisy channel coding

Encoding or decoding

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Dept. of Linguistics, Indiana University Fall 2015

Lecture 1: Introduction, Entropy and ML estimation

Population Coding. Maneesh Sahani Gatsby Computational Neuroscience Unit University College London

Machine Learning Srihari. Information Theory. Sargur N. Srihari

ELECTRONICS & COMMUNICATIONS DIGITAL COMMUNICATIONS

Lecture 5 - Information theory

Computing and Communications 2. Information Theory -Entropy

The IM Algorithm : A variational approach to Information Maximization

Revision of Lecture 4

One Lesson of Information Theory

The Method of Types and Its Application to Information Hiding

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Chapter 9 Fundamental Limits in Information Theory

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

Gaussian channel. Information theory 2013, lecture 6. Jens Sjölund. 8 May Jens Sjölund (IMT, LiU) Gaussian channel 1 / 26

Capacity of AWGN channels

Nonlinear reverse-correlation with synthesized naturalistic noise

Expectation Maximization

Chapter 2 Review of Classical Information Theory

c 2010 Christopher John Quinn

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

Chaos, Complexity, and Inference (36-462)

COMP 546. Lecture 21. Cochlea to brain, Source Localization. Tues. April 3, 2018

Information Theory, Statistics, and Decision Trees

Fisher Information Quantifies Task-Specific Performance in the Blowfly Photoreceptor

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Lateral organization & computation

EC2252 COMMUNICATION THEORY UNIT 5 INFORMATION THEORY

Natural Image Statistics

Basics of Information Processing

Block 2: Introduction to Information Theory

Modeling and Characterization of Neural Gain Control. Odelia Schwartz. A dissertation submitted in partial fulfillment

Tutorial on Blind Source Separation and Independent Component Analysis

Bioinformatics: Biology X

Do Neurons Process Information Efficiently?

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang

Exercises with solutions (Set D)

Lecture 8: Shannon s Noise Models

Signal detection theory

ELEMENT OF INFORMATION THEORY

Computational Systems Biology: Biology X

Bayesian Inference Course, WTCN, UCL, March 2013

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

Shannon s Noisy-Channel Coding Theorem

Multimedia Communications. Scalar Quantization

Revision of Lecture 5

Information Theory in Intelligent Decision Making

CS 630 Basic Probability and Information Theory. Tim Campbell

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

Bayesian probability theory and generative models

Coding for Discrete Source

Quantitative Biology II Lecture 4: Variational Methods

Lecture 21: Minimax Theory

Problem 7.7 : We assume that P (x i )=1/3, i =1, 2, 3. Then P (y 1 )= 1 ((1 p)+p) = P (y j )=1/3, j=2, 3. Hence : and similarly.

Pacific Symposium on Biocomputing 6: (2001)

Lecture 11: Information theory THURSDAY, FEBRUARY 21, 2019

Lecture 7. Union bound for reducing M-ary to binary hypothesis testing

Information maximization in a network of linear neurons

The peculiarities of the model: - it is allowed to use by attacker only noisy version of SG C w (n), - CO can be known exactly for attacker.

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

Exercises with solutions (Set B)

An Introduction to Independent Components Analysis (ICA)

Principal Component Analysis

Estimation of information-theoretic quantities

Introduction to Information Theory. B. Škorić, Physical Aspects of Digital Security, Chapter 2

Minimum Message Length Analysis of the Behrens Fisher Problem

SIDDHARTH GROUP OF INSTITUTIONS :: PUTTUR Siddharth Nagar, Narayanavanam Road UNIT I

Lecture 5 Channel Coding over Continuous Channels

Hands-On Learning Theory Fall 2016, Lecture 3

The Bayesian Brain. Robert Jacobs Department of Brain & Cognitive Sciences University of Rochester. May 11, 2017

Principles of Communications

PATTERN RECOGNITION AND MACHINE LEARNING

Transcription:

Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment Jean-Pierre Nadal CNRS & EHESS Laboratoire de Physique Statistique (LPS, UMR 8550 CNRS - ENS UPMC Univ. Paris Diderot) Ecole Normale Supérieure (ENS) & Centre d Analyse et de Mathématique Sociales (CAMS, UMR 8557 CNRS - EHESS) Ecole des Hautes Etudes en Sciences Sociales (EHESS) nadal@lps.ens.fr

Neural coding Photoreceptors intensities data Retina Principal Component Analysis (PCA) Neural representation: activivities of ganglion cells Representation: projection onto the principal axis environment stimulus network neural representation data algorithm (neural code) signal filter ρ (.) θ X θ = { θ 1, θ 2,, θ N } X = { X 1, X 2,, X p }

Ecological approach to sensory coding: efficient adaptation to the natural environment Horace Barlow, 1961 H. B. Barlow. Possible principles underlying the transformation of sensory messages. Sensory Communication, pp. 217-234, 1961 efficient coding hypothesis sensory processing in the brain should be adapted to natural stimuli e.g. neurons in the visual (or auditory) system of a given animal should be optimized for coding images (or sounds) representative of those found in the natural environment of that animal. It has been shown that filters optimized for coding natural images lead to filters which resemble the receptive fields of simple-cells in V1. In the auditory domain, optimizing a network for coding natural sounds leads to filters which resemble the impulse response of cochlear filters found in the inner ear. Formalization: tools from Information Theory, Statistical (Bayesian) inference, parameter estimation

Neural coding PCA: max variances, well adapted to Gaussian like distributions More general: «infomax» Max Mutual Information[stimuli ; neural representation] environment stimulus network neural representation data algorithm (neural code) signal filter ρ (.) θ W X θ = { θ 1, θ 2,, θ N } X = { X 1, X 2,, X p }

Information Theory (Shannon) Entropy, Shannon information Mutual Information = output entropy equivocation o Capacity o Infomax Redundancy o different types of redundancies o min redundancy (H. Barlow, 1961) = Independent Component Analysis (ICA) Parameter estimation Cramer-Rao inequality Fisher Information

Entropy (Shannon information) Basic properties Discrete case Σ k p k = 1 (k=1,, K) H = - Σ k p k ln p k ln K Max entropy: equiprobable distribution, p k = 1/K H = ln K Binary case: 1 bit of information = 1 binary variable with equiprobable states (fair coin) H = ln 2 H / ln 2 = 1 (bit) With the logarithm in base 2: information in bits

Binary case binary random variable taking one value with probability f and the other with probability 1-f H = f ln f 1 f ln 1 f With logarithms in base 2: information in bits H 2 = f log 2 f (1 f) log 2 (1 f) log 2 (. ) = ln(. ) ln 2 H 2 1 H 2 f = H 2 (1 f) H 2 f = 0 = H 2 (f = 1) = 0 0 0 1/2 1 f For f = 1 2 H 2 = 1 bit

Entropy (Shannon information) Continuous case x X; probability distribution differential entropy H =

Entropy (Shannon information) Basic properties Max entropy? Continuous case (differential) entropy H = - ρ (x) ln ρ (x) dx among distributions ρ with support [a, b]: uniform distribution on [a, b]; ρ =1/(b-a); H = ln (b-a) among distributions ρ with support ]-, [ ρ (x) = under variance constraint, <x 2 >= σ 2 Gaussian distribution; H = ½ ln(2 π e σ 2 )

Entropy (Shannon information): simple examples Uniform distribution Gaussian distribution Multidimensional Gaussian distribution

Mutual information environment stimulus neural representation ρ (.) θ Q( X θ ) X = { X 1, X 2,, X p } I [ θ, X ] = [ entropy of X ] [ entropy of X given θ ] (equivocation) = ln Q X Q X d p X + ln Q(X θ) Q(X θ) d p X ρ θ d N θ = Information that X carries about θ = Information that θ carries about X = H X + H θ H(θ, X) P θ, X = ln ρ θ Q X P θ, X dp X d N θ Kullback-Leibler divergence between the joint distribution and the product distribution Output distribution (marginal distribution of X): Q X = Q(X θ) ρ θ d N θ Joint distribution of X and θ : P θ, X = Q(X θ) ρ θ

Mutual Information = difference of entropies large number of p objects Object type: τ {, } f = probability to have an object of type Classification Data analysis Signal processing Encoding If no error Maxwell s demon H 1 = 0 H 2 = 0 Entropy (Shannon information): H H = p f ln f 1 f ln(1 f) H = Information gain = decrease in entropy I = H - H 1 - H 2 = H Box number: σ { 1, 2}

Mutual Information = difference of entropies large number of p objects Object type: τ {, } f = probability to have an object of type Classification Data analysis Signal processing Encoding If noise, errors Drunk Maxwell s demon H 1 > 0 H 2 > 0 Box number: σ { 1, 2} Entropy (Shannon information): H H = p f ln f 1 f ln(1 f) H = Information gain = decrease in entropy I = H - H 1 - H 2 = mutual information between τ and σ

Basic properties of the mutual information For any random variables, X and Y: I(X,Y) 0 I(X,Y) = 0 iff the two random variables are statistically independent Mutual info = relative entropy (Kullback-Leibler divergence) between the joint and the factorized distributions Case X discrete: I(X,Y) H(X) ln K (similarly, if Y discrete, I(X,Y) H(Y) ln M ) Data processing theorem S X Y Z I(S,Z), I(S,Y), I(X,Y) I(S,Z) I(S,Y) I(X,Y) I(X,Y) = I(S,Y) + I(X,Y S) I(S,Y)

Mutual information environment stimulus neural representation ρ (.) θ Q( X θ ) X = { X 1, X 2,, X p } I [ θ, X ] 0 ( I = 0 θ and X are statistically independent ) Capacity: Q given C = max I [ θ, X ρ ] (transmission channel) principle for optimal coding: Infomax ρ given max I [ θ, X ] Q Redundancy (environment) R = I [ X 1, X 2,, X p ] 0 Barlow s principle: R = Σ k I [ θ, X k min R ] - I [ θ, X ] (can be < 0)

Shannon: Communication theory Channel codeword of length decoding: message = = largest number of codewords that can be decoded with a fraction of error Capacity: Memory less channel: ρ = proba. dist. of τ

Stimulus S output V Mutual info I (V,S) = output entropy - equivocation equivocation = entropy of the output given the stimulus, averaged over the stimulus distribution A useful particular case: additive noise V = f(s) + noise equivocation? = noise entropy (hence independent of the input distribution) I (V,S) = output entropy - noise entropy