Information Theory. Mark van Rossum. January 24, School of Informatics, University of Edinburgh 1 / 35

Size: px
Start display at page:

Download "Information Theory. Mark van Rossum. January 24, School of Informatics, University of Edinburgh 1 / 35"

Transcription

1 1 / 35 Information Theory Mark van Rossum School of Informatics, University of Edinburgh January 24, Version: January 24, 2018

2 Why information theory 2 / 35 Understanding the neural code. Encoding and decoding. We imposed coding schemes, such as 2nd-order kernel, or NLP. We possibly lost information in doing so. Instead, use information: Don t need to impose encoding or decoding scheme (non-parametric). In particular important for 1) spike timing codes, 2) higher areas. Estimate how much information is coded in certain signal. Caveats: No easy decoding scheme for organism (upper bound only) Requires more data and biases are tricky

3 3 / 35 Overview Entropy, Mutual Information Entropy Maximization for a Single Neuron Maximizing Mutual Information Estimating information Reading: Dayan and Abbott ch 4, Rieke

4 4 / 35 Definition The entropy of a quantity is defined as H(X) = x P(x) log 2 P(x) This is not derived, nor fully unique, but it fulfills these criteria: Continuous If p i = 1 n, it increases monotonically with n. H = log 2 n. Parallel independent channels add. Unit : bits Entropy can be thought of as physical entropy, richness of distribution [Shannon and Weaver, 1949, Cover and Thomas, 1991, Rieke et al., 1996]

5 Entropy 5 / 35 Discrete variable H(R) = r p(r) log 2 p(r) Continuous variable at resolution r H(R) = r p(r) r log 2 (p(r) r) = r p(r) r log 2 p(r) log 2 r letting r 0 we have lim [H + log 2 r] = r 0 p(r) log 2 p(r)dr (also called differential entropy)

6 Joint, Conditional entropy 6 / 35 Joint entropy: H(S, R) = r,s P(S, R) log 2 P(S, R) Conditional entropy: H(S R) = r P(R = r)h(s R = r) = r P(r) s P(s r) log 2 P(s r) If S, R are independent = H(S, R) H(R) H(S, R) = H(S) + H(R)

7 7 / 35 Mutual information Mutual information: I m (R; S) = r,s p(r, s) log 2 p(r, s) p(r)p(s) = H(R) H(R S) = H(S) H(S R) Measures reduction in uncertainty of R by knowing S (or vice versa) I m (R; S) 0 The continuous version is the difference of two entropies, the r divergence cancels

8 Mutual Information 8 / 35 The joint histogram determines mutual information. Given P(r, s) I m.

9 Mutual Information: Examples 9 / 35 Y non smoker smoker 1 smoker Y 1 non smoker Y 2 lung cancer 1/3 0 Y 2 lung cancer 1/9 2/9 no lung cancer 0 2/3 no lung cancer 2/9 4/9

10 Mutual Information: Examples 10 / 35 Y non smoker smoker 1 smoker Y 1 non smoker Y 2 lung cancer 1/3 0 Y 2 lung cancer 1/9 2/9 no lung cancer 0 2/3 no lung cancer 2/9 4/9 Only for the left joint probability I m > 0 (how much?). On the right, knowledge about Y 1 does not inform us about Y 2.

11 11 / 35 Kullback-Leibler divergence KL-divergence measures distance between two probability distributions D KL (P Q) = P(x) log 2 P(x) Q(x) dx, or D KL(P Q) i P i log 2 P i Q i Not symmetric, but can be symmetrized I m (R; S) = D KL (p(r, s) p(r)p(s)). Often used as probabilistic cost function: D KL (data model). Other probability distances exist (e.g. earth-movers distance)

12 Mutual info between jointly Gaussian variables 12 / 35 I(Y 1 ; Y 2 ) = P(y 1, y 2 ) log 2 P(y 1, y 2 ) P(y 1 )P(y 2 ) dy 1 dy 2 = 1 2 log 2(1 ρ 2 ) ρ is (Pearson-r) correlation coefficient.

13 Populations of Neurons 13 / 35 Given H(R) = p(r) log 2 p(r)dr N log 2 r and H(R i ) = p(r i ) log 2 p(r i )dr log 2 r We have H(R) i H(R i ) (proof, consider KL divergence)

14 Mutual information in populations of Neurons 14 / 35 Reduncancy can be defined as (compare to above) n r R = I(r i ; s) I(r; s). i=1 Some codes have R > 0 (redundant code), others R < 0 (synergistic) Example of synergistic code: P(r 1, r 2, s) with P(0, 0, 1) = P(0, 1, 0) = P(1, 0, 0) = P(1, 1, 1) = 1 4

15 15 / 35 Entropy Maximization for a Single Neuron I m (R; S) = H(R) H(R S) If noise entropy H(R S) is independent of the transformation S R, we can maximize mutual information by maximizing H(R) under given constraints Possible constraint: response r is 0 < r < r max. Maximal H(R) if p(r) U(0, r max ) (U is uniform dist) If average firing rate is limited, and 0 < r < : exponential distribution is optimal p(x) = 1/ xexp( x/ x). H = log 2 e x If variance is fixed and < r < : Gaussian distribution. H = 1 2 log 2(2πeσ 2 ) (note funny units)

16 Let r = f (s) and s p(s). Which f (assumed monotonic) maximizes H(R) using max firing rate constraint? Require: P(r) = 1 r max Thus df /ds = r max p(s) and p(s) = p(r) dr ds = 1 r max f (s) = r max s df ds s min p(s )ds This strategy is known as histogram equalization in signal processing 16 / 35

17 Fly retina Evidence that the large monopolar cell in the fly visual system carries out histogram equalization Contrast response for fly large monopolar cell (points) matches environment statistics (line) [Laughlin, 1981] (but changes in high noise conditions) 17 / 35

18 V1 contrast responses 18 / 35 Similar in V1, but On and Off channels [Brady and Field, 2000]

19 Information of time varying signals 19 / 35 Single analog channel with Gaussian signal s and Gaussian noise η: r = s + η I = 1 2 log 2(1 + σ2 s ση 2 ) = 1 2 log 2(1 + SNR) For time dependent signals I = 1 2 T dω 2π log 2(1 + s(ω) n(ω) ) To maximize information, when variance of the signal is constrained, use all frequency bands such that signal+noise = constant. Whitening. Water filling analog:

20 Information of graded synapses 20 / 35 Light - (photon noise) - photoreceptor - (synaptic noise) - LMC At low light levels photon noise dominates, synaptic noise is negligible. Information rate: 1500 bits/s [de Ruyter van Steveninck and Laughlin, 1996].

21 Spiking neurons: maximal information 21 / 35 Spike train with N = T /δt bins [Mackay and McCullogh, 1952] δt time-resolution. N! pn = N 1 events, #words = N 1!(N N 1 )! Maximal entropy if all words are equally likely. H = p i log 2 p i = log 2 N! log 2 N 1! log 2 (N N 1 )! Use for large x that log x! x(log x 1) H = T δt [p log 2 p + (1 p) log 2 (1 p)] For low rates p 1, setting λ = (δt)p: H = T λ log 2 ( e λδt )

22 Spiking neurons 22 / 35 Calculation incorrect when multiple spikes per bin. Instead, for large bins maximal information for exponential distribution: P(n) = 1 1 Z exp[ n log(1 + n )] H = log 2 (1 + n ) + n log 2 (1 + 1 n ) log 2(1 + n ) + 1

23 23 / 35 Spiking neurons: rate code [Stein, 1967] Measure rate in window T, during which stimulus is constant. Periodic neuron can maximally encode [1 + (f max f min )T ] stimuli H log 2 [1 + (f max f min )T ]. Note, only log(t )

24 [Stein, 1967] Similar behaviour for Poisson : H log(t ) 24 / 35

25 Spiking neurons: dynamic stimuli 25 / 35 [de Ruyter van Steveninck et al., 1997], but see [Warzecha and Egelhaaf, 1999].

26 26 / 35 Maximizing Information Transmission: single output Single linear neuron with post-synaptic noise v = w u + η where η is an independent noise variable I m (u; v) = H(v) H(v u) Second term depends only on p(η) To maximize I m need to maximize H(v); sensible constraint is that w 2 = 1 If u N(0, Q) and η N(0, σ 2 η) then v N(0, w T Qw + σ 2 η)

27 27 / 35 For a Gaussian RV with variance σ 2 we have H = 1 2 log 2πeσ2. To maximize H(v) we need to maximize w T Qw subject to the constraint w 2 = 1 Thus w e 1 so we obtain PCA If v is non-gaussian then this calculation gives an upper bound on H(v) (as the Gaussian distribution is the maximum entropy distribution for a given mean and covariance)

28 Infomax 28 / 35 Infomax: maximize information in multiple outputs wrt weights [Linsker, 1988] v = W u + η H(v) = 1 2 log det( vvt ) Example: 2 inputs and 2 outputs. Input is correlated. w 2 k1 + w 2 k2 = 1. At low noise independent coding, at high noise joint coding.

29 Estimating information 29 / 35 Information estimation requires a lot of data. Most statistical quantities are unbiased (mean, var,...). But both entropy and noise entropy have bias. [Panzeri et al., 2007]

30 Try to fit 1/N correction [Strong et al., 1998] 30 / 35

31 Common technique for I m : shuffle correction [Panzeri et al., 2007] See also: [Paninski, 2003, Nemenman et al., 2002] 31 / 35

32 32 / 35 Summary Information theory provides non parametric framework for coding Optimal coding schemes depend strongly on noise assumptions and optimization constraints In data analysis biases can be substantial

33 References I 33 / 35 Brady, N. and Field, D. J. (2000). Local contrast in natural images: normalisation and coding efficiency. Perception, 29(9): Cover, T. M. and Thomas, J. A. (1991). Elements of information theory. Wiley, New York. de Ruyter van Steveninck, R. R. and Laughlin, S. B. (1996). The rate of information transfer at graded-potential synapses. Nature, 379: de Ruyter van Steveninck, R. R., Lewen, G. D., Strong, S. P., Koberle, R., and Bialek, W. (1997). Reproducibility and variability in neural spike trains. Science, 275: Laughlin, S. B. (1981). A simple coding procedure enhances a neuron s information capacity. Zeitschrift für Naturforschung, 36: Linsker, R. (1988). Self-organization in a perceptual network. Computer, 21(3):

34 References II 34 / 35 Mackay, D. and McCullogh, W. S. (1952). The limiting information capacity of neuronal link. Bull Math Biophys, 14: Nemenman, I., Shafee, F., and Bialek, W. (2002). Entropy and Inference, Revisited. nips, 14. Paninski, L. (2003). Estimation of Entropy and Mutual Information. Neural Comp., 15: Panzeri, S., Senatore, R., Montemurro, M. A., and Petersen, R. S. (2007). Correcting for the sampling bias problem in spike train information measures. J Neurophysiol, 98(3): Rieke, F., Warland, D., de Ruyter van Steveninck, R., and Bialek, W. (1996). Spikes: Exploring the neural code. MIT Press, Cambridge. Shannon, C. E. and Weaver, W. (1949). The mathematical theory of communication. Univeristy of Illinois Press, Illinois.

35 References III 35 / 35 Stein, R. B. (1967). The information capacity of nerve cells using a frequency code. Biophys J, 7: Strong, S. P., Koberle, R., de Ruyter van Steveninck, R. R., and Bialek, W. (1998). Entropy and Information in Neural Spike Trains. Phys Rev Lett, 80: Warzecha, A. K. and Egelhaaf, M. (1999). Variability in spike trains during constant and dynamic stimulation. Science, 283(5409):

Estimation of information-theoretic quantities

Estimation of information-theoretic quantities Estimation of information-theoretic quantities Liam Paninski Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/ liam liam@gatsby.ucl.ac.uk November 16, 2004 Some

More information

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy Coding and Information Theory Chris Williams, School of Informatics, University of Edinburgh Overview What is information theory? Entropy Coding Information Theory Shannon (1948): Information theory is

More information

encoding and estimation bottleneck and limits to visual fidelity

encoding and estimation bottleneck and limits to visual fidelity Retina Light Optic Nerve photoreceptors encoding and estimation bottleneck and limits to visual fidelity interneurons ganglion cells light The Neural Coding Problem s(t) {t i } Central goals for today:

More information

Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment

Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment Jean-Pierre Nadal CNRS & EHESS Laboratoire de Physique Statistique (LPS, UMR 8550 CNRS - ENS UPMC Univ.

More information

Membrane equation. VCl. dv dt + V = V Na G Na + V K G K + V Cl G Cl. G total. C m. G total = G Na + G K + G Cl

Membrane equation. VCl. dv dt + V = V Na G Na + V K G K + V Cl G Cl. G total. C m. G total = G Na + G K + G Cl Spiking neurons Membrane equation V GNa GK GCl Cm VNa VK VCl dv dt + V = V Na G Na + V K G K + V Cl G Cl G total G total = G Na + G K + G Cl = C m G total Membrane with synaptic inputs V Gleak GNa GK

More information

arxiv:cond-mat/ v2 27 Jun 1997

arxiv:cond-mat/ v2 27 Jun 1997 Entropy and Information in Neural Spike rains Steven P. Strong, 1 Roland Koberle, 1,2 Rob R. de Ruyter van Steveninck, 1 and William Bialek 1 1 NEC Research Institute, 4 Independence Way, Princeton, New

More information

Encoding or decoding

Encoding or decoding Encoding or decoding Decoding How well can we learn what the stimulus is by looking at the neural responses? We will discuss two approaches: devise and evaluate explicit algorithms for extracting a stimulus

More information

4.2 Entropy lost and information gained

4.2 Entropy lost and information gained 4.2. ENTROPY LOST AND INFORMATION GAINED 101 4.2 Entropy lost and information gained Returning to the conversation between Max and Allan, we assumed that Max would receive a complete answer to his question,

More information

Fisher Information Quantifies Task-Specific Performance in the Blowfly Photoreceptor

Fisher Information Quantifies Task-Specific Performance in the Blowfly Photoreceptor Fisher Information Quantifies Task-Specific Performance in the Blowfly Photoreceptor Peng Xu and Pamela Abshire Department of Electrical and Computer Engineering and the Institute for Systems Research

More information

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing Linear recurrent networks Simpler, much more amenable to analytic treatment E.g. by choosing + ( + ) = Firing rates can be negative Approximates dynamics around fixed point Approximation often reasonable

More information

Natural Image Statistics and Neural Representations

Natural Image Statistics and Neural Representations Natural Image Statistics and Neural Representations Michael Lewicki Center for the Neural Basis of Cognition & Department of Computer Science Carnegie Mellon University? 1 Outline 1. Information theory

More information

1/12/2017. Computational neuroscience. Neurotechnology.

1/12/2017. Computational neuroscience. Neurotechnology. Computational neuroscience Neurotechnology https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-core-concepts/ 1 Neurotechnology http://www.lce.hut.fi/research/cogntech/neurophysiology Recording

More information

Do Neurons Process Information Efficiently?

Do Neurons Process Information Efficiently? Do Neurons Process Information Efficiently? James V Stone, University of Sheffield Claude Shannon, 1916-2001 Nothing in biology makes sense except in the light of evolution. Theodosius Dobzhansky, 1973.

More information

Slide11 Haykin Chapter 10: Information-Theoretic Models

Slide11 Haykin Chapter 10: Information-Theoretic Models Slide11 Haykin Chapter 10: Information-Theoretic Models CPSC 636-600 Instructor: Yoonsuck Choe Spring 2015 ICA section is heavily derived from Aapo Hyvärinen s ICA tutorial: http://www.cis.hut.fi/aapo/papers/ijcnn99_tutorialweb/.

More information

Correlations in Populations: Information-Theoretic Limits

Correlations in Populations: Information-Theoretic Limits Correlations in Populations: Information-Theoretic Limits Don H. Johnson Ilan N. Goodman dhj@rice.edu Department of Electrical & Computer Engineering Rice University, Houston, Texas Population coding Describe

More information

Adaptive contrast gain control and information maximization $

Adaptive contrast gain control and information maximization $ Neurocomputing 65 66 (2005) 6 www.elsevier.com/locate/neucom Adaptive contrast gain control and information maximization $ Yuguo Yu a,, Tai Sing Lee b a Center for the Neural Basis of Cognition, Carnegie

More information

Ch. 8 Math Preliminaries for Lossy Coding. 8.5 Rate-Distortion Theory

Ch. 8 Math Preliminaries for Lossy Coding. 8.5 Rate-Distortion Theory Ch. 8 Math Preliminaries for Lossy Coding 8.5 Rate-Distortion Theory 1 Introduction Theory provide insight into the trade between Rate & Distortion This theory is needed to answer: What do typical R-D

More information

Signal detection theory

Signal detection theory Signal detection theory z p[r -] p[r +] - + Role of priors: Find z by maximizing P[correct] = p[+] b(z) + p[-](1 a(z)) Is there a better test to use than r? z p[r -] p[r +] - + The optimal

More information

Information Theory Primer:

Information Theory Primer: Information Theory Primer: Entropy, KL Divergence, Mutual Information, Jensen s inequality Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Multi-neuron spike trains - distances and information.

Multi-neuron spike trains - distances and information. Multi-neuron spike trains - distances and information. Conor Houghton Computer Science, U Bristol NETT Florence, March 204 Thanks for the invite In 87 Stendhal reportedly was overcome by the cultural richness

More information

Information maximization in a network of linear neurons

Information maximization in a network of linear neurons Information maximization in a network of linear neurons Holger Arnold May 30, 005 1 Introduction It is known since the work of Hubel and Wiesel [3], that many cells in the early visual areas of mammals

More information

Approximations of Shannon Mutual Information for Discrete Variables with Applications to Neural Population Coding

Approximations of Shannon Mutual Information for Discrete Variables with Applications to Neural Population Coding Article Approximations of Shannon utual Information for Discrete Variables with Applications to Neural Population Coding Wentao Huang 1,2, * and Kechen Zhang 2, * 1 Key Laboratory of Cognition and Intelligence

More information

Efficient Coding. Odelia Schwartz 2017

Efficient Coding. Odelia Schwartz 2017 Efficient Coding Odelia Schwartz 2017 1 Levels of modeling Descriptive (what) Mechanistic (how) Interpretive (why) 2 Levels of modeling Fitting a receptive field model to experimental data (e.g., using

More information

Neural Encoding. Mark van Rossum. January School of Informatics, University of Edinburgh 1 / 58

Neural Encoding. Mark van Rossum. January School of Informatics, University of Edinburgh 1 / 58 1 / 58 Neural Encoding Mark van Rossum School of Informatics, University of Edinburgh January 2015 2 / 58 Overview Understanding the neural code Encoding: Prediction of neural response to a given stimulus

More information

Machine Learning Srihari. Information Theory. Sargur N. Srihari

Machine Learning Srihari. Information Theory. Sargur N. Srihari Information Theory Sargur N. Srihari 1 Topics 1. Entropy as an Information Measure 1. Discrete variable definition Relationship to Code Length 2. Continuous Variable Differential Entropy 2. Maximum Entropy

More information

Statistical models for neural encoding, decoding, information estimation, and optimal on-line stimulus design

Statistical models for neural encoding, decoding, information estimation, and optimal on-line stimulus design Statistical models for neural encoding, decoding, information estimation, and optimal on-line stimulus design Liam Paninski Department of Statistics and Center for Theoretical Neuroscience Columbia University

More information

arxiv:physics/ v1 [physics.data-an] 7 Jun 2003

arxiv:physics/ v1 [physics.data-an] 7 Jun 2003 Entropy and information in neural spike trains: Progress on the sampling problem arxiv:physics/0306063v1 [physics.data-an] 7 Jun 2003 Ilya Nemenman, 1 William Bialek, 2 and Rob de Ruyter van Steveninck

More information

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University Charalampos E. Tsourakakis January 25rd, 2017 Probability Theory The theory of probability is a system for making better guesses.

More information

How biased are maximum entropy models?

How biased are maximum entropy models? How biased are maximum entropy models? Jakob H. Macke Gatsby Computational Neuroscience Unit University College London, UK jakob@gatsby.ucl.ac.uk Iain Murray School of Informatics University of Edinburgh,

More information

Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons

Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons Yan Karklin and Eero P. Simoncelli NYU Overview Efficient coding is a well-known objective for the evaluation and

More information

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H. Appendix A Information Theory A.1 Entropy Shannon (Shanon, 1948) developed the concept of entropy to measure the uncertainty of a discrete random variable. Suppose X is a discrete random variable that

More information

Higher Order Statistics

Higher Order Statistics Higher Order Statistics Matthias Hennig Neural Information Processing School of Informatics, University of Edinburgh February 12, 2018 1 0 Based on Mark van Rossum s and Chris Williams s old NIP slides

More information

Statistical models for neural encoding

Statistical models for neural encoding Statistical models for neural encoding Part 1: discrete-time models Liam Paninski Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/ liam liam@gatsby.ucl.ac.uk

More information

Computing and Communications 2. Information Theory -Entropy

Computing and Communications 2. Information Theory -Entropy 1896 1920 1987 2006 Computing and Communications 2. Information Theory -Entropy Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1 Outline Entropy Joint entropy

More information

Variable selection and feature construction using methods related to information theory

Variable selection and feature construction using methods related to information theory Outline Variable selection and feature construction using methods related to information theory Kari 1 1 Intelligent Systems Lab, Motorola, Tempe, AZ IJCNN 2007 Outline Outline 1 Information Theory and

More information

Pacific Symposium on Biocomputing 6: (2001)

Pacific Symposium on Biocomputing 6: (2001) Analyzing sensory systems with the information distortion function Alexander G Dimitrov and John P Miller Center for Computational Biology Montana State University Bozeman, MT 59715-3505 falex,jpmg@nervana.montana.edu

More information

Block 2: Introduction to Information Theory

Block 2: Introduction to Information Theory Block 2: Introduction to Information Theory Francisco J. Escribano April 26, 2015 Francisco J. Escribano Block 2: Introduction to Information Theory April 26, 2015 1 / 51 Table of contents 1 Motivation

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

Population Coding. Maneesh Sahani Gatsby Computational Neuroscience Unit University College London

Population Coding. Maneesh Sahani Gatsby Computational Neuroscience Unit University College London Population Coding Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2010 Coding so far... Time-series for both spikes and stimuli Empirical

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

SPIKE TRIGGERED APPROACHES. Odelia Schwartz Computational Neuroscience Course 2017

SPIKE TRIGGERED APPROACHES. Odelia Schwartz Computational Neuroscience Course 2017 SPIKE TRIGGERED APPROACHES Odelia Schwartz Computational Neuroscience Course 2017 LINEAR NONLINEAR MODELS Linear Nonlinear o Often constrain to some form of Linear, Nonlinear computations, e.g. visual

More information

What can spike train distances tell us about the neural code?

What can spike train distances tell us about the neural code? What can spike train distances tell us about the neural code? Daniel Chicharro a,b,1,, Thomas Kreuz b,c, Ralph G. Andrzejak a a Dept. of Information and Communication Technologies, Universitat Pompeu Fabra,

More information

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

arxiv: v1 [q-bio.nc] 9 Mar 2016

arxiv: v1 [q-bio.nc] 9 Mar 2016 Temporal code versus rate code for binary Information Sources Agnieszka Pregowska a, Janusz Szczepanski a,, Eligiusz Wajnryb a arxiv:1603.02798v1 [q-bio.nc] 9 Mar 2016 a Institute of Fundamental Technological

More information

The IM Algorithm : A variational approach to Information Maximization

The IM Algorithm : A variational approach to Information Maximization The IM Algorithm : A variational approach to Information Maximization David Barber Felix Agakov Institute for Adaptive and Neural Computation : www.anc.ed.ac.uk Edinburgh University, EH1 2QL, U.K. Abstract

More information

!) + log(t) # n i. The last two terms on the right hand side (RHS) are clearly independent of θ and can be

!) + log(t) # n i. The last two terms on the right hand side (RHS) are clearly independent of θ and can be Supplementary Materials General case: computing log likelihood We first describe the general case of computing the log likelihood of a sensory parameter θ that is encoded by the activity of neurons. Each

More information

What do V1 neurons like about natural signals

What do V1 neurons like about natural signals What do V1 neurons like about natural signals Yuguo Yu Richard Romero Tai Sing Lee Center for the Neural Computer Science Computer Science Basis of Cognition Department Department Carnegie Mellon University

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information

Searching for simple models

Searching for simple models Searching for simple models 9th Annual Pinkel Endowed Lecture Institute for Research in Cognitive Science University of Pennsylvania Friday April 7 William Bialek Joseph Henry Laboratories of Physics,

More information

Efficient representation as a design principle for neural coding and computation

Efficient representation as a design principle for neural coding and computation Efficient representation as a design principle for neural coding and computation William Bialek, a Rob R. de Ruyter van Steveninck b and Naftali Tishby c a Joseph Henry Laboratories of Physics, Lewis Sigler

More information

Hands-On Learning Theory Fall 2016, Lecture 3

Hands-On Learning Theory Fall 2016, Lecture 3 Hands-On Learning Theory Fall 016, Lecture 3 Jean Honorio jhonorio@purdue.edu 1 Information Theory First, we provide some information theory background. Definition 3.1 (Entropy). The entropy of a discrete

More information

Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model

Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model ACCEPTED FOR NIPS: DRAFT VERSION Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model Sander M. Bohte CWI, Life Sciences Amsterdam, The Netherlands S.M.Bohte@cwi.nl September

More information

Dimensionality reduction in neural models: an information-theoretic generalization of spiketriggered average and covariance analysis

Dimensionality reduction in neural models: an information-theoretic generalization of spiketriggered average and covariance analysis to appear: Journal of Vision, 26 Dimensionality reduction in neural models: an information-theoretic generalization of spiketriggered average and covariance analysis Jonathan W. Pillow 1 and Eero P. Simoncelli

More information

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information. L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate

More information

Principles of Communications

Principles of Communications Principles of Communications Weiyao Lin Shanghai Jiao Tong University Chapter 10: Information Theory Textbook: Chapter 12 Communication Systems Engineering: Ch 6.1, Ch 9.1~ 9. 92 2009/2010 Meixia Tao @

More information

Bayesian Inference Course, WTCN, UCL, March 2013

Bayesian Inference Course, WTCN, UCL, March 2013 Bayesian Course, WTCN, UCL, March 2013 Shannon (1948) asked how much information is received when we observe a specific value of the variable x? If an unlikely event occurs then one would expect the information

More information

Dept. of Linguistics, Indiana University Fall 2015

Dept. of Linguistics, Indiana University Fall 2015 L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission

More information

Inferring the Capacity of the Vector Poisson Channel with a Bernoulli Model

Inferring the Capacity of the Vector Poisson Channel with a Bernoulli Model Inferring the with a Bernoulli Model Don H. Johnson and Ilan N. Goodman Electrical & Computer Engineering Department, MS380 Rice University Houston, Texas 77005 1892 {dhj,igoodman}@rice.edu Third Revision

More information

Thanks. WCMC Neurology and Neuroscience. Collaborators

Thanks. WCMC Neurology and Neuroscience. Collaborators Thanks WCMC Neurology and Neuroscience Keith Purpura Ferenc Mechler Anita Schmid Ifije Ohiorhenuan Qin Hu Former students Dmitriy Aronov Danny Reich Michael Repucci Bob DeBellis Collaborators Sheila Nirenberg

More information

Quantitative Biology Lecture 3

Quantitative Biology Lecture 3 23 nd Sep 2015 Quantitative Biology Lecture 3 Gurinder Singh Mickey Atwal Center for Quantitative Biology Summary Covariance, Correlation Confounding variables (Batch Effects) Information Theory Covariance

More information

Neural Coding: Integrate-and-Fire Models of Single and Multi-Neuron Responses

Neural Coding: Integrate-and-Fire Models of Single and Multi-Neuron Responses Neural Coding: Integrate-and-Fire Models of Single and Multi-Neuron Responses Jonathan Pillow HHMI and NYU http://www.cns.nyu.edu/~pillow Oct 5, Course lecture: Computational Modeling of Neuronal Systems

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Design principles for contrast gain control from an information theoretic perspective

Design principles for contrast gain control from an information theoretic perspective Design principles for contrast gain control from an information theoretic perspective Yuguo Yu Brian Potetz Tai Sing Lee Center for the Neural Computer Science Computer Science Basis of Cognition Department

More information

Variational Autoencoders (VAEs)

Variational Autoencoders (VAEs) September 26 & October 3, 2017 Section 1 Preliminaries Kullback-Leibler divergence KL divergence (continuous case) p(x) andq(x) are two density distributions. Then the KL-divergence is defined as Z KL(p

More information

5 Mutual Information and Channel Capacity

5 Mutual Information and Channel Capacity 5 Mutual Information and Channel Capacity In Section 2, we have seen the use of a quantity called entropy to measure the amount of randomness in a random variable. In this section, we introduce several

More information

Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons

Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons To appear in: Neural Information Processing Systems (NIPS), http://nips.cc/ Granada, Spain. December 12-15, 211. Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons Yan

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. Nicholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of Notre Dame Notre Dame, Indiana, USA Email:

More information

x log x, which is strictly convex, and use Jensen s Inequality:

x log x, which is strictly convex, and use Jensen s Inequality: 2. Information measures: mutual information 2.1 Divergence: main inequality Theorem 2.1 (Information Inequality). D(P Q) 0 ; D(P Q) = 0 iff P = Q Proof. Let ϕ(x) x log x, which is strictly convex, and

More information

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18 Information Theory David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 18 A Measure of Information? Consider a discrete random variable

More information

Synergy, Redundancy, and Independence in Population Codes, Revisited

Synergy, Redundancy, and Independence in Population Codes, Revisited The Journal of Neuroscience, May 25, 2005 25(21):5195 5206 5195 Behavioral/Systems/Cognitive Synergy, Redundancy, and Independence in Population Codes, Revisited Peter E. Latham 1 and Sheila Nirenberg

More information

The Theoretical Channel Capacity of a Single Neuron as Determined by Various Coding Systems*

The Theoretical Channel Capacity of a Single Neuron as Determined by Various Coding Systems* ISFOR~ATION A~D CONTROL 3, 335-350 (1960) The Theoretical Channel Capacity of a Single Neuron as Determined by Various Coding Systems* ANATOL RAPOPORT AND WILLIAM J. HORVATH Mental Health Research Institute,

More information

CSE/NB 528 Final Lecture: All Good Things Must. CSE/NB 528: Final Lecture

CSE/NB 528 Final Lecture: All Good Things Must. CSE/NB 528: Final Lecture CSE/NB 528 Final Lecture: All Good Things Must 1 Course Summary Where have we been? Course Highlights Where do we go from here? Challenges and Open Problems Further Reading 2 What is the neural code? What

More information

An Introduction to Independent Components Analysis (ICA)

An Introduction to Independent Components Analysis (ICA) An Introduction to Independent Components Analysis (ICA) Anish R. Shah, CFA Northfield Information Services Anish@northinfo.com Newport Jun 6, 2008 1 Overview of Talk Review principal components Introduce

More information

Series 7, May 22, 2018 (EM Convergence)

Series 7, May 22, 2018 (EM Convergence) Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18

More information

Exam. Matrikelnummer: Points. Question Bonus. Total. Grade. Information Theory and Signal Reconstruction Summer term 2013

Exam. Matrikelnummer: Points. Question Bonus. Total. Grade. Information Theory and Signal Reconstruction Summer term 2013 Exam Name: Matrikelnummer: Question 1 2 3 4 5 Bonus Points Total Grade 1/6 Question 1 You are traveling to the beautiful country of Markovia. Your travel guide tells you that the weather w i in Markovia

More information

Exercises. Chapter 1. of τ approx that produces the most accurate estimate for this firing pattern.

Exercises. Chapter 1. of τ approx that produces the most accurate estimate for this firing pattern. 1 Exercises Chapter 1 1. Generate spike sequences with a constant firing rate r 0 using a Poisson spike generator. Then, add a refractory period to the model by allowing the firing rate r(t) to depend

More information

The homogeneous Poisson process

The homogeneous Poisson process The homogeneous Poisson process during very short time interval Δt there is a fixed probability of an event (spike) occurring independent of what happened previously if r is the rate of the Poisson process,

More information

Modeling and Characterization of Neural Gain Control. Odelia Schwartz. A dissertation submitted in partial fulfillment

Modeling and Characterization of Neural Gain Control. Odelia Schwartz. A dissertation submitted in partial fulfillment Modeling and Characterization of Neural Gain Control by Odelia Schwartz A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Center for Neural Science

More information

Lecture 22: Final Review

Lecture 22: Final Review Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University Basics Dr Yao Xie, ECE587, Information

More information

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture : Mutual Information Method Lecturer: Yihong Wu Scribe: Jaeho Lee, Mar, 06 Ed. Mar 9 Quick review: Assouad s lemma

More information

Introduction to Statistical Learning Theory

Introduction to Statistical Learning Theory Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated

More information

Noisy channel communication

Noisy channel communication Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 6 Communication channels and Information Some notes on the noisy channel setup: Iain Murray, 2012 School of Informatics, University

More information

Neural Population Structures and Consequences for Neural Coding

Neural Population Structures and Consequences for Neural Coding Journal of Computational Neuroscience 16, 69 80, 2004 c 2004 Kluwer Academic Publishers. Manufactured in The Netherlands. Neural Population Structures and Consequences for Neural Coding DON H. JOHNSON

More information

Dimensionality reduction in neural models: An information-theoretic generalization of spike-triggered average and covariance analysis

Dimensionality reduction in neural models: An information-theoretic generalization of spike-triggered average and covariance analysis Journal of Vision (2006) 6, 414 428 http://journalofvision.org/6/4/9/ 414 Dimensionality reduction in neural models: An information-theoretic generalization of spike-triggered average and covariance analysis

More information

Learning quadratic receptive fields from neural responses to natural signals: information theoretic and likelihood methods

Learning quadratic receptive fields from neural responses to natural signals: information theoretic and likelihood methods Learning quadratic receptive fields from neural responses to natural signals: information theoretic and likelihood methods Kanaka Rajan Lewis-Sigler Institute for Integrative Genomics Princeton University

More information

Adaptation in the Neural Code of the Retina

Adaptation in the Neural Code of the Retina Adaptation in the Neural Code of the Retina Lens Retina Fovea Optic Nerve Optic Nerve Bottleneck Neurons Information Receptors: 108 95% Optic Nerve 106 5% After Polyak 1941 Visual Cortex ~1010 Mean Intensity

More information

Ch. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited

Ch. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited Ch. 8 Math Preliminaries for Lossy Coding 8.4 Info Theory Revisited 1 Info Theory Goals for Lossy Coding Again just as for the lossless case Info Theory provides: Basis for Algorithms & Bounds on Performance

More information

14 : Mean Field Assumption

14 : Mean Field Assumption 10-708: Probabilistic Graphical Models 10-708, Spring 2018 14 : Mean Field Assumption Lecturer: Kayhan Batmanghelich Scribes: Yao-Hung Hubert Tsai 1 Inferential Problems Can be categorized into three aspects:

More information

Chapter 2 Review of Classical Information Theory

Chapter 2 Review of Classical Information Theory Chapter 2 Review of Classical Information Theory Abstract This chapter presents a review of the classical information theory which plays a crucial role in this thesis. We introduce the various types of

More information

The Bayesian Brain. Robert Jacobs Department of Brain & Cognitive Sciences University of Rochester. May 11, 2017

The Bayesian Brain. Robert Jacobs Department of Brain & Cognitive Sciences University of Rochester. May 11, 2017 The Bayesian Brain Robert Jacobs Department of Brain & Cognitive Sciences University of Rochester May 11, 2017 Bayesian Brain How do neurons represent the states of the world? How do neurons represent

More information

Entropy and information in neural spike trains: Progress on the sampling problem

Entropy and information in neural spike trains: Progress on the sampling problem PHYSICAL REVIEW E 69, 056111 (2004) Entropy and information in neural spike trains: Progress on the sampling problem Ilya Nemenman, 1, * William Bialek, 2, and Rob de Ruyter van Steveninck 3, 1 avli Institute

More information

CS 630 Basic Probability and Information Theory. Tim Campbell

CS 630 Basic Probability and Information Theory. Tim Campbell CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)

More information

Information Theory (Information Theory by J. V. Stone, 2015)

Information Theory (Information Theory by J. V. Stone, 2015) Information Theory (Information Theory by J. V. Stone, 2015) Claude Shannon (1916 2001) Shannon, C. (1948). A mathematical theory of communication. Bell System Technical Journal, 27:379 423. A mathematical

More information

Empirical Bayes interpretations of random point events

Empirical Bayes interpretations of random point events INSTITUTE OF PHYSICS PUBLISHING JOURNAL OF PHYSICS A: MATHEMATICAL AND GENERAL J. Phys. A: Math. Gen. 38 (25) L531 L537 doi:1.188/35-447/38/29/l4 LETTER TO THE EDITOR Empirical Bayes interpretations of

More information

Revision of Lecture 5

Revision of Lecture 5 Revision of Lecture 5 Information transferring across channels Channel characteristics and binary symmetric channel Average mutual information Average mutual information tells us what happens to information

More information

Lecture 1: Introduction, Entropy and ML estimation

Lecture 1: Introduction, Entropy and ML estimation 0-704: Information Processing and Learning Spring 202 Lecture : Introduction, Entropy and ML estimation Lecturer: Aarti Singh Scribes: Min Xu Disclaimer: These notes have not been subjected to the usual

More information

Variational Information Maximization in Gaussian Channels

Variational Information Maximization in Gaussian Channels Variational Information Maximization in Gaussian Channels Felix V. Agakov School of Informatics, University of Edinburgh, EH1 2QL, UK felixa@inf.ed.ac.uk, http://anc.ed.ac.uk David Barber IDIAP, Rue du

More information

Lecture 14 February 28

Lecture 14 February 28 EE/Stats 376A: Information Theory Winter 07 Lecture 4 February 8 Lecturer: David Tse Scribe: Sagnik M, Vivek B 4 Outline Gaussian channel and capacity Information measures for continuous random variables

More information

What is the neural code? Sekuler lab, Brandeis

What is the neural code? Sekuler lab, Brandeis What is the neural code? Sekuler lab, Brandeis What is the neural code? What is the neural code? Alan Litke, UCSD What is the neural code? What is the neural code? What is the neural code? Encoding: how

More information

Neural Encoding: Firing Rates and Spike Statistics

Neural Encoding: Firing Rates and Spike Statistics Neural Encoding: Firing Rates and Spike Statistics Dayan and Abbott (21) Chapter 1 Instructor: Yoonsuck Choe; CPSC 644 Cortical Networks Background: Dirac δ Function Dirac δ function has the following

More information