Information Theory. Mark van Rossum. January 24, School of Informatics, University of Edinburgh 1 / 35
|
|
- Arabella Dixon
- 5 years ago
- Views:
Transcription
1 1 / 35 Information Theory Mark van Rossum School of Informatics, University of Edinburgh January 24, Version: January 24, 2018
2 Why information theory 2 / 35 Understanding the neural code. Encoding and decoding. We imposed coding schemes, such as 2nd-order kernel, or NLP. We possibly lost information in doing so. Instead, use information: Don t need to impose encoding or decoding scheme (non-parametric). In particular important for 1) spike timing codes, 2) higher areas. Estimate how much information is coded in certain signal. Caveats: No easy decoding scheme for organism (upper bound only) Requires more data and biases are tricky
3 3 / 35 Overview Entropy, Mutual Information Entropy Maximization for a Single Neuron Maximizing Mutual Information Estimating information Reading: Dayan and Abbott ch 4, Rieke
4 4 / 35 Definition The entropy of a quantity is defined as H(X) = x P(x) log 2 P(x) This is not derived, nor fully unique, but it fulfills these criteria: Continuous If p i = 1 n, it increases monotonically with n. H = log 2 n. Parallel independent channels add. Unit : bits Entropy can be thought of as physical entropy, richness of distribution [Shannon and Weaver, 1949, Cover and Thomas, 1991, Rieke et al., 1996]
5 Entropy 5 / 35 Discrete variable H(R) = r p(r) log 2 p(r) Continuous variable at resolution r H(R) = r p(r) r log 2 (p(r) r) = r p(r) r log 2 p(r) log 2 r letting r 0 we have lim [H + log 2 r] = r 0 p(r) log 2 p(r)dr (also called differential entropy)
6 Joint, Conditional entropy 6 / 35 Joint entropy: H(S, R) = r,s P(S, R) log 2 P(S, R) Conditional entropy: H(S R) = r P(R = r)h(s R = r) = r P(r) s P(s r) log 2 P(s r) If S, R are independent = H(S, R) H(R) H(S, R) = H(S) + H(R)
7 7 / 35 Mutual information Mutual information: I m (R; S) = r,s p(r, s) log 2 p(r, s) p(r)p(s) = H(R) H(R S) = H(S) H(S R) Measures reduction in uncertainty of R by knowing S (or vice versa) I m (R; S) 0 The continuous version is the difference of two entropies, the r divergence cancels
8 Mutual Information 8 / 35 The joint histogram determines mutual information. Given P(r, s) I m.
9 Mutual Information: Examples 9 / 35 Y non smoker smoker 1 smoker Y 1 non smoker Y 2 lung cancer 1/3 0 Y 2 lung cancer 1/9 2/9 no lung cancer 0 2/3 no lung cancer 2/9 4/9
10 Mutual Information: Examples 10 / 35 Y non smoker smoker 1 smoker Y 1 non smoker Y 2 lung cancer 1/3 0 Y 2 lung cancer 1/9 2/9 no lung cancer 0 2/3 no lung cancer 2/9 4/9 Only for the left joint probability I m > 0 (how much?). On the right, knowledge about Y 1 does not inform us about Y 2.
11 11 / 35 Kullback-Leibler divergence KL-divergence measures distance between two probability distributions D KL (P Q) = P(x) log 2 P(x) Q(x) dx, or D KL(P Q) i P i log 2 P i Q i Not symmetric, but can be symmetrized I m (R; S) = D KL (p(r, s) p(r)p(s)). Often used as probabilistic cost function: D KL (data model). Other probability distances exist (e.g. earth-movers distance)
12 Mutual info between jointly Gaussian variables 12 / 35 I(Y 1 ; Y 2 ) = P(y 1, y 2 ) log 2 P(y 1, y 2 ) P(y 1 )P(y 2 ) dy 1 dy 2 = 1 2 log 2(1 ρ 2 ) ρ is (Pearson-r) correlation coefficient.
13 Populations of Neurons 13 / 35 Given H(R) = p(r) log 2 p(r)dr N log 2 r and H(R i ) = p(r i ) log 2 p(r i )dr log 2 r We have H(R) i H(R i ) (proof, consider KL divergence)
14 Mutual information in populations of Neurons 14 / 35 Reduncancy can be defined as (compare to above) n r R = I(r i ; s) I(r; s). i=1 Some codes have R > 0 (redundant code), others R < 0 (synergistic) Example of synergistic code: P(r 1, r 2, s) with P(0, 0, 1) = P(0, 1, 0) = P(1, 0, 0) = P(1, 1, 1) = 1 4
15 15 / 35 Entropy Maximization for a Single Neuron I m (R; S) = H(R) H(R S) If noise entropy H(R S) is independent of the transformation S R, we can maximize mutual information by maximizing H(R) under given constraints Possible constraint: response r is 0 < r < r max. Maximal H(R) if p(r) U(0, r max ) (U is uniform dist) If average firing rate is limited, and 0 < r < : exponential distribution is optimal p(x) = 1/ xexp( x/ x). H = log 2 e x If variance is fixed and < r < : Gaussian distribution. H = 1 2 log 2(2πeσ 2 ) (note funny units)
16 Let r = f (s) and s p(s). Which f (assumed monotonic) maximizes H(R) using max firing rate constraint? Require: P(r) = 1 r max Thus df /ds = r max p(s) and p(s) = p(r) dr ds = 1 r max f (s) = r max s df ds s min p(s )ds This strategy is known as histogram equalization in signal processing 16 / 35
17 Fly retina Evidence that the large monopolar cell in the fly visual system carries out histogram equalization Contrast response for fly large monopolar cell (points) matches environment statistics (line) [Laughlin, 1981] (but changes in high noise conditions) 17 / 35
18 V1 contrast responses 18 / 35 Similar in V1, but On and Off channels [Brady and Field, 2000]
19 Information of time varying signals 19 / 35 Single analog channel with Gaussian signal s and Gaussian noise η: r = s + η I = 1 2 log 2(1 + σ2 s ση 2 ) = 1 2 log 2(1 + SNR) For time dependent signals I = 1 2 T dω 2π log 2(1 + s(ω) n(ω) ) To maximize information, when variance of the signal is constrained, use all frequency bands such that signal+noise = constant. Whitening. Water filling analog:
20 Information of graded synapses 20 / 35 Light - (photon noise) - photoreceptor - (synaptic noise) - LMC At low light levels photon noise dominates, synaptic noise is negligible. Information rate: 1500 bits/s [de Ruyter van Steveninck and Laughlin, 1996].
21 Spiking neurons: maximal information 21 / 35 Spike train with N = T /δt bins [Mackay and McCullogh, 1952] δt time-resolution. N! pn = N 1 events, #words = N 1!(N N 1 )! Maximal entropy if all words are equally likely. H = p i log 2 p i = log 2 N! log 2 N 1! log 2 (N N 1 )! Use for large x that log x! x(log x 1) H = T δt [p log 2 p + (1 p) log 2 (1 p)] For low rates p 1, setting λ = (δt)p: H = T λ log 2 ( e λδt )
22 Spiking neurons 22 / 35 Calculation incorrect when multiple spikes per bin. Instead, for large bins maximal information for exponential distribution: P(n) = 1 1 Z exp[ n log(1 + n )] H = log 2 (1 + n ) + n log 2 (1 + 1 n ) log 2(1 + n ) + 1
23 23 / 35 Spiking neurons: rate code [Stein, 1967] Measure rate in window T, during which stimulus is constant. Periodic neuron can maximally encode [1 + (f max f min )T ] stimuli H log 2 [1 + (f max f min )T ]. Note, only log(t )
24 [Stein, 1967] Similar behaviour for Poisson : H log(t ) 24 / 35
25 Spiking neurons: dynamic stimuli 25 / 35 [de Ruyter van Steveninck et al., 1997], but see [Warzecha and Egelhaaf, 1999].
26 26 / 35 Maximizing Information Transmission: single output Single linear neuron with post-synaptic noise v = w u + η where η is an independent noise variable I m (u; v) = H(v) H(v u) Second term depends only on p(η) To maximize I m need to maximize H(v); sensible constraint is that w 2 = 1 If u N(0, Q) and η N(0, σ 2 η) then v N(0, w T Qw + σ 2 η)
27 27 / 35 For a Gaussian RV with variance σ 2 we have H = 1 2 log 2πeσ2. To maximize H(v) we need to maximize w T Qw subject to the constraint w 2 = 1 Thus w e 1 so we obtain PCA If v is non-gaussian then this calculation gives an upper bound on H(v) (as the Gaussian distribution is the maximum entropy distribution for a given mean and covariance)
28 Infomax 28 / 35 Infomax: maximize information in multiple outputs wrt weights [Linsker, 1988] v = W u + η H(v) = 1 2 log det( vvt ) Example: 2 inputs and 2 outputs. Input is correlated. w 2 k1 + w 2 k2 = 1. At low noise independent coding, at high noise joint coding.
29 Estimating information 29 / 35 Information estimation requires a lot of data. Most statistical quantities are unbiased (mean, var,...). But both entropy and noise entropy have bias. [Panzeri et al., 2007]
30 Try to fit 1/N correction [Strong et al., 1998] 30 / 35
31 Common technique for I m : shuffle correction [Panzeri et al., 2007] See also: [Paninski, 2003, Nemenman et al., 2002] 31 / 35
32 32 / 35 Summary Information theory provides non parametric framework for coding Optimal coding schemes depend strongly on noise assumptions and optimization constraints In data analysis biases can be substantial
33 References I 33 / 35 Brady, N. and Field, D. J. (2000). Local contrast in natural images: normalisation and coding efficiency. Perception, 29(9): Cover, T. M. and Thomas, J. A. (1991). Elements of information theory. Wiley, New York. de Ruyter van Steveninck, R. R. and Laughlin, S. B. (1996). The rate of information transfer at graded-potential synapses. Nature, 379: de Ruyter van Steveninck, R. R., Lewen, G. D., Strong, S. P., Koberle, R., and Bialek, W. (1997). Reproducibility and variability in neural spike trains. Science, 275: Laughlin, S. B. (1981). A simple coding procedure enhances a neuron s information capacity. Zeitschrift für Naturforschung, 36: Linsker, R. (1988). Self-organization in a perceptual network. Computer, 21(3):
34 References II 34 / 35 Mackay, D. and McCullogh, W. S. (1952). The limiting information capacity of neuronal link. Bull Math Biophys, 14: Nemenman, I., Shafee, F., and Bialek, W. (2002). Entropy and Inference, Revisited. nips, 14. Paninski, L. (2003). Estimation of Entropy and Mutual Information. Neural Comp., 15: Panzeri, S., Senatore, R., Montemurro, M. A., and Petersen, R. S. (2007). Correcting for the sampling bias problem in spike train information measures. J Neurophysiol, 98(3): Rieke, F., Warland, D., de Ruyter van Steveninck, R., and Bialek, W. (1996). Spikes: Exploring the neural code. MIT Press, Cambridge. Shannon, C. E. and Weaver, W. (1949). The mathematical theory of communication. Univeristy of Illinois Press, Illinois.
35 References III 35 / 35 Stein, R. B. (1967). The information capacity of nerve cells using a frequency code. Biophys J, 7: Strong, S. P., Koberle, R., de Ruyter van Steveninck, R. R., and Bialek, W. (1998). Entropy and Information in Neural Spike Trains. Phys Rev Lett, 80: Warzecha, A. K. and Egelhaaf, M. (1999). Variability in spike trains during constant and dynamic stimulation. Science, 283(5409):
Estimation of information-theoretic quantities
Estimation of information-theoretic quantities Liam Paninski Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/ liam liam@gatsby.ucl.ac.uk November 16, 2004 Some
More informationInformation Theory. Coding and Information Theory. Information Theory Textbooks. Entropy
Coding and Information Theory Chris Williams, School of Informatics, University of Edinburgh Overview What is information theory? Entropy Coding Information Theory Shannon (1948): Information theory is
More informationencoding and estimation bottleneck and limits to visual fidelity
Retina Light Optic Nerve photoreceptors encoding and estimation bottleneck and limits to visual fidelity interneurons ganglion cells light The Neural Coding Problem s(t) {t i } Central goals for today:
More informationNeural coding Ecological approach to sensory coding: efficient adaptation to the natural environment
Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment Jean-Pierre Nadal CNRS & EHESS Laboratoire de Physique Statistique (LPS, UMR 8550 CNRS - ENS UPMC Univ.
More informationMembrane equation. VCl. dv dt + V = V Na G Na + V K G K + V Cl G Cl. G total. C m. G total = G Na + G K + G Cl
Spiking neurons Membrane equation V GNa GK GCl Cm VNa VK VCl dv dt + V = V Na G Na + V K G K + V Cl G Cl G total G total = G Na + G K + G Cl = C m G total Membrane with synaptic inputs V Gleak GNa GK
More informationarxiv:cond-mat/ v2 27 Jun 1997
Entropy and Information in Neural Spike rains Steven P. Strong, 1 Roland Koberle, 1,2 Rob R. de Ruyter van Steveninck, 1 and William Bialek 1 1 NEC Research Institute, 4 Independence Way, Princeton, New
More informationEncoding or decoding
Encoding or decoding Decoding How well can we learn what the stimulus is by looking at the neural responses? We will discuss two approaches: devise and evaluate explicit algorithms for extracting a stimulus
More information4.2 Entropy lost and information gained
4.2. ENTROPY LOST AND INFORMATION GAINED 101 4.2 Entropy lost and information gained Returning to the conversation between Max and Allan, we assumed that Max would receive a complete answer to his question,
More informationFisher Information Quantifies Task-Specific Performance in the Blowfly Photoreceptor
Fisher Information Quantifies Task-Specific Performance in the Blowfly Photoreceptor Peng Xu and Pamela Abshire Department of Electrical and Computer Engineering and the Institute for Systems Research
More information+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing
Linear recurrent networks Simpler, much more amenable to analytic treatment E.g. by choosing + ( + ) = Firing rates can be negative Approximates dynamics around fixed point Approximation often reasonable
More informationNatural Image Statistics and Neural Representations
Natural Image Statistics and Neural Representations Michael Lewicki Center for the Neural Basis of Cognition & Department of Computer Science Carnegie Mellon University? 1 Outline 1. Information theory
More information1/12/2017. Computational neuroscience. Neurotechnology.
Computational neuroscience Neurotechnology https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-core-concepts/ 1 Neurotechnology http://www.lce.hut.fi/research/cogntech/neurophysiology Recording
More informationDo Neurons Process Information Efficiently?
Do Neurons Process Information Efficiently? James V Stone, University of Sheffield Claude Shannon, 1916-2001 Nothing in biology makes sense except in the light of evolution. Theodosius Dobzhansky, 1973.
More informationSlide11 Haykin Chapter 10: Information-Theoretic Models
Slide11 Haykin Chapter 10: Information-Theoretic Models CPSC 636-600 Instructor: Yoonsuck Choe Spring 2015 ICA section is heavily derived from Aapo Hyvärinen s ICA tutorial: http://www.cis.hut.fi/aapo/papers/ijcnn99_tutorialweb/.
More informationCorrelations in Populations: Information-Theoretic Limits
Correlations in Populations: Information-Theoretic Limits Don H. Johnson Ilan N. Goodman dhj@rice.edu Department of Electrical & Computer Engineering Rice University, Houston, Texas Population coding Describe
More informationAdaptive contrast gain control and information maximization $
Neurocomputing 65 66 (2005) 6 www.elsevier.com/locate/neucom Adaptive contrast gain control and information maximization $ Yuguo Yu a,, Tai Sing Lee b a Center for the Neural Basis of Cognition, Carnegie
More informationCh. 8 Math Preliminaries for Lossy Coding. 8.5 Rate-Distortion Theory
Ch. 8 Math Preliminaries for Lossy Coding 8.5 Rate-Distortion Theory 1 Introduction Theory provide insight into the trade between Rate & Distortion This theory is needed to answer: What do typical R-D
More informationSignal detection theory
Signal detection theory z p[r -] p[r +] - + Role of priors: Find z by maximizing P[correct] = p[+] b(z) + p[-](1 a(z)) Is there a better test to use than r? z p[r -] p[r +] - + The optimal
More informationInformation Theory Primer:
Information Theory Primer: Entropy, KL Divergence, Mutual Information, Jensen s inequality Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationMulti-neuron spike trains - distances and information.
Multi-neuron spike trains - distances and information. Conor Houghton Computer Science, U Bristol NETT Florence, March 204 Thanks for the invite In 87 Stendhal reportedly was overcome by the cultural richness
More informationInformation maximization in a network of linear neurons
Information maximization in a network of linear neurons Holger Arnold May 30, 005 1 Introduction It is known since the work of Hubel and Wiesel [3], that many cells in the early visual areas of mammals
More informationApproximations of Shannon Mutual Information for Discrete Variables with Applications to Neural Population Coding
Article Approximations of Shannon utual Information for Discrete Variables with Applications to Neural Population Coding Wentao Huang 1,2, * and Kechen Zhang 2, * 1 Key Laboratory of Cognition and Intelligence
More informationEfficient Coding. Odelia Schwartz 2017
Efficient Coding Odelia Schwartz 2017 1 Levels of modeling Descriptive (what) Mechanistic (how) Interpretive (why) 2 Levels of modeling Fitting a receptive field model to experimental data (e.g., using
More informationNeural Encoding. Mark van Rossum. January School of Informatics, University of Edinburgh 1 / 58
1 / 58 Neural Encoding Mark van Rossum School of Informatics, University of Edinburgh January 2015 2 / 58 Overview Understanding the neural code Encoding: Prediction of neural response to a given stimulus
More informationMachine Learning Srihari. Information Theory. Sargur N. Srihari
Information Theory Sargur N. Srihari 1 Topics 1. Entropy as an Information Measure 1. Discrete variable definition Relationship to Code Length 2. Continuous Variable Differential Entropy 2. Maximum Entropy
More informationStatistical models for neural encoding, decoding, information estimation, and optimal on-line stimulus design
Statistical models for neural encoding, decoding, information estimation, and optimal on-line stimulus design Liam Paninski Department of Statistics and Center for Theoretical Neuroscience Columbia University
More informationarxiv:physics/ v1 [physics.data-an] 7 Jun 2003
Entropy and information in neural spike trains: Progress on the sampling problem arxiv:physics/0306063v1 [physics.data-an] 7 Jun 2003 Ilya Nemenman, 1 William Bialek, 2 and Rob de Ruyter van Steveninck
More informationCS 591, Lecture 2 Data Analytics: Theory and Applications Boston University
CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University Charalampos E. Tsourakakis January 25rd, 2017 Probability Theory The theory of probability is a system for making better guesses.
More informationHow biased are maximum entropy models?
How biased are maximum entropy models? Jakob H. Macke Gatsby Computational Neuroscience Unit University College London, UK jakob@gatsby.ucl.ac.uk Iain Murray School of Informatics University of Edinburgh,
More informationEfficient coding of natural images with a population of noisy Linear-Nonlinear neurons
Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons Yan Karklin and Eero P. Simoncelli NYU Overview Efficient coding is a well-known objective for the evaluation and
More information3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.
Appendix A Information Theory A.1 Entropy Shannon (Shanon, 1948) developed the concept of entropy to measure the uncertainty of a discrete random variable. Suppose X is a discrete random variable that
More informationHigher Order Statistics
Higher Order Statistics Matthias Hennig Neural Information Processing School of Informatics, University of Edinburgh February 12, 2018 1 0 Based on Mark van Rossum s and Chris Williams s old NIP slides
More informationStatistical models for neural encoding
Statistical models for neural encoding Part 1: discrete-time models Liam Paninski Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/ liam liam@gatsby.ucl.ac.uk
More informationComputing and Communications 2. Information Theory -Entropy
1896 1920 1987 2006 Computing and Communications 2. Information Theory -Entropy Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1 Outline Entropy Joint entropy
More informationVariable selection and feature construction using methods related to information theory
Outline Variable selection and feature construction using methods related to information theory Kari 1 1 Intelligent Systems Lab, Motorola, Tempe, AZ IJCNN 2007 Outline Outline 1 Information Theory and
More informationPacific Symposium on Biocomputing 6: (2001)
Analyzing sensory systems with the information distortion function Alexander G Dimitrov and John P Miller Center for Computational Biology Montana State University Bozeman, MT 59715-3505 falex,jpmg@nervana.montana.edu
More informationBlock 2: Introduction to Information Theory
Block 2: Introduction to Information Theory Francisco J. Escribano April 26, 2015 Francisco J. Escribano Block 2: Introduction to Information Theory April 26, 2015 1 / 51 Table of contents 1 Motivation
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationPopulation Coding. Maneesh Sahani Gatsby Computational Neuroscience Unit University College London
Population Coding Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2010 Coding so far... Time-series for both spikes and stimuli Empirical
More informationMachine learning - HT Maximum Likelihood
Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce
More informationSPIKE TRIGGERED APPROACHES. Odelia Schwartz Computational Neuroscience Course 2017
SPIKE TRIGGERED APPROACHES Odelia Schwartz Computational Neuroscience Course 2017 LINEAR NONLINEAR MODELS Linear Nonlinear o Often constrain to some form of Linear, Nonlinear computations, e.g. visual
More informationWhat can spike train distances tell us about the neural code?
What can spike train distances tell us about the neural code? Daniel Chicharro a,b,1,, Thomas Kreuz b,c, Ralph G. Andrzejak a a Dept. of Information and Communication Technologies, Universitat Pompeu Fabra,
More informationECE 4400:693 - Information Theory
ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential
More informationarxiv: v1 [q-bio.nc] 9 Mar 2016
Temporal code versus rate code for binary Information Sources Agnieszka Pregowska a, Janusz Szczepanski a,, Eligiusz Wajnryb a arxiv:1603.02798v1 [q-bio.nc] 9 Mar 2016 a Institute of Fundamental Technological
More informationThe IM Algorithm : A variational approach to Information Maximization
The IM Algorithm : A variational approach to Information Maximization David Barber Felix Agakov Institute for Adaptive and Neural Computation : www.anc.ed.ac.uk Edinburgh University, EH1 2QL, U.K. Abstract
More information!) + log(t) # n i. The last two terms on the right hand side (RHS) are clearly independent of θ and can be
Supplementary Materials General case: computing log likelihood We first describe the general case of computing the log likelihood of a sensory parameter θ that is encoded by the activity of neurons. Each
More informationWhat do V1 neurons like about natural signals
What do V1 neurons like about natural signals Yuguo Yu Richard Romero Tai Sing Lee Center for the Neural Computer Science Computer Science Basis of Cognition Department Department Carnegie Mellon University
More informationExpectation Maximization
Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /
More informationSearching for simple models
Searching for simple models 9th Annual Pinkel Endowed Lecture Institute for Research in Cognitive Science University of Pennsylvania Friday April 7 William Bialek Joseph Henry Laboratories of Physics,
More informationEfficient representation as a design principle for neural coding and computation
Efficient representation as a design principle for neural coding and computation William Bialek, a Rob R. de Ruyter van Steveninck b and Naftali Tishby c a Joseph Henry Laboratories of Physics, Lewis Sigler
More informationHands-On Learning Theory Fall 2016, Lecture 3
Hands-On Learning Theory Fall 016, Lecture 3 Jean Honorio jhonorio@purdue.edu 1 Information Theory First, we provide some information theory background. Definition 3.1 (Entropy). The entropy of a discrete
More informationEfficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model
ACCEPTED FOR NIPS: DRAFT VERSION Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model Sander M. Bohte CWI, Life Sciences Amsterdam, The Netherlands S.M.Bohte@cwi.nl September
More informationDimensionality reduction in neural models: an information-theoretic generalization of spiketriggered average and covariance analysis
to appear: Journal of Vision, 26 Dimensionality reduction in neural models: an information-theoretic generalization of spiketriggered average and covariance analysis Jonathan W. Pillow 1 and Eero P. Simoncelli
More informationIntroduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.
L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate
More informationPrinciples of Communications
Principles of Communications Weiyao Lin Shanghai Jiao Tong University Chapter 10: Information Theory Textbook: Chapter 12 Communication Systems Engineering: Ch 6.1, Ch 9.1~ 9. 92 2009/2010 Meixia Tao @
More informationBayesian Inference Course, WTCN, UCL, March 2013
Bayesian Course, WTCN, UCL, March 2013 Shannon (1948) asked how much information is received when we observe a specific value of the variable x? If an unlikely event occurs then one would expect the information
More informationDept. of Linguistics, Indiana University Fall 2015
L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission
More informationInferring the Capacity of the Vector Poisson Channel with a Bernoulli Model
Inferring the with a Bernoulli Model Don H. Johnson and Ilan N. Goodman Electrical & Computer Engineering Department, MS380 Rice University Houston, Texas 77005 1892 {dhj,igoodman}@rice.edu Third Revision
More informationThanks. WCMC Neurology and Neuroscience. Collaborators
Thanks WCMC Neurology and Neuroscience Keith Purpura Ferenc Mechler Anita Schmid Ifije Ohiorhenuan Qin Hu Former students Dmitriy Aronov Danny Reich Michael Repucci Bob DeBellis Collaborators Sheila Nirenberg
More informationQuantitative Biology Lecture 3
23 nd Sep 2015 Quantitative Biology Lecture 3 Gurinder Singh Mickey Atwal Center for Quantitative Biology Summary Covariance, Correlation Confounding variables (Batch Effects) Information Theory Covariance
More informationNeural Coding: Integrate-and-Fire Models of Single and Multi-Neuron Responses
Neural Coding: Integrate-and-Fire Models of Single and Multi-Neuron Responses Jonathan Pillow HHMI and NYU http://www.cns.nyu.edu/~pillow Oct 5, Course lecture: Computational Modeling of Neuronal Systems
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationDesign principles for contrast gain control from an information theoretic perspective
Design principles for contrast gain control from an information theoretic perspective Yuguo Yu Brian Potetz Tai Sing Lee Center for the Neural Computer Science Computer Science Basis of Cognition Department
More informationVariational Autoencoders (VAEs)
September 26 & October 3, 2017 Section 1 Preliminaries Kullback-Leibler divergence KL divergence (continuous case) p(x) andq(x) are two density distributions. Then the KL-divergence is defined as Z KL(p
More information5 Mutual Information and Channel Capacity
5 Mutual Information and Channel Capacity In Section 2, we have seen the use of a quantity called entropy to measure the amount of randomness in a random variable. In this section, we introduce several
More informationEfficient coding of natural images with a population of noisy Linear-Nonlinear neurons
To appear in: Neural Information Processing Systems (NIPS), http://nips.cc/ Granada, Spain. December 12-15, 211. Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons Yan
More informationIntroduction to Probability and Statistics (Continued)
Introduction to Probability and Statistics (Continued) Prof. Nicholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of Notre Dame Notre Dame, Indiana, USA Email:
More informationx log x, which is strictly convex, and use Jensen s Inequality:
2. Information measures: mutual information 2.1 Divergence: main inequality Theorem 2.1 (Information Inequality). D(P Q) 0 ; D(P Q) = 0 iff P = Q Proof. Let ϕ(x) x log x, which is strictly convex, and
More informationInformation Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18
Information Theory David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 18 A Measure of Information? Consider a discrete random variable
More informationSynergy, Redundancy, and Independence in Population Codes, Revisited
The Journal of Neuroscience, May 25, 2005 25(21):5195 5206 5195 Behavioral/Systems/Cognitive Synergy, Redundancy, and Independence in Population Codes, Revisited Peter E. Latham 1 and Sheila Nirenberg
More informationThe Theoretical Channel Capacity of a Single Neuron as Determined by Various Coding Systems*
ISFOR~ATION A~D CONTROL 3, 335-350 (1960) The Theoretical Channel Capacity of a Single Neuron as Determined by Various Coding Systems* ANATOL RAPOPORT AND WILLIAM J. HORVATH Mental Health Research Institute,
More informationCSE/NB 528 Final Lecture: All Good Things Must. CSE/NB 528: Final Lecture
CSE/NB 528 Final Lecture: All Good Things Must 1 Course Summary Where have we been? Course Highlights Where do we go from here? Challenges and Open Problems Further Reading 2 What is the neural code? What
More informationAn Introduction to Independent Components Analysis (ICA)
An Introduction to Independent Components Analysis (ICA) Anish R. Shah, CFA Northfield Information Services Anish@northinfo.com Newport Jun 6, 2008 1 Overview of Talk Review principal components Introduce
More informationSeries 7, May 22, 2018 (EM Convergence)
Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18
More informationExam. Matrikelnummer: Points. Question Bonus. Total. Grade. Information Theory and Signal Reconstruction Summer term 2013
Exam Name: Matrikelnummer: Question 1 2 3 4 5 Bonus Points Total Grade 1/6 Question 1 You are traveling to the beautiful country of Markovia. Your travel guide tells you that the weather w i in Markovia
More informationExercises. Chapter 1. of τ approx that produces the most accurate estimate for this firing pattern.
1 Exercises Chapter 1 1. Generate spike sequences with a constant firing rate r 0 using a Poisson spike generator. Then, add a refractory period to the model by allowing the firing rate r(t) to depend
More informationThe homogeneous Poisson process
The homogeneous Poisson process during very short time interval Δt there is a fixed probability of an event (spike) occurring independent of what happened previously if r is the rate of the Poisson process,
More informationModeling and Characterization of Neural Gain Control. Odelia Schwartz. A dissertation submitted in partial fulfillment
Modeling and Characterization of Neural Gain Control by Odelia Schwartz A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Center for Neural Science
More informationLecture 22: Final Review
Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University Basics Dr Yao Xie, ECE587, Information
More informationECE598: Information-theoretic methods in high-dimensional statistics Spring 2016
ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture : Mutual Information Method Lecturer: Yihong Wu Scribe: Jaeho Lee, Mar, 06 Ed. Mar 9 Quick review: Assouad s lemma
More informationIntroduction to Statistical Learning Theory
Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated
More informationNoisy channel communication
Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 6 Communication channels and Information Some notes on the noisy channel setup: Iain Murray, 2012 School of Informatics, University
More informationNeural Population Structures and Consequences for Neural Coding
Journal of Computational Neuroscience 16, 69 80, 2004 c 2004 Kluwer Academic Publishers. Manufactured in The Netherlands. Neural Population Structures and Consequences for Neural Coding DON H. JOHNSON
More informationDimensionality reduction in neural models: An information-theoretic generalization of spike-triggered average and covariance analysis
Journal of Vision (2006) 6, 414 428 http://journalofvision.org/6/4/9/ 414 Dimensionality reduction in neural models: An information-theoretic generalization of spike-triggered average and covariance analysis
More informationLearning quadratic receptive fields from neural responses to natural signals: information theoretic and likelihood methods
Learning quadratic receptive fields from neural responses to natural signals: information theoretic and likelihood methods Kanaka Rajan Lewis-Sigler Institute for Integrative Genomics Princeton University
More informationAdaptation in the Neural Code of the Retina
Adaptation in the Neural Code of the Retina Lens Retina Fovea Optic Nerve Optic Nerve Bottleneck Neurons Information Receptors: 108 95% Optic Nerve 106 5% After Polyak 1941 Visual Cortex ~1010 Mean Intensity
More informationCh. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited
Ch. 8 Math Preliminaries for Lossy Coding 8.4 Info Theory Revisited 1 Info Theory Goals for Lossy Coding Again just as for the lossless case Info Theory provides: Basis for Algorithms & Bounds on Performance
More information14 : Mean Field Assumption
10-708: Probabilistic Graphical Models 10-708, Spring 2018 14 : Mean Field Assumption Lecturer: Kayhan Batmanghelich Scribes: Yao-Hung Hubert Tsai 1 Inferential Problems Can be categorized into three aspects:
More informationChapter 2 Review of Classical Information Theory
Chapter 2 Review of Classical Information Theory Abstract This chapter presents a review of the classical information theory which plays a crucial role in this thesis. We introduce the various types of
More informationThe Bayesian Brain. Robert Jacobs Department of Brain & Cognitive Sciences University of Rochester. May 11, 2017
The Bayesian Brain Robert Jacobs Department of Brain & Cognitive Sciences University of Rochester May 11, 2017 Bayesian Brain How do neurons represent the states of the world? How do neurons represent
More informationEntropy and information in neural spike trains: Progress on the sampling problem
PHYSICAL REVIEW E 69, 056111 (2004) Entropy and information in neural spike trains: Progress on the sampling problem Ilya Nemenman, 1, * William Bialek, 2, and Rob de Ruyter van Steveninck 3, 1 avli Institute
More informationCS 630 Basic Probability and Information Theory. Tim Campbell
CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)
More informationInformation Theory (Information Theory by J. V. Stone, 2015)
Information Theory (Information Theory by J. V. Stone, 2015) Claude Shannon (1916 2001) Shannon, C. (1948). A mathematical theory of communication. Bell System Technical Journal, 27:379 423. A mathematical
More informationEmpirical Bayes interpretations of random point events
INSTITUTE OF PHYSICS PUBLISHING JOURNAL OF PHYSICS A: MATHEMATICAL AND GENERAL J. Phys. A: Math. Gen. 38 (25) L531 L537 doi:1.188/35-447/38/29/l4 LETTER TO THE EDITOR Empirical Bayes interpretations of
More informationRevision of Lecture 5
Revision of Lecture 5 Information transferring across channels Channel characteristics and binary symmetric channel Average mutual information Average mutual information tells us what happens to information
More informationLecture 1: Introduction, Entropy and ML estimation
0-704: Information Processing and Learning Spring 202 Lecture : Introduction, Entropy and ML estimation Lecturer: Aarti Singh Scribes: Min Xu Disclaimer: These notes have not been subjected to the usual
More informationVariational Information Maximization in Gaussian Channels
Variational Information Maximization in Gaussian Channels Felix V. Agakov School of Informatics, University of Edinburgh, EH1 2QL, UK felixa@inf.ed.ac.uk, http://anc.ed.ac.uk David Barber IDIAP, Rue du
More informationLecture 14 February 28
EE/Stats 376A: Information Theory Winter 07 Lecture 4 February 8 Lecturer: David Tse Scribe: Sagnik M, Vivek B 4 Outline Gaussian channel and capacity Information measures for continuous random variables
More informationWhat is the neural code? Sekuler lab, Brandeis
What is the neural code? Sekuler lab, Brandeis What is the neural code? What is the neural code? Alan Litke, UCSD What is the neural code? What is the neural code? What is the neural code? Encoding: how
More informationNeural Encoding: Firing Rates and Spike Statistics
Neural Encoding: Firing Rates and Spike Statistics Dayan and Abbott (21) Chapter 1 Instructor: Yoonsuck Choe; CPSC 644 Cortical Networks Background: Dirac δ Function Dirac δ function has the following
More information