Population Coding. Maneesh Sahani Gatsby Computational Neuroscience Unit University College London

Similar documents
Encoding or decoding

3.3 Population Decoding

Signal detection theory

!) + log(t) # n i. The last two terms on the right hand side (RHS) are clearly independent of θ and can be

The Bayesian Brain. Robert Jacobs Department of Brain & Cognitive Sciences University of Rochester. May 11, 2017

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing

Exercises. Chapter 1. of τ approx that produces the most accurate estimate for this firing pattern.

3 Neural Decoding. 3.1 Encoding and Decoding. (r 1, r 2,..., r N ) for N neurons is a list of spike-count firing rates, although,

Robust regression and non-linear kernel methods for characterization of neuronal response functions from limited data

Neural Decoding. Chapter Encoding and Decoding

Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment

Lateral organization & computation

What is the neural code? Sekuler lab, Brandeis

CSE/NB 528 Final Lecture: All Good Things Must. CSE/NB 528: Final Lecture

Information Theory. Mark van Rossum. January 24, School of Informatics, University of Edinburgh 1 / 35

Neuronal Tuning: To Sharpen or Broaden?

1/12/2017. Computational neuroscience. Neurotechnology.

Describing Spike-Trains

Neural Encoding Models

Neural Coding: Integrate-and-Fire Models of Single and Multi-Neuron Responses

Detection theory 101 ELEC-E5410 Signal Processing for Communications

Concerns of the Psychophysicist. Three methods for measuring perception. Yes/no method of constant stimuli. Detection / discrimination.

Mathematical Tools for Neuroscience (NEU 314) Princeton University, Spring 2016 Jonathan Pillow. Homework 8: Logistic Regression & Information Theory

Exercise Sheet 4: Covariance and Correlation, Bayes theorem, and Linear discriminant analysis

Probabilistic Models in Theoretical Neuroscience

Sean Escola. Center for Theoretical Neuroscience

Estimation of information-theoretic quantities

Bayesian Inference. 2 CS295-7 cfl Michael J. Black,

Consider the following spike trains from two different neurons N1 and N2:

Statistical models for neural encoding

encoding and estimation bottleneck and limits to visual fidelity

Probabilistic & Unsupervised Learning

Week 3: The EM algorithm

Inferring Elapsed Time from Stochastic Neural Processes

Lecture 7 Introduction to Statistical Decision Theory

If there exists a threshold k 0 such that. then we can take k = k 0 γ =0 and achieve a test of size α. c 2004 by Mark R. Bell,

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Comparing integrate-and-fire models estimated using intracellular and extracellular data 1

Detection theory. H 0 : x[n] = w[n]

Computing with Inter-spike Interval Codes in Networks of Integrate and Fire Neurons

The homogeneous Poisson process

Correlations and neural information coding Shlens et al. 09

The Mixed States of Associative Memories Realize Unimodal Distribution of Dominance Durations in Multistable Perception

Lecture 8: Information Theory and Statistics

Comparison of receptive fields to polar and Cartesian stimuli computed with two kinds of models

Finding a Basis for the Neural State

Encoding and Decoding Spikes for Dynamic Stimuli

Stephen Scott.

Neural Decoding. Mark van Rossum. School of Informatics, University of Edinburgh. January 2012

Implementing a Bayes Filter in a Neural Circuit: The Case of Unknown Stimulus Dynamics

Design of experiments via information theory

The Effect of Correlated Variability on the Accuracy of a Population Code

Neural Encoding: Firing Rates and Spike Statistics

Correlations in Populations: Information-Theoretic Limits

Frequentist-Bayesian Model Comparisons: A Simple Example

12/2/15. G Perception. Bayesian Decision Theory. Laurence T. Maloney. Perceptual Tasks. Testing hypotheses. Estimation

Improved characterization of neural and behavioral response. state-space framework

Quantum estimation for quantum technology

Linking non-binned spike train kernels to several existing spike train metrics

Latent state estimation using control theory

CHARACTERIZATION OF NONLINEAR NEURON RESPONSES

Unsupervised Learning

Membrane equation. VCl. dv dt + V = V Na G Na + V K G K + V Cl G Cl. G total. C m. G total = G Na + G K + G Cl

Artificial Neural Networks Examination, March 2004

Probabilistic & Unsupervised Learning

Learning with Probabilities

Bayesian Decision Theory

PATTERN RECOGNITION AND MACHINE LEARNING

Generative models for amplitude modulation: Gaussian Modulation Cascade Processes

1. Fisher Information

Introduction to Machine Learning CMU-10701

Effective Connectivity & Dynamic Causal Modelling

How Behavioral Constraints May Determine Optimal Sensory Representations

Statistical Tools and Techniques for Solar Astronomers

Bayesian Regression Linear and Logistic Regression

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015

Computational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept

A Multivariate Time-Frequency Based Phase Synchrony Measure for Quantifying Functional Connectivity in the Brain

Statistical models for neural encoding, decoding, information estimation, and optimal on-line stimulus design

Decision-making, inference, and learning theory. ECE 830 & CS 761, Spring 2016

Gentle Introduction to Infinite Gaussian Mixture Modeling

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

A New Unsupervised Event Detector for Non-Intrusive Load Monitoring

Statistical Data Analysis Stat 3: p-values, parameter estimation

ECE 564/645 - Digital Communications, Spring 2018 Homework #2 Due: March 19 (In Lecture)

Bayesian inference J. Daunizeau

Statistics: Learning models from data

A Few Notes on Fisher Information (WIP)

Supplemental Data: Appendix

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Basic concepts in estimation

A Monte Carlo Sequential Estimation for Point Process Optimum Filtering

Neural spike statistics modify the impact of background noise

Minimum Message Length Analysis of the Behrens Fisher Problem

Linear Models for Classification

Probabilistic Modeling of Dependencies Among Visual Short-Term Memory Representations

SPIKE TRIGGERED APPROACHES. Odelia Schwartz Computational Neuroscience Course 2017

Transcription:

Population Coding Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2010

Coding so far... Time-series for both spikes and stimuli Empirical estimate function(s) given measured stimuli or movements and spikes

Population codes High dimensionality (cells stimulus time). usually limited to simple rate codes. even prosthetic work assumes instantaneous (lagged) coding Limited empirical data can record 10s - 100s of neurons. population size more like 10 4-10 6. theoretical inferences, based on single-cell and aggregate (fmri, LFP, optical) measurements.

Common approach The most common sort of questions asked of population codes: given assumed encoding functions, how well can we (or downstream areas) decode the encoded stimulus value? what encoding schemes would be optimal, in the sense of allowing decoders to estimate stimulus values as well as possible. Before considering populations, we need to formulate some ideas about rate coding in the context of single cells.

Rate coding In the rate coding context, we imagine that the firing rate of a cell r represents a single (possibly multidimensional) stimulus value s at any one time: r = f (s). Even if s and r are embedded in time-series we assume: 1. that coding is instantaneous (with a fixed lag), 2. that r (and therefore s) is constant over a short time. The actual number of spikes n produced in is then taken to be distributed around r, often according to a Poisson distribution.

Tuning curves The function f (s) is known as a tuning curve. Commonly assumed forms: Gaussian r 0 + r max exp [ 1 ] 2σ 2(x x pref) 2 Cosine r 0 + r max cos(θ θ pref ) Wrapped Gaussian r 0 + r max n exp von Mises ( circular Gaussian ) r 0 + r max exp [ κcos(θ θ pref ) ] [ 1 ] 2σ 2(θ θ pref 2πn) 2

Measuring the performance of rate codes: Discrete choice Suppose we want to make a binary choice based on firing rate: present / absent (signal detection) up / down horizontal / vertical Call one potential stimulus s 0, the other s 1. P(n s): probability density P(n s 0 ) P(n s 1 ) response

ROC curves probability density P(n s 0 ) P(n s 1 ) 1 response 0.9 0.8 0.7 hit rate 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 false alarm rate

ROC curves probability density P(n s 0 ) P(n s 1 ) 1 response 0.9 0.8 0.7 hit rate 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 false alarm rate

Summary measures area under the ROC curve given n 1 P(n s 1 ) and n 0 P(n s 0 ), this equals P(n 1 > n 0 ) discriminability d for equal variance Gaussians d = µ 1 µ 0 σ. for any threshold d = Φ 1 (1 FA) Φ 1 (1 HR) where Φ is a standard normal cdf. definition unclear for non-gaussian distributions.

Continuous estimation Now consider a (one dimensional) stimulus that takes any real value (or angle). contrast orientation motion direction movement speed Consider a neuron that fires n spikes in response to a stimulus s, according P(n f (s) ) Given n we attempt to estimate s. How well can we do?

Continuous estimation Useful to consider a limit given N measurements n i all generated by the same stimulus s. Then the posterior over s is log P(s {n i }) = i log P(n i s) + log P(s) log Z({n i }) and so taking N 1 N log P(s {n i}) log P(n s) n s + 0 log Z(s ) and so P(s {n i }) e N log P(n s) n s /Z = e NKL [P(n s ) P(n s)] /Z

Continuous estimation Now, Taylor expand the KL divergence in s around s : KL [ P(n s ) P(n s) ] = log P(n s) + log P(n s ) n s n s = log P(n s ) d log P(n s) (s s ) n s ds + log P(n s ) n s = 1 d 2 (s s ) 2 2 log P(n s) s +... ds 2 s = 1 2 (s s ) 2 J(s ) +... So in asymptopia, the posterior N (s, 1/J(s )). J(s ) is called the Fisher Information. d J(s 2 log P(n s) (d s log P(n s) ) = ds 2 ds s = s 1 d s 2 (s s ) 2 2 log P(n s) s +... ds 2 s s ) 2 (You will show that these are identical in the homework.) s

Cramér-Rao bound The Fisher Information is important even outside the large data limit due to a deeper result that is due to Cramér and Rao. This states that for any N, any unbiased estimator ŝ({n i }) of s will have the property that (ŝ({ni }) s ) 2 n i s 1 J(s ). Thus, Fisher Information gives a lower bound on the variance of any unbiased estimator. This is called the Cramér-Rao bound. (There is also a version for biased estimators). The Fisher Information will be our primary tool to quantify the performance of a population code.

Fisher Info and tuning curves n = r + noise; r = f (s) ( d ) 2 J(s )= ds log P(n s) s s ( d ) 2 = dr log P(n r ) f (s ) f (s ) s = J noise (r ) 2 f (s ) 2 f(s) J(s) firing rate / Fisher info s

Fisher info for Poisson neurons For Poisson neurons so and P(n f (s)) = e f (s) f (s) n J noise [ f (s )]= = n! ( d d f ( d d f ) 2 log P(n f (s)) f (s ) s ) 2 f (s) + n log f (s) log n! f (s ) ( ) 2 = 1 + n/ f (s ) = (n f (s )) 2 f (s ) 2 = f (s ) f (s ) 2= 1 f (s ) J[s ]= f (s ) 2 / f (s ) s s [not surprising! f (s) = n] s

Cooperative coding Labelled Line Scalar coding firing rate firing rate s s Distributed encoding firing rate firing rate s s

Cooperative coding All of these are found in biological systems. Issues: 1. redundancy and robustness (not scalar) 2. efficiency (not labelled line) 3. local computation (not scalar or distributed) 4. multiple values (not scalar)

Coding in multiple dimensions Cartesian Distributed s 2 s 2 s 1 s 1 efficient problems with multiple values represent multiple values may require more neurons

Cricket cercal system r a (s) = ra max [cos(θ θ a )] + = ra max [c T a v] + So, writing r a = r a /ra max : ( ) ( ) r1 r 3 c T = 1 r 2 r 4 c T v ( 2 ) r1 r v = (c 1 c 2 ) 3 = r r 2 r 1 c 1 r 3 c 3 + r 2 c 2 r 4 c 4 = 4 a c T 1 c 2 = 0 c 3 = c 1 c 4 = c 2 r a c a This is called population vector decoding.

Motor cortex (simplified) Cosine tuning, randomly distributed preferred directions. In general, population vector decoding works for cosine tuning cartesian or dense (tight) directions But: is it optimal? does it generalise? (Gaussian tuning curves) how accurate is it?

Bayesian decoding Take n a Poisson[ f a (s) ], independently for different cells. Then P(n s) = a e f a(s) ( f a (s) ) n a n a! and log P(s n) = a f a (s) + n a log( f a (s) ) log n a! + log P(s) Assume a f a(s) is independent of s for a homogeneous population, and prior is flat. d ds log P(s n) = d ds = a a n a log( f a (s) ) n a f a (s) f a(s)

Bayesian decoding Now, consider f a (s) = e (s s a) 2 /2σ 2, so f a(s) = (s s a )/σ 2 e (s s a) 2 /2σ 2 and set the derivative to 0: n a (s s a )/σ 2 = 0 a ŝ MAP = a n as a a n a So the MAP estimate is a population average of preferred directions. Not exactly a population vector.

Population Fisher Info Fisher Informations for independent random variates add: J n (s) = d2 log P(n s) ds2 = d2 log P(n ds 2 a s) a = d2 ds log P(n a s) = J 2 na (s). a a = a f a(s) 2 f a (s)

Optimal tuning properties A considerable amount of work has been done in recent years on finding optimal properties of tuning curves for rate-based population codes. Here, we reproduce one such argument (from Zhang and Sejnowski, 1999). Consider a population of cells that codes the value of a D dimensional stimulus, s. Let the ath cell emit r spikes in an interval τ with probability distribution that is conditionally independent of the other cells (given s) and has the form P a (r s, τ) = S (r, f a (s), τ). The tuning curve of the ath cell, f a (s), has the form f a (s) = F φ ( (ξ a ) 2) ; (ξ a ) 2 = D i (ξ a i ) 2 ; ξ a i = s i c a i σ, where F is a maximal rate and the function φ is monotically decreasing. The parameters c a and σ give the centre of the ath tuning curve and the (common) width.

Optimal tuning properties Now, the (i j)th term in the FI matrix for the ath cell is (by definition) [ Ji a j(s) = E log P a (r s, τ) ] log P a (r s, τ) s i s j Applying the chain rule repeatedly, we find that log P a 1 (r s, τ) = S (r, f a (s), τ) s i S (r, f a (s), τ) s i = S (2) (r, f a (s), τ) f a (s) S (r, f a (s), τ) s i (where S (2) indicates differentiation with respect to the second argument) = S (2) (r, f a (s), τ) S (r, f a (s), τ) Fφ ( (ξ a ) 2) s i D (ξi a ) 2 = S (2) (r, f a (s), τ) ( S (r, f a (s), τ) Fφ (ξ a ) 2) 2(s i c a i ) (σ a i )2 i

Optimal tuning properties So, [ (S Ji a (2) (r, f a ) 2 ] (s), τ) j(s) = E 4F ( 2 φ ( (ξ a ) 2)) 2 (s i c a i )(s j c a j ) S (r, f a (s), τ) σ 4 ( = A φ (ξ a ) 2, F, τ ) (s i c a i )(s j c a j ) σ 4 where the function A φ does not depend explicitly on σ.

Optimal tuning properties We assumed neurons were independent Fisher information adds. Approximate by integral over the tuning curve centres, assuming uniform density η of neurons. J i j (s) = a J a i j(s) = + + dc a 1 dc a 1 + + dc a D ηj a i j(s) dc a D ηa φ ( (ξ a ) 2, F, τ ) (s i c a i )(s j c a j ) σ 4 Change variables: c a i ξ a i = + = σd σ 2 η σdξ a 1 + + dξ a 1 + σdξ a D ηa φ ( (ξ a ) 2, F, τ ) ξa i ξa j σ 2 dξ a D A φ ( (ξ a ) 2, F, τ ) ξ a i ξ a j Now, if i j, integral is odd in both ξi a and ξ a j, and thus vanishes. If i = j, then the integral has some value D K φ (F, τ, D), independent of σ. Thus, J ii = σ D 2 ηdk φ (F, τ, D) and the total Fisher information is proportional to σ D 2.

Optimal tuning properties Thus optimal tuning width depends on the stimulus dimension. D = 1 σ 0 (although a lower limit is encountered when the tuning width falls below the inter-cell spacing) D = 2 J independent of σ. D > 2 σ (actual limit set by valid stimuli).

More... Correlated noise Extended s (feature maps etc.) Uncertainty