Entropy and Information in the Neuroscience Laboratory

Size: px

Start display at page:

Download "Entropy and Information in the Neuroscience Laboratory"

Allyson Horn
6 years ago
Views:

1 Entropy and Information in the Neuroscience Laboratory Graduate Course in BioInformatics WGSMS Jonathan D. Victor Department of Neurology and Neuroscience Weill Cornell Medical College June 2008

2 Thanks WMC Neurology and Neuroscience Keith Purpura Ferenc Mechler Anita Schmid Ifije Ohiorhenuan Qin Hu Former students Dmitriy Aronov Danny Reich Michael Repucci Bob DeBellis Collaborators Dan Gardner (WMC) Bruce Knight (Rockefeller) Patricia DiLorenzo (SUNY Binghamton) Support NEI NINDS WMC Neurology and Neuroscience

3 Outline Information Why calculate information? What is it? Why is it challenging to estimate? What are some useful strategies for neural data? Examples Maximum entropy methods Data analysis Stimulus design Homework

4 Information: Why Calculate It? An interesting, natural quantity Compare across systems (e.g., one spike per bit ) Determine the constraints on a system (e.g., metabolic cost of information) See where it is lost (e.g., within a neuron) Insight into what the system is designed to do To evaluate candidates for neural codes What statistical features are available? Can precise spike timing carry information? Can neuronal diversity carry information? What codes can be rigorously ruled out? A non-parametric measure of association

5 Even in visual cortex, the neural code unknown What physiologists say: Neurons have definite selectivities ( tuning ) Tuning properties can account for behavior Hubel and Wiesel 1968 What physiologists also know: Responses depend on multiple stimulus parameters Response variability (number of spikes, and firing pattern) is substantial and complicated

6 Some coding hypotheses At the level of individual neurons Spike count Firing rate envelope Interspike interval pattern, e.g., bursts At the level of neural populations Total population activity Labeled lines Patterns across neurons Synchronous spikes Oscillations

7 Coding by intervals can be faster than coding by count Signaling a step change in a sensory input coding by count (rate averaged over time) time coding by interval pattern One short interval indicates a change! Tradeoff: firing must be regular

8 Coding by rate envelope supports signaling of multiple attributes more spikes more transient time Codes based on spike patterns can also support signaling of multiple attributes.

9 A direct experimental test of a neural coding hypothesis is difficult Count, rate, and pattern are interdependent Time is that great gift of nature which keeps everything from happening at once. (C.J. Overbeck, 1978) We d have to manipulate count, rate, and pattern selectively AND observe an effect on behavior So, we need some guidance from theory

10 Information = Reduction in Uncertainty (Claude Shannon, 1948)?????? Observe a response?? Reduction in uncertainty from 6 possibilities to 2 Information = log(6/2)

11 In a bit more detail: A priori knowledge?????? Observe a response No spikes One spike Two spikes A posteriori knowledge

12 Second-guessing shouldn t help No spikes One spike Two spikes Maybe there really should have been a spike? Maybe these two kinds of responses should be pooled? The Data Processing Inequality : information cannot be increased by re-analysis.

13 Information on independent channels should add Color channel Observe a response log(6/2) = log(3) Shape channel Observe a response log(6/3) = log(2) Both channels Observe a response log(6/1) = log(6) log(6) = log(3 2) = log(3) + log(2)

14 Surprising Consequence Data Processing Inequality + Independent channels combine additively + Continuity = Unique definition of information, up to a constant

15 Information: Difference of Entropies output symbol (k) input symbol (j) p(, k) = p(j,k) J j= 1 p( j, k) p( j, ) = K k= 1 p( j, k) a priori distribution: p(,) j a posteriori distribution, given output symbol k: p( j k) = p( j, k)/ p(, k) Information ={Entropy of the a priori distribution of input symbols} minus {Entropy of a posteriori distribution of input symbols, given the observation k, averaged over all k}

16 Entropy: Definition Discrete case: H H = p j log p j (use log 2 to get bits ) j Equivalent forms, useful for analytic purposes: H 1 d α = p α= 1 log2 α j j d 1 1 = lim ( log2 α 1α 1 j α p j 1) Continuous form: has a log(dx) term.

17 Entropy: Properties Mixing property H 1 λ ) p + λq } (1 λ) H{ p } + λh{ q } {( J J J J Chain rule, general form p H(X) H(Y) H ( Z) = H( X) + ph( Y) Chain rule, table form H(X) H(Y) H ( Z) = H( X) + H( Y)

18 Information: Symmetric Difference of Entropies output symbol (k) input symbol (j) p(j,k) = = K k k j p j p 1 ), ( ), ( = = J j k j p k p 1 ), ( ), ( Information = {entropy of output}+{entropy of input} -{entropy of table} )}, ( { )}, ( { )}, ( { k j p H k p H j p H I + =

19 Information: Properties I = H{ p( j, )} + H{ p(, k)} H{ p( j, k)} I is symmetric in input and output I is independent of labeling within input and output I 0 = 0, and I if and only if I H{ p( j, )} and I H{ p(, k)} p ( j, k) = p( j, ) p(, k) Data Processing Inequality: if Y determines Z, then I( X, Y) I( X, Z)

20 Information: Related Quantities Channel capacity maximum information for any input ensemble Efficiency {Information}/{Channel capacity} Redundancy {Information from all channels} 1 - {sum of informations from each channel} Redundancy Index {Information from all channels} 1 - {sum of informations from each channel} 1 - {Information from all channels} {maximum of informations from each channel}

21 Investigating neural coding: not Shannon s paradigm Shannon symbols and codes are known joint (input/output) probabilities are known what are the limits of performance? Neural coding symbols and codes are not known joint probabilities must be measured ultimate performance often known (behavior) what are the codes?

22 Information estimates depend on partitioning of stimulus domain stimulus domain response domain finely partitioned s r unambiguous; H=log(4) bits coarsely partitioned s r unambiguous but detail is lost; H=log(2) bits

23 Information estimates depend on partitioning of response domain stimulus domain response domain s r r finely partitioned: unambiguous; H=log(4) bits coarsely partitioned: ambiguous, H=log(2) bits

24 Information estimates depend on partitioning of response domain, II stimulus domain response domain r s r finely partitioned: unambiguous; H=log(4)=2 bits wrongly partitioned: ambiguous, H=log(1)=0 bits

25 Revenge of the Data Processing Inequality Should these responses be grouped into one code word? time time time Data Processing Inequality says NO: If you group, you underestimate information

26 The Basic Difficulty We need to divide stimulus and response domains finely, to avoid underestimating information ( Data Processing Theorem ). We want to determine , but we only have an estimate of p, not its exact value. Dividing stimulus and response domains makes p small. This increases the variability of estimates of p. But that s not all p log p is a nonlinear function of p. Replacing with log incurs a bias. How does this bias depend on p?

27 Biased Estimates of -p log p 0.4 f(p)= - p log p p Downward bias is greatest where f is greatest. f (p)= - 1/p

28 The Classic Debiaser: Good News/ Bad News We don t have to debias every p log p term, just the sum. The good news (for entropy of a discrete distribution): The plug-in entropy estimate has an asymptotic bias proportional to (k-1)/n, where N is the number of samples and k is the number of different symbols (Miller, Carlton, Treves, Panzeri). The bad news: Unless N>>k, the asymptotic correction may be worse than none at all. More bad news: We don t know what k is.

29 Another debiasing strategy Toy problem: <x 2 > <x> 2 Our problem: <- p log p> -log -p log p x 2 x For a parabola, bias is constant. This is why the naïve estimator for variance can be simply debiased: σ 2 est =<(x-<x>)2 >/(N-1) p Bias depends on the best local parabolic approximation. This leads to a polynomial debiaser. (Paninski) Better than classical debiaser, but p=0 is still worst case. And it still fails in the extreme undersampled regime.

30 The Direct Method (Strong, de Ruyter, Bialek, et al. 1998) Discretize the response into binary words T letter T word T letter must be small to capture temporal detail timing precision of spikes: <1 ms T word must be large enough insect sensory neuron: ms may be adequate Vertebrate CNS: 100 ms at a minimum Up to 2^(T word /T letter ) probabilities to be estimated

31 Multiple neurons: a severe sampling problem One dimension for each bin and each neuron L T letter T word 2^[L(T word / T letter )] probabilities must be estimated.

32 What else can we do? Spike trains are events in time So there are relationships between them: a continuous topology discrete relationships: how many spikes? (and on which neurons?) How can we exploit this?

33 Strategies for Estimating Entropy and Information Spike train as a symbol sequence NO Smooth dependence on spike times? YES Spike train as a point process increasing model strength NO direct Relationship between symbol sequences? YES LZW compression bottleneck/codebook context tree Markov NO binless embedding power series most require comparison of two entropy estimates Smooth dependence on spike count? stimulus reconstruction YES metric space

34 Strategies for Estimating Entropy and Information Spike train as a symbol sequence NO Smooth dependence on spike times? YES Spike train as a point process NO Relationship between symbol sequences? YES NO Smooth dependence on spike count? direct LZW compression bottleneck/codebook context tree binless embedding power series YES metric space Markov stimulus reconstruction

35 A General Approach to Undersampling Create a parametric model for spike trains Fit the model parameters Calculate entropy from the model (numerically or analytically) Toy example: Binomial model p( )= p( 0 ) p( 0 ) p( 1 ) We can determine probabilities of all spike trains from p 0 = p( 0 ). Entropy of an n-bin segment: H n =nh 1 H 1 =-p 0 log p 0 -p 1 logp 1. (bins are independent). A good estimate of the entropy of the model -- but a bad model: spiking probabilities are not independent.

36 Incorporating Simple Dependencies Say we measure p( 0 0 ), p( 0 1 ), p( 1 0 ), p( 1 1 ). Use these to estimate (model) probabilities of longer trains: Assuming that dependence (memory) lasts only one bin, p( a b )= p( a ) a b Iterating, p( a ), so p( a b a )= p( a b c d )= p( a ) p( a b a ) p( b c b ) p( c d c ) p( a b p( a ) ). = p( a ) p( a b p( a ) ) p( b c p( b ) ) p( c d p( c ) ) = p( a b ) p( p( b ) b c ) p( p( c ) c d ).

37 The Markov Model Say we measure m-block probabilities p( a b c d ): Use these to estimate (model) probabilities of longer trains: p( a b c d ) p( b c d e ) p( c d e f ) p( a b c d e f )=. p( b c d ) p( c d e ) Entropy of an n-bin segment: H n nh 1 (bins are dependent). Recursion: H n+1 - H n =H n - H n-1 for n m Analytic expression for entropy per bin: (not obvious) H lim n = Hm Hm 1 n n As m (the Markov order) increases, the model fits progressively better, but there are more parameters to measure (2 m-1 ). If m needs to be large to account for long-time correlations, it is not clear whether we ve gained much over the direct method.

38 Context Tree Method in a nutshell Kennel, Shlens, Abarbanel, Chichilnisky Neural Computation (2005) p(spike) modeled as a variable-depth tree (Markov: constant-depth) Model parameters: tree topology, probability at terminal nodes Entropies determined from model parameters (Wolpert-Wolf) Weighting of each model according to minimum description length Confidence limit estimation via Monte Carlo resampling

39 Strategies for Estimating Entropy and Information Spike train as a symbol sequence NO Smooth dependence on spike times? YES Spike train as a point process NO Relationship between symbol sequences? YES NO Smooth dependence on spike count? direct LZW compression bottleneck/codebook context tree binless embedding power series YES metric space Markov stimulus reconstruction

40 Binless Method: Strategy If every spike train had only 1 spike: then estimation of entropy of spike trains would be the same as empiric estimation of the entropy of a one-dimensional distribution p(x). If every spike train had r spikes: then estimation of entropy of spike trains would be the same as empiric estimation of the entropy of an r-dimensional distribution p(x 1,x 2,,x r ). But spike trains can have 0, 1, 2, spikes. We can estimate a spike count contribution to information. Then, for each r, we can use continuous methods to estimate entropies and a contribution to spike timing information.

41 Entropy of a continuous distribution Entropy: Change of variables: Transformed entropy: x y = p( t) dt H = 1 ln H = p( x)ln p( x) dx ln2 lnp( x) dy Plan: at each datum, estimate dy as 1/N, estimate p(x) locally Relation to binned entropy estimate with bin size b: p j bp ( x j ) so 1 ln p ln p H b j j ln2 ln2

42 One dimension, in more detail Naïve estimate: 1 1 ln p( x) dy ln2 0 N 1 ln2 j ln 1 Nλ j where λ j is the distance to the nearest neighbor of x j. But sampling is Poisson, not uniform. This leads to a bias since lnλ ln λ. Corrected estimate: N 1 ln2 j ln ( N 1 ln γ 1) λ ln2 ln2 j where = γ e v lnvdv

43 Does it work? 1-d Gaussian, 40 numerical experiments 3 2 estimated entropy samples 128 samples bin width=0.125 bin width=0.5 binless binned, jackknife debiased binned, T-P debiased bin width 1024 samples samples bin width= bin width number of samples

44 r dimensions Naïve estimate: 1 1 ln p( x) dy ln2 0 N 1 ln2 j ln 1 NVj where V j is the volume of the smallest sphere near x j that contains a nearest neighbor (proportional to λ jr ). In y-space, sampling is still Poisson! This leads to Corrected estimate: r H N ln2 j ln ( N 1 1) λ Sr ln[ ] γ + r + ln2 ln2 where S r is the surface area of a unit r-sphere. j

45 Does it work? 3-d Gaussian, 40 numerical experiments 8 estimated entropy samples 128 samples bin width=0.125 bin width=0.5 binless binned, jackknife debiased binned, T-P debiased 4 0 bin width 1024 samples bin width= bin width number of samples

46 Binless Embedding Method in a nutshell Embed responses with r spikes as points in an r-dimensional space responses with 0 spikes responses with 1 spike responses with 2 spikes t 2 responses with 3 spikes t 2 t 1 t 3 t 1 t 1 In each r-dimensional stratum, use Kozachenko-Leonenko (1987) nearest-neighbor estimate I r λ j 1 N 1 ln( ) k Nk ln * Nln2 λ ln2 N N 1 j j k r = 2 λ * j λ j Victor, Phys Rev E (2002)

47 Strategies for Estimating Entropy and Information Spike train as a symbol sequence NO Smooth dependence on spike times? YES Spike train as a point process NO Relationship between symbol sequences? YES NO Smooth dependence on spike count? direct LZW compression bottleneck/codebook context tree binless embedding power series YES metric space Markov stimulus reconstruction

48 Coding hypotheses: in what ways can spike trains be considered similar? Similar spike counts Similar spike times Similar interspike intervals

49 Measuring similarity based on spike times Define the distance between two spike trains as the simplest morphing of one spike train into the other by inserting, deleting, and moving spikes A B Unit cost to insert or delete a spike We don t know the relative importance of spike timing, so we make it a parameter, q: shift a spike in time by ΔT incurs a cost of q ΔT Spike trains are similar only if spikes occur at similar times (i.e., within 1/q sec), so q measures the informative precision of spike timing

50 Identification of Minimal-Cost Paths A World lines cannot cross. So, either (i) The last spike in A is deleted, (ii) The last spike in B is inserted (iii) The last spike in A and the last spike in B must correspond via a shift B The algorithm is closely analogous to the Needleman-Wunsch & Sellers (1970) dynamic programming algorithms for genetic sequence comparisons.

51 Distances between all pairs of responses determine a response space responses to stimulus 1 responses to stimulus 2 calculate all pairwise distances responses to stimulus 3 etc.

52 Configuration of the response space tests whether a hypothesized distance is viable Random: responses to the four stimuli are interspersed Systematic clustering: responses to the stimuli are grouped and nearby groups correspond to similar stimuli

53 Metric Space Method in a nutshell Postulate a parametric family of edit-length metrics (distances ) between spike trains Cluster the responses A B Allowed elementary transformations: insert or delete a spike: unit cost shift a spike in time by ΔT: cost is q ΔT Create a response space from the distances actual stimulus assigned cluster Victor and Purpura, Network (1997) Information = row entropy + column entropy - table entropy

54 Visual cortex: contrast responses Trial number 256 ms

55 Visual cortex: contrast coding spike time code information spike count code chance q max interspike interval code q

56 Multiple visual sub-modalities contrast orientation spatial frequency texture type

57 information 1 0 Attributes are coded in distinct ways q max 1 q 100 temporal precision q (1/sec) in primary visual cortex (V1) contrast orientation spatial frequency texture type temporal precision q (1/sec) and especially in V2 contrast orientation spatial frequency texture type

58 Analyzing coding across multiple neurons A multineuronal activity pattern neurons time is a time series of labeled events Distances between labeled time series can also be defined as the minimal cost to morph one into another, with one new parameter: Cost to insert or delete a spike: 1 Cost to move a spike by an amount ΔT: q ΔT Cost to change the label of a spike: k k determines the importance of the neuron of origin of a spike. k=0: summed population code k large: labeled line code

59 Multineuronal Analysis via the Metric-Space Method: A two-parameter family of codes Change the time of a spike: cost/sec = q q=0: spike count code Change the neuron of origin of a spike: cost = k k=0: summed population code : (neuron doesn t matter) k=2: labelled line code (neuron matters maximally) spike count code importance of neuron of origin (k) labeled-line codes summed population codes importance of timing (q)

60 Preparation Recordings from primary visual cortex (V1) of macaque monkey Multineuronal recording via tetrodes ensures neurons are neighbors (ca. 100 microns)

61 The stimulus set: a cyclic domain 16 kinds of stimuli in the full stimulus set

Spatial phase coding in two simple cells 2.

62 Spatial phase coding in two simple cells imp/s imp/s msec Information (bits) k q (1/sec) 473 msec

63 Discrimination is not enough Information indicates discriminability of stimuli, but not whether stimuli are represented in a manner amenable to later analysis Do the distances between the responses recover the geometry of the stimulus space?

neurons considered jointly summed population code (k=0) respect neuron of origin (k=1)

64 Representation of phase by a neuron pair stimuli reconstructed response space: each neuron considered separately neuron 1 neuron 2 reconstructed response space: two neurons considered jointly summed population code (k=0) respect neuron of origin (k=1) Representation of stimulus space is more faithful when neuron-of-origin of each spike is respected.

65 Representing Taste responses of rat solitary tract neuron P. DiLorenzo, J.-Y. Chen, SfN 2007 reconstructed response spaces 4 primary tastes 6 mixtures salty bitter sour sweet spike count code salty bitter 100 spikes per sec 5 sec sour sweet Temporal pattern supports full representation of the 4 primary tastes and their 6 mixtures spike timing code, informative precision q ~ 200 msec

66 Summary: Estimation of Information- Theoretic Quantities from Data It is challenging but approachable There is a role for multiple methodologies Methodologies differ in assumptions about how the response space is partitioned The way that information estimates depend on this partitioning provides insight into neural coding It can provide some biological insights The temporal structure of spike trains is informative (Adding parametric tools): temporal structure can represent the geometry of a multidimensional sensory space

67 Maximum-Entropy Methods The maximum-entropy criterion chooses a specific distribution from an incomplete specification Typically unique: the most random distribution consistent with those specifications Often easy to compute (but not always!) Maximum-entropy distributions General characteristics Familiar examples Other examples Applications

68 General Characteristics of Maximum- Entropy Distributions Setup Seek a probability distribution p(x) on a specified domain Specify constraints C 1,, C M of the form ΣC m (x)p(x)=b m Domain can be continuous; sums become integrals Constraints are linear in p but may be nonlinear in x C(x)=x 2 constrains the variance of p Maximize H(p)=-Σp(x)logp(x) subject to these constraints Solution Add a constraint C 0 =1 to enforce normalization of p Lagrange multipliers λ m for each C m Maximize -Σp(x)logp(x)+Σλ m ΣC m p(x) Solution: p(x)=exp{σλ m C m (x)}, with the multipliers λ m are determined by ΣC m (x)p(x)=b m These equations are typically nonlinear, but occasionally can be solved explicitly The solution is always unique (mixing property of entropy)

69 Examples of Maximum-Entropy Distributions Univariate On [a, b], no constraints: uniform on [a, b] On [-, ], variance constrained: Kexp(-cx 2 ) On [0, ], mean constrained: Kexp(-cx) On [0, 2π], Fourier component constrained: Von Mises distribution, Kexp{-c cos(x-φ)} Multivariate Mean and covariances constrained: Kexp{(x-m) T C(x-m)} Independent, with marginals P (x), Q(y): P(x)Q(y) Discrete On {0,1} on a 1-d lattice, with m-block configurations constrained: the mth-order Markov processes On {0,1} on a 2-d lattice, with adjacent pair configurations constrained: the Ising problem

70 Wiener/Volterra and Maximum-Entropy Wiener-Volterra systems identification: Determine the nonlinear functional F in a stimulus-response relationship r(t)=f[s(t)] Discretize prior times: s(t)=(s(t-δt),, s(t-lδt))= (s 1,, s L ) Consider only the present response: r=r(0) F is a multivariate Taylor series for r in terms of (s 1,, s L ) Measure low-order coefficients, assume others are 0 Probabilistic view Same setup as above, find P(r,s)=P(r, s 1,, s L ) Measure some moments Mean and covariance of s: <s j >, <s j s k >, Cross-correlations: <rs j >, <rs j s k >, Mean of r: <r> Variance of r: <r 2 > Find the maximum-entropy distribution constrained by the above This is Kexp{-c(r-F(s 1,,s L )) 2 } The Wiener-Volterra series with additive and Gaussian noise is the maximum-entropy distribution consistent with the above constraints What happens if the noise is not Gaussian (e.g., spikes?) Victor and Johanessma, 1986

71 Some Experiment-Driven Uses of Maximum-Entropy Methods Rationale: since it is the most random distribution consistent with a set of specifications, it can be thought of as an automated way to generate null hypotheses Modeling multineuronal activity patterns Spontaneous activity in the retina: Shlens et al. 2006, Schneidman et al Spontaneous and driven activity in visual cortex: Ohiorhenuan et al Designing stimuli

72 Analyzing Multineuronal Firing Patterns Consider three neurons A, B, C. If all pairs are uncorrelated, then we have an obvious prediction for the triplet firing event, namely p(a,b,c)=p(a)p(b)p(c). But if we observe pairwise correlations, what do we predict for p(a,b,c), and higher-order patterns? A B C D Use constrained maximum-entropy distributions, of course. Measure p( 1 1 ), p( ), p( ), p( ), 1 Find the maximum-entropy distribution, and use to predict p( ), p( ), p( ), 1 p( p( ), ), p( 1 1 p( ), ).

Neurosci. 2006) And only nearestneighbor pairs matter. Also see Schneidmat et al.

73 Multineuronal Firing Patterns: Retina Maximum-entropy distributions built from pairs account for joint activity of clusters of up to 7 cells (macaque retina, Shlens et al., J. Neurosci. 2006) And only nearestneighbor pairs matter. Also see Schneidmat et al., Nature 2006 (15 cells), and review in Current Opinion In Neurobiology, 2007 (Nirenberg and Victor)

74 Significance of Maximum-Entropy Analysis The pairwise model dramatically simplifies the description of multineuronal firing patterns Reduction from 2 N parameters to N(N-1)/2 parameters Further reduction to ~4N parameters if only nearest neighbors matter It makes sense in terms of retinal anatomy Limitations What about firing patterns across time? What about stimulus driving? What about cortex (connectivity is more complex)

75 Multineuronal Firing Patterns: Visual Cortex number of recordings log 2 -likelihood ratio: model vs. observed Maximum-entropy distributions built from pairs do not account for joint activity of clusters of 3-5 cells (macaque V1, Ohiorhenuan et al., CoSyNe 2006)

Multineuronal Firing Patterns: Visual Cortex reverse correlation with pseudorandom checkerboard (m-sequence) stimuli Analyzing stimulus-dependence construct a pairwise maximum entropy model

76 Multineuronal Firing Patterns: Visual Cortex reverse correlation with pseudorandom checkerboard (m-sequence) stimuli Analyzing stimulus-dependence construct a pairwise maximum entropy model conditioned on one or more stimulus pixels that modulate the response probability distribution 0 log 2 likelihood ratio Higher-order correlations are stimulus-dependent! (Ohiorhenuan et al., SfN 2007) number of conditioning pixels

77 Maximum Entropy and Stimulus Design We d like to understand how cortical neurons respond to natural scenes How can we determine what aspects of natural scenes make current models fail? Maximum-entropy approach: incrementally add constraints corresponding to natural-scene image statistics Gaussian white noise or binary noise: resulting models are OK for retinal ganglion cells, poor in V1 Gaussian 1/f 2 noise: in V1, >50% of variance not explained How about adding higher-order statistical constraints? Which ones to add? How do they interact?

Dimensional Reduction of Image Statistics Image statistics can be

constrained: the mean value of the product of luminance values (+1 or

not When two image statistics are constrained, simple combination rules

78 Dimensional Reduction of Image Statistics Image statistics can be studied in isolation by creating binary images with a single statistic constrained: the mean value of the product of luminance values (+1 or -1) in a glider of several cells some are visually salient others are not When two image statistics are constrained, simple combination rules account for the visual salience of the resulting texture. Victor and Conte 1991, 2006

79 Homework Debiasing entropy estimates: Given an urn with an unknown but finite number of kinds of objects k, and an unknown number n k of each kind of object, and N samples (with replacement) from the urn, estimate k. Does it help to use a jackknife debiaser? Does it help to postulate a distribution for n k? (Treves and Panzeri, Neural Computation 1995) The Data Processing Inequality: Shannon s mutual information is just one example of a function defined on joint input-output distributions. Are there any others for which Shannon information holds? If so, are they useful? (Victor and Nirenberg, submitted 2007) Markov models of point processes Demonstrate that for a mth-order Markov model, the block entropies satisfy H n+1 -H n =H n -H n-1 for n m Maximum-entropy distributions: Consider a Poisson neuron driven by a Gaussian noise. Build the maximum-entropy distribution constrained by the overall mean firing rate, input power spectrum, and spike-triggered average. Fit this with a linear-nonlinear (LN) model. What is the nonlinearity s input-output function? Is the spike-triggered covariance zero? (?? and Victor)

Entropy and Information in the Neuroscience Laboratory

Entropy and Information in the Neuroscience Laboratory G80.3042.002: Statistical Analysis and Modeling of Neural Data Jonathan D. Victor Department of Neurology and Neuroscience Weill Medical College of