Encoding or decoding

Size: px

Start display at page:

Download "Encoding or decoding"

Todd Jackson
5 years ago
Views:

1 Encoding or decoding

2 Decoding How well can we learn what the stimulus is by looking at the neural responses? We will discuss two approaches: devise and evaluate explicit algorithms for extracting a stimulus estimate directly quantify the relationship between stimulus and response using information theory

3 The optimal linear estimator Let s start with a rate response, r(t) and a stimulus, s(t). The optimal linear estimator is closest to satisfying Want to solve for K. Multiply by s(t-t ) and integrate over t:

4 The optimal linear estimator produced terms which are simply correlation functions: Given a convolution, Fourier transform: Now we have a straightforward algebraic equation for K(w): Solving for K(t),

5 The optimal linear estimator For white noise, the correlation function C ss (t) = s 2 d(t), So K(t) is simply C rs (t).

6 Stimulus reconstruction K t

7 Stimulus reconstruction

8 Stimulus reconstruction

9 Reading minds: the LGN Yang Dan, UC Berkeley

10 Other decoding approaches

11 Binary choice tasks Britten et al. 92: measured both behavior + neural responses

12 Behavioral performance

13 Predictable from neural activity? Discriminability: d = ( <r> + - <r> - )/ s r

14 Signal detection theory z p(r -) p(r +) <r> - <r> + Decoding corresponds to comparing test, r, to threshold, z. a(z) = P[ r z -] false alarm rate, size b(z) = P[ r z +] hit rate, power Find z by maximizing P[correct] = p[+] b(z) + p[-](1 a(z))

15 ROC curves summarize performance of test for different thresholds z Want b 1, a 0.

16 ROC: two alternative forced choice Threshold z is the result from the first presentation The area under the ROC curve corresponds to P[correct]

17 Is there a better test to use than r? The optimal test function is the likelihood ratio, l(r) = p[r +] / p[r -]. (Neyman-Pearson lemma) Recall a(z) = P[ r z -] b(z) = P[ r z +] false alarm rate, size hit rate, power Then l(z) = (db/dz) / (da/dz) = db/da i.e. slope of ROC curve

18 The logistic function If p[r +] and p[r -] are both Gaussian, one can show that P[correct] = ½ erfc(-d /2). To interpret results as two-alternative forced choice, need simultaneous responses from + neuron and from neuron. Simulate - neuron responses from same neuron in response to stimulus. Ideal observer: performs as area under ROC curve.

19 More d Again, if and p[r -] and p[r +] are Gaussian, p[+] and p[-] are equal, P[+ r] = 1/ [1 + exp(-d (r - <r>)/ s)]. d is the slope of the sigmoidal fitted to P[+ r]

20 Neurons vs organisms Close correspondence between neural and behaviour.. Why so many neurons? Correlations limit performance.

21 Priors z p[r -] p[r +] <r> - <r> + Role of priors: Find z by maximizing P[correct] = p[+] b(z) + p[-](1 a(z))

22 The wind or a tiger? Classification of noisy data: single photon responses Rieke

23 Nonlinear separation of signal and noise Classification of noisy data: single photon responses P(I noise) P(I signal) I Rieke

24 Nonlinear separation of signal and noise Classification of noisy data: single photon responses P(I noise) P(I signal) I Rieke

25 Nonlinear separation of signal and noise Classification of noisy data: single photon responses P(I noise) P(I signal) I Rieke

26 Nonlinear separation of signal and noise Classification of noisy data: single photon responses P(I noise) P(noise) P(I signal) P(signal) I Rieke

27 How about costs? P(I noise) P(noise) P(I signal) P(signal) I

28 Building in cost Penalty for incorrect answer: L +, L - For an observation r, what is the expected loss? Loss - = L - P[+ r] Loss + = L + P[- r] Cut your losses: answer + when Loss + < Loss - i.e. Using Bayes, L + P[- r] < L - P[+ r]. P[+ r] = p[r +]P[+]/p(r); P[- r] = p[r -]P[-]/p(r); l(r) = p[r +]/p[r -] > L + P[-] / L - P[+].

29 Relationship of likelihood to tuning curves For small stimulus differences s and s + ds like comparing to threshold

30 Decoding from many neurons: population codes Population code formulation Methods for decoding: population vector Bayesian inference maximum likelihood maximum a posteriori Fisher information

31 Cricket cercal cells Jacobs G A et al. J Exp Biol 2008;211: by The Company of Biologists Ltd

32 Cricket cercal cells

33 Population vector RMS error in estimate Theunissen & Miller, 1991

34 Population coding in M1 r 0 Hand reaching direction Cosine tuning curve of a motor cortical neuron

35 Population coding in M1 Cosine tuning: Pop. vector: For sufficiently large N, is parallel to the direction of arm movement

36 Population coding in M1 Cosine tuning: Pop. vector: Difficulties with this coding scheme?

37 Is this the best one can do? The population vector is neither general nor optimal. Optimal : make use of all information in the stimulus/response distributions

38 Bayesian inference Bayes law: likelihood function conditional distribution prior distribution a posteriori distribution marginal distribution

39 Bayesian estimation Want an estimator s Bayes Introduce a cost function, L(s,s Bayes ); minimize mean cost. For least squares cost, L(s,s Bayes ) = (s s Bayes ) 2 ; solution is the conditional mean.

40 Bayesian inference By Bayes law, likelihood function a posteriori distribution

41 Maximum likelihood Find maximum of P[r s] over s More generally, probability of the data given the model Model = stimulus assume parametric form for tuning curve

42 Bayesian inference By Bayes law, likelihood function a posteriori distribution

43 Population vector RMS error in estimate Theunissen & Miller, 1991

44 MAP and ML ML: s* which maximizes p[r s] MAP: s* which maximizes p[s r] Difference is the role of the prior: differ by factor p[s]/p[r] For cercal data:

45 Decoding an arbitrary continuous stimulus Work through a specific example assume independence assume Poisson firing Noise model: Poisson distribution P T [k] = (lt) k exp(-lt)/k!

46 Decoding an arbitrary continuous stimulus E.g. Gaussian tuning curves

47 Need to know full P[r s] Assume Poisson: Assume independent: Population response of 11 cells with Gaussian tuning curves

48 ML Apply ML: maximize ln P[r s] with respect to s Set derivative to zero, use sum = constant From Gaussianity of tuning curves, If all s same

49 MAP Apply MAP: maximise ln p[s r] with respect to s Set derivative to zero, use sum = constant From Gaussianity of tuning curves,

50 Given this data: Prior with mean -2, variance 1 MAP: Constant prior

51 How good is our estimate? For stimulus s, have estimated s est Bias: Variance: Mean square error: Cramer-Rao bound: Fisher information (ML is unbiased: b = b = 0)

52 Fisher information Alternatively: Quantifies local stimulus discriminability

53 Fisher information for Gaussian tuning curves For the Gaussian tuning curves w/poisson statistics:

54 Are narrow or broad tuning curves better? Approximate: Thus, Narrow tuning curves are better But not in higher dimensions!..what happens in 2D?

55 Fisher information and discrimination Recall d' = mean difference/standard deviation Can also decode and discriminate using decoded values. Trying to discriminate s and s+ds: Difference in ML estimate is Ds (unbiased) variance in estimate is 1/I F (s).

56 Limitations of these approaches Tuning curve/mean firing rate Correlations in the population

57 The importance of correlation Shadlen and Newsome, 98

58 The importance of correlation

59 The importance of correlation

60 Entropy and Shannon information Model-based vs model free

61 Entropy and Shannon information For a random variable X with distribution p(x), the entropy is H[X] = - S x p(x) log 2 p(x) Information is defined as I[X] = - log 2 p(x)

62 Mutual information Typically, information = mutual information: how much knowing the value of one random variable r (the response) reduces uncertainty about another random variable s (the stimulus). Variability in response is due both to different stimuli and to noise. How much response variability is useful, i.e. can represent different messages, depends on the noise. Noise can be specific to a given stimulus.

63 Mutual information Information quantifies how independent r and s are: I(s;r) = D KL [P(r,s), P(r)P(s)] Alternatively: I(s;r) = H[P(r)] S s P(s) H[P(r s)].

64 Mutual information Mutual information is the difference between the total response entropy and the mean noise entropy: I(s;r) = H[P(r)] S s P(s) H[P(r s)]. Need to know the conditional distribution P(s r) or P(r s). Take a particular stimulus s=s 0 and repeat many times to obtain P(r s 0 ). Compute variability due to noise: noise entropy

65 Mutual information Information is symmetric in r and s Examples: response is unrelated to stimulus: p[r s] =?, MI =? response is perfectly predicted by stimulus: p[r s] =?

66 Simple example r + encodes stimulus +, r - encodes stimulus - but with a probability of error: P(r + +) = 1- p P(r - -) = 1- p What is the response entropy H[p]? What is the noise entropy?

67 Entropy and Shannon information Entropy Information H[p] = -p + log p + (1-p + )log(1-p + ) When p + = ½, H[P(r s)] = -p log p (1-p)log(1-p)

Signal detection theory

Signal detection theory z p[r -] p[r +] - + Role of priors: Find z by maximizing P[correct] = p[+] b(z) + p[-](1 a(z)) Is there a better test to use than r? z p[r -] p[r +] - + The optimal