Maximally informative dimensions: Analyzing neural responses to natural signals.

Maximally infrmative dimensins: Analyzing neural respnses t natural signals. Tatyana Sharpee, Nicle C. Rust, and William Bialek Slan Swartz Center fr Theretical Neurbilgy, Department f Physilgy University f Califrnia at San Francisc, San Francisc, Califrnia 94143 0444 Center fr Neural Science, New Yrk University, New Yrk, NY 10003 Department f Physics, Princetn University, Princetn, New Jersey 08544 sharpee@phy.ucsf.edu, rust@cns.nyu.edu, wbialek@princetn.edu We prpse a methd that allws fr a rigrus statistical analysis f neural respnses t natural stimuli, which are nn-gaussian and exhibit strng crrelatins. We have in mind a mdel in which neurns are selective fr a small number f stimulus dimensins ut f the high dimensinal stimulus space, but within this subspace the respnses can be arbitrarily nnlinear. Therefre we maximize the mutual infrmatin between the sequence f elicited neural respnses and an ensemble f stimuli that has been prjected n trial directins in the stimulus space. The prcedure can be dne iteratively by increasing the number f directins with respect t which infrmatin is maximized. Thse directins that allw the recvery f all f the infrmatin between spikes and the full unprjected stimuli describe the relevant subspace. If the dimensinality f the relevant subspace indeed is much smaller than that f the verall stimulus space, it may becme experimentally feasible t map ut the neurn s input-utput functin even under fully natural stimulus cnditins. This cntrasts with methds based n crrelatins functins (reverse crrelatin, spike-triggered cvariance,...) which all require simplified stimulus statistics if we are t use them rigrusly. 1 Intrductin Frm lfactin t visin and auditin, there is an increasing need, and a grwing number f experiments [1]-[8] that study respnses f sensry neurns t natural stimuli. Natural stimuli have specific statistical prperties [9, 10], and therefre sample nly a subspace f all pssible spatial and tempral frequencies explred during stimulatin with white nise. Observing the full dynamic range f neural respnses may require using stimulus ensembles which apprximate thse ccurring in nature, and it is an attractive hypthesis that the neural representatin f these natural signals may be ptimized in sme way. Finally, sme neurn respnses are strngly nnlinear and adaptive, and may nt be predicted frm a cmbinatin f respnses t simple stimuli. It has als been shwn that the variability in neural respnse decreases substantially when dynamical, rather than static, stimuli are used [11, 1]. Fr all these reasns, it wuld be attractive t have a rigrus methd f analyzing neural respnses t cmplex, naturalistic inputs. The stimuli analyzed by sensry neurns are intrinsically high-dimensinal, with dimen-

> sins. Fr example, in the case f visual neurns, input is specified as light intensity n a grid f at least pixels. The dimensinality increases further if the time dependence is t be explred as well. Full explratin f such a large parameter space is beynd the cnstraints f experimental data cllectin. Hwever, prgress can be made prvided we make certain assumptins abut hw the respnse has been generated. In the simplest mdel, the prbability f respnse can be described by ne receptive field (RF) [13]. The receptive field can be thught f as a special directin in the stimulus space such that the neurn s respnse depends nly n a prjectin f a given stimulus nt. This special directin is the ne fund by the reverse crrelatin methd [13, 14]. In a mre general case, the prbability f the respnse depends n prjectins "!#, $ &('' '*), f the stimulus n a set f vectrs +,, -'' '.0/ : 13540687 9:&; = 13540687 9:?> @' ' '. 135406879&:&; where is the prbability f a spike given a stimulus 13540687 9: and is the average firing rate. In what fllws we will call the subspace spanned by the set f vectrs +, / the relevant subspace (RS). Even thugh the ideas develped belw can be used t analyze input-utput functins with respect t different neural respnses, we settle n a single spike as the respnse f interest. ) Eq. (1) in itself is nt yet a simplificatin if the dimensinality f the RS is equal t the dimensinality A f the stimulus space. )CB In this paper we will use the idea f dimensinality reductin [15, 16] and assume that A. The input-utput functin > in Eq. (1) can be strngly nnlinear, but it is presumed t depend nly n a small number f prjectins. This assumptin appears t be less stringent than that f apprximate linearity which ne makes when characterizing neurn s respnse in terms f Wiener kernels. )EDF The mst difficult part in recnstructing the input-utput functin is t find the RS. Fr, a descriptin in (1) / is just as valid, since we did nt make any terms f any linear cmbinatin f vectrs +, assumptins as t a particular frm f nnlinear functin >. We might hwever prefer ne crdinate system ver anther if it, fr example, leads t sparser prbability distributins r mre statistically independent variables. Once the relevant subspace is knwn, the prbability 13540687 9:G; # becmes a functin f nly few parameters, and it becmes feasible t map this functin experimentally, inverting the prbability distributins accrding t Bayes rule: +# /H 13 +# / ; 406879&: 13 +# / If stimuli are crrelated Gaussian nise, then the neural respnse can be characterized by the spike-triggered cvariance methd [15, 16]. It can be shwn that the dimensinality f the RS is equal t the number f nn-zer eigenvalues f a matrix given by a difference between cvariance matrices f all presented stimuli and stimuli cnditinal n a spike. Mrever, the RS is spanned by the eigenvectrs assciated with the nn-zer eigenvalues multiplied by the inverse f the a priri cvariance matrix. Cmpared t the reverse crrelatin methd, we are n lnger limited t finding nly ne f the relevant directins. Hwever because f the necessity t prbe a tw-pint crrelatin functin, the spiketriggered cvariance methd requires better sampling f distributins f inputs cnditinal n a spike. In this paper we investigate whether it is pssible t lift the requirement fr stimuli t be Gaussian. When using natural stimuli, which are certainly nn-gaussian, the RS cannt be fund by the spike-triggered cvariance methd. Similarly, the reverse crrelatin methd des nt give the crrect RF, even in the simplest case where the input-utput functin (1) depends nly n ne prjectin. Hwever, vectrs that span the RS are clearly special directins in the stimulus space. This ntin can be quantified by Shannn infrmatin, and ()

e X e g ' an ptimizatin prblem can be frmulated t find the RS. Therefre the current implementatin f the dimensinality reductin idea is cmplimentary t the clustering f stimuli dne in the infrmatin bttleneck methd [17]; see als Ref. 13 [18]. Nn infrmatin based measures f similarity between prbability distributins # 13 and ; 4?6I79&: have als been prpsed [19]. We illustrate hw the ptimizatin scheme f maximizing infrmatin as functin f directin in the stimulus space wrks with natural stimuli fr mdel rientatin sensitive cells with ne and tw relevant directins, much like simple and cmplex cells fund in primary visual crtex. It is als pssible t estimate average errrs in the recnstructin. The advantage f this ptimizatin scheme is that it des nt rely n any specific statistical prperties f the stimulus ensemble, and can be used with natural stimuli. Infrmatin as an bjective functin When analyzing neural respnses, we cmpare the a priri prbability distributin f all presented stimuli with the prbability distributin f stimuli which lead t a spike. Fr Gaussian signals, the prbability distributin can be characterized by its secnd mment, the cvariance matrix. Hwever, an ensemble f natural stimuli is nt Gaussian, s that neither secnd nr any ther finite number f mments is sufficient t describe the prbability distributin. In this situatin, the Shannn infrmatin prvides a cnvenient way f cmparing tw prbability distributins. The average infrmatin carried by the arrival time f ne spike is given by [0] J K5L@M N0O QP&R 13 ; 4?687 9: GS TU WV 13 ; 4?6I79&: 0X 13 #ZY ; 4?687 9: The infrmatin per spike, as written in (3) is difficult t estimate experimentally, 13 since it requires either sampling f the high-dimensinal prbability distributin r a mdel f hw spikes were generated, J K5L@M N0O i.e. the knwledge f lw-dimensinal RS. Hwever it is pssible t calculate in a mdel-independent 13540687 9:G; way, if stimuli are presented multiple times t estimate the prbability distributin #. Then, J K[LM N*O ]\ 13^4?6I79&:_; 13^4?6I79&: 135406879&:_; S TU ` 13540687 9: badc where the average is taken ver all presented stimuli. Nte that fr a finite dataset f e repetitins, J K5L@M N0O the btained Kji5M kml@n M value K[L@M N*Op ef J K[L@M N*O hg will be Kji5M n kml@n Maverage larger than, with difference S q, where e is the number f different stimuli, and e K5L@M N0O J K[LM N0O is the number f elicited spikes [1] acrss all f the repetitins. The true value can als be fund by extraplating t esr []. The knwledge f the ttal infrmatin per spike will characterize the quality f the recnstructin f the neurn s input-utput relatin. Having in mind a mdel in which spikes are generated accrding t prjectin nt a lwdimensinal subspace, we start by prjecting all f the presented stimuli n a particular directin 1utv5wm; 4?6I79&: in the stimulus space, and frm prbability distributins xzy5{ [w! ; 40687 9: R 1utv5w, Wzy5{ 5w! R. The infrmatin Jp x~} w,1mtv[wm; 406879&: GS TU V 1utv5wm; 4?6I79&: 0X 1utG5w ZY prvides an invariant measure f hw much the ccurrence f a spike is determined by prjectin n the directin. It is a functin nly f directin in the stimulus space and des nt change when vectr is multiplied by a cnstant. This can be seen by nting that fr any prbability distributin and any cnstant 1m ZtG[w, W # 1utv5w X. When evaluated J, alng any vectr, ƒ J K[LM N0O J K[L@M N*O. The ttal infrmatin can be recvered alng ne particular directin nly if, and the RS is ne-dimensinal. R (3) (4) (5)

+ ) / { > Y + ' { J, By analgy with (5), ne culd als calculate infrmatin '' ' alng a set f several directins + '' ' / based n the multi-pint prbability distributins: 1mt ˆ Š Š Š t0œ w,ž ; 4?6I79&: W y 5w,Ž Ž _! Ž ; 406879&: R 1mt ˆ Š Š Š t*œ, wpž /x y [wpž Ž _! Ž R ' ) If we are successful in finding all f the directins in the input-utput relatin (1), then J K[LM N0O the infrmatin evaluated alng the fund set will be equal t the ttal infrmatin. When we calculate infrmatin alng J K[LM a N*O set f vectrs that are slightly ff frm the RS, the answer is, f curse, smaller than and is quadratic in deviatins { Ž. One can therefre find the RS by maximizing infrmatin with respect t vectrs simultaneusly. The infrmatin des nt increase if mre vectrs utside the RS are included int the calculatin. On the ther hand, the result f ptimizatin with respect t the number f vectrs may deviate frm the RS if stimuli are crrelated. The deviatin is als 13^4?687 9:_; @' '' prprtinal t a weighted average f 13^4?6I79&:&; @' ' '.. Fr uncrrelated stimuli, any vectr r Jp Jp a set f vectrs that maximizes belngs t the RS. T find the RS, we first maximize J K[L@M N*O, and cmpare this maximum with, which is estimated accrding t (4). If the difference exceeds that expected frm finite sampling crrectins, we increment the number f directins with respect t which infrmatin is simultaneusly maximized. J, The infrmatin as defined by (5) is a cntinuus functin, whse gradient can be cmputed t J w,1 } t [w V y5 ; w 40687 9: y^ ; wp 1mtv[w"; 40687 9: w 1utG[w (6) tj š Since infrmatin des nt change with the length f the vectr, (which can als be seen frm (6) directly), unnecessary evaluatins f infrmatin fr multiples f are avided by maximizing alng the gradient. As an ptimizatin algrithm, we have used a cmbinatin f gradient ascent and simulated annealing algrithms: successive line maximizatins were dne alng the directin f the gradient. During line maximizatins, a pint with a smaller value f infrmatin was accepted accrding t Bltzmann statistics, :@œv6 V ^J, Ž Jp Ž with prbability?0xžy. The effective temperature T is reduced upn cmpletin f each line maximizatin. 3 Discussin We tested the scheme f lking fr the mst infrmative directins n mdel neurns that respnd t stimuli derived frm natural scenes. As stimuli we used patches f digitized t 8-bit scale phts, in which n crrectins were made fr camera s light intensity transfrmatin functin. Our gal is t demnstrate that even thugh spatial crrelatins present in natural scenes are nn-gaussian, they can be successfully remved frm the estimate f vectrs defining the RS. 3.1 Simple Cell Our first example is taken t mimic prperties f simple cells fund in the primary visual crtex. A mdel phase and rientatin sensitive cell has a single relevant directin shwn in Fig. 1(a). A given frame leads t a spike if prjectin Ÿ!, reaches a threshld value in the presence f nise: 135406879&:&; 135406879&: F x y^ (7)

Figure 1: Analysis f a mdel simple cell with RF shwn in (a). The spike-triggered is shwn in (b). Panel (c) shws an attempt t remve crrelatins accrd- ; (d) vectr kh? fund by maximizing average Kji5 ing t reverse crrelatin methd, vª@«ž «Ž 135406879&:&; K[i5 infrmatin; (e) The prbability f a spike!h ± (crsses) is cmpared t 13540687 9:&; 8' used in generating spikes (slid line). Parameters ²~ kh 0 khm ³ I' and kh 0 khm ³ [ kh 0 and khm ³ are the maximum and minimum values f ver the ensemble Jp f presented stimuli.] (f) Cnvergence f the algrithm accrding t infrmatin and prjectin!_ as a functin f inverse effective temperature žµ. where 5w Gaussian randm variable f variance ² mdels additive nise, and functin 3 w D fr, and zer therwise. Tgether with the RF, the parameters fr threshld and the nise variance ² determine the input-utput functin. The spike-triggered average (STA), shwn in Fig. 1(b), is bradened because f spatial crrelatins present in natural stimuli. If stimuli were drawn frm a Gaussian prbability distributin, they culd be decrrelated by multiplying K[i5 by the inverse f the a priri cvariance matrix, accrding t the reverse crrelatin methd. The prcedure is nt valid fr nn-gaussian stimuli and nnlinear input-utput functins (1). The result f such a decrrelatin is shwn in Fig. 1(c). It is clearly missing the structure f the mdel filter. Hwever, it is pssible t btain a gd estimate f it by maximizing infrmatin directly, see panel (d). A typical prgress f the simulated annealing algrithm with decreasing temperature ž is shwn in panel (e). There we plt bth the infrmatin alng the vectr, and its prjectin n. The final value f prjectin depends 8* n the size f the data set, see belw. In the example 8' shwn in Fig. 1 there were ¹ spikes with average prbability f spike per frame. Having recnstructed the RF, ne can prceed t sample 13 the nnlinear input-utput functin. This is dne by cnstructing histgrams fr º!( kh 0 13 and º!p kh? ; 4?6I79&: f prjectins nt vectr 13^4?687 kh 0 9:G; fund by maximizing infrmatin, and taking their rati. In Fig. 1(e) we cmpare»!( kh? (crsses) with the prbability 13^4?687 9:_; used in the mdel (slid line). 3. Estimated deviatin frm the ptimal directin When infrmatin is calculated with respect t a finite data set, the vectr which maxi- J mizes will deviate frm the true RF. The deviatin { arises because the prbability distributins are estimated frm experimental histgrams and differ frm the

e Ž X e J J 1 0.95 e 1 v max 0.9 0.85 Figure : Prjectin f vectr kh 0 0.8 0 1 3 N 1 spike that maximizes infrmatin nk[lm RF N*O functin f the number f spikes t shw the linear scaling in (slid line is a fit). X e 10 5 is pltted as a distributins fund in the limit n infinite data size. Fr a simple cell, the quality f recnstructin can be characterized by the prjectin!_, where bth and are nrmalized, ¼ and { is by definitin rthgnal t. The deviatin { ¼, where is the Hessian f infrmatin. Its structure is similar t that f a cvariance matrix: ¼ Ž¾½ Sq } wp13[w"; 40687 9: wàs q 13[wm; 406879&: 13[w Á yh { ½ ; wp y^ Ž ; w y^ ½ ; wp (8) When averaged ver pssible utcmes f N trials, the gradient f infrmatin is zer fr the ptimal directin. Here in rder t evaluate y^{ Â-Ã V ¼ y J_Ä" ¼ Y, we need t knw the variance w J f the gradient f. By discretizing bth the space f stimuli and pssible prjectins, and assuming that the prbability f generating a spike is independent fr different bins, ne culd btain that y J Ž J ½ ¼ Ž¾½ K[LM N*O S q. Therefre an expected errr in the recnstructin f the ptimal filter is inversely prprtinal t the number f spikes and is given by:!8 y^{ ÂuÃ*Å V ¼ K[LM N0O Y S q where ÂuÃ Å means that the trace is taken in the subspace rthgnal t the mdel filter, since by definitin {#Æ!-. In Fig. we plt the average prjectin f the nrmalized recnstructed vectr n the RF, and shw that it scales with the number f spikes. 3.3 Cmplex Cell A sequence f spikes frm a mdel cell with tw relevant directins was simulated by prjecting each f the stimuli n vectrs that differ by ÇuX in their spatial phase, taken t mimic prperties f cmplex cells, see Fig. 3. A particular frame leads t a spike accrding t a lgical OR, that is if either zÿ!p,, zÿ!p, r exceeds a threshld value @È in the presence f nise. Similarly t (7), 13540687 9:_; 13540687 9: # É> xzy5 *; ; ËÊF *; ; where and are independent Gaussian variables. The sampling f this input-utput functin by ur particular set f natural stimuli is shwn in Fig. 3(c). Sme, especially large, cmbinatins f values f and are nt present in the ensemble. (9) (10)

Ì Ð t Ì Ì Ð t Ì We start by maximizing infrmatin with respect t ne directin. Cntrary t analysis fr a simple cell, ne ptimal directin recvers nly abut J K[L@M N*O 60 f the ttal infrmatin per spike. This is significantly different frm the ttal fr stimuli drawn frm natural scenes, where due t crrelatins even a randm vectr has a high prbability f explaining 60 f ttal infrmatin per spike. We therefre g n t maximize infrmatin with respect t tw directins. An example f the recnstructin f input-utput functin f a cmplex cell is given in Fig. 3. Vectrs and Jp that maximize are nt rthgnal, and are als rtated with respect t and. Hwever, the quality f recnstructin is independent f a particular chice f basis with the RS. The apprpriate measure f similarity ˆ5Ð between the tw planes is the dt prduct f their nrmals. In the example f Fig. 3, Î?Ï Î?Ï¾Ñ t ˆ I' Ó. ÍÌ!I ÒÌ Ñ Maximizing infrmatin with respect t tw directins requires a significantly slwer cling rate, and cnsequently lnger cmputatinal times. Hwever, an expected errr in the recnstructin, Ô ˆ^Ð ÍÌ Î?Ï Î0Ï Ñ!, t ˆ ÍÌ Ñ, fllws a e K[LM N0O behavir, similarly t (9), and is rughly twice that fr a simple cell given the same number f spikes. In this calculatin there were spikes. (a) e (1) (b) e () (c) P(spike s (1),s () ) mdel 10 0 10 0 30 (d) 10 0 30 v 1 30 (e) 10 0 30 v (f) P(spike s v 1,s v ) recnstructin 10 0 10 0 30 10 0 30 30 10 0 30 and Figure 3: Analysis f a mdel cmplex cell with relevant directins shwn in (a) and (b). Spikes are generated I' accrding t an OR input-utput functin > with the threshld kh? khm ³ 8' and nise variance ²¹ kh? khm ³. Panel (c) shws hw the input-utput functin is sampled by ur ensemble f stimuli. Dark. Belw, pixels fr large values f and 13 crrespnd t cases where x we shw vectrs and J, fund by maximizing infrmatin tgether with the crrespnding input-utput functin with respect t prjectins! and!. In cnclusin, features f the stimulus that are mst relevant fr generating the respnse f a neurn can be fund by maximizing infrmatin between the sequence f respnses and the prjectin f stimuli n trial vectrs within the stimulus space. Calculated in this manner, infrmatin becmes a functin f directin in a stimulus space. Thse directins that maximize the infrmatin and accunt fr the ttal infrmatin per respnse f interest span the relevant subspace. This analysis allws the recnstructin f the relevant subspace withut assuming a particular frm f the input-utput functin. It can be strngly nnlinear within the relevant subspace, and is t be estimated frm experimental histgrams. Mst imprtantly, this methd can be used with any stimulus ensemble, even thse that are strngly nn-gaussian as in the case f natural images. Acknwledgments We thank K. D. Miller fr many helpful discussins. Wrk at UCSF was supprted in part by the Slan and Swartz Fundatins and by a training grant frm the NIH. Our cllab-

ratin began at the Marine Bilgical Labratry in a curse supprted by grants frm NIMH and the Hward Hughes Medical Institute. References [1] F. Rieke, D. A. Bdnar, and W. Bialek. Naturalistic stimuli increase the rate and efficiency f infrmatin transmissin by primary auditry afferents. Prc. R. Sc. Lnd. B, 6:59 65, (1995). [] W. E. Vinje and J. L. Gallant. Sparse cding and decrrelatin in primary visual crtex during natural visin. Science, 87:173 176, 000. [3] F. E. Theunissen, K. Sen, and A. J. Dupe. Spectral-tempral receptive fields f nnlinear auditry neurns btained using natural sunds. J. Neursci., 0:315 331, 000. [4] G. D. Lewen, W. Bialek, and R. R. de Ruyter van Steveninck. Neural cding f naturalistic mtin stimuli. Netwrk: Cmput. Neural Syst., 1:317 39, 001. [5] N. J. Vickers, T. A. Christensen, T. Baker, and J. G. Hildebrand. Odur-plume dynamics influence the brain s lfactry cde. Nature, 410:466 470, 001. [6] K. Sen, F. E. Theunissen, and A. J. Dupe. Feature analysis f natural sunds in the sngbird auditry frebrain. J. Neurphysil., 86:1445 1458, 001. [7] D. L. Ringach, M. J. Hawken, and R. Shapley. Receptive field structure f neurns in mnkey visual crtex revealed by stimulatin with natural image sequences. Jurnal f Visin, :1 4, 00. [8] W. E. Vinje and J. L. Gallant. Natural stimulatin f the nnclassical receptive field increases infrmatin transmissin efficiency in V1. J. Neursci., :904 915, 00. [9] D. L. Ruderman and W. Bialek. Statistics f natural images: scaling in the wds. Phys. Rev. Lett., 73:814 817, 1994. [10] D. J. Field. Relatins between the statistics f natural images and the respnse prperties f crtical cells. J. Opt. Sc. Am. A, 4:379 394, 1987. [11] P. Kara, P. Reinagel, and R. C. Reid. Lw respnse variability in simultaneusly recrded retinal, thalamic, and crtical neurns. Neurn, 7:635 646, 000. [1] R. R. de Ruyter van Steveninck, G. D. Lewen, S. P. Strng, R. Kberle, and W. Bialek. Reprducibility and variability in neural spike trains. Science, 75:1805 1808, 1997. [13] F. Rieke, D. Warland, R. R. de Ruyter van Steveninck, and W. Bialek. Spikes: Explring the neural cde. MIT Press, Cambridge, 1997. [14] E. de Ber and P. Kuyper. Triggered crrelatin. IEEE Trans. Bimed. Eng., 15:169 179, 1968. [15] N. Brenner, W. Bialek, and R. R. de Ruyter van Steveninck. Adaptive rescaling maximizes infrmatin transmissin. Neurn, 6:695 70, 000. [16] R. R. de Ruyter van Steveninck and W. Bialek. Real-time perfrmance f a mvement-sensitive neurn in the blwfly visual system: cding and infrmatin transfer in shrt spike sequences. Prc. R. Sc. Lnd. B, 34:379 414, 1988. [17] N. Tishby, F. C. Pereira, and W. Bialek. The infrmatin bttleneck methd. In Prceedings f the 37th Allertn Cnference n Cmmunicatin, Cntrl and Cmputing, edited by B. Hajek & R. S. Sreenivas. University f Illinis, 368 377, 1999. [18] A. G. Dimitrv and J. P. Miller. Neural cding and decding: cmmunicatin channels and quantizatin. Netwrk: Cmput. Neural Syst., 1:441 47, 001. [19] L. Paninski. Cnvergence prperties f sme spike-triggered analysis techniques. In Advances in Neural Infrmatin Prcessing 15, edited by 003. [0] N. Brenner, S. P. Strng, R. Kberle, W Bialek, and R. R. de Ruyter van Steveninck. Synergy in a neural cde. Neural Cmp., 1:1531-155, 000. [1] A. Treves and S. Panzeri. The upward bias in measures f infrmatin derived frm limited data samples. Neural Cmp., 7:399, 1995. [] S. P. Strng, R. Kberle, R. R. de Ruyter van Steveninck, and W. Bialek. Entrpy and infrmatin in neural spike trains. Phys. Rev. Lett., 80:197 00, 1998.