CONTINUOUS TIME CORRELATION ANALYSIS TECHNIQUES FOR SPIKE TRAINS

CONTINUOUS TIME CORRELATION ANALYSIS TECHNIQUES FOR SPIKE TRAINS By IL PARK A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE UNIVERSITY OF FLORIDA 2007 1

c 2007 Il Park 2

Memmings are Memmings, computers are recursive, brains are brains. 3

ACKNOWLEDGMENTS I thank my adviser Dr. José C. Príncipe for all his great guidance, my committee member Dr. John Harris for insightful suggestions, and Dr. Thomas B. DeMarse for his knowledge and intuition on experiments. I thank my collaborators António R. C. Paiva and Karl Dockendorf for all the joyful discussions. I also thank Dongming Xu (dynamics), Jian-Wu Xu (RKHS), Vaibhav Garg, Manu Rastogi, Savyasachi Singh (chess), Allen Martins (pdf), Yiwen Wang and Ayşegül Gündüz of CNEL, Jason T. Winters, Alex J. Cadotte, Hany Elmariah (singing) and Nicky Grimes of the Neural Robotics and Neural Computation Lab for their support and help. Last but not least, I thank my family and friends for being there. 4

TABLE OF CONTENTS page ACKNOWLEDGMENTS................................. 4 LIST OF TABLES..................................... 7 LIST OF FIGURES.................................... 8 ABSTRACT........................................ 9 CHAPTER 1 INTRODUCTION.................................. 10 1.1 Motivation.................................... 10 1.1.1 Why Do We Analyze Spike Trains?.................. 11 1.1.2 What Are Similar Spike Trains?.................... 12 1.2 Minimal Notation................................ 13 2 CROSS INFORMATION POTENTIAL...................... 14 2.1 Smoothed Spike Train Representation..................... 14 2.2 L 2 Metric.................................... 14 2.3 Cauchy-Schwarz Dissimilarity......................... 16 2.4 Information Potential.............................. 17 2.5 Discussion.................................... 18 2.5.1 Comparison of Distances........................ 18 2.5.2 Robustness to Jitter in the Spike Timings............... 20 3 INSTANTANEOUS CROSS INFORMATION POTENTIAL........... 22 3.1 Synchrony Detection Problem......................... 22 3.2 Instantaneous CIP............................... 22 3.2.1 Derivation from CIP.......................... 22 3.2.2 Spatial Averaging............................ 23 3.2.3 Rescaling ICIP.............................. 23 3.3 Analysis..................................... 24 3.3.1 Sensitivity to Number of Neurons................... 24 3.4 Results...................................... 25 3.4.1 High-order Synchronized Spike Trains................. 25 3.4.2 Mirollo-Strogatz Model......................... 27 5

4 CONTINUOUS CROSS CORRELOGRAM.................... 32 4.1 Delay Estimation Problem........................... 32 4.2 Continuous Correlogram............................ 34 4.3 Algorithm.................................... 36 4.4 Results...................................... 39 4.4.1 Analysis................................. 42 4.4.2 Examples................................. 43 4.5 Discussion.................................... 47 5 CONCLUSION.................................... 49 5.1 Summary of Contribution........................... 49 5.2 Potential Applications and Future Work................... 49 APPENDIX A BACKGROUND................................... 50 A.1 Point Process.................................. 50 A.1.1 An Alternative Representation of Poisson Process.......... 51 A.1.2 Filtered Poission Process........................ 52 A.2 Mean Square Calculus............................. 53 A.3 Probability Density Estimation........................ 54 A.4 Information Theoretic Learning........................ 56 A.5 Reproducing Kernel Hilbert Space....................... 58 B STATISTICAL PROOFS.............................. 62 C NOTATION...................................... 67 D SOURCE CODE................................... 69 D.1 CIP....................................... 69 D.2 ICIP....................................... 70 D.3 CCC....................................... 73 REFERENCES....................................... 75 BIOGRAPHICAL SKETCH................................ 81 6

Table LIST OF TABLES page A-1 Various probability density estimation kernels................... 56 7

Figure LIST OF FIGURES page 2-1 L 2 distance versus CS divergence.......................... 18 2-2 Distance difference of CS divergence for a synchronized or uncorrelated missing spike.......................................... 19 2-3 Change in CIP versus jitter standard deviation in the synchronous spike timings 20 3-1 Spike train as a realization of point process and smoothed spike train...... 22 3-2 Variance in scaled CIP versus the number of spike trains used for spatial averaging in log scale....................................... 24 3-3 Analysis of ICIP as a function of synchrony.................... 26 3-4 Evolution of synchrony in the spiking neural network............... 28 3-5 Zero-lag cross-correlation for comparison...................... 29 4-1 Example of cross correlogram construction..................... 33 4-2 Decomposition and shift of the multiset A...................... 36 4-3 Effect of the length of spike train and strength of connectivity on precision of delay estimation................................... 40 4-4 Effect of kernel size (bin size) of CCC (CCH) to the performance........ 41 4-5 Schematic diagram for the configuration of neurons................. 43 4-6 Comparison between CCC and CCH on synthesized data.............. 44 4-7 Effect of length of spike trains on CCC and CCH................. 45 4-8 Correlograms for in vitro data............................ 46 8

Abstract of Thesis Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Master of Science CONTINUOUS TIME CORRELATION ANALYSIS TECHNIQUES FOR SPIKE TRAINS Chair: Jóse Carlos Príncipe Major: Electrical and Computer Engineering By Il Park May 2007 Correlation is the most basic analysis tool for time series. To apply correlation to train of action potentials generated by neurons, the conventional method is to discretize the time. However, time binning is not optimal: time resolution is sacrificed, and it introduces the notorious problem of bin size sensitivity. Since spike trains can be considered as a realization of a point process, the signal has no amplitude and all information is embedded in the times of occurrence. Instead of time binning, we propose a set of methods based on kernel smoothing to analyze the correlations. Smoothing is done in continuous time so we do not lose the exact time of spikes while enabling interaction between spikes at a distance. We present three techniques derived from correlation: (1) spike train similarity measure, (2) synchrony detection mechanism, and (3) continuous cross correlogram. 9

CHAPTER 1 INTRODUCTION 1.1 Motivation Signal processing tools such as adaptive filtering, least squares, detection theory, clustering, and spectral analysis have brought engineers the power to analyze virtually any signal. However, the application of such tools to the signal of the nervous system, the spike train, has remained restricted. This is mainly because of the poor performance of usual estimators for statistical variables such as mean and correlation function for point process observations. The foundation of signal processing tools is the L 2, the metric space of random processes with finite second order moment, which is a well defined Hilbert space. The metric (distance measure) of the random process provides a continuous spectra of similar signals, providing a friendly space analogous to Euclidean space. Also, the distance is strongly related to correlation which is the inner product in L 2. While point processes can be theoretically treated in the same way, the main problem is to estimate the process from the observation. In contrast to analog and digital signals, the distance estimator between two point process observations in the traditional sense leads to natural numbers which is not continuous but discrete, so the spectra of signals is lost. The discrete metric makes it inappropriate to directly apply the signal processing tools to spike trains. Neuroscience literature have been using several approaches to overcome this difficulty. The most widely used approach is to use time bins to convert the times of occurrence to a sequence of binary amplitude or discrete time series. Recently, van Rossum proposed a metric for spike trains [1], which is related to a non-euclidean metric proposed by Victor and coworkers which is an extension of the Levenshtein distance (also known as edit distance in computer science) to continuous time [2]. Many neuroscientists were already using the van Rossum distance by intuition in the form of correlation [3 6]. We mapped the spike trains to a realization of a random process in L 2, so that traditional signal processing techniques can be readily applied. We will analyze 10

the properties of the mapping and the metric induced by the mapping. One of the advantages we gain from this approach is that by choosing the appropriate mapping, the computational cost can be minimized while the time resolution remains continuous. We will derive correlation based measures from this space and recover the power of signal processing tools for spike trains. Specifically, we propose three techniques, (1) the cross information potential (CIP) as a similarity measure between spike trains based on correlation, (2) the instantaneous cross information potential (ICIP) as a measure of instantaneous synchrony among spikes trains, and (3) continous cross correlogram (CCC) as an extension of CIP to continuous time lags. All of the proposed has efficient computation mechanism and will be accompanied by statistical analysis. 1.1.1 Why Do We Analyze Spike Trains? Neurons communicate mainly through a series of action potentials, although there are increasing evidence that field potentials are also essential in the brain [7]. Action potentials are generation by the complex dynamics of a neuron [8, 9], and has a stereotypical shape which can be propagated through a long distance and can resist noise because of its all-or-none type of transmission. There have been evidence that not only the existence of an action potentials carries information, but the duration of the action potential is systematically modulated [10], and recently even subthreshold dendritic input can modulate synaptic terminals [11]. However, from the computational point of view, it is believed that the temporal structure of the action potentials is more important than individual details of an action potential. Experiments mainly in sensory encoding demonstrates precise timing (or precise time to first spike) of action potentials ([12 14], see [15] for a review, and [16] for arguments against it) which supports the idea of encoding information on spike times. The precision of spike timings is less than 100 µs in auditory system [17] and in the order of 1 ms in other experiments [14]. 11

The other reason that spike trains are widely studied is because it is relatively easy to record with high accuracy and precision. Extracellular electrode arrays permits the recording from massive number of neurons simultaneously in vivo and in vitro. Many methods have been developed to analyze spike trains for various problems including correlation analysis [18], connectivity estimation [19, 20], delay estimation [21], system identification [22], clustering different spike patterns [4, 23], estimating entropy [24 27], and neural decoding [28, 29]. We will tackle some of these problems with the proposed techniques. 1.1.2 What Are Similar Spike Trains? As mentioned in section 1.1.1, the spike times produced by neurons in response to repeated stimulus often shows precise timing with some error. The jitter error distribution fits with a Gaussian distribution [13]. The possible noise sources are thermal noise, ion channels, probabilistic synapse activation, spontaneous release of vesicles. When the spike train is modeled by a Poisson process, the jitter noise restricts the shape of the intensity function (instantaneous firing rate) over time. In other words, the noise will limit the narrowness of a precisely timed spike. In addition, this implicates that the spike trains with small timing differences should be treated as similar to each other, thus having a small distance (or dissimilarity 1 ). We can exploit this and construct a probable intensity function from a spike train by using the techniques of kernel density estimation. The kernel, which represents the jitter timing distribution, will be placed where the spikes have actually occurred, and the summation of all kernels will estimate the intensity function assuming a Poisson process. Nawrot and coworkers have tried various kernels for single trial estimation of the intensity 1 Distance usually refers to a mathematical metric which satisfies positivity, reflexivity, definiteness, symmetry and triangle inequality. However, we will also refer to dissimilarity measures that lack the triangle inequality as a distance informally and interchangeably with dissimilarity. 12

function from spike trains in a model, and concluded that the kernel size (bandwidth) is more important that the shape of the kernel [30]. Another type of noise in spike trains is insertion or deletion of spikes. Although spike trains of neurons conserve high precision of spike timings when they occur, there is evidence that neurons often skip a few spikes [4, 31, 32]. When a spike is inserted or removed from a spike train, the distance differs by the constant 1 2 in van Rossum distance. In contrast, a correlation measure does not depend on signal power (or number of action potentials), but only on the coincidental action potential pairs. In applications, such as classification of spike trains with template matching, the correlation based distance measure (Cauchy-Schwarz divergence) can perform better than van Rossum (L 2 ) distance. The concept of coincidental spikes leads to synchrony between spike trains. In addition, there are strong evidences that neurons and dendrites work as a coincidence detector and sensitive to afferent synchrony [26, 33 36]. 1.2 Minimal Notation We introduce the minimal mathematical notation. We assume that a number of spike trains are observed, and indexed. Each spike train is a finite set of spike timings where the action potentials are detected. For the spike train indexed by i, individual timings are denoted as t i m where m is the index for spikes. The functional form of i-th spike train is defined as, s i (t) = N i m=1 δ(t t i m) (1 1) where N i is the number of spikes in i-th spike train, and δ( ) is the Dirac delta function. 13

CHAPTER 2 CROSS INFORMATION POTENTIAL 2.1 Smoothed Spike Train Representation Given a spike train s i (t), we assume inhomogeneous Poisson process and estimate the intensity function by using a kernel. The kernel has to be non-negative valued and has area of 1, that is, it has to be a proper probability density function. Denote this kernel as κ pdf (t), then the estimated intensity function can be written as, ˆλ i (t) = N i m=1 κ pdf (t t i m). (2 1) This process can also be viewed as low pass filtering of the spike trains to estimate the post synaptic potential of synapses. In the point process literature, this is a special case of filtered point process, and in the engineering literature known as shot noise. 1 The estimated intensity function is continuous if κ pdf is continuous. Assuming continuous κ pdf, the mapping equation (2 1) converts a spike train to a continuous signal that can be interpreted with the second order theory with a continuous metric. Note that the mapping is one-to-one and onto: deconvolution of ˆλ i (t) with κ pdf uniquely determines a spike train. 2.2 L 2 Metric The smoothed spike train, or estimated intensity function, can be considered as a signal in L 2. The distance in L 2 of two smoothed spike trains is, ˆλ i (t) ˆλ j (t) 2 = 2 = ( ˆλ i (t) ˆλ j (t)) 2 dt (2 2a) (ˆλ2 i (t) 2ˆλ i (t)ˆλ j (t) + ˆλ ) 2 j(t) dt. (2 2b) 1 When the underlying process is a homogeneous Poisson process, the filtered point process is wide sense stationary (WSS) by Campbell s theorem (see appendix, theorem 3). 14

Using the definition of the estimator (2 1), ˆλ 2 i (t)dt = = N i N i m=1 n=1 N i N i m=1 n=1 κ pdf (t t i m)κ pdf (t t i n)dt κ(t i m t i n) (2 3a) (2 3b) and the cross term (inner product in L 2 ) becomes, ˆλ i (t)ˆλ j (t)dt = N i N j κ(t i m t j n) m=1 n=1 (2 3c) where κ(t) = κ pdf(s)κ pdf (s + t)ds. κ is the kernel which computes the correlation. If an exponential distribution is used, i.e., κ pdf (t) = 1 τ e t τ u(t), (2 4) where u(t) is the unit step function, then the L 2 distance is proportional to van Rossum distance with factor 1. In addition, the combined kernel κ(t) becomes a scaled Laplace τ distribution kernel: ˆλ i (t)ˆλ j (t)dt = 1 τ 2 = 1 τ 2 = 1 τ 2 = 1 τ 2 = N i N i N j m=1 n=1 N i m=1 N i m=1 N i ( exp t ) ( ti m u(t t i τ m) exp t ) tj n u(t t j τ n)dt (2 5) N j ( exp 2t ) ti m t j n u(t t i n=1 τ m)u(t t j n)dt (2 6) N j ( exp 2t ) ti m t j n dt (2 7) n=1 max(t i m,t j n) τ N j ( τ ( 2 ) exp 2t ) ti m t j n (2 8) τ m=1 n=1 N j m=1 n=1 1 2τ exp ( ) ti m t j n τ max(t i m,t j n) Note that in terms of a linear filter, the causal exponential distribution corresponds to a first-order infinite impulse response (IIR) filter with time constant τ with gain of 1 τ. (2 9) 15

2.3 Cauchy-Schwarz Dissimilarity An alternative dissimilarity measure that can be induced from inner product of L 2 is the Cauchy-Schwarz (CS) divergence. Recall the Cauchy-Schwarz inequality (see lemma 6): x y x y. Since each quantity is positive if x and y are not zero vectors, and equality holds when either of them are zero, we can divide both sides, 1 x y x y. By taking the logarithm, 0 log ( ) x y. x y It can be proved that this quantity is positive, reflexive, and symmetric [37] if we exclude 0 from the space. However, CS divergence does not hold the triangular inequality, thus it is not a metric. By expanding the definition of inner product and norm of L 2 space, d CS (ˆλ i (t), ˆλ ˆλ 2 i j (t)) = log (t)dt ˆλ 2 j (t)dt ˆλ (2 10a) i (t)ˆλ j (t)dt = log Ni Ni m=1 n=1 κ(ti m t i n) N j Ni m=1 = log N i N i κ(t i m t i n) m=1 n=1 Nj m=1 Nj n=1 κ(t i m t j n) N j N i N j log κ(t i m t j n), m=1 n=1 N j κ(t j m t j n) m=1 n=1 n=1 κ(t j m t j n) (2 10b) (2 10c) where d CS denotes the CS divergence. If the spike trains are homogeneous Poisson with firing rate λ i and λ j respectively, the expected value of the norm of estimated intensity function E [λ 2 i (t)] is the second order 16

moment of the shot noise, which can be obtained by equation (A 8), E [ λ 2 i (t) ] = λ i κ 2 (t)dt. (2 11) Therefore the first term in equation (2 10c) can be approximated as a constant. However, depending on the correlation of the spike trains, the second term will vary. Since the negative logarithm is a monotonically decreasing function, we take the argument, denote as V ij, and define as cross information potential for reasons that would be explained in section 2.4. V ij = N i N j m=1 n=1 κ(t i m t j n) (2 12) This inner product term is essentially equivalent to correlation of smoothed spike trains. CIP is inversely related to CS divergence, so it quantifies similarity between spike trains. 2.4 Information Potential Given a probability distribution, entropy quantifies the peakiness and is related to the higher order moments that the variance cannot capture. Renyí s entropy is a generalization of the classic Shannon s entropy. Information theoretic learning (see section A.4 for a summary of the information theoretic learning framework). Inhomogeneous Poisson process can be represented as two separate random variables: one for the number of spikes and the other for the temporal density (see section A.1.1). The pdf for the temporal density is simply a normalized form of the intensity function (equation (A 2)). This pdf does not have the information of how active the process is, that is, the firing rate. Information potential of density function estimated using Parzen window with κ pdf for the i-th spike train has the following form (compare equation (A 16)), V i = 1 N 2 i N i N i m=1 n=1 where κ(t) = κ pdf(s)κ pdf (s + t)ds is defined as before. κ(t i m t i n) (2 13) This coincides with the definition of norm square of the smoothed spike train, equation (2 3b), normalized by the 17

0.04 L2 distance 5 CS distance distance from template 2 0.035 0.03 0.025 0.02 0.015 0.01 0.005 0 0 0.01 0.02 0.03 0.04 distance from template 1 distance from template 2 4 3 2 1 0 0 1 2 3 4 5 distance from template 1 Figure 2-1. L 2 distance versus CS divergence. Spike trains from template 1 is generated and the distance (or divergence) from each template. Gaussian jitter with 0.7 ms standard deviation is added to the timings. Blue circles correspond to spike trains with same number of spikes, and red dots correspond to spike trains with missing spikes. The kernel κ was Laplacian with time constant τ = 1 ms. number of spikes. For a pair of spike trains, the cross information potential can be defined as a similarity index between the corresponding pair of pdfs. Note that in terms of CS divergence, the normalization with the number of spikes in the spike train cancels away. 2.5 Discussion 2.5.1 Comparison of Distances As mentioned earlier in section 1.1.2, although neurons fire with high temporal precision, they often miss spikes. In this case, L 2 distance would deviate because of the missing spike. CS divergence would be less sensitive because it will ignore missing spikes. To demonstrate this, a simple classification task was performed (see figure.2-1). Two template spike trains were prepared: template 1 with 2 spikes at 3 ms and 8 ms, and template 2 with 1 spike at 6 ms. Then, we generated instances of template 1 by putting Gaussian jitter on timing (blue circles) and removing a spike (red dots). For the no missing spike case, both L 2 (94%) and CS divergence (100%) correctly classified the instance as template 1 (they lie on the upper half). But for missing spikes 18

0.7 0.6 loosing one uncorrelated spike 0.7 0.6 loosing one correlated spike 30 spikes total 60 spikes total decrease in CS distance 0.5 0.4 0.3 0.2 0.1 increase in CS distance 0.5 0.4 0.3 0.2 0.1 0 0 10 20 30 number of total spikes 0 0 10 20 30 number of correlated spikes Figure 2-2. Increase or decrease in Cauchy-Schwarz (CS) divergence (dissimilarity) when a spike is missing. (Left) When a correlated (perfectly synchronized in this case) spike is missing, the divergence decrease inversely related to the total number of spikes. (Right) But if a correlated (synchronized) spike is missing, the divergence increases proportional to the total number of synchronized spikes, and not greatly influence by the total number of spikes. In contrast, L 2 distance the increase and decrease are constant (see text for details). case, L 2 distance (51%) performed a lot worse than CS divergence (93%). The CS divergence shows lines when one spike is missing because the distance (quantified as the divergence) is a log of the kernel which is a single Laplacian. Suppose individual spikes are separated compared to the kernel size or exactly synchronized so that we can approximate the norm and inner product by the number of spikes: norm square of a spike train is the number of spikes, and inner product gives the number of synchronized spikes. This is equivalent to making the kernel size infinitely small, so that it converges to a Dirac delta function. Let there be two spike trains A and T (for template) with N A and N T number of spikes respectively, and N AT synchronized spikes. The L 2 distance between A and T is N A + N T 2N AT, and the CS divergence is log N AN T N AT. If we loose a spike that was not synchronous between A and T, the distance will decrease by the constant 1 in L 2 distance ( 1 2 in van Rossum distance) and for CS 19

CIP 0.7 0.6 0.5 0.4 0.3 [2ms] 0 0.1 0.2 0.3 0.4 0.5 CIP 0.35 0.3 0.25 0.2 [5ms] 0 0.1 0.2 0.3 0.4 0.5 0.2 0.15 0.1 0.1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Jitter standard deviation (ms) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Jitter standard deviation (ms) Figure 2-3. Change in CIP versus jitter standard deviation in the synchronous spike timings. For the case with independent spike trains, the error bars for one standard deviation are also shown. The kernel size is 2ms (left) and 5ms (right). divergence the decrease is, log N AN T N AT log N A(N T 1) N AT = log N AN T N AT N AT N A (N T 1) (2 14) N T = log N T 1. (2 15) Thus, if there are more spikes, the CS divergence decreases less for a missing non-synchronous spike. (And if the last spike is lost, the CS divergence is not defined anymore.) If a synchronized (correlated) spike is lost, N T and N AT are reduced by 1. L 2 distance increases by 3, and for CS divergence the increase is, log N A(N T 1) N AT 1 log N AN T = log N T 1 N AT N AT N AT 1. (2 16) Therefore, if there are more synchronized spikes, the distance decreases more. See figure 2-2 for an illustrative example. 2.5.2 Robustness to Jitter in the Spike Timings CIP was analyzed when jitter is present in the spike timings. This was done with a modified multiple interaction process (MIP) model [38, 39] where jitter, modeled as N T 20

i.i.d. Gaussian noise, was added to the individual spike timings. In the MIP model an initial spike train is generated as a realization of a Poisson process. All spike trains are derived from this one by copying spikes with a probability ε. The operation is performed independently for each spike and for each spike train. The resulting spike trains are also Poisson processes. If γ was the firing rate of the initial spike train then the derived spikes trains will have firing rate εγ. Furthermore, it can be shown that ε is also the count correlation coefficient [38]. A different interpretation for ε is that, given a spike in a spike train, it quantifies the probability of a spike co-occurrence in another spike train. The effect was then studied in terms of the synchrony level and kernel size. Figure 2-3 shows the average CIP for 10 Monte Carlo runs of two spike trains, 10 seconds long, and with constant firing rate of 20 spikes/s. In the simulation, the synchrony level was varied between 0 (independent) to 0.5 for a kernel size of 2ms and 5ms. The jitter standard deviation varied between the ideal case (no-jitter) to 15ms. As mentioned earlier, CIP measures the coincidence of the spike timings. As a consequence, the presence of jitter in the spike timings decreases the expected values of CIP (and time averaged ICIP). Nevertheless, the results in Fig. 2-3 support the statement that the measure is indeed robust to large levels of jitter compared to the kernel size, and is capable of detecting the existence of synchrony among neurons. Of course, increasing the kernel size decreases the sensitivity of the measure for the same amount of jitter. Furthermore, as in the previous example, it is also shown that small levels of synchrony can be discriminated from the independent case as suggested by the error bars in Figure 2-3. Finally, we remark that the difference in scale between the figures is a consequence of the normalization of the kernel so that it is a valid pdf. This can be compensated explicitly by scaling the CIP by τ. Simply note that the expressions provided in the previous example for mean ICIP (and therefore CIP) as a function of the synchrony level implicitly compensate for τ. 21

CHAPTER 3 INSTANTANEOUS CROSS INFORMATION POTENTIAL 3.1 Synchrony Detection Problem Coincidental firing of different neurons has been a focus of interest from synfire chain [40], neural coding [31, 41], neural assemblies [3], binding problem [42], and to pulse coupled oscillators [43 47]. Analysis of synchrony has relied on various methods, such as the cross-correlation [48], joint peri-stimulus time histogram (JPSTH) [49], unitary events [50], and gravity transform [3], among many others. Since CIP (or CS divergence) characterizes the similarity (or dissimilarity) of spike trains with correlation of spike times, CIP can also be used as a synchrony measure. However, CIP does not provide information about instantaneous synchrony. A sliding window approach can be used with sacrifice of the temporal resolution, as in cross correlation and gravity transform. 3.2.1 Derivation from CIP (2 3c)). 3.2 Instantaneous CIP Let us break the integral range from the definition of L 2 inner product (equation V ij (t) = Taking the derivative on time yields ICIP, t ˆλ i (σ)ˆλ j (σ)dσ. (3 1) v ij (t) = ˆλ i (t)ˆλ j (t), (3 2) (a) t i 1 t i 2 t i 3 t i 4 t i N i (b) T0 time T Figure 3-1. Spike train as a realization of point process and smoothed spike train. (a) Spike train of neuron i represented in the time domain as a sequence of impulses and (b) its filtered counterpart using a causal decaying exponential. 22

by the fundamental theorem of calculus. Since the derivative provides the instantaneous change of CIP at that time, ICIP quantifies instantaneous synchrony of the action potential timing. If we use the exponential kernel for intensity estimation, ICIP can be easily estimated by two IIRs and a multiplication, therefore requiring no memory, but just two state variables. 3.2.2 Spatial Averaging In the context of neural assembly, ensemble of neurons work together with synchronous spikes. Current multielectrode recording technology has enabled the analysis of a number of spike trains recorded simultaneously. It is possible to reduce the trial averaging by combining the concept of neural assemblies and multiple spike trains recording. The spatial averaging over the ensemble may provide high resolution of the events. Consider a set of M spike trains. ICIP (and CIP) can be generalized to multiple spike trains in a straightforward manner by averaging over all the pairwise combinations. That is, the ensemble averaged ICIP is given by v(t) = 2 M(M 1) M M i=1 j=i+1 Analysis of the spatial averaging is presented in section 3.3.1. 3.2.3 Rescaling ICIP v ij (t). (3 3) When precise timing is modulated with a fluctuation of the firing rate, the precision of the timing may vary. In high firing rate regions, the experimenter would like to pay more attention to more precise synchronizations, since the spikes are dense. Changing the kernel size according to the general firing rate trend may help in these cases. The time rescaling theorem states that an inhomogeneous Poisson process can be transformed into a homogeneous Poisson process [51, 52] by stretching the time according to the intensity function. Transformation of equation (2 1) into a constant firing rate time scale for different spike trains depends on individual intensity function, and therefore the transformed results are not synchronous. Thus, in order to quantify synchrony, the 23

Variance of CIP 10 0 10 1 10 2 10 3 10 4 10 5 0.0 0.1 0.2 0.3 0.4 0.5 5 10 15 20 25 30 Number of spike trains Variance of CIP 10 1 10 2 10 3 10 4 10 5 1 ms 2 ms 5 ms 10 ms 20 ms 10 6 0 5 10 15 20 25 30 Number of spike trains Figure 3-2. Variance in scaled CIP versus the number of spike trains used for spatial averaging in log scale. The analysis was performed for different levels of synchrony and constant τ = 2ms (left), and different values of the exponential decay parameter τ on independent spike trains (right). In both plots the theoretical value of CIP for independent spike trains is shown (dashed line). correlation operation should be performed in the original times, but with the smoothing in the transformed space. The first order approximation of this can be achieved by redefining the intensity estimator as ˆλ i (t) = 1 β N i m=1 ( exp ˆf ) i (t) β (t ti m) u(t t i m) (3 4) where ˆf i (t) is also the estimation for the intensity function and β > 0 is a scaling constant which specifies the value of τ when the firing rate is one. Therefore, at time t, the effective time constant is approximately β. It may seem like an oxymoron to estimate an ˆλ i (t) intensity function using an estimate of the intensity function, but ˆf(t) is estimated with a broader kernel for the firing rate trend, and ˆλ(t) has a small kernel size that corresponds to the resolution of interest. 3.3 Analysis 3.3.1 Sensitivity to Number of Neurons We now analyze the effect of the number of spike trains used for spatial averaging. This effect was studied with respect to two main factors: the synchrony level of the spike 24

trains and the exponential decay parameter τ. In the first case, a constant τ = 2ms was used, while the latter case considered only independent spike trains. The results are shown in Fig. 3-2 for the scaled CIP spatially averaged over all pair combinations of neurons. The simulation was repeated for 200 Monte Carlo runs using 10 second long spike trains obtained as homogeneous Poisson processes with firing rate of 20 spikes/s. As illustrated in the figure, the variance in CIP decreases dramatically with the increase in the number of spike trains employed in the analysis. Recall that the number of pair combinations over which the averaging is performed increases with M(M 1), where M is the number of spike trains. As expected, this improvement is most pronounced in the case of independent spikes trains. In this situation, the variance decreases proportionally to the number of averaged pairs of spike trains. This is shown by the dashed line in the plots of Fig. 3-2. These results support the role and importance of ensemble averaging as a principled method to reduce the variance of the CIP estimator. 3.4 Results 3.4.1 High-order Synchronized Spike Trains Figure 3-3 shows ICIP of different levels of synchrony over ten spike trains. The synchrony was generated by using the MIP model, and modulated over time for 1 seconds of time durations. The firing rate of the generated spike trains was constant and equal to 20 spikes/s for all spike trains. The figure shows the ICIP averaged for each time instant over all pair combinations of spike trains. Because the spike trains have constant firing rate, the time constant of the decaying exponential convolved with the spike trains was constant and chosen to be τ = 2 ms. Also, in the bottom plot the average value of the mean ICIP is shown. This was computed in 25 ms steps with a causal 250 ms long sliding window. To establish a relevance of the values measured, the expectation and this value plus two standard deviations are also shown, assuming independence between spike trains. The mean and standard deviation, assuming independence, are 1 and ( 1 2τλ + 1) 2 1, respectively (see Appendix for details). The expected value of the ICIP when synchrony 25

10 0.5 0 Synchrony, ε Spike train number 8 6 4 2 600 500 400 300 200 100 ICIP 10 0 0 0 1 2 3 4 5 6 7 8 9 10 11 Time (s) Figure 3-3. Analysis of ICIP as a function of synchrony. (Top) Level on synchrony specified in the simulation of the spike trains. (Upper middle) Raster plot of firings. (Lower middle) Average ICIP across all neuron pair combinations. (Bottom) Time average of ICIP in the upper plot computed in steps of 25ms with a causal rectangular window 250ms long (dark gray). For reference, it is also displayed the expected value (dashed line) and this value plus two standard deviations (dotted line) for independent neurons, together with the expected value during moments of synchronous activity (thick light gray line), as obtained analytically from the level of synchrony used in the generation of the dataset. Furthermore, the mean and standard deviation of the ensemble averaged CIP scaled by T measured from data in one second intervals is also shown (black). 26

among spike trains exists is given by 1 + ε/(2τλ), with λ the firing rate of the two spike trains, and is also shown in the plot for reference. In the figure, it is noticeable that estimated synchrony increases as measured by ICIP. Moreover, the averaged ICIP is very close to the theoretical expected value and is typically below the expected maximum under an independence assumption as given by the line indicating the mean plus two standard deviations. The delayed increase in the averaged ICIP is a consequence of the causal averaging of ICIP. It is equally remarkable to verify that (scaled) CIP matches precisely the expected values from ICIP as given analytically. 3.4.2 Mirollo-Strogatz Model In this example, we show that ICIP can quantify synchrony in a spiking neural network of leaky-integrate-and-fire (LIF) neurons designed according to [43] 1 and compare the result with extended cross-correlation for multiple neurons. This is the simplest pulse coupled network that was proven to be perfectly synchronized from almost any initial condition (Fig. 3-4). The synchronization is essentially due to leakiness and the weak global coupling among the oscillatory neurons. The raster plot of the network firing pattern is shown in Fig. 3-4. There are two main observations: the progressive synchronization of the firings associated with the global oscillatory behavior of the network, and the local grouping that tends to preserve local synchronizations that either entrain the full network or wash out over time. As expected from theoretical studies of the network behavior [43, 46] and which ICIP depicts precisely, the synchronization is monotonically increasing, with a period of fast increase in the first second followed by a plateau and slower increase as time advances. Moreover, it is possible 1 The parameters for the simulation are: 100 neurons, resting and reset membrane potential -60 mv, threshold -45 mv, membrane capacitance 300 nf, membrane resistance 1 MΩ, current injection 50 na, synaptic weight 100 nv, synaptic time constant 0.1 ms and the topology was all to all excitatory connection. 27

Spike train number 100 90 80 70 60 50 40 30 20 10 x 10 3 3 2.5 2 1.5 ICIP 1.1 1.2 1.3 1.4 1.5 1 0.5 2.4 x 106 0 2.2 IP of Membrane Potential 2 1.8 1.6 1.4 1.2 1 0.8 0.6 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Time (sec) Figure 3-4. Evolution of synchrony in the spiking neural network. (Top) Raster plot of the neuron firings. (Middle) ICIP over time. The inset highlights the merging of two synchronous groups. (Bottom) Information potential of the membrane potentials. This is a macroscopic variable describing the synchrony in the neurons internal state. 28

0.07 0.06 0.05 Crosscorrelation 0.04 0.03 0.02 0.01 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Time (sec) 0.07 0.06 0.05 Crosscorrelation 0.04 0.03 0.02 0.01 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Time (sec) Figure 3-5. Zero-lag cross-correlation computed over time using a sliding window 10 bins long, and bin size 1ms (top) and 1.1ms (bottom). 29

to observe in the first 1.5 s the formation of a second group of synchronized neurons which slowly merges into the main group. Since the model was simulated, we also have access to all the internal variables: the membrane potential of individual neurons over time. Thus, we can compute the synchrony of neurons in terms of membrane potential. Surprisingly, the information potential (IP) of the membrane potentials reveals the same evolution as the envelope of ICIP, including the plateau. The IP was computed according to (A 18) using a Gaussian kernel with size 0.75mV. 2 The IP measures synchrony of the neuron s internal state, which is only available in simulated networks. Yet the results show that ICIP was able to sucessfully and accurately extract such information from the observed spike trains. For completeness, in Fig. 3-5 we also present the zero-lag cross-correlation over time, averaged through all pairwise combinations of neurons. The cross-correlation was computed with a sliding window 10 bins long, sliding 1 bin at a time. In the figure, the result is shown for a bin size of 1 ms and 1.1 ms. It is notable that although cross-correlation captures the general trends of synchrony, it masks the plateau and the final synchrony and it is highly sensitive to the bin size as shown in the figure, unlike ICIP (data not shown). In other words, the results for the windowed cross-correlation show the importance of working in continuous time which is crucial for robust synchrony estimation in the spike domain. Other methods relying on binning also suffer from sensitivity to bin size, such as the ones mentioned earlier. For this reason, these methods are limited and unable to achieve the same high temporal resolution as ICIP. In addition, spike trains are generally non-stationary unlike some methods assume. The conventional approach is to use a moving window analysis such that only piece-wise 2 The distance used in the Gaussian kernel was d(θ i, θ j ) = min ( θ i θ j, 15mV θ i θ j ), where θ i is the membrane potential of the ith neuron. This wrap-around effect expresses the phase proximity of the neurons before and after firing. 30

stationarity is necessary. The information theoretic framework of ICIP, and CIP, treats the non-stationarity implicitly as a pdf estimation problem. 31

CHAPTER 4 CONTINUOUS CROSS CORRELOGRAM 4.1 Delay Estimation Problem Precise time delay in transmission of a spike in the neural system is considered to be one of the key features to allow efficient computation in cortex [15, 53]. For example, it is crucial for coincidence detection of auditory signal processing [17]. One of the effective methods for estimating the delay is to use a cross correlogram [54]. Cross correlogram is a basic tool to analyze the temporal structure of signals. It is widely applied in neuroscience to assess oscillation, propagation delay, effective connection strength, and spatiotemporal structure of a network [28]. However, estimating the cross correlation of spike trains is non-trivial since they are point processes, thus the signals do not have amplitude but only time instances when the spikes occur. A well known algorithm for estimating the correlogram from point processes involves histogram construction with time interval bins [48]. The binning process is effectively transforming the uncertainty in time to amplitude variability. This quantization of time introduces binning error and leads to coarse time resolution. Furthermore, the correlogram does not take advantage of the higher temporal resolution of the spike times provided by current recording methods. This can be improved by using smoothing kernels to estimate the cross correlation function from finite samples. The resulting cross correlogram is continuous and provides high temporal resolution in the region where there is a peak (see Fig. 4-1 for comparison between histogram method and kernel method.) In this paper, we propose an efficient algorithm for estimating the continuous correlogram of spike trains without time binning. The continuous time resolution is achieved by computing at finite time lags where the continuous cross correlogram can have a local maximum. The time complexity of the proposed algorithm is O(T log T ) on average where T is the duration of spike trains. The application of the proposed algorithm is not restricted to simultaneously recorded spike trains, but also to PSTH and also other point processes in general. 32

A D B 0 100 200 300 time (ms) 300 200 100 0 100 200 300 6 4 E 2 0 200 0 200 C F 0 100 200 300 300 200 100 0 100 200 300 Figure 4-1. Example of cross correlogram construction. A and C are two spike trains each with 4 spikes. Except for the third spike in A, each spike in A invokes a spike in C with some small delay around 10 ms. B represents all the positive (black) and negative (gray) time differences between the spike trains. D shows the position of delays obtained in B. E is the histogram of D, which is the conventional cross correlogram with bin size of 100 ms. F shows the continuous cross correlogram with Laplacian kernel (solid) and Gaussian kernel (dotted) with bandwidth 40 ms. Note that the Laplacian kernel is more sensitive to the exact delay. 33

4.2 Continuous Correlogram Two simultaneously recorded instances of point processes are represented as a sum of Dirac delta functions at the time of firing event, s i (t) and s j (t), s i (t) = N i m=1 δ(t t i m), (4 1) where N i is the number of spikes and t i m are the time instances of action potentials. The cross correlation function is defined as, Q ij ( t) = E t [s i (t)s j (t + t)], (4 2) where E t [ ] denotes expected value over time t. The cross correlation can be interpreted as scaled conditional probability of j-th neuron firing given i-th neuron fired t seconds before [55]. In a physiological context, there is a physical restriction of propagation delay for an action potential to have a causal influence to invoke any other action potential. Therefore, this delay would influence the cross correlogram as a form of increased amplitude. Thus, estimating the delay involves finding the lag at which there is a maximum in the cross correlogram (inhibitory interaction which appear as troughs rather than peaks is not considered in this article). Smoothing a point process is superior to the histogram method for the estimation of the intensity function [30], and especially the maxima [56]. Similarly, the cross correlation function can also be estimated better with smoothing which is done in continuous time so we do not lose the exact time of spikes while enabling interaction between spikes at a distance. Instead of smoothing the histogram of time differences between two spike trains, we first smooth the spike train to obtain a continuous signal [57]. We will show that this is equivalent to smoothing the time differences with a different kernel. A causal exponential decay was chosen as the smoothing kernel to achieve computational efficiency 34

(see section 4.3). Smoothed spike trains are represented as, q i (t) = N i m=1 1 t t i τ e m τ u(t t i m), (4 3) where u(t) is the unit step function. The cross correlation function of the smoothed spike trains is, Q ij( t) = E t [q i (t)q j (t + t)]. (4 4) Given a finite length of observation, the expectation in equation (4 4) can be estimated from samples as, ˆQ ij( t) = 1 T 0 q i (t)q j (t + t)dt, (4 5) where T is the length of the observation. After evaluation of the integral, the resulting estimator becomes, ˆQ ij( t) = 1 2τT N i N j m=1 n=1 e t i m tj n t τ, (4 6) which is equivalent to the kernel intensity estimation [58, 59] from time differences using a Laplacian distribution kernel. The mean and variance of the estimator is analyzed by assuming the spike trains are realizations of two independent homogeneous Poisson processes. [ ] E ˆQ ij( t) λ A λ B, (4 7) var( ˆQ ij( t)) λ Aλ B 4τT, (4 8) where λ A and λ B denote the firing rate of the Poisson process of which i-th and j-th spike train, respectively, is a realization (see Appendix for derivation). Note that the variance reduces linearly as the duration of the spike train is elongated. By removing the mean and dividing by the standard deviation, we standardize the measure for inter-experiment 35

t (A t ) (A t ) + θ n 1 θ n θ n+1 θ n+2 (A t δ ) (A t δ ) + t δ Figure 4-2. Decomposition and shift of the multiset A. comparison: Q ij ( t) = 4τT (Qij ( t) λ A λ B ) λa λ B. (4 9) 4.3 Algorithm The algorithm divides the computation of the summation of continuous cross correlogram into disjoint regions and combines the result. We show that there are only finite possible local maxima, and by storing the intermediate computation results for neighboring time lags, the cross correlation of each lag can be computed in constant time. The essential quantity to be computed is the following double summation, Q ij ( t) = N i N j m=1 n=1 e t i m tj n t τ. (4 10) The basic idea for efficient computing is that the summation of the exponential function computed on a collection of points can be shifted with only one multiplication, i ex i+δ = ( i ex i )e δ. Since a Laplacian kernel is two exponentials stitched together, we need to carefully take the regions into account. Define the multiset of all time differences between two spike trains, A = {θ θ = t i m t j n, m = 1,..., N i, n = 1,..., N j }. (4 11) Even though A is not strictly a set, since it may contain duplicates, we will abuse the set notation for simplicity. Note that the cardinality of the multiset A is N i N j. Now equation 36

(4 10) can be rewritten as Q ij ( t) = θ A e θ t τ. (4 12) Now let us define a series of operations for a multiset B R and δ R, B + = {x x B and x 0}, (non-negative lag) (4 13a) B = {x x B and x < 0}, (negative lag) (4 13b) B δ = {x y B and x = y δ}. (shift) (4 13c) Since B can be decomposed into two exclusive sets B + and B, equation (4 12) can also be rewritten and decomposed, Q ij ( t) = e θ θ A t τ = θ (A t ) + e e θ θ (A t ) + = θ τ + τ + θ (A t ) e e θ τ. θ (A t ) θ τ (4 14a) (4 14b) For convenience, we define the following summations Q ± ij ( t) = θ (A t ) ± e θ τ. (4 15) Let us order the multiset A in ascending order and denote the elements as θ 1 θ 2... θ n θ n+1... θ Ni N j. Observe that within an interval t (θ n, θ n+1 ], the multiset ((A t ) ± ) t is always the same (see Fig. 4-2). In other words, if t = θ n+1, for a small change δ [0, θ n+1 θ n ), the multisets do not change their membership, i.e. ((A t ) ± ) δ = (A ( t δ) ) ±. Therefore, we can simplify an arbitrary shift of Q ± ij with single multiplication of an exponential as, Q ± ij ( t δ) = t (A t δ ) ± e e t δ t (A t ) ± = t τ = τ = t ((A t ) ± ) δ e t (A t ) ± e t τ t τ e ± δ τ = Q ± δ ij ( t)e± τ. (4 16a) (4 16b) 37

Thus, local changes of Q ij can be computed by a constant number of operations no matter how large the set A is, so that Q ij ( t δ) = Q + ij ( t δ) + Q ij ( t δ) (4 17a) = Q + ij ( t)e δ τ + Q ij ( t)e δ τ. (4 17b) If there is a local maximum or minimum of Q ij ( t δ), it would be where dq ij( t δ) dδ = 0, which is, δ = τ 2 Also note that since the second derivative, ( ln(q ij ( t)) ln(q+ ij ( t))). (4 18) d 2 Q ij ( t δ) dδ 2 = 1 τ 2 ( Q + ij ( t)e δ τ ) + Q δ ij ( t)e τ 0, (4 19) Q ij ( t δ) is a convex function of δ within the range. Thus, the maximum of the function value is always on either side of its valid range, only local minimum can be in between. In principle, we need to compute equation (4 10) for all t [ T, T ] to achieve continuous resolution, where T is the maximum time lag of interest. However, if we only want all local minima and maxima, we just need to evaluate on all t A, and compute the minima and maxima using equation (4 17b) and equation (4 18). Therefore, if we compute the Q ± ij (θ n) for all θ n A, we can compute δ for all intervals (θ n, θ n+1 ] if a local extremum exists. These can be computed using the following recursive formulae. Q ij (θ n+1) = Q ij (θ n)e θ n+1 θn τ + 1, (4 20a) Q + ij (θ n+1) = Q + ij (θ n)e θ n+1 θn τ 1. (4 20b) In practice, due to accumulation of numerical error, the following form is preferable for Q + ij, Q + ij (θ n) = (Q + ij (θ n+1) + 1)e θ n+1 θ n τ. (4 21) 38