Entropy and Information in the Neuroscience Laboratory

Size: px
Start display at page:

Download "Entropy and Information in the Neuroscience Laboratory"

Transcription

1 Entropy and Information in the Neuroscience Laboratory Graduate Course in BioInformatics WGSMS Jonathan D. Victor Department of Neurology and Neuroscience Weill Cornell Medical College June 2008

2 Thanks WMC Neurology and Neuroscience Keith Purpura Ferenc Mechler Anita Schmid Ifije Ohiorhenuan Qin Hu Former students Dmitriy Aronov Danny Reich Michael Repucci Bob DeBellis Collaborators Dan Gardner (WMC) Bruce Knight (Rockefeller) Patricia DiLorenzo (SUNY Binghamton) Support NEI NINDS WMC Neurology and Neuroscience

3 Outline Information Why calculate information? What is it? Why is it challenging to estimate? What are some useful strategies for neural data? Examples Maximum entropy methods Data analysis Stimulus design Homework

4 Information: Why Calculate It? An interesting, natural quantity Compare across systems (e.g., one spike per bit ) Determine the constraints on a system (e.g., metabolic cost of information) See where it is lost (e.g., within a neuron) Insight into what the system is designed to do To evaluate candidates for neural codes What statistical features are available? Can precise spike timing carry information? Can neuronal diversity carry information? What codes can be rigorously ruled out? A non-parametric measure of association

5 Even in visual cortex, the neural code unknown What physiologists say: Neurons have definite selectivities ( tuning ) Tuning properties can account for behavior Hubel and Wiesel 1968 What physiologists also know: Responses depend on multiple stimulus parameters Response variability (number of spikes, and firing pattern) is substantial and complicated

6 Some coding hypotheses At the level of individual neurons Spike count Firing rate envelope Interspike interval pattern, e.g., bursts At the level of neural populations Total population activity Labeled lines Patterns across neurons Synchronous spikes Oscillations

7 Coding by intervals can be faster than coding by count Signaling a step change in a sensory input coding by count (rate averaged over time) time coding by interval pattern One short interval indicates a change! Tradeoff: firing must be regular

8 Coding by rate envelope supports signaling of multiple attributes more spikes more transient time Codes based on spike patterns can also support signaling of multiple attributes.

9 A direct experimental test of a neural coding hypothesis is difficult Count, rate, and pattern are interdependent Time is that great gift of nature which keeps everything from happening at once. (C.J. Overbeck, 1978) We d have to manipulate count, rate, and pattern selectively AND observe an effect on behavior So, we need some guidance from theory

10 Information = Reduction in Uncertainty (Claude Shannon, 1948)?????? Observe a response?? Reduction in uncertainty from 6 possibilities to 2 Information = log(6/2)

11 In a bit more detail: A priori knowledge?????? Observe a response No spikes One spike Two spikes A posteriori knowledge

12 Second-guessing shouldn t help No spikes One spike Two spikes Maybe there really should have been a spike? Maybe these two kinds of responses should be pooled? The Data Processing Inequality : information cannot be increased by re-analysis.

13 Information on independent channels should add Color channel Observe a response log(6/2) = log(3) Shape channel Observe a response log(6/3) = log(2) Both channels Observe a response log(6/1) = log(6) log(6) = log(3 2) = log(3) + log(2)

14 Surprising Consequence Data Processing Inequality + Independent channels combine additively + Continuity = Unique definition of information, up to a constant

15 Information: Difference of Entropies output symbol (k) input symbol (j) p(, k) = p(j,k) J j= 1 p( j, k) p( j, ) = K k= 1 p( j, k) a priori distribution: p(,) j a posteriori distribution, given output symbol k: p( j k) = p( j, k)/ p(, k) Information ={Entropy of the a priori distribution of input symbols} minus {Entropy of a posteriori distribution of input symbols, given the observation k, averaged over all k}

16 Entropy: Definition Discrete case: H H = p j log p j (use log 2 to get bits ) j Equivalent forms, useful for analytic purposes: H 1 d α = p α= 1 log2 α j j d 1 1 = lim ( log2 α 1α 1 j α p j 1) Continuous form: has a log(dx) term.

17 Entropy: Properties Mixing property H 1 λ ) p + λq } (1 λ) H{ p } + λh{ q } {( J J J J Chain rule, general form p H(X) H(Y) H ( Z) = H( X) + ph( Y) Chain rule, table form H(X) H(Y) H ( Z) = H( X) + H( Y)

18 Information: Symmetric Difference of Entropies output symbol (k) input symbol (j) p(j,k) = = K k k j p j p 1 ), ( ), ( = = J j k j p k p 1 ), ( ), ( Information = {entropy of output}+{entropy of input} -{entropy of table} )}, ( { )}, ( { )}, ( { k j p H k p H j p H I + =

19 Information: Properties I = H{ p( j, )} + H{ p(, k)} H{ p( j, k)} I is symmetric in input and output I is independent of labeling within input and output I 0 = 0, and I if and only if I H{ p( j, )} and I H{ p(, k)} p ( j, k) = p( j, ) p(, k) Data Processing Inequality: if Y determines Z, then I( X, Y) I( X, Z)

20 Information: Related Quantities Channel capacity maximum information for any input ensemble Efficiency {Information}/{Channel capacity} Redundancy {Information from all channels} 1 - {sum of informations from each channel} Redundancy Index {Information from all channels} 1 - {sum of informations from each channel} 1 - {Information from all channels} {maximum of informations from each channel}

21 Investigating neural coding: not Shannon s paradigm Shannon symbols and codes are known joint (input/output) probabilities are known what are the limits of performance? Neural coding symbols and codes are not known joint probabilities must be measured ultimate performance often known (behavior) what are the codes?

22 Information estimates depend on partitioning of stimulus domain stimulus domain response domain finely partitioned s r unambiguous; H=log(4) bits coarsely partitioned s r unambiguous but detail is lost; H=log(2) bits

23 Information estimates depend on partitioning of response domain stimulus domain response domain s r r finely partitioned: unambiguous; H=log(4) bits coarsely partitioned: ambiguous, H=log(2) bits

24 Information estimates depend on partitioning of response domain, II stimulus domain response domain r s r finely partitioned: unambiguous; H=log(4)=2 bits wrongly partitioned: ambiguous, H=log(1)=0 bits

25 Revenge of the Data Processing Inequality Should these responses be grouped into one code word? time time time Data Processing Inequality says NO: If you group, you underestimate information

26 The Basic Difficulty We need to divide stimulus and response domains finely, to avoid underestimating information ( Data Processing Theorem ). We want to determine <p log p>, but we only have an estimate of p, not its exact value. Dividing stimulus and response domains makes p small. This increases the variability of estimates of p. But that s not all p log p is a nonlinear function of p. Replacing <p log p> with <p>log<p> incurs a bias. How does this bias depend on p?

27 Biased Estimates of -p log p 0.4 f(p)= - p log p p Downward bias is greatest where f is greatest. f (p)= - 1/p

28 The Classic Debiaser: Good News/ Bad News We don t have to debias every p log p term, just the sum. The good news (for entropy of a discrete distribution): The plug-in entropy estimate has an asymptotic bias proportional to (k-1)/n, where N is the number of samples and k is the number of different symbols (Miller, Carlton, Treves, Panzeri). The bad news: Unless N>>k, the asymptotic correction may be worse than none at all. More bad news: We don t know what k is.

29 Another debiasing strategy Toy problem: <x 2 > <x> 2 Our problem: <- p log p> -<p>log<p> -p log p x 2 x For a parabola, bias is constant. This is why the naïve estimator for variance can be simply debiased: σ 2 est =<(x-<x>)2 >/(N-1) p Bias depends on the best local parabolic approximation. This leads to a polynomial debiaser. (Paninski) Better than classical debiaser, but p=0 is still worst case. And it still fails in the extreme undersampled regime.

30 The Direct Method (Strong, de Ruyter, Bialek, et al. 1998) Discretize the response into binary words T letter T word T letter must be small to capture temporal detail timing precision of spikes: <1 ms T word must be large enough insect sensory neuron: ms may be adequate Vertebrate CNS: 100 ms at a minimum Up to 2^(T word /T letter ) probabilities to be estimated

31 Multiple neurons: a severe sampling problem One dimension for each bin and each neuron L T letter T word 2^[L(T word / T letter )] probabilities must be estimated.

32 What else can we do? Spike trains are events in time So there are relationships between them: a continuous topology discrete relationships: how many spikes? (and on which neurons?) How can we exploit this?

33 Strategies for Estimating Entropy and Information Spike train as a symbol sequence NO Smooth dependence on spike times? YES Spike train as a point process increasing model strength NO direct Relationship between symbol sequences? YES LZW compression bottleneck/codebook context tree Markov NO binless embedding power series most require comparison of two entropy estimates Smooth dependence on spike count? stimulus reconstruction YES metric space

34 Strategies for Estimating Entropy and Information Spike train as a symbol sequence NO Smooth dependence on spike times? YES Spike train as a point process NO Relationship between symbol sequences? YES NO Smooth dependence on spike count? direct LZW compression bottleneck/codebook context tree binless embedding power series YES metric space Markov stimulus reconstruction

35 A General Approach to Undersampling Create a parametric model for spike trains Fit the model parameters Calculate entropy from the model (numerically or analytically) Toy example: Binomial model p( )= p( 0 ) p( 0 ) p( 1 ) We can determine probabilities of all spike trains from p 0 = p( 0 ). Entropy of an n-bin segment: H n =nh 1 H 1 =-p 0 log p 0 -p 1 logp 1. (bins are independent). A good estimate of the entropy of the model -- but a bad model: spiking probabilities are not independent.

36 Incorporating Simple Dependencies Say we measure p( 0 0 ), p( 0 1 ), p( 1 0 ), p( 1 1 ). Use these to estimate (model) probabilities of longer trains: Assuming that dependence (memory) lasts only one bin, p( a b )= p( a ) a b Iterating, p( a ), so p( a b a )= p( a b c d )= p( a ) p( a b a ) p( b c b ) p( c d c ) p( a b p( a ) ). = p( a ) p( a b p( a ) ) p( b c p( b ) ) p( c d p( c ) ) = p( a b ) p( p( b ) b c ) p( p( c ) c d ).

37 The Markov Model Say we measure m-block probabilities p( a b c d ): Use these to estimate (model) probabilities of longer trains: p( a b c d ) p( b c d e ) p( c d e f ) p( a b c d e f )=. p( b c d ) p( c d e ) Entropy of an n-bin segment: H n nh 1 (bins are dependent). Recursion: H n+1 - H n =H n - H n-1 for n m Analytic expression for entropy per bin: (not obvious) H lim n = Hm Hm 1 n n As m (the Markov order) increases, the model fits progressively better, but there are more parameters to measure (2 m-1 ). If m needs to be large to account for long-time correlations, it is not clear whether we ve gained much over the direct method.

38 Context Tree Method in a nutshell Kennel, Shlens, Abarbanel, Chichilnisky Neural Computation (2005) p(spike) modeled as a variable-depth tree (Markov: constant-depth) Model parameters: tree topology, probability at terminal nodes Entropies determined from model parameters (Wolpert-Wolf) Weighting of each model according to minimum description length Confidence limit estimation via Monte Carlo resampling

39 Strategies for Estimating Entropy and Information Spike train as a symbol sequence NO Smooth dependence on spike times? YES Spike train as a point process NO Relationship between symbol sequences? YES NO Smooth dependence on spike count? direct LZW compression bottleneck/codebook context tree binless embedding power series YES metric space Markov stimulus reconstruction

40 Binless Method: Strategy If every spike train had only 1 spike: then estimation of entropy of spike trains would be the same as empiric estimation of the entropy of a one-dimensional distribution p(x). If every spike train had r spikes: then estimation of entropy of spike trains would be the same as empiric estimation of the entropy of an r-dimensional distribution p(x 1,x 2,,x r ). But spike trains can have 0, 1, 2, spikes. We can estimate a spike count contribution to information. Then, for each r, we can use continuous methods to estimate entropies and a contribution to spike timing information.

41 Entropy of a continuous distribution Entropy: Change of variables: Transformed entropy: x y = p( t) dt H = 1 ln H = p( x)ln p( x) dx ln2 lnp( x) dy Plan: at each datum, estimate dy as 1/N, estimate p(x) locally Relation to binned entropy estimate with bin size b: p j bp ( x j ) so 1 ln p ln p H b j j ln2 ln2

42 One dimension, in more detail Naïve estimate: 1 1 ln p( x) dy ln2 0 N 1 ln2 j ln 1 Nλ j where λ j is the distance to the nearest neighbor of x j. But sampling is Poisson, not uniform. This leads to a bias since lnλ ln λ. Corrected estimate: N 1 ln2 j ln ( N 1 ln γ 1) λ ln2 ln2 j where = γ e v lnvdv

43 Does it work? 1-d Gaussian, 40 numerical experiments 3 2 estimated entropy samples 128 samples bin width=0.125 bin width=0.5 binless binned, jackknife debiased binned, T-P debiased bin width 1024 samples samples bin width= bin width number of samples

44 r dimensions Naïve estimate: 1 1 ln p( x) dy ln2 0 N 1 ln2 j ln 1 NVj where V j is the volume of the smallest sphere near x j that contains a nearest neighbor (proportional to λ jr ). In y-space, sampling is still Poisson! This leads to Corrected estimate: r H N ln2 j ln ( N 1 1) λ Sr ln[ ] γ + r + ln2 ln2 where S r is the surface area of a unit r-sphere. j

45 Does it work? 3-d Gaussian, 40 numerical experiments 8 estimated entropy samples 128 samples bin width=0.125 bin width=0.5 binless binned, jackknife debiased binned, T-P debiased 4 0 bin width 1024 samples bin width= bin width number of samples

46 Binless Embedding Method in a nutshell Embed responses with r spikes as points in an r-dimensional space responses with 0 spikes responses with 1 spike responses with 2 spikes t 2 responses with 3 spikes t 2 t 1 t 3 t 1 t 1 In each r-dimensional stratum, use Kozachenko-Leonenko (1987) nearest-neighbor estimate I r λ j 1 N 1 ln( ) k Nk ln * Nln2 λ ln2 N N 1 j j k r = 2 λ * j λ j Victor, Phys Rev E (2002)

47 Strategies for Estimating Entropy and Information Spike train as a symbol sequence NO Smooth dependence on spike times? YES Spike train as a point process NO Relationship between symbol sequences? YES NO Smooth dependence on spike count? direct LZW compression bottleneck/codebook context tree binless embedding power series YES metric space Markov stimulus reconstruction

48 Coding hypotheses: in what ways can spike trains be considered similar? Similar spike counts Similar spike times Similar interspike intervals

49 Measuring similarity based on spike times Define the distance between two spike trains as the simplest morphing of one spike train into the other by inserting, deleting, and moving spikes A B Unit cost to insert or delete a spike We don t know the relative importance of spike timing, so we make it a parameter, q: shift a spike in time by ΔT incurs a cost of q ΔT Spike trains are similar only if spikes occur at similar times (i.e., within 1/q sec), so q measures the informative precision of spike timing

50 Identification of Minimal-Cost Paths A World lines cannot cross. So, either (i) The last spike in A is deleted, (ii) The last spike in B is inserted (iii) The last spike in A and the last spike in B must correspond via a shift B The algorithm is closely analogous to the Needleman-Wunsch & Sellers (1970) dynamic programming algorithms for genetic sequence comparisons.

51 Distances between all pairs of responses determine a response space responses to stimulus 1 responses to stimulus 2 calculate all pairwise distances responses to stimulus 3 etc.

52 Configuration of the response space tests whether a hypothesized distance is viable Random: responses to the four stimuli are interspersed Systematic clustering: responses to the stimuli are grouped and nearby groups correspond to similar stimuli

53 Metric Space Method in a nutshell Postulate a parametric family of edit-length metrics (distances ) between spike trains Cluster the responses A B Allowed elementary transformations: insert or delete a spike: unit cost shift a spike in time by ΔT: cost is q ΔT Create a response space from the distances actual stimulus assigned cluster Victor and Purpura, Network (1997) Information = row entropy + column entropy - table entropy

54 Visual cortex: contrast responses Trial number 256 ms

55 Visual cortex: contrast coding spike time code information spike count code chance q max interspike interval code q

56 Multiple visual sub-modalities contrast orientation spatial frequency texture type

57 information 1 0 Attributes are coded in distinct ways q max 1 q 100 temporal precision q (1/sec) in primary visual cortex (V1) contrast orientation spatial frequency texture type temporal precision q (1/sec) and especially in V2 contrast orientation spatial frequency texture type

58 Analyzing coding across multiple neurons A multineuronal activity pattern neurons time is a time series of labeled events Distances between labeled time series can also be defined as the minimal cost to morph one into another, with one new parameter: Cost to insert or delete a spike: 1 Cost to move a spike by an amount ΔT: q ΔT Cost to change the label of a spike: k k determines the importance of the neuron of origin of a spike. k=0: summed population code k large: labeled line code

59 Multineuronal Analysis via the Metric-Space Method: A two-parameter family of codes Change the time of a spike: cost/sec = q q=0: spike count code Change the neuron of origin of a spike: cost = k k=0: summed population code : (neuron doesn t matter) k=2: labelled line code (neuron matters maximally) spike count code importance of neuron of origin (k) labeled-line codes summed population codes importance of timing (q)

60 Preparation Recordings from primary visual cortex (V1) of macaque monkey Multineuronal recording via tetrodes ensures neurons are neighbors (ca. 100 microns)

61 The stimulus set: a cyclic domain 16 kinds of stimuli in the full stimulus set

62 Spatial phase coding in two simple cells imp/s imp/s msec Information (bits) k q (1/sec) 473 msec

63 Discrimination is not enough Information indicates discriminability of stimuli, but not whether stimuli are represented in a manner amenable to later analysis Do the distances between the responses recover the geometry of the stimulus space?

64 Representation of phase by a neuron pair stimuli reconstructed response space: each neuron considered separately neuron 1 neuron 2 reconstructed response space: two neurons considered jointly summed population code (k=0) respect neuron of origin (k=1) Representation of stimulus space is more faithful when neuron-of-origin of each spike is respected.

65 Representing Taste responses of rat solitary tract neuron P. DiLorenzo, J.-Y. Chen, SfN 2007 reconstructed response spaces 4 primary tastes 6 mixtures salty bitter sour sweet spike count code salty bitter 100 spikes per sec 5 sec sour sweet Temporal pattern supports full representation of the 4 primary tastes and their 6 mixtures spike timing code, informative precision q ~ 200 msec

66 Summary: Estimation of Information- Theoretic Quantities from Data It is challenging but approachable There is a role for multiple methodologies Methodologies differ in assumptions about how the response space is partitioned The way that information estimates depend on this partitioning provides insight into neural coding It can provide some biological insights The temporal structure of spike trains is informative (Adding parametric tools): temporal structure can represent the geometry of a multidimensional sensory space

67 Maximum-Entropy Methods The maximum-entropy criterion chooses a specific distribution from an incomplete specification Typically unique: the most random distribution consistent with those specifications Often easy to compute (but not always!) Maximum-entropy distributions General characteristics Familiar examples Other examples Applications

68 General Characteristics of Maximum- Entropy Distributions Setup Seek a probability distribution p(x) on a specified domain Specify constraints C 1,, C M of the form ΣC m (x)p(x)=b m Domain can be continuous; sums become integrals Constraints are linear in p but may be nonlinear in x C(x)=x 2 constrains the variance of p Maximize H(p)=-Σp(x)logp(x) subject to these constraints Solution Add a constraint C 0 =1 to enforce normalization of p Lagrange multipliers λ m for each C m Maximize -Σp(x)logp(x)+Σλ m ΣC m p(x) Solution: p(x)=exp{σλ m C m (x)}, with the multipliers λ m are determined by ΣC m (x)p(x)=b m These equations are typically nonlinear, but occasionally can be solved explicitly The solution is always unique (mixing property of entropy)

69 Examples of Maximum-Entropy Distributions Univariate On [a, b], no constraints: uniform on [a, b] On [-, ], variance constrained: Kexp(-cx 2 ) On [0, ], mean constrained: Kexp(-cx) On [0, 2π], Fourier component constrained: Von Mises distribution, Kexp{-c cos(x-φ)} Multivariate Mean and covariances constrained: Kexp{(x-m) T C(x-m)} Independent, with marginals P (x), Q(y): P(x)Q(y) Discrete On {0,1} on a 1-d lattice, with m-block configurations constrained: the mth-order Markov processes On {0,1} on a 2-d lattice, with adjacent pair configurations constrained: the Ising problem

70 Wiener/Volterra and Maximum-Entropy Wiener-Volterra systems identification: Determine the nonlinear functional F in a stimulus-response relationship r(t)=f[s(t)] Discretize prior times: s(t)=(s(t-δt),, s(t-lδt))= (s 1,, s L ) Consider only the present response: r=r(0) F is a multivariate Taylor series for r in terms of (s 1,, s L ) Measure low-order coefficients, assume others are 0 Probabilistic view Same setup as above, find P(r,s)=P(r, s 1,, s L ) Measure some moments Mean and covariance of s: <s j >, <s j s k >, Cross-correlations: <rs j >, <rs j s k >, Mean of r: <r> Variance of r: <r 2 > Find the maximum-entropy distribution constrained by the above This is Kexp{-c(r-F(s 1,,s L )) 2 } The Wiener-Volterra series with additive and Gaussian noise is the maximum-entropy distribution consistent with the above constraints What happens if the noise is not Gaussian (e.g., spikes?) Victor and Johanessma, 1986

71 Some Experiment-Driven Uses of Maximum-Entropy Methods Rationale: since it is the most random distribution consistent with a set of specifications, it can be thought of as an automated way to generate null hypotheses Modeling multineuronal activity patterns Spontaneous activity in the retina: Shlens et al. 2006, Schneidman et al Spontaneous and driven activity in visual cortex: Ohiorhenuan et al Designing stimuli

72 Analyzing Multineuronal Firing Patterns Consider three neurons A, B, C. If all pairs are uncorrelated, then we have an obvious prediction for the triplet firing event, namely p(a,b,c)=p(a)p(b)p(c). But if we observe pairwise correlations, what do we predict for p(a,b,c), and higher-order patterns? A B C D Use constrained maximum-entropy distributions, of course. Measure p( 1 1 ), p( ), p( ), p( ), 1 Find the maximum-entropy distribution, and use to predict p( ), p( ), p( ), 1 p( p( ), ), p( 1 1 p( ), ).

73 Multineuronal Firing Patterns: Retina Maximum-entropy distributions built from pairs account for joint activity of clusters of up to 7 cells (macaque retina, Shlens et al., J. Neurosci. 2006) And only nearestneighbor pairs matter. Also see Schneidmat et al., Nature 2006 (15 cells), and review in Current Opinion In Neurobiology, 2007 (Nirenberg and Victor)

74 Significance of Maximum-Entropy Analysis The pairwise model dramatically simplifies the description of multineuronal firing patterns Reduction from 2 N parameters to N(N-1)/2 parameters Further reduction to ~4N parameters if only nearest neighbors matter It makes sense in terms of retinal anatomy Limitations What about firing patterns across time? What about stimulus driving? What about cortex (connectivity is more complex)

75 Multineuronal Firing Patterns: Visual Cortex number of recordings log 2 -likelihood ratio: model vs. observed Maximum-entropy distributions built from pairs do not account for joint activity of clusters of 3-5 cells (macaque V1, Ohiorhenuan et al., CoSyNe 2006)

76 Multineuronal Firing Patterns: Visual Cortex reverse correlation with pseudorandom checkerboard (m-sequence) stimuli Analyzing stimulus-dependence construct a pairwise maximum entropy model conditioned on one or more stimulus pixels that modulate the response probability distribution 0 log 2 likelihood ratio Higher-order correlations are stimulus-dependent! (Ohiorhenuan et al., SfN 2007) number of conditioning pixels

77 Maximum Entropy and Stimulus Design We d like to understand how cortical neurons respond to natural scenes How can we determine what aspects of natural scenes make current models fail? Maximum-entropy approach: incrementally add constraints corresponding to natural-scene image statistics Gaussian white noise or binary noise: resulting models are OK for retinal ganglion cells, poor in V1 Gaussian 1/f 2 noise: in V1, >50% of variance not explained How about adding higher-order statistical constraints? Which ones to add? How do they interact?

78 Dimensional Reduction of Image Statistics Image statistics can be studied in isolation by creating binary images with a single statistic constrained: the mean value of the product of luminance values (+1 or -1) in a glider of several cells some are visually salient others are not When two image statistics are constrained, simple combination rules account for the visual salience of the resulting texture. Victor and Conte 1991, 2006

79 Homework Debiasing entropy estimates: Given an urn with an unknown but finite number of kinds of objects k, and an unknown number n k of each kind of object, and N samples (with replacement) from the urn, estimate k. Does it help to use a jackknife debiaser? Does it help to postulate a distribution for n k? (Treves and Panzeri, Neural Computation 1995) The Data Processing Inequality: Shannon s mutual information is just one example of a function defined on joint input-output distributions. Are there any others for which Shannon information holds? If so, are they useful? (Victor and Nirenberg, submitted 2007) Markov models of point processes Demonstrate that for a mth-order Markov model, the block entropies satisfy H n+1 -H n =H n -H n-1 for n m Maximum-entropy distributions: Consider a Poisson neuron driven by a Gaussian noise. Build the maximum-entropy distribution constrained by the overall mean firing rate, input power spectrum, and spike-triggered average. Fit this with a linear-nonlinear (LN) model. What is the nonlinearity s input-output function? Is the spike-triggered covariance zero? (?? and Victor)

Entropy and Information in the Neuroscience Laboratory

Entropy and Information in the Neuroscience Laboratory Entropy and Information in the Neuroscience Laboratory G80.3042.002: Statistical Analysis and Modeling of Neural Data Jonathan D. Victor Department of Neurology and Neuroscience Weill Medical College of

More information

Thanks. WCMC Neurology and Neuroscience. Collaborators

Thanks. WCMC Neurology and Neuroscience. Collaborators Thanks WCMC Neurology and Neuroscience Keith Purpura Ferenc Mechler Anita Schmid Ifije Ohiorhenuan Qin Hu Former students Dmitriy Aronov Danny Reich Michael Repucci Bob DeBellis Collaborators Sheila Nirenberg

More information

Victor, Binless Entropy and Information Estimates 09/04/02 1:21 PM BINLESS STRATEGIES FOR ESTIMATION OF INFORMATION FROM NEURAL DATA

Victor, Binless Entropy and Information Estimates 09/04/02 1:21 PM BINLESS STRATEGIES FOR ESTIMATION OF INFORMATION FROM NEURAL DATA 9/4/ : PM BINLESS STRATEGIES FOR ESTIMATION OF INFORMATION FROM NEURAL DATA Jonathan D. Victor Department of Neurology and Neuroscience Weill Medical College of Cornell University 3 York Avenue New York,

More information

Estimation of information-theoretic quantities

Estimation of information-theoretic quantities Estimation of information-theoretic quantities Liam Paninski Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/ liam liam@gatsby.ucl.ac.uk November 16, 2004 Some

More information

The homogeneous Poisson process

The homogeneous Poisson process The homogeneous Poisson process during very short time interval Δt there is a fixed probability of an event (spike) occurring independent of what happened previously if r is the rate of the Poisson process,

More information

encoding and estimation bottleneck and limits to visual fidelity

encoding and estimation bottleneck and limits to visual fidelity Retina Light Optic Nerve photoreceptors encoding and estimation bottleneck and limits to visual fidelity interneurons ganglion cells light The Neural Coding Problem s(t) {t i } Central goals for today:

More information

Information Processing in Neural Populations

Information Processing in Neural Populations Information Processing in Neural Populations selective tutorial introduction KITP, UCSB July 2011 Jonathan Victor Neurology and Neuroscience Weill Cornell Medical College Disclaimers The visual system

More information

Entropy and information estimation: An overview

Entropy and information estimation: An overview Entropy and information estimation: An overview Ilya Nemenman December 12, 2003 Ilya Nemenman, NIPS 03 Entropy Estimation Workshop, December 12, 2003 1 Workshop schedule: Morning 7:30 7:55 Ilya Nemenman,

More information

Information Theory. Mark van Rossum. January 24, School of Informatics, University of Edinburgh 1 / 35

Information Theory. Mark van Rossum. January 24, School of Informatics, University of Edinburgh 1 / 35 1 / 35 Information Theory Mark van Rossum School of Informatics, University of Edinburgh January 24, 2018 0 Version: January 24, 2018 Why information theory 2 / 35 Understanding the neural code. Encoding

More information

Transformation of stimulus correlations by the retina

Transformation of stimulus correlations by the retina Transformation of stimulus correlations by the retina Kristina Simmons (University of Pennsylvania) and Jason Prentice, (now Princeton University) with Gasper Tkacik (IST Austria) Jan Homann (now Princeton

More information

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing Linear recurrent networks Simpler, much more amenable to analytic treatment E.g. by choosing + ( + ) = Firing rates can be negative Approximates dynamics around fixed point Approximation often reasonable

More information

Statistical models for neural encoding, decoding, information estimation, and optimal on-line stimulus design

Statistical models for neural encoding, decoding, information estimation, and optimal on-line stimulus design Statistical models for neural encoding, decoding, information estimation, and optimal on-line stimulus design Liam Paninski Department of Statistics and Center for Theoretical Neuroscience Columbia University

More information

Why is the field of statistics still an active one?

Why is the field of statistics still an active one? Why is the field of statistics still an active one? It s obvious that one needs statistics: to describe experimental data in a compact way, to compare datasets, to ask whether data are consistent with

More information

Neural Coding: Integrate-and-Fire Models of Single and Multi-Neuron Responses

Neural Coding: Integrate-and-Fire Models of Single and Multi-Neuron Responses Neural Coding: Integrate-and-Fire Models of Single and Multi-Neuron Responses Jonathan Pillow HHMI and NYU http://www.cns.nyu.edu/~pillow Oct 5, Course lecture: Computational Modeling of Neuronal Systems

More information

SPIKE TRIGGERED APPROACHES. Odelia Schwartz Computational Neuroscience Course 2017

SPIKE TRIGGERED APPROACHES. Odelia Schwartz Computational Neuroscience Course 2017 SPIKE TRIGGERED APPROACHES Odelia Schwartz Computational Neuroscience Course 2017 LINEAR NONLINEAR MODELS Linear Nonlinear o Often constrain to some form of Linear, Nonlinear computations, e.g. visual

More information

1/12/2017. Computational neuroscience. Neurotechnology.

1/12/2017. Computational neuroscience. Neurotechnology. Computational neuroscience Neurotechnology https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-core-concepts/ 1 Neurotechnology http://www.lce.hut.fi/research/cogntech/neurophysiology Recording

More information

Exercises. Chapter 1. of τ approx that produces the most accurate estimate for this firing pattern.

Exercises. Chapter 1. of τ approx that produces the most accurate estimate for this firing pattern. 1 Exercises Chapter 1 1. Generate spike sequences with a constant firing rate r 0 using a Poisson spike generator. Then, add a refractory period to the model by allowing the firing rate r(t) to depend

More information

Mathematical Tools for Neuroscience (NEU 314) Princeton University, Spring 2016 Jonathan Pillow. Homework 8: Logistic Regression & Information Theory

Mathematical Tools for Neuroscience (NEU 314) Princeton University, Spring 2016 Jonathan Pillow. Homework 8: Logistic Regression & Information Theory Mathematical Tools for Neuroscience (NEU 34) Princeton University, Spring 206 Jonathan Pillow Homework 8: Logistic Regression & Information Theory Due: Tuesday, April 26, 9:59am Optimization Toolbox One

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Encoding or decoding

Encoding or decoding Encoding or decoding Decoding How well can we learn what the stimulus is by looking at the neural responses? We will discuss two approaches: devise and evaluate explicit algorithms for extracting a stimulus

More information

!) + log(t) # n i. The last two terms on the right hand side (RHS) are clearly independent of θ and can be

!) + log(t) # n i. The last two terms on the right hand side (RHS) are clearly independent of θ and can be Supplementary Materials General case: computing log likelihood We first describe the general case of computing the log likelihood of a sensory parameter θ that is encoded by the activity of neurons. Each

More information

Statistical models for neural encoding

Statistical models for neural encoding Statistical models for neural encoding Part 1: discrete-time models Liam Paninski Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/ liam liam@gatsby.ucl.ac.uk

More information

What is the neural code? Sekuler lab, Brandeis

What is the neural code? Sekuler lab, Brandeis What is the neural code? Sekuler lab, Brandeis What is the neural code? What is the neural code? Alan Litke, UCSD What is the neural code? What is the neural code? What is the neural code? Encoding: how

More information

Implications of Spike Trains as Event Sequences (revised) 1:14 PM 04/04/12 SPIKE TRAINS AS EVENT SEQUENCES: FUNDAMENTAL IMPLICATIONS

Implications of Spike Trains as Event Sequences (revised) 1:14 PM 04/04/12 SPIKE TRAINS AS EVENT SEQUENCES: FUNDAMENTAL IMPLICATIONS SPIKE TRAINS AS EVENT SEQUENCES: FUNDAMENTAL IMPLICATIONS Jonathan D. Victor,3 and Sheila Nirenberg 2,3 Department of Neurology and Neuroscience Weill Medical College of Cornell University 3 York Avenue

More information

Multi-neuron spike trains - distances and information.

Multi-neuron spike trains - distances and information. Multi-neuron spike trains - distances and information. Conor Houghton Computer Science, U Bristol NETT Florence, March 204 Thanks for the invite In 87 Stendhal reportedly was overcome by the cultural richness

More information

This cannot be estimated directly... s 1. s 2. P(spike, stim) P(stim) P(spike stim) =

This cannot be estimated directly... s 1. s 2. P(spike, stim) P(stim) P(spike stim) = LNP cascade model Simplest successful descriptive spiking model Easily fit to (extracellular) data Descriptive, and interpretable (although not mechanistic) For a Poisson model, response is captured by

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

The Bayesian Brain. Robert Jacobs Department of Brain & Cognitive Sciences University of Rochester. May 11, 2017

The Bayesian Brain. Robert Jacobs Department of Brain & Cognitive Sciences University of Rochester. May 11, 2017 The Bayesian Brain Robert Jacobs Department of Brain & Cognitive Sciences University of Rochester May 11, 2017 Bayesian Brain How do neurons represent the states of the world? How do neurons represent

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees Ryan P Adams So far, we have primarily examined linear classifiers and regressors, and considered several different ways to train them When we ve found the linearity

More information

CS 630 Basic Probability and Information Theory. Tim Campbell

CS 630 Basic Probability and Information Theory. Tim Campbell CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)

More information

Spike Count Correlation Increases with Length of Time Interval in the Presence of Trial-to-Trial Variation

Spike Count Correlation Increases with Length of Time Interval in the Presence of Trial-to-Trial Variation NOTE Communicated by Jonathan Victor Spike Count Correlation Increases with Length of Time Interval in the Presence of Trial-to-Trial Variation Robert E. Kass kass@stat.cmu.edu Valérie Ventura vventura@stat.cmu.edu

More information

Lecture 11: Continuous-valued signals and differential entropy

Lecture 11: Continuous-valued signals and differential entropy Lecture 11: Continuous-valued signals and differential entropy Biology 429 Carl Bergstrom September 20, 2008 Sources: Parts of today s lecture follow Chapter 8 from Cover and Thomas (2007). Some components

More information

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy Coding and Information Theory Chris Williams, School of Informatics, University of Edinburgh Overview What is information theory? Entropy Coding Information Theory Shannon (1948): Information theory is

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Neural variability and Poisson statistics

Neural variability and Poisson statistics Neural variability and Poisson statistics January 15, 2014 1 Introduction We are in the process of deriving the Hodgkin-Huxley model. That model describes how an action potential is generated by ion specic

More information

arxiv:physics/ v1 [physics.data-an] 7 Jun 2003

arxiv:physics/ v1 [physics.data-an] 7 Jun 2003 Entropy and information in neural spike trains: Progress on the sampling problem arxiv:physics/0306063v1 [physics.data-an] 7 Jun 2003 Ilya Nemenman, 1 William Bialek, 2 and Rob de Ruyter van Steveninck

More information

Pacific Symposium on Biocomputing 6: (2001)

Pacific Symposium on Biocomputing 6: (2001) Analyzing sensory systems with the information distortion function Alexander G Dimitrov and John P Miller Center for Computational Biology Montana State University Bozeman, MT 59715-3505 falex,jpmg@nervana.montana.edu

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Spatio-temporal correlations and visual signaling in a complete neuronal population Jonathan W. Pillow 1, Jonathon Shlens 2, Liam Paninski 3, Alexander Sher 4, Alan M. Litke 4,E.J.Chichilnisky 2, Eero

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

The Effects of Coarse-Graining on One- Dimensional Cellular Automata Alec Boyd UC Davis Physics Deparment

The Effects of Coarse-Graining on One- Dimensional Cellular Automata Alec Boyd UC Davis Physics Deparment The Effects of Coarse-Graining on One- Dimensional Cellular Automata Alec Boyd UC Davis Physics Deparment alecboy@gmail.com Abstract: Measurement devices that we use to examine systems often do not communicate

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

What can spike train distances tell us about the neural code?

What can spike train distances tell us about the neural code? What can spike train distances tell us about the neural code? Daniel Chicharro a,b,1,, Thomas Kreuz b,c, Ralph G. Andrzejak a a Dept. of Information and Communication Technologies, Universitat Pompeu Fabra,

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

Sean Escola. Center for Theoretical Neuroscience

Sean Escola. Center for Theoretical Neuroscience Employing hidden Markov models of neural spike-trains toward the improved estimation of linear receptive fields and the decoding of multiple firing regimes Sean Escola Center for Theoretical Neuroscience

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

Nonlinear reverse-correlation with synthesized naturalistic noise

Nonlinear reverse-correlation with synthesized naturalistic noise Cognitive Science Online, Vol1, pp1 7, 2003 http://cogsci-onlineucsdedu Nonlinear reverse-correlation with synthesized naturalistic noise Hsin-Hao Yu Department of Cognitive Science University of California

More information

Mid Year Project Report: Statistical models of visual neurons

Mid Year Project Report: Statistical models of visual neurons Mid Year Project Report: Statistical models of visual neurons Anna Sotnikova asotniko@math.umd.edu Project Advisor: Prof. Daniel A. Butts dab@umd.edu Department of Biology Abstract Studying visual neurons

More information

CHARACTERIZATION OF NONLINEAR NEURON RESPONSES

CHARACTERIZATION OF NONLINEAR NEURON RESPONSES CHARACTERIZATION OF NONLINEAR NEURON RESPONSES Matt Whiteway whit8022@umd.edu Dr. Daniel A. Butts dab@umd.edu Neuroscience and Cognitive Science (NACS) Applied Mathematics and Scientific Computation (AMSC)

More information

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1 EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle

More information

Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons

Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons Yan Karklin and Eero P. Simoncelli NYU Overview Efficient coding is a well-known objective for the evaluation and

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary discussion 1: Most excitatory and suppressive stimuli for model neurons The model allows us to determine, for each model neuron, the set of most excitatory and suppresive features. First,

More information

Gentle Introduction to Infinite Gaussian Mixture Modeling

Gentle Introduction to Infinite Gaussian Mixture Modeling Gentle Introduction to Infinite Gaussian Mixture Modeling with an application in neuroscience By Frank Wood Rasmussen, NIPS 1999 Neuroscience Application: Spike Sorting Important in neuroscience and for

More information

Natural Image Statistics and Neural Representations

Natural Image Statistics and Neural Representations Natural Image Statistics and Neural Representations Michael Lewicki Center for the Neural Basis of Cognition & Department of Computer Science Carnegie Mellon University? 1 Outline 1. Information theory

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

2D Image Processing (Extended) Kalman and particle filter

2D Image Processing (Extended) Kalman and particle filter 2D Image Processing (Extended) Kalman and particle filter Prof. Didier Stricker Dr. Gabriele Bleser Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz

More information

Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment

Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment Jean-Pierre Nadal CNRS & EHESS Laboratoire de Physique Statistique (LPS, UMR 8550 CNRS - ENS UPMC Univ.

More information

THE retina in general consists of three layers: photoreceptors

THE retina in general consists of three layers: photoreceptors CS229 MACHINE LEARNING, STANFORD UNIVERSITY, DECEMBER 2016 1 Models of Neuron Coding in Retinal Ganglion Cells and Clustering by Receptive Field Kevin Fegelis, SUID: 005996192, Claire Hebert, SUID: 006122438,

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

AT2 Neuromodeling: Problem set #3 SPIKE TRAINS

AT2 Neuromodeling: Problem set #3 SPIKE TRAINS AT2 Neuromodeling: Problem set #3 SPIKE TRAINS Younesse Kaddar PROBLEM 1: Poisson spike trains Link of the ipython notebook for the code Brain neuron emit spikes seemingly randomly: we will aim to model

More information

State-Space Methods for Inferring Spike Trains from Calcium Imaging

State-Space Methods for Inferring Spike Trains from Calcium Imaging State-Space Methods for Inferring Spike Trains from Calcium Imaging Joshua Vogelstein Johns Hopkins April 23, 2009 Joshua Vogelstein (Johns Hopkins) State-Space Calcium Imaging April 23, 2009 1 / 78 Outline

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we

More information

Characterization of Nonlinear Neuron Responses

Characterization of Nonlinear Neuron Responses Characterization of Nonlinear Neuron Responses Final Report Matt Whiteway Department of Applied Mathematics and Scientific Computing whit822@umd.edu Advisor Dr. Daniel A. Butts Neuroscience and Cognitive

More information

Maximum Likelihood Estimation. only training data is available to design a classifier

Maximum Likelihood Estimation. only training data is available to design a classifier Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional

More information

Efficient Coding. Odelia Schwartz 2017

Efficient Coding. Odelia Schwartz 2017 Efficient Coding Odelia Schwartz 2017 1 Levels of modeling Descriptive (what) Mechanistic (how) Interpretive (why) 2 Levels of modeling Fitting a receptive field model to experimental data (e.g., using

More information

Bayesian probability theory and generative models

Bayesian probability theory and generative models Bayesian probability theory and generative models Bruno A. Olshausen November 8, 2006 Abstract Bayesian probability theory provides a mathematical framework for peforming inference, or reasoning, using

More information

High-dimensional geometry of cortical population activity. Marius Pachitariu University College London

High-dimensional geometry of cortical population activity. Marius Pachitariu University College London High-dimensional geometry of cortical population activity Marius Pachitariu University College London Part I: introduction to the brave new world of large-scale neuroscience Part II: large-scale data preprocessing

More information

When does interval coding occur?

When does interval coding occur? When does interval coding occur? Don H. Johnson Λ and Raymon M. Glantz y Λ Department of Electrical & Computer Engineering, MS 366 y Department of Biochemistry and Cell Biology Rice University 6 Main Street

More information

PLEASE SCROLL DOWN FOR ARTICLE

PLEASE SCROLL DOWN FOR ARTICLE This article was downloaded by: [CDL Journals Account] On: 15 May 2009 Access details: Access Details: [subscription number 785022368] Publisher Informa Healthcare Informa Ltd Registered in England and

More information

Finding a Basis for the Neural State

Finding a Basis for the Neural State Finding a Basis for the Neural State Chris Cueva ccueva@stanford.edu I. INTRODUCTION How is information represented in the brain? For example, consider arm movement. Neurons in dorsal premotor cortex (PMd)

More information

Chapter 9 Fundamental Limits in Information Theory

Chapter 9 Fundamental Limits in Information Theory Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

ECNS 561 Multiple Regression Analysis

ECNS 561 Multiple Regression Analysis ECNS 561 Multiple Regression Analysis Model with Two Independent Variables Consider the following model Crime i = β 0 + β 1 Educ i + β 2 [what else would we like to control for?] + ε i Here, we are taking

More information

CSCE 478/878 Lecture 6: Bayesian Learning

CSCE 478/878 Lecture 6: Bayesian Learning Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell

More information

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy

More information

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland Frederick James CERN, Switzerland Statistical Methods in Experimental Physics 2nd Edition r i Irr 1- r ri Ibn World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI CONTENTS

More information

EEG- Signal Processing

EEG- Signal Processing Fatemeh Hadaeghi EEG- Signal Processing Lecture Notes for BSP, Chapter 5 Master Program Data Engineering 1 5 Introduction The complex patterns of neural activity, both in presence and absence of external

More information

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester Physics 403 Choosing Priors and the Principle of Maximum Entropy Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Odds Ratio Occam Factors

More information

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1 Lecture 2 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Maximum Likelihood Estimation of a Stochastic Integrate-and-Fire Neural Model

Maximum Likelihood Estimation of a Stochastic Integrate-and-Fire Neural Model Maximum Likelihood Estimation of a Stochastic Integrate-and-Fire Neural Model Jonathan W. Pillow, Liam Paninski, and Eero P. Simoncelli Howard Hughes Medical Institute Center for Neural Science New York

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Artificial Intelligence Module 2. Feature Selection. Andrea Torsello

Artificial Intelligence Module 2. Feature Selection. Andrea Torsello Artificial Intelligence Module 2 Feature Selection Andrea Torsello We have seen that high dimensional data is hard to classify (curse of dimensionality) Often however, the data does not fill all the space

More information

3.3 Population Decoding

3.3 Population Decoding 3.3 Population Decoding 97 We have thus far considered discriminating between two quite distinct stimulus values, plus and minus. Often we are interested in discriminating between two stimulus values s

More information

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015 10-704 Homework 1 Due: Thursday 2/5/2015 Instructions: Turn in your homework in class on Thursday 2/5/2015 1. Information Theory Basics and Inequalities C&T 2.47, 2.29 (a) A deck of n cards in order 1,

More information

A first model of learning

A first model of learning A first model of learning Let s restrict our attention to binary classification our labels belong to (or ) We observe the data where each Suppose we are given an ensemble of possible hypotheses / classifiers

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

ITCT Lecture IV.3: Markov Processes and Sources with Memory

ITCT Lecture IV.3: Markov Processes and Sources with Memory ITCT Lecture IV.3: Markov Processes and Sources with Memory 4. Markov Processes Thus far, we have been occupied with memoryless sources and channels. We must now turn our attention to sources with memory.

More information

A comparison of binless spike train measures

A comparison of binless spike train measures A comparison of binless spike train measures António R. C. Paiva, Il Park, and José C. Príncipe Department of Electrical and Computer Engineering University of Florida Gainesville, FL 36, USA {arpaiva,

More information

SAMSI Astrostatistics Tutorial. More Markov chain Monte Carlo & Demo of Mathematica software

SAMSI Astrostatistics Tutorial. More Markov chain Monte Carlo & Demo of Mathematica software SAMSI Astrostatistics Tutorial More Markov chain Monte Carlo & Demo of Mathematica software Phil Gregory University of British Columbia 26 Bayesian Logical Data Analysis for the Physical Sciences Contents:

More information

Neural Encoding: Firing Rates and Spike Statistics

Neural Encoding: Firing Rates and Spike Statistics Neural Encoding: Firing Rates and Spike Statistics Dayan and Abbott (21) Chapter 1 Instructor: Yoonsuck Choe; CPSC 644 Cortical Networks Background: Dirac δ Function Dirac δ function has the following

More information

Signal Processing - Lecture 7

Signal Processing - Lecture 7 1 Introduction Signal Processing - Lecture 7 Fitting a function to a set of data gathered in time sequence can be viewed as signal processing or learning, and is an important topic in information theory.

More information

Comparison of receptive fields to polar and Cartesian stimuli computed with two kinds of models

Comparison of receptive fields to polar and Cartesian stimuli computed with two kinds of models Supplemental Material Comparison of receptive fields to polar and Cartesian stimuli computed with two kinds of models Motivation The purpose of this analysis is to verify that context dependent changes

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Why is Deep Learning so effective?

Why is Deep Learning so effective? Ma191b Winter 2017 Geometry of Neuroscience The unreasonable effectiveness of deep learning This lecture is based entirely on the paper: Reference: Henry W. Lin and Max Tegmark, Why does deep and cheap

More information

Design and Implementation of Speech Recognition Systems

Design and Implementation of Speech Recognition Systems Design and Implementation of Speech Recognition Systems Spring 2013 Class 7: Templates to HMMs 13 Feb 2013 1 Recap Thus far, we have looked at dynamic programming for string matching, And derived DTW from

More information

Lecture 9: Bayesian Learning

Lecture 9: Bayesian Learning Lecture 9: Bayesian Learning Cognitive Systems II - Machine Learning Part II: Special Aspects of Concept Learning Bayes Theorem, MAL / ML hypotheses, Brute-force MAP LEARNING, MDL principle, Bayes Optimal

More information

Decision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1

Decision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1 Decision Trees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 5 th, 2007 2005-2007 Carlos Guestrin 1 Linear separability A dataset is linearly separable iff 9 a separating

More information

Resampling techniques for statistical modeling

Resampling techniques for statistical modeling Resampling techniques for statistical modeling Gianluca Bontempi Département d Informatique Boulevard de Triomphe - CP 212 http://www.ulb.ac.be/di Resampling techniques p.1/33 Beyond the empirical error

More information