Statistical models for neural encoding

Similar documents
Statistical models for neural encoding, decoding, information estimation, and optimal on-line stimulus design

SPIKE TRIGGERED APPROACHES. Odelia Schwartz Computational Neuroscience Course 2017

Combining biophysical and statistical methods for understanding neural codes

This cannot be estimated directly... s 1. s 2. P(spike, stim) P(stim) P(spike stim) =

Neural Coding: Integrate-and-Fire Models of Single and Multi-Neuron Responses

CHARACTERIZATION OF NONLINEAR NEURON RESPONSES

Comparing integrate-and-fire models estimated using intracellular and extracellular data 1

Mid Year Project Report: Statistical models of visual neurons

1/12/2017. Computational neuroscience. Neurotechnology.

CHARACTERIZATION OF NONLINEAR NEURON RESPONSES

encoding and estimation bottleneck and limits to visual fidelity

Comparison of objective functions for estimating linear-nonlinear models

Bayesian inference for low rank spatiotemporal neural receptive fields

Neural characterization in partially observed populations of spiking neurons

SUPPLEMENTARY INFORMATION

Estimation of information-theoretic quantities

Characterization of Nonlinear Neuron Responses

What is the neural code? Sekuler lab, Brandeis

Dimensionality reduction in neural models: an information-theoretic generalization of spiketriggered average and covariance analysis

Neural Encoding Models

Phenomenological Models of Neurons!! Lecture 5!

Primer: The deconstruction of neuronal spike trains

SUPPLEMENTARY INFORMATION

Time-rescaling methods for the estimation and assessment of non-poisson neural encoding models

Liam Paninski Department of Statistics and Center for Theoretical Neuroscience Columbia University liam.

Dimensionality reduction in neural models: An information-theoretic generalization of spike-triggered average and covariance analysis

Likelihood-Based Approaches to

Maximum Likelihood Estimation of a Stochastic Integrate-and-Fire Neural Model

Maximum likelihood estimation of cascade point-process neural encoding models

Efficient and direct estimation of a neural subunit model for sensory coding

Robust regression and non-linear kernel methods for characterization of neuronal response functions from limited data

Learning quadratic receptive fields from neural responses to natural signals: information theoretic and likelihood methods

Neural Encoding. Mark van Rossum. January School of Informatics, University of Edinburgh 1 / 58

Modelling and Analysis of Retinal Ganglion Cells Through System Identification

Inferring input nonlinearities in neural encoding models

Inferring synaptic conductances from spike trains under a biophysically inspired point process model

The Bayesian Brain. Robert Jacobs Department of Brain & Cognitive Sciences University of Rochester. May 11, 2017

Maximum Likelihood Estimation of a Stochastic Integrate-and-Fire Neural Encoding Model

Analyzing Neural Responses to Natural Signals: Maximally Informative Dimensions

Information Theory. Mark van Rossum. January 24, School of Informatics, University of Edinburgh 1 / 35

Modelling temporal structure (in noise and signal)

Adaptation in the Neural Code of the Retina

Nonlinear reverse-correlation with synthesized naturalistic noise

Characterization of Nonlinear Neuron Responses

Bayesian active learning with localized priors for fast receptive field characterization

High-dimensional geometry of cortical population activity. Marius Pachitariu University College London

A Deep Learning Model of the Retina

Probabilistic Inference of Hand Motion from Neural Activity in Motor Cortex

LETTERS. Spatio-temporal correlations and visual signalling in a complete neuronal population

Two-dimensional adaptation in the auditory forebrain

Convolutional Spike-triggered Covariance Analysis for Neural Subunit Models

Learning from Data: Regression

Modeling and Characterization of Neural Gain Control. Odelia Schwartz. A dissertation submitted in partial fulfillment

STA 4273H: Sta-s-cal Machine Learning

Lateral organization & computation

Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons

Model-based decoding, information estimation, and change-point detection techniques for multi-neuron spike trains

Mathematical Tools for Neuroscience (NEU 314) Princeton University, Spring 2016 Jonathan Pillow. Homework 8: Logistic Regression & Information Theory

!) + log(t) # n i. The last two terms on the right hand side (RHS) are clearly independent of θ and can be

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Population Coding. Maneesh Sahani Gatsby Computational Neuroscience Unit University College London

A Monte Carlo Sequential Estimation for Point Process Optimum Filtering

Probabilistic & Unsupervised Learning

Flexible Gating of Contextual Influences in Natural Vision. Odelia Schwartz University of Miami Oct 2015

Identifying Functional Bases for Multidimensional Neural Computations

Neural Decoding. Mark van Rossum. School of Informatics, University of Edinburgh. January 2012

Second Order Dimensionality Reduction Using Minimum and Maximum Mutual Information Models

Bayesian probability theory and generative models

THE retina in general consists of three layers: photoreceptors

Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model

ECE521 week 3: 23/26 January 2017

Model-based clustering of non-poisson, non-homogeneous, point process events, with application to neuroscience

Regression, Ridge Regression, Lasso

Dynamical mechanisms underlying contrast gain control in single neurons

Modeling Convergent ON and OFF Pathways in the Early Visual System

Tuning tuning curves. So far: Receptive fields Representation of stimuli Population vectors. Today: Contrast enhancment, cortical processing

Convolutional Spike-triggered Covariance Analysis for Neural Subunit Models

Using Statistics of Natural Images to Facilitate Automatic Receptive Field Analysis

The unified maximum a posteriori (MAP) framework for neuronal system identification

Computation in a single neuron: Hodgkin and Huxley revisited

Transformation of stimulus correlations by the retina

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing

Probabilistic Models in Theoretical Neuroscience

What do V1 neurons like about natural signals

Intrinsic gain modulation and adaptive neural coding. Abstract. Sungho Hong 1,2, Brian Nils Lundstrom 1 and Adrienne L. Fairhall 1

Design of experiments via information theory

Sean Escola. Center for Theoretical Neuroscience

THE MAN WHO MISTOOK HIS COMPUTER FOR A HAND

Features and dimensions: Motion estimation in fly vision

Is early vision optimised for extracting higher order dependencies? Karklin and Lewicki, NIPS 2005

A new look at state-space models for neural data

Encoding or decoding

arxiv:physics/ v1 [physics.data-an] 7 Jun 2003

CSE/NB 528 Final Lecture: All Good Things Must. CSE/NB 528: Final Lecture

Design principles for contrast gain control from an information theoretic perspective

Exercises. Chapter 1. of τ approx that produces the most accurate estimate for this firing pattern.

Hierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References

Sparse Coding as a Generative Model

Transcription:

Statistical models for neural encoding Part 1: discrete-time models Liam Paninski Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/ liam liam@gatsby.ucl.ac.uk November 2, 2004

Goals Introduce some statistical methods for reading the brain Give an intuitive feel for when these techniques will work (and why) Discuss novel insights about neural coding obtained via statistical modeling efforts

The neural code x? y Input-output relationship between External observables x (sensory stimuli, motor responses...) Neural variables y (spike trains, population activity...) Probabilistic formulation: p(y x)

...reading the brain. Basic goal Fundamental question: how to estimate p(y x) from experimental data? General problem is too hard not enough data, too many possible inputs x e.g., responses to natural scenes in IT cortex

Avoiding the curse of insufficient data Many approaches to make problem tractable: 1: Estimate some functional f(p) instead e.g., NIPS03 workshop on estimation of information-theoretic quantities 2: Select stimuli more efficiently e.g., (Foldiak, 2001; Machens, 2002; Paninski, 2003b) 3: Fit a model with small number of parameters

Neural encoding models Good news: many methods available. Well-understood, easy to use. Bad news: none are perfect. Variety of approaches: different tools for different situations

Encoding models Main theme: want model to be flexible but not overly so Flexibility vs. fittability

We want p(spike x) Encoding models 0-th idea: in 1-dimension or finite x: just take ˆp(spike x) = fraction of x which led to spike Very flexible ( nonparametric ), but quickly overwhelmed for large-d x

Additive models First idea: model p(spike x) as linear in x: p(spike x) = k x + b. Fit coefficients by linear regression: k = (E( x t x)) 1 E( x spike) Easy, but too simple. Neurons often code nonlinearly; besides, leads to negative firing rate predictions.

Aside: regression Where does this come from? k = (E( x t x)) 1 E( x spike) Optimal least-squares problem: choose k to minimize Err = E[( k x y) 2 ]. Take derivative w.r.t. k, set to zero (exercise). Is this automatically a minimum? If so, is it automatically the only minimum?

Additive models Next: polynomial (Volterra/Wiener) expansion. p(spike x) = k 0 + k 1 x + x t K 2 x + K 3 x 3 +... - can be fit with regression methods...

Volterra models... but still not great. Quadratic approximation often poor; going beyond 2nd order requires much data 2 true f quad approx 1.5 1 0.5 3 2 1 0 1 2

Additive models Natural direction: more general expansions. p(spike x) = i k i F i ( x) e.g., F i = sigmoids, Gaussian bumps Can still be fit by regression methods. not a bad idea, but not very commonly used. Requires: 1) good functions F i to try 2) good way to control complexity (avoid overfitting)

Different approach: Cascade models Idea: dimensionality reduction Try to pick out a few important linear filters { k i } i m and then fit nonlinear models in this lower-d space

.

LNP model k could represent (quasilinear) presynaptic, dendritic filtering How to fit k and nonlinearity?

Computing the STA just cross-correlate spikes and x.

STA defines a direction in stimulus space STA is unbiased estimate of k if p(stim) is radially symmetric: (Chichilnisky, 2001).

Aside: bias and variance In general, two kinds of estimation error to consider: Bias: average error E(ˆk) k; expected difference between estimate ˆk and true k Variance: spread around mean E(ˆk) bias of STA is zero. But still need sufficient samples to make variance small (reduce noise in estimate).

(Chichilnisky, 2001) Retinal example

Need to account for prior covariance! (Theunissen et al., 2001)

Accounting for prior covariance Instead of taking raw STA, take k = C 1 kst A C = prior covariance matrix, E( x t x). Same as regression solution we used for additive model Unbiased for elliptically symmetric p( x), not just radially symmetric (exercise: prove this.)

Asymmetric examples: STA failures

What if symmetric nonlinearity? Or > 1 filter?

Spike-triggered covariance

Maximizing the change in variance We want to pick the direction where change in variance is largest: maximize v t C v subject to v 2 = 1; C = C prior C spike = find eigenvectors of C with very large / small eigenvalues

Example from fly visual system (Brenner et al., 2001)

(Rust et al., 2003) V1 cells aren t so simple

What if neither mean nor covariance change? Fit L and N at same time. Leads to unbiased estimators, no matter what p(stim) is symmetric priors unnecessary......but takes longer to compute. (Weisberg and Welsh, 1994)

Choose most modulatory direction (Paninski, 2003a; Sharpee et al., 2004) Proof that this gives correct k requires info theory - will come back to this in a couple lectures

Other applications: response-triggered averaging Psychophysics ( classification images ) fmri Optical imaging...

Interneuronal interactions Above: firing rate given stimulus. What if we want rate given activity of neighboring cells in network? In a sense, just redefining what we mean by stimulus = same mathematical techniques apply! Can incorporate single- or multi-unit activity, or LFP, etc.

Predictions from neighbor activity in V1 (Tsodyks et al., 1999)

Predictions from neighbor activity in V1 (Tsodyks et al., 1999)

Predictions from neighbor activity in V1 (Tsodyks et al., 1999)

Combining kinematic, neighbor activity in MI cell num 1 2 3 neural weight (a i ) hand pos (cm) 5 0-5 x y kinematic filter (k) 1 0-0. 2 0 0.2 0.4 1 time (sec) 4 0.5 0 filtered signal predicted rate (Hz) 2 1 0-1 15 10 5 0 kin neur nonlinearity (f) 15 10 5-1 0 1 0 filtered signal firing rate (Hz) target spike train 0 2 4 6 8 10 time (sec) (Paninski et al., 2004)

Combining place field and neighbor activity in hippocampus (Harris et al., 2003)

Regularization Fitting stimulus, neighbor effects = lots of parameters. Lots of parameters + not enough data = overfitting. How to avoid this?

Regularization, linear version Instead of minimizing minimize Err = E[( k x y) 2 ], k 2 = k k. Solution: Err + λ 1 k 2 + λ 2 D k 2, kreg = (C + λ 1 I + λ 2 ) 1 kst A. λ 1, λ 2 0 = no regularization λ 1 = k reg 0: shrinking λ 2 = k reg 1: smoothing

Theoretical justification Idea: introduce some bias to reduce variance If λ 1, λ 2 chosen correctly, k reg is a better estimate for k than kst A, for all underlying k (James and Stein, 1960)

Bayesian interpretation kreg = posterior mean given data { x, y}, under Gaussian errors, Gaussian prior on k (exercise) λ 1, λ 2 large = we expect k to be smooth and small

Regularization can improve estimates of V1 receptive fields (Smyth et al., 2003)

Regularization can improve estimates of A1 receptive fields (Machens et al., 2003)

Regularization can improve predictions given MI neighbor activity 0.8 0.7 0.6 0.5 0.4 likelihood 0.3 0.2 0.1 0 0.1 0.2 2 1.5 1 0.5 0 0.5 1 1.5 2 2.5 log λ (Paninski et al., 2003; Paninski, 2004)

One last problem... We assumed above that spikes are conditionally independent given x. What if spikes are history dependent? (they are... refractory period, adaptation, burstiness, etc.)

STA is biased if spikes history-dependent (Pillow and Simoncelli, 2003)

Summary so far... Introduced cascade idea, geometric intuition Discussed STA, STC fitting procedures Extended cascade idea to include interneuronal effects Illustrated importance of complexity control, reducing overfitting Next: incorporating history dependence; continuous-time models

References Brenner, N., Bialek, W., and de Ruyter van Steveninck, R. (2001). Adaptive rescaling optimizes information transmission. Neuron, 26:695 702. Chichilnisky, E. (2001). A simple white noise analysis of neuronal light responses. Network: Computation in Neural Systems, 12:199 213. Foldiak, P. (2001). Stimulus optimisation in primary visual cortex. Neurocomputing, 38 40:1217 1222. Harris, K., Csicsvari, J., Hirase, H., Dragoi, G., and Buzsaki, G. (2003). Organization of cell assemblies in the hippocampus. Nature, 424:552 556. James, W. and Stein, C. (1960). Estimation with quadratic loss. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, 1:361 379. Machens, C. (2002). Adaptive sampling by information maximization. Physical Review Letters, 88:228104 228107. Machens, C., Wehr, M., and Zador, A. (2003). Spectro-temporal receptive fields of subthreshold responses in auditory cortex. NIPS. Paninski, L. (2003a). Convergence properties of some spike-triggered analysis techniques. Network: Computation in Neural Systems, 14:437 464. Paninski, L. (2003b). Design of experiments via information theory. Advances in Neural Information Processing Systems, 16. Paninski, L. (2004). Maximum likelihood estimation of cascade point-process neural encoding models. Network: Computation in Neural Systems, 15:243 262. Paninski, L., Fellows, M., Shoham, S., Hatsopoulos, N., and Donoghue, J. (2004). Superlinear population encoding of dynamic hand trajectory in primary motor cortex. Journal of Neuroscience, 24:8551 8561.

Paninski, L., Lau, B., and Reyes, A. (2003). Noise-driven adaptation: in vitro and mathematical analysis. Neurocomputing, 52:877 883. Pillow, J. and Simoncelli, E. (2003). Biases in white noise analysis due to non-poisson spike generation. Neurocomputing, 52:109 115. Rust, N., Schwartz, O., Movshon, A., and Simoncelli, E. (2003). Spike-triggered characterization of excitatory and suppressive stimulus dimensions in monkey v1 directionally selective neurons. Presented at the Annual Computational Neuroscience meeting, Alicante, Spain. Sharpee, T., Rust, N., and Bialek, W. (2004). Analyzing neural responses to natural signals: Maximally informative dimensions. Neural Computation, 16:223 250. Smyth, D., Willmore, B., Baker, G., Thompson, I., and Tolhurst, D. (2003). The receptive-field organization of simple cells in primary visual cortex of ferrets under natural scene stimulation. Journal of Neuroscience, 23:4746 4759. Theunissen, F., David, S., Singh, N., Hsu, A., Vinje, W., and Gallant, J. (2001). Estimating spatio-temporal receptive fields of auditory and visual neurons from their responses to natural stimuli. Network: Computation in Neural Systems, 12:289 316. Tsodyks, M., Kenet, T., Grinvald, A., and Arieli, A. (1999). Linking spontaneous activity of single cortical neurons and the underlying functional architecture. Science, 286:1943 1946. Weisberg, S. and Welsh, A. (1994). Adapting for the missing link. Annals of Statistics, 22:1674 1700.