Searching for simple models

Similar documents
encoding and estimation bottleneck and limits to visual fidelity

Optimal Filtering in the Salamander Retina

Information Theory. Mark van Rossum. January 24, School of Informatics, University of Edinburgh 1 / 35

Do Neurons Process Information Efficiently?

The homogeneous Poisson process

Membrane equation. VCl. dv dt + V = V Na G Na + V K G K + V Cl G Cl. G total. C m. G total = G Na + G K + G Cl

Transformation of stimulus correlations by the retina

Analyzing large-scale spike trains data with spatio-temporal constraints

Brains and Computation

arxiv:physics/ v1 [physics.data-an] 7 Jun 2003

BASIC VISUAL SCIENCE CORE

4.2 Entropy lost and information gained

arxiv:q-bio/ v1 [q-bio.nc] 2 May 2005

Analyzing large-scale spike trains data with spatio-temporal constraints

Spatial Vision: Primary Visual Cortex (Chapter 3, part 1)

Adaptation in the Neural Code of the Retina

The Bayesian Brain. Robert Jacobs Department of Brain & Cognitive Sciences University of Rochester. May 11, 2017

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing

Features and dimensions: Motion estimation in fly vision

Bayesian probability theory and generative models

arxiv:cond-mat/ v2 27 Jun 1997

Estimation of information-theoretic quantities

Learning the collective dynamics of complex biological systems. from neurons to animal groups. Thierry Mora

Limulus. The Neural Code. Response of Visual Neurons 9/21/2011

RESEARCH STATEMENT. Nora Youngs, University of Nebraska - Lincoln

1/12/2017. Computational neuroscience. Neurotechnology.

Hopfield Neural Network and Associative Memory. Typical Myelinated Vertebrate Motoneuron (Wikipedia) Topic 3 Polymers and Neurons Lecture 5

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16

Solving with Absolute Value

Efficient representation as a design principle for neural coding and computation

Do students sleep the recommended 8 hours a night on average?

In other words, we are interested in what is happening to the y values as we get really large x values and as we get really small x values.

Resonance and response

The Philosophy of Physics. Is Space Absolute or Relational?

Special Theory of Relativity Prof. Shiva Prasad Department of Physics Indian Institute of Technology, Bombay. Lecture - 15 Momentum Energy Four Vector

SPIKE TRIGGERED APPROACHES. Odelia Schwartz Computational Neuroscience Course 2017

Ising models for neural activity inferred via Selective Cluster Expansion: structural and coding properties

Bayesian Inference. 2 CS295-7 cfl Michael J. Black,

September 16, 2004 The NEURON Book: Chapter 2

Identification of Odors by the Spatiotemporal Dynamics of the Olfactory Bulb. Outline

Adaptive contrast gain control and information maximization $

Effects of Interactive Function Forms in a Self-Organized Critical Model Based on Neural Networks

Temporal filtering in retinal bipolar cells

Lesson 6-1: Relations and Functions

SUPPLEMENTARY INFORMATION

CSC321 Lecture 16: ResNets and Attention

CS1800: Mathematical Induction. Professor Kevin Gold

Mid Year Project Report: Statistical models of visual neurons

Machine Learning. Neural Networks

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

Neural Coding: Integrate-and-Fire Models of Single and Multi-Neuron Responses

7 Rate-Based Recurrent Networks of Threshold Neurons: Basis for Associative Memory

213 Midterm coming up

Capacitors. Chapter How capacitors work Inside a capacitor

COMP 546. Lecture 21. Cochlea to brain, Source Localization. Tues. April 3, 2018

Fisher Information Quantifies Task-Specific Performance in the Blowfly Photoreceptor

CHAPTER 3. Pattern Association. Neural Networks

Learning Outcomes 2. Key Concepts 2. Misconceptions and Teaching Challenges 3. Vocabulary 4. Lesson and Content Overview 5

Multiple Testing. Gary W. Oehlert. January 28, School of Statistics University of Minnesota

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5

Physics 403. Segev BenZvi. Credible Intervals, Confidence Intervals, and Limits. Department of Physics and Astronomy University of Rochester

1 Probabilities. 1.1 Basics 1 PROBABILITIES

Once Upon A Time, There Was A Certain Ludwig

Classification and Regression Trees

Replay argument. Abstract. Tanasije Gjorgoski Posted on on 03 April 2006

Learning the dynamics of biological networks

7 Recurrent Networks of Threshold (Binary) Neurons: Basis for Associative Memory

Mathematical Tools for Neuroscience (NEU 314) Princeton University, Spring 2016 Jonathan Pillow. Homework 8: Logistic Regression & Information Theory

Chapter 9: The Perceptron

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

arxiv:physics/ v2 [physics.bio-ph] 9 Jul 2002

Lecture 11: Extrema. Nathan Pflueger. 2 October 2013

37-6 Watching the electrons (matter waves)

Computation in a single neuron: Hodgkin and Huxley revisited

Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment

MITOCW ocw f99-lec01_300k

Nonlinear reverse-correlation with synthesized naturalistic noise

Probability and Independence Terri Bittner, Ph.D.

CMPSCI611: Three Divide-and-Conquer Examples Lecture 2

Concerns of the Psychophysicist. Three methods for measuring perception. Yes/no method of constant stimuli. Detection / discrimination.

AN ALGEBRA PRIMER WITH A VIEW TOWARD CURVES OVER FINITE FIELDS

Introduction to MRI Acquisition

Natural Image Statistics and Neural Representations

Chapter 2. Mathematical Reasoning. 2.1 Mathematical Models

Convolution and Linear Systems

Neuroscience Introduction

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018

AQI: Advanced Quantum Information Lecture 6 (Module 2): Distinguishing Quantum States January 28, 2013

The World According to Wolfram

Period Analysis on a Spreadsheet

Section 20: Arrow Diagrams on the Integers

THE retina in general consists of three layers: photoreceptors

Entropy and information in neural spike trains: Progress on the sampling problem

CPSC 340: Machine Learning and Data Mining

1 Boolean Algebra Simplification

Artificial Neural Networks Examination, June 2005

Math 138: Introduction to solving systems of equations with matrices. The Concept of Balance for Systems of Equations

Modeling Convergent ON and OFF Pathways in the Early Visual System

Please bring the task to your first physics lesson and hand it to the teacher.

Transcription:

Searching for simple models 9th Annual Pinkel Endowed Lecture Institute for Research in Cognitive Science University of Pennsylvania Friday April 7 William Bialek Joseph Henry Laboratories of Physics, and Lewis-Sigler Institute for Integrative Genomics Princeton University http://www.princeton.edu/~wbialek/wbialek.html

When we think about (and act upon) the things that we see (and hear, and...), we put them into categories all images from Design Within Reach (http://dwr.com)

g stools office chairs dining chairs benches lounge chairs

In the dark of night, vision is based on signals (only) from the rod photoreceptor cells..5.5.5!.5!!.5! 4 5 6 7.5.5.5.5.5.5!.5!!.5! 4 5 6 7 consider the responses to dim flashes of light.5.5.5!.5!!.5! 4 5 6 7.5.5.5.5!.5!.5.5!.5!.5!.5! 4 5 6 7!.5!!.5! 4 5 6 7.5!.5.5! 4 5 6 7.5.5.5.5.5.5!.5!!.5.5.5!.5!!.5.5!.5! 4 5 6 7.5.5.5!.5!!.5! 4 5 6 7!.5!!.5! 4 5 6 7.5.5.5!.5! 4 5 6 7.5.5.5!.5!!.5! 4 5 6 7!.5!.5.5!.5! 4 5 6 7.5.5.5.5!.5! 4 5 6 7!.5!!.5! 4 5 6 7.5.5!.5!!.5! 4 5 6 7!.5!!.5! 4 5 6 7 ~ microns salamander rods (not that it really matters) rod image and current data from FM Rieke salamander image from MJ Berry II The brain has the problem of categorizing these responses!

Remember Hecht, Shlaer & Pirenne! Energy, quanta and vision, J Gen Physiol 5, 89 (94) probability of seeing.9.8.7.6.5.4... Hecht Shlaer Pirenne K=6 (inferred) mean number of photons at the retina x-axis is proportional to light intensity of stimulus flash solid line is model where observer sees when more than K photons are counted at the retina... distribution of counts determined by physics of the light source in this regime, our visual perception is controlled by the random arrival of individual photons at the retina categories of rod cell response should correspond to zero, one, two... photons try to categorize based simply on current at peak time probability density (/pa).8.7.6.5.4... photons? photon!!.5.5.5.5.5 4 current at t peak (pa) photons this gives the right idea, but simplest approach leaves substantial ambiguities (?) is there a better strategy??

Raw data for categorization is the current I(t) many time points = many dimensions better categories = better boundaries in this multi-dimensional space predicted bipolar response measured bipolar response try planar boundaries: decisions with output of a linear filter best filter determined by signal and noise properties of the rods themselves normalized voltage.4 photons measured rod (voltage) response. probability density (/pa).8.6.4. optimal filter resolves almost all ambiguity photon!!.5.5.5.5.5 4 photons..5. time after light flash (seconds) if there is a unique optimal filtering strategy for processing the rod cell signals, the retina should use this strategy... this is a parameter free prediction! Optimal filtering in the salamander retina. F Rieke, WG Owen & W Bialek, in Advances in Neural Information Processing, R Lippman, J Moody & D Touretzky, eds, pp 77-8 (Morgan Kaufmann, San Mateo CA, 99).

categorizing rod responses might be analogous to categorizing images of chairs but sometimes simpler animals actually have to solve the same problem that we do not as different as Mr Larson thinks

place a small wire in the back of the fly s head to listen in on the electrical signals from nerve cells that respond to movement The fly has to solve (at least) two problems:! estimate motion from the movie on the retina, and represent or encode the result in the sequence of spikes Optimization principles (as with optimal filtering above): estimates as accurate as possible coding in spikes should be matched to input signals Spikes: Exploring the Neural code F Rieke, D Warland, RR de Ruyter van Steveninck & WB (MIT Press, 997) focus will be on extracting a feature rather than building its representation, i.e., estimation theory not information theory (maybe a mistake for this talk)

does the fly make accurate estimates of motion? we can get at this by decoding the spikes from the motion sensitive neurons then look at the power spectrum of errors in the reconstructed signal F ( τ) S(ω) N (ω) N min (ω) v est (t) = i F (t t i ) Reading a neural code WB, F Rieke, RR de Ruyter van Steveninck & D Warland Science 5, 854 (99) compare with the minimum level of errors set by diffraction blur and photoreceptor noise (includes averaging over ~ receptors!) fly approaches optimal estimation on short time scales (high frequencies) that actually matter for behavior

What computation must the fly do in order to achieve optimal motion estimates? motion estimation takes photoreceptor signals as inputs (not velocity!) after several layers of processing... output is an estimate of velocity we can actually solve the optimal estimation problem in some limiting cases (also need hypotheses about the statistical structure of the world) Statistical mechanics and visual signal processing M Potters & WB, J Phys I France 4, 755 (994) at high signal-to-noise ratios, velocity is just the ratio of temporal and spatial derivatives v est (t) i (dv i/dt) (V i V i+ ) constant + i (V i V i+ ) V/ t V/ x at low signal-to-noise ratios, the only reliable velocity signal is spatiotemporal correlation v est (t) ij dτ dτ V i (t τ)k ij (τ, τ )V j (t τ ) + optimal estimation always is a tradeoff between systematic and random errors (think about averaging over time in the lab!) the optimal estimator is not perfect... and can even see motion when nothing moves visual stimuli from RR de Ruyter van Steveninck (flies see it move too!) can go further with random stimuli to dissect the computation Features and dimensions: Motion estimation in fly vision WB & RR de Ruyter van Steveninck, http://arxiv.org/q-bio/55 (5)

Almost everything interesting that the brain does involves LOTS of neurons How do we think about these networks as a whole? Imagine slicing time into little windows, like the frames of a movie. In each frame, each cell either spikes, or it doesn t. a moment ago now a moment later neuron # spike no spike no spike neuron # no spike spike no spike neuron # spike no spike spike neuron # 4 no spike no spike spike New experimental methods make it possible to listen in on many neurons at once (MJ Berry II). neuron #5 no spike spike no spike neuron # 6 no spike no spike spike neuron # 7 no spike spike no spike neuron # 8 spike no spike no spike neuron # 9 no spike no spike no spike neuron # no spike spike spike states of the network: these cells from a salamander retina (not crucial, but cute) These are the words with which the retina tells the brain what we see! How big is the vocabulary? cells = 4 possible words cells =,67,65,6,8,9,45,44,4,7,84 possible words

An important insight from theory: If it really is as complicated as it possibly could be, you ll never understand it. Progress = Simplification Simplifying hypothesis #: cells cooperate, but only talk to each other by... collective actions emerge from all pairwise discussions. Simplifying hypothesis #: every cell does its thing independently of all the others. works surprisingly well if you look just at pairs of cells # of cell pairs % correlation independent model experiments! probability fails dramatically if you look at 4 cells (or in detail at cells) / /, one in a million (C Boutin & J Jameson, Princeton Weekly Bulletin, May, 6) 5 5 # of cells that spike together Simplifying hypothesis #: if cells spike together, there must be something special about those cells... (actually, not a simplification). Take seriously the weak correlations among many pairs (compare w/cortex!). Build the least structured model consistent with these correlations. (least structured = maximum entropy) Weak pairwise correlations imply strongly correlated network states in a neural population E Schneidman, MJ Berry II, R Segev & WB, Nature 44, 7 (6).

The model we are looking for (minimal structure to match the pairwise correlations) is exactly the Ising model in statistical mechanics. σ i = + neuron i fires a spike σ i = neuron i is silent state of entire network: σ, σ,, σ N {σ i } distribution of network states (words) P (σ, σ,, σ N ) = Z exp h i σ i + J ij σ i σ j i i j for N= we can check the whole distribution! entropy if N neurons are independent Entropy scale Max ent model captures ~9% of the structure max ent given pair correlations actual entropy independent neurons (suggested by weak correlations) pairwise (Ising) model But this is also the Hopfield model of memory! Are there stored patterns?

Moving to larger networks opens a much richer structure! with N=4 neurons we have multiple ground states (=stored patterns) network returns to the same basin of attraction when we play the movie again, even if microstate is different observed groups of cells are typical of ensembles generated by drawing means and correlations at random from observed distribution... suggests extrapolation to larger N model from real data can be thought of as being at temperature T= study specific heat vs T (integrate to get entropy) is the system poised near a critical point? (test via adaptation expt!) Ising models for networks of real neurons G Tkacik, E Schneidman, MJ Berry II & WB, http://arxiv.org/q-bio.nc/67 (6)

How seriously should you take these maximum entropy models? Can we use them to describe more complex phenomena? Try words as networks of letters! one year of Reuters articles (~9 million words) choose 5 most common words select the subset of four letter words (66 distinct words) 4 full probability distribution has (6) ~ half a million elements max ent model consistent with pairwise correlations has ~ 6x(6) ~ 4 parameters, x fewer (!) recall that spelling rules have a very combinatorial feel... if all letters used equally, entropy = 4xlog (6) = 8.8 bits taking account of letter frequencies, entropy of independent letters = 4.59 bits entropy of actual distribution = 7.4 bits so, multi-information = 7.7 bits max-ent model captures 6. bits, or ~87% of the structure inevitably, the model assigns nonzero probability to words not seen in the data... (remember the data are limited to 5 most common words) rite, hove, rase, lost, hive, mave, wark, whet, lise, tame, leat, fave, tike, pall, meek, nate, mast, hale, sime, gave, tome,... Toward a statistical mechanics of four letter words, GJ Stephens & WB, in progress (6).

What problem is the brain solving? classification (e.g., rod responses) estimating a feature (e.g., motion)... in many (simple) cases, there is evidence for near-optimal performance (many examples I didn t discuss) if we take this seriously, we have a theory of what the brain should compute; key qualitative prediction is context dependence but: why these features? (laundry list problem) is there a unifying theme for the problems that the brain solves well? are all the problems really the problem of prediction? How do neurons cooperate in networks? common observation is that pairs of neurons are only weakly correlated or anti-correlated but there are LOTS of pairs from (simple) statistical mechanics models: if all pairs interact, weak is <</N, not << minimally structured models consistent with weak correlations predict dramatic collective states: memories, critical points... (exotica implied by modest phenomena!) how do we connect network dynamics and computational function? maybe: predictive information is maximal at critical points...