+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing

Similar documents
Exercises. Chapter 1. of τ approx that produces the most accurate estimate for this firing pattern.

3.3 Population Decoding

Signal detection theory

Encoding or decoding

The homogeneous Poisson process

The Bayesian Brain. Robert Jacobs Department of Brain & Cognitive Sciences University of Rochester. May 11, 2017

3 Neural Decoding. 3.1 Encoding and Decoding. (r 1, r 2,..., r N ) for N neurons is a list of spike-count firing rates, although,

Neural Decoding. Chapter Encoding and Decoding

What is the neural code? Sekuler lab, Brandeis

!) + log(t) # n i. The last two terms on the right hand side (RHS) are clearly independent of θ and can be

1/12/2017. Computational neuroscience. Neurotechnology.

Exercise Sheet 4: Covariance and Correlation, Bayes theorem, and Linear discriminant analysis

ECE521 week 3: 23/26 January 2017

Population Coding. Maneesh Sahani Gatsby Computational Neuroscience Unit University College London

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

Bayesian Models in Machine Learning

PATTERN RECOGNITION AND MACHINE LEARNING

Mathematical Tools for Neuroscience (NEU 314) Princeton University, Spring 2016 Jonathan Pillow. Homework 8: Logistic Regression & Information Theory

Concerns of the Psychophysicist. Three methods for measuring perception. Yes/no method of constant stimuli. Detection / discrimination.

How Behavioral Constraints May Determine Optimal Sensory Representations

Neural Encoding: Firing Rates and Spike Statistics

Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment

encoding and estimation bottleneck and limits to visual fidelity

Bayesian probability theory and generative models

A Brief Review of Probability, Bayesian Statistics, and Information Theory

An Introductory Course in Computational Neuroscience

Bayesian Inference. 2 CS295-7 cfl Michael J. Black,

Ways to make neural networks generalize better

Bayesian Decision Theory

Lateral organization & computation

Statistics for the LHC Lecture 1: Introduction

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

INTRODUCTION TO PATTERN RECOGNITION

Sean Escola. Center for Theoretical Neuroscience

STA414/2104 Statistical Methods for Machine Learning II

Naïve Bayes classification

Information Theory. Mark van Rossum. January 24, School of Informatics, University of Edinburgh 1 / 35

Artificial Neural Networks Examination, March 2004

Neuronal Dynamics: Computational Neuroscience of Single Neurons

STA 4273H: Statistical Machine Learning

Probability theory basics

Statistical Data Analysis Stat 3: p-values, parameter estimation

p(d θ ) l(θ ) 1.2 x x x

CSC321 Lecture 18: Learning Probabilistic Models

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Machine Learning for Signal Processing Bayes Classification and Regression

CSE/NB 528 Final Lecture: All Good Things Must. CSE/NB 528: Final Lecture

Bayesian Decision Theory

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

CS 361: Probability & Statistics

Consider the following spike trains from two different neurons N1 and N2:

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Bayesian Methods: Naïve Bayes

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

What does Bayes theorem give us? Lets revisit the ball in the box example.

Lecture : Probabilistic Machine Learning

Bayesian Inference. Introduction

Machine Learning Lecture 3

Error analysis for efficiency

Neural Coding: Integrate-and-Fire Models of Single and Multi-Neuron Responses

State-Space Methods for Inferring Spike Trains from Calcium Imaging

Inconsistency of Bayesian inference when the model is wrong, and how to repair it

From perceptrons to word embeddings. Simon Šuster University of Groningen

The Diffusion Model of Speeded Choice, from a Rational Perspective

Randomized Algorithms

Neural Encoding Models

Probabilistic Reasoning in Deep Learning

CHARACTERIZATION OF NONLINEAR NEURON RESPONSES

SPIKE TRIGGERED APPROACHES. Odelia Schwartz Computational Neuroscience Course 2017

A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace

Lecture 18: Learning probabilistic models

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Poisson Processes for Neuroscientists

Statistical Methods for Particle Physics (I)

Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons

Bayesian Learning (II)

Classification & Information Theory Lecture #8

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP

Design of experiments via information theory

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

L11: Pattern recognition principles

Application: Can we tell what people are looking at from their brain activity (in real time)? Gaussian Spatial Smooth

STA 4273H: Sta-s-cal Machine Learning

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Artificial Neural Networks Examination, June 2005

Exam. Matrikelnummer: Points. Question Bonus. Total. Grade. Information Theory and Signal Reconstruction Summer term 2013

CS 361: Probability & Statistics

Supratim Ray

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Computation in Recurrent Neural Circuits

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

Lecture 3: Pattern Classification

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Model Averaging (Bayesian Learning)

CS 630 Basic Probability and Information Theory. Tim Campbell

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Learning with Probabilities

Some Concepts of Probability (Review) Volker Tresp Summer 2018

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Transcription:

Linear recurrent networks Simpler, much more amenable to analytic treatment E.g. by choosing + ( + ) = Firing rates can be negative Approximates dynamics around fixed point Approximation often reasonable in presence of weak background activity output input In this case, firing rate is relative to baseline rate Rough, but often useful approximation: calculate dynamics of linear network, apply nonlinearity to solution B τ = + + =

Solution for symmetric M and timeindependent input h: Idea: Express firing rate v in terms of eigenvectors of M and solve for timedependent coefficients c µ ( ) = v µ= µ ( ) µ For a real symmetric matrix this is always possible. The eigenvectors satisfy output B input τ = + + µ = λ µ µ They are orthogonal and can be normalized to unit length µ ν = δ µν The eigenvalues λ ν are real for real symmetric matrices.

Substituting into the network equation yields v τ µ= ( ) = µ µ = v µ= v µ= µ ( ) µ ( λ µ ) µ ( ) µ + output B input τ = + + Taking the dot product of each side with e ν yields: τ ν = ( λ ν) ν ( ) + ν This involves only one coefficient c µ, i.e. the different components are decoupled

The solution of τ ν = ( λ ν) ν ( ) + ν ν ( ) = ν λ ν with initial condition ( ) for time-independent input h is ( ( ( λ ν) τ ν ( ) = ν ( ) The full solution is then obtained by )) output input The above equation for c µ (t) has a number of important characteristics to be discussed in the following! B + ν ( ) ( ) = τ = + + v µ= ( ( λ ) ν) τ µ ( ) µ

ν ( ) = ν λ ν ( ( ( λ )) ν) τ + ν ( ) ( ( λ ) ν) τ Discussion If λ v >1, the exponential functions grow without bound, reflecting a fundamental instability If λ v <1, c µ approaches the steady-state value exponentially The time constant is τ /( λ ν ) ν /( λ ν ) Thus, strictly speaking, the system has no memory, i.e. the effect of the initial condition decays to zero exponentially However, the time constant of decay can be very large, much larger than the time constant τ r of an individual neuron.

ν ( ) = ν λ ν ( ( ( λ )) ν) τ The steady state value is proportional to, the projection of the input vector onto the relevant eigenvector The steady state value of the firing rate is = v ν= ( ν ) λ ν ν ν Selective amplification: If one eigenvalue is close to one and all others are significantly different from one: ( ) λ + ν ( ) ( ( λ ν) τ ) More general: Projection onto k-dimensional subspace, in case of k degenerate eigenvalues close to one

Note: the steady-state solution of a recurrent network with time-independent input is also a solution of a feedforward network To see this, consider: Fixed point with steady state output: (for λ ν <1 and constant input u) τ = + + This can be rewritten as a feedforward network = + with = = ( )

Neural Decoding Forward mapping: how will neurons respond to a given stimulus? Backward mapping: how can you know something about the stimulus from the neural responses?

Perception an inverse problem Consider an experiment where we record from a neuron: let s be a stimulus parameter (like orientation of moving edge) let r be the response of the neuron (e.g. spike count firing rate) Then we can define: P[s], the probability of stimulus s being presented P[r], the probability of response r being recorded P[r,s], the probability of stimulus s being presented and response r being recorded (joint probability) P[r s], the conditional probability of evoking response r given that stimulus s was presented P[s r], the conditional probability that stimulus s was presented given the response r was evoked Bayes theorem: P[s r] = P[r s] P[s] P[r] Neural decoding! P[s]: prior P[s r]: posterior probability

Formalizing Neural Variability: Elements of information Theory Mutual information: Difference between total response entropy and average response entropy on trials that involve presentation of the same stimulus. Entropy of the responses evoked by repeated presentations of a given stimulus s: = Average over all the stimuli: = =, Noise entropy Entropy associated with that part of the response variability that is not due to changes in the stimulus, but arises from other sources.

Mutual information: The mutual information is the difference between the total response entropy and the average response entropy on trials that involve repetitive presentation of the same stimulus. = = +, With = one gets =, ( ) ( ) and with stimulus, = = =, ( ),, Which is the mutual information expressed by the KL-divergence

Mutual information is symmetric with respect to Interchanges of r and s =, ( ),, = +, = +, Average uncertainty about identity of stimulus given response.

Neural Example: neuron in MT Middle temporal cortex: large receptive fields sensitive to object motion record from single neuron during movement patterns such as the ones below animal is trained to decide if the coherent movement is upwards or downwards

Left: behavioral performance of the animal and of an ideal observer considering single neuron Right: histograms (thinned) of average firing rate for different stimuli (up/down) at different coherence levels

Likelihood Consider probability distribution depending on parameter θ Likelihood: L( x) =P (x ) The likelihood of parameter value θ given an observed (fixed) outcome x is equal to the probability of x given the parameter value θ Example Given that I have flipped a coin 100 times and it is a fair coin, what is the probability of it landing heads-up every time? "Given that I have flipped a coin 100 times and it has landed heads-up 100 times, what is the likelihood that the coin is fair?

Optimal discrimination Optimal strategy for discriminating between two alternative signals presented in background of noise? Assume we must base our decisions on the observation of a single observable x x could be e.g. the firing rate of a neuron when a particular stimulus is present If the signal is blue then the values of x are chosen from P(x blue) If the signal is red then the values of x are chosen from P(x red) If we have seen a particular value of x, can we tell which signal was presented? Intuition: Divide x axis at critical point x 0 : Everything to right is called a blue, everything to the left a red. How should we choose x 0?

More difficult decision problem: was stimulus blue or red present? 0.2 Optimal decision threshold Probability density 0.15 0.1 0.05 0 0 5 10 15 Firing rate Compare firing rate to threshold If larger than 7.5: blue; else: red

More difficult decision problem: was stimulus blue or red present? 0.2 Optimal decision threshold Probability density 0.15 0.1 0.05 0 0 5 10 15 Firing rate Choose decision boundary based on likelihood ratio: L( x) L( x) = p(x ) p(x )

A general result Applies also to multimodal and multivariate distributions 0.5 Probability density 0.4 0.3 0.2 0.1 0 0 5 10 15 Firing rate Choose decision boundary based on likelihood ratio: L( x) L( x) = p(x ) p(x )

Likelihood ratio, loss, bias L+: penalty associated with answering plus when the correct answer is minus L- : penalty associated with answering minus when the correct answer is plus Given firing rate r, the probabilities that the correct answer is - : P[- r] Probabilities that the correct answer is + : P[+ r]

Average loss expected for a plus answer given r: loss associated with being wrong times the probability of being the loss wrong associated Loss + = L + P[ r] = + Average loss expected for a minus answer given r: oss + = L + P[ r]. S Loss = L P[+ r]. plusî if Loss + Lo= L P[+ r]. A r Loss + Loss this strategy giv Strategy: cut losses. Answer plus if Using P[+ r] = p[r +]P[+] p[r] and P[ r] = p[r ]P[ ] p[r] yields l(r) = p[r +] p[r ] L + P[ ] L P[+]

Interpretation of l(r) = p[r +] p[r ] L + P[ ] L P[+] Likelihood ratio is compared to a threshold Threshold composed of two terms: penalty prior

Continuous variable. Example: Cricket cercal system 1.0 0.5 0.0 0 90 180 (degrees) 270 360 8 maximum likelihood 8 Bayesian error (degrees) 6 4 2 6 4 2 0-180 -90 0 90 180 wind direction (degrees) 0-180 -90 0 90 180 wind direction (degrees)

Optimal General Decoding Methods Optimal methods: Bayesian, maximum a posteriori (MAP), maximum likelihood (ML) MAP estimation: start with Bayes rule = This allows to calculate probability of each possible stimulus s given the neural response r. (But requires knowledge of p(r s), which is (still) difficult to estimate.) Bayesian decoding.: = The MAP estimate of s is the value s* which maximizes p(s r). If p(s) does not depend on s then s* maximizes p(r s). This is called the maximum likelihood (ML) estimate.

Decoding arbitrary stimulus parameter by a population of independent Poisson-neurons Instructive example: Array of N neurons, preferred stimulus value uniformly distributed Tuning curves ( ) = ( ( ) ) σ

The homogeneous Poisson process during very short time interval Δt there is a fixed probability of an event (spike) occurring independent of what happened previously if r is the rate of the Poisson process, then the probability of finding a spike in a short interval Δt is given by r Δt. ( ) The probability of seeing exactly n spikes in a (long) interval T is given by the Poisson distribution: = = (r ) ( r )

Gaussian fit ( ) = = (r ) Properties: mean: E[n] = rt variance: E[ (n - E[n]) 2 ] = rt Fano factor: variance/mean = 1 ( r ) good approximation by Gaussian for large rt (see fig. B)

Poisson neurons: Average rate to stimulus s determined by tuning curve Probability of stimulus evoking spikes = ( ( ) ) ( ) Assume independent neurons: = = ( ( ) ) Apply ML, use logarithm and only consider terms depending on s = ( ( ) ) +... Find maximum of r.h.s. by setting derivative to zero: 1.0 0.5 0.0 = -5-4 -3-2 -1 0 1 2 3 4 5 ( ) ( ( ) ) = ( ) ( ( ) ) = ( ) (remember) = (r ) = ( r ) ( ) ( ) = =

ML estimate Reproduced from previous slide = ( ) ( ) = For Gaussians we can use ( )/ ( ) = ( )/σ and thus obtain = /σ /σ and if all tuning curves have the same width = Intuitive explanation: firing rate weighted average of preferred values of encoding neurons

MAP estimate: include prior information Prior information is taken into account ( ) We now have = ( ( ) ) + +... Derivative to zero: = = ( ) ( ) + = which leads to 1.5 1.0 0.5 = /σ + /σ /σ + /σ Solid: constant stimulus distribution Dashed: s prior = -2, σ prior =1 0.0-1.0-0.5 0.0 0.5 1.0