Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons

Similar documents
Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons

Neural Coding: Integrate-and-Fire Models of Single and Multi-Neuron Responses

Statistical models for neural encoding, decoding, information estimation, and optimal on-line stimulus design

CHARACTERIZATION OF NONLINEAR NEURON RESPONSES

encoding and estimation bottleneck and limits to visual fidelity

Information maximization in a network of linear neurons

Encoding or decoding

Sean Escola. Center for Theoretical Neuroscience

Phenomenological Models of Neurons!! Lecture 5!

Mathematical Tools for Neuroscience (NEU 314) Princeton University, Spring 2016 Jonathan Pillow. Homework 8: Logistic Regression & Information Theory

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing

SUPPLEMENTARY INFORMATION

Pattern Recognition and Machine Learning

Neural Network Training

Today. Calculus. Linear Regression. Lagrange Multipliers

Lecture 5: GPs and Streaming regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Chapter 9 Fundamental Limits in Information Theory

Comparing integrate-and-fire models estimated using intracellular and extracellular data 1

Estimation of information-theoretic quantities

Information Theory. Mark van Rossum. January 24, School of Informatics, University of Edinburgh 1 / 35

Machine Learning Lecture 5

ECE521 week 3: 23/26 January 2017

Membrane equation. VCl. dv dt + V = V Na G Na + V K G K + V Cl G Cl. G total. C m. G total = G Na + G K + G Cl

Support Vector Machines

STA414/2104 Statistical Methods for Machine Learning II

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Ch 4. Linear Models for Classification

GAUSSIAN PROCESS REGRESSION

9 Classification. 9.1 Linear Classifiers

Iterative Reweighted Least Squares

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Linear classifiers: Overfitting and regularization

Probabilistic numerics for deep learning

Statistical models for neural encoding

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Neural Encoding Models

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

THE retina in general consists of three layers: photoreceptors

Lecture 14 February 28

Information Dynamics Foundations and Applications

Transformation of stimulus correlations by the retina

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

ECE521 lecture 4: 19 January Optimization, MLE, regularization

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems

Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment

Machine Learning Techniques

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

Signal Processing - Lecture 7

State-Space Methods for Inferring Spike Trains from Calcium Imaging

Hierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References

Laplacian Filters. Sobel Filters. Laplacian Filters. Laplacian Filters. Laplacian Filters. Laplacian Filters

Comparison of receptive fields to polar and Cartesian stimuli computed with two kinds of models

3.3 Population Decoding

Linear & nonlinear classifiers

Logistic Regression. COMP 527 Danushka Bollegala

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Deep Feedforward Networks

Linear Models for Regression

Combining biophysical and statistical methods for understanding neural codes

Linear Models for Regression

Artificial Neural Networks

Probabilistic Machine Learning. Industrial AI Lab.

Machine Learning. Neural Networks. Le Song. CSE6740/CS7641/ISYE6740, Fall Lecture 7, September 11, 2012 Based on slides from Eric Xing, CMU

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Reliability Monitoring Using Log Gaussian Process Regression

Nonparametric Bayesian Methods (Gaussian Processes)

Channel capacity. Outline : 1. Source entropy 2. Discrete memoryless channel 3. Mutual information 4. Channel capacity 5.

Computational statistics

SPIKE TRIGGERED APPROACHES. Odelia Schwartz Computational Neuroscience Course 2017

Exercises. Chapter 1. of τ approx that produces the most accurate estimate for this firing pattern.

Reading Group on Deep Learning Session 1

6.036 midterm review. Wednesday, March 18, 15

Generalization. A cat that once sat on a hot stove will never again sit on a hot stove or on a cold one either. Mark Twain

Artificial Neural Networks. MGS Lecture 2

Is early vision optimised for extracting higher order dependencies? Karklin and Lewicki, NIPS 2005

Introduction to Convolutional Neural Networks (CNNs)

What is the neural code? Sekuler lab, Brandeis

Efficient Coding. Odelia Schwartz 2017

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Nonlinear reverse-correlation with synthesized naturalistic noise

Regularization in Neural Networks

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Do Neurons Process Information Efficiently?

Probabilistic Graphical Models

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Bayesian probability theory and generative models

Notes on Noise Contrastive Estimation (NCE)

Gentle Introduction to Infinite Gaussian Mixture Modeling

STA 4273H: Statistical Machine Learning

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.

Expectation Maximization

Statistical Methods for SVM

Transcription:

Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons Yan Karklin and Eero P. Simoncelli NYU

Overview Efficient coding is a well-known objective for the evaluation and design of signal processing systems, and provides a theoretical framework for understanding biological sensory systems. (I.E. a system is designed to maximize the mutual information I(X; Y ) for a given distribution on X and some model constraints) The receptive field of retinal neurons have been shown to be consistent with efficient coding principles

Linear-nonlinear neurons One of the simplest, but accurate, neural models for retinal cells comes from a LN (linear-nonlinear) cascade. We can write the system as: where r j is the firing rate of neuron j. Descriptive figure on next slide. r j = f j (y j ) + n (r) j (1) y j = w T j (x + n (x) ) (2)

b Figure 1: a. Schematic of the model (see text for description). The transfer between images x and the neural response r, subjecttom Information about the stimulus is conveyed both by the arrangeme of the neural nonlinearities. Top: two neurons encode two stimulu

Infomax We want to maximize the mutual information between the signal input x and the firing rates of the neurons r. However, in a biological system, it is expensive to fire action potentials. The implies that there should be a penalty on the firing rates of the neurons r. Therefore, we want to optimize the objective function: I(X; R) λ j r j (3) j j λ j r j gives a penalty on increasing the firing rate. This is a biologically difficult parameter to obtain, so instead of setting λ j we can instead treat λ j as a Lagrange multiplier to have r j match the observed r j. In general, r j is constrained to be positive.

Model for f j ( ) While often the nonlinear is considered to be set, for example via a exponential or logistic link function, it can be of interest to learn the ideal f j ( ). To do this, we have to specify a flexible model for f j ( ). In the neural case, it makes sense for this function to be nondecreasing. One way to do this is to parameterize the slope of the model, g j = df j /dy j, by K closely spaced, nonnegative, overlapping Gaussian kernels: g j (y j x jk, µ jk, σ j ) = ( K c jk exp (y ) j µ jk ) 2 2σj 2 k=1 where µ jk, σ 2 j, and K are set and c jk 0 are to be learned. (4)

Calculating mutual information (I) We can write: I(X; R) = H(X) H(X R) (5) Since only H(X R) is going to change when we change the nonlinearities or the linear kernels, we only need to worry about minimizing H(X R) If we have data samples {x, r}, then we can approximate this entropy: H(X R) 1 h(x R = r i ) (6) n i

Calculating mutual information (II) To calculate the posterior, we need to determine the likelihood and prior terms. We can use a taylor series expansion to calculate the likelihood term of p(r x), where i denotes all values dependent on the input: r i G i W T (x i + n i x) + n i r + f i 0 (7) where G i is the local value of the derivative of our nonlinearity and f i 0 is the local offset. If we have covariances C n x and C nr, then our likelihood covariance is approximately: C i r x = Gi W T C nx WG i + C nr (8)

Calculating mutual information (III) To estimate real images, we would like to put as few assumptions on the prior as possible. However, constructing a accurate high dimensional approximation to real images is difficult. To generate samples, the authors sample real images to get individual samples, and estimate the curvature of the prior by using a global curvature approximation, which gives an uncertainty estimate C x (basically a Laplace approximation to the real data), and our posterior covariance is then approximately: C i x r = (C 1 x + WG i (G i W T C nx WG i + C nr ) 1 G i W T ) 1 (9) and H(X R = r i ) = 1 2 log 2πe det(xi x r ) (10)

Training on natural images Natural image data were obtained by sampling 16x16 patches randomly from a collection of grayscale photographs of zero-meaned outdoor scenes. The input and output noises were assumed i.i.d., so C nx = σ 2 n x I D and C nr = σ 2 n r I J and were estimated to match observed biological properties. The model consided of 100 neurons, and filter weights were initialized to random values

Numerical Optimization From the approximations, the derivatives on all the parameters are calculable, although via some messy linear algebra. Optimization proceeded via gradient descent using Laplace multipliers to maintain the energy constraint and both W and the nonlinear functions were learned. Batch size was 100 patches randomly sampled from the training data and optimization was run over 100000 iterations.

a b ON center OFF center 16 16 c 15 1 1 16 1 1 16 0 Figure 2: In the presence of biologically realistic level of noise, the optimal filters are centersurround and contain both On-center and Off-center profiles; the optimal nonlinearities are hardrectifying functions. a. Thesetoflearnedfiltersfor100modelneurons. b. Inpixelcoordinates, contours of On-center (Off-center) filters at 50% maximum (minimum) levels. c. The learned nonlinearities for the first four model neurons, superimposed on distributions of filter outputs. a 24 b 10 sp/sec 12 sp/sec

Figure 2: In the presence of biologically realistic level of noise, the optimal filters are centersurround and contain both On-center and Off-center profiles; the optimal nonlinearities are hardrectifying functions. a. Thesetoflearnedfiltersfor100modelneurons. b. Inpixelcoordinates, contours of On-center (Off-center) filters at 50% maximum (minimum) levels. c. The learned nonlinearities for the first four model neurons, superimposed on distributions of filter outputs. a 24 b 10 sp/sec 12 sp/sec 0 60 0 10 sp/sec 30 sp/sec 0 1 11 3 0 3 0 1 16 3 0 3 Figure 3: a. A characterization of two retinal ganglion cells obtained with white noise stimulus [31]. We plot the estimated linear filters, horizontal slices throughthefilters,andmeanoutputas afunctionofinput(blackline,shadedareashowsonestandard deviation of response). b. For comparison, we performed the same analysis on two model neurons. Note that the spatial scales of model and data filters are different. in the number of On-center neurons (bottom left panel). In this case, increasing the number of neurons restored the balance of On- and Off-center filters (not shown). In the case of vanishing input and output noise, we obtain localized oriented filters (top left panel),and the nonlinearities are smoothly accelerating functions that map inputs to an exponential output distribution (not shown). These results are consistent with previous theoretical workshowingthatoptimalnonlinearityinthe low noise regime maximizes the entropy of the output subject to response constraints [11, 7, 17]. How important is the choice of linear filters for efficient information transmission? We compared the performance of different Reviewfiltersets by: Davidacross Carlson a range Efficient of firingcoding rates (Fig. with5). LNFor neurons each simulation, we

output noise input noise =0.10 (20dB) σnr σnx =0.10 (20dB) σnx =0.18 (15dB) σnx =0.40 (8dB) σnr =2( 6dB) Figure 4: Each panel shows a subset of filters (20 of 100) obtained under different levels of input and output noise, as well as the nonlinearity for a typical neuron in each model. W 1 W 2 W 3 45 40 MI (bits) 35 30 1 25 20 2 3 15 80 100 120

and output noise, as well as the nonlinearity for a typical neuron in each model. W 1 W 2 W 3 30 1 20 2 3 15 80 100 120 total spikes Figure 5: Information transmitted as a function of spike rate, under noisy conditions (8dB SNR in, 6dB SNR out ). We compare the performance of optimal filters (W 1 )tofiltersobtainedunderlow noise conditions (W 2, 20dB SNR in, 20dB SNR out )andpcafilters,i.e.thefirst100eigenvectors of the data covariance matrix (W 3 ). a b varied 40 the number of neurons from 1 (very 150 precise) neuron 50 to 150 (fairly noisy) neurons (Fig. 150 6a). We estimated the transmitted information as described above. In this regime of noise and spiking 40 budget, 30 the optimal population size was around 100 neurons. Next, we repeated the analysis but 100 100 used neurons with fixed precision, i.e., the spike budget 30was scaled with the population to give 1 20 noisy neuron or 150 equally noisy neurons (Fig. 6b). As20 the population grows, more information is 50 50 transmitted, 10 but the rate of increase slows. This suggests that incorporating an additional penalty, 10 such as a fixed metabolic cost per neuron, would allow us to predict the optimal number of canonical noisy0neurons. 0 MI (bits) 0 50 100 150 0 # neurons total spikes Figure 4 Discussion 6: Transmitted information (solid line) and total spike rate (dashed line) as a function of the number of neurons, assuming (a)fixedtotal spike budget and (b)fixedspikebudgetper neuron. We have described an efficient coding model that incorporatesingredientsessentialforcomputa- tion in sensory systems: non-gaussian signal distributions, realistic levels of input and output noise, from an adaptive second-order expansion of the prior density aroundthemaximumoftheposterior. This requires the Review estimation by: David of local Carlson density (orefficient rather, its coding curvature) with LNfrom neurons samples, which is metabolic costs, nonlinear responses, and a large population of neurons. The resulting optimal solua MI (bits) MI (bits) 45 40 35 25 0 50 100 150 0 # neurons total spikes