Correlations in Populations: Information-Theoretic Limits

Similar documents
Limits of population coding

INFORMATION PROCESSING ABILITY OF BINARY DETECTORS AND BLOCK DECODERS. Michael A. Lexa and Don H. Johnson

Inferring the Capacity of the Vector Poisson Channel with a Bernoulli Model

Information Theory. Mark van Rossum. January 24, School of Informatics, University of Edinburgh 1 / 35

When does interval coding occur?

Texas , USA

Multivariate Dependence and the Sarmanov-Lancaster Expansion

Distributed Structures, Sequential Optimization, and Quantization for Detection

Symmetrizing the Kullback-Leibler distance

DETECTION theory deals primarily with techniques for

Basics of Information Processing

Data Complexity and Success Probability for Various Cryptanalyses

Applications of Information Geometry to Hypothesis Testing and Signal Detection

!) + log(t) # n i. The last two terms on the right hand side (RHS) are clearly independent of θ and can be

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

The homogeneous Poisson process

Jointly Poisson processes

Neural Population Structures and Consequences for Neural Coding

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger

CSE/NB 528 Final Lecture: All Good Things Must. CSE/NB 528: Final Lecture

Encoding or decoding

Signal detection theory

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing

Intelligent Systems Statistical Machine Learning

Advanced Machine Learning & Perception

If there exists a threshold k 0 such that. then we can take k = k 0 γ =0 and achieve a test of size α. c 2004 by Mark R. Bell,

Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

On Design Criteria and Construction of Non-coherent Space-Time Constellations

3.3 Population Decoding

Sean Escola. Center for Theoretical Neuroscience

Exercises. Chapter 1. of τ approx that produces the most accurate estimate for this firing pattern.

Population Coding. Maneesh Sahani Gatsby Computational Neuroscience Unit University College London

Intelligent Systems Statistical Machine Learning

Lecture 5 Channel Coding over Continuous Channels

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

A minimalist s exposition of EM

External field h ext. Supplementary Figure 1. Performing the second-order maximum entropy fitting procedure on an

Hypothesis testing:power, test statistic CMS:

The Poisson Channel with Side Information

A DYNAMICAL STATE UNDERLYING THE SECOND ORDER MAXIMUM ENTROPY PRINCIPLE IN NEURONAL NETWORKS

Today. Calculus. Linear Regression. Lagrange Multipliers

Clustering means geometry in sparse graphs. Dmitri Krioukov Northeastern University Workshop on Big Graphs UCSD, San Diego, CA, January 2016

Horizontal mergers: Merger in the salmon market Unilateral effects in Cournot markets with differentiated products

Chapter 5 continued. Chapter 5 sections

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

CS Lecture 19. Exponential Families & Expectation Propagation

CPSC 340: Machine Learning and Data Mining

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 3 September 14, Readings: Mitchell Ch Murphy Ch.

Spatial point process. Odd Kolbjørnsen 13.March 2017

Efficient Coding. Odelia Schwartz 2017

Information Theory in Intelligent Decision Making

Statistics for Particle Physics. Kyle Cranmer. New York University. Kyle Cranmer (NYU) CERN Academic Training, Feb 2-5, 2009

EECS 750. Hypothesis Testing with Communication Constraints

Concerns of the Psychophysicist. Three methods for measuring perception. Yes/no method of constant stimuli. Detection / discrimination.

THE QUEEN S UNIVERSITY OF BELFAST

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

Use of the likelihood principle in physics. Statistics II

Spring 2012 Math 541B Exam 1

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

University of Illinois ECE 313: Final Exam Fall 2014

Information Theory and Hypothesis Testing

Detection theory 101 ELEC-E5410 Signal Processing for Communications

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Variational Inference (11/04/13)

SPIKE TRIGGERED APPROACHES. Odelia Schwartz Computational Neuroscience Course 2017

i=1 h n (ˆθ n ) = 0. (2)

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018

Fisher Information in Gaussian Graphical Models

EECS564 Estimation, Filtering, and Detection Exam 2 Week of April 20, 2015

Lecture 2: Repetition of probability theory and statistics

Comments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert

Support Vector Machines

APC486/ELE486: Transmission and Compression of Information. Bounds on the Expected Length of Code Words

Lecture #11: Classification & Logistic Regression

c 2010 Christopher John Quinn

EE 4TM4: Digital Communications II. Channel Capacity

9 Classification. 9.1 Linear Classifiers

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

Novel spectrum sensing schemes for Cognitive Radio Networks

Data complexity and success probability of statisticals cryptanalysis

Natural Image Statistics and Neural Representations

Machine Learning, Fall 2012 Homework 2

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Superiorized Inversion of the Radon Transform

Lecture 7 Introduction to Statistical Decision Theory

MLPR: Logistic Regression and Neural Networks

Outline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap.

Neural Encoding: Firing Rates and Spike Statistics

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Space-Time Coding for Multi-Antenna Systems

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

Markov Chains. X(t) is a Markov Process if, for arbitrary times t 1 < t 2 <... < t k < t k+1. If X(t) is discrete-valued. If X(t) is continuous-valued

Exponential Families

Posterior Regularization

Expectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

Chapter 4 No. 4.0 Answer True or False to the following. Give reasons for your answers.

Transcription:

Correlations in Populations: Information-Theoretic Limits Don H. Johnson Ilan N. Goodman dhj@rice.edu Department of Electrical & Computer Engineering Rice University, Houston, Texas

Population coding Describe a population as parallel point process channels Variations Separate inputs Common input Dependence among channels What do information theoretic considerations suggest is best? X X 1 X 2 X N Y 1 Y 2 Y N

Modeling approach We would like to use point process models for the outputs Technically very difficult to describe connection-induced dependencies Use simpler Bernoulli models, capable of describing complex correlation structures Assume homogeneous populations P( X 1, X 2,K, X N )= P( X 1 )P( X 2 )LP( X N ) P(X j =1) ' # (2) $( X " 1+ i % p)( X j % p) ) & p(1% p) () i> j # (3) * $( X + i % p)( X j % p)( X k % p) &, i> j>k [ p(1% p)] 3/ 2 +,

A note on modeling Correlation, orthogonal model P( X 1, X 2,K, X N ) = P( X 1 )P( X 2 )L P( X N ) 1 + Exponential model P( X 1, X 2,K, X N ) " exp % & ' + & ( ( ' % i > j > k % i > j " (2) # ( X i $ p i )( X j $ p j ) p i (1$ p i ) p j (1$ p j ) " (3) # ( X i $ p)( X j $ p)( X k $ p) p i (1 $ p i ) p j (1$ p j )p k (1$ p k ) $ # i X i + $ # ij X i X j + $ # ijk X i X j X k + L i i, j i, j,k ( ) * ) + + *

Fisher information analysis How should the stimulus be encoded in spike rate to achieve constant Fisher information? Input structure not important 0.5 Two-neuron Bernoulli Model p ρ 0.5 Three-neuron Bernoulli Model p ρ ρ3 10 Poisson Counting Model Total rate Common rate 0 0 1 α 0 0 1 α 0 0 α 1

Kullback-Leibler distance and data analysis D X (" 1 " 0 ) = # p(x;" 1 )log p(x;" 1 ) x p(x;" 0 ) α 0, α 1 two different stimulus conditions p (x; α) - response probabilities K-L distance is the exponential rate of a Neyman-Pearson classifier s false-alarm probability P F ~ 2!ND X (" 1 " 0 ) for fixed PM Distance resulting from information perturbations is proportional to Fisher information D X (" 0 + #" " 0 ) $ F(" 0 ) % (#") 2

Data Processing Theorem Redux X " X,Y (# 0,# 1 ) = D Y (# 1 # 0 ) D X (# 1 # 0 ) Systems cannot create information 0 $ " X,Y (# 0,# 1 ) $ 1 Basis for a system theory for information processing and determining which structures are inherently more effective Y

Population encoding properties from a K-L distance perspective Individual inputs don t necessarily achieve maximal information transfer " X,Y (N ) # max Explicitly indicating that the inputs encode a single quantity reveals that perfect fidelity is possible X. Y 1 Y 2 Y N 1 γ X,Y (N) i " X i,y i 0 N 1 " X,Y (N ) # 1$ k N or " X,Y (N ) # 1$ e $kn

Another viewpoint: Channel Capacity Capacity for the stationary point process channel is known If 0 λ t λ max is the power constraint C (bits/s) = " max eln 2 = " max 1.88417 If we additionally constrain average rate #" max % C (bits/s) = eln 2, " > " o $ " ln 2 ln ", " o = " max % max e," < " & o " Capacity achieved by a Poisson process driven by a random telegraph wave λ max t

Channel capacity of populations Use a Bernoulli model and investigate the small probability limit to determine capacity for parallel Poisson channels The two input structures have the same capacity C ( N ) = NC (1) = N " # max eln 2 X 1 Y 1 Y 1 X 2 Y 2 X Y 2 X N Y N Y N

Imposing connection dependence changes the story Using Bernoulli models, connection dependence can be added Caveat: modeling Poisson processes Interesting restrictions arise Capacity depends only on pairwise correlations (dependencies) Only positive pairwise correlations possible Restricted range of correlation values For homogenous populations: 0 " # " 1 N $1 For inhomogenous populations: 0 " # " # max

3.5 3 Capacity results Capacity achieved with a homogeneous population Correlation affects the two input structures differently M = 2 3.5 3 Separate inputs M = 3 2.5 Separate inputs 2.5 2 2 1.5 Single, common input 1.5 Single, common input 1 1 0.5 0.5 0 0 0.25 0.5 0.75 1 Correlation 0 0 0.25 0.5 0.75 1 Correlation Qualitatively similar to Gaussian channel results

However As population size increases, introducing connection-induced dependence reduces capacity Capacity unaffected by input- or connectioninduced dependence Fits with previous results derived using Fisher information

Poisson vs. Non-Poisson Models Results derived using a Poisson assumption How about non-poisson models? Probably impossible to extend Bernoulli approach to interesting non-poisson cases, but Kabanov showed that the single-channel Poisson capacity bounded the capacity of all other point process models Does this bound apply to multi-channel processes as well?

Connection-induced dependence Bernoulli model vague about how correlations are induced If internal feedback is used Feedback can increase capacity M. Lexa has shown that internal feedback can increase the performance of distributed classifiers Y 1 X Y 2 Y 3

Conclusions From two theoretical viewpoints, connectioninduced dependence not required to increase capacity Specific forms of dependence may increase a population s processing power Capacity afforded by non-poisson models probably bounded by Poisson result, but not in detail