Sean Escola. Center for Theoretical Neuroscience

Similar documents
Neural Coding: Integrate-and-Fire Models of Single and Multi-Neuron Responses

CHARACTERIZATION OF NONLINEAR NEURON RESPONSES

State-Space Methods for Inferring Spike Trains from Calcium Imaging

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

CHARACTERIZATION OF NONLINEAR NEURON RESPONSES

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Statistical analysis of neural data: Discrete-space hidden Markov models

Introduction to Machine Learning CMU-10701

Neural Encoding Models

An Introductory Course in Computational Neuroscience

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing

THE retina in general consists of three layers: photoreceptors

Linear Dynamical Systems (Kalman filter)

STA 4273H: Statistical Machine Learning

STA 414/2104: Machine Learning

Bayesian probability theory and generative models

Characterization of Nonlinear Neuron Responses

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Introduction to Machine Learning Midterm, Tues April 8

Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons

A brief introduction to Conditional Random Fields

Neuronal Dynamics: Computational Neuroscience of Single Neurons

Hierarchy. Will Penny. 24th March Hierarchy. Will Penny. Linear Models. Convergence. Nonlinear Models. References

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

Basic math for biology

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Statistical Modeling. Prof. William H. Press CAM 397: Introduction to Mathematical Modeling 11/3/08 11/5/08

Pattern Recognition and Machine Learning

SPIKE TRIGGERED APPROACHES. Odelia Schwartz Computational Neuroscience Course 2017

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Chapter 3. Tohru Katayama

Comparing integrate-and-fire models estimated using intracellular and extracellular data 1

Lecture 9. Intro to Hidden Markov Models (finish up)

A new look at state-space models for neural data

Hidden Markov Models. Terminology, Representation and Basic Problems

Machine Learning for OR & FE

Statistical Methods for NLP

A Monte Carlo Sequential Estimation for Point Process Optimum Filtering

Mid Year Project Report: Statistical models of visual neurons

Unit 16: Hidden Markov Models

Neural Spike Train Analysis 1: Introduction to Point Processes

Linear Dynamical Systems

p(d θ ) l(θ ) 1.2 x x x

Machine Learning for natural language processing

HMM and IOHMM Modeling of EEG Rhythms for Asynchronous BCI Systems

Characterization of Nonlinear Neuron Responses

SUPPLEMENTARY INFORMATION

Natural Image Statistics

Bayesian Machine Learning - Lecture 7

CS 7180: Behavioral Modeling and Decision- making in AI

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Dynamic Approaches: The Hidden Markov Model

Lateral organization & computation

Information Dynamics Foundations and Applications

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

Advanced Data Science

Applications of Hidden Markov Models

Stephen Scott.

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

Graphical Models for Collaborative Filtering

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

CPSC 540: Machine Learning

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology

Hidden Markov Models

The homogeneous Poisson process

Hidden Markov Models

Hidden Markov models

Mathematical Tools for Neuroscience (NEU 314) Princeton University, Spring 2016 Jonathan Pillow. Homework 8: Logistic Regression & Information Theory

Hidden Markov Models for biological sequence analysis I

Hidden Markov Models. x 1 x 2 x 3 x K

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Note Set 5: Hidden Markov Models

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series

Maximum Likelihood Estimation of a Stochastic Integrate-and-Fire Neural Model

Neural characterization in partially observed populations of spiking neurons

Estimation of information-theoretic quantities

Statistical models for neural encoding, decoding, information estimation, and optimal on-line stimulus design

CSCE 471/871 Lecture 3: Markov Chains and

Consider the following spike trains from two different neurons N1 and N2:

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

CSE/NB 528 Final Lecture: All Good Things Must. CSE/NB 528: Final Lecture

A REVIEW AND APPLICATION OF HIDDEN MARKOV MODELS AND DOUBLE CHAIN MARKOV MODELS

Restricted Boltzmann Machines for Collaborative Filtering

Machine Learning 4771

This cannot be estimated directly... s 1. s 2. P(spike, stim) P(stim) P(spike stim) =

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Improved characterization of neural and behavioral response. state-space framework

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017

A minimalist s exposition of EM

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Finding a Basis for the Neural State

BAYESIAN STRUCTURAL INFERENCE AND THE BAUM-WELCH ALGORITHM

Brief Introduction of Machine Learning Techniques for Content Analysis

Bayesian Methods for Machine Learning

Supervised Learning Hidden Markov Models. Some of these slides were inspired by the tutorials of Andrew Moore

Transcription:

Employing hidden Markov models of neural spike-trains toward the improved estimation of linear receptive fields and the decoding of multiple firing regimes Sean Escola Center for Theoretical Neuroscience

LNP Model Overview Linear-nonlinear-Poisson Stimuli are passed through a linear filter The linear filter is the neuron s receptive field The result is passed through a nonlinearity to determine the instantaneous firing rate The rate defines an inhomogeneous Poisson process p spike in t,t + dt " t = f k # s t ( ) ( ) ( ) ( ) = t p( no spike) = e " t ( )dt ( )dt

LNP Model Learning Model parameters are learned by maximizing the log-likelihood f If is convex and log-concave, there is a unique solution easily found by gradient ascent It doesn t matter your is wrong L( " ) " log f k s ti f k s t dt & i# spikes f ( $ ( ))% $ ( ) ' ( )

HMM Overview Bayesian network model 2 events at every time-step (discrete model) Transition to next step (hidden) Emit observable (known) Emission probability distributions vary from state to state (in general) Therefore, can inferred underlying sequence of hidden states from sequence of observables Uses the Markov assumption The future is independent of the past given the present

Graphical model: Markov assumption: { } Factorized complete probability distribution X U = y ( 0,T ),q ( 0,T ) : Images courtesy of T. Jebara, Dept. of CS, Columbia U.

Improving RF estimation Original question: Can we improve receptive field estimation by categorizing spikes as either informative or not informative about the stimulus, and then only use the relevant spikes to calculate the RF? HMM version: Can we learn when the neuron is in a stimulus attentive state and when it is in some other state?

A 2-state model State 1: Attending to stimulus State 2: Attending to other activity Biologically reasonability: UP/DOWN states Tonic-burst LGN neurons α 11 (t) α 12 (t) α 22 (t) State 1 State 2 α 21 (t)

Transition matrix: Defined by rates is in LNP model e " g # ( k12 $ % s ( t) & )dt g # ( k12 $ % s ( t) )dt) ( t) = ( g # ( k21 $ % s ( t) )dt e " g + ( # ( k21 $ % s ( t) )dt + ' * Emission matrix: LNP model for state-1 (stimulus dependent) Homogeneous Poisson process for state-2 Model parameters: t k12 & ( ' ( ) = f " ( k s # s ( t) )dt e $ f " ( k s # s ( t) )dt % o dt e $% odt Linear filter for transitioning while in state-1 k21 k s Linear filter for transitioning while in state-2 The neuron s receptive field ) + *

HMM Max Likelihood Since we don t know the hidden variables, we can t maximize the complete log-likelihood Incomplete likelihood: ( ) = log " q0 # qi$1 q i ( t i ) L ' q 0,T ( ) i=1 Exponential in T Use Expectation-Maximization T % %& qi y i t i E-step: Guess the q i s at the current parameter settings (Baum-Welch) M-step: Maximize the complete log-likelihood using the guessed q i s EM is guaranteed to monotonically increased the incomplete log-likelihood M-step is concave if g and f are convex and log-concave T i=0 ( )

Algorithmic considerations Convergence can be slow EM can have linear convergence near the maximum likelihood solution Usually quadratic convergence It is possible to perform gradient ascent directly on the log-likelihood The exact gradient can be calculated This provides quadratic convergence The best approach is to switch between EM and GA depending on the local likelihood landscape Using a continuous time formulation can also help In discrete time the probabilities for all time-steps must be calculated You can easily have data with > 1e7 time-steps Almost all time-steps have no associated spikes In continuous time the probabilities are integrated from spiketime to spike-time This involves much less computation and memory

Computations can be numerically unstable Bernoulli approximation to Poisson distribution may fail when transition and firing rates get too high ( t)dt + e "( t)dt # 1 It s okay to use to true Poisson distribution for firing (i.e. you can have more than 1 spike in a time-step) ( t i ) = " t i The transition probabilities must be changed since it makes no sense to have more than 1 transition in a single time-step t ( ) = ( ( )dt) y i y i % 1 ' ' 1 + g " ( k12 # $ s ( t) )dt ' ' g " ( k21 # $ s ( t) )dt ' & 1 + g " ( k21 # $ s ( t) )dt e #" ( t i )dt g " ( k12 # $ s ( t) )dt ( 1+ g " k12 # $ s * ( ( t) )dt * * 1 1+ g " ( k21 # $ s * * ( t) )dt ) To guarantee concavity of M-step, must grow exponentially A continuous time formulation also solves this problem since it guarantees that the Bernoulli approximation is correct As dt->0, the new discrete formulation and the continuous formulation are equivalent g

Preliminary Results I simulated 100 seconds of data 1d, noisy sine-wave stimulus k12 likes positive stimulus values k21, like negative stimulus values k s Firing rate in state-1: ~20 Hz Firing rate in state-2: 10 Hz All nonlinearity were the exponential u e

The data were partitioned into ten 10 second segments. 10 HMMs and 10 standard LNP models were trained, 1 on each segment The remaining 9 segments were used to test the models The log-likelihoods shown are the total difference while testing from the log-likelihood achieved by the HMM trained on that segment All the HMMs outperformed all the LNP models on all segments

After training, the inferred hidden state values show that the model did learn to distinguish state-1 from state-2 There are 4 important combinations to predict Trivially (since the average firing rate is higher in state-1, and state-2 is more common): Being in state-1 and spiking (i.e. 0.91 s) Being in state-2 and not-spiking (i.e. 0.97 s) Not-trivially: Being in state-2 and spiking (i.e. 0.98 s) Being in state-1 and not spiking (shown elsewhere)