Exercise Sheet 4: Covariance and Correlation, Bayes theorem, and Linear discriminant analysis

Exercise Sheet 4: Covariance and Correlation, Bayes theorem, and Linear discriminant analysis Younesse Kaddar. Covariance and Correlation Assume that we have recorded two neurons in the two-alternative-forced choice task discussed in class. We denote the firing rate of neuron in trial i as and the firing rate of neuron as. We furthermore denote the averaged firing rate of neuron as and of neuron as. Let us "center" the data,i r,i r r by defining two data vectors: x, r, r y r, r r, r (a Show that the variance of the firing rates of the first neuron is: Var( = x The actual variance of is not known. So to estimate it, we compute the average of the squared deviations on a sample (,, of the population: the biased sample variance:,, Var biased ( = (,i r i= = i=,i μ Var( To what extent is this estimate biased? By denoting by (resp. the actual mean (resp. variance of :

E( Var biased ( = E ( (,i r i= = E i= (,i,j j= = E r +,i i= r,i,j j= (,j j= = E ( + r,i i= r,i,j j=,j,k j,k = ( E( E( + E( i= r,i j=,i,j j,k = ( E( E( E( E( i= r,i r,i j= j i,i,j,j,k + E( + E( E( j r,j j k,j = ( E( E( E( E( r r,i,i j= j i,i,j,k + E( + E( E( j r,j j k = ( E( r E( r,i,i Var( + μ j= j i,j,k E( r,i E(,j μ + E( r + E(,j,j E(,k j j k =μ =μ = ((Var( + μ (Var( + ( + (Var( + + μ μ μ ( μ = Var( = μ (by independence Thus we get the following unbiased sample variance: Var( = ( Var biased = (,i r i= = x since x = (,i r i= And we get the expected result: Var( = x (b Compute the cosine of the angle between x and y. What do you get? cos(x, y = x y x y (,i r ( r,i r i= = ( Var( ( Var( = (,i r ( r,i r i= Var( Var( r r Analogously to what we did in the previous question, we can show that:

Cov(, r = ( (,i r r,i r i= Therefore: Cov(, r cos(x, y = Var( Var( r which is the correlation coefficient ( c What are the maximum and minimum values that the correlation coefficient between and r can take? Why? By the Cauchy-Schwarz inequality: Cov(, Var( Var( r r i.e. Cov(, r Var( Var( r hence Cov(, r Var( Var( The maximum (resp. minimum value of the correlation coefficient between and is (resp. r r By Cauchy-Schwarz: α 0 i = α( 0 α 0 i = α( a correlation coefficient of corresponds to colinear vectors: there exists such that for all,,i r r,i r a correlation coefficient of corresponds to uncorrelated vectors a correlation coefficient of corresponds to anti-colinear vectors: there exists such that for all,,i r r,i r (d What do you think the term centered refers to? Centered means that the data sample mean is equal to 0. We can easily check it is the case here (let s do it for x, it is similar for y: i= (,i r = (,i r = 0 i=,i i=. Bayes theorem The theorem of Bayes summarizes all the knowledge we have about about the stimulus by observing the responses of a set of neurons, independently of the specific decoding rule. To get a better intuition about this theorem, we will look at the motion discrimination task again and compute the probability ( ( s {, } r that the stimulus moved to the left or right. For a stimulus, and a firing rate response of a single neuron, Bayes theorem reads: p(s r = p(r sp(s p(r p(r s r s Here, is the probability that the firing rate is if the stimulus was. The respective distribution can be measured and we assume that it follows a Gaussian probability density with mean and standard deviation σ: The relative frequency with which the stimuli (leftward or rightward motion, or appear is denoted by, often called the prior probability or, for short, the prior. The distribution p(r μ s (r μ p(r s = exp( s σ πσ p(s denotes the probability of observing a response r, independent of any knowledge about the stimulus.

(a How can you calculate As {{ }, { }} p(r is a partition of the sample space:? What shape does it have? s s p(r = p(r s = p(r sp(s = p(r p( + p(r p( As a result: p(r = (p( exp( + p( exp( (r μ (r μ σ σ πσ To use the same example as in the last exercise sheet: if the firing rates range from to Hz, and the means and standard deviation are set to be: 60 Hz μ 40 Hz μ σ = Hz 0 00 then we have: Figure.a. - Plot of the neuron's response probability for left and right motion Figure.a. - Plot of the previous neuron's response (in green probability, if p( = /3 and p( = /3

(b The distribution is often called the posterior probability or, for short, the posterior. Calculate the posterior for s = and sketch it as a function of r, assuming a prior p( = p( = /. Draw the posterior into the same plot. By Bayes theorem: p( r p(s r (rμ ( So is a sigmoid of μ μ μ. p( r p( r p( r As for, as is a probability distribution: σ p(r p( p( r = p(r p(r p( = p(r p( + p(r p( = + p(r p( p(r p( = + p(r p(r = (rμ (r + exp( μ σ = (r + exp( μ ( μ μ μ σ p( r = p( r Figure.b. - Plot of the previous neuron's response (in green probability, if p( = p( = /

Figure.b. - Plot of the previous neuron's posteriors, if p( = p( = / p( ( c What happens if you change the prior? Investigate how the posterior changes if becomes much larger than and vice versa. Make a sketch similar to (b. p( p( If the prior changes, the ratio is no longer equal to and the posterior changes as well, insofar as the general expression of p( r is: p( p( r = p( (r + exp( μ μ ( μ μ p( σ = = p( r (r + exp( μ μ ( μ μ p( + ln( σ p( As a matter of fact: p( >> p( if (for a fixed r: p( << p( if (for a fixed r: p( (r μ p( r exp( μ ( μ μ p( σ p( p( r p( p( 0 + + p( 0 +

Figure.c - Plot of the previous neuron's posteriors, for various values of the prior (d Let us assume that we decide for leftward motion whenever μ μ. Interpret this decision rule in the plots above. How well does this rule do depending on the prior? What do you lose when you move from the full posterior to a simple decision (decoding rule? B: the decision rule mentioned in the problem statement suggests that μ μ, i.e. we are now considering a neuron whose tuning is such that is fires more strongly if the motion stimulus moves leftwards. As nothing was specified before, the example neuron we were focusing on so far was of opposite tuning. We will, from now on, switch to a neuron such that: μ > > μ r > ( + Figure.d. - Plot of the response probability of the neuron we are focusing on from now on, for left and right motion

Figure.d. - Plot of the new neuron posteriors (for various priors and r > ( + r > ( μ + μ decision rule for leftward motion The μ μ decision rule corresponds to inferring that the motion is to the left (resp. to the right in the red (resp. blue area of figure.d., which is perfectly suited to the case where (bold dash-dotted curves. p( = p( = / However, it does a poor job at predicting the motion direction in the other cases, since it does not take into account the changing prior. To infer a more sensible rule, let us reconsider the situation: we want to come up with a threshold rate such that an ideal observer - making the decision that the motion that caused the neuron to fire at r > r th was to the left (preferred direction of the neuron - would have a higher chance of being right than wrong. In other words, the decision rule we are looking for is: p( r > p( r r th As such, r th satisfies: p( r th = p( r th p( = / r th (r th μ exp( μ ( μ μ p( + ln( = σ p( σ = p( μ r th ln( + + μ μ μ p( A more sensible decoding rule would then be: B: we can check that, in case p( = p( σ r > p( ln( + μ μ p(, we do recover the previous rule. μ + μ However, moving from the full posterior to a simple decision rule makes us lose the information relating to how much we are sure about the decision p( r p( r p( r > p( r made. Indeed, the full content of the information is contained in the posterior values and. If, then there is a higher chance that the motion that caused the neuron to fire was to the left, but just stating that doesn t tell you how much higher. You need p( r and p( r p( r to do so: as it happens, the chance of leftward motion is times higher. p( r 3. Linear discriminant analysis Let us redo the calculations for the case of neurons. If we denote by r the vector of firing rates, Bayes theorem reads: p(s r = p(r sp(s p(r We assume that the distribution of firing rates again follows a Gaussian so that:

p(r s = exp( (r (r (π / det μ C s T C μ s where μ s denotes the mean of the density for stimulus s {, } C is the covariance matrix assumed identical for both stimuli (a Compute the log-likelihood ratio l(r ln p(r p(r l(r = ln exp( (r (r (π / det C μ T C μ exp( (r (r (π / det C μ T C μ = ln( exp( (r μ (r T C μ exp( (r μ T C (r μ = ln(exp( (r μ (r ln(exp( (r (r T C μ μ T C μ = (r μ (r (r (r T C μ μ T C μ = ( r + r rt C μ T C μ r T C μ μ T C r T C r μ T C μ + r T C μ + μ T C r μt C μ μ T C μ r T C μ μ T C r T C μ μ T C = ( ( + r + ( + r But: R r T C μ s = ( r T C μ s T = μ T s ( C T r since C is a covariance matrix, it is symmetric, hence: ( = ( C = = I, and C T C T C T I T ( C T = ( C T Therefore: l(r = ( + μt C μ μ T C μ r T C μ r T C μ and l(r = ( + μt C μ μ T C μ r T C ( μ μ α β (b Assume that is the decision boundary, so that any firing rate vector r giving a log-likelihood ratio larger than zero is classified as coming from the stimulus. Compute a formula for the decision boundary. What shape does this boundary have? First, let us show how we could come up with the So the l(r > 0 l(r = 0 l(r > 0 decision boundary is a good one if and only if decision boundary. p( r > p( r p( r > p( r p(r p( /p(r > p(r p( /p(r p(r p( > p(r p( p(r p( > p(r p( p(r p( l(r ln( > ln( p(r p(

p( ln( = 0p( = p( = / p( The formula for this decision boundary is the following one: l(r = 0 ( + = 0 μt C μ μ T C μ r T C ( μ μ It s an affine hyperplane (i.e. a subspace of codimension in the -dimensional space. α r T β = α n R n β p( = p( = / ( c Assume that. Assume we are analyzing two neurons with uncorrelated activities, so that the covariance matrix is: σ C = ( 0 0 σ Sketch the decision boundary for this case. With: μ (60 Hz, 40 Hz μ (40 Hz, 60 Hz C ( Hz 0 0 5 Hz l(r = + > 0 we get, by resorting to the LDA decision boundary r T β α : Figure 3.c - Plot of the LDA decision boundary for the two neurons with uncorrelated activities