Exam, Solutions - PDF Free Download

Exam, - Solutions Q Constructing a balanced sequence containing three kinds of stimuli Here we design a balanced cyclic sequence for three kinds of stimuli (labeled {,, }, in which every three-element sequence (except for the sequence {,,}occurs exactly once We do this by extending the finite field to make a field of size 7, GF (, nalogous to, is the field containing {,, }, with addition and multiplication defined (mod The polynomial x + x+ = has no solutions in, so we add a formal quantity ξ to, and assert that ξ + ξ+ =, and that ξ satisfies the associative, commutative, and distributive laws for addition and multiplication with itself and with the elements of r Using ξ + ξ+ =, express ξ in terms of ξ, ξ,and ξ, for r =,,6 Note that the sequence of coefficients of ξ, considered cyclically, has the property that ever sequence of three labels (except for the sequence {,,}occurs exactly once Field operations are mod, ie, we can replace by +, and by, by, etc So, for example, ξ + ξ+ = implies ξ = ξ = ξ+, and ξ = ξ+ = ξ+ So, using the field properties and the equation satisfied by ξ, we find for example that = = ( + = + ; ξ ξ ξ ξ ξ ξ ξ Working similarly, the table of coefficients is: ξ ξ ξ ξ = ξ = ξ = ξ = ξ = ξ = 6 ξ = 7 ξ = 8 ξ = 9 ξ = ξ = ξ = ξ = ξ = ξ = ξ = 6 ξ = 7 ξ = 8 ξ = 9 ξ = ξ = ξ = ξ = ξ = ξ = ξ = 6 ξ = = = ( + = + = + + ξ ξ ξ ξ ξ ξ ξ ξ ξ ξ

B Consider the sequence of the coefficients of ξ that occur in the expansion of ξ k, for k =,, Note that the second half of the sequence can be obtained from the first half of the sequence by exchanging and Why must this be true? k s the above table shows, ξ = So ξ coefficient of + k = ξ, so the coefficient of ξ in ξ k + is twice the ξ in ξ k, which means that is replaced by =, and is replaced by = C What is the size of the multiplicative group generated by ξ? Show that for all elements g in this 6 group, g = The group has 6 elements, ξ =, ξξ,,, ξ, since the first time that the identity recurs in the above 6 table is ξ = (This in turn means that the above table contains all the nonzero elements of the field GF (,, since this field has = 7 elements Since the order of every element of a finite group must divide the size of the group, the only possible 6 orders are the factors of 6, namely,,,, and 6 To see that this implies that g = for all group elements g, note that 6 g = = g = g =, and if 6 g, then ( = 6 g, then ( g = g = So in all cases, D Which of the following maps are automorphisms of the multiplicative group (ie, are - maps that preserve multiplication? U( g= g, X ( g= g, Y( g= g U( g= elements onto g cannot be an automorphism, since ξ = but X ( g= g is an automorphism; its inverse is ( = Y g g is an automorphism; its inverse is ( ( ( ( ( ( ( U 6 ( ξ = ξ = ( ( ( X X, since ( Y Y Y, since = = = = = 6 6 + Y Y Y Y g g g g g g,so U maps two different ( 7 6 + X X X g = g = g = g = g E Which of the above maps are automorphisms of the field GF (, (ie, are - maps that preserve multiplication and addition? U( g= multiplicative group X ( g= g cannot be a field automorphism, since (as in D it is not even an automorphism of the g is a field automorphism, since X ( g+ h = ( g+ h = g + g h+ gh + h = g + h = X( g + X( h

Y( g= Y( g+ h = Y( g + Y( h convenient way to do this is g =, h = ξ From the above table, g is not a field automorphism One way to see this is to try some random choices to see if 9 9 9 6+ 9 9 + ξ= ξ, so Y( + ξ = Y( ξ = ξ = ξ = ξ = ξ = ξ + ξ+ On the other hand, Y( + Y( ξ = + ξ = + (ξ + ξ+ = ξ + ξ So Y does not preserve addition Q Decomposing group representations ecall that if we have a group G with a representation U in V and another representation U in V, we can define a group representation U U on V V by U U ( v v = U ( v U ( v Here we will show that if V and V are the same (ie, if (, g, g, g, g V= V = V and U and U are the same ( U= U = U,we can decompose U U into a symmetric and an antisymmetric component, and determine their characters We will do this by decomposing V V into sym( V V and anti( V V, and seeing how each transformation in U U acts in these components ecall that a linear transformation on V acts on a typical element sym( x y = ( x y + y x of sym( V V by ( sym( x y = ( ( x ( y + ( y ( x = sym( x y, and similarly for the action of in anti( V V Given that on V has distinct eigenvalues λ,, λm and eigenvectors v,, vm (and that V has dimension m, find the eigenvalues and eigenvectors for the action of in sym( V V and anti( V V and their traces, denoted tr( sym( and tr( anti( convenient basis for sym( V V is sym( v vk,, m with k These are eigenvectors of, since v ( v ( + v ( k v ( λλ k( v vk + λλ k ( vk v symv ( ( vk = = = λλ ksymv ( vk This calculation also shows that the eigenvalues are the products λλ k We don t consider pairs, k for which k>, since these have already been counted: sym( v v = sym( v v, for any pair, k in { } k k similar calculation holds for the action of in anti( V V, except that anti( v vk is only a basis element (and an eigenvector if k, since anti( v v = ( v v v v = We also don t consider pairs, k for which k>, since these have already been counted: anti( v v = anti( v v k k Thus, the eigenvalues for the action of in sym( V V are all pairwise products λλ k with k mm ( mm ( + (including λ a total of + m =, corresponding to the dimension of sym( V V m So tr( sym( = λλ k k

The eigenvalues for the action of in anti( V V are all pairwise products λλ k with < k ( (excluding λ a total of mm quantities, corresponding to the dimension of anti ( V V m So tr( anti( = λλ < k k B Our next step is to express tr( sym( (ie, the trace of acting in sym( V V, and tr( anti( (ie, the trace of acting in anti( V V, in terms of easier quantities Given the same setup as above, find the trace of (acting inv V and the trace of (acting in V, and relate this to tr( sym( and tr( anti( To calculate tr( (the sum of its eigenvalues, we use the basis v vk (all and k from to m for V V The eigenvalue corresponding to v vk is λλ k, as ( ( v vk = ( v vk = ( λv λkvk = λλk( v vk So m m tr( = λλ k = λ = ( tr( k, = = This can be rephrased as m m m ( tr( = λλ k = λλ k + λ k, = < k = To calculate tr(, we use the basis v for V The eigenvalue corresponding to v is v = ( v = ( λv = λv = λ v To express tr( sym( in terms of tr( ( tr( m m m k k k < k = tr( sym( = λλ = λλ + λ = and tr( : m m m From ( tr( = λλ k = λλ k + λ k, = < k m =, we have ( ( λλ = tr Therefore, m m m tr( sym( = λλ k + λ = ( tr( λ + = (( tr( + tr( < k = =, and m m tr( anti( = λλ k = ( tr( λ = (( tr( tr( < k = C Finally, recalling that the character is defined by L( g tr( Lg in terms of χ U λ, as m k λ < k = χ =, express χ and sym( U U χanti( U U

( ( χ ( g = tr sym( U U = tr U + tr U = (( tr( Ug + tr ( U = (( χu( g + χu( g g Similarly, χ ( ( g = tr ( anti( U U = tr( U tr U = (( tr( Ug tr ( U = (( χu( g χu( g g ( ( ( ( sym( U U g g g g (( (( anti U U g g g g Q: Mutual information: synergy, redundancy, etc Consider a stimulus S that is equally likely to have one of several values, {,,, N }, and two neurons that are influenced by it We only concern ourselves with snapshots, so the responses and can be considered to be binary {,} In this scenario, we can calculate the information conveyed by each neuron alone: I= H( S + H( H( S, and I = H( S + H( H( S,, as well as the information conveyed by the entire population, I = H( S + H(, + H( S,, Note that H ( S,, is the entropy of the table of N entries, listing the probability p( S,, of that each value of S is associated with one of the four outputs patterns that and can produce (ie, { =, = }, { =, = }, { =, = }, { =, = } This table also determines the oint probabilities of S and each i, since ps (, = ps (,, = + ps (,, = and similarly p( S, = ps (, =, + ps (, =, Find an example of p( S,, that illustrates each of the behaviors below, or, alternatively, show that the behavior is impossible It may be work in terms of the stimulus probabilities p( S and the conditional probabilities p(, S (the probability that particular values of and occur, given a value of S, noting that p( S,, =, SpS ( I >, I >, I + = I+ I(independent channels Take N = and consider S to be a two-bit binary number (,,,, each with equal probability ( ps ( = / Have each neuron care only about one of the stimulus bits and ignore the other Since the bits (the channels are independent, the information in each channel should add To work this out as a simple example: we can have code the first bit with perfect reliability, and code the second bit with perfect reliability So we expect that I = I =, and I + = To verify: if S = ss(as bits, then we can take, S = if = s and = s, and zero otherwise To calculate I + : Express the stimulus-response relationship as a table

, S = = = = = = = = S = S = S = S = Since there are four equally likely inputs ( ps ( = /, the input entropy is log = Since there are also four equally likely outputs, the output entropy is also log = There are also equally likely table entries, so the table entropy is log = The information I + = + = To calculate I : educe the above table, by summing over values of : S = = S = S = S = S = The input entropy is log = The output entropy is log = There are equally likely table entries, so the table entropy is log = The information I = + = The above is spelling things out in much more detail than necessary, and are all straightforward consequences of a basic property of information, namely, that for independent channels, information adds The idea of course generalizes to neurons that are not fully reliable (ie, p( i = si = pi; in this case, the table for I + is = = = =, S = = = = S = pp ( p p p( p ( p( p, S = ( p p pp ( p( p p( p S = p( p ( p( p pp ( p p S = ( p ( p p ( p ( p p p p and the table for I is S = = S = p p S = p p S = p p S = p p Here, I = + plog p + ( p log ( p, which is nonzero provided p / i i i i i B I >, I >, I = max( I, I (completely redundant channels + i

We can ensure that I I + I and identical To spell this out: Let S be binary, and take = = = =, S = = = = S = p p S = p p That is, the only response configurations with nonzero probability are those for which =, and, and have reliability p( = s i = p The table for I is S = = S = p p S = p p For these, I I I plog p ( + p log ( p C I >, I >, max( I, I < I + < I+ I (partially redundant channels We can create partial redundancy by starting from, and eliminating the any single input (for example, S =, so that now, if either neuron indicates a input, then the other input must be and the other neuron s activity is uninformative So N = ; ps ( = = ps ( = = ps ( = = / For I +, the table is = = = =, S = = = = S = S = S = The input entropy, the output entropy, and the table entropy are all log (three equally likely possibilities for all, so I + = log+ log log = log 8 For I, S = = S = S = S = The input entropy and the table entropy are still log, but the output entropy is h = / log ( / / log (/ 98 So I = log + h log = h, and indeed, h = max( I, I < I < I + I = h + This of course also extends to less-than-reliable neurons, as above I > I + I ( synergistic channels D +

We can set up a scenario in which neither neuron alone has any information about the stimulus, but the pair does To do this, we caricature the idea that when S =, the neurons are asynchronous, but when S =, they are synchronized For I +, the table is = = = =, S = = = = S = / / S = / / The input entropy is, the output entropy is (all possible outputs have probability /, and the table entropy is So I + = + = But for each neuron alone, the information is : i S i = i = S = / /, S = / / since input entropy is, output entropy is, and table entropy is E I + < max( I, I ( occluding channels This is impossible, because of the Data Processing Inequality In detail: the response variable can be derived from the response variable (, by ignoring Therefore, I + cannot be less than I Similarly, I + cannot be less than than I So I < max( I, I is impossible + I < min( I, I ( strongly occluding channels This is impossible; it is a special case of E F + Q Mutual information with additive noise Here we will determine the mutual information between a Gaussian stimulus s and a response r that are related by r = gs+ a, where g is a constant (the gain, and a is an additive noise, uncorrelated with the input For definiteness, we assume that the stimulus s has variance V S and the noise term is drawn from a Gaussian with variance V, and that both have mean x For a Gaussian distribution of variance V, /V pv ( x = e, calculate the differential πv entropy The differential entropy is HV = pv( xlog pv( x dx Via standard steps: x HV = pv( xlog pv( x dx = pv( xln pv( x dx = pv( x ln dx log( log( πv V

The first term in parentheses is ust a constant; its contribution to the integral can be calculated from p V ( x dx = (since p V is a probability distribution The second term is proportional to contribution can be calculated from V V V x pv ( x dx = (since p V has variance V So, x ; its H = p ( xlog p ( x dx = ln ln ln ln log V V e πv = + = + log log ( ( π ( π B How is the output distributed (ie, what is p( r? What is its differential entropy? Since r = gs+ a, it is a sum of two independent Gaussian components, both of mean Therefore r is distributed as a Gaussian, of mean Its variance is the sum of the variances of the two terms: ( ( S V = r = gs+ a = gs + gsa+ a = g s + g sa + a = g V + V, where sa = because we have hypothesized that they are independent Since r is distributed as a Gaussian with the above variance, its differential entropy is H ( ( = ln g VS + V + ln π e log C What is the distribution of the output, conditional on a particular value of the input, say, s = s? Ie, what is p( r s? What is its differential entropy? With s = s given, r = gs + a, so p( r s is a Gaussian with mean gs and variance V The mean does not affect the differential entropy, as it ust translates the distribution So Hs = ( lnv ln + π e log D Calculate the mutual information between the input and the output We do this by comparing the unconditional entropy of the output, H, with the average condition entropy, H s, ie, I( S, = H psh ( s ds Since H s is constant (part C, this reduces to I( S, = H Hs, for any s Thus, gvs + V I( S, = ( ln( g VS + V + lnπe ( ln( V + lnπe = ln log log log V E Calculate the signal to-noise ratio, namely, the ratio of the variance of the signal term, gs, to the variance of the noise term, a elate this to the mutual information ( gs VS SN = = g So I( S, = ( SN+, a classic result a V log

F Calculate the correlation coefficient between the input and the output, ( s s ( r r C = elate this to the mutual information ( s s ( r r For the numerator of C: gvs gvs So C = = V V V g V + V S S S = = + = + = ( s s ( r r sr s( gs a gs sa gvs, from which gv + VV V = and = and C g V S S g VS C S gvs + V gv S I( S, = ln = ln + log V log V = ln + = ln + = ln = ln C C log log C log C log C another classic result, Q Toy examples of IC Take four data points, located at ( ±, ±, (see diagram I and consider its proection onto a unit vector v θ = ( cos θ,sinθ, to form a distribution of points x x I How does the variance of this distribution depend on θ? For which proections is it maximized (equivalently, what direction(s would be selected by PC? The proected distribution consists of the four points at positions xi = ± cosθ ± sinθ along v θ Their mean is zero So the variance is (( ( ( ( V = x = cos θ + sin θ + cos θ sin θ + cos θ + sin θ + cos θ sin θ, = cos θ + sin θ = which is independent of θ From the point of view of PC, no directions are special (ie, each accounts for the same amount of variance

B s in, but determine how the kurtosis of the proected distribution depends on θ ecall, kurtosis is defined by κ = ( x x V, where V = ( x x is the variance For which directions is kurtosis largest? Since x = and V =, (( ( ( ( κ = x = cosθ + sinθ + cosθ sinθ + cosθ + sinθ + cosθ sinθ = cos θ + 6cos θsin θ + sin θ One could graph it ( or, note that κ = cos θ + 6cos θsin θ + sin θ = ( cos θ + sin θ + cos θsin θ = cos θ sin θ = cosθ sinθ = sin θ ( ( So kurtosis is largest when ( sin θ is largest, ie, when sin θ = ±, which happens when θ = π /, π /,, ie, when θ = π /, π /, Using maximum kurtosis as a criterion, IC would identify the oblique axes as the unmixed coordinates C s in, but determine how the entropy of the proected distribution depends on θ For which proections is it minimized? The proected distribution consists of the four points ( θ θ ± cos ± sin v θ t all typical angles, ie, angles that are not multiples of π /, the four proection points are distinct In these cases, each value ( ± cosθ ± sinθ has probability /, and the distribution has entropy log (/ = t atypical angles that correspond to the cardinal directions θ =, π /, π,, the points coincide in pairs, and each pair proects onto one of two values, ± So they have probability /, and the distribution has entropy log (/ = t atypical angles that correspond to the oblique directions θ = π /, π /,, two of the points coincide (and proect to, and two do not, and proect to ± So the point has probability /, and each of the points ± have probability / The distribution has entropy log (/ log (/ = / So the entropy is minimized at the cardinal angles, θ =, π /, π, Using minimum entropy as a criterion, IC would identify the cardinal axes as the unmixed coordinates

x x II D Same as in, but take five data points, located at ( ±, ± and also the origin, (see diagram II and consider its proection onto a unit vector v θ = ( cos θ,sinθ, to form a distribution of points How does the variance of this distribution depend on θ? For which proections is it maximized (equivalently, what direction(s would be selected by PC? The proected distribution consists of five points, four at positions xi = ± cosθ ± sinθ and one at the origin So the variance is (( ( ( ( V = x = cos θ + sin θ + cos θ sin θ + cos θ + sin θ + cos θ sin θ, = cos θ + sin θ = again independent of θ So again, from the point of view of PC, no directions are special E s in B, but determine how the kurtosis of the proected distribution depends on θ For which directions is kurtosis largest? Since x = and V = /, (( ( ( ( x κ = = cos sin cos sin cos sin cos sin θ + θ + θ θ + θ + θ + θ θ (/ = ( cos θ + 6cos θ sin θ + sin θ This of course differs from the result in B only by a scale factor and an offset, so again the kurtosis is largest when ( sin θ is largest, ie, when θ = π /, π /, Using maximum kurtosis as a criterion, IC would identify the oblique axes as the unmixed coordinates F s in B, but determine how the entropy of the proected distribution depends on θ For which proections is it minimized? The proected distribution consists of the five points ( θ θ ± cos ± sin v θ and the origin t all typical angles, ie, angles that are not multiples of π /, the five proection points are distinct In these cases, each has probability /, and the distribution has entropy log (/ = log

t angles that correspond to the cardinal directions θ =, π /, π,, four of the points coincide in pairs, and there is one singleton The two values, ± (the results of the coincident pairs have probability /, and the origin has probability / So the distribution has entropy log(/ log(/ log(/ = log t angles that correspond to the oblique directions θ = π /, π /,, three of the points coincide (and proect to, and two do not, and proect to ± So the point has probability /, and each of the points ± have probability / The distribution has entropy log (/ log (/ log (/ = log log 7 So the entropy is minimized at the oblique angles, θ = π /, π /, Using minimum entropy as a criterion, IC would identify the oblique axes as the unmixed coordinates dding one point makes a difference for entropy, but not for the kurtosis Q6 Linear systems and feedback The diagram shows a linear system with input St ( and output ( t, and the filters F, i G i, and H are linear filters with transfer functions F ( ω, G ( ω, and H ( ω i H i S(t + F + F F (t G Find the Fourier transform S ( ω of St ( in terms of the Fourier transform ( ω of ( t Write X ( t as the output of F and Y( t as the output of F : H G S(t X(t + F + F Y(t F (t G G Then (omitting the ω -argument throughout, and using the standard rules for the transfer functions of combined linear systems, Y = F ( X + FGY F, from whichy = X FFG

Similarly, X = F( S + FG X + FHY, and, substituting the above equation for Y, F X = F S + FG X + FH X, from which FFG FS X = F FFG FFH FFG F Using Y = X for Y FFG : FFS Y = FFG FFG FFFH, so ( ( FFF = FY = S FFG FFG FFFH ( ( B Write out the relationship between S ( ω and ( ω when (a the feedback between F and F is absent, (b the feedback between F and F is absent, or (c the feedback between F and F is absent This is readily done by setting the appropriate filter in the above circuit to zero: (a Setting G FFF = yields = FFG FFFH (b Setting G FFF = yields = FFG FFFH (c Setting H = yields = FFF S S ( FFG ( FFG C Say that it is known that F = F = F = F, and that G = G = G Now consider the four configurations above (the full configuration, and the three configurations of part B, in which one of the three feedback filters has been removed Do they all produce different outputs? The transfer functions are given by: S Full system: = F ( S FG FH G F removed: = S F G F H G F removed: = S F G F H, the same as G removed H F removed: = S FG (