Multivariate Dependence and the Sarmanov-Lancaster Expansion

Size: px
Start display at page:

Download "Multivariate Dependence and the Sarmanov-Lancaster Expansion"

Transcription

1 Multivariate Dependence and the Sarmanov-Lancaster Expansion Ilan N. Goodman and Don H. Johnson ECE Department, Rice University Houston, TX July 5 Abstract We extend the work of Sarmanov and Lancaster to obtain an expansion for multivariate distributions. The expansion reveals a flexible, detailed dependence structure that goes beyond pairwise linear correlation to include non-linear and higher-order statistical dependencies. Through examples we show how to use the expansion to analyze existing distributions and to construct new distributions with given properties. We also provide a related dependence measure which we decompose into the separate contributions from each subset of random variables. Using the decomposition we analyze neural population data, revealing significant dependencies not captured by cross-correlation. 1 Introduction Characterizing statistical dependencies in large groups of random variables presents a considerable challenge; traditional multivariate models tend to be quite narrow in scope, and most commonly used dependence measures are appropriate only for a small class of random variables. For instance, the most popular dependence measure, the correlation coefficient, measures only linear dependence between pairs of random variables. For jointly Gaussian variables, this pairwise linear dependence completely characterizes the dependence structure. However, even simple examples show that, in general, groups of random variables can express more than just pairwise dependence. For example, consider N binary random variables. Fully specifying their joint distribution requires N 1 parameters; there are N(N 1)/ pairwise correlations, which, together with the N marginal probabilities, can specify the joint distribution only when N =. For larger groups, third and higher order dependencies must also be determined. While for some applications simple models like the multivariate Gaussian may be perfectly adequate, many applications require more flexible models that account for interactions between large numbers of variables. For example, neuroscientists are currently able to make recordings of tens to hundreds of neurons simultaneously, and studies have shown that these 1

2 ensembles exhibit time-varying dependencies that may contribute to stimulus encoding [1]. Consequently, dependence analysis techniques that can be applied to such high-dimensional random vectors are vital to cracking the neural population code, the way in which neurons encode information jointly. The challenge is to model dependencies in a coherent, meaningful way. In this paper we extend the work of Sarmanov [, 3] and Lancaster [4 6] to obtain the Sarmanov-Lancaster (SL) expansion that generates a highly flexible family of multivariate distributions having attractive properties for many applications. Like the more familiar copula models [7], the SL expansion expresses a joint probability function in terms of its univariate marginal distributions and a multivariate dependence structure. However, the SL expansion benefits both from being more flexible than most copulas, since it is applicable to an extremely broad class of distributions, and from inducing a more intuitive and meaningful dependence structure. From the SL expansion we also derive a powerful non-parametric dependence measure φ, which is a generalization of Pearson s coefficient of mean-square contingency [8]. This measure has the particularly useful property that it can be decomposed into elements that quantify the dependencies within each subset of the random variables. As a practical example, we analyze neural population data and show how the decomposition of φ provides a level of detail unavailable using conventional techniques. Background In their study of noise in nonlinear devices, Barrett and Lampard [9] discussed a double Fourier series expansion for certain bivariate probability distributions using orthogonal polynomials. They restricted their study to distributions whose expansions are diagonal; in other words, if the expansion coefficients are arranged into a matrix, they considered only distributions for which the off-diagonal elements are zero. Though this type of expansion would later be used by Sarmanov and Lancaster to generalize the theory of bivariate dependence, Barrett and Lampard focused on applications concerning processes that are subject to a nonlinear distortion. Several years later, Bahadur [1] described a similar expansion for the joint distribution of binary random variables. Bahadur also used orthogonal polynomials to expand the joint probability function, though his focus was quite different; he was concerned with analyzing multivariate categorical data, for instance to determine relationships between survey questions in a psychology experiment. Apparently unaware of this previous work, in papers published between 1958 and 1963 O. V. Sarmanov [, 3] and H. O. Lancaster [4, 5] each described what was essentially the same series expansion of bivariate probability functions. These authors were interested in two related dependence properties. Sarmanov sought to generalize the maximum correlation coefficient of Hirschfeld [11]. Sarmanov defined the maximum correlation coefficient to be the largest eigenvalue of a kernel related to the bivariate probability function, and showed that in the finite discrete case this quantity corresponds to Hirschfeld s coefficient. Furthermore, Sarmanov showed that if the correlation is purely linear, such as in the bivariate Gaussian case, his maximum correlation coefficient corresponds to the usual product-moment correlation coefficient.

3 Around the same time, Lancaster obtained the same expansion while generalizing Hotelling s [1] canonical correlation theory. Lancaster derived a set of canonical variables and correlations for a class of bivariate distributions, including (but not limited to) the bivariate Gaussian distribution. Moreover, he showed that if the expansion is diagonal, then the expansion variables and coefficients are the canonical variables and correlations as defined by Hotelling. Sarmanov and Lancaster each sought to generalize linear correlation theory to a larger class of non-gaussian bivariate distributions. While their methods differed slightly, they both treated correlation analysis as a problem of evaluating the spectrum of a kernel corresponding to the ratio of the joint distribution to the product of the marginals. The result is an expansion for a class of bivariate distributions that is expressed in terms of the product of the marginals and a set of correction terms that completely specify the dependence structure. Lancaster s derivation is somewhat more general; he considers the expansion of the Radon-Nikodym derivative of the joint distribution measure with respect to the product measure. As a result, the derivation is not limited to distributions having a density with respect to the Lebesgue measure. In the next section, we extend Lancaster s method to generalize the expansion to the case of more than two variables. 3 The Sarmanov-Lancaster (SL) Expansion To obtain the SL expansion, we begin by constructing a Hilbert basis for a space of univariate probability functions. Consider a random variable X on the probability space (Ω, B, P ), and define the space L (P ) to be the set of measurable functions g(x) having finite variance, L (P ) = { g : R R, g B(R)/B(R) s.t. E [g(x)] < }, (1) where E [g(x)] = Ω g (X)dP. Letting F = P X 1 be the distribution of X on (R, B(R)), we see that L (P ) defines a separable Hilbert space on R [13] with the inner product g, h = g(x)h(x)f (dx) = E [g(x)h(x)]. R Hence, L (P ) contains a complete orthonormal sequence {ψ i = ψ i (X)} i N L (P ), and every function g L (P ) can be expanded as g(x) = a i ψ i (X), i= where a i = E [g(x) ψ i (X)]. Extending the construction to a collection of random variables, we let X = (X 1,..., X N ), where each X n : (Ω n, B n ) (R, B(R)) is equipped with probability measure P n, and for each n we define the space L (P n ) in the same way as equation { } 1. As we have already shown, for each n there exists a complete orthonormal sequence ψ (n) i n in L (P n ) which is a basis for i n N that space. Define (Ω, B) = (Ω 1 Ω N, B 1 B N ) to be the product space, and let P = P 1 P N be the product measure. Finally, define the likelihood ratio Λ = dq/dp, 3

4 where Q is an arbitrary measure on the product space, absolutely continuous with respect to P (Λ is the Radon-Nikodym derivative of Q with respect to P ). Then, if Λ L (P ), we can expand Q as [ ] N dq = dp ψ (n) i n, () where a i1 i N = E [ Λ a i1 i N i 1,...,i N n=1 N n=1 ] ψ (n) i n = N Ω n=1 ψ (n) i n dq. The expansion is a straightforward application of Hilbert theory; { the space L (P } ) = L (P 1 ) L (P N ), so the tensor product of the marginal bases ψ (1) i 1 ψ (N) i N is a basis for L (P ) [14]. Equation provides a multivariate model that is uniquely determined by the marginal distributions and a set of dependence parameters which are expectations of products of functions defined on the marginals. In general, the choice of marginal bases is arbitrary, which makes the dependence structure induced by this model very flexible. However, this also means that the dependence structure may tell us very little about the actual statistical interactions between the variables. The Sarmanov-Lancaster (SL) expansion solves this problem by imposing a structure on the expansion coefficients. Specifically, we choose the marginal bases such that ψ (n) = 1 n. Then, the basis element N n=1 ψ(n) i n is a function of a subset of the N variables. Using the terminology of Bahadur [1], we say that the coefficient a i1 i N is of order k if the corresponding basis element is a function of exactly k variables. Define S (N) k, k =,..., N to be the class of all unique combinations of k integers between 1 and N, so that the elements of S (N) correspond to all distinct pairs, S (N) 3 to triplets, and so on. For example, if N = 3, we would have S (3) = {{1, }, {1, 3}, {, 3}} and S (3) 3 = {1,, 3}; in general, S (N) k = ( ) N k. Each element of S (N) k corresponds to a distinct subset of variables, so for example the set {1, } denotes the random variables X 1, X. Noting that each subset of variables is, in general, described by more than one coefficient, we define I (N) j (k) to be the class of indices denoting coefficients for the j th subset of order k. In other words, { } I (N) j (k) = i 1 i N : i n = n Sj c and i n > n S j, S j S (N) k. Continuing { the above } example, we have I (3) 1 () = {11, 1, 13,, 1,, 3, }. The classes (k) form a disjoint partition of the set of indices {i 1 i N }. We can now write I (N) j equation as j,k dq = dp 1 + N ( N k) k= j=1 i I (N) j (k) a i ψ i, (3) where for clarity we have streamlined the notation so that a i = a i1 i N and ψ i = N n=1 ψ(n) i n. Now, for a given i I (N) j (k), ψ i is a function only of the j th subset of k variables; hence, the 4

5 coefficients a i represent the interactions within that particular subset of variables. In other words, the subscript k denotes the order of the interaction, j indexes a particular subset of k variables, and i is an index into the coefficients corresponding to that set. Note that, in general, I (N) j (k) > 1, so there may be multiple coefficients associated with each subset of variables. Moreover, since the basis elements are orthogonal, the interaction in a given subset is independent of interactions in every other subset. For example, if N = 3, the coefficient a 11 denotes a pairwise interaction between X 1 and X, whereas the coefficient a 111 corresponds strictly to a third-order interaction between all three variables. In the special case of N =, Lancaster [4] showed that for any distribution for which Λ L (P ) there exist bases such that the resulting expansion is diagonal; in other words, a i1 i = when i 1 i. Moreover, the diagonal basis is unique, and the functions ψ i = ψ (1) i ψ () i and the coefficients a i are exactly equal to the canonical variables and correlations of the well known canonical correlation analysis. In adapting the expansion to the multivariate (N > ) distributions, it is unclear whether there is an analogous property. As an analytic tool, the SL expansion can be used in one of two ways. First, given a set of univariate marginal distributions, we can choose a corresponding set of bases and construct arbitrary joint distributions with a given dependence structure. When doing this, however, we must take care to ensure that the result is in fact a valid distribution. In general, the span of the coefficients a i that produces a valid distribution is not the entire real line. For example, certain sets of coefficients may result in the distribution being negative. It is usually difficult to know a priori if a given set of coefficients will produce a valid distribution. Hence, a certain amount of experimentation may be necessary to obtain a valid distribution this way. The second way in which the SL expansion can be used is to evaluate the dependence structure inherent in a given distribution. Given a joint distribution for which Λ L (P ), we can choose a convenient set of bases for the marginal spaces and determine the corresponding dependence coefficients. Exploiting the dependence structure of the SL expansion, we would then be able to tell what sort of interactions are induced by the given distribution. As we will show in section 4, it turns out that this information is independent of the choice of basis; as a result, performing this type of analysis does not require explicit calculation of the SL expansion, but rather can be found by decomposing a non-parametric dependence measure known as φ. Before introducing this dependence measure, however, we will discuss a few useful families of distributions that can be characterized by the SL expansion. 3.1 Example 1: Distributions with Gaussian marginals. An obvious choice of basis for the Gaussian distribution is the collection of Hermite polynomials, which are orthogonal with respect to the weighting function e x. The n th Hermite polynomial is defined as [15] dn H n (x) = ( 1) n e x dx n e x. Letting X = (X 1,..., X N ) be a collection of Gaussian random variables with means µ 1,..., µ N and variances σ 1,..., σ N, the SL expansion for the joint probability density function p X(x) is p X (x) = [ N p Xn (x n ) n=1 i 1 = a i1 i N N/ i N = 5 N n=1 i n i n! H i n ( ) ] xn µ n. (4) σ n

6 (a) (b) Figure 1: Expanding the joint distribution of Gaussian random variables. Panel (a) shows the contours of the distribution of two standard normal random variables that are jointly Gaussian. Panel (b) shows the contours of a different distribution of two standard normal variables that has off-diagonal terms in its SL expansion. The random variables in this case are not jointly Gaussian. Note the addition of weighting factors which normalize the Hermite polynomials with respect to the non-standard marginal distributions. In general, distributions taking the form of equation 4 are not jointly Gaussian, although each of the marginal distributions is Gaussian. This is clearly the case when the expansion includes non-linear and non-pairwise terms. Figure 1 shows an example of two bivariate distributions with standard normal marginals having very different dependence structures. In the special case of two jointly Gaussian variables, Barrett and Lampard [9] used Mehler s expansion to express the expansion coefficients in terms of the correlation coefficient, resulting in the simplified expression [ ( ) ( ) ] ρ i p X (x) = p X1 (x 1 )p X (x ) 1 + i i! H x1 µ 1 x µ i H i. σ 1 σ i=1 Slepian [16] further generalized Mehler s formula for N >. His method, while complicated, can be used to obtain the expansion coefficients for an arbitrary Gaussian random vector. For example, consider the case N = 3, with σ 1 = σ = σ 3 = 1, and correlation coefficients ρ 1, ρ 13, and ρ 3. The resulting expression is then [ 3 ] ρ i 3 p X (x) = p Xn (x n ) 1 ρ i 13 ρ i 1 3 i 1!i!i 3! H i +i 3 (z 1 )H i1 +i 3 (z )H i1 +i (z 3 ), n=1 where z n = (x n µ n )/( σ n ). i 1 = i = i 3 = 3. Example : Distributions with uniform marginals. Another interesting example is the case of random variables that are uniformly distributed on the interval [, 1]. In this case, the Haar wavelet basis is a natural choice. The Haar functions 6

7 (a) (b) (c) (d) Figure : Expanding the joint distribution of uniform random variables. The first two panels depict basis functions for the SL expansion of three uniform random variables; panel (a) is a second-order function of X 1 and X, and panel (b) is a third-order function of all three variables. The last two panels depict two different joint densities; panel (c) shows a density having only third-order dependencies, while the density in panel (d) has both second- and third-order dependencies. are defined by W ij (x) = W ( i x j), i N, j < i, where 1 if x 1, W (x) = 1 if 1 < x 1, otherwise. The Haar functions together with the weighting function ψ = 1 form a complete basis for the space L [, 1] [17]. So, given a collection X = (X 1,..., X N ) of random variables, each one being uniformly distributed on the unit interval, the joint pdf p X (x) can be expressed by the SL expansion p X (x) = 1 + N W in j n (x n ). i 1,j 1 a i1 j 1 i N j N i N,j N n=1 For example, let N = 3, and consider the family of densities p(x 1, x, x 3 ) = 1 + aw (x 1 )W (x ) + bw (x 1 )W (x )W (x 3 ). The two basis functions for this distribution are depicted in figure. When a = and b, there is only third-order dependence between the variables. Panel (c) shows the corresponding joint density when b =.5. Though the density is clearly highly structured, the correlations between each pair of variables here is zero. If a, there is pairwise dependence between X 1 and X, yielding a distribution like the one in panel (d). Here, a =.5 and b =.5, resulting in a non-zero correlation ρ 1 = 3/ Example 3: Distributions on the integers. The Bahadur representation [1] is a well known representation of the joint distribution of Bernoulli random variables. Letting X = (X 1,..., X N ) be a collection of random variables with P [X n = 1] = p n and P [X n = ] = 1 p n, and letting p [1] (x) = N n=1 pxn n (1 p n ) 1 x n be the product distribution, Bahadur showed that the joint distribution can always be expressed 7

8 as [ p X (x) = p [1] (x) 1 + ] r ij z i z j + r ijk z i z j z k + + r 1 n z 1 z z n, (5) <i<j N <i<j<k N where z n = (x n p n )/ p n (1 p n ), and r ij = E [z i z j ], r ijk = E [z i z j z k ], etc., with the expectations being with respect to the joint distribution p X (x). It is easy to see that the Bahadur representation is a special case of the SL expansion, using the sets {1, z n } as bases for the marginal spaces l (p Xn ). We can extend the Bahadur representation to distributions on the integers by enlarging the span of the marginal bases. Again, consider a collection of random variables X = (X 1,..., X N ), and for each n, set p n (x) = P [X n = x]. For clarity we assume that X n M almost surely; the extension to the negative integers is straightforward. Now, we can find a complete orthonormal basis for each space l (p Xn ) by applying the Gram-Schmidt procedure [18] to the polynomials { 1, x, x,..., x M}. Letting ψ (n) i (x) = f(x i ) be the resulting i th basis function, we obtain the expansion ] p X (x) = [ N M p Xn (x n ) n=1 M i 1 = i = M N a i1 i N i N = n=1 ψ (n) i n (x n ) The functions ψ (n) 1 (x) = (x E [X n ]) /σ n take the same form as Bahadur s functions in equation 5. Thus, the Bahadur expansion is simply the SL expansion for integer-valued random variables using orthonormal polynomials, for the special case M = 1. Later, in section 4.3, we show how this construction is useful for analyzing neural population recordings. We have seen how the SL expansion provides a flexible model with a useful and intuitive multivariate dependence structure. This type of model is extremely useful for many applications, particularly those in which the variables have a complicated high-order dependence structure that must be preserved. However, it is often infeasible or undesirable to compute the expansion or to estimate it from data. For example, while we can easily construct an SL expansion for a collection of discrete random variables, estimating the combinatorial number of parameters might require a prohibitively large quantity of data. In that case, we would prefer to use a non-parametric measure of dependence that can be estimated more reliably from fewer data. The case of continuous random variables is even more problematic, since the number of non-zero parameters may be infinite, and computing the bases may be analytically intractable. In the next section, we describe a summary measure of dependence derived from the SL expansion that can be used to characterize the collection of variables when direct estimation of the SL parameters is infeasible.. 4 The Phi-Squared Dependence Measure Pearson defined φ, his coefficient of mean-square contingency, as a generalization of the χ statistic to test association in a multi-dimensional contingency table [8]. Letting X = (X 1,..., X N ) be a random vector with joint probability density function (or 8

9 mass function) p X (x) and marginal densities p Xn (x n ), the classical definition is φ = p X (x)/ N n=1 p X n (x n )dx 1. Rearranging this formula, we obtain ( φ = E p X (x) N n=1 p 1, (6) X n (x n )) where the expectation is with respect to the product density. Thus, φ is the variance of the likelihood ratio Λ, as we defined it in section 3. It is also a member of a general class of dependence measures that can be defined as Ali-Silvey distances [19] between the joint distribution and the product distribution. Besides φ, another notable member of this class is mutual information, which has gained use in recent years as a dependence measure [ 3]. Dependence measures in this class have a number of important properties, which are discussed at length in [4]. 4.1 Components of Phi-Squared One of the most important properties of the Ali-Silvey dependence measures is that if X and Y are jointly Gaussian random vectors then every Ali-Silvey dependence measure between them is a non-decreasing function of each of the canonical correlations [4]. Letting ρ 1,..., ρ M be the canonical correlations between X and Y, we obtain φ = M m=1 (1 ρ m) 1 1, which is clearly non-decreasing in ρ m. However, we get a more general result if we consider the SL expansion of p X (x); substituting equation into equation 6 we obtain φ = i a i, which is a consequence of Parseval s theorem. Thus, φ is an increasing function of each of the dependence parameters in the SL expansion. So for example, recalling the bivariate example of section 3.1 where X and Y were jointly Gaussian with correlation ρ, we get φ = i=1 ρi, a geometric sum having the well-known solution (1 ρ ) 1 1 as noted earlier. In addition, by exploiting the inherent structure in the SL parameters we obtain an explicit decomposition of φ. Recalling equation 3, we let I (N) j (k) be the class of coefficients that correspond to interactions in the j th subset of k variables. Now, letting φ j(k) = i I (N) j (k) a i, we can rewrite the total dependence φ as the sum φ = N ( N k) φ j(k). k= j=1 For clarity, we re-index the components to indicate directly which subset of variables they represent: φ = φ ij + φ ijk + + φ 13 N. i<j i<j<k For example, φ 1 is the component of φ due entirely to pairwise interactions between the variables X 1 and X, and φ 13 is the component corresponding to third-order interactions between X 1, X, and X 3. 9

10 Although we used the SL expansion explicitly to obtain the decomposition of φ, it is important to note that the decomposition does not depend on the basis used for the expansion. This fact is easiest to see if we consider the simple case of N = 3. There are three second-order subsets of variables and one third-order subset to consider. Expanding equation 3 and choosing any complete orthonormal basis, we have the SL expansion p X (x) = 3 p Xn (x n ) 1 + n=1 i I (3) 1 () a i ψ i + i I (3) () a i ψ i + i I (3) 3 () a i ψ i + i I (3) 1 (3) a i ψ i, (7) and the corresponding decomposition φ = φ 1 + φ 13 + φ 3 + φ 13. Now, suppose that I (3) 1 () denotes the set of indices corresponding to pairwise interactions between X 1 and X. Then, integrating equation 7 with respect to X 3, we obtain the marginal density p(x 1, x ) = p X1 (x 1 )p X (x ) 1 + i I (3) 1 () a i ψ i Letting φ be the phi-squared dependence in the pair (X 1, X ), we get p φ (x 1, x ) = p X1 (x 1 )p X (x ) dx 1dx 1 = i a i I (3) 1 () = φ 1, which is independent of the bases ψ i. Consequently, the φ decomposition is uniquely specified by the joint distribution, and does not depend on any particular choice of basis. Moreover, the φ decomposition can always be computed without explicitly computing an SL expansion, by a process of onion peeling. Here, we first compute all second-order components by computing every pairwise marginal distribution and applying equation 6. Then, we compute the thirdorder components by computing each third-order marginal distribution, applying equation 6 and subtracting the appropriate second-order components. Proceeding in this way, we obtain the complete decomposition of φ without calculating the SL expansion explicitly. As Ali and Silvey noted, which dependence measure one chooses to use is largely arbitrary since all measures in the Ali-Silvey class possess the same essential properties [4]. We have shown, however, that φ possesses the additional property that it can be decomposed into separate contributions from each subset of variables. In section 4.3 we will illustrate the usefulness of this property in data analysis. First, however, it is worth discussing a second dependence measure possessing a similar, though less-detailed, decomposition. 4. Kullback-Leibler (KL) Dependence Perhaps the best known dependence measure in the Ali-Silvey class is the mutual information between two random variables, which is widely used in information theory and statistics. Here 1.

11 we generalize it for multiple random variables. Let X = (X 1,..., X N ) be a random vector with joint probability density function p X (x) and marginal densities p Xn (x n ), and define the likelihood ratio Λ = p X (x)/ N n=1 p X n (x n ). The Kullback-Leibler (KL) dependence ν is the KL divergence between the joint probability function and the product distribution, p X (x) ν = E [Λ log Λ] = p X (x) log N n=1 p X n (x n ) dx. x Unlike φ, no simple expression exists for ν as a function of the SL parameters. However, for a certain class of distributions a similar decomposition of the dependence measure exists. Amari [5] describes a decomposition of the KL dependence measure for N variables that can be described by a log-linear model: N ν = ν n. n= Here, each component ν n represents interactions strictly of the n th order. For example, ν summarizes all pairwise dependencies, and ν 3 summarizes all third-order dependencies. Finding the decomposition is extremely computationally intensive, even for small N []. Moreover, the decomposition is less detailed than the decomposition of φ, since it only separates interactions of each order but does not distinguish between different sets of variables. Finally, the ν decomposition is only available when the variables can be described by a log-linear model, which heavily restricts the class of distributions to which it applies. Consequently, while the total KL dependence measure is widely used in many fields, we prefer φ for its detail, flexibility, and the computational efficiency of its decomposition. We illustrate the use of φ in data analysis by analyzing a neural population recording. 4.3 Example: Neural Populations. An important topic in computational neuroscience is the study of population codes, the mechanism through which sensory information is encoded in the coordinated action of multiple neurons. As a first step toward understanding this mechanism, uncovering the statistical dependencies between neural responses is vital. Neurons encode sensory information in sequences of identical electrical spikes, differing only in their timing. Consequently, to analyze a neural recording, we divide the response time into discrete time bins and count the number of spikes that occurred in each bin. The neural response in each bin can then be viewed as a random variable distributed on the non-negative integers. Thus, the probability law described in section 3.3 provides a natural characterization of the neural response. For the dependence analysis, we compute a normalized φ dependence measure. All Ali- Silvey dependence measures achieve their maximum value when the random variables are completely mutually dependent. In general, the maximum value is infinity, so no normalization exists. However, when we are dealing with a finite discrete alphabet, we can normalize φ so that φ = 1 when the variables are completely mutually dependent. Using Bayes Theorem [6], for any n we can write the joint distribution as p X (x) = p Xn (x n )p(x 1,..., x n 1, x n+1,..., x N x n ). Since the conditional probabilities are all less than or equal to 1, the joint distribution is upper 11

12 rate (spikes/s) 1 5 φ 13 6 x φ 1 6 x x x x φ 4 φ 13 φ 3 4 time (seconds) 4 time (seconds) 4 time (seconds) Figure 3: Dependence analysis of three spiking neurons in the crayfish optic nerve. The top left plot shows the mean response for each neuron over 9 repetitions of the stimulus. The remaining plots show the normalized φ dependence in each subgroup of neurons. 9% confidence intervals were computed using the bootstrap method, and are indicated by dotted lines. Note the significant levels of nd and 3 rd order dependence throughout the experiment. bounded by min n p Xn (x n ). Hence, φ x:p X (x)> min n p X n (x n ) N n=1 p X n (x n ) 1 φ max. The normalized measure is then φ = φ /φ max. Complete mutual dependence occurs when the random variables are one-to-one functions of each other. In that case, p X (x) = p Xn (x n ) at exactly N points and p X (x) = everywhere else; hence φ = 1 if the variables in X are completely mutually dependent. This normalization should be used with caution, however. Since complete mutual dependence can only be achieved under specific constraints on the marginal distributions, the upper bound is not always achievable for an arbitrary set of distributions. Thus, although it is always true that φ 1, small values could indicate relatively strong dependencies with a given set of marginal distributions whereas large values could correspond to weak dependencies for another set of marginal distributions. Figure 3 shows the results of an experiment on the crayfish optic nerve 1. Micro-electrodes inserted into a crayfish brain recorded from three neurons responding to a visual light stimulus (in this case, a triangle-wave light grating moving at constant spatial frequency). We estimated the dependence measure φ in each 1 ms time bin using histogram estimates for the response distributions, and computed confidence intervals using the bootstrap method [8]. The neurons exhibited statistically significant dependencies throughout the stimulus presentation. It is particularly interesting that the third-order dependence has the same order of magnitude as the pairwise dependencies; consequently, a significant portion of the total dependence would not be revealed by cross-correlation analysis. 1 The authors are grateful to Dr. R.M. Glantz, Professor of Biochemistry and Cell Biology, Rice University for providing the data analyzed here. For more experimental details, see [7]. 1

13 5 Conclusion The Sarmanov-Lancaster expansion provides an intuitive characterization of the dependence structure in any number of random variables, and applies to a large class of distributions. The SL expansion has utility as a constructive model; by selecting a convenient basis we can compute families of distributions with a given set of marginals and a particular dependence structure. We can also use it to analyze an existing distribution, projecting the distribution onto an orthogonal basis to reveal its inherent dependence structure. As an added bonus, the dependencies revealed by the SL expansion are completely captured by the φ dependence measure. This dependence measure, which has the same basic properties as other more commonly used dependence measures (such as mutual information), has the additional property that it can be decomposed into the separate contributions of each subset of variables to the overall dependence. The decomposition of φ summarizes the essential elements of the dependence structure, revealing exactly which variables are interacting, and on what level. Moreover, the decomposition is independent of the choice of SL basis, making it easy to compute and universally applicable. The uses and limitations of pairwise linear dependence models are well understood in statistics, but nevertheless they are often used inappropriately when significant higher-order and non-linear dependencies exist [9]. For example, as we saw in section 4.3, neural populations exhibit complicated dependencies that may be the key to understanding how they encode information, and which would be missed by traditional correlation analysis. High-order dependencies exists in other applications as well; in multi-modal data fusion, for example, different kinds of signals produced by the same source (eg. audio and video) often exhibit dependencies that are not well modeled by pairwise linear correlation [3, 31]. Document retrieval systems have similarly complicated dependence structures, and researchers have been able to improve performance of such systems using higher-order dependence models to describe the data [3, 33]. The SL expansion and the φ dependence measure can detail the statistical dependence structure of an arbitrary number of random variables, and thus prove to be useful in a variety of applications. References [1] M. Bezzi, M. E. Diamond, and A. Treves, Redundancy and synergy arising from pairwise correlations in neuronal ensembles, J. Computational Neuroscience, vol. 1, no. 3, pp , May-June. [] O. V. Sarmanov, Maximum correlation coefficient (nonsymmetric case), in Selected Translations in Mathematical Statistics and Probability, vol., pp Amer. Math. Soc., 196. [3] O. V. Sarmanov, Maximum correlation coefficient (symmetric case), in Selected Translations in Mathematical Statistics and Probability, vol. 4, pp Amer. Math. Soc., [4] H. O. Lancaster, The structure of bivariate distributions, Ann. Math. Statistics, vol. 9, no. 3, pp , September

14 [5] H. O. Lancaster, Correlation and complete dependence of random variables, Ann. Math. Statistics, vol. 34, no. 4, pp , December [6] H. O. Lancaster, Correlations and canonical forms of bivariate distributions, Ann. Math. Statistics, vol. 34, no., pp , June [7] H. Joe, Multivariate Models and Dependence Concepts, Chapman & Hall, [8] L. A. Goodman and W. H. Kruskal, Measures of association for cross classifications, J. Amer. Stat. Assoc., vol. 49, no. 68, pp , [9] J. F. Barrett and D. G. Lampard, An expansion for some second-order probability distributions and its application to noise problems, IRE Transactions - Information Theory, vol. 1, pp. 1 15, [1] R. R. Bahadur, A representation of the joint distribution of responses to n dichotomous items, in Studies in Item Analysis and Prediction, H. Solomon, Ed., pp Stanford University Press, [11] H. O. Hirschfeld, A connection between correlation and contingency, Proc. Cambridge Philos. Soc., vol. 31, pp. 5 54, [1] H. Hotelling, Relations between two sets of variates, Biometrika, vol. 8, pp , [13] N. Young, An Introduction to Hilbert Space, Cambridge University Press, [14] M. Reed and B. Simon, Methods of Modern Mathematical Physics: Functional Analysis, vol. 1, Academic Press, New York, NY, 198. [15] A. M. Krall, Hilbert Space, Boundary Value Problems, and Orthogonal Polynomials, vol. 133 of Operator Theory Advances and Applications, Birkhauser,. [16] D. Slepian, On the symmetrized Kronecker power of a matrix and extensions of Mehler s formula for Hermite polynomials, SIAM J. Math. Anal., vol. 3, no. 4, pp , 197. [17] G. Strang, Wavelet transforms versus Fourier transforms, Bulletin of the American Mathematical Society, vol. 8, no., pp , [18] G. Strang, Introduction to Linear Algebra, Wellesley-Cambridge Press, edition, [19] S. M. Ali and S. D. Silvey, A general class of coefficients of divergence of one distribution from another, J. Royal Stat. Soc. Series B, vol. 8, no. 1, pp , [] I. N. Goodman and D. H. Johnson, Orthogonal decompositions of multivariate statistical dependence measures, in Proc. 4 International Conference of Acoustics, Speech, and Signal Processing, May 4. [1] H. Joe, Relative entropy measures of multivariate dependence, Journal of the American Statistical Association, vol. 84, no. 45, pp , March

15 [] C. B. Bell, Mutual information and maximal correlation as measures of dependence, Ann. Math. Stat., vol. 33, no., pp , June 196. [3] G. Pola, A. Thiele, K-P. Hoffmann, and S. Panzeri, An exact method to quantify the information transmitted by different mechanisms of correlational coding, Network: Comput. Neural Syst., vol. 14, pp. 35 6, 3. [4] S. M. Ali and S. D. Silvey, Association between random variables and the dispersion of a Radon-Nikodym derivative, J. Royal Stat. Soc. Series B, vol. 7, no. 1, pp. 1 17, [5] S. Amari, Information geometry on hierarchy of probability distributions, IEEE Trans. Info. Theory, vol. 47, no. 5, pp , July 1. [6] H. Stark and J. W. Woods, Probability, Random Processes, and Estimation Theory for Engineers, Prentice Hall, nd edition, [7] C. S. Miller, D. H. Johnson, J. P. Schroeter, L. L. Myint, and R. M. Glantz, Visual signals in an optomotor reflex: Systems and information theoretic analysis, J. Computational Neuroscience, vol. 13, no. 1, pp. 5 1, July. [8] B. Efron, Better bootstrap confidence intervals, J. American Statistical Assosiation, vol. 8, no. 397, pp , March [9] D. Drouet Mari and S. Kotz, Correlation and Dependence, Imperial College Press, 1. [3] J. W. Fisher III, T. Darrell, W. T. Freeman, and P. Viola, Learning joint statistical models for audio-visual fusion and segregation, Advances in Neural Information Processing Systems, Nov.. [31] J. Hershey and J. Movellan, Using audio-visual synchrony to locate sounds, in Advances in Neural Information Processing Systems 1, S. A. Solla, T. K. Leen, and K-R. Mller, Eds., pp MIT Press, [3] R. M. Losee, Term dependence: Truncating the Bahadur-Lazarsfeld expansion, Information Processing and Management, vol. 3, no., pp , [33] G. Salton, C. Buckley, and C. T. Yu, An evaluation of term dependence models in information retrieval, in SIGIR 8: Proceedings of the 5th annual ACM conference on Research and development in information retrieval, G. Goos and J. Hartmanis, Eds. 198, pp , Springer-Verlag New York, Inc. 15

Correlations in Populations: Information-Theoretic Limits

Correlations in Populations: Information-Theoretic Limits Correlations in Populations: Information-Theoretic Limits Don H. Johnson Ilan N. Goodman dhj@rice.edu Department of Electrical & Computer Engineering Rice University, Houston, Texas Population coding Describe

More information

Jointly Poisson processes

Jointly Poisson processes Jointly Poisson processes Don H. Johnson and Ilan N. Goodman Electrical & Computer Engineering Department, MS380 Rice University Houston, Texas 77005 1892 {dhj,igoodman}@rice.edu Abstract What constitutes

More information

Dynamic System Identification using HDMR-Bayesian Technique

Dynamic System Identification using HDMR-Bayesian Technique Dynamic System Identification using HDMR-Bayesian Technique *Shereena O A 1) and Dr. B N Rao 2) 1), 2) Department of Civil Engineering, IIT Madras, Chennai 600036, Tamil Nadu, India 1) ce14d020@smail.iitm.ac.in

More information

When does interval coding occur?

When does interval coding occur? When does interval coding occur? Don H. Johnson Λ and Raymon M. Glantz y Λ Department of Electrical & Computer Engineering, MS 366 y Department of Biochemistry and Cell Biology Rice University 6 Main Street

More information

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define HILBERT SPACES AND THE RADON-NIKODYM THEOREM STEVEN P. LALLEY 1. DEFINITIONS Definition 1. A real inner product space is a real vector space V together with a symmetric, bilinear, positive-definite mapping,

More information

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing Linear recurrent networks Simpler, much more amenable to analytic treatment E.g. by choosing + ( + ) = Firing rates can be negative Approximates dynamics around fixed point Approximation often reasonable

More information

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product Chapter 4 Hilbert Spaces 4.1 Inner Product Spaces Inner Product Space. A complex vector space E is called an inner product space (or a pre-hilbert space, or a unitary space) if there is a mapping (, )

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases

A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases 2558 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 48, NO 9, SEPTEMBER 2002 A Generalized Uncertainty Principle Sparse Representation in Pairs of Bases Michael Elad Alfred M Bruckstein Abstract An elementary

More information

Adapted Feature Extraction and Its Applications

Adapted Feature Extraction and Its Applications saito@math.ucdavis.edu 1 Adapted Feature Extraction and Its Applications Naoki Saito Department of Mathematics University of California Davis, CA 95616 email: saito@math.ucdavis.edu URL: http://www.math.ucdavis.edu/

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

Hilbert Spaces. Hilbert space is a vector space with some extra structure. We start with formal (axiomatic) definition of a vector space.

Hilbert Spaces. Hilbert space is a vector space with some extra structure. We start with formal (axiomatic) definition of a vector space. Hilbert Spaces Hilbert space is a vector space with some extra structure. We start with formal (axiomatic) definition of a vector space. Vector Space. Vector space, ν, over the field of complex numbers,

More information

Gaussian Random Fields

Gaussian Random Fields Gaussian Random Fields Mini-Course by Prof. Voijkan Jaksic Vincent Larochelle, Alexandre Tomberg May 9, 009 Review Defnition.. Let, F, P ) be a probability space. Random variables {X,..., X n } are called

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

Vector Spaces. Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms.

Vector Spaces. Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms. Vector Spaces Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms. For each two vectors a, b ν there exists a summation procedure: a +

More information

Approximation of Functions by Multivariable Hermite Basis: A Hybrid Method

Approximation of Functions by Multivariable Hermite Basis: A Hybrid Method Approximation of Functions by Multivariable Hermite Basis: A Hybrid Method Bartlomiej Beliczynski Warsaw University of Technology, Institute of Control and Industrial Electronics, ul. Koszykowa 75, -66

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

x log x, which is strictly convex, and use Jensen s Inequality:

x log x, which is strictly convex, and use Jensen s Inequality: 2. Information measures: mutual information 2.1 Divergence: main inequality Theorem 2.1 (Information Inequality). D(P Q) 0 ; D(P Q) = 0 iff P = Q Proof. Let ϕ(x) x log x, which is strictly convex, and

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

CMPE 58K Bayesian Statistics and Machine Learning Lecture 5

CMPE 58K Bayesian Statistics and Machine Learning Lecture 5 CMPE 58K Bayesian Statistics and Machine Learning Lecture 5 Multivariate distributions: Gaussian, Bernoulli, Probability tables Department of Computer Engineering, Boğaziçi University, Istanbul, Turkey

More information

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline.

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline. MFM Practitioner Module: Risk & Asset Allocation September 11, 2013 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y

More information

2234 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 52, NO. 8, AUGUST Nonparametric Hypothesis Tests for Statistical Dependency

2234 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 52, NO. 8, AUGUST Nonparametric Hypothesis Tests for Statistical Dependency 2234 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 52, NO. 8, AUGUST 2004 Nonparametric Hypothesis Tests for Statistical Dependency Alexander T. Ihler, Student Member, IEEE, John W. Fisher, Member, IEEE,

More information

Distributed Structures, Sequential Optimization, and Quantization for Detection

Distributed Structures, Sequential Optimization, and Quantization for Detection IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL., NO., JANUARY Distributed Structures, Sequential Optimization, and Quantization for Detection Michael A. Lexa, Student Member, IEEE and Don H. Johnson, Fellow,

More information

The Variational Gaussian Approximation Revisited

The Variational Gaussian Approximation Revisited The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much

More information

Math Real Analysis II

Math Real Analysis II Math 4 - Real Analysis II Solutions to Homework due May Recall that a function f is called even if f( x) = f(x) and called odd if f( x) = f(x) for all x. We saw that these classes of functions had a particularly

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) Contents 1 Vector Spaces 1 1.1 The Formal Denition of a Vector Space.................................. 1 1.2 Subspaces...................................................

More information

Integrating Correlated Bayesian Networks Using Maximum Entropy

Integrating Correlated Bayesian Networks Using Maximum Entropy Applied Mathematical Sciences, Vol. 5, 2011, no. 48, 2361-2371 Integrating Correlated Bayesian Networks Using Maximum Entropy Kenneth D. Jarman Pacific Northwest National Laboratory PO Box 999, MSIN K7-90

More information

Wavelets For Computer Graphics

Wavelets For Computer Graphics {f g} := f(x) g(x) dx A collection of linearly independent functions Ψ j spanning W j are called wavelets. i J(x) := 6 x +2 x + x + x Ψ j (x) := Ψ j (2 j x i) i =,..., 2 j Res. Avge. Detail Coef 4 [9 7

More information

Wavelets and Image Compression. Bradley J. Lucier

Wavelets and Image Compression. Bradley J. Lucier Wavelets and Image Compression Bradley J. Lucier Abstract. In this paper we present certain results about the compression of images using wavelets. We concentrate on the simplest case of the Haar decomposition

More information

Bayes spaces: use of improper priors and distances between densities

Bayes spaces: use of improper priors and distances between densities Bayes spaces: use of improper priors and distances between densities J. J. Egozcue 1, V. Pawlowsky-Glahn 2, R. Tolosana-Delgado 1, M. I. Ortego 1 and G. van den Boogaart 3 1 Universidad Politécnica de

More information

22 : Hilbert Space Embeddings of Distributions

22 : Hilbert Space Embeddings of Distributions 10-708: Probabilistic Graphical Models 10-708, Spring 2014 22 : Hilbert Space Embeddings of Distributions Lecturer: Eric P. Xing Scribes: Sujay Kumar Jauhar and Zhiguang Huo 1 Introduction and Motivation

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

On Scalable Source Coding for Multiple Decoders with Side Information

On Scalable Source Coding for Multiple Decoders with Side Information On Scalable Source Coding for Multiple Decoders with Side Information Chao Tian School of Computer and Communication Sciences Laboratory for Information and Communication Systems (LICOS), EPFL, Lausanne,

More information

There are two things that are particularly nice about the first basis

There are two things that are particularly nice about the first basis Orthogonality and the Gram-Schmidt Process In Chapter 4, we spent a great deal of time studying the problem of finding a basis for a vector space We know that a basis for a vector space can potentially

More information

Cambridge University Press The Mathematics of Signal Processing Steven B. Damelin and Willard Miller Excerpt More information

Cambridge University Press The Mathematics of Signal Processing Steven B. Damelin and Willard Miller Excerpt More information Introduction Consider a linear system y = Φx where Φ can be taken as an m n matrix acting on Euclidean space or more generally, a linear operator on a Hilbert space. We call the vector x a signal or input,

More information

Multiple Similarities Based Kernel Subspace Learning for Image Classification

Multiple Similarities Based Kernel Subspace Learning for Image Classification Multiple Similarities Based Kernel Subspace Learning for Image Classification Wang Yan, Qingshan Liu, Hanqing Lu, and Songde Ma National Laboratory of Pattern Recognition, Institute of Automation, Chinese

More information

Remarks on the Rademacher-Menshov Theorem

Remarks on the Rademacher-Menshov Theorem Remarks on the Rademacher-Menshov Theorem Christopher Meaney Abstract We describe Salem s proof of the Rademacher-Menshov Theorem, which shows that one constant works for all orthogonal expansions in all

More information

Linear Algebra and Eigenproblems

Linear Algebra and Eigenproblems Appendix A A Linear Algebra and Eigenproblems A working knowledge of linear algebra is key to understanding many of the issues raised in this work. In particular, many of the discussions of the details

More information

Multidimensional scaling (MDS)

Multidimensional scaling (MDS) Multidimensional scaling (MDS) Just like SOM and principal curves or surfaces, MDS aims to map data points in R p to a lower-dimensional coordinate system. However, MSD approaches the problem somewhat

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

MATH 115A: SAMPLE FINAL SOLUTIONS

MATH 115A: SAMPLE FINAL SOLUTIONS MATH A: SAMPLE FINAL SOLUTIONS JOE HUGHES. Let V be the set of all functions f : R R such that f( x) = f(x) for all x R. Show that V is a vector space over R under the usual addition and scalar multiplication

More information

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag A Tutorial on Data Reduction Principal Component Analysis Theoretical Discussion By Shireen Elhabian and Aly Farag University of Louisville, CVIP Lab November 2008 PCA PCA is A backbone of modern data

More information

Stat 159/259: Linear Algebra Notes

Stat 159/259: Linear Algebra Notes Stat 159/259: Linear Algebra Notes Jarrod Millman November 16, 2015 Abstract These notes assume you ve taken a semester of undergraduate linear algebra. In particular, I assume you are familiar with the

More information

A Few Notes on Fisher Information (WIP)

A Few Notes on Fisher Information (WIP) A Few Notes on Fisher Information (WIP) David Meyer dmm@{-4-5.net,uoregon.edu} Last update: April 30, 208 Definitions There are so many interesting things about Fisher Information and its theoretical properties

More information

RESEARCH STATEMENT. Nora Youngs, University of Nebraska - Lincoln

RESEARCH STATEMENT. Nora Youngs, University of Nebraska - Lincoln RESEARCH STATEMENT Nora Youngs, University of Nebraska - Lincoln 1. Introduction Understanding how the brain encodes information is a major part of neuroscience research. In the field of neural coding,

More information

More on Estimation. Maximum Likelihood Estimation.

More on Estimation. Maximum Likelihood Estimation. More on Estimation. In the previous chapter we looked at the properties of estimators and the criteria we could use to choose between types of estimators. Here we examine more closely some very popular

More information

Lecture 3: Central Limit Theorem

Lecture 3: Central Limit Theorem Lecture 3: Central Limit Theorem Scribe: Jacy Bird (Division of Engineering and Applied Sciences, Harvard) February 8, 003 The goal of today s lecture is to investigate the asymptotic behavior of P N (

More information

Information Geometry on Hierarchy of Probability Distributions

Information Geometry on Hierarchy of Probability Distributions IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 5, JULY 2001 1701 Information Geometry on Hierarchy of Probability Distributions Shun-ichi Amari, Fellow, IEEE Abstract An exponential family or mixture

More information

Estimation of information-theoretic quantities

Estimation of information-theoretic quantities Estimation of information-theoretic quantities Liam Paninski Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/ liam liam@gatsby.ucl.ac.uk November 16, 2004 Some

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

Sparse Time-Frequency Transforms and Applications.

Sparse Time-Frequency Transforms and Applications. Sparse Time-Frequency Transforms and Applications. Bruno Torrésani http://www.cmi.univ-mrs.fr/~torresan LATP, Université de Provence, Marseille DAFx, Montreal, September 2006 B. Torrésani (LATP Marseille)

More information

USING CARLEMAN EMBEDDING TO DISCOVER A SYSTEM S MOTION CONSTANTS

USING CARLEMAN EMBEDDING TO DISCOVER A SYSTEM S MOTION CONSTANTS (Preprint) AAS 12-629 USING CARLEMAN EMBEDDING TO DISCOVER A SYSTEM S MOTION CONSTANTS John E. Hurtado and Andrew J. Sinclair INTRODUCTION Although the solutions with respect to time are commonly sought

More information

On the occurrence times of componentwise maxima and bias in likelihood inference for multivariate max-stable distributions

On the occurrence times of componentwise maxima and bias in likelihood inference for multivariate max-stable distributions On the occurrence times of componentwise maxima and bias in likelihood inference for multivariate max-stable distributions J. L. Wadsworth Department of Mathematics and Statistics, Fylde College, Lancaster

More information

Mathematical Tools for Neuroscience (NEU 314) Princeton University, Spring 2016 Jonathan Pillow. Homework 8: Logistic Regression & Information Theory

Mathematical Tools for Neuroscience (NEU 314) Princeton University, Spring 2016 Jonathan Pillow. Homework 8: Logistic Regression & Information Theory Mathematical Tools for Neuroscience (NEU 34) Princeton University, Spring 206 Jonathan Pillow Homework 8: Logistic Regression & Information Theory Due: Tuesday, April 26, 9:59am Optimization Toolbox One

More information

A DECOMPOSITION THEOREM FOR FRAMES AND THE FEICHTINGER CONJECTURE

A DECOMPOSITION THEOREM FOR FRAMES AND THE FEICHTINGER CONJECTURE PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY Volume 00, Number 0, Pages 000 000 S 0002-9939(XX)0000-0 A DECOMPOSITION THEOREM FOR FRAMES AND THE FEICHTINGER CONJECTURE PETER G. CASAZZA, GITTA KUTYNIOK,

More information

Discrete Simulation of Power Law Noise

Discrete Simulation of Power Law Noise Discrete Simulation of Power Law Noise Neil Ashby 1,2 1 University of Colorado, Boulder, CO 80309-0390 USA 2 National Institute of Standards and Technology, Boulder, CO 80305 USA ashby@boulder.nist.gov

More information

Beyond Wiener Askey Expansions: Handling Arbitrary PDFs

Beyond Wiener Askey Expansions: Handling Arbitrary PDFs Journal of Scientific Computing, Vol. 27, Nos. 1 3, June 2006 ( 2005) DOI: 10.1007/s10915-005-9038-8 Beyond Wiener Askey Expansions: Handling Arbitrary PDFs Xiaoliang Wan 1 and George Em Karniadakis 1

More information

BAYESIAN DECISION THEORY

BAYESIAN DECISION THEORY Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will

More information

A Note on a General Expansion of Functions of Binary Variables

A Note on a General Expansion of Functions of Binary Variables INFORMATION AND CONTROL 19-, 206-211 (1968) A Note on a General Expansion of Functions of Binary Variables TAIC~YASU ITO Stanford University In this note a general expansion of functions of binary variables

More information

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396 Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction

More information

Bootstrap Approximation of Gibbs Measure for Finite-Range Potential in Image Analysis

Bootstrap Approximation of Gibbs Measure for Finite-Range Potential in Image Analysis Bootstrap Approximation of Gibbs Measure for Finite-Range Potential in Image Analysis Abdeslam EL MOUDDEN Business and Management School Ibn Tofaïl University Kenitra, Morocco Abstract This paper presents

More information

An Improved Cumulant Based Method for Independent Component Analysis

An Improved Cumulant Based Method for Independent Component Analysis An Improved Cumulant Based Method for Independent Component Analysis Tobias Blaschke and Laurenz Wiskott Institute for Theoretical Biology Humboldt University Berlin Invalidenstraße 43 D - 0 5 Berlin Germany

More information

DETECTION theory deals primarily with techniques for

DETECTION theory deals primarily with techniques for ADVANCED SIGNAL PROCESSING SE Optimum Detection of Deterministic and Random Signals Stefan Tertinek Graz University of Technology turtle@sbox.tugraz.at Abstract This paper introduces various methods for

More information

INFORMATION PROCESSING ABILITY OF BINARY DETECTORS AND BLOCK DECODERS. Michael A. Lexa and Don H. Johnson

INFORMATION PROCESSING ABILITY OF BINARY DETECTORS AND BLOCK DECODERS. Michael A. Lexa and Don H. Johnson INFORMATION PROCESSING ABILITY OF BINARY DETECTORS AND BLOCK DECODERS Michael A. Lexa and Don H. Johnson Rice University Department of Electrical and Computer Engineering Houston, TX 775-892 amlexa@rice.edu,

More information

Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction

Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction Xiaodong Lin 1 and Yu Zhu 2 1 Statistical and Applied Mathematical Science Institute, RTP, NC, 27709 USA University of Cincinnati,

More information

Ensembles and incomplete information

Ensembles and incomplete information p. 1/32 Ensembles and incomplete information So far in this course, we have described quantum systems by states that are normalized vectors in a complex Hilbert space. This works so long as (a) the system

More information

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Jeremy S. Conner and Dale E. Seborg Department of Chemical Engineering University of California, Santa Barbara, CA

More information

GAUSSIAN PROCESS TRANSFORMS

GAUSSIAN PROCESS TRANSFORMS GAUSSIAN PROCESS TRANSFORMS Philip A. Chou Ricardo L. de Queiroz Microsoft Research, Redmond, WA, USA pachou@microsoft.com) Computer Science Department, Universidade de Brasilia, Brasilia, Brazil queiroz@ieee.org)

More information

Mean-field equations for higher-order quantum statistical models : an information geometric approach

Mean-field equations for higher-order quantum statistical models : an information geometric approach Mean-field equations for higher-order quantum statistical models : an information geometric approach N Yapage Department of Mathematics University of Ruhuna, Matara Sri Lanka. arxiv:1202.5726v1 [quant-ph]

More information

Regression models for multivariate ordered responses via the Plackett distribution

Regression models for multivariate ordered responses via the Plackett distribution Journal of Multivariate Analysis 99 (2008) 2472 2478 www.elsevier.com/locate/jmva Regression models for multivariate ordered responses via the Plackett distribution A. Forcina a,, V. Dardanoni b a Dipartimento

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

3. Probability and Statistics

3. Probability and Statistics FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important

More information

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

Maximum variance formulation

Maximum variance formulation 12.1. Principal Component Analysis 561 Figure 12.2 Principal component analysis seeks a space of lower dimensionality, known as the principal subspace and denoted by the magenta line, such that the orthogonal

More information

MULTIPLEXING AND DEMULTIPLEXING FRAME PAIRS

MULTIPLEXING AND DEMULTIPLEXING FRAME PAIRS MULTIPLEXING AND DEMULTIPLEXING FRAME PAIRS AZITA MAYELI AND MOHAMMAD RAZANI Abstract. Based on multiplexing and demultiplexing techniques in telecommunication, we study the cases when a sequence of several

More information

REPRESENTATION THEORY NOTES FOR MATH 4108 SPRING 2012

REPRESENTATION THEORY NOTES FOR MATH 4108 SPRING 2012 REPRESENTATION THEORY NOTES FOR MATH 4108 SPRING 2012 JOSEPHINE YU This note will cover introductory material on representation theory, mostly of finite groups. The main references are the books of Serre

More information

Recent Developments in Numerical Methods for 4d-Var

Recent Developments in Numerical Methods for 4d-Var Recent Developments in Numerical Methods for 4d-Var Mike Fisher Slide 1 Recent Developments Numerical Methods 4d-Var Slide 2 Outline Non-orthogonal wavelets on the sphere: - Motivation: Covariance Modelling

More information

Analytic Geometry. Orthogonal projection. Chapter 4 Matrix decomposition

Analytic Geometry. Orthogonal projection. Chapter 4 Matrix decomposition 1541 3 Analytic Geometry 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 In Chapter 2, we studied vectors, vector spaces and linear mappings at a general but abstract level. In this

More information

The following definition is fundamental.

The following definition is fundamental. 1. Some Basics from Linear Algebra With these notes, I will try and clarify certain topics that I only quickly mention in class. First and foremost, I will assume that you are familiar with many basic

More information

CHAPTER VIII HILBERT SPACES

CHAPTER VIII HILBERT SPACES CHAPTER VIII HILBERT SPACES DEFINITION Let X and Y be two complex vector spaces. A map T : X Y is called a conjugate-linear transformation if it is a reallinear transformation from X into Y, and if T (λx)

More information

Topics in Representation Theory: Fourier Analysis and the Peter Weyl Theorem

Topics in Representation Theory: Fourier Analysis and the Peter Weyl Theorem Topics in Representation Theory: Fourier Analysis and the Peter Weyl Theorem 1 Fourier Analysis, a review We ll begin with a short review of simple facts about Fourier analysis, before going on to interpret

More information

REPRESENTATION THEORY OF S n

REPRESENTATION THEORY OF S n REPRESENTATION THEORY OF S n EVAN JENKINS Abstract. These are notes from three lectures given in MATH 26700, Introduction to Representation Theory of Finite Groups, at the University of Chicago in November

More information

Pramod K. Varshney. EECS Department, Syracuse University This research was sponsored by ARO grant W911NF

Pramod K. Varshney. EECS Department, Syracuse University This research was sponsored by ARO grant W911NF Pramod K. Varshney EECS Department, Syracuse University varshney@syr.edu This research was sponsored by ARO grant W911NF-09-1-0244 2 Overview of Distributed Inference U i s may be 1. Local decisions 2.

More information

TAKING THE CONVOLUTED OUT OF BERNOULLI CONVOLUTIONS: A DISCRETE APPROACH

TAKING THE CONVOLUTED OUT OF BERNOULLI CONVOLUTIONS: A DISCRETE APPROACH TAKING THE CONVOLUTED OUT OF BERNOULLI CONVOLUTIONS: A DISCRETE APPROACH NEIL CALKIN, JULIA DAVIS, MICHELLE DELCOURT, ZEBEDIAH ENGBERG, JOBBY JACOB, AND KEVIN JAMES Abstract. In this paper we consider

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Mutual Information and Optimal Data Coding

Mutual Information and Optimal Data Coding Mutual Information and Optimal Data Coding May 9 th 2012 Jules de Tibeiro Université de Moncton à Shippagan Bernard Colin François Dubeau Hussein Khreibani Université de Sherbooe Abstract Introduction

More information

Invariant HPD credible sets and MAP estimators

Invariant HPD credible sets and MAP estimators Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in

More information

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) D. ARAPURA This is a summary of the essential material covered so far. The final will be cumulative. I ve also included some review problems

More information

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Ann Inst Stat Math (0) 64:359 37 DOI 0.007/s0463-00-036-3 Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Paul Vos Qiang Wu Received: 3 June 009 / Revised:

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Encoding or decoding

Encoding or decoding Encoding or decoding Decoding How well can we learn what the stimulus is by looking at the neural responses? We will discuss two approaches: devise and evaluate explicit algorithms for extracting a stimulus

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

Mathematics Department Stanford University Math 61CM/DM Inner products

Mathematics Department Stanford University Math 61CM/DM Inner products Mathematics Department Stanford University Math 61CM/DM Inner products Recall the definition of an inner product space; see Appendix A.8 of the textbook. Definition 1 An inner product space V is a vector

More information

Lecture 3: Central Limit Theorem

Lecture 3: Central Limit Theorem Lecture 3: Central Limit Theorem Scribe: Jacy Bird (Division of Engineering and Applied Sciences, Harvard) February 8, 003 The goal of today s lecture is to investigate the asymptotic behavior of P N (εx)

More information