INVARIANT COORDINATE SELECTION

Size: px
Start display at page:

Download "INVARIANT COORDINATE SELECTION"

Transcription

1 INVARIANT COORDINATE SELECTION By David E. Tyler 1, Frank Critchley, Lutz Dümbgen 2, and Hannu Oja Rutgers University, Open University, University of Berne and University of Tampere SUMMARY A general method for exploring multivariate data by comparing different estimates of multivariate scatter is presented. The method is based upon the eigenvalue-eigenvector decomposition of one scatter matrix relative to another. In particular, it is shown that the eigenvectors can be used to generate an affine invariant coordinate system for the multivariate data. Consequently, we view this method as a method for invariant coordinate selection (ICS). By plotting the data with respect to this new invariant coordinate system, various data structures can be revealed. For example, under certain independent components models, it is shown that the invariant coordinates correspond to the independent components. Another example pertains to mixtures of elliptical distributions. In this case, it is shown that a subset of the invariant coordinates corresponds to Fisher s linear discriminant subspace, even though the class identifications of the data points are unknown. Some illustrative examples are given. 1. Introduction. When sampling from a multivariate normal distribution, the sample mean vector and sample variance-covariance matrix are a sufficient summary of the data set. To protect against non-normality, and in particular against longer tailed distributions and outliers, one can replace the sample mean and covariance matrix with robust estimates of multivariate location and scatter (or pseudo-covariance). A variety of robust estimates of the multivariate location vector and scatter matrix have been proposed. Among them are multivariate M -estimates [19, 29], the minimum volume ellipsoid estimate (MVE) and the minimum covariance determinant estimate (MCD) [38], S-estimates [12, 25], projection based estimates [30, 44], τ-estimates [26], CM -estimates [24] and MM -estimates [43, 45], as well as one-step versions of these estimates [27]. After computing robust estimates of multivariate location and scatter, outliers can often be detected by examining the corresponding robust Mahalanobis distances, see e.g. [39]. Summarizing a multivariate data set via a location and a scatter statistic, and then inspecting the corresponding Mahalanobis distance plot for possible outliers, is appropriate if the bulk of the data arises from a multivariate normal distribution or, more generally, from an elliptically symmetric distribution. However, if the data arises from a distribution which is not symmetric, then different AMS 2000 subject classifications. Primary 62H05, 62G35. Secondary 62-09, 62H25, 62H30. Key words and phrases. affine invariance, cluster analysis, independent components analysis, mixture models, multivariate diagnostics, multivariate scatter, principal components, projection pursuit, robust statistics. 1 Research supported by NSF Grant DMS Research supported by the Swiss National Science Foundation 1

2 2 D.E. TYLER, F. CRITCHLEY, L. DÜMBGEN and H. OJA location statistics are estimating different notions of central tendency. Moreover, if the data arises from a distribution other than an elliptically symmetric distribution, even one which is symmetric, then different scatter statistics are not necessarily estimating the same population quantity, but rather are reflecting different aspects of the underlying distribution. This suggests that comparing different estimates of multivariate scatter may help reveal interesting departures from an elliptically symmetric distribution. Such data structures may not be apparent in a Mahalanobis distance plot. In this paper, we present a general multivariate method based upon the comparison of different estimates of multivariate scatter. This method is based on the eigenvalue-eigenvector decomposition of one scatter matrix relative to another. An important property of this decomposition is that the corresponding eigenvectors generate an affine invariant coordinate system for the multivariate observations, and so we view this method as a method for invariant coordinate selection (ICS). By plotting the data with respect to this new invariant coordinate system, various data structures can be revealed. For example, when the data arises from a mixture of elliptical distributions, the space spanned by a subset of the invariant coordinates gives an estimate of Fisher s linear discriminant subspace, even though the class identifications of the data points are unknown. Another example pertains to certain independent components models. Here the variables obtained using the invariant coordinates correspond to estimates of the independent components. The paper is organized as follows. Section 2 sets up some notation and concepts to be used in the paper. In particular, the general concept of affine equivariant scatter matrices is reviewed in 2.1 and some classes of scatter matrices are briefly reviewed in section 2.2. The idea of comparing two different scatter matrices using the eigenvalue-eigenvector decomposition of one scatter matrix relative to another is discussed in section 3, with the invariance properties of the ICS transformation being given in section 4. Section 5 gives a theoretical study of the ICS transformation under the aforementioned elliptical mixture models (section 5.1), and under independent components models (section 5.2). The results in section 5.1 represent a broad generalization of results given under the heading of generalized principal components analysis (GPCA) by Ruiz-Gazen [41] and Caussinus and Ruiz-Gazen [5, 6]. Readers primarily interested in how ICS works in practice may wish to skip section 5 at a first reading. In section 6, a general discussion on the choice of scatter matrices one may consider when implementing ICS, along with some examples illustrating the utility of the ICS transformation for diagnostic plots, are given. Further discussion, open research questions, and the relationship of ICS to other approaches are given in section 7. All formal proofs are reserved for section 8, an appendix. An R package entitled ICS [34] is freely available for implementing the ICS methods. 2. Scatter Matrices Affine Equivariance. Let F Y denote the distribution function of the multivariate random variable Y R p, and let P p represent the set of all symmetric positive definite matrices of order p. Affine equivariant multivariate location and scatter functionals, say µ(f Y ) R p and V (F Y ) P p

3 INVARIANT COORDINATE SELECTION 3 respectively, are functions of the distribution satisfying the property that for Y = AY + b, with A nonsingular and b R p, (1) µ(f Y ) = Aµ(F Y ) + b and V (F Y ) = AV (F Y )A. Classical examples of affine equivariant location and scatter functionals are the mean vector µ Y = E[Y ] and the variance-covariance matrix Σ Y = E[(Y µ Y )(Y µ Y ) ] respectively, provided they exist. For our purposes, affine equivariance of the scatter matrix can be relaxed slightly to require only affine equivariance of its shape components. A shape component of a scatter matrix V P p refers to any function of V, say S(V ), such that (2) S(V ) = S(λV ) for any λ > 0. Thus, we say that the shape of V (F Y ) is affine equivariant if (3) V (F Y ) AV (F Y )A. For a p-dimensional sample of size n, Y = {y 1,..., y n }, affine equivariant multivariate location and scatter statistics, say µ and V respectively, are defined by applying the above definition to the empirical distribution function. That is, they are statistics satisfying the property that for any nonsingular A and any b R p, (4) y i y i = Ay i + b for i = 1,..., n ( µ, V ) ( µ, V ) = (A µ + b, A V A ). Likewise, the shape of V is said to be affine equivariant if (5) V A V A. The sample mean vector ȳ and sample variance-covariance matrix S n are examples of affine equivariant location and scatter statistics respectively, as are all the estimates cited in the introduction. Typically, in practice, V is normalized so that it is consistent at the multivariate normal model for the variance-covariance matrix. The normalized version is thus given as Ṽ = V /β, where β > 0 is such that V (F Z ) = βi when Z has a standard multivariate normal distribution. For our purposes, it is sufficient to consider only the unnormalized scatter matrix V since our proposed methods depend only on the scatter matrix up to proportionality, i.e. only on the shape of the scatter matrix. Under elliptical symmetry, affine equivariant location and scatter functionals have relatively simple forms. Recall that an elliptically symmetric distribution is defined to be one arising from an affine transformation of a spherically symmetric distribution, i.e. if Z QZ for any p p orthogonal matrix Q, then the distribution of Y = AZ + µ is said to have an elliptically symmetric distribution with center µ R p and shape matrix Γ = AA, see e.g. [2]. If the distribution of Y is also absolutely continuous, then it has a density of the form (6) f(y; µ, Γ, g) = det(γ) 1/2 g{(y µ) Γ 1 (y µ)} for y R p,

4 4 D.E. TYLER, F. CRITCHLEY, L. DÜMBGEN and H. OJA for some non-negative function g and with Γ P p. As defined, the shape parameter Γ of an elliptically symmetric distribution is only well defined up to a scalar multiple, i.e. if Γ satisfies the definition of a shape matrix for a given elliptically symmetric distribution, then λγ also does for any λ > 0. In the absolutely continuous case, if no restrictions are placed on the function g, then the parameter Γ is confounded with g. One could normalize the shape parameter by setting, for example, det(γ) = 1 or trace(γ) = p. Again, this is not necessary for our purposes since only the shape components of Γ, as defined in (2), are of interest in this paper, and these shape components for an elliptically symmetric distribution are well defined. Under elliptical symmetry, any affine equivariant location functional corresponds to the center of symmetry and any affine equivariant scatter functional is proportional to the shape matrix, i.e. µ(f Y ) = µ and V (F Y ) Γ. In particular, µ Y = µ and Σ Y Γ when the first and second moments exist respectively. More generally, if V (F Y ) is any functional satisfying (3), then V (F Y ) Γ. As noted in the introduction, for general distributions, affine equivariant location functionals are not necessarily equal and affine equivariant scatter functionals are not necessarily proportional to each other. The corresponding sample versions of these functionals are therefore estimating different population features. The difference in these functionals reflect in some way how the distribution differs from an elliptically symmetric distribution. Remark 2.1. The class of distributions for which all affine equivariant location functionals are equal and all equivariant scatter functionals are proportional to each other is broader than the class of elliptical distributions. For example, this can be shown to be true for F Y when Y = AZ+µ with the distribution of Z being exchangeable and symmetric in each component. That is, Z DJZ for any permutation matrix J and any diagonal matrix D having diagonal elements ±1. We conjecture that this is the broadest class for which this property holds. This class contains the elliptical symmetric distributions, since these correspond to Z having a spherically symmetric distribution Classes of scatter statistics. Conceptually, the simplest alternatives to the sample mean ȳ and sample covariance matrix S n are the weighted sample means and sample covariance matrices respectively, with the weights dependent on the classical Mahalanobis distances. These are defined by (7) n i=1 µ = u n 1(s o,i )y i n i=1 u 1(s o,i ), and V = i=1 u 2(s o,i )(y i ȳ)(y i ȳ) n i=1 u, 2(s o,i ) where s o,i = (y i ȳ) Sn 1 (y i ȳ), and u 1 (s) and u 2 (s) are some appropriately chosen weight functions. Other simple alternatives to the sample covariance matrix can be obtained by applying only the scatter equation above to the sample of pairwise differences, i.e. to the symmetrized data set (8) Y s = {y i y j i, j = 1,..., n, i j},

5 INVARIANT COORDINATE SELECTION 5 for which the sample mean is zero. Even though the weighted mean and covariance matrix, as well as the symmetrized version of the weighted covariance matrix, may downweight outliers, they have unbounded influence functions and zero breakdown points. A more robust class of multivariate location and scatter statistics is given by the multivariate M -estimates, which can be viewed as adaptively weighted sample means and sample covariance matrices respectively. More specifically, they are defined as solutions to the M -estimating equations n i=1 µ = u n 1(s i )y i n i=1 u 1(s i ), and V = i=1 u 2(s i )(y i µ)(y i µ) (9) n i=1 u, 3(s i ) where s i = (y i µ) V 1 (y i µ), and u 1 (s), u 2 (s) and u 3 (s) are again some appropriately chosen weight functions. We refer the reader to [19] and [29] for the general theory regarding the multivariate M -estimates. The equations given in (9) are implicit equations in ( µ, V ) since the weights depend upon the Mahalanobis distances relative to ( µ, V ), i.e. on d i ( µ, V ) = s i. Nevertheless, relatively simple algorithms exist for computing the multivariate M -estimates. The maximum likelihood estimates of the parameters µ and Γ of an elliptical distribution for a given spread function g in (6) are special cases of M -estimates. From a robustness perspective, an often cited drawback to the multivariate M -estimates is their relatively low breakdown in higher dimension. Specifically, their breakdown point is bounded above by 1/(p + 1). Subsequently, numerous high breakdown point estimates have been proposed, such as the MVE, the MCD, the S-estimates, the projection based estimates, the τ-estimates, the CM -estimates and the MM -estimates, all of which are cited in the introduction. All the high breakdown point estimates are computationally intensive and, except for small data sets, are usually computed using approximate or probabilistic algorithms. The computational complexity of high breakdown point multivariate estimates is especially challenging for extremely large data sets in high dimensions, and this remains an open and active area of research. The definition of the weighted sample means and covariance matrices given by (7) can be readily generalized by using any initial affine equivariant location and scatter statistic, say µ o and V o respectively. That is, (10) n i=1 µ = u n 1(s o,i )y i n i=1 u 1(s o,i ), and V = i=1 u 2(s o,i )(y i µ o )(y i µ o ) n i=1 u, 2(s o,i ) where now s o,i = (y i µ o ) 1 V o (y i µ o ). In the univariate setting such weighted sample means and variances are sometimes referred to as one-step W -estimates [18, 31], and so we refer to their multivariate versions as multivariate one-step W -estimates. Given a location and a scatter statistic, a corresponding one-step W -estimate provides a computationally simple choice for an alternative location and scatter statistic. Any method one uses for obtaining location and scatter statistics for a data set Y can also be applied to its symmetrized version Y s to produce a scatter statistic. For symmetrized data, any affine equivariant location statistic is always zero.

6 6 D.E. TYLER, F. CRITCHLEY, L. DÜMBGEN and H. OJA The functional or population versions of the location and scatter statistics discussed in this section are readily obtained by replacing the empirical distribution of Y with the population distribution function F Y. For the M -estimates and the one-step W -estimates, this simply implies replacing the averages in (9) and (10) respectively with expected values. For symmetrized data, the functional versions are obtained by replacing the empirical distribution of Y s with its almost sure limit F s Y, the distribution function of Y s = Y 1 Y 2, where Y 1 and Y 2 are independent copies of Y. 3. Comparing Scatter Matrices. Comparing positive definite symmetric matrices arises naturally within a variety of multivariate statistical problems. Perhaps the most obvious case is when one wishes to compare the covariance structures of two or more different groups, see e.g. [16]. Other well known cases occur in multivariate analysis of variance, or MANOVA, wherein interest lies in comparing the within group and between group sum of squares and cross-products matrices, and in canonical correlation analysis, wherein interest lies in comparing the covariance matrix of one set of variables with the covariance matrix of its linear predictor based on another set of variables. These methods involve either multiple populations or two different sets of variables. Less attention has been given to the comparison of different estimates of scatter for a single set of variables from a single population. Some work in in this direction, though, can be found in [1, 4, 5, 6, 7, 41], which will be discussed in later sections. Typically, the difference between two positive definite symmetric matrices can be summarized by considering the eigenvalues and eigenvectors of one matrix with respect to the other. More specifically, suppose V 1 P p and V 2 P p. An eigenvalue, say ρ j, and a corresponding eigenvector, say h j, of V 2 relative to V 1 correspond to a nontrivial solution to the matrix equations (11) V 2 h j = ρ j V 1 h j. Equivalently, ρ j and h j are an eigenvalue and corresponding eigenvector respectively of V1 1 V 2. Since most readers are probably more familiar with the eigenvalue-eigenvector theory of symmetric matrices, we note that ρ j also represents an eigenvalue of the symmetric matrix M = V 1/2 1 V 2 V 1/2 1 P, where V 1/2 1 P p denotes the unique positive definite symmetric square root of V 1. Hence, we can choose p ordered eigenvalues, ρ 1 ρ 2... ρ p > 0, and an orthonormal set of eigenvectors q j, j = 1,..., p, such that Mq j = ρ j q j. The relationship between h j and the eigenvectors of M is given by q j V 1/2 1 h j, and so h i V 1h j = 0 for i j. This yields the following simultaneous diagonalization of V 1 and V 2, (12) H V 1 H = D 1 and H V 2 H = D 2 where H = [ h 1... h p ], D 1 and D 2 are diagonal matrices with positive entries and D 1 1 D 2 = = diagonal{ρ 1,..., ρ p }. Without loss of generality, one can take D 1 = I by normalizing h j so that h j V 1h j = 1. Alternatively, one can take D 2 = I. Such a normalization is not necessary for our

7 INVARIANT COORDINATE SELECTION 7 purposes and we simply prefer the general form (12) since it reflects the exchangeability between the roles of V 1 and V 2. Note that the matrix V 1 1 V 2 has the spectral value decomposition (13) V 1 1 V 2 = H H 1. Various useful interpretations of the eigenvalues and eigenvectors in (11) can be given whenever V 1 and V 2 are two different scatter matrices for the same population or sample. We first note that the eigenvalues ρ 1,..., ρ p are the maximal invariants under affine transformation for comparing V 1 and V 2. That is, if we define a function G(V 1, V 2 ) such that G(V 1, V 2 ) = G(AV 1 A, AV 2 A ) for any nonsingular A, then G(V 1, V 2 ) = G(D 1, D 2 ) = G(I, ), with D 1, D 2 and being defined as above. Furthermore is invariant under such transformations. Since scatter matrices tend to only be well defined up to a scalar multiple, it is more natural to be interested in the difference between V 1 and V 2 up to proportionality. In this case, if we consider a function G(V 1, V 2 ) such that G(V 1, V 2 ) = G(λ 1 AV 1 A, λ 2 AV 2 A ) for any nonsingular A and any λ 1 > 0 and λ 2 > 0, then G(V 1, V 2 ) = G(I, / det( ) 1/p ). That is, maximal invariants in this case are (ρ 1,..., ρ p )/( p i=1 ρ i) 1/p or, in other words, we are interested in (ρ 1,..., ρ p ) up to a common scalar multiplier. A more useful interpretation of the eigenvalues arises from the following optimality property, which follows readily from standard eigenvalue-eigenvector theory. For h R p, let (14) κ(h) = h V 2 h/h V 1 h. For V 1 = V 1 (F Y ) and V 2 = V 2 (F Y ), κ(h) represents the square of the ratio of two different measures of scale for the variable h Y. Recall that the classical measure of kurtosis corresponds to the fourth power of the ratio of two scale measures, namely the fourth root of the fourth central moment and the standard deviation. Thus, the value of κ(h) 2 can be viewed as a generalized measure of relative kurtosis. The term relative is used here since the scatter matrices V 1 and V 2 are not necessarily normalized. If both V 1 and V 2 are normalized so that they are both consistent for the variance-covariance matrix under a multivariate normal model, then a deviation of κ(h) from 1 would indicate non-normality. In general, though, the ratio κ(h 1 ) 2 /κ(h 2 ) 2 does not depend upon any particular normalization. The maximal possible value of κ(h) over h R p is ρ 1 with the maximum being achieved in the direction of h 1. Likewise, the minimal possible value of κ(h) is ρ p with the minimum being achieved in the direction of h p. More generally, we have (15) sup{κ(h) h R p, h V 1 h j = 0, j = 1,... m 1} = ρ m, with the supremum being obtained at h m, and (16) inf{κ(h) h R p, h V 1 h j = 0, j = m + 1,... p} = ρ m, with the infimum being obtained at h m. These successive optimality results suggest that plotting the data or distribution using the coordinates Z = H Y may reveal interesting structures. We explore this idea in later sections.

8 8 D.E. TYLER, F. CRITCHLEY, L. DÜMBGEN and H. OJA Remark 3.1. An alternative motivation for the transformation Z = H Y is as follows. Suppose Y is first standardized using a scatter functional V 1 (F ) satisfying (3), i.e. X = V 1 (F Y ) 1/2 Y. If Y is elliptically symmetric about µ Y, then X is spherically symmetric about the center µ X = V 1 (F Y ) 1/2 µ Y. If a second scatter functional is then applied to X, say V 2 (F ) satisfying (3), then V 2 (F X ) I, and hence no projection of X is any more interesting than any other projection of X. However, if Y is not elliptically symmetric, then V 2 (F X ) is not necessarily proportional to I. This suggests a principal components analysis of X based on V 2 (F X ) may reveal some interesting projections. By taking the spectral value decomposition V 2 (F X ) = QDQ, where Q is an orthogonal matrix, and then constructing the principal component variables Q X, one obtains (17) Q X = H Y = Z, with D =, whenever H is normalized so that H V 1 (F Y )H = I. 4. Invariant Coordinate Systems. In this and the following section we study the properties of the transformation Z = H Y in more detail, and in section 6 we give some examples illustrating the utility of the transformation when used in diagnostic plots. For simplicity, unless otherwise stated, we hereafter state any theoretical properties using the functional or population version of scatter matrices. The sample version then follows as a special case based on the empirical distributions. Examples are, of course, given for the sample version. The following condition is assumed throughout and the following notation is used hereafter. Condition 4.1. For Y R p having distribution F Y, let V 1 (F ) and V 2 (F ) be two scatter functionals satisfying (3). Further, suppose both V 1 (F ) and V 2 (F ) are uniquely defined at F Y. Definition 4.1. Let H(F ) = [h 1 (F )... h p (F )] be a matrix of eigenvectors defined as in (11) and (12), with ρ 1 (F )... ρ p (F ) being the corresponding eigenvalues, whenever V 1 and V 2 are taken to be V 1 (F ) and V 2 (F ) respectively. It is well known that principal component variables are invariant under translations and orthogonal transformations of the original variables, but not invariant under other general affine transformations. An important property of the transformation proposed here, i.e. Z = H(F Y ) Y, is that the resulting variables are invariant under any affine transformation. Theorem 4.1. In addition to Condition 4.1, suppose the roots ρ 1 (F Y ),..., ρ p (F Y ) are all distinct. Then for the affine transformation Y = AY + b, with A being nonsingular, (18) ρ j (F Y ) = γρ j (F Y ) for j = 1,..., p for some γ > 0. Moreover, the components of Z = H(F Y ) Y and Z = H(F Y ) Y differ at most by coordinatewise location and scale. That is, for some constants α 1,..., α p and β 1,..., β p, with

9 INVARIANT COORDINATE SELECTION 9 α j 0 for j = 1,..., p, (19) Z j = α j Z j + β j for j = 1,..., p. Due to property (19) we refer to the transformed variables Z = H(F Y ) Y as an invariant coordinate system, and the method for obtaining them as invariant coordinate selection (ICS). Note that if a univariate standardization is applied to the transformed variables, then the standardized versions of Z j and Zj differ only by a factor of ±1. A generalization of the previous theorem, which allows for possible multiple roots, can be stated as follows. Theorem 4.2. Let Y, Y, Z and Z be defined as in Theorem 4.1. In addition to Condition 4.1, suppose the roots ρ 1 (F Y ),..., ρ p (F Y ) consist of m distinct values, say ρ (1) >... > ρ (m), with ρ (k) having multiplicity p k for k = 1,..., m, and hence p p m = p. Then, (18) still holds. Furthermore, suppose we partition Z = (Z(1),..., Z (m) ), where Z (k) R p k. Then, for some nonsingular matrix C k of order p k and some p k -dimensional vector β k, (20) Z (k) = C kz (k) + β k for k = 1,..., m. That is, the space spanned by the components of Z(k) components of Z (k). is the same as the space spanned by the As with any eigenvalue/eigenvector problem, eigenvectors are not well defined. For a distinct root, the eigenvector is well defined up to a scalar multiple. For a multiple root, say with multiplicity p o, the corresponding p o eigenvectors can be chosen to be any linearly independent vectors spanning the corresponding p o dimensional eigenspace. Consequently Z (k) in Theorem 4.2 is not well defined. One could construct some arbitrary rule for defining Z (k) uniquely. However, this is not necessary here since no matter which rule one may use to define Z (k) uniquely, the results of Theorem 4.2 hold. 5. ICS Under Non-elliptical Models. When Y has an elliptically symmetric distribution, all the roots ρ 1 (F Y ),..., ρ p (F Y ) are equal, and so the ICS transformation Z = H(F Y ) Y is arbitrary. The aim of ICS though is to detect departures of Y from an elliptically symmetric distribution. In this section, the behavior of the ICS transformation is demonstrated theoretically for two classes of non-elliptically symmetric models, namely for mixtures of elliptical distributions and for independent components models Mixture of elliptical distributions. In practice, data often appear to arise from mixture distributions, with the mixing being the result of some unmeasured grouping variable. Uncovering the different groups is typically viewed as a problem in cluster analysis. One clustering method, proposed by Art, Gnanadeskian and Kettenring [1], is based on first reducing the dimension of the

10 10 D.E. TYLER, F. CRITCHLEY, L. DÜMBGEN and H. OJA clustering problem by attempting to identify Fisher s linear discriminant subspace. To do this, they give an iterative algorithm for approximating the within group sum of squares and cross-products matrix, say W n, and then consider the eigenvectors of Wn 1 (T n W n ), where T n is the total sum of squares and cross-products matrix. The approach proposed by Art et al. [1] is motivated primarily by heuristic arguments and is supported by a Monte Carlo study. Subsequently, Ruiz-Gazen [41] and Caussinus and Ruiz-Gazen [5, 6] show for a location mixture of multivariate normal distributions with equal variance-covariance matrices that Fisher s linear discriminant subspace can be consistently estimated even when the group identification is not known, provided that the dimension, say q, of the subspace is known. Their results are based on the eigenvectors associated with the q largest eigenvalues of S1,n 1 S n, where S n is the sample variance-covariance matrix and S 1,n is either the one-step W -estimate (7) or its symmetrized version. They also require that the S 1,n differs from S n by only a small perturbation, since their proof involves expanding the functional version of S 1,n about the functional version of S n. In this subsection, it is shown that these results can be extended essentially to any pair of scatter matrices, and also that the results hold under mixtures of elliptical distributions with proportional scatter parameters. For simplicity, we first consider properties of the ICS transformation for a mixture of two multivariate normal distributions with proportional covariance matrices. Considering proportional covariance matrices allows for the inclusion of a point mass contamination as one of the mixture components, since a point mass contamination is obtained by letting the proportionality constant go to zero. Theorem 5.1. In addition to Condition 4.1, suppose Y d (1 α) Normal p (µ 1, Γ) + α Normal p (µ 2, λ Γ), where 0 < α < 1, µ 1 µ 2, λ > 0 and Γ P p. Then either i) ρ 1 (F Y ) > ρ 2 (F Y ) =... = ρ p (F Y ), ii) ρ 1 (F Y ) =... = ρ p 1 (F Y ) > ρ p (F Y ), or iii) ρ 1 (F Y ) =... = ρ p (F Y ). For p > 2, if case (i) holds, then h 1 (F Y ) Γ 1 (µ 1 µ 2 ), and if case (ii) holds, then h p (F Y ) Γ 1 (µ 1 µ 2 ). For p = 2, if ρ 1 (F Y ) > ρ 2 (F Y ), then either h 1 (F Y ) or h 2 (F Y ) is proportional to Γ 1 (µ 1 µ 2 ) Thus, depending on whether case (i) or case (ii) holds, h 1 or h p respectively corresponds to Fisher s linear discriminant function, see e.g. [28], even though the group identity is unknown. An intuitive explanation as to why one might expect this to hold is that any estimate of scatter contains information on the between group variability, i.e. the difference between µ 1 and µ 2, and the within

11 INVARIANT COORDINATE SELECTION 11 group variability or shape, i.e. Γ. Thus, one might anticipate that one could separate these two sources of variability by using two different estimates of scatter. This intuition though is not used in our proof of Theorem 5.1, nor is our proof based on generalizing the perturbation arguments used by Ruiz-Gazen [41] and Caussinus and Ruiz-Gazen [6] in deriving their aforementioned results. Rather, the proof of Theorem 5.1 given in the appendix relies solely on invariance arguments. Whether case (i) or case (ii) holds in Theorem 5.1 depends on the choice of V 1 (F ) and V 2 (F ) and on the nature of the mixture. Obviously, if case (i) holds and then the roles of V 1 (F ) and V 2 (F ) are reversed, then case (ii) would hold. Case (iii) holds only in very specific situations. In particular, case (iii) holds if µ 1 = µ 2, in which case Y has an elliptically symmetric distribution. When µ 1 µ 2, i.e. when the mixture is not elliptical itself, it is still possible for case (iii) to hold. This though is dependent not only on the specific choice of V 1 (F ) and V 2 (F ), but also on the particular value of the parameters α, µ 1, µ 2, Γ and λ. For example, suppose V 1 (F ) = Σ(F ), the population covariance matrix, and V 2 (F ) = K(F ) where (21) K(F ) = E[(Y µ Y ) Σ(F ) 1 (Y µ Y ) (Y µ Y )(Y µ Y ) ], Beside being analytically tractable, the scatter functional K(F ) is one which arises in a classical algorithm for independent components analysis and is discussed in more detail in later sections. For the special case λ = 1 and when µ 1 µ 2, if we let η = α(1 α), then it can be shown that case (i) holds for η > 1/6, case (ii) holds for η < 1/6, and case (iii) holds for η = 1/6. Also, for any of these three cases, we have ρ 1 (F Y ) ρ p (F Y ) = η 1 6η θ 2 /(1 + ηθ) 2, where θ = (µ 1 µ 2 ) Γ 1 (µ 1 µ 2 ). Other examples have been studied in the aforementioned papers by Caussinus and Ruiz-Gazen [5, 6]. In their work, V 2 (F ) = Σ(F ) and V 1 (F ) corresponds to the functional version of the symmetrized version of the one-step W -estimate (7). Paraphrasing, they show for the case λ = 1 and for the class of weight functions u 2 (s) = u(βs) that case (i) holds for small enough β provided η < 1/6. They do not note, though, that case (i) or (ii) can hold for other values of β and η. The reason the condition η < 1/6 arises in their work, as well as in the discussion in the previous paragraph, is because their proof involves expanding u(βs) about u(s), with the matrix K(F ) then appearing in the linear term of the corresponding expansion of the one-step W -estimate about Σ(F ). Theorem 5.1 readily generalizes to a mixture of two elliptical distributions with equal shape matrices, but with possibly different location vectors and different spread functions. That is, if Y has density f Y (y) = (1 α)f(y; µ 1, Γ, g 1 ) + αf(y; µ 2, Γ, g 2 ), where 0 < α < 1, µ 1 µ 2 and f(y; µ, Γ, g) is defined by (6), then the results of Theorem 5.1 hold. Note that this mixture distribution includes the case where both mixture components are from the same elliptical family but with proportional shape matrices. This special case corresponds to setting g 2 (s) = g 1 (s/λ), and hence f(y; µ 2, Γ, g 2 ) = f(y; µ 2, λγ, g 1 ).

12 12 D.E. TYLER, F. CRITCHLEY, L. DÜMBGEN and H. OJA An extension of these results to a mixture of k elliptically symmetric distributions with possibly different centers and different spread functions, but with equal shape matrices, is given in the following theorem. Stated more heuristically, this theorem implies that Fisher s linear discriminant subspace, see e.g. [28], corresponds to the span of some subset of the invariant coordinates, even though the group identifications are not known. Theorem 5.2. In addition to Condition 4.1, suppose Y has density f Y (y) = det(γ) 1/2 k α j g j {(y µ j ) Γ 1 (y µ j )}, j=1 where α j > 0 for j = 1,..., k, α α k = 1, Γ P p, and g 1,..., g k are nonnegative functions. Also, suppose the centers µ 1,..., µ k span some q dimensional hyperplane, with 0 < q < p. Then, using the notation of Theorem 4.2 for multiple roots, there exists at least one root ρ (j), j = 1,..., m, with multiplicity greater than or equal to p q. Furthermore, if no root has multiplicity greater than p q, then there is a root with multiplicity p q, say ρ (t), such that (22) Span{ Γ 1 (µ j µ k ) j = 1,..., k 1 } = Span{H q (F Y )}, where H q (F Y ) = [ h 1 (F Y ),..., h p1+...+p t 1 (F Y ), h p1+...+p t+1 (F Y ),..., h p (F Y ) ]. The condition in the above theorem that only one root has multiplicity p q and no other root has a greater multiplicity reduces to case (i)-(ii) in Theorem 5.1 when k = 2. Analogous to the discussion given after Theorem 5.1, this condition generally holds except for special cases. For a given choice of V 1 (F Y ) and V 2 (F Y ), these special cases depend on the particular values of the parameters Independent components analysis models. Independent components analysis or ICA is a highly popular method within many applied areas which routinely encounter multivariate data. For a good overview, see [21]. The most common ICA model presumes that Y arises as a convolution of p independent components or variables. That is, Y = BX, where B is nonsingular, and the components of X, say X 1,..., X p, are independent. The main objective of ICA is to recover the mixing matrix B so that one can unmix Y to obtain independent components X = B 1 Y. Under this ICA model, there is some indeterminacy in the mixing matrix B, since the model can also be expressed as Y = B o X o, where B o = BQΛ and X o = Λ 1 Q X, Q being a permutation matrix and Λ a diagonal matrix with non-zero entries. The components of X o are then also independent. Under the condition that at most one of the independent components X 1,..., X p has a normal distribution, it is well known that this is the only indeterminacy for B, and consequently the independent components X = B 1 Y are well defined up to permutations and componentwise scaling factors. The relationship between ICS and ICA for symmetric distributions is given in the next theorem.

13 INVARIANT COORDINATE SELECTION 13 Theorem 5.3. In addition to Condition 4.1, suppose Y = BX+µ, where B is nonsingular, and the components of X, say X 1,..., X p, are mutually independent. Further, suppose X is symmetric about 0, i.e. X d X, and the roots ρ 1 (F Y ),..., ρ p (F Y ) are all distinct. Then, the transformed variable Z = H(F Y ) Y consists of independent components, or more specifically, Z and X differ by at most a permutation and/or componentwise location and scale. From the proof of Theorem 5.3, it can be noted that the condition that X be symmetrically distributed about 0 can be relaxed to require that only p 1 of the components of X be symmetrically distributed about 0. It is also worth noting that the condition that all the roots be distinct is more restrictive than the condition that at most one of the components of X is normal. This follows since it is straightforward to show in general that if the distributions of two components of X differ from each other by only a location shift and/or scale change, then there is at least one root having multiplicity greater than one. If X is not symmetric about 0, then one can symmetrize Y before applying the above theorem. That is, suppose Y = BX + µ with X having independent components, and let Y 1 and Y 2 be independent copies of Y. Then Y s = Y 1 Y 2 = BX s, where X s = X 1 X 2 is symmetric about zero and has independent components. Thus, Theorem 5.3 can be applied to Y s. Moreover, since the convolution matrix B is the same for both Y and Y s, it follows that the transformed variable Z = H(F s Y ) Y and X differ by at most a permutation and/or componentwise location and scale, where F s Y refers to the symmetrized distribution of F Y, i.e. the distribution of Y s. An alternative to symmetrizing Y is to choose both V 1 (F ) and V 2 (F ) so that they satisfy the following independence property. Definition 5.1. An affine equivariant scatter functional V (F ) is said to have the independence property if V (F X ) is a diagonal matrix whenever the components of X are mutually independent, provided V (F X ) exists. Assuming this property, Oja et al. [35] proposed using principal components on standardized variables as defined in Remark 3.1 to obtain a solution to the ICA problem. Their solution can be restated as follows. Theorem 5.4. In addition to Condition 4.1, suppose Y = BX + µ, where B is nonsingular, and the components of X, say X 1,..., X p, are mutually independent. Further, suppose both scatter functionals V 1 (F ) and V 2 (F ) satisfy the independence property given in Definition 5.1, and the roots ρ 1 (F Y ),..., ρ p (F Y ) are all distinct. Then, the transformed variable Z = H(F Y ) Y consists of independent components, or more specifically, Z and X differ by at most a permutation and/or componentwise location and scale. The covariance matrix Σ(F ) is of course well known to satisfy Definition 5.1. It is also straightforward to show that the scatter functional K(F ) defined in (21) does as well. Theorem 5.4 represents

14 14 D.E. TYLER, F. CRITCHLEY, L. DÜMBGEN and H. OJA a generalization of the an early ICA algorithm proposed by Cardoso [3] based on the spectral value decomposition of a kurtosis matrix. Cardoso s algorithm, which he calls the fourth-order blind identification (FOBI ) algorithm, can be shown to be equivalent to choosing V 1 (F ) = Σ(F ) and V 2 (F ) = K(F ) in the above theorem. It is worth noting that the independence property given by Definition 5.1 is weaker than the property (23) X i and X j are independent V (F X ) i,j = 0. The covariance matrix satisfies (23), whereas K(F ) does not. An often overlooked observation is that (23) does not hold for robust scatter functionals in general, i.e. independence does not necessarily imply a zero pseudo-correlation. It is an open problem as to what scatter functionals other the covariance matrix, if any, satisfy (23). Furthermore, robust scatter functionals tend not to satisfy in general even the weaker Definition 5.1. At symmetric distributions, though, the independence property can be shown to hold for general scatter matrices in the following sense. Theorem 5.5. Let V (F ) be a scatter functional satisfying (3). Suppose the distribution of X is symmetric about some center µ R p, with the components of X being mutually independent. If V (F X ) exists, then it is a diagonal matrix. Consequently, given a scatter functional V (F ), one can construct a new scatter functional satisfying Definition 5.1 by defining V s (F ) = V (F s ), where F s represents the symmetrized distribution of F. Using symmetrization to obtain scatter functionals which satisfy the independence property has been studied recently by Taskinen et al. [42]. Finally, we note that the results of this section can be generalized in two directions. First, we consider the case of multiple roots, and next we consider the case where only blocks of the components of X are independent. Theorem 5.6. In addition to Condition 4.1, suppose Y = BX + µ, where B is nonsingular, and the components of X, say X 1,..., X p, are mutually independent. Further, suppose either (i) X is symmetric about 0, i.e. X d X, or (ii) both V 1 (F ) and V 2 (F ) satisfy Definition 5.1. Then, using the notation of Theorem 4.2 for multiple roots, for the transformed variable Z = H(F Y ) Y the random vectors Z (1),..., Z (m) are mutually independent. Theorem 5.7. In addition to Condition 4.1, suppose Y = BX + µ, where B is nonsingular, and X = (X(1),..., X (m) ) has mutually independent components X (1) R p1,..., X (m) R pm, with p p m = p. Further, suppose X is symmetric about 0, and the roots ρ 1 (F Y ),..., ρ p (F Y ) are

15 INVARIANT COORDINATE SELECTION 15 all distinct. Then, there exists a partition {J 1,... J m } of {1,..., p} with the cardinality of J k being p k for k = 1,..., m such that for the transformed variable Z = H(F Y ) Y the random vectors Z (1) = {Z j, j J 1 },..., Z (m) = {Z j, j J m } are mutually independent. More specifically, Z (j) and X (j) are affine transformations of each other. From the proof of Theorem 5.7, it can be noted that the theorem still holds if one of the X (j) s is not symmetric. If the distribution of X is not symmetric, Theorems 5.6 and 5.7 can be applied to Y s, the symmetrized version of Y. To generalize Theorem 5.4 to the case where blocks of the components of X are independent, a modification of the independence property is needed. Such generalizations of Definition 5.1, Theorem 5.4 and Theorem 5.5 are fairly straightforward, and so are not treated formally here. Remark 5.1. The general case of multiple roots for the setting given in Theorem 5.7 is more problematic. The problem stems from the possibility that a multiple root may not be associated with a particular X (j) but rather with two or more different X (j) s. For example, consider the case X = (X(1), X (2)), with X (1) R 2 and X (2) R. For this case, V 1 (F X ) 1 V 2 (F X ) is block diagonal with diagonal blocks of order 2 and 1 respectively. The three eigenvalues ρ 1 (F Y ), ρ 2 (F Y ) and ρ 3 (F Y ) correspond to the two eigenvalues of the diagonal block of order 2 and to the last diagonal element, but not necessarily respectively. So, if ρ 1 (F Y ) = ρ 2 (F Y ) > ρ 3 (F Y ), this does not imply that the last diagonal element corresponds to ρ 3 (F Y ), and hence Z (1) R 2 and Z (2) R, as defined in Theorem 4.2, are not necessarily independent. 6. Discussion and Examples. Although the theoretical results of this paper essentially apply to any pair of scatter matrices, in practice the choice of scatter matrices can affect the resulting ICS method. From our experience, for some data sets, the choice of the scatter matrices does not seem to have a big impact on the diagnostic plots of the ICS variables, particularly when the data is consistent with one of the mixture models or one of the independent component models considered in section 5. For some other data sets, however, the resulting diagnostic plots can be quite sensitive to the choice of the scatter matrices. In general, different pairs of scatter matrices may reveal different types of structure in the data, since departures from an elliptical distribution can come in many forms. Consequently, it is doubtful if any specific pair of scatter matrices is best for all situations. Rather than choosing two scatter matrices beforehand, especially when one is in a purely exploratory situation having no idea of what to expect, it would be reasonable to consider a number of different pairs of scatter matrices and to consider the resulting ICS transformations as complementary. A general sense of how the choice of the pair of scatter matrices may impact the resulting ICS method can be obtained by a basic understanding of the properties of the scatter matrices being used. For the purpose of this discussion, we divide the scatter matrices into three broad

16 16 D.E. TYLER, F. CRITCHLEY, L. DÜMBGEN and H. OJA classes. Class I scatter statistics will refer to those which are not robust in the sense that their breakdown point is essentially zero. This class includes the sample covariance matrix, as well as the one-step W -estimates defined by (7) and their symmetrized version. Other scatter statistics which lie within this class are the multivariate sign and rank scatter matrices, see e.g. [46]. Class II scatter statistics will refer to those which are moderately robust in the sense that they have bounded influence functions as well as positive breakdown points, but with breakdown points being no greater than 1/(p + 1). This class primarily includes the multivariate M -estimates, but it also includes among others the sample covariance matrices obtained after applying either convex hull peeling or ellipsoid hull peeling to the data, see [13]. Class III scatter statistics will refer to the high breakdown point scatter matrices which are discussed in section 2.2. The symmetrized version of a class II or III scatter matrix, as well as the one-step W -estimates of scatter (10) which uses an initial class II or III scatter matrix for downweighting, are viewed respectively as class II or III scatter matrices themselves. If one or both scatter matrices are from class I, then the resulting ICS transformation may be heavily influenced by a few outliers at the expense of finding other structures in the data. In addition, even if there are no spurious outliers and a mixture model or an independent components model of the form discussed in section 5 hold, but with long tailed distributions, then the resulting sample ICS transformation may be an inefficient estimate of the corresponding population ICS transformation. Simulation studies reported in [32] have shown that for independent components analysis an improved performance is obtained by choosing robust scatter matrices for the ICS transformation. Nevertheless, since they are simple to compute, the use of class I scatter matrices can be useful if the data set is known not to contain any spurious outliers or if the objective of the diagnostics is to find such outliers, as recommended in [4]. If one uses class II or III scatter matrices, then one can still find spurious outliers by plotting the corresponding robust Mahalanobis distances. The resulting ICS transformation, though, would not be heavily affected by the spurious outliers. Outliers affect class II scatter matrices more so than class III scatter matrices, although even a high proportion of spurious outliers may not necessarily affect the class II scatter matrices. For outliers to heavily affect a class II scatter matrix, they usually need to lie in a cluster, see e.g. [15]. The results of section 5.1 though suggest that such clustered outliers can be identified after making an ICS transformation, even if they can not be identified using a robust Mahalanobis distance based on a class II statistic. Using two class III scatter matrices for an ICS transformation may not necessarily give good results, unless one is only interested in the structure of the inner 50% of the data. For example, suppose the data arises from a mixture of two multivariate normal distributions with widely separated means but equal covariance matrices. A class III scatter matrix is then primarily determined by the properties of the 60% component. Consequently, when using two class III scatter matrices for ICS the corresponding ICS roots will tend to be equal or nearly equal. In the case where all the roots are equal, Theorem 5.1 does not apply. In the case where the roots are nearly

17 INVARIANT COORDINATE SELECTION 17 (a) (b) (c) Distances using Cauchy M estimate Distances using W estimate Distances using W estimate Index Index Distances using Cauchy M estimate Fig. 1. Example 1: Mahalanobis distances based on (a) V 1, (b) V 2 and (c) V 1 versus V 2. equal, due to sampling variation, the sample ICS transformation may not satisfactorily uncover Fisher s linear discriminant function. A reasonable general choice for the pair of scatter matrices to use for an ICS transformation would be to use one class II and one class III scatter matrix. If one wishes to avoid the computational complexity involved with a class III scatter matrix, then using two class II scatter matrices may be adequate. In particular, one could choose a class II scatter matrix whose breakdown point is close to 1/(p + 1), such as the M -estimate corresponding to the maximum likelihood estimate for an elliptical Cauchy distribution [15], together with a corresponding one-step W -estimate for which ψ(s) = su 2 (s) 0 goes s. Such a one-step W -estimate of scatter has a redescending influence function. From our experience, the use of a class III scatter matrix for ICS does not seem to reveal any data structures that can not be obtained otherwise. The remarks and recommendations made here are highly conjectural. What pairs of scatter matrices are best at detecting specific types of departure from an elliptical distribution remains a broad open problem. In particular, it would be of interest to discover for what types of data structures would it be advantageous to use at least one class III scatter matrix in the ICS method. Most likely, some advantages may arise when working with very high dimensional data sets, in which case the computational intensity needed to compute a class III scatter matrix is greatly amplified, see e.g. [40]. We demonstrate some of the concepts in the following examples. These examples illustrate for several data sets the use of the ICS transformation for constructing diagnostic plots. They also serve as illustrations of the theory presented in the previous sections Example 1. Rousseeuw and van Driessen [40] analyze a data set consisting of n = 677 metal plates on which p = 9 characteristics are measured. For this data set they compute the sample mean and covariance matrix as well as the MCD estimate of center and scatter. Their paper helps illustrate the advantage of using high breakdown point multivariate estimates, or class III statistics, for uncovering multiple outliers in a data set.

Invariant co-ordinate selection

Invariant co-ordinate selection J. R. Statist. Soc. B (2009) 71, Part 3, pp. 549 592 Invariant co-ordinate selection David E. Tyler, Rutgers University, Piscataway, USA Frank Critchley, The Open University, Milton Keynes, UK Lutz Dümbgen

More information

Invariant coordinate selection for multivariate data analysis - the package ICS

Invariant coordinate selection for multivariate data analysis - the package ICS Invariant coordinate selection for multivariate data analysis - the package ICS Klaus Nordhausen 1 Hannu Oja 1 David E. Tyler 2 1 Tampere School of Public Health University of Tampere 2 Department of Statistics

More information

Robustness of Principal Components

Robustness of Principal Components PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.

More information

Properties of Matrices and Operations on Matrices

Properties of Matrices and Operations on Matrices Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,

More information

Scatter Matrices and Independent Component Analysis

Scatter Matrices and Independent Component Analysis AUSTRIAN JOURNAL OF STATISTICS Volume 35 (2006), Number 2&3, 175 189 Scatter Matrices and Independent Component Analysis Hannu Oja 1, Seija Sirkiä 2, and Jan Eriksson 3 1 University of Tampere, Finland

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

EE731 Lecture Notes: Matrix Computations for Signal Processing

EE731 Lecture Notes: Matrix Computations for Signal Processing EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University September 22, 2005 0 Preface This collection of ten

More information

Independent component analysis for functional data

Independent component analysis for functional data Independent component analysis for functional data Hannu Oja Department of Mathematics and Statistics University of Turku Version 12.8.216 August 216 Oja (UTU) FICA Date bottom 1 / 38 Outline 1 Probability

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Independent Component (IC) Models: New Extensions of the Multinormal Model

Independent Component (IC) Models: New Extensions of the Multinormal Model Independent Component (IC) Models: New Extensions of the Multinormal Model Davy Paindaveine (joint with Klaus Nordhausen, Hannu Oja, and Sara Taskinen) School of Public Health, ULB, April 2008 My research

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Computational Connections Between Robust Multivariate Analysis and Clustering

Computational Connections Between Robust Multivariate Analysis and Clustering 1 Computational Connections Between Robust Multivariate Analysis and Clustering David M. Rocke 1 and David L. Woodruff 2 1 Department of Applied Science, University of California at Davis, Davis, CA 95616,

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Optimization Theory. A Concise Introduction. Jiongmin Yong

Optimization Theory. A Concise Introduction. Jiongmin Yong October 11, 017 16:5 ws-book9x6 Book Title Optimization Theory 017-08-Lecture Notes page 1 1 Optimization Theory A Concise Introduction Jiongmin Yong Optimization Theory 017-08-Lecture Notes page Optimization

More information

The Multivariate Gaussian Distribution

The Multivariate Gaussian Distribution The Multivariate Gaussian Distribution Chuong B. Do October, 8 A vector-valued random variable X = T X X n is said to have a multivariate normal or Gaussian) distribution with mean µ R n and covariance

More information

Linear Algebra and Eigenproblems

Linear Algebra and Eigenproblems Appendix A A Linear Algebra and Eigenproblems A working knowledge of linear algebra is key to understanding many of the issues raised in this work. In particular, many of the discussions of the details

More information

On Invariant Within Equivalence Coordinate System (IWECS) Transformations

On Invariant Within Equivalence Coordinate System (IWECS) Transformations On Invariant Within Equivalence Coordinate System (IWECS) Transformations Robert Serfling Abstract In exploratory data analysis and data mining in the very common setting of a data set X of vectors from

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

2. Matrix Algebra and Random Vectors

2. Matrix Algebra and Random Vectors 2. Matrix Algebra and Random Vectors 2.1 Introduction Multivariate data can be conveniently display as array of numbers. In general, a rectangular array of numbers with, for instance, n rows and p columns

More information

Numerical Linear Algebra

Numerical Linear Algebra University of Alabama at Birmingham Department of Mathematics Numerical Linear Algebra Lecture Notes for MA 660 (1997 2014) Dr Nikolai Chernov April 2014 Chapter 0 Review of Linear Algebra 0.1 Matrices

More information

Chapter 3 Transformations

Chapter 3 Transformations Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases

More information

Basic Concepts in Matrix Algebra

Basic Concepts in Matrix Algebra Basic Concepts in Matrix Algebra An column array of p elements is called a vector of dimension p and is written as x p 1 = x 1 x 2. x p. The transpose of the column vector x p 1 is row vector x = [x 1

More information

This appendix provides a very basic introduction to linear algebra concepts.

This appendix provides a very basic introduction to linear algebra concepts. APPENDIX Basic Linear Algebra Concepts This appendix provides a very basic introduction to linear algebra concepts. Some of these concepts are intentionally presented here in a somewhat simplified (not

More information

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar

More information

HST.582J/6.555J/16.456J

HST.582J/6.555J/16.456J Blind Source Separation: PCA & ICA HST.582J/6.555J/16.456J Gari D. Clifford gari [at] mit. edu http://www.mit.edu/~gari G. D. Clifford 2005-2009 What is BSS? Assume an observation (signal) is a linear

More information

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = 30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

More information

Discriminant analysis and supervised classification

Discriminant analysis and supervised classification Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical

More information

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x = Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.

More information

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The

More information

Elliptically Contoured Distributions

Elliptically Contoured Distributions Elliptically Contoured Distributions Recall: if X N p µ, Σ), then { 1 f X x) = exp 1 } det πσ x µ) Σ 1 x µ) So f X x) depends on x only through x µ) Σ 1 x µ), and is therefore constant on the ellipsoidal

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

MULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES

MULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES MULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES S. Visuri 1 H. Oja V. Koivunen 1 1 Signal Processing Lab. Dept. of Statistics Tampere Univ. of Technology University of Jyväskylä P.O.

More information

A ROBUST METHOD OF ESTIMATING COVARIANCE MATRIX IN MULTIVARIATE DATA ANALYSIS G.M. OYEYEMI *, R.A. IPINYOMI **

A ROBUST METHOD OF ESTIMATING COVARIANCE MATRIX IN MULTIVARIATE DATA ANALYSIS G.M. OYEYEMI *, R.A. IPINYOMI ** ANALELE ŞTIINłIFICE ALE UNIVERSITĂłII ALEXANDRU IOAN CUZA DIN IAŞI Tomul LVI ŞtiinŃe Economice 9 A ROBUST METHOD OF ESTIMATING COVARIANCE MATRIX IN MULTIVARIATE DATA ANALYSIS G.M. OYEYEMI, R.A. IPINYOMI

More information

MATH 583A REVIEW SESSION #1

MATH 583A REVIEW SESSION #1 MATH 583A REVIEW SESSION #1 BOJAN DURICKOVIC 1. Vector Spaces Very quick review of the basic linear algebra concepts (see any linear algebra textbook): (finite dimensional) vector space (or linear space),

More information

Stat 159/259: Linear Algebra Notes

Stat 159/259: Linear Algebra Notes Stat 159/259: Linear Algebra Notes Jarrod Millman November 16, 2015 Abstract These notes assume you ve taken a semester of undergraduate linear algebra. In particular, I assume you are familiar with the

More information

Introduction to Robust Statistics. Anthony Atkinson, London School of Economics, UK Marco Riani, Univ. of Parma, Italy

Introduction to Robust Statistics. Anthony Atkinson, London School of Economics, UK Marco Riani, Univ. of Parma, Italy Introduction to Robust Statistics Anthony Atkinson, London School of Economics, UK Marco Riani, Univ. of Parma, Italy Multivariate analysis Multivariate location and scatter Data where the observations

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

Linear Algebra: Matrix Eigenvalue Problems

Linear Algebra: Matrix Eigenvalue Problems CHAPTER8 Linear Algebra: Matrix Eigenvalue Problems Chapter 8 p1 A matrix eigenvalue problem considers the vector equation (1) Ax = λx. 8.0 Linear Algebra: Matrix Eigenvalue Problems Here A is a given

More information

Maximum variance formulation

Maximum variance formulation 12.1. Principal Component Analysis 561 Figure 12.2 Principal component analysis seeks a space of lower dimensionality, known as the principal subspace and denoted by the magenta line, such that the orthogonal

More information

Principal component analysis

Principal component analysis Principal component analysis Angela Montanari 1 Introduction Principal component analysis (PCA) is one of the most popular multivariate statistical methods. It was first introduced by Pearson (1901) and

More information

MULTIVARIATE TECHNIQUES, ROBUSTNESS

MULTIVARIATE TECHNIQUES, ROBUSTNESS MULTIVARIATE TECHNIQUES, ROBUSTNESS Mia Hubert Associate Professor, Department of Mathematics and L-STAT Katholieke Universiteit Leuven, Belgium mia.hubert@wis.kuleuven.be Peter J. Rousseeuw 1 Senior Researcher,

More information

LINEAR ALGEBRA BOOT CAMP WEEK 4: THE SPECTRAL THEOREM

LINEAR ALGEBRA BOOT CAMP WEEK 4: THE SPECTRAL THEOREM LINEAR ALGEBRA BOOT CAMP WEEK 4: THE SPECTRAL THEOREM Unless otherwise stated, all vector spaces in this worksheet are finite dimensional and the scalar field F is R or C. Definition 1. A linear operator

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters, transforms,

More information

Notes on Linear Algebra and Matrix Theory

Notes on Linear Algebra and Matrix Theory Massimo Franceschet featuring Enrico Bozzo Scalar product The scalar product (a.k.a. dot product or inner product) of two real vectors x = (x 1,..., x n ) and y = (y 1,..., y n ) is not a vector but a

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak, scribe: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters,

More information

arxiv: v1 [math.na] 5 May 2011

arxiv: v1 [math.na] 5 May 2011 ITERATIVE METHODS FOR COMPUTING EIGENVALUES AND EIGENVECTORS MAYSUM PANJU arxiv:1105.1185v1 [math.na] 5 May 2011 Abstract. We examine some numerical iterative methods for computing the eigenvalues and

More information

7. Symmetric Matrices and Quadratic Forms

7. Symmetric Matrices and Quadratic Forms Linear Algebra 7. Symmetric Matrices and Quadratic Forms CSIE NCU 1 7. Symmetric Matrices and Quadratic Forms 7.1 Diagonalization of symmetric matrices 2 7.2 Quadratic forms.. 9 7.4 The singular value

More information

Introduction to robust statistics*

Introduction to robust statistics* Introduction to robust statistics* Xuming He National University of Singapore To statisticians, the model, data and methodology are essential. Their job is to propose statistical procedures and evaluate

More information

MTH Linear Algebra. Study Guide. Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education

MTH Linear Algebra. Study Guide. Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education MTH 3 Linear Algebra Study Guide Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education June 3, ii Contents Table of Contents iii Matrix Algebra. Real Life

More information

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis .. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make

More information

A DECOMPOSITION THEOREM FOR FRAMES AND THE FEICHTINGER CONJECTURE

A DECOMPOSITION THEOREM FOR FRAMES AND THE FEICHTINGER CONJECTURE PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY Volume 00, Number 0, Pages 000 000 S 0002-9939(XX)0000-0 A DECOMPOSITION THEOREM FOR FRAMES AND THE FEICHTINGER CONJECTURE PETER G. CASAZZA, GITTA KUTYNIOK,

More information

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det What is the determinant of the following matrix? 3 4 3 4 3 4 4 3 A 0 B 8 C 55 D 0 E 60 If det a a a 3 b b b 3 c c c 3 = 4, then det a a 4a 3 a b b 4b 3 b c c c 3 c = A 8 B 6 C 4 D E 3 Let A be an n n matrix

More information

NORMS ON SPACE OF MATRICES

NORMS ON SPACE OF MATRICES NORMS ON SPACE OF MATRICES. Operator Norms on Space of linear maps Let A be an n n real matrix and x 0 be a vector in R n. We would like to use the Picard iteration method to solve for the following system

More information

Statistics for Applications. Chapter 9: Principal Component Analysis (PCA) 1/16

Statistics for Applications. Chapter 9: Principal Component Analysis (PCA) 1/16 Statistics for Applications Chapter 9: Principal Component Analysis (PCA) 1/16 Multivariate statistics and review of linear algebra (1) Let X be a d-dimensional random vector and X 1,..., X n be n independent

More information

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces. Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,

More information

Chapter Two Elements of Linear Algebra

Chapter Two Elements of Linear Algebra Chapter Two Elements of Linear Algebra Previously, in chapter one, we have considered single first order differential equations involving a single unknown function. In the next chapter we will begin to

More information

Least Squares Optimization

Least Squares Optimization Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques. I assume the reader is familiar with basic linear algebra, including the

More information

ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX

ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX STATISTICS IN MEDICINE Statist. Med. 17, 2685 2695 (1998) ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX N. A. CAMPBELL *, H. P. LOPUHAA AND P. J. ROUSSEEUW CSIRO Mathematical and Information

More information

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014 Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014 Linear Algebra A Brief Reminder Purpose. The purpose of this document

More information

The following definition is fundamental.

The following definition is fundamental. 1. Some Basics from Linear Algebra With these notes, I will try and clarify certain topics that I only quickly mention in class. First and foremost, I will assume that you are familiar with many basic

More information

Problems in Linear Algebra and Representation Theory

Problems in Linear Algebra and Representation Theory Problems in Linear Algebra and Representation Theory (Most of these were provided by Victor Ginzburg) The problems appearing below have varying level of difficulty. They are not listed in any specific

More information

MATHEMATICS 217 NOTES

MATHEMATICS 217 NOTES MATHEMATICS 27 NOTES PART I THE JORDAN CANONICAL FORM The characteristic polynomial of an n n matrix A is the polynomial χ A (λ) = det(λi A), a monic polynomial of degree n; a monic polynomial in the variable

More information

Linear Algebra I. Ronald van Luijk, 2015

Linear Algebra I. Ronald van Luijk, 2015 Linear Algebra I Ronald van Luijk, 2015 With many parts from Linear Algebra I by Michael Stoll, 2007 Contents Dependencies among sections 3 Chapter 1. Euclidean space: lines and hyperplanes 5 1.1. Definition

More information

Principal Components Theory Notes

Principal Components Theory Notes Principal Components Theory Notes Charles J. Geyer August 29, 2007 1 Introduction These are class notes for Stat 5601 (nonparametrics) taught at the University of Minnesota, Spring 2006. This not a theory

More information

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The

More information

Chapter 6: Orthogonality

Chapter 6: Orthogonality Chapter 6: Orthogonality (Last Updated: November 7, 7) These notes are derived primarily from Linear Algebra and its applications by David Lay (4ed). A few theorems have been moved around.. Inner products

More information

4.2. ORTHOGONALITY 161

4.2. ORTHOGONALITY 161 4.2. ORTHOGONALITY 161 Definition 4.2.9 An affine space (E, E ) is a Euclidean affine space iff its underlying vector space E is a Euclidean vector space. Given any two points a, b E, we define the distance

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

DS-GA 1002 Lecture notes 10 November 23, Linear models

DS-GA 1002 Lecture notes 10 November 23, Linear models DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.

More information

Boolean Inner-Product Spaces and Boolean Matrices

Boolean Inner-Product Spaces and Boolean Matrices Boolean Inner-Product Spaces and Boolean Matrices Stan Gudder Department of Mathematics, University of Denver, Denver CO 80208 Frédéric Latrémolière Department of Mathematics, University of Denver, Denver

More information

1. Addition: To every pair of vectors x, y X corresponds an element x + y X such that the commutative and associative properties hold

1. Addition: To every pair of vectors x, y X corresponds an element x + y X such that the commutative and associative properties hold Appendix B Y Mathematical Refresher This appendix presents mathematical concepts we use in developing our main arguments in the text of this book. This appendix can be read in the order in which it appears,

More information

1. General Vector Spaces

1. General Vector Spaces 1.1. Vector space axioms. 1. General Vector Spaces Definition 1.1. Let V be a nonempty set of objects on which the operations of addition and scalar multiplication are defined. By addition we mean a rule

More information

Characteristics of multivariate distributions and the invariant coordinate system

Characteristics of multivariate distributions and the invariant coordinate system Characteristics of multivariate distributions the invariant coordinate system Pauliina Ilmonen, Jaakko Nevalainen, Hannu Oja To cite this version: Pauliina Ilmonen, Jaakko Nevalainen, Hannu Oja. Characteristics

More information

arxiv: v3 [stat.me] 2 Feb 2018 Abstract

arxiv: v3 [stat.me] 2 Feb 2018 Abstract ICS for Multivariate Outlier Detection with Application to Quality Control Aurore Archimbaud a, Klaus Nordhausen b, Anne Ruiz-Gazen a, a Toulouse School of Economics, University of Toulouse 1 Capitole,

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

November 18, 2013 ANALYTIC FUNCTIONAL CALCULUS

November 18, 2013 ANALYTIC FUNCTIONAL CALCULUS November 8, 203 ANALYTIC FUNCTIONAL CALCULUS RODICA D. COSTIN Contents. The spectral projection theorem. Functional calculus 2.. The spectral projection theorem for self-adjoint matrices 2.2. The spectral

More information

(x, y) = d(x, y) = x y.

(x, y) = d(x, y) = x y. 1 Euclidean geometry 1.1 Euclidean space Our story begins with a geometry which will be familiar to all readers, namely the geometry of Euclidean space. In this first chapter we study the Euclidean distance

More information

Robust estimation of principal components from depth-based multivariate rank covariance matrix

Robust estimation of principal components from depth-based multivariate rank covariance matrix Robust estimation of principal components from depth-based multivariate rank covariance matrix Subho Majumdar Snigdhansu Chatterjee University of Minnesota, School of Statistics Table of contents Summary

More information

Consistent Histories. Chapter Chain Operators and Weights

Consistent Histories. Chapter Chain Operators and Weights Chapter 10 Consistent Histories 10.1 Chain Operators and Weights The previous chapter showed how the Born rule can be used to assign probabilities to a sample space of histories based upon an initial state

More information

MA201: Further Mathematical Methods (Linear Algebra) 2002

MA201: Further Mathematical Methods (Linear Algebra) 2002 MA201: Further Mathematical Methods (Linear Algebra) 2002 General Information Teaching This course involves two types of teaching session that you should be attending: Lectures This is a half unit course

More information

Here each term has degree 2 (the sum of exponents is 2 for all summands). A quadratic form of three variables looks as

Here each term has degree 2 (the sum of exponents is 2 for all summands). A quadratic form of three variables looks as Reading [SB], Ch. 16.1-16.3, p. 375-393 1 Quadratic Forms A quadratic function f : R R has the form f(x) = a x. Generalization of this notion to two variables is the quadratic form Q(x 1, x ) = a 11 x

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Common-Knowledge / Cheat Sheet

Common-Knowledge / Cheat Sheet CSE 521: Design and Analysis of Algorithms I Fall 2018 Common-Knowledge / Cheat Sheet 1 Randomized Algorithm Expectation: For a random variable X with domain, the discrete set S, E [X] = s S P [X = s]

More information

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 Instructions Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 The exam consists of four problems, each having multiple parts. You should attempt to solve all four problems. 1.

More information

σ 11 σ 22 σ pp 0 with p = min(n, m) The σ ii s are the singular values. Notation change σ ii A 1 σ 2

σ 11 σ 22 σ pp 0 with p = min(n, m) The σ ii s are the singular values. Notation change σ ii A 1 σ 2 HE SINGULAR VALUE DECOMPOSIION he SVD existence - properties. Pseudo-inverses and the SVD Use of SVD for least-squares problems Applications of the SVD he Singular Value Decomposition (SVD) heorem For

More information

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection Eigenvalue Problems Last Time Social Network Graphs Betweenness Girvan-Newman Algorithm Graph Laplacian Spectral Bisection λ 2, w 2 Today Small deviation into eigenvalue problems Formulation Standard eigenvalue

More information

EE731 Lecture Notes: Matrix Computations for Signal Processing

EE731 Lecture Notes: Matrix Computations for Signal Processing EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University October 17, 005 Lecture 3 3 he Singular Value Decomposition

More information

MATH 23a, FALL 2002 THEORETICAL LINEAR ALGEBRA AND MULTIVARIABLE CALCULUS Solutions to Final Exam (in-class portion) January 22, 2003

MATH 23a, FALL 2002 THEORETICAL LINEAR ALGEBRA AND MULTIVARIABLE CALCULUS Solutions to Final Exam (in-class portion) January 22, 2003 MATH 23a, FALL 2002 THEORETICAL LINEAR ALGEBRA AND MULTIVARIABLE CALCULUS Solutions to Final Exam (in-class portion) January 22, 2003 1. True or False (28 points, 2 each) T or F If V is a vector space

More information

5. Discriminant analysis

5. Discriminant analysis 5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density

More information

WOMP 2001: LINEAR ALGEBRA. 1. Vector spaces

WOMP 2001: LINEAR ALGEBRA. 1. Vector spaces WOMP 2001: LINEAR ALGEBRA DAN GROSSMAN Reference Roman, S Advanced Linear Algebra, GTM #135 (Not very good) Let k be a field, eg, R, Q, C, F q, K(t), 1 Vector spaces Definition A vector space over k is

More information

A Characterization of Principal Components. for Projection Pursuit. By Richard J. Bolton and Wojtek J. Krzanowski

A Characterization of Principal Components. for Projection Pursuit. By Richard J. Bolton and Wojtek J. Krzanowski A Characterization of Principal Components for Projection Pursuit By Richard J. Bolton and Wojtek J. Krzanowski Department of Mathematical Statistics and Operational Research, University of Exeter, Laver

More information

Factor Analysis (10/2/13)

Factor Analysis (10/2/13) STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 1 x 2. x n 8 (4) 3 4 2

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 1 x 2. x n 8 (4) 3 4 2 MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS SYSTEMS OF EQUATIONS AND MATRICES Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

More information

PRINCIPAL COMPONENTS ANALYSIS

PRINCIPAL COMPONENTS ANALYSIS 121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves

More information

Independent Component Analysis and Its Applications. By Qing Xue, 10/15/2004

Independent Component Analysis and Its Applications. By Qing Xue, 10/15/2004 Independent Component Analysis and Its Applications By Qing Xue, 10/15/2004 Outline Motivation of ICA Applications of ICA Principles of ICA estimation Algorithms for ICA Extensions of basic ICA framework

More information

Quantum Computing Lecture 2. Review of Linear Algebra

Quantum Computing Lecture 2. Review of Linear Algebra Quantum Computing Lecture 2 Review of Linear Algebra Maris Ozols Linear algebra States of a quantum system form a vector space and their transformations are described by linear operators Vector spaces

More information