HOW MANY MODES CAN A TWO COMPONENT MIXTURE HAVE? Surajit Ray and Dan Ren. Boston University

Size: px
Start display at page:

Download "HOW MANY MODES CAN A TWO COMPONENT MIXTURE HAVE? Surajit Ray and Dan Ren. Boston University"

Transcription

1 HOW MANY MODES CAN A TWO COMPONENT MIXTURE HAVE? Surajit Ray and Dan Ren Boston University Abstract: The main result of this article states that one can get as many as D + 1 modes from a two component normal mixture in D dimensions. Multivariate mixture models are widely used for modeling homogeneous populations and for cluster analysis. Either the components directly or modes arsing from these components are often used to extract individual clusters. Though in lower dimensions these strategies work well, our results show that high dimensional mixtures are often very complex and researchers should take extra precaution while using mixtures for cluster analysis. Even in the simplest case of mixing only two normal components in D dimensions, we can show that it can have a maximum of D + 1 modes. When we mix more components or if the components are non-normal the number of modes might be even higher, which might lead us to wrong inference on the number of clusters. Further analyses show that the number of modes depend on the component means and eigenvalues of the ratio of the two component covariance matrices, which in turn provides a clear guideline as to when one can use mixture analysis for clustering high dimensional data. Key words and phrases: Mixture, modal cluster, multivariate mode, clustering, dimension reduction, topography, manifold 1 Introduction 1.1 Number of modes of a normal mixture Multivariate normal mixtures provide a flexible method of fitting high-dimensional data. This fit often provides a primary data reduction through the number, location and shape of its components. However, a more interesting question relates to the exploration of how components interact to describe an overall pattern of density. Of particular interest is finding the number of modes the density displays. The relation between the number of modes and number of components is not one to one. Often modes are used to determine the number of homogeneous groups in a population (Li et al., 2007; McLachlan and Peel, 2000; Titterington et al., 1985). Modes of densities are also widely used to summarize posterior distributions in Bayesian analysis (Berger, 1985; Lehmann and Casella, 1998) and to build Bayesian inferential framework. 1

2 The main results of this paper is summarized in the following theorem: Theorem 1. A D dimensional normal mixture of two components has at most D + 1 modes and a mixture with D + 1 modes always exists in D dimensions. In one dimension a two-component normal mixture can display one or two modes. But the density shapes become complex in higher dimensions. For example a two-component normal in two dimensions can give rise to one, two or three modes (see Ray and Lindsay, 2005, for a three mode example). Ray and Lindsay (2005) provide more examples in two and three dimensions where the number of modes are more than the number of mixing components. But beside these pathological examples there is no result on the upper bound of the number of modes that a mixture of normals can display. This paper provides the first set of results on the upper bound for the number of modes of a two-component normal mixture. We also show that this bound is tight, i.e., we can provide numerical values for a mixture which attains this upper bound. It is well known that the topography, in the sense of the key features as a density of a mixture of distributions is often extremely complex. Among the different features of the topography we are especially interested in the number of modes the density displays referred to as the modality of the density from here on. Ray and Lindsay (2005) provide a detailed understanding of the topography of mixtures of normal distributions in terms of the means and variances of the component distributions. But how these density shapes respond to the rotation or scaling based on the component covariances is not well studied. For example, it is not clear if rotation and scaling retains all the modes after transformation. In this paper we present a set of results showing the invariance of modality of normal mixtures under the operation of translation, scaling and rotation. These results allow us to show that the modality of a two-component mixture of normals with arbitrary variance-covariance matrices is mathematically equivalent to the topography of a mixture of normals, with one component of which has a spherical covariance and the other has an appropriate diagonal covariance matrix of the same dimension. A follow up analysis shows that, the number of modes are closely related to the number of unique eigenvalues of the ratio of the covariance matrices, in a matrix sense (inverse of one matrix multiplied by the other matrix). Finally we use these results to arrive at the main result on the tight upper bound on the number of modes. 1.2 Relevant Literature Studies of the number of modes of normal mixtures date back to the beginning of twentieth century but until recently the results have focused primarily on univariate mixtures. In fact, there is a simple description of modality when one is mixing two univariate normal components. Helguero (1904) determined necessary and sufficient conditions for bimodality in 2

3 the mixture of two univariate normals with equal variances and mixing proportions. More research on univariate mixture cases followed. For example, Eisenberger (1964) investigated the conditions for bimodality in the mixture of two univariate normals with arbitrary variance and mixing proportions and Behboodian (1970) derived a sufficient condition for unimodal mixture densities. Kakiuchi (1981) and Kemperman (1991) then extended the problem to mixtures of non-normal distributions, and derived corresponding necessary and sufficient conditions. In the context of multivariate normal mixtures, a recent result by Carreira-Perpiñán and Williams (2003) shows that for any D-dimensional normal mixture, the number of modes cannot exceed the number of components if each component has the same covariance matrix up to a scalar scaling factor. The most recent and comprehensive results in this area of research are provided by Ray and Lindsay (2005), who present the most generalized modality results for arbitrary dimensions, number of components and component variance structure. The key result in Ray and Lindsay (2005) shows that the topography of multivariate mixtures, in the sense of their key features as a density, can be analyzed rigorously in lower dimensions by use of a ridgeline manifold that contains all critical points as well as the ridges of the density. This important topographical result allows them to solve for the number of modes both analytically and numerically. Besides solving for the number of modes Ray and Lindsay (2005) provide pathological examples of more modes than components in more than one dimension. A comprehensive summary of the above results are available in Frühwirth- Schnatter (2006) and a recent review paper by Melnykov and Maitra (2010). Much of the modality theory discussed in Ray and Lindsay (2005) has been widely used for developing clustering techniques by Ray and Lindsay (2008); Coretto and Hennig (2010); Hennig (2010b) and Hennig (2010a) and for the advancement of likelihood based inference for normal mixtures by Chen and Tan (2009); Holzmann and Vollmer (2008); Dannemann and Holzmann (2008) and Lindsay et al. (2008). Applications of these results are found in new areas of research such as signal processing (Li, 2007; Scott et al., 2009) and image retrieval (Sfikas et al., 2005). Using the modality theorem in the special case of a two-component normal mixture, Ray and Lindsay (2005) provide examples of three modes in two dimensions, and four modes in three dimensions. These mixtures have unequal covariances matrices, but they are limited to being diagonal in structure. But providing an upper bound of modes for mixtures in arbitrary dimensions for arbitrary component variance-covariance matrix remained an unresolved problem. 1.3 Our Results The main contribution of this paper is to provide a tight upper bound for the number of modes of a two-component normal mixture for arbitrary dimension and arbitrary component 3

4 variance-covariance matrices. Let us denote the dimension of the multivariate normal density by D and the number of components of the mixture by K. In this paper, we only consider two-component normal mixture cases, i.e., K = 2; and the corresponding parameters for each normal density are their means µ i and variance-covariance matrices Σ i, i = 1, 2. Let π, π = 1 π be the respective proportions of two densities. It can be shown that for specified means and variances the number of modes depends on the mixing proportions. In fact, Ray and Lindsay (2005) provide examples of mixtures where different ranges of π display one, two and three modes for the same means and variance-covariance matrices. But one should notice that the specification of π is irrelevant in the context of determination of the maximal number of modes displayed by a mixture of two components. In other words we are asking the following question given a pair of component means and covariance matrices what is the maximum number of modes it can display if one has the complete freedom of choosing the mixing proportion π? Hence we will ignore the parameter π for our analysis and for notational ease we will denote a D dimensional mixture of two components with means µ 1 and µ 2, and variances Σ 1 and Σ 2 by NM(µ 1, Σ 1, µ 2, Σ 2 ) D. Our main result shows that the number of modes for the above mixture is bounded above by D + 1 and that bound is achievable for any D. In fact we provide a recursive algorithm to construct the parameters of the component densities which attain this bound. Modes are defined as the local maxima of the density height and understanding the modes require understanding of the topography of the density along with their higher order features. Many of the results we will use in this paper are based on these higher order features of normal mixtures defined in terms of Π-function (different from the omitted parameter π) and curvature functions defined in Ray and Lindsay (2005). So, in Section 2 we will first define the terminologies and state some of the important results from Ray and Lindsay (2005) which will be used in this paper. In particular we will present the concept of Π-functions and curvature functions of a mixture, which have the advantage of being expressed explicitly in terms of means and variances of components while retaining full information about the topography and hence the number of modes of a mixture. Moreover the Π-function and curvature function attain a very simple form for a two-component normal mixture. This simplification of the curvature function allows us to show that the number of modes of the two-component mixtures is explicitly determined by the number of roots of the curvature function within the range [0,1]. But the roots of the curvature function defined in Section 2 are very difficult to study for arbitrary mixtures. Ray and Lindsay (2005) explore the roots for curvature functions only in the case of diagonal covariance matrices up to three dimensions. In this paper we seek to generalize the modality results for arbitrary dimensions and component variance- 4

5 covariance matrices. To arrive at these results in Section 3, we first show that modality of an arbitrary D-dimensional normal mixture NM(µ 1, Σ 1, µ 2, Σ 2 ) D remains unchanged under any translation and a specified scaling and rotation of the random variable. These results will be enormously helpful as it will allows us to study the topography of arbitrary D-dimensional normal mixture by exploring the topography of a simplified class of normal mixtures with the first component being a standard normal and the second component having a diagonal covariance matrix. We denote this class by NM(0, I,µ, Λ) D, where, 0 and µ are both D dimensional means, I is a identity matrix and Λ is a diagonal matrix of dimension D. These results are derived analytically and examples are provided to illustrate these results. In Section 4 we explore the modality of normal mixtures of the form NM(0, I,µ, Λ) D. We show that the maximum number of modes is constrained by d, the number of distinct diagonal entries in Λ. In fact the modality of such a normal with d distinct diagonal entries, is less than or equal to (d+1). It is easy to check that d can be equal to the dimension D and thus we arrive at the first part of our result showing that any arbitrary D dimensional normal mixture can have at most (D + 1) modes. The tightness of the stated bound is achieved by providing a recursive method for construction of two-component normals which achieve this bound. In this section we also show that many previous modality results can be stated as special cases of our generalized result. For D = 1, this can be used to prove the univariate results in Helguero (1904) and Robertson and Fryer (1969). For D = 2 and D = 3 our results show that the examples in Ray and Lindsay (2005) achieve the upper limit of the number of modes in their respective dimensions. Section 5 provides some discussion and further research directions regarding the number of modes of multivariate normal mixture of more than two components. Generalization of the modality of mixtures of multivariate normals to multivariate-t densities and then ultimately to multivariate elliptical distributions will also be discussed in this section. 2 Topography of multivariate normals In this section we state some important results from Ray and Lindsay (2005) that will be extensively used in this paper. The rest of the paper will use the notations defined in this section. Readers familiar with the results in Ray and Lindsay (2005) may skip this section. Ray and Lindsay (2005) presents a unified theory for understanding the topography of high dimensional normal mixtures. Their main result shows that the topography of mixtures, in the sense of their key features as a density, can be analyzed rigorously in lower dimensions by use of a ridgeline manifold that contains all critical points as well as the ridges of the density. A K-component mixture of D-dimensional normals can be represented by the probability 5

6 density function g(x) = π 1 φ(x; µ 1, Σ 1 ) + π 2 φ(x; µ 2, Σ 2 ) π K φ(x; µ K, Σ K ),x R D, where π j is the mixing proportion of component j, π j [0, 1], K j=1 π j = 1, and φ(x; µ, Σ) is the density of a multivariate normal distribution with mean µ and variance Σ. We will sometimes use φ j (x) as shorthand notation for φ(x; µ j, Σ j ), and call φ j the j th component density. 2.1 The K-1 dimensional ridgeline manifold Definition 1. The K 1 dimensional set of points { } K S K = α R K : α i [0, 1], α i = 1 will be called the unit simplex. The function x (α) from S K into R D defined by x (α) = [ α 1 Σ α 2 Σ α K Σ 1 ] 1 [ K α1 Σ 1 1 µ 1 + α 2 Σ 1 2 µ α K Σ 1 K µ ] K will be called the ridgeline function. It will sometimes be written as x α. The image of this map will be denoted by M and called the ridgeline surface or manifold. If K = 2, it will be called the ridgeline as it is a one-dimensional curve. Theorem 2. (Ray and Lindsay, 2005) Let g(x) be the density of a K-component multivariate normal densities as given by (2). Then all of g(x) s critical values, and hence modes, antimodes and saddle points, are points in M. The previous result states that instead of exploring the whole R D space to find modes, we now only need to concentrate on the ridgeline, embedded in the (K 1)-dimensional unit simplex. In this paper we only deal with two components and for K = 2 the ridgeline can be represented as x (α) = Sα 1 [ ασ 1 1 µ 1 + ᾱσ 1 2 µ ] 2, where Sα = [ ασ ᾱσ 1 ] 2 and α [0, 1] and ᾱ = 1 α. As α varies from 0 to 1, the image of the function x (α) defines a curve from µ 1 to µ 2 and the critical points of the D-dimensional mixture can be explored by evaluating the height of the density along the curve x (α). Thus we next consider the diagnostic properties of the elevation plot along the curve x (α) defined by h(α) = g (x (α)). We will call h(α) the ridgeline elevation function. Analytically, the number of peaks of h(α) is exactly the maximum number of modes the mixture can display. In some cases a visual (1) 6

7 inspection of h(α) or numerical root finding methods might allow us the enumerate the roots of h(α) and hence the number of modes. But depending on the resolution, numerical methods can always miss some zero crossings. Moreover, numerical solutions will not serve the purpose of this paper which focuses on determining the upper bound on the number of modes. Hence we focus our attention to finding analytical solutions for the critical points of h(α) for finding the number of modes of the mixture. 2.2 The curvature function To find the number of modes, first note that x (α) is a critical value of h(α) if it satisfies h (α) = πφ 1 (x (α)) + πφ 2 (x (α)) = 0, where prime denotes differentiation with respect to α. Solving the last displayed equation for π, and turning it into a function of α we get: Π(α) = φ 2 (α) φ 2 (α) φ 1 (α). As we are just interested in the number of modes we can examine the number of up and down oscillations of the function Π. Section 4 of Ray and Lindsay (2005) shows that the number of up-down oscillations of Π, is given by n, the zeroes of Π (α) = φ 2 (α)φ 1 (α) φ 1 (α)φ 2 ) (φ (α) 2. 2 (α) φ 1 (α) In general, to determine the sign changes of Π we can use any function of α with the same numerator φ 2 (α)φ 1 (α) φ 1 (α)φ 2 (α), provided the denominator is a positive function of α. Using the denominator φ 1 (α)φ 2 (α) instead of (φ 2 (α) φ 1 (α))2 the curvature function κ(α) is defined as: κ(α) = φ 2 (α) φ 1 (α) φ 2 (α) φ 1 (α) φ 1 (α) φ 2 (α) φ 1 (α) φ 2 (α). (2) We use κ(α) as it results in a simple expression for any distribution belonging to the exponential family. It is closely related to the mixture curvature measures given by Lindsay (1983). 2.3 Properties of the Curvature function κ(α) We now study the curvature function κ(α) more closely, as it will be extensively used to prove the results in Section 3 and Section 4. The following result, provides a simple expression for the curvature for the mixture of normals. 7

8 Theorem 3. (Ray and Lindsay, 2005) Let g(x) be the mixture of two multivariate normal densities. Then the curvature function in (2) is given by κ(α) = [p(α)] 2 [1 αᾱp(α)], where p(α) = (µ 2 µ 1 ) Σ 1 1 S 1 α Σ 1 2 S 1 α Σ 1 2 S 1 α Σ 1 1 (µ 2 µ 1 ). (3) By the expression above, p(α) is always positive. Thus zeroes of κ(α) are the same as the zeroes of (1 αᾱp(α)). For notational ease, let us denote q(α) = 1 αᾱp(α). (4) By calculation, q(0) = q(1) = 1 and hence, κ takes positive values at the two extremes α=0 and 1. Thus, there are an even number of sign changes of the function κ(α) in the range [0,1], as also indicated by the nature of Π. In particular at the first zero, α 1, of κ, the function Π has a maximum, at the next α 1 a minimum, and so forth. Thus we arrive at the following result relating the number of solutions of q(α) to the modality of the mixture. Result 1. Let n be the number of solutions of q(α) in the range [0,1]. Then the corresponding mixture will display n modes. We note that both p(α) and q(α) uniquely defines the number of modes. We will use p(α) to show the invariance in the proof of Theorem 5, and later use q(α) to find the number of modes while providing the proofs of other theorems. 3 Invariance of modality under scaling and rotation Studying the modality of arbitrary normal mixtures directly based on the curvature function κ(α) is a very complex undertaking. Instead in this section we will show that the curvature function which defines the modal features of a two-component normal mixture remains unchanged under certain transformations. We will use these transformations to show that the topography of arbitrary D-dimensional normal mixture can be examined by exploring the topography of a simplified class of normal mixture given by the mixture of a spherical normal and a normal with a diagonal covariance matrix. We arrive at this result in two steps described in the following two subsections. 3.1 Invariance of modality under scaling First we state the theorem that provides the simplification that in D dimensions the modal properties of arbitrary two-component normal mixture can be fully examined by studying the modality of mixture of two components, one of which is the standard normal in D dimensions. 8

9 Theorem 4. For an arbitrary mixture of two multivariate normals, the modality, of NM(µ 1, Σ 1, µ 2, Σ 2 ) D is the same as that of NM(0, I,µ 2, Σ 2 ) D, where µ 2 = (Σ 2 2Σ 1 )1 2 2 (µ 2 µ 1 ), Σ 2 = Σ1 2 2 Σ 1 1 Σ Proof. See Appendix Remark 1. First note that the above transformation is not equivalent to the regular standardization for the first component alone. Using a regular standardization a single component can be transformed to a standard normal but the resulting parameters of the second component will lose its symmetry which is crucial for equating the curvature function of the two mixtures detailed in the proof of Theorem 4. Also, note that µ 2, Σ 2 in Theorem 4 is well-defined, because the variance matrices Σ 1 and Σ 2 are both positive definite. Note that the two components are interchangeable and the strategy is to scale the whole mixture by the covariance of the component whose mean is translated to the origin. Before moving on to the next result, we provide an application of Theorem 4. For easy visualization we will use contour plots of a two dimensional mixture. This example will also serve the purpose of providing a geometric intuition of the proof of Theorem 4. First, note that it is easy to check that geometrically shifting the means of both the components by the same vector is equivalent to changing the origin of the reference frame of the contour plot. This implies that the modal features and hence the number of modes remain unchanged after simple translation. So we concentrate on the changes of the contour plot strictly under the operation of scaling defined in Theorem 4 by taking µ 1 = 0. Example 1. Consider the mixture density with the following parameters: µ 1 =, Σ 1 =, µ 2 =, Σ 2 = Applying the transformation defined in Theorem 4 the parameters of the two components after scaling are given by: µ 0 1 =, Σ =, µ =, Σ = Figure 1 gives the density contour plots before (left panel) and after (right panel) the transformation and clearly though the contour shapes and the location of the modes have changed, the number of modes and the number of saddle points remains unchanged. Note that under the transformation both components are scaled, and in this example the component centered at zero is scaled to have the identity covariance and the covariance of the other component is scaled appropriately. This is easily visible from the contour plots of in Figure 1 where the elongated elliptical component in the left panel with the origin as the center is transformed into a spherical component with the same center. Of course the change 9

10 (a) (b) y y x x Figure 1: Contour plots for the bivariate normal mixture of Example 1 in (a) the original parameters and (b) the transformed parameter. in means and covariances of the components have changed the location of the three modes, but as the theorem suggests the number of modes is strictly preserved between the mixtures. The contour plots in Figure 1 are not available unless D = 2, so we provide an alternative graphical display showing the invariance of modes. We compare the ridgeline elevation of the two mixtures in Example 1. Recall that the ridgeline elevation for a two component mixtures is simply the height of the mixture density along the ridgeline manifold defined in (1), but it carries the full modality information for mixtures in any dimensions. Figure 2 displays the ridgeline elevation plot before and after the transformation. Again note that though the shape of elevation plots differ, the number of up-down oscillations of the curves in the left and right panel in Figure 2 are exactly the same. In both cases the ridgeline elevation plot confirms the presence of three modes. 3.2 Invariance of modality under rotation By Theorem 4 the topography of any D dimensional mixture can be studied using mixtures of the form NM(µ 1 = 0, Σ 1 = I,µ 2, Σ 2 ). But uncovering the topography, even when one component has an arbitrary covariance matrix, is difficult. In this section we seek to provide a further simplification, which will allow us to find the number of modes of an arbitrary mixture by studying the modes of another mixture, one component of which is a standard normal and the other component is a normal with diagonal covariance matrix. Before we state the result, recall that the maximum number of modes of a two-component 10

11 (a) (b) density arc length arc length Figure 2: Ridgeline function with respect to the arc distance for the bivariate normal mixture of Example 1 in (a) the original parameters and (b) the transformed parameter. normal is uniquely defined by the number of roots between 0 and 1 of q(α) given in (4) and for any mixture q(α) is uniquely defined by p(α). So we will first provide a simplification of the expression for p(α) for mixtures of the form NM(0, I,µ 2, Σ 2 ) D and then state the rotation invariance theorem. Result 2. For mixture of the form NM(0, I,µ 2, Σ 2 ) D, the term p(α) in (3) can be expressed in terms of the eigenvalues and eigenvectors of Σ 2 in the following way: p(α) = D c i [α(λ i 1) + 1] 3, (5) where c i = λ i (µ 2 ξ i) 2, and λ i s and ξ i s are eigenvalues and corresponding eigenvectors of matrix Σ 2. Proof. See Appendix. We will now state the following property of invariance of mixture modality under rotation. Theorem 5. The modality of mixture NM(0, I,µ 2, Σ 2 ) D, is the same as that of mixture NM(0, I,µ 0, Λ) D, with µ T 0 = (µ 2 ξ 1, µ 2 ξ 2,...,µ 2 ξ D) and Λ = diag(λ 1, λ 2,...,λ D ), where (λ i, ξ i, i = 1,, D) are the eigenvalue, eigenvector pairs of Σ 2 Proof. Using µ 0 and Λ in Result 2 it is easy to check that the p(α) of mixtures NM(0, I,µ 2, Σ 2 ) D and NM(0, I,µ 0, Λ) D have the same expression, hence the number of roots, which implies that the two mixtures will have the same modality. 11

12 For illustration, we will now apply the rotation described in Theorem 5 to the scaled version of Example 1 whose first component is a standard normal. Example 1 gives the numerical values of the parameters after scaling and Figure 3 shows the contour plots of the mixtures before and after rotation. Example 2. (Continuation of Example 1) Applying the rotation transformation described in Theorem 5 on the mixture with parameters µ 1 =, Σ 1 =, µ 2 =, Σ 2 =, we get the mixture with parameters µ 1 =, Σ 1 = 0 0 1, µ 0 = , Λ = (6) The contour plot in Figure 3(a) depicts the unrotated mixture NM(0, I,µ 2, Σ 2 ), where as the Figure 3(b) shows the contours of the rotated mixture NM(0, I,µ 0, Λ). Algebraically the rotation to achieve the diagonal covariance of the second component is equivalent to using the orthonormal matrix P, whose columns are the eigenvectors of covariance matrix Σ 2, to rotate the random variable. In fact, in two dimensions it has a very simple interpretation. We simply rotate the mixture contour around the origin (0, 0), such that the major axis of the ellipse from contour of the second component is parallel to the x-axis. This will automatically set the minor axis parallel to the y-axis resulting in a diagonal covariance matrix of the second component (see Figure 3). Note that this rotation does not affect the covariance matrix of the first component as it remains an identity matrix. Finally we combine Theorem 4 and Theorem 5 to state the following corollary. Corollary 1. The modality of any arbitrary mixture is equal to another mixture of the form NM(0, I,µ 0, Λ), where Λ is diagonal. Proof. First apply Theorem 4 to scale any mixture to the form NM(0, I,µ, Σ) and then apply Theorem 5 to rotate it to the form NM(0, I,µ 0, Λ). 4 Number of modes of a two-component multivariate normal mixture In this section we will first focus our attention to exploring the modality of normal mixtures of the simplified form NM(0, I,µ, Λ) D. We will restrict ourselves to this small class of mixtures as we have already shown in Section 3 that the modality of any two-component normal mixture is equivalent to the modality of a corresponding mixture of the form NM(0, I,µ, Λ) D. 12

13 (a) (b) y y x x Figure 3: Contour plots for the bivariate normal mixture of Example 2 in (a) before and (b) after rotation. First we will show that the maximum number of modes is a function of d, the number of distinct diagonal entries in Λ, by first showing that the maximum number of modes is less than or equal to (d + 1), and then by showing that the upper bound (d + 1) is achievable. It is easy to check that d can be equal to the dimension D and thus we arrive at the final result on the upper bound of the number of modes of an arbitrary D dimensional mixture. 4.1 Upper bound on the number of modes of a two-component normal mixture Recall that the number of modes can be directly enumerated using the number of solutions of q(α) = 1 α(1 α)p(α) = 0 within the range [0,1]. Using the simplified form of p(α) given in (5) for mixtures of the form NM(0, I,µ, Λ) D we can simplify q(α) as q(α) = 1 α(1 α) D where λ i s are the diagonal elements of Λ and c i = λ i µ 2 i. c i [α(λ i 1) + 1] 3 = 0, To find the roots of q(α), we first state the following Lemma. Lemma 1. The number of solutions of q(α) = 1 α(1 α) D c i [α(λ i 1) + 1] 3 = 0, 13

14 where α [0, 1] is exactly equal to the number of non-negative solutions for the equation q (t) = 1 t(t + 1) D c i (t + λ i ) 3 = 0. Proof. Define α = 1, then t [0, ) corresponds to α [0, 1] and it is easy to check t + 1 q(α) = q (t). This simple change of variable from α to t allows us to relate the number of modes to the positive solutions of q (t) instead of the more difficult problem of finding solutions in the restricted interval [0, 1] for q(α). This simplification will enable us to find the upper bound of the number of modes and also allow us to recursively construct extra modes in extra dimensions. We will now use the mixture density given in (2) to illustrate the result in Lemma 1. Example 3. (Continuation of Example 1 and 2) After scaling and rotation the modality of Example 1 is equivalent to the mixture with parameters µ 1 =, Σ 1 =, µ 2 =, Σ 2 = For the above mixture [ q(α) = 1 α(1 α) Using the change of variable α = 1 t (19α + 1) ( 0.95α + 1) 3 we have [ q (t) = 1 t(t + 1) 0.05 (t ) (t + 20) 3 Solving the equation q(α) = 0 the 4 solutions in the range [0,1] are α 1 = , α 2 = , α 3 = , α 4 = ; while the equation q (t) = 0 also have 4 non-negative solutions, which are t 1 = , t 2 = , t 3 = , t 4 = As a visual aid we have also presented the curves q(α) and q (t) along with their zero crossing in Figure 4. As we are only interested in the positive solutions of q (t) we have changed the axis of t to log(t) to accommodate the wide range of t. In fact the solutions for Example 3 in log scale are symmetric and they are log(t 1 ) = 5.821, log(t 2 ) = 1.822, log(t 3 ) = 1.822, log(t 4 ) = ]. ], 14

15 (a) (b) q(α) q(t) α log(t) Figure 4: Plots for (a) q(α) against α and (b) q (t) against log(t) for the mixture given in Example 3 Now we state the important result relating the number of non-negative solutions of q (t) = 0, and hence the number of modes to the number of unique diagonal entries of Λ, which equals to the number of distinct eigenvalues of Σ 2. Lemma 2. Consider mixtures of type NM(0, I,µ, Σ 2 ) D. Suppose Σ 2 has d (d D) distinct eigenvalues, then irrespective of the value of µ there are at most 2d non-negative solutions for the corresponding q (t) = 0. Proof. Let the d distinct eigenvalues of Σ 2 be λ 1,, λ d. Let us denote the upper bound of the number of real roots of q (t) by O and the lower bound of its negative roots by N. We are interested in finding an upper bound for the non-negative roots, i.e O N. We will calculate the two bounds in two separate steps. Within each step we will consider two separate cases: one where all the eigenvalues are distinct from 1 and the other where at least one of the d distinct eigenvalues is equal to 1. Step 1. To enumerate the upper bound of the number of real roots of the rational function q (t) we transform it to a polynomial function, whose roots are easier to enumerate. Case 1: If λ i 1 for all i = 1,, d the resulting multiplier for converting q (t) = 0 into a polynomial equation will be d (t + λ i) 3 and as the highest order of the polynomial q (t) d (t + λ i) 3 is 3d, we have O = 3d. 15

16 Case 2: On the other hand if λ i = 1 for any one i {1,, d} the resulting multiplier for converting q (t) = 0 into a polynomial equation will be order of the polynomial is q (t) Q d (t+λ i) 3 Q d (t+λ i) 3 (t+1) and the highest (t+1) will now be 3d 1 giving O = 3d 1. Hence, the equation q(t) = 0 has at most O solutions, where 3d if λ i 1, i {1,, d}; O = (7) 3d 1 if λ i = 1, for any one i {1,, d}. Step 2. To find the lower bound on the number of negative roots we first note the following = = q (t) = 0 1 D t(t + 1) = c i (t + λ i ) 3 1 t = 1 D t c i (t + λ i ) 3 Thus the solutions to q (t) = 0 are equal to the crossing of the two curves 1 t, and r(t) = 1 D t c i (see Figure 5 for an illustration). Let us denote the (t + λ i ) 3 right limit of a function f at point t, lim x t + f(x) by f(x + ). Similarly we denote the left limit, lim x t f(x) by f(x ). Notice that r(t) is a rational function and c i 0, λ i > 0. Thus for each i = 1, 2...d we have a vertical asymptote i.e., r(( λ i ) + ) = + and r(( λ i ) ) =. Additionally we have r(( 1) + ) = + and r(( 1) ) =. [See the dashed lines representing the asymptotes in Figure 5] This implies that r(t) will have several disjoint branches and those branches traveling from one negative to its neighboring positive vertical asymptote have to cross the line y = 0 and hence the curve 1/t at least once. Now we discuss the two distinct cases. Case 1: If λ i 1 for all i = 1,, d the graph of r(t) has d+1 asymptotes one each at λ 1,...,λ d and 1. This gives rise to d + 2 disjoint branches among which d intermediate branches will have at least one crossing with the curve 1, which gives rise to at least d t negative roots of q (t) and hence N = d. Case 2: On the other hand if λ i = 1 for any one i {1,, d} then there are only (d 1) distinct eigenvalues different from 1, and the graph of r(t) now has (d + 1) branches, among which the d 1 intermediate branches give rise to at least d 1 negative solutions and hence N = d 1. 16

17 Hence, the equation q(t) = 0 has at most N negative solutions, where d if λ i 1, i {1,, d}; N = d 1 if λ i = 1, for any one i {1,, d}. (8) Combining the (7) and (8) we show that for both cases there can be at most (O N) = 2d non-negative solutions for the equation q (t) = 0. 1/t and r(t) t r(t) = 1 t (t + 2) (t + 4) (t + 8) (t + 9) t Figure 5: Plots showing the vertical asymptotes of r(t) = 1 t (t+2) (t+4) (t+8) (t+9) 3 and its crossing with the curve 1/t. Finally we state the main theorem of this paper giving us the upper bound on the number of modes of a mixture of two normal components. Theorem 6. The number of modes of the normal mixture NM(µ 1, Σ 1, µ 2, Σ 2 ) D is at most (d + 1), where d is the number of distinct eigenvalues of the matrix Σ 2 = Σ1/2 2 Σ 1 1 Σ1/2 2 and hence the number of distinct eigenvalues of the matrix ratio of the covariance matrices Σ 2 and Σ 1 denoted by Σ 1 1 Σ 2. Proof. By Theorem 4 the modality of the mixture NM(µ 1, Σ 1, µ 2, Σ 2 ) D is the same as the mixture NM(0, I,µ 2, Σ1 2 2 Σ 1 1 Σ1 2 2 ) D, where µ 2 is a vector of dimension D. Now using Lemma 2 17

18 we know that the corresponding q (t) and hence q(α) will have at most 2d roots. Finally, using Result 1 we can show that NM(µ 1, Σ 1, µ 2, Σ 2 ) D has at most 2d = d + 1 modes. To show the second part, note that if λ is an eigenvalue of the matrix Σ 2 = Σ1/2 2 Σ 1 1 Σ1/2 2, then λ satisfies the equation: Σ 2 λi = 0. On the other hand, Σ 2 Σ 1 1 λi = Σ 1/2 2 Σ 2Σ 1/2 2 λi = Σ 1/2 2 Σ 2 λi Σ 1/2 2 = Σ 2 λi Hence, λ is an eigenvalue of the matrix Σ 2 if and only if λ is an eigenvalue of the matrix Σ 2 Σ 1 1, which implies the second part of the Theorem. Theorem 7. Any D dimensional normal mixture NM(µ 1, Σ 1, µ 2, Σ 2 ) D has at most D + 1 modes. Proof. Σ 2 = Σ1/2 2 Σ 1 1 Σ1/2 2, has D eigenvalues, hence d D. Using this inequality in Theorem 6 completes the proof. 4.2 Existence of D + 1 modes in D dimensions In this subsection we will show that it is always possible to find a mixture in any dimension which will attain D + 1 modes. First we provide two examples for D=2 and D=3 where the upper bound is achieved. Remark 2. Example 1, with D = 2, and eigenvalues 20 and 0.05 achieves the upper bound on the number of modes for a two dimensional mixtures. Example 4. Consider the three dimensional example with 4 modes given in Ray and Lindsay (2005) with the parameters being µ 1 = 0 0 0, Σ 1 = , µ 2 = 1/ 2 2 1/ 2, Σ 2 = (9) A straightforward calculation based on Theorem 4 shows that Σ 2 has eigenvalues 0.05, 1 and 20, i.e., D = d = 3. This density mixture has 4 modes, which again achieves the upper bound (D + 1). Though we have come up with examples achieving the upper bound for two and three components, it is not easy to come up with such pathological examples in higher dimensions. Hence we will design a construction method which allows one to construct one extra mode from each additional dimension. Starting from the fact that one can construct a mixture with two modes in one dimension (or using the examples in D=2 and D=3) one can use the 18

19 recursive relation to construct the parameters of a mixture in D dimensions which will have D + 1 modes. Recall that Theorem 6 shows that in D dimensions the equation q (t) = 0, can have at most 2D non-negative solutions, which in turn implies that the corresponding mixture can achieve at most D + 1 modes. Therefore, to achieve one extra mode in D + 1 dimensions we just need to choose the parameters of the mixture such that the corresponding q (t) = 0 achieves two extra non-negative solutions. The following Lemma provides the construction method to find the two extra solution of q (t) = 0 starting from any dimension D. Lemma 3. Let {(c i, λ i ), i = 1, 2,...,D} be such that the equation y(t, D) = 1 t(t + 1) D c i (t + λ i ) 3 = 0 has 2D non-negative solutions. Then one can always find a pair of scalars (c D+1, λ D+1 ) such that has 2D + 2 solutions. D+1 c i y(t, D + 1) = 1 t(t + 1) (t + λ i ) 3 = 0 Proof. Note that y(t, D) is the same as q (t) = 0 for D dimensions. Since y(t, D) = 0 has 2D non-negative solutions, and y(0, D) and y(, D) are both positive, y(t, D) changes sign 2D times in the positive axis of t. Let y(t, D) be positive at points t 0, t 2,, t 2D = a, and negative at points t 1, t 3,, t 2D 1, such that 0 t 0 < t 1 < t 2 < < t 2D 1 < t 2D = a. First we choose y 0 > 0 such that y 0 (a + λ) 3 < y(t j, D)(t j + λ) 3 for j even, and for all eigenvalues λ > 0. It can be verified that such an y 0 always exists. 1 Then we choose t 2D+1 > a such that t 2D+1 (t 2D+1 + 1) < y 0 8, and then we choose λ D+1 > max{λ 1,, λ D }, such that t 2D+1 + λ D+1 a + λ D+1 Now define c D+1 = y 0 (a + λ D+1 ) 3. < 2, which will ensure that (t 2D+1 + λ D+1 ) 3 t 2d+1 (t 2D+1 + 1)(a + λ D+1 ) 3 < y 0 (10) With the chosen pair of (c D+1, λ D+1 ) we have c D+1 > 0, for j even; Y (t j ) = y(t j, D) (t j + λ D+1 ) 3 < 0, for j odd. 19

20 c D+1 i.e., Y (t) = y(t, D) (t + λ D+1 ) 3 has the same sign as y(t, d) at points t 0, t 1,, t 2D, which means that Y (t) has 2D non-negative solutions which are all less than a = t 2D. On the other hand, we have c D+1 Y (t 2D+1 ) = y(t 2D+1, D) (t 2D+1 + λ D+1 ) 3 1 < t 2D+1 (t 2D+1 + 1) c D+1 (t 2D+1 + λ D+1 ) 3 < 1 t 2D+1 (t 2D+1 + 1) y 0(a + λ D+1 ) 3 (t 2D+1 + λ D+1 ) 3 < 0 where the last inequality holds because of the inequality (10). Hence Y (t) will be negative at point t 2D+1 > a, but lim t Y (t) > 0 so Y (t) = y(t, D + 1) = 0 has two more solutions than y(t, D) = 0, both of which are greater than a. Remark 3. Note that the proof of the above theorem provides only one method of constructing the two extra non-negative solutions. These solutions are not unique. The following corollary provides the recursive construction method for constructing extra modes when the dimension of mixture is increased by unity. Corollary 2. If a mixture of two normals in D dimensions has D +1 modes one can choose the parameters of the extra dimensions such that the resulting D + 1 dimensional normal will have D + 2 modes. Proof. Use Theorem 4 and 5 to re-parametrize any mixture to the form NM(0, I, µ,λ) D, where µ = (µ 1,...,µ D ), Λ = diag(λ 1,...,λ D ) and then use Lemma 3 with c i = λ i µ 2 i to compute (c D+1, λ D+1 ). The new mixture NM(0, I, µ = (µ 1,...,µ D, µ D+1 ), Λ = diag(λ 1,...,λ D, λ D+1, )) D+1, with µ D+1 = λ D+1 /c i will have D + 2 modes. We now apply the method described in Corollary 2 to construct a 4-dimensional example with 5 modes, starting from the 3-dimensional case in Example 4. Example 5. We first apply theorem 3 to transform the 3-dimensional normal mixture given in (9) into the form NM(0, I,µ 2, Λ) D=3, where µ 2 = 1/ , Σ 2 = (11) 20

21 Σ 2 has d = 3 eigenvalues: λ 1 = 0.05, λ 2 = 1, λ 3 = 20, with corresponding c i s given by Note that the equation q (t) = y(t, 3) = c 1 = 0.025, c 2 = 4, c 3 = t(t + 1) 3 c i has 6 positive solutions: (t + λ i ) , , , , and Now we take 0 < t 0 = < t 1 = 0.1 < t 2 = 0.3 < t 3 = 1 < t 4 = 3 < t 5 = 30 < t 6 = 200 = a such that y(t) is positive at points t 0, t 2, t 4, t 6, and negative at points t 1, t 3, t 5. Now choose y 0 = , then y 0 (a+λ) 3 < y(t j )(t j +λ) 3 for all j even, and eigenvalues 1 λ. Now take t 7 = > a = 200 such that t 7 (t 7 + 1) < y 0 8. Let λ 4 = , then t 7 + λ 4 a + λ 4 < 2. Let c 4 = y 0 (a + λ 4 ) 3 = , i.e., the last component of the new 4-dimensional mean is µ 4 = c 4 /λ 4 = This gives a 4-dimensional normal mixture NM(0, I,µ new 2, Σ new 2 ) D=4, with 1/ µ new 2 = The corresponding equation 2 10, Σnew = q (t) = 1 t(t + 1) has eight positive solutions as following: d c i (t + λ i ) 3 = , , , , , , and which implies the existence of five modes. Figure 6 shows the q (t) for the four dimensional example along with the eight nonnegative zero crossings. Among the eight crossings the two on the right are obtained using the construction method in Corollary 2. Remark 4. The construction process in Lemma 3 is designed to add two more positive solutions to equation q (t) = 0, when the dimension is increased, by adding another term in the summation, without perturbing the original non-negative solutions too much. In Example 5 we started with six roots in three dimensions and constructed two extra roots in four dimensions. Among the six roots the first five remained exactly the same as the original ones (according to our precision), and the sixth one is only shifted by a small magnitude (0.001).. 21

22 q (t) log(t) Figure 6: Plots for q (t), which has eight positive roots, along with the zero crossing. Here q (t) is plotted with respect to log(t) because of the big range of t. Finally we state arrive at the main theorem of the paper, Theorem 1, which proves the tightness of the bound given in Theorem 7, using the following argument Proof of Theorem 1. The upper bound has already been shown in Theorem 7. To show that this bound can be achieved we show the construction of mixtures with D + 1 modes in any dimension. In one dimension two normals with equal variance will have two modes if the distance between their means is more than two times the common standard deviation. Now one can use Corollary 2 repeatedly to construct one extra mode per dimension resulting in exactly D + 1 modes in D dimensions. 4.3 Special Cases The result given in Theorem 1 is the most general modality theorem available for a twocomponent normal mixture. Many previous modality results can be stated as special cases of this generalized result. In the corollaries which follow we show that our modality result can be used to duplicate some of the univariate and multivariate results found in the literature. The study of the case when D = 1, i.e., the mixture of two univariate normals, can be traced back to the early 20th century. For example, Helguero (1904) discussed the equal variance case, and Robertson and Fryer (1969) discussed the unequal variance case, and they both showed that there exists at most 2 modes for the univariate normal mixture. Note that for both cases, the two variances are either equal or proportional to one another in one dimension, and our result also shows that at most two modes are achievable. Some results 22

23 on the mixture of two higher-dimensional normals with equal or proportional variances have also been developed later. A recent result from Ray and Lindsay (2005) shows that for any dimension, a two-component normal mixture with proportional variances can have at most two modes. Our result confirms the result from Ray and Lindsay (2005), however with a different methodology. Corollary 3. In any dimension the mixture of two normal components with equal or proportional variance (Σ 2 = cσ 1 for a scalar c > 0), can have at most two modes. Proof. By Theorem 6 the maximum number of modes is one more than the number of distinct eigenvalues, d of Σ 2 = Σ1/2 2 Σ 1 1 Σ1/2 2. For the equal or proportional case Σ I if Σ 2 = Σ 1 2 = ci if Σ 2 = cσ 1 In both case all the eigenvalues are same, thus they can have at most two modes. Now we discuss some of the examples stated in Ray and Lindsay (2005). Both the two dimensional example with three modes with parameters given in Example 3 and the three dimensional example in Example 4 with four modes were stated earlier as mere examples of existence of more than two modes. But our results show that they actually achieve the upper bound possible within their respective dimensions. Moreover the construction method of the examples in Ray and Lindsay (2005) was not easily generalizable in higher dimensions, but our construction algorithm described in Lemma 3 provides an easy strategy for constructing such examples. 5 Conclusion and discussion In this paper we have developed a powerful theory for understanding the topography of a multivariate normal mixture model. The results on the upper bound are mainly focused on the two-component case, where we can provide the clear upper bound of D + 1 for any D dimensional normal mixtures. Moreover, for any dimension one can produce a mixture which attains the upper bound. In this paper, we have also verified that the number of modes for a two-component D-dimensional normal distribution mixture NM(µ 1, Σ 1, µ 2, Σ 2 ) D is bounded above by the distinct eigenvalues of the ratio matrix Σ 2 Σ 1 1, irrespective of the means. In the process of doing this analysis, we have not discussed how these new bounds and construction methods might be used for statistical purposes. We think that there is a wide area of application for these results. Given a parameter structure one can easily estimate the upper bound of the number of modes which might be enormous help for many clustering methods. The construction method might become handy for Bayesian prior elicitation. 23

24 Finally the results give us a clear understanding of the interplay of component means and variances in shaping up the topography of mixtures which may be easily generalizable to mixtures of other distributions. We also note that there are still a number of open mathematical questions. For example, mixture of T distributions are often used as a robust alternative to mixtures of normals, but there are no available results on the number of modes of the mixture of T s. One should note that the contours of T and normal, which determine the number of modes displays very similar topographical structure and so one might be able to borrow the results on topography of normals for exploring the topography of T mixtures. In fact using this intuition one can then easily generalize the results for any elliptical distribution. Finally, our results on upper bound are mainly derived for K = 2. It would therefore be useful to establish relationships between the modality structure of the pairs of densities in a mixture and the overall modality of the entire mixture of K > 2 components. This generalization becomes challenging even when K = 3 resulting in the ridgeline manifold of two dimensions which may involve finding the roots of an equation of two variables. Acknowledgments: We thank Dr. David Fried of the Department of Mathematics and Statistics at Boston University for his assistance in solving the algebraic problems for this paper. A Proof of Theorems and Results A.1 Proof of Theorem 4 We only need to compare if the function p(α) is same for the two mixtures NM(µ 1, Σ 1, µ 2, Σ 2 ) D and NM(0, I,µ 2, Σ 2 ) D.. Thus First note that for which implies S α = ασ ᾱσ 1 2 = Σ 1/2 2 (ασ 1/2 2 Σ 1 1 Σ1/2 2 + ᾱi)σ 1/2 2 Now for the mixture NM(µ 1, Σ 1, µ 2, Σ 2 ) D, S 1 α = Σ 1/2 2 (ασ 1/2 2 Σ 1 1 Σ1/2 2 + ᾱi) 1 Σ 1/2 2, Σ 1/2 2 S 1 α Σ 1/2 2 = (ασ 1/2 2 Σ 1 1 Σ1/2 2 + ᾱi) 1. p(α) = (µ 2 µ 1 ) Σ 1 1 S 1 α Σ 1 2 S 1 α Σ 1 2 S 1 α Σ 1 1 (µ 2 µ 1 ) = (µ 2 µ 1 ) Σ 1 1 Σ1/2 2 (Σ 1/2 2 Sα 1 Σ 1/2 2 ) 3 Σ 1/2 2 Σ 1 1 (µ 2 µ 1 ) = (µ 2 µ 1 ) Σ 1 1 Σ1/2 2 (ασ 1/2 2 Σ 1 1 Σ1/2 2 + ᾱi) 3 Σ 1/2 2 Σ 1 1 (µ 2 µ 1 ) 24

On the upper bound of the number of modes of a multivariate normal mixture

On the upper bound of the number of modes of a multivariate normal mixture On the upper bound of the number of modes of a multivariate normal mixture Surajit Ray and Dan Ren Department of Mathematics and Statistics Boston University 111 Cummington Street, Boston, MA 02215, USA

More information

THE TOPOGRAPHY OF MULTIVARIATE NORMAL MIXTURES 1

THE TOPOGRAPHY OF MULTIVARIATE NORMAL MIXTURES 1 The Annals of Statistics 2005, Vol. 33, No. 5, 2042 2065 DOI: 10.1214/009053605000000417 Institute of Mathematical Statistics, 2005 THE TOPOGRAPHY OF MULTIVARIATE NORMAL MIXTURES 1 BY SURAJIT RAY AND BRUCE

More information

BU Statistics Seminar Series Fall 2006

BU Statistics Seminar Series Fall 2006 Modal EM for Mixtures and its Application in Clustering BU Statistics Seminar Series Fall 2006 Surajit Ray BU : Sep 2006 - slide #1 Why Study Modality Inference about actual data generation process. Potentially

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Math Ordinary Differential Equations

Math Ordinary Differential Equations Math 411 - Ordinary Differential Equations Review Notes - 1 1 - Basic Theory A first order ordinary differential equation has the form x = f(t, x) (11) Here x = dx/dt Given an initial data x(t 0 ) = x

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

THE DYNAMICS OF SUCCESSIVE DIFFERENCES OVER Z AND R

THE DYNAMICS OF SUCCESSIVE DIFFERENCES OVER Z AND R THE DYNAMICS OF SUCCESSIVE DIFFERENCES OVER Z AND R YIDA GAO, MATT REDMOND, ZACH STEWARD Abstract. The n-value game is a dynamical system defined by a method of iterated differences. In this paper, we

More information

Modulation of symmetric densities

Modulation of symmetric densities 1 Modulation of symmetric densities 1.1 Motivation This book deals with a formulation for the construction of continuous probability distributions and connected statistical aspects. Before we begin, a

More information

The Multivariate Gaussian Distribution

The Multivariate Gaussian Distribution The Multivariate Gaussian Distribution Chuong B. Do October, 8 A vector-valued random variable X = T X X n is said to have a multivariate normal or Gaussian) distribution with mean µ R n and covariance

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

To get horizontal and slant asymptotes algebraically we need to know about end behaviour for rational functions.

To get horizontal and slant asymptotes algebraically we need to know about end behaviour for rational functions. Concepts: Horizontal Asymptotes, Vertical Asymptotes, Slant (Oblique) Asymptotes, Transforming Reciprocal Function, Sketching Rational Functions, Solving Inequalities using Sign Charts. Rational Function

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

PY 351 Modern Physics Short assignment 4, Nov. 9, 2018, to be returned in class on Nov. 15.

PY 351 Modern Physics Short assignment 4, Nov. 9, 2018, to be returned in class on Nov. 15. PY 351 Modern Physics Short assignment 4, Nov. 9, 2018, to be returned in class on Nov. 15. You may write your answers on this sheet or on a separate paper. Remember to write your name on top. Please note:

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

56 CHAPTER 3. POLYNOMIAL FUNCTIONS

56 CHAPTER 3. POLYNOMIAL FUNCTIONS 56 CHAPTER 3. POLYNOMIAL FUNCTIONS Chapter 4 Rational functions and inequalities 4.1 Rational functions Textbook section 4.7 4.1.1 Basic rational functions and asymptotes As a first step towards understanding

More information

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces. Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,

More information

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES JOEL A. TROPP Abstract. We present an elementary proof that the spectral radius of a matrix A may be obtained using the formula ρ(a) lim

More information

Recall the convention that, for us, all vectors are column vectors.

Recall the convention that, for us, all vectors are column vectors. Some linear algebra Recall the convention that, for us, all vectors are column vectors. 1. Symmetric matrices Let A be a real matrix. Recall that a complex number λ is an eigenvalue of A if there exists

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

22.3. Repeated Eigenvalues and Symmetric Matrices. Introduction. Prerequisites. Learning Outcomes

22.3. Repeated Eigenvalues and Symmetric Matrices. Introduction. Prerequisites. Learning Outcomes Repeated Eigenvalues and Symmetric Matrices. Introduction In this Section we further develop the theory of eigenvalues and eigenvectors in two distinct directions. Firstly we look at matrices where one

More information

MCPS Algebra 2 and Precalculus Standards, Categories, and Indicators*

MCPS Algebra 2 and Precalculus Standards, Categories, and Indicators* Content Standard 1.0 (HS) Patterns, Algebra and Functions Students will algebraically represent, model, analyze, and solve mathematical and real-world problems involving functional patterns and relationships.

More information

September Math Course: First Order Derivative

September Math Course: First Order Derivative September Math Course: First Order Derivative Arina Nikandrova Functions Function y = f (x), where x is either be a scalar or a vector of several variables (x,..., x n ), can be thought of as a rule which

More information

Eigenvalues and diagonalization

Eigenvalues and diagonalization Eigenvalues and diagonalization Patrick Breheny November 15 Patrick Breheny BST 764: Applied Statistical Modeling 1/20 Introduction The next topic in our course, principal components analysis, revolves

More information

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008.

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008. 1 ECONOMICS 594: LECTURE NOTES CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS W. Erwin Diewert January 31, 2008. 1. Introduction Many economic problems have the following structure: (i) a linear function

More information

MATH 215/255 Solutions to Additional Practice Problems April dy dt

MATH 215/255 Solutions to Additional Practice Problems April dy dt . For the nonlinear system MATH 5/55 Solutions to Additional Practice Problems April 08 dx dt = x( x y, dy dt = y(.5 y x, x 0, y 0, (a Show that if x(0 > 0 and y(0 = 0, then the solution (x(t, y(t of the

More information

MILNOR SEMINAR: DIFFERENTIAL FORMS AND CHERN CLASSES

MILNOR SEMINAR: DIFFERENTIAL FORMS AND CHERN CLASSES MILNOR SEMINAR: DIFFERENTIAL FORMS AND CHERN CLASSES NILAY KUMAR In these lectures I want to introduce the Chern-Weil approach to characteristic classes on manifolds, and in particular, the Chern classes.

More information

8. Diagonalization.

8. Diagonalization. 8. Diagonalization 8.1. Matrix Representations of Linear Transformations Matrix of A Linear Operator with Respect to A Basis We know that every linear transformation T: R n R m has an associated standard

More information

Convex Functions and Optimization

Convex Functions and Optimization Chapter 5 Convex Functions and Optimization 5.1 Convex Functions Our next topic is that of convex functions. Again, we will concentrate on the context of a map f : R n R although the situation can be generalized

More information

Computationally, diagonal matrices are the easiest to work with. With this idea in mind, we introduce similarity:

Computationally, diagonal matrices are the easiest to work with. With this idea in mind, we introduce similarity: Diagonalization We have seen that diagonal and triangular matrices are much easier to work with than are most matrices For example, determinants and eigenvalues are easy to compute, and multiplication

More information

Semidefinite Programming

Semidefinite Programming Semidefinite Programming Notes by Bernd Sturmfels for the lecture on June 26, 208, in the IMPRS Ringvorlesung Introduction to Nonlinear Algebra The transition from linear algebra to nonlinear algebra has

More information

Label Switching and Its Simple Solutions for Frequentist Mixture Models

Label Switching and Its Simple Solutions for Frequentist Mixture Models Label Switching and Its Simple Solutions for Frequentist Mixture Models Weixin Yao Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu Abstract The label switching

More information

APPENDIX B SUMMARIES OF SUBJECT MATTER TOPICS WITH RELATED CALIFORNIA AND NCTM STANDARDS PART 1

APPENDIX B SUMMARIES OF SUBJECT MATTER TOPICS WITH RELATED CALIFORNIA AND NCTM STANDARDS PART 1 APPENDIX B SUMMARIES OF SUBJECT MATTER TOPICS WITH RELATED CALIFORNIA AND NCTM STANDARDS This appendix lists the summaries of the subject matter topics presented in Section 2 of the Statement. After each

More information

Robustness of Principal Components

Robustness of Principal Components PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

Tangent spaces, normals and extrema

Tangent spaces, normals and extrema Chapter 3 Tangent spaces, normals and extrema If S is a surface in 3-space, with a point a S where S looks smooth, i.e., without any fold or cusp or self-crossing, we can intuitively define the tangent

More information

MULTIVARIABLE CALCULUS, LINEAR ALGEBRA, AND DIFFERENTIAL EQUATIONS

MULTIVARIABLE CALCULUS, LINEAR ALGEBRA, AND DIFFERENTIAL EQUATIONS T H I R D E D I T I O N MULTIVARIABLE CALCULUS, LINEAR ALGEBRA, AND DIFFERENTIAL EQUATIONS STANLEY I. GROSSMAN University of Montana and University College London SAUNDERS COLLEGE PUBLISHING HARCOURT BRACE

More information

Whitening and Coloring Transformations for Multivariate Gaussian Data. A Slecture for ECE 662 by Maliha Hossain

Whitening and Coloring Transformations for Multivariate Gaussian Data. A Slecture for ECE 662 by Maliha Hossain Whitening and Coloring Transformations for Multivariate Gaussian Data A Slecture for ECE 662 by Maliha Hossain Introduction This slecture discusses how to whiten data that is normally distributed. Data

More information

Conceptual Questions for Review

Conceptual Questions for Review Conceptual Questions for Review Chapter 1 1.1 Which vectors are linear combinations of v = (3, 1) and w = (4, 3)? 1.2 Compare the dot product of v = (3, 1) and w = (4, 3) to the product of their lengths.

More information

Content Standard 1: Numbers, Number Sense, and Computation Place Value

Content Standard 1: Numbers, Number Sense, and Computation Place Value Content Standard 1: Numbers, Number Sense, and Computation Place Value Fractions Comparing and Ordering Counting Facts Estimating and Estimation Strategies Determine an approximate value of radical and

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

Statistics 992 Continuous-time Markov Chains Spring 2004

Statistics 992 Continuous-time Markov Chains Spring 2004 Summary Continuous-time finite-state-space Markov chains are stochastic processes that are widely used to model the process of nucleotide substitution. This chapter aims to present much of the mathematics

More information

Out-colourings of Digraphs

Out-colourings of Digraphs Out-colourings of Digraphs N. Alon J. Bang-Jensen S. Bessy July 13, 2017 Abstract We study vertex colourings of digraphs so that no out-neighbourhood is monochromatic and call such a colouring an out-colouring.

More information

Lecture Notes: Geometric Considerations in Unconstrained Optimization

Lecture Notes: Geometric Considerations in Unconstrained Optimization Lecture Notes: Geometric Considerations in Unconstrained Optimization James T. Allison February 15, 2006 The primary objectives of this lecture on unconstrained optimization are to: Establish connections

More information

Questionnaire for CSET Mathematics subset 1

Questionnaire for CSET Mathematics subset 1 Questionnaire for CSET Mathematics subset 1 Below is a preliminary questionnaire aimed at finding out your current readiness for the CSET Math subset 1 exam. This will serve as a baseline indicator for

More information

The Chromatic Number of Ordered Graphs With Constrained Conflict Graphs

The Chromatic Number of Ordered Graphs With Constrained Conflict Graphs The Chromatic Number of Ordered Graphs With Constrained Conflict Graphs Maria Axenovich and Jonathan Rollin and Torsten Ueckerdt September 3, 016 Abstract An ordered graph G is a graph whose vertex set

More information

MATRICES ARE SIMILAR TO TRIANGULAR MATRICES

MATRICES ARE SIMILAR TO TRIANGULAR MATRICES MATRICES ARE SIMILAR TO TRIANGULAR MATRICES 1 Complex matrices Recall that the complex numbers are given by a + ib where a and b are real and i is the imaginary unity, ie, i 2 = 1 In what we describe below,

More information

In particular, if A is a square matrix and λ is one of its eigenvalues, then we can find a non-zero column vector X with

In particular, if A is a square matrix and λ is one of its eigenvalues, then we can find a non-zero column vector X with Appendix: Matrix Estimates and the Perron-Frobenius Theorem. This Appendix will first present some well known estimates. For any m n matrix A = [a ij ] over the real or complex numbers, it will be convenient

More information

The Simplex Method: An Example

The Simplex Method: An Example The Simplex Method: An Example Our first step is to introduce one more new variable, which we denote by z. The variable z is define to be equal to 4x 1 +3x 2. Doing this will allow us to have a unified

More information

FIRST-ORDER SYSTEMS OF ORDINARY DIFFERENTIAL EQUATIONS III: Autonomous Planar Systems David Levermore Department of Mathematics University of Maryland

FIRST-ORDER SYSTEMS OF ORDINARY DIFFERENTIAL EQUATIONS III: Autonomous Planar Systems David Levermore Department of Mathematics University of Maryland FIRST-ORDER SYSTEMS OF ORDINARY DIFFERENTIAL EQUATIONS III: Autonomous Planar Systems David Levermore Department of Mathematics University of Maryland 4 May 2012 Because the presentation of this material

More information

Algebra 2 (2006) Correlation of the ALEKS Course Algebra 2 to the California Content Standards for Algebra 2

Algebra 2 (2006) Correlation of the ALEKS Course Algebra 2 to the California Content Standards for Algebra 2 Algebra 2 (2006) Correlation of the ALEKS Course Algebra 2 to the California Content Standards for Algebra 2 Algebra II - This discipline complements and expands the mathematical content and concepts of

More information

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In

More information

CS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory

CS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory CS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory Tim Roughgarden & Gregory Valiant May 2, 2016 Spectral graph theory is the powerful and beautiful theory that arises from

More information

THE N-VALUE GAME OVER Z AND R

THE N-VALUE GAME OVER Z AND R THE N-VALUE GAME OVER Z AND R YIDA GAO, MATT REDMOND, ZACH STEWARD Abstract. The n-value game is an easily described mathematical diversion with deep underpinnings in dynamical systems analysis. We examine

More information

4. Linear transformations as a vector space 17

4. Linear transformations as a vector space 17 4 Linear transformations as a vector space 17 d) 1 2 0 0 1 2 0 0 1 0 0 0 1 2 3 4 32 Let a linear transformation in R 2 be the reflection in the line = x 2 Find its matrix 33 For each linear transformation

More information

av 1 x 2 + 4y 2 + xy + 4z 2 = 16.

av 1 x 2 + 4y 2 + xy + 4z 2 = 16. 74 85 Eigenanalysis The subject of eigenanalysis seeks to find a coordinate system, in which the solution to an applied problem has a simple expression Therefore, eigenanalysis might be called the method

More information

Honors Integrated Algebra/Geometry 3 Critical Content Mastery Objectives Students will:

Honors Integrated Algebra/Geometry 3 Critical Content Mastery Objectives Students will: Content Standard 1: Numbers, Number Sense, and Computation Place Value Fractions Comparing and Ordering Counting Facts Estimating and Estimation Strategies Determine an approximate value of radical and

More information

Review of Linear Algebra Definitions, Change of Basis, Trace, Spectral Theorem

Review of Linear Algebra Definitions, Change of Basis, Trace, Spectral Theorem Review of Linear Algebra Definitions, Change of Basis, Trace, Spectral Theorem Steven J. Miller June 19, 2004 Abstract Matrices can be thought of as rectangular (often square) arrays of numbers, or as

More information

Algebra 2 with Trigonometry Correlation of the ALEKS course Algebra 2 with Trigonometry to the Tennessee Algebra II Standards

Algebra 2 with Trigonometry Correlation of the ALEKS course Algebra 2 with Trigonometry to the Tennessee Algebra II Standards Algebra 2 with Trigonometry Correlation of the ALEKS course Algebra 2 with Trigonometry to the Tennessee Algebra II Standards Standard 2 : Number & Operations CLE 3103.2.1: CLE 3103.2.2: CLE 3103.2.3:

More information

Chapter 3 Transformations

Chapter 3 Transformations Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases

More information

642:550, Summer 2004, Supplement 6 The Perron-Frobenius Theorem. Summer 2004

642:550, Summer 2004, Supplement 6 The Perron-Frobenius Theorem. Summer 2004 642:550, Summer 2004, Supplement 6 The Perron-Frobenius Theorem. Summer 2004 Introduction Square matrices whose entries are all nonnegative have special properties. This was mentioned briefly in Section

More information

Some Notes on Linear Algebra

Some Notes on Linear Algebra Some Notes on Linear Algebra prepared for a first course in differential equations Thomas L Scofield Department of Mathematics and Statistics Calvin College 1998 1 The purpose of these notes is to present

More information

Introduction. J.M. Burgers Center Graduate Course CFD I January Least-Squares Spectral Element Methods

Introduction. J.M. Burgers Center Graduate Course CFD I January Least-Squares Spectral Element Methods Introduction In this workshop we will introduce you to the least-squares spectral element method. As you can see from the lecture notes, this method is a combination of the weak formulation derived from

More information

Metric-based classifiers. Nuno Vasconcelos UCSD

Metric-based classifiers. Nuno Vasconcelos UCSD Metric-based classifiers Nuno Vasconcelos UCSD Statistical learning goal: given a function f. y f and a collection of eample data-points, learn what the function f. is. this is called training. two major

More information

1 ** The performance objectives highlighted in italics have been identified as core to an Algebra II course.

1 ** The performance objectives highlighted in italics have been identified as core to an Algebra II course. Strand One: Number Sense and Operations Every student should understand and use all concepts and skills from the pervious grade levels. The standards are designed so that new learning builds on preceding

More information

Definition 2.3. We define addition and multiplication of matrices as follows.

Definition 2.3. We define addition and multiplication of matrices as follows. 14 Chapter 2 Matrices In this chapter, we review matrix algebra from Linear Algebra I, consider row and column operations on matrices, and define the rank of a matrix. Along the way prove that the row

More information

Course Goals and Course Objectives, as of Fall Math 102: Intermediate Algebra

Course Goals and Course Objectives, as of Fall Math 102: Intermediate Algebra Course Goals and Course Objectives, as of Fall 2015 Math 102: Intermediate Algebra Interpret mathematical models such as formulas, graphs, tables, and schematics, and draw inferences from them. Represent

More information

Unit 2, Section 3: Linear Combinations, Spanning, and Linear Independence Linear Combinations, Spanning, and Linear Independence

Unit 2, Section 3: Linear Combinations, Spanning, and Linear Independence Linear Combinations, Spanning, and Linear Independence Linear Combinations Spanning and Linear Independence We have seen that there are two operations defined on a given vector space V :. vector addition of two vectors and. scalar multiplication of a vector

More information

MATHEMATICAL METHODS AND APPLIED COMPUTING

MATHEMATICAL METHODS AND APPLIED COMPUTING Numerical Approximation to Multivariate Functions Using Fluctuationlessness Theorem with a Trigonometric Basis Function to Deal with Highly Oscillatory Functions N.A. BAYKARA Marmara University Department

More information

n=0 xn /n!. That is almost what we have here; the difference is that the denominator is (n + 1)! in stead of n!. So we have x n+1 n=0

n=0 xn /n!. That is almost what we have here; the difference is that the denominator is (n + 1)! in stead of n!. So we have x n+1 n=0 DISCRETE MATHEMATICS HOMEWORK 8 SOL Undergraduate Course Chukechen Honors College Zhejiang University Fall-Winter 204 HOMEWORK 8 P496 6. Find a closed form for the generating function for the sequence

More information

Spectral Theorem for Self-adjoint Linear Operators

Spectral Theorem for Self-adjoint Linear Operators Notes for the undergraduate lecture by David Adams. (These are the notes I would write if I was teaching a course on this topic. I have included more material than I will cover in the 45 minute lecture;

More information

PURE MATHEMATICS AM 27

PURE MATHEMATICS AM 27 AM SYLLABUS (2020) PURE MATHEMATICS AM 27 SYLLABUS 1 Pure Mathematics AM 27 (Available in September ) Syllabus Paper I(3hrs)+Paper II(3hrs) 1. AIMS To prepare students for further studies in Mathematics

More information

Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs

Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture

More information

MATH 353 LECTURE NOTES: WEEK 1 FIRST ORDER ODES

MATH 353 LECTURE NOTES: WEEK 1 FIRST ORDER ODES MATH 353 LECTURE NOTES: WEEK 1 FIRST ORDER ODES J. WONG (FALL 2017) What did we cover this week? Basic definitions: DEs, linear operators, homogeneous (linear) ODEs. Solution techniques for some classes

More information

Nonlinear Autonomous Systems of Differential

Nonlinear Autonomous Systems of Differential Chapter 4 Nonlinear Autonomous Systems of Differential Equations 4.0 The Phase Plane: Linear Systems 4.0.1 Introduction Consider a system of the form x = A(x), (4.0.1) where A is independent of t. Such

More information

8. Limit Laws. lim(f g)(x) = lim f(x) lim g(x), (x) = lim x a f(x) g lim x a g(x)

8. Limit Laws. lim(f g)(x) = lim f(x) lim g(x), (x) = lim x a f(x) g lim x a g(x) 8. Limit Laws 8.1. Basic Limit Laws. If f and g are two functions and we know the it of each of them at a given point a, then we can easily compute the it at a of their sum, difference, product, constant

More information

Eigenvalues and Eigenvectors

Eigenvalues and Eigenvectors Sec. 6.1 Eigenvalues and Eigenvectors Linear transformations L : V V that go from a vector space to itself are often called linear operators. Many linear operators can be understood geometrically by identifying

More information

Algebra Performance Level Descriptors

Algebra Performance Level Descriptors Limited A student performing at the Limited Level demonstrates a minimal command of Ohio s Learning Standards for Algebra. A student at this level has an emerging ability to A student whose performance

More information

ALGEBRA 2. Background Knowledge/Prior Skills Knows what operation properties hold for operations with matrices

ALGEBRA 2. Background Knowledge/Prior Skills Knows what operation properties hold for operations with matrices ALGEBRA 2 Numbers and Operations Standard: 1 Understands and applies concepts of numbers and operations Power 1: Understands numbers, ways of representing numbers, relationships among numbers, and number

More information

Rigid Geometric Transformations

Rigid Geometric Transformations Rigid Geometric Transformations Carlo Tomasi This note is a quick refresher of the geometry of rigid transformations in three-dimensional space, expressed in Cartesian coordinates. 1 Cartesian Coordinates

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Generalized eigenvector - Wikipedia, the free encyclopedia

Generalized eigenvector - Wikipedia, the free encyclopedia 1 of 30 18/03/2013 20:00 Generalized eigenvector From Wikipedia, the free encyclopedia In linear algebra, for a matrix A, there may not always exist a full set of linearly independent eigenvectors that

More information

PURE MATHEMATICS AM 27

PURE MATHEMATICS AM 27 AM Syllabus (014): Pure Mathematics AM SYLLABUS (014) PURE MATHEMATICS AM 7 SYLLABUS 1 AM Syllabus (014): Pure Mathematics Pure Mathematics AM 7 Syllabus (Available in September) Paper I(3hrs)+Paper II(3hrs)

More information

Module 3. Function of a Random Variable and its distribution

Module 3. Function of a Random Variable and its distribution Module 3 Function of a Random Variable and its distribution 1. Function of a Random Variable Let Ω, F, be a probability space and let be random variable defined on Ω, F,. Further let h: R R be a given

More information

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016 U.C. Berkeley CS294: Spectral Methods and Expanders Handout Luca Trevisan February 29, 206 Lecture : ARV In which we introduce semi-definite programming and a semi-definite programming relaxation of sparsest

More information

MATH 114 Calculus Notes on Chapter 2 (Limits) (pages 60-? in Stewart)

MATH 114 Calculus Notes on Chapter 2 (Limits) (pages 60-? in Stewart) Still under construction. MATH 114 Calculus Notes on Chapter 2 (Limits) (pages 60-? in Stewart) As seen in A Preview of Calculus, the concept of it underlies the various branches of calculus. Hence we

More information

On the Geometry of EM algorithms

On the Geometry of EM algorithms On the Geometry of EM algorithms David R. Hunter Technical report no. 0303 Department of Statistics Penn State University University Park, PA 16802-2111 email: dhunter@stat.psu.edu phone: (814) 863-0979

More information

Lecture for Week 2 (Secs. 1.3 and ) Functions and Limits

Lecture for Week 2 (Secs. 1.3 and ) Functions and Limits Lecture for Week 2 (Secs. 1.3 and 2.2 2.3) Functions and Limits 1 First let s review what a function is. (See Sec. 1 of Review and Preview.) The best way to think of a function is as an imaginary machine,

More information

Linear algebra for MATH2601: Theory

Linear algebra for MATH2601: Theory Linear algebra for MATH2601: Theory László Erdős August 12, 2000 Contents 1 Introduction 4 1.1 List of crucial problems............................... 5 1.2 Importance of linear algebra............................

More information

CS 6820 Fall 2014 Lectures, October 3-20, 2014

CS 6820 Fall 2014 Lectures, October 3-20, 2014 Analysis of Algorithms Linear Programming Notes CS 6820 Fall 2014 Lectures, October 3-20, 2014 1 Linear programming The linear programming (LP) problem is the following optimization problem. We are given

More information

TEACHER NOTES FOR ADVANCED MATHEMATICS 1 FOR AS AND A LEVEL

TEACHER NOTES FOR ADVANCED MATHEMATICS 1 FOR AS AND A LEVEL April 2017 TEACHER NOTES FOR ADVANCED MATHEMATICS 1 FOR AS AND A LEVEL This book is designed both as a complete AS Mathematics course, and as the first year of the full A level Mathematics. The content

More information

Scattered Data Interpolation with Polynomial Precision and Conditionally Positive Definite Functions

Scattered Data Interpolation with Polynomial Precision and Conditionally Positive Definite Functions Chapter 3 Scattered Data Interpolation with Polynomial Precision and Conditionally Positive Definite Functions 3.1 Scattered Data Interpolation with Polynomial Precision Sometimes the assumption on the

More information

The chromatic number of ordered graphs with constrained conflict graphs

The chromatic number of ordered graphs with constrained conflict graphs AUSTRALASIAN JOURNAL OF COMBINATORICS Volume 69(1 (017, Pages 74 104 The chromatic number of ordered graphs with constrained conflict graphs Maria Axenovich Jonathan Rollin Torsten Ueckerdt Department

More information

PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS

PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS Statistica Sinica 15(2005), 831-840 PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS Florin Vaida University of California at San Diego Abstract: It is well known that the likelihood sequence of the EM algorithm

More information

Abstract & Applied Linear Algebra (Chapters 1-2) James A. Bernhard University of Puget Sound

Abstract & Applied Linear Algebra (Chapters 1-2) James A. Bernhard University of Puget Sound Abstract & Applied Linear Algebra (Chapters 1-2) James A. Bernhard University of Puget Sound Copyright 2018 by James A. Bernhard Contents 1 Vector spaces 3 1.1 Definitions and basic properties.................

More information

Systems of Algebraic Equations and Systems of Differential Equations

Systems of Algebraic Equations and Systems of Differential Equations Systems of Algebraic Equations and Systems of Differential Equations Topics: 2 by 2 systems of linear equations Matrix expression; Ax = b Solving 2 by 2 homogeneous systems Functions defined on matrices

More information

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest

More information

4. Determinants.

4. Determinants. 4. Determinants 4.1. Determinants; Cofactor Expansion Determinants of 2 2 and 3 3 Matrices 2 2 determinant 4.1. Determinants; Cofactor Expansion Determinants of 2 2 and 3 3 Matrices 3 3 determinant 4.1.

More information

ESCONDIDO UNION HIGH SCHOOL DISTRICT COURSE OF STUDY OUTLINE AND INSTRUCTIONAL OBJECTIVES

ESCONDIDO UNION HIGH SCHOOL DISTRICT COURSE OF STUDY OUTLINE AND INSTRUCTIONAL OBJECTIVES ESCONDIDO UNION HIGH SCHOOL DISTRICT COURSE OF STUDY OUTLINE AND INSTRUCTIONAL OBJECTIVES COURSE TITLE: Algebra II A/B COURSE NUMBERS: (P) 7241 / 2381 (H) 3902 / 3903 (Basic) 0336 / 0337 (SE) 5685/5686

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

The Cartan Dieudonné Theorem

The Cartan Dieudonné Theorem Chapter 7 The Cartan Dieudonné Theorem 7.1 Orthogonal Reflections Orthogonal symmetries are a very important example of isometries. First let us review the definition of a (linear) projection. Given a

More information