A Separability Index for Distance-based Clustering and Classification Algorithms
|
|
- Harvey Tucker
- 5 years ago
- Views:
Transcription
1 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL.?, NO.?, JANUARY 1? 1 A Separabiity Index for Distance-based Custering and Cassification Agorithms Arka P. Ghosh, Ranjan Maitra and Anna D. Peterson Abstract We propose a separabiity index that quantifies the degree of difficuty in a hard custering probem under assumptions of a mutivariate Gaussian distribution for each custer. A preiminary index is first defined and severa of its properties are expored both theoreticay and numericay. Adjustments are then made to this index so that the fina refinement is aso interpretabe in terms of the Adjusted Rand Index between a true grouping and its hypothetica ideaized custering, taken as a surrogate of custering compexity. Our derived index is used to deveop a data-simuation agorithm that generates sampes according to the prescribed vaue of the index. This agorithm is particuary usefu for systematicay generating datasets with varying degrees of custering difficuty which can be used to evauate performance of different custering agorithms. The index is aso shown to be usefu in providing a summary of the distinctiveness of casses in grouped datasets. Index Terms Separation index, mutivariate Jaccard index, custering compexity, exact-c-separation, radia visuaization pots 1 INTRODUCTION There is a arge body of iterature [1], [2], [3], [4], [5], [6], [7], [8], [9] dedicated to custering observations in a dataset into homogeneous groups, but no method uniformy outperforms the others. Many agorithms perform we in some settings but not so otherwise. Further, the settings where agorithms work we or poory is very often not quite understood. Thus there is need for a systematic study of performance of any custering agorithm, and aso for evauating effectiveness of new methods using the same objective criterion. Many researchers evauate the performance of a proposed custering technique by comparing its performance on cassification datasets ike textures [10], wine [11], Iris [12], crabs [13], image [14], E. coi [15] or [16] s dataset. Whie evauating a custering agorithm through its performance on seect cassification datasets is usefu, it does not provide a systematic and comprehensive understanding of its strengths and weaknesses over many scenarios. Thus, there is need for ways to simuate datasets of different custering difficuty and caibrating the performance of a custering agorithm under different conditions. In order to do so, we need to index the custering compexity of a dataset appropriatey. There have been some attempts at generating custered data of different custering compexity in terms of separabiity indices. [17] proposed a much-used agorithm [18], [19], [], [21], [22] that generates we-separated custers from The authors are with the Department of Statistics, Iowa State University, Ames, IA , USA. c 10 IEEE. Persona use of this materia is permitted. However, permission to use this materia for any other purposes must be obtained from the IEEE by sending a request to pubs-permissions@ieee.org. Manuscript received June 21, 10; revised October 21, 10. First pubished xxxxxxxx x, xxxx, current version pubished yyyyyyyy y, yyyy R. Maitra s and A. D. Peterson s research was partiay supported by the Nationa Science Foundation under Grant Nos. CAREER DMS and VIGRE , respectivey. Digita Object Identifier norma distributions over bounded ranges, with provisions for incuding scatter [23], non-informative dimensions, outiers. However, [24] observed that both increasing the variance and adding outiers increases the degree of overap in unpredictabe and differing ways, and thus, this method is incapabe of accuratey generating indexed custered data. [25] proposed the OCLUS agorithm for generating custers based on known asymptotic) overap by having the user provide a design matrix and an order matrix the former indicates the at most) tripets of custers that are desired to be overapping with each other whie the atter dictates the ordering of custers in each dimension. Athough, the idea of using overap in generating custered data is appeaing, the agorithm has constraints beyond the structure of the design matrix above: for instance, independence between dimensions is aso required. A separation index between any two univariate Gaussian custers was proposed by [26]. For higher dimensions, they aso used the same index on the 1-D transformation obtained after optimay projecting the two mutivariate norma custers onto 1-D space. For mutipe custers, their overa separation index aso the basis for their custer generation agorithm [27]) is the maximum of a n 2) pairwise indices and thus, quite impervious to variations between other groups that are not in this maximizing pair. Additionay, characterizing separation between muti-dimensiona custers by means of the best 1-D projection oses substantia information: thus, resuting statements on custer overap can be very miseading. Finding the optima 1-D projection is aso computationay intensive and impractica for very high dimensions. [28], [29], [30] demonstrated performance of their custering agorithms using simuation datasets generated using the concept or a variant) of c-separation between custers proposed by [31], which defines two Gaussian distributions Nµ 1, Σ 1 ) and Nµ 2, Σ 2 ) in IR n as c-separated if µ 1 µ 2 c n maxλ max Σ 1 ), λ max Σ 2 )) where λ max Σ i ) is the argest eigenvaue of Σ i. [32] formaized the concept to
2 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL.?, NO.?, JANUARY 1? 2 exact-c-separation by requiring equaity for at east one pair i, j) of custers and used it for generating datasets to caibrate some partitiona initiaization agorithms. He aso pointed out some inherent shortcomings that originate from ignoring the reative orientations of the custer dispersions. More recenty, [33] proposed a method for generating Gaussian mixture distributions according to some summary measure of overap between every component pair, defined as the unweighted sum of the probabiities of their individua miscassification rates. They aso provided open-source C software C-MixSim) and a R package MixSim) for generating custers corresponding to desired overap characteristics. In contrast to many of the existing indices and simuation agorithms, their methodoogy does not impose any restriction on the parameters of the distributions but was derived entirey in the context of mixture modes and mode-based custering agorithms. Thus, their methods were specificay geared toward soft custering and mode-based custering scenarios. In this paper, we compement [33] s scenario and derive a separation index Section 2) for distance-based partitioning and hard custering agorithms. Our index is motivated by the intuition that for any two we-separated groups, the majority of observations shoud be coser to their own center than to the other. We use Gaussian-distributed custers and Eucidean and Mahaanobis distances to simpify our theoretica cacuations. The preiminary index is investigated and fine-tuned in the context of homogeneous spherica custers in Section 3.1 and then extended to the case for mutipe groups. The methodoogy is then studied for the genera case in Section 3.2. Our derived index can be used to quantify cass distinctions in grouped data, and we iustrate this appication in Section 4 in the context of severa cassification datasets. The main paper concudes with some discussion. The paper aso has an appendix detaiing the agorithm that uses our index to generate datasets of desired custering compexity. 2 METHODOLOGICAL DEVELOPMENT Consider a dataset S = {X 1, X 2,..., X n }. The objective of hard custering or fixed-partitioning agorithms is to group the observations into hard categories C 1, C 2,..., C K such that some objective function measuring the quaity of fit is minimized. If the objective function is specified in terms of minimizing the tota distance of each observation from some characteristic of the assigned partition, then it is defined as O K = n i=1 k=1 K IX i C k )D k X i ) 1) where D k X i ) is the distance of X i from the center of the k-th partition, and assumed to be of the form D k X i ) = X i µ k ) k X i µ k ) 2) where k is a non-negative definite matrix of dimension p p. We consider two specia cases here: in the first case, k = I p, the identity matrix of order p p where D k X i ) reduces to the squared Eucidean distance and soving for 1) invoves finding partitions C 1, C 2,..., C K in S such that the sum of the squared Eucidean distance of each observation to the center of its assigned partition is minimized. The popuar k-means agorithm [34], [35] provides ocay optima soutions to this minimization probem. In the second scenario, k = Σ 1 k, where Σ k is the dispersion matrix of the kth partition. Then D k X i ) is the Mahaanobis distance between X i and µ k. In this paper, we adopt a convenient mode-based formuation for the setup above. In this formuation, X 1, X 2,..., X n are independent p-variate observations with X i N p µ ζi, Σ ζi ), where ζ i {1, 2,..., K} for i = 1, 2,..., n. Here we assume that µ k s are a distinct and that n k is the number of observations in custer k. Then the density for the X i s is given by fx) = K k=1 IX C k)φx; µ k, Σ k ), where C k is the subpopuation indexed by the N p µ k, Σ k ) density and IX C k ) is an indicator function specifying whether observation X beongs to the kth group having a p-dimensiona mutivariate norma density φx; µ k, Σ k ) Σ k p 2 exp 1 2 X µ k ) Σ 1 k X µ k)), k = 1,..., K. When Σ k = I p, maximizing the ogikeihood with respect to the parameters ζ i, i = 1, 2,..., n and µ k s is equivaent to soving for 1) with D k X i ) as Eucidean distance. Using this formaism, we deveop, in theoretica terms, a separation index that quantifies separation between any two custers and reates it to the difficuty in recovering the true partition C 1, C 2,..., C K of a dataset. We begin by defining a preiminary index. 2.1 A preiminary separation index between two custers Consider the case with K = 2 groups, abeed C j and C for j {1, 2}). Define Y j, X) = D j X) D X), where X C, 3) and Y,j X), simiary. Using the modeing formuation above, Y j, X) is a random variabe which represents the difference in squared distances of X C to the center of C j and to the center of C. For distance-based cassification methods, Pr[Y j, X) < 0] is the probabiity that an observation is cassified into C j when in fact the observation is from C. Intuitivey, one expects that since X ζi : ζ i =, i = 1,..., n, beong to C, then most of these n observations wi be coser to the mean of C, compared to the mean of C j. Based on this observation, we define the index in terms of the probabiity that α fraction of the observations are coser to the incorrect custer center. In other words, we find the probabiity that an order statistic say, n α -th, for α 0, 1)) of {Y j, ζ i X) : ζ i =, i = 1,..., n} is ess than 0. Here x is the greatest integer smaer than x). We specify this probabiity as p j,. To simpify notation we wi assume that n α is an integer. Therefore, p j, = n i=n α ) n Pr[Y j, X) < 0] i Pr[Y j, X) > 0] n i, i 4) and p,j is defined simiary. Since both these probabiities can be extremey sma we take the average to an inverse power of a function of n j and n. We define the pairwise index as
3 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL.?, NO.?, JANUARY 1? 3 ) 1 1 I j, = 1 2 p,j + p j, n j, ), 5) where n j, = n,j = n 2 + n2 j )/2. Note that this index incorporates cass sizes into it, which is desirabe since miscassification rates are affected by the reative sizes of groups. Aso, when n j = n = n then n j, = n. The index I j, takes vaues in [0, 1], with a vaue cose to unity indicating a we-separated dataset and vaues coser to zero indicating that the two groups have substantia overap. We ca the index in 5) the preiminary index, because we wi make suitabe adjustments to this obtained through simuations in Section 3. The cacuation of I requires the knowedge of the distribution of the random variabes defined in 3), which is summarized in the Theorem beow for different cases. Here and for the remainder of this paper, the notation X = d Y means that X has the same distribution as Y. Theorem 1: For any, j {1,..., K}, et Y j, X) be as defined in 3), for some distance metric D j and D as defined in 2). Further assume X N p µ k, Σ k ). Then for specific choices of, j and for µ j = µ µ j the foowing hod: a) For the Eucidean distance j = = I) Y j, X) Nµ j µj, 4µ j Σ µ j ). b) For the Mahaanobis distance j = Σ 1 j, = Σ 1 ), when both custers have identica covariance structures, i.e. Σ = Σ j Σ, then Y j, X) Nµ j Σ 1 µ j, 4µ j Σ 1 µ j ) c) For the Mahaanobis distance j = Σ 1 j, = Σ 1 ), when the two custers do not have identica covariance structures i.e. Σ Σ j ), et λ 1, λ 2,..., λ p be the eigenvaues of Σ j Σ 1 2 Σ 1 j Σ 1 2 and the corresponding eigenvectors be given by γ 1, γ 2,...γ p. Then Y j, X) has the distribution of p λ i 1) 2 U i λ iδ 2 i i=1:λ 1 λ i 1 + p i=1:λ=1 λ iδ i 2W i + δ i ), where U i s are independent non-centra χ2 random variabes with one degree of freedom and noncentraity parameter given by λ 2 δ2 /λ 1) 2 with δ i = γ i Σ 1 2 µ µ j ) for i {1, 2,..., p} {i : λ 1}, independent of W i s, which are independent N p 0, 1) random variabes, for i {1, 2,..., p} {i : λ i = 1}. Proof: For any p-variate vector X, D j X) D X) = X µ j ) j X µ j ) X µ ) X µ ) = X j )X + 2X µ j µ j ) +µ j j µ j µ µ ). 6) Therefore, when j = = I, Y j, X)=D j X) D X)=2X µ µ j )+µ j µ j µ µ, where X N p µ, Σ ). Simpe agebra competes the proof of part a). Simiar cacuation using 6), when j = = Σ 1, shows that Y j, X)=2X µ µ j ) + µ j Σ 1 µ j µ Σ 1 µ, where X N p µ, Σ ). This competes the proof of part b) using simiar arguments as above. For part c), et ξ N p 0, I). Since X d = Σ 1 2 ξ + µ, using 6), we have Y j, X) = X Σ 1 j Σ 1 +µ jσ 1 j )X + 2X Σ 1 µ Σ 1 j µ j ) µ j µ Σ 1 µ d = Σ 1 2 ξ + µ ) Σ 1 j Σ 1 )Σ 1 2 ξ + µ ) +2Σ 1 2 ξ + µ ) Σ 1 µ Σ 1 j µ j ) +µ jσ 1 j µ j µ Σ 1 µ = ξ Σ j I)ξ + 2ξ Σ 1 2 Σ 1 j )µ µ j ) +µ µ j ) Σ 1 j )µ µ j ) 7) where Σ j = Σ 1 2 Σ 1 j Σ 1 2. Let the spectra decomposition of Σ j be given by Σ j = Γ j Λ j Γ j, where Λ j is a diagona matrix containing the eigenvaues λ 1, λ 2,... λ p of Σ j, and Γ j is an orthogona matrix containing the eigenvectors γ 1, γ 2,...γ p of Σ j. Since Z Γ j ξ N p 0, I) as we, we get from 7) that Y j, X) d = ξ Γ j Λ j Γ j Γ j Γ j )ξ +2ξ Γ j Λ j Γ j Σ )µ µ j ) +µ µ j ) Σ Γ j Λ j Γ j Σ 1 2 )µ µ j ) = Γ j ξ) Λ j I)Γ j ξ) d = +2Γ j ξ) Λ j Γ j Σ 1 2 )µ µ j ) +µ µ j ) Σ 1 2 Γ j Λ j Γ j Σ 1 2 )µ µ j ) p λ i 1)Z 2 i + 2λ i δ i Z i + λ i δi 2, 8) i=1 where δ i, i = 1,..., p are as in the statement of the theorem. Depending on the vaues of λ, one can simpify the expression in 8). If λ i > 1: λ i 1)Z 2 i +2λ iδ i Z i +λ i δ 2 i = λ 1Z i + λ i δ i / λ i 1) 2 λ i δ 2 i /λ i 1), which is distributed as a λ i 1)χ 2,λ 2 δ2 /λi 1)2 -random variabe. If λ i < 1: λ i 1)Z i 2 + 2λ i δ i Z i +λ i δi 2 = 1 λz i +λ i δ i / λ i 1) 2 λ i δi 2/λ i 1), which is distributed as a 1 λ i )χ 2 -random,λ 2 δ2 /λi 1)2 variabe. In the case of λ i = 1, λ i 1)Zi 2 +2λ iδ i Z i +λ i δi 2 = 2λ i δ i Z i + λ i δi 2. The proof foows from incorporating these expressions and rearranging terms in 8). Remark 2: Computing Pr[Y j, X) < 0] in case a) and b) of the theorem invoves cacuating Gaussian probabiities. For the third case part c) of the theorem) we see that this invoves cacuating the probabiity of a inear combination of independent non-centra χ 2 and Gaussian variabe, for which we use Agorithm AS 155 of [36]. Once Pr[Y j, X) < 0] and Pr[Y,j X) < 0] are computed, the index can be cacuated from 5) using p j, and p,j computed from 4) Properties of our Preiminary Index In this section, we highight some properties of our preiminary index with regard to achievabe vaues and scaing that wi be used for our second objective of simuating random configurations satisfying a desired vaue of our index.
4 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL.?, NO.?, JANUARY 1? 4 Theorem 3: Fix c > 0 α 0, 1) and consider two custers C and C j. Each C i has n i p-variate Gaussian observations with mean cµ i and covariance Σ i, i {j, }. Then for specific choices of the distance metric, the foowing properties hod: a) For the Eucidean distance j = = I), define θ i θj+θ ) 2 Φ n i 1 2α)), i {j, } and I j, 0 1 ) 1/nj,. Then for arge nj, n, im c 0 Ij, I j, 0 and im c Ij, = 1. 9) b) For the specia case of Mahaanobis distance with identica covariance structure i.e., j = = Σ 1 ), define θ j, θ, I j, 0 as in part a). Then for arge n j, n, im c 0 Ij, I j, 0 and im c Ij, = 1. 10) c) For the genera Mahaanobis distance i.e., j = Σ 1 j, = Σ 1 with Σ Σ j ), et λ 1, λ 2,..., λ p be the eigenvaues of Σ j Σ 1 2 Σ 1 j Σ 1 2 and Z,...Z be independent N0, 1) random variabes. Define for niκ α) i {j, }, θ i Φ ), where κ = κ1 κ) Pr [ p i=1 λ i 1)Z 2 i < 0 ]. Aso define I j, 0 1 ) 1/nj,. Then for arge nj, n. Then θj+θ ) 2 im c 0 Ij, I j, 0 and im c Ij, = 1. 11) Proof: First note that for arge n and the norma approximation to binomia probabiities, p j, = n i=n α Φ n ) Pr[Y j, X) < 0] i Pr[Y j, X) > 0] n i i n Pr[Y j, X) < 0] α ) Pr[Y j, X) < 0] Pr[Y j, X) > 0] ). 12) We get from Theorem 1a), Y j, X) Nc 2 µ µ j ) µ µ j ), 4c 2 µ µ j ) Σ µ µ j )) and hence, Pr[Y j, X) < 0] = Φ cµ µ j ) µ µ j ). 2 µ µ j ) Σ µ µ j )) Taking imits of both sides, we get using continuity of Φ ), im Pr[Y j, X) < 0] = 0.5, and c 0 im Pr[Y j, X) < 0] = 0. c This, together with 12) and the definition of I j, in 5), competes the proof of part a). The proof for Part b) is very simiar to that of part a) and is omitted. For part c), note that from Theorem 1c), p Y j, X) = d λ i 1)Z 2 i + 2λ i δ i cz i + λ i δi 2 c 2, 13) i=1 where δ i, i = 1,..., p are as defined thereand using the fact that the means are cµ j and cµ here). Then by continuity arguments, it foows that [ p ] im Pr [ Y j, X) < 0 ] = Pr c 0 =1 λ i 1)Z i 2 < 0 = κ. Note that the right-side of 13) is a quadratic in c with eading coefficient p i=1 λ iδ 2 i > 0. Hence, for arge vaues of c in particuar, when c is arger than the argest root of the quadratic equation), the right side in 13) is positive. Taking imits on both sides of 13) and using continuity arguments we get im c Pr [ Y j, X) < 0 ] = 0. This competes the proof of Theorem 3. The foowing coroary is immediate from the proof above. Coroary 4: Fix α 0, 1). Let for any c > 0, I j, c) = I j, be our preiminary index of separation between two groups C and C j, each having n i number of p-variate Gaussian observations with mean cµ i and covariance Σ i, i {j, }. For different choices of the distance metric, et I j, 0 be as defined in the three parts of Theorem 3. Then I j, c) is a continuous function of c with its range containing I j, 0, 1). From Theorem 1 see Remark 2 and Coroary 4), one can compute the vaue of the preiminary index for given vaues of the custer means and dispersions. In rea datasets, we can appy some custering method to obtain the custer memberships and estimate the custer means and the variance structures and compute an estimated version of the index. See Section 4 for one such appication. 2.2 Generating Data with given vaues of the Index From Coroary 4, it is cear that any vaue in I j, 0, 1) is a possibe vaue of I j,, for a given set of mode parameters cµ, Σ, n 1, n 2 ) by suitaby choosing the scaing parameter c > 0. This eads to the data-generation agorithm described in the appendix. The main idea is that for a given target vaue of the index, start with some initia sets of parameters p, µ, Σ, n 1, n 2 ), and compute our index for this initia configuration. Then find c > 0 iterativey so that the data with parameters cµ, Σ, n 1, n 2 ) attains the target vaue. The agorithm described in the appendix gives a more genera version of this for mutipe custers as we as for the adjusted version of the index discussed in Section 3. 3 ILLUSTRATIONS AND ADJUSTMENTS In this section, we provide some iustrative exampes obtained by simuating reaizations from groups under different scenarios using the preiminary index and the agorithm in the appendix. We study the reationship of reaizations obtained at these different vaues of the index with that of custering difficuty, which we measure in terms of the Adjusted Rand index R) of [37] obtained by custering the observations in each dataset and comparing the resut with the true cassification. Note that by design, R takes an average vaue of zero if a observations are assigned to a group competey at random. A perfect grouping of the observations matching the true, on the other hand, yieds R its highest vaue of unity. 3.1 Homogeneous Spherica Custers Here, it is assumed that a groups have the same spherica dispersion structure, i.e., Σ k = σ 2 I for a k {1, 2,..., K}. In this section, the ideaized R is cacuated on each simuated dataset by comparing the true cassification with that obtained
5 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL.?, NO.?, JANUARY 1? 5 using the k-means agorithm of [35] started with the true known) group means in order to eiminate initiaization issues on the obtained custering). The obtained custering may thus be regarded as the best possibe grouping and its degree of abiity as measured by R) to recover the true casses may be considered to be an indication of the custering compexity of the simuated dataset The Two-Groups Case a) R = 0.83 d) R = 0.96 g) R = 1.0 I = 0.6 b) R = 0.86 I = 0.8 e) R = 0.96 I = 1.0 h) R = 1.0 c) R = 0.83 f) R = 0.92 i) R = 1.0 Fig. 1. Simuated datasets at three different vaues of I. Coors and symbos represent the true custer cass. Above each set of pots we give the vaue of I and beow each individua pot we give the obtained R. Figures 1a-i dispay simuated datasets for different vaues of our preiminary index I I 1,2 with α = See [38] for pots of additiona reaizations and at other vaues of I. In each case, 100 observations were generated from each of two groups separated according to I. In the figures, coor and character are used to distinguish the true grouping. Figure 1 demonstrates that as the vaue of I increases, the custers become more separated. This is confirmed by the vaues of R. The cassifications of the datasets in Figures 1a-c have the owest R between 0.83 and Each subsequent row down has a range of R higher than those in the previous row. Thus, there is some quaitative support for our measure of separation I) as an indicator of difficuty, however a more comprehensive anaysis is needed for using I as a quantitative measure. Therefore, we conducted a simuation study to investigate the vaidity of our index as a quantitative surrogate for custering compexity. Specificay, we simuated 25 datasets, each having observations from two groups at each of 15 eveny-spaced vaues of I using α = 0.75) in 0, 1), for nine different combinations of numbers of observations per custer and dimensions. For each dataset obtained at each vaue of I, we cacuated the R as outined above. Figures 2a-c dispay the resuts for p = 10, 100 and 1000, respectivey, with I on the x-axis and the corresponding R on the y-axis. Coor is used to indicate the number of observations per group in each dataset at setting. In each figure, the shaded region for each coor denotes the spread between the first and third quarties of R based on 25 datasets. From the figures, we note that I tracks R very we, again providing quaitative support for our index as a surrogate for custering compexity. However, the reationship is not inear. There is aso more variabiity in R when the number of observations is ow. Further, the bands representing the spread between the first and third quarties of R do not often overap with a change in dimension. Thus, there is some inconsistency in the reationship between our preiminary index I and R across dimensions. Indeed, this inconsistency is most apparent when the number of observations in the dataset is ess than the number of dimensions. This inconsistency is not terriby surprising, and in ine with the so-caed curse of dimensionaity and the need for arger sampe sizes with increasing p [39] to obtain the same kinds of custering performance as with ower dimensions. In order to maintain interpretabiity across dimensions, we therefore investigate adjustments to our preiminary index to account for the effect of n and p. We pursue this course in the remainder of this section An Initia Adjustment to I for group size and dimension: To understand further the reationship between I and R we simuated 25 datasets each with observations from two homogeneous spherica groups for a combinations of n 1, n 2, p), where n 1 n 2 assumed without oss of generaity) were the observations in the first and second groups, and p {2, 4, 5, 10,,, 100, 0, 0, 1000}, and n 1, n 2 ) {, ),, ), 75, 75), 100, 100), 0, 0), 0, 0), 1000, 1000), 30, 100),, ), 60, 75), 90, 100), 1, 2),, 600), 100, 1000)}. We simuated datasets according to ten vaues of I eveny spaced between 0 and 1, and for α = For each of these 1000 datasets thus obtained, we again obtained R using k-means. We expored severa reationships between I and R very extensivey. Of these exporations, the foowing mutipicative reationship between I and R was found to perform the best: og where δ α = θ α = ) )) θα R expδ α ) og, 14) 1 R 1 I 1 i j 4 1 i j k 4 ζ ωi,ω j,ω k ogω i ) ogω j ) ogω k ), β ωi,ω j ogω i ) ogω j ), and ω 1, ω 2, ω 3, ω 4 ) = n 1, n 2, p, e). Then for α =0.75 we fit the inear mode [ )] [ )] R og og θ α og og + δ α. 15) 1 R 1 I
6 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL.?, NO.?, JANUARY 1? a) p = b) p = c) p = 1000 Fig. 2. Pots for K = 2 custers comparing R y-axis) again I x-axis) for α = The three coors designate number of observations per custer such that n 1 = n 2. The ower and upper bound of the bands represent the first and third quartie of R cacuated on 25 datasets for severa different I. The estimates for the coefficients in 15) were obtained using east-squares to obtain the best possibe fit of I to R and are dispayed in Tabe 1 under the coumns abeed hom. See [38] for parameter estimates obtained at α = 0.25, 0.). This reationship has the property that a arge vaue of I corresponds to R that is cose to 1 whie vaues of I that are cose to zero correspond to Rs that are aso cose to zero. The sum of the estimates for the four three-way interaction terms for the ns is cose to zero for each α. This suggests that the when n 1 = n 2 the cubed term for the ns is not that different from zero. Therefore, based on the parameter estimates in the eft pane of Tabe 1, we define R I,α as foows: R I,α = exp expδ α ) og 1 1 I )) θα ) 1 exp expδ α ) og 1 1 I )) θα ) ) We ca R I,α as our initia adjusted index. We now investigate its performance in indexing custering compexity in a simiar framework to Figures 1 and 2. Figure 3 iustrates two-custer datasets simuated in a simiar manner as in Figure 1 but using our initia adjusted index R I,0.75. Simiar is the case with Figure 4 which mimics the setup of Figure 2 with the ony difference being that datasets are generated here using R I,0.75 instead of I). In both cases, we note that the agreement between the numerica vaues of R and R I,0.75 is substantiay improved. In particuar, Figures 3a-i show that the range of actua vaues of R contains the vaue of R I,0.75. Aso, Figures 4a-c demonstrate that R I,0.75 tracks R very we. Simiar is the case with α = 0.25 and 0. see [38]), providing support for the use of R I,α as a surrogate of custering compexity. This support is consistent for different numbers of observations and dimension. Note however, that Figure 4 continues to indicate greater variabiity in the obtained R when there are fewer observations in a group. This is expected because smaer sampe sizes ead to resuts with higher variabiity. R I,0.75 = 0.6 a) R = 0.65 b) R = 0.64 c) R = 0.55 R I,0.75 = 0.8 d) R = 0.79 e) R = 0.81 f) R = 0.83 R I,0.75 = 1.0 g) R = 1.0 h) R = 1.0 i) R = 1.0 Fig. 3. Simuated datasets at different R I,0.75 top of each row). Coor and symbo represent true cass. Obtained R vaues are provided beow each pot The Case with Many Groups K 2) So far we have anayzed and deveoped our index for twocass datasets. In this section, we extend it to the genera muti-group case. Ceary, separation between different pairs of custers impacts custering difficuty. We investigated severa
7 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL.?, NO.?, JANUARY 1? 7 TABLE 1 Tabe of estimated parameter vaues to adjust index for n and p when α = 0.75, for custers with homogeneous spherica hom) and the genera heterogeneous het) dispersion structures. Fonts in the tabe represent bounds on the p-vaues bod and underine for a p-vaue < 0.001, bod and itaic for a p-vaue < 0.01, bod for a p-vaue < 0.05, itaic for a p-vaue < 0.1 and reguar font otherwise). hom het hom het hom het hom het hom het ζ ζ n1,p ζ n1,n 1,n β β n1,p ζ n ζ n2,p ζ n1,n 1,n β n β n2,p ζ n ζ n1,n ζ n,1n2,n β n β n1,n ζ p ζ n2,n ζ n2,n 2,n β p β n1,n ζ p,p ζ n1,n β n2,n a) p = 10 b) p = 100 c) p = 1000 Fig. 4. Pots for K = 2 custers comparing R against R I,0.75. Coors and bands are as in Figure 2. possibiities for summarizing custering difficuty based on a ) K 2 pairwise indices, however, they a possessed severa drawbacks. For instance, the average pairwise separation is typicay high because many of the R I,α s are cose to 1, whie the minimum is overy infuenced by the presence of ony) two cose groups. Therefore, we investigated an adaptation of the summarized mutipe Jaccard simiarity index proposed by [40]. The Jaccard coefficient of simiarity [41] measures simiarity or overap between two species or popuations. This was extended by [40] for summarizing many pairwise indices for the case of mutipe popuations. Note that both the Jaccard index and its summarized mutipe version address simiarity or overap between popuations whie our pairwise index measures separabiity. This needs to be adapted for our case. Therefore, we define R ii I,α = 0 for i = 1...., K. Further, for each pair R ij I,α : 1 i, j K of custers, et R ij I,α = Rji I,α R I,α, i.e., the adjusted index defined using custers C i and C j. Aso, et Υ = R ij I,α )) 1 i,j K be the matrix of corresponding R ij I,α s. Then we define the foowing summarized index for K custers: R I,α = 1 J K Υ) λ 1) 1, 17) K 1 where J K is a K K matrix with a entries equa to unity, and J K Υ) λ 1) is the argest eigenvaue of the matrix J K Υ). [40] motivates his summary using principa components anaysis PCA) in the context of a correation matrix where the first principa component is that orthogona projection of the dataset that captures the greatest variabiity in the K coordinates. Like his summary, our summary index 17) has some very appeaing properties. When the matrix of pairwise separation indices is J K I K, i.e., R j I,α = 1 j. then the first argest) eigenvaue captures ony 1/K proportion of the tota eigenvaues. In this case, R I,α = 1. On the other hand when every eement in the matrix is zero, there is perfect overap between a groups, and ony the argest eigenvaue is non-zero. In this case R I,α = 0. Finay, when K = 2, R I,α is consistent with the soe) pairwise index R ij I,α Fina Adjustments to Separation Index: Whie 17) provides a summarized pairwise measure that is appeaing in some specia cases, we sti need to investigate its performance as a surrogate measure of custering compexity. We therefore performed a detaied simuation study to study the reationship between the summarized R I,α and R. Specificay, we generated 25 K-custer datasets for K = 3, 5 and 10 each with 100 dimensions, at each of 10 different vaues of R I,0.75 between 0 and 1, for equa numbers of observations in each group, i.e., n k n 0 ) where n 0 {, 100, 1000}. For each dataset, we used the k-means agorithm, and computed R of the subsequent custering. Figures 5a-c pot the interquartie ranges of R for each combination of K and n 0. Here we have R I,α on the x-axis and R on the y-axis. Figures 5a-c demonstrate that R I,α tracks R. In addition, the reationship between R I,α and R is consistent for different
8 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL.?, NO.?, JANUARY 1? a) K = 3 b) K = 5 c) K = 10 Fig. 5. Pots for R against R I,0.75. The three coors designate numbers of observations per custer, set to be equa and {, 100, 1000}. Other aspects of the pots are as in Figure a) K = 3 b) K = 5 c) K = 10 Fig. 6. Pots of R against I for when α = Other aspects of the pot are as in Figure 5. numbers of observations. However, it is aso cear that the exact reationship between R I,α and R depends on K so some more adjustments are caed for. To study this issue further, we simuated 25 datasets for each combination of p {2, 4, 5, 10,,, 100, 0, 0, 1000}, n i = n j n 0 for a i, j, where n 0 {,, 75, 100, 0, 0, 1000}, at ten eveny-spaced vaues of R I,α in 0,1), α {0.25, 0.5, 0.75} and K {3, 5, 7, 10}. For each dataset we cacuated R. Using the R from each of these datasets, and for each combination of p, α, K) we fit the mutipicative mode: R R β k,p I,α. 18) Using the parameter estimates for β k,p for each combination of dimension and number of custers, we then fit the foowing inear mode separatey for each tuning parameter: β k,p = η + η 1 k + η 2 k 2 + η 3 p + η 4 p 2 + η 5 kp. 19) Parameter estimates for this mode for the case of α = 0.75 TABLE 2 Tabe of estimated parameter vaues to adjust index for K and p when α = 0.75, for custers with homogeneous spherica hom) and the genera heterogeneous het) dispersion structures. For the estimated parameters, two of the p-vaues are < 0.01 and the rest are < For any two constants a and b, ae-b means a 10 b. η η 1 η 2 η 3 η 4 η 5 hom E-4-1.9E E-5 het E E-4 are presented in Tabe 2 see [38] for the parameter estimates when α {0.25, 0.5}). Thus the fina version of our index, after a adjustments, for the case of homogeneous spherica custers is I = R β k,p I,α. )
9 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL.?, NO.?, JANUARY 1? 9 Figures 6a-c are constructed simiary to Figures 5a-c except that the datasets are now generated using the agorithm in the appendix and using I instead of R I,0.75 ). Note that the effect of K has been argey addressed and the reationship between I and R is fairy simiar across dimensions. a) R = 0.57 d) R = 0.80 g) R = 1.0 I = 0.6 b) R = 0.58 I = 0.8 e) R = 0.82 I = 1.0 h) R = 1.0 c) R = 0.58 f) R = 0.76 i) R = 1.0 Fig. 7. Four-component simuated datasets at different I -vaues, with coor and potting character used to represent the true cass. Obtained R-vaues are dispayed beow each pot Iustrations: We now present, in Figures 7ai, some simuated four-cass datasets to demonstrate possibe configurations obtained using the agorithm in the appendix for three different vaues of I. Additiona reaizations and at different vaues of I are in [38]. In each case we used α = The different coors and characters in the figures represent the true casses. Note that the custers are we-separated as I increases. Custering compexity aso decreases as confirmed by the computed vaues of R. The datasets in the first row have the owest I, with each subsequent row down having a range of R higher than those in the previous row. In each row we see that I is within the range of the actua R index vaues. This provides further evidence that the transformed version of our index in the form of I is simiar to R. We concude this section with a few comments. Note that I is stricty between 0 and 1. Further, et us denote I c) to be the vaue of I that is defined using 15), 18) and ), but with I i,j repaced by Ic) i,j as in Coroary 4 for the custer C i with n i p-variate Gaussian observations with mean cµ i and covariance Σ i, i {1,..., K}. The foowing coroary impies that we can generate datasets with any vaue of the fina index between I 0) and unity using the agorithm in the appendix. Coroary 5: Fix α. Let c > 0, then for positive θ α and β k,p, I c) is a continuous function where its range contains I 0), 1). In addition, I c) is an increasing function of c. Proof: The resut foows directy from Theorem 3 and Coroary 4. Note that we found the adjustments in 15) and ) empiricay through simuations. In a of the cases we considered, θ α and β k,p are positive and thus Coroary 5 hods. Ideay, we shoud consider conducting further simuations if we desire adjustments for n i, n j, K or p that are very different from the cases we considered in our simuations. In this case we need to find other parameter estimates as in Tabes 1 and Custers with Genera Eipsoida Dispersions In this section, we assume that Σ k is any genera nonnegative definite covariance matrix, for each k {1,..., K}. In this case, the k-means agorithm is no onger appicabe so we used hierarchica custering with Eucidean distance and Ward s criterion [42] to obtain the tree, which was subsequenty cut to yied K custers. The R-vaue of this grouping reative to the true was taken to indicate the difficuty of custering. The focus in this section is therefore to broady reate our preiminary index and its adjustments to the R thus obtained The Two-Groups Case Figures 8a-i dispay simuated datasets for different vaues of our preiminary index I using the agorithm in the appendix with α = In each case, 100 observations were generated from each of two groups separated according to I. In these figures, coor and character distinguish the true grouping. For each simuated dataset, we aso obtained R as described above. Figures 8a-c have the owest R between 0.56 and 0.67, but each subsequent row down has R vaues, on the average, higher than in previous rows. In genera, therefore, Figures 8a-i demonstrate as the vaue of I increases, the custers become more separated. This coincides with an increase in the vaues of R. Thus ower vaues of I correspond to custering probems of higher difficuty whie higher vaues of I are associated with higher vaues of R and hence custering compexity. Simiar to Section 3.1, we investigated further the reationship between I and R in the case of nonhomogeneous groups. Specificay, we simuated 25 datasets each of observations from two nonhomogeneous popuations each with arbitrary eipsoida dispersion structures for a possibe combinations of n 1, n 2, p, I), where n 1 n 2 assumed w..o.g.), and n 1, n 2 are the numbers of observations in the two groups. In our experiments, p {2, 4, 5, 10,,, 100, 0, 0, 1000} and n 1, n 2 ) {, ),, ), 75, 75), 100, 100), 0, 0), 3, 3), 0, 100), 30, 100),, ), 60, 75), 90, 100), 1, 2),, 600), 100, 0)}. We simuated 25 datasets according to each of ten vaues of I spaced eveny between 0 and 1, and for α = Each
10 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL.?, NO.?, JANUARY 1? 10 I = 0.6 a) R = 0.64 b) R = 0.56 c) R = 0.67 I = 0.8 d) R = 0.81 e) R = 0.81 f) R = 0.83 I = 1.0 g) R = 1.0 h) R = 1.0 i) R = 1.0 Fig. 8. Simuated datasets at different I in the case with genera eipsoida-dispersed groups. Coor and potting character represent true groups. Under each pot we aso report the R comparing the true grouping with that using hierarchica custering with Ward s inkage and K = 2. dataset was partitioned into two groups using hierarchica custering with [42] s inkage and the resuting partition evauated with the true grouping of the simuated dataset in terms of R. We noticed noninearity in the genera reationship between I and R so as in Section 3.1.1, we expored a reationship between I and R aong with the parameters p, n. Simiar to 14), 15), 18) and 19) we found appropriate adjustments and defined R I,α for this case. See the coumns eveed het of Tabe 1 for the estimated vaues when α = 0.75 and [38] for estimates obtained when α equas 0.25 and 0..) The Case with Many Groups K 2) Summarizing the index for K > 2 groups brings the same issues outined in Section We propose adapting [40] s summarized mutipe Jaccard simiarity index in the same manner as before but noting that R I,α is cacuated within the setting of nonhomogeneous eipsoida custers. As in Section 3.1.2, we conducted an extensive simuation experiment to study the reationship between R I,α and R. We simuated 25 datasets for each combination of p, n i s, α, R I,α and K where p {2, 4, 5, 10,,, 100, 0}, α = 0.75, n i = n j = n 0 for a i, j with n 0 {,, 75, 100, 0, 0}, K {3, 5, 7, 10} and R I,α eveny-spaced over ten vaues in 0, 1). We partitioned each simuated dataset using hierarchica custering with [42] s inkage and cacuated R of the resuting partitioning reative to the true. We used the resuts to adjust our index as in 18) and 19) and obtained the fina summary I ) of simiar form to ), and coefficient estimates provided in Tabe 2. Figure 9 presents resuts of simuation experiments in the same spirit as Figures 5 and 6 except that the datasets are now generated using the agorithm in the appendix with I as described here. Note that as in Figures 6 the effect of K has been argey addressed and the reationship between I and R is argey consistent across dimensions. Thus the curse of dimensionaity no onger affects our index and as such it is interpretabe across dimensions. 3.3 Iustrative Exampes We now provide some iustrations of the range of muticustered datasets that can be obtained using the agorithm in the appendix and for different vaues of the summarized index 10 0 a) K = b) K = c) K = 10 Fig. 9. Pot of R against I for the case with genera eipsoida custers. The three coors represent the dimensions of the simuated datasets p {10,, 0}). Other aspects of the pot are as in Figure 2.
11 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL.?, NO.?, JANUARY 1? 11 I. We first dispay reaizations obtained in two dimensions, and then move on to higher dimensions. Figures 10a-i mimic the setup of Figure 7 for nonhomogenous eipsoida custers. Here the observations are grouped using hierarchica custering with Ward s criterion [42] and cutting the tree hierarchy at K = 4. The grouping of the datasets in Figures 10a-c have the owest R between 0.54 and 0.64, but each subsequent row down has R vaues, on the average, higher than in previous rows. In genera, therefore, Figures 10a-i demonstrate as the vaue of I increases, the custers become more separated. This coincides with an increase in the vaues of R. Thus ower vaues of I correspond to custering probems of higher difficuty whie higher vaues of I are associated with higher vaues of R and hence custering compexity. Figures 10a-i demonstrate the various possibe configurations obtained using the agorithm in the appendix for three different vaues of I. Note that in some cases ony two of the four groups are we-separated whie in other cases none of the groups are we-separated. We used Radviz or radia visuaization pots [43] to dispay muti-dimensiona muti-cass datasets in Figure 11. We dispay three reaizations each of five-dimensiona five-custered datasets, obtained using our agorithm with I vaues of 0.6, 0.8 and 1. Additiona simuated reaizations and other vaues of I are presented in [38]. Beow each pot we provide R obtained upon comparing the partitioning obtained using a) R = 0.59 d) R = 0.84 I = 0.6 b) R = 0.60 I = 0.8 e) R = 0.75 I = 1 c) R = 0.60 f) R = 0.79 I = 0.6 g) R = 1 h) R = 1 i) R = 1 a) R = 0.54 d) R = 0.76 b) R = 0.64 I = 0.8 e) R = 0.84 c) R = 0.61 f) R = 0.80 Fig. 11. Radia visuaization pots of 5 dimension and 5 components at different vaues for I hierarchica custering with [42] s inkage. From Figure 11 we see that custers with higher vaues of I are more separated in genera than datasets with ower vaues of I. The custers with I = 0.6 appear to overap quite a bit; however, the datasets corresponding to I = 1 seem more separated. Additionay, I is cose to R in a cases. We cose this section here by commenting on the roe of α. Our preiminary index I α is dependent on the choice of α. However, our fina adjustments ensuring the fina index I to be approximatey inear in R means that the vaue of α is not that crucia in our cacuations. g) R = 1.0 I = 1.0 h) R = 1.0 i) R = 1.0 Fig. 10. Simuated datasets at different I in heterogeneous case for K = 4, drawn in the same spirit as Figure 8. 4 APPLICATION TO GROUPED DATA In this section we iustrate some appications of our index to some datasets which contains cass information for each observation. The datasets studied here are the textures [10], wine [11], Iris [12], crabs [13], image [14], E. coi [15] and [16] s synthetic two-dimensiona datasets. 4.1 Estimate I for cassification datasets Our first appication is in using I as a measure of how weseparated the groups in each dataset are, on the whoe, and how they reate to the difficuty in custering or cassification.
12 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL.?, NO.?, JANUARY 1? 12 TABLE 3 The estimated index I Î ), the index derived by [27] QJ) and exact-c-separation c) are presented with adjusted Rand indices R Q, R E ) obtained using quadratic discriminant anaysis QDA) and EM-custering respectivey on the datasets S) i) textures, ii) Ruspini, iii) wine, iv) image, v) crabs, vi) Iris and vii) E. coi. S n p K Î R Q R E QJ c i) ii) iii) iv) 2, v) vi) vii) a) wine b) Iris c) crabs Tabe 3 provides summaries of our estimated vaues of I Î ) for each dataset. We use the true cassification and the actua observations to obtain estimates of the mean and covariance of each custer. Using the estimated mean, covariance and number of observations from each custer we find estimates of I for α = 0.75 and using Mahaanobis distance. We aso cacuated R based on the custering using quadratic discriminant anaysis QDA) and EM-custering done using the R package MCust. The corresponding cacuated R s were caed R Q and R E respectivey. Note that our EM agorithms were initiaized using the estimated mean and covariance matrix and the actua custer size proportions. Therefore we consider the fina custering as the best we coud possiby do over a other choices for initiaization vaues. Tabe 3 compares the estimated vaues of Î and R evauating the cassification on the corresponding dataset done using QDA and mode-based custering. The datasets are ordered in Tabe 3 from the argest to the smaest Î vaue. For each dataset, we aso cacuated [27] s index as we as exact-cseparation. With the exception of the Iris and image datasets, higher vaues of Î correspond to higher vaues of both R Q and R E. The reationship between ˆR and c-separation is not as cear. The textures dataset for exampe has ˆR = 1 but a very sma c-separation = There aso does not appear to be much reationship between the index of [27] and R. Both c-separation and [27] do however pick up image as the most difficut dataset to group. 4.2 Indexing Distinctiveness of Casses In this section, we discuss the use of our index to summarize the distinctiveness of each cass with respect to the other. To iustrate this point, we estimate the pairwise indices R I,α of Section for each pair of casses in each dataset. These pairwise indices are presented in Figure 12. For each dataset, we dispay the vaue quantitativey and quaitativey by means of a coor map. Darker vaues indicate we-separated groups with index vaues coser to 1 whie ighter regions represent pairs of groups that are harder to separate. The map of these pairwise indices provides us with an idea of the groups that are easier or harder to separate. In the Iris dataset exampe, ˆR I,0.75 = 1 for species 1 I. Setosa) and 2 I. Versicoor) and for species groups 1 I. Setosa) and 3 I. Virginica) d) E. coi e) image Fig. 12. Pots dispaying the the pairwise indices ˆR I,0.75 for five commony used cassification datasets. This indicates very-we separated groups. On the other hand, ˆR I,0.75 = 0.7 for the species groups 2 I. Versicoor and I. Virginica). This suggests that the difficuty in cassifying the Iris dataset is argey due to the simiarities between the species I. Virginica and I. Versicoor. The wine dataset is simiar to the Iris dataset in that most of the difficuty in cassification or custering appears due to just two groups, as evidenced by ˆR I,0.75 = for between Groups 1 and 2. A other pairs of groups in the wine dataset have a vaue of ˆRI,0.75 greater than The pairwise indices of the crabs dataset aso produces some interesting insights into the difficuty of cassification or custering. Note that this dataset has mae and femae crabs each of orange and bue coored crabs. An interesting finding is that ˆR I,0.75 is above 0.99 for a pairs of categories except for the two pairs of crab groups having the same coor. For the bue mae and femae crabs, we have ˆR I,0.75 = 0.61 whie ˆR I,0.75 = 0.90 for the orange mae and femae crabs. This suggests that coors separate the crab groups better than gender and thus the difficuty in custering this dataset is argey to the sma differences between genders. Both the Ruspini dataset and the textures dataset have ˆR I,0.75 = 1 for a pairs of groups and are not dispayed for brevity. Figure 12d dispays the pairwise vaues of ˆRI,0.75 for the 5 groups in the E. coi dataset. Once again, groups 2 and 3 are very difficut to separate, since ˆR I,0.75 = between these two groups. Figure 12e dispays the pairwise vaues for ˆR I,0.75 for the seven groups in the image dataset. Once again, there is very itte separation between groups 3 and 5 corresponding to images of foiage and a window as we as groups 4 and 6 corresponding to images of cement and a path. A other pairs of groups appear very we-separated. Our pairwise index thus provides an idea of how separated each individua grouping is reative to the others. Then, we can use our index to characterize the reationships between 0.1 0
A Separability Index for Distance-based Clustering and Classification Algorithms
A Separabiity Index for Distance-based Custering and Cassification Agorithms Arka P. Ghosh, Ranjan Maitra and Anna D. Peterson 1 Department of Statistics, Iowa State University, Ames, IA 50011-1210, USA.
More informationA Separability Index for Distance-based Clustering and Classification Algorithms
Journa of Machine Learning Research 1 (2000) 1-48 Submitted Apri 30, 2010; Pubished 10/00 A Separabiity Index for Distance-based Custering and Cassification Agorithms Arka P. Ghosh Ranjan Maitra Anna D.
More informationFRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)
1 FRST 531 -- Mutivariate Statistics Mutivariate Discriminant Anaysis (MDA) Purpose: 1. To predict which group (Y) an observation beongs to based on the characteristics of p predictor (X) variabes, using
More informationA Brief Introduction to Markov Chains and Hidden Markov Models
A Brief Introduction to Markov Chains and Hidden Markov Modes Aen B MacKenzie Notes for December 1, 3, &8, 2015 Discrete-Time Markov Chains You may reca that when we first introduced random processes,
More informationA. Distribution of the test statistic
A. Distribution of the test statistic In the sequentia test, we first compute the test statistic from a mini-batch of size m. If a decision cannot be made with this statistic, we keep increasing the mini-batch
More informationMARKOV CHAINS AND MARKOV DECISION THEORY. Contents
MARKOV CHAINS AND MARKOV DECISION THEORY ARINDRIMA DATTA Abstract. In this paper, we begin with a forma introduction to probabiity and expain the concept of random variabes and stochastic processes. After
More informationDo Schools Matter for High Math Achievement? Evidence from the American Mathematics Competitions Glenn Ellison and Ashley Swanson Online Appendix
VOL. NO. DO SCHOOLS MATTER FOR HIGH MATH ACHIEVEMENT? 43 Do Schoos Matter for High Math Achievement? Evidence from the American Mathematics Competitions Genn Eison and Ashey Swanson Onine Appendix Appendix
More informationXSAT of linear CNF formulas
XSAT of inear CN formuas Bernd R. Schuh Dr. Bernd Schuh, D-50968 Kön, Germany; bernd.schuh@netcoogne.de eywords: compexity, XSAT, exact inear formua, -reguarity, -uniformity, NPcompeteness Abstract. Open
More information(This is a sample cover image for this issue. The actual cover is not yet available at this time.)
(This is a sampe cover image for this issue The actua cover is not yet avaiabe at this time) This artice appeared in a journa pubished by Esevier The attached copy is furnished to the author for interna
More informationFirst-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries
c 26 Noninear Phenomena in Compex Systems First-Order Corrections to Gutzwier s Trace Formua for Systems with Discrete Symmetries Hoger Cartarius, Jörg Main, and Günter Wunner Institut für Theoretische
More informationCS229 Lecture notes. Andrew Ng
CS229 Lecture notes Andrew Ng Part IX The EM agorithm In the previous set of notes, we taked about the EM agorithm as appied to fitting a mixture of Gaussians. In this set of notes, we give a broader view
More informationExpectation-Maximization for Estimating Parameters for a Mixture of Poissons
Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Brandon Maone Department of Computer Science University of Hesini February 18, 2014 Abstract This document derives, in excrutiating
More informationSUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS
ISEE 1 SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS By Yingying Fan and Jinchi Lv University of Southern Caifornia This Suppementary Materia
More informationTesting for the Existence of Clusters
Testing for the Existence of Custers Caudio Fuentes and George Casea University of Forida November 13, 2008 Abstract The detection and determination of custers has been of specia interest, among researchers
More informationAlberto Maydeu Olivares Instituto de Empresa Marketing Dept. C/Maria de Molina Madrid Spain
CORRECTIONS TO CLASSICAL PROCEDURES FOR ESTIMATING THURSTONE S CASE V MODEL FOR RANKING DATA Aberto Maydeu Oivares Instituto de Empresa Marketing Dept. C/Maria de Moina -5 28006 Madrid Spain Aberto.Maydeu@ie.edu
More informationGauss Law. 2. Gauss s Law: connects charge and field 3. Applications of Gauss s Law
Gauss Law 1. Review on 1) Couomb s Law (charge and force) 2) Eectric Fied (fied and force) 2. Gauss s Law: connects charge and fied 3. Appications of Gauss s Law Couomb s Law and Eectric Fied Couomb s
More information6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7
6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17 Soution 7 Probem 1: Generating Random Variabes Each part of this probem requires impementation in MATLAB. For the
More informationEfficiently Generating Random Bits from Finite State Markov Chains
1 Efficienty Generating Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown
More informationSVM: Terminology 1(6) SVM: Terminology 2(6)
Andrew Kusiak Inteigent Systems Laboratory 39 Seamans Center he University of Iowa Iowa City, IA 54-57 SVM he maxima margin cassifier is simiar to the perceptron: It aso assumes that the data points are
More informationTwo-sample inference for normal mean vectors based on monotone missing data
Journa of Mutivariate Anaysis 97 (006 6 76 wwweseviercom/ocate/jmva Two-sampe inference for norma mean vectors based on monotone missing data Jianqi Yu a, K Krishnamoorthy a,, Maruthy K Pannaa b a Department
More informationAn Algorithm for Pruning Redundant Modules in Min-Max Modular Network
An Agorithm for Pruning Redundant Modues in Min-Max Moduar Network Hui-Cheng Lian and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University 1954 Hua Shan Rd., Shanghai
More informationLecture Note 3: Stationary Iterative Methods
MATH 5330: Computationa Methods of Linear Agebra Lecture Note 3: Stationary Iterative Methods Xianyi Zeng Department of Mathematica Sciences, UTEP Stationary Iterative Methods The Gaussian eimination (or
More informationAn explicit Jordan Decomposition of Companion matrices
An expicit Jordan Decomposition of Companion matrices Fermín S V Bazán Departamento de Matemática CFM UFSC 88040-900 Forianópois SC E-mai: fermin@mtmufscbr S Gratton CERFACS 42 Av Gaspard Coriois 31057
More informationComponentwise Determination of the Interval Hull Solution for Linear Interval Parameter Systems
Componentwise Determination of the Interva Hu Soution for Linear Interva Parameter Systems L. V. Koev Dept. of Theoretica Eectrotechnics, Facuty of Automatics, Technica University of Sofia, 1000 Sofia,
More informationSome Measures for Asymmetry of Distributions
Some Measures for Asymmetry of Distributions Georgi N. Boshnakov First version: 31 January 2006 Research Report No. 5, 2006, Probabiity and Statistics Group Schoo of Mathematics, The University of Manchester
More informationNEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION
NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION Hsiao-Chang Chen Dept. of Systems Engineering University of Pennsyvania Phiadephia, PA 904-635, U.S.A. Chun-Hung Chen
More informationDiscrete Techniques. Chapter Introduction
Chapter 3 Discrete Techniques 3. Introduction In the previous two chapters we introduced Fourier transforms of continuous functions of the periodic and non-periodic (finite energy) type, as we as various
More informationStatistical Learning Theory: A Primer
Internationa Journa of Computer Vision 38(), 9 3, 2000 c 2000 uwer Academic Pubishers. Manufactured in The Netherands. Statistica Learning Theory: A Primer THEODOROS EVGENIOU, MASSIMILIANO PONTIL AND TOMASO
More informationProblem set 6 The Perron Frobenius theorem.
Probem set 6 The Perron Frobenius theorem. Math 22a4 Oct 2 204, Due Oct.28 In a future probem set I want to discuss some criteria which aow us to concude that that the ground state of a sef-adjoint operator
More informationSeparation of Variables and a Spherical Shell with Surface Charge
Separation of Variabes and a Spherica She with Surface Charge In cass we worked out the eectrostatic potentia due to a spherica she of radius R with a surface charge density σθ = σ cos θ. This cacuation
More informationDiscrete Techniques. Chapter Introduction
Chapter 3 Discrete Techniques 3. Introduction In the previous two chapters we introduced Fourier transforms of continuous functions of the periodic and non-periodic (finite energy) type, we as various
More information4 1-D Boundary Value Problems Heat Equation
4 -D Boundary Vaue Probems Heat Equation The main purpose of this chapter is to study boundary vaue probems for the heat equation on a finite rod a x b. u t (x, t = ku xx (x, t, a < x < b, t > u(x, = ϕ(x
More informationAkaike Information Criterion for ANOVA Model with a Simple Order Restriction
Akaike Information Criterion for ANOVA Mode with a Simpe Order Restriction Yu Inatsu * Department of Mathematics, Graduate Schoo of Science, Hiroshima University ABSTRACT In this paper, we consider Akaike
More informationTurbo Codes. Coding and Communication Laboratory. Dept. of Electrical Engineering, National Chung Hsing University
Turbo Codes Coding and Communication Laboratory Dept. of Eectrica Engineering, Nationa Chung Hsing University Turbo codes 1 Chapter 12: Turbo Codes 1. Introduction 2. Turbo code encoder 3. Design of intereaver
More informationStatistics for Applications. Chapter 7: Regression 1/43
Statistics for Appications Chapter 7: Regression 1/43 Heuristics of the inear regression (1) Consider a coud of i.i.d. random points (X i,y i ),i =1,...,n : 2/43 Heuristics of the inear regression (2)
More informationData Mining Technology for Failure Prognostic of Avionics
IEEE Transactions on Aerospace and Eectronic Systems. Voume 38, #, pp.388-403, 00. Data Mining Technoogy for Faiure Prognostic of Avionics V.A. Skormin, Binghamton University, Binghamton, NY, 1390, USA
More informationII. PROBLEM. A. Description. For the space of audio signals
CS229 - Fina Report Speech Recording based Language Recognition (Natura Language) Leopod Cambier - cambier; Matan Leibovich - matane; Cindy Orozco Bohorquez - orozcocc ABSTRACT We construct a rea time
More informationDIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM
DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM MIKAEL NILSSON, MATTIAS DAHL AND INGVAR CLAESSON Bekinge Institute of Technoogy Department of Teecommunications and Signa Processing
More informationEfficient Generation of Random Bits from Finite State Markov Chains
Efficient Generation of Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown
More informationExplicit overall risk minimization transductive bound
1 Expicit overa risk minimization transductive bound Sergio Decherchi, Paoo Gastado, Sandro Ridea, Rodofo Zunino Dept. of Biophysica and Eectronic Engineering (DIBE), Genoa University Via Opera Pia 11a,
More informationMoreau-Yosida Regularization for Grouped Tree Structure Learning
Moreau-Yosida Reguarization for Grouped Tree Structure Learning Jun Liu Computer Science and Engineering Arizona State University J.Liu@asu.edu Jieping Ye Computer Science and Engineering Arizona State
More informationTheory and implementation behind: Universal surface creation - smallest unitcell
Teory and impementation beind: Universa surface creation - smaest unitce Bjare Brin Buus, Jaob Howat & Tomas Bigaard September 15, 218 1 Construction of surface sabs Te aim for tis part of te project is
More informationTracking Control of Multiple Mobile Robots
Proceedings of the 2001 IEEE Internationa Conference on Robotics & Automation Seou, Korea May 21-26, 2001 Tracking Contro of Mutipe Mobie Robots A Case Study of Inter-Robot Coision-Free Probem Jurachart
More informationConvergence Property of the Iri-Imai Algorithm for Some Smooth Convex Programming Problems
Convergence Property of the Iri-Imai Agorithm for Some Smooth Convex Programming Probems S. Zhang Communicated by Z.Q. Luo Assistant Professor, Department of Econometrics, University of Groningen, Groningen,
More informationFormulas for Angular-Momentum Barrier Factors Version II
BNL PREPRINT BNL-QGS-06-101 brfactor1.tex Formuas for Anguar-Momentum Barrier Factors Version II S. U. Chung Physics Department, Brookhaven Nationa Laboratory, Upton, NY 11973 March 19, 2015 abstract A
More informationStochastic Complement Analysis of Multi-Server Threshold Queues. with Hysteresis. Abstract
Stochastic Compement Anaysis of Muti-Server Threshod Queues with Hysteresis John C.S. Lui The Dept. of Computer Science & Engineering The Chinese University of Hong Kong Leana Goubchik Dept. of Computer
More informationUniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete
Uniprocessor Feasibiity of Sporadic Tasks with Constrained Deadines is Strongy conp-compete Pontus Ekberg and Wang Yi Uppsaa University, Sweden Emai: {pontus.ekberg yi}@it.uu.se Abstract Deciding the feasibiity
More informationBALANCING REGULAR MATRIX PENCILS
BALANCING REGULAR MATRIX PENCILS DAMIEN LEMONNIER AND PAUL VAN DOOREN Abstract. In this paper we present a new diagona baancing technique for reguar matrix pencis λb A, which aims at reducing the sensitivity
More informationLecture 6: Moderately Large Deflection Theory of Beams
Structura Mechanics 2.8 Lecture 6 Semester Yr Lecture 6: Moderatey Large Defection Theory of Beams 6.1 Genera Formuation Compare to the cassica theory of beams with infinitesima deformation, the moderatey
More informationAlgorithms to solve massively under-defined systems of multivariate quadratic equations
Agorithms to sove massivey under-defined systems of mutivariate quadratic equations Yasufumi Hashimoto Abstract It is we known that the probem to sove a set of randomy chosen mutivariate quadratic equations
More informationA proposed nonparametric mixture density estimation using B-spline functions
A proposed nonparametric mixture density estimation using B-spine functions Atizez Hadrich a,b, Mourad Zribi a, Afif Masmoudi b a Laboratoire d Informatique Signa et Image de a Côte d Opae (LISIC-EA 4491),
More informationDistributed average consensus: Beyond the realm of linearity
Distributed average consensus: Beyond the ream of inearity Usman A. Khan, Soummya Kar, and José M. F. Moura Department of Eectrica and Computer Engineering Carnegie Meon University 5 Forbes Ave, Pittsburgh,
More informationPHYS 110B - HW #1 Fall 2005, Solutions by David Pace Equations referenced as Eq. # are from Griffiths Problem statements are paraphrased
PHYS 110B - HW #1 Fa 2005, Soutions by David Pace Equations referenced as Eq. # are from Griffiths Probem statements are paraphrased [1.] Probem 6.8 from Griffiths A ong cyinder has radius R and a magnetization
More informationOptimality of Inference in Hierarchical Coding for Distributed Object-Based Representations
Optimaity of Inference in Hierarchica Coding for Distributed Object-Based Representations Simon Brodeur, Jean Rouat NECOTIS, Département génie éectrique et génie informatique, Université de Sherbrooke,
More informationSTA 216 Project: Spline Approach to Discrete Survival Analysis
: Spine Approach to Discrete Surviva Anaysis November 4, 005 1 Introduction Athough continuous surviva anaysis differs much from the discrete surviva anaysis, there is certain ink between the two modeing
More informationA unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio
MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 1654 March23, 1999
More informationMATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES
MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES Separation of variabes is a method to sove certain PDEs which have a warped product structure. First, on R n, a inear PDE of order m is
More informationApproximated MLC shape matrix decomposition with interleaf collision constraint
Approximated MLC shape matrix decomposition with intereaf coision constraint Thomas Kainowski Antje Kiese Abstract Shape matrix decomposition is a subprobem in radiation therapy panning. A given fuence
More informationSydU STAT3014 (2015) Second semester Dr. J. Chan 18
STAT3014/3914 Appied Stat.-Samping C-Stratified rand. sampe Stratified Random Samping.1 Introduction Description The popuation of size N is divided into mutuay excusive and exhaustive subpopuations caed
More informationc 2007 Society for Industrial and Applied Mathematics
SIAM REVIEW Vo. 49,No. 1,pp. 111 1 c 7 Society for Industria and Appied Mathematics Domino Waves C. J. Efthimiou M. D. Johnson Abstract. Motivated by a proposa of Daykin [Probem 71-19*, SIAM Rev., 13 (1971),
More informationOptimal Control of Assembly Systems with Multiple Stages and Multiple Demand Classes 1
Optima Contro of Assemby Systems with Mutipe Stages and Mutipe Demand Casses Saif Benjaafar Mohsen EHafsi 2 Chung-Yee Lee 3 Weihua Zhou 3 Industria & Systems Engineering, Department of Mechanica Engineering,
More informationGeneralized multigranulation rough sets and optimal granularity selection
Granu. Comput. DOI 10.1007/s41066-017-0042-9 ORIGINAL PAPER Generaized mutigranuation rough sets and optima granuarity seection Weihua Xu 1 Wentao Li 2 Xiantao Zhang 1 Received: 27 September 2016 / Accepted:
More information17 Lecture 17: Recombination and Dark Matter Production
PYS 652: Astrophysics 88 17 Lecture 17: Recombination and Dark Matter Production New ideas pass through three periods: It can t be done. It probaby can be done, but it s not worth doing. I knew it was
More informationIterative Decoding Performance Bounds for LDPC Codes on Noisy Channels
Iterative Decoding Performance Bounds for LDPC Codes on Noisy Channes arxiv:cs/060700v1 [cs.it] 6 Ju 006 Chun-Hao Hsu and Achieas Anastasopouos Eectrica Engineering and Computer Science Department University
More informationHigh Spectral Resolution Infrared Radiance Modeling Using Optimal Spectral Sampling (OSS) Method
High Spectra Resoution Infrared Radiance Modeing Using Optima Spectra Samping (OSS) Method J.-L. Moncet and G. Uymin Background Optima Spectra Samping (OSS) method is a fast and accurate monochromatic
More informationRate-Distortion Theory of Finite Point Processes
Rate-Distortion Theory of Finite Point Processes Günther Koiander, Dominic Schuhmacher, and Franz Hawatsch, Feow, IEEE Abstract We study the compression of data in the case where the usefu information
More informationInvestigation on spectrum of the adjacency matrix and Laplacian matrix of graph G l
Investigation on spectrum of the adjacency matrix and Lapacian matrix of graph G SHUHUA YIN Computer Science and Information Technoogy Coege Zhejiang Wani University Ningbo 3500 PEOPLE S REPUBLIC OF CHINA
More informationAn approximate method for solving the inverse scattering problem with fixed-energy data
J. Inv. I-Posed Probems, Vo. 7, No. 6, pp. 561 571 (1999) c VSP 1999 An approximate method for soving the inverse scattering probem with fixed-energy data A. G. Ramm and W. Scheid Received May 12, 1999
More informationOn the evaluation of saving-consumption plans
On the evauation of saving-consumption pans Steven Vanduffe Jan Dhaene Marc Goovaerts Juy 13, 2004 Abstract Knowedge of the distribution function of the stochasticay compounded vaue of a series of future
More informationPrimal and dual active-set methods for convex quadratic programming
Math. Program., Ser. A 216) 159:469 58 DOI 1.17/s117-15-966-2 FULL LENGTH PAPER Prima and dua active-set methods for convex quadratic programming Anders Forsgren 1 Phiip E. Gi 2 Eizabeth Wong 2 Received:
More informationBayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with?
Bayesian Learning A powerfu and growing approach in machine earning We use it in our own decision making a the time You hear a which which coud equay be Thanks or Tanks, which woud you go with? Combine
More informationNonlinear Analysis of Spatial Trusses
Noninear Anaysis of Spatia Trusses João Barrigó October 14 Abstract The present work addresses the noninear behavior of space trusses A formuation for geometrica noninear anaysis is presented, which incudes
More informationMultiple Beam Interference
MutipeBeamInterference.nb James C. Wyant 1 Mutipe Beam Interference 1. Airy's Formua We wi first derive Airy's formua for the case of no absorption. ü 1.1 Basic refectance and transmittance Refected ight
More informationhttps://doi.org/ /epjconf/
HOW TO APPLY THE OPTIMAL ESTIMATION METHOD TO YOUR LIDAR MEASUREMENTS FOR IMPROVED RETRIEVALS OF TEMPERATURE AND COMPOSITION R. J. Sica 1,2,*, A. Haefee 2,1, A. Jaai 1, S. Gamage 1 and G. Farhani 1 1 Department
More informationORTHOGONAL MULTI-WAVELETS FROM MATRIX FACTORIZATION
J. Korean Math. Soc. 46 2009, No. 2, pp. 281 294 ORHOGONAL MLI-WAVELES FROM MARIX FACORIZAION Hongying Xiao Abstract. Accuracy of the scaing function is very crucia in waveet theory, or correspondingy,
More informationIn-plane shear stiffness of bare steel deck through shell finite element models. G. Bian, B.W. Schafer. June 2017
In-pane shear stiffness of bare stee deck through she finite eement modes G. Bian, B.W. Schafer June 7 COLD-FORMED STEEL RESEARCH CONSORTIUM REPORT SERIES CFSRC R-7- SDII Stee Diaphragm Innovation Initiative
More informationSymbolic models for nonlinear control systems using approximate bisimulation
Symboic modes for noninear contro systems using approximate bisimuation Giordano Poa, Antoine Girard and Pauo Tabuada Abstract Contro systems are usuay modeed by differentia equations describing how physica
More informationSequential Decoding of Polar Codes with Arbitrary Binary Kernel
Sequentia Decoding of Poar Codes with Arbitrary Binary Kerne Vera Miosavskaya, Peter Trifonov Saint-Petersburg State Poytechnic University Emai: veram,petert}@dcn.icc.spbstu.ru Abstract The probem of efficient
More informationPartial permutation decoding for MacDonald codes
Partia permutation decoding for MacDonad codes J.D. Key Department of Mathematics and Appied Mathematics University of the Western Cape 7535 Bevie, South Africa P. Seneviratne Department of Mathematics
More informationAbsolute Value Preconditioning for Symmetric Indefinite Linear Systems
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.mer.com Absoute Vaue Preconditioning for Symmetric Indefinite Linear Systems Vecharynski, E.; Knyazev, A.V. TR2013-016 March 2013 Abstract We introduce
More informationC. Fourier Sine Series Overview
12 PHILIP D. LOEWEN C. Fourier Sine Series Overview Let some constant > be given. The symboic form of the FSS Eigenvaue probem combines an ordinary differentia equation (ODE) on the interva (, ) with a
More informationASummaryofGaussianProcesses Coryn A.L. Bailer-Jones
ASummaryofGaussianProcesses Coryn A.L. Baier-Jones Cavendish Laboratory University of Cambridge caj@mrao.cam.ac.uk Introduction A genera prediction probem can be posed as foows. We consider that the variabe
More informationFORECASTING TELECOMMUNICATIONS DATA WITH AUTOREGRESSIVE INTEGRATED MOVING AVERAGE MODELS
FORECASTING TEECOMMUNICATIONS DATA WITH AUTOREGRESSIVE INTEGRATED MOVING AVERAGE MODES Niesh Subhash naawade a, Mrs. Meenakshi Pawar b a SVERI's Coege of Engineering, Pandharpur. nieshsubhash15@gmai.com
More informationStochastic Automata Networks (SAN) - Modelling. and Evaluation. Paulo Fernandes 1. Brigitte Plateau 2. May 29, 1997
Stochastic utomata etworks (S) - Modeing and Evauation Pauo Fernandes rigitte Pateau 2 May 29, 997 Institut ationa Poytechnique de Grenobe { IPG Ecoe ationae Superieure d'informatique et de Mathematiques
More informationApproximated MLC shape matrix decomposition with interleaf collision constraint
Agorithmic Operations Research Vo.4 (29) 49 57 Approximated MLC shape matrix decomposition with intereaf coision constraint Antje Kiese and Thomas Kainowski Institut für Mathematik, Universität Rostock,
More informationThe EM Algorithm applied to determining new limit points of Mahler measures
Contro and Cybernetics vo. 39 (2010) No. 4 The EM Agorithm appied to determining new imit points of Maher measures by Souad E Otmani, Georges Rhin and Jean-Marc Sac-Épée Université Pau Veraine-Metz, LMAM,
More informationAutomobile Prices in Market Equilibrium. Berry, Pakes and Levinsohn
Automobie Prices in Market Equiibrium Berry, Pakes and Levinsohn Empirica Anaysis of demand and suppy in a differentiated products market: equiibrium in the U.S. automobie market. Oigopoistic Differentiated
More informationInteractive Fuzzy Programming for Two-level Nonlinear Integer Programming Problems through Genetic Algorithms
Md. Abu Kaam Azad et a./asia Paciic Management Review (5) (), 7-77 Interactive Fuzzy Programming or Two-eve Noninear Integer Programming Probems through Genetic Agorithms Abstract Md. Abu Kaam Azad a,*,
More informationarxiv: v1 [math.ca] 6 Mar 2017
Indefinite Integras of Spherica Besse Functions MIT-CTP/487 arxiv:703.0648v [math.ca] 6 Mar 07 Joyon K. Boomfied,, Stephen H. P. Face,, and Zander Moss, Center for Theoretica Physics, Laboratory for Nucear
More informationV.B The Cluster Expansion
V.B The Custer Expansion For short range interactions, speciay with a hard core, it is much better to repace the expansion parameter V( q ) by f( q ) = exp ( βv( q )), which is obtained by summing over
More informationPreconditioned Locally Harmonic Residual Method for Computing Interior Eigenpairs of Certain Classes of Hermitian Matrices
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.mer.com Preconditioned Locay Harmonic Residua Method for Computing Interior Eigenpairs of Certain Casses of Hermitian Matrices Vecharynski, E.; Knyazev,
More informationCoded Caching for Files with Distinct File Sizes
Coded Caching for Fies with Distinct Fie Sizes Jinbei Zhang iaojun Lin Chih-Chun Wang inbing Wang Department of Eectronic Engineering Shanghai Jiao ong University China Schoo of Eectrica and Computer Engineering
More informationTwo-Stage Least Squares as Minimum Distance
Two-Stage Least Squares as Minimum Distance Frank Windmeijer Discussion Paper 17 / 683 7 June 2017 Department of Economics University of Bristo Priory Road Compex Bristo BS8 1TU United Kingdom Two-Stage
More informationBiometrics Unit, 337 Warren Hall Cornell University, Ithaca, NY and. B. L. Raktoe
NONISCMORPHIC CCMPLETE SETS OF ORTHOGONAL F-SQ.UARES, HADAMARD MATRICES, AND DECCMPOSITIONS OF A 2 4 DESIGN S. J. Schwager and w. T. Federer Biometrics Unit, 337 Warren Ha Corne University, Ithaca, NY
More informationT.C. Banwell, S. Galli. {bct, Telcordia Technologies, Inc., 445 South Street, Morristown, NJ 07960, USA
ON THE SYMMETRY OF THE POWER INE CHANNE T.C. Banwe, S. Gai {bct, sgai}@research.tecordia.com Tecordia Technoogies, Inc., 445 South Street, Morristown, NJ 07960, USA Abstract The indoor power ine network
More informationarxiv: v1 [math.co] 17 Dec 2018
On the Extrema Maximum Agreement Subtree Probem arxiv:1812.06951v1 [math.o] 17 Dec 2018 Aexey Markin Department of omputer Science, Iowa State University, USA amarkin@iastate.edu Abstract Given two phyogenetic
More informationNOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS
NOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS TONY ALLEN, EMILY GEBHARDT, AND ADAM KLUBALL 3 ADVISOR: DR. TIFFANY KOLBA 4 Abstract. The phenomenon of noise-induced stabiization occurs
More informationStatistical Learning Theory: a Primer
??,??, 1 6 (??) c?? Kuwer Academic Pubishers, Boston. Manufactured in The Netherands. Statistica Learning Theory: a Primer THEODOROS EVGENIOU AND MASSIMILIANO PONTIL Center for Bioogica and Computationa
More informationDiscriminant Analysis: A Unified Approach
Discriminant Anaysis: A Unified Approach Peng Zhang & Jing Peng Tuane University Eectrica Engineering & Computer Science Department New Oreans, LA 708 {zhangp,jp}@eecs.tuane.edu Norbert Riede Tuane University
More informationApplied Nuclear Physics (Fall 2006) Lecture 7 (10/2/06) Overview of Cross Section Calculation
22.101 Appied Nucear Physics (Fa 2006) Lecture 7 (10/2/06) Overview of Cross Section Cacuation References P. Roman, Advanced Quantum Theory (Addison-Wesey, Reading, 1965), Chap 3. A. Foderaro, The Eements
More information