THE MOMENT-PRESERVATION METHOD OF SUMMARIZING DISTRIBUTIONS. Stanley L. Sclove

Size: px

Start display at page:

Download "THE MOMENT-PRESERVATION METHOD OF SUMMARIZING DISTRIBUTIONS. Stanley L. Sclove"

Chloe Stafford
5 years ago
Views:

1 THE MOMENT-PRESERVATION METHOD OF SUMMARIZING DISTRIBUTIONS Stanley L. Sclove Department of Information & Decision Sciences University of Illinois at Chicago slsclove Research Memorandum TECHNICAL REPORT NO. SLS Business Statistics Area-of-Inquiry PhD Program in Business Administration College of Business Administration University of Illinois at Chicago 2011: July 9 i

2 Sclove: Moment-Preservation Method 1 The Moment-Preservation Method of Summarizing Distributions Contents Stanley L. Sclove Department of Information & Decision Sciences University of Illinois at Chicago 1 Introduction 3 2 Details of the Moment-Preservation Method for K = Moments of a Binary (0,1) Variable Moments of a Binary Variable Re-statement in Terms of Central Moments Moment-Preservation Method for K Comparison with Other Methods Partitioning a Distribution Stanines Multivariate Case K = K = 2, p = Higher Dimensions Cluster Analysis 11

3 The Moment-Preservation Method of Summarizing Distributions Stanley L. Sclove July 28, 2011 Abstract The moment-preservation method is a method of approximating a distribution (theoretical or data) by a discrete distribution, and hence of summarizing the distribution with a few values and their frequencies. Results on the method are reviewed. Keywords: moments, moment-preservation method, cluster analysis Stanley L. Sclove (Ph.D., Mathematical Statistics, Columbia University; B.A., Applied Honors Mathematics, Dartmouth College) is a Professor in the Department of Information & Decision Sciences at the University of Illinois at Chicago (UIC) and Coordinator of the Business Statistics Area of the PhD Program in Business Administration. In addition to UIC he has taught at Carnegie-Mellon, Northwestern, and Stanford. Sclove is Secretary/Treasurer of the Classification Society and the Representative of the Risk Analysis Section of the American Statistical Association to its Council of Sections. Address: Department of Information & Decision Sciences (MC 294), College of Business Administration, University of Illinois at Chicago, 601 S. Morgan St., Chicago, IL Earlier versions of this paper were presented in sessions at the Annual Meeting, Classification Society, Carnegie Mellon University, Pittsburgh, PA, June 16-18, 2011, and the Joint Statistical Meetings, American Statistical Association, Miami Beach, FL, July 30 - August 4, The author thanks Professor David Banks of Duke University for organizing these sessions. 2

4 Sclove: Moment-Preservation Method 3 1 Introduction The moment-preservation method of approximating a distribution is discussed. The method is found in the engineering literature, esp. in articles concerning image processing and pattern recognition. The method consists of approximating a given distribution with a discrete distribution, on, say, K values, where K would usually be small, like 2 or 3. A rationale for the method is that two distributions will be close if a number of their moments are close. In particular, the discrete approximating distribution will be close to the given distribution of their moments are close. The discrete approximation provides K values that, along with their probabilities, summarize, or are representative, of the distribution. Given a dataset, or a theoretical distribution, a problem is to fit several, say K clusters e.g., K = 2 or 3 (or more). An approach is to find mass points (cluster centers) c 1, c 2,..., c K and probabilities p 1, p 2,..., p K such that the corresponding discrete distribution Pr{X = c k } = p k, where X denotes the corresponding random variable, matches the moments of the given distribution. Recall that if two distributions have a number of moments close together, then the two distributions will be close in appropriate ways. The moment-preservation method may be viewed as solving for a discrete (K-valued) approximation to the given distribution. The unknowns are c 1, c 2,..., c K and p 1, p 2,..., p K. There are 2K unknowns, with one constraint (the probabilities add to 1), hence 2K 1 free unknowns. This can be considered as an extreme case of the finite mixture model where the component distributions are one-point distributions. The moment-preservation method consists of computing values of the unknowns by matching 2K 1 moments m 1, m 2,..., m 2K 1. The values m k could, remember, be sample moments or specified values of true moments. The method can be viewed as one of discrete approximation to a distribution (data or theoretical). It is reminiscent of the elicitation process in Decision Risk Analysis or in CPM/PERT: Obtain high, medium, and low values of a risk variable or activity time, with their (subjective) probabilities. Here, however, we re asking the data, not a person.

5 Sclove: Moment-Preservation Method 4 2 Details of the Moment-Preservation Method for K = 2 There are two mass points c 1, c 2 and two probabilities p 1 p 2, but p 2 = 1 p 1 so there are only three free unknowns. Denote the (raw) moments by m k. (Given data x i, i = 1, 2,..., n, m k = n i=1 x k i.) Equations for the unknowns are p 1 + p 2 = 1, p 1 c 1 +p 2 c 2 = m 1, p 1c 2 1 +p 2c 2 2 = m 2, p 1c 3 1 +p 2c 3 2 = m 3. The situation looks like this. > x c 1 m 1 c 2 p 1 m 1 c 1 p 2 c 2 m 1 Figure. K = 2 mass points c 1, c 2 with probabilities p 1, p 2. The quantitites c 1, c 2, p 1, p 2 are taken as functions of m 1, m 2, m 3. The solution comes out as follows (Harris 2003). Let a = [(m 1m 3 m 2 2 )/(m 2 m 2 1 )]/2, b = [(m 1m 2 m 3)/(m 2 m 2 1 )]/2. (Note m 2 m 2 1 is the variance, m 2. ) Then c 1 = [ b (b 2 4a) 1/2 ]/2, c 2 = [ b + (b 2 4a) 1/2 ]/2, p 1 = (c 2 m 1)/(c 2 c 1 ), p 2 = 1 p 1 = (m 1 c 1 )/(c 2 c 1 ). An alternative way of writing the solution (Tabatabai and Mitchell 1984) is where s = m 2 = c 1 = m 1 s p 2 /p 1, c 2 = m 1 + s p 1 /p 2, m 2 m 2 1, p 2 = [1+g 1/(g 2 + 4)]/2, p 1 = 1 p 2, the quantity g = m 3 /m 3/2 2 the normalized third central moment, being the skewness. Judging from this form of the solution, it appears that use of cental moments m 1 = 0, m 2 = m 2 m 2 1, m 3 = m 3 3m 2 m 1 + 2m 3 1 from the outset may simplify things. In terms of the variance m 2, the equations in m 1 and m 2 give

6 Sclove: Moment-Preservation Method 5 p 1 p 2 (c 1 c 2 ) 2 = m 2 = m 2 m 2 1. The problem might be even further simplified by working in terms of deviations d 1 = c 1 m 1 and d 2 = c 2 m 1. The moments are worked out below. First note how that the assignment to Cluster 1 or Cluster 2 is obtained by arranging the n points in increasing order and assigning those whose rank does not exceed np 1 into the first cluster. 2.1 Moments of a Binary (0,1) Variable Let us view the case K = 2 from the viewpoint of a binary variable. Let B be a Bernoulli variable, with Pr{B = 1} = p 2 and Pr{B = 0} = p 1. Then the k-th raw moment of B is m k(b) = E[B k ] = p 1 0 k + p 2 1 k = p 2. The variance is p 1 p 2. The third central moment of B is m 3 = m 3 3m 2m 1 + 2m 3 1 = p 2 3p 2 p 2 + 2p 3 2 = p 2 3p p 3 2 = p 2 (1 3p p 3 2) = p 2 (1 p 2 )(1 2p 2 ) = p 1 p 2 (p 1 + p 2 2p 2 ) = p 1 p 2 (p 1 p 2 ). The skewness γ = m 3 /m 3/2 2 is γ = p 1 p 2 (p 1 p 2 )/(p 1 p 2 ) 3/2 = (p 1 p 2 )/(p 1 p 2 ) 1/2. Applying the identity xy = [(x + y) 2 (x y) 2 ]/4 with x = p 1, y = p 2 gives p 1 p 2 = [(p 1 + p 2 ) 2 (p 1 p 2 ) 2 ]/4 = [1 (p 1 p 2 ) 2 ]/4. This gives the skewness in terms of p 1 p 2 as γ = p 1 p 2 (1/2) 1 (p 1 p 2 ) 2.

7 Sclove: Moment-Preservation Method Moments of a Binary Variable Now suppose Y = c 1 when B = 0 and Y = c 2 when B = 1. Then Y = (c 2 c 1 )B +c 1. The mean is m 1 (Y ) = m 1 ((c 2 c 1 )B+c 1 ) = (c 2 c 1 )m 1 (B)+c 1 = (c 2 c 1 )p 2 +c 1 or p 1 c 1 +p 2 c 2. The k-th central moment is (c 2 c 1 ) k times that of B: m k (Y ) = (c 2 c 1 ) k m k (B). The variance is (c 2 c 1 ) 2 times that of B, or p 1 p 2 (c 1 c 2 ) 2. The third central moment is (c 2 c 1 ) 3 m 3 (B) = p 1 p 2 (p 1 p 2 )(c 2 c 1 ) 3. The skewness γ = m 3 /m 3/2 2 is γ = p 1 p 2 (p 1 p 2 )(c 2 c 1 ) 3 /[p 1 p 2 (c 1 c 2 ) 2 ] 3/2 = (p 1 p 2 )/(p 1 p 2 ) 1/2, the same as that of B, since γ is normalized and hence invariant under scale transformation. 2.3 Re-statement in Terms of Central Moments Denote the central moments by m k = k = 1, 2,.... In terms of the distribution of a random variable Y, these are m k = E[(Y µ) k ], where µ = E[Y ] = m 1 = m, say. The first several central moments in terms of the raw moments m k are are m 1 = m 1 m 1 = 0, m 2 = m 2 m 2 1, m 3 = m 3 3m 2 m + 3m 1 m2 m 3 = m 3 3m 2 m + 2m3. The moment-preservation equations in terms of central moments are as follows. equations p 1 (c 1 m) + p 2 (c 2 m) = m 1 = 0 p 1 p 2 (c 2 c 1 ) 2 = m 2 p 1 p 2 (p 1 p 2 )(c 2 c 1 ) 3 = m 3 3 Moment-Preservation Method for K 3 For K = 3, there are mass points c 1, c 2, c 3 and probabilities p 1, p 2, p 3. The equations are p 1 + p 2 + p 3 = 1 and p 1 c j 1 + p 2c j 2 + p 3c j 3 = m j, j = 1, 2, 3, 4, 5. I have not seen the details of solutions for K 3. It could be interesting to obtain such results, but the matter is not pursued here. Of course, given numerical values of the moments, a solution can be obtained by numerical methods.

8 Sclove: Moment-Preservation Method 7 4 Comparison with Other Methods 4.1 Partitioning a Distribution Contrast the moment-preservation method of Harris and others with partitioning a distribution, where partitioning means finding minimum mean squared error intervals. (The representative of an interval is then the conditional expectation, given that interval.) E.g., Normal distribution, there is the 27% rule (see, e.g., Cox 1957): for K = 3 for the the proportions in the three intervals come out to be.27,.46, and.27. The intervals are approximately (, ), (-0.613, ), (+0.613, ). The means of these intervals are approximately -1.22, 0, The result used here is E[Z a < Z < b] = φ(b) φ(a) Φ(b) Φ(a), where Z denotes a random variable having the standard Normal distribution and φ and Φ are the standard Normal p.d.f. and c.d.f., resp. Let s look at the results for K = 2 and 3. First consider K = 3. Say the given moments are the same as the standard Normal, the first one being 0, the second 1, the third 0, and the fourth 3. What are the results from the moment-preservation method, and how do they compare with those of partitioning? Table 1: K=3: Comparing moment preservation and partitioning for the standard Normal distribution moment-preservation centers c 1 = 3, c 2 = 0, c 3 = 3 probs p 1 = 1/6, p 2 = 4/6, p 3 = 1/6 partitioning solution centers c , c 2 = 0, c probs p 1.27, p 2.46, p 3.27

9 Sclove: Moment-Preservation Method 8 Now consider the comparison for K = 2. Table 2: K=2: Comparing moment preservation and partitioning for the standard Normal distribution moment preservation centers c 1 = 1, c 2 = +1 probs p 1 = p 2 = 1/2 partitioning centers c , c probs p 1 = p 2 = 1/2 4.2 Stanines It is worth mentioning another method of scoring, though it s not moment-preservation or partitioning. These are the stanines, for K = 9. They are based on intervals a half standard deviation wide. The values are 1, 2, 3, 4, 5, 6, 7, 8, 9. The boundaries are, -1.75, -1.25, -0.75, -0.25, +0.25, +0.75, +1.25, +1.75,. The proportions in the nine categories for a Normal distribution are approximately 4, 7, 12, 17, 20, 17, 12, 7, and 4%. See Johari & Sclove (1976) for further discussion of stanines, and of partitioning for various families of distributions. 5 Multivariate Case There are K mass points c 1, c 2,..., c K and their probabilities p 1, p 2,..., p K. Let p denote the number of variables. (The univariate case is p = 1.) Then each mass point is a vector of p elements, so the number of unknown is pk + (K 1) = pk + K 1 or K(p + 1) 1, where 1 is subtracted because the probabilities must add to one.

10 Sclove: Moment-Preservation Method 9 When the given distribution is a data distribution, denote the sample by x i, i = 1, 2,..., n. Then m = (1/n) n i=1 x i. The matrix of sample sums of cross products is n A = x i x i, = [a uv ] u=1,2,...,p; v=1,2,...,p, i=1 where a uv = n i=1 x ui x vi, u, v = 1, 2,..., p. Here denotes vector or matrix transpose. The matrix of sample sums of cross products of deviations is C = n i=1 (x i m) (x i m). The generic element of C is c uv = n i=1 (x ui m u ) (x vi m v ), u, v = 1, 2,..., p. 5.1 K = 2 Let c 1 and c 2 denote two vector mass points to be found and m the mean vector, with elements m v, v = 1, 2,..., p, where p is the number of variables. For K = 2, an equation for p 1, p 2, c 1, c 2 is p 1 c 1 + p 2 c 2 = m. A second equation is p 1 c 1 c 1 + p 2 c 2 c 2 = A. Instead, could be used. p 1 (c 1 m)(c 1 m) + p 2 (c 2 m)(c 2 m) = C 5.2 K = 2, p = 2 Consider 2D data (p = 2). The data are x i = (x i ; y i ), i = 1, 2,..., n, Denote the mass point vectors by c 1 = (c 1x c 1y ), c 2 = (c 2x c 2y ). Let m = ( m x m y ). Denote the elements of A by

11 Sclove: Moment-Preservation Method 10 a xx, a yy, and a xy = a yx. Pei and Chen (1996) considered the case K = 2 for 2D data; they deal with the two dimensions by using a single complex variable. It may be illuminating to examine this also without complex variables. The unknowns are the six quantitites c 1x, c 1y, c 2x c 2y, and p 1, p 2, with 5 degrees of freedom, since p 1 +p 2 = 1. Equations are p 1 + p 2 = 1 p 1 c 1x + p 2 c 2x = m x p 1 c 1y + p 2 c 2y = m y p 1 c 2 1x + p 2 c 2 2x = a xx p 1 c 2 1y + p 2 c 2 2y = a yy p 1 c 1x c 1y + p 2 c 2x c 2y = a xy 5.3 Higher Dimensions More generally, whether or not two points are linked into the same cluster should depend on the distance between them. The definition of distance is tricky in the multivariate case. Statistical (Mahalanobis) distance should be used, but the appropriate one is based on the within-groups covariance matrix, and in cluster analysis, by any method, moment-preservation or otherwise, the groups are only just being formed. So one needs to iterate between tentative assignments to clusters and updating the within-groups covariance matrix and hence updating the inter-point distances. The total covariance matrix can be misleading; for example, the overall correlation between two variables could be positive (negative) while the within-groups correlations are negative (positive).

12 Sclove: Moment-Preservation Method 11 6 Cluster Analysis Given a dataset, or a theoretical distribution, a problem is to fit several, say K clusters e.g., K = 2 or 3 (or more). The moment preservation may be viewed as a way of performing cluster analysis or mode seeking. Cluster analysis is the grouping of similar individuals or objects. One method of performing cluster analysis is to fit a finite mixture model (FMM), f(x) = π 1 f 1 (x) + π 2 f 2 (x) + + f K (x), where f k (x) are the component p.d.f.s and π k are the mixing probabilities, adding to one. The moment-preservation method may be viewed as an extreme case of the FMM, where the component distributions are single-point (singular) distributions. References [1] Cox, D. R. (1957). Note on grouping. Journal of the American Statistical Association, 52, [2] Delp, E. J., and Mitchell, O. R. (1979). Image compression using block truncation coding. IEEE Trans. on Communications, COM-27, [3] Harris, B. (2003). The moment preservation method of cluster analysis. Exploratory Data Analysis in Empirical Research: Proc. 25th Ann. Conf. of GfKl, University of Munich, March 14-16, 2001, Schwaiger and Opitz (eds.), Springer-Verlag, Inc., Berlin and New York. [4] Johari, S., and Sclove, S. L. (1976). Partitioning a distribution. Communications in Statistics, A5, [5] Lin, J. C., and Tsai, W.-H. (1994). Feature-preserving clustering of 2-D data for two-class problems using analytical formulas: an automatic and fast approach. IEEE Trans. on PAMI, 16 (5), [6] Pei, S.-C, and Cheng, C.-M. A fast two-class classifier for 2D data using complex-moment-preserving principle. Pattern Recognition, 29(3), [7] Tabatabai, A. J., and Mitchell, O. R. (1984). Edge location to subpixel values in digital imagery. IEEE Trans. on PAMI, 6 (2),

13 Sclove: Moment-Preservation Method 12 Technical Reports in this series 1. Stanley L. Sclove. The Variance of a Product. Research Memorandum, Technical Report No. SLS , 9-March Business Statistics Area-of-Inquiry, PhD Program in Business Administration, University of Illinois at Chicago. 2. Stanley L. Sclove. A Review of Statistical Model Selection Criteria for Multiple Regression. Research Memorandum, Technical Report No. SLS , 9-April Business Statistics Area-of-Inquiry, PhD Program in Business Administration, University of Illinois at Chicago. 3. Stanley L. Sclove. A Review of Statistical Model Selection Criteria. Research Memorandum, Technical Report No. SLS , 31-May Business Statistics Area-of-Inquiry, PhD Program in Business Administration, University of Illinois at Chicago. 4. Stanley L. Sclove. The Moment-Preservation Method of Summarizing Distributions. Research Memorandum, Technical Report No. SLS , 9-July Business Statistics Area-of-Inquiry, PhD Program in Business Administration, University of Illinois at Chicago. These notes Copyright c 2011 Stanley Louis Sclove Created 2011: July 25 latest update 2011: July 25

UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN BOOKSTACKS

UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN BOOKSTACKS i CENTRAL CIRCULATION BOOKSTACKS The person charging this material is responsible for its renewal or its return to the library from which it