THE MOMENT-PRESERVATION METHOD OF SUMMARIZING DISTRIBUTIONS. Stanley L. Sclove
|
|
- Chloe Stafford
- 5 years ago
- Views:
Transcription
1 THE MOMENT-PRESERVATION METHOD OF SUMMARIZING DISTRIBUTIONS Stanley L. Sclove Department of Information & Decision Sciences University of Illinois at Chicago slsclove Research Memorandum TECHNICAL REPORT NO. SLS Business Statistics Area-of-Inquiry PhD Program in Business Administration College of Business Administration University of Illinois at Chicago 2011: July 9 i
2 Sclove: Moment-Preservation Method 1 The Moment-Preservation Method of Summarizing Distributions Contents Stanley L. Sclove Department of Information & Decision Sciences University of Illinois at Chicago 1 Introduction 3 2 Details of the Moment-Preservation Method for K = Moments of a Binary (0,1) Variable Moments of a Binary Variable Re-statement in Terms of Central Moments Moment-Preservation Method for K Comparison with Other Methods Partitioning a Distribution Stanines Multivariate Case K = K = 2, p = Higher Dimensions Cluster Analysis 11
3 The Moment-Preservation Method of Summarizing Distributions Stanley L. Sclove July 28, 2011 Abstract The moment-preservation method is a method of approximating a distribution (theoretical or data) by a discrete distribution, and hence of summarizing the distribution with a few values and their frequencies. Results on the method are reviewed. Keywords: moments, moment-preservation method, cluster analysis Stanley L. Sclove (Ph.D., Mathematical Statistics, Columbia University; B.A., Applied Honors Mathematics, Dartmouth College) is a Professor in the Department of Information & Decision Sciences at the University of Illinois at Chicago (UIC) and Coordinator of the Business Statistics Area of the PhD Program in Business Administration. In addition to UIC he has taught at Carnegie-Mellon, Northwestern, and Stanford. Sclove is Secretary/Treasurer of the Classification Society and the Representative of the Risk Analysis Section of the American Statistical Association to its Council of Sections. Address: Department of Information & Decision Sciences (MC 294), College of Business Administration, University of Illinois at Chicago, 601 S. Morgan St., Chicago, IL Earlier versions of this paper were presented in sessions at the Annual Meeting, Classification Society, Carnegie Mellon University, Pittsburgh, PA, June 16-18, 2011, and the Joint Statistical Meetings, American Statistical Association, Miami Beach, FL, July 30 - August 4, The author thanks Professor David Banks of Duke University for organizing these sessions. 2
4 Sclove: Moment-Preservation Method 3 1 Introduction The moment-preservation method of approximating a distribution is discussed. The method is found in the engineering literature, esp. in articles concerning image processing and pattern recognition. The method consists of approximating a given distribution with a discrete distribution, on, say, K values, where K would usually be small, like 2 or 3. A rationale for the method is that two distributions will be close if a number of their moments are close. In particular, the discrete approximating distribution will be close to the given distribution of their moments are close. The discrete approximation provides K values that, along with their probabilities, summarize, or are representative, of the distribution. Given a dataset, or a theoretical distribution, a problem is to fit several, say K clusters e.g., K = 2 or 3 (or more). An approach is to find mass points (cluster centers) c 1, c 2,..., c K and probabilities p 1, p 2,..., p K such that the corresponding discrete distribution Pr{X = c k } = p k, where X denotes the corresponding random variable, matches the moments of the given distribution. Recall that if two distributions have a number of moments close together, then the two distributions will be close in appropriate ways. The moment-preservation method may be viewed as solving for a discrete (K-valued) approximation to the given distribution. The unknowns are c 1, c 2,..., c K and p 1, p 2,..., p K. There are 2K unknowns, with one constraint (the probabilities add to 1), hence 2K 1 free unknowns. This can be considered as an extreme case of the finite mixture model where the component distributions are one-point distributions. The moment-preservation method consists of computing values of the unknowns by matching 2K 1 moments m 1, m 2,..., m 2K 1. The values m k could, remember, be sample moments or specified values of true moments. The method can be viewed as one of discrete approximation to a distribution (data or theoretical). It is reminiscent of the elicitation process in Decision Risk Analysis or in CPM/PERT: Obtain high, medium, and low values of a risk variable or activity time, with their (subjective) probabilities. Here, however, we re asking the data, not a person.
5 Sclove: Moment-Preservation Method 4 2 Details of the Moment-Preservation Method for K = 2 There are two mass points c 1, c 2 and two probabilities p 1 p 2, but p 2 = 1 p 1 so there are only three free unknowns. Denote the (raw) moments by m k. (Given data x i, i = 1, 2,..., n, m k = n i=1 x k i.) Equations for the unknowns are p 1 + p 2 = 1, p 1 c 1 +p 2 c 2 = m 1, p 1c 2 1 +p 2c 2 2 = m 2, p 1c 3 1 +p 2c 3 2 = m 3. The situation looks like this. > x c 1 m 1 c 2 p 1 m 1 c 1 p 2 c 2 m 1 Figure. K = 2 mass points c 1, c 2 with probabilities p 1, p 2. The quantitites c 1, c 2, p 1, p 2 are taken as functions of m 1, m 2, m 3. The solution comes out as follows (Harris 2003). Let a = [(m 1m 3 m 2 2 )/(m 2 m 2 1 )]/2, b = [(m 1m 2 m 3)/(m 2 m 2 1 )]/2. (Note m 2 m 2 1 is the variance, m 2. ) Then c 1 = [ b (b 2 4a) 1/2 ]/2, c 2 = [ b + (b 2 4a) 1/2 ]/2, p 1 = (c 2 m 1)/(c 2 c 1 ), p 2 = 1 p 1 = (m 1 c 1 )/(c 2 c 1 ). An alternative way of writing the solution (Tabatabai and Mitchell 1984) is where s = m 2 = c 1 = m 1 s p 2 /p 1, c 2 = m 1 + s p 1 /p 2, m 2 m 2 1, p 2 = [1+g 1/(g 2 + 4)]/2, p 1 = 1 p 2, the quantity g = m 3 /m 3/2 2 the normalized third central moment, being the skewness. Judging from this form of the solution, it appears that use of cental moments m 1 = 0, m 2 = m 2 m 2 1, m 3 = m 3 3m 2 m 1 + 2m 3 1 from the outset may simplify things. In terms of the variance m 2, the equations in m 1 and m 2 give
6 Sclove: Moment-Preservation Method 5 p 1 p 2 (c 1 c 2 ) 2 = m 2 = m 2 m 2 1. The problem might be even further simplified by working in terms of deviations d 1 = c 1 m 1 and d 2 = c 2 m 1. The moments are worked out below. First note how that the assignment to Cluster 1 or Cluster 2 is obtained by arranging the n points in increasing order and assigning those whose rank does not exceed np 1 into the first cluster. 2.1 Moments of a Binary (0,1) Variable Let us view the case K = 2 from the viewpoint of a binary variable. Let B be a Bernoulli variable, with Pr{B = 1} = p 2 and Pr{B = 0} = p 1. Then the k-th raw moment of B is m k(b) = E[B k ] = p 1 0 k + p 2 1 k = p 2. The variance is p 1 p 2. The third central moment of B is m 3 = m 3 3m 2m 1 + 2m 3 1 = p 2 3p 2 p 2 + 2p 3 2 = p 2 3p p 3 2 = p 2 (1 3p p 3 2) = p 2 (1 p 2 )(1 2p 2 ) = p 1 p 2 (p 1 + p 2 2p 2 ) = p 1 p 2 (p 1 p 2 ). The skewness γ = m 3 /m 3/2 2 is γ = p 1 p 2 (p 1 p 2 )/(p 1 p 2 ) 3/2 = (p 1 p 2 )/(p 1 p 2 ) 1/2. Applying the identity xy = [(x + y) 2 (x y) 2 ]/4 with x = p 1, y = p 2 gives p 1 p 2 = [(p 1 + p 2 ) 2 (p 1 p 2 ) 2 ]/4 = [1 (p 1 p 2 ) 2 ]/4. This gives the skewness in terms of p 1 p 2 as γ = p 1 p 2 (1/2) 1 (p 1 p 2 ) 2.
7 Sclove: Moment-Preservation Method Moments of a Binary Variable Now suppose Y = c 1 when B = 0 and Y = c 2 when B = 1. Then Y = (c 2 c 1 )B +c 1. The mean is m 1 (Y ) = m 1 ((c 2 c 1 )B+c 1 ) = (c 2 c 1 )m 1 (B)+c 1 = (c 2 c 1 )p 2 +c 1 or p 1 c 1 +p 2 c 2. The k-th central moment is (c 2 c 1 ) k times that of B: m k (Y ) = (c 2 c 1 ) k m k (B). The variance is (c 2 c 1 ) 2 times that of B, or p 1 p 2 (c 1 c 2 ) 2. The third central moment is (c 2 c 1 ) 3 m 3 (B) = p 1 p 2 (p 1 p 2 )(c 2 c 1 ) 3. The skewness γ = m 3 /m 3/2 2 is γ = p 1 p 2 (p 1 p 2 )(c 2 c 1 ) 3 /[p 1 p 2 (c 1 c 2 ) 2 ] 3/2 = (p 1 p 2 )/(p 1 p 2 ) 1/2, the same as that of B, since γ is normalized and hence invariant under scale transformation. 2.3 Re-statement in Terms of Central Moments Denote the central moments by m k = k = 1, 2,.... In terms of the distribution of a random variable Y, these are m k = E[(Y µ) k ], where µ = E[Y ] = m 1 = m, say. The first several central moments in terms of the raw moments m k are are m 1 = m 1 m 1 = 0, m 2 = m 2 m 2 1, m 3 = m 3 3m 2 m + 3m 1 m2 m 3 = m 3 3m 2 m + 2m3. The moment-preservation equations in terms of central moments are as follows. equations p 1 (c 1 m) + p 2 (c 2 m) = m 1 = 0 p 1 p 2 (c 2 c 1 ) 2 = m 2 p 1 p 2 (p 1 p 2 )(c 2 c 1 ) 3 = m 3 3 Moment-Preservation Method for K 3 For K = 3, there are mass points c 1, c 2, c 3 and probabilities p 1, p 2, p 3. The equations are p 1 + p 2 + p 3 = 1 and p 1 c j 1 + p 2c j 2 + p 3c j 3 = m j, j = 1, 2, 3, 4, 5. I have not seen the details of solutions for K 3. It could be interesting to obtain such results, but the matter is not pursued here. Of course, given numerical values of the moments, a solution can be obtained by numerical methods.
8 Sclove: Moment-Preservation Method 7 4 Comparison with Other Methods 4.1 Partitioning a Distribution Contrast the moment-preservation method of Harris and others with partitioning a distribution, where partitioning means finding minimum mean squared error intervals. (The representative of an interval is then the conditional expectation, given that interval.) E.g., Normal distribution, there is the 27% rule (see, e.g., Cox 1957): for K = 3 for the the proportions in the three intervals come out to be.27,.46, and.27. The intervals are approximately (, ), (-0.613, ), (+0.613, ). The means of these intervals are approximately -1.22, 0, The result used here is E[Z a < Z < b] = φ(b) φ(a) Φ(b) Φ(a), where Z denotes a random variable having the standard Normal distribution and φ and Φ are the standard Normal p.d.f. and c.d.f., resp. Let s look at the results for K = 2 and 3. First consider K = 3. Say the given moments are the same as the standard Normal, the first one being 0, the second 1, the third 0, and the fourth 3. What are the results from the moment-preservation method, and how do they compare with those of partitioning? Table 1: K=3: Comparing moment preservation and partitioning for the standard Normal distribution moment-preservation centers c 1 = 3, c 2 = 0, c 3 = 3 probs p 1 = 1/6, p 2 = 4/6, p 3 = 1/6 partitioning solution centers c , c 2 = 0, c probs p 1.27, p 2.46, p 3.27
9 Sclove: Moment-Preservation Method 8 Now consider the comparison for K = 2. Table 2: K=2: Comparing moment preservation and partitioning for the standard Normal distribution moment preservation centers c 1 = 1, c 2 = +1 probs p 1 = p 2 = 1/2 partitioning centers c , c probs p 1 = p 2 = 1/2 4.2 Stanines It is worth mentioning another method of scoring, though it s not moment-preservation or partitioning. These are the stanines, for K = 9. They are based on intervals a half standard deviation wide. The values are 1, 2, 3, 4, 5, 6, 7, 8, 9. The boundaries are, -1.75, -1.25, -0.75, -0.25, +0.25, +0.75, +1.25, +1.75,. The proportions in the nine categories for a Normal distribution are approximately 4, 7, 12, 17, 20, 17, 12, 7, and 4%. See Johari & Sclove (1976) for further discussion of stanines, and of partitioning for various families of distributions. 5 Multivariate Case There are K mass points c 1, c 2,..., c K and their probabilities p 1, p 2,..., p K. Let p denote the number of variables. (The univariate case is p = 1.) Then each mass point is a vector of p elements, so the number of unknown is pk + (K 1) = pk + K 1 or K(p + 1) 1, where 1 is subtracted because the probabilities must add to one.
10 Sclove: Moment-Preservation Method 9 When the given distribution is a data distribution, denote the sample by x i, i = 1, 2,..., n. Then m = (1/n) n i=1 x i. The matrix of sample sums of cross products is n A = x i x i, = [a uv ] u=1,2,...,p; v=1,2,...,p, i=1 where a uv = n i=1 x ui x vi, u, v = 1, 2,..., p. Here denotes vector or matrix transpose. The matrix of sample sums of cross products of deviations is C = n i=1 (x i m) (x i m). The generic element of C is c uv = n i=1 (x ui m u ) (x vi m v ), u, v = 1, 2,..., p. 5.1 K = 2 Let c 1 and c 2 denote two vector mass points to be found and m the mean vector, with elements m v, v = 1, 2,..., p, where p is the number of variables. For K = 2, an equation for p 1, p 2, c 1, c 2 is p 1 c 1 + p 2 c 2 = m. A second equation is p 1 c 1 c 1 + p 2 c 2 c 2 = A. Instead, could be used. p 1 (c 1 m)(c 1 m) + p 2 (c 2 m)(c 2 m) = C 5.2 K = 2, p = 2 Consider 2D data (p = 2). The data are x i = (x i ; y i ), i = 1, 2,..., n, Denote the mass point vectors by c 1 = (c 1x c 1y ), c 2 = (c 2x c 2y ). Let m = ( m x m y ). Denote the elements of A by
11 Sclove: Moment-Preservation Method 10 a xx, a yy, and a xy = a yx. Pei and Chen (1996) considered the case K = 2 for 2D data; they deal with the two dimensions by using a single complex variable. It may be illuminating to examine this also without complex variables. The unknowns are the six quantitites c 1x, c 1y, c 2x c 2y, and p 1, p 2, with 5 degrees of freedom, since p 1 +p 2 = 1. Equations are p 1 + p 2 = 1 p 1 c 1x + p 2 c 2x = m x p 1 c 1y + p 2 c 2y = m y p 1 c 2 1x + p 2 c 2 2x = a xx p 1 c 2 1y + p 2 c 2 2y = a yy p 1 c 1x c 1y + p 2 c 2x c 2y = a xy 5.3 Higher Dimensions More generally, whether or not two points are linked into the same cluster should depend on the distance between them. The definition of distance is tricky in the multivariate case. Statistical (Mahalanobis) distance should be used, but the appropriate one is based on the within-groups covariance matrix, and in cluster analysis, by any method, moment-preservation or otherwise, the groups are only just being formed. So one needs to iterate between tentative assignments to clusters and updating the within-groups covariance matrix and hence updating the inter-point distances. The total covariance matrix can be misleading; for example, the overall correlation between two variables could be positive (negative) while the within-groups correlations are negative (positive).
12 Sclove: Moment-Preservation Method 11 6 Cluster Analysis Given a dataset, or a theoretical distribution, a problem is to fit several, say K clusters e.g., K = 2 or 3 (or more). The moment preservation may be viewed as a way of performing cluster analysis or mode seeking. Cluster analysis is the grouping of similar individuals or objects. One method of performing cluster analysis is to fit a finite mixture model (FMM), f(x) = π 1 f 1 (x) + π 2 f 2 (x) + + f K (x), where f k (x) are the component p.d.f.s and π k are the mixing probabilities, adding to one. The moment-preservation method may be viewed as an extreme case of the FMM, where the component distributions are single-point (singular) distributions. References [1] Cox, D. R. (1957). Note on grouping. Journal of the American Statistical Association, 52, [2] Delp, E. J., and Mitchell, O. R. (1979). Image compression using block truncation coding. IEEE Trans. on Communications, COM-27, [3] Harris, B. (2003). The moment preservation method of cluster analysis. Exploratory Data Analysis in Empirical Research: Proc. 25th Ann. Conf. of GfKl, University of Munich, March 14-16, 2001, Schwaiger and Opitz (eds.), Springer-Verlag, Inc., Berlin and New York. [4] Johari, S., and Sclove, S. L. (1976). Partitioning a distribution. Communications in Statistics, A5, [5] Lin, J. C., and Tsai, W.-H. (1994). Feature-preserving clustering of 2-D data for two-class problems using analytical formulas: an automatic and fast approach. IEEE Trans. on PAMI, 16 (5), [6] Pei, S.-C, and Cheng, C.-M. A fast two-class classifier for 2D data using complex-moment-preserving principle. Pattern Recognition, 29(3), [7] Tabatabai, A. J., and Mitchell, O. R. (1984). Edge location to subpixel values in digital imagery. IEEE Trans. on PAMI, 6 (2),
13 Sclove: Moment-Preservation Method 12 Technical Reports in this series 1. Stanley L. Sclove. The Variance of a Product. Research Memorandum, Technical Report No. SLS , 9-March Business Statistics Area-of-Inquiry, PhD Program in Business Administration, University of Illinois at Chicago. 2. Stanley L. Sclove. A Review of Statistical Model Selection Criteria for Multiple Regression. Research Memorandum, Technical Report No. SLS , 9-April Business Statistics Area-of-Inquiry, PhD Program in Business Administration, University of Illinois at Chicago. 3. Stanley L. Sclove. A Review of Statistical Model Selection Criteria. Research Memorandum, Technical Report No. SLS , 31-May Business Statistics Area-of-Inquiry, PhD Program in Business Administration, University of Illinois at Chicago. 4. Stanley L. Sclove. The Moment-Preservation Method of Summarizing Distributions. Research Memorandum, Technical Report No. SLS , 9-July Business Statistics Area-of-Inquiry, PhD Program in Business Administration, University of Illinois at Chicago. These notes Copyright c 2011 Stanley Louis Sclove Created 2011: July 25 latest update 2011: July 25
UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN BOOKSTACKS
UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN BOOKSTACKS i CENTRAL CIRCULATION BOOKSTACKS The person charging this material is responsible for its renewal or its return to the library from which it
More informationCS145: Probability & Computing
CS45: Probability & Computing Lecture 0: Continuous Bayes Rule, Joint and Marginal Probability Densities Instructor: Eli Upfal Brown University Computer Science Figure credits: Bertsekas & Tsitsiklis,
More informationLecture 13: Data Modelling and Distributions. Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1
Lecture 13: Data Modelling and Distributions Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1 Why data distributions? It is a well established fact that many naturally occurring
More informationASM Study Manual for Exam P, First Edition By Dr. Krzysztof M. Ostaszewski, FSA, CFA, MAAA Errata
ASM Study Manual for Exam P, First Edition By Dr. Krzysztof M. Ostaszewski, FSA, CFA, MAAA (krzysio@krzysio.net) Errata Effective July 5, 3, only the latest edition of this manual will have its errata
More informationA Unified Bayesian Framework for Face Recognition
Appears in the IEEE Signal Processing Society International Conference on Image Processing, ICIP, October 4-7,, Chicago, Illinois, USA A Unified Bayesian Framework for Face Recognition Chengjun Liu and
More informationMultivariate Distributions
IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate
More informationEconomics 240A, Section 3: Short and Long Regression (Ch. 17) and the Multivariate Normal Distribution (Ch. 18)
Economics 240A, Section 3: Short and Long Regression (Ch. 17) and the Multivariate Normal Distribution (Ch. 18) MichaelR.Roberts Department of Economics and Department of Statistics University of California
More informationTHE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2014, Mr. Ruey S. Tsay. Solutions to Final Exam
THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2014, Mr. Ruey S. Tsay Solutions to Final Exam 1. City crime: The distance matrix is 694 915 1073 528 716 881 972 464
More informationVALUES FOR THE CUMULATIVE DISTRIBUTION FUNCTION OF THE STANDARD MULTIVARIATE NORMAL DISTRIBUTION. Carol Lindee
VALUES FOR THE CUMULATIVE DISTRIBUTION FUNCTION OF THE STANDARD MULTIVARIATE NORMAL DISTRIBUTION Carol Lindee LindeeEmail@netscape.net (708) 479-3764 Nick Thomopoulos Illinois Institute of Technology Stuart
More informationA Novel Activity Detection Method
A Novel Activity Detection Method Gismy George P.G. Student, Department of ECE, Ilahia College of,muvattupuzha, Kerala, India ABSTRACT: This paper presents an approach for activity state recognition of
More informationProblem Y is an exponential random variable with parameter λ = 0.2. Given the event A = {Y < 2},
ECE32 Spring 25 HW Solutions April 6, 25 Solutions to HW Note: Most of these solutions were generated by R. D. Yates and D. J. Goodman, the authors of our textbook. I have added comments in italics where
More informationBAYESIAN DECISION THEORY
Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will
More informationHigher order moments of the estimated tangency portfolio weights
WORKING PAPER 0/07 Higher order moments of the estimated tangency portfolio weights Farrukh Javed, Stepan Mazur, Edward Ngailo Statistics ISSN 403-0586 http://www.oru.se/institutioner/handelshogskolan-vid-orebro-universitet/forskning/publikationer/working-papers/
More informationASM Study Manual for Exam P, Second Edition By Dr. Krzysztof M. Ostaszewski, FSA, CFA, MAAA Errata
ASM Study Manual for Exam P, Second Edition By Dr. Krzysztof M. Ostaszewski, FSA, CFA, MAAA (krzysio@krzysio.net) Errata Effective July 5, 3, only the latest edition of this manual will have its errata
More informationUniversity of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians
Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision
More informationMultivariate Regression (Chapter 10)
Multivariate Regression (Chapter 10) This week we ll cover multivariate regression and maybe a bit of canonical correlation. Today we ll mostly review univariate multivariate regression. With multivariate
More informationIntroduction to Machine Learning. Introduction to ML - TAU 2016/7 1
Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger
More informationThis does not cover everything on the final. Look at the posted practice problems for other topics.
Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationComputational Genomics
Computational Genomics http://www.cs.cmu.edu/~02710 Introduction to probability, statistics and algorithms (brief) intro to probability Basic notations Random variable - referring to an element / event
More informationConsistent Bivariate Distribution
A Characterization of the Normal Conditional Distributions MATSUNO 79 Therefore, the function ( ) = G( : a/(1 b2)) = N(0, a/(1 b2)) is a solu- tion for the integral equation (10). The constant times of
More informationFactor Analysis of Data Matrices
Factor Analysis of Data Matrices PAUL HORST University of Washington HOLT, RINEHART AND WINSTON, INC. New York Chicago San Francisco Toronto London Contents Preface PART I. Introductory Background 1. The
More informationChart types and when to use them
APPENDIX A Chart types and when to use them Pie chart Figure illustration of pie chart 2.3 % 4.5 % Browser Usage for April 2012 18.3 % 38.3 % Internet Explorer Firefox Chrome Safari Opera 35.8 % Pie chart
More informationNotes on Decision Theory and Prediction
Notes on Decision Theory and Prediction Ronald Christensen Professor of Statistics Department of Mathematics and Statistics University of New Mexico October 7, 2014 1. Decision Theory Decision theory is
More informationProbability. Table of contents
Probability Table of contents 1. Important definitions 2. Distributions 3. Discrete distributions 4. Continuous distributions 5. The Normal distribution 6. Multivariate random variables 7. Other continuous
More informationMath for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han
Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR
More informationUniversity of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries
University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent
More informationRobustness of Principal Components
PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.
More informationDeccan Education Society s FERGUSSON COLLEGE, PUNE (AUTONOMOUS) SYLLABUS UNDER AUTOMONY. SECOND YEAR B.Sc. SEMESTER - III
Deccan Education Society s FERGUSSON COLLEGE, PUNE (AUTONOMOUS) SYLLABUS UNDER AUTOMONY SECOND YEAR B.Sc. SEMESTER - III SYLLABUS FOR S. Y. B. Sc. STATISTICS Academic Year 07-8 S.Y. B.Sc. (Statistics)
More informationEigenface-based facial recognition
Eigenface-based facial recognition Dimitri PISSARENKO December 1, 2002 1 General This document is based upon Turk and Pentland (1991b), Turk and Pentland (1991a) and Smith (2002). 2 How does it work? The
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationUCLA Department of Statistics Papers
UCLA Department of Statistics Papers Title Can Interval-level Scores be Obtained from Binary Responses? Permalink https://escholarship.org/uc/item/6vg0z0m0 Author Peter M. Bentler Publication Date 2011-10-25
More informationDegenerate Expectation-Maximization Algorithm for Local Dimension Reduction
Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction Xiaodong Lin 1 and Yu Zhu 2 1 Statistical and Applied Mathematical Science Institute, RTP, NC, 27709 USA University of Cincinnati,
More informationMidterm: CS 6375 Spring 2015 Solutions
Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an
More informationIntroduction to Normal Distribution
Introduction to Normal Distribution Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 17-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Introduction
More informationCDA6530: Performance Models of Computers and Networks. Chapter 2: Review of Practical Random Variables
CDA6530: Performance Models of Computers and Networks Chapter 2: Review of Practical Random Variables Two Classes of R.V. Discrete R.V. Bernoulli Binomial Geometric Poisson Continuous R.V. Uniform Exponential,
More informationMultivariate Random Variable
Multivariate Random Variable Author: Author: Andrés Hincapié and Linyi Cao This Version: August 7, 2016 Multivariate Random Variable 3 Now we consider models with more than one r.v. These are called multivariate
More informationClass 8 Review Problems solutions, 18.05, Spring 2014
Class 8 Review Problems solutions, 8.5, Spring 4 Counting and Probability. (a) Create an arrangement in stages and count the number of possibilities at each stage: ( ) Stage : Choose three of the slots
More informationTable of Contents. Multivariate methods. Introduction II. Introduction I
Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation
More informationWhen Dictionary Learning Meets Classification
When Dictionary Learning Meets Classification Bufford, Teresa 1 Chen, Yuxin 2 Horning, Mitchell 3 Shee, Liberty 1 Mentor: Professor Yohann Tendero 1 UCLA 2 Dalhousie University 3 Harvey Mudd College August
More informationCopula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011
Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011 Outline Ordinary Least Squares (OLS) Regression Generalized Linear Models
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationGaussian Mixture Models, Expectation Maximization
Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak
More informationErrata for the ASM Study Manual for Exam P, Fourth Edition By Dr. Krzysztof M. Ostaszewski, FSA, CFA, MAAA
Errata for the ASM Study Manual for Exam P, Fourth Edition By Dr. Krzysztof M. Ostaszewski, FSA, CFA, MAAA (krzysio@krzysio.net) Effective July 5, 3, only the latest edition of this manual will have its
More informationOn Improving the k-means Algorithm to Classify Unclassified Patterns
On Improving the k-means Algorithm to Classify Unclassified Patterns Mohamed M. Rizk 1, Safar Mohamed Safar Alghamdi 2 1 Mathematics & Statistics Department, Faculty of Science, Taif University, Taif,
More informationDay 5: Generative models, structured classification
Day 5: Generative models, structured classification Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 22 June 2018 Linear regression
More informationMath 101 Study Session Spring 2016 Test 4 Chapter 10, Chapter 11 Chapter 12 Section 1, and Chapter 12 Section 2
Math 101 Study Session Spring 2016 Test 4 Chapter 10, Chapter 11 Chapter 12 Section 1, and Chapter 12 Section 2 April 11, 2016 Chapter 10 Section 1: Addition and Subtraction of Polynomials A monomial is
More informationName of the Student: Problems on Discrete & Continuous R.Vs
Engineering Mathematics 05 SUBJECT NAME : Probability & Random Process SUBJECT CODE : MA6 MATERIAL NAME : University Questions MATERIAL CODE : JM08AM004 REGULATION : R008 UPDATED ON : Nov-Dec 04 (Scan
More informationINTRODUCTION TO PATTERN RECOGNITION
INTRODUCTION TO PATTERN RECOGNITION INSTRUCTOR: WEI DING 1 Pattern Recognition Automatic discovery of regularities in data through the use of computer algorithms With the use of these regularities to take
More informationA Probability Review
A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in
More informationNotes on Random Vectors and Multivariate Normal
MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution
More informationWarner, R. M. (2008). Applied Statistics: From bivariate through multivariate techniques. Thousand Oaks: Sage.
Errata for Warner, R. M. (2008). Applied Statistics: From bivariate through multivariate techniques. Thousand Oaks: Sage. Most recent update: March 4, 2009 Please send information about any errors in the
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationNotes on a skew-symmetric inverse double Weibull distribution
Journal of the Korean Data & Information Science Society 2009, 20(2), 459 465 한국데이터정보과학회지 Notes on a skew-symmetric inverse double Weibull distribution Jungsoo Woo 1 Department of Statistics, Yeungnam
More informationUniversity of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians
University of Cambridge Engineering Part IIB Module 4F: Statistical Pattern Processing Handout 2: Multivariate Gaussians.2.5..5 8 6 4 2 2 4 6 8 Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2 2 Engineering
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationHST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007
MIT OpenCourseWare http://ocw.mit.edu HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationAppendix A. Math Reviews 03Jan2007. A.1 From Simple to Complex. Objectives. 1. Review tools that are needed for studying models for CLDVs.
Appendix A Math Reviews 03Jan007 Objectives. Review tools that are needed for studying models for CLDVs.. Get you used to the notation that will be used. Readings. Read this appendix before class.. Pay
More informationNext is material on matrix rank. Please see the handout
B90.330 / C.005 NOTES for Wednesday 0.APR.7 Suppose that the model is β + ε, but ε does not have the desired variance matrix. Say that ε is normal, but Var(ε) σ W. The form of W is W w 0 0 0 0 0 0 w 0
More informationModeling and Performance Analysis with Discrete-Event Simulation
Simulation Modeling and Performance Analysis with Discrete-Event Simulation Chapter 9 Input Modeling Contents Data Collection Identifying the Distribution with Data Parameter Estimation Goodness-of-Fit
More informationECON Fundamentals of Probability
ECON 351 - Fundamentals of Probability Maggie Jones 1 / 32 Random Variables A random variable is one that takes on numerical values, i.e. numerical summary of a random outcome e.g., prices, total GDP,
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationLinear Algebra Review
Linear Algebra Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Linear Algebra Review 1 / 45 Definition of Matrix Rectangular array of elements arranged in rows and
More informationComputer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr.
Simulation Discrete-Event System Simulation Chapter 8 Input Modeling Purpose & Overview Input models provide the driving force for a simulation model. The quality of the output is no better than the quality
More informationGrowing a Large Tree
STAT 5703 Fall, 2004 Data Mining Methodology I Decision Tree I Growing a Large Tree Contents 1 A Single Split 2 1.1 Node Impurity.................................. 2 1.2 Computation of i(t)................................
More informationProblem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30
Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain
More informationEvaluation requires to define performance measures to be optimized
Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 89 Part II
More informationOrientation Map Based Palmprint Recognition
Orientation Map Based Palmprint Recognition (BM) 45 Orientation Map Based Palmprint Recognition B. H. Shekar, N. Harivinod bhshekar@gmail.com, harivinodn@gmail.com India, Mangalore University, Department
More informationReview Packet 1 B 11 B 12 B 13 B = B 21 B 22 B 23 B 31 B 32 B 33 B 41 B 42 B 43
Review Packet. For each of the following, write the vector or matrix that is specified: a. e 3 R 4 b. D = diag{, 3, } c. e R 3 d. I. For each of the following matrices and vectors, give their dimension.
More informationACE 562 Fall Lecture 2: Probability, Random Variables and Distributions. by Professor Scott H. Irwin
ACE 562 Fall 2005 Lecture 2: Probability, Random Variables and Distributions Required Readings: by Professor Scott H. Irwin Griffiths, Hill and Judge. Some Basic Ideas: Statistical Concepts for Economists,
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationCorrelation: Copulas and Conditioning
Correlation: Copulas and Conditioning This note reviews two methods of simulating correlated variates: copula methods and conditional distributions, and the relationships between them. Particular emphasis
More informationTopic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.
Topic 2: Probability & Distributions ECO220Y5Y: Quantitative Methods in Economics Dr. Nick Zammit University of Toronto Department of Economics Room KN3272 n.zammit utoronto.ca November 21, 2017 Dr. Nick
More informationIEOR 4701: Stochastic Models in Financial Engineering. Summer 2007, Professor Whitt. SOLUTIONS to Homework Assignment 9: Brownian motion
IEOR 471: Stochastic Models in Financial Engineering Summer 27, Professor Whitt SOLUTIONS to Homework Assignment 9: Brownian motion In Ross, read Sections 1.1-1.3 and 1.6. (The total required reading there
More informationSTATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010
STATS306B Discriminant Analysis Jonathan Taylor Department of Statistics Stanford University June 3, 2010 Spring 2010 Classification Given K classes in R p, represented as densities f i (x), 1 i K classify
More informationTESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST
Econometrics Working Paper EWP0402 ISSN 1485-6441 Department of Economics TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Lauren Bin Dong & David E. A. Giles Department
More informationJoint Probability Distributions, Correlations
Joint Probability Distributions, Correlations What we learned so far Events: Working with events as sets: union, intersection, etc. Some events are simple: Head vs Tails, Cancer vs Healthy Some are more
More informationDrift Reduction For Metal-Oxide Sensor Arrays Using Canonical Correlation Regression And Partial Least Squares
Drift Reduction For Metal-Oxide Sensor Arrays Using Canonical Correlation Regression And Partial Least Squares R Gutierrez-Osuna Computer Science Department, Wright State University, Dayton, OH 45435,
More informationPattern recognition. "To understand is to perceive patterns" Sir Isaiah Berlin, Russian philosopher
Pattern recognition "To understand is to perceive patterns" Sir Isaiah Berlin, Russian philosopher The more relevant patterns at your disposal, the better your decisions will be. This is hopeful news to
More informationVector Space Models. wine_spectral.r
Vector Space Models 137 wine_spectral.r Latent Semantic Analysis Problem with words Even a small vocabulary as in wine example is challenging LSA Reduce number of columns of DTM by principal components
More informationREVIEW OF MAIN CONCEPTS AND FORMULAS A B = Ā B. Pr(A B C) = Pr(A) Pr(A B C) =Pr(A) Pr(B A) Pr(C A B)
REVIEW OF MAIN CONCEPTS AND FORMULAS Boolean algebra of events (subsets of a sample space) DeMorgan s formula: A B = Ā B A B = Ā B The notion of conditional probability, and of mutual independence of two
More informationP 1.5 X 4.5 / X 2 and (iii) The smallest value of n for
DHANALAKSHMI COLLEGE OF ENEINEERING, CHENNAI DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING MA645 PROBABILITY AND RANDOM PROCESS UNIT I : RANDOM VARIABLES PART B (6 MARKS). A random variable X
More informationLarge-Margin Thresholded Ensembles for Ordinal Regression
Large-Margin Thresholded Ensembles for Ordinal Regression Hsuan-Tien Lin and Ling Li Learning Systems Group, California Institute of Technology, U.S.A. Conf. on Algorithmic Learning Theory, October 9,
More informationChapter 4 Multiple Random Variables
Review for the previous lecture Theorems and Examples: How to obtain the pmf (pdf) of U = g ( X Y 1 ) and V = g ( X Y) Chapter 4 Multiple Random Variables Chapter 43 Bivariate Transformations Continuous
More informationDiscrete Random Variables
Discrete Random Variables We have a probability space (S, Pr). A random variable is a function X : S V (X ) for some set V (X ). In this discussion, we must have V (X ) is the real numbers X induces a
More informationSupplement to Bayesian inference for high-dimensional linear regression under the mnet priors
The Canadian Journal of Statistics Vol. xx No. yy 0?? Pages?? La revue canadienne de statistique Supplement to Bayesian inference for high-dimensional linear regression under the mnet priors Aixin Tan
More informationUnsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent
Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:
More informationRelevance Vector Machines for Earthquake Response Spectra
2012 2011 American American Transactions Transactions on on Engineering Engineering & Applied Applied Sciences Sciences. American Transactions on Engineering & Applied Sciences http://tuengr.com/ateas
More information(y 1, y 2 ) = 12 y3 1e y 1 y 2 /2, y 1 > 0, y 2 > 0 0, otherwise.
54 We are given the marginal pdfs of Y and Y You should note that Y gamma(4, Y exponential( E(Y = 4, V (Y = 4, E(Y =, and V (Y = 4 (a With U = Y Y, we have E(U = E(Y Y = E(Y E(Y = 4 = (b Because Y and
More informationLinear Subspace Models
Linear Subspace Models Goal: Explore linear models of a data set. Motivation: A central question in vision concerns how we represent a collection of data vectors. The data vectors may be rasterized images,
More informationMachine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationThe Matrix Algebra of Sample Statistics
The Matrix Algebra of Sample Statistics James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) The Matrix Algebra of Sample Statistics
More informationLecture 2: Review of Probability
Lecture 2: Review of Probability Zheng Tian Contents 1 Random Variables and Probability Distributions 2 1.1 Defining probabilities and random variables..................... 2 1.2 Probability distributions................................
More informationMultiple Random Variables
Multiple Random Variables This Version: July 30, 2015 Multiple Random Variables 2 Now we consider models with more than one r.v. These are called multivariate models For instance: height and weight An
More informationCITS 4402 Computer Vision
CITS 4402 Computer Vision A/Prof Ajmal Mian Adj/A/Prof Mehdi Ravanbakhsh Lecture 06 Object Recognition Objectives To understand the concept of image based object recognition To learn how to match images
More informationStochastic Processes. M. Sami Fadali Professor of Electrical Engineering University of Nevada, Reno
Stochastic Processes M. Sami Fadali Professor of Electrical Engineering University of Nevada, Reno 1 Outline Stochastic (random) processes. Autocorrelation. Crosscorrelation. Spectral density function.
More informationCase of single exogenous (iv) variable (with single or multiple mediators) iv à med à dv. = β 0. iv i. med i + α 1
Mediation Analysis: OLS vs. SUR vs. ISUR vs. 3SLS vs. SEM Note by Hubert Gatignon July 7, 2013, updated November 15, 2013, April 11, 2014, May 21, 2016 and August 10, 2016 In Chap. 11 of Statistical Analysis
More informationPart 6: Multivariate Normal and Linear Models
Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of
More information