FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)

Similar documents
General Certificate of Education Advanced Level Examination June 2010

BALANCING REGULAR MATRIX PENCILS

A Comparison Study of the Test for Right Censored and Grouped Data

A proposed nonparametric mixture density estimation using B-spline functions

8 APPENDIX. E[m M] = (n S )(1 exp( exp(s min + c M))) (19) E[m M] n exp(s min + c M) (20) 8.1 EMPIRICAL EVALUATION OF SAMPLING

A Separability Index for Distance-based Clustering and Classification Algorithms

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with?

Automobile Prices in Market Equilibrium. Berry, Pakes and Levinsohn

Alberto Maydeu Olivares Instituto de Empresa Marketing Dept. C/Maria de Molina Madrid Spain

A Separability Index for Distance-based Clustering and Classification Algorithms

Separation of Variables and a Spherical Shell with Surface Charge

Lecture Note 3: Stationary Iterative Methods

II. PROBLEM. A. Description. For the space of audio signals

General Certificate of Education Advanced Level Examination June 2010

(This is a sample cover image for this issue. The actual cover is not yet available at this time.)

SVM: Terminology 1(6) SVM: Terminology 2(6)

AST 418/518 Instrumentation and Statistics

IE 361 Exam 1. b) Give *&% confidence limits for the bias of this viscometer. (No need to simplify.)

Statistical Learning Theory: A Primer

A Separability Index for Distance-based Clustering and Classification Algorithms

A MODEL FOR ESTIMATING THE LATERAL OVERLAP PROBABILITY OF AIRCRAFT WITH RNP ALERTING CAPABILITY IN PARALLEL RNAV ROUTES

hole h vs. e configurations: l l for N > 2 l + 1 J = H as example of localization, delocalization, tunneling ikx k

4 1-D Boundary Value Problems Heat Equation

Two-Stage Least Squares as Minimum Distance

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents

STA 216 Project: Spline Approach to Discrete Survival Analysis

4 Separation of Variables

Some Measures for Asymmetry of Distributions

Wavelet Methods for Time Series Analysis. Wavelet-Based Signal Estimation: I. Part VIII: Wavelet-Based Signal Extraction and Denoising

Distribution free tests for polynomial regression based on simplicial depth

A Brief Introduction to Markov Chains and Hidden Markov Models

Explicit overall risk minimization transductive bound

Appendix of the Paper The Role of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Model

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS

Statistics for Applications. Chapter 7: Regression 1/43

arxiv: v2 [cond-mat.stat-mech] 14 Nov 2008

Akaike Information Criterion for ANOVA Model with a Simple Order Restriction

MATRIX CONDITIONING AND MINIMAX ESTIMATIO~ George Casella Biometrics Unit, Cornell University, Ithaca, N.Y. Abstract

$, (2.1) n="# #. (2.2)

Stochastic Variational Inference with Gradient Linearization

Assignment 7 Due Tuessday, March 29, 2016

A. Distribution of the test statistic

Statistical Inference, Econometric Analysis and Matrix Algebra

Two view learning: SVM-2K, Theory and Practice

Testing for the Existence of Clusters

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1

Mode in Output Participation Factors for Linear Systems

Midterm 2 Review. Drew Rollins

Manipulation in Financial Markets and the Implications for Debt Financing

From Margins to Probabilities in Multiclass Learning Problems

Online Appendices for The Economics of Nationalism (Xiaohuan Lan and Ben Li)

High Spectral Resolution Infrared Radiance Modeling Using Optimal Spectral Sampling (OSS) Method

Strauss PDEs 2e: Section Exercise 2 Page 1 of 12. For problem (1), complete the calculation of the series in case j(t) = 0 and h(t) = e t.

14-6 The Equation of Continuity

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES

David Eigen. MA112 Final Paper. May 10, 2002

CS229 Lecture notes. Andrew Ng

Inverse-Variance Weighting PCA-based VRE criterion to select the optimal number of PCs

C. Fourier Sine Series Overview

Statistical Astronomy

The EM Algorithm applied to determining new limit points of Mahler measures

XSAT of linear CNF formulas

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries

Available online at ScienceDirect. Procedia Computer Science 96 (2016 )

Math 124B January 17, 2012

Chapter 7 PRODUCTION FUNCTIONS. Copyright 2005 by South-Western, a division of Thomson Learning. All rights reserved.

WAVELET FREQUENCY DOMAIN APPROACH FOR TIME-SERIES MODELING

ORTHOGONAL MULTI-WAVELETS FROM MATRIX FACTORIZATION

Short Circuit Detection Utilization Analysis under Uniprocessor EDF Scheduling

The Normalized Singular Value Decomposition of Non-Symmetric Matrices Using Givens fast Rotations

A Robust Voice Activity Detection based on Noise Eigenspace Projection

A sta6s6cal view of entropy

STATISTICAL APPROACHES FOR MATCHING THE COMPONENTS OF COMPLEX MICROBIAL COMMUNITIES

UI FORMULATION FOR CABLE STATE OF EXISTING CABLE-STAYED BRIDGE

Trainable fusion rules. I. Large sample size case

Support Vector Machine and Its Application to Regression and Classification

Solution Set Seven. 1 Goldstein Components of Torque Along Principal Axes Components of Torque Along Cartesian Axes...

Combining reaction kinetics to the multi-phase Gibbs energy calculation

On the evaluation of saving-consumption plans

More Scattering: the Partial Wave Expansion

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones

Gauss Law. 2. Gauss s Law: connects charge and field 3. Applications of Gauss s Law

14 Separation of Variables Method

Statistical Learning Theory: a Primer

Physics 235 Chapter 8. Chapter 8 Central-Force Motion

[11] J.V. Uspensky, Introduction to Mathematical Probability (McGraw Hill, New

Converting Z-number to Fuzzy Number using. Fuzzy Expected Value

A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio

Math 124B January 31, 2012

Two-stage least squares as minimum distance

Calculation of Aggregate Pavement Condition Indices from Damage Data Using Factor Analysis

Process Capability Proposal. with Polynomial Profile

Nonlinear Analysis of Spatial Trusses

Lobontiu: System Dynamics for Engineering Students Website Chapter 3 1. z b z

NOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS

arxiv: v2 [stat.ml] 19 Oct 2016

Chemical Kinetics Part 2

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7

On colorings of the Boolean lattice avoiding a rainbow copy of a poset arxiv: v1 [math.co] 21 Dec 2018

Transcription:

1 FRST 531 -- Mutivariate Statistics Mutivariate Discriminant Anaysis (MDA) Purpose: 1. To predict which group (Y) an observation beongs to based on the characteristics of p predictor (X) variabes, using inear composites of predictor variabes. The criterion for these composites is that the between group variance is maximized subject to the within group variance. Each new inear composite is uncorreated with previous ones (see Figures 1, 3 and 4), but are not necessariy orthogona (not a at 90 degree anges). 2. To minimize miscassification error rates. Once the discriminating functions are found, Fisher s inear discriminating functions (1 per group) can be used to predict group membership of another data set (Figure 2). 3. To determine whether the group centroids are statisticay different. The group centroid is the average vaue (average discriminant score) for the inear composite of the predictor variabes. This can aso be found by inputting the averages of each of the predictor variabes to find the average discriminant score. 4. To determine the number of statisticay significant discriminant axes (see Figures 3 and 4). 5. To determine which of the predictor variabes contributes most to discriminating among groups.

2 Reationship of MDA to Other Techniques: Unike Custer Anaysis, the group to which each entity beongs is known. As with Regression Anaysis, a prediction mode is wanted. However, the dependent variabe with MDA is a category (ordina or nomina scae), rather than a continuous variabe as with regression anaysis. MDA is the reverse of Mutivariate Anaysis of Variance (MANOVA); the continuous variabes are dependent variabes in MANOVA and the casses are predictor variabes. PCA can be used as an initia step in discriminant anaysis to reduce the number of predictor variabes. A reated procedure is to fit a series of ogistic modes. These incude Probit and Logit anaysis which predicts the probabiity of a yes or no (2 casses). This can be extended to a mutinomia ogit, by fitting a series of 2 cass modes. Probit and Logit, and mutinomia ogit are not covered in this course. Procedure, Canonica discriminant anaysis: The prediction modes ( inear combinations of predictor variabes) are based on a Learning Data Set. This set is comprised of sampe observations of the X variabes for each of the k groups. X j x11 x12 x13 L x1p x21 x22 x23 L x2 p = x31 x32 x33 L x3p M M M O M xn ( j) 1 xn ( j) 2 xn ( j) 3 L xn ( j) p n = n1 + n2 + n3+ L n k j = 123,,,L k

3 The idea is to determine functions of the X variabes that in some way separates the k groups as we as possibe. The simpest is to take a inear combinations of the X variabes. z 1 = b 11 x 1 + b 12 x 2 + b 13 x 3 +L b 1p x p where z 1 is a matrix of discriminant scores, one vaue for each of the n observations, and one for each of the r discriminating functions. There can be z 2, z 3, etc. up to the smaer of p or k-1 different inear functions possibe. In order to obtain the "best" vaues for the coefficients, we want to maximize the ratio of the between group variance over the within group variance. λ = b T B b T b W b where T= the tota variation of a the predictor variabes; and T can be divided into variance between groups (B) and variance within groups (W). The cacuation of these is as foows: (see presentation in cass)

4 We want to find a coefficient matrix b, such that this ratio is maximized. We aso have the constraint that the resuting discriminant scores are uncorreated. To begin, we take first derivatives and set equa to zero. Then, using a Lagrangian mutipier as we did in finding principe components of a matrix, we obtain: ( B λw) b= 0 which is equivaent to: 1 ( W B λ I) b= 0 b are the eigenvectors of W 1 λ are the eigenvaues of W B 1 B (discriminant weights); from argest to smaest; 1 W B is nonsymmetric; eigenvectors are uncorreated but wi not be orthogona. The reative weight of the function can be expressed by a ratio of the associated eigenvaue reative to the sum of a eigenvaues combined for a discriminating functions. RW This indicates which axis capture the most variation. The cosine of the ange between two discriminating functions can be found by: = cos( θ v ) r λ = 1 λ T = b b v which is the inner product of the two eigenvectors.

5 Assumptions: 1. Mutivariate Normaity of predictor variabes (a continuous and normay distributed). 2. Homogeneity of the variance-covariance matrix over a m groups. Discriminant anaysis is not robust to these assumptions. If these are not met, the resuting tests of significance wi not be reiabe (the package coud report a "p vaue" of 0.001 and the rea vaue is 0.30). The discriminant anaysis can be used as a descriptive too in this case, but cannot be used to test hypothesis about the discriminant functions. If #1 hods but #2 does not, can use a quadratic discriminating function instead of a inear discriminating function. If #1 does not hod, coud resut in biased estimates of miscassification error rates aso. Miscassification Error Rates Fisher s inear discriminating functions ( one function for each of the k groups) can be used to predict group membership, based on an observation of the predictor variabes. The vaue for each of the k Fisher s inear discriminating functions is determined using the vaues for the predictor variabes. The highest vaue indicates the group membership. Aternativey, the vector of discriminating scores (z) coud be found using the r inear discriminating functions (ess than k), by inputting the set of vaues for the predictor variabes. Then, the vector of average discriminating scores using the

6 average vaues for the predictor variabes in the earning data set coud be found for each group. For each group, using the vector of averages ( z m ), the Mahaanobis distance woud then be cacuated: 2 T 1 D = ( z z ) C ( z z ) m m where C is the covariance matrix for the X variabes. The new data point represented by vector z is then predicted to beong to the group having the owest Mahaanobis distance. Based on the prediction modes obtained using the earning dataset, the number of incorrecty cassified observations (miscassification error rate) can be determined using the earning set data, and Fisher s inear discriminating functions. The error rate wi be underestimated, since these data were used to estabish the prediction modes. There are severa aternatives for estimating the miscassification error rate: 1. Cacuate the error rate using a new data set. 2. Spit the origina data set into two subsets. One part of the data woud be used to fit the discriminating functions, and the other woud be used to cacuate the miscassification error rate. 3. Cross-Vaidation The process for cross-vaidation is to 1) fit the discriminating functions using a but 1 of the observations in the data set; 2) cacuate the error rate for the reserved observation; 3) repeat 1 and 2 by reserving a different observation, unti each observation has been removed (i.e. fit the discriminating functions n times).

7 4. n-way vaidation A modification of the cross-vaidation is to divide the data into groups. Discriminant anaysis is performed using a but one of the groups of data. The reserved group of data is then used to test the functions. This is repeated by reserving a different group. The average error rate is then cacuated. Considerations in data spitting incude: 1) random spit? 2) random spit by group? 3) enough data eft to obtain a reasonabe discriminating mode? Which centroids differ? For 2 groups, the difference between group centroids can be tested to see if the group centroids differ ( H0: μ 1 = μ 2). The Mahaanobis distance is defined as: 2 T 1 D = ( z z ) C ( z z ) A transformation of this distance can be used to test for differences between two group centroids: F = n n n + n n1 + n2 p 1 ( n + n 2) p D 2 Under the nu hypothesis that the two group centroids are the same, this is distributed as an F distribution, for the 1 α percentie, and with n + n p 1 degress of freedom. For more than two groups, often this two group test is α performed for every pair of groups; however, the 1 percentie shoud no. pairs be used instead of the 1 α percentie.

8 An aternative test, that is reated to this test, is the Hoteing s T squared test. T n n = n + n D 2 2 The test statistic is then cacuated: n1 + n2 p 1 T p ( n + n 2) 2 which is distributed as the F distribution for the 1 α percentie, and with p and n1 + n2 p 1 degress of freedom. Again this can be used for more groups by testing every pair of groups, but using the percentie. 1 α no. pairs instead of the 1 α Both of these tests assume a mutivariate distribution of data, and that the covariance matrix is the same for the two groups. Which inear composites (discriminant functions) shoud be retained? 1. Significance of Eigenvaues as a Group (a r discriminant functions): Cacuate where n, p, k are as defined above. { } V = ( n 1) 1/ 2( p+ k) n( 1+ λ ) r = 1 Compare to Chi Square distribution with p(k-1) degrees of freedom and the 1 α percentie. If V is greater than the critica vaue, the discriminant functions as a group are significant.

9 2. Significance of function : Cacuate { } V = ( n 1) 1/ 2( p+ k) n( 1+ λ ) Compare to Chi Square distribution with p+k-2 degrees of freedom. If V greater than critica vaue, discriminant function is significant. is Which X Variabes are most important to the Discriminant Scores? 1. Discriminant weight: probems with these are that they reate to variabe size and reationship among variabes (dependence of predictor variabes). 2. Discriminant oadings: gives simpe correation coefficient of variabe with the discriminant scores. Cacuation of Discriminant oadings: Let C -1/2 be the square root of diagona eements of the variance-covariance matrix of the predictor variabes (standard deviations) and et R be the correation matrix for X (pairwise correations between the origina variabes). Then: 1. b * C = 1/2 b to get scaed weights 2. R b to get scaed weights * correation for X which resuts in = * correations between each X with each discriminating function (discriminant scores). where the are vectors of discriminant oadings for the discriminant function.

10 Toos for Interpretation: 1. Pot the group centroids. One centroid for each group, for each discriminant function. (Figure 5) 2. Pot group overaps. (Figure 6). 3. Aso possibe to ater the prior probabiities (equa, sampe based, other). 4. Stepwise discriminant anaysis possibe but based on the mutivariate norma distribution function. References Dion, W.R. and Godstein, M. 1984. Mutivariate anaysis. Methods and appications. John Wiey and Sons, Toronto. [and textbooks for the course]