Correspondence Analysis & Related Methods

Similar documents
COMPARING MORE THAN TWO POPULATION MEANS: AN ANALYSIS OF VARIANCE

Power transformations in correspondence analysis

AVS fiziks. Institute for NET/JRF, GATE, IIT-JAM, JEST, TIFR and GRE in PHYSICAL SCIENCES

matschek (ccm2548) Ch17-h3 chiu (57890) 1

e sin cos i sin sin j cos k [2 POINTS] (c) Hence, determine expressions for sin sin i sin cos j sin e

Red Shift and Blue Shift: A realistic approach

Experiment 1 Electric field and electric potential

(conservation of momentum)

Photographing a time interval

Non-Ideal Gas Behavior P.V.T Relationships for Liquid and Solid:

working pages for Paul Richards class notes; do not copy or circulate without permission from PGR 2004/11/3 10:50

SKP-2 ALGORITHM: ON FORMING PART AND MACHINE CLUSTERS SEPARATELY

Rigid Body Dynamics 2. CSE169: Computer Animation Instructor: Steve Rotenberg UCSD, Winter 2018

1 Fundamental Solutions to the Wave Equation

PHYS 110B - HW #7 Fall 2005, Solutions by David Pace Equations referenced as Eq. # are from Griffiths Problem statements are paraphrased

Review for the previous lecture

3.6 Applied Optimization

TOTAL VARIANCE AS AN EXACT ANALYSIS OF THE SAMPLE VARIANCE*

1 Fundamental Solutions to the Wave Equation

SAMPLE LABORATORY SESSION FOR JAVA MODULE B. Calculations for Sample Cross-Section 2

1 Similarity Analysis

Information Retrieval Advanced IR models. Luca Bondi

MAC Module 12 Eigenvalues and Eigenvectors

Research Design - - Topic 17 Multiple Regression & Multiple Correlation: Two Predictors 2009 R.C. Gardner, Ph.D.

Generalized Vapor Pressure Prediction Consistent with Cubic Equations of State

dp p v= = ON SHOCK WAVES AT LARGE DISTANCES FROM THE PLACE OF THEIR ORIGIN By Lev D. Landau J. Phys. U.S.S.R. 9, 496 (1945).

Physics 2A Chapter 10 - Moment of Inertia Fall 2018

1D2G - Numerical solution of the neutron diffusion equation

n 1 Cov(X,Y)= ( X i- X )( Y i-y ). N-1 i=1 * If variable X and variable Y tend to increase together, then c(x,y) > 0

Section 8.2 Polar Coordinates

A Crash Course in (2 2) Matrices

Inverse Square Law and Polarization

Temporal-Difference Learning

Physics 218, Spring March 2004

Related Rates - the Basics

In statistical computations it is desirable to have a simplified system of notation to avoid complicated formulas describing mathematical operations.

Psychometric Methods: Theory into Practice Larry R. Price

REVIEW Polar Coordinates and Equations

How to Obtain Desirable Transfer Functions in MIMO Systems Under Internal Stability Using Open and Closed Loop Control

The Research of AQI Index Changing Regularity Mainly in Tianjin Ziyu Guo

Physics 121 Hour Exam #5 Solution

Extra Examples for Chapter 1

PROBLEM SET #1 SOLUTIONS by Robert A. DiStasio Jr.

Berkeley Math Circle AIME Preparation March 5, 2013

Appendix B The Relativistic Transformation of Forces

On the characteristic of projectively invariant Pseudo-distance on Finsler spaces

OBSTACLE DETECTION USING RING BEAM SYSTEM

Complex Eigenvalues. Prepared by Vince Zaccone For Campus Learning Assistance Services at UCSB

As is natural, our Aerospace Structures will be described in a Euclidean three-dimensional space R 3.

Euclidean Figures and Solids without Incircles or Inspheres

The Radii of Baryons

2 Governing Equations

Test 2, ECON , Summer 2013

7.2.1 Basic relations for Torsion of Circular Members

Physics 2B Chapter 22 Notes - Magnetic Field Spring 2018

Multiple Experts with Binary Features

MODULE 5a and 5b (Stewart, Sections 12.2, 12.3) INTRO: In MATH 1114 vectors were written either as rows (a1, a2,..., an) or as columns a 1 a. ...

33. 12, or its reciprocal. or its negative.

radians). Figure 2.1 Figure 2.2 (a) quadrant I angle (b) quadrant II angle is in standard position Terminal side Terminal side Terminal side

Chapter 5 Linear Equations: Basic Theory and Practice

On Rotating Frames and the Relativistic Contraction of the Radius (The Rotating Disc)

Diffusion and Transport. 10. Friction and the Langevin Equation. Langevin Equation. f d. f ext. f () t f () t. Then Newton s second law is ma f f f t.

The Strain Compatibility Equations in Polar Coordinates RAWB, Last Update 27/12/07

15.081J/6.251J Introduction to Mathematical Programming. Lecture 6: The Simplex Method II

Singly and doubly ordered cumulative correspondence analysis.

Mass Transfer (Stoffaustausch)

6 PROBABILITY GENERATING FUNCTIONS

Suppose you have a bank account that earns interest at rate r, and you have made an initial deposit of X 0

Time Dilation in Gravity Wells

Physics 111 Lecture 5 (Walker: 3.3-6) Vectors & Vector Math Motion Vectors Sept. 11, 2009

PHYS 705: Classical Mechanics. Small Oscillations

Numerical Modeling in Biomedical Systems

Correspondence Analysis & Related Methods

The Kerr-metric, mass- and light-horizons, and black holes' radii.

INTEGRATION OF THE SELF-ORGANIZING MAP AND NEURAL GAS WITH MULTIDIMENSIONAL SCALING

Special Relativity in Acoustic and Electromagnetic Waves Without Phase Invariance and Lorentz Transformations 1. Introduction n k.

Vectors, Vector Calculus, and Coordinate Systems

Pearson s Chi-Square Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted Histograms

INTRODUCTION. 2. Vectors in Physics 1

Information Filtering and Retrieval lecture SS 2007

Relativity for Global Navigation Satellite Systems

ENGI 4430 Non-Cartesian Coordinates Page xi Fy j Fzk from Cartesian coordinates z to another orthonormal coordinate system u, v, ˆ i ˆ ˆi

In electrostatics, the electric field E and its sources (charges) are related by Gauss s law: Surface

Numerical Integration

Web-based Supplementary Materials for. Controlling False Discoveries in Multidimensional Directional Decisions, with

Special relativity with clock synchronization

arxiv: v2 [physics.data-an] 15 Jul 2015

KEPLER S LAWS OF PLANETARY MOTION

New problems in universal algebraic geometry illustrated by boolean equations

K.S.E.E.B., Malleshwaram, Bangalore SSLC Model Question Paper-1 (2015) Mathematics

Perturbation to Symmetries and Adiabatic Invariants of Nonholonomic Dynamical System of Relative Motion

Problem Set 10 Solutions

ATMO 551a Fall 08. Diffusion

AMC 10 Contest B. Solutions Pamphlet. Wednesday, FEBRUARY 21, American Mathematics Competitions

Goodness-of-fit for composite hypotheses.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Physics Department. Problem Set 10 Solutions. r s

2 x 8 2 x 2 SKILLS Determine whether the given value is a solution of the. equation. (a) x 2 (b) x 4. (a) x 2 (b) x 4 (a) x 4 (b) x 8

Vision Sensor. Vision. (Phase 1) pre-shaping. Actuator. Tactile Sensor. Vision. (Phase 2) shaping. Actuator. Tactile Sensor.

Electric Anisotropy, Magnetic Anisotropy, Uniaxial and Biaxial Materials, Bianisotropic Media (Definitions)

Physics 521. Math Review SCIENTIFIC NOTATION SIGNIFICANT FIGURES. Rules for Significant Figures

Transcription:

Coespondene Analysis & Related Methods Oveview of CA and basi geometi onepts espondents, all eades of a etain newspape, osstabulated aoding to thei eduation goup and level of eading of the newspape Mihael Geenae E E 5 8 C C 46 0 0.4 0. 0. 0.09 (5.48 %) E SSION 9: (SIMPLE) CORRESPONDENCE ANALYSIS: basi geometi onepts 9 9 40 9 49 6 0. 0-0. -0. C 0.004 (.5 %) E C -0.5-0.4-0. -0. -0. 0 0. 0. 0. 0.4 0.5 0.6 E: some pimay E: pimay ompleted : some seonday : seonday ompleted : some tetiay : glane C : faily thoough C: vey thoough Pofile Row pofiles viewed in -d A pofile is a set of elative fequenies, that is a set of fequenies expessed elative to thei total (often in peentage fom). Eah ow o eah olumn of a table of fequenies defines a diffeent pofile. It is these pofiles whih CA visualises as points in a map. E E oiginal data 5 8 9 C C 46 9 40 0 9 49 6 4 8 0 6 E E ow pofiles.6.... C C.50.55..40..4.4.45.49.6 olumn pofiles E E.09....05 C C 5 9 6.05....05.0.6..9.

Plotting pofiles in pofile spae (tiangula oodinates) Weighted aveage (entoid entoid) 0.6 E : 0.6 0.50 0.4 aveage The aveage is the point at whih the two points ae balaned. weighted aveage The situation is idential fo multidimensional points... 0.4 0.50 Plotting pofiles in pofile spae (bayenti o weighted aveage piniple) 0.4 E: 0.6 0.50 0.4 Plotting pofiles in pofile spae (bayenti o weighted aveage piniple) 0.4 E: 0. 0.55 0.4 0.50 0.6 0.55 0.

Plotting pofiles in pofile spae (bayenti o weighted aveage piniple) 0.6 oiginal data C C Masses of the pofiles masses E 5 4.045 E 8 46 0.69 9 9 9 8.9 : 0. 0. 0.6 aveage ow pofile 40 49 6 0 6 5 9 6.8.4.404.4.08 0. 0. Readeship data Calulating hi-squae E E Eduation Goup Some pimay Pimay ompleted 5 (0.5) 8 (0.4) C (0.500) 46 (0.548) C (0.4) 0 (0.8) 4 Mass 0.045 0.69 χ = simila tems... + ( - 4.6) + ( -0.4) + = 6.0 (6-0.50) 4.6 0.4 0.50 Some seonday 9 (0.8) 9 (0.) 9 (0.448) 8 0.9 Eduation Goup C C Mass Seonday ompleted Some tetiay (0.9) (0.5) 5 (0.8) 40 (0.96) (0.69) 9 (0.4) 49 (0.485) 6 (0.65) 6 (0.404) 0 6 : glane C: faily thoough C: vey thoough 0.4 0.08.... Obseved Fequeny Some tetiay Expeted Fequeny (0.5) 4.6 5 (0.8) (0.69) 0.4 9 (0.4) 6 (0.65) 0.50 6 (0.404) 4 8 0 6 0.08 Fo example, expeted fequeny of (,): 0.8 x 6 = 4.6

Calulating hi-squae χ = simila tems... + 6 [ ( / 6-4.6 / 6) + ( / 6-0.4 / 6) + (6 / 6-0.50 / 6) ] 4.6 / 6 0.4 / 6 0.50 / 6 χ / = simila tems... + 0.08 [ (0.5 0.8) + (0.69 0.4) + (0.65 0.404) ] 0.8 0.4 0.404 Calulating inetia Inetia = χ / = simila tems fo fist fou ows... + 0.08 [ (0.5 0.8) + (0.69 0.4) + (0.65 0.404) ] 0.8 0.4 0.404 Eduation Goup.... Obseved Fequeny Some tetiay Expeted Fequeny (0.5) 4.6 5 (0.8) C (0.69) 0.4 9 (0.4) C 6 (0.65) 0.50 6 (0.404) 4 8 0 6 Mass 0.08 mass (of ow ) squaed hi-squae distane (between the pofile of and the aveage pofile) Inetia = mass (hi-squae distane) (0.5 0.8) + (0.69 0.4) + (0.65 0.404) EUCLIDEAN 0.8 0.4 0.404 WEIGHTED How an we see hi-squae distanes? Stethed Stethed ow pofiles viewed in -d hi-squaed spae Inetia = χ / = simila tems fo fist fou ows... + 0.08 [ (0.5 0.8) + (0.69 0.4) + (0.65 0.404) ] 0.8 0.4 0.404 mass (of ow ) squaed hi-squae distane (between the pofile of and the aveage pofile) (0.5 0.8) + (0.69 0.4) + (0.65 0.404) EUCLIDEAN 0.8 0.4 0.404 WEIGHTED ( 0.5 0.8 ) + ( 0.69 0.4 ) + ( 0.65 0.404 ) 0.8 0.8 0.4 0.4 0.404 0.404 Pythagoian odinay Eulidean distanes So the answe is to divide all pofile elements by the of thei aveages Chi-squae distanes

Summay: Basi geometi onepts Pofiles ae ows o olumns of elative fequenies, that is the ows o olumns expessed elative to thei espetive maginals, o bases. Eah pofile has a weight assigned to it, alled the mass, whih is popotional to the oiginal maginal fequeny used as a base. The aveage pofile is the the entoid (weighted aveage) of the pofiles. Vetex pofiles ae the exteme pofiles in the pofile spae ( simplex ). Pofiles ae weighted aveages of the veties, using the pofile elements as weights. The dimensionality of an I x J matix = min{i, J } The hi-squae distane measues the diffeene between pofiles, using an Eulidean-type funtion whih standadizes eah pofile element by dividing by the squae oot of its expeted value. The (total) inetia an be expessed as the weighted aveage of the squaed hi-squae distanes between the pofiles and thei aveage. The famous famous smoking data: ow poblem (see Coespondene Analysis in Patie ) atifiial example designed to illustate two-dimensional maps Senio manages Junio manages no li me hv 4 4 4 Senio employees 5 0 4 Junio employees 8 4 Seetaies 0 6 9 employees of a fim 5 ategoies of staff goup 4 ategoies of smoking (none,light,medium,heavy) ave none light medium heavy ow pofiles.6.8..8...9..49.0.4.08.0..8.5.40.4.8.08.... 0 0 0 0 0 0 0 0 0 0 0 0 View of ow pofiles in -d The famous famous smoking data: olumn poblem Senio manages Junio manages Senio employees Junio employees Seetaies no li me hv 4 4 4 5 0 4 8 4 0 6 It seems like the olumn pofiles, with 5 elements, ae 4-dimensional, BUT thee ae only 4 points and 4 points lie exatly in dimensions. So the dimensionality of the olumns is the same as the ows. no li me hv ave olumn pofiles.0.04.05.08.0.0..6.4..9.6.0.5.5.5.6...08.06.09.6.46. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

View of olumn pofiles in -d View of both pofiles and veties in -d

Coespondene Analysis & Related Methods Mihael Geenae SSION 0: (SIMPLE) CORRESPONDENCE ANALYSIS: SVD theoy What CA does entes the ow and olumn pofiles with espet to thei aveage pofiles, so that the oigin epesents the aveage. e-defines the dimensions of the spae in an odeed way: fist dimension explains the maximum amount of inetia possible in one dimension; seond adds the maximum amount to fist (hene fist two explain the maximum amount in two dimensions), and so on until all dimensions ae explained. deomposes the total inetia along the pinipal axes into pinipal inetias, usually expessed as % of the total. so if we want a low-dimensional vesion, we ust take the fist (pinipal) dimensions The ow and olumn poblem solutions ae losely elated, one an be obtained fom the othe; thee ae simple saling fatos along eah dimension elating the two poblems. D / XD / Genealized SVD (epeat epeat) We often want to assoiate weights on the ows and olumns, so that the fit is by weighted least-squaes, not odinay least squaes, that is we want to minimize RSS = n p i i= = ( x i x * i ) T T T = UDα V whee U U = V V = I, α α L 0 / X = D UD ( D X* = et... α / V) T Weighted meti multidimensional saling (epeat epeat) Suppose we want to epesent the (ented) ows of a matix Y, weighted by (positive) elements down diagonal of matix D, whee distane between ows is in the (weighted) meti defined by matix D m -. inetia = Σ i Σ q i (/m )y i S = D q½ Y D m ½ = U D α V T whee U T U = V T V = I Pinipal oodinates of ows: F = D q ½ U D α Pinipal axes of the ows: D m½ V Standad oodinates of olumns: G = D m ½ V Vaianes (inetias) explained: λ = α, λ = α,...

Of the ows: Coespondene analysis Y is the ented matix of ow pofiles ow masses in D q ae the elative fequenies of the ows olumn weights in D w ae the inveses of the elative fequenies of the olumns inetia = χ /n Of the olumns: Y is the ented matix of olumn pofiles olumn masses in D q ae the elative fequenies of the olumns ow weights in D w ae the inveses of the elative fequenies of the ows inetia = χ /n Both poblems lead to the SVD of the same matix S Coespondene analysis Table of nonnegative data N Divide N by its gand total n to obtain the so-alled oespondene matix P = (/n) N Let the ow and olumn maginal totals of P be the vetos and espetively, that is the vetos of ow and olumn masses, and D and D be the diagonal maties of these masses / T / = D ( P ) D o equivalently S = D Pinipal oodinates / T / ( D PD ) D F = D / UD α G = D / VD α (to be deived algebaially in lass) p i p i i i Standad oodinates i i / Φ = D U / Γ D V = Deomposition of total inetia along pinipal axes Duality (symmety) of the ows and olumns I ows (smoking I=5) J olumns (smoking J=4) inetia in(i) 0.0859 in(j) 0.0859 Inetia axis λ 0.046 (8.8%) λ 0.046 Senio manages Junio manages Senio employees Junio employees Seetaies no li me hv 4 4 4 5 0 4 8 4 0 6 sum 8 5 88 5 ow pofiles masses.6.8..8.06...9..09.49.0.4.08.6.0..8.5.46.40.4.8.08. Inetia axis λ 0.000 (.8%) λ 0.000 sum 6 45 6 5 ave.... Inetia axis λ 0.0004 ( 0.5%) λ 0.0004 olumn pofiles no li me hv.0.04.05.08.0.0..6.4..9.6.0.5.5.5.6...08 ave.06.09.6.46. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 no li me hv 0 0 0 0 0 0 0 0 0 0 0 0 masses....

Relationship between ow and olumn solutions ows olumns standad oodinates Φ = [ φ ik ] Γ = [ γ k ] pinipal oodinates F = [ f ik ] G = [ g k ] Relationship between ow and olumn solutions Vetex pofiles in standad oodinates λ = 0.046 = 0.65 λ = 0.000 = 0.00 elationships between F = ΦD α G = ΓD α oodinates f ik = α k x ik g k = α k y k whee α k = λ k is the squae oot of the pinipal inetia on axis k pinipal = standad α k Data pofiles in pinipal oodinates standad = pinipal / α k Vetiex pofiles in standad oodinates Data pofiles in pinipal oodinates 0.4 0. 0-0. Symmeti map using XLSTAT heavy Junio Manages 0.0046 (8.8 %) Junio Employees medium light 0.000 (.8 %) Senio Manages Seetaies none Senio Employees -0.4-0. 0 0. 0.4 Summay: Relationship between ow and olumn solutions. same dimensionality (ank) = min{i, J }. same total inetia and same pinipal inetias λ, λ,, on eah dimension (i.e., same deomposition of inetia along pinipal axes), hene same peentages of inetia on eah dimension. same oodinate solutions, up to a sala onstant along eah pinipal axis, whih depends on the squae oot λ k = α k of the pinipal inetia on eah axis: pinipal = standad λ k standad = pinipal / λ k 4. Asymmeti map: one set pinipal, othe standad 5. Symmeti map: both sets pinipal