The Greatest Deviation Correlation Coefficient and its Geometrical Interpretation

Similar documents
Multistage Median Ranked Set Sampling for Estimating the Population Median

UNIT10 PLANE OF REGRESSION

Chapter Fifiteen. Surfaces Revisited

Physics 11b Lecture #2. Electric Field Electric Flux Gauss s Law

Correspondence Analysis & Related Methods

Thermodynamics of solids 4. Statistical thermodynamics and the 3 rd law. Kwangheon Park Kyung Hee University Department of Nuclear Engineering

Chapter I Matrices, Vectors, & Vector Calculus 1-1, 1-9, 1-10, 1-11, 1-17, 1-18, 1-25, 1-27, 1-36, 1-37, 1-41.

Integral Vector Operations and Related Theorems Applications in Mechanics and E&M

A. Thicknesses and Densities

The Correlation Coefficients

Generating Functions, Weighted and Non-Weighted Sums for Powers of Second-Order Recurrence Sequences

Learning the structure of Bayesian belief networks

P 365. r r r )...(1 365

PHY126 Summer Session I, 2008

Scalars and Vectors Scalar

Dynamics of Rigid Bodies

If there are k binding constraints at x then re-label these constraints so that they are the first k constraints.

Rigid Bodies: Equivalent Systems of Forces

Tian Zheng Department of Statistics Columbia University

Optimization Methods: Linear Programming- Revised Simplex Method. Module 3 Lecture Notes 5. Revised Simplex Method, Duality and Sensitivity analysis

Set of square-integrable function 2 L : function space F

iclicker Quiz a) True b) False Theoretical physics: the eternal quest for a missing minus sign and/or a factor of two. Which will be an issue today?

Engineering Mechanics. Force resultants, Torques, Scalar Products, Equivalent Force systems

A Brief Guide to Recognizing and Coping With Failures of the Classical Regression Assumptions

PHYS 705: Classical Mechanics. Derivation of Lagrange Equations from D Alembert s Principle

Chapter 23: Electric Potential

8 Baire Category Theorem and Uniform Boundedness

An Approach to Inverse Fuzzy Arithmetic

Theo K. Dijkstra. Faculty of Economics and Business, University of Groningen, Nettelbosje 2, 9747 AE Groningen THE NETHERLANDS

COMPLEMENTARY ENERGY METHOD FOR CURVED COMPOSITE BEAMS

Groupoid and Topological Quotient Group

CSJM University Class: B.Sc.-II Sub:Physics Paper-II Title: Electromagnetics Unit-1: Electrostatics Lecture: 1 to 4

PARAMETER ESTIMATION FOR TWO WEIBULL POPULATIONS UNDER JOINT TYPE II CENSORED SCHEME

q-bernstein polynomials and Bézier curves

N = N t ; t 0. N is the number of claims paid by the

Energy in Closed Systems

Capítulo. Three Dimensions

Pattern Analyses (EOF Analysis) Introduction Definition of EOFs Estimation of EOFs Inference Rotated EOFs

Physics 207 Lecture 16

LASER ABLATION ICP-MS: DATA REDUCTION

GENERALIZATION OF AN IDENTITY INVOLVING THE GENERALIZED FIBONACCI NUMBERS AND ITS APPLICATIONS

Khintchine-Type Inequalities and Their Applications in Optimization

Some Approximate Analytical Steady-State Solutions for Cylindrical Fin

24-2: Electric Potential Energy. 24-1: What is physics

On the Distribution of the Product and Ratio of Independent Central and Doubly Non-central Generalized Gamma Ratio random variables

4 Recursive Linear Predictor

Test 1 phy What mass of a material with density ρ is required to make a hollow spherical shell having inner radius r i and outer radius r o?

Part V: Velocity and Acceleration Analysis of Mechanisms

APPLICATIONS OF SEMIGENERALIZED -CLOSED SETS

Rotational Kinematics. Rigid Object about a Fixed Axis Western HS AP Physics 1

Machine Learning. Spectral Clustering. Lecture 23, April 14, Reading: Eric Xing 1

DYNAMICS VECTOR MECHANICS FOR ENGINEERS: Kinematics of Rigid Bodies in Three Dimensions. Seventh Edition CHAPTER

4 SingularValue Decomposition (SVD)

Summer Workshop on the Reaction Theory Exercise sheet 8. Classwork

MULTIPOLE FIELDS. Multipoles, 2 l poles. Monopoles, dipoles, quadrupoles, octupoles... Electric Dipole R 1 R 2. P(r,θ,φ) e r

19 The Born-Oppenheimer Approximation

x = , so that calculated

Bayesian Assessment of Availabilities and Unavailabilities of Multistate Monotone Systems

Review of Vector Algebra and Vector Calculus Operations

Chapter 13 - Universal Gravitation

V. Principles of Irreversible Thermodynamics. s = S - S 0 (7.3) s = = - g i, k. "Flux": = da i. "Force": = -Â g a ik k = X i. Â J i X i (7.

Online Appendix to Position Auctions with Budget-Constraints: Implications for Advertisers and Publishers

Distinct 8-QAM+ Perfect Arrays Fanxin Zeng 1, a, Zhenyu Zhang 2,1, b, Linjie Qian 1, c

A. Proofs for learning guarantees

EE 5337 Computational Electromagnetics (CEM)

Contact, information, consultations

STAT 3008 Applied Regression Analysis

THE SUMMATION NOTATION Ʃ

Optimal System for Warm Standby Components in the Presence of Standby Switching Failures, Two Types of Failures and General Repair Time

A Bijective Approach to the Permutational Power of a Priority Queue

3. A Review of Some Existing AW (BT, CT) Algorithms

Linear Regression Analysis: Terminology and Notation

APPENDIX A Some Linear Algebra

Study on Vibration Response Reduction of Bladed Disk by Use of Asymmetric Vane Spacing (Study on Response Reduction of Mistuned Bladed Disk)

A. P. Sakis Meliopoulos Power System Modeling, Analysis and Control. Chapter 7 3 Operating State Estimation 3

Efficiency of the principal component Liu-type estimator in logistic

VISUALIZATION OF THE ABSTRACT THEORIES IN DSP COURSE BASED ON CDIO CONCEPT

Backward Haplotype Transmission Association (BHTA) Algorithm. Tian Zheng Department of Statistics Columbia University. February 5 th, 2002

Graphs of Sine and Cosine Functions

Chapter 3 Vector Integral Calculus

Remember: When an object falls due to gravity its potential energy decreases.

RECAPITULATION & CONDITIONAL PROBABILITY. Number of favourable events n E Total number of elementary events n S

4.4 Continuum Thermomechanics

Supersymmetry in Disorder and Chaos (Random matrices, physics of compound nuclei, mathematics of random processes)

Chapter 9: Statistical Inference and the Relationship between Two Variables

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Physics 2A Chapter 11 - Universal Gravitation Fall 2017

Ranks of quotients, remainders and p-adic digits of matrices

Polynomial Regression Models

PHYS Week 5. Reading Journals today from tables. WebAssign due Wed nite

VParC: A Compression Scheme for Numeric Data in Column-Oriented Databases

Markscheme May 2017 Calculus Higher level Paper 3

COORDINATE SYSTEMS, COORDINATE TRANSFORMS, AND APPLICATIONS

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Event Shape Update. T. Doyle S. Hanlon I. Skillicorn. A. Everett A. Savin. Event Shapes, A. Everett, U. Wisconsin ZEUS Meeting, October 15,

an application to HRQoL

A Micro-Doppler Modulation of Spin Projectile on CW Radar

Lab 10: Newton s Second Law in Rotation

3.1 Electrostatic Potential Energy and Potential Difference

Professor Wei Zhu. 1. Sampling from the Normal Population

Transcription:

By Rudy A. Gdeon The Unvesty of Montana The Geatest Devaton Coelaton Coeffcent and ts Geometcal Intepetaton The Geatest Devaton Coelaton Coeffcent (GDCC) was ntoduced by Gdeon and Hollste (987). The GDCC was shown to possess all the popetes of a ank based coelaton coeffcent, a new ted value pocedue was ntoduced, the null dstbuton fo ndependence was gven, and a populaton ntepetaton was gven. Ths pape expands the exact dstbuton up to sample szes 5, gves a moe ntutve defnton based on the gaph of the data, and demonstates the geometcal unqueness of the defnton by 9 otatons of the gaph. An analogous gaph s used to gve a geometc defnton of the populaton o theoetcal value of GDCC. Fou coelaton coeffcents, GDCC, Kendall s, Speaman s, and an absolute value ank coelaton coeffcent ae computed on the scatteplot of the anked data by smple countng methods. The sgnfcance of GDCC as a obust coelaton coeffcent s llustated and the use of the asymptotc dstbuton s demonstated.. The Mathematcal Defnton of GDCC Ths pape bngs togethe components of the GDCC n ode to make t a useful tool fo studyng the elatonshps between vaables. We stat wth the basc o mathematcal defnton. The method wll be llustated wth the YMCA data that appeaed n the 987 atcle. The data came as the anks ( won-lost ecod vesus Spotsmanshp) of 6 teams of 4 th and 5 th gades n a YMCA basketball league. The Spotsmanshp ank was a summay tally of a weekly evaluaton by team coaches and offcals. The data s epoduced hee n Table. TABLE : YMCA team ankng n Spotsmanshp and won-lost ecod p 4 6 2 2 3 7 9 3 8 5 6 4 5 2 3 4 5 6 7 8 9 2 3 4 5 6 The Won-Lost ank data s the lowe ow and can be consdeed the ndependent vaable whle the uppe ow s the Spotsmanshp ank and s taken as the dependent vaable. By vewng the gaph of ths data ( see Gaph) one mmedately notces that (4,2) and (3,5) ae unusual data ponts. We wte the data n the fom {, p },,2,3,..., 6 ; that s, numecally odeed by the won-lost ankngs. The geneal defnton of the GDCC, wth sample sze n, s max n I( n + p > ) max n 2 n I( p > ) ()

2 whee the denomnato s the geatest ntege functon evaluated at n/2, and I s the ndcato functon. The ndcato functon s s the expesson s tue and f false. Fo small sample szes ths s easy to compute by hand, and the layout s shown fo the YMCA data n Table 2. Thee columns ae necessay, {, p, n + p} whee the fst column of the ognal data must be n numecal ode. Fom these columns the elements n the summatons n the fomula above ae computed. Ths computatonal opeaton follows: Column conssts of values of,2,3 6 and column 2 s the coespondng values of p (Spotsmanshp ank). In ode to compute I( p > ), labeled I-sum-R n column 3, a smple pocedue s used. Fo ow ( ) smply count how many p-anks (the p column) thee ae n ows and above that ae geate than. Snce p 4, thee s only one, namely 4 tself, so the 3 d column enty s. Fo ow 2 ( 2), p 2 and both 4 and (p and p 2 ) exceed 2 and hence the coespondng column 3 value s 2. Contnue down the column. One last example: fo ow 7 ( 7) we examne the numbe of entes fom {p,p 2,...,p 7 } {4,,6,2,3} whch exceed 7. Thee ae fve. In the exact same manne, columns and 4 (the evese column) ae used to calculate the values n column 5 whose fomula s the left hand sde of fomula (). The maxmums of columns 3 and 5, 6 and 3, ae convenently put on the bottom of columns 3 and 5. Then max( I sum L) max( I sum R) 3 6 3. 6 8 8 2 TABLE 2: Calculaton of the Geatest Devaton Coelaton Coeffcent column column 2 column 3 column 4 column 5 p I-sum-R n+-p I-sum-L 4 3 2 2 6 2 3 6 3 4 2 3 5 2 5 2 4 5 2 6 3 5 4 7 7 5 2 8 9 6 8 2 9 6 7 2 3 5 4 2 8 4 9 2 2 3 6 3 3 5 3 2 3 4 6 2 2 5 4 3 6 5 2 maxmum 6 3

3 2. The Geometcal Defnton of GDCC We now make the tanston fom the fomula calculaton of to a gaphcal countng technque. The scatteplot of the YMCA data {, p },,2,3,..., 6 s gven n fgue. The lnes wth slopes ± have been added to facltate the countng. Fst, the patal sum I( p > ) s meely the count among the fst p -anks of those that ae stctly geate than ;.e., the data ponts n the scatteplot stctly above and to the left of the pont (,) on the + dagonal. Ponts on the ght hand sde of the defnng ectangle ae ncluded n the count; thus, the countng egon s closed on the ght but open on the bottom. A maxmum of 6 s acheved at 9, and the bodes of ths ectangula egon ae shown on the gaph. Note that the pont (8,9) on the lowe s not counted because of the stct nequalty wheeas the pont (9,) s counted. Thus, thee ae 6 ponts fo the maxmum. If (8,8) s used to defne a countng egon, the maxmum of 6 s also acheved. Second, the patal sum I( n + p > ) I( p < n + ) s the count among the fst p -anks of those that ae stctly less than n+- 7-;.e., those ponts that ae on the vetcal o to the left and stctly below the pont (, 7-) on the dagonal. A maxmum of 3 s acheved at 2 wth the egon contanng the 3 ponts shown n the fgue. The maxmum s also acheved at 3, as s easly seen n the fgue. It s clea that the two pats of the numeato of can be calculated by tavesng the ± dagonals wth movng ectangles and detemnng whch ectangula aeas contan the maxmal numbe of data ponts. The numbe of ponts n each ectangula egon,2,3,... 6 wll coespond exactly to the numbes n the I-sum-R and I- sum-l columns of Table 2. Geometcally, t may appea that some nfomaton s lost o a dffeent value of could be obtaned by tavesng the ± dagonals on the othe sdes. The next secton elates ths queston to the geneal popetes of of ths secton was used n obtanng the asymptotc dstbuton of 3. Geometcal Unquenes of. The geometcal featue. Refeence? The nfomaton n the scatteplot should eman the same f the gaph s otated 9 and s calculated ove on ths new oentaton. When the gaph s otated 9, let the axes be elabeled n the usual left to ght and vetcal manne and ecompute.

4 Ths 9 otaton wll be done thee consecutve tmes. A fouth otaton bngs back the ognal gaph. Let the notaton (, + ) denote the elatonshp of the countng ectangles to the ognal oentaton of the axes. Thus, (, + ) means lowe ectangle fo the dagonal and uppe ectangle fo + dagonal, the scheme n fgue. We now explan what happens to the data unde thee consecutve 9 otatons; the oentaton of the countng ectangles elatve to the ognal fgue, and the new value of. In Table 3 the fst ow s the data afte a otaton, the ow the oentaton of the countng ectangles elatve to the ognal gaph, the thd ow s the value of, and the fouth ow s what s effectvely beng calculated as elated to the ognal data, x, won lost anks, and y p, the anks of spotsmanshp. Table 3:Rotatonal Effects on Fgue ognal fst otaton second otaton thd otaton data {,p } {7-p,} {7-,7-p } {p,7-} dag sdes (, + ) (, + ) (, + ) (, + ) value -3/8 3/8-3/8 3/8 (.,.) (x,y) (x,-y) (-x,-y) (-x,y) Not only does mantan the same absolute value, but the countng ectangles whle tavesng the ± dagonals gve exactly the same sequences of numbes. Thus, the I- sum-r and I-sum-L columns of the otatons ae exactly the same as fo the ognal data. Howeve, they may be n evese ode. Ths connects the geomety to popety 4 of Secton 3 and fact (d) on page 654 of the [987] pape. The eade s nvted to pefom these calculatons by otatng the fgue, elabelng the axes and ecomputng. 4. Populaton Defnton of GDCC Ths secton epeats the populaton ntepetaton of n the 987 pape and ntepets t elatve to the geometcal deas of the sample defnton. Let ( X, Y ) be a contnuous bvaate andom vaable wth magnal dstbuton functons F and G, espectvely. Let ( U, V ) ( F ( X ), G( Y )), and defne the Copula functon, Nelson (999), P( U t, V < t) C( t, t) P( U t, V > t) C( t,) C( t, t). A gaph of the densty of ( U, V ) s gven n whch the bvaate nomal fom whch t came has means and standad devatons of 2 and 32, and covaance 3. The populaton value of s 2sup C( t, t) 2sup[ C( t, ) C( t, t)]. (2) < t< < t<

5 The ght hand sde of the populaton value of coesponds to the ght hand sde of the statstc, max I ( p > ). The left hand sde sup C( t, t) coesponds to max I ( n + p > ). The dagonal lnes n the ( U, V ) plane, slopes ±, ae used n exactly the same manne to move ectangles along and detemne the ectangle wth the lagest volume unde the densty of ( U, V ). Rathe than count ponts the populaton value seeks the ectangles wth maxmum volume unde the densty of ( U, V ). Fo the specfc bvaate nomal fo the fgue, the coelaton coeffcent s ρ 3 ( 2 32) 3 8. In fomula (2) the supemums, as explaned n the 987 pape occu at t/2 and C (, ) + sn 2 2 4 2π 3.25 +.679.379, and 8 3 C(,) C(, ) ( + sn ).25.679.8882. 2 2 2 2 4 2π 8 Thus 2(.379)-2(.8882).62235.37764.2447. Fo the bvaate Nomal, the populaton value of s smplfed to 2 sn π 2 3 ρ. So, dectly, sn. 2447. π 8 The volumes.38 and.888 ae the volumes of the ectangla egons along the dagonals emanatng fom the pont (/2,/2) unde the (U,V) densty. In the fgue of the (U,V) densty the volume.32 s the egon unde the ased pat of the densty, nea the(,) pont. The.888 quantty came fom the volume unde that pat of the densty doppng off towad zeo, nea the pont (,). 5. The Null Dstbuton of The GDCC The null dstbuton fo n the 987 pape was gven exactly up to sample sze n. Table 5 ncludes ths ognal table and expands to lst the pobablty dstbuton up to sample sze n 5. The symmety of the Null Dstbuton s used so that only pobabltes of postve outcomes and zeo ae gven. A ctcal value table was gven n the ognal pape fo hypothess testng fo alpha values of.,.5, and.. An ntepolaton tem was gven so that a andomzed test could be used to obtan almost exact alpha levels. Wth the exact dstbutons fo n to 5, the smulaton estmates can be eplaced by moe exact values. Fo example f n 5 and a two-sded test of ndependence s desed, then fom the ognal pape, to obtan a two-sded test wth α. 5, eect H o : ndependence, f 4 7 and eect wth pobablty.399 f 3 7. The old pobablty was.3664. Thus,

6 ( eect tue) P( 4 7) + (.399) P( 3 7) P H o.6474 + (.399).8839.5 Table 4: Null Dstbuton: Geatest Devaton Coelaton Coeffcent fequency of outcomes, symmetc about zeo sample sze outcome n3 4 5 6 7 8 9 4 6 6 256 2848 6 6372 4624 475496 /5 562932 666468 ¼ 772 2366 /3 96 5 2/5 4792 528736 ½ 3 5 248 8992 3/5 36672 73228 2/3 35 595 ¾ 399 6927 4/5 4623 879 TOTAL 3! 4! 5! 6! 7! 8! 9!!!

7 sample sze outcome n 2 3 8323892 449824256 /6 4629788 599388 /3 3223332 633876372 ½ 946768 2633376 2/3 727632 28456272 5/6 53823 94863 TOTAL 2! 3! sample sze outcome n 4 5 3683667456 335549456 /7 73656736 2692784448 2/7 364344 585248588 3/7 274856 57739685652 4/7 758459648 9645298832 5/7 2688 4989392 6/7 627263 967359 TOTAL 4! 5! Ths s ust beng saved fo now: Column s headed by, the won-lost ankng, Column 2 by p, the Spotmanshp anks, and column 4 by n+-p, called the evese anks. The computatons ae shown n columns 3 and 5 and ae the tems I( p > ), labeled I-sum-R, and the tems I( n + p > ), labeled I-sum-L, espectvely. The hand, eye, ban algothm fo ceatng column 3 fom columns and 2 s exactly the same fo column 5 usng columns and 4. Put fnges unde the of column and unde the coespondng postons on columns 3 and 5 and count the y anks above that ae stctly geate than, and ente the esult. Move down o ncease and epeat. Wth ust a lttle pactce, t s easy to pefom ths opeaton wthout eally thnkng of these fomulae. the maxmums used fo GDCC ae gven at the bottom, 6 and 3 fo ths data. Thus -3/8.