Towards a Localized Version of Pearson's Correlation Coefficient

Similar documents
How to Distinguish True Dependence from Varying Independence?

Propagation of Probabilistic Uncertainty: The Simplest Case (A Brief Pedagogical Introduction)

Why Copulas? University of Texas at El Paso. Vladik Kreinovich University of Texas at El Paso,

Modeling Extremal Events Is Not Easy: Why the Extreme Value Theorem Cannot Be As General As the Central Limit Theorem

Similarity Beyond Correlation: Symmetry-Based Approach

How to Define "and"- and "or"-operations for Intuitionistic and Picture Fuzzy Sets

How to Make Plausibility-Based Forecasting More Accurate

How to Assign Weights to Different Factors in Vulnerability Analysis: Towards a Justification of a Heuristic Technique

How to Define Relative Approximation Error of an Interval Estimate: A Proposal

When Can We Reduce Multi-Variable Range Estimation Problem to Two Fewer-Variable Problems?

How to Bargain: An Interval Approach

Towards Decision Making under General Uncertainty

Exact Bounds on Sample Variance of Interval Data

Product of Partially Ordered Sets (Posets), with Potential Applications to Uncertainty Logic and Space-Time Geometry

Towards Designing Optimal Individualized Placement Tests

Membership Functions Representing a Number vs. Representing a Set: Proof of Unique Reconstruction

Do It Today Or Do It Tomorrow: Empirical Non- Exponential Discounting Explained by Symmetry and Fuzzy Ideas

Voting Aggregation Leads to (Interval) Median

Monchaya Chiangpradit 1, Wararit Panichkitkosolkul 1, Hung T. Nguyen 1, and Vladik Kreinovich 2. 2 University of Texas at El Paso

Towards a More Physically Adequate Definition of Randomness: A Topological Approach

Towards Selecting the Best Abstraction for a Patrolling Game

Decision Making under Interval and Fuzzy Uncertainty: Towards an Operational Approach

Multi-Criteria Optimization - an Important Foundation of Fuzzy System Design

How to Estimate Expected Shortfall When Probabilities Are Known with Interval or Fuzzy Uncertainty

How to Explain Ubiquity of Constant Elasticity of Substitution (CES) Production and Utility Functions Without Explicitly Postulating CES

Computing Standard-Deviation-to-Mean and Variance-to-Mean Ratios under Interval Uncertainty is NP-Hard

Density-Based Fuzzy Clustering as a First Step to Learning the Rules: Challenges and Possible Solutions

From Mean and Median Income to the Most Adequate Way of Taking Inequality Into Account

Why Locating Local Optima Is Sometimes More Complicated Than Locating Global Ones

How to Measure Loss of Privacy

Adjoint Fuzzy Partition and Generalized Sampling Theorem

Decision Making under Interval Uncertainty: What Can and What Cannot Be Computed in Linear Time and in Real Time

Decision Making Under Interval Uncertainty: Beyond Hurwicz Pessimism-Optimism Criterion

Towards Fast and Reliable Localization of an Underwater Object: An Interval Approach

Logic of Scientific Discovery: How Physical Induction Affects What Is Computable

Why Curvature in L-Curve: Combining Soft Constraints

Image and Model Fusion: Unexpected Counterintuitive Behavior of Traditional Statistical Techniques and Resulting Need for Expert Knowledge

Full Superposition Principle Is Inconsistent with Non-Deterministic Versions of Quantum Physics

Interval Computations as an Important Part of Granular Computing: An Introduction

Optimizing Computer Representation and Computer Processing of Epistemic Uncertainty for Risk-Informed Decision Making: Finances etc.

Towards Decision Making under Interval Uncertainty

Exponential Disutility Functions in Transportation Problems: A New Theoretical Justification

Checking Monotonicity is NP-Hard Even for Cubic Polynomials

Computing with Words: Towards a New Tuple-Based Formalization

Why Bellman-Zadeh Approach to Fuzzy Optimization

Computational Aspects of Aggregation in Biological Systems

Use of Grothendieck s Inequality in Interval Computations: Quadratic Terms are Estimated Accurately (Modulo a Constant Factor)

Towards Decision Making under Interval, Set-Valued, Fuzzy, and Z-Number Uncertainty: A Fair Price Approach

Increased Climate Variability Is More Visible Than Global Warming: A General System-Theory Explanation

On Decision Making under Interval Uncertainty: A New Justification of Hurwicz Optimism-Pessimism Approach and Its Use in Group Decision Making

Inverse problems in theory and practice of measurements and metrology

We Live in the Best of Possible Worlds: Leibniz's Insight Helps to Derive Equations of Modern Physics

Computability of the Avoidance Set and of the Set-Valued Identification Problem

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

Kekule's Benzene Structure: A Case Study of Teaching Usefulness of Symmetry

Decision Making Under Interval Uncertainty as a Natural Example of a Quandle

Formalizing the Informal, Precisiating the Imprecise: How Fuzzy Logic Can Help Mathematicians and Physicists by Formalizing Their Intuitive Ideas

Why in Mayan Mathematics, Zero and Infinity are the Same: A Possible Explanation

Decision Making under Interval (and More General) Uncertainty: Monetary vs. Utility Approaches

Computing With Tensors: Potential Applications of Physics-Motivated Mathematics to Computer Science

Why Fuzzy Cognitive Maps Are Efficient

Which Bio-Diversity Indices Are Most Adequate

Positive Consequences of Negative Attitude: Game-Theoretic Analysis

A New Cauchy-Based Black-Box Technique for Uncertainty in Risk Analysis

Unimodality, Independence Lead to NP-Hardness of Interval Probability Problems

Towards A Physics-Motivated Small-Velocities Approximation to General Relativity

Computation in Quantum Space-time Could Lead to a Super-polynomial Speedup

Combining Interval, Probabilistic, and Fuzzy Uncertainty: Foundations, Algorithms, Challenges An Overview

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Formulas for probability theory and linear models SF2941

Towards Combining Probabilistic and Interval Uncertainty in Engineering Calculations

Bivariate Paired Numerical Data

ESTIMATING STATISTICAL CHARACTERISTICS UNDER INTERVAL UNCERTAINTY AND CONSTRAINTS: MEAN, VARIANCE, COVARIANCE, AND CORRELATION ALI JALAL-KAMALI

Towards Symmetry-Based Explanation of (Approximate) Shapes of Alpha-Helices and Beta- Sheets (and Beta-Barrels) in Protein Structure

How to Make a Proof of Halting Problem More Convincing: A Pedagogical Remark

Combining Interval and Probabilistic Uncertainty in Engineering Applications

How to Estimate, Take Into Account, and Improve Travel Time Reliability in Transportation Networks

University of Texas at El Paso. Vladik Kreinovich University of Texas at El Paso,

Only Intervals Preserve the Invertibility of Arithmetic Operations

Positive Consequences of Negative Attitude: Game-Theoretic Analysis

Discrete Taylor Series as a Simple Way to Predict Properties of Chemical Substances like Benzenes and Cubanes

Towards Foundations of Interval and Fuzzy Uncertainty

5 Operations on Multiple Random Variables

Regression and Statistical Inference

Covariance and Correlation

Probability Theory and Statistics. Peter Jochumzen

1 Random variables and distributions

GEOMETRY OF PROTEIN STRUCTURES. I. WHY HYPERBOLIC SURFACES ARE A GOOD APPROXIMATION FOR BETA-SHEETS

18 Bivariate normal distribution I

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2

Towards Optimal Allocation of Points to Different Assignments and Tests

TMA4255 Applied Statistics V2016 (5)

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours

Lecture 21: Convergence of transformations and generating a random variable

Chapter 12: Bivariate & Conditional Distributions

Lecture Quantitative Finance Spring Term 2015

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Multivariate Survival Data With Censoring.

Multivariate Random Variable

Transcription:

University of Texas at El Paso DigitalCommons@UTEP Departmental Technical Reports (CS) Department of Computer Science 7-1-201 Towards a Localized Version of Pearson's Correlation Coefficient Vladik Kreinovich University of Texas at El Paso, vladik@utep.edu Hung T. Nguyen New Mexico State University - Main Campus, hunguyen@nmsu.edu Berlin Wu National Chengchi University, berlin@nccu.edu.tw Follow this and additional works at: http://digitalcommons.utep.edu/cs_techrep Part of the Computer Sciences Commons Comments: Technical Report: UTEP-CS-1-46 Published in Journal of Intelligent Technologies and Applied Statistics, 201, Vol. 6, No., pp. 215-224. Recommended Citation Kreinovich, Vladik; Nguyen, Hung T.; and Wu, Berlin, "Towards a Localized Version of Pearson's Correlation Coefficient" (201). Departmental Technical Reports (CS). Paper 787. http://digitalcommons.utep.edu/cs_techrep/787 This Article is brought to you for free and open access by the Department of Computer Science at DigitalCommons@UTEP. It has been accepted for inclusion in Departmental Technical Reports (CS) by an authorized administrator of DigitalCommons@UTEP. For more information, please contact lweber@utep.edu.

Towards a Localized Version of Pearson s Correlation Coefficient Vladik Kreinovich 1, Hung T. Nguyen 2,, and Berlin Wu 4 1 Department of Computer Science University of Texas at El Paso 500 W. University, El Paso, TX 79968, USA vladik@utep.edu 2 Department of Mathematical Sciences New Mexico State University Las Cruces, New Mexico 8800, USA hunguyen@nmsu.edu Department of Economics Chiang Mai University Chiang Mai, Thailand 4 Department of Mathematical Sciences National Chengchi University Taipei 116, Taiwan berlin@nccu.edu.tw Abstract Pearson s correlation coefficient is used to describe dependence between random variables X and Y. In some practical situations, however, we have strong correlation for some values X and/or Y and no correlation for other values of X and Y. To describe such a local dependence, we come up with a natural localized version of Pearson s correlation coefficient. We also study the properties of the newly defined localized coefficient. 1 Formulation of the Problem Pearson s correlation coefficient: reminder. To describe relation between two random variables X and Y, Pearson s correlation coefficient r is often used. This coefficient is defined as r[x, Y ] def = C[X, Y ] σ[x] σ(y ), (1) 1

where the C(X, Y ) def = E[(X E[X]) (Y E[Y ])] = E[X Y ] E[X] E[Y ] is the covariance, E[X] means the mean, σ[x] def = V [X] if the standard deviation, and V [X] def = E[(X E[X]) 2 ] = E[X 2 ] (E[X]) 2 is the variance. Pearson s correlation coefficient ranges between 1 and 1. When the variables X and Y are independent, then r[x, Y ] = 0. When r[x, Y ] = 1 or r[x, Y ] = 1, this means that Y is a linear function of X (i.e., informally, that we have a perfect correlation). Need for a local version of Pearson s correlation coefficient. Pearson s correlation coefficient provides a global description of the relation between the random variables X and Y. In some practical situations, there is a stronger correlation for some values of X and/or Y and a weaker correlation for other values of X and/or Y. To describe such local dependence, we need to come up with a local version of Pearson s correlation coefficient. 2 Towards a Definition of a Local Version of Pearson s Correlation Coefficient Motivation for the new definition. For given random variables X and Y and for given real numbers x and y, we want to describe the dependence between the variables X and Y limited to small neighborhood (x ε, x + ε) (y δ, y + δ) of a point (x, y). This means, in effect, that instead of the pair of random variables (X, Y ) corresponding to the original probability distribution, we consider a pair (X x±ε,y±δ, Y ε,δ ) with the conditional probability distribution, under the condition that (X, Y ) (x ε, x + ε) (y δ, y + δ). This conditional probability distribution can be described in the usual way: for every measurable set S IR 2, the corresponding probability is defined as Prob ((X x±ε,y±δ, Y x±ε,y±δ ) S) Prob ((X x±ε,y±δ, Y x±ε,y±δ ) S) = Prob ((X, Y ) S (x ε, x + ε) (y δ, y + δ)). Prob ((X, Y ) (x ε, x + ε) (y δ, y + δ)) To describe the desired dependence, it is reasonable to consider the asymptotic behavior of the correlation between X x±ε,y±δ and Y x±ε,y±δ when ε 0 and δ 0. It turns out that for probability distributions with twice continuously differentiable probability density function ρ(x, y), we can have an explicit expression for the Pearson s correlation coefficient r[x x±ε,y±δ, Y x±ε,y±δ ] in terms of ρ(x, y): 2

Proposition 1. For probability distributions with twice continuously differentiable probability density functions ρ(x, y), we have r[x x±ε,y±δ, Y x±ε,y±δ ] = 1 ε δ 2 ln(ρ(x, y)) + o(ε δ). (2) This asymptotic behavior is determined by a single parameter 2 ln(ρ(x, y)). It is therefore reasonable to use this parameter as a local version of Pearson s correlation coefficient: Definition 1. Let (X, Y ) be a random 2-D vector, and let (x, y) be a 2-D point. By the local correlation at a point (x, y), we mean the value r x,y [X, Y ] def = 2 ln(ρ(x, y)). () Proof of Proposition. Since the probability density function is twice differentiable, in the small vicinity of the point (x, y), we have where ρ(x + x, y + y) = c + c x x + c y y+ 1 2 c xx ( x) 2 + c xy x y + 1 2 c yy ( y) 2 + o(ε 2, δ 2 ), (4) c def = ρ(x, y), c x def = ρ x, c xx def = 2 ρ x 2, c x def = 2 ρ, c y c yy def = ρ y, def = 2 ρ y 2. (5) The condition probability distribution is obtained by obtained by dividing the original one by δ Prob ((X, Y ) (x ε, x + ε) (y δ, y + δ)) = ρ(x+ x, y+ y) d x d y = c (2ε) (2δ)+o(ε, δ) = 4c ε δ+o(εδ). (6) Pearson s correlation coefficient does not change if we shift both X and Y by a constant, i.e., consider shifted random variables X def = X x±ε,y±δ x and Y def = Y x±ε,y±δ y instead of the original random variables X x±ε,y±δ and Y x±ε,y±δ : C[X x±ε,y±δ, Y x±ε,y±δ ] = C[ X, Y ] =

E[ X Y ] E[ X] E[ Y ] E[( X)2 ] (E[ X]) 2 E[( Y ) 2 ] (E[ Y ]) 2. (7) Here, E[ X] = x ρ(x + x, y + y) d x d y δ δ ρ(x + x, y + y) d x d y. (8) In this formula, x ρ(x + x, y + y) = x (c+c x x+c y y + 1 2 c xx ( x) 2 +c xy x y + 1 2 c yy ( y) 2 +o). (9) The integral, over a symmetric box, of any expression which is odd in x and/or y is equal to 0. Thus, the main term in this expression that leads to a non-zero integral is c x ( x) 2. For this term, the interval in the numerator of the formula ( ) ε (8) is equal to c x (2δ) () the denominator (6) of the formula (8) is equal to 4c ε δ + o, thus the formula (8) leads to: Similarly, For the expected value of ( X) 2, we get E[( X) 2 ] = δ (c ( x)2 + o) d x d y δ (c + o) d x d y = = c x δ 4 ε. We already know that E[ X] = 1 ε2 cx c. (10) E[ Y ] = 1 δ2 cy c. (11) δ ( x)2 ρ(x + x, y + y) d x d y δ ρ(x + x, y + y) d x d y = ( ε c 2 δ () 4c ε δ + o ) + o Here, E[( X) 2 ] ε 2 and, due to (10), (E[ X]) 2 ε 4 = o(ε 2 ). Thus, = 1 ε2. (12) E[( X) 2 ] (E[ X]) 2 = E[( X) 2 ] + o = 1 ε2 + o. (1) Similarly, E[( Y ) 2 ] (E[ Y ]) 2 = 1 ε2 + o, (14) so the denominator of the expression (7) is equal to E[( X)2 ] (E[ X]) 2 E[( Y ) 2 ] (E[ Y ]) 2 = 1 ε δ + o. (15) 4

For the numerator of the formula (7), we get E[ X Y ] = x y ρ(x + x, y + y) d x d y δ δ ρ(x + x, y + y) d x d y. (16) In this formula, x y ρ(x + x, y + y) = x y (c+c x x+c y y+ 1 2 c xx ( x) 2 +c xy x y+ 1 2 c yy ( y) 2 +o). (17) The only term here which is not odd in x or in y is the term c xy ( x) 2 ( y) 2, for which the interval in the numerator of (16) is equal to c xy 2 ε 2 δ + o. Since the denominator (6) of the expression (16) is equal to 4c ε δ + o, thus From (10), (11), and (18), we conclude that E[ X Y ] = 1 9 cxy c ε2 δ 2 + o. (18) E[ X Y ] E[ X] E[ Y ] = 1 ( 9 ε2 δ 2 cxy c c x c ) y c 2 + o. (19) From (19) and (15), we can now conclude that the ratio (7) has the form C[X x±ε,y±δ, Y x±ε,y±δ ] = C[ X, Y ] = 1 ε δ ( cxy c c x c y c 2 ) + o. (20) Substituting the expressions (5) for c, c x, c y, and c xy in terms of the probability density ρ(x, y) and its derivative, we can easily check that the expression c xy c c x c y c 2 is indeed equal to the derivative (). The proposition is proven. What is the Meaning of the New Definition for a Normal Distribution To better understand the meaning of our newly defined term, let us compute its value for a normal distribution, for which ρ(x, y) = const exp ( a xx (x x 0 ) 2 + 2a xy (x x 0 ) (y y 0 ) + a yy (y y 0 ) 2 ), 2 where the matrix ( ) axx a a = xy a xy a yy 5

is the inverse matrix to the covariance matrix ( ) V [X] C[X, Y ] C =. (21) C[X, Y ] V [Y ] For this distribution, ln(ρ(x, y)) = 1 2 (a xx (x x 0 ) 2 +2a xy (x x 0 ) (y y 0 )+a yy (y y 0 ) 2 )), (22) and thus, r x,y [X, Y ] = 2 ln(ρ(x, y)) = a xy. (2) If we use an explicit formula for the elements of the inverse 2 2 matrix to describe a xy in terms of the element of the covariance matrix (21), we get the following expression: r x,y [X, Y ] = Substituting V [X] = (σ[x]) 2, V [Y ] = (σ[y ]) 2, and into the formula (24), we conclude that r x,y [X, Y ] = C[X, Y ] V [X] V [Y ] (C[X, Y ]) 2. (24) C[X, Y ] = σ[x] σ[y ] r[x, Y ] 4 Criterion for Independence r x,y [X, Y ] 1 (r x,y [X, Y ]) 2 1 σ[x] σ[y ]. (25) It turns out that the new localized version of Pearson s correlation coefficient provides a natural criterion for independence. Proposition 2. For a random 2-D vector (X, Y ) with a twice continuously differentiable probability density function ρ(x, y), the following two conditions are equivalent to each other: X and Y are independent; r x,y [X, Y ] = 0 for all x and y. Proof. If X and Y are independent, then ρ(x, y) = ρ X (x) ρ Y (y), where ρ X (x) and ρ Y (y) are probability densities corresponding to the marginal distributions. Thus, ln(ρ(x, y)) = ln(ρ X (x)) + ln(ρ Y (y)) and therefore, for every x and y, we have r x,y [X, Y ] = 2 ln(ρ(x, y)) = 0. 6

Vice versa, let us assume that r x,y [X, Y ] = 2 ln(ρ(x, y)) = 0 for all x and y. The fact that the x-partial derivative of the auxiliary function ln(ρ(x, y)) y is equal to 0 means that this auxiliary function does not depend on x, i.e., that it depends only on y: ln(ρ(x, y)) y for some function f 1 (y). Integrating over y, we get = f 1 (y) (26) ln(ρ(x, y)) = f 2 (y) + f (x), (27) where f 2 (y) def = y 0 f 1(t) dt is an integral of the function f 1 (y), and f (x) is a constant of integration constant which may depend on x. For ρ(x, y) = exp(ln(ρ(x, y)), we thus conclude that ρ(x, y) = F (x) F 2 (y), (28) where F (x) def = exp(f (x)) and F 2 (y) def = exp(f 2 (y)). Since ρ(x, y) 0 for all x and y, we can conclude that ρ(x, y) = F (x) F 2 (y), (29) with F (x) 0 and F 2 (x) 0. By normalizing the functions of x and y, i.e., by taking ρ X (x) def F (x) = F (t) dt and ρ Y (y) def F 2 (y) = F, we conclude 2(t) dt that ρ(x, y) = ρ X (x) ρ Y (y), i.e., that X and Y are indeed independent. The proposition is proven. 5 Relation to Copulas A probability distribution with a probability distribution function F (x, y) def = Prob (X x & Y y) can be described as F (x, y) = C(F X (x), F Y (y)), (0) where F X (x) def = Prob (X x) and F Y (y) def = Prob (Y y) are marginal distributions, and a function C(a, b) def = F (F 1 1 X (a), FY (b)) is known as a copula; see, e.g., [1, 2]. A copula can also be viewed as a probability distribution function for a 2-D random vector (A, B), with a probability density function c(a, b) def = 2 C(a, b). a b The probability density function ρ(x, y) of the original distribution can be described, in terms of the cumulative distribution function F (x, y), as ρ(x, y) = 7

2 F (x, y). Substituting the expression (0) into this formula and using the chain rule, we conclude that ρ(x, y) = c(f X (x), F Y (y)) ρ X (x) ρ Y (y), (1) where ρ X (x) and ρ Y (y) are the probability densities corresponding to the marginal distributions. For the copula s random vector (A, B), we can also define the local Pearson s correlation coefficient: r a,b [A, B] = 2 ln(c(a, b)). (2) a b It turns out that the above localized version of Pearson s correlation coefficient can be naturally reformulated in terms of the copula and marginal distributions; namely, the relation is the same as the relation (1) for probability densities: Proposition. Let (X, Y ) be a random 2-D vector with marginal distributions F X (x) and F Y (y), and let (A, B) be the corresponding copula distribution. Then, r x,y [X, Y ] = r FX (x),f Y (y)[a, B] ρ X (x) ρ Y (y). () Proof. By taking logarithms of both sides of the formula (1), we get ln(ρ(x, y)) = ln(c(f X (x), F Y (y)) + ln(ρ X (x)) + ln(ρ Y (y)). (4) Differentiating both sides of this formula with respect to x and y, we get the desired expression (). The proposition is proven. Acknowledgements This work was supported in part by the National Science Foundation grants HRD-074825 and HRD-1242122 (Cyber-ShARE Center of Excellence) and DUE-0926721, by Grants 1 T6 GM078000-01 and 1R4TR00017-01 from the National Institutes of Health, and by a grant N62909-12-1-709 from the Office of Naval Research. References [1] P. Jaworski, F. Durante, W. K. Härdle, and T. Ruchlik (eds.), Copula Theory and Its Applications, Springer Verlag, Berlin, Heidelberg, New York, 2010. [2] R. B. Nelsen, An Introduction to Copulas, Springer Verlag, Berlin, Heidelberg, New York, 1999. 8

[] D. J. Sheskin, Handbook of Parametric and Nonparametric Statistical Procedures, Chapman and Hall/CRC, Boca Raton, Florida, 2011. 9