Bivariate Paired Numerical Data

Similar documents
More than one variable

Multiple Sample Categorical Data

Multiple Sample Numerical Data

One-Sample Numerical Data

Simple Linear Regression

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Covariance and Correlation

Simulating Realistic Ecological Count Data

H 2 : otherwise. that is simply the proportion of the sample points below level x. For any fixed point x the law of large numbers gives that

ACM 116: Lectures 3 4

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

1 A Review of Correlation and Regression

Algorithms for Uncertainty Quantification

Bivariate distributions

STA 2201/442 Assignment 2

UCSD ECE153 Handout #34 Prof. Young-Han Kim Tuesday, May 27, Solutions to Homework Set #6 (Prepared by TA Fatemeh Arbabjolfaei)

Lecture 2: Review of Probability

Multivariate Distribution Models

Bivariate Distributions

MATH 38061/MATH48061/MATH68061: MULTIVARIATE STATISTICS Solutions to Problems on Random Vectors and Random Sampling. 1+ x2 +y 2 ) (n+2)/2

Asymptotic Statistics-III. Changliang Zou

First steps of multivariate data analysis

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Lecture 21: Convergence of transformations and generating a random variable

Lecture 2: Repetition of probability theory and statistics

Semester , Example Exam 1

Continuous Random Variables

EE4601 Communication Systems

Nemours Biomedical Research Biostatistics Core Statistics Course Session 4. Li Xie March 4, 2015

5 Operations on Multiple Random Variables

Elements of Probability Theory

Correlation analysis. Contents

Bivariate Relationships Between Variables

Solutions to Homework Set #5 (Prepared by Lele Wang) MSE = E [ (sgn(x) g(y)) 2],, where f X (x) = 1 2 2π e. e (x y)2 2 dx 2π

Nonparametric Independence Tests

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Multiple Linear Regression

LIST OF FORMULAS FOR STK1100 AND STK1110

Statistics for Economists Lectures 6 & 7. Asrat Temesgen Stockholm University

Random Signals and Systems. Chapter 3. Jitendra K Tugnait. Department of Electrical & Computer Engineering. Auburn University.

Master s Written Examination - Solution

For a stochastic process {Y t : t = 0, ±1, ±2, ±3, }, the mean function is defined by (2.2.1) ± 2..., γ t,

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

Review of Statistics

Unit 14: Nonparametric Statistical Methods

Cramér-Type Moderate Deviation Theorems for Two-Sample Studentized (Self-normalized) U-Statistics. Wen-Xin Zhou

Joint Gaussian Graphical Model Review Series I

UNIT 4 RANK CORRELATION (Rho AND KENDALL RANK CORRELATION

Nonparametric hypothesis tests and permutation tests

ENGG2430A-Homework 2

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline.

Lehrstuhl für Statistik und Ökonometrie. Diskussionspapier 87 / Some critical remarks on Zhang s gamma test for independence

Systems Simulation Chapter 7: Random-Number Generation

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistics notes. A clear statistical framework formulates the logic of what we are doing and why. It allows us to make precise statements.

Outline Properties of Covariance Quantifying Dependence Models for Joint Distributions Lab 4. Week 8 Jointly Distributed Random Variables Part II

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

Joint Distribution of Two or More Random Variables

2 (Statistics) Random variables

Advanced Statistics II: Non Parametric Tests

Summary of Chapters 7-9

Introduction to bivariate analysis

Lecture 11. Probability Theory: an Overveiw

STAT 461/561- Assignments, Year 2015

Introduction to bivariate analysis

Data files for today. CourseEvalua2on2.sav pontokprediktorok.sav Happiness.sav Ca;erplot.sav

Probability Theory and Statistics. Peter Jochumzen

f X, Y (x, y)dx (x), where f(x,y) is the joint pdf of X and Y. (x) dx

Nonparametric Statistics Notes

Statistics Introductory Correlation

EXPECTED VALUE of a RV. corresponds to the average value one would get for the RV when repeating the experiment, =0.

Problem Solving. Correlation and Covariance. Yi Lu. Problem Solving. Yi Lu ECE 313 2/51

UCSD ECE153 Handout #27 Prof. Young-Han Kim Tuesday, May 6, Solutions to Homework Set #5 (Prepared by TA Fatemeh Arbabjolfaei)

Regression and Statistical Inference

Stat 710: Mathematical Statistics Lecture 31

Preliminary Statistics. Lecture 3: Probability Models and Distributions

Let X and Y denote two random variables. The joint distribution of these random

Bivariate Distributions. Discrete Bivariate Distribution Example

the long tau-path for detecting monotone association in an unspecified subpopulation

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables

A Modification of Linfoot s Informational Correlation Coefficient

Simple Linear Regression

Week 10 Worksheet. Math 4653, Section 001 Elementary Probability Fall Ice Breaker Question: Do you prefer waffles or pancakes?

robustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression

Variance reduction. Michel Bierlaire. Transport and Mobility Laboratory. Variance reduction p. 1/18

Spring 2012 Math 541B Exam 1

Lecture 22: Variance and Covariance

Appendix A : Introduction to Probability and stochastic processes

Introduction to Machine Learning

UNIT-2: MULTIPLE RANDOM VARIABLES & OPERATIONS

Financial Econometrics and Volatility Models Copulas

ON SMALL SAMPLE PROPERTIES OF PERMUTATION TESTS: INDEPENDENCE BETWEEN TWO SAMPLES

Master s Written Examination

Business Statistics. Lecture 10: Correlation and Linear Regression

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

Session III: New ETSI Model on Wideband Speech and Noise Transmission Quality Phase II. STF Validation results

matrix-free Elements of Probability Theory 1 Random Variables and Distributions Contents Elements of Probability Theory 2

CPSC 531: Random Numbers. Jonathan Hudson Department of Computer Science University of Calgary

Transcription:

Bivariate Paired Numerical Data Pearson s correlation, Spearman s ρ and Kendall s τ, tests of independence University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 17 Bivariate Paired Numerical Data Suppose we have paired data of the form where the variables X and Y are both numerical. (X 1,Y 1 ),...,(X n,y n ) We want to know how they are related / associated / depend on each other. Note that X and Y may be measurements of different types. Summary statistics. In addition to summarizing each variable, some form of correlation between the variables is computed. Graphics. In addition to a boxplot of each variable, a scatterplot helps visualize how the variables vary together. Example: stopping distance as a function of speed Consider the cars dataset in the datasets package in R. The data give the speed of cars X (in miles per hour) and the distances taken to stop Y (in feet). Note that the data were recorded in the 1920s. 2 / 17 3 / 17

Pearson s correlation (a measure of linear association) The covariance between two random variables X and Y, with respective means µ X = E[X] and µ Y = E[Y], is given by Cov(X,Y) = E [ (X µ X )(Y µ Y ) ] = E[XY] E[X]E[Y] Their correlation is Corr(X,Y) = Cov(X,Y) Var(X)Var(Y) It is always in [ 1,1], and equal to ±1 if and only if X = ay +b for some constants a 0 and b. If this is the case, we say that X and Y are perfectly correlated. The closer the Pearson correlation is to 1 in absolute value, the stronger the linear association. 4 / 17 NOTE. Corr(X,Y) = 0 does not imply that X and Y are independent. For example, take X Unif[ 1,1] and Y = X 2. They are perfectly associated (Y is a deterministic function of X), yet Cov(X,Y) = E[XY] E[X]E[Y] = E[X 3 ] = 0 (Indeed, E(X k ) = 0 for any odd integer k 1, by symmetry.) Sample covariance and correlation The sample covariance is The sample correlation is R XY = S XY = 1 n 1 = 1 n 1 5 / 17 (X i X)(Y i Ȳ) (1) X i Y i n n 1 n (X i X)(Y i Ȳ) n (X i X) 2 n (Y = S XY i Ȳ)2 S X S Y where S X and S Y are the sample standard deviations for X and Y, respectively. XȲ (2) 6 / 17

Correlation t-test Assume we have an i.i.d. sample (X 1,Y 1 ),...,(X n,y n ), and want to test H 0 : Cor(X,Y) = 0 versus H 1 : Cor(X,Y) 0 It is natural to reject for large values of R, where R = R XY. Equivalently, we reject for large values of T, where T = R n 2 1 R 2 (Equivalently because T is a strictly increasing function of R.) Theory. Assuming that (X 1,Y 1 ),...,(X n,y n ) are i.i.d. bivariate normal, under the null hypothesis of zero correlation, T has a t-distribution with n 2 degrees of freedom. NOTE. Even when the sample is not bivariate normal, T is asymptotically standard normal as long as X and Y have finite second moments and are independent (thus under a stronger assumption that having zero correlation). The bivariate normal distribution The random vector (X, Y) is said to have a bivariate normal distribution if any (deterministic) linear combination ax + by is normally distributed. Five parameters define a bivariate normal distribution: The marginal means µ X and µ Y. The marginal variances σ 2 X and σ2 Y. The correlation ρ = Corr(X,Y). If (X,Y) is bivariate normal with these parameters, then for any a,b R, ax +by is normal with mean 7 / 17 aµ X +bµ Y and variance a 2 σx 2 +b2 σy 2 +2abρσ Xσ Y (This follows from simple moment calculations.) If (X,Y) is bivariate normal, then Cor(X,Y) = 0 implies that X and Y are independent. 8 / 17

Permutation test We can permute the data to break the pairing. What we are really testing is H 0 : X and Y are independent A permutation test based on Pearson s correlation is designed for alternatives where X and Y are linearly associated. In principle, the test works as follows: 1. For each permutation π of {1,...,n}, compute R π, the sample correlation of (X i,y π(i) ),i = 1,...,n. 2. The p-value here is the fraction of R π that are at least as large as the observed sample correlation R obs pval = #{π : R π R obs } n! 9 / 17 Permutation test Usually, the number of permutations (= n!) is too large to do that, so we estimate the p-value by Monte Carlo, which amounts to sampling B permutations (B is a large integer) π 1,...,π B, uniformly at random and computing the fraction pval = #{b : R π b R obs }+1 B +1 Spearman s rank correlation This is the same as Pearson s correlation except that the observations are replaced by their ranks. Let A i denote the rank of X i within X 1,...,X n. Let B i denote the rank of Y i within Y 1,...,Y n. Spearman s rank correlation, denoted R spear = R spear XY, is the sample Pearson correlation of (A i,b i ),i = 1,...,n. If there are no ties, R spear = 1 6 n 3 n (A i B i ) 2 Note that R spear [ 1,1] and equal to 1 (resp. 1) if and only if there is an increasing (resp. decreasing) function f such that Y i = f(x i ). 10 / 17 The closer the Spearman correlation is to 1 in absolute value, the stronger the monotonic association. 11 / 17

A test for independence vs monotonic association rejects for large values of R spear. This test is distribution-free if X and Y are continuous. Equivalently, we reject for large values of T spear, where T spear = Rspear n 2 1 (R spear ) 2 Theory. Under the null hypothesis, T spear has asymptotically the standard normal distribution. NOTE. R spear is a consistent estimator for Spearman s ρ, defined as where (X 1,Y 1 ),X 2,Y 3 are independent. ρ = 3E [ sign ( (X 1 X 2 )(Y 1 Y 3 ) )] We can have ρ = 0 even if X and Y are not independent, and indeed, the test is not universally consistent as a test for independence. Kendall s tau Kendall s tau is very similar to Spearman s rho. The sample version is defined as T kend = 2 n(n 1) 1 i<j n sign ( (X j X i )(Y j Y i ) ) Note that T kend [ 1,1] and equal to 1 (resp. 1) if and only if there is an increasing (resp. decreasing) function f such that Y i = f(x i ). The resulting test is also distribution-free if the variables are continuous. Theory. Under the null hypothesis, T kend has asymptotically the standard normal distribution. NOTE. T kend is a consistent estimator for where (X 1,Y 1 ) and (X 2,Y 2 ) are independent. τ = E [ sign ( (X 1 X 2 )(Y 1 Y 2 ) )] 12 / 17 We can have τ = 0 even if X and Y are not independent, and indeed, the test is not universally consistent as a test for independence. The joint cumulative distribution function The joint CDF of (X,Y) (a random vector with values in R 2 ) is defined as F XY (x,y) = P(X x,y y) 13 / 17 Theory. F XY (x,y) = F X (x)f Y (y) X and Y are independent

14 / 17 Tests for independence based on the empirical distributions In the spirit of the Kolmogorov-Smirnov test, Hoeffding (1948) and others later proposed a test of independence based on the empirical CDFs. An example of such a test rejects for large values of H = sup F n XY (x,y) Fn X (x)fy n (y) x,y R where F X n is the empirical CDF of X 1,...,X n F Y n is the empirical CDF of Y 1,...,Y n F XY n is the joint empirical CDF of (X 1,Y 1 ),...,(X n,y n ) F XY n (x,y) = 1 n I{X i x,y i y} It X and Y are continuous, the test is distribution-free. (The asymptotic distribution is also known in closed form.) The test is universally consistent against any alternative to independence. Energy statistics A more recent proposal of Székely, Rizzo and Bakirov (2007) is based on the following distance covariance statistic: V n (X,Y) = 1 A n 2 ij B ij where and B ij is defined similarly based on the Y s. j=1 A ij = a ij ā i. ā j. +ā.., a ij = X i X j, ā i. = 1 a ij, ā.. = 1 n n 2 Theory. If X and Y have densities f X and f Y with finite first moment, then with probability one, V n (X,Y) V(X,Y) as n, where j=1 V(X,Y) 2 (fxy (x,y) f X (x)f Y (y)) 2 := dxdy x 2 y 2 a ij 15 / 17 16 / 17

Note that V(X,Y) = 0 if and only if X and Y are independent. The authors recommend using the distance correlation, defined as V n (X,Y) Vn (X,X)V n (Y,Y) We have V n (X,Y) Vn (X,X)V n (Y,Y) V(X,Y) V(X,X)V(Y,Y) Calibration (whether using the distance covariance or the distance correlation) can be done by permutation in practice. All this remains valid if X and Y are random vectors (of possibly different dimensions). 17 / 17