A COMPARISON STUDY BETWEEN THE CORRELATION COEFFICIENTS OF PEARSON, SPEARMAN AND KENDALL WITH NUMERICAL APPLICATIONS

Size: px
Start display at page:

Download "A COMPARISON STUDY BETWEEN THE CORRELATION COEFFICIENTS OF PEARSON, SPEARMAN AND KENDALL WITH NUMERICAL APPLICATIONS"

Transcription

1 A COMPARISON STUDY BETWEEN THE CORRELATION COEFFICIENTS OF PEARSON, SPEARMAN AND KENDALL WITH NUMERICAL APPLICATIONS Nicolae POPOVICIU Abstract. The work systematically presents three types of correlation coefficient: Pearson, Spearman, Kendall. For the last two types the rank correlation coefficient (RCC) is studied. The RCC use two random ordinal variables and Y and mesures the degree of similarity between them or assess the significance of the relation between them. For each RCC a computation algorithm is given. Some numerical examples iluustrate the theory. The section of conclusions shows how to decide which type of correlation coefficient have to be used in a numerical problem. Keywords: Pearson s coefficient, PCC, Spearman s coefficient, SCC, Kendall s coefficient (KCC), PSK Program Description. 1. Introduction For random variables we recall some usual notations and meanings: x1 x xn x =, n 1; =, p1 p pn f ( x) f (x) is the density probability;, Y are random variables (discrete or continuous); Mean value M ( ) = m = m; dispersion or variance D ( ) = σ = = σ = Var( ); m Reduced (normalized) random variable ' = ; M ( ') = 0; σ Covariance of and Y is a real number defined as cov(, Y ) = M[( m ) ( Y m )] (theoretical formula); cov(, Y ) = M ( Y ) M ( ) M ( Y ) (computational formula); Y Hyperion University of Bucharest, popoviciunicolae15@yahoo.ro 347

2 cov(, Y ) = cov( Y, ) (commutativity); cov(, ) = M ( ) [ M ( )] = D ( ) = ( σ cov(, ) = M ( ) M ( ) M ( ) = D ( ) = σ ; If, Y are independently, then cov(, Y ) = 0.. Correlation coefficient of Pearson If cov(, Y ) 0, then the random variables are correlated; If cov(, Y ) = 0, then and Y are not correlated, but we don t know if they were independently. That is why a new measure of correlation is necessary. cov(, Y ) The real number ρ (, Y ) defined as ρ(, Y ) = is σ σ Y called correlation coefficient of Pearson (PCC). Karl Pearson: Sometimes on denotes ρ (, Y ) = ρ P (, Y ) = co r(, Y ). Pearson s correlation coefficient properties: cov(, ) ρ (, Y ) = M ( ' Y '); ρ(, ) = = 1; σ σ cov(, ) ρ(, ) = = 1 1 ρ(, Y ) 1. σ σ cov(, Y ) Remark 1. cov(, Y ) = M[( m ) ( Y my )] and ρ(, Y ) = σ σ Y are called the theoretical formulas. There exists also the estimated formulas, marked by Estim and based on a sample of volume N 1 N = =1 x, i i M ( ) (estimation for mean value); N 1 N s = = ( x ), i 1 i s D ( ), s = s N (estimation for dispersion and standard deviation); ' Estim = ; Y Y Y ' Estim = ; s s Y ) ; 348

3 1 N cov(, Y ) Estim = ( )( ); 1 1 = x i i yi Y N cov(, Y ) Estim ρ (, Y ) Estim = s. s Y The estimation formulas don t use the probabilities p i. 3. Spearman's rank correlation coefficient Charles Edward Spearman ( ): English statistician and psychologist; founder of factorial analysis. We are interested to study a dichotomist process P (i.e. separated in two main parts). For P we have to collect only two sets of data of size n: A = ( a j ) and B = ( b j ); j = 1, n. Generally a j, b j are real numbers, but by using an appropriate transformation we obtain some natural numbers. Example: a = 4, billion becomes b = 400 million; a R, b N. The initial data generate the pairs ( a j, b j ). All numbers use the same unit of measure. Example: a j represents the student s note obtained in theoretical exam; b j represents the student s note obtained in practical exam. These numbers have an ordinal type, namely isn t important the value a j, but it is important its rank (place, position) in the string. The initial data A = ( a j ) and B = ( b j ) generate the random variavles = ( x i ) and Y = ( y i ), where the index i range in the domain i = 1, n; x i and y i are natural numbers. We look for the level of correlation between the random variables and Y. The Spearman's rank correlation coefficient (SCC) is denoted by ρ S (, Y ) and it could be calculated by two formulas. n 6 = D i 1 i Version 1. ρ S 1 = 1, where D i = xi y i. Version. Use the pairs of initial values ( x i, y i ), ( i = 1, n), where the values x i are arranged in increasing order. The x i values, in increasing order are denoted u i. This generates the ranks r = U, r Y = V, where U = ( u i ), V = ( v i ); i = 1, n; u i, v i are natural numbers. 349

4 6 D i 1 i Then ρ S = 1, where D i = u i vi (difference between ranks). 350 n = 4. Kendall's rank correlation coefficient Maurice George Kendall ( ) was an English statistician. We denote by ρ K (, Y ) Kendall s rank correlation coefficient (KCC). The notations and the main idea of section are available. The computation of ρ K (, Y ) is done in several steps. Step 1. Construct the table 1 of input data = ( x i ), Y = ( y i) and the pairs ( x i, y i ); i = 1, n, where the values x i are arranged in increasing order. Hence we use the pairs ( x i, y i ), where the redundant data (the repeating data) aren t eliminated. Step. (Optional). Find the ranks r and r Y for the random variables and Y. a) The ranks of x i are the natural numbers 1,, 3 etc. Find the rank of y i, corerewsponding to x i. So, we construct the table of ranks. The repeating data aren t eliminated. The new random Y is denoted Y '. Obtain the ranks r Y : r 1, r, r3,. b) The table 1 and table yield the table 3 of ranks r and r Y. Step 3. Because the values x i are in natural increasing order, the ranks r aren t used in the computation of Kendall s correlation coefficient. Only the ranks r Y are used. Construct the variable RSY = u ), i = 1, n which contains the superior ranks for variable Y. For this construction we take a fixed value y j from initial table 1 and count how many values y k, situated after y j have the property yk y j. The result is the table 4 of superior ranks for the variable Y. Step 4. Construct the variable RIY = v ), i = 1, n which contains the inferior ranks for variable Y. For this construction we take a fixed value ( i ( i

5 y j from initial table 1 and count how many values y k, situated after y j have the property yk y j. The result is the table 5 of inferior ranks for the variable Y. The redundant values aren t eliminated. Step 5. Compute the rank s difference d i = u i vi, D = ( d i ) and construct the table 6. Step 6. Compute the sum n S d. i = 1 i Step 7. Compute the Kendall s correlation coefficient ρ K = ρ K (, Y ) = S S =. C n Remark. For each type of correlation coefficient we have elaborated a C++ program (PSK Program) to compute the specified coefficient. // PSK Program description // code=1 for Pearson; code= for Spearman; code=3 for Kendall // Pearson correlation. // corpy=covy/d*dy; D=M-M*; DY=MY-MY*MY // cov(,y)=covy=m[(-m)(y-my)] or cov(,y)= M(Y)-M*MY // // Spearman correlation. // Version 1. We use unmodified initial data and Y and the pairs (xi,yi) // Version. We arrange the values xi in increasing order. // // Kendall correlation. // The vector has the components xi in increasing order. The redundant values aren t eliminated. The vector Y generates the pairs (xi,yi). // Observation. If in vector Y appears the redundant values yi (for example: // (redundant values are 9 and 9; 8 and 8) ) then we apply a perturbation of these values, so that the perturbation doesn t change the ranks. 351

6 // For example we put 9.01 instead of the first 9 and 8.01 instead of the first 8; the 9.01 for the second 9 and 8.0 for second 8. Hence the result is // // ; the ranks aren t changed. 5. Numerical applications Application 1 (S). For a group of 10 students on knows the notes = ( x i ) obtained at theoretical exam and the notes Y = ( y i ) for practical exam (Table 1). ρ Tablel 1 (S) [7]. Student Note Note Y cov(, Y ) a) Compute the Pearson correlation coefficient ρ (, Y ) =. σ σ Y b) Compute the Spearsman s rank correlation coefficient = (, Y ). S ρ S Solution. a) M ( ) = m = = 5.5; M ( Y ) = m Y = = 5.5; M ( ) = = 38.3; M ( Y ) = = 38.5; D ( ) = 8.05; D ( Y ) = 8.5; 10 σ =.8373; = σ cov(, Y ) = M[( m ) ( Y m Y 70.5 cov(, Y ) 7.05 cov(, Y ) = = 7.05; ρ(, Y ) = = = 0.865; 10 σ σ Y ρ P (, Y ) = The random variables and Y are very correlated. b) Compute Spearman s rank coefficient ρ = (, Y ). )] S ρ S n = S Di = xi y i. 6 D i 1 i Version 1. Use the formula ρ 1 = 1 ; 35

7 n = S i = u i v i. 6 D i 1 i Version. Use the formula ρ = 1 ; D Version 1. Use the initial data: notes x y. Table (S). (initial exam notes) i, i Student Note Note Y D i D i (the sum is 4) = D = 4; i 1 i ρ S 1 = 1 = ; ρ S = (100 1) Version. Use the pairs of initial notes ( x i, y i ), ( i = 1,10), where the notes x are arranged in increasing order. This generates the ranks i r = U, r Y = V, where U = u ), V = v ); i = 1, n. ( i ( i Table 3 (S) (of ranks) Student Note Note Y r r Y D i D i (the sum is 4) = D = 4; i 1 i ρ S = 1 = ; 10(100 1) ρ S = The boths versions givs the some result. Observation 1. We see that the coefficient values ρ(, Y ) = and ρ S = are very compatible, but the Spearman s coefficient is easier to compute. 353

8 Nevertheless, the pairs ( u i, vi ) with u i in increasing order is rather difficult to construct. For a big volume of data, a computer program is necessary (for example C++ program). Application (K). [11] For 17 economical societies we know two types of data: = ( x i ) : the sums used for publicity (in millions); Y = ( y i) : the total capitals for each society (in millions)l The data x i are arranged in increasing order. Table 1 contains all 17 pairs x, y ). 354 ( i i Table 1 (K) (sum publicity and total capital) = ( x i ) Y = ( y i ) Find the Kendall s rank correlation coefficient of random variables and Y. Solution. We use several steps. Step 0. Arrange the values x i of in increasing order (in this problem it is automatically done). The repeating (redundant) data aren t eliminated. Step 1. Construct the table 1 with the pairs ( x i, y i ); i = 1, n (see the problem formulation). Step. Construct the vector variable RSY = ( u i ), i = 1, n containing the superior ranks of variable Y. We use the table 1 and for a fixed value y j we count the values y k (placed after the value y j ) having the property yk y j. The result is table for variable Y. Tabelul (K). Superior ranks for variable Y Y = ( y i ) RSY = ( u i ) Step 3. Construct the vector variable RIY = ( u i ), i = 1, n containing the inferior ranks of variable Y. We use the table 1 and for a fixed value y j we count the values y k (placed after the value y j ) having the property yk y j. The result is table 3 for variable Y.

9 Table 3 (K). Inferior ranks for variable Y Y = ( y i ) RIY = ( v i ) Step 4. Compute the rank s differences D = ( d i ) (table 4). d i = u v and denote it i i Table 4 (K). Rank s differences RSY = ( u i ) RIY = ( v i ) D = ( d i ) Step 5. Compute the sum of d i and denote S = d. i = 1 i We obtain S = 8. Step 6. Compute the Kendall s rank coefficient ρ K = ρ K (, Y ) = S =. The result is ρ K = 0,603. The variables and Y have I good correlation. Application 3 (PSK). We repeat the application 1 with the data from table 1 PSK). Table 1 (PSK) [7]. Student Nota Nota Y Compute all the correlation coefficients ρ (, Y ) Pearson; ρ (, Y ) Spearman; ρ (, Y ) Kendall. P S Soluţie. D 1 = D = {1,,,10}; D 1 = 10 1 = 9; n = 10. Our the computer program PSK Program gives the following results ρ (, Y ) = 0,854545; ρ 1 (, Y ) = 0,854545; ρ (, Y ) = 0,854545; P ρ K S (, Y ) = 0, The results are compatible between them. K S n 355

10 6. Conclusions In order to draw several conclusins, we use some special notations. D 1 is the domain of x i; D is the domain of y i; if D 1 = D then we denote D = D 1 = D ; S1 = max{ x x D1}; s1 = min{ x x D1}; D1 = S1 s1; D = S s etc. Our problem is to determine the level of correlation between the numerical data A and B, or between the random variables and Y. The answer is obtained by several methods. Method 1. Compute the Pearson s correlation coefficient ρ = ρ P (, Y ). Method. Compute the Spearman s rank correlation coefficient (, Y ). ρ S Method 3. Compute the Kendall s rank correlation coefficient ρ (, Y ). The problem is how to choose the appropriate method? We suggest the following answer. a) The method 1 could be used for any kind of data and Y. b) If D 1 = D, D 1 = D and the norm isn t a big number, then we recommend the method. c) If D1 D we recommend the method 3. REFERENCES [1] Popoviciu N., Tutorial on Statistical Formulas. Parameters Estimation. Confidence Intervals, University Hyperion of Bucharest Annals, Exact Sciences and Engineering Series, Vol. 1, ISSN , Victor Publishing, Bucharest; 013, pp [] Purcaru I., Bâscă O., People, ideas, and Facts from the history of Mathematics, Economic Publishing, Bucharest, [3] Tomescu Rodica, IJACU Daniela, Probability and mathematical statistics, PRINTECH Publishing, Bucharest, 005. [4] Turban E., Aronson J. E., Decision Support Systems and Intelligent Systems, ed. 5th, New Jersey Prentice Hall, [5] SSP-IBM Scientific Subroutine Package, IBM Vienna, [6] Văduva I., Computer-Aided Simulation Models, Technical Publishing House, Bucharest, [7] Voineagu V., Mitruţ C-tin şi colectiv; The theoretical and statistical Macroeconomic; Tests, practical work, case studies, Economic Publishing, Bucharest, [8] Wikipedia: Multivariate Normal: [9] Wikipedia: Birth and death processes. [10] [11] K 356

Correlation and Regression

Correlation and Regression Correlation and Regression. ITRDUCTI Till now, we have been working on one set of observations or measurements e.g. heights of students in a class, marks of students in an exam, weekly wages of workers

More information

BIOL 4605/7220 CH 20.1 Correlation

BIOL 4605/7220 CH 20.1 Correlation BIOL 4605/70 CH 0. Correlation GPT Lectures Cailin Xu November 9, 0 GLM: correlation Regression ANOVA Only one dependent variable GLM ANCOVA Multivariate analysis Multiple dependent variables (Correlation)

More information

CORELATION - Pearson-r - Spearman-rho

CORELATION - Pearson-r - Spearman-rho CORELATION - Pearson-r - Spearman-rho Scatter Diagram A scatter diagram is a graph that shows that the relationship between two variables measured on the same individual. Each individual in the set is

More information

Measuring Associations : Pearson s correlation

Measuring Associations : Pearson s correlation Measuring Associations : Pearson s correlation Scatter Diagram A scatter diagram is a graph that shows that the relationship between two variables measured on the same individual. Each individual in the

More information

Correlation: Relationships between Variables

Correlation: Relationships between Variables Correlation Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means However, researchers are

More information

Data Analysis as a Decision Making Process

Data Analysis as a Decision Making Process Data Analysis as a Decision Making Process I. Levels of Measurement A. NOIR - Nominal Categories with names - Ordinal Categories with names and a logical order - Intervals Numerical Scale with logically

More information

Reminder: Student Instructional Rating Surveys

Reminder: Student Instructional Rating Surveys Reminder: Student Instructional Rating Surveys You have until May 7 th to fill out the student instructional rating surveys at https://sakai.rutgers.edu/portal/site/sirs The survey should be available

More information

UNIT 4 RANK CORRELATION (Rho AND KENDALL RANK CORRELATION

UNIT 4 RANK CORRELATION (Rho AND KENDALL RANK CORRELATION UNIT 4 RANK CORRELATION (Rho AND KENDALL RANK CORRELATION Structure 4.0 Introduction 4.1 Objectives 4. Rank-Order s 4..1 Rank-order data 4.. Assumptions Underlying Pearson s r are Not Satisfied 4.3 Spearman

More information

Inferences for Correlation

Inferences for Correlation Inferences for Correlation Quantitative Methods II Plan for Today Recall: correlation coefficient Bivariate normal distributions Hypotheses testing for population correlation Confidence intervals for population

More information

Correlation and Regression. Tudor Călinici 2017

Correlation and Regression. Tudor Călinici 2017 Correlation and Regression Tudor Călinici 2017 1 Objectives To verify the existence of a relation between two quantitative continuous variables using the coefficient of correlation If the correlation exists,

More information

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă HYPOTHESIS TESTING II TESTS ON MEANS Sorana D. Bolboacă OBJECTIVES Significance value vs p value Parametric vs non parametric tests Tests on means: 1 Dec 14 2 SIGNIFICANCE LEVEL VS. p VALUE Materials and

More information

Statistics Introductory Correlation

Statistics Introductory Correlation Statistics Introductory Correlation Session 10 oscardavid.barrerarodriguez@sciencespo.fr April 9, 2018 Outline 1 Statistics are not used only to describe central tendency and variability for a single variable.

More information

CORRELATION. compiled by Dr Kunal Pathak

CORRELATION. compiled by Dr Kunal Pathak CORRELATION compiled by Dr Kunal Pathak Flow of Presentation Definition Types of correlation Method of studying correlation a) Scatter diagram b) Karl Pearson s coefficient of correlation c) Spearman s

More information

Section 4.7 Scientific Notation

Section 4.7 Scientific Notation Section 4.7 Scientific Notation INTRODUCTION Scientific notation means what it says: it is the notation used in many areas of science. It is used so that scientist and mathematicians can work relatively

More information

Correlation and Linear Regression

Correlation and Linear Regression Correlation and Linear Regression Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means

More information

Measurement and Data. Topics: Types of Data Distance Measurement Data Transformation Forms of Data Data Quality

Measurement and Data. Topics: Types of Data Distance Measurement Data Transformation Forms of Data Data Quality Measurement and Data Topics: Types of Data Distance Measurement Data Transformation Forms of Data Data Quality Importance of Measurement Aim of mining structured data is to discover relationships that

More information

Unit 2. Describing Data: Numerical

Unit 2. Describing Data: Numerical Unit 2 Describing Data: Numerical Describing Data Numerically Describing Data Numerically Central Tendency Arithmetic Mean Median Mode Variation Range Interquartile Range Variance Standard Deviation Coefficient

More information

Correlation & Linear Regression. Slides adopted fromthe Internet

Correlation & Linear Regression. Slides adopted fromthe Internet Correlation & Linear Regression Slides adopted fromthe Internet Roadmap Linear Correlation Spearman s rho correlation Kendall s tau correlation Linear regression Linear correlation Recall: Covariance n

More information

INTERVAL ESTIMATION AND HYPOTHESES TESTING

INTERVAL ESTIMATION AND HYPOTHESES TESTING INTERVAL ESTIMATION AND HYPOTHESES TESTING 1. IDEA An interval rather than a point estimate is often of interest. Confidence intervals are thus important in empirical work. To construct interval estimates,

More information

Session III: New ETSI Model on Wideband Speech and Noise Transmission Quality Phase II. STF Validation results

Session III: New ETSI Model on Wideband Speech and Noise Transmission Quality Phase II. STF Validation results Session III: New ETSI Model on Wideband Speech and Noise Transmission Quality Phase II STF 294 - Validation results ETSI Workshop on Speech and Noise in Wideband Communication Javier Aguiar (University

More information

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline. Practitioner Course: Portfolio Optimization September 10, 2008 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x,

More information

STATISTICS ( CODE NO. 08 ) PAPER I PART - I

STATISTICS ( CODE NO. 08 ) PAPER I PART - I STATISTICS ( CODE NO. 08 ) PAPER I PART - I 1. Descriptive Statistics Types of data - Concepts of a Statistical population and sample from a population ; qualitative and quantitative data ; nominal and

More information

Data files for today. CourseEvalua2on2.sav pontokprediktorok.sav Happiness.sav Ca;erplot.sav

Data files for today. CourseEvalua2on2.sav pontokprediktorok.sav Happiness.sav Ca;erplot.sav Correlation Data files for today CourseEvalua2on2.sav pontokprediktorok.sav Happiness.sav Ca;erplot.sav Defining Correlation Co-variation or co-relation between two variables These variables change together

More information

Class 11 Maths Chapter 15. Statistics

Class 11 Maths Chapter 15. Statistics 1 P a g e Class 11 Maths Chapter 15. Statistics Statistics is the Science of collection, organization, presentation, analysis and interpretation of the numerical data. Useful Terms 1. Limit of the Class

More information

ON SMALL SAMPLE PROPERTIES OF PERMUTATION TESTS: INDEPENDENCE BETWEEN TWO SAMPLES

ON SMALL SAMPLE PROPERTIES OF PERMUTATION TESTS: INDEPENDENCE BETWEEN TWO SAMPLES ON SMALL SAMPLE PROPERTIES OF PERMUTATION TESTS: INDEPENDENCE BETWEEN TWO SAMPLES Hisashi Tanizaki Graduate School of Economics, Kobe University, Kobe 657-8501, Japan e-mail: tanizaki@kobe-u.ac.jp Abstract:

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y Regression and correlation Correlation & Regression, I 9.07 4/1/004 Involve bivariate, paired data, X & Y Height & weight measured for the same individual IQ & exam scores for each individual Height of

More information

1 Overview. Coefficients of. Correlation, Alienation and Determination. Hervé Abdi Lynne J. Williams

1 Overview. Coefficients of. Correlation, Alienation and Determination. Hervé Abdi Lynne J. Williams In Neil Salkind (Ed.), Encyclopedia of Research Design. Thousand Oaks, CA: Sage. 2010 Coefficients of Correlation, Alienation and Determination Hervé Abdi Lynne J. Williams 1 Overview The coefficient of

More information

Measurement Theory. Reliability. Error Sources. = XY r XX. r XY. r YY

Measurement Theory. Reliability. Error Sources. = XY r XX. r XY. r YY Y -3 - -1 0 1 3 X Y -10-5 0 5 10 X Measurement Theory t & X 1 X X 3 X k Reliability e 1 e e 3 e k 1 The Big Picture Measurement error makes it difficult to identify the true patterns of relationships between

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

N Utilization of Nursing Research in Advanced Practice, Summer 2008

N Utilization of Nursing Research in Advanced Practice, Summer 2008 University of Michigan Deep Blue deepblue.lib.umich.edu 2008-07 536 - Utilization of ursing Research in Advanced Practice, Summer 2008 Tzeng, Huey-Ming Tzeng, H. (2008, ctober 1). Utilization of ursing

More information

Recall the Basics of Hypothesis Testing

Recall the Basics of Hypothesis Testing Recall the Basics of Hypothesis Testing The level of significance α, (size of test) is defined as the probability of X falling in w (rejecting H 0 ) when H 0 is true: P(X w H 0 ) = α. H 0 TRUE H 1 TRUE

More information

EXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY

EXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY EXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY MODULE 4 : Linear models Time allowed: One and a half hours Candidates should answer THREE questions. Each question carries 20 marks. The number of marks

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Measures of Central Tendency and their dispersion and applications. Acknowledgement: Dr Muslima Ejaz

Measures of Central Tendency and their dispersion and applications. Acknowledgement: Dr Muslima Ejaz Measures of Central Tendency and their dispersion and applications Acknowledgement: Dr Muslima Ejaz LEARNING OBJECTIVES: Compute and distinguish between the uses of measures of central tendency: mean,

More information

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation

More information

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline.

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline. MFM Practitioner Module: Risk & Asset Allocation September 11, 2013 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y

More information

Bivariate Relationships Between Variables

Bivariate Relationships Between Variables Bivariate Relationships Between Variables BUS 735: Business Decision Making and Research 1 Goals Specific goals: Detect relationships between variables. Be able to prescribe appropriate statistical methods

More information

Business Statistics. Lecture 10: Correlation and Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form

More information

Correlation and regression

Correlation and regression NST 1B Experimental Psychology Statistics practical 1 Correlation and regression Rudolf Cardinal & Mike Aitken 11 / 12 November 2003 Department of Experimental Psychology University of Cambridge Handouts:

More information

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing STAT 135 Lab 5 Bootstrapping and Hypothesis Testing Rebecca Barter March 2, 2015 The Bootstrap Bootstrap Suppose that we are interested in estimating a parameter θ from some population with members x 1,...,

More information

Soc 3811 Basic Social Statistics Second Midterm Exam Spring Your Name [50 points]: ID #: ANSWERS

Soc 3811 Basic Social Statistics Second Midterm Exam Spring Your Name [50 points]: ID #: ANSWERS Soc 3811 Basic Social Statistics Second idterm Exam Spring 010 our Name [50 points]: ID #: INSTRUCTIONS: ANSERS (A) rite your name on the line at top front of every sheet. (B) If you use a page of notes

More information

Statistics: A review. Why statistics?

Statistics: A review. Why statistics? Statistics: A review Why statistics? What statistical concepts should we know? Why statistics? To summarize, to explore, to look for relations, to predict What kinds of data exist? Nominal, Ordinal, Interval

More information

MATH 1070 Introductory Statistics Lecture notes Relationships: Correlation and Simple Regression

MATH 1070 Introductory Statistics Lecture notes Relationships: Correlation and Simple Regression MATH 1070 Introductory Statistics Lecture notes Relationships: Correlation and Simple Regression Objectives: 1. Learn the concepts of independent and dependent variables 2. Learn the concept of a scatterplot

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS Communications in Statistics - Simulation and Computation 33 (2004) 431-446 COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS K. Krishnamoorthy and Yong Lu Department

More information

Chapter 16: Correlation

Chapter 16: Correlation Chapter 16: Correlation Correlations: Measuring and Describing Relationships A correlation is a statistical method used to measure and describe the relationship between two variables. A relationship exists

More information

HUDM4122 Probability and Statistical Inference. February 2, 2015

HUDM4122 Probability and Statistical Inference. February 2, 2015 HUDM4122 Probability and Statistical Inference February 2, 2015 Special Session on SPSS Thursday, April 23 4pm-6pm As of when I closed the poll, every student except one could make it to this I am happy

More information

Some Review Problems for Exam 3: Solutions

Some Review Problems for Exam 3: Solutions Math 3355 Fall 018 Some Review Problems for Exam 3: Solutions I thought I d start by reviewing some counting formulas. Counting the Complement: Given a set U (the universe for the problem), if you want

More information

Econ 424 Time Series Concepts

Econ 424 Time Series Concepts Econ 424 Time Series Concepts Eric Zivot January 20 2015 Time Series Processes Stochastic (Random) Process { 1 2 +1 } = { } = sequence of random variables indexed by time Observed time series of length

More information

Correlation analysis. Contents

Correlation analysis. Contents Correlation analysis Contents 1 Correlation analysis 2 1.1 Distribution function and independence of random variables.......... 2 1.2 Measures of statistical links between two random variables...........

More information

Factor Analysis. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

Factor Analysis. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA Factor Analysis Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA 1 Factor Models The multivariate regression model Y = XB +U expresses each row Y i R p as a linear combination

More information

6 Single Sample Methods for a Location Parameter

6 Single Sample Methods for a Location Parameter 6 Single Sample Methods for a Location Parameter If there are serious departures from parametric test assumptions (e.g., normality or symmetry), nonparametric tests on a measure of central tendency (usually

More information

SAMPLING, THE CLT, AND THE STANDARD ERROR. Business Statistics

SAMPLING, THE CLT, AND THE STANDARD ERROR. Business Statistics SAMPLING, THE CLT, AND THE STANDARD ERROR Business Statistics CONTENTS Sampling The central limit theorem Point and interval estimates for μ Confidence intervals for μ Old exam question Further study SAMPLING

More information

Practical Statistics

Practical Statistics Practical Statistics Lecture 1 (Nov. 9): - Correlation - Hypothesis Testing Lecture 2 (Nov. 16): - Error Estimation - Bayesian Analysis - Rejecting Outliers Lecture 3 (Nov. 18) - Monte Carlo Modeling -

More information

Advanced Quantitative Research Methodology Lecture Notes: January Ecological 28, 2012 Inference1 / 38

Advanced Quantitative Research Methodology Lecture Notes: January Ecological 28, 2012 Inference1 / 38 Advanced Quantitative Research Methodology Lecture Notes: Ecological Inference 1 Gary King http://gking.harvard.edu January 28, 2012 1 c Copyright 2008 Gary King, All Rights Reserved. Gary King http://gking.harvard.edu

More information

CS 147: Computer Systems Performance Analysis

CS 147: Computer Systems Performance Analysis CS 147: Computer Systems Performance Analysis Summarizing Variability and Determining Distributions CS 147: Computer Systems Performance Analysis Summarizing Variability and Determining Distributions 1

More information

Answer keys for Assignment 10: Measurement of study variables (The correct answer is underlined in bold text)

Answer keys for Assignment 10: Measurement of study variables (The correct answer is underlined in bold text) Answer keys for Assignment 10: Measurement of study variables (The correct answer is underlined in bold text) 1. A quick and easy indicator of dispersion is a. Arithmetic mean b. Variance c. Standard deviation

More information

1. Pearson linear correlation calculating testing multiple correlation. 2. Spearman rank correlation calculating testing 3. Other correlation measures

1. Pearson linear correlation calculating testing multiple correlation. 2. Spearman rank correlation calculating testing 3. Other correlation measures STATISTICAL METHODS 1. Introductory lecture 2. Random variables and probability theory 3. Populations and samples 4. Hypotheses testing and parameter estimation 5. Most widely used statistical tests I.

More information

Some Review Problems for Exam 3: Solutions

Some Review Problems for Exam 3: Solutions Math 3355 Spring 017 Some Review Problems for Exam 3: Solutions I thought I d start by reviewing some counting formulas. Counting the Complement: Given a set U (the universe for the problem), if you want

More information

CORRELATION AND SIMPLE REGRESSION 10.0 OBJECTIVES 10.1 INTRODUCTION

CORRELATION AND SIMPLE REGRESSION 10.0 OBJECTIVES 10.1 INTRODUCTION UNIT 10 CORRELATION AND SIMPLE REGRESSION STRUCTURE 10.0 Objectives 10.1 Introduction 10. Correlation 10..1 Scatter Diagram 10.3 The Correlation Coefficient 10.3.1 Karl Pearson s Correlation Coefficient

More information

Define characteristic function. State its properties. State and prove inversion theorem.

Define characteristic function. State its properties. State and prove inversion theorem. ASSIGNMENT - 1, MAY 013. Paper I PROBABILITY AND DISTRIBUTION THEORY (DMSTT 01) 1. (a) Give the Kolmogorov definition of probability. State and prove Borel cantelli lemma. Define : (i) distribution function

More information

Finding Relationships Among Variables

Finding Relationships Among Variables Finding Relationships Among Variables BUS 230: Business and Economic Research and Communication 1 Goals Specific goals: Re-familiarize ourselves with basic statistics ideas: sampling distributions, hypothesis

More information

Practice Problems Section Problems

Practice Problems Section Problems Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,

More information

Product Held at Accelerated Stability Conditions. José G. Ramírez, PhD Amgen Global Quality Engineering 6/6/2013

Product Held at Accelerated Stability Conditions. José G. Ramírez, PhD Amgen Global Quality Engineering 6/6/2013 Modeling Sub-Visible Particle Data Product Held at Accelerated Stability Conditions José G. Ramírez, PhD Amgen Global Quality Engineering 6/6/2013 Outline Sub-Visible Particle (SbVP) Poisson Negative Binomial

More information

Sec 3.3 The Conditional & Circuits

Sec 3.3 The Conditional & Circuits Sec 3.3 The Conditional & Circuits Conditional statement: connective if... then. a compound statement that uses the Conditional statements are also known as implications, and can be written as: p q (pronounced

More information

Unit 14: Nonparametric Statistical Methods

Unit 14: Nonparametric Statistical Methods Unit 14: Nonparametric Statistical Methods Statistics 571: Statistical Methods Ramón V. León 8/8/2003 Unit 14 - Stat 571 - Ramón V. León 1 Introductory Remarks Most methods studied so far have been based

More information

Correlation. We don't consider one variable independent and the other dependent. Does x go up as y goes up? Does x go down as y goes up?

Correlation. We don't consider one variable independent and the other dependent. Does x go up as y goes up? Does x go down as y goes up? Comment: notes are adapted from BIOL 214/312. I. Correlation. Correlation A) Correlation is used when we want to examine the relationship of two continuous variables. We are not interested in prediction.

More information

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Correlation. A statistics method to measure the relationship between two variables. Three characteristics Correlation Correlation A statistics method to measure the relationship between two variables Three characteristics Direction of the relationship Form of the relationship Strength/Consistency Direction

More information

PROBABILITIES OF MISCLASSIFICATION IN DISCRIMINATORY ANALYSIS. M. Clemens Johnson

PROBABILITIES OF MISCLASSIFICATION IN DISCRIMINATORY ANALYSIS. M. Clemens Johnson RB-55-22 ~ [ s [ B A U R L t L Ii [ T I N PROBABILITIES OF MISCLASSIFICATION IN DISCRIMINATORY ANALYSIS M. Clemens Johnson This Bulletin is a draft for interoffice circulation. Corrections and suggestions

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

How to fail Hyperbolic Geometry

How to fail Hyperbolic Geometry Hyperbolic Geometry 1 Introduction These notes describe some of the most common misunderstandings and mistakes that occur almost every year. The section headings contain the most common mistakes students

More information

Statistics Assignment 2 HET551 Design and Development Project 1

Statistics Assignment 2 HET551 Design and Development Project 1 Statistics Assignment HET Design and Development Project Michael Allwright - 74634 Haddon O Neill 7396 Monday, 3 June Simple Stochastic Processes Mean, Variance and Covariance Derivation The following

More information

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag A Tutorial on Data Reduction Principal Component Analysis Theoretical Discussion By Shireen Elhabian and Aly Farag University of Louisville, CVIP Lab November 2008 PCA PCA is A backbone of modern data

More information

A simple graphical method to explore tail-dependence in stock-return pairs

A simple graphical method to explore tail-dependence in stock-return pairs A simple graphical method to explore tail-dependence in stock-return pairs Klaus Abberger, University of Konstanz, Germany Abstract: For a bivariate data set the dependence structure can not only be measured

More information

A tutorial on Principal Components Analysis

A tutorial on Principal Components Analysis A tutorial on Principal Components Analysis Lindsay I Smith February 26, 2002 Chapter 1 Introduction This tutorial is designed to give the reader an understanding of Principal Components Analysis (PCA).

More information

Mock Exam - 2 hours - use of basic (non-programmable) calculator is allowed - all exercises carry the same marks - exam is strictly individual

Mock Exam - 2 hours - use of basic (non-programmable) calculator is allowed - all exercises carry the same marks - exam is strictly individual Mock Exam - 2 hours - use of basic (non-programmable) calculator is allowed - all exercises carry the same marks - exam is strictly individual Question 1. Suppose you want to estimate the percentage of

More information

HANDBOOK OF APPLICABLE MATHEMATICS

HANDBOOK OF APPLICABLE MATHEMATICS HANDBOOK OF APPLICABLE MATHEMATICS Chief Editor: Walter Ledermann Volume VI: Statistics PART A Edited by Emlyn Lloyd University of Lancaster A Wiley-Interscience Publication JOHN WILEY & SONS Chichester

More information

Non-parametric tests, part A:

Non-parametric tests, part A: Two types of statistical test: Non-parametric tests, part A: Parametric tests: Based on assumption that the data have certain characteristics or "parameters": Results are only valid if (a) the data are

More information

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals (SW Chapter 5) Outline. The standard error of ˆ. Hypothesis tests concerning β 3. Confidence intervals for β 4. Regression

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Measuring relationships among multiple responses

Measuring relationships among multiple responses Measuring relationships among multiple responses Linear association (correlation, relatedness, shared information) between pair-wise responses is an important property used in almost all multivariate analyses.

More information

SOLUTIONS Problem Set 2: Static Entry Games

SOLUTIONS Problem Set 2: Static Entry Games SOLUTIONS Problem Set 2: Static Entry Games Matt Grennan January 29, 2008 These are my attempt at the second problem set for the second year Ph.D. IO course at NYU with Heski Bar-Isaac and Allan Collard-Wexler

More information

Spearman Rho Correlation

Spearman Rho Correlation Spearman Rho Correlation Learning Objectives After studying this Chapter, you should be able to: know when to use Spearman rho, Calculate Spearman rho coefficient, Interpret the correlation coefficient,

More information

L7: Multicollinearity

L7: Multicollinearity L7: Multicollinearity Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Introduction ï Example Whats wrong with it? Assume we have this data Y

More information

Can you tell the relationship between students SAT scores and their college grades?

Can you tell the relationship between students SAT scores and their college grades? Correlation One Challenge Can you tell the relationship between students SAT scores and their college grades? A: The higher SAT scores are, the better GPA may be. B: The higher SAT scores are, the lower

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n = Hypothesis testing I I. What is hypothesis testing? [Note we re temporarily bouncing around in the book a lot! Things will settle down again in a week or so] - Exactly what it says. We develop a hypothesis,

More information

A Primer on Statistical Inference using Maximum Likelihood

A Primer on Statistical Inference using Maximum Likelihood A Primer on Statistical Inference using Maximum Likelihood November 3, 2017 1 Inference via Maximum Likelihood Statistical inference is the process of using observed data to estimate features of the population.

More information

Violating the normal distribution assumption. So what do you do if the data are not normal and you still need to perform a test?

Violating the normal distribution assumption. So what do you do if the data are not normal and you still need to perform a test? Violating the normal distribution assumption So what do you do if the data are not normal and you still need to perform a test? Remember, if your n is reasonably large, don t bother doing anything. Your

More information

Sensitivity Analysis with Correlated Variables

Sensitivity Analysis with Correlated Variables Sensitivity Analysis with Correlated Variables st Workshop on Nonlinear Analysis of Shell Structures INTALES GmbH Engineering Solutions University of Innsbruck, Faculty of Civil Engineering University

More information

Factor Analysis. Qian-Li Xue

Factor Analysis. Qian-Li Xue Factor Analysis Qian-Li Xue Biostatistics Program Harvard Catalyst The Harvard Clinical & Translational Science Center Short course, October 7, 06 Well-used latent variable models Latent variable scale

More information

A Measure of Monotonicity of Two Random Variables

A Measure of Monotonicity of Two Random Variables Journal of Mathematics and Statistics 8 (): -8, 0 ISSN 549-3644 0 Science Publications A Measure of Monotonicity of Two Random Variables Farida Kachapova and Ilias Kachapov School of Computing and Mathematical

More information

Statistics for Managers Using Microsoft Excel 5th Edition

Statistics for Managers Using Microsoft Excel 5th Edition Statistics for Managers Using Microsoft Ecel 5th Edition Chapter 7 Sampling and Statistics for Managers Using Microsoft Ecel, 5e 2008 Pearson Prentice-Hall, Inc. Chap 7-12 Why Sample? Selecting a sample

More information

Stat 5421 Lecture Notes Fuzzy P-Values and Confidence Intervals Charles J. Geyer March 12, Discreteness versus Hypothesis Tests

Stat 5421 Lecture Notes Fuzzy P-Values and Confidence Intervals Charles J. Geyer March 12, Discreteness versus Hypothesis Tests Stat 5421 Lecture Notes Fuzzy P-Values and Confidence Intervals Charles J. Geyer March 12, 2016 1 Discreteness versus Hypothesis Tests You cannot do an exact level α test for any α when the data are discrete.

More information

Preptests 55 Answers and Explanations (By Ivy Global) Section 4 Logic Games

Preptests 55 Answers and Explanations (By Ivy Global) Section 4 Logic Games Section 4 Logic Games Questions 1 6 There aren t too many deductions we can make in this game, and it s best to just note how the rules interact and save your time for answering the questions. 1. Type

More information

Sampling Distributions: Central Limit Theorem

Sampling Distributions: Central Limit Theorem Review for Exam 2 Sampling Distributions: Central Limit Theorem Conceptually, we can break up the theorem into three parts: 1. The mean (µ M ) of a population of sample means (M) is equal to the mean (µ)

More information

Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box Durham, NC 27708, USA

Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box Durham, NC 27708, USA Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box 90251 Durham, NC 27708, USA Summary: Pre-experimental Frequentist error probabilities do not summarize

More information

Introduction to emulators - the what, the when, the why

Introduction to emulators - the what, the when, the why School of Earth and Environment INSTITUTE FOR CLIMATE & ATMOSPHERIC SCIENCE Introduction to emulators - the what, the when, the why Dr Lindsay Lee 1 What is a simulator? A simulator is a computer code

More information