A COMPARISON STUDY BETWEEN THE CORRELATION COEFFICIENTS OF PEARSON, SPEARMAN AND KENDALL WITH NUMERICAL APPLICATIONS
|
|
- Edward Hutchinson
- 5 years ago
- Views:
Transcription
1 A COMPARISON STUDY BETWEEN THE CORRELATION COEFFICIENTS OF PEARSON, SPEARMAN AND KENDALL WITH NUMERICAL APPLICATIONS Nicolae POPOVICIU Abstract. The work systematically presents three types of correlation coefficient: Pearson, Spearman, Kendall. For the last two types the rank correlation coefficient (RCC) is studied. The RCC use two random ordinal variables and Y and mesures the degree of similarity between them or assess the significance of the relation between them. For each RCC a computation algorithm is given. Some numerical examples iluustrate the theory. The section of conclusions shows how to decide which type of correlation coefficient have to be used in a numerical problem. Keywords: Pearson s coefficient, PCC, Spearman s coefficient, SCC, Kendall s coefficient (KCC), PSK Program Description. 1. Introduction For random variables we recall some usual notations and meanings: x1 x xn x =, n 1; =, p1 p pn f ( x) f (x) is the density probability;, Y are random variables (discrete or continuous); Mean value M ( ) = m = m; dispersion or variance D ( ) = σ = = σ = Var( ); m Reduced (normalized) random variable ' = ; M ( ') = 0; σ Covariance of and Y is a real number defined as cov(, Y ) = M[( m ) ( Y m )] (theoretical formula); cov(, Y ) = M ( Y ) M ( ) M ( Y ) (computational formula); Y Hyperion University of Bucharest, popoviciunicolae15@yahoo.ro 347
2 cov(, Y ) = cov( Y, ) (commutativity); cov(, ) = M ( ) [ M ( )] = D ( ) = ( σ cov(, ) = M ( ) M ( ) M ( ) = D ( ) = σ ; If, Y are independently, then cov(, Y ) = 0.. Correlation coefficient of Pearson If cov(, Y ) 0, then the random variables are correlated; If cov(, Y ) = 0, then and Y are not correlated, but we don t know if they were independently. That is why a new measure of correlation is necessary. cov(, Y ) The real number ρ (, Y ) defined as ρ(, Y ) = is σ σ Y called correlation coefficient of Pearson (PCC). Karl Pearson: Sometimes on denotes ρ (, Y ) = ρ P (, Y ) = co r(, Y ). Pearson s correlation coefficient properties: cov(, ) ρ (, Y ) = M ( ' Y '); ρ(, ) = = 1; σ σ cov(, ) ρ(, ) = = 1 1 ρ(, Y ) 1. σ σ cov(, Y ) Remark 1. cov(, Y ) = M[( m ) ( Y my )] and ρ(, Y ) = σ σ Y are called the theoretical formulas. There exists also the estimated formulas, marked by Estim and based on a sample of volume N 1 N = =1 x, i i M ( ) (estimation for mean value); N 1 N s = = ( x ), i 1 i s D ( ), s = s N (estimation for dispersion and standard deviation); ' Estim = ; Y Y Y ' Estim = ; s s Y ) ; 348
3 1 N cov(, Y ) Estim = ( )( ); 1 1 = x i i yi Y N cov(, Y ) Estim ρ (, Y ) Estim = s. s Y The estimation formulas don t use the probabilities p i. 3. Spearman's rank correlation coefficient Charles Edward Spearman ( ): English statistician and psychologist; founder of factorial analysis. We are interested to study a dichotomist process P (i.e. separated in two main parts). For P we have to collect only two sets of data of size n: A = ( a j ) and B = ( b j ); j = 1, n. Generally a j, b j are real numbers, but by using an appropriate transformation we obtain some natural numbers. Example: a = 4, billion becomes b = 400 million; a R, b N. The initial data generate the pairs ( a j, b j ). All numbers use the same unit of measure. Example: a j represents the student s note obtained in theoretical exam; b j represents the student s note obtained in practical exam. These numbers have an ordinal type, namely isn t important the value a j, but it is important its rank (place, position) in the string. The initial data A = ( a j ) and B = ( b j ) generate the random variavles = ( x i ) and Y = ( y i ), where the index i range in the domain i = 1, n; x i and y i are natural numbers. We look for the level of correlation between the random variables and Y. The Spearman's rank correlation coefficient (SCC) is denoted by ρ S (, Y ) and it could be calculated by two formulas. n 6 = D i 1 i Version 1. ρ S 1 = 1, where D i = xi y i. Version. Use the pairs of initial values ( x i, y i ), ( i = 1, n), where the values x i are arranged in increasing order. The x i values, in increasing order are denoted u i. This generates the ranks r = U, r Y = V, where U = ( u i ), V = ( v i ); i = 1, n; u i, v i are natural numbers. 349
4 6 D i 1 i Then ρ S = 1, where D i = u i vi (difference between ranks). 350 n = 4. Kendall's rank correlation coefficient Maurice George Kendall ( ) was an English statistician. We denote by ρ K (, Y ) Kendall s rank correlation coefficient (KCC). The notations and the main idea of section are available. The computation of ρ K (, Y ) is done in several steps. Step 1. Construct the table 1 of input data = ( x i ), Y = ( y i) and the pairs ( x i, y i ); i = 1, n, where the values x i are arranged in increasing order. Hence we use the pairs ( x i, y i ), where the redundant data (the repeating data) aren t eliminated. Step. (Optional). Find the ranks r and r Y for the random variables and Y. a) The ranks of x i are the natural numbers 1,, 3 etc. Find the rank of y i, corerewsponding to x i. So, we construct the table of ranks. The repeating data aren t eliminated. The new random Y is denoted Y '. Obtain the ranks r Y : r 1, r, r3,. b) The table 1 and table yield the table 3 of ranks r and r Y. Step 3. Because the values x i are in natural increasing order, the ranks r aren t used in the computation of Kendall s correlation coefficient. Only the ranks r Y are used. Construct the variable RSY = u ), i = 1, n which contains the superior ranks for variable Y. For this construction we take a fixed value y j from initial table 1 and count how many values y k, situated after y j have the property yk y j. The result is the table 4 of superior ranks for the variable Y. Step 4. Construct the variable RIY = v ), i = 1, n which contains the inferior ranks for variable Y. For this construction we take a fixed value ( i ( i
5 y j from initial table 1 and count how many values y k, situated after y j have the property yk y j. The result is the table 5 of inferior ranks for the variable Y. The redundant values aren t eliminated. Step 5. Compute the rank s difference d i = u i vi, D = ( d i ) and construct the table 6. Step 6. Compute the sum n S d. i = 1 i Step 7. Compute the Kendall s correlation coefficient ρ K = ρ K (, Y ) = S S =. C n Remark. For each type of correlation coefficient we have elaborated a C++ program (PSK Program) to compute the specified coefficient. // PSK Program description // code=1 for Pearson; code= for Spearman; code=3 for Kendall // Pearson correlation. // corpy=covy/d*dy; D=M-M*; DY=MY-MY*MY // cov(,y)=covy=m[(-m)(y-my)] or cov(,y)= M(Y)-M*MY // // Spearman correlation. // Version 1. We use unmodified initial data and Y and the pairs (xi,yi) // Version. We arrange the values xi in increasing order. // // Kendall correlation. // The vector has the components xi in increasing order. The redundant values aren t eliminated. The vector Y generates the pairs (xi,yi). // Observation. If in vector Y appears the redundant values yi (for example: // (redundant values are 9 and 9; 8 and 8) ) then we apply a perturbation of these values, so that the perturbation doesn t change the ranks. 351
6 // For example we put 9.01 instead of the first 9 and 8.01 instead of the first 8; the 9.01 for the second 9 and 8.0 for second 8. Hence the result is // // ; the ranks aren t changed. 5. Numerical applications Application 1 (S). For a group of 10 students on knows the notes = ( x i ) obtained at theoretical exam and the notes Y = ( y i ) for practical exam (Table 1). ρ Tablel 1 (S) [7]. Student Note Note Y cov(, Y ) a) Compute the Pearson correlation coefficient ρ (, Y ) =. σ σ Y b) Compute the Spearsman s rank correlation coefficient = (, Y ). S ρ S Solution. a) M ( ) = m = = 5.5; M ( Y ) = m Y = = 5.5; M ( ) = = 38.3; M ( Y ) = = 38.5; D ( ) = 8.05; D ( Y ) = 8.5; 10 σ =.8373; = σ cov(, Y ) = M[( m ) ( Y m Y 70.5 cov(, Y ) 7.05 cov(, Y ) = = 7.05; ρ(, Y ) = = = 0.865; 10 σ σ Y ρ P (, Y ) = The random variables and Y are very correlated. b) Compute Spearman s rank coefficient ρ = (, Y ). )] S ρ S n = S Di = xi y i. 6 D i 1 i Version 1. Use the formula ρ 1 = 1 ; 35
7 n = S i = u i v i. 6 D i 1 i Version. Use the formula ρ = 1 ; D Version 1. Use the initial data: notes x y. Table (S). (initial exam notes) i, i Student Note Note Y D i D i (the sum is 4) = D = 4; i 1 i ρ S 1 = 1 = ; ρ S = (100 1) Version. Use the pairs of initial notes ( x i, y i ), ( i = 1,10), where the notes x are arranged in increasing order. This generates the ranks i r = U, r Y = V, where U = u ), V = v ); i = 1, n. ( i ( i Table 3 (S) (of ranks) Student Note Note Y r r Y D i D i (the sum is 4) = D = 4; i 1 i ρ S = 1 = ; 10(100 1) ρ S = The boths versions givs the some result. Observation 1. We see that the coefficient values ρ(, Y ) = and ρ S = are very compatible, but the Spearman s coefficient is easier to compute. 353
8 Nevertheless, the pairs ( u i, vi ) with u i in increasing order is rather difficult to construct. For a big volume of data, a computer program is necessary (for example C++ program). Application (K). [11] For 17 economical societies we know two types of data: = ( x i ) : the sums used for publicity (in millions); Y = ( y i) : the total capitals for each society (in millions)l The data x i are arranged in increasing order. Table 1 contains all 17 pairs x, y ). 354 ( i i Table 1 (K) (sum publicity and total capital) = ( x i ) Y = ( y i ) Find the Kendall s rank correlation coefficient of random variables and Y. Solution. We use several steps. Step 0. Arrange the values x i of in increasing order (in this problem it is automatically done). The repeating (redundant) data aren t eliminated. Step 1. Construct the table 1 with the pairs ( x i, y i ); i = 1, n (see the problem formulation). Step. Construct the vector variable RSY = ( u i ), i = 1, n containing the superior ranks of variable Y. We use the table 1 and for a fixed value y j we count the values y k (placed after the value y j ) having the property yk y j. The result is table for variable Y. Tabelul (K). Superior ranks for variable Y Y = ( y i ) RSY = ( u i ) Step 3. Construct the vector variable RIY = ( u i ), i = 1, n containing the inferior ranks of variable Y. We use the table 1 and for a fixed value y j we count the values y k (placed after the value y j ) having the property yk y j. The result is table 3 for variable Y.
9 Table 3 (K). Inferior ranks for variable Y Y = ( y i ) RIY = ( v i ) Step 4. Compute the rank s differences D = ( d i ) (table 4). d i = u v and denote it i i Table 4 (K). Rank s differences RSY = ( u i ) RIY = ( v i ) D = ( d i ) Step 5. Compute the sum of d i and denote S = d. i = 1 i We obtain S = 8. Step 6. Compute the Kendall s rank coefficient ρ K = ρ K (, Y ) = S =. The result is ρ K = 0,603. The variables and Y have I good correlation. Application 3 (PSK). We repeat the application 1 with the data from table 1 PSK). Table 1 (PSK) [7]. Student Nota Nota Y Compute all the correlation coefficients ρ (, Y ) Pearson; ρ (, Y ) Spearman; ρ (, Y ) Kendall. P S Soluţie. D 1 = D = {1,,,10}; D 1 = 10 1 = 9; n = 10. Our the computer program PSK Program gives the following results ρ (, Y ) = 0,854545; ρ 1 (, Y ) = 0,854545; ρ (, Y ) = 0,854545; P ρ K S (, Y ) = 0, The results are compatible between them. K S n 355
10 6. Conclusions In order to draw several conclusins, we use some special notations. D 1 is the domain of x i; D is the domain of y i; if D 1 = D then we denote D = D 1 = D ; S1 = max{ x x D1}; s1 = min{ x x D1}; D1 = S1 s1; D = S s etc. Our problem is to determine the level of correlation between the numerical data A and B, or between the random variables and Y. The answer is obtained by several methods. Method 1. Compute the Pearson s correlation coefficient ρ = ρ P (, Y ). Method. Compute the Spearman s rank correlation coefficient (, Y ). ρ S Method 3. Compute the Kendall s rank correlation coefficient ρ (, Y ). The problem is how to choose the appropriate method? We suggest the following answer. a) The method 1 could be used for any kind of data and Y. b) If D 1 = D, D 1 = D and the norm isn t a big number, then we recommend the method. c) If D1 D we recommend the method 3. REFERENCES [1] Popoviciu N., Tutorial on Statistical Formulas. Parameters Estimation. Confidence Intervals, University Hyperion of Bucharest Annals, Exact Sciences and Engineering Series, Vol. 1, ISSN , Victor Publishing, Bucharest; 013, pp [] Purcaru I., Bâscă O., People, ideas, and Facts from the history of Mathematics, Economic Publishing, Bucharest, [3] Tomescu Rodica, IJACU Daniela, Probability and mathematical statistics, PRINTECH Publishing, Bucharest, 005. [4] Turban E., Aronson J. E., Decision Support Systems and Intelligent Systems, ed. 5th, New Jersey Prentice Hall, [5] SSP-IBM Scientific Subroutine Package, IBM Vienna, [6] Văduva I., Computer-Aided Simulation Models, Technical Publishing House, Bucharest, [7] Voineagu V., Mitruţ C-tin şi colectiv; The theoretical and statistical Macroeconomic; Tests, practical work, case studies, Economic Publishing, Bucharest, [8] Wikipedia: Multivariate Normal: [9] Wikipedia: Birth and death processes. [10] [11] K 356
Correlation and Regression
Correlation and Regression. ITRDUCTI Till now, we have been working on one set of observations or measurements e.g. heights of students in a class, marks of students in an exam, weekly wages of workers
More informationBIOL 4605/7220 CH 20.1 Correlation
BIOL 4605/70 CH 0. Correlation GPT Lectures Cailin Xu November 9, 0 GLM: correlation Regression ANOVA Only one dependent variable GLM ANCOVA Multivariate analysis Multiple dependent variables (Correlation)
More informationCORELATION - Pearson-r - Spearman-rho
CORELATION - Pearson-r - Spearman-rho Scatter Diagram A scatter diagram is a graph that shows that the relationship between two variables measured on the same individual. Each individual in the set is
More informationMeasuring Associations : Pearson s correlation
Measuring Associations : Pearson s correlation Scatter Diagram A scatter diagram is a graph that shows that the relationship between two variables measured on the same individual. Each individual in the
More informationCorrelation: Relationships between Variables
Correlation Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means However, researchers are
More informationData Analysis as a Decision Making Process
Data Analysis as a Decision Making Process I. Levels of Measurement A. NOIR - Nominal Categories with names - Ordinal Categories with names and a logical order - Intervals Numerical Scale with logically
More informationReminder: Student Instructional Rating Surveys
Reminder: Student Instructional Rating Surveys You have until May 7 th to fill out the student instructional rating surveys at https://sakai.rutgers.edu/portal/site/sirs The survey should be available
More informationUNIT 4 RANK CORRELATION (Rho AND KENDALL RANK CORRELATION
UNIT 4 RANK CORRELATION (Rho AND KENDALL RANK CORRELATION Structure 4.0 Introduction 4.1 Objectives 4. Rank-Order s 4..1 Rank-order data 4.. Assumptions Underlying Pearson s r are Not Satisfied 4.3 Spearman
More informationInferences for Correlation
Inferences for Correlation Quantitative Methods II Plan for Today Recall: correlation coefficient Bivariate normal distributions Hypotheses testing for population correlation Confidence intervals for population
More informationCorrelation and Regression. Tudor Călinici 2017
Correlation and Regression Tudor Călinici 2017 1 Objectives To verify the existence of a relation between two quantitative continuous variables using the coefficient of correlation If the correlation exists,
More informationHYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă
HYPOTHESIS TESTING II TESTS ON MEANS Sorana D. Bolboacă OBJECTIVES Significance value vs p value Parametric vs non parametric tests Tests on means: 1 Dec 14 2 SIGNIFICANCE LEVEL VS. p VALUE Materials and
More informationStatistics Introductory Correlation
Statistics Introductory Correlation Session 10 oscardavid.barrerarodriguez@sciencespo.fr April 9, 2018 Outline 1 Statistics are not used only to describe central tendency and variability for a single variable.
More informationCORRELATION. compiled by Dr Kunal Pathak
CORRELATION compiled by Dr Kunal Pathak Flow of Presentation Definition Types of correlation Method of studying correlation a) Scatter diagram b) Karl Pearson s coefficient of correlation c) Spearman s
More informationSection 4.7 Scientific Notation
Section 4.7 Scientific Notation INTRODUCTION Scientific notation means what it says: it is the notation used in many areas of science. It is used so that scientist and mathematicians can work relatively
More informationCorrelation and Linear Regression
Correlation and Linear Regression Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means
More informationMeasurement and Data. Topics: Types of Data Distance Measurement Data Transformation Forms of Data Data Quality
Measurement and Data Topics: Types of Data Distance Measurement Data Transformation Forms of Data Data Quality Importance of Measurement Aim of mining structured data is to discover relationships that
More informationUnit 2. Describing Data: Numerical
Unit 2 Describing Data: Numerical Describing Data Numerically Describing Data Numerically Central Tendency Arithmetic Mean Median Mode Variation Range Interquartile Range Variance Standard Deviation Coefficient
More informationCorrelation & Linear Regression. Slides adopted fromthe Internet
Correlation & Linear Regression Slides adopted fromthe Internet Roadmap Linear Correlation Spearman s rho correlation Kendall s tau correlation Linear regression Linear correlation Recall: Covariance n
More informationINTERVAL ESTIMATION AND HYPOTHESES TESTING
INTERVAL ESTIMATION AND HYPOTHESES TESTING 1. IDEA An interval rather than a point estimate is often of interest. Confidence intervals are thus important in empirical work. To construct interval estimates,
More informationSession III: New ETSI Model on Wideband Speech and Noise Transmission Quality Phase II. STF Validation results
Session III: New ETSI Model on Wideband Speech and Noise Transmission Quality Phase II STF 294 - Validation results ETSI Workshop on Speech and Noise in Wideband Communication Javier Aguiar (University
More informationDependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.
Practitioner Course: Portfolio Optimization September 10, 2008 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x,
More informationSTATISTICS ( CODE NO. 08 ) PAPER I PART - I
STATISTICS ( CODE NO. 08 ) PAPER I PART - I 1. Descriptive Statistics Types of data - Concepts of a Statistical population and sample from a population ; qualitative and quantitative data ; nominal and
More informationData files for today. CourseEvalua2on2.sav pontokprediktorok.sav Happiness.sav Ca;erplot.sav
Correlation Data files for today CourseEvalua2on2.sav pontokprediktorok.sav Happiness.sav Ca;erplot.sav Defining Correlation Co-variation or co-relation between two variables These variables change together
More informationClass 11 Maths Chapter 15. Statistics
1 P a g e Class 11 Maths Chapter 15. Statistics Statistics is the Science of collection, organization, presentation, analysis and interpretation of the numerical data. Useful Terms 1. Limit of the Class
More informationON SMALL SAMPLE PROPERTIES OF PERMUTATION TESTS: INDEPENDENCE BETWEEN TWO SAMPLES
ON SMALL SAMPLE PROPERTIES OF PERMUTATION TESTS: INDEPENDENCE BETWEEN TWO SAMPLES Hisashi Tanizaki Graduate School of Economics, Kobe University, Kobe 657-8501, Japan e-mail: tanizaki@kobe-u.ac.jp Abstract:
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationRegression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y
Regression and correlation Correlation & Regression, I 9.07 4/1/004 Involve bivariate, paired data, X & Y Height & weight measured for the same individual IQ & exam scores for each individual Height of
More information1 Overview. Coefficients of. Correlation, Alienation and Determination. Hervé Abdi Lynne J. Williams
In Neil Salkind (Ed.), Encyclopedia of Research Design. Thousand Oaks, CA: Sage. 2010 Coefficients of Correlation, Alienation and Determination Hervé Abdi Lynne J. Williams 1 Overview The coefficient of
More informationMeasurement Theory. Reliability. Error Sources. = XY r XX. r XY. r YY
Y -3 - -1 0 1 3 X Y -10-5 0 5 10 X Measurement Theory t & X 1 X X 3 X k Reliability e 1 e e 3 e k 1 The Big Picture Measurement error makes it difficult to identify the true patterns of relationships between
More informationStatistical Inference: Estimation and Confidence Intervals Hypothesis Testing
Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire
More informationN Utilization of Nursing Research in Advanced Practice, Summer 2008
University of Michigan Deep Blue deepblue.lib.umich.edu 2008-07 536 - Utilization of ursing Research in Advanced Practice, Summer 2008 Tzeng, Huey-Ming Tzeng, H. (2008, ctober 1). Utilization of ursing
More informationRecall the Basics of Hypothesis Testing
Recall the Basics of Hypothesis Testing The level of significance α, (size of test) is defined as the probability of X falling in w (rejecting H 0 ) when H 0 is true: P(X w H 0 ) = α. H 0 TRUE H 1 TRUE
More informationEXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY
EXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY MODULE 4 : Linear models Time allowed: One and a half hours Candidates should answer THREE questions. Each question carries 20 marks. The number of marks
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationMeasures of Central Tendency and their dispersion and applications. Acknowledgement: Dr Muslima Ejaz
Measures of Central Tendency and their dispersion and applications Acknowledgement: Dr Muslima Ejaz LEARNING OBJECTIVES: Compute and distinguish between the uses of measures of central tendency: mean,
More informationCorrelation and the Analysis of Variance Approach to Simple Linear Regression
Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation
More informationDependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline.
MFM Practitioner Module: Risk & Asset Allocation September 11, 2013 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y
More informationBivariate Relationships Between Variables
Bivariate Relationships Between Variables BUS 735: Business Decision Making and Research 1 Goals Specific goals: Detect relationships between variables. Be able to prescribe appropriate statistical methods
More informationBusiness Statistics. Lecture 10: Correlation and Linear Regression
Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form
More informationCorrelation and regression
NST 1B Experimental Psychology Statistics practical 1 Correlation and regression Rudolf Cardinal & Mike Aitken 11 / 12 November 2003 Department of Experimental Psychology University of Cambridge Handouts:
More informationSTAT 135 Lab 5 Bootstrapping and Hypothesis Testing
STAT 135 Lab 5 Bootstrapping and Hypothesis Testing Rebecca Barter March 2, 2015 The Bootstrap Bootstrap Suppose that we are interested in estimating a parameter θ from some population with members x 1,...,
More informationSoc 3811 Basic Social Statistics Second Midterm Exam Spring Your Name [50 points]: ID #: ANSWERS
Soc 3811 Basic Social Statistics Second idterm Exam Spring 010 our Name [50 points]: ID #: INSTRUCTIONS: ANSERS (A) rite your name on the line at top front of every sheet. (B) If you use a page of notes
More informationStatistics: A review. Why statistics?
Statistics: A review Why statistics? What statistical concepts should we know? Why statistics? To summarize, to explore, to look for relations, to predict What kinds of data exist? Nominal, Ordinal, Interval
More informationMATH 1070 Introductory Statistics Lecture notes Relationships: Correlation and Simple Regression
MATH 1070 Introductory Statistics Lecture notes Relationships: Correlation and Simple Regression Objectives: 1. Learn the concepts of independent and dependent variables 2. Learn the concept of a scatterplot
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationCOMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS
Communications in Statistics - Simulation and Computation 33 (2004) 431-446 COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS K. Krishnamoorthy and Yong Lu Department
More informationChapter 16: Correlation
Chapter 16: Correlation Correlations: Measuring and Describing Relationships A correlation is a statistical method used to measure and describe the relationship between two variables. A relationship exists
More informationHUDM4122 Probability and Statistical Inference. February 2, 2015
HUDM4122 Probability and Statistical Inference February 2, 2015 Special Session on SPSS Thursday, April 23 4pm-6pm As of when I closed the poll, every student except one could make it to this I am happy
More informationSome Review Problems for Exam 3: Solutions
Math 3355 Fall 018 Some Review Problems for Exam 3: Solutions I thought I d start by reviewing some counting formulas. Counting the Complement: Given a set U (the universe for the problem), if you want
More informationEcon 424 Time Series Concepts
Econ 424 Time Series Concepts Eric Zivot January 20 2015 Time Series Processes Stochastic (Random) Process { 1 2 +1 } = { } = sequence of random variables indexed by time Observed time series of length
More informationCorrelation analysis. Contents
Correlation analysis Contents 1 Correlation analysis 2 1.1 Distribution function and independence of random variables.......... 2 1.2 Measures of statistical links between two random variables...........
More informationFactor Analysis. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA
Factor Analysis Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA 1 Factor Models The multivariate regression model Y = XB +U expresses each row Y i R p as a linear combination
More information6 Single Sample Methods for a Location Parameter
6 Single Sample Methods for a Location Parameter If there are serious departures from parametric test assumptions (e.g., normality or symmetry), nonparametric tests on a measure of central tendency (usually
More informationSAMPLING, THE CLT, AND THE STANDARD ERROR. Business Statistics
SAMPLING, THE CLT, AND THE STANDARD ERROR Business Statistics CONTENTS Sampling The central limit theorem Point and interval estimates for μ Confidence intervals for μ Old exam question Further study SAMPLING
More informationPractical Statistics
Practical Statistics Lecture 1 (Nov. 9): - Correlation - Hypothesis Testing Lecture 2 (Nov. 16): - Error Estimation - Bayesian Analysis - Rejecting Outliers Lecture 3 (Nov. 18) - Monte Carlo Modeling -
More informationAdvanced Quantitative Research Methodology Lecture Notes: January Ecological 28, 2012 Inference1 / 38
Advanced Quantitative Research Methodology Lecture Notes: Ecological Inference 1 Gary King http://gking.harvard.edu January 28, 2012 1 c Copyright 2008 Gary King, All Rights Reserved. Gary King http://gking.harvard.edu
More informationCS 147: Computer Systems Performance Analysis
CS 147: Computer Systems Performance Analysis Summarizing Variability and Determining Distributions CS 147: Computer Systems Performance Analysis Summarizing Variability and Determining Distributions 1
More informationAnswer keys for Assignment 10: Measurement of study variables (The correct answer is underlined in bold text)
Answer keys for Assignment 10: Measurement of study variables (The correct answer is underlined in bold text) 1. A quick and easy indicator of dispersion is a. Arithmetic mean b. Variance c. Standard deviation
More information1. Pearson linear correlation calculating testing multiple correlation. 2. Spearman rank correlation calculating testing 3. Other correlation measures
STATISTICAL METHODS 1. Introductory lecture 2. Random variables and probability theory 3. Populations and samples 4. Hypotheses testing and parameter estimation 5. Most widely used statistical tests I.
More informationSome Review Problems for Exam 3: Solutions
Math 3355 Spring 017 Some Review Problems for Exam 3: Solutions I thought I d start by reviewing some counting formulas. Counting the Complement: Given a set U (the universe for the problem), if you want
More informationCORRELATION AND SIMPLE REGRESSION 10.0 OBJECTIVES 10.1 INTRODUCTION
UNIT 10 CORRELATION AND SIMPLE REGRESSION STRUCTURE 10.0 Objectives 10.1 Introduction 10. Correlation 10..1 Scatter Diagram 10.3 The Correlation Coefficient 10.3.1 Karl Pearson s Correlation Coefficient
More informationDefine characteristic function. State its properties. State and prove inversion theorem.
ASSIGNMENT - 1, MAY 013. Paper I PROBABILITY AND DISTRIBUTION THEORY (DMSTT 01) 1. (a) Give the Kolmogorov definition of probability. State and prove Borel cantelli lemma. Define : (i) distribution function
More informationFinding Relationships Among Variables
Finding Relationships Among Variables BUS 230: Business and Economic Research and Communication 1 Goals Specific goals: Re-familiarize ourselves with basic statistics ideas: sampling distributions, hypothesis
More informationPractice Problems Section Problems
Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,
More informationProduct Held at Accelerated Stability Conditions. José G. Ramírez, PhD Amgen Global Quality Engineering 6/6/2013
Modeling Sub-Visible Particle Data Product Held at Accelerated Stability Conditions José G. Ramírez, PhD Amgen Global Quality Engineering 6/6/2013 Outline Sub-Visible Particle (SbVP) Poisson Negative Binomial
More informationSec 3.3 The Conditional & Circuits
Sec 3.3 The Conditional & Circuits Conditional statement: connective if... then. a compound statement that uses the Conditional statements are also known as implications, and can be written as: p q (pronounced
More informationUnit 14: Nonparametric Statistical Methods
Unit 14: Nonparametric Statistical Methods Statistics 571: Statistical Methods Ramón V. León 8/8/2003 Unit 14 - Stat 571 - Ramón V. León 1 Introductory Remarks Most methods studied so far have been based
More informationCorrelation. We don't consider one variable independent and the other dependent. Does x go up as y goes up? Does x go down as y goes up?
Comment: notes are adapted from BIOL 214/312. I. Correlation. Correlation A) Correlation is used when we want to examine the relationship of two continuous variables. We are not interested in prediction.
More informationCorrelation. A statistics method to measure the relationship between two variables. Three characteristics
Correlation Correlation A statistics method to measure the relationship between two variables Three characteristics Direction of the relationship Form of the relationship Strength/Consistency Direction
More informationPROBABILITIES OF MISCLASSIFICATION IN DISCRIMINATORY ANALYSIS. M. Clemens Johnson
RB-55-22 ~ [ s [ B A U R L t L Ii [ T I N PROBABILITIES OF MISCLASSIFICATION IN DISCRIMINATORY ANALYSIS M. Clemens Johnson This Bulletin is a draft for interoffice circulation. Corrections and suggestions
More information9 Correlation and Regression
9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the
More informationHow to fail Hyperbolic Geometry
Hyperbolic Geometry 1 Introduction These notes describe some of the most common misunderstandings and mistakes that occur almost every year. The section headings contain the most common mistakes students
More informationStatistics Assignment 2 HET551 Design and Development Project 1
Statistics Assignment HET Design and Development Project Michael Allwright - 74634 Haddon O Neill 7396 Monday, 3 June Simple Stochastic Processes Mean, Variance and Covariance Derivation The following
More informationA Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag
A Tutorial on Data Reduction Principal Component Analysis Theoretical Discussion By Shireen Elhabian and Aly Farag University of Louisville, CVIP Lab November 2008 PCA PCA is A backbone of modern data
More informationA simple graphical method to explore tail-dependence in stock-return pairs
A simple graphical method to explore tail-dependence in stock-return pairs Klaus Abberger, University of Konstanz, Germany Abstract: For a bivariate data set the dependence structure can not only be measured
More informationA tutorial on Principal Components Analysis
A tutorial on Principal Components Analysis Lindsay I Smith February 26, 2002 Chapter 1 Introduction This tutorial is designed to give the reader an understanding of Principal Components Analysis (PCA).
More informationMock Exam - 2 hours - use of basic (non-programmable) calculator is allowed - all exercises carry the same marks - exam is strictly individual
Mock Exam - 2 hours - use of basic (non-programmable) calculator is allowed - all exercises carry the same marks - exam is strictly individual Question 1. Suppose you want to estimate the percentage of
More informationHANDBOOK OF APPLICABLE MATHEMATICS
HANDBOOK OF APPLICABLE MATHEMATICS Chief Editor: Walter Ledermann Volume VI: Statistics PART A Edited by Emlyn Lloyd University of Lancaster A Wiley-Interscience Publication JOHN WILEY & SONS Chichester
More informationNon-parametric tests, part A:
Two types of statistical test: Non-parametric tests, part A: Parametric tests: Based on assumption that the data have certain characteristics or "parameters": Results are only valid if (a) the data are
More informationRegression with a Single Regressor: Hypothesis Tests and Confidence Intervals
Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals (SW Chapter 5) Outline. The standard error of ˆ. Hypothesis tests concerning β 3. Confidence intervals for β 4. Regression
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationMeasuring relationships among multiple responses
Measuring relationships among multiple responses Linear association (correlation, relatedness, shared information) between pair-wise responses is an important property used in almost all multivariate analyses.
More informationSOLUTIONS Problem Set 2: Static Entry Games
SOLUTIONS Problem Set 2: Static Entry Games Matt Grennan January 29, 2008 These are my attempt at the second problem set for the second year Ph.D. IO course at NYU with Heski Bar-Isaac and Allan Collard-Wexler
More informationSpearman Rho Correlation
Spearman Rho Correlation Learning Objectives After studying this Chapter, you should be able to: know when to use Spearman rho, Calculate Spearman rho coefficient, Interpret the correlation coefficient,
More informationL7: Multicollinearity
L7: Multicollinearity Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Introduction ï Example Whats wrong with it? Assume we have this data Y
More informationCan you tell the relationship between students SAT scores and their college grades?
Correlation One Challenge Can you tell the relationship between students SAT scores and their college grades? A: The higher SAT scores are, the better GPA may be. B: The higher SAT scores are, the lower
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationCHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)
FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter
More informationHypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =
Hypothesis testing I I. What is hypothesis testing? [Note we re temporarily bouncing around in the book a lot! Things will settle down again in a week or so] - Exactly what it says. We develop a hypothesis,
More informationA Primer on Statistical Inference using Maximum Likelihood
A Primer on Statistical Inference using Maximum Likelihood November 3, 2017 1 Inference via Maximum Likelihood Statistical inference is the process of using observed data to estimate features of the population.
More informationViolating the normal distribution assumption. So what do you do if the data are not normal and you still need to perform a test?
Violating the normal distribution assumption So what do you do if the data are not normal and you still need to perform a test? Remember, if your n is reasonably large, don t bother doing anything. Your
More informationSensitivity Analysis with Correlated Variables
Sensitivity Analysis with Correlated Variables st Workshop on Nonlinear Analysis of Shell Structures INTALES GmbH Engineering Solutions University of Innsbruck, Faculty of Civil Engineering University
More informationFactor Analysis. Qian-Li Xue
Factor Analysis Qian-Li Xue Biostatistics Program Harvard Catalyst The Harvard Clinical & Translational Science Center Short course, October 7, 06 Well-used latent variable models Latent variable scale
More informationA Measure of Monotonicity of Two Random Variables
Journal of Mathematics and Statistics 8 (): -8, 0 ISSN 549-3644 0 Science Publications A Measure of Monotonicity of Two Random Variables Farida Kachapova and Ilias Kachapov School of Computing and Mathematical
More informationStatistics for Managers Using Microsoft Excel 5th Edition
Statistics for Managers Using Microsoft Ecel 5th Edition Chapter 7 Sampling and Statistics for Managers Using Microsoft Ecel, 5e 2008 Pearson Prentice-Hall, Inc. Chap 7-12 Why Sample? Selecting a sample
More informationStat 5421 Lecture Notes Fuzzy P-Values and Confidence Intervals Charles J. Geyer March 12, Discreteness versus Hypothesis Tests
Stat 5421 Lecture Notes Fuzzy P-Values and Confidence Intervals Charles J. Geyer March 12, 2016 1 Discreteness versus Hypothesis Tests You cannot do an exact level α test for any α when the data are discrete.
More informationPreptests 55 Answers and Explanations (By Ivy Global) Section 4 Logic Games
Section 4 Logic Games Questions 1 6 There aren t too many deductions we can make in this game, and it s best to just note how the rules interact and save your time for answering the questions. 1. Type
More informationSampling Distributions: Central Limit Theorem
Review for Exam 2 Sampling Distributions: Central Limit Theorem Conceptually, we can break up the theorem into three parts: 1. The mean (µ M ) of a population of sample means (M) is equal to the mean (µ)
More informationTesting Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box Durham, NC 27708, USA
Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box 90251 Durham, NC 27708, USA Summary: Pre-experimental Frequentist error probabilities do not summarize
More informationIntroduction to emulators - the what, the when, the why
School of Earth and Environment INSTITUTE FOR CLIMATE & ATMOSPHERIC SCIENCE Introduction to emulators - the what, the when, the why Dr Lindsay Lee 1 What is a simulator? A simulator is a computer code
More information