Investigating the Significance of a Correlation Coefficient using Jackknife Estimates

Similar documents
Properties and Hypothesis Testing

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Regression, Inference, and Model Building

1 Inferential Methods for Correlation and Regression Analysis

A statistical method to determine sample size to estimate characteristic value of soil parameters

3 Resampling Methods: The Jackknife

Chapter 13, Part A Analysis of Variance and Experimental Design

(all terms are scalars).the minimization is clearer in sum notation:

GG313 GEOLOGICAL DATA ANALYSIS

11 Correlation and Regression

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

Final Examination Solutions 17/6/2010

Chapters 5 and 13: REGRESSION AND CORRELATION. Univariate data: x, Bivariate data (x,y).

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y

Random Variables, Sampling and Estimation

Resampling modifications for the Bagai test

Introduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 3

Goodness-Of-Fit For The Generalized Exponential Distribution. Abstract

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

The Bootstrap, Jackknife, Randomization, and other non-traditional approaches to estimation and hypothesis testing

Mathematical Notation Math Introduction to Applied Statistics

Access to the published version may require journal subscription. Published with permission from: Elsevier.

Statistical Analysis on Uncertainty for Autocorrelated Measurements and its Applications to Key Comparisons

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

There is no straightforward approach for choosing the warmup period l.

Lecture 2: Monte Carlo Simulation

Improved Class of Ratio -Cum- Product Estimators of Finite Population Mean in two Phase Sampling

Stat 200 -Testing Summary Page 1

[412] A TEST FOR HOMOGENEITY OF THE MARGINAL DISTRIBUTIONS IN A TWO-WAY CLASSIFICATION

Chapter 11 Output Analysis for a Single Model. Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

AClassofRegressionEstimatorwithCumDualProductEstimatorAsIntercept

Lesson 11: Simple Linear Regression

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Correlation Regression

A Relationship Between the One-Way MANOVA Test Statistic and the Hotelling Lawley Trace Test Statistic

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

V. Nollau Institute of Mathematical Stochastics, Technical University of Dresden, Germany

Sample Size Determination (Two or More Samples)

ECON 3150/4150, Spring term Lecture 3

G. R. Pasha Department of Statistics Bahauddin Zakariya University Multan, Pakistan

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Expectation and Variance of a random variable

Assessment and Modeling of Forests. FR 4218 Spring Assignment 1 Solutions

11 THE GMM ESTIMATION

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Comparison of Minimum Initial Capital with Investment and Non-investment Discrete Time Surplus Processes

Worksheet 23 ( ) Introduction to Simple Linear Regression (continued)

Linear Regression Models

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Lecture 33: Bootstrap

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

The standard deviation of the mean

Stat 139 Homework 7 Solutions, Fall 2015

Stat 319 Theory of Statistics (2) Exercises

Describing the Relation between Two Variables

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

SIMPLE LINEAR REGRESSION AND CORRELATION ANALYSIS

Correlation. Two variables: Which test? Relationship Between Two Numerical Variables. Two variables: Which test? Contingency table Grouped bar graph

Additional Notes and Computational Formulas CHAPTER 3

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

Efficient GMM LECTURE 12 GMM II

Biostatistics for Med Students. Lecture 2

Statistical inference: example 1. Inferential Statistics

Confidence Interval for Standard Deviation of Normal Distribution with Known Coefficients of Variation

Output Analysis (2, Chapters 10 &11 Law)

4 Multidimensional quantitative data

This is an introductory course in Analysis of Variance and Design of Experiments.

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

Chain ratio-to-regression estimators in two-phase sampling in the presence of non-response

CONTROL CHARTS FOR THE LOGNORMAL DISTRIBUTION

Algebra of Least Squares

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Chapter 6 Sampling Distributions

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

Confidence interval for the two-parameter exponentiated Gumbel distribution based on record values

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Common Large/Small Sample Tests 1/55

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution

[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is:

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

University of California, Los Angeles Department of Statistics. Hypothesis testing

Modified Ratio Estimators Using Known Median and Co-Efficent of Kurtosis

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

Statistics Lecture 27. Final review. Administrative Notes. Outline. Experiments. Sampling and Surveys. Administrative Notes

Topic 9: Sampling Distributions of Estimators

Chapter 5: Hypothesis testing

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

Section 14. Simple linear regression.

This chapter focuses on two experimental designs that are crucial to comparative studies: (1) independent samples and (2) matched pair samples.

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Topic 9: Sampling Distributions of Estimators

Transcription:

Iteratioal Joural of Scieces: Basic ad Applied Research (IJSBAR) ISSN 2307-4531 (Prit & Olie) http://gssrr.org/idex.php?joural=jouralofbasicadapplied --------------------------------------------------------------------------------------------------------------------------- Ivestigatig the Sigificace of a Correlatio Coefficiet usig Jackkife Estimates Athoy Akpata a, Idika Okorie b a,b Departmet of Statistics, Abia State Uiversity Uturu,Nigeria a Email: ac_akpa@yahoo.com b Email:iokorie@yahoo.com Abstract Ofte i Applied statistics, populatio parameters are ot kow ad could be iferred usig the available sample data ad this is the uderpiig of statistical iferece. Resamplig techique such as jackkife offers effective estimates of parameters ad its asymptotic distributio. I this paper, we preset the jackkife estimate of the parameters of a simple liear regressio model with particular iterest o the correlatio coefficiet. This procedure provides a effective alterative test statistic for testig the ull hypothesis of o associatio betwee the explaatory variables ad a respose variable. Keywords: Jackkife; simple liear regressio; correlatio coefficiet; ols estimates; bias. 1. Itroductio After estimatio of parameters i applied statistics it is always crucial to assess the accuracy of the estimator by its stadard error ad costructio of cofidece itervals for the parameter [1]. Queouille i 1956 developed a cross validatio procedure kow as jackkife (leave-oe-out procedure) for estimatig the bias of a estimator [2]. Two years later this method was further exteded by Joh Tukey to estimate the variace of a estimator ad the ame Jackkife was coied for this cross validatio method [3]. ------------------------------------------------------------------------ * Correspodig author. E-mail address: ac_akpa@yahoo.com 441

Iteratioal Joural of Scieces: Basic ad Applied Research (IJSBAR)(2015) Volume 22, No 2, pp 441-448 The jackkife algorithm is a iterative procedure. The iitial step is to estimate the parameter(s) from the etire sample. The the ith elemet (datum) is sequetially dropped from the sample ad the model parameters estimated from the reduced sample data. The resultat estimates are called the partial estimate (pseudo estimates) [4]. The mea of the pseudo estimates is referred to as the jackkife estimate used i place of the mai parameter value [5]. Also, from the pseudo estimates the stadard errors of the parameters could be estimated usig the stadard deviatio i order to eable a statistically sigificat test of the parameters ad the costructio of the cofidece iterval [6]. Regressio aalysis has bee widely used to explai the relatioship betwee the explaatory variables ad a respose variable. However, jackkife was foud viable i estimatig the samplig distributio of the regressio coefficiets i the work of Efro [7], ad further exteded by Freedma [8] ad Wu [9]. With a special case of the simple liear regressio model, this article is aimed at illustratig a alterative to the classic test statistic for assessig the sigificace of the correlatio coefficiet usig jackkife estimates. 2. Methods The liear regressio model could be give i matrix form Y = Xθ + ε (1) Where x 11 x 12 x 1p x 21 x 22 x 2p X = x 1 x 2 x p p, is the p desig matrix (matrix of the explaatory variables) ad the remaiig quatities are vectors correspodig to p 1 regressio parameters, 1 respose variable ad 1 ormally distributed error term with zero mea ad costat variace, defied by θ 1 θ = θ p p 1 Y 1, Y = Y 1 ad ε = ε 1 ε 1. The simple liear regressio model with oe explaatory variable (x i, i = 1,2,3,, ) ad two parameters θ (0) ad θ (1) correspodig to the itercept ad slope parameter is a special case of (1). Hece, the ordiary least square (ols) estimator of this model is θ ols (0) ols θ = (X X) 1 X Y (2) (1) 442

Iteratioal Joural of Scieces: Basic ad Applied Research (IJSBAR)(2015) Volume 22, No 2, pp 441-448 Where 1 x 1 1 x X = 2 1 x 2 With variace covariace matrix of θ ols (0) ad θ ols (1) give by var/cov θ ols (0) ols θ = σ 2 (X X) 1 2 2 (3) (1) Where the diagoal elemets of (3) are the variaces of θ ols (0) ad θ ols (1) respectively, ad the off-diagoals are their co-variaces. Also, the least squares estimate of the correlatio coefficiet which measures the stregth of a liear relatioship is give by the Pearso product momet estimate x i y i x i y i ρ x,y = ( x 2 i [ x i )( y 2 i [ y i ). (4) ] 2 ] 2 This measure of stregth lies withi 1 ρ x,y 1 where the closer it is to 1 the stroger the positive, if closer to -1 the the stroger the egative relatioship, ad the closer it is to 0, the weaker the relatioship. Iterestigly, -1, 0 ad 1 estimates of this measure imply perfect egative, o ad perfect positive relatioships, respectively. Also, it is ofte ecessary to test the sigificace of this parameter with the followig hypothesis ad test statistic Hypothesis: H 0 : ρ x,y = 0 H 1 : ρ x,y 0 Test Statistic ρ x,y 2 1 ρ x,y 2 ~t α,( 2). However, the test statistic above is classical, ad i this article we propose a jackkife based statistic ρ x,y(j) var ρ x,y(j) ~t α,( 2) for testig the above hypothesis. The jackkife estimates of (2) ad (4) is obtaied by leavig-out the ith observatio of the pair y i, x i ; i = 443

Iteratioal Joural of Scieces: Basic ad Applied Research (IJSBAR)(2015) Volume 22, No 2, pp 441-448 1,2,3,, ad evaluatig θ ols (J) ad ρ x,y (J) the least squares estimates based o the remaiig observatios [10]. The estimates of θ J ad ρ J, bias ad variace usig the pseudo values θ Ji ad ρ x,y(ji) are θ J = θ Ji (5) With bias bias = θ ols θ Ji (6) Or more succictly bias = θ ols θ J (7) Ad the variace var θ J = θ Ji θ J 2 ( 1) (8) Also, ρ x,y(j) = ρ x,y(ji) (9) With bias bias = ρ x,y ρ x,y(ji) (10) Or bias = ρ x,y ρ x,y(j) (11) Ad variace var ρ x,y(j) = ρ x,y(ji) ρ x,y(j) 2 ( 1) (12) 2.1 Algorithm for Jackkifig Simple Liear Regressio Model Steps: 444

Iteratioal Joural of Scieces: Basic ad Applied Research (IJSBAR)(2015) Volume 22, No 2, pp 441-448 Usig a pair of idepedet sample of size () of explaatory ad respose variables (x i, y i ), i = 1, 2, 3,,. Drop the first datum i both variable ad estimate the ordiary least squares (ols) regressio coefficiets θ (0)J1 ad θ (1)J1 ad the correlatio coefficiet ρ x,y(j1) usig 1 observatios. Drop the secod datum ad replace the iitially dropped datum i (ii) ad compute the ordiary least squares (ols) regressio coefficiets θ (0)J2 ad θ (1)J2 ad the correlatio coefficiet ρ x,y(j2) usig 1 observatios. Repeat steps (ii) ad (iii) by replacig the (i 1)th previously dropped observatio ad droppig the ith observatio ad the computig the ordiary least squares (ols) regressio coefficiets θ (0)Ji ad θ (1)Ji, i = 3, 4, 5,, ad the correlatio coefficiet ρ x,y(ji), i = 3, 4, 5,, usig 1 observatios at each iteratio util all the observatios i the pair (x i, y i ), i = 1, 2, 3,, has bee sequetially dropped ad replaced i turs. Steps (ii) to (iv) results to a dimesioal vectors of pseudo values correspodig to θ (0)Ji, θ (1)Ji ad ρ x,y(ji). Compute the jackkife regressio parameters, correlatio coefficiets ad their correspodig bias ad stadard errors usig (5), (7), (8), (9), (11), ad (12). 3. Data ad Simulatio We have used the total demad ad supply of FOREX (USD millio) data from Jauary 2008 to May 20014 (77 data poits) available o the Cetral Bak of Nigeria official website [11]. All computatios are doe usig R programs for widows. 3.1 Simulatio Results Usig the data i 2.0 we fit a simple liear regressio model ad the result is show i Table 1. Table 1: Parameter Estimates for the Fitted Simple Liear Regressio Model Parameters θ (0) θ (1) ρ x,y Estimate 925.33554 0.49293 0.7232257 Stadard Error 182.61583 0.05435-3.1.1 Jackkifig the Simple Liear Regressio Model Table 2 shows the ols estimates of the pseudo values, jackkife estimates ad their correspodig stadard errors obtaied from the leave-oe-out procedure. 445

Iteratioal Joural of Scieces: Basic ad Applied Research (IJSBAR)(2015) Volume 22, No 2, pp 441-448 Table 2: ols Estimates S/N θ (0)Ji θ (1)Ji ρ x,y(ji) 1 952.7329 0.4867450 0.7188903 2 966.5801 0.4822654 0.7111098 3 972.6817 0.4806240 0.7101433 4 959.0865 0.4842215 0.7119457 5 945.4296 0.4886822 0.7203511 73 868.0236 0.5149849 0.7233487 74 935.5823 0.4876679 0.7174566 75 936.0865 0.4877716 0.7162645 76 926.6243 0.4899442 0.7226179 77 920.3058 0.4905668 0.7264761 θ (0)J θ (1)J ρ x,y(j) 925.023 0.4930578 0.7232903 SE θ (0)J SE θ (1)J SE ρ x,y(j) 35.01666 0.01321761 0.0142381 Table 3: Compariso betwee ols ad Jackkife ols Estimates Estimates ols Jackkife Bias θ (0) 925.33554 925.023 0.31254 SE θ (0) 182.61583 35.01666 - θ (1) 0.49293 0.4930578-0.0001278 SE θ (1) 0.05435 0.01321761 - ρ x,y 0.7232257 0.7232903-0.0000646 SE ρ x,y - 0.0142381-3.1.2. Testig the sigificace of the correlatio coefficiet We shall proceed to test the sigificace of the correlatio coefficiet at 5% level of sigificace as follows: H 0 : ρ x,y = 0 H 1 : ρ x,y 0 446

Iteratioal Joural of Scieces: Basic ad Applied Research (IJSBAR)(2015) Volume 22, No 2, pp 441-448 classic = ρ x,y 2 1 ρ x,y 2 = (0.7232903) 77 2 1 (0.7232903) 2 = 9.07093 jackkife = ρ x,y(j) SE ρ x,y(j) = 0.7232903 0.0142381 = 50.79964 with critical value t α,( 2) = t 0.05,(77 2) = 1.992102. Decisio: Sice both test statistics are larger tha the critical value, we therefore coclude that there is eough evidece agaist the ull hypothesis; hece, the correlatio coefficiet is sigificatly differet from 0 at 5% level of sigificace. 3.1.3. Discussios The jackkife (leave-oe-out) ols estimator provides better estimates of the regressio parameters tha the ols method. From Table 3 above it could be see that the Jackkife estimates of both the regressio coefficiets θ (0) ad θ (1) ad the correlatio coefficiet ρ x,y are approximately the ols estimates with very small bias, it is iterestig to observe that the Jackkife estimates has smaller stadard errors (Efficiecy property), a uique feature of a good estimator i compariso to their ols couterpart. The classic test statistic value for testig the sigificace of the correlatio coefficiet is smaller tha the value obtaied from the proposed jackkife test statistic; this is a cosequece of a large variace of the ols estimates. 4. Coclusio Jackkife results are misleadig whe the sample size is ot large eough ( < 50), [12]. Factually, the 77 observatios used i this study reveals that the Jackkife estimators are more efficiet tha their ols couterpart i estimatig the coefficiets of a liear regressio model ad the correlatio coefficiet. It also provides the asymptotic distributio of the above metioed parameters, e.g., Table 2. The classic test statistic for testig the sigificace of the correlatio coefficiet is uder-estimated, a effect of large stadard error of the ols estimators ad cosequetly, could lead to erroeously acceptig the ull hypothesis (Type II error). Without loss of geerality, the jackkife based test statistic is better tha its classic couterpart. Refereces [1] M. R. Cherick. Bootstrap Methods a Guide for Practitioers ad Researchers. 2d ed; Joh Wiley & Sos Ic., New Jersey, 2008. [2] M. H. Queouille. "Notes o Bias i Estimatio", Biometrika, 61, pp. 1-17, 1956. [3] J. W.Tukey."Bias, ad Cofidece i ot Quite Large Samples (Abstract)" Aals of Mathematical 447

Iteratioal Joural of Scieces: Basic ad Applied Research (IJSBAR)(2015) Volume 22, No 2, pp 441-448 Statistics, 29, pp. 614, 1985. [4] H. Friedl ad E. Stampfer. "Jackkife Resamplig", Ecyclopaedia of Ecoometrics, 2, pp. 1089-1098, 2002. [5] S. Sahiler ad D.Topuz. "Bootstrap ad Jackkife Resamplig Algorithms for Estimatio of Regressio Parameters", Joural of Applied Quatitative Methods, Vol. 2. No. 2. pp. 188-199, 2007. [6] H. Abdi ad J. L. Williams. "Jackkife", I Neil Salkid (Ed.), Ecyclopaedia of Research Desig. Thousad Oaks, CA: Sage,2010. [7] B. Efro. "Bootstrap Method; aother Look at Jackkife". Aals of Statistics, Vol. 7, pp. 1-26, 1979. [8] D.A. Freedma. "Bootstrappig Regressio Models", Aals of Statistics. Vol.1, No. 6, pp. 1218-1228, 1981 [9] C. F. J. Wu. "Jackkife, Bootstrap ad other Resamplig Methods i Regressio Aalysis", Aals of Statistics, Vol. 14, No. 4, pp. 1261-1295,1986. [10] J. Shao ad D. Tu. The Jackkife ad Bootstrap, Spriger- Verlag, New York, 1995. [11] http//www.cb.gov.g, date accessed 1\5\2015. [12] Zakariya, Y. A. ad Khairy, B. R., (2010), Re-samplig i Liear Regressio Model Usig Jackkife ad Bootstrap, Iraqi Joural of Statistical Sciece. Vol. 18, pp. 59-73. 448