Meta-Analytic Interval Estimation for Bivariate Correlations

Similar documents
An Introduction to Meta-Analysis

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES

Testing equality of variances for multiple univariate normal populations

Biostatistics Department Technical Report

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

Non-Parametric Non-Line-of-Sight Identification 1

OBJECTIVES INTRODUCTION

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Block designs and statistics

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are,

Feature Extraction Techniques

3.3 Variational Characterization of Singular Values

Lower Bounds for Quantized Matrix Completion

A Simple Regression Problem

Machine Learning Basics: Estimators, Bias and Variance

Sampling How Big a Sample?

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

Inference in the Presence of Likelihood Monotonicity for Polytomous and Logistic Regression

Polygonal Designs: Existence and Construction

A Note on the Applied Use of MDL Approximations

RAFIA(MBA) TUTOR S UPLOADED FILE Course STA301: Statistics and Probability Lecture No 1 to 5

Estimation of the Mean of the Exponential Distribution Using Maximum Ranked Set Sampling with Unequal Samples

Online Publication Date: 19 April 2012 Publisher: Asian Economic and Social Society

Ensemble Based on Data Envelopment Analysis

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels

Topic 5a Introduction to Curve Fitting & Linear Regression

Estimation of the Population Mean Based on Extremes Ranked Set Sampling

Modeling the Structural Shifts in Real Exchange Rate with Cubic Spline Regression (CSR). Turkey

Best Procedures For Sample-Free Item Analysis

Probability Distributions

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

Measures of average are called measures of central tendency and include the mean, median, mode, and midrange.

Bootstrapping Dependent Data

Examining the Effects of Site Selection Criteria for Evaluating the Effectiveness of Traffic Safety Countermeasures

Sexually Transmitted Diseases VMED 5180 September 27, 2016

Interactive Markov Models of Evolutionary Algorithms

General Properties of Radiation Detectors Supplements

When Short Runs Beat Long Runs

Analyzing Simulation Results

Pattern Recognition and Machine Learning. Artificial Neural networks

Correcting a Significance Test for Clustering in Designs With Two Levels of Nesting

A method to determine relative stroke detection efficiencies from multiplicity distributions

1 Proof of learning bounds

In this chapter, we consider several graph-theoretic and probabilistic models

Generalized Augmentation for Control of the k-familywise Error Rate

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

Kinetic Theory of Gases: Elementary Ideas

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

arxiv: v1 [stat.ot] 7 Jul 2010

Estimating Parameters for a Gaussian pdf

Pattern Recognition and Machine Learning. Artificial Neural networks

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval

Lost-Sales Problems with Stochastic Lead Times: Convexity Results for Base-Stock Policies

Figure 1: Equivalent electric (RC) circuit of a neurons membrane

A Semi-Parametric Approach to Account for Complex. Designs in Multiple Imputation

On Conditions for Linearity of Optimal Estimation

Testing the lag length of vector autoregressive models: A power comparison between portmanteau and Lagrange multiplier tests

Handwriting Detection Model Based on Four-Dimensional Vector Space Model

Kinetic Theory of Gases: Elementary Ideas

1 Bounding the Margin

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

On the Maximum Likelihood Estimation of Weibull Distribution with Lifetime Data of Hard Disk Drives

arxiv: v1 [cs.ds] 3 Feb 2014

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

Comparing Probabilistic Forecasting Systems with the Brier Score

Curious Bounds for Floor Function Sums

What is Probability? (again)

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

COS 424: Interacting with Data. Written Exercises

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

Bayesian Approach for Fatigue Life Prediction from Field Inspection

TABLE FOR UPPER PERCENTAGE POINTS OF THE LARGEST ROOT OF A DETERMINANTAL EQUATION WITH FIVE ROOTS. By William W. Chen

THE CONSTRUCTION OF GOOD EXTENSIBLE RANK-1 LATTICES. 1. Introduction We are interested in approximating a high dimensional integral [0,1]

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013).

SPECTRUM sensing is a core concept of cognitive radio

Best Linear Unbiased and Invariant Reconstructors for the Past Records

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition

Generalized eigenfunctions and a Borel Theorem on the Sierpinski Gasket.

Weighted- 1 minimization with multiple weighting sets

A proposal for a First-Citation-Speed-Index Link Peer-reviewed author version

arxiv: v1 [cs.ds] 29 Jan 2012

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

E. Alpaydın AERFAISS

Computable Shell Decomposition Bounds

ADVANCES ON THE BESSIS- MOUSSA-VILLANI TRACE CONJECTURE

Moments of the product and ratio of two correlated chi-square variables

Mathematical Models to Determine Stable Behavior of Complex Systems

Synchronization in large directed networks of coupled phase oscillators

International Journal of Pure and Applied Mathematics Volume 37 No , IMPROVED DATA DRIVEN CONTROL CHARTS

AN EFFICIENT CLASS OF CHAIN ESTIMATORS OF POPULATION VARIANCE UNDER SUB-SAMPLING SCHEME

Is the seismic moment frequency relation universal?

C na (1) a=l. c = CO + Clm + CZ TWO-STAGE SAMPLE DESIGN WITH SMALL CLUSTERS. 1. Introduction

Research in Area of Longevity of Sylphon Scraies

Transcription:

Psychological Methods 2008, Vol. 13, No. 3, 173 181 Copyright 2008 by the Aerican Psychological Association 1082-989X/08/$12.00 DOI: 10.1037/a0012868 Meta-Analytic Interval Estiation for Bivariate Correlations Douglas G. Bonett Iowa State University The currently available eta-analytic ethods for correlations have restrictive assuptions. The fixed-effects ethods assue equal population correlations and exhibit poor perforance under correlation heterogeneity. The rando-effects ethods do not assue correlation hoogeneity but are based on an equally unrealistic assuption that the selected studies are a rando saple fro a well-defined superpopulation of study populations. The randoeffects ethods can accoodate correlation heterogeneity, but these ethods do not perfor properly in typical applications where the studies are nonrandoly selected. A new fixed-effects eta-analytic confidence interval for bivariate correlations is proposed that is easy to copute and perfors well under correlation heterogeneity and nonrandoly selected studies. Keywords: confidence interval, fixed effects, rando effects, research synthesis The purpose of this article is to describe an interval estiation procedure for a bivariate correlation using inforation obtained fro different studies with no participant belonging to ore than one study. In each of the studies, a rando saple is obtained fro the study population of interest. In each study, a correlation is coputed fro a rando saple of the study population. The saple correlation estiates the unknown value of the correlation ( i )in the specified study population of study i. The correlation estiate fro study i ay be used to ake a stateent regarding the possible values of i with soe specified level of confidence. Stateents of this type are called confidence intervals. For a given level of confidence, the width of the confidence interval depends on the saple size, and narrow confidence intervals often require large saple sizes. If easureents are costly or participants are difficult to obtain, a single researcher ay not have the resources that are needed to obtain a sufficiently large saple. One solution to this proble was recognized over 100 years ago by Karl Pearson, who cobined correlation estiates fro five different studies to obtain a ore precise estiate of the correlation between inoculation for typhoid fever and ortality (Pearson, 1904). The practice of cobining estiates fro ultiple studies, also called eta-analysis (Glass, 1976), is now a standard statistical tool in psychology, edicine, and the social sciences. Several different eta-analytic ethods for cobining saple correlations fro different studies have been Correspondence concerning this article should be addressed to Douglas G. Bonett, Departent of Statistics, Iowa State University, Aes, IA 50011. E ail: dgbonett@iastate.edu proposed. These ethods ay be classified into two general categories: rando-effects (RE) ethods and fixed-effects (FE) ethods. The RE ethods assue that study populations have been randoly sapled fro a specific and clearly defined superpopulation that consists of a large nuber of study populations. A Gaussian distribution is assued for the set of all i values in the superpopulation. The researcher s goal is to estiate the ean and standard deviation of the Gaussian distribution. Two RE ethods for correlations are described by Hedges and Vevea (1998) and Hunter and Schidt (2004). These ethods are referred to as HV-R and HS-R, respectively. Coputational forulas for the HV-R and HS-R ethods are given in Appendix A. The HV-R and HS-R ethods have been recoended because they do not ake the unrealistic assuption that all i values are identical. However, unless the study populations are randoly sapled fro a specific superpopulation, the researcher cannot ake a statistical inference fro the study populations to the superpopulation. A rando saple of studies has the property that every possible subset of study populations in the superpopulation has exactly the sae probability of being included in the eta-analysis. Additionally, to obtain a rando saple of studies, the selection of any particular study into the eta-analysis saple ust not differentially alter the probabilities of any other study fro the superpopulation being selected into the eta-analysis saple. In a typical eta-analysis, the studies selected for analysis are likely to have been published sequentially over tie, with each study intentionally designed to be siilar or dissiilar to one or ore previous studies, and this would be inconsistent with a rando selection process. The rando selection assuption of the RE ethods is a critical assuption that, as noted by Hedges 173

174 BONETT and Vevea (1998), will virtually never be justified in practice. The FE ethods do not ake the unrealistic assuption that the study populations have been randoly selected fro a superpopulation. With FE ethods, the researcher s goal is to estiate the average correlation ( 1 2... )/. The ost frequently used FE ethod is described by Hedges and Olkin (1985, p. 231). A second ethod is described by Shadish and Haddock (1994, p. 268), who explained how the general FE approach ight be applied to correlations. These ethods are referred to as HO-F and SH-F, respectively. Coputational forulas for the HO-F and SH-F ethods are given in Appendix A. The HO-F and SH-F ethods assue 1 2.... Hunter and Schidt (2000) argued convincingly that this assuption will alost never be satisfied in practice. The HV-R, HS-R, HO-F, and SH-F ethods ay be used to test a hypothesis or construct a confidence interval for the unknown average correlation. Confidence intervals are alost always preferred to null hypothesis tests in scientific studies (Rozeboo, 1960), and a new confidence interval for an average correlation is proposed here. The proposed confidence interval is based on an FE approach. The proposed ethod does not assue that the studies have been randoly selected fro a superpopulation and does not assue 1 2.... Proposed Confidence Interval Let ˆ i denote a saple Pearson correlation fro study i based on a rando saple of size n i. Each saple correlation is an estiate of an unknown population correlation i, and the average value of these population correlations, denoted as, ay provide iportant practical or scientific inforation. The following point estiate of 1 i is proposed: 1 ˆi, (1) which is a type of analog estiate (Goldberger, 1991, p. 117). Note that Equation 1 is an unweighted average of the saple correlations. The HV-R, HS-R, HO-F, and SH-F ethods all use a weighted average of saple correlations (see Appendix A) to estiate the unweighted average of study population correlations. A weighted average will have a saller ean square error (MSE) than an unweighted average will under certain conditions. In a typical etaanalysis where the saple sizes vary across studies and correlation hoogeneity cannot be assued, a weighted average of saple correlations will not necessarily have a saller MSE than the MSE of, as explained in Appendix B. Fisher (1921) and Hotelling (1953) showed that the sall-saple variance of tanh 1 ( ˆ i) ln [(1 ˆ i)/(1 ˆ i)]/2 is closely approxiated by 1/(n i 3). Applying the delta ethod (see, e.g., Stuart & Ord, 1996, p. 350) gives (1 ˆ i2 ) 2 /(n i 3) as an approxiation to the variance of ˆ i tanh{ln [(1 ˆ i)/(1 ˆ i)]/2}. Preliinary coputer siulation results suggest that using (1 ˆ i2 ) 2 /(n i 3) to estiate var( ˆ i) in the proposed confidence interval for presented shortly yields slightly better results than the ore coonly used variance estiate (1 ˆ i2 ) 2 /(n i 1). Assuing independence aong the saple correlations, an estiate of the approxiate variance of is var 2 var( ˆi), (2) where var( ˆ i) (1 ˆ i2 ) 2 /(n i 3). The sapling distribution of is nonnoral partly because it is bounded on the interval 1 to 1. The sapling distribution of tanh 1 ( ) ln [(1 )/(1 )]/2 is unbounded and should ore closely approxiate a noral distribution than the sapling distribution of. Applying the delta ethod (see, e.g., Stuart & Ord, 1996, p. 350), an estiate of the approxiate variance of tanh 1 ( ) is var[tanh 1 ( )] var( )/ 1 2 2 2 1 ˆi2 2 / n i 3 ]/(1 2 ) 2. (3) Assuing approxiate norality of tanh 1 ( ) and using the approxiate variance of tanh 1 ( ) given in Equation 3, the following approxiate 100(1 )% confidence interval for is proposed: tanh{tanh 1 z /2 var[tanh 1 ( )] 1/2 } (4) where tanh(x) [exp(2x) 1]/[exp(2x) 1] and z /2 is a two-sided critical value of the standard noral distribution. When 1, Equation 4 reduces to the confidence interval for a single Pearson correlation coefficient (see, e.g., Glass & Hopkins, 1996, pp. 357 358). Equation 4 also ay be used with partial Pearson correlations by replacing n i 3 with n i 3 q, where q is the nuber of control variables. The bivariate norality assuption ay be difficult to justify for certain quantitative variables, and the Spearan rank correlation ay be used in these situations. The Spearan correlation is siply a Pearson correlation applied to rank-transfored scores for each of the two quantitative variables. Spearan correlations ay be used in Equation 4 but var( ˆ i) should then be replaced with (1 ˆ i2 /2)(1 ˆ i2 ) 2 /(n i 3) on the basis of the results of Bonett and Wright (2000). If the Pearson or Spearan correlations have been

META-ANALYTIC INTERVAL ESTIMATION 175 coputed fro a contingency table, it is necessary to replace var( ˆ i) with the special variance estiates derived by Brown and Benedetti (1977). As explained by Brown and Benedetti (1977), the coonly used variance estiates for Pearson and Spearan correlations assue continuous bivariate distributions, and these variance estiates are inappropriate when the correlations have been estiated fro contingency tables. Siulation Study In a preliinary investigation for 5 and 10, it was found that Equation 4 had a coverage probability very close to 1 for every condition exained. However, the perforance of the other eta-analytic confidence intervals (HS-R, HV-R, HO-F, SH-F) had coverage probabilities that varied considerably depending on the pattern of saple sizes and the pattern of population correlations. To provide an accurate description of how the HS-R, HV-R, HO-F, SH-F ethods perfor under realistic conditions, it is necessary to exaine any different patterns of saple sizes and population correlations. Saple sizes fro 20 to 400 and population correlations fro 0 to 0.9 were exained. A coputer siulation study was perfored in which the coverage probabilities of the five eta-analytic confidence intervals were estiated under 1,500 different patterns of saple sizes and population correlations for 5 and 10. For each of the 1,500 conditions, the coverage probabilities and confidence interval widths for Equation 4, HS-R, HV-R, HO-F, and SH-F were estiated fro 50,000 Monte Carlo trials. Each trial involved the coputer generation of bivariate noral saples of size n i within each of the 5or 10 study populations. The siulation progra was written in GAUSS and executed on a Pentiu 4 coputer. The results of the siulation study are suarized in Tables 1 and 2. Table 1 describes the perforance of the five 95% etaanalytic confidence intervals broken down by six subdivisions of the 1,500 conditions, with 250 different patterns of saple sizes and population correlations within each subdivision. For instance, within the first subdivision, the saple sizes could range fro 20 to 100 and the population correlations could range fro 0 to 0.3. Within a given subdivision, the population correlations ranged fro 0 to 0.3, 0.3 to 0.6, or 0.6 to 0.9. These correlation ranges were selected to reflect the degree of heterogeneity that has been observed in published eta-analyses (see Field, 2005). The saple sizes and population correlations within a single condition were randoly selected with equal probability fro the specified ranges. The coputer siulations attepted to iic a typical eta-analysis where the saple sizes are alost always unequal, the population correlations vary across the study populations, and the studies have not been randoly selected fro a specific superpopulation with statistical inference applying only to the studies. Under conditions of unequal correlations, unequal saple sizes, and nonrandoly selected studies, none of the currently available ethods can be expected to perfor properly. Each nuber in Table 1 is a suary of 250 coverage probabilities for a 95% level of confidence. Table 2 extends the results to 90% and 99% confidence levels for saple sizes 20 to 100. Tables 1 and 2 present the average coverage probabilities across 250 conditions, the sallest coverage Table 1 Perforance Coparison of Five 95% Meta-Analytic Confidence Intervals Saple Population Average coverage Miniu coverage Average width sizes correlations Eq. 4 HS-R HV-R HO-F SH-F Eq. 4 HS-R HV-R HO-F SH-F Eq. 4 HS-R HV-R HO-F SH-F 5 20 100 0.3.949.896.970.944.905.943.846.942.898.829.241.229.284.226.218 20 100.3.6.950.907.971.937.822.946.831.910.817.561.202.201.245.186.176 20 100.6.9.952.950.979.867.417.946.835.882.231.003.118.152.176.100.086 40 200 0.3.949.921.974.935.911.946.838.939.839.773.171.185.222.159.156 40 200.3.6.949.935.977.924.831.946.825.925.753.475.140.166.196.130.125 40 200.6.9.950.978.992.796.319.946.777.889.050.001.079.144.166.068.060 10 20 100 0.3.949.913.953.923.861.944.887.937.889.799.173.176.198.161.154 20 100.3.6.949.963.980.948.723.944.845.866.656.290.142.159.169.129.121 20 100.6.9.949.964.978.210.000.946.882.872.127.000.083.120.128.069.057 40 200 0.3.950.980.988.919.855.947.892.936.810.708.121.141.155.112.109 40 200.3.6.949.957.980.944.899.948.825.922.720.298.100.129.141.092.088 40 200.6.9.952.987.982.739.060.948.908.942.007.000.057.109.118.048.042 Note: Eq 4 new fixed-effects ethod; HS-R Hunter and Schidt (2004) rando-effects ethod; HV-R Hedges and Vevea (1998) rando-effects ethod; HO-F Hedges and Olkin (1985) fixed-effects ethod; SH-F Shadish and Haddock (1994) fixed-effects ethod; nuber of study populations.

176 BONETT Table 2 Perforance Coparison of Five 90% and 99% Meta-Analytic Confidence Intervals (n i 20 100) Population Average coverage Miniu coverage Average width 1 correlations Eq. 4 HS-R HV-R HO-F SH-F Eq. 4 HS-R HV-R HO-F SH-F Eq. 4 HS-R HV-R HO-F SH-F 5 0.90 0 0.3.899.841.934.890.840.894.792.897.804.728.220.191.240.193.185 0.90 0.3 0.6.900.861.938.881.747.896.755.849.697.459.170.169.206.156.142 0.90 0.6 0.9.903.930.963.800.317.898.783.839.227.005.096.129.149.082.070 0.99 0 0.3.989.947.994.988.969.986.914.985.960.896.318.298.370.298.238 0.99 0.3 0.6.990.955.994.986.917.986.912.982.950.763.265.265.321.245.232 0.99 0.6 0.9.990.979.996.953.529.985.877.928.281.002.154.201.238.131.112 10 0.90 0 0.3.900.919.949.900.835.896.840.896.802.684.145.149.167.135.129 0.90 0.3 0.6.899.925.964.882.723.897.760.816.647.322.122.131.144.110.104 0.90 0.6 0.9.900.986.982.874.089.899.821.792.157.000.072.102.109.058.049 0.99 0 0.3.990.986.995.989.977.988.962.987.966.902.229.231.260.212.204 0.99 0.3 0.6.990.986.994.975.771.988.967.984.935.660.189.204.225.172.161 0.99 0.6 0.9.990.996.998.977.480.989.960.965.216.000.111.159.172.092.076 Note: Eq 4 new fixed-effects ethod; HS-R Hunter and Schidt (2004) rando-effects ethod; HV-R Hedges and Vevea (1998) rando-effects ethod; HO-F Hedges and Olkin (1985) fixed-effects ethod; SH-F Shadish and Haddock (1994) fixed-effects ethod; nuber of study populations. probability across 250 conditions (iniu coverage probability), and the average interval width across 250 conditions. The best confidence interval ethod will have an average coverage probability close to 1 and a iniu coverage probability that is not too far below 1. Iftwo ethods have siilar average and iniu coverage probabilities, then the ethod with the sallest average interval width is preferred. Fro Tables 1 and 2, it is clear that Equation 4 is superior to all other ethods. The iniu coverage probability is perhaps the ost iportant perforance characteristic because it reveals how poorly a ethod ay perfor for certain patterns of saple sizes and population correlation. The pattern of population correlations will not be known to the researcher, and it is essential that a given ethod perfors well for any pattern of population correlations and saple sizes. In Tables 1 and 2, it can be seen that the HO-F and SH-F ethods have saller average interval widths than the other ethods. However, interval widths aong copeting ethods are not coparable unless the copeting ethods have siilar coverage probabilities. The HO-F and SH-F ethods can have true coverage probabilities that are far below the specified 1 value. The RE ethods have average widths that are wider than that of Equation 4 and also ay have coverage probabilities that are below the specified confidence level. Assessing Correlation Heterogeneity The results of the siulation study indicate that the proposed eta-analytic confidence interval (Equation 4) perfors rearkably well under the wide range of conditions exained. Additional siulation studies suggest that Equation 4 also perfors well under far ore extree degrees of correlation heterogeneity. For instance, using the ethodology that produced the results in Table 1, with 5, saple sizes that vary fro 20 to 100, and population correlations that vary fro 0.9 to 0.9, the average coverage probability for Equation 4 is.950 and the iniu coverage probability across the 250 conditions is.940. However, if the population correlations are highly heterogeneous, it could be argued that, the average of the population correlations, is not a eaningful paraeter to estiate. In applications where the population correlations are believed to be highly disparate, the researcher ay want to estiate the average of two or ore subsets of population correlations where the correlations within subsets are believed to be ore hoogeneous. The foration of subsets could be otivated by known deographic or design differences across the study populations that are believed to oderate the agnitude of the correlation under investigation. The traditional test of correlation hoogeneity (see, e.g., Hedges & Olkin, 1985, p. 153) has been isused in etaanalytic applications. Contrary to popular belief, failure to reject the null hypothesis of equal population correlations does not iply that the population correlations are equal, and rejection of the null hypothesis does not iply that there are eaningfully large differences aong the population correlations. To assess the agnitude of the difference between any two population correlations or subset averages of population correlations, the researcher should use a confidence interval approach. Let A denote a single population correlation or the av-

META-ANALYTIC INTERVAL ESTIMATION 177 erage of two or ore population correlations and let B denote a single population correlation or the average of two or ore population correlations. The correlations that define A ust be distinct fro those that define B. For instance, for 10, the researcher ight define A ( 1 2 3 )/3 and B ( 4 5... 10 )/7. Equation 4 ay be used to obtain confidence intervals for A and B (recall that Equation 4 also works for a single correlation). The lower and upper 100(1 )% interval estiates fro Equation 4 for A are denoted as L A and U A ; the lower and upper 100(1 )% interval estiates fro Equation 4 for B are denoted as L B and U B. Following Zou (2007), lower and upper 100(1 )% interval estiates for A B ay be expressed as L ˆA ˆB ˆA L A 2 U B ˆB 2 1/2 and (5) U ˆA ˆB U A ˆA 2 ˆB L B 2 1/2, (6) where ˆ A and ˆ B represent saple correlations fro a single study or unweighted averages of saple correlations fro two or ore studies. Equations 5 and 6 also ay be applied with partial or Spearan correlations. Coputational Exaple The coputation of Equation 4 is illustrated using a siple hypothetical exaple where the Pearson correlation between students course grades and their course evaluations in four different studies are cobined. The four saple correlations are 0.40, 0.65, 0.60, and 0.45 fro saples of size 55, 190, 65, and 35, respectively. The point estiate of is (0.40 0.65 0.60 0.45)/4 0.525 and tanh 1 (0.525) ln [(1 0.525) ln (1 0.525)]/2 0.583. The estiated variance of is [(1 0.40 2 ) 2 /52 (1 0.65 2 ) 2 /187 (1 0.60 2 ) 2 /62 (1 0.45 2 ) 2 /32]/16 0.00261, and the estiated variance of tanh 1 ( ) is 0.00261/(1 0.525 2 ) 2 0.00498. Inserting these values into Equation 4 and setting.05 gives tanh[0.583 1.96(0.00498) 1/2 ] [tanh(0.445), tanh(0.722)] (0.418, 0.618). Thus, the researcher can be 95% confident that the average correlation between students course grades and course evaluations in the four study populations is in the range of 0.418 to 0.618. The 95% confidence intervals for the HS-R, HV-R, HO-F, and SH-F ethods in this exaple are (0.484, 0.676), (0.425, 0.666), (0.515, 0.655), and (0.539, 0.673), respectively. It is not appropriate to copare the interval widths of these five ethods because they do not have the sae true levels of confidence. A sall siulation study with 4, n 1 55, n 2 190, n 3 65, and n 4 35 was conducted to get an idea of the approxiate levels of confidence for these five ethods. The population correlations for the siulation were set equal to the saple correlation values in this exaple. The estiated coverage probabilities for Equation 4, HS-R, HV-R, HO-F, and SH-F were found to be.950,.828,.924,.571, and.296, respectively. Although the HO-F and SH-F confidence intervals are narrower than the confidence interval of Equation 4 in this exaple, the interval (0.418, 0.619) fro Equation 4 is a 95.0% interval, whereas the interval (0.515, 0.655) fro the HO-F ethod is an approxiate 57.1% confidence interval and the interval (0.539, 0.673) fro the SH-F ethod is an approxiate 29.6% confidence interval. It was stated in the introduction that a priary goal of eta-analysis is to obtain a ore accurate estiate of the paraeter of interest by cobining saple inforation fro ultiple studies. If the saple sizes in the studies are siilar, Equation 4 will yield a confidence interval that is siilar to a confidence interval that would be obtained fro a single saple of size n 1 n 2... n. For instance, if a single saple of size 55 190 65 35 345 produced a saple correlation of 0.525 (the sae as above), the 95% confidence interval for the population correlation would be (0.444, 0.598), which is only slightly narrower than the eta-analytic confidence interval reported above using Equation 4. Suppose that the instructors in Studies 1 and 4 have little teaching experience whereas the instructors in Studies 2 and 3 have extensive teaching experience and the researcher believes that instructor experience ay oderate the correlation between students course grades and course evaluations. The point and 95% interval estiates for A ( 1 4 )/2 are 0.425 and (0.231, 0.587), respectively; the point and 95% interval estiates for B ( 2 3 )/2 are 0.625 and (0.527, 0.707), respectively. Applying Equations 5 and 6 gives a 95% confidence interval for B A equal to ( 0.011, 0.389). The results are inconclusive at the 95% level because the 95% confidence interval for B A includes zero. The confidence interval is wide priarily because of the sall saple sizes in Studies 1 and 4. Future studies that exaine the relation between students course grades and course evaluations could be cobined with the correlations in this eta-analysis to obtain a ore accurate estiate of B A. Concluding Rearks The classic FE eta-analytic confidence intervals (the HO-F and SH-F ethods) for an average Pearson correlation have unacceptable perforance characteristics under correlation heterogeneity and unequal saple sizes. Hunter and Schidt (2000) discussed the serious liitations of the classic FE ethods and warned of potentially isleading results in the any published studies that have used these ethods. The results in Tables 1 and 2 clearly support and reinforce the Hunter Schidt recoendation to discontinue use of the classic FE ethods. The National Research

178 BONETT Council (1992) also concluded that the classic FE ethods should rarely, if ever, be used because of their unacceptable perforance characteristics under effect-size heterogeneity. If researchers coply with the recoendations of Hunter and Schidt (2000) and the National Research Council (1992), researchers ust then choose between HS-R, HV-R, and Equation 4 when conducting a etaanalysis of bivariate correlations. None of the three ethods require the unrealistic assuption that 1 2.... The following four criteria ay then be used to choose between HS-R, HV-R, and Equation 4: (a) coverage probability bias, (b) confidence interval width, (c) perforance of heterogeneity assessent ethods, and (d) ability to generalize beyond the study populations. For the first criterion, Equation 4 is the clear winner because it has a true coverage probability very close to the specified 1 confidence level. The RE ethods can have true coverage probabilities that are far before the specified 1 confidence level. This fact alone is enough to recoend Equation 4 over the RE ethods. Even if the RE ethods did have true coverage probabilities close to 1, Equation 4 would be preferred to the RE ethods with respect to the second criterion of confidence interval width. One consequence of the wider RE confidence intervals is that the total saple size would typically need to be considerably larger if an RE ethod is used instead of Equation 4. Heterogeneity assessent, the third criterion, is fundaentally different in FE and RE ethods. With FE ethods, the researcher deliberately selects studies to cobine and each i is interesting in its own right. Equations 5 and 6 ay be used to copare any pair of individual i values or any pair of subset average values. Equations 5 and 6 ay be used for values as sall as 2 and n i values as sall as 15. Equations 5 and 6 yield a true coverage probability that is very close to 1 when the distributional assuptions within each study are satisfied. Heterogeneity assessent in RE ethods is fraught with serious probles. Unlike FE ethods, RE ethods assue that the studies have been randoly selected fro a superpopulation of studies and that the set of i values in the superpopulation follows a Gaussian distribution with unknown ean and unknown standard deviation. Heterogeneity assessent in an RE odel involves the estiation of, which describes the variability of the set of all i values in the superpopulation. Note that ˆ2 in Equation A3 is a point estiate of 2. A priary purpose of an RE etaanalysis is to obtain a confidence interval for (Equations A3 and A4) and also a confidence interval for (although soe eta-analysts inappropriately report only point estiates of and ). Even if the rando sapling and Gaussian assuptions of the RE odel could be satisfied, Field (2005) found that the HV-R and HS-R confidence intervals for have poor perforance even for very large values of. Confidence intervals for also have serious probles. Viechtbauer (2007) found that the currently available confidence interval ethods for exhibit poor perforance and their use cannot be recoended. Viechtbauer (2007) proposed a new ethod that works well if the effect-size variances within each study are treated as known constants. This assuption is not a proble in the HV-R ethod where the variance of tanh 1 ( ˆ i) is a function of known saple sizes. However, in the HV-R ethod, describes the standard deviation of tanh 1 ( i ) values and not the standard deviation of i values. The currently available confidence interval ethods for are not robust to violations of the Gaussian assuption, and ethods to detect a non-gaussian distribution are not effective. Viechtbauer (2007) also exained the perforance of a nonparaetric bootstrap confidence interval that does not assue a Gaussian distribution but found that this ethod did not perfor well. Furtherore, in any applications, ust be prohibitively large to obtain an acceptably narrow confidence interval for. Generalization beyond the study populations is the fourth criterion by which to copare Equation 4 with the RE ethods. The RE ethods assue that the studies are a rando saple fro a superpopulation of studies. If this assuption could be satisfied, then the SH-R and HV-R confidence intervals would provide stateents about the ean of all i values in the superpopulation. In contrast, Equation 4 only provides a stateent about the value of ( 1 2... )/. The rando sapling assuption is critical: It cannot be taken lightly or ignored as is coonly done in RE eta-analyses. No researcher who understands eleentary statistics would take a nonrando saple of en, woan, and children fro a nearby neighborhood, apply soe inferential statistical ethod, and then clai that the results generalize to all en, woen, and children in the world. Fro eleentary statistics, we know that the researcher ust specify a study population fro which the rando saple has been obtained and that the results of inferential statistical ethods generalize fro the rando saple to the study population. In a single study, the study population is often a sall subset of soe target population that the researcher wants to investigate. For instance, the study population ay consist only of students enrolled in a large introductory psychology course. Stateents about the study population paraeter ay be generalized to a larger target population using logical arguents rather than statistical inference. If the researcher can argue convincingly that the study population paraeter should not be oderated by the deographic or design characteristics of the study population, then others ay be willing to accept the researcher s clai that the study population paraeter should be siilar to a corresponding paraeter in a particular target population. This sae process of logical arguentation ay be used with

META-ANALYTIC INTERVAL ESTIMATION 179 Equation 4 to extend the results beyond the study populations to soe clearly defined target population. If Equation 4 is used and generalization to a larger target population is desired, it is the responsibility of the etaanalyst to provide a detailed description of each study population and convincing reasons why ( 1 2... )/ should be siilar to the average correlation in soe larger target population. This inforation should be ade public so that others ay decide for theselves if such a generalization is justified. Likewise, if RE ethods are used to describe a superpopulation, it is the responsibility of the eta-analyst to provide convincing evidence that the studies represent a legitiate rando saple fro a clearly defined superpopulation. If eta-analysts are held to this requireent by journal editors and reviewers, RE etaanalysis ethods will be used only on rare occasions. In applications where study population correlations are not expected to be equal and where the selected studies are not a rando saple fro a clearly defined superpopulation, Equation 4, which is siple to copute and perfors rearkably well, should be used instead of the HO-F, SH-F, HV-R and HS-R ethods. Equations 5 and 6 are recoended for assessing heterogeneity aong the study population correlations. References Aiken, A. C. (1935). On least squares and linear cobinations of observations. Proceedings of the Royal Statistical Society, 55, 42 48. Bonett, D. G., & Wright, T. A. (2000). Saple size requireents for estiating Pearson, Kendall and Spearan correlations. Psychoetrika, 65, 23 28. Brown, M. B., & Benedetti, J. K. (1977). Sapling behavior and tests for correlations in two-way contingency tables. Journal of the Aerican Statistical Association, 72, 309 315. Field, A. P. (2005). Is eta-analysis of correlation coefficients accurate when population correlations vary? Psychological Methods, 10, 444 467. Fisher, R. A. (1921). On the probable error of the coefficient of correlation deduced fro a sall saple. Metron, 1, 1 32. Glass, G. V. (1976). Priary, secondary, and eta-analysis of research. Educational Researcher, 5, 3 8. Glass, G. V., & Hopkins, K. D. (1996). Statistical ethods in psychology and education (3rd ed.). Boston: Allyn & Bacon. Goldberger, A. S. (1991). A course in econoetrics. Cabridge, MA: Harvard University Press. Hedges, L. V., & Olkin, I. (1985). Statistical ethods in etaanalysis. Orlando, FL: Acadeic Press. Hedges, L. V., & Vevea, J. L. (1998). Fixed- and rando-effects odels in eta-analysis. Psychological Methods, 3, 486 504. Hotelling, H. (1953). New light on the correlation coefficient and its transfors (with discussion). Journal of the Royal Statistical Society, Series B: Statistical Methodology, 15, 193 232. Hunter, J. E., & Schidt, F. L. (2000). Fixed effects vs. rando effects eta-analysis odels: Iplications for cuulative knowledge in psychology. International Journal of Selection and Assessent, 8, 275 292. Hunter, J. E., & Schidt, F. L. (2004). Methods of eta-analysis: Correcting errors and bias in research findings (2nd ed.). Newbury Park, CA: Sage. Judge, G. G., Griffiths, W. E., Hill, R. C., Lütkepohl, H., & Lee, T.-C. (1985). The theory and practice of econoetrics (2nd ed.). New York: Wiley National Research Council. (1992). Cobining inforation: Statistical issues and opportunities for research. Washington, DC: National Acadey Press. Olkin, I., & Pratt, J. W. (1958). Unbiased estiation of certain correlation coefficients. Annals of Matheatical Statistics, 29, 201 211. Pearson, K. (1904). Report on certain enteric fever inoculation statistics. British Medical Journal, 3, 1243 1246. Rozeboo, W. W. (1960). The fallacy of the null hypothesis significance test. Psychological Bulletin, 57, 416 428. Shadish, W. R., & Haddock, C. K. (1994). Cobining estiates of effect size. In H. Cooper & L. V. Hedges (Eds.), Handbook of research synthesis (pp. 261 281). New York: Russell Sage Foundation. Stuart, A., & Ord, J. K. (1996). Kendall s advanced theory of statistics: Vol. 1. Distribution theory. London: Arnold. Viechtbauer, W. (2007). Confidence intervals for the aount of heterogeneity in eta-analysis. Statistics in Medicine, 26, 37 52. Zou, G. Y. (2007). Toward using confidence intervals to copare correlations. Psychological Methods, 12, 399 413. (Appendixes follow)

180 BONETT Appendix A Confidence Interval Forulas for the Hedges and Olkin (1985) Fixed-Effects Method (HO-F), the Shadish and Haddock (1994) Fixed-Effects Method (SH-F), the Hedges and Vevea (1998) Rando-Effects Method (HV-R), and the Hunter and Schidt (2004) Rando-Effects Method (HS-R) Method HO-F tanh w i tanh 1 ˆ i w i where w i n i 3. w i ˆ i w i Method SH-F where w i (n i 1)/(1 ˆ i2 ) 2. z /2 2 w / z /2 2 w / (A1) (A2) Method HV-R tanh i tanh 1 ˆ i i z /2 2 / (A3) where i [1/(n i 3) ˆ 2] 1, ˆ 2 (Q 1)/c, c w i w 2 i / w i, w i n i 3, and Q is the chi-square test statistic for a test of equal population correlations. where ˆ Method HS-R ˆ z /2 n i ˆ i ˆ 2 n i ˆ i/ i n n i. 1/2 (A4) Appendix B Coparison of Weighted and Unweighted Averages It is coon practice in eta-analysis to estiate an unweighted average of population effect sizes using a weighted average of saple effect sizes where the weights are inversely proportional to the squared standard errors. Let ˆi denote an estiator of effect size i with standard error SE( ˆi) obtained fro study i. The classic fixed-effects (FE) estiator of the average effect size 1 i is ˆ w i ˆi w i, (B1) where w i 1/SE( ˆ i) 2. The classic FE variance estiate of ˆ is var ˆ 1 i. (B2) The popularity of Equation B1 is based on the fact that, under certain conditions, it is a iniu variance linear unbiased (MVLU) estiator of, which iplies that ˆ has the sallest variance within a class of all possible unbiased estiators that are linear functions of ˆ i. The MVLU property of ˆ follows fro Aiken s theore (Aiken, 1935), which states that for Y X, (B3) where is an vector of unknown constants, E( ) 0, and cov( ) V with X and V containing known constants, then

META-ANALYTIC INTERVAL ESTIMATION 181 ˆ X V 1 X 1 X V 1 Y (B4) is an MVLU estiator of and cov( ˆ ) (X V 1 X) 1. Setting X to an 1 vector of ones, V 1 to a diagonal atrix with w i along the principal diagonal, and Y [ ˆ1 ˆ2... ˆ ], it is easy to show that, ˆ ˆ, and cov( ˆ ) var( ˆ). Thus ˆ (Equation B1) is an MVLU estiator of under the assuptions of Equation B3. Two key assuptions in Equation B3 require careful attention. The assuption that V is a atrix of known constants will not be satisfied when SE( ˆ i) contains estiated paraeters. For instance, the Shadish and Haddock (1994) FE ethod (SH-F) sets w i (n i 1)/(1 ˆ i2 ) 2 where ˆ i is a saple correlation and thus the weights are rando variables rather than known constants. When V contains paraeter estiates, Equation B4 is called an estiated generalized least squares estiator (see, e.g., Judge, Griffiths, Hill, Lütkepohl, & Lee, 1985, pp. 174 177) with sapling properties that, in general, ay be stated only asyptotically. Thus, when SE( ˆ i) contains estiated paraeters, Equation B2 is valid only asyptotically. Equation B4 is a weighted least squares estiator of. An unweighted least squares estiator of is X X 1 X Y and cov( ) (X X) 1 X VX(X X) 1. Setting X to an 1 vector of ones and Y [ ˆ1 ˆ2... ˆ ], it is easy to show that, 1 ˆ i, and var( ) 2 var( ˆ i). If V is a atrix of known constants, then var( ˆ) var( ) because cov( ) cov( ˆ ) is a positive seidefinite atrix (Judge et al., 1985, p. 170). These results show that a weighted average using inverse variance weighting with known values in the standard error will have a saller variance than an unweighted average. However, the choice between ˆ and should be based on the estiator ean squared error (MSE) rather than the variance. The MSE of an estiator is the su of its variance and squared bias. The bias of ˆ and also ust be taken into consideration. For a eta-analysis of saple Pearson correlations where ˆ i ˆ i and, it is known that E( ˆ i i ) rapidly approaches zero as the saple size increases (Olkin & Pratt, 1958). The saple Pearson correlation is nearly unbiased and we write E( ˆ i) i. The bias of the unweighted average of correlations is E 1 ˆi 1 E ˆi 0. Thus, (Equation 1) is a nearly unbiased estiator of 1 i. The bias of a weighted average of correlations, such as ˆ w i ˆ i/ w i, with known weights is E w i ˆi i w i 1 w i E ˆi w i w i i w i. If all weights are equal (and equal to w), the bias of the weighted average is w i i w i w 1 i w 0. Alternatively, if all population correlations are equal (and equal to ), the bias of the weighted average is w i i w i w i w i 0. A weighted average of correlations is nearly unbiased if the weights are equal or if the population correlations are equal. Another potential source of bias is introduced when transfored correlations are averaged. Specifically, tanh[ 1 tanh 1 ( i )] 1 i unless 1 2.... The bias in ay result in an MSE of that is considerably larger than the MSE of. The bias of a weighted average increases as the weights and the population correlations becoe ore heterogeneous. In Tables 1 and 2, it can be seen that the average widths of the Hedges and Olkin (1985) FE ethod (HO-F) and SH-F are saller than the average width of Equation 4 because var( ) var( ). However, the average coverage probabilities of the HO-F and SH-F ethods can be far below the specified level because of the bias of the weighted averages. Received May 16, 2007 Revision received March 18, 2008 Accepted March 18, 2008