ATTENUATION OF THE SQUARED CANONICAL CORRELATION COEFFICIENT UNDER VARYING ESTIMATES OF SCORE RELIABILITY. Celia M. Wilson

Size: px

Start display at page:

Download "ATTENUATION OF THE SQUARED CANONICAL CORRELATION COEFFICIENT UNDER VARYING ESTIMATES OF SCORE RELIABILITY. Celia M. Wilson"

Lorraine Cecily Turner
5 years ago
Views:

1 ATTENUATION OF THE SQUARED CANONICAL CORRELATION COEFFICIENT UNDER VARYING ESTIMATES OF SCORE RELIABILITY Celia M. Wilson Dissertation Prepared for the Degree of DOCTOR OF PHILOSOPHY UNIVERSITY OF NORTH TEXAS August 2010 APPROVED: Robin K. Henson, Major Professor Donald Easton-Brooks, Minor Professor Lesley Leach, Committee Member Qi Chen, Committee Member Abbas Tashakkori, Chair, Department of Educational Psychology Jerry R. Thomas, Dean of the College of Education James D. Meernik, Acting Dean of the Robert B. Toulouse School of Graduate Studies

2 Wilson, Celia M. Attenuation of the Squared Canonical Correlation Coefficient Under Varying Estimates of Score Reliability. Doctor of Philosophy (Educational Research), August 2010, 387 pp., 60 tables, 3 figures, 44 references. Research pertaining to the distortion of the squared canonical correlation coefficient has traditionally been limited to the effects of sampling error and associated correction formulas. The purpose of this study was to compare the degree of attenuation of the squared canonical correlation coefficient under varying conditions of score reliability. Monte Carlo simulation methodology was used to fulfill the purpose of this study. Initially, data populations with various manipulated conditions were generated (N = 100,000). Subsequently, 500 random samples were drawn with replacement from each population, and data was subjected to canonical correlation analyses. The canonical correlation results were then analyzed using descriptive statistics and an ANOVA design to determine under which condition(s) the squared canonical correlation coefficient was most attenuated when compared to population R 2 c values. This information was analyzed and used to determine what effect, if any, the different conditions considered in this study had on R 2 c. The results from this Monte Carlo investigation clearly illustrated the importance of score reliability when interpreting study results. As evidenced by the outcomes presented, the more measurement error (lower reliability) present in the variables included in an analysis, the more attenuation experienced by the effect size(s) produced in the analysis, in this case R 2 c. These results also demonstrated the role between and

3 within set correlation, variable set size, and sample size played in the attenuation levels of the squared canonical correlation coefficient.

5 ACKNOWLEDGEMENTS My sincere appreciation is extended to my major professor, Dr. Robin Henson, for his guidance during this long journey. A simple note of encouragement on a homework assignment so many years ago led me down the path of educational research and I am thankful for it. The teaching ability of Dr. Henson is unsurpassed and because of his skill, I was able to grasp the difficult concepts that allowed me to continue my journey. I would like to thank also my good friend and committee member, Dr. Lesley Leach, whose unwavering support and encouragement (not to mention syntax) helped fuel my motivation to continue. I am grateful also to my committee members, Dr. Donald Easton-Brooks and Dr. Qi Chen for their availability and willingness to help when needed. In addition, I am very grateful to my family. From the beginning my parents, Charles McCall and Sundra Girard, taught me that my possibilities are limitless, through Christ. Thank you for always believing in me, even when I did not, and encouraging and supporting me all my life. Finally, I would like to acknowledge the support that my husband, Jeff, provided to me during the countless evenings and weekends I spent completing this process. His patience with this seemingly unending task (and tuition) was amazing and I am forever grateful. iii

6 TABLE OF CONTENTS Page ACKNOWLEDGEMENTS...iii LIST OF TABLES v LIST OF FIGURES viii INTRODUCTION..1 METHOD RESULTS 34 DISCUSSION..62 APPENDIX.. 67 REFERENCES iv

7 LIST OF TABLES Page 1. Coefficient Values Calculated in Johnson s (1944) Study Johnson s (1944) True versus Obtained Correlation Parameter Levels in Fan (2003) Study Cochran s (1970) Values of f = 2 /R 2 w for Six Examples Cochran s (1970) Values of f = R`2/gR 2 for Three Examples with Positive ρ Edward s (1971) Illustration of the Effect of Measurement Error on Individual Variable Contributions to R Summary of Data Conditions Manipulated in the Study Expected Population Inter-Item Correlations (Λ), Population Between/Within Correlations (Φ), and Population Item Residuals (Θ) Used to Generate Covariance (COV) Matrices Variable Set Size Condition and Corresponding Variable Composites Score Reliability for Composite Variables Data Generation Conditions Used to Obtain Population R c 2 Values Composite Variable Reliability for the 6 Populations Population R c 2 Values (N = 100,000) Descriptive Statistics for Condition Descriptive Statistics for Condition Descriptive Statistics for Condition Descriptive Statistics for Condition Descriptive Statistics for Condition Descriptive Statistics for Condition v

8 20. Descriptive Statistics for Condition Descriptive Statistics for Condition Descriptive Statistics for Condition Descriptive Statistics for Condition Descriptive Statistics for Condition Descriptive Statistics for Condition Descriptive Statistics for Condition Descriptive Statistics for Condition Descriptive Statistics for Condition Descriptive Statistics for Condition Descriptive Statistics for Condition Descriptive Statistics for Condition Descriptive Statistics for Condition Descriptive Statistics for Condition Descriptive Statistics for Condition Descriptive Statistics for Condition Descriptive Statistics for Condition Descriptive Statistics for Condition Descriptive Statistics for Condition Descriptive Statistics for Condition Descriptive Statistics for Condition Descriptive Statistics for Condition Descriptive Statistics for Condition vi

9 43. Descriptive Statistics for Condition Descriptive Statistics for Composite Reliability Level Descriptive Statistics for Composite Reliability Level Descriptive Statistics for Composite Reliability Level Descriptive Statistics for Composite Reliability Level Descriptive Statistics for Composite Reliability Level Descriptive Statistics for Composite Reliability Level Descriptive Statistics for Composite Reliability Level Descriptive Statistics for Composite Reliability Level Descriptive Statistics for Composite Reliability Level Descriptive Statistics for Composite Reliability Level Descriptive Statistics for R c 2 Difference Values Across all Levels of Composite Variable Reliability Main and Interaction Effects for Multi-Way ANOVA Difference Values Reported by Fan (2003) at Population Correlation Levels of.4 and Difference Values from the Current Study Mean (n = 500) R c 2 Values Under Various Study Conditions for Between and Within Set Correlation Mean (n = 500) R c 2 Values Under Various Study Conditions for Between and Within Set Correlation Mean (n = 500) R c 2 Values Under Various Study Conditions for Between and Within Set Correlation.8 61 vii

10 LIST OF FIGURES Page 1. Eight factor model for data generation. Eight variable composites (FX i, FY i ) each composed of four items (x i, y i ) Illustration of influence of sample size by composite variable reliability and between and within set correlation Illustration of influence of variable set size by composite variable reliability and between and within set correlation...51 viii

11 INTRODUCTION In recent history the movement to report effect size estimates in published research, rather than relying solely on null hypothesis significance testing, has strengthened. The controversy, while not new, centers on perceived inadequacies of null hypothesis significance testing and resulting p values, including the inability of p values to inform judgments regarding result importance (Thompson, 1999, p. 168) and that p values do not provide information regarding the probability of result replication in future samples (Thompson, 1996). These matters relate to a common misconception that p values represent the probability the null hypothesis is true in the population when in reality, null hypothesis significance testing tells us the probability of obtaining the given data, assuming the null hypothesis is exactly true (Kirk, 1996, p. 748). With these concerns in mind, effect size reporting is becoming increasingly common (Henson, 2006; Henson & Smith, 2000; Vacha-Haase & Thompson, 2004). While p values indicate statistical significance (or non-significance) of results, effect sizes can indicate the practical significance of research findings. Although there is a proliferation of work detailing the importance of reporting and interpreting effect sizes (Trusty, Thompson & Petrocelli, 2004; Vacha-Haase, Nilsson, Reetz, Lance & Thompson, 2000), some researchers may not be aware of the various factors that impact these estimates. Measurement error, one such factor, attenuates the statistical relationship between two variables (Crocker & Algina, 1986; Fan, 2003). As noted by Henson (2001), Effect size magnitude is inherently attenuated by the reliability of the scores used to obtain the effect estimate (p. 178). Vacha-Haase (1998) also observed the accuracy and replicability of reported effects is dependent, to some 1

12 degree, on the reliability of the scores being analyzed. Measurement error not only attenuates effect sizes, but can also cause effects to fluctuate across studies (Johnson, 1944). As effect size reporting in published research becomes recognized as a necessary and responsible practice (American Psychological Association, 2001; Henson & Smith, 2000; Kirk, 1996; Kline, 2004; Wilkinson & Task Force on Statistical Inference, 1999), it is essential researchers consider the attenuation of effect size estimates due to measurement error. Several researchers have discussed the impact of score reliability on bivariate correlations, specifically Pearson s r and the corresponding variance accounted for effect size, r 2 (Baugh, 2002; Fan, 2003; Johnson, 1944; Onwuegbuzie, Roberts & Daniel, 2005). However, the impact of score reliability generalizes to other variable relationships, including multivariate ones such as the canonical correlation. As canonical correlation analysis (CCA) is the most general case of the parametric general linear model (Baggaley, 1981; Fan, 1996; Fornell, 1978; Henson, 2000; Knapp, 1978; Thompson, 1991; Thompson, 1996), all classical parametric analyses are subsumed by CCA. Because less is known about the impact of reliability on this multivariate relationship, further study is needed to determine how score reliability affects the canonical correlation coefficient (R c, the multivariate analogue to the bivariate correlation coefficient) and thereby the squared canonical correlation coefficient (R 2 c ). The purpose of the present Monte Carlo study was to investigate the impact of score reliability on the squared canonical correlation coefficient. 2

13 Studies Investigating Attenuation of r 2 Due to Measurement Error When statistical tests yield results that are statistically non-significant based on data collected with instruments producing low score reliability, the given results may be questioned with regard to whether or not they are a real result or a statistical artifact (Onwuegbuzie et al., 2005, p. 230). Onwuegbuzie and Daniel (2002) discussed the neglect of score reliability and its effect on statistical power. Specifically, a lower level of score reliability adversely affects the ability of a statistical test to render statistically significant results when a relationship or difference actually exists in the population. Additionally, Onwuegbuzie et al. indicated that in its 5 th edition Publication Manual (2001), even the American Psychological Association does not mention the effect of score reliability on statistical power. Nevertheless, Fan (2003) recommended that, when available, score reliability and its attenuation of effect sizes should be considered in substantive research. Johnson (1944) introduced a study of the influence of measurement errors on the correlation coefficient by asserting that random errors do not always lower a correlation coefficient. Johnson explained results from Thorndike (1907) in which errors were added to two sets of measures to investigate the effect measurement error had on the correlation coefficient. Thorndike reported the errors of measurement caused a downward bias of the observed correlation. Thorndike also reported the Spearman correction formula proved to be an accurate correction. Johnson (1944) used two subsets of the 1920 Stanford Achievement Test (reading and math, n = 55 eighth grade students). The bivariate correlation of the two subsets was 0.588; this value was considered the true correlation. Next, Johnson 3

14 created a group of 100 errors ranging from +9 to -9 and chose 55 at random to apply to the initial scores. This procedure was repeated 6 different times resulting in 6 series of reading and math scores. Thirty-six inter-correlations between the two sets of scores were computed, as well as 15 alternate forms reliability coefficients for reading test scores and 15 alternate forms reliability coefficients for math scores. Next, Johnson calculated the mean of the 15 reliability coefficients, including reading and math independently and reading and math combined, and the 36 correlations between reading and math scores (see Table 1). Table 1 Coefficient Values Calculated in Johnson s (1944) Study 15 reliability coefficients for reading 15 reliability coefficients for math 30 reliability coefficients (reading + math) 36 correlations between reading and math scores Arithmetic Mean Geometric Mean Corrected for Attenuation * Compare to true correlation value of.589 Correction for attenuation (arithmetic mean and geometric mean) Correction for attenuation (geometric mean only).591*.589* Using the arithmetic mean of the 36 trait coefficients (.529) and the geometric mean of the reliability coefficients (.895) yielded a corrected coefficient value of.591. Using the geometric mean of the 36 trait-coefficients (.527) and the geometric mean of the reliability coefficients (.895) yielded a corrected coefficient value of.589. The corrected coefficient value in both cases was almost exactly the true coefficient value. Johnson concluded that when the fluctuations in obtained coefficients are eliminated by averaging a fairly large number of them, Spearman s formula to correct for attenuation 4

15 gives just about perfect correspondence with the true coefficient (Johnson, 1944, p. 526). Furthermore, Johnson (1944) explained that when a true correlation value approaches zero, the observed correlation coefficient will be, with increased frequency, greater than the true correlation value, the converse is also true (see Table 2). Table 2 Johnson s (1944) True versus Obtained Correlation True Coefficient Value Obtained Coefficient Value Approaches 1 Approaches 0 Low frequency of observed correlation value being higher than true correlation value. Increasing frequency of observed correlation value being higher that true correlation value. Johnson claimed measurement error commonly causes two primary problems: lower obtained correlation values and fluctuation in obtained correlation values. Additionally, Johnson argued that because random sampling takes into account chance factors, it is an absolute must when the intent is to calculate a correlation coefficient from observed data. In another study, Fan (2003) compared the classical test theory approach for correcting for attenuation with the latent variable modeling approach for correcting for attenuation. In his Monte Carlo study several parameters were considered: number of items for each composite, magnitude of inter-item correlation within a composite, magnitude of inter-factor correlation and sample size (see Table 3). 5

16 Table 3 Parameter Levels in Fan (2003) Study Parameter Levels Number of items for each composite 4, 8 Magnitude of inter-item correlation.81,.64,.49,.36,.25 Magnitude of inter-factor correlation.4,.6 Sample size 50, 100, 200, 400 It is important to note that in Fan s study reliability was not manipulated directly; that is, it was not possible to generate variables (referred to as composite variables) with set levels of reliability. Therefore, data had to first be generated at the item level, and items were then summed into composite variables. In order that varying reliability estimates be considered, Fan first indicated the number of items for each composite variable and, secondly, the magnitude of the inter-item correlations. Once the number of items per composite variable and inter-item correlations were established, Cronbach s alpha was calculated (population reliability), using, (1) where k is the number of items within a composite variable and ρ is the inter-item correlation. This yielded Cronbach s alpha levels ranging from 0.57 to 0.97 for the composite variables. Twenty populations of data (N = 50,000) were generated (2 items per composite variable x 5 inter-item correlations x 2 inter-factor correlations). Five hundred random samples of each sample size (4) were drawn from each population making the total number of replications 40,000 (20x500x4). Results when the inter-factor correlation was 0.4 indicated the uncorrected (for measurement error) correlation between the two composite variables experienced a systematic downward bias. Specifically, the more measurement error a composite 6

17 variable contained (lower score reliability), the more downward bias experienced by the correlation coefficient (Fan, 2003). Although this was expected, Fan (2003) reported a more unexpected observation that the downward bias can be such that even the upper limit of the 95% confidence interval for the uncorrected correlation may still be lower than the population correlations, especially when the sample size is relatively large and measurement reliability is relatively low (p. 921). A second finding included information about the width of the confidence intervals for the three correlation coefficients (the uncorrected correlation coefficient, the coefficient corrected through classical test theory approach and the coefficient corrected through latent variable modeling approach). First, the width of the confidence interval for the uncorrected correlation coefficient was not affected by a change in reliability. However, the confidence interval widths for the corrected correlation coefficients were related to reliability in that the lower the measurement reliability, the more sampling variation for these two types of correlation coefficients (Fan, 2003, p. 921). Fan also reported that both correction approaches (classical test theory approach and the latent variable modeling approach) yielded unbiased sample estimates. Although results were similar when the inter-factor correlation was.6, Fan found the downward bias of the uncorrected effect was even more severe. When measurement reliability was low even the confidence interval s upper limit of the sample correlations may be quite a bit lower than the population correlation of 0.60 (Fan, 2003, p. 929). Fan asserted that attenuation of correlation coefficients due to score reliability may be even more severe than many researchers realize and that meaningful relationships could be masked by low score reliability. 7

18 In 2005 Onwuegbuzie et al. proposed a what if reliability analysis based upon the 1999 work of Kieffer and Thompson. Kieffer and Thompson introduced a what if analysis that allowed a researcher to determine the sample size needed in order to obtain a statistically significant result, while taking sampling error into account. The work by Onwuegbuzie et al. was similar to the Kieffer and Thompson (1999) work except the reliability coefficient was the index of effect size. The what if reliability study by Onwuegbuzie et al. illustrated that in a situation where the original correlation coefficient was.30, a 25% reduction in score reliability (for both the independent and dependent variables) resulted in a 42.9% reduction in the number of cases needed to achieve statistical significance of the corrected correlation coefficient. Additionally, the corrected correlation went from.38 to.50, an increase from a correction of 2.7% to a correction of 66.7%. As explained by Onwuegbuzie et al. The what if reliability analysis capitalizes on the fact that as the reliability estimate for the scores on the dependent and/or independent measure decreases, the difference between the uncorrected and corrected correlation coefficient increases, and subsequently, the sample size needed for statistical significance of the correlation coefficient decreases. (p. 234) That is, when more measurement error is present in the independent and/or dependent variable, more correction of the correlation coefficient is required. Therefore, a smaller sample size is necessary in order for the corrected correlation coefficient to attain statistical significance. The what if reliability study illustrated the importance of score reliability with regard to effect size estimates. As a note, the authors recognized the theoretical nature of the what if reliability study in that it assumes the same correlation 8

19 would be obtained in future studies with more or fewer participants. When used for theoretical interpretation, what if reliability may be considered in the same category as other investigative tools such as power analysis, confidence intervals and internal replication. Lastly, the authors indicated the double correction for attenuation formula may be squared to provide a correction formula for multiple R 2 where the double correction formula takes the form of (2) where is the corrected correlation coefficient, is the obtained sample correlation coefficient, is the score reliability for the independent variable, and is the score reliability for the dependent variable. Studies Investigating Attenuation of R 2 Due to Measurement Error The previously discussed studies by Thorndike (1907), Johnson (1944), Fan (2003) and Onwuegbuzie et al. (2005) detailed work conducted regarding the attenuation of the bivariate correlation coefficient, r. These studies, thereby, also examined the attenuation of the square of that coefficient, r 2. As explained by Coladarci, Cobb, Minium and Clarke (2008), regression is an extension of the bivariate correlation in that correlation and regression are closely related: without a correlation between two variables, there can be no meaningful prediction from one to the other (Coladarci et al., p. 149). Additional work by Cochran (1970) and Edwards (1971) investigated the effect of measurement error on multiple R 2, the squared correlation coefficient produced in a regression analysis. Cochran (1970) investigated the effect of measurement error on multiple R 2. Cochran s investigation was conducted based on mathematical models, unlike the 9

20 present study which employed Monte Carlo simulation methods. Cochran initially discussed the effect of measurement error on R 2 when predictor variables (two) are independent. The author derived a correction formula for R 2 to illustrate the relationship between R 2 (uncorrected for measurement error) and 2 (corrected for measurement error): 2 2 g y i 2 i g i 2 g y g w (3) where g y is the reliability of Y, g i is the reliability of x i, i 2 is the squared correlation between x iu and Y u, and w is the weighted mean of the reliability coefficients of the x i. Cochran concluded, for situations involving two, uncorrelated predictors, for a given reliability value ( w ), the effect on the residual variance increased as R 2 increased. As such, Cochran reported the effect on R 2 would be greater when the prediction formula is very good than when the formula is average. The author provided the following example (p. 24). Suppose that w =.05, representing a poor reliability of measurement of the x i (given w is the weighted mean of the reliability coefficients of the predictors). If R 2 = 0.9, the residual variance is increased by errors in measurement of the x i from 0.1σ 2 y to 0.55σ 2 y, over a fivefold increase. With R 2 = 0.4, the increase is only from 0.6σ 2 y to 0.8σ 2 y, a 33 % jump. Cochran also investigated scenarios involving two correlated predictor variables. Cochran planned to develop a correction factor, f, which could be applied in most cases and would be equal to 1 when both predictors were uncorrelated. The author soon realized no general correction factor could be developed and therefore presented his results in a table indicating the correction factors necessary for use in 10

21 2 = R 2 g y w f (4) under varying circumstances where 2 is the multiple correlation coefficient corrected for measurement error, R 2 is the multiple correlation coefficient uncorrected for measurement error, g y is the reliability of y, w is the weighted mean of the reliability coefficients of the predictors, and f is the correction factor. The conditions investigated by Cochran included ρ as the correlation between predictors, g i as the reliability of the predictors, as the correlation between each predictor and y and R 2 as the uncorrected multiple correlation coefficient. Essentially, Cochran investigated four conditions and each condition included multiple levels. Table 4 presents values of the correction factor f for the four conditions at multiple levels. Table 4 Cochran s (1970) Values of f = R 2 /R 2 w for Six Examples 1 =.6, 2 =.4 1 =.7, 2 =.2 ρ g i =.9,.7.8,.6.7,.5 g i =.9,.7.8,.6.7,.5 R 2 f f f R 2 f f f -.5 a a a a a a a a g w = g w = a Impossible because R 2 > 1. *Reprinted with permission from the Journal of the American Statistical Association. Copyright 1970 by the American Statistical Association. All rights reserved. Cochran, W.G. (1970). Some effects of errors of measurement on multiple correlation. Journal of the American Statistical Association, 65,

22 Cochran explained the primary difference between the cases of 1 = 0.6, 2 = 0.4 and 1 = 0.7 and 2 = 0.2 is that when 1 and 2 differ greatly and ρ (correlation between predictors) is positive, R 2 begins to increase and f to decrease for quite moderate values of ρ (around 0.3 for 1 =.7, 2 =.2), while when 1 and 2 are more nearly equal, R 2 decreases and f increases until ρ is closer to 1 (p. 26). Finally, Cochran addresses the calculated values of f when 3, 5 and 10 predictors (k) are included, under various conditions of i and g/ρ (see Table 5). Table 5 Cochran s (1970) Values of f=r`2/gr 2 for Three Examples with Positive ρ K = i =.6,.5,.4.5, (.4) 2,.3,.2.5,.4, (.3) 2,(.2) 3, (.1) 3 g\ρ *Reprinted with permission from the Journal of the American Statistical Association. Copyright 1970 by the American Statistical Association. All rights reserved. Cochran, W.G. (1970). Some effects of errors of measurement on multiple correlation. Journal of the American Statistical Association, 65, Cochran indicated the 3, 5 and 10 variable cases demonstrate similar values of f. The author explained the 5 variable case has more variables, which was typically associated with higher f s but also had more variation among the i, leading to lower f s. The ten variable example, which had even more variation among the i, gave values of f near one. Additionally, with all positive inter-variable correlations, the value of f also tended to increase as measurement error increased. An additional study by Edwards (1971) presented correction formulas for partial and multiple correlations. As was the case with Cochran (1970), Edward s investigation 12

23 was conducted based on mathematical models, unlike the present study which employed Monte Carlo simulation methods. Edwards discussed the effects of measurement error on multiple correlations, including the distortion of the predicted variance (R 2 ) attributed to each predictor. Edwards presented the following two predictor case as an illustration of such distortion (see Table 6). Table 6 Edward s (1971) Illustration of the Effect of Measurement Error on Individual Variable Contributions to R 2 *ρ 11 =1.0, **ρ 22 =1.0 R =.385 % of variance unique to x 1 = 13.5 % of variance unique to x 2 = 13.5 % of variance common to 11.5 both= R =.356 ρ 11 =0.8, ρ 22 =1.0 % of variance unique to x 1 = 10.6 % of variance unique to x 2 = 15.6 % of variance common to 9.4 both= R =.328 ρ 11 =0.6, ρ 22 =1.0 % of variance unique to x 1 = 7.8 % of variance unique to x 2 = 17.8 % of variance common to 7.2 both= * ρ 11 indicates score reliability for variable x 1 **ρ 22 indicates score reliability for variable x 2 As demonstrated in the example provided by Edwards (1971), as the difference in the reliabilities of the predictors became greater, the true contribution of the predictors became more distorted. Edwards explained that unreliability in one of the variables takes part of that variable out of the prediction, shifting predicted variance to the more reliable predictors (p. 9). 13

24 The studies by Cochran (1970) and Edwards (1971) examined the attenuation and associated correction formulas of the multiple correlation coefficient, R 2. There is however, in the current literature, no work concerning the attenuation of the canonical correlation coefficient, R c, or the square of that coefficient, R 2 c. Because canonical correlation analysis (CCA) is the most general case of the parametric general linear model (e.g., Fan, 1996; Fan, 1997; Sherry & Henson, 2005; Thompson, 2000) outside of structural equation modeling, which explicitly takes measurement error into account, both the bivariate correlation and multiple correlation are subsumed by CCA. Canonical Correlation Analysis In 1935, Hotelling introduced the canonical variate analysis. However, due to the complexity of the technique, it was underutilized for nearly fifty years. In more recent years, with the proliferation of the computer and the availability of canonical correlation analysis in many statistical packages, the analysis has experienced resurgence. Essentially, canonical correlation analysis is used to examine the relationship between two variable sets. While canonical correlation analysis allows for the examination of more than two variable sets, study of more than two sets is not usually encountered in social science research (Thompson, 2000). The variables in each set must compose a meaningful set (e.g., variables are thought to measure the same construct). For example, suppose a teacher retention specialist is interested in investigating teacher characteristics possibly related to student performance on standardized tests. Assume the researcher believes certain teacher characteristics can lead to higher student performance. The retention specialist first indentifies several variables believed to measure student performance. These variables constitute the predictor variable set and 14

25 could include variables such as district reading scores, district math scores, and SAT scores. Next, the specialist identifies variables measuring various teacher characteristics. This set could include variables such as teacher performance evaluations, an absenteeism measure, and a personality profile. These variables constitute the criterion variable set. Once the variables for each set have been identified and measured, a correlation matrix is computed and the resulting matrix is divided into four quadrants. Next, a quadruple-product matrix is computed from the four quadrants, using the following matrix algebra formula : R22-1 2x2R21 2x3 R11-1 3x3R12 3x2 = A 2x2 (5) (Thompson, 2000, p. 291). Standardized weights (standardized canonical function coefficients) are produced when matrix A 2x2 is subjected to a principal components analysis. These standardized weights are directly analogous to beta weights in regression and pattern coefficients in exploratory factor analysis. The number of canonical functions (sets of weights) produced will be equal to the number of variables in the smaller variable set (Sherry & Henson, 2005) and each function is perfectly uncorrelated with each other and so are the scores on the latent or synthetic variables computed by applying the weights to the observed or measured variables (Thompson, 2000, p. 292). The two sets of latent variable scores (one for the independent variable set and one for the dependent variable set) are then correlated, yielding a canonical correlation coefficient (R c ). When this value is squared, a variance accounted for effect 15

26 size indicating the amount of variance shared between the synthetic predictor variable and the synthetic criterion variable is produced (R 2 c ). In canonical correlation analysis, Thompson (1997) and Sherry and Henson (2005) advocated a two-step approach for interpretation. First, the researcher should determine whether the canonical model sufficiently captures the relationship between the predictor and criterion variable sets to warrant interpretation (Sherry & Henson, 2005, p. 41). This may be done by evaluating the full model and then each canonical function individually. If the determination is made that the model captures the relationship between the criterion and predictor sets of variables to a reasonable degree, the researcher may then investigate from where the effect comes. As discussed previously, canonical correlation analysis subsumes all other analyses in the parametric general linear model. Canonical correlation analysis may be conceptualized as a Pearson correlation between the synthetic criterion variable and the 2 synthetic predictor variable. Therefore, it is reasonable to hypothesize that R c and R c could be affected by measurement error in a similar manner as r 2 and R 2. However, as previously indicated, no work has investigated this hypothesis to date. Purpose of the Current Study Research pertaining to the distortion of the squared canonical correlation coefficient has traditionally been limited to the effects of sampling error and associated correction formulas (e.g., Leach, 2006; Snyder & Lawson, 1993; Yin & Fan, 2001). While information is available regarding the attenuation of effect sizes due to score unreliability and associated correction formulas, specifically r 2 and R 2 (Cochran, 1970; Edwards, 1971; Fan, 2003; Johnson, 1944; Onwuegbuzie et al., 2005; Thorndike, 16

27 1907), no research is available regarding the attenuation of R c 2 in consequence of score reliability. The purpose of this study was to compare the degree of attenuation of the squared canonical correlation coefficient under varying conditions of score reliability. 17

28 METHOD Monte Carlo simulation methodology was used to fulfill the purpose of this study. Initially, data populations with various manipulated conditions (detailed below) were generated (N = 100,000). Subsequently, 500 random samples were drawn with replacement from each population, and data was subjected to canonical correlation analyses. The canonical correlation results were then analyzed using descriptive statistics and an ANOVA design to determine under which condition(s) the squared 2 canonical correlation coefficient was most attenuated when compared to population R c values. The proposed Monte Carlo methodology created a simulated sampling distribution of R 2 c and information about R 2 c, an indeterminate fit statistic. This information was analyzed and used to determine what effect, if any, the different conditions considered in this study had on R 2 c. Specifically, several conditions were considered and are detailed in the following sections. Research Design To determine the degree of attenuation of the squared canonical correlation coefficient due to score reliability, five conditions were considered: 1. Item number per composite variable 2. Inter-item correlation 3. Between/within set correlation 4. Variable set size 5. Sample size 18

29 Items per Composite The first condition manipulated in this study was the number of items per composite variable. As was the case in the previously discussed work by Fan (2003), it is important to note, in this Monte Carlo study reliability was not manipulated directly. That is, it was not possible to generate variables (referred to as composite variables) with set levels of reliability. Therefore, data was first generated at the item level and later items were summed to create composite variables. In order that varying reliability estimates be considered, first the number of items for each composite variable was specified. Later (detailed in the following section) inter-item correlation values were set. Numbers of items per composite variable and inter-item correlation values were used to calculate the reliability of each composite variable. Stevens (2002) indicated a factor (composite variable for the present study) consisting of fewer than four items is in danger of being variable specific. Furthermore, Stevens (2002) recommended a factor be composed of a minimum of four items (factor loadings >.60 ), unless sample size is quite large (> 300). Therefore, population data for this study was generated such that each composite variable consisted of either four or eight items each. The levels of this condition were a replication of Fan s (2003) work concerning the attenuation of the bivariate correlation coefficient due to score reliability. The resulting population composite variables were used in subsequent repeated sampling and canonical correlation analyses. Inter-item Correlation Regarding inter-item correlation, five levels were considered and included.81,.64,.49,.36, and.25. These levels were also a replication of levels used by Fan (2003) 19

30 in his study of the degree of attenuation of the bivariate correlation coefficient due to score reliability. These levels of inter-item correlation represented correlations of relatively high to low correlations (Fan, 2003). As formerly stated, although the degree of attenuation of the squared canonical correlation coefficient due to score reliability is the focus of this study, score reliability was not manipulated directly. Score reliability was manipulated through the use of varied number of items per composite variable and inter-item correlation values. Once the numbers of items per composite variable and inter-item correlation values were set, score reliability (Cronbach s alpha) for the composite variables was calculated. Based on Fan s work (2003), the score reliability for the population composite variables generated for this study would range from marginal reliability (~0.5) to high reliability (~0.9). Cronbach s alpha may be represented by: (6) Where K is the number of items, i 2 is the sum of the k item score variances, and x 2 is the variance of the composite variable scores (Cronbach, 1951). Additionally, the variance of the composite variable scores is the sum of the item variances and the sum of item covariances (Fan, 2003):. (7) At the item level, for this study, normal, standardized variables were generated (mean = 0, standard deviation = 1). In such cases, the covariance between the variables will be equal to the correlation between them. It follows that: 20 (8)

31 when a composite variable consists of k standardized variables with equal inter-item correlation coefficient of ρ (Fan, 2003). Therefore, Cronbach s alpha values for the population composite variables in this study were calculated as (Fan, 2003):. (9) Cronbach s alpha, an internal consistency estimate of score reliability, can be useful in that calculation may be done from a single administration of a test. While there are several internal consistency methods available (e.g., KR20, KR21, Hoyt s analysis of variance), Cronbach s alpha is the most widely reported internal consistency method of estimating score reliability (Cronbach, 1951; Henson, 2001; Onwuegbuzie & Daniel, 2000; Onwuegbuzie & Daniel, 2002; Thompson, 2003). However, as noted by Onwuegbuzie and Daniel (2002) and Henson (2001), internal consistency reliability coefficients are theoretical estimates of score reliability and not direct measures. Also, Onwuegbuzie and Daniel (2002) suggested that these reliability estimates may be somewhat limited since it is possible the estimates might fail to consider other important measurement issues that may be regarded using other methods of estimating score reliability. Between/Within Set Correlation In Fan s (2003) work concerning the bivariate correlation coefficient, the population correlation coefficients were set to.4 and.6. Results for both conditions indicated a downward bias of the sample correlation coefficient under various degrees of measurement error. However, the condition of.6 experienced an even greater downward bias than that of the.4 condition. For the multivariate case represented in this study, Fan s (2003) bivariate correlations corresponding to between and within set 21

32 correlations of the composite variables were used in the canonical correlation analysis. Specifically, between set correlations referred to the correlations between the variables in each variable set included in a canonical correlation analysis. Within set correlations referred to the correlations among the variables within each variable set included in a canonical correlation analysis. Therefore, the third condition manipulated was between and within set correlations. This condition had three levels and included r =.3, r =.5 and r =.8. Two of these levels were chosen to approximate Cohen s (1988) standards of moderate (r =.3) and large effects (r =.5). Of course it should be noted that Cohen did not intend the proposed benchmarks be applied with rigidity. Researchers are strongly encouraged to interpret findings within the context of his or her study and with regard to previous research in the field. Although Cohen set forth these standards with some hesitancy, they appear to have endured with regard to univariate statistics. Because there is little research regarding multivariate standards, the univariate benchmarks were applied. The third value, r =.8, was chosen in light of results obtained by Fan (2003) regarding the further bias of the bivariate correlation coefficient at the population correlation value of r =.6. The condition of r =.8 was included as an extension of Fan s work to determine what effect score reliability had on an even higher level of population correlation (e.g., possibly result in greater downward bias of the squared canonical correlation coefficient). To reduce the complexity of data generation procedures (detailed below), between and within set correlations remained identical within each condition (e.g., when between set correlation is set to.5, within set correlation was also set to.5). 22

33 Variable Set Size The condition of variable set size was directly related to the purpose of the study given the number of variables used in a canonical correlation analysis theoretically influences the amount of attenuation of the squared canonical correlation coefficient. Purportedly, the more variables included in a study, each measured with a degree of error, the greater amount of attenuation of the squared canonical correlation coefficient. Therefore, the fourth condition of variable set size consisted of three levels and included (a) 2+2 (v = 4), (b) 3+3 (v = 6), and (c) 4+4 (v = 8), each of which are plausible scenarios for canonical correlation use, ranging from the minimum number of variables per set (2+2) to larger sets. Sample Size The final condition manipulated was sample size and included 10, 50 and 100 cases per variable. Theoretically, as sample size increases, sample statistics more closely represent population parameters. Therefore, at larger sample sizes, it is possible score reliability may have a lesser effect on the canonical correlation coefficient (coefficient may experience less attenuation). While the selected range of sample sizes did not directly replicate Fan s (2003) study (n = 50, 100, 200, and 400), the selected levels fall within the range of what is commonly recommended in social science research utilizing canonical correlation. For example, for CCA, Tabachnick and Fidell (2007) recommended a minimum ratio of 10 cases per variable. Additionally, Stevens (2002) discussed a study by Mendoza, Markos, and Gonter (1978) where it was demonstrated that strong population canonical correlations (.9,.8,.7) were detected 90% of the time with a total sample size of 50. Further, more moderate correlations 23

34 (.50) were detected 67% of the time with a total sample size of 100 and weak population canonical correlations (.3,.2,.1) were detected 60% of the time with a total sample size of 200. Stevens (2003) continued and suggested, based on Barcikowski and Stevens (1975), when interpreting only the largest canonical correlation (as is the case in the current study), a sample size of 20 cases per variable is adequate. Therefore, the levels selected for this condition represented a range of recommended sample sizes for CCA. In review, the research design for this study was fully crossed and included two numbers of items per variable, five inter-item correlations, three between/within set correlations, three variable set sizes, and three sample sizes resulting in (2x5x3x3x3) 270 conditions examined. Five hundred random samples were drawn from each condition resulting in 135,000 canonical correlation analyses (see Table 7). Table 7 Summary of Data Conditions Manipulated in the Study Data Condition Levels Manipulated Item number per composite 4 and 8 Inter-item correlation.25,.36,.49,.64,.81 Between/within set correlation.3,.5,.8 Variable set size (v = 4) (v = 6) (v = 8) Sample size 10:1, 50:1, 100:1 Simulation This study utilized Monte Carlo methods for simulation of the data conditions. SAS system software, version 9.1, was used to process the simulation and to perform any statistical analysis of the data. SAS syntax, for the discrete steps of data generation and analysis (detailed below) for this study, may be found in the appendix. 24

35 Data Generation In order to generate random normal data with specified inter-item and between/within correlations, two eight-factor models were utilized. Figure 1 represents one of the two-eight factor models. Figure 1. Eight factor model for data generation. Eight variable composites (FX i, FY i ) each composed of four items (x i, y i ). Specifically, the model in Figure 1 represents population data generated, initially, at the item level (x i, y i ) with set levels of inter-item correlations and four items per composite variable. Later, items (x i, y i ) were summed to create the population composite variables (FX i, FY i ). Reliability was then calculated for each of the population composite variables (FX i, FY i ). Repeated sampling of the population composite variables under the previously discussed conditions of variable set size and sample size occurred and those samples subjected to the canonical correlation analyses. The second eightfactor model is identical to Figure 1, however, in the second factor case, each factor was composed of eight items instead of four as represented in Figure 1. The models are based on the following (Joreskog & Sorbom, 1989): = Λ Φ Λ + Θ (10) Where is the population covariance/correlation matrix, Λ is the matrix of population pattern coefficients in Figure 1, Φ is the population correlation matrix for the eight 25

36 factors, and Θ is the covariance matrix of population residuals for the items (Fan, 2003). Eight-factor models were chosen in order that all variables set sizes be encompassed in the data generation process [(a) 2+2 (v = 4), (b) 3+3 (v = 6), and (c) 4+4(v = 8)]. Through this method, 30 population covariance/correlation matrices were imputed and used to generate 30 populations of normal, standardized composite variables composed of either four or eight items (2), each with given levels of inter-item (5) and between/within correlation values (3) (2x5x3 = 30). Specifically, 30 separate SAS programs were written and included the correct covariance/correlation matrix generated using the formula in Equation 8. Next the SAS FACTOR procedure was used to analyze the matrix, resulting in a factor pattern matrix. The factor pattern matrix was then read in as matrix F. Next, the RANNOR function was used to generate a random normal data set (DATA) of N = 100,000, the data set was transposed (DATA = DATA'), inter-item correlations imposed (Z = F*DATA) and finally, the data matrix was transposed back (Z = Z'). Next, variable composites were created from four or eight items, depending on the covariance/correlation matrix in question. The values necessary to compute the 30 covariance/correlation matrices are included in Table 8. Table 9 includes information with regard to which composite variables composed each variable set size. 26

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic Model for Group Differences in Regression Intercepts by Margarita Olivera Aguilar A Thesis Presented in Partial Fulfillment of