Agonistic Display in Betta splendens: Data Analysis I. Betta splendens Research: Parametric or Non-parametric Data?

Agonistic Display in Betta splendens: Data Analysis By Joanna Weremjiwicz, Simeon Yurek, and Dana Krempels Once you have collected data with your ethogram, you are ready to analyze that data to see whether it indicates rejection or failure to reject your null and alternative experimental hypotheses. This chapter will help you with statistical analysis. I. Betta splendens Research: Parametric or Non-parametric Data? There are essentially three different types of data you might have collected from your Betta splendens subjects. Duration of a particular behavior (seconds; discrete numerical data) o Example: How long were operculum flares in treatment and control groups o Note: Whole seconds are discrete numerical data. Do not use fractions of seconds. Counts/incidences of a particular behavior (integer; discrete numerical data) o Example: How many times did fish in treatment and control groups flare opercula? Time to start/end of a particular behavior (seconds; discrete numerical data) o Example: How long until the first (or n th ) operculum flare in treatment and control groups? o Note: Whole seconds are discrete numerical data. Do not use fractions of seconds. The type of data your team collected will determine the type of statistical analysis to employ. The flow chart shown in Figure 1 will help you decide which test is most appropriate for the data you have collected from your Betta splendens. II. Non-parametric statistics: Mann-Whitney and Kruskal-Wallis Tests If your team collected data that can be counted as integers (e.g., number of incidences of behavior; seconds), then you will use the Mann-Whitney U test if you have only two experimental groups to compare, or a Kruskal-Wallis Test if you have more than two experimental groups to compare. The Mann-Whitney U can be considered a non-parametric analog to the (parametric) t-test, whereas the Kruskal-Wallis Test can be considered a non-parametric analog to (parametric) ANOVA. Neither the Mann-Whitney nor Kruskal-Wallis tests require a normal distribution of data. However, they do require independent observations of data that can be ranked. The tests allow comparison of the rank positions of data points (rather than the values of the data points themselves) to one another, determining the degree of overlap between the groups being compared. A. Mann-Whitney U Test (a.k.a., Wilcoxon rank-sum test) The Mann-Whitney is used to compare data sets from two groups (e.g., treatment and control). It requires that the sample size for each group be less than 30. The test statistic generated by the Mann -Whitney test is known as the U statistic. (The U stands for unbiased, as in unbiased estimate.) (For a sample size of greater than 30, Data Analysis - 1

when data approximate a normal distribution, a Z statistic should be calculated, instead of a U statistic.) The experimental hypotheses for the Mann-Whitney test are, generically: HO Two samples have the same rank ordered positions of data. Example: Experimental Group A (e.g., control) has a similar number of fish exhibiting the same number of operculum flares as Experimental Group B (e.g., treatment). HA Two samples come from populations with different rank ordered positions of data. Example: Experimental Group A (e.g., control) has more fish exhibiting a greater number of operculum flares than Experimental Group B (e.g., treatment). The Mann-Whitney test is highly sensitive to the number of interchanges in rank between two experimental groups, but it has less power to detect departures from the null than some other tests. For a full explanation of the logic of the Mann-Whitney U test, go to http://vassarstats.net/textbook/ch11a.html (Thanks, Vassar Stats!) Figure 1. Flow chart for determining appropriate statistical test. Created by Tom de Jong and Frans Jacobs, Universiteit Leiden, Netherlands. (http://science.leidenuniv.nl/index.php/ibl/pep/people/tom_de_jong/teaching) Data Analysis - 2

Let s say your team has counted the number of operculum flares in two groups of fish being compared, one exposed to a living rival fish, and the other group exposed to a mirror image of itself. An example of raw data appears in Table 1a. The raw data are ranked in Table 1b. Table 1a. Number of operculum flares per minute in male Betta splendens exposed to live rival (Group A) versus mirror image (Group B). Replicate Group A Group B 1 20 1 2 26 4 3 30 3 4 21 7 5 4 2 6 24 0 Table 1b. Ranked values for Betta splendens the number of operculum flares per minute. Values should be ranked from lowest to highest number. Each Group A value gets one point for every Group B value that appears below it. Each Group B value gets one point for every Group A value that appears below it. (For example, the first value, 0 for Group B, has six Group A values below it, so it gets 6 points.) In case of a tie: If two values are the same, then the rank of both values is the average of the two ranks. The points assigned each value are equal to the number of values below the tied value plus 0.5 point for the tied value. (See the table for an example.) Rank # of Fish Group Points 1 0 B 6 2 1 B 6 3 2 B 6 4 3 B 6 5.5 4 B 5.5 5.5 4 A 1.5 7 7 B 5 8 20 A 0 9 21 A 0 10 24 A 0 11 26 A 0 12 30 A 0 The U statistic for each group is the sum of its points. For our example: U for Group A = 1 + 0.5 + 0 + 0 + 0 + 0 = 1.5 U for Group B = 6 + 6 + 6 + 6 + 5.5 + 5 = 34.5 If your team chose to record a discrete number of behaviors, then use the Mann- Whitney U to determine if the number of occurrences of the behavior between your two groups shows significant overlap (suggesting the groups are not very different, and you will fail to reject your null hypothesis) or not (suggesting that the groups are different, and you may find a P level consistent with rejecting your null hypothesis). Data Analysis - 3

Your final U statistic is the smaller of the two values you will calculate for your two experimental groups. In the above example, the lower value is 1. In general, the lower the value of the U statistic, the less overlap there is between the two groups being compared. In this example, there is only one overlapping value, suggesting that the two groups are very different. Use the table of Critical Values for the Mann-Whitney U (Table 2) to determine the probability value (P) that corresponds to your own team s U statistic. If your U value is smaller than that shown in the table, then there is less than 5% chance that the difference between your two experimental groups is due to chance alone. (If your U value is smaller than the one shown in this table for your two sample sizes, reject your null hypothesis.) If your U value is larger than that shown in the table, fail to reject your null hypothesis. Table 2. Critical values for the Mann-Whitney U statistic. Find the value that corresponds to the sample sizes of your two experimental groups. (From The Open Door Web Site, http://www.saburchill.com/) Data Analysis - 4

B. Kruskal-Wallis Test This test is appropriate if you have more than two experimental groups (For example, Control, Treatment A, and Treatment B). The sample size must be greater than five. The test statistic you will calculate is H (variance), and its distribution follows the same as that of the Chi square (Χ 2 ). The table of Chi Square critical values can be found at the end of this chapter. The generalized hypotheses for a Kruskal-Wallis test are H o : Positions of data from several populations do not significantly differ. (Example: Control, Treatment A, and Treatment B have a similar number of fish exhibiting the same number of operculum flares.) H A : Positions of data from several populations are significantly different. (Example: Control, Treatment A, and Treatment B have different number of fish exhibiting the same number of operculum flares.) As in the Mann-Whitney test, you will not be considering the actual raw values of your data points, but rather their ranks, relative to each other. As before, you wish to determine the degree to which different groups data overlap. The less overlap between groups, the more likely that the difference between them is real, indicating rejection of H O. Consider the imaginary data set shown in Table 3, which shows the duration of operculum flares in a fish subjected to three different experimental conditions. (NOTE: Subjecting a single experimental fish to all three conditions is more powerful than using different individuals, as it reduces variability inherent in using different individuals. However, one must consider the effects of previous treatments on an individual, and thus be sure not only to allow adequate recovery time between the three treatments, but also to randomize the order of the three treatments for experimental replications. The calculations shown below are for independent, not paired/grouped samples.) Table 3. Number of seconds of operculum flaring in a one-minute trial for three experimental groups. The rank of each value (ranking from lowest to highest value) appears in parentheses beside each value. Sums and averages of ranks appear in the bottom two rows. Replicate Treatment A Treatment B Control 1 20 (7) 40 (15) 1 (2) Treatment 2 26 (10) 32 (13) 4 (5.5) 3 30 (11) 47 (17) 3 (4) 4 21 (8) 29 (12) 7 (7) 5 4 (5.5) 41 (16) 2 (2) 6 24 (9) 35 (14) 0 (1) Sum of ranks (T) Mean of ranks (M) A, Treatment B, Control combined 50.5 (T A ) 87 (T B ) 21.5 (T C ) 159 (T all ) 8.4 (M A ) 14.5 (M B ) 3.5 (M C ) 9 (M all ) A measure of the combined degree to which group ranks differ is known as the sum of squared deviates (SS bg, in which bg stands for between groups ). The squared deviate for any particular group (SS grp, in which grp stands for the specific group, A, B, Data Analysis - 5

or C) can be calculated as the squared difference between the group s mean rank and the combined mean rank of all groups, multiplied by the sample size of that group: Thus, For Treatment A: SS A = 6(8.4 9) 2 = 2.16 for Treatment B: SS B = 6(14.5 9) 2 = 181.5 and for Control: SS C = 6(3.5 9) 2 = 181.5 SS grp = Σ [n grp (M grp M all ) 2 ] The sum of squared deviates, SS bg, is equal to the sum of the squared deviates for all groups. In our example: SS bg = 2.16 + 181.5 + 181.5 = 365.2 The logic of the Kruskal-Wallis test is fairly straightforward, and an excellent, easy-tounderstand explanation can be found here: http://vassarstats.net/textbook/ch14a.html. (Thanks again, Vassar Stats!) For the purpose of expediency in this chapter, however, we will cut to the chase, and go straight to the calculation of the Kruskal-Wallis test statistic, H. This statistic represents a ratio with your observed sum of squared deviates as the numerator (in our example, it is 365.2), and the expected sum of squared deviates of a sampling distribution to which your sample belongs (this is represented as N(N+1)/12, in which N is equal to the number of counts. In our example, this would be 18, the sum of the number of replicates in all three experimental groups). H = SS bg [N(N-1)]/12 For our example: H = 365.2 [18(18-1)]/12 = 14.32 Conveniently, if each of the experimental groups has yielded at least five (5) observations, the sampling distribution of H is very similar to that of the Chi Square with degrees of freedom = (k 1), in which k is the number of experimental groups (in our example, k = 3). A table of critical values for the Chi square can be found in Table 4. In our example, df = 3-1 = 2. Our value of H (14.32) at 2 degrees of freedom is to the right of the largest value shown (10.597), which is associated with a P value of 0.005. The probability that this lack of overlap is due to chance is very small (P < 0.005). Hence, we reject the null hypothesis. Data Analysis - 6

Armed with these examples, you should now be able to apply these statistical methods to your own data. Table 4. A partial table of the critical values for the Kruskall-Wallis or Chi Square. Data Analysis - 7