Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017 I. χ 2 or chi-square test Objectives: Compare how close an experimentally derived value agrees with an expected value. One method to estimate the probability that an observed value did or did not occur by chance. The χ 2 test helps us determine whether there is a statistically significant different or not between the expected and observed values. We can choose an arbitrary cut-off for significance, which is typically at a p value of 0.05 or 0.01, meaning 5 or 1 percent of the outcome occurring due to chance alone. The χ 2 test also creates two hypotheses, known as our null and alternative hypotheses. The null hypothesis or H 0 states that there is no significant difference between the observed and expected values. The alternative hypothesis or H A states that there is a significant difference between the observed and expected values. The χ 2 or chi-square analysis begins with finding the χ 2 value. This is measured by the sum of each variable s observed minus expected values squared divided by the expected value. Or as an equation: (Observed Expected)! Expected You can abbreviate observed as O and expected as E, simplifying the general equation to: (O E)! Once you have a χ 2 value, you will then calculate your degrees of freedom or df. E df = n 1 n is the number of values that are allowed to vary freely, or each of our discrete categories that can be measured by an expected and observed value.
Using the df and χ 2 value, we can use the χ 2 table to find the probability, p. First, go to the column for df and go down until you find the correct df, this will be the row that you will find your χ 2 value within. From the df, go to the right and find your χ 2 value cut off. The χ 2 value will most likely not equal that exact number within the table, so we are searching for which columns your χ 2 value falls between. Once you have found the two values that are around your χ 2 value, then you will go up to the top row to find the corresponding probability, or p value. Notice the p values are between 0-1. Chi Square Table: Degrees of Freedom (df) Probability (p) 0.95 0.90 0.80 0.70 0.50 0.30 0.20 0.10 0.05 0.01 0.001 1 0.004 0.02 0.06 0.15 0.46 1.07 1.64 2.71 3.84 6.64 10.83 2 0.10 0.21 0.45 0.71 1.39 2.41 3.22 4.60 5.99 9.21 13.82 3 0.35 0.58 1.01 1.42 2.37 3.66 4.64 6.25 7.82 11.34 16.27 4 0.71 1.06 1.65 2.20 3.36 4.88 5.99 7.78 9.49 13.28 18.47 5 1.14 1.61 2.34 3.00 4.35 6.06 7.29 9.24 11.07 15.09 20.52 6 1.63 2.20 3.07 3.83 5.35 7.23 8.56 10.64 12.59 16.81 22.46 7 2.17 2.83 3.82 4.67 6.35 8.38 9.80 12.02 14.07 18.48 24.32 8 2.73 3.49 4.59 5.53 7.34 9.52 11.03 13.36 15.51 20.09 26.12 9 3.32 4.17 5.38 6.39 8.34 10.66 12.24 14.68 16.92 21.67 27.88 10 3.94 4.86 6.18 7.27 9.34 11.78 13.44 15.99 18.31 23.21 29.59 Non-significant Significant The p values denote 0-100% occurrences by chance. You can write your p value as: higher probability > p value > lower probability This means that your p value falls within a probability of x and y or the probability is less than the higher value, but greater than the lower probability value.
There is also the case in which the χ 2 value is very, very small and the value you find is off the chart, this will correspond to a p value that is greater than 0.95 or p value > 0.95. On the other side of the spectrum, the χ 2 value may be very large and is beyond that of the chart. In this case the p value will be written as p value < 0.001. When writing your own conclusion based on the results of a χ 2 test, you will include the specifics of what you have measured using the observed and expected values, the p value, and the conclusion in terms of the null hypothesis. p > 0.05 There is not a significant difference between the expected and observed values specifically state what you measured (p value), therefor we fail to reject the null hypothesis. p < 0.05 There is a significant difference between the expected and observed values specifically state what you measured (p value), therefor we reject the null hypothesis.
If we go back to the previous unit using a dihybrid cross, we can hypothesize the offspring ratio for a heterozygous vestigial stubble female crossed with a heterozygous vestigial male. 1. Female vg/+; s/+ x male vg/+; s/+ F2 genotypic ratio: F2 phenotypic ratio:
Phenotype Expected Observed Wildtype 35 Vestigial 17 Stubble 12 Vestigial/Stubble 7 Totals A) Fill in the expected number of each phenotype. B) Write the appropriate null hypothesis: C) Write the appropriate alternative hypothesis: D) Perform a Chi Square test. Show all your work.
E) What is the Chi Square value? F) What are the degrees of freedom? G) What is the associated P-value? H) What is your conclusion in terms of the null hypothesis? When is it appropriate to use the Chi-Square test?
II. T-test Objectives: Understand a second way to get a p value or measure probability is through a T-test. Be able to perform a t-test in excel with a data set. The t-test uses a different approach to finding probability of significance. Rather than using values different from the observed and expected, such as in χ 2 test. In this case, you are measuring the statistical difference between two sets of measured values. The general equation used is: t = Estimate of the parameter Hypothesized value of the parameter Estimated Standard Error of the Estimator Estimate of the parameter = differences between the two sample means of each group Hypothesized value of the parameter = expected difference between the two sample means of each group based on the null hypothesis (no difference) Estimated Standard Error of the Estimator = standard error of the data set, referring to the variability of the set due to sampling (dependent on equal versus unequal variance). The two-sample t test is the difference between the two means divided by the standard error of the sample. With this equation, you create two hypotheses (or more!) based on your variable that you are measuring the impact. In this case, we have a control group and a manipulated group. Both groups will have the exact same conditions aside from one variable, so we can measure the impact of the variable between these two groups. The group that does not have the variable factor is called the control group. The null hypothesis or H 0 states there is no significant difference between the two groups, one that has the variable and the control. mean 1 = mean 2
The alternative hypothesis or H A states that there is a significant difference between the two groups dependent on the variable. In this case we can state that they are not equal, or we can state that the variable creates a greater than or less than difference. The two variables are not equal, then: mean 1 mean 2 One variable is less than the other, then: mean 1 mean 2
One value is greater than the other, then: mean 1 mean 2 Within Microsoft Excel, we can use the t test function = TTEST From here, we will need to insert 4 values: = TTEST array1, array2, tails, type Array1= all cells containing results from sample group 1 Array2= all cells containing results from sample group 2 Tails= 1 or 2, dependent on the alternative hypothesis Type= 1, 2, or 3 1= paired 2= two-sample, equal variance 3= two-sample, unequal variance Because we are not going to test the equality of variance, we will either use types 1 or 3 Paired t test: using the same sample to measure a variable, before and after, repeated measures. In this case, the same individuals are measured twice Unpaired t test: using two independent samples to measure the affect of a variable. In this case each individual is only measured once. Once you enter = TTEST array1, array2, tails, type, you will have a resulting p value.
2. You will conduct a t-test using the height data and compare male to female heights. Write corresponding null and alternative hypotheses for this data comparison. Complete the t-test in Excel. Write a conclusion in terms of the null hypothesis.