Chapter 3: Statistical methods for estimation and testing Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001).
Chapter 3: Statistical methods for estimation and testing Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001). 3.1 Motivation: 1. Identify genes which show evidence of differential expression (DE).
Chapter 3: Statistical methods for estimation and testing Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001). 3.1 Motivation: 1. Identify genes which show evidence of differential expression (DE). In general, a study may involve one or a small number of genes, or many thousands of genes as in microarray experiments.
Chapter 3: Statistical methods for estimation and testing Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001). 3.1 Motivation: 1. Identify genes which show evidence of differential expression (DE). In general, a study may involve one or a small number of genes, or many thousands of genes as in microarray experiments. In microarray experiments, a primary goal is to rank genes according to evidence of DE.
Chapter 3: Statistical methods for estimation and testing Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001). 3.1 Motivation: 1. Identify genes which show evidence of differential expression (DE). In general, a study may involve one or a small number of genes, or many thousands of genes as in microarray experiments. In microarray experiments, a primary goal is to rank genes according to evidence of DE. 2. Matching DNA sequences.
For microarrays, we need to: (i) select one or more statistics to rank genes in order of evidence of DE, from strongest to weakest; and
For microarrays, we need to: (i) select one or more statistics to rank genes in order of evidence of DE, from strongest to weakest; and (ii) choose a critical value for the ranking statistic, above which any value is considered to be statistically significant, and therefore DE.
For microarrays, we need to: (i) select one or more statistics to rank genes in order of evidence of DE, from strongest to weakest; and (ii) choose a critical value for the ranking statistic, above which any value is considered to be statistically significant, and therefore DE. There are practical constraints: in a typical study, only a limited number of genes can be followed up for further study.
For microarrays, we need to: (i) select one or more statistics to rank genes in order of evidence of DE, from strongest to weakest; and (ii) choose a critical value for the ranking statistic, above which any value is considered to be statistically significant, and therefore DE. There are practical constraints: in a typical study, only a limited number of genes can be followed up for further study. Note that in Chapter 3, we assume the data have been appropriately normalized using the methods we studied in Chapter 2.
3.2 The simplest comparison: Two mrna samples A, B are hybridised to a single array (=slide), and the array is replicated n times. Assume each gene is spotted once on each slide.
3.2 The simplest comparison: Two mrna samples A, B are hybridised to a single array (=slide), and the array is replicated n times. Assume each gene is spotted once on each slide. Replication is very important.
3.2 The simplest comparison: Two mrna samples A, B are hybridised to a single array (=slide), and the array is replicated n times. Assume each gene is spotted once on each slide. Replication is very important. (i) A common approach to analysis is to calculate the average log ratio M j for each gene, and sort the genes according to the absolute value of M j.
3.2 The simplest comparison: Two mrna samples A, B are hybridised to a single array (=slide), and the array is replicated n times. Assume each gene is spotted once on each slide. Replication is very important. (i) A common approach to analysis is to calculate the average log ratio M j for each gene, and sort the genes according to the absolute value of M j. But this is a poor choice because it ignores the variability in expression levels for each gene.
3.2 The simplest comparison: Two mrna samples A, B are hybridised to a single array (=slide), and the array is replicated n times. Assume each gene is spotted once on each slide. Replication is very important. (i) A common approach to analysis is to calculate the average log ratio M j for each gene, and sort the genes according to the absolute value of M j. But this is a poor choice because it ignores the variability in expression levels for each gene. The variability of the M j values for a gene over replicates varies from gene to gene, and genes with larger variance have a good chance of giving a large M j statistic even if they are not DE.
(ii) It is better to use the single sample t statistic. For each gene j, calculate: t j = M j s j / n where s j is the standard deviation of M j -values for the replicates for a gene; j = 1,..., g genes.
(ii) It is better to use the single sample t statistic. For each gene j, calculate: t j = M j s j / n where s j is the standard deviation of M j -values for the replicates for a gene; j = 1,..., g genes. This is in fact a paired t statistic when applied to microarray data. Why?
(ii) It is better to use the single sample t statistic. For each gene j, calculate: t j = M j s j / n where s j is the standard deviation of M j -values for the replicates for a gene; j = 1,..., g genes. This is in fact a paired t statistic when applied to microarray data. Why? The null hypothesis is H 0 : µ j = 0 vs H A : µ j 0 for each gene j = 1,..., g.
(iii) Penalised t statistic: this is a compromise between using M j and the t j -statistic.
(iii) Penalised t statistic: this is a compromise between using M j and the t j -statistic. Aim is to avoid spuriously large t j resulting from unrealistically small s j.
(iii) Penalised t statistic: this is a compromise between using M j and the t j -statistic. Aim is to avoid spuriously large t j resulting from unrealistically small s j. The penalty is applied to the estimated standard deviation s j : t j = M j (a + s j )/ n
(iii) Penalised t statistic: this is a compromise between using M j and the t j -statistic. Aim is to avoid spuriously large t j resulting from unrealistically small s j. The penalty is applied to the estimated standard deviation s j : t j = M j (a + s j )/ n or to the estimated variance s 2 j : M j t j = (a + s 2 j )/n
The penalties can be estimated in different ways e.g., a may be the 90th percentile of the s j values.
The penalties can be estimated in different ways e.g., a may be the 90th percentile of the s j values. The choice is driven by empirical rather than by theoretical considerations.
The penalties can be estimated in different ways e.g., a may be the 90th percentile of the s j values. The choice is driven by empirical rather than by theoretical considerations. Intensity-dependent penalities are also applied in practice.
The penalties can be estimated in different ways e.g., a may be the 90th percentile of the s j values. The choice is driven by empirical rather than by theoretical considerations. Intensity-dependent penalities are also applied in practice. Always analyse the unadjusted t statistics.
The penalties can be estimated in different ways e.g., a may be the 90th percentile of the s j values. The choice is driven by empirical rather than by theoretical considerations. Intensity-dependent penalities are also applied in practice. Always analyse the unadjusted t statistics. The wisdom of penalising is open to debate.
(i) Standard error of M versus average gene intensity (ii) Normal qq-plot of penalised t statistic Standard deviation of log ratios 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 Sample Quantiles 10 5 0 5 10 9 10 11 12 13 14 15 Average gene intensity 4 2 0 2 4 Theoretical Quantiles
Assessing differential expression 3.3. Ranking genes: Suppose we calculate the t statistic for every gene in the array (this could be any number up to 20,000) and rank the absolute values of the t statistics.
Assessing differential expression 3.3. Ranking genes: Suppose we calculate the t statistic for every gene in the array (this could be any number up to 20,000) and rank the absolute values of the t statistics. This will give us a ranked list of genes in which the largest values of t provide the strongest evidence of differential expression.
Assessing differential expression 3.3. Ranking genes: Suppose we calculate the t statistic for every gene in the array (this could be any number up to 20,000) and rank the absolute values of the t statistics. This will give us a ranked list of genes in which the largest values of t provide the strongest evidence of differential expression. However we have, in effect, just performed 20,000 t-tests!
Assessing differential expression 3.3. Ranking genes: Suppose we calculate the t statistic for every gene in the array (this could be any number up to 20,000) and rank the absolute values of the t statistics. This will give us a ranked list of genes in which the largest values of t provide the strongest evidence of differential expression. However we have, in effect, just performed 20,000 t-tests! The next step is to choose a cut-off value above which genes will be flagged as statistically significant. How should we do this?
The aim in attempting to determine which genes are truly DE is to control for the large amount of multiple testing inherent in the need to conduct a test for each gene. See Chapter 4 on Multiple Comparisons.
An informal, graphical method that can be used to assess significance is to display the sorted t statistics in a normal quantile-quantile plot, or a t-distribution qq-plot.
An informal, graphical method that can be used to assess significance is to display the sorted t statistics in a normal quantile-quantile plot, or a t-distribution qq-plot. The idea is to look for points which deviate markedly from the line.
An informal, graphical method that can be used to assess significance is to display the sorted t statistics in a normal quantile-quantile plot, or a t-distribution qq-plot. The idea is to look for points which deviate markedly from the line. The example on the next slide shows a qq-plot for t statistics with 4 degrees of freedom; the experiment compared two mutant cells lines in leukaemic mice on each slide.
An informal, graphical method that can be used to assess significance is to display the sorted t statistics in a normal quantile-quantile plot, or a t-distribution qq-plot. The idea is to look for points which deviate markedly from the line. The example on the next slide shows a qq-plot for t statistics with 4 degrees of freedom; the experiment compared two mutant cells lines in leukaemic mice on each slide. The method implicitly assumes M is roughly normal and that the genes are behaving independently (which may not be true).
t qq-plot t.statistics1a[, 1] 200 150 100 50 0 50 15 10 5 0 5 10 15 qt(ppoints(t.statistics1a[, 1]), df = 4)
3.4 More complex experiments:
3.4 More complex experiments: One of the most commonly used designs in biological experiments is the reference design.
3.4 More complex experiments: One of the most commonly used designs in biological experiments is the reference design. The simplest such design compares two mrna samples A and B through a reference sample, Ref. That is, A is compared with Ref, and B is compared with Ref.
3.4 More complex experiments: One of the most commonly used designs in biological experiments is the reference design. The simplest such design compares two mrna samples A and B through a reference sample, Ref. That is, A is compared with Ref, and B is compared with Ref. In terms of log ratios M, for each gene j we now have M Aj = log(a j /Ref), M Bj = log(b j /Ref) where A, B are labelled red and Ref is labelled green.
3.4 More complex experiments: One of the most commonly used designs in biological experiments is the reference design. The simplest such design compares two mrna samples A and B through a reference sample, Ref. That is, A is compared with Ref, and B is compared with Ref. In terms of log ratios M, for each gene j we now have M Aj = log(a j /Ref), M Bj = log(b j /Ref) where A, B are labelled red and Ref is labelled green. The difference of interest is M Aj M Bj.
For ease of notation, we will drop the subscript j.
For ease of notation, we will drop the subscript j. In microarray experiments, there will be n 1 replicate arrays comparing (i.e. hybridising) A with Ref, and n 2 replicate arrays comparing B with Ref. Then the test statistic will be based on M A M B.
For ease of notation, we will drop the subscript j. In microarray experiments, there will be n 1 replicate arrays comparing (i.e. hybridising) A with Ref, and n 2 replicate arrays comparing B with Ref. Then the test statistic will be based on M A M B. We know that the optimal normal theory statistic is the two-sample t statistic: t = M A M B s p 1 n 1 + 1 n 2 where s p is the pooled sample standard deviation.
The null hypothesis for each gene is that the expression levels in the two cell types A and B are the same, i.e., H 0 : µ A = µ B versus H a : µ A µ B.
The null hypothesis for each gene is that the expression levels in the two cell types A and B are the same, i.e., H 0 : µ A = µ B versus H a : µ A µ B. s p is sometimes replaced by the penalised pooled sample standard deviation, s p = a + s 2 p.
The example on the next page is from Dudoit et al. (2002) and shows a histogram of the observed two-sample t statistics, and the normal qq-plot for two-sample t statistics from a study comparing lipid levels in treated (A) and control mice (B).
The example on the next page is from Dudoit et al. (2002) and shows a histogram of the observed two-sample t statistics, and the normal qq-plot for two-sample t statistics from a study comparing lipid levels in treated (A) and control mice (B). There were 16 slides in the experiment, 8 for treated and 8 for control mice, each hybridised to a common reference pool of mice DNA (Ref).
The example on the next page is from Dudoit et al. (2002) and shows a histogram of the observed two-sample t statistics, and the normal qq-plot for two-sample t statistics from a study comparing lipid levels in treated (A) and control mice (B). There were 16 slides in the experiment, 8 for treated and 8 for control mice, each hybridised to a common reference pool of mice DNA (Ref). The points lying off the line are candidates for differential expression.
Histogram & qq plot ApoA1
Remarks on t statistics
Remarks on t statistics The t statistic has the advantage of extending to more complex situations, such as factorial designs and multiple regression.
Remarks on t statistics The t statistic has the advantage of extending to more complex situations, such as factorial designs and multiple regression. The above approach to analysis can be generalised to more than two samples using F statistics, and so on.
Remarks on t statistics The t statistic has the advantage of extending to more complex situations, such as factorial designs and multiple regression. The above approach to analysis can be generalised to more than two samples using F statistics, and so on. However, the two-sample t statistic assumes the random variables M A and M B are normally distributed and have equal variances, which may not be justified.
We can relax the equal variance assumption by using the approximate unequal variance form of the two-sample t statistic: t = M A M B s 2 A n 1 + s2 B n 2
We can relax the equal variance assumption by using the approximate unequal variance form of the two-sample t statistic: t = M A M B s 2 A n 1 + s2 B n 2 But there are better, alternative approaches.
The rest of this Chapter...
The rest of this Chapter... Nonparametric or distribution-free alternatives to the two-sample t statistic are popular and we consider two of these in 3.5: Mann-Whitney test Permutation test.
The rest of this Chapter... Nonparametric or distribution-free alternatives to the two-sample t statistic are popular and we consider two of these in 3.5: Mann-Whitney test Permutation test. Computer-intensive testing and estimation procedures are also popular, and in 3.6 we will study Bootstrap techniques.
The rest of this Chapter... Nonparametric or distribution-free alternatives to the two-sample t statistic are popular and we consider two of these in 3.5: Mann-Whitney test Permutation test. Computer-intensive testing and estimation procedures are also popular, and in 3.6 we will study Bootstrap techniques. In 3.7, we will study Bayesian estimation and testing procedures.