Inferential Statistical Analysis of Microarray Experiments 2007 Arizona Microarray Workshop

Size: px

Start display at page:

Download "Inferential Statistical Analysis of Microarray Experiments 2007 Arizona Microarray Workshop"

Jasmin Cameron
6 years ago
Views:

1 Inferential Statistical Analysis of Microarray Experiments 007 Arizona Microarray Workshop μ!! Robert J Tempelman Department of Animal Science tempelma@msuedu

2 HYPOTHESIS TESTING (as if there was only one gene) Significance level (False Positive Rate) H 0 is not rejected H 0 is rejected H 0 is true (non-de) H 0 is false (DE) No error (-α) Type II error (β) Type I error (α) No error (-β) DE: differentially expressed Standard approach: Power (Sensitivity) Specify an acceptable type I error rate (α) Seek tests that minimize the type II error rate (β), ie, maximize power ( - β)

3 Unpaired comparisons between two treatments Example ) Affymetrix Data: one specimen (expt l unit/biological rep) per array A A An B B Bn eg response (y ijg ) = log fluorescence intensity for subject j on gene k within Group i i =,,,T (T = # of treatments) j =,,,n i g =,,,G (n i = # of biological reps within treatment i) (G= # of genes)

4 Unpaired comparisons between two treatments (cont d) Cy5 Example ) Reference (two color) Designs: A R B R A R B R An R Bn R Cy3 eg Response (y ijg ) = log ratio of fluorescence intensity (relative to reference common sample R) Subscripts as on previous slide (one measure per probe)

5 Linear statistical model Basis for classical statistical inference Consider linear model for one gene (drop g) Y ij = μ + trt i + e ij μ = overall mean trt i = effect of treatment i True mean for treatment i e ij : random experimental (biological) error

6 THE TWO-SAMPLE t-test (T=) Assume equal variances within each treatment Sample statistics y g y g, For gene k Sample Sample y g y g y n,g s g s g y g y n,g y g, s g ( n ) s + ( n ) s = n + n g g Given yjg ~ N( μg, σg) and H σ : μ = μ vs H : μ μ = σ g g y ~ N( μ, σ ) jg g g 0 g g g g t = y Test statistic: y ~ t g g g ( n+ n ) sg + n n weighted average of s g and s g SED g

7 DECISION RULES AND P-VALUES (IGNORING MULTIPLICITY) One-tailed test (H : μ g - μ g > 0) If t g > t α Gene k concluded to be differentially expressed t g = y y g g SED g

8 t g = y DECISION RULES AND P-VALUES (IGNORING MULTIPLICITY) Two-tailed test (H : μ g - μ g 0) g g SED y g Compare t k to t α/ or compare P-value Prob(t> t k ) to α ) P-value < α Reject H o : μ g -μ g = 0 α/ α/ ) P-value > α Fail to reject H o :μ g -μ g = 0 -t α/ t t α/

9 DECISION RULES AND P-VALUES (EXAMPLES) α = 005 t 60 ) t = 5: P-value =005< α -> Reject H o : Prob(t<-5) Prob (t>5) -5 5 ) t = -0: P-value = 03 > α > Fail to reject H o : Prob(t<-0) Prob (t>0) -0 0

10 THE TWO-SAMPLE t-test σ σ COMMENT: If, the t-test should be altered accordingly: t = y s n y s + n t ~ df *, where: s s + n n df * = s s n n + n + n + (Satterthwaite, 946)

EXAMPLE and SAS CODE for one gene T 57 557 56 59 594 574 T 544 549 548 577 58 --- n =6; n = 5 y

11 EXAMPLE and SAS CODE for one gene T T n =6; n = 5 y = y = s = 05 s = 08 s ( n ) s + ( n ) s = = n + n 0069 t = = NS

12 RESULTS P-value (two-tailed) =0695 = Prob(t 9 >49) + Prob(t 9 <-49) PROC TTEST

13 An issue taken with the two sample t-test (or classical linear model analysis) Distributional assumptions especially with small n effect of non-normality outliers might have unduly large influence

14 THE PERMUTATION TEST The basic idea is simple estimate the null distribution of the test statistics to draw conclusions on statistical significance There is a close connection with bootstrap sampling Suppose: Experiment Trt Trt y y From distribution F From distribution G y y y n y n H 0 : F = G vs H : F G y ± s y ± s

15 THE PERMUTATION TEST y y Define a statistic (eg t = SED ) and calculate its value for the actual experiment (call it t*) Repeat B times Take a random sample of size n without replacement from the data to represent Group The remaining n observations are assigned to Group y y Compute the value of t = (call it t (i) ) SED P-value (one-tailed for H : μ > μ ) : p = Σ I(t (i) t*)/b

16 THE PERMUTATION TEST Actual Experiment Permutation Permutation Permutation B Trt Trt y y y y n y y n T y y T y n y y n y y, s y, s T y y, s T y y y y n y n y, s T y y, s T y n y y y y n y, s y, s t* = y y SED y, s t () t () t (B) One-tailed Permutation P-value: p = Σ I(t (i) t*)/b ie proportion of times that t (b) exceeds t* for b =,, B

17 SAS example B data example; input trt $ y; datalines; T 57 T 557 T 56 T 59 T 594 T 574 T 544 T 549 T 548 T 577 T 58 ; proc multtest data=example permutation nsample=0000 pvals outsamp=res; test mean(y); class Trt; contrast 'Trt -' -; run; proc print data=res(obs=); run;

18 permuted samples t Permuted DataSet y y () () () = = SED() t Permuted DataSet y y () () () = = SED() Obs _sample class obs_ y T T T T T T T T T 58 0 T 57 T 4 59 T T T 57 5 T T T T T T T 58 T 4 59 And so on

19 Summary Continuous Variable Tabulations Standard Variable trt NumObs Mean Deviation y T y T p-values Variable Contrast Raw Permutation y Trt Regular t-test P- value Permutation based two-tailed P-value

20 FREQUENCY t Value t*= 49 (actual expt) Permutation p-value = 0778 Distribution of t (i) over B=0000 permuted datasets t=-49

21 THE BOOTSTRAP The bootstrap tests are more widely applicable though less accurate than the permutation test Extremely useful for computing standard errors and confidence intervals Suppose: Experiment Trt Trt y y From distribution F From distribution G y y y n y n H 0 : F = G vs H : F G y ± s y ± s

22 THE BOOTSTRAP y y Define the statistics (eg t = ) and calculate SED its value for the data set (call it t*) Compute the estimated residuals for each observation e = y y ˆij ij i Repeat B times Draw at random n residuals of size with replacement: Assign as data for Group Draw at random n residuals of size with replacement: Assign as data for Group Compute the value of t (call it t (i) ) One-tailed P-value: p = Σ I(t (i) t*)/b (for H : μ > μ )

23 SAS example data example; input trt $ y; datalines; T 57 T 557 T 56 T 59 T 594 T 574 T 544 T 549 T 548 T 577 T 58 ; proc multtest data=example bootstrap nsample=0000 pvals outsamp=res; test mean(y); class Trt; contrast 'Trt -' -; run; proc print data=res(obs=); run; y = y = Residuals for actual expt Obs trt residual T T T T T T T T T T 0700 T 000

24 bootstrap samples t Bootstrapped DataSet y y () () () = = SED() Bootstrapped DataSet t y y () () () = = SED() And so on Obs _sample class obs_ y T 0000 T T T T T T T T T T T T T T T T T T T T T

25 Standard Variable trt NumObs Mean Deviation y T y T p-values Variable Contrast Raw Bootstrap y Trt Regular t-test P- value Bootstrap based two-tailed P-value

26 Issues with permutation and bootstrap sampling Still need to have sufficiently large samples: The granularity problem (Allison et al, 006) Limited number of permutations ( ) n+ n n+ n = n n! n! eg if n = n = 3, then only 0 permutations possible smallest possible (one-tailed) P-value is /0 = 005 Less applicability to more complex designs! Allison DB, Cui XQ, Page GP, and Sabripour M Microarray data analysis: from disarray to consolidation and consensus Nature Reviews Genetics 7: 55-65, 006

27 The multiple testing issue involving m genes Called not significant Called significant Total Constant Null true m o -F F m o Alternative true m T T m Total m-s S m (=G) F: number of Type errors m -T: number of Type errors Observed

28 A hypothetical situation involving m=0000 genes (Pawitin et al, 005; Bioinformatics) Called not significant Called significant Total Null true m o F = 905 F = 475 m o = 9500 Alternative true m T = 00 T = 400 m = 500 Total m-s = 95 S = 875 m = 0000 False positive rate F 475 = = = 005 (FPR) m m0 F 905 Specificity=-FPR = = = 095 m Consistent with using α = 005

29 A hypothetical situation involving 0000 genes (Pawitin et al, 005) Called not significant Called significant Total Null true m o F = 905 F = 475 m o = 9500 Alternative true m T = 00 T = 400 m = 500 Total m-s = 95 S = 875 m = 0000 False negative rate (FNR) m T 00 = = = m 500 Sensitivity =- T 400 = = = 080 FNR m Consistent with Power = 080

30 Controlling FWER Prob(F=) There have been improvements to controlling FWER relative to using Bonferroni (too conservative) Stepdown procedures (eg Holm s, Sidak, Westfall and Young) Multivariate permutation (next) Provided the early inspiration on multiple testing in microarray studies

31 Multivariate permutation and bootstrapping and controlling FWER A R A R Suited for each other More powerful than Bonferroni Reference Design Example (m genes): Treatment A A3 R 3 4 A4 R or B5 R Treatment B B6 R B7 R B8 R Compute t-test P-values for comparing A to B for each gene p p M pm p m

32 Multivariate permutation and bootstrapping and controlling FWER (cont d) Treatment A Treatment B Permutation * * * * m m p p p p M min * p () * * * * m m p p p p M min * p () * * * * m m p p p p M min * p () Identify gene j as significantly expressed if * () # # j of perm where p p of perm α < < Also used in Callow et al (000) Genome Research 0: 0-09 Compute P-values for each of m genes:

33 SAS program data example; input trt $ y y y3; datalines; T T T T T T T T T T T ; proc multtest data=example permutation nsample=0000 pvals outsamp=res; test mean(y y y3); class Trt; contrast 'Trt -' -; run;

34 First Two Multivariate Permutation Samples Note: correlation structure between genes is preserved only expt l unit labels are shuffled Obs _sample class obs_ y y y3 T T T T T T T T T T T T T T T T T T T T T T

35 Continuous Variable Tabulations Standard Variable trt NumObs Mean Deviation y T y T y T y T y3 T y3 T p-values Variable Contrast Raw Permutation y Trt y Trt y3 Trt Note multivariate P-values < Bonferroni adjusted P-values

36 A hypothetical situation involving 0000 genes (Pawitin et al, 005) Called not significant Called significant Total Null true m o F = 905 F = 475 m o = 9500 Alternative true m T = 00 T = 400 m = 500 Total m-s = 95 S = 875 m = 0000 False discovery rate (FDR) = F S = 875 = FDR particularly suffers when π = 0 m0 m π o : proportion of all genes are that are non-de

37 FDR (solid curves for π o = 09, 095 or 099), FPR {α} (dashed curves) and sensitivity (dotted curves) as a function of critical value of the t-statistic Half of DE genes had (μ μ )/σ = ; other half had (μ -μ ) /σ = - Figure from Pawitan, Y et al Bioinformatics 005 : ; π 0 =099 π 0 =095 π 0 =

38 Using permutation/bootstrapping to estimate FDR Small example (3000 genes) from Storey and Tibshirani (003) Compare two Groups -> Group (n = 5) vs Group (n = 3) Suppose decide to reject H o : for all genes with t > 00 would then conclude 46 genes would be statistically significant Randomly shuffle experimental units for 00 different permutation datasets and simply tabulate the number of times t >00 for each gene Average number of times t > 00 across 00 permutations is 3 Thus a simple estimate of FDR for t > 00 is 3/46*00% = 84% ie if one used t > 00 to conclude statistical significance, 84% of genes in the significant list would be estimated to be false positives Equivalently 958% of the genes in the list should be estimated to be true positives Storey, JD, and R Tibshirani (003) SAM thresholding and false discovery rates for detecting differential gene expression in DNA microarrays In Parmigiani et al (eds) The Analysis of Gene Expression Data: Methods and Software Springer, Verlag pp7-90

39 Using permutation/bootstrapping to estimate FDR (cont d) Actually estimated FDR of 84% is biased upwards recall: Called significant Called not significant Total Null true F m o -F m o Alternative true T m -T m Total S=46 m S=854 m=3000 Permutations make all m genes null, but only a mo portion π o = truly are So to improve estimate FDR m estimate, should multiple 84% by π o Estimate of π o from example (details on next slide) = 089 Therefore, improved estimate of FDR for t > 00 is 84*089 = 749%

40 How to estimate π o using permutation? Suppose it is safe to say that t <05 involve all true null hypotheses Consider number of observed t < 05 For example = 668 Consider average number of permuted t < 05 For example = 750 Therefore, Therefore: 668 ˆ π o = =

41 SAM (Significance Analysis of Microarrays) Storey and Tibshirani (003) A popular inferential procedure for differential gene expression in microarrays A mix on permutation/bootstrap methods with FDR control and shrinkage estimation (later) Permute or bootstrap on: yj yg dg = ; g =,,, m se y y + s ( ) g g 0 s o = some (50 th or 90 th ) percentile (or percentile that minimizes the CV of d g ) Empirical Bayes adjustment provides stability to unusual SED!

42 The SAM procedure yg yg d = ; g =,,, m ( ) Compute g se yg yg + s0 as based on the data and order them from smallest to largest: d < d < d < < d m < d m () () (3) ( ) ( ) Take b =,,,B permuted or bootstrap ( samples, compute d b ) g, g =,,, G for each (b) sample and order the statistics from smallest to largest within each sample b d < d < d,, d < d b b b b b () () (3) ( m ) ( m)

43 3) Compute the average of B values for each ordered d statistic: where, eg, d, d, d,, d, d () () (3) ( m ) ( m) d () = B b= d B b () 4) Plot d, d, d,, d, d vs () () (3) ( m ) ( m) d, d, d,, d, d () () (3) ( m ) ( m) and base gene list on values that fall outside bands parallel to line

44 Example Affymetrix dataset on 79 genes from each of two groups (n=4) for 8 slides Distributed with SAM software (downloadable from cademic)

45 Treatment labels Click Need first columns for gene labels

47 Plot of observed vs expected statistics d( g ) Significantly upregulated (40) Significantly downregulated (39) d( g ) Δ = difference (along 45 line) between outer two (dashed) lines with expected (blue) line Note asymmetric rejection regions

Estimating π o and FDR for regular nonpermutation (eg t-tests) procedures: Distribution of P-values Under the null hypothesis, the distribution of P-values

49 Estimating π o and FDR for regular nonpermutation (eg t-tests) procedures: Distribution of P-values Under the null hypothesis, the distribution of P-values across many independent tests is uniform on the interval [0,], regardless of the sample size and statistical test used (provided the test is valid) Frequency P-value

50 Distribution of P-values (cont d) If some genes are differentially expressed, then the frequency of low P-values should be greater than that of high P-values: Frequency P-value

51 Distribution of P-values (for 333 genes from a small boutique array at MSU) FREQUENCY 300 Ef f ect =t r eat Expected height of each bar if no differentially expressed genes 00 Plausible estimate of π o? mo π o = m Pr > t

52 How to (roughly) estimate π o? Choose all p-values above a point (λ) where the p-value frequencies start to level off (say λ = 060 based on previous slide) # p 70 ˆ0 ( ) j > λ π λ = = 0774 m( λ) 333 ( λ)

53 Estimating FDR s based on P-values Choose an arbitrary P-value cutoff (0<t<) for statistical significance Expected number of false positives (F(t)) with P-value<t determined by: E(F(t)) = m o t Hence, estimated FDR at P-value cutoff t is: E( F( t) ) mt ˆ ˆ 0 π 0mt FDR () t = = = E( S() t ) S() t S() t

54 Q-value Defined for each gene Minimum FDR that can be attained by calling that gene significant (and others that have greater statistical significance) For gene i: qˆ ( pi ) = minfdr( t) t p i

55 Small example (SAS program) data example; input raw_p; datalines; run; proc multtest fdr pdata=example; run;

56 Small example (SAS output) The Multtest Procedure p-values False Discovery Test Raw Rate The SAS procedure assumes π o = (as from Benjamini and Hochberg, 995) Just need to multiply SAS values by estimated π 0

57 Plot of q-values vs p-values (for heifer example) qval ue raw_p

58 Plot of q-values vs Number of Declared Significant Genes qval ue S_t

59 Classical linear model analysis So far, Comparison of two treatments Simple design structure -> Simple linear model: Y ij = μ + trt i + e ij μ = overall mean trt i = effect of treatment i e ij : random experimental (biological) error Formal linear model analysis is not necessary unless t >

60 Common reference design for two treatments balanced for dye assignments Cy3 Cy5 A R A R A3 R An R eg Response (y ijkg ) = log ratio of fluorescence intensity (relative to reference common sample R) B B B3 Bn i: treatment j: dye assignment to test sample R R R R k: biological rep g: gene

61 Linear model for common reference with dye balance Y ijk = μ + trt i +dye j + e ijk μ = overall mean trt i = effect of treatment i, dye j = effect of dye j assigned to treated sample e ijk : random experimental (biological) error for biological rep k assigned to trt i and dye k Simple linear model analysis (ANOVA)

62 Representative data ( refdye ): i j k y ijk Cy Cy Cy Cy Cy Cy Cy Cy Cy Cy Cy Cy5 logfluor fold subj dye trt Obs n = 6

63 SAS ANOVA code proc mixed data=refdye; class trt dye; model logfluor = trt dye; lsmeans trt /diff; run;

64 SAS output ANOVA table Type 3 Tests of Fixed Effects Effect Num DF Den DF trt 9 dye 9 Adjusted trt means F Value Pr > F Least Squares Means Effect trt Estimate Standard Error DF tvalue Pr > t trt trt Adjusted trt mean difference Differences of Least Squares Means Effect trt _trt Estimate Standard Error DF tvalue trt Est fold change (relative to reference) = 0487 Est fold change(trt vs trt) = -036 Pr > t 0490

65 Balanced Block Design Example ): Comparison of two treatments each based on n subjects A A A3 An B B B3 Bn Total of n biological replicates Probably might be good to have even n (balanced dye swap) eg Response (y ijkg ) = log fluorescence intensity for treatment i on subject j within array k on gene g

66 Linear (mixed) model for balanced block design Y ijk = μ + trt i +dye j + array k +e ijk μ = overall mean trt i = effect of treatment i, dye j = effect of dye j assigned to treated sample array k = random effect of array k e ijk : random experimental (biological) error for biological rep assigned to trt i and dye j within array k Simple linear mixed model analysis (ANOVA) Each ijk identifies a unique biological replicate

67 logfluor fluorescence array dye trt Obs Representative data ( balanceblock )

68 SAS code proc mixed data=balanceblock method = type3; class array trt dye; model logfluor = trt dye; random array; lsmeans trt /diff; run;

69 Representative output Source df SS MS Expected Mean Square Error Term trt Var(Residual) + Q(trt) MS(Residual) dye Var(Residual) + Q(dye) MS(Residual) array Var(Residual) + Var(array) MS(Residual) Resid Var(Residual) Source Error DF F Value Pr > F trt dye array Resid

70 Representative output (cont d) Least Squares Means Effect trt Estimate Standard Error DF tvalue Pr > t trt <000 trt <000 Differences of Least Squares Means Effect trt _trt Estimate Standard Error DF tvalue Pr > t trt Est fold change(trt vs trt) = 0389

71 Balanced Block Design (Two Color blocking on array & subject) Example ): Comparison of two treatments/tissues within each of n subjects A A A3 An A A A3 An Total of n mice/arrays Probably be good to have even n eg Response (y ijkg ) = log fluorescence intensity from tissue/treatment i on slide/animal j from array k for gene g Same linear mixed model as previous!

72 A dairy heifer expt (Two Color blocking on array & subject) Two treatments (A & B) randomly assigned to one of two mrna aliquots taken from the same animal Trt A Trt A Trt A Trt B Trt B Trt B Trt B Trt B Trt B Trt A Trt A Trt A Heifer Heifer Heifer 3 Heifer 4 Heifer 5 Heifer 6 Dye and treatments orthogonal to each other Heifer and array confounded with each other

73 4 rows & 8 columns = 3 printtips 359 genes 4 spots per gene: therefore need to distinguish experimental from pseudo replication

74 Inference strategies ) Could average the intensities at the 4 spots for each gene Would still need to model treatment, dye and array effects Then same mixed model analysis as one presented previously! ) Explicitly model spot variability and treatment*array variability

75 Array data for one gene ( heifer ): Obs array dye trt spot logf Array0 Cy Array0 Cy Array0 Cy Array0 Cy Array0 Cy Array0 Cy Array0 Cy Array0 Cy Array8 Cy Array8 Cy Array8 Cy Array8 Cy Array8 Cy Array8 Cy Array8 Cy Array8 Cy Array9 Cy Array9 Cy Array9 Cy Array9 Cy Array9 Cy Array9 Cy Array9 Cy Array9 Cy Obs array dye trt spot logf 5 Array35 Cy Array35 Cy Array35 Cy Array35 Cy Array35 Cy Array35 Cy Array35 Cy Array35 Cy Array36 Cy Array36 Cy Array36 Cy Array36 Cy Array36 Cy Array36 Cy Array36 Cy Array36 Cy Array88 Cy Array88 Cy Array88 Cy Array88 Cy Array88 Cy Array88 Cy Array88 Cy Array88 Cy

76 ANOVA (SAS PROC MIXED) proc mixed data=heifer method=type3 ; class array dye trt spot ; model resid = dye trt; random array dye*trt*array spot(array); lsmeans trt /diff; run;

77 Some output Source DF Sum of Squares Mean Square Expected Mean Square dye Var(Residual) + 4 Var(array*dye*trt) + Q(dye) trt Var(Residual) + 4 Var(array*dye*trt) + Q(trt) array Var(Residual) + Var(spot(array)) + 4 Var(array*dye*trt) + 8 Var(array) array*dye* trt Var(Residual) + 4 Var(array*dye*trt) spot(array) Var(Residual) + Var(spot(array)) Residual Var(Residual) Source Error DF F Value Pr > F dye trt array array*dye*trt spot(array) <000 Residual

78 ANOVA table with EMS for example with technical replication Source Treatment Dye Array Array*Treat Spot(Array) Residual df SS SS t SS d SS a SS a*t SS s(a) SS e MS MS t MS d MS a MS a(t) MS e EMS σ + 4σ + γ σ σ γ σ σ σ σ σ e a* t trt e + 4 a* t + dye e + s( a) + 4 a* t + 8 a + 4σ e a* t s( a) e e σ e + σ φ = σ

79 Least Squares Means Effect trt Estimate Standard Error DF tvalue Pr > t trt <000 trt <000 Differences of Least Squares Means Effect trt _trt Estimate Standard Error DF tvalue Pr > t trt Hence estimated trt : trt fold change = = 086

80 Another example: Connected Loop Design (n=4) A A C Loop Loop B C B A3 A4 Loop 3 Loop 4 C3 B3 C4 B4

81 Mixed model approach Source Treatment Dye Array Animal(Trt) Residual df df t df d df b df a(t) df e SS SS t SS d SS b SS a(t) SS e MS MS t MS d MS b MS a(t) MS e EMS σ + 5σ + γ e animal( trt) trt σ e + γ dye σe + 5σarray σe + 5σanimal( trt) σ e

82 What next? FDR adjustment on P-values to provide q-values Same procedure as described previously Use FDR control criterion to come up with a gene list

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008 1 / 35 Lecture outline Motivation for not using