Inferential Statistical Analysis of Microarray Experiments 2007 Arizona Microarray Workshop

Size: px
Start display at page:

Download "Inferential Statistical Analysis of Microarray Experiments 2007 Arizona Microarray Workshop"

Transcription

1 Inferential Statistical Analysis of Microarray Experiments 007 Arizona Microarray Workshop μ!! Robert J Tempelman Department of Animal Science tempelma@msuedu

2 HYPOTHESIS TESTING (as if there was only one gene) Significance level (False Positive Rate) H 0 is not rejected H 0 is rejected H 0 is true (non-de) H 0 is false (DE) No error (-α) Type II error (β) Type I error (α) No error (-β) DE: differentially expressed Standard approach: Power (Sensitivity) Specify an acceptable type I error rate (α) Seek tests that minimize the type II error rate (β), ie, maximize power ( - β)

3 Unpaired comparisons between two treatments Example ) Affymetrix Data: one specimen (expt l unit/biological rep) per array A A An B B Bn eg response (y ijg ) = log fluorescence intensity for subject j on gene k within Group i i =,,,T (T = # of treatments) j =,,,n i g =,,,G (n i = # of biological reps within treatment i) (G= # of genes)

4 Unpaired comparisons between two treatments (cont d) Cy5 Example ) Reference (two color) Designs: A R B R A R B R An R Bn R Cy3 eg Response (y ijg ) = log ratio of fluorescence intensity (relative to reference common sample R) Subscripts as on previous slide (one measure per probe)

5 Linear statistical model Basis for classical statistical inference Consider linear model for one gene (drop g) Y ij = μ + trt i + e ij μ = overall mean trt i = effect of treatment i True mean for treatment i e ij : random experimental (biological) error

6 THE TWO-SAMPLE t-test (T=) Assume equal variances within each treatment Sample statistics y g y g, For gene k Sample Sample y g y g y n,g s g s g y g y n,g y g, s g ( n ) s + ( n ) s = n + n g g Given yjg ~ N( μg, σg) and H σ : μ = μ vs H : μ μ = σ g g y ~ N( μ, σ ) jg g g 0 g g g g t = y Test statistic: y ~ t g g g ( n+ n ) sg + n n weighted average of s g and s g SED g

7 DECISION RULES AND P-VALUES (IGNORING MULTIPLICITY) One-tailed test (H : μ g - μ g > 0) If t g > t α Gene k concluded to be differentially expressed t g = y y g g SED g

8 t g = y DECISION RULES AND P-VALUES (IGNORING MULTIPLICITY) Two-tailed test (H : μ g - μ g 0) g g SED y g Compare t k to t α/ or compare P-value Prob(t> t k ) to α ) P-value < α Reject H o : μ g -μ g = 0 α/ α/ ) P-value > α Fail to reject H o :μ g -μ g = 0 -t α/ t t α/

9 DECISION RULES AND P-VALUES (EXAMPLES) α = 005 t 60 ) t = 5: P-value =005< α -> Reject H o : Prob(t<-5) Prob (t>5) -5 5 ) t = -0: P-value = 03 > α > Fail to reject H o : Prob(t<-0) Prob (t>0) -0 0

10 THE TWO-SAMPLE t-test σ σ COMMENT: If, the t-test should be altered accordingly: t = y s n y s + n t ~ df *, where: s s + n n df * = s s n n + n + n + (Satterthwaite, 946)

11 EXAMPLE and SAS CODE for one gene T T n =6; n = 5 y = y = s = 05 s = 08 s ( n ) s + ( n ) s = = n + n 0069 t = = NS

12 RESULTS P-value (two-tailed) =0695 = Prob(t 9 >49) + Prob(t 9 <-49) PROC TTEST

13 An issue taken with the two sample t-test (or classical linear model analysis) Distributional assumptions especially with small n effect of non-normality outliers might have unduly large influence

14 THE PERMUTATION TEST The basic idea is simple estimate the null distribution of the test statistics to draw conclusions on statistical significance There is a close connection with bootstrap sampling Suppose: Experiment Trt Trt y y From distribution F From distribution G y y y n y n H 0 : F = G vs H : F G y ± s y ± s

15 THE PERMUTATION TEST y y Define a statistic (eg t = SED ) and calculate its value for the actual experiment (call it t*) Repeat B times Take a random sample of size n without replacement from the data to represent Group The remaining n observations are assigned to Group y y Compute the value of t = (call it t (i) ) SED P-value (one-tailed for H : μ > μ ) : p = Σ I(t (i) t*)/b

16 THE PERMUTATION TEST Actual Experiment Permutation Permutation Permutation B Trt Trt y y y y n y y n T y y T y n y y n y y, s y, s T y y, s T y y y y n y n y, s T y y, s T y n y y y y n y, s y, s t* = y y SED y, s t () t () t (B) One-tailed Permutation P-value: p = Σ I(t (i) t*)/b ie proportion of times that t (b) exceeds t* for b =,, B

17 SAS example B data example; input trt $ y; datalines; T 57 T 557 T 56 T 59 T 594 T 574 T 544 T 549 T 548 T 577 T 58 ; proc multtest data=example permutation nsample=0000 pvals outsamp=res; test mean(y); class Trt; contrast 'Trt -' -; run; proc print data=res(obs=); run;

18 permuted samples t Permuted DataSet y y () () () = = SED() t Permuted DataSet y y () () () = = SED() Obs _sample class obs_ y T T T T T T T T T 58 0 T 57 T 4 59 T T T 57 5 T T T T T T T 58 T 4 59 And so on

19 Summary Continuous Variable Tabulations Standard Variable trt NumObs Mean Deviation y T y T p-values Variable Contrast Raw Permutation y Trt Regular t-test P- value Permutation based two-tailed P-value

20 FREQUENCY t Value t*= 49 (actual expt) Permutation p-value = 0778 Distribution of t (i) over B=0000 permuted datasets t=-49

21 THE BOOTSTRAP The bootstrap tests are more widely applicable though less accurate than the permutation test Extremely useful for computing standard errors and confidence intervals Suppose: Experiment Trt Trt y y From distribution F From distribution G y y y n y n H 0 : F = G vs H : F G y ± s y ± s

22 THE BOOTSTRAP y y Define the statistics (eg t = ) and calculate SED its value for the data set (call it t*) Compute the estimated residuals for each observation e = y y ˆij ij i Repeat B times Draw at random n residuals of size with replacement: Assign as data for Group Draw at random n residuals of size with replacement: Assign as data for Group Compute the value of t (call it t (i) ) One-tailed P-value: p = Σ I(t (i) t*)/b (for H : μ > μ )

23 SAS example data example; input trt $ y; datalines; T 57 T 557 T 56 T 59 T 594 T 574 T 544 T 549 T 548 T 577 T 58 ; proc multtest data=example bootstrap nsample=0000 pvals outsamp=res; test mean(y); class Trt; contrast 'Trt -' -; run; proc print data=res(obs=); run; y = y = Residuals for actual expt Obs trt residual T T T T T T T T T T 0700 T 000

24 bootstrap samples t Bootstrapped DataSet y y () () () = = SED() Bootstrapped DataSet t y y () () () = = SED() And so on Obs _sample class obs_ y T 0000 T T T T T T T T T T T T T T T T T T T T T

25 Standard Variable trt NumObs Mean Deviation y T y T p-values Variable Contrast Raw Bootstrap y Trt Regular t-test P- value Bootstrap based two-tailed P-value

26 Issues with permutation and bootstrap sampling Still need to have sufficiently large samples: The granularity problem (Allison et al, 006) Limited number of permutations ( ) n+ n n+ n = n n! n! eg if n = n = 3, then only 0 permutations possible smallest possible (one-tailed) P-value is /0 = 005 Less applicability to more complex designs! Allison DB, Cui XQ, Page GP, and Sabripour M Microarray data analysis: from disarray to consolidation and consensus Nature Reviews Genetics 7: 55-65, 006

27 The multiple testing issue involving m genes Called not significant Called significant Total Constant Null true m o -F F m o Alternative true m T T m Total m-s S m (=G) F: number of Type errors m -T: number of Type errors Observed

28 A hypothetical situation involving m=0000 genes (Pawitin et al, 005; Bioinformatics) Called not significant Called significant Total Null true m o F = 905 F = 475 m o = 9500 Alternative true m T = 00 T = 400 m = 500 Total m-s = 95 S = 875 m = 0000 False positive rate F 475 = = = 005 (FPR) m m0 F 905 Specificity=-FPR = = = 095 m Consistent with using α = 005

29 A hypothetical situation involving 0000 genes (Pawitin et al, 005) Called not significant Called significant Total Null true m o F = 905 F = 475 m o = 9500 Alternative true m T = 00 T = 400 m = 500 Total m-s = 95 S = 875 m = 0000 False negative rate (FNR) m T 00 = = = m 500 Sensitivity =- T 400 = = = 080 FNR m Consistent with Power = 080

30 Controlling FWER Prob(F=) There have been improvements to controlling FWER relative to using Bonferroni (too conservative) Stepdown procedures (eg Holm s, Sidak, Westfall and Young) Multivariate permutation (next) Provided the early inspiration on multiple testing in microarray studies

31 Multivariate permutation and bootstrapping and controlling FWER A R A R Suited for each other More powerful than Bonferroni Reference Design Example (m genes): Treatment A A3 R 3 4 A4 R or B5 R Treatment B B6 R B7 R B8 R Compute t-test P-values for comparing A to B for each gene p p M pm p m

32 Multivariate permutation and bootstrapping and controlling FWER (cont d) Treatment A Treatment B Permutation * * * * m m p p p p M min * p () * * * * m m p p p p M min * p () * * * * m m p p p p M min * p () Identify gene j as significantly expressed if * () # # j of perm where p p of perm α < < Also used in Callow et al (000) Genome Research 0: 0-09 Compute P-values for each of m genes:

33 SAS program data example; input trt $ y y y3; datalines; T T T T T T T T T T T ; proc multtest data=example permutation nsample=0000 pvals outsamp=res; test mean(y y y3); class Trt; contrast 'Trt -' -; run;

34 First Two Multivariate Permutation Samples Note: correlation structure between genes is preserved only expt l unit labels are shuffled Obs _sample class obs_ y y y3 T T T T T T T T T T T T T T T T T T T T T T

35 Continuous Variable Tabulations Standard Variable trt NumObs Mean Deviation y T y T y T y T y3 T y3 T p-values Variable Contrast Raw Permutation y Trt y Trt y3 Trt Note multivariate P-values < Bonferroni adjusted P-values

36 A hypothetical situation involving 0000 genes (Pawitin et al, 005) Called not significant Called significant Total Null true m o F = 905 F = 475 m o = 9500 Alternative true m T = 00 T = 400 m = 500 Total m-s = 95 S = 875 m = 0000 False discovery rate (FDR) = F S = 875 = FDR particularly suffers when π = 0 m0 m π o : proportion of all genes are that are non-de

37 FDR (solid curves for π o = 09, 095 or 099), FPR {α} (dashed curves) and sensitivity (dotted curves) as a function of critical value of the t-statistic Half of DE genes had (μ μ )/σ = ; other half had (μ -μ ) /σ = - Figure from Pawitan, Y et al Bioinformatics 005 : ; π 0 =099 π 0 =095 π 0 =

38 Using permutation/bootstrapping to estimate FDR Small example (3000 genes) from Storey and Tibshirani (003) Compare two Groups -> Group (n = 5) vs Group (n = 3) Suppose decide to reject H o : for all genes with t > 00 would then conclude 46 genes would be statistically significant Randomly shuffle experimental units for 00 different permutation datasets and simply tabulate the number of times t >00 for each gene Average number of times t > 00 across 00 permutations is 3 Thus a simple estimate of FDR for t > 00 is 3/46*00% = 84% ie if one used t > 00 to conclude statistical significance, 84% of genes in the significant list would be estimated to be false positives Equivalently 958% of the genes in the list should be estimated to be true positives Storey, JD, and R Tibshirani (003) SAM thresholding and false discovery rates for detecting differential gene expression in DNA microarrays In Parmigiani et al (eds) The Analysis of Gene Expression Data: Methods and Software Springer, Verlag pp7-90

39 Using permutation/bootstrapping to estimate FDR (cont d) Actually estimated FDR of 84% is biased upwards recall: Called significant Called not significant Total Null true F m o -F m o Alternative true T m -T m Total S=46 m S=854 m=3000 Permutations make all m genes null, but only a mo portion π o = truly are So to improve estimate FDR m estimate, should multiple 84% by π o Estimate of π o from example (details on next slide) = 089 Therefore, improved estimate of FDR for t > 00 is 84*089 = 749%

40 How to estimate π o using permutation? Suppose it is safe to say that t <05 involve all true null hypotheses Consider number of observed t < 05 For example = 668 Consider average number of permuted t < 05 For example = 750 Therefore, Therefore: 668 ˆ π o = =

41 SAM (Significance Analysis of Microarrays) Storey and Tibshirani (003) A popular inferential procedure for differential gene expression in microarrays A mix on permutation/bootstrap methods with FDR control and shrinkage estimation (later) Permute or bootstrap on: yj yg dg = ; g =,,, m se y y + s ( ) g g 0 s o = some (50 th or 90 th ) percentile (or percentile that minimizes the CV of d g ) Empirical Bayes adjustment provides stability to unusual SED!

42 The SAM procedure yg yg d = ; g =,,, m ( ) Compute g se yg yg + s0 as based on the data and order them from smallest to largest: d < d < d < < d m < d m () () (3) ( ) ( ) Take b =,,,B permuted or bootstrap ( samples, compute d b ) g, g =,,, G for each (b) sample and order the statistics from smallest to largest within each sample b d < d < d,, d < d b b b b b () () (3) ( m ) ( m)

43 3) Compute the average of B values for each ordered d statistic: where, eg, d, d, d,, d, d () () (3) ( m ) ( m) d () = B b= d B b () 4) Plot d, d, d,, d, d vs () () (3) ( m ) ( m) d, d, d,, d, d () () (3) ( m ) ( m) and base gene list on values that fall outside bands parallel to line

44 Example Affymetrix dataset on 79 genes from each of two groups (n=4) for 8 slides Distributed with SAM software (downloadable from cademic)

45 Treatment labels Click Need first columns for gene labels

46

47 Plot of observed vs expected statistics d( g ) Significantly upregulated (40) Significantly downregulated (39) d( g ) Δ = difference (along 45 line) between outer two (dashed) lines with expected (blue) line Note asymmetric rejection regions

48

49 Estimating π o and FDR for regular nonpermutation (eg t-tests) procedures: Distribution of P-values Under the null hypothesis, the distribution of P-values across many independent tests is uniform on the interval [0,], regardless of the sample size and statistical test used (provided the test is valid) Frequency P-value

50 Distribution of P-values (cont d) If some genes are differentially expressed, then the frequency of low P-values should be greater than that of high P-values: Frequency P-value

51 Distribution of P-values (for 333 genes from a small boutique array at MSU) FREQUENCY 300 Ef f ect =t r eat Expected height of each bar if no differentially expressed genes 00 Plausible estimate of π o? mo π o = m Pr > t

52 How to (roughly) estimate π o? Choose all p-values above a point (λ) where the p-value frequencies start to level off (say λ = 060 based on previous slide) # p 70 ˆ0 ( ) j > λ π λ = = 0774 m( λ) 333 ( λ)

53 Estimating FDR s based on P-values Choose an arbitrary P-value cutoff (0<t<) for statistical significance Expected number of false positives (F(t)) with P-value<t determined by: E(F(t)) = m o t Hence, estimated FDR at P-value cutoff t is: E( F( t) ) mt ˆ ˆ 0 π 0mt FDR () t = = = E( S() t ) S() t S() t

54 Q-value Defined for each gene Minimum FDR that can be attained by calling that gene significant (and others that have greater statistical significance) For gene i: qˆ ( pi ) = minfdr( t) t p i

55 Small example (SAS program) data example; input raw_p; datalines; run; proc multtest fdr pdata=example; run;

56 Small example (SAS output) The Multtest Procedure p-values False Discovery Test Raw Rate The SAS procedure assumes π o = (as from Benjamini and Hochberg, 995) Just need to multiply SAS values by estimated π 0

57 Plot of q-values vs p-values (for heifer example) qval ue raw_p

58 Plot of q-values vs Number of Declared Significant Genes qval ue S_t

59 Classical linear model analysis So far, Comparison of two treatments Simple design structure -> Simple linear model: Y ij = μ + trt i + e ij μ = overall mean trt i = effect of treatment i e ij : random experimental (biological) error Formal linear model analysis is not necessary unless t >

60 Common reference design for two treatments balanced for dye assignments Cy3 Cy5 A R A R A3 R An R eg Response (y ijkg ) = log ratio of fluorescence intensity (relative to reference common sample R) B B B3 Bn i: treatment j: dye assignment to test sample R R R R k: biological rep g: gene

61 Linear model for common reference with dye balance Y ijk = μ + trt i +dye j + e ijk μ = overall mean trt i = effect of treatment i, dye j = effect of dye j assigned to treated sample e ijk : random experimental (biological) error for biological rep k assigned to trt i and dye k Simple linear model analysis (ANOVA)

62 Representative data ( refdye ): i j k y ijk Cy Cy Cy Cy Cy Cy Cy Cy Cy Cy Cy Cy5 logfluor fold subj dye trt Obs n = 6

63 SAS ANOVA code proc mixed data=refdye; class trt dye; model logfluor = trt dye; lsmeans trt /diff; run;

64 SAS output ANOVA table Type 3 Tests of Fixed Effects Effect Num DF Den DF trt 9 dye 9 Adjusted trt means F Value Pr > F Least Squares Means Effect trt Estimate Standard Error DF tvalue Pr > t trt trt Adjusted trt mean difference Differences of Least Squares Means Effect trt _trt Estimate Standard Error DF tvalue trt Est fold change (relative to reference) = 0487 Est fold change(trt vs trt) = -036 Pr > t 0490

65 Balanced Block Design Example ): Comparison of two treatments each based on n subjects A A A3 An B B B3 Bn Total of n biological replicates Probably might be good to have even n (balanced dye swap) eg Response (y ijkg ) = log fluorescence intensity for treatment i on subject j within array k on gene g

66 Linear (mixed) model for balanced block design Y ijk = μ + trt i +dye j + array k +e ijk μ = overall mean trt i = effect of treatment i, dye j = effect of dye j assigned to treated sample array k = random effect of array k e ijk : random experimental (biological) error for biological rep assigned to trt i and dye j within array k Simple linear mixed model analysis (ANOVA) Each ijk identifies a unique biological replicate

67 logfluor fluorescence array dye trt Obs Representative data ( balanceblock )

68 SAS code proc mixed data=balanceblock method = type3; class array trt dye; model logfluor = trt dye; random array; lsmeans trt /diff; run;

69 Representative output Source df SS MS Expected Mean Square Error Term trt Var(Residual) + Q(trt) MS(Residual) dye Var(Residual) + Q(dye) MS(Residual) array Var(Residual) + Var(array) MS(Residual) Resid Var(Residual) Source Error DF F Value Pr > F trt dye array Resid

70 Representative output (cont d) Least Squares Means Effect trt Estimate Standard Error DF tvalue Pr > t trt <000 trt <000 Differences of Least Squares Means Effect trt _trt Estimate Standard Error DF tvalue Pr > t trt Est fold change(trt vs trt) = 0389

71 Balanced Block Design (Two Color blocking on array & subject) Example ): Comparison of two treatments/tissues within each of n subjects A A A3 An A A A3 An Total of n mice/arrays Probably be good to have even n eg Response (y ijkg ) = log fluorescence intensity from tissue/treatment i on slide/animal j from array k for gene g Same linear mixed model as previous!

72 A dairy heifer expt (Two Color blocking on array & subject) Two treatments (A & B) randomly assigned to one of two mrna aliquots taken from the same animal Trt A Trt A Trt A Trt B Trt B Trt B Trt B Trt B Trt B Trt A Trt A Trt A Heifer Heifer Heifer 3 Heifer 4 Heifer 5 Heifer 6 Dye and treatments orthogonal to each other Heifer and array confounded with each other

73 4 rows & 8 columns = 3 printtips 359 genes 4 spots per gene: therefore need to distinguish experimental from pseudo replication

74 Inference strategies ) Could average the intensities at the 4 spots for each gene Would still need to model treatment, dye and array effects Then same mixed model analysis as one presented previously! ) Explicitly model spot variability and treatment*array variability

75 Array data for one gene ( heifer ): Obs array dye trt spot logf Array0 Cy Array0 Cy Array0 Cy Array0 Cy Array0 Cy Array0 Cy Array0 Cy Array0 Cy Array8 Cy Array8 Cy Array8 Cy Array8 Cy Array8 Cy Array8 Cy Array8 Cy Array8 Cy Array9 Cy Array9 Cy Array9 Cy Array9 Cy Array9 Cy Array9 Cy Array9 Cy Array9 Cy Obs array dye trt spot logf 5 Array35 Cy Array35 Cy Array35 Cy Array35 Cy Array35 Cy Array35 Cy Array35 Cy Array35 Cy Array36 Cy Array36 Cy Array36 Cy Array36 Cy Array36 Cy Array36 Cy Array36 Cy Array36 Cy Array88 Cy Array88 Cy Array88 Cy Array88 Cy Array88 Cy Array88 Cy Array88 Cy Array88 Cy

76 ANOVA (SAS PROC MIXED) proc mixed data=heifer method=type3 ; class array dye trt spot ; model resid = dye trt; random array dye*trt*array spot(array); lsmeans trt /diff; run;

77 Some output Source DF Sum of Squares Mean Square Expected Mean Square dye Var(Residual) + 4 Var(array*dye*trt) + Q(dye) trt Var(Residual) + 4 Var(array*dye*trt) + Q(trt) array Var(Residual) + Var(spot(array)) + 4 Var(array*dye*trt) + 8 Var(array) array*dye* trt Var(Residual) + 4 Var(array*dye*trt) spot(array) Var(Residual) + Var(spot(array)) Residual Var(Residual) Source Error DF F Value Pr > F dye trt array array*dye*trt spot(array) <000 Residual

78 ANOVA table with EMS for example with technical replication Source Treatment Dye Array Array*Treat Spot(Array) Residual df SS SS t SS d SS a SS a*t SS s(a) SS e MS MS t MS d MS a MS a(t) MS e EMS σ + 4σ + γ σ σ γ σ σ σ σ σ e a* t trt e + 4 a* t + dye e + s( a) + 4 a* t + 8 a + 4σ e a* t s( a) e e σ e + σ φ = σ

79 Least Squares Means Effect trt Estimate Standard Error DF tvalue Pr > t trt <000 trt <000 Differences of Least Squares Means Effect trt _trt Estimate Standard Error DF tvalue Pr > t trt Hence estimated trt : trt fold change = = 086

80 Another example: Connected Loop Design (n=4) A A C Loop Loop B C B A3 A4 Loop 3 Loop 4 C3 B3 C4 B4

81 Mixed model approach Source Treatment Dye Array Animal(Trt) Residual df df t df d df b df a(t) df e SS SS t SS d SS b SS a(t) SS e MS MS t MS d MS b MS a(t) MS e EMS σ + 5σ + γ e animal( trt) trt σ e + γ dye σe + 5σarray σe + 5σanimal( trt) σ e

82 What next? FDR adjustment on P-values to provide q-values Same procedure as described previously Use FDR control criterion to come up with a gene list

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008 1 / 35 Lecture outline Motivation for not using

More information

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018 High-Throughput Sequencing Course Multiple Testing Biostatistics and Bioinformatics Summer 2018 Introduction You have previously considered the significance of a single gene Introduction You have previously

More information

Estimation of the False Discovery Rate

Estimation of the False Discovery Rate Estimation of the False Discovery Rate Coffee Talk, Bioinformatics Research Center, Sept, 2005 Jason A. Osborne, osborne@stat.ncsu.edu Department of Statistics, North Carolina State University 1 Outline

More information

Non-specific filtering and control of false positives

Non-specific filtering and control of false positives Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview

More information

Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. T=number of type 2 errors

Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. T=number of type 2 errors The Multiple Testing Problem Multiple Testing Methods for the Analysis of Microarray Data 3/9/2009 Copyright 2009 Dan Nettleton Suppose one test of interest has been conducted for each of m genes in a

More information

Linear Combinations. Comparison of treatment means. Bruce A Craig. Department of Statistics Purdue University. STAT 514 Topic 6 1

Linear Combinations. Comparison of treatment means. Bruce A Craig. Department of Statistics Purdue University. STAT 514 Topic 6 1 Linear Combinations Comparison of treatment means Bruce A Craig Department of Statistics Purdue University STAT 514 Topic 6 1 Linear Combinations of Means y ij = µ + τ i + ǫ ij = µ i + ǫ ij Often study

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca

More information

High-throughput Testing

High-throughput Testing High-throughput Testing Noah Simon and Richard Simon July 2016 1 / 29 Testing vs Prediction On each of n patients measure y i - single binary outcome (eg. progression after a year, PCR) x i - p-vector

More information

Chapter 11. Analysis of Variance (One-Way)

Chapter 11. Analysis of Variance (One-Way) Chapter 11 Analysis of Variance (One-Way) We now develop a statistical procedure for comparing the means of two or more groups, known as analysis of variance or ANOVA. These groups might be the result

More information

13. The Cochran-Satterthwaite Approximation for Linear Combinations of Mean Squares

13. The Cochran-Satterthwaite Approximation for Linear Combinations of Mean Squares 13. The Cochran-Satterthwaite Approximation for Linear Combinations of Mean Squares opyright c 2018 Dan Nettleton (Iowa State University) 13. Statistics 510 1 / 18 Suppose M 1,..., M k are independent

More information

Review Article Statistical Analysis of Efficient Unbalanced Factorial Designs for Two-Color Microarray Experiments

Review Article Statistical Analysis of Efficient Unbalanced Factorial Designs for Two-Color Microarray Experiments International Journal of Plant Genomics Volume 2008, Article ID 584360, 16 pages doi:10.1155/2008/584360 Review Article Statistical Analysis of Efficient Unbalanced Factorial Designs for Two-Color Microarray

More information

Outline. Topic 19 - Inference. The Cell Means Model. Estimates. Inference for Means Differences in cell means Contrasts. STAT Fall 2013

Outline. Topic 19 - Inference. The Cell Means Model. Estimates. Inference for Means Differences in cell means Contrasts. STAT Fall 2013 Topic 19 - Inference - Fall 2013 Outline Inference for Means Differences in cell means Contrasts Multiplicity Topic 19 2 The Cell Means Model Expressed numerically Y ij = µ i + ε ij where µ i is the theoretical

More information

Multiple testing: Intro & FWER 1

Multiple testing: Intro & FWER 1 Multiple testing: Intro & FWER 1 Mark van de Wiel mark.vdwiel@vumc.nl Dep of Epidemiology & Biostatistics,VUmc, Amsterdam Dep of Mathematics, VU 1 Some slides courtesy of Jelle Goeman 1 Practical notes

More information

Statistical testing. Samantha Kleinberg. October 20, 2009

Statistical testing. Samantha Kleinberg. October 20, 2009 October 20, 2009 Intro to significance testing Significance testing and bioinformatics Gene expression: Frequently have microarray data for some group of subjects with/without the disease. Want to find

More information

Tools and topics for microarray analysis

Tools and topics for microarray analysis Tools and topics for microarray analysis USSES Conference, Blowing Rock, North Carolina, June, 2005 Jason A. Osborne, osborne@stat.ncsu.edu Department of Statistics, North Carolina State University 1 Outline

More information

COMPLETELY RANDOM DESIGN (CRD) -Design can be used when experimental units are essentially homogeneous.

COMPLETELY RANDOM DESIGN (CRD) -Design can be used when experimental units are essentially homogeneous. COMPLETELY RANDOM DESIGN (CRD) Description of the Design -Simplest design to use. -Design can be used when experimental units are essentially homogeneous. -Because of the homogeneity requirement, it may

More information

Advanced Statistical Methods: Beyond Linear Regression

Advanced Statistical Methods: Beyond Linear Regression Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi

More information

Sta$s$cs for Genomics ( )

Sta$s$cs for Genomics ( ) Sta$s$cs for Genomics (140.688) Instructor: Jeff Leek Slide Credits: Rafael Irizarry, John Storey No announcements today. Hypothesis testing Once you have a given score for each gene, how do you decide

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 6, Issue 1 2007 Article 28 A Comparison of Methods to Control Type I Errors in Microarray Studies Jinsong Chen Mark J. van der Laan Martyn

More information

Contrasts and Multiple Comparisons Supplement for Pages

Contrasts and Multiple Comparisons Supplement for Pages Contrasts and Multiple Comparisons Supplement for Pages 302-323 Brian Habing University of South Carolina Last Updated: July 20, 2001 The F-test from the ANOVA table allows us to test the null hypothesis

More information

Sample Size Estimation for Studies of High-Dimensional Data

Sample Size Estimation for Studies of High-Dimensional Data Sample Size Estimation for Studies of High-Dimensional Data James J. Chen, Ph.D. National Center for Toxicological Research Food and Drug Administration June 3, 2009 China Medical University Taichung,

More information

Single gene analysis of differential expression

Single gene analysis of differential expression Single gene analysis of differential expression Giorgio Valentini DSI Dipartimento di Scienze dell Informazione Università degli Studi di Milano valentini@dsi.unimi.it Comparing two conditions Each condition

More information

Sleep data, two drugs Ch13.xls

Sleep data, two drugs Ch13.xls Model Based Statistics in Biology. Part IV. The General Linear Mixed Model.. Chapter 13.3 Fixed*Random Effects (Paired t-test) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch

More information

Chapter 3: Statistical methods for estimation and testing. Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001).

Chapter 3: Statistical methods for estimation and testing. Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001). Chapter 3: Statistical methods for estimation and testing Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001). Chapter 3: Statistical methods for estimation and testing Key reference:

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University Multiple Testing Hoang Tran Department of Statistics, Florida State University Large-Scale Testing Examples: Microarray data: testing differences in gene expression between two traits/conditions Microbiome

More information

Design of Microarray Experiments. Xiangqin Cui

Design of Microarray Experiments. Xiangqin Cui Design of Microarray Experiments Xiangqin Cui Experimental design Experimental design: is a term used about efficient methods for planning the collection of data, in order to obtain the maximum amount

More information

The miss rate for the analysis of gene expression data

The miss rate for the analysis of gene expression data Biostatistics (2005), 6, 1,pp. 111 117 doi: 10.1093/biostatistics/kxh021 The miss rate for the analysis of gene expression data JONATHAN TAYLOR Department of Statistics, Stanford University, Stanford,

More information

Sample Size / Power Calculations

Sample Size / Power Calculations Sample Size / Power Calculations A Simple Example Goal: To study the effect of cold on blood pressure (mmhg) in rats Use a Completely Randomized Design (CRD): 12 rats are randomly assigned to one of two

More information

Statistical analysis of microarray data: a Bayesian approach

Statistical analysis of microarray data: a Bayesian approach Biostatistics (003), 4, 4,pp. 597 60 Printed in Great Britain Statistical analysis of microarray data: a Bayesian approach RAPHAEL GTTARD University of Washington, Department of Statistics, Box 3543, Seattle,

More information

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons:

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons: STAT 263/363: Experimental Design Winter 206/7 Lecture January 9 Lecturer: Minyong Lee Scribe: Zachary del Rosario. Design of Experiments Why perform Design of Experiments (DOE)? There are at least two

More information

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Statistics Journal Club, 36-825 Beau Dabbs and Philipp Burckhardt 9-19-2014 1 Paper

More information

Biochip informatics-(i)

Biochip informatics-(i) Biochip informatics-(i) : biochip normalization & differential expression Ju Han Kim, M.D., Ph.D. SNUBI: SNUBiomedical Informatics http://www.snubi snubi.org/ Biochip Informatics - (I) Biochip basics Preprocessing

More information

Looking at the Other Side of Bonferroni

Looking at the Other Side of Bonferroni Department of Biostatistics University of Washington 24 May 2012 Multiple Testing: Control the Type I Error Rate When analyzing genetic data, one will commonly perform over 1 million (and growing) hypothesis

More information

Statistics GIDP Ph.D. Qualifying Exam Methodology

Statistics GIDP Ph.D. Qualifying Exam Methodology Statistics GIDP Ph.D. Qualifying Exam Methodology January 9, 2018, 9:00am 1:00pm Instructions: Put your ID (not your name) on each sheet. Complete exactly 5 of 6 problems; turn in only those sheets you

More information

Stat 206: Estimation and testing for a mean vector,

Stat 206: Estimation and testing for a mean vector, Stat 206: Estimation and testing for a mean vector, Part II James Johndrow 2016-12-03 Comparing components of the mean vector In the last part, we talked about testing the hypothesis H 0 : µ 1 = µ 2 where

More information

Exam: high-dimensional data analysis February 28, 2014

Exam: high-dimensional data analysis February 28, 2014 Exam: high-dimensional data analysis February 28, 2014 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question (not the subquestions) on a separate piece of paper.

More information

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong Statistics Primer ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong 1 Quick Overview of Statistics 2 Descriptive vs. Inferential Statistics Descriptive Statistics: summarize and describe data

More information

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /8/2016 1/38

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /8/2016 1/38 BIO5312 Biostatistics Lecture 11: Multisample Hypothesis Testing II Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/8/2016 1/38 Outline In this lecture, we will continue to

More information

Design and Analysis of Gene Expression Experiments

Design and Analysis of Gene Expression Experiments Design and Analysis of Gene Expression Experiments Guilherme J. M. Rosa Department of Animal Sciences Department of Biostatistics & Medical Informatics University of Wisconsin - Madison OUTLINE Æ Linear

More information

Exam: high-dimensional data analysis January 20, 2014

Exam: high-dimensional data analysis January 20, 2014 Exam: high-dimensional data analysis January 20, 204 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question not the subquestions on a separate piece of paper. - Finish

More information

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Fall, 2013 Page 1 Random Variable and Probability Distribution Discrete random variable Y : Finite possible values {y

More information

Cross model validation and multiple testing in latent variable models

Cross model validation and multiple testing in latent variable models Cross model validation and multiple testing in latent variable models Frank Westad GE Healthcare Oslo, Norway 2nd European User Meeting on Multivariate Analysis Como, June 22, 2006 Outline Introduction

More information

Lesson 11. Functional Genomics I: Microarray Analysis

Lesson 11. Functional Genomics I: Microarray Analysis Lesson 11 Functional Genomics I: Microarray Analysis Transcription of DNA and translation of RNA vary with biological conditions 3 kinds of microarray platforms Spotted Array - 2 color - Pat Brown (Stanford)

More information

Step-down FDR Procedures for Large Numbers of Hypotheses

Step-down FDR Procedures for Large Numbers of Hypotheses Step-down FDR Procedures for Large Numbers of Hypotheses Paul N. Somerville University of Central Florida Abstract. Somerville (2004b) developed FDR step-down procedures which were particularly appropriate

More information

Expression arrays, normalization, and error models

Expression arrays, normalization, and error models 1 Epression arrays, normalization, and error models There are a number of different array technologies available for measuring mrna transcript levels in cell populations, from spotted cdna arrays to in

More information

Quick Calculation for Sample Size while Controlling False Discovery Rate with Application to Microarray Analysis

Quick Calculation for Sample Size while Controlling False Discovery Rate with Application to Microarray Analysis Statistics Preprints Statistics 11-2006 Quick Calculation for Sample Size while Controlling False Discovery Rate with Application to Microarray Analysis Peng Liu Iowa State University, pliu@iastate.edu

More information

Resampling and the Bootstrap

Resampling and the Bootstrap Resampling and the Bootstrap Axel Benner Biostatistics, German Cancer Research Center INF 280, D-69120 Heidelberg benner@dkfz.de Resampling and the Bootstrap 2 Topics Estimation and Statistical Testing

More information

Topic 20: Single Factor Analysis of Variance

Topic 20: Single Factor Analysis of Variance Topic 20: Single Factor Analysis of Variance Outline Single factor Analysis of Variance One set of treatments Cell means model Factor effects model Link to linear regression using indicator explanatory

More information

Visual interpretation with normal approximation

Visual interpretation with normal approximation Visual interpretation with normal approximation H 0 is true: H 1 is true: p =0.06 25 33 Reject H 0 α =0.05 (Type I error rate) Fail to reject H 0 β =0.6468 (Type II error rate) 30 Accept H 1 Visual interpretation

More information

Lecture 10: Experiments with Random Effects

Lecture 10: Experiments with Random Effects Lecture 10: Experiments with Random Effects Montgomery, Chapter 13 1 Lecture 10 Page 1 Example 1 A textile company weaves a fabric on a large number of looms. It would like the looms to be homogeneous

More information

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs)

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs) The One-Way Repeated-Measures ANOVA (For Within-Subjects Designs) Logic of the Repeated-Measures ANOVA The repeated-measures ANOVA extends the analysis of variance to research situations using repeated-measures

More information

Chapter Seven: Multi-Sample Methods 1/52

Chapter Seven: Multi-Sample Methods 1/52 Chapter Seven: Multi-Sample Methods 1/52 7.1 Introduction 2/52 Introduction The independent samples t test and the independent samples Z test for a difference between proportions are designed to analyze

More information

Androgen-independent prostate cancer

Androgen-independent prostate cancer The following tutorial walks through the identification of biological themes in a microarray dataset examining androgen-independent. Visit the GeneSifter Data Center (www.genesifter.net/web/datacenter.html)

More information

EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

EXST Regression Techniques Page 1. We can also test the hypothesis H : œ 0 versus H : EXST704 - Regression Techniques Page 1 Using F tests instead of t-tests We can also test the hypothesis H :" œ 0 versus H :" Á 0 with an F test.! " " " F œ MSRegression MSError This test is mathematically

More information

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

y ˆ i = ˆ  T u i ( i th fitted value or i th fit) 1 2 INFERENCE FOR MULTIPLE LINEAR REGRESSION Recall Terminology: p predictors x 1, x 2,, x p Some might be indicator variables for categorical variables) k-1 non-constant terms u 1, u 2,, u k-1 Each u

More information

Improved Statistical Tests for Differential Gene Expression by Shrinking Variance Components Estimates

Improved Statistical Tests for Differential Gene Expression by Shrinking Variance Components Estimates Improved Statistical Tests for Differential Gene Expression by Shrinking Variance Components Estimates September 4, 2003 Xiangqin Cui, J. T. Gene Hwang, Jing Qiu, Natalie J. Blades, and Gary A. Churchill

More information

False discovery rate procedures for high-dimensional data Kim, K.I.

False discovery rate procedures for high-dimensional data Kim, K.I. False discovery rate procedures for high-dimensional data Kim, K.I. DOI: 10.6100/IR637929 Published: 01/01/2008 Document Version Publisher s PDF, also known as Version of Record (includes final page, issue

More information

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Christopher R. Genovese Department of Statistics Carnegie Mellon University joint work with Larry Wasserman

More information

Performance Evaluation and Comparison

Performance Evaluation and Comparison Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation

More information

Effects of dependence in high-dimensional multiple testing problems. Kyung In Kim and Mark van de Wiel

Effects of dependence in high-dimensional multiple testing problems. Kyung In Kim and Mark van de Wiel Effects of dependence in high-dimensional multiple testing problems Kyung In Kim and Mark van de Wiel Department of Mathematics, Vrije Universiteit Amsterdam. Contents 1. High-dimensional multiple testing

More information

Table 1: Fish Biomass data set on 26 streams

Table 1: Fish Biomass data set on 26 streams Math 221: Multiple Regression S. K. Hyde Chapter 27 (Moore, 5th Ed.) The following data set contains observations on the fish biomass of 26 streams. The potential regressors from which we wish to explain

More information

Two-Color Microarray Experimental Design Notation. Simple Examples of Analysis for a Single Gene. Microarray Experimental Design Notation

Two-Color Microarray Experimental Design Notation. Simple Examples of Analysis for a Single Gene. Microarray Experimental Design Notation Simple Examples of Analysis for a Single Gene wo-olor Microarray Experimental Design Notation /3/0 opyright 0 Dan Nettleton Microarray Experimental Design Notation Microarray Experimental Design Notation

More information

FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES

FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES Sanat K. Sarkar a a Department of Statistics, Temple University, Speakman Hall (006-00), Philadelphia, PA 19122, USA Abstract The concept

More information

Analysis of variance, multivariate (MANOVA)

Analysis of variance, multivariate (MANOVA) Analysis of variance, multivariate (MANOVA) Abstract: A designed experiment is set up in which the system studied is under the control of an investigator. The individuals, the treatments, the variables

More information

Topics on statistical design and analysis. of cdna microarray experiment

Topics on statistical design and analysis. of cdna microarray experiment Topics on statistical design and analysis of cdna microarray experiment Ximin Zhu A Dissertation Submitted to the University of Glasgow for the degree of Doctor of Philosophy Department of Statistics May

More information

Chap The McGraw-Hill Companies, Inc. All rights reserved.

Chap The McGraw-Hill Companies, Inc. All rights reserved. 11 pter11 Chap Analysis of Variance Overview of ANOVA Multiple Comparisons Tests for Homogeneity of Variances Two-Factor ANOVA Without Replication General Linear Model Experimental Design: An Overview

More information

arxiv: v1 [math.st] 31 Mar 2009

arxiv: v1 [math.st] 31 Mar 2009 The Annals of Statistics 2009, Vol. 37, No. 2, 619 629 DOI: 10.1214/07-AOS586 c Institute of Mathematical Statistics, 2009 arxiv:0903.5373v1 [math.st] 31 Mar 2009 AN ADAPTIVE STEP-DOWN PROCEDURE WITH PROVEN

More information

On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses

On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses Gavin Lynch Catchpoint Systems, Inc., 228 Park Ave S 28080 New York, NY 10003, U.S.A. Wenge Guo Department of Mathematical

More information

Specific Differences. Lukas Meier, Seminar für Statistik

Specific Differences. Lukas Meier, Seminar für Statistik Specific Differences Lukas Meier, Seminar für Statistik Problem with Global F-test Problem: Global F-test (aka omnibus F-test) is very unspecific. Typically: Want a more precise answer (or have a more

More information

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data DEGseq: an R package for identifying differentially expressed genes from RNA-seq data Likun Wang Zhixing Feng i Wang iaowo Wang * and uegong Zhang * MOE Key Laboratory of Bioinformatics and Bioinformatics

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

Analysis of Variance

Analysis of Variance Analysis of Variance Blood coagulation time T avg A 62 60 63 59 61 B 63 67 71 64 65 66 66 C 68 66 71 67 68 68 68 D 56 62 60 61 63 64 63 59 61 64 Blood coagulation time A B C D Combined 56 57 58 59 60 61

More information

BIOINFORMATICS ORIGINAL PAPER

BIOINFORMATICS ORIGINAL PAPER BIOINFORMATICS ORIGINAL PAPER Vol 21 no 11 2005, pages 2684 2690 doi:101093/bioinformatics/bti407 Gene expression A practical false discovery rate approach to identifying patterns of differential expression

More information

Topic 28: Unequal Replication in Two-Way ANOVA

Topic 28: Unequal Replication in Two-Way ANOVA Topic 28: Unequal Replication in Two-Way ANOVA Outline Two-way ANOVA with unequal numbers of observations in the cells Data and model Regression approach Parameter estimates Previous analyses with constant

More information

Introduction to Crossover Trials

Introduction to Crossover Trials Introduction to Crossover Trials Stat 6500 Tutorial Project Isaac Blackhurst A crossover trial is a type of randomized control trial. It has advantages over other designed experiments because, under certain

More information

Lec 1: An Introduction to ANOVA

Lec 1: An Introduction to ANOVA Ying Li Stockholm University October 31, 2011 Three end-aisle displays Which is the best? Design of the Experiment Identify the stores of the similar size and type. The displays are randomly assigned to

More information

Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing

Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing So, What is Statistics? Theory and techniques for learning from data How to collect How to analyze How to interpret

More information

Aliaksandr Hubin University of Oslo Aliaksandr Hubin (UIO) Bayesian FDR / 25

Aliaksandr Hubin University of Oslo Aliaksandr Hubin (UIO) Bayesian FDR / 25 Presentation of The Paper: The Positive False Discovery Rate: A Bayesian Interpretation and the q-value, J.D. Storey, The Annals of Statistics, Vol. 31 No.6 (Dec. 2003), pp 2013-2035 Aliaksandr Hubin University

More information

Reference: Chapter 13 of Montgomery (8e)

Reference: Chapter 13 of Montgomery (8e) Reference: Chapter 1 of Montgomery (8e) Maghsoodloo 89 Factorial Experiments with Random Factors So far emphasis has been placed on factorial experiments where all factors are at a, b, c,... fixed levels

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 3, Issue 1 2004 Article 13 Multiple Testing. Part I. Single-Step Procedures for Control of General Type I Error Rates Sandrine Dudoit Mark

More information

SPOTTED cdna MICROARRAYS

SPOTTED cdna MICROARRAYS SPOTTED cdna MICROARRAYS Spot size: 50um - 150um SPOTTED cdna MICROARRAYS Compare the genetic expression in two samples of cells PRINT cdna from one gene on each spot SAMPLES cdna labelled red/green e.g.

More information

FDR and ROC: Similarities, Assumptions, and Decisions

FDR and ROC: Similarities, Assumptions, and Decisions EDITORIALS 8 FDR and ROC: Similarities, Assumptions, and Decisions. Why FDR and ROC? It is a privilege to have been asked to introduce this collection of papers appearing in Statistica Sinica. The papers

More information

STAT 461/561- Assignments, Year 2015

STAT 461/561- Assignments, Year 2015 STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and

More information

Resampling and the Bootstrap

Resampling and the Bootstrap Resampling and the Bootstrap Axel Benner Biostatistics, German Cancer Research Center INF 280, D-69120 Heidelberg benner@dkfz.de Resampling and the Bootstrap 2 Topics Estimation and Statistical Testing

More information

Permutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

Permutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods Permutation Tests Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods The Two-Sample Problem We observe two independent random samples: F z = z 1, z 2,, z n independently of

More information

IEOR165 Discussion Week 12

IEOR165 Discussion Week 12 IEOR165 Discussion Week 12 Sheng Liu University of California, Berkeley Apr 15, 2016 Outline 1 Type I errors & Type II errors 2 Multiple Testing 3 ANOVA IEOR165 Discussion Sheng Liu 2 Type I errors & Type

More information

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS Ravinder Malhotra and Vipul Sharma National Dairy Research Institute, Karnal-132001 The most common use of statistics in dairy science is testing

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2004 Paper 147 Multiple Testing Methods For ChIP-Chip High Density Oligonucleotide Array Data Sunduz

More information

Lecture 21: October 19

Lecture 21: October 19 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 21: October 19 21.1 Likelihood Ratio Test (LRT) To test composite versus composite hypotheses the general method is to use

More information

Example 1: Two-Treatment CRD

Example 1: Two-Treatment CRD Introduction to Mixed Linear Models in Microarray Experiments //0 Copyright 0 Dan Nettleton Statistical Models A statistical model describes a formal mathematical data generation mechanism from which an

More information

Hunting for significance with multiple testing

Hunting for significance with multiple testing Hunting for significance with multiple testing Etienne Roquain 1 1 Laboratory LPMA, Université Pierre et Marie Curie (Paris 6), France Séminaire MODAL X, 19 mai 216 Etienne Roquain Hunting for significance

More information

Analysis of Variance

Analysis of Variance Statistical Techniques II EXST7015 Analysis of Variance 15a_ANOVA_Introduction 1 Design The simplest model for Analysis of Variance (ANOVA) is the CRD, the Completely Randomized Design This model is also

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Gene Expression an Overview of Problems & Solutions: 3&4. Utah State University Bioinformatics: Problems and Solutions Summer 2006

Gene Expression an Overview of Problems & Solutions: 3&4. Utah State University Bioinformatics: Problems and Solutions Summer 2006 Gene Expression an Overview of Problems & Solutions: 3&4 Utah State University Bioinformatics: Problems and Solutions Summer 006 Review Considering several problems & solutions with gene expression data

More information

Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs

Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs The Analysis of Variance (ANOVA) The analysis of variance (ANOVA) is a statistical technique

More information

Peak Detection for Images

Peak Detection for Images Peak Detection for Images Armin Schwartzman Division of Biostatistics, UC San Diego June 016 Overview How can we improve detection power? Use a less conservative error criterion Take advantage of prior

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org

More information

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model Topic 17 - Single Factor Analysis of Variance - Fall 2013 One way ANOVA Cell means model Factor effects model Outline Topic 17 2 One-way ANOVA Response variable Y is continuous Explanatory variable is

More information