Design of Microarray Experiments. Xiangqin Cui

Similar documents
Design and Analysis of Gene Expression Experiments

Chapter 3: Statistical methods for estimation and testing. Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001).

Topics on statistical design and analysis. of cdna microarray experiment

Experimental Design. Experimental design. Outline. Choice of platform Array design. Target samples

DIRECT VERSUS INDIRECT DESIGNS FOR edna MICROARRAY EXPERIMENTS

changes in gene expression, we developed and tested several models. Each model was

Optimal design of microarray experiments

Design of microarray experiments

Example 1: Two-Treatment CRD

GS Analysis of Microarray Data

GS Analysis of Microarray Data

Improving the identification of differentially expressed genes in cdna microarray experiments

SPOTTED cdna MICROARRAYS

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data

Inferential Statistical Analysis of Microarray Experiments 2007 Arizona Microarray Workshop

Sample Size and Power Calculation in Microarray Studies Using the sizepower package.

Improved Statistical Tests for Differential Gene Expression by Shrinking Variance Components Estimates

Sample Size Estimation for Studies of High-Dimensional Data

Biochip informatics-(i)

Quick Calculation for Sample Size while Controlling False Discovery Rate with Application to Microarray Analysis

Expression arrays, normalization, and error models

Bioconductor Project Working Papers

Statistical Applications in Genetics and Molecular Biology

Normalization. Example of Replicate Data. Biostatistics Rafael A. Irizarry

Abstract. comment reviews reports deposited research refereed research interactions information

Lesson 11. Functional Genomics I: Microarray Analysis

Statistical Applications in Genetics and Molecular Biology

Design of microarray experiments

Two-Color Microarray Experimental Design Notation. Simple Examples of Analysis for a Single Gene. Microarray Experimental Design Notation

Mixture models for analysing transcriptome and ChIP-chip data

Linear Models and Empirical Bayes Methods for. Assessing Differential Expression in Microarray Experiments

GS Analysis of Microarray Data

Chapter 10. Design of Experiments and Analysis of Variance

Review Article Statistical Analysis of Efficient Unbalanced Factorial Designs for Two-Color Microarray Experiments

GS Analysis of Microarray Data

Single gene analysis of differential expression. Giorgio Valentini

ABSSeq: a new RNA-Seq analysis method based on modelling absolute expression differences

Single gene analysis of differential expression

Genomic Medicine HT 512. Data representation, transformation & modeling in genomics

OPTIMAL DESIGN OF EXPERIMENTS FOR EMERGING BIOLOGICAL AND COMPUTATIONAL APPLICATIONS

Chapter 4: Randomized Blocks and Latin Squares

Technologie w skali genomowej 2/ Algorytmiczne i statystyczne aspekty sekwencjonowania DNA

Package dama. November 25, 2017

Data Preprocessing. Data Preprocessing

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST

microrna pseudo-targets

Introduction to Bioinformatics. Shifra Ben-Dor Irit Orr

A variance-stabilizing transformation for gene-expression microarray data

Use of Agilent Feature Extraction Software (v8.1) QC Report to Evaluate Microarray Performance

Clustering & microarray technology

The gpca Package for Identifying Batch Effects in High-Throughput Genomic Data

PROGRESS in plant and animal breeding is often

Low-Level Analysis of High- Density Oligonucleotide Microarray Data

Network Biology-part II

GS Analysis of Microarray Data

Bioinformatics. Transcriptome

GS Analysis of Microarray Data

Biology EOC Review Study Questions

Exploratory statistical analysis of multi-species time course gene expression

Chapter 1 Statistical Inference

Systematic Variation in Genetic Microarray Data

Estimation of Transformations for Microarray Data Using Maximum Likelihood and Related Methods

cdna Microarray Analysis

Protocol S1. Replicate Evolution Experiment

Mixtures and Hidden Markov Models for analyzing genomic data

Supplemental Data. Perea-Resa et al. Plant Cell. (2012) /tpc

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018

Multiple QTL mapping

Mean Comparisons PLANNED F TESTS

Error analysis in biology

GS Analysis of Microarray Data

GS Analysis of Microarray Data

Full versus incomplete cross-validation: measuring the impact of imperfect separation between training and test sets in prediction error estimation

Topic 9: Factorial treatment structures. Introduction. Terminology. Example of a 2x2 factorial

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

Genome Assembly. Sequencing Output. High Throughput Sequencing

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

Blocks are formed by grouping EUs in what way? How are experimental units randomized to treatments?

Tools and topics for microarray analysis

Experimental Design, Data, and Data Summary

Implementing contrasts using SAS Proc GLM is a relatively straightforward process. A SAS Proc GLM contrast statement has the following form:

g A n(a, g) n(a, ḡ) = n(a) n(a, g) n(a) B n(b, g) n(a, ḡ) = n(b) n(b, g) n(b) g A,B A, B 2 RNA-seq (D) RNA mrna [3] RNA 2. 2 NGS 2 A, B NGS n(

Introduction to clustering methods for gene expression data analysis

Statistics for Differential Expression in Sequencing Studies. Naomi Altman

FORESTRY 601 RESEARCH CONCEPTS FALL 2009

robustness: revisting the significance of mirna-mediated regulation

Predicting Protein Functions and Domain Interactions from Protein Interactions

Smart pooling for interactome mapping

A calibration method for estimating absolute expression levels from microarray data

Balanced Treatment-Control Row-Column Designs

Stat 890 Design of computer experiments

GS Analysis of Microarray Data

CONJOINT 541. Translating a Transcriptome at Specific Times and Places. David Morris. Department of Biochemistry

MEMORIAL UNIVERSITY OF NEWFOUNDLAND DEPARTMENT OF MATHEMATICS AND STATISTICS FINAL EXAM - STATISTICS FALL 1999

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES

Multiplicative background correction for spotted. microarrays to improve reproducibility

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

BIOINFORMATICS LAB AP BIOLOGY

Hidden Markov Models and some applications

Transcription:

Design of Microarray Experiments Xiangqin Cui

Experimental design Experimental design: is a term used about efficient methods for planning the collection of data, in order to obtain the maximum amount of information for the least amount of work. Anyone collecting and analyzing data, be it in the lab, the field or the production plant, can benefit from knowledge about experimental design. http://www.stat.sdu.dk/matstat/design/index.html

Experimental Design Principles Randomization Replication Blocking Use of factorial experiments instead of the one-factor-at-a-time methods. Orthogonality

Randomization The experimental treatments are assigned to the experimental units (subjects) in a random fashion. It helps to eliminate effect of "lurking variables", uncontrolled factors which might vary over the length of the experiment.

Commonly Used Randomization Methods Number the subjects to be randomized and then randomly draw the numbers using paper pieces in a hat or computer random number generator. 1 2 3 4 5 6 Hormone treatment : (1,3,4); (1,2,6) Control : (2,5,6); (3,4,5) Plan 1 plan 2

Randomization in Microarray Experiments Randomize arrays in respect to samples Randomize samples in respect to treatments Randomize the order of performing the labeling, hybridizations, scanning

Replication Replication is repeating the creation of a phenomenon, so that the variability associated with the phenomenon can be estimated. Replications should not be confused with repeated measurements which refer to literally taking several measurements of a single occurrence of a phenomenon.

Replication in microarray experiments What to replicate? Biological replicates (replicates at the experimental unit level, e.g. mouse, plant, pot of plants ) Experimental unit is the unit that the experiment treatment or condition is directly applied to, e.g. a plant if hormone is sprayed to individual plants; a pot of seedlings if different fertilizers are applied to different pots. Technical replicates Any replicates below the experimental unit, e.g. different leaves from the same plant sprayed with one hormone level; different seedlings from the same pot; Different aliquots of the same RNA extraction; multiple arrays hybridized to the same RNA; multiple spots on the same array.

How Many Replicates Power: the probability for a real positive to be identified It depends on: 1) The size of difference (log ratio) to be detected The bigger the difference, the higher the power. 2) The sample size The larger the sample size, the higher the power. 3) The error variance of the treatment The bigger the error variance, the lower the power.

How Many Replicates * Minimum number of replication (r) per group: 3 5 (two groups) (reliable variance estimates, permutation test, df error, etc.) Factorial 2 x 2 A1 B1 r B2 r A2 r r S.V. A B A*B Error (r=1) 1 1 1 0 (r=2) 1 1 1 4 (r=3) 1 1 1 8 Total 3 7 11

* Sample size calculation for experiments with two groups REFERENCE: n = # arrays (total) 4 ( z + z ) (1 α / 2) ( δ / σ) log(fc) (1 β) 2 2 SD BALANCED BLOCK: n = ( z + z ) (1 α / 2) ( δ / τ) (1 β) 2 2 (in general, τ > σ) Example (Dobbin and Simon, 2003): δ=1, α=0.001, β=0.05: σ 0.50 n = 30 (i.e. 15A + 15B + 30R) τ 0.67 n = 17 (i.e. 17A + 17B)

Error variance (EV) of the estimated treatment effect: TrtA TrtB Reference Design M1 M2 M3 M4 EV = σ m 2 M 2 σ e + mn R R R R Error variance of the estimated treatment effect: Approximately EV = σ e 2 2mn( σ 2 2 σ M σ + e m mn 4 A + σ m : number of mice per treatment n : number of array pairs per mouse 2 e )

Resource allocation: 2 2 σ M σ e EV = + m mn Cost = mc + m M nc A m n C M C A mice / trt array pairs / mouse cost / mouse cost / array pair The optimum number of array pairs per mouse: n σ 2 e = 2 σ M C C M A

Examples for resource allocation Using variance components estimated from kidney in Project Normal data. r = 1 ( no replicated spots on array ) Reference design Mouse price Array price/pair # of array pairs per mouse $15 $600 1 $300 $600 1 $1500 $600 2 More efficient array level designs, such as direct comparisons and loop designs, can reduce the optimum number of arrays per mouse.

Pooling Samples Pooling can reduce the biological variance but not the technical variances. The biological variance will be replaced by: σ 2 1 pool = α k : pool size k σ 2 M α : constant for the effect of pooling. 0 < α < 1 α = 0, pooling has no effect. α = 1, pooling has maximum effect. Note: More pools with fewer arrays per pool is better than fewer pools with more arrays per pools when fixed number of arrays and mice are to be used.

Power Increase to Detect 2 fold change by Pooling ( Pool size k = 3, α = 1 ) 2 pools / trt 4 pools / trt 6 pools / trt 8 pools / trt 2 mice / trt 4 mice / trt 6 mice / trt 8 mice / trt Significance level: 0.05 after Bonferroni correction

PowerAtlas It is an online tool to calculate sample size based on similar data set in GEO or user provided pilot data. http://www.poweratlas.org/ Page et al., BMC Bioinformatics. 2006 Feb 22;7:84. Gadbury GL, et al. Stat Meth Med Res 13: 325-338, 2004.

EDR is the proportion of genes that are truly differentially expressed that will be called significant at the significance level of 0.05. This is analogous to average power. TP is the proportion of genes called 'significant' that are truly differentially expressed. TN is the proportion of genes called 'not significant' that are truly not differentially expressed between the conditions.

Blocking Some identified uninteresting but varying factors can be controlled through blocking. Completely randomized design Complete block design Incompletely block designs

Completely Randomized Design There is no blocking Example Compare two hormone treatments (trt and control) using 6 Arabidopsis plants. 1 2 3 4 5 6 Hormone trt: (1,3,4); (1,2,6) Control : (2,5,6); (3,4,5) Note: Design with one-color microarrays is often completely randomized design.

Complete Block Design There is blocking and the block size is equal to the number of treatments. Example: Compare two hormone treatments (trt and control) using 6 Arabidopsis plants. For some reason plant 1 and 2 are taller, plant 5 and 6 are thinner. 1 2 3 4 5 6 Hormone treatment: (1,4,5); (1,3,6) Control : (2,3,6); (2,4,5) Randomization within blocks

Incomplete Block Design There is blocking and the block size is smaller than the number of treatments. Example: Compare three hormone treatments (hormone level 1, hormone level 2, and control) using 6 Arabidopsis plants. For some reason plant 1 and 2 are taller, plant 5 and 6 are thinner. 1 2 3 4 5 6 Hormone level1: (1,4); (2,4) Hormone level2: (2,5); (1,6) Control : (3,6); (3.5) Randomization within blocks

Designs for two-color microarrays Experimental design for one color microarray array is straight forward with out the required blocking at the array level. For two color microarrays, we need to pair the samples and label then with Cy3 and Cy5 for hybridization to one array. How do we pair?

Blocking in microarray experiments In two-color platform, the arrays vary due to the loading of DNA quantity, spot morphology, hybridization condition, scanning setting It is treated as a block of size 2, the samples are compared within each array. If there are two lots of chips in the experiment and there is large variation across chip lots. We can treat chip lots as a blocking factor. Example: compare three samples: A, B, C A B B C C A Block (array) 1 Block (array) 2 Block (array) 3

A1 Cy5 Cy3 B1

Dye-swap A1 B1 Dye swapping Separates the dye effect from the sample effect. Results are less sensitive to the data transformation processes Kerr and Churchill(2001) Biostatistics, 2:183-201.

Replicated Dye-Swap A1 B1 A2 B2

Reference design All samples are compared to a single reference sample. Single reference Double reference A B C A B C R R

Example Microarray Experiment comparing 3 treatments T1 T2 T3 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 R R R R R R R R R R R R R R R Note: Arrows are used to represent a two-color microarray. The arrow head represents the red channel and the tail represents the green channel. T1 to T3, the three treatments; M1 to M15, the 15 plants; R, reference sample.

Loop Design A D B C

Basic types of loop designs Single loop Double loop A B A B C C

Complex loop designs Two-factor factorial design at the high level Pera I DBA High fat A B C Low fat E F D A1 B1 A2 B2 C1 F1 C2 F2 D1 E1 D2 E2

Reference or Loop Reference design: It has lower overall efficiency, but it has equal efficiency for all sample comparisons, and is easy to extend. Loop design: higher overall efficiency, harder to extend, unequal efficiency for all comparisons,

Comparing the efficiency of reference versus loop designs Yang and Speed (2002) Nat. Rev. Genet 3:579

Alternative Designs Loop: A 1 B 1 Connected Loop A B C 1 A A 1 B 2 2 B 1 C Regular Loop C 1 C 2 Reference: A A B B Classical Reference A 1 A 2 B 1 B 2 A 1 A 2 B 1 B 2 R R R R R 1 R 2 R 3 R 4 R 1 R 1 R 1 R 1 Common Reference

Experiments With Two Groups A 1 A 2 B 1 B 2 R R R R Reference with dye-swap A 1 A 2 A 3 B 1 B 2 B 3 A 4 B 4 Reference with R R R R R R R R alternating dye Is alternating dye really necessary?

Experiments with Two Groups (Direct comparison) A 1 A 2 B 1 B 2 A 3 A 4 B 3 B 4 Complete block with dye-swap Complete block design; Row-column design A 1 A 2 A 3 A 4 A 5 A 6 A 7 A 8 B 1 B 2 B 3 B 4 B 5 B 6 B 7 B 8

Experiments with Three Groups A 1 A 2 B 1 B 2 C 1 R R R R R C 2 R Reference with dye-swap Reference with alternating dye A 1 A 2 A 3 A 4 B 1 B 2 B 3 B 4 C 1 C 2 C 3 C 4 R R R R R R R R R R R R

Experiments With Three Groups Multiple connected loops A 1 B 1 A 2 B 2 A 3 B 3 A 4 B 4 C 1 C 2 C 3 C 4 Multiple regular loops = Incomplete block structure A A 1 B 2 2 B A 1 1 B 1 = C 1 C 1 C 2 B 2 C 2 A 2

Pooling mice Pooling can reduce the mouse variance but not the technical variances. The mouse variance will be replaced by: σ 2 1 pool = α k : pool size k σ 2 M α : constant for the effect of pooling. 0 < α < 1 α = 0, pooling has no effect. α = 1, pooling has maximum effect. Note: More pools with fewer arrays per pool is better than fewer pools with more arrays per pools when fixed number of arrays and mice are to be used.

Power Increase to Detect 2 fold change by Pooling ( Pool size k = 3, α = 1 ) 2 pools / trt 4 pools / trt 6 pools / trt 8 pools / trt 2 mice / trt 4 mice / trt 6 mice / trt 8 mice / trt Significance level: 0.05 after Bonferroni correction

Design Principle Continues Use of factorial experiments instead of the one-factor-at-a-time methods. Orthogonality: each level of one factor has all levels of the other factor. Example: To compare two treatments (T1, T2) and two strains (S1, S2) S1 S2 T1 T1S1 T1S2 T2 T2S1 T2S2

References Churchill GA. Fundamentals of experimental design for cdna microarrays. Nature Genet. 32: 490-495, 2002. Cui X and Churchill GA. How many mice and how many arrays? Replication in mouse cdna microarray experiments, in "Methods of Microarray Data Analysis III", Edited by KF Johnson and SM Lin. Kluwer Academic Publishers, Norwell, MA. pp 139-154, 2003. Gadbury GL, et al. Power and sample size estimation in high dimensional biology. Stat Meth Med Res 13: 325-338, 2004. Kerr MK. Design considerations for efficient and effective microarray studies. Biometrics 59: 822-828, 2003. Kerr MK and Churchill GA. Statistical design and the analysis of gene expression microarray data. Genet. Res. 77: 123-128, 2001. Kuehl RO. Design of experiments: statistical principles of research design and analysis, 2 nd ed., 1994, (Brooks/cole) Duxbry Press, Pacific Grove, CA. Page GP et al. The PowerAtlas: a power and sample size atlas for microarray experimental design and research. BMC Bioinformatics. 2006 Feb 22;7:84. Rosa GJM, et al. Reassessing design and analysis of two-colour microarray experiments using mixed effects models. Comp. Funct. Genomics 6: 123-131. 2005. Wit E, et al. Near-optimal designs for dual channel microarray studies. Appl. Statist. 54: 817-830, 2005. Yand YH and Speed T. Design issues for cdna microarray experiments. Nat. Rev. Genet. 3: 570-588, 2002.