Two-Color Microarray Experimental Design Notation. Simple Examples of Analysis for a Single Gene. Microarray Experimental Design Notation

Size: px

Start display at page:

Download "Two-Color Microarray Experimental Design Notation. Simple Examples of Analysis for a Single Gene. Microarray Experimental Design Notation"

Daniel Owen
5 years ago
Views:

1 Simple Examples of Analysis for a Single Gene wo-olor Microarray Experimental Design Notation /3/0 opyright 0 Dan Nettleton Microarray Experimental Design Notation Microarray Experimental Design Notation 3 4 Biological Replicates vs. echnical Replicates Example : wo-reatment RD Biological Replication echnical Replication Both Biological and echnical Replication 5 6

2 Assign 8 Plants to Each reatment ompletely at Random Randomly Pair Plants Receiving Different reatments 7 8 Randomly Assign Pairs to Slides Balancing the wo Dye onfigurations Observed Normalized Log Signal Intensities for One Gene Y Y Y 5 Y 5 Y Y Y 6 Y 6 Y 3 Y 3 Y 7 Y 7 Y 4 Y 4 Y 8 Y 8 treatment dye slide 9 0 Unknown Means Underlying the Observed Normalized Log Signal Intensities (NLSI) Differential Expression μ+τ +δ μ+τ +δ μ+τ +δ μ+τ +δ μ+τ +δ μ+τ +δ μ+τ +δ μ+τ +δ μ+τ +δ μ+τ +δ μ+τ +δ μ+τ +δ μ+τ +δ μ+τ +δ μ+τ +δ μ+τ +δ μ+τ +δ μ+τ +δ μ+τ +δ μ+τ +δ μ+τ +δ μ+τ +δ μ+τ +δ μ+τ +δ μ+τ +δ μ+τ +δ μ+τ +δ μ+τ +δ μ+τ +δ μ+τ +δ μ+τ +δ μ+τ +δ μ is a real-valued parameter common to all observations. τ and τ represent the effects of treatments and on mean NLSI. δ and δ represents the effects of y3 and y5 dyes on mean NLSI. A gene is said to be differentially expressed if τ τ.

3 Unknown Random Effects Underlying Observed NLSI o make our model complete, we need to say more about the random effects. s +e s +e s 5 +e 5 s 5 +e 5 We will almost always assume that random effects are independent and normally distributed with mean zero and a factor-specific variance. s +e s 3 +e 3 s 4 +e 4 s +e s 3 +e 3 s 4 +e 4 s 6 +e 6 s 7 +e 7 s 8 +e 8 s, s, s 3, s 4, s 5, s 6, s 7, and s 8 represent slide effects. s 6 +e 6 s 7 +e 7 s 8 +e 8 s, s,..., s 8 ~ N(0,σ s ) and independent of e, e, e 3, e 4, e, e, e 3, e 4, e 5, e 6, e 7, e 8, e 5, e 6, e 7, e 8 ~ N(0,σ e ). e,...,e 8 represent error random effects that include any sources of variation unaccounted for by other terms. 3 4 iid iid (or just e ijk ~ N(0,σ e ) to save time and space.) iid iid What does s, s,..., s 8 ~ N(0,σ s ) mean? Observed NLSI are Modeled as Means Plus Random Effects Y =μ+τ +δ +s +e Y =μ+τ +δ +s +e Y 5 =μ+τ +δ +s 5 +e 5 Y 5 =μ+τ +δ +s 5 +e 5 Y =μ+τ +δ +s +e Y =μ+τ +δ +s +e Y 6 =μ+τ +δ +s 6 +e 6 Y 6 =μ+τ +δ +s 6 +e 6 Y 3 =μ+τ +δ +s 3 +e 3 Y 3 =μ+τ +δ +s 3 +e 3 Y 7 =μ+τ +δ +s 7 +e 7 Y 7 =μ+τ +δ +s 7 +e 7 Y 4 =μ+τ +δ +s 4 +e 4 Y 4 =μ+τ +δ +s 4 +e 4 Y 8 =μ+τ +δ +s 8 +e 8 Y 8 =μ+τ +δ +s 8 +e 8 Y ijk =μ+τ i +δ j +s k +e ijk 5 6 Observed Normalized Signal Intensities (NLSI) for One Gene Analysis of Log Red to Green Ratios Rather than working with the normalized log signal intensities, it is often customary to consider the log of the red to green normalized signals from each slide as the basic data for analysis. his is equivalent to working with the red green difference in NLSI from each slide. Given data, our task it to determine whether the gene is differentially expressed and, if so, estimate the log(r/g)=log(r)-log(g) magnitude and direction of differential expression

4 Differences for Slides with reatment Green and reatment Red Slide Difference Differences for Slides with reatment Red and reatment Green Difference Slide Y =μ+τ +δ +s +e Y =μ+τ +δ +s +e Y -Y = τ -τ +δ -δ +e -e Y 5 -Y 5 = τ -τ +δ -δ +e 5 -e 5 Y 5 =μ+τ +δ +s 5 +e 5 Y 5 =μ+τ +δ +s 5 +e 5 Y =μ+τ +δ +s +e Y =μ+τ +δ +s +e Y -Y = τ -τ +δ -δ +e -e Y 6 -Y 6 = τ -τ +δ -δ +e 6 -e 6 Y 6 =μ+τ +δ +s 6 +e 6 Y 6 =μ+τ +δ +s 6 +e 6 Y 3 =μ+τ +δ +s 3 +e 3 Y 3 =μ+τ +δ +s 3 +e 3 Y 3 -Y 3 = τ -τ +δ -δ +e 3 -e 3 Y 7 -Y 7 = τ -τ +δ -δ +e 7 -e 7 Y 7 =μ+τ +δ +s 7 +e 7 Y 7 =μ+τ +δ +s 7 +e 7 Y 4 =μ+τ +δ +s 4 +e 4 Y 4 =μ+τ +δ +s 4 +e 4 Y 4 -Y 4 = τ -τ +δ -δ +e 4 -e 4 Y 8 -Y 8 = τ -τ +δ -δ +e 8 -e 8 Y 8 =μ+τ +δ +s 8 +e 8 Y 8 =μ+τ +δ +s 8 +e 8 Note that according to our original model, these differences are iid N(τ -τ +δ -δ, σ e ). 9 Note that according to our original model, these differences are iid N(τ -τ +δ -δ, σ e ). 0 If we let d k denote the difference from slide k, we have Estimation of the Direction and Magnitude of Differential Expression d, d, d 3, d 4 iid N(τ -τ +δ -δ, σ e ) independent of d 5, d 6, d 7, d 8 iid N(τ -τ +δ -δ, σ e ). A standard two-sample t-test can be used to test H 0 : τ -τ +δ -δ = τ -τ +δ -δ which is equivalent to H 0 : τ = τ (null hypothesis of no differential expression). An unbiased estimator of τ -τ is given by { mean(d 5, d 6, d 7, d 8 ) - mean(d, d, d 3, d 4 ) } /. Because τ -τ is a difference in treatment effects for a measure of log expression level, exp(τ -τ ) can be interpreted as a ratio of expression levels on the original scale. exp[ { mean(d 5, d 6, d 7, d 8 ) - mean(d, d, d 3, d 4 ) } / ] can be reported as an estimate of the fold change in the expression level for treatment relative to treatment. Observed Normalized Log Signal Intensities (NLSI) for One Gene P-Value for esting τ = τ is < Estimated Fold hange= % onfidence Interval for Fold hange 3.3 to

5 P-Value for esting τ = τ is Estimated Fold hange= % onfidence Interval for Fold hange 0.83 to 7.49 Example : RD with Affymetrix echnology What genes are involved in muscle hypertrophy? Design a treatment that will induce hypertrophy in muscle tissue and an appropriate control treatment. Randomly assign experimental units to the two treatments. Use microarray technology to measure mrna transcript abundance in muscle tissue. Identify genes whose mrna levels differs between treatments. 5 6 Assign 6 mice to each treatment completely at random Assign 6 mice to each group completely at random 7 8 Measure Expression in Relevant Muscle issue with Affymetrix Genehips Normalized Log Scale Data Experimental Units Genes

6 Model for One Gene Gene 4: Data Analysis Y ij =μ+τ i +e ij (i=,; j=,, 3, 4, 5, 6) Y ij =normalized log signal intensity for the j th experimental unit exposed to the i th treatment μ=real-valued parameter common to all obs. τ i =effect due to i th treatment e ij =error effect for the j th experimental unit exposed to i th treatment Y =8.6 Y =6. Y.-Y.=τ -τ +e.-e.=.0 Y =8.8 Y =6.8 Y 3 =9. Y 3 =6.6 Y 4 =9.8 Y 4 =6.8 Y 5 =7.9 Y 5 =5.5 Y 6 =7.4 Y 6 =7.7 Y.=8.6 Y.=6.6 se(y.- Y.) = s p + n n = = Gene 4: 95% onfidence Interval for τ -τ Gene 4: 95% onfidence Interval for Fold hange Y.- Y. = 0 se(y.- Y.) = ( ) Y.- Y. ± tn n -se(y.- Y.) +..0 ±.8 * (0.98, 3.0) Y 0 Estimated Fold hange= e.-y. e. = 7. 4 (e ( ) ( ) Y.-Y.-tn n - se(y.-y.) Y.-Y. tn n -se(y.-y.) (.7,0.5),e ) Gene 4: t-test Y =8.6 Y =6. Y.-Y.=τ -τ +e.-e.=.0 Y =8.8 Y =6.8 Y 3 =9. Y 3 =6.6 Y.- Y.. 0 t = = = Y 4 =9.8 Y 4 =6.8 se(y.- Y.) Y 5 =7.9 Y 5 =5.5 ompare to a t-distribution Y 6 =7.4 Y 6 =7.7 with n +n -=0 d.f. to obtain p-value Y.=8.6 Y.=

Example 1: Two-Treatment CRD

Example 1: Two-Treatment CRD Introduction to Mixed Linear Models in Microarray Experiments //0 Copyright 0 Dan Nettleton Statistical Models A statistical model describes a formal mathematical data generation mechanism from which an