Optimal normalization of DNA-microarray data

Size: px

Start display at page:

Download "Optimal normalization of DNA-microarray data"

Norman Norton
5 years ago
Views:

1 Optimal normalization of DNA-microarray data Daniel Faller 1, HD Dr. J. Timmer 1, Dr. H. U. Voss 1, Prof. Dr. Honerkamp 1 and Dr. U. Hobohm 2 1 Freiburg Center for Data Analysis and Modeling 1 F. Hoffman-La Roche, Pharma Research, Switzerland

2 Contents DNA-microarrays: Basic principle and challenges Optimal transformations Properties & example Application to DNA-microarray data Results Signal variability False positives Improvements Robust estimation Variance stabilization Summary & Outlook

3 DNA-Microarrays Probe 1 Probe 2 mrna-preparation DNA-Array adding Fluorescence Hybridisation

4 DNA-Microarrays: (AG Walz)

5 Challenges From Gene-Expression to numbers biological differences artifacts of fabrication process noise systematic errors correct for systematic errors horizons of comparability combine results from different groups measurements at different times make results comparable What is significant? biology: Log(ratio) > 2 what is the noise level? advanced normalization algorithms

6 Standard techniques: One color design: linear normalization most spots do not change regulation symmetric same mean, median Two color design: Normalization algorithms linear normalization for Log(red/green) the same variability same variance

7 Basic Idea: Real Expression: r g (Gen g) Experiment i measured expression: f i (r g ) Find f i which maximizes correlation Normalization algorithms Assumptions exact replications: none different conditions: most genes do not change robust algorithm biological effects outliers maximize correlation of most genes

8 Optimal Transformations Idea: Rényi-maximalcorrelation: Ψ(x 1, x 2 ) = sup f,g Properties: R(f(x 1 ), g(x 2 )) defined if x 1, x 2 const symmetric, normalized, 0 Ψ 1 Ψ = 0 if and only if x 1, x 2 independent Ψ = 1 if fully dependent p(x 1, x 2 ) = N(µ 1, µ 2, σ 1, σ 2 ) Ψ = R

9 Optimal transformations Advantages: f, g maximize Ψ minimize regression e 2 = e 2 (f, g ) = inf f,g e2 (f, g) e 2 (f, g) = E {[f(x 1 ) g(x 2 )] 2 } E{f 2 (x 1 )}, Ψ 2 = 1 e 2 Correct for every systematic error (Rényi, 1959) No parametric assumptions

10 How to find them? ACE (Breiman & Friedman) iterative algorithm, based on Alternating Conditional Expectation guaranteed convergence Example: Y = exp[sin(2πx) + ɛ/2], X, ɛ N(0, 1)

11 Optimal transformations

12 DNA-microarrays Generalization e 2 = i<j g[φ i (X ig ) Φ j (X jg )] 2 g Φ2 i (X ig) Idea ACE for all pairwise comparisons After every iterative step average all transformations for one measurement Details Rank ordering Expectation value computed by smoothing Transform back using joint distribution The algorithm

13 Results 158 Affymetrix Chips

14 Results

15 Results Variance

16 Results False positive rate

17 Improvements Problems Not robust (L2 norm) Expectation values at the borders of intensity scale Rank ordering Solution Least trimmed squares (LTS) regression to estimate φ i r 2 = N/2 i=1 r2 i:n, r2 i:n ordered residuals In addition: Normalize by variance stabilizing functions Random number Z with mean µ, Variance σ 2 (µ) h(t) = t 1 d u 0 V ar(u)

18 Least Trimmed Squares (LTS) Robust regression: Breakdown points: Least squares: 0 % Least trimmed squares: 50 %

19 Variance stabilization Up to now: smooth functions with µ = 0, σ = 1 Now: smooth functions with σ(µ) = const

20 Variance stabilization Constant variance as a function of intensity by using a common transformation

21 First Results real data (left) real data with artificial systematic errors (right) consistent results

22 The transformations

23 The data

24 Correct for systematic errors No parametric assumptions Significant reduction of signal variability false positive rate Application to data including effects robust because of LTS variance stabilizing functions false negative rate? Interested? Summary & Outlook Please let us know...

25 The algorithm For all pairwise comparisons i, j Φ i (X ig ) = X ig / X ig Repeat For all pairwise comparisons i, j Φ j (X jg ) = E [Φ i (X ig ) X jg ] Φ i (X ig ) = E [Φ j (X jg ) X ig ] / Φ i (X ig ) Compute mean of Φ i same replication while e 2 (Φ i, Φ j ) decreases Back

Lesson 11. Functional Genomics I: Microarray Analysis

Lesson 11. Functional Genomics I: Microarray Analysis Lesson 11 Functional Genomics I: Microarray Analysis Transcription of DNA and translation of RNA vary with biological conditions 3 kinds of microarray platforms Spotted Array - 2 color - Pat Brown (Stanford)