Optimal normalization of DNA-microarray data

Optimal normalization of DNA-microarray data Daniel Faller 1, HD Dr. J. Timmer 1, Dr. H. U. Voss 1, Prof. Dr. Honerkamp 1 and Dr. U. Hobohm 2 1 Freiburg Center for Data Analysis and Modeling 1 F. Hoffman-La Roche, Pharma Research, Switzerland 23.11.2001

Contents DNA-microarrays: Basic principle and challenges Optimal transformations Properties & example Application to DNA-microarray data Results Signal variability False positives Improvements Robust estimation Variance stabilization Summary & Outlook

DNA-Microarrays Probe 1 Probe 2 mrna-preparation DNA-Array adding Fluorescence Hybridisation

DNA-Microarrays: (AG Walz)

Challenges From Gene-Expression to numbers biological differences artifacts of fabrication process noise systematic errors correct for systematic errors horizons of comparability combine results from different groups measurements at different times make results comparable What is significant? biology: Log(ratio) > 2 what is the noise level? advanced normalization algorithms

Standard techniques: One color design: linear normalization most spots do not change regulation symmetric same mean, median Two color design: Normalization algorithms linear normalization for Log(red/green) the same variability same variance

Basic Idea: Real Expression: r g (Gen g) Experiment i measured expression: f i (r g ) Find f i which maximizes correlation Normalization algorithms Assumptions exact replications: none different conditions: most genes do not change robust algorithm biological effects outliers maximize correlation of most genes

Optimal Transformations Idea: Rényi-maximalcorrelation: Ψ(x 1, x 2 ) = sup f,g Properties: R(f(x 1 ), g(x 2 )) defined if x 1, x 2 const symmetric, normalized, 0 Ψ 1 Ψ = 0 if and only if x 1, x 2 independent Ψ = 1 if fully dependent p(x 1, x 2 ) = N(µ 1, µ 2, σ 1, σ 2 ) Ψ = R

Optimal transformations Advantages: f, g maximize Ψ minimize regression e 2 = e 2 (f, g ) = inf f,g e2 (f, g) e 2 (f, g) = E {[f(x 1 ) g(x 2 )] 2 } E{f 2 (x 1 )}, Ψ 2 = 1 e 2 Correct for every systematic error (Rényi, 1959) No parametric assumptions

How to find them? ACE (Breiman & Friedman) iterative algorithm, based on Alternating Conditional Expectation guaranteed convergence Example: Y = exp[sin(2πx) + ɛ/2], X, ɛ N(0, 1)

Optimal transformations

DNA-microarrays Generalization e 2 = i<j g[φ i (X ig ) Φ j (X jg )] 2 g Φ2 i (X ig) Idea ACE for all pairwise comparisons After every iterative step average all transformations for one measurement Details Rank ordering Expectation value computed by smoothing Transform back using joint distribution The algorithm

Results 158 Affymetrix Chips

Results

Results Variance

Results False positive rate

Improvements Problems Not robust (L2 norm) Expectation values at the borders of intensity scale Rank ordering Solution Least trimmed squares (LTS) regression to estimate φ i r 2 = N/2 i=1 r2 i:n, r2 i:n ordered residuals In addition: Normalize by variance stabilizing functions Random number Z with mean µ, Variance σ 2 (µ) h(t) = t 1 d u 0 V ar(u)

Least Trimmed Squares (LTS) Robust regression: Breakdown points: Least squares: 0 % Least trimmed squares: 50 %

Variance stabilization Up to now: smooth functions with µ = 0, σ = 1 Now: smooth functions with σ(µ) = const

Variance stabilization Constant variance as a function of intensity by using a common transformation

First Results real data (left) real data with artificial systematic errors (right) consistent results

The transformations

The data

Correct for systematic errors No parametric assumptions Significant reduction of signal variability false positive rate Application to data including effects robust because of LTS variance stabilizing functions false negative rate? Interested? Summary & Outlook Please let us know... Daniel.Faller@physik.uni-freiburg.de

The algorithm For all pairwise comparisons i, j Φ i (X ig ) = X ig / X ig Repeat For all pairwise comparisons i, j Φ j (X jg ) = E [Φ i (X ig ) X jg ] Φ i (X ig ) = E [Φ j (X jg ) X ig ] / Φ i (X ig ) Compute mean of Φ i same replication while e 2 (Φ i, Φ j ) decreases Back