Optimal normalization of DNA-microarray data

Similar documents
Lesson 11. Functional Genomics I: Microarray Analysis

Normalization. Example of Replicate Data. Biostatistics Rafael A. Irizarry

Microarray Preprocessing

Biochip informatics-(i)

Seminar Microarray-Datenanalyse

Low-Level Analysis of High- Density Oligonucleotide Microarray Data

Technologie w skali genomowej 2/ Algorytmiczne i statystyczne aspekty sekwencjonowania DNA

GS Analysis of Microarray Data

GS Analysis of Microarray Data

GS Analysis of Microarray Data

Expression arrays, normalization, and error models

Chapter 3: Statistical methods for estimation and testing. Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001).

Bayesian Regression of Piecewise Constant Functions

robustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression

Introduction to Linear regression analysis. Part 2. Model comparisons

SUPPLEMENTAL DATA: ROBUST ESTIMATORS FOR EXPRESSION ANALYSIS EARL HUBBELL, WEI-MIN LIU, AND RUI MEI

Discovering Correlation in Data. Vinh Nguyen Research Fellow in Data Science Computing and Information Systems DMD 7.

Bayesian Models for Regularization in Optimization

Probe-Level Analysis of Affymetrix GeneChip Microarray Data

Robust Monte Carlo Methods for Sequential Planning and Decision Making

Non-specific filtering and control of false positives

Androgen-independent prostate cancer

9. Robust regression

Inferring Transcriptional Regulatory Networks from High-throughput Data

Lecture 14 October 13

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data

Information geometry for bivariate distribution control

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Optimal design of microarray experiments

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

4.1. Introduction: Comparing Means

Package plw. R topics documented: May 7, Type Package

Indian Statistical Institute

SPOTTED cdna MICROARRAYS

Prohorov s theorem. Bengt Ringnér. October 26, 2008

Bayesian ANalysis of Variance for Microarray Analysis

STA205 Probability: Week 8 R. Wolpert

Multiple comparisons The problem with the one-pair-at-a-time approach is its error rate.

Asymptotic Statistics-III. Changliang Zou

Introduction to Functional Data Analysis A CSCU Workshop. Giles Hooker Biological Statistics and Computational Biology

Chapter 5: Microarray Techniques

GS Analysis of Microarray Data

Design of microarray experiments

Lecture 12 Robust Estimation

Bioconductor Project Working Papers

Joint Probability Distributions

GS Analysis of Microarray Data

Nonlinear Programming Models

2. Mathematical descriptions. (i) the master equation (ii) Langevin theory. 3. Single cell measurements

LECTURE NOTE #NEW 6 PROF. ALAN YUILLE

Practical Applications and Properties of the Exponentially. Modified Gaussian (EMG) Distribution. A Thesis. Submitted to the Faculty

Sub-Gaussian Estimators of the Mean of a Random Matrix with Entries Possessing Only Two Moments

Chapter 12. Analysis of variance

Fuzzy Clustering of Gene Expression Data

High Breakdown Point Estimation in Regression

Statistical Estimation

Use of Agilent Feature Extraction Software (v8.1) QC Report to Evaluate Microarray Performance

Regression Clustering

More on Unsupervised Learning

Variations on Nonparametric Additive Models: Computational and Statistical Aspects

Mixture distributions with application to microarray data analysis

Exhaustive search. CS 466 Saurabh Sinha

cdna Microarray Analysis

4 Invariant Statistical Decision Problems

Probe-Level Analysis of Affymetrix GeneChip Microarray Data

Experimental Design and Data Analysis for Biologists

Data Preprocessing. Data Preprocessing

Statistical Applications in Genetics and Molecular Biology

S The Over-Reliance on the Central Limit Theorem

On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness

Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations

1 General problem. 2 Terminalogy. Estimation. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ).

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

Robustness and Distribution Assumptions

Gaussian processes for inference in stochastic differential equations

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Inferential Statistical Analysis of Microarray Experiments 2007 Arizona Microarray Workshop

Discovering molecular pathways from protein interaction and ge

Robust statistics. Michael Love 7/10/2016

Fractal functional regression for classification of gene expression data by wavelets

Efficient and Robust Scale Estimation

Correlation analysis 2: Measures of correlation

Error models and normalization. Wolfgang Huber DKFZ Heidelberg

GS Analysis of Microarray Data

Design of microarray experiments

networks in molecular biology Wolfgang Huber

Linear Algebra Massoud Malek

Accelerated Block-Coordinate Relaxation for Regularized Optimization

Image Cartoon-Texture Decomposition and Feature Selection using the Total Variation Regularized L 1 Functional

APMA 2811Q. Homework #1. Due: 9/25/13. 1 exp ( f (x) 2) dx, I[f] =

Statistical analysis of microarray data: a Bayesian approach

Robust Statistics, Revisited

Support Vector Machines

If we want to analyze experimental or simulated data we might encounter the following tasks:

A Resampling Method on Pivotal Estimating Functions

EECS E6690: Statistical Learning for Biological and Information Systems Lecture1: Introduction

Regression Shrinkage and Selection via the Lasso

Statistical Applications in Genetics and Molecular Biology

Statistical Inference on Large Contingency Tables: Convergence, Testability, Stability. COMPSTAT 2010 Paris, August 23, 2010

Transcription:

Optimal normalization of DNA-microarray data Daniel Faller 1, HD Dr. J. Timmer 1, Dr. H. U. Voss 1, Prof. Dr. Honerkamp 1 and Dr. U. Hobohm 2 1 Freiburg Center for Data Analysis and Modeling 1 F. Hoffman-La Roche, Pharma Research, Switzerland 23.11.2001

Contents DNA-microarrays: Basic principle and challenges Optimal transformations Properties & example Application to DNA-microarray data Results Signal variability False positives Improvements Robust estimation Variance stabilization Summary & Outlook

DNA-Microarrays Probe 1 Probe 2 mrna-preparation DNA-Array adding Fluorescence Hybridisation

DNA-Microarrays: (AG Walz)

Challenges From Gene-Expression to numbers biological differences artifacts of fabrication process noise systematic errors correct for systematic errors horizons of comparability combine results from different groups measurements at different times make results comparable What is significant? biology: Log(ratio) > 2 what is the noise level? advanced normalization algorithms

Standard techniques: One color design: linear normalization most spots do not change regulation symmetric same mean, median Two color design: Normalization algorithms linear normalization for Log(red/green) the same variability same variance

Basic Idea: Real Expression: r g (Gen g) Experiment i measured expression: f i (r g ) Find f i which maximizes correlation Normalization algorithms Assumptions exact replications: none different conditions: most genes do not change robust algorithm biological effects outliers maximize correlation of most genes

Optimal Transformations Idea: Rényi-maximalcorrelation: Ψ(x 1, x 2 ) = sup f,g Properties: R(f(x 1 ), g(x 2 )) defined if x 1, x 2 const symmetric, normalized, 0 Ψ 1 Ψ = 0 if and only if x 1, x 2 independent Ψ = 1 if fully dependent p(x 1, x 2 ) = N(µ 1, µ 2, σ 1, σ 2 ) Ψ = R

Optimal transformations Advantages: f, g maximize Ψ minimize regression e 2 = e 2 (f, g ) = inf f,g e2 (f, g) e 2 (f, g) = E {[f(x 1 ) g(x 2 )] 2 } E{f 2 (x 1 )}, Ψ 2 = 1 e 2 Correct for every systematic error (Rényi, 1959) No parametric assumptions

How to find them? ACE (Breiman & Friedman) iterative algorithm, based on Alternating Conditional Expectation guaranteed convergence Example: Y = exp[sin(2πx) + ɛ/2], X, ɛ N(0, 1)

Optimal transformations

DNA-microarrays Generalization e 2 = i<j g[φ i (X ig ) Φ j (X jg )] 2 g Φ2 i (X ig) Idea ACE for all pairwise comparisons After every iterative step average all transformations for one measurement Details Rank ordering Expectation value computed by smoothing Transform back using joint distribution The algorithm

Results 158 Affymetrix Chips

Results

Results Variance

Results False positive rate

Improvements Problems Not robust (L2 norm) Expectation values at the borders of intensity scale Rank ordering Solution Least trimmed squares (LTS) regression to estimate φ i r 2 = N/2 i=1 r2 i:n, r2 i:n ordered residuals In addition: Normalize by variance stabilizing functions Random number Z with mean µ, Variance σ 2 (µ) h(t) = t 1 d u 0 V ar(u)

Least Trimmed Squares (LTS) Robust regression: Breakdown points: Least squares: 0 % Least trimmed squares: 50 %

Variance stabilization Up to now: smooth functions with µ = 0, σ = 1 Now: smooth functions with σ(µ) = const

Variance stabilization Constant variance as a function of intensity by using a common transformation

First Results real data (left) real data with artificial systematic errors (right) consistent results

The transformations

The data

Correct for systematic errors No parametric assumptions Significant reduction of signal variability false positive rate Application to data including effects robust because of LTS variance stabilizing functions false negative rate? Interested? Summary & Outlook Please let us know... Daniel.Faller@physik.uni-freiburg.de

The algorithm For all pairwise comparisons i, j Φ i (X ig ) = X ig / X ig Repeat For all pairwise comparisons i, j Φ j (X jg ) = E [Φ i (X ig ) X jg ] Φ i (X ig ) = E [Φ j (X jg ) X ig ] / Φ i (X ig ) Compute mean of Φ i same replication while e 2 (Φ i, Φ j ) decreases Back