GS Analysis of Microarray Data

Similar documents
GS Analysis of Microarray Data

GS Analysis of Microarray Data

GS Analysis of Microarray Data

SPOTTED cdna MICROARRAYS

GS Analysis of Microarray Data

GS Analysis of Microarray Data

GS Analysis of Microarray Data

Lecture 1: The Structure of Microarray Data

GS Analysis of Microarray Data

GS Analysis of Microarray Data

cdna Microarray Analysis

Use of Agilent Feature Extraction Software (v8.1) QC Report to Evaluate Microarray Performance

Normalization. Example of Replicate Data. Biostatistics Rafael A. Irizarry

Low-Level Analysis of High- Density Oligonucleotide Microarray Data

ncounter PlexSet Data Analysis Guidelines

Expression arrays, normalization, and error models

Wavelet-Based Preprocessing Methods for Mass Spectrometry Data

Design of Microarray Experiments. Xiangqin Cui

GS Analysis of Microarray Data

GS Analysis of Microarray Data

Probe-Level Analysis of Affymetrix GeneChip Microarray Data

Compensation: Basic Principles

Experiment 10. Zeeman Effect. Introduction. Zeeman Effect Physics 244

Biochip informatics-(i)

The Haar Wavelet Transform: Compression and. Reconstruction

CS 4495 Computer Vision Binary images and Morphology

Systematic Uncertainty Max Bean John Jay College of Criminal Justice, Physics Program

Multimedia Databases. Previous Lecture. 4.1 Multiresolution Analysis. 4 Shape-based Features. 4.1 Multiresolution Analysis

Multimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig

WFC3 IR Blobs, IR Sky Flats and the measured IR background levels

Experiment 9. Emission Spectra. measure the emission spectrum of a source of light using the digital spectrometer.

BigBOSS Data Reduction Software

Developments & Limitations in GSR Analysis

Zeeman Effect Physics 481

GS Analysis of Microarray Data

Probe-Level Analysis of Affymetrix GeneChip Microarray Data

Introduction to the Scanning Tunneling Microscope

Multimedia Databases. 4 Shape-based Features. 4.1 Multiresolution Analysis. 4.1 Multiresolution Analysis. 4.1 Multiresolution Analysis

20 Unsupervised Learning and Principal Components Analysis (PCA)

Data Preprocessing. Data Preprocessing

MT Electron microscopy Scanning electron microscopy and electron probe microanalysis

Clustering & microarray technology

DEVELOP YOUR LAB, YOUR WAY

Please bring the task to your first physics lesson and hand it to the teacher.

Figure 1: Doing work on a block by pushing it across the floor.

GS Analysis of Microarray Data

MODULAR AND CUSTOM EXHIBIT SOLUTIONS

Measuring Colocalization within Fluorescence Microscopy Images

Chemistry 524--Final Exam--Keiderling May 4, :30 -?? pm SES

Error models and normalization. Wolfgang Huber DKFZ Heidelberg

First Derivative Test

Classification and Regression Trees

IncuCyte ZOOM NeuroTrack Fluorescent Processing

Carbon Dating The decay of radioactive nuclei can be used to measure the age of artifacts, fossils, and rocks. The half-life of C 14 is 5730 years.

1 The Preliminary Processing

Algorithm User Guide:

The Hydrogen Atom Student Guide

Tutorial Three: Loops and Conditionals

Class 14-light and lasers

Sampling. Where we re heading: Last time. What is the sample? Next week: Lecture Monday. **Lab Tuesday leaving at 11:00 instead of 1:00** Tomorrow:

Name Final Exam December 7, 2015

Two-sample inference: Continuous data

Practical Applications and Properties of the Exponentially. Modified Gaussian (EMG) Distribution. A Thesis. Submitted to the Faculty

Cover Page. The handle holds various files of this Leiden University dissertation

APh161: Physical Biology of the Cell Homework 1 Due Date: Tuesday, January 27, 2009

Optimal normalization of DNA-microarray data

Relative Photometry with data from the Peter van de Kamp Observatory D. Cohen and E. Jensen (v.1.0 October 19, 2014)


Experimental Design. Experimental design. Outline. Choice of platform Array design. Target samples

Diffraction Limited Size and DOF Estimates

Math 378 Spring 2011 Assignment 4 Solutions

LECTURE 15: SIMPLE LINEAR REGRESSION I

Do NOT rely on this as your only preparation for the Chem 101A final. You will almost certainly get a bad grade if this is all you look at.

CORE MOLIT ACTIVITIES at a glance

Physics Spring 2008 Midterm #1 Solution

Linear models. x = Hθ + w, where w N(0, σ 2 I) and H R n p. The matrix H is called the observation matrix or design matrix 1.

Spectrometers. Materials: Easy Spectrometer. Old CD Razor Index card Cardboard tube at least 10 inches long

= 6 (1/ nm) So what is probability of finding electron tunneled into a barrier 3 ev high?

1. Counting. Chris Piech and Mehran Sahami. Oct 2017

22A-2 SUMMER 2014 LECTURE 5

Introducing GIS analysis

Two-sample inference: Continuous data

Homework : Data Mining SOLUTIONS

Automatic Star-tracker Optimization Framework. Andrew Tennenbaum The State University of New York at Buffalo

Microarray Preprocessing

Cell-free and in vivo characterization of Lux, Las, and Rpa quorum activation systems in E. coli Supporting Information

Linear Models and Empirical Bayes Methods for Microarrays

Slide Set 3. for ENEL 353 Fall Steve Norman, PhD, PEng. Electrical & Computer Engineering Schulich School of Engineering University of Calgary

CPSC 340: Machine Learning and Data Mining. Regularization Fall 2017

Know Your Uncertainty

networks in molecular biology Wolfgang Huber

Electron probe microanalysis - Electron microprobe analysis EPMA (EMPA) What s EPMA all about? What can you learn?

The Distances and Ages of Star Clusters

Name: Unit 3 Guide-Electrons In Atoms

Bioconductor Project Working Papers

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way

Playing with the AnIMLs: Demonstrations of AnIML Generic Viewers. Gary W. Kramer

Designing Information Devices and Systems I Spring 2018 Homework 13

The IRS Flats. Spitzer Science Center

Tree Building Activity

Transcription:

GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org kcoombes@mdanderson.org 1 November 2005

QUANTIFYING GLASS MICROARRAYS 1 Lecture 17: Quantifying Glass Microarrays Microarray Platforms Overview: Two-color Spotted Microarrays TIFF files: the good and the bad Counting pixels: foreground, shapes, and masks The nothing that is: background Summarizing the spot: log ratios The effects of preprocessing: background subtraction

QUANTIFYING GLASS MICROARRAYS 2 Microarray Platforms Multiple synthesized short oligos (25-mers) on silicon Commerically produced: Affymetrix Single channel fluorescent labeling Between 11 and 20 probes per gene target Spotted cdna on nylon membranes (obsolete) Commerically produced: Research Genetics, Clontech Radioactive labeling, single channel Spotted cdna or long oligos (60- or 70-mers) on glass slides Home-grown or commercial Two-channel: simultaneous co-hybridization of two samples Two-color fluorescent labeling

QUANTIFYING GLASS MICROARRAYS 3 Overview: Nylon cdna Microarrays

QUANTIFYING GLASS MICROARRAYS 4 Nylon cdna Microarrays The raw data is a 16-bit, gray-scale, TIFF image.

QUANTIFYING GLASS MICROARRAYS 5 Nylon cdna Microarrays Array images are often viewed in color by changing the color map; this does not change the actual data.

QUANTIFYING GLASS MICROARRAYS 6 Nylon cdna Microarrays Numerical operations (like this square-root transform) can make certain features more visible. Transformations must only be used for visualization, since they would distort the quantifications.

QUANTIFYING GLASS MICROARRAYS 7 Nylon cdna Microarrays This is a close-up of the same microarray. Note the general blurriness, along with the blotchy artifact along the left edge.

QUANTIFYING GLASS MICROARRAYS 8 Overview: Two-color Spotted Microarrays

QUANTIFYING GLASS MICROARRAYS 9 Two-color Spotted Microarrays

QUANTIFYING GLASS MICROARRAYS 10 Two-color Spotted Microarrays

QUANTIFYING GLASS MICROARRAYS 11 Two-color Spotted Microarrays

QUANTIFYING GLASS MICROARRAYS 12 Two-color Spotted Microarrays

QUANTIFYING GLASS MICROARRAYS 13 The Images Are The Data A common feature of all microarray platforms is that the primary data produced by an experiment is in the form of a gray-scale image. With Affymetrix arrays, the raw image is stored in a DAT file. The precision of the manufacturing process and the alternating border of dark and bright spots makes it fairly straightforward to find and quantify features. Mechanical variation in the robotic production of nylon or glass arrays, however, makes finding and quantifying features a more complicated procedure. Extra complications also arise from the pairing of two samples (red and green) in the same hybridization.

QUANTIFYING GLASS MICROARRAYS 14 Arrayer robot Glass microarrays are produced by robotically spotting cdna or long oligos (60- or 70-mers) on glass microscope slides.

QUANTIFYING GLASS MICROARRAYS 15 Robotic pins impose a grid substructure This slide was produced by a robot using 48 pins, laid out in a 4 12 rectangle. Each 10 10 subgrid was produced by a separate pin. Note: Flaws in a pin (e.g., blunt tip) can cause one subgrid to be systematically different from the others.

QUANTIFYING GLASS MICROARRAYS 16 Trends in the background Systematic differences in grids can create spatially coherent artifacts. These can also be caused by incomplete mixing of the hybridization solution, often revealing itself in the background.

QUANTIFYING GLASS MICROARRAYS 17 Scanners are boring beige boxes...

QUANTIFYING GLASS MICROARRAYS 18 Scanning schematic

QUANTIFYING GLASS MICROARRAYS 19 Fluorescent Emission Properties Fluorescent dyes (Cy3 = green, wavelength 532; Cy5 = red, wavelength 635) absorb light at a specific excitation wavelength, and emit light at a specific larger wavelength.

QUANTIFYING GLASS MICROARRAYS 20 Dye effects The basic difficulty in making sense of glass microarray data is deciding how to combine the information from the two channels. The two dyes have different chemical properties; they may be incorporated into genes at different rates. They may also fluoresce at different rates. These differences can lead to array-wide differences in intensity and to gene-specific differences related to the DNA sequence of the target genes. These differences have implications for the downstream analysis, and will therefore affect how we think about designing microarray experiments.

QUANTIFYING GLASS MICROARRAYS 21 A closer look at a microarray image This is a fairly old microarray from M.D. Anderson; it doesn t yet have a barcode attached.

QUANTIFYING GLASS MICROARRAYS 22 A closer look at one subgrid These arrays were printed with duplicate spots. The top 5 rows are duplicated in the bottom 5 rows. Note the ring or donut effect that is evident at many spots, especially in Cy5. [Deegan et al (1997). Capillary flow as the cause of ring stains from dried liquid drops. Nature, v.389, p.827-9.]

QUANTIFYING GLASS MICROARRAYS 23 A closer look at one spot Notes: (1) The spot is not perfectly uniform; higher edges are evident even in bright spots. (2) The background is not uniform. (3) The spot appears to be contained in a box with 22 22 pixels.

QUANTIFYING GLASS MICROARRAYS 24 A dust speck Automatic processing algorithms to locate the spots have to contend with large dust specks; quantification algorithms can be affected by smaller specks that overlap true spots.

QUANTIFYING GLASS MICROARRAYS 25 A fiber Why do you think the pixel values are uniform along the fiber?

QUANTIFYING GLASS MICROARRAYS 26 Summary: glass microarrays We ve seen a number of potential difficulties with analyzing the images from glass microarrays. Artifacts like dust specks, fibers, or water spots can cause small-scale problems with spot-finding or quantification. Differences in pins can cause systematic biases on the scale of subgrids. Channel differences, either directly related to chemical propoerties of the dyes or differences in laser intensity, can introduce systematic distortions. Insufficient mixing of the hybridization solution can cause large-scale differences in background and signal intensity.

QUANTIFYING GLASS MICROARRAYS 27 The Next Goal: Quantifying a Spot

QUANTIFYING GLASS MICROARRAYS 28 TIFF Files: The Good Almost all cdna microarray images are 16-bit TIFF files. This is a fairly standard file format, and most image viewing software will read TIFF files... However The TIFF standard is highly flexible and can even be customized. In addition to the basic parameters required for all images, you can add your own tags to the files to guide processing.

QUANTIFYING GLASS MICROARRAYS 29 TIFF Files: The Bad Flexibility can be unclear. The defined behavior on encountering an unknown tag is to skip that field entirely. The above can bite you. (More shortly.) Flexibility in terms of placement of tags in the file and possible inclusion of internal compression makes it harder to find (or write) freeware scripts; some of the compression algorithms are not public domain. R does not have a built-in function for reading TIFF files.

QUANTIFYING GLASS MICROARRAYS 30 Dilution: A Cautionary Tale STORM phosphorimagers can be used to produce image files from radiolabeled nylon membranes. The STORM counts are advertised to go to 100,000. A 16-bit register holds counts up to 65,535 (2 16 1). How do they do it? They record the square root of every count! If you don t know this, your assessments of the amount of change will be quite off.

QUANTIFYING GLASS MICROARRAYS 31 A Quantification File A typical entry: Spot labels, AR VOL - Levels x mm2, A - 1 : a - 1, 1585.1254, % Replaced (AR VOL), SD - Levels 0.000, 13869.45 Pos X - mm, Pos Y - mm, Area - mm2 11.278, 21.346, 0.083 Bkgd, sarvol, S/N, Flag 396.710, 1188.416, 33.006, 0 What do these numbers mean?

QUANTIFYING GLASS MICROARRAYS 32 Quantification has three parts Registration: locating the centers of the spots Segmentation: deciding which pixels around the center actually belong to the spot Quantification: summarizing the numerical pixel values in the spot (foreground) and around it (background).

QUANTIFYING GLASS MICROARRAYS 33 Quantification in action

QUANTIFYING GLASS MICROARRAYS 34 Position Pos X - mm : 11.278 specifying the center of a spot in terms of the slide (mm, from lower left) and image (row and column, from upper left).

QUANTIFYING GLASS MICROARRAYS 35 Volume: Summing the pixels in the circle Having identified the center of the spot, we add up all the pixels in a circular region here (the mask ). The size of the mask is dictated by the physical spot size.

QUANTIFYING GLASS MICROARRAYS 36 What do the pixel intensities look like? Not quite a perfect fit, but ok.

QUANTIFYING GLASS MICROARRAYS 37 What is AR Vol? AR Vol = Artifact Removed Volume. As most artifacts are of high intensity, we omit those whose intensity is very far above the central (median) value that is seen. This presumes that the spots are (a) even, and (b) of roughly equal size.

QUANTIFYING GLASS MICROARRAYS 38 Does AR Vol Work? Not always...

QUANTIFYING GLASS MICROARRAYS 39 Does AR Vol Work? Not always...

QUANTIFYING GLASS MICROARRAYS 40 Does AR Vol Work? But sometimes!

QUANTIFYING GLASS MICROARRAYS 41 Does AR Vol Work? But sometimes!

QUANTIFYING GLASS MICROARRAYS 42 Are there other ways of dealing with bugs? The simplest and best is to use replicates, either within the array itself, or across multiple arrays. Replicates let us check that the differences are large relative to the scale of Noise.

QUANTIFYING GLASS MICROARRAYS 43 What other metrics could we use? mean? median (or some other percentile)? why a circle? There are downsides to a fixed shape. We may miss some parts of the spot, and include others that aren t there. (segmentation)

QUANTIFYING GLASS MICROARRAYS 44 What about Background? What is it? The pixel intensity in those parts of the image where nothing has been spotted. How do we count it? Typically by averaging the intensities in some non-spot pixels close to the spot center. What do we do with it? Subtract it off from all pixels inside the spot(?). Subtracting background is very important when a fixed mask is used.

QUANTIFYING GLASS MICROARRAYS 45 What about Background? Which pixels do we use? (Yang et al, JCGS, 2001)

QUANTIFYING GLASS MICROARRAYS 46 Putting Channels Together: Log Ratios Why is this a natural scale? Lots of biological stuff works in terms of fold changes. Why log? Fold changes of 2 and 1/2 are of equal magnitude, but different sign Why ratios? We ll look at that below.

QUANTIFYING GLASS MICROARRAYS 47 Checking the Data: Replicate Spots Log scale; do things line up?

QUANTIFYING GLASS MICROARRAYS 48 A Better View: M-A Plots rotate things by 45 degrees so that they re easier to see.

QUANTIFYING GLASS MICROARRAYS 49 Subtracting Background, with Replicates Negative values, thresholding, and variability

QUANTIFYING GLASS MICROARRAYS 50 Flagged Spot 1

QUANTIFYING GLASS MICROARRAYS 51 Flagged Spot 2

QUANTIFYING GLASS MICROARRAYS 52 Flagged Spot 3

QUANTIFYING GLASS MICROARRAYS 53 Why Use Log Ratios?: Red Replicates

QUANTIFYING GLASS MICROARRAYS 54 Why Use Log Ratios?: Green Replicates

QUANTIFYING GLASS MICROARRAYS 55 Why Use Log Ratios?: Ratio Replicates

QUANTIFYING GLASS MICROARRAYS 56 Why does this work? Channels

QUANTIFYING GLASS MICROARRAYS 57 Why does this work? Ratios