Lecture 1: The Structure of Microarray Data

Size: px
Start display at page:

Download "Lecture 1: The Structure of Microarray Data"

Transcription

1 INTRODUCTION TO MICROARRAYS 1 Lecture 1: The Structure of Microarray Data So, Why are we here? What are we trying to measure, and how? Mechanics the data files produced and used Getting numbers: normalization and quantification

2 INTRODUCTION TO MICROARRAYS 2 The role of gene expression Your genome consists of pairs of DNA molecules (chromosomes) held together by complementary nucleotide base pairs (in total, about base pairs). The structure of DNA provides an explanation for heredity, by copying individual strands and maintaining complementarity. All of your cells contain the same genetic information, but your skin cells are different from liver cells or kidney cells or brain cells. These differences come about because different genes are expressed at high levels in different tissues. So, how are genes expressed?

3 INTRODUCTION TO MICROARRAYS 3 The Central Dogma: DNA and Protein

4 INTRODUCTION TO MICROARRAYS 4 The Central Dogma: RNA

5 INTRODUCTION TO MICROARRAYS 5 The Central Dogma DNA sequence DNA makes RNA makes protein mrna sequence and abundance protein sequence, abundance and shape Microarrays measure mrna expression.

6 INTRODUCTION TO MICROARRAYS 6 An idealized expression profile If we could count the number of mrna molecules from each gene in a single cell at a particular time, we might get this: But how do we make these measurements?

7 INTRODUCTION TO MICROARRAYS 7 How do microarrays work? The biological principle involved is that sequences of DNA or RNA molecules containing complementary base pairs have a natural tendency to bind together....aaaaagctagtcgatgctag......tttttcgatcagctacgatc... If we know the target mrna sequence we can build a probe for it using the complementary sequence. The probe location tells us the identity of the gene. Two variants are common: Reverse transcription from mrna to cdna Photolithographic synthesis of short subsequences (oligos)

8 INTRODUCTION TO MICROARRAYS 8 Hyb Pic 1 from Bioconductor, ENAR03 Workshop Slides

9 INTRODUCTION TO MICROARRAYS 9 Hyb Pic 2 from Bioconductor, ENAR03 Workshop Slides

10 INTRODUCTION TO MICROARRAYS 10 How do Affymetrix chips work? In general, the probes are shorter than the genes. This is driven by the manufacturing process. Critical note: Different probes for the same gene have different binding affinities. These affinities are unknown. Thus, it s difficult to say gene A beats gene B, as opposed to there s more gene A here than there was there. Microarrays produce relative measurements of gene expression.

11 INTRODUCTION TO MICROARRAYS 11 How do we choose probes? Given unknown affinities, we can provide some security by using several different probes for the same gene, but the optimal number is not clear. Sequential generations of Affy chips have used 20, 16, and 11 probes. Some further difficulties: some genes are short (overlap) genes have an orientation (3 bias) the gene may not be what we think it is (DB evolution) probes can cross-hybridize to the wrong targets

12 INTRODUCTION TO MICROARRAYS 12 Perfect Match and Mismatch (Probe-Pairs) PM: MM: GCTAGTCGATGCTAGCTTACTAGTC GCTAGTCGATGCAAGCTTACTAGTC Affymetrix tries to control for cross-hybridization by coupling multiple probes that should work paired with those that shouldn t. These are known as the Perfect Match (PM) and Mismatch (MM) probes, and constitute probe pairs.

13 INTRODUCTION TO MICROARRAYS 13 The Ensemble: A Probeset Affy Microarray Suite 4.0 User Guide

14 INTRODUCTION TO MICROARRAYS 14 How de we measure amounts bound? When we extract mrna from a sample of cells, we do not measure this mrna directly. Rather, we make copies, and incorporate molecules of fluorescent dye into the copies. We can measure fluorescence! Thus, after selecting the desired probes for all genes of interest, these are printed on a solid substrate. Samples containing the target genes are processed, labeled with a fluorescent dye, hybridized to the array, and scanned, producing an image file. The image is the data. Ah, but what files do we get?

15 INTRODUCTION TO MICROARRAYS 15 Affymetrix Data file formats All Affymetrix GeneChips are scanned in an Affymetrix scanner, and the initial quantification of features is performed using Affymetrix software. (The main difference of opinion arises in how to combine the feature quantifications from a probe set.) The software involves numerous files. EXP Contains basic information about the experiment. DAT Contains the raw image. CEL Contains features Quantifications. CDF Maps between features, probes, probe-sets, and genes. CHP Contains gene expression levels, as assessed by the Affy software.

16 INTRODUCTION TO MICROARRAYS 16 An Affymetrix GeneChip microarray image The array shown here has a grid of features. Each feature contains many copies of the same 25-bp probe.

17 INTRODUCTION TO MICROARRAYS 17 Closeup of a GeneChip image The pixelated features have been combined with positive controls to spell out the chip type this helps ensure that the image is correctly oriented. Also note the border lattice of alternating dark and bright QC probes, making image alignment and feature detection easier.

18 INTRODUCTION TO MICROARRAYS 18 GeneChip features Features are square. Horizontal and vertical alignment with the edges of the image is pretty good. However, feature boundaries can be rather blurry.

19 INTRODUCTION TO MICROARRAYS 19 GeneChip features Each feature on this chip is approximately 7 pixels (20 microns) on a side. In general, Affymetrix features use many fewer pixels than are used for the round spots in the images of other types of microarrays. Features are not perfectly uniform!

20 INTRODUCTION TO MICROARRAYS 20 The DAT file size The DAT file contains a 16-bit intensity image in a proprietary format. The file structure consists of a 512 byte header followed by the raw image data. The image shown above involved a 4733 by 4733 grid of pixels, so the total file size is = bytes (45M). This is BIG. File size is a nontrivial issue with Affy data; the earlier versions of the software could only work with a limited number of chips (say 30).

21 INTRODUCTION TO MICROARRAYS 21 DAT to CEL: Quantifying features Locate a probe pair of interest

22 INTRODUCTION TO MICROARRAYS 22 Zoom to a single feature start with the pixel region

23 INTRODUCTION TO MICROARRAYS 23 Trim the feature trim off the outermost boundary

24 INTRODUCTION TO MICROARRAYS 24 Get numbers record the 75th percentile value of the stuff remaining. Why trim? Why the 75th percentile, and not the median? or the mean?

25 INTRODUCTION TO MICROARRAYS 25 A shift in focus the above problem, going from the image to the feature quantification, has been a major part of the discussion for quantification of other types of arrays, in part because we get one spot per gene. Here, pretty much everybody uses Affy s algorithm. Not so much because it s perfect, as because it s reasonable. The real challenge here comes from summarizing multiple measurements of the same thing.

26 INTRODUCTION TO MICROARRAYS 26 Where the Probesets Are: The CDF file With any set of microarray experiments, one of the major challenges is keeping track of how the feature quantifications map back to information about genes, probes, and probe sets. There is one CDF file for each type of GeneChip, which contains this information. There have been many different versions of expression chips now, so knowing which type you re dealing with is important.

27 INTRODUCTION TO MICROARRAYS 27 Probe-Set Locations Probe set locations change with the chip type. In older chips, the probe pairs were adjacent. In newer chips, they are randomized.

28 INTRODUCTION TO MICROARRAYS 28 Preprocessing 1 Before we quantify individual probesets across a set of chips, we need to check whether the image data is roughly comparable in intensity. Adding twice as much sample may make the resultant image brighter, but it doesn t tell us anything new about the underlying biology. Baseline Correction and Normalization Where is 0? Where is 1? Where is 2?

29 INTRODUCTION TO MICROARRAYS 29 Bioconductor Packages Illustrating preprocessing with tools... I will refer to the following packages from the Bioconductor web site. Use the menu item Packages > Install package(s) from BioConductor... to get them. repostools : Repository tools for R Biobase : Base functions for BioConductor affy : Methods for Affymetrix oligonucleotide arrays affydata : Affymetrix data for demonstration purposes affypdnn : Probe dependent nearest neighbor (PDNN) for the affy package

30 INTRODUCTION TO MICROARRAYS 30 Affymetrix Data in BioConductor For working with Affymetrix data, BioConductor includes a specialized kind of exprset called an AffyBatch. To create an AffyBatch object from the CEL files in the current directory, do the following: > library(affy) # load the affy library > my.data <- ReadAffy() # read CEL data You may have to start by telling R to use a different working directory to find the CEL files; the command to do this is setwd. > setwd("/my/celfiles") # point to the CEL files Paths in R are separated by forward slashes (/) not backslashes (\); this is a common source of confusion.

31 INTRODUCTION TO MICROARRAYS 31 Demonstration data Note: If you are trying to follow along and have not yet obtained some CEL files, the affydata package includes demonstration data fom a dilution experiment. You can load it by typing > library(affydata) > data(dilution) These commands will create an AffyBatch object called Dilution that you can explore.

32 INTRODUCTION TO MICROARRAYS 32 Peeking at what s inside BioConductor will automatically build an object with the correct gene annotations for the kind of array you are using the first time you access the data; this may take a while, since it downloads all the information from the internet. So, don t be surprised if it takes a few minutes to display the response to the command > Dilution AffyBatch object size of arrays=640x640 features (12805 kb) cdf=hg_u95av2 (12625 affyids) number of samples=4 annotation=hgu95av2

33 INTRODUCTION TO MICROARRAYS 33 Looking at the experimental design You can see what the experiments are by looking at the phenotype information. sn19: amount of central nervous system RNA hybi > phenodata(dilution) phenodata object with 3 variables and 4 cases varlabels liver: amount of liver RNA hybridized to array scanner: ID number of scanner used > pdata(dilution) liver sn19 scanner 20A B A B

34 INTRODUCTION TO MICROARRAYS 34 A first look at an array > image(dilution[,1])

35 INTRODUCTION TO MICROARRAYS 35 A summary view of four images > boxplot(dilution, col=1:4)

36 INTRODUCTION TO MICROARRAYS 36 The distribution of feature intensities > hist(dilution, col=1:4, lty=1)

37 INTRODUCTION TO MICROARRAYS 37 Background correction The list of available background correction methods is stored in a variable: > bgcorrect.methods [1] "mas" "none" "rma" "rma2" So there are four methods: none Do nothing mas Use the algorithm from MAS 5.0 rma Use the algorithm from the current version of RMA rma2 Use the algorithm from an older version of RMA

38 INTRODUCTION TO MICROARRAYS 38 Background correction in MAS 5.0 MAS 5.0 divides the microarray (more precisely, the CEL file) into 16 regions. In each region, the intensity of the dimmest 2% of features is used to define the background level. Each probe is then adjusted by a weighted average of these 16 values, with the weights depending on the distance to the centroids of the 16 regions.

39 INTRODUCTION TO MICROARRAYS 39 Background correction in RMA RMA takes a very different approach to background correction. First, only PM values are adjusted, the MM values are not changed at all. Second, they try to model the distribution of PM intensities statistically as a sum of exponential signal with mean λ normal noise with mean µ and variance σ 2 (truncated at 0 to avoid negatives). If we observe a signal X = x at a PM feature, we adjust it by φ(a/b) φ((x a)/b) E(s X = x) = a + b Φ(a/b) + Φ((x a)/b) 1 where b = σ and a = s µ λσ 2.

40 INTRODUCTION TO MICROARRAYS 40 Comparing background methods > d.mas <- bg.correct(dilution[,1], "mas") > d.rma <- bg.correct(dilution[,1], "rma") > bg.with.mas <- pm(dilution[,1]) - pm(d.mas) > bg.with.rma <- pm(dilution[,1]) - pm(d.rma)

41 INTRODUCTION TO MICROARRAYS 41 > summary(bg.with.mas) Min. : st Qu.:93.14 Median :94.35 Mean : rd Qu.:95.80 Max. :97.67 > summary(bg.with.rma) Min. : st Qu.:113.7 Median :114.9 Mean : rd Qu.:114.9 Max. :114.9

42 INTRODUCTION TO MICROARRAYS 42 Difference in background estimates On this array, RMA gives slightly larger background estimates, and gives estimates that are more nearly constant across the array. The overall differences can be displayed in a histogram.

43 INTRODUCTION TO MICROARRAYS 43 Normalization Ok, having defined where 0 is (addressing location), we now want to address the issue of scale. Are there differences in overall brightness unrelated to the biology of interest? Some changes are undoubtedly of interest. However, in most microarray experiments, we are comparing samples of a single tissue type (eg brain), and in such cases we assume that most genes don t change. Thus, we want the values to mostly match.

44 INTRODUCTION TO MICROARRAYS 44 What Has Been Tried? Housekeeping genes start with a set of genes whose expression you believe shouldn t change, and scale the other expression values accordingly. Spike-ins introduce a set of markers whose relative intensities you can control, and use these to calibrate the remaining intensities. Simple scaling multiply all of the intensities from one chip so that a summary of the result matches the summary value from another chip or some standard.

45 INTRODUCTION TO MICROARRAYS 45 Some Difficulties Housekeeping genes what are they? Are there any genes whose expression does not change across a wide variety of conditions? Do those conditions include cancer (broad distortion)? Spike-ins How do we regulate the amount of the spike-in relative to the amount of the material of interest? Simple scaling Does a log scale MA plot look flat? What scaling factor do we use? Median? Total amount of RNA present? Some other quantile?

46 INTRODUCTION TO MICROARRAYS 46 An Alternative? Combining Spiking and scaling? Use spike-ins covering a very broad dynamic range, and use this to try to define linearity of expression, after which you scale.

47 INTRODUCTION TO MICROARRAYS 47 Housekeeping: Can this be Inferred? What is a housekeeping gene? Typical assumption something that is vital to the stable functioning of the cell, present at constant levels. In most suggested cases, this level has been high. Slightly different definition a gene that retains roughly the same rank in a sorted list of expression values. Choose a subset with invariant ranks and use these to determine our mapping (Schadt et al.)

48 INTRODUCTION TO MICROARRAYS 48 What is Our Reference? When we are assessing invariance, we have to define with respect to what? For the most part, this has meant choosing a single chip as a canonical reference. Typically, this chip is taken to be a middle chip by some metric, such as the chip having the median median intensity. This is the default in dchip. The choice of reference can affect the results.

49 INTRODUCTION TO MICROARRAYS 49 Invariance and Nonlinearity Using an invariant set effectively applies a whole bunch of scaling factors, with the factor changing depending on the intensity (scaling at high levels is different than scaling at low levels). For the most part, this serves to map the quantiles of one set of intensities over to another. Similar to producing an MA-plot, fitting a smooth curve (e.g. loess) to the center of the data, and subtracting the curve off.

50 INTRODUCTION TO MICROARRAYS 50 Quantiles A simpler (and faster) method is simply to match quantiles. Assume that the distributions of probe intensities should be completely the same across chips. Start with n arrays, and p probes, and form a p n matrix X. Sort the columns of X, so that the entries in a given row correspond to a fixed quantile. Replace all entries in that row with their mean value. Undo the sort. This procedure, sorting and averaging, is comparatively fast. It also treats all of the chips symmetrically.

51 INTRODUCTION TO MICROARRAYS 51 Affy Normalization in BioConductor The affy package contains several options: normalize.affybatch.constant normalize.affybatch.contrasts normalize.affybatch.invariantset normalize.affybatch.quantiles Typically, this is coupled with baseline subtraction and with summarization using expresso.

52 INTRODUCTION TO MICROARRAYS 52 Constant normalization: choosing reference find the middle behavior chip > apply(exprs(affybatch.example),2,"median"); 20A 20B 10A eset1 <- expresso(affybatch.example, bgcorrect.method = "rma", normalize.method = "constant", normalize.param = list(refindex=3), pmcorrect.method = "pmonly", summary.method = "medianpolish"); > eset1.mu <- apply(exprs(eset1),1,"mean"); > eset1.var <- apply(exprs(eset1),1,"var");

53 INTRODUCTION TO MICROARRAYS 53 An overview In many cases, pairwise MA plots show some curvature before normalization, suggesting that a nonlinear fit might help. In general, the first thing that I do is check the context of the experiment, to see if I am persuaded that most genes might be stable. Next, I check the pairwise MA plots. If I don t see substantial nonlinearity, then I don t have to use quantiles so I don t. If I see nonlinearity, I check images of the CEL files to look for artifacts, and if I see nothing I go with quantiles. Now, on to modeling the summaries...

54 INTRODUCTION TO MICROARRAYS 54 Quantifying a Dataset For this, we need a set of data. Fortunately for us, there s lots of Affy data on the web. In this example, we ll be using some data from Todd Golub s lab on Leukemia differentiation. HL60 undiff PMA ATRA CELfiles.tar 21 CEL files, Hu6800 chips (aka HuGeneFL). (CEL files from the web; CDF file from Affymetrix).

55 INTRODUCTION TO MICROARRAYS 55 Probeset D11086 at, chip 1 So, how do we summarize this?

56 INTRODUCTION TO MICROARRAYS 56 In the beginning: MAS 4.0, aka AvDiff First, shift to PM-MM differences. (Cross-hyb?)

57 INTRODUCTION TO MICROARRAYS 57 AvDiff Processing 1: Flag extremes

58 INTRODUCTION TO MICROARRAYS 58 AvDiff Processing 2: Define ok Bounds mean ± 3*sd, computed omitting extremes

59 INTRODUCTION TO MICROARRAYS 59 AvDiff Processing 3: Average ok Diffs Not quite what we had before!

60 INTRODUCTION TO MICROARRAYS 60 Comments on AvDiff? It does combine measurements across probes, and tries to exploit redundancy. It weights all probes equally. It works on the PM-MM differences in an additive fashion. It can give negative values. It can omit interesting probes. It works one chip at a time. It does not learn.

61 INTRODUCTION TO MICROARRAYS 61 Using models: dchip In 2001, Cheng Li and Wing Wong introduced a new method of summarizing probeset intensities, model-based expression indices, or MBEI. (PNAS, v.98, p.31-36). At the crux of their argument was a very simple observation the relative expression values of probes within a probeset were very stable across multiple arrays.

62 INTRODUCTION TO MICROARRAYS 62 Stability: Our First Chip

63 INTRODUCTION TO MICROARRAYS 63 Stability: Two Chips

64 INTRODUCTION TO MICROARRAYS 64 Stability: Three Chips

65 INTRODUCTION TO MICROARRAYS 65 Stability: Ten Chips

66 INTRODUCTION TO MICROARRAYS 66 So, how to exploit this? Fit a model: For sample i, and probe pair j, they posit that P M ij = ν j + θ i α j + θ i φ j + ɛ MM ij = ν j + θ i α j + ɛ Focusing on the PM-MM differences, this model condenses to one with two sets of unknowns: θ i and φ j.

67 INTRODUCTION TO MICROARRAYS 67 Fit using several chips at once!

68 INTRODUCTION TO MICROARRAYS 68 The next step: φ

69 INTRODUCTION TO MICROARRAYS 69 and back to θ

70 INTRODUCTION TO MICROARRAYS 70 and after 5 of each

71 INTRODUCTION TO MICROARRAYS 71 What do the residuals look like? Note potential outlying probes!

72 INTRODUCTION TO MICROARRAYS 72 Comments on dchip? By using multiple chips, it can keep all of the probes; no tossing of the most informative ones. It captures effects that are multiplicative. By checking the residuals from the model, it is possible to identify outliers due to artifacts (that replication idea again). Using the hypothesized error model, confidence bands for the fold change can be computed. Probe profiles can be computed in one experiment and used in another.

73 INTRODUCTION TO MICROARRAYS 73 The downside(?)s dchip requires several chips to get a good fit for its model. It is not a good idea to trust the fits too much if they are based on just one or two chips. I m not convinced that this is altogether a bad thing. The error model is too simplistic larger intensity probes will typically also have larger variances.

74 INTRODUCTION TO MICROARRAYS 74 Why did dchip catch on? The model works pretty well. The software package was (and remains) easy to acquire, learn and use. It incorporates several of the most common tricks that people want to play with array data. It could handle large numbers of chips. This last is in part due to the fact that their file structures are better a version 3.0 Affy CEL file for a U95Av2 chip takes about 12M to store, but the dchip summary takes about 1.6M.

75 INTRODUCTION TO MICROARRAYS 75 Is dchip the Best Model? Probably not. Others out there include MAS5.0 Tukey Biweight(log(PM j CT j )) RMA log(p M ij BG) = µ i + α j + σɛ ij PDNN GCRMA all of these are implemented in R in BioConductor.

76 INTRODUCTION TO MICROARRAYS 76 How do we know which models are better? Well, after MAS5.0, Affy decided it wasn t going to fight to have the best algorithm; it would let others play that game. Indeed, it could reap the benefits of better algorithms by selling more chips. To let people test their own models, they posted a test dataset: The Affy Latin Square Experiment. In this experiment, only the levels of 14 spike-ins were varied. Truth is known, and agreement with truth can be tested.

77 INTRODUCTION TO MICROARRAYS 77 Overview: Two-color Spotted Microarrays

78 INTRODUCTION TO MICROARRAYS 78 The Images Are The Data A common feature of all microarray platforms is that the primary data produced by an experiment is in the form of a gray-scale image. With Affymetrix arrays, the raw image is stored in a DAT file. The precision of the manufacturing process and the alternating border of dark and bright spots makes it fairly straightforward to find and quantify features. Mechanical variation in the robotic production of nylon or glass arrays, however, makes finding and quantifying features a more complicated procedure. Extra complications also arise from the pairing of two samples (red and green) in the same hybridization.

79 INTRODUCTION TO MICROARRAYS 79 Arrayer robot Glass microarrays are produced by robotically spotting cdna or long oligos (60- or 70-mers) on glass microscope slides.

80 INTRODUCTION TO MICROARRAYS 80 Robotic pins impose a grid substructure This slide was produced by a robot using 48 pins, laid out in a 4 12 rectangle. Each subgrid was produced by a separate pin. Note: Flaws in a pin (e.g., blunt tip) can cause one subgrid to be systematically different from the others.

81 INTRODUCTION TO MICROARRAYS 81 Scanners are boring beige boxes...

82 INTRODUCTION TO MICROARRAYS 82 Fluorescent Emission Properties Fluorescent dyes (Cy3 = green, wavelength 532; Cy5 = red, wavelength 635) absorb light at a specific excitation wavelength, and emit light at a specific larger wavelength.

83 INTRODUCTION TO MICROARRAYS 83 Dye effects The basic difficulty in making sense of glass microarray data is deciding how to combine the information from the two channels. The two dyes have different chemical properties; they may be incorporated into genes at different rates. They may also fluoresce at different rates. These differences can lead to array-wide differences in intensity and to gene-specific differences related to the DNA sequence of the target genes. These differences have implications for the downstream analysis, and will therefore affect how we think about designing microarray experiments.

84 INTRODUCTION TO MICROARRAYS 84 A closer look at a microarray image This is a fairly old microarray from M.D. Anderson; it doesn t yet have a barcode attached.

85 INTRODUCTION TO MICROARRAYS 85 A closer look at one subgrid These arrays were printed with duplicate spots. The top 5 rows are duplicated in the bottom 5 rows. Note the ring or donut effect that is evident at many spots, especially in Cy5. [Deegan et al (1997). Capillary flow as the cause of ring stains from dried liquid drops. Nature, v.389, p ]

86 INTRODUCTION TO MICROARRAYS 86 A closer look at one spot Notes: (1) The spot is not perfectly uniform; higher edges are evident even in bright spots. (2) The background is not uniform. (3) The spot appears to be contained in a box with pixels.

87 INTRODUCTION TO MICROARRAYS 87 Quantification has three parts Registration: locating the centers of the spots Segmentation: deciding which pixels around the center actually belong to the spot Quantification: summarizing the numerical pixel values in the spot (foreground) and around it (background).

88 INTRODUCTION TO MICROARRAYS 88 Quantification in action

89 INTRODUCTION TO MICROARRAYS 89 What about Background? What is it? The pixel intensity in those parts of the image where nothing has been spotted. How do we count it? Typically by averaging the intensities in some non-spot pixels close to the spot center. What do we do with it? Subtract it off from all pixels inside the spot(?). Subtracting background is very important when a fixed mask is used.

90 INTRODUCTION TO MICROARRAYS 90 What about Background? Which pixels do we use? (Yang et al, JCGS, 2001)

91 INTRODUCTION TO MICROARRAYS 91 Putting Channels Together: Log Ratios Why is this a natural scale? Lots of biological stuff works in terms of fold changes. Why log? Fold changes of 2 and 1/2 are of equal magnitude, but different sign Why ratios? We ll look at that below.

92 INTRODUCTION TO MICROARRAYS 92 Why Use Log Ratios?: Red Replicates

93 INTRODUCTION TO MICROARRAYS 93 Why Use Log Ratios?: Green Replicates

94 INTRODUCTION TO MICROARRAYS 94 Why Use Log Ratios?: Ratio Replicates

95 INTRODUCTION TO MICROARRAYS 95 Why does this work? Channels

96 INTRODUCTION TO MICROARRAYS 96 Why does this work? Ratios

97 INTRODUCTION TO MICROARRAYS 97 Spatial Replicates Step 1: form a spatial plot of the log ratios for array 1. Step 2: repeat step 1 for a second array Step 3: form an image of the differences, point by point. If the printing is dense, then we will see big trends if these remain. Try this out with Affy first

98 INTRODUCTION TO MICROARRAYS 98 Chip 1

99 INTRODUCTION TO MICROARRAYS 99 Chip 2

100 INTRODUCTION TO MICROARRAYS 100 Chips 1 and 2

101 INTRODUCTION TO MICROARRAYS 101 Normalization for Glass Arrays There are two stages here. First, normalization across the two channels of an array. Second, normalization of log ratios across a set of arrays. In both cases, loess smooths are the most common method employed. Second is probably print-tip loess, allowing distributions to vary by pin (if there are many pins, this is a partial surrogate for spatial effects).

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Bradley Broom Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org bmbroom@mdanderson.org

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Bradley Broom Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org bmbroom@mdanderson.org

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Bradley Broom Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org bmbroom@mdanderson.org

More information

Low-Level Analysis of High- Density Oligonucleotide Microarray Data

Low-Level Analysis of High- Density Oligonucleotide Microarray Data Low-Level Analysis of High- Density Oligonucleotide Microarray Data Ben Bolstad http://www.stat.berkeley.edu/~bolstad Biostatistics, University of California, Berkeley UC Berkeley Feb 23, 2004 Outline

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org

More information

Probe-Level Analysis of Affymetrix GeneChip Microarray Data

Probe-Level Analysis of Affymetrix GeneChip Microarray Data Probe-Level Analysis of Affymetrix GeneChip Microarray Data Ben Bolstad http://www.stat.berkeley.edu/~bolstad Biostatistics, University of California, Berkeley University of Minnesota Mar 30, 2004 Outline

More information

Probe-Level Analysis of Affymetrix GeneChip Microarray Data

Probe-Level Analysis of Affymetrix GeneChip Microarray Data Probe-Level Analysis of Affymetrix GeneChip Microarray Data Ben Bolstad http://www.stat.berkeley.edu/~bolstad Biostatistics, University of California, Berkeley Memorial Sloan-Kettering Cancer Center July

More information

cdna Microarray Analysis

cdna Microarray Analysis cdna Microarray Analysis with BioConductor packages Nolwenn Le Meur Copyright 2007 Outline Data acquisition Pre-processing Quality assessment Pre-processing background correction normalization summarization

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org kcoombes@mdanderson.org

More information

SPOTTED cdna MICROARRAYS

SPOTTED cdna MICROARRAYS SPOTTED cdna MICROARRAYS Spot size: 50um - 150um SPOTTED cdna MICROARRAYS Compare the genetic expression in two samples of cells PRINT cdna from one gene on each spot SAMPLES cdna labelled red/green e.g.

More information

Normalization. Example of Replicate Data. Biostatistics Rafael A. Irizarry

Normalization. Example of Replicate Data. Biostatistics Rafael A. Irizarry This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org

More information

Use of Agilent Feature Extraction Software (v8.1) QC Report to Evaluate Microarray Performance

Use of Agilent Feature Extraction Software (v8.1) QC Report to Evaluate Microarray Performance Use of Agilent Feature Extraction Software (v8.1) QC Report to Evaluate Microarray Performance Anthea Dokidis Glenda Delenstarr Abstract The performance of the Agilent microarray system can now be evaluated

More information

Lesson 11. Functional Genomics I: Microarray Analysis

Lesson 11. Functional Genomics I: Microarray Analysis Lesson 11 Functional Genomics I: Microarray Analysis Transcription of DNA and translation of RNA vary with biological conditions 3 kinds of microarray platforms Spotted Array - 2 color - Pat Brown (Stanford)

More information

Seminar Microarray-Datenanalyse

Seminar Microarray-Datenanalyse Seminar Microarray- Normalization Hans-Ulrich Klein Christian Ruckert Institut für Medizinische Informatik WWU Münster SS 2011 Organisation 1 09.05.11 Normalisierung 2 10.05.11 Bestimmen diff. expr. Gene,

More information

Laurent Gautier 1, 2. August 18 th, 2009

Laurent Gautier 1, 2. August 18 th, 2009 s Processing of Laurent Gautier 1, 2 1 DTU Multi-Assay Core(DMAC) 2 Center for Biological Sequence (CBS) August 18 th, 2009 s 1 s 2 3 4 s 1 s 2 3 4 Photolithography s Courtesy of The Science Creative Quarterly,

More information

Data Preprocessing. Data Preprocessing

Data Preprocessing. Data Preprocessing Data Preprocessing 1 Data Preprocessing Normalization: the process of removing sampleto-sample variations in the measurements not due to differential gene expression. Bringing measurements from the different

More information

Error models and normalization. Wolfgang Huber DKFZ Heidelberg

Error models and normalization. Wolfgang Huber DKFZ Heidelberg Error models and normalization Wolfgang Huber DKFZ Heidelberg Acknowledgements Anja von Heydebreck, Martin Vingron Andreas Buness, Markus Ruschhaupt, Klaus Steiner, Jörg Schneider, Katharina Finis, Anke

More information

Practical Applications and Properties of the Exponentially. Modified Gaussian (EMG) Distribution. A Thesis. Submitted to the Faculty

Practical Applications and Properties of the Exponentially. Modified Gaussian (EMG) Distribution. A Thesis. Submitted to the Faculty Practical Applications and Properties of the Exponentially Modified Gaussian (EMG) Distribution A Thesis Submitted to the Faculty of Drexel University by Scott Haney in partial fulfillment of the requirements

More information

Bi 1x Spring 2014: LacI Titration

Bi 1x Spring 2014: LacI Titration Bi 1x Spring 2014: LacI Titration 1 Overview In this experiment, you will measure the effect of various mutated LacI repressor ribosome binding sites in an E. coli cell by measuring the expression of a

More information

Microarray Preprocessing

Microarray Preprocessing Microarray Preprocessing Normaliza$on Normaliza$on is needed to ensure that differences in intensi$es are indeed due to differen$al expression, and not some prin$ng, hybridiza$on, or scanning ar$fact.

More information

Optimal normalization of DNA-microarray data

Optimal normalization of DNA-microarray data Optimal normalization of DNA-microarray data Daniel Faller 1, HD Dr. J. Timmer 1, Dr. H. U. Voss 1, Prof. Dr. Honerkamp 1 and Dr. U. Hobohm 2 1 Freiburg Center for Data Analysis and Modeling 1 F. Hoffman-La

More information

Session 06 (A): Microarray Basic Data Analysis

Session 06 (A): Microarray Basic Data Analysis 1 SJTU-Bioinformatics Summer School 2017 Session 06 (A): Microarray Basic Data Analysis Maoying,Wu ricket.woo@gmail.com Dept. of Bioinformatics & Biostatistics Shanghai Jiao Tong University Summer, 2017

More information

Keppel, G. & Wickens, T. D. Design and Analysis Chapter 4: Analytical Comparisons Among Treatment Means

Keppel, G. & Wickens, T. D. Design and Analysis Chapter 4: Analytical Comparisons Among Treatment Means Keppel, G. & Wickens, T. D. Design and Analysis Chapter 4: Analytical Comparisons Among Treatment Means 4.1 The Need for Analytical Comparisons...the between-groups sum of squares averages the differences

More information

Expression arrays, normalization, and error models

Expression arrays, normalization, and error models 1 Epression arrays, normalization, and error models There are a number of different array technologies available for measuring mrna transcript levels in cell populations, from spotted cdna arrays to in

More information

ncounter PlexSet Data Analysis Guidelines

ncounter PlexSet Data Analysis Guidelines ncounter PlexSet Data Analysis Guidelines NanoString Technologies, Inc. 530 airview Ave North Seattle, Washington 98109 USA Telephone: 206.378.6266 888.358.6266 E-mail: info@nanostring.com Molecules That

More information

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc. Chapter 8 Linear Regression Copyright 2010 Pearson Education, Inc. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu: Copyright

More information

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals Chapter 8 Linear Regression Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 8-1 Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Fat Versus

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Relative Photometry with data from the Peter van de Kamp Observatory D. Cohen and E. Jensen (v.1.0 October 19, 2014)

Relative Photometry with data from the Peter van de Kamp Observatory D. Cohen and E. Jensen (v.1.0 October 19, 2014) Relative Photometry with data from the Peter van de Kamp Observatory D. Cohen and E. Jensen (v.1.0 October 19, 2014) Context This document assumes familiarity with Image reduction and analysis at the Peter

More information

Introduction to Algorithms

Introduction to Algorithms Lecture 1 Introduction to Algorithms 1.1 Overview The purpose of this lecture is to give a brief overview of the topic of Algorithms and the kind of thinking it involves: why we focus on the subjects that

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

Please bring the task to your first physics lesson and hand it to the teacher.

Please bring the task to your first physics lesson and hand it to the teacher. Pre-enrolment task for 2014 entry Physics Why do I need to complete a pre-enrolment task? This bridging pack serves a number of purposes. It gives you practice in some of the important skills you will

More information

Experiment 2 Random Error and Basic Statistics

Experiment 2 Random Error and Basic Statistics PHY9 Experiment 2: Random Error and Basic Statistics 8/5/2006 Page Experiment 2 Random Error and Basic Statistics Homework 2: Turn in at start of experiment. Readings: Taylor chapter 4: introduction, sections

More information

Biochip informatics-(i)

Biochip informatics-(i) Biochip informatics-(i) : biochip normalization & differential expression Ju Han Kim, M.D., Ph.D. SNUBI: SNUBiomedical Informatics http://www.snubi snubi.org/ Biochip Informatics - (I) Biochip basics Preprocessing

More information

1 The Preliminary Processing

1 The Preliminary Processing AY 257 Modern Observational Techniques...23 1 The Preliminary Processing Frames must be corrected for a bias level and quantum efficiency variations on all scales. For a minority of CCDs and most near-ir

More information

BME 5742 Biosystems Modeling and Control

BME 5742 Biosystems Modeling and Control BME 5742 Biosystems Modeling and Control Lecture 24 Unregulated Gene Expression Model Dr. Zvi Roth (FAU) 1 The genetic material inside a cell, encoded in its DNA, governs the response of a cell to various

More information

SUPPLEMENTAL DATA: ROBUST ESTIMATORS FOR EXPRESSION ANALYSIS EARL HUBBELL, WEI-MIN LIU, AND RUI MEI

SUPPLEMENTAL DATA: ROBUST ESTIMATORS FOR EXPRESSION ANALYSIS EARL HUBBELL, WEI-MIN LIU, AND RUI MEI SUPPLEMENTAL DATA: ROBUST ESTIMATORS FOR EXPRESSION ANALYSIS EARL HUBBELL, WEI-MIN LIU, AND RUI MEI ABSTRACT. This is supplemental data extracted from the paper Robust Estimators for Expression Analysis

More information

Petrel TIPS&TRICKS from SCM

Petrel TIPS&TRICKS from SCM Petrel TIPS&TRICKS from SCM Knowledge Worth Sharing Fault Model Quality Control Most Petrel projects are faulted. This means that fault models must be built and then checked for accuracy in almost every

More information

Ratios, Proportions, Unit Conversions, and the Factor-Label Method

Ratios, Proportions, Unit Conversions, and the Factor-Label Method Ratios, Proportions, Unit Conversions, and the Factor-Label Method Math 0, Littlefield I don t know why, but presentations about ratios and proportions are often confused and fragmented. The one in your

More information

Lab 2 Worksheet. Problems. Problem 1: Geometry and Linear Equations

Lab 2 Worksheet. Problems. Problem 1: Geometry and Linear Equations Lab 2 Worksheet Problems Problem : Geometry and Linear Equations Linear algebra is, first and foremost, the study of systems of linear equations. You are going to encounter linear systems frequently in

More information

Overview - MS Proteomics in One Slide. MS masses of peptides. MS/MS fragments of a peptide. Results! Match to sequence database

Overview - MS Proteomics in One Slide. MS masses of peptides. MS/MS fragments of a peptide. Results! Match to sequence database Overview - MS Proteomics in One Slide Obtain protein Digest into peptides Acquire spectra in mass spectrometer MS masses of peptides MS/MS fragments of a peptide Results! Match to sequence database 2 But

More information

appstats8.notebook October 11, 2016

appstats8.notebook October 11, 2016 Chapter 8 Linear Regression Objective: Students will construct and analyze a linear model for a given set of data. Fat Versus Protein: An Example pg 168 The following is a scatterplot of total fat versus

More information

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010 BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for

More information

Bioinformatics Chapter 1. Introduction

Bioinformatics Chapter 1. Introduction Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!

More information

Homework 9: Protein Folding & Simulated Annealing : Programming for Scientists Due: Thursday, April 14, 2016 at 11:59 PM

Homework 9: Protein Folding & Simulated Annealing : Programming for Scientists Due: Thursday, April 14, 2016 at 11:59 PM Homework 9: Protein Folding & Simulated Annealing 02-201: Programming for Scientists Due: Thursday, April 14, 2016 at 11:59 PM 1. Set up We re back to Go for this assignment. 1. Inside of your src directory,

More information

Package plw. R topics documented: May 7, Type Package

Package plw. R topics documented: May 7, Type Package Type Package Package plw May 7, 2018 Title Probe level Locally moderated Weighted t-tests. Version 1.40.0 Date 2009-07-22 Author Magnus Astrand Maintainer Magnus Astrand

More information

Chapter 6 The Standard Deviation as a Ruler and the Normal Model

Chapter 6 The Standard Deviation as a Ruler and the Normal Model Chapter 6 The Standard Deviation as a Ruler and the Normal Model Overview Key Concepts Understand how adding (subtracting) a constant or multiplying (dividing) by a constant changes the center and/or spread

More information

Comparing whole genomes

Comparing whole genomes BioNumerics Tutorial: Comparing whole genomes 1 Aim The Chromosome Comparison window in BioNumerics has been designed for large-scale comparison of sequences of unlimited length. In this tutorial you will

More information

Processing microarray data with Bioconductor

Processing microarray data with Bioconductor Processing microarray data with Bioconductor Statistical analysis of gene expression data with R and Bioconductor University of Copenhagen Copenhagen Biocenter Laurent Gautier 1, 2 August 17-21 2009 Contents

More information

Lecture 5. 1 Review (Pairwise Independence and Derandomization)

Lecture 5. 1 Review (Pairwise Independence and Derandomization) 6.842 Randomness and Computation September 20, 2017 Lecture 5 Lecturer: Ronitt Rubinfeld Scribe: Tom Kolokotrones 1 Review (Pairwise Independence and Derandomization) As we discussed last time, we can

More information

Discovering Correlation in Data. Vinh Nguyen Research Fellow in Data Science Computing and Information Systems DMD 7.

Discovering Correlation in Data. Vinh Nguyen Research Fellow in Data Science Computing and Information Systems DMD 7. Discovering Correlation in Data Vinh Nguyen (vinh.nguyen@unimelb.edu.au) Research Fellow in Data Science Computing and Information Systems DMD 7.14 Discovering Correlation Why is correlation important?

More information

Bioconductor Project Working Papers

Bioconductor Project Working Papers Bioconductor Project Working Papers Bioconductor Project Year 2004 Paper 6 Error models for microarray intensities Wolfgang Huber Anja von Heydebreck Martin Vingron Department of Molecular Genome Analysis,

More information

Androgen-independent prostate cancer

Androgen-independent prostate cancer The following tutorial walks through the identification of biological themes in a microarray dataset examining androgen-independent. Visit the GeneSifter Data Center (www.genesifter.net/web/datacenter.html)

More information

Design of Microarray Experiments. Xiangqin Cui

Design of Microarray Experiments. Xiangqin Cui Design of Microarray Experiments Xiangqin Cui Experimental design Experimental design: is a term used about efficient methods for planning the collection of data, in order to obtain the maximum amount

More information

Lecture 4: Constructing the Integers, Rationals and Reals

Lecture 4: Constructing the Integers, Rationals and Reals Math/CS 20: Intro. to Math Professor: Padraic Bartlett Lecture 4: Constructing the Integers, Rationals and Reals Week 5 UCSB 204 The Integers Normally, using the natural numbers, you can easily define

More information

Quantitative Understanding in Biology Short Course Session 9 Principal Components Analysis

Quantitative Understanding in Biology Short Course Session 9 Principal Components Analysis Quantitative Understanding in Biology Short Course Session 9 Principal Components Analysis Jinhyun Ju Jason Banfelder Luce Skrabanek June 21st, 218 1 Preface For the last session in this course, we ll

More information

Quantification of JEOL XPS Spectra from SpecSurf

Quantification of JEOL XPS Spectra from SpecSurf Quantification of JEOL XPS Spectra from SpecSurf The quantification procedure used by the JEOL SpecSurf software involves modifying the Scofield cross-sections to account for both an energy dependency

More information

Geog 210C Spring 2011 Lab 6. Geostatistics in ArcMap

Geog 210C Spring 2011 Lab 6. Geostatistics in ArcMap Geog 210C Spring 2011 Lab 6. Geostatistics in ArcMap Overview In this lab you will think critically about the functionality of spatial interpolation, improve your kriging skills, and learn how to use several

More information

Exploratory Factor Analysis and Principal Component Analysis

Exploratory Factor Analysis and Principal Component Analysis Exploratory Factor Analysis and Principal Component Analysis Today s Topics: What are EFA and PCA for? Planning a factor analytic study Analysis steps: Extraction methods How many factors Rotation and

More information

Experimental Design. Experimental design. Outline. Choice of platform Array design. Target samples

Experimental Design. Experimental design. Outline. Choice of platform Array design. Target samples Experimental Design Credit for some of today s materials: Jean Yang, Terry Speed, and Christina Kendziorski Experimental design Choice of platform rray design Creation of probes Location on the array Controls

More information

Introduction to the Scanning Tunneling Microscope

Introduction to the Scanning Tunneling Microscope Introduction to the Scanning Tunneling Microscope A.C. Perrella M.J. Plisch Center for Nanoscale Systems Cornell University, Ithaca NY Measurement I. Theory of Operation The scanning tunneling microscope

More information

Exploratory Factor Analysis and Principal Component Analysis

Exploratory Factor Analysis and Principal Component Analysis Exploratory Factor Analysis and Principal Component Analysis Today s Topics: What are EFA and PCA for? Planning a factor analytic study Analysis steps: Extraction methods How many factors Rotation and

More information

The IRS Flats. Spitzer Science Center

The IRS Flats. Spitzer Science Center Spitzer Science Center Table of Contents The IRS Flats 1 Chapter 1. The Purpose of this Document... 3 Chapter 2.... 4 2.1 Make a finely-spaced map of a star....4 2.2 Simulate an extended source...6 2.3

More information

Software Testing Lecture 2

Software Testing Lecture 2 Software Testing Lecture 2 Justin Pearson September 25, 2014 1 / 1 Test Driven Development Test driven development (TDD) is a way of programming where all your development is driven by tests. Write tests

More information

Lab 4: Stellar Spectroscopy

Lab 4: Stellar Spectroscopy Name:... Astronomy 101: Observational Astronomy Fall 2006 Lab 4: Stellar Spectroscopy 1 Observations 1.1 Objectives and Observation Schedule During this lab each group will target a few bright stars of

More information

Experiment 10. Zeeman Effect. Introduction. Zeeman Effect Physics 244

Experiment 10. Zeeman Effect. Introduction. Zeeman Effect Physics 244 Experiment 10 Zeeman Effect Introduction You are familiar with Atomic Spectra, especially the H-atom energy spectrum. Atoms emit or absorb energies in packets, or quanta which are photons. The orbital

More information

8. TRANSFORMING TOOL #1 (the Addition Property of Equality)

8. TRANSFORMING TOOL #1 (the Addition Property of Equality) 8 TRANSFORMING TOOL #1 (the Addition Property of Equality) sentences that look different, but always have the same truth values What can you DO to a sentence that will make it LOOK different, but not change

More information

Introduction to Bioinformatics

Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics

More information

CS 5014: Research Methods in Computer Science. Bernoulli Distribution. Binomial Distribution. Poisson Distribution. Clifford A. Shaffer.

CS 5014: Research Methods in Computer Science. Bernoulli Distribution. Binomial Distribution. Poisson Distribution. Clifford A. Shaffer. Department of Computer Science Virginia Tech Blacksburg, Virginia Copyright c 2015 by Clifford A. Shaffer Computer Science Title page Computer Science Clifford A. Shaffer Fall 2015 Clifford A. Shaffer

More information

Matrix Assembly in FEA

Matrix Assembly in FEA Matrix Assembly in FEA 1 In Chapter 2, we spoke about how the global matrix equations are assembled in the finite element method. We now want to revisit that discussion and add some details. For example,

More information

1 Closest Pair of Points on the Plane

1 Closest Pair of Points on the Plane CS 31: Algorithms (Spring 2019): Lecture 5 Date: 4th April, 2019 Topic: Divide and Conquer 3: Closest Pair of Points on a Plane Disclaimer: These notes have not gone through scrutiny and in all probability

More information

Lecture 9: Generalization

Lecture 9: Generalization Lecture 9: Generalization Roger Grosse 1 Introduction When we train a machine learning model, we don t just want it to learn to model the training data. We want it to generalize to data it hasn t seen

More information

Lecture 12: Quality Control I: Control of Location

Lecture 12: Quality Control I: Control of Location Lecture 12: Quality Control I: Control of Location 10 October 2005 This lecture and the next will be about quality control methods. There are two reasons for this. First, it s intrinsically important for

More information

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math. Regression, part II I. What does it all mean? A) Notice that so far all we ve done is math. 1) One can calculate the Least Squares Regression Line for anything, regardless of any assumptions. 2) But, if

More information

Fitting a Straight Line to Data

Fitting a Straight Line to Data Fitting a Straight Line to Data Thanks for your patience. Finally we ll take a shot at real data! The data set in question is baryonic Tully-Fisher data from http://astroweb.cwru.edu/sparc/btfr Lelli2016a.mrt,

More information

Final Review Sheet. B = (1, 1 + 3x, 1 + x 2 ) then 2 + 3x + 6x 2

Final Review Sheet. B = (1, 1 + 3x, 1 + x 2 ) then 2 + 3x + 6x 2 Final Review Sheet The final will cover Sections Chapters 1,2,3 and 4, as well as sections 5.1-5.4, 6.1-6.2 and 7.1-7.3 from chapters 5,6 and 7. This is essentially all material covered this term. Watch

More information

Factoring. there exists some 1 i < j l such that x i x j (mod p). (1) p gcd(x i x j, n).

Factoring. there exists some 1 i < j l such that x i x j (mod p). (1) p gcd(x i x j, n). 18.310 lecture notes April 22, 2015 Factoring Lecturer: Michel Goemans We ve seen that it s possible to efficiently check whether an integer n is prime or not. What about factoring a number? If this could

More information

Data Exploration and Unsupervised Learning with Clustering

Data Exploration and Unsupervised Learning with Clustering Data Exploration and Unsupervised Learning with Clustering Paul F Rodriguez,PhD San Diego Supercomputer Center Predictive Analytic Center of Excellence Clustering Idea Given a set of data can we find a

More information

TruSight Cancer Workflow on the MiniSeq System

TruSight Cancer Workflow on the MiniSeq System TruSight Cancer Workflow on the MiniSeq System Prepare Library Sequence Analyze Data TruSight Cancer 1.5 days ~ 24 hours < 2 hours TruSight Cancer Library Prep MiniSeq System Local Run Manager Enrichment

More information

Sampling. Where we re heading: Last time. What is the sample? Next week: Lecture Monday. **Lab Tuesday leaving at 11:00 instead of 1:00** Tomorrow:

Sampling. Where we re heading: Last time. What is the sample? Next week: Lecture Monday. **Lab Tuesday leaving at 11:00 instead of 1:00** Tomorrow: Sampling Questions Define: Sampling, statistical inference, statistical vs. biological population, accuracy, precision, bias, random sampling Why do people use sampling techniques in monitoring? How do

More information

Reading and Writing. Mathematical Proofs. Slides by Arthur van Goetham

Reading and Writing. Mathematical Proofs. Slides by Arthur van Goetham Reading and Writing Mathematical Proofs Slides by Arthur van Goetham What is a proof? Why explanations are not proofs What is a proof? A method for establishing truth What establishes truth depends on

More information

The Haar Wavelet Transform: Compression and. Reconstruction

The Haar Wavelet Transform: Compression and. Reconstruction The Haar Wavelet Transform: Compression and Damien Adams and Halsey Patterson December 14, 2006 Abstract The Haar Wavelet Transformation is a simple form of compression involved in averaging and differencing

More information

Statistics for Differential Expression in Sequencing Studies. Naomi Altman

Statistics for Differential Expression in Sequencing Studies. Naomi Altman Statistics for Differential Expression in Sequencing Studies Naomi Altman naomi@stat.psu.edu Outline Preliminaries what you need to do before the DE analysis Stat Background what you need to know to understand

More information

BLAST. Varieties of BLAST

BLAST. Varieties of BLAST BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database

More information

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1 Regression diagnostics As is true of all statistical methodologies, linear regression analysis can be a very effective way to model data, as along as the assumptions being made are true. For the regression

More information

LAB 2 - ONE DIMENSIONAL MOTION

LAB 2 - ONE DIMENSIONAL MOTION Name Date Partners L02-1 LAB 2 - ONE DIMENSIONAL MOTION OBJECTIVES Slow and steady wins the race. Aesop s fable: The Hare and the Tortoise To learn how to use a motion detector and gain more familiarity

More information

HR Diagram of Globular Cluster Messier 80 Using Hubble Space Telescope Data

HR Diagram of Globular Cluster Messier 80 Using Hubble Space Telescope Data Jason Kendall, William Paterson University, Department of Physics HR Diagram of Globular Cluster Messier 80 Using Hubble Space Telescope Data Background Purpose: HR Diagrams are central to understanding

More information

Review for Exam 1. Erik G. Learned-Miller Department of Computer Science University of Massachusetts, Amherst Amherst, MA

Review for Exam 1. Erik G. Learned-Miller Department of Computer Science University of Massachusetts, Amherst Amherst, MA Review for Exam Erik G. Learned-Miller Department of Computer Science University of Massachusetts, Amherst Amherst, MA 0003 March 26, 204 Abstract Here are some things you need to know for the in-class

More information

Computational Biology

Computational Biology Computational Biology Lecture 6 31 October 2004 1 Overview Scoring matrices (Thanks to Shannon McWeeney) BLAST algorithm Start sequence alignment 2 1 What is a homologous sequence? A homologous sequence,

More information

Naive Bayes classification

Naive Bayes classification Naive Bayes classification Christos Dimitrakakis December 4, 2015 1 Introduction One of the most important methods in machine learning and statistics is that of Bayesian inference. This is the most fundamental

More information

Designing Information Devices and Systems I Spring 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way

Designing Information Devices and Systems I Spring 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way EECS 16A Designing Information Devices and Systems I Spring 018 Lecture Notes Note 1 1.1 Introduction to Linear Algebra the EECS Way In this note, we will teach the basics of linear algebra and relate

More information

Technologie w skali genomowej 2/ Algorytmiczne i statystyczne aspekty sekwencjonowania DNA

Technologie w skali genomowej 2/ Algorytmiczne i statystyczne aspekty sekwencjonowania DNA Technologie w skali genomowej 2/ Algorytmiczne i statystyczne aspekty sekwencjonowania DNA Expression analysis for RNA-seq data Ewa Szczurek Instytut Informatyki Uniwersytet Warszawski 1/35 The problem

More information