Limma = linear models for microarray data. Linear models and Limma. Expression measures. Questions of Interest. y ga = log 2 (R/G) y ga =

Size: px

Start display at page:

Download "Limma = linear models for microarray data. Linear models and Limma. Expression measures. Questions of Interest. y ga = log 2 (R/G) y ga ="

Barnaby Clarke
5 years ago
Views:

Linear models and Limma Københavns Universitet, 19 August 2009 Mark D.

Introduction! Background correction! Moderated t-tests! Simple linear models o Morning Practical! Demonstration of smoothing! Limma objects (beta7)! Background correction and normalization (beta7)!

Other analyses limma can do o Afternoon practical! Factorial design (estrogen)! Gene set testing (cancer)!

1 Linear models and Limma Københavns Universitet, 19 August 2009 Mark D. Robinson Bioinformatics, Walter+Eliza Hall Institute Epigenetics Laboratory, Garvan Institute (with many slides taken from Gordon Smyth) 1 Limma = linear models for microarray data o Morning Theory! Introduction! Background correction! Moderated t-tests! Simple linear models o Morning Practical! Demonstration of smoothing! Limma objects (beta7)! Background correction and normalization (beta7)! Simple experimental designs! 2-colour example (beta7)! Affymetrix example (cancer) o Afternoon Theory! More advanced designs / linear modeling! Moderated F-tests! Gene set tests! Other analyses limma can do o Afternoon practical! Factorial design (estrogen)! Gene set testing (cancer)! Time course experiment (SAHA/depsipeptide) Two-colour Affymetrix Illumina Expression measures array y ga = log 2 (R/G) probe or gene y ga = y ga = log-intensity (summarized over probes) log-intensity (summarized over beads) 3 Questions of Interest o What genes have changed in expression? (e.g. between disease/normal, affected by treatment) Gene discovery, differential expression o Is a specified group of genes all up-regulated in a particular condition? Gene set differential expression o Can the expression profile predict outcome? Class prediction, classification o Are there tumour sub-types not previously identified? Do my genes group into previously undiscovered pathways? Class discovery, clustering Today will cover first two questions - differential expression 4 3 4

Microarray Differential Expression Studies o 10 3 10 6 genes/probes/exons on a chip or glass slide o Inputs to limma: log-intensities (1-colour data) or log(r/g) log-ratios (for 2-colour data) o

2 Microarray Differential Expression Studies o genes/probes/exons on a chip or glass slide o Inputs to limma: log-intensities (1-colour data) or log(r/g) log-ratios (for 2-colour data) o Several steps to go from raw data to table of expression : background correction, normalization o Idea: Fit a linear model to the expression data for each gene 5 Two colour microarrays Two-colour data: Log-Intensities For each probe: R = R! R * log 2( f b ) G = G! G * log 2( f b ) Two-colour data: Means and Differences For each probe: M = R! G A = ( R + G)/ 2 minus add Various ways to calculate background. Will often modify to ensure: R f R b >0 and G f G b >

3 Data Summaries MA Plot For each gene G 1 R 1 G 2 R 2 G 3 R 3 G 4 R 4 M 1 A 1 M 2 A 2 M 3 A 3 M 4 A Log-Ratios or Single Channel Intensities? o Tradition analysis, treats log-ratios M=log(R/G) as the primary data, i.e., gene expression measurements are relative o Alternative approach treats individual channel intensities R and G as primary data, i.e., gene expression measures are absolute (Wolfinger, Churchill, Kerr) o Single channel approach makes new analyses possible but - make stronger assumptions - requires more complex models (mixed models in place of ordinary linear models) to accommodate correlation between R and G on same spot BG correction affects DE results o Importance of careful pre-processing and quality control cannot be over-emphasized for microarray data o Can have dramatic effect on differential expression results o Consider here the normexp method of adaptive background correction - background correction step of the RMA algorithm - Can also be applied to two colour data

4 Additive + multiplicative error model Observe intensity for one probe on one array Intensity = background + signal I = B + S normexp convolution model Intensity = Background + Signal N(!,! 2 ) Exponential(") additive errors multiplicative errors = + This idea underlies variance stabilizing transformations vsn (two colour data) and vst (for Illumina data) Then with Conditional expectation under normexp model normexp background correction o Estimate the three parameters o Replace I with E(S I) o For Affymetrix data, I is the Perfect Match data intensity o For two-colour data, I=R f -R b or I=G f -G b o In the RMA algorithm, parameter estimation uses an ad hoc density kernel method o In limma (two colour), parameter estimation maximises the saddlepoint approximation to the likelihood

5 PM data on log 2 scale: raw and fitted model Background corrected intensity = E(Signal Observed Intensity) Adaptive background correction produces positive signal E( Signal Intensity) 17 Observed Intensity Offsets to stabilise the variance Background correction Log-ratios Why do offsets stabilize the variance? Offset reduces variability at low intensities

6 Why do offsets stabilize the variance? A self-self experiment: two background methods o log 2 ( 800/100 ) = log 2 ( 8/1 ) = 3 (8-fold) o Additive noise affects small numbers more o Offsets introduce bias: o log 2 [(80+10)/(10+10)] = 2.17 o But the tradeoff (drop in variance for increase in bias) is usually worth it spots not plotted 22 First statistical point choose background correction to stabilise the variance as a function of intensity Comparison of 2-colour BG correction methods References False discoveries (limma) o Silver et al. (2009). Microarray background correction: maximum likelihood estimation for the normal-exponential convolution. Biostatistics. [complete mathematical development of the saddle point approximation] o Ritchie et al. (2007). A comparison of background correction methods for two-colour microarrays. Bioinformatics. [shows normexp performs best for 2-colour data] o Irizarry et al. (2003). Exploration, normalization and summaries of high density oligonucleotide array probe level data. Biostatistics. [Describes RMA BG correction, but doesn't give much detail of the normexp convolution model.] Ritchie et al Genes selected

26 25 26 One-colour Normalization Moderated t-tests Similarly for single

7 Two-colour Normalization Normalization BG correction and normalization are closely connected 25 Even after BG correction, some effects remain One-colour Normalization Moderated t-tests Similarly for single channel data, adjustments need to be made for all samples to be comparable

8 Borrowing information across genes Hard and soft shrinkage o Small data sets: few arrays, inference for individual genes is uncertain o Curse of dimensionality: many tests, need to adjust for multiple testing, loss of power o Benefit of parallelism: same model is fitted for every gene. Can borrow information from one gene to another o Hard: simplest way to borrow information is to assume that one or more parameters are constant across genes o Soft: smooth genewise parameters towards a common value in a graduated way, e.g., Bayes, empirical Bayes, Stein shrinkage A very common experiment Wild-type mouse x 2 30 Ordinary t-tests Mutant mouse x 2 Gene X give very high false discovery rates Which genes are differentially expressed? Residual df = 2 n1 = n2 = 2 Affymetrix arrays 25,000 probe-sets

9 Another very common experiment Ordinary t-tests Wild-type mouse 1 Mutant mouse 1 Wild-type mouse 2 Mutant mouse 2 Which genes are differentially expressed? give very high false discovery rates n = 2 two-colour arrays 30,000 probes Residual df = Small sample size, many tests The problem: o These experiments would be under-powered even with just one gene o Yet we want to test differential expression for each of 50k genes, hence lots of multiple testing and further loss of power The solution: The same statistical model is being fitted for every gene in parallel. Can borrow strength from other genes. t-tests with common variance with residual standard deviation across genes pooled More stable, but ignores gene-specific variability

10 A better compromise Shrinkage of standard deviations Shrink standard deviations towards common value = degrees of freedom s% 1 s 0 s% 2 L s% G t g,pooled t% g d Moderated t-statistics s 1 s 2 L s G t g d 0 37 The data decides whether tg,pooled or to t g t% g should be closer to Why does it work? Hierarchical model for variances o We learn what is the typical variability level by looking at all genes, but allow some flexibility from this for individual genes o Adaptive Data Prior Posterior

11 Moderated t-statistics Posterior Statistics Posterior variance estimators Exact distribution for moderated t An unexpected piece of mathematics shows that, under the null hypothesis, %: tt gdd 0+ g The degrees of freedom add! The Bayes prior in effect adds d 0 extra arrays for estimating the variance. Baldi & Long 2001, Wright & Simon 2003, Smyth Wright and Simon 2003, Smyth Hierarchical model for means Data More on empirical Bayes statistics Prior 43 Lönnstedt and Speed 2002, Smyth

12 Posterior Odds Posterior odds of differential expression Monotonic function of t% Estimating Hyper-Parameters Closed form estimators with good properties are available: for s 0 and d 0 in terms of the first two moments of log s 2 for c 0 in terms of quantiles of the t% g Hence t% gives the best possible ranking of genes Lönnstedt and Speed 2002, Smyth Marginal Distributions Under usual likelihood model, s g is independent of the estimated coefficients. Under the hierarchical model, s g is independent of the moderated t-statistics instead Moment estimators for s 0 and d 0 Marginal moments of log s 2 lead to estimators of s 0 and d 0 : Estimate d 0 by solving where t% g!" # "$ t d0+ d 0 d0+ d with prob 1- p 1 + c / c t with prob p Finally

13 Quantile Estimation of c 0 Let r be rank of t% g in descending order, and let F(;) be the distribution function of the t-distribution. Can estimate c 0 by equating empirical to theoretical quantiles: Short note on multiple testing Get overall estimator of c 0 by averaging the individual estimators from the top p/2 proportion of the t% g Multiple testing and adjusted p-values o Traditional method in statistics is to control family wise error rate, e.g., by Bonferroni. o Holm s method is improved (step-down) modification of Bonferroni. o Controlling the false discovery rate (FDR) is more appropriate in microarray studies o Benjamini and Hochberg method controls expected FDR for independent or weakly dependent test statistics. Simulation studies support use for microarray data. o All methods can be implemented in terms of adjusted p-values. End of morning theory - Summary o Background correction, normalization are important considerations -- normexp, offsets o Moderation will generally always help -- moderation of variances is very effective o Convenient model gives known null distribution o Multiple testing

14 More complex experiments Linear models o More complex microarray experiments can be represented by linear models o For one-channel platforms, the linear model can be set up using the usual univariate linear model formulae o For two-colour platforms, the linear models have some special properties Linear Models o In general, need to specify: - Dependent variable - Explanatory variables (experimental design, covariates, etc.) o More generally: vector of observed data design matrix Vector of parameters to estimate 55 Linear Models for microarrays o Analyse all arrays together combining information in optimal way o Combined estimation of precision o Extensible to arbitrarily complicated experiments o Design matrix: specifies RNA targets used on arrays o Contrast matrix: specifies which comparisons are of interest

15 Designs # Linear Models Design # Linear models A Wild-type mouse x 2 y = log 2 ( R / G )! B " A B Mutant mouse x 2 A B " y1 # " 1 # $ y % = $ &1%! ' 2( ' ( A Ref $1 = wt log-expression B $2 = mutant % wt E[y1]=E[y2]=$1 A E[y3]=E[y4]= $1+ $2 Design matrices for 1-colour arrays are easier to specify! B C " y1 # " 1 0 # $ y % = $ &1 0 % "!1 # $ 2% $ %$ % $ y % $ 1 1 % '!2 ( ' 3( ' (! " B# A!1 " A # Ref!2 " B # A " y1 # " 1 0 # $ y % = $ &1 1 % "!1 # $ 2% $ % $'! 2 %( $ % $ % ' y3 ( ' 0 &1(!1 " B # A!2 " C # A Matrix Multiplication A B A Ref B A 1 3 B 2 C Linear Model Estimates " y1 # " 1 # "! # $ y % = $ &1 %! = $ &! % ' ( ' 2( ' (! " B# A " y1 # " 1 0 # "!1 # $ y % = $ &1 0 % "!1 # = $ &! % 1 $ 2% $ % $'! 2 %( $ % $ y % $ 1 1% $! +! % 2( ' 3( ' ( ' 1!1 " A # Ref!2 " B # A " y1 # " 1 0 # "!1 # $ y % = $ &1 1 % "!1 # = $ &! +! % 2 $ 2% $ % $'! 2 %( $ 1 % $ % $ % $ % ' y 3 ( ' 0 &1 ( ' &!2 ( 58 Obtain a linear model for each gene g Estimate models to get coefficients!1 " B # A!2 " C # A standard deviations standard errors Contrast:

Contrasts A contrast is any linear combination of the coefficients " j which we want to test equal to zero. Define contrasts were C is the contrast matrix.

16 Contrasts A contrast is any linear combination of the coefficients " j which we want to test equal to zero. Define contrasts were C is the contrast matrix. Want to test vs 61 Pax5: example of saturated design 1 Wt Pax5-/ IL-7 removed Rag1-/ Robust design can tolerate failure of some of the arrays Regression Analysis Choose 3 comparisons between the 4 RNA sources to be the coefficients of the linear model, e.g., - PW: Pax5-/- vs Wt - RW: Rag1-/- vs Wt - IW: IL-7 withdrawn vs Wt For each gene, fit a linear model with a coefficient for each contrast Any other comparisons of interest can be extracted from the linear model as contrasts 63 & m1 # &' # $! $! Exercise: Fill in the design $ m2! $ 1 0 0! matrix. $ m! $! ' 1 $! $! $ m4! $ 0 0 1! $ m! $ '! $! $! & PW # $ m6! $ ' 1 0 1! $! E$! = $! $ RW! $ m7! $ 1 ' 1 0! $! $! $ '! % IW m " $! $ m9! $ $ 0 ' 1! 0! $ m! $ 0 1 0! 10 $! $! $ m11! $ 0 ' 1 1! $! % m12 " % 0 1 ' 1" 64 Can be fitted using robust regression, but problems with se s, as Patty Solomon has observed 63 64

17 Example of factorial design WT.P1 µ 2 WT.P11 µ + a 1 5 WT.P21 µ + a 1 + a MT.P1 µ + b 3 MT.P11 µ +a 1 +b+a 1.b & y1 #, $! * $ y2! * $ y! * $! * $ y4! = * $! * $ y5! * $ y! 6 * $! * % y7 " MT.P21 µ + (a 1 +a 2 ) + b + (a 1 +a 2 )b 0) 0 ' ' & a1 # $! 0' $ a2! ' 0 $! ' b $! 0' $ a1b! ' $! 1' % a2b" 1' ( 65 Moderated F-statistics Moderated F-Statistic Doubly shrunk F statistic The idea of shrinking the variance extends immediately to multiple contrasts Moderated F-statistic The moderated F is not a monotonically function of the posterior odds A doubly shrunk F statistic can be shown have to the desired relationship to the posterior odds MST=Mean Sum of squares between Treatments Improves further the gene ranking Wright & Simon 2003, Smyth Tai and Speed 2006,

18 Single or double shrinkage o Shrinking the variances only is enough when comparing two groups o When comparing 3 or more groups, further gain can be had by shrinking the $ also (recall Stein estimator needs at least 3 means) Functional category analysis Functional category analysis o Used on a set of genes deemed to be differentially expressed o Asks the question: is my set of genes enriched for a particular molecular function? o Useful for establishing what pathways / types of genes are affected o Nowadays largely superceded by gene set tests Overlap statistics o Question: Say you have a set of 85 genes (of a total genes) known to be associated with function X. Calculate the probability of randomly selecting 40 or more of those genes in a list of 100 DE genes. o Answer:?

19 Overlap statistics o Question: Say you have a set of 85 genes (of a total genes) known to be associated with some function. Calculate the probability of randomly selecting 40 or more of those genes in a list of 100 DE genes. o Answer: Hypergeometric (i.e. the urn problem). Gene set tests N=19915 white m=85 black n=100 k= Gene sets o Test significance of a (apriori specified) group of genes o The genes might belong to a known pathway or might be the top genes from a related experiment o The set might be significant even if individual genes are not o Gene set enrichment analysis (GSEA) originated by Mootha et al PNAS 2003 and Subramanian et al PNAS Available gene set methods o GSEA: gene set enrichment analysis. Complex method using Kolmogorov-Smirnov type tests and sample permutation. Needs two-groups, many arrays, many genes and many sets. o GSA: gene set analysis. Uses combination of permutation of samples and standardization across genes. More powerful. Still needs two-groups, many genes and many sets. o GST: gene set tests using Wilcoxon test. Use randomization over genes. Applicable to linear models and small samples, but can be over-optimistic if the genes in the set are highly correlated. o Now, rotation-based gene set tests

20 Gene set tests Viewing gene sets A priori subset of genes X1, X2, X3 Xn All microarray probes, ranked by a test statistic of interest t1 t2 t3 t4 : Cell adhesion genes Genes regulated by MYB Look for ranks for set genes amongst test statistics What s the hypothesis? o Two major types of gene set tests: competitive or self-contained o Competitive: Genes in the set tend to be more strongly DE than randomly chosen genes o Self-contained: At least some genes in the set are truly DE Permutation o Competitive gene set tests are usually tested by permuting genes, but this ignores intergene correlations o Self-contained gene set tests are usually tested by permuting arrays, but this is limited to twogroup comparisons with large numbers Goeman & Bühlmann, Bioinformatics

21 Rotation gene set tests (ROAST) o Self-contained hypothesis o The first test suitable for small samples which correctly accounts for inter-gene correlation o Can handle complex linear model designs, including array weights, random effects etc Two steps: projection and rotation o Project data onto space orthogonal to nuisance parameters in the linear model o Random rotation of the orthogonal residuals provides fractional permutation, avoids granularity of p-values o Assumes multivariate normality, but proves to be highly robust against deviations from normality Set summary statistics o Compute empirical Bayes t-statistics for each gene in set o Convert to z-statistics o Mean of z 2 : good power, even when only a subset of genes respond not robust against non-normality o Mean50: mean of top half of z : good power robust References o Subramanian et al (2005). A knowledge-based approach for interpreting genome-wide expression profiles. PNAS 102, o Efron and Tibshirani (2007). On testing the significance of sets of genes. Ann Appl Stat 1,

22 Hard shrinking: examples Multi-level models I: duplicate spots o Common correlation model for within-array duplicates spots Smyth et al (2005). The use of within-array replicate spots for assessing differential expression in microarray experiments. Bioinformatics 21, o Common correlation models for single channel analysis of two-color microarray data Smyth, G. K. (2005). Individual channel analysis of two-colour microarray data. 55th Session of the International Statistics Institute, 5-12 April 2005, Sydney, Australia Common correlation model Given a blocking factor with variance component! b2, focus on within-block correlations Common correlation model assumes Duplicate Spots o If the clone library is not too large, it is often possible to print each gene more than once on an array o Duplicates are always side-by-side or a fixed distance apart Has proved effective for technical blocking factors for which correlations are high

Duplicate spots are correlated o Duplicate spots are a form of technical replication, share lots of common causes o Cannot be treated as replicates on separate arrays, log-ratios from duplicate spots

23 Duplicate spots are correlated o Duplicate spots are a form of technical replication, share lots of common causes o Cannot be treated as replicates on separate arrays, log-ratios from duplicate spots are correlated o How best to use duplicate spots? Usual approach is simply to average them Genes printed in duplicate pairs One pair Common Correlation Model 90 Consequences for individual genes o Assume the between-duplicate correlation is the same for every gene o Justified by the belief that the correlation springs mainly from spatial proximity o Improves estimation of variances o If the number of genes is large, then the estimator of " is very accurate, so " may be treated as known as far as inference for each individual gene is concerned o This doesn t change estimation of!g but greatly changes estimation of #g

24 Validation with Spike-In Data o Does the idea of using common correlations work in practice? o Check the ability of the common-correlation t-statistic to distinguish calibration from ratio spike-in spots o Scorecard system includes calibration controls, 3-fold up and down ratio controls, and 10-fold up and down ratio controls Multilevel models II: Separate channel analysis of two-colour data

25 Separate channel analysis of twocolour microarray data Each spot is a block of size 2 G 1 R 1 G 2 R 2 G 3 R 3 G 4 R 4 The two channels give correlated pair of values Why Use Means and Differences? If then var( R) = var( G) cor( M, A ) = 0 regardless of the correlation between R and G. Common correlation model sets intra-spot correlation equal across genes 97 An old idea dating back to Tukey, Altman & Bland Common Reference Experiment A simple normal model Ref Ref B B } µ M = B! Ref µ = ( B + Ref)/2 A Gene g, array i Ref Ref C C } µ M = C! Ref µ = ( C + Ref)/2 A Why not use the A-values as well as M-values? 99! g is the intra-spot correlation

26 Models in terms of M and A M and A parameters µ Mgi = µ Rgi! µ Ggi µ = ( µ + µ )/ 2 Agi Rgi Ggi! = 2! (1 # " ) 2 2 Mgi g g 2! =! 2 (1 + " )/ 2 Agi g g Correlation M and A are independent but have different variances. Have converted a correlated problem into a heteroscedastic problem Can estimate correlation by estimating variances: 4 1 1! # tanh " g = log 2! 2 Ag 2 Mg 103 Common correlation model o Assume the intra-spot correlation is constant across genes o Justified by (i) variance components are observed to be positively correlated and (ii) standard errors for coefficients are not sensitive to correlation value o Common correlation can then be assumed to known at individual gene level o Converts a mixed model into a weighted regression lmscfit() in limma

27 ApoAI Experiment 8 wild type arrays and 8 ApoAI-/- arrays, all relative to a common reference Median intraspot correlation is estimated as ˆ! = 0.85 The efficiency gain from using A-values is 1" ˆ! = ˆ! 105 Individual Channel Normalization Using A-values in the analysis requires that they be normalized to have comparable values between arrays: single channel normalization For the ApoAI data: Within-array loess normalization only A-quantile normalization Quantile normalization ˆ! = 0.89 ˆ! = 0.85 ˆ! = Ignoring the Reference Why not ignore the common reference channel? If we use only the red channel in the common reference experiment, var( RC $ RB ) =! " % + # n1 n & ' 2 ( so var( RC " RB ) 1 = var( M " M ) 2(1 "!) C so adjusting for a common reference is worthwhile whenever " > 0.5 B 107 Disconnected Design B D A-values make no contribution to estimating BvsC comparison (direct) M-values make no contribution to estimating BvsD comparison (indirect) Relative efficiency of indirect comparison compared to direct is 1 "! C E

28 0 Observed false discovery rates Mixed model Ignore Cor SpotCor Why common correlation is effective o Bias introduced into the variance estimation seems to be offset by increase in precision o Great simplification of mathematical model o Penalizes genes with large within-block variances Tests Rejected Data from Holloway et al (2006) BMC Bioinformatics 7, Article A Shrinkage Hierarchy o Fold changes shrinkage may not be required (unless more than 2 groups) o Genewise variances soft smoothing gives spectacular improvement o Technical replicate correlations hard smoothing has proved successful Other bits and pieces As we move from parameters of interest to higher order nuisance parameters, bias decreases in importance relative to noise

29 Some other common variations o Technical replicates o Paired samples o Array weights Summary o Borrowing strength is essential in small-scale microarray experiments o Information can be shared across genes or across arrays o Parameters may be set common between genes (correlations) or shrunk in a graduated way (standard errors) o Power can be increased by testing hypotheses for sets of genes Acknowledgements WEHI Bioinformatics o Gordon Smyth o Matt Ritchie o Alicia Oshlack o Terry Speed Garvan Epigenetics o Susan Clark o Aaron Statham o Marcel Coolen Computer Laboratories

30 Proposed schedule Getting started a.m. o ModerationDemo.R o LimmaObjects.R o BGNormalization.R o SimpleExperiments.R p.m. o FactorialDesign.R o GeneSetAnalysis.R o TimeCourse.R o Grab the files from the KU-August2009-LIMMA directory (or archive) and copy/move to a convenient location on your computer o Set the variable rootdir to that directory. For example: rootdir <- ~/Desktop/KU-August2009-LIMMA/data o Make rootdir the working directory of your R session limma package documentation o Function help pages?lmfit,?ebayes o Class help pages?"rglist-class"?"marraylm-class" o Group help pages help("06.linearmodels") o User s Guide Moderation Demo o Illustration of sampling from the model o Reduction in false discoveries o Empirical Bayes differential expression limmausersguide() o The R html help system is a good top view

same 122 121 122 Simple Experiment 1: Integrin beta7+ vs beta7 Simple Experiment 2: Cancer cells versus normal cells beta7-

31 Limma Objects BG correction / normalization demo o RGList o MAList o MArrayLM o Various procedures for BG correction, normalization o Jurkat data: 121 same vs. same Simple Experiment 1: Integrin beta7+ vs beta7 Simple Experiment 2: Cancer cells versus normal cells beta7- beta7+ Normal cell line Cancer cell line o Reading two-color data o Control spots o Background correction o Dye-swaps o Empirical Bayes differential expression 123 o Reading Affymetrix data o Simple design versus contrasts

32 Factorial Experiment: Estrogen Gene set analysis Estrogen Absent 10 hours 48 hours o Convenient functional category analysis o Newer, flexible set-based testing Present o Reading Affymetrix data o Factorial designs, contrasts Targets of SAHA and depsipeptide A time course experiment o Study effects of SAHA and depsipeptide on the acute T-cell leukemia cell line CEM o SAHA and depsipeptide are structurally different but have similar biological effects (induce death through intrinsic apoptotic pathway) o Prising out subtle differences is of great interest SAHA/depsipeptide: Experimental design SAHA Vehicle only depsipeptide 0hr 0hr 0hr 1hr 1hr 1hr 2hr 2hr 2hr 4hr 4hr 4hr 8hr 8hr 8hr hr 16hr 16hr 128 Time courses of 6 arrays were done at each time

33 Aims of experiment o Identify common responders: genes which respond similarly to SAHA and depsipeptide o Identify specific responders: genes which respond to one of SAHA or depsipeptide, but not to the other o Different responders, genes which respond to both SAHA and depsipeptide but differently, are of lesser interest Linear model analysis o Fit genewise linear models to all the arrays simultaneously o Include effects for drug x time o Allow for probe-specific dye-effects o Treat each time series of 6 arrays as a randomized block, i.e., allow arrays hybridized together to be correlated nd statistical point analyse all arrays together

Linear Models and Empirical Bayes Methods for. Assessing Differential Expression in Microarray Experiments

Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments by Gordon K. Smyth (as interpreted by Aaron J. Baraff) STAT 572 Intro Talk April 10, 2014 Microarray