(134/ !"#!$%&%' ())$+,-.)//01$2345/0/ M<39$N1$;<)5$")3/B.)O ;).C0141=5 " /7$83.9$:

Size: px

Start display at page:

Download "(134/ !"#!$%&%' ()*)$+,-.)//01*$2*345/0/ M<39$N1$;<)5$")3/B.)O ;).C0*141=5 " /7$83.9$:"

Alyson Parrish
5 years ago
Views:

1 (134/!"#!$%&%' ()*)$+,-)//01*$2*345/0/ "061335/7$839$: %&'()*&%&+,($'*&$%'-& D1C340E3?1*$F91$/93?/?634$34=109<C/H "*%'/01&$23$'(()%45"+$"+$6&+,&*$"7$-0(,*02)5"+ 8"6'/$4"/3+"%0'/$*&9*&((0"+ "*%'/01&$23$'(()%45"+$"+$&+5*&$-0(,*02)5"+ :)'+5/&$+"*%'/01'5"+ I ;)C0*141=5 M<39$N1$;<)5$")3/B)O " $3357$1$6<0- J5K00E3?1* 81K)/ L)39B)/ ;3=)9 #63**) :*9)*/0?)/ "061335/ ()*)$+,-)//01*$235/ +,1*$235/

2 ;3*/60-?1* P)Q)/)$93*/60-?1*!41*)$6ND2$/93*/7$61C-4)C)*935$91$9<)$CPD2 ND2 CPD2 T T C C T C C T T PD2$ -145C)3/) U U C C L1C$ND2$91$CPD2 CPD2 6ND2 U U C C U C P)Q)/)$ 93*/60-93/) T T C T T C T T C C T T C C T T C C T T T C T T DB64)06$260 N)*39B3?1* 2$;$($!$!$($;$;$($!$2 2$;$($!$!$($;$;$($!$2 ;$2$!$($($!$2$2$!$($; ;$2$!$($($!$2$2$!$($;

3 J5K00E3?1* 2$;$($!$!$($;$;$($!$2 J5K00E3?1* 2$;$($!$!$($;$;$($!$2 ;$2$!$($($!$2$2$!$($;!$!$!$;$2$;$!$($!$2$;$ 2$!$!$;$;$2$!$($!$;$2$ ;$2$!$($($!$2$2$!$($;!$!$!$;$2$;$!$($!$2$;$ 2$!$!$;$;$2$!$($!$;$2$!"#$%&%'()*+!"#$%&%'()*+ ;3=)9 CTT CT R3K)4)$;3=)9 CTT CT L)39B)/$1$81K)/ TCT CT

4 843S1C/$9<39$1C0*39)$C3T)9 2U5C)90,$F<0=<$)*/0957$1*)$6141H 2=04)*9$F6064)/$1*$=07$1*)$1$91$6141H :44BC0*3$F<0=<$)*/0957$1*)$1$91$6141H D0CK4)=)*$F<0=<$)*/0957$1*)$1$91$6141H J0=<$)*/0957$1*)$6141 2U5C)90, :44BC0*3$B/)/$K)3/$0*/9)3$1>$0*V/09B$/)WB)*60*= #)WB)*6)$FJ0=<$)*/095H J5K00E3?1*$1>$CPD2$91$-1K)/ XY

5 2U5C)90,$()*)!<0- [ $235/ * * * * * ')>1)$R3K)40*= Sample 1 Sample 2 XZ rray 1 rray 2 ')>1)$J5K00E3?1*@$\*)$!<3**)4 Sample 1 Sample 2 2])$J5K00E3?1* rray 1 rray 2 rray 1 rray 2

6 #63**)$:C3=) ^B3*?_63?1* rray 1 rray 2 rray 1 rray 2 "061335$:C3=) ;1$61417$=0$1*$6064)

7 Sample 1 Sample 2 ')>1)$J5K00E3?1* Sample 1 Sample 2 rray 1 rray 1 2])$J5K00E3?1* #63**)$:C3=) rray 1 rray 1

8 ^B3*?_63?1* "061335$:C3=) 4,0 2,4 0,0 3,3 rray 1 Measurements Samples (individuals) Probes (genes) N article MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia Scott rmstrong 1 4, Jane E Staunton 5, Lewis B Silverman 1,3,4, Rob Pieters 6, Monique L den Boer 6, Mark D Minden 7, Stephen E Sallan 1,3,4, Eric S Lander 5, Todd R olub 1,3,4,5 * & Stanley J Korsmeyer 2,4,8 * *These authors contributed equally to this work Published online: 3 December 2001, DOI: /ng765 DT MTRIX

9 Common Types of Data nalysis Microarray Data Class Discovery (Clustering, Unsupervised learning) Class Prediction (Classification, Supervised Learning) Class Comparison (Differential Expression) Why log? Why logs? For better of worse, fold changes are the preferred quantification of differential expression Fold changes are basically ratios Ratios are not symmetric around 1 This makes it problematic to perform statistical operations with ratios We prefer logs

10 Why logs Scatter Plot The intensity distribution has a fat right tail Log of ratios are symmetric around 0: verage of 1/10 and 10 is about 5 verage of log(1/10) and log(10) is 0 veraging ratios is almost always a bad idea! Facts you must remember: log(1) = 0 log(xy) = log(x) + log(y) log(y/x) = log(y) - log(x) log( X) = 1/2 log(x) 45 rotation highlights a problem Experiments with replicates If we are interested in genes with over-all large fold changes why not look at average (log) fold changes? Experience has shown that one usually wants to stratify by over-all expression We can make averaged M plots: M = difference in average log intensities and = average of log intensities This is referred to as Mplot

M plot of average log ratios Scatter Smooth This particular problem addressed by RM (see last slide for ref) Normalization Normalization is needed to ensure that differences in intensities are indeed

11 M plot of average log ratios Scatter Smooth This particular problem addressed by RM (see last slide for ref) Normalization Normalization is needed to ensure that differences in intensities are indeed due to differential expression, and not some printing, hybridization, or scanning artifact D1C340E0*=$6)*9)$1>$0/90KB?1* F41634$-145*1C034$)=)//01*H Normalization is necessary before any analysis which involves within or between slides comparisons of intensities, eg, clustering, classification, testing Somewhat different approaches are used in twocolor and one-color technologies ``

Most Common Problem Scatter Plot Intensity dependent effect: Different background level most likely culprit Demonstrates importance of M plot Example:

12 Most Common Problem Scatter Plot Intensity dependent effect: Different background level most likely culprit Demonstrates importance of M plot Example: Intensity Effect Loess The most common problem is intensity dependent effects Probably due to different background Loess is used to estimate and remove this effect

14 Loess LOcal regression Sometimes lowess for LOcally Weighted regression For each interval i Fit linear model using Ni points in interval using least squares min β 0i, β 1i N i j=1 (y ij β 0i β 1i x ij ) 2 Loess Why stop at linear fits? Can use polynomials For each interval i Fit quadratic model using Ni points in interval using least squares min β 0i, β 1i, β 2i N i j=1 (y ij β 0i β 1i x ij β 2i x 2 ij) 2 Loess What if data is irregularly spaced then some intervals will have more data points to estimate the fit One solution: for each value use α% of the data points (adaptive) Similar to k-nearest neighbors There are other more sophisticated solutions

15 Example of Replicate Data Quantile normalization Loess estimates mean of intensity effect and corrects based on that Quantile normalization assumes distributions must match So-called distribution normalization It is fast and conceptually simple Basic idea: order value in each array take average across probes Substitute probe intensity with average Put in original order Example of quantile normalization Original Ordered veraged Re-ordered Preprocessing and Normalization There is a lot more we havenʼt discussed But, this is essential to successful use of microarray data There is a lot of interest in analyzing across experiments Say, multiple tumor/normal comparisons Proper normalization is hard but essential Further reading: frozen RM and the microarray barcode (PMIDS: and ) PUBMED:

Normalization. Example of Replicate Data. Biostatistics Rafael A. Irizarry

Normalization. Example of Replicate Data. Biostatistics Rafael A. Irizarry This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this