Probe-Level Analysis of Affymetrix GeneChip Microarray Data

Size: px
Start display at page:

Download "Probe-Level Analysis of Affymetrix GeneChip Microarray Data"

Transcription

1 Probe-Level Analysis of Affymetrix GeneChip Microarray Data Ben Bolstad Biostatistics, University of California, Berkeley University of Minnesota Mar 30, 2004

2 Outline A brief introduction to the Affymetrix GeneChip Technology Pre-processing Background Correction Normalization Probe Level Modeling Models Applications

3 Brief Technology Overview High density oligonucleotide array technology as developed by Affymetrix Known as the GeneChip Overview images courtesy of Affymetrix unless otherwise specified

4 Probes and Probesets

5 Two Probe Types Reference Sequence TAGGTCTGTATGACAGACACAAAGAAGATG CAGACATAGTGTCTGTGTTTCTTCT CAGACATAGTGTGTGTGTTTCTTCT PM: the Perfect Match MM: the Mismatch

6 Constructing the Chip Source: Lipshutz et al (1999) Nature Genetics Supplement: The Chipping Forecast

7 Focusing on a Single GeneChip Cell Location

8 Sample Preparation

9 Hybridization to the Chip

10 The Chip is Scanned

11 Chip dat file checkered board oligo B2 Courtesy: F. Colin

12 Chip dat file checkered board close up w/ grid Courtesy: F. Colin

13 Chip dat file checkered board close up pixel selection

14 Chip cel file checkered board Courtesy: F. Colin

15 Probe-Level Analysis What is Probe-level analysis? Analysis and manipulation of probe intensity data Expression calculation: Background, Normalization, Summarization Fitting models to probe intensity data Quality control diagnostics Why do we do it? Hopefully it will allow us to produce better, more biologically meaningful gene expression values We want accurate (low bias) and precise (low variance) gene expression estimates Is there additional information at the probe-level that we might otherwise throw away?

16 High-Level Analysis Clustering/Classification Pathway Analysis Cell Cycle Gene function Anything where a more biological interpretation is desired Such matters will not be discussed further in today's talk.

17 Background/Signal Adjustment A method which does some or all of the following Corrects for background noise, processing effects Adjusts for cross hybridization Adjust estimated expression values to fall on proper scale Probe intensities are used in background adjustment to compute correction (unlike cdna arrays where area surrounding spot might be used)

18 RMA Background Approach Convolution Model = + Observed O Signal S ( ) Noise N Exp α ( 2 N µ, σ )

19 E Correction is given by ( ) SO= o = a+ b a= o, b= 2 µ σ α σ φ a o a φ b b a o a Φ +Φ 1 b b

20 Normalization Non-biological factors can contribute to the variability of data... In order to reliably compare data from multiple probe arrays, differences of non-biological origin must be minimized. 1 Normalization is a process of reducing unwanted variation across chips. It may use information from multiple chips 1 GeneChip 3.1 Expression Analysis Algorithm Tutorial, Affymetrix technical support

21 Non-Biological Variability 5 scanners for 6 dilution groups

22 Non-linear normalization needed Unnormalized Scaled A Non-linear Normalization

23 Quantile Normalization Normalize so that the quantiles of each chip are equal. Simple and fast algorithm. Goal is to give same distribution to each chip. Original Distribution Target Distribution

24 Sort Sort columns of of original matrix Take averages across rows Take averages across rows Set Set average as as value for for All All elements in in the the row row Unsort columns of of matrix to to original order

25 It Reduces Variability Expression Values Fold change Also no serious bias effects. For more see Bolstad et al (2003)

26 General Probe Level Model y ij = f( X) + ε ij Where f(x) is function of factor (and possibly covariate) variables (our interest will be in linear functions) y ij is a pre-processed probe intensity (usually log scale) Assume that E = 0 Var ε ij ε ij = σ 2

27 Parallel Behavior Suggests Multi-chip Model

28 Probe Pattern Suggests Including Probe-Effect

29 Also Want Robustness

30 Summarization Problem: Calculating gene expression values. How do we reduce the probe intensities for each probeset on to a gene expression value? Our Approach RMA a robust multi-chip linear model fit on the log scale Some Other Approaches Single chip AvDiff (Affymetrix) no longer recommended for use due to many flaws Mas 5.0 (Affymetrix) use a 1 step Tukey biweight to combine the probe intensities in log scale Multiple Chip MBEI (Li-Wong dchip) a multiplicative model on natural scale

31 The Three Steps of RMA 1. Convolution Background 2. Quantile Normalization 3. Linear model on the log2 scale fit robustly. Software for implementing RMA is in the Bioconductor affy package

32 RMA mostly does well in practice Detecting Differential Expression Not noisy in low intensities RMA MAS 5.0

33 One Drawback RMA MAS 5.0 Some fixes for this are being developed see GCRMA (Irizarry and Wu, JHU)

34 The RMA model y = m+ α + β + ε ij i j ij where y ij α i β j = log N B 2 ( ( PM )) ij is a probe-effect i= 1,,I is chip-effect ( m + β j is log2 gene expression on array j) j=1,,j

35 Median Polish Algorithm y11 y1 J 0 yi1 yij Imposes Constraints Sweep Columns Iterate medianα = median β = 0 i Sweep Rows median ε = median ε = 0 i ij j ij j ˆ ε ˆ ε αˆ 11 1J 1 ˆ ε ˆ ˆ I1 ε α IJ I ˆ ˆ β β mˆ 1 J

36 Advantages Fast Very robust Disadvantages Median Polish No algorithmic flexibility to fit alternative models No standard error estimates

37 An Alternative Method for Fitting a PLM Robust regression using M-estimation In this talk, we will use Huber s influence function ψ. The software handles many more. Fitting algorithm is IRLS with weights dependent on current residuals ψ r r ( ) Software for fitting such models is part of affyplm package of Bioconductor

38 Variance Covariance Estimates Suppose model is Y = Xβ + ε Huber (1981) gives three forms for estimating variance covariance matrix 2 κ 1/ ( n p) ψ ( r) 1/ n i i ψ ( r) i i 2 2 ( T X X ) 1 1/( n p) ψ ri i κ 1/ n ψ r i ( ) ( ) i 2 W 1 1 1/ ( ) ( ) ( T n p ψ r ) i W X X W κ i We will use this form T ' W = X Ψ X

39 We Will Focus on Two Particular PLM Array effect model Array Effect Pre-processed Log PM intensity Treatment effect model y = α + β + ε ij i j ij y = α + τ + ε ij i l ij j In both cases I i= 1 α i = 0 Probe Effect Treatment Effect

40 Detecting Differential Expression Problem: Given an experiment with two treatment groups correctly identify the differential genes without incorrectly choosing non differential genes. Question 1: How do different methods for DDE compare? Question 2: Can we do better using PLM s?

41 How Do We Know Which Genes are Differential? Spike-in datasets. Transcripts at known concentrations differing across arrays. Common background crna. Typically, Latin Square design Affymetrix HG-U95A GeneLogic AML bacterial spike-ins GeneLogic Tonsil bacterial spike-ins

42 Affymetrix Spike-in Data 59 chips. All but 1 of the rows are done as triplicates A B C D E F G H I J K L M N O P Q R S T

43 Testing for Differential Expression On a probeset by probeset basis compute a test statistic Should include something related to observed FC and some variability estimate A t statistic: something of the form t = X SE

44 Fold Change FC = X X l m Where X l = β j Ind( j group l) Ind( j group l)

45 Simple t-statistic t = X l s n m 2 s 2 l m l X + n m

46 Robust t-statistic t = X l s n m 2 s 2 l m l X + n m Use medians in place of means Use MAD in place of standard deviation

47 Simple Moderated t- statistic t = s n l 2 s 2 l m l X X m m + + n s med s 2 2 sl sm med is median + across all genes Analogous to SAM n l n m

48 Limma ebayes t- statistic Generalization of Bayesian method of Lonnstadt and Speed (2002) to the general linear model case An alternative and much more sophisticated moderated t-statistic. Variability is estimated using a posterior From the limma Bioconductor package.

49 Probe Level Model test statistics Σ Suppose that is component of the variancecovariance matrix related to β Let c be the contrast vector defined such that the j th element of c is 1 if array j is in group l n l 1 if array j is in group m n m 0 otherwise

50 Probe Level Model test statistics t PLM.1 = J j= 1 T c β c 2 j Σ jj PLM.2 T t = c β T c Σc

51 A First Comparison 8 chips from Affymetrix HG-U95A spike-in dataset 4 arrays for each of two concentration profiles Fit an array effect model to all 8 chips Compare the performance of the different methods by looking at all comparisons 1 vs 1 2 vs 2 3 vs 3 4 vs 4

52

53

54

55

56 What Happens as the Number of Arrays Increases? Expand comparison to all 24 Arrays with same concentration profiles from Affymetrix HG-U95A spike-in dataset Fit an array effect model to all 24 arrays Look at comparisons between equal number of chips

57

58

59

60

61 A Larger Comparison Look at the entire 59 chips for Affymetrix HG-U95A spike-in dataset Examine two cases. After standard preprocessing Fit models for each pairwise comparison (individual models) Fit a model to all 59 chips (single model) There are 91 pairwise comparisions

62 Results Method Individual Models Single Model 0% FP 5% FP AUC 0% FP 5% FP AUC FC Std Robust Mod PLM PLM Ebayes

63 More Spike-in Datasets Two GeneLogic Spike-in datasets Tonsil dataset (36 arrays) AML dataset (34 arrays) In each case use single models fitted to all arrays

64

65

66 What is Going On Here? Examine residuals stratified by concentration group Spike-ins Randomly chosen non-differential probesets at low, medium and high average expression

67 Affymetrix Spike-ins

68 Low Non-Differential

69 Middle Non-Differential

70 High Non-Differential

71 GeneLogic AML Spike-ins

72 GeneLogic Tonsil dataset

73 How About With More Real Data? GeneLogic Dilution/Mixture Dataset Learning Set Liver 30x VS CNS 30x 400 top genes truth 75:25 5x VS 25:75 5x Test Set Mixture Data Dilution Series Data

74 Mixture Data Results Method 3 vs 3 4 vs 4 5 vs 5 0% FP 5% FP AUC 0% FP 5% FP AUC 0% FP 5% FP AUC FC Std Robust Mod PLM PLM Ebayes

75 What About the Treatment Effect Model? With treatment effect model, have more observations contributing to estimation for each element of the variance covariance matrix Much quicker to fit

76

77 Moderation for the PLM test statistic Method 5 vs 5 0% FP 5% FP AUC PLM Ebayes PLM Moderated (Preliminary Results)

78 Quality Assessment using PLM PLM quantities useful for assessing chip quality Weights Residuals Standard Errors Expression values relative to median chip

79 Pseudo-chip images Weights Residuals Positive Residuals Negative Residuals

80 NUSE Normalized Unscaled Standard Errors

81 RLE Relative Log Expression

82 Ongoing Work in this Area Technology changes: What still works? What doesn t? Other probe-level models (eg Introduce sequence information, MM s)

83 Acknowledgements Terry Speed (UC Berkeley) Rafael Irizarry (Johns Hopkins) Francois Colin (UC Berkeley) Julia Brettschneider (UC Berkeley) Bioconductor Core

84 Additional Slides

85 From Chip To Data

86 Constructing a gene expression measure

87 Computing Expression Measures: A Three Step Procedure 1. Background/Signal adjustment (B) 2. Normalization (N) 3. Summarization (S) Let be cel file data from multiple arrays then X X Expression values = S(N(B( )))

88 Background Methods Affymetrix Location dependent Ideal mismatch RMA Convolution model Other Standard curve adjustment GCRMA (Wu, Irizzary et al 2003) + =

89 Background Signal Methods Affymetrix Location dependent background based on grids I will refer to this as the MAS 5 background Originally proposed subtracting MM from PM but this is problematic because as many as a third of MM s are greater than the respective PM No longer used Now uses what they refer to as the Ideal Mismatch which is MM when possible and something else when not possible (designed so that there is now no negatives) Call this IMM

90 Original RMA Background Convolution model is suggested by looking at density of observed empirical distributions

91 Convolution Model O = S + N O is observed PM, S is signal (assumed exponential), N is noise (assumed normal, truncated at zero) Correction is then a o a φ φ E ( ) b = = + b S O o a b a o a Φ Φ b b a = o, b = 2 µ σ α σ 1

92 A Standard Curve Adjustment Based on Spike-in Information Observes that there is a curve that relates observed expression and spike-in concentration. The ideal would be to have a linear relationship between concentration and computed expression. The curve gives us a concentration dependent adjustment

93 What About Non Spike-ins? We don t know a concentration for most probesets. If we did, or if we had a variable that related to concentration, the adjustment would be easy to perform Fit the following model y y ( k ) = α + ε ( k) ( k) 1i i i ( k) k ( k) = + + ( k) ( ) 2 i αi γ ε' i Where y y = log ( k) ( k) 1i 2 i = log PM MM ( k) ( k) 2i 2 i

94 γ Relates to Concentration

95 Establishing a Relationship γ Between and Concentration

96 The Two Curves Yield an Adjustment Curve

97 Normalization Methods Methods already compared in Bolstad et al (2003) Baseline (normalized using reference chip) Scaling (Affymetrix) Non linear (Li-Wong) Complete data (no reference chip, information from all arrays used) Quantile normalization (Bolstad et al 2003) Contrast (Åstrand, 2003) Cyclic Loess

98 Why Quantile Normalization? Quantile normalization found to perform acceptably in reducing variance without drastic bias effects Quantile normalization is fast

99 RMA Model To each probeset (k), with i being number of probes and j being number of chips, fit the model: y = α + β + ε ( k) ( k) ( k) ( k) ij i j ij ( k) ( k) where αi is a probe effect and β j is the ( k ) log gene expression. y ij is the log2 background adjusted and normalized PM intensity Different ways to fit this model Median polish quick Robust linear model yields some good quality diagnostic tools

100 Basic RMA model Let then y ij = log N B 2 ( ( PM )) ij y = m+ α + β + ε ij i j ij where α i β j is probe-effect is chip-effect ( m + β j is log2 gene expression on array j) Median-polish imposes constraints medianα = median β = 0 i median ε = median ε = 0 i ij j ij j

101 Advantages/Disadvantag es of RMA/Median polish Advantages Fast Gene expression measures perform favorably when compared with MAS 5.0, Li-Wong MBEI Robust against outliers Disadvantages No standard error estimates No algorithmic flexibility to fit alternative models

102 Probe Level Models are RMA method based on RMA Convolution Model Background Quantile Normalization Summarization using a robust multi-chip model on the log scale. Model is fitted using the median polish algorithm on a probeset by probeset basis

103 Comparing the background methods Using an Affymetrix spike-in experiment we shall examine Observed vs spike-in concentration Observed vs expected fold change Composite M vs A plots ROC curves In each case we will compute expression values use standard RMA methodology. (ie quantile normalization, median polish summarization)

104 Assessing Bias: Observed Expression vs Spike-in Concentration Slope None RMA MAS 5 IMM MAS5/ S.C.A. IMM All Mid Low High

105 Assessing Bias: Observed Fold-change versus Expected Fold-change Slope None RMA MAS 5 IMM MAS5/ IMM S.C.A. All

106 Assessing Variability: M vs A plots Vertical Axis is M: log2 fold-change. Horizontal Axis is A: average expression value (on log2 scale) aka geometric average. Ideally non differential genes tight about M=0

107

108

109

110

111

112

113 Detecting Differential Expression: ROC Curves

114 Summary of Trade-offs Background Method No Background RMA MAS 5.0 Ideal Mismatch MAS5.0/IdealMM Standard Curve Adjustment Detect Differential Genes Good Good Good Poor Poor Good Accurate estimates of Fold change Poor Poor Poor Good Good Good

115 Comparing the Normalization Methods Want to reduce variation but at the same time we do not want to introduce any bias First a quick examination of the expression values by array Using same spike-in experiment as before, this time no background correction, only normalization and median polish summarization.

116

117

118

119 Scaling is Not Sufficient

120 Variability of Non-Differential Genes is Reduced

121 Little effect on Spike-ins Method All Low Mid High FC No Normalization (0.845) (0.148) (0.733) (0.207) (0.952) Quantile (0.851) (0.153) (0.741) (0.224) (0.955) Scaling (0.852) (0.156) (0.742) 0.33 (0.225) (0.954)

122 ROC Curves

123 Comparing Established Expression Measures

124 Slope All Mid Low High Value

125 Slope All Mid Low High Value

126 Slope All Mid Low High Value

127 Slope All Mid Low High Value

128 Slope All Mid Low High Value

129 Slope All Mid Low High Value

130 Slope: 0.484

131 Slope: 0.624

132 Slope: 0.583

133 Slope: 0.683

134 Slope: 0.692

135 Slope: 0.847

Low-Level Analysis of High- Density Oligonucleotide Microarray Data

Low-Level Analysis of High- Density Oligonucleotide Microarray Data Low-Level Analysis of High- Density Oligonucleotide Microarray Data Ben Bolstad http://www.stat.berkeley.edu/~bolstad Biostatistics, University of California, Berkeley UC Berkeley Feb 23, 2004 Outline

More information

Probe-Level Analysis of Affymetrix GeneChip Microarray Data

Probe-Level Analysis of Affymetrix GeneChip Microarray Data Probe-Level Analysis of Affymetrix GeneChip Microarray Data Ben Bolstad http://www.stat.berkeley.edu/~bolstad Biostatistics, University of California, Berkeley Memorial Sloan-Kettering Cancer Center July

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Bradley Broom Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org bmbroom@mdanderson.org

More information

Microarray Preprocessing

Microarray Preprocessing Microarray Preprocessing Normaliza$on Normaliza$on is needed to ensure that differences in intensi$es are indeed due to differen$al expression, and not some prin$ng, hybridiza$on, or scanning ar$fact.

More information

Lesson 11. Functional Genomics I: Microarray Analysis

Lesson 11. Functional Genomics I: Microarray Analysis Lesson 11 Functional Genomics I: Microarray Analysis Transcription of DNA and translation of RNA vary with biological conditions 3 kinds of microarray platforms Spotted Array - 2 color - Pat Brown (Stanford)

More information

Session 06 (A): Microarray Basic Data Analysis

Session 06 (A): Microarray Basic Data Analysis 1 SJTU-Bioinformatics Summer School 2017 Session 06 (A): Microarray Basic Data Analysis Maoying,Wu ricket.woo@gmail.com Dept. of Bioinformatics & Biostatistics Shanghai Jiao Tong University Summer, 2017

More information

Normalization. Example of Replicate Data. Biostatistics Rafael A. Irizarry

Normalization. Example of Replicate Data. Biostatistics Rafael A. Irizarry This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Lecture 1: The Structure of Microarray Data

Lecture 1: The Structure of Microarray Data INTRODUCTION TO MICROARRAYS 1 Lecture 1: The Structure of Microarray Data So, Why are we here? What are we trying to measure, and how? Mechanics the data files produced and used Getting numbers: normalization

More information

affyplm: Fitting Probe Level Models

affyplm: Fitting Probe Level Models affyplm: Fitting Probe Level Models Ben Bolstad bolstad@stat.berkeley.edu October 16, 2005 Contents 1 Introduction 2 2 Fitting Probe Level Models 2 2.1 What is a Probe Level Model and What is a PLMset?...........

More information

Data Preprocessing. Data Preprocessing

Data Preprocessing. Data Preprocessing Data Preprocessing 1 Data Preprocessing Normalization: the process of removing sampleto-sample variations in the measurements not due to differential gene expression. Bringing measurements from the different

More information

Error models and normalization. Wolfgang Huber DKFZ Heidelberg

Error models and normalization. Wolfgang Huber DKFZ Heidelberg Error models and normalization Wolfgang Huber DKFZ Heidelberg Acknowledgements Anja von Heydebreck, Martin Vingron Andreas Buness, Markus Ruschhaupt, Klaus Steiner, Jörg Schneider, Katharina Finis, Anke

More information

cdna Microarray Analysis

cdna Microarray Analysis cdna Microarray Analysis with BioConductor packages Nolwenn Le Meur Copyright 2007 Outline Data acquisition Pre-processing Quality assessment Pre-processing background correction normalization summarization

More information

affyplm: Fitting Probe Level Models

affyplm: Fitting Probe Level Models affyplm: Fitting Probe Level Models Ben Bolstad bmb@bmbolstad.com http://bmbolstad.com June 13, 2018 Contents 1 Introduction 2 2 Fitting Probe Level Models 2 2.1 What is a Probe Level Model and What is

More information

Empirical Bayes Moderation of Asymptotically Linear Parameters

Empirical Bayes Moderation of Asymptotically Linear Parameters Empirical Bayes Moderation of Asymptotically Linear Parameters Nima Hejazi Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi nimahejazi.org twitter/@nshejazi github/nhejazi

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org

More information

Laurent Gautier 1, 2. August 18 th, 2009

Laurent Gautier 1, 2. August 18 th, 2009 s Processing of Laurent Gautier 1, 2 1 DTU Multi-Assay Core(DMAC) 2 Center for Biological Sequence (CBS) August 18 th, 2009 s 1 s 2 3 4 s 1 s 2 3 4 Photolithography s Courtesy of The Science Creative Quarterly,

More information

Seminar Microarray-Datenanalyse

Seminar Microarray-Datenanalyse Seminar Microarray- Normalization Hans-Ulrich Klein Christian Ruckert Institut für Medizinische Informatik WWU Münster SS 2011 Organisation 1 09.05.11 Normalisierung 2 10.05.11 Bestimmen diff. expr. Gene,

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Bradley Broom Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org bmbroom@mdanderson.org

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Bradley Broom Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org bmbroom@mdanderson.org

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org kcoombes@mdanderson.org

More information

SUPPLEMENTAL DATA: ROBUST ESTIMATORS FOR EXPRESSION ANALYSIS EARL HUBBELL, WEI-MIN LIU, AND RUI MEI

SUPPLEMENTAL DATA: ROBUST ESTIMATORS FOR EXPRESSION ANALYSIS EARL HUBBELL, WEI-MIN LIU, AND RUI MEI SUPPLEMENTAL DATA: ROBUST ESTIMATORS FOR EXPRESSION ANALYSIS EARL HUBBELL, WEI-MIN LIU, AND RUI MEI ABSTRACT. This is supplemental data extracted from the paper Robust Estimators for Expression Analysis

More information

Prediction of double gene knockout measurements

Prediction of double gene knockout measurements Prediction of double gene knockout measurements Sofia Kyriazopoulou-Panagiotopoulou sofiakp@stanford.edu December 12, 2008 Abstract One way to get an insight into the potential interaction between a pair

More information

Empirical Bayes Moderation of Asymptotically Linear Parameters

Empirical Bayes Moderation of Asymptotically Linear Parameters Empirical Bayes Moderation of Asymptotically Linear Parameters Nima Hejazi Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi nimahejazi.org twitter/@nshejazi github/nhejazi

More information

Optimal normalization of DNA-microarray data

Optimal normalization of DNA-microarray data Optimal normalization of DNA-microarray data Daniel Faller 1, HD Dr. J. Timmer 1, Dr. H. U. Voss 1, Prof. Dr. Honerkamp 1 and Dr. U. Hobohm 2 1 Freiburg Center for Data Analysis and Modeling 1 F. Hoffman-La

More information

Androgen-independent prostate cancer

Androgen-independent prostate cancer The following tutorial walks through the identification of biological themes in a microarray dataset examining androgen-independent. Visit the GeneSifter Data Center (www.genesifter.net/web/datacenter.html)

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 4, Issue 1 2005 Article 30 Weighted Analysis of Paired Microarray Experiments Erik Kristiansson Anders Sjögren Mats Rudemo Olle Nerman

More information

Expression arrays, normalization, and error models

Expression arrays, normalization, and error models 1 Epression arrays, normalization, and error models There are a number of different array technologies available for measuring mrna transcript levels in cell populations, from spotted cdna arrays to in

More information

Processing microarray data with Bioconductor

Processing microarray data with Bioconductor Processing microarray data with Bioconductor Statistical analysis of gene expression data with R and Bioconductor University of Copenhagen Copenhagen Biocenter Laurent Gautier 1, 2 August 17-21 2009 Contents

More information

Gordon Smyth, WEHI Analysis of Replicated Experiments. January 2004 IMS NUS Microarray Tutorial. Some Web Sites. Analysis of Replicated Experiments

Gordon Smyth, WEHI Analysis of Replicated Experiments. January 2004 IMS NUS Microarray Tutorial. Some Web Sites. Analysis of Replicated Experiments Some Web Sites Statistical Methods in Microarray Analysis Tutorial Institute for Mathematical Sciences National University of Sinapore January, 004 Gordon Smyth Walter and Eliza Hall Institute Technical

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Package plw. R topics documented: May 7, Type Package

Package plw. R topics documented: May 7, Type Package Type Package Package plw May 7, 2018 Title Probe level Locally moderated Weighted t-tests. Version 1.40.0 Date 2009-07-22 Author Magnus Astrand Maintainer Magnus Astrand

More information

Estimation of Transformations for Microarray Data Using Maximum Likelihood and Related Methods

Estimation of Transformations for Microarray Data Using Maximum Likelihood and Related Methods Estimation of Transformations for Microarray Data Using Maximum Likelihood and Related Methods Blythe Durbin, Department of Statistics, UC Davis, Davis, CA 95616 David M. Rocke, Department of Applied Science,

More information

Advanced Statistical Methods: Beyond Linear Regression

Advanced Statistical Methods: Beyond Linear Regression Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) Authored by: Sarah Burke, PhD Version 1: 31 July 2017 Version 1.1: 24 October 2017 The goal of the STAT T&E COE

More information

Practical Applications and Properties of the Exponentially. Modified Gaussian (EMG) Distribution. A Thesis. Submitted to the Faculty

Practical Applications and Properties of the Exponentially. Modified Gaussian (EMG) Distribution. A Thesis. Submitted to the Faculty Practical Applications and Properties of the Exponentially Modified Gaussian (EMG) Distribution A Thesis Submitted to the Faculty of Drexel University by Scott Haney in partial fulfillment of the requirements

More information

Linear Models and Empirical Bayes Methods for Microarrays

Linear Models and Empirical Bayes Methods for Microarrays Methods for Microarrays by Gordon Smyth Alex Sánchez and Carme Ruíz de Villa Department d Estadística Universitat de Barcelona 16-12-2004 Outline 1 An introductory example Paper overview 2 3 Lönnsted and

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice

The Model Building Process Part I: Checking Model Assumptions Best Practice The Model Building Process Part I: Checking Model Assumptions Best Practice Authored by: Sarah Burke, PhD 31 July 2017 The goal of the STAT T&E COE is to assist in developing rigorous, defensible test

More information

Simultaneous Inference for Multiple Testing and Clustering via Dirichlet Process Mixture Models

Simultaneous Inference for Multiple Testing and Clustering via Dirichlet Process Mixture Models Simultaneous Inference for Multiple Testing and Clustering via Dirichlet Process Mixture Models David B. Dahl Department of Statistics Texas A&M University Marina Vannucci, Michael Newton, & Qianxing Mo

More information

Bayesian Nonparametric Meta-Analysis Model George Karabatsos University of Illinois-Chicago (UIC)

Bayesian Nonparametric Meta-Analysis Model George Karabatsos University of Illinois-Chicago (UIC) Bayesian Nonparametric Meta-Analysis Model George Karabatsos University of Illinois-Chicago (UIC) Collaborators: Elizabeth Talbott, UIC. Stephen Walker, UT-Austin. August 9, 5, 4:5-4:45pm JSM 5 Meeting,

More information

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant

More information

Intro. to Tests for Differential Expression (Part 2) Utah State University Spring 2014 STAT 5570: Statistical Bioinformatics Notes 3.

Intro. to Tests for Differential Expression (Part 2) Utah State University Spring 2014 STAT 5570: Statistical Bioinformatics Notes 3. Intro. to Tests for Differential Expression (Part 2) Utah State University Spring 24 STAT 557: Statistical Bioinformatics Notes 3.4 ### First prepare objects for DE test ### (as on slide 3 of Notes 3.3)

More information

EE/CpE 345. Modeling and Simulation. Fall Class 9

EE/CpE 345. Modeling and Simulation. Fall Class 9 EE/CpE 345 Modeling and Simulation Class 9 208 Input Modeling Inputs(t) Actual System Outputs(t) Parameters? Simulated System Outputs(t) The input data is the driving force for the simulation - the behavior

More information

Single gene analysis of differential expression. Giorgio Valentini

Single gene analysis of differential expression. Giorgio Valentini Single gene analysis of differential expression Giorgio Valentini valenti@disi.unige.it Comparing two conditions Each condition may be represented by one or more RNA samples. Using cdna microarrays, samples

More information

An automatic report for the dataset : affairs

An automatic report for the dataset : affairs An automatic report for the dataset : affairs (A very basic version of) The Automatic Statistician Abstract This is a report analysing the dataset affairs. Three simple strategies for building linear models

More information

Detection ASTR ASTR509 Jasper Wall Fall term. William Sealey Gosset

Detection ASTR ASTR509 Jasper Wall Fall term. William Sealey Gosset ASTR509-14 Detection William Sealey Gosset 1876-1937 Best known for his Student s t-test, devised for handling small samples for quality control in brewing. To many in the statistical world "Student" was

More information

Experimental Design. Experimental design. Outline. Choice of platform Array design. Target samples

Experimental Design. Experimental design. Outline. Choice of platform Array design. Target samples Experimental Design Credit for some of today s materials: Jean Yang, Terry Speed, and Christina Kendziorski Experimental design Choice of platform rray design Creation of probes Location on the array Controls

More information

Parametric Empirical Bayes Methods for Microarrays

Parametric Empirical Bayes Methods for Microarrays Parametric Empirical Bayes Methods for Microarrays Ming Yuan, Deepayan Sarkar, Michael Newton and Christina Kendziorski April 30, 2018 Contents 1 Introduction 1 2 General Model Structure: Two Conditions

More information

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data DEGseq: an R package for identifying differentially expressed genes from RNA-seq data Likun Wang Zhixing Feng i Wang iaowo Wang * and uegong Zhang * MOE Key Laboratory of Bioinformatics and Bioinformatics

More information

Assessing Model Adequacy

Assessing Model Adequacy Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for inferences. In cases where some assumptions are violated, there are

More information

MS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015

MS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015 MS&E 226 In-Class Midterm Examination Solutions Small Data October 20, 2015 PROBLEM 1. Alice uses ordinary least squares to fit a linear regression model on a dataset containing outcome data Y and covariates

More information

Overview - MS Proteomics in One Slide. MS masses of peptides. MS/MS fragments of a peptide. Results! Match to sequence database

Overview - MS Proteomics in One Slide. MS masses of peptides. MS/MS fragments of a peptide. Results! Match to sequence database Overview - MS Proteomics in One Slide Obtain protein Digest into peptides Acquire spectra in mass spectrometer MS masses of peptides MS/MS fragments of a peptide Results! Match to sequence database 2 But

More information

Homework 2. For the homework, be sure to give full explanations where required and to turn in any relevant plots.

Homework 2. For the homework, be sure to give full explanations where required and to turn in any relevant plots. Homework 2 1 Data analysis problems For the homework, be sure to give full explanations where required and to turn in any relevant plots. 1. The file berkeley.dat contains average yearly temperatures for

More information

Use of Agilent Feature Extraction Software (v8.1) QC Report to Evaluate Microarray Performance

Use of Agilent Feature Extraction Software (v8.1) QC Report to Evaluate Microarray Performance Use of Agilent Feature Extraction Software (v8.1) QC Report to Evaluate Microarray Performance Anthea Dokidis Glenda Delenstarr Abstract The performance of the Agilent microarray system can now be evaluated

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org

More information

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55

More information

Nonrobust and Robust Objective Functions

Nonrobust and Robust Objective Functions Nonrobust and Robust Objective Functions The objective function of the estimators in the input space is built from the sum of squared Mahalanobis distances (residuals) d 2 i = 1 σ 2(y i y io ) C + y i

More information

Bayesian model selection: methodology, computation and applications

Bayesian model selection: methodology, computation and applications Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program

More information

Regression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y.

Regression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y. Regression Bivariate i linear regression: Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables and. Generally describe as a

More information

Introduction to Supervised Learning. Performance Evaluation

Introduction to Supervised Learning. Performance Evaluation Introduction to Supervised Learning Performance Evaluation Marcelo S. Lauretto Escola de Artes, Ciências e Humanidades, Universidade de São Paulo marcelolauretto@usp.br Lima - Peru Performance Evaluation

More information

ISQS 5349 Spring 2013 Final Exam

ISQS 5349 Spring 2013 Final Exam ISQS 5349 Spring 2013 Final Exam Name: General Instructions: Closed books, notes, no electronic devices. Points (out of 200) are in parentheses. Put written answers on separate paper; multiple choices

More information

Summary statistics. G.S. Questa, L. Trapani. MSc Induction - Summary statistics 1

Summary statistics. G.S. Questa, L. Trapani. MSc Induction - Summary statistics 1 Summary statistics 1. Visualize data 2. Mean, median, mode and percentiles, variance, standard deviation 3. Frequency distribution. Skewness 4. Covariance and correlation 5. Autocorrelation MSc Induction

More information

Bioconductor Project Working Papers

Bioconductor Project Working Papers Bioconductor Project Working Papers Bioconductor Project Year 2004 Paper 6 Error models for microarray intensities Wolfgang Huber Anja von Heydebreck Martin Vingron Department of Molecular Genome Analysis,

More information

EE/CpE 345. Modeling and Simulation. Fall Class 10 November 18, 2002

EE/CpE 345. Modeling and Simulation. Fall Class 10 November 18, 2002 EE/CpE 345 Modeling and Simulation Class 0 November 8, 2002 Input Modeling Inputs(t) Actual System Outputs(t) Parameters? Simulated System Outputs(t) The input data is the driving force for the simulation

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org

More information

STATISTICAL METHODS FOR GENETICS AND GENOMICS STUDIES

STATISTICAL METHODS FOR GENETICS AND GENOMICS STUDIES STATISTICAL METHODS FOR GENETICS AND GENOMICS STUDIES A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY MEIJUAN LI IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR

More information

Non-specific filtering and control of false positives

Non-specific filtering and control of false positives Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Microarray data analysis

Microarray data analysis Microarray data analysis September 20, 2006 Jonathan Pevsner, Ph.D. Introduction to Bioinformatics pevsner@kennedykrieger.org Johns Hopkins School of Public Health (260.602.01) Copyright notice Many of

More information

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College 1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College An example ANOVA situation Example (Treating Blisters) Subjects: 25 patients with blisters Treatments: Treatment A, Treatment

More information

ABC random forest for parameter estimation. Jean-Michel Marin

ABC random forest for parameter estimation. Jean-Michel Marin ABC random forest for parameter estimation Jean-Michel Marin Université de Montpellier Institut Montpelliérain Alexander Grothendieck (IMAG) Institut de Biologie Computationnelle (IBC) Labex Numev! joint

More information

Unit 12: Analysis of Single Factor Experiments

Unit 12: Analysis of Single Factor Experiments Unit 12: Analysis of Single Factor Experiments Statistics 571: Statistical Methods Ramón V. León 7/16/2004 Unit 12 - Stat 571 - Ramón V. León 1 Introduction Chapter 8: How to compare two treatments. Chapter

More information

Bayesian Model Diagnostics and Checking

Bayesian Model Diagnostics and Checking Earvin Balderama Quantitative Ecology Lab Department of Forestry and Environmental Resources North Carolina State University April 12, 2013 1 / 34 Introduction MCMCMC 2 / 34 Introduction MCMCMC Steps in

More information

Week 3: Linear Regression

Week 3: Linear Regression Week 3: Linear Regression Instructor: Sergey Levine Recap In the previous lecture we saw how linear regression can solve the following problem: given a dataset D = {(x, y ),..., (x N, y N )}, learn to

More information

Auxiliary Variables in Mixture Modeling: Using the BCH Method in Mplus to Estimate a Distal Outcome Model and an Arbitrary Secondary Model

Auxiliary Variables in Mixture Modeling: Using the BCH Method in Mplus to Estimate a Distal Outcome Model and an Arbitrary Secondary Model Auxiliary Variables in Mixture Modeling: Using the BCH Method in Mplus to Estimate a Distal Outcome Model and an Arbitrary Secondary Model Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 21 Version

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

DETECTING DIFFERENTIALLY EXPRESSED GENES WHILE CONTROLLING THE FALSE DISCOVERY RATE FOR MICROARRAY DATA

DETECTING DIFFERENTIALLY EXPRESSED GENES WHILE CONTROLLING THE FALSE DISCOVERY RATE FOR MICROARRAY DATA University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Dissertations and Theses in Statistics Statistics, Department of 2009 DETECTING DIFFERENTIALLY EXPRESSED GENES WHILE CONTROLLING

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement

More information

Biochip informatics-(i)

Biochip informatics-(i) Biochip informatics-(i) : biochip normalization & differential expression Ju Han Kim, M.D., Ph.D. SNUBI: SNUBiomedical Informatics http://www.snubi snubi.org/ Biochip Informatics - (I) Biochip basics Preprocessing

More information

x 21 x 22 x 23 f X 1 X 2 X 3 ε

x 21 x 22 x 23 f X 1 X 2 X 3 ε Chapter 2 Estimation 2.1 Example Let s start with an example. Suppose that Y is the fuel consumption of a particular model of car in m.p.g. Suppose that the predictors are 1. X 1 the weight of the car

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Behavioral Data Mining. Lecture 7 Linear and Logistic Regression

Behavioral Data Mining. Lecture 7 Linear and Logistic Regression Behavioral Data Mining Lecture 7 Linear and Logistic Regression Outline Linear Regression Regularization Logistic Regression Stochastic Gradient Fast Stochastic Methods Performance tips Linear Regression

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature

More information

SPOTTED cdna MICROARRAYS

SPOTTED cdna MICROARRAYS SPOTTED cdna MICROARRAYS Spot size: 50um - 150um SPOTTED cdna MICROARRAYS Compare the genetic expression in two samples of cells PRINT cdna from one gene on each spot SAMPLES cdna labelled red/green e.g.

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

Algebra of Random Variables: Optimal Average and Optimal Scaling Minimising

Algebra of Random Variables: Optimal Average and Optimal Scaling Minimising Review: Optimal Average/Scaling is equivalent to Minimise χ Two 1-parameter models: Estimating < > : Scaling a pattern: Two equivalent methods: Algebra of Random Variables: Optimal Average and Optimal

More information

Algebra of Random Variables: Optimal Average and Optimal Scaling Minimising

Algebra of Random Variables: Optimal Average and Optimal Scaling Minimising Review: Optimal Average/Scaling is equivalent to Minimise χ Two 1-parameter models: Estimating < > : Scaling a pattern: Two equivalent methods: Algebra of Random Variables: Optimal Average and Optimal

More information

Linear Models and Empirical Bayes Methods for. Assessing Differential Expression in Microarray Experiments

Linear Models and Empirical Bayes Methods for. Assessing Differential Expression in Microarray Experiments Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments by Gordon K. Smyth (as interpreted by Aaron J. Baraff) STAT 572 Intro Talk April 10, 2014 Microarray

More information

Module 6: Model Diagnostics

Module 6: Model Diagnostics St@tmaster 02429/MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KU-LIFE Module 6: Model Diagnostics 6.1 Introduction............................... 1 6.2 Linear model diagnostics........................

More information

Statistical Algorithms Description Document

Statistical Algorithms Description Document Statistical Algorithms Description Document 00 Affymetrix, Inc. All Rights Reserved. TABLE OF CONTENTS OVERVIEW...3 GENECHIP ARRAY DESIGN...3 DATA OUTPUTS...3 DATA PREPARATION...4 MASKING... 4 BACKGROUND

More information

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference CS 229 Project Report (TR# MSB2010) Submitted 12/10/2010 hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference Muhammad Shoaib Sehgal Computer Science

More information

Technologie w skali genomowej 2/ Algorytmiczne i statystyczne aspekty sekwencjonowania DNA

Technologie w skali genomowej 2/ Algorytmiczne i statystyczne aspekty sekwencjonowania DNA Technologie w skali genomowej 2/ Algorytmiczne i statystyczne aspekty sekwencjonowania DNA Expression analysis for RNA-seq data Ewa Szczurek Instytut Informatyki Uniwersytet Warszawski 1/35 The problem

More information

Eco517 Fall 2014 C. Sims FINAL EXAM

Eco517 Fall 2014 C. Sims FINAL EXAM Eco517 Fall 2014 C. Sims FINAL EXAM This is a three hour exam. You may refer to books, notes, or computer equipment during the exam. You may not communicate, either electronically or in any other way,

More information

DIRECT VERSUS INDIRECT DESIGNS FOR edna MICROARRAY EXPERIMENTS

DIRECT VERSUS INDIRECT DESIGNS FOR edna MICROARRAY EXPERIMENTS Sankhyā : The Indian Journal of Statistics Special issue in memory of D. Basu 2002, Volume 64, Series A, Pt. 3, pp 706-720 DIRECT VERSUS INDIRECT DESIGNS FOR edna MICROARRAY EXPERIMENTS By TERENCE P. SPEED

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Diagnostics. Gad Kimmel

Diagnostics. Gad Kimmel Diagnostics Gad Kimmel Outline Introduction. Bootstrap method. Cross validation. ROC plot. Introduction Motivation Estimating properties of an estimator. Given data samples say the average. x 1, x 2,...,

More information

Instrumental Variables and the Problem of Endogeneity

Instrumental Variables and the Problem of Endogeneity Instrumental Variables and the Problem of Endogeneity September 15, 2015 1 / 38 Exogeneity: Important Assumption of OLS In a standard OLS framework, y = xβ + ɛ (1) and for unbiasedness we need E[x ɛ] =

More information