Chapter 3: Statistical methods for estimation and testing. Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001).

Size: px
Start display at page:

Download "Chapter 3: Statistical methods for estimation and testing. Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001)."

Transcription

1 Chapter 3: Statistical methods for estimation and testing Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001).

2 Chapter 3: Statistical methods for estimation and testing Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001). 3.1 Motivation: 1. Identify genes which show evidence of differential expression (DE).

3 Chapter 3: Statistical methods for estimation and testing Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001). 3.1 Motivation: 1. Identify genes which show evidence of differential expression (DE). In general, a study may involve one or a small number of genes, or many thousands of genes as in microarray experiments.

4 Chapter 3: Statistical methods for estimation and testing Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001). 3.1 Motivation: 1. Identify genes which show evidence of differential expression (DE). In general, a study may involve one or a small number of genes, or many thousands of genes as in microarray experiments. In microarray experiments, a primary goal is to rank genes according to evidence of DE.

5 Chapter 3: Statistical methods for estimation and testing Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001). 3.1 Motivation: 1. Identify genes which show evidence of differential expression (DE). In general, a study may involve one or a small number of genes, or many thousands of genes as in microarray experiments. In microarray experiments, a primary goal is to rank genes according to evidence of DE. 2. Matching DNA sequences.

6 For microarrays, we need to: (i) select one or more statistics to rank genes in order of evidence of DE, from strongest to weakest; and

7 For microarrays, we need to: (i) select one or more statistics to rank genes in order of evidence of DE, from strongest to weakest; and (ii) choose a critical value for the ranking statistic, above which any value is considered to be statistically significant, and therefore DE.

8 For microarrays, we need to: (i) select one or more statistics to rank genes in order of evidence of DE, from strongest to weakest; and (ii) choose a critical value for the ranking statistic, above which any value is considered to be statistically significant, and therefore DE. There are practical constraints: in a typical study, only a limited number of genes can be followed up for further study.

9 For microarrays, we need to: (i) select one or more statistics to rank genes in order of evidence of DE, from strongest to weakest; and (ii) choose a critical value for the ranking statistic, above which any value is considered to be statistically significant, and therefore DE. There are practical constraints: in a typical study, only a limited number of genes can be followed up for further study. Note that in Chapter 3, we assume the data have been appropriately normalized using the methods we studied in Chapter 2.

10 3.2 The simplest comparison: Two mrna samples A, B are hybridised to a single array (=slide), and the array is replicated n times. Assume each gene is spotted once on each slide.

11 3.2 The simplest comparison: Two mrna samples A, B are hybridised to a single array (=slide), and the array is replicated n times. Assume each gene is spotted once on each slide. Replication is very important.

12 3.2 The simplest comparison: Two mrna samples A, B are hybridised to a single array (=slide), and the array is replicated n times. Assume each gene is spotted once on each slide. Replication is very important. (i) A common approach to analysis is to calculate the average log ratio M j for each gene, and sort the genes according to the absolute value of M j.

13 3.2 The simplest comparison: Two mrna samples A, B are hybridised to a single array (=slide), and the array is replicated n times. Assume each gene is spotted once on each slide. Replication is very important. (i) A common approach to analysis is to calculate the average log ratio M j for each gene, and sort the genes according to the absolute value of M j. But this is a poor choice because it ignores the variability in expression levels for each gene.

14 3.2 The simplest comparison: Two mrna samples A, B are hybridised to a single array (=slide), and the array is replicated n times. Assume each gene is spotted once on each slide. Replication is very important. (i) A common approach to analysis is to calculate the average log ratio M j for each gene, and sort the genes according to the absolute value of M j. But this is a poor choice because it ignores the variability in expression levels for each gene. The variability of the M j values for a gene over replicates varies from gene to gene, and genes with larger variance have a good chance of giving a large M j statistic even if they are not DE.

15 (ii) It is better to use the single sample t statistic. For each gene j, calculate: t j = M j s j / n where s j is the standard deviation of M j -values for the replicates for a gene; j = 1,..., g genes.

16 (ii) It is better to use the single sample t statistic. For each gene j, calculate: t j = M j s j / n where s j is the standard deviation of M j -values for the replicates for a gene; j = 1,..., g genes. This is in fact a paired t statistic when applied to microarray data. Why?

17 (ii) It is better to use the single sample t statistic. For each gene j, calculate: t j = M j s j / n where s j is the standard deviation of M j -values for the replicates for a gene; j = 1,..., g genes. This is in fact a paired t statistic when applied to microarray data. Why? The null hypothesis is H 0 : µ j = 0 vs H A : µ j 0 for each gene j = 1,..., g.

18 (iii) Penalised t statistic: this is a compromise between using M j and the t j -statistic.

19 (iii) Penalised t statistic: this is a compromise between using M j and the t j -statistic. Aim is to avoid spuriously large t j resulting from unrealistically small s j.

20 (iii) Penalised t statistic: this is a compromise between using M j and the t j -statistic. Aim is to avoid spuriously large t j resulting from unrealistically small s j. The penalty is applied to the estimated standard deviation s j : t j = M j (a + s j )/ n

21 (iii) Penalised t statistic: this is a compromise between using M j and the t j -statistic. Aim is to avoid spuriously large t j resulting from unrealistically small s j. The penalty is applied to the estimated standard deviation s j : t j = M j (a + s j )/ n or to the estimated variance s 2 j : M j t j = (a + s 2 j )/n

22 The penalties can be estimated in different ways e.g., a may be the 90th percentile of the s j values.

23 The penalties can be estimated in different ways e.g., a may be the 90th percentile of the s j values. The choice is driven by empirical rather than by theoretical considerations.

24 The penalties can be estimated in different ways e.g., a may be the 90th percentile of the s j values. The choice is driven by empirical rather than by theoretical considerations. Intensity-dependent penalities are also applied in practice.

25 The penalties can be estimated in different ways e.g., a may be the 90th percentile of the s j values. The choice is driven by empirical rather than by theoretical considerations. Intensity-dependent penalities are also applied in practice. Always analyse the unadjusted t statistics.

26 The penalties can be estimated in different ways e.g., a may be the 90th percentile of the s j values. The choice is driven by empirical rather than by theoretical considerations. Intensity-dependent penalities are also applied in practice. Always analyse the unadjusted t statistics. The wisdom of penalising is open to debate.

27 (i) Standard error of M versus average gene intensity (ii) Normal qq-plot of penalised t statistic Standard deviation of log ratios Sample Quantiles Average gene intensity Theoretical Quantiles

28 Assessing differential expression 3.3. Ranking genes: Suppose we calculate the t statistic for every gene in the array (this could be any number up to 20,000) and rank the absolute values of the t statistics.

29 Assessing differential expression 3.3. Ranking genes: Suppose we calculate the t statistic for every gene in the array (this could be any number up to 20,000) and rank the absolute values of the t statistics. This will give us a ranked list of genes in which the largest values of t provide the strongest evidence of differential expression.

30 Assessing differential expression 3.3. Ranking genes: Suppose we calculate the t statistic for every gene in the array (this could be any number up to 20,000) and rank the absolute values of the t statistics. This will give us a ranked list of genes in which the largest values of t provide the strongest evidence of differential expression. However we have, in effect, just performed 20,000 t-tests!

31 Assessing differential expression 3.3. Ranking genes: Suppose we calculate the t statistic for every gene in the array (this could be any number up to 20,000) and rank the absolute values of the t statistics. This will give us a ranked list of genes in which the largest values of t provide the strongest evidence of differential expression. However we have, in effect, just performed 20,000 t-tests! The next step is to choose a cut-off value above which genes will be flagged as statistically significant. How should we do this?

32 The aim in attempting to determine which genes are truly DE is to control for the large amount of multiple testing inherent in the need to conduct a test for each gene. See Chapter 4 on Multiple Comparisons.

33 An informal, graphical method that can be used to assess significance is to display the sorted t statistics in a normal quantile-quantile plot, or a t-distribution qq-plot.

34 An informal, graphical method that can be used to assess significance is to display the sorted t statistics in a normal quantile-quantile plot, or a t-distribution qq-plot. The idea is to look for points which deviate markedly from the line.

35 An informal, graphical method that can be used to assess significance is to display the sorted t statistics in a normal quantile-quantile plot, or a t-distribution qq-plot. The idea is to look for points which deviate markedly from the line. The example on the next slide shows a qq-plot for t statistics with 4 degrees of freedom; the experiment compared two mutant cells lines in leukaemic mice on each slide.

36 An informal, graphical method that can be used to assess significance is to display the sorted t statistics in a normal quantile-quantile plot, or a t-distribution qq-plot. The idea is to look for points which deviate markedly from the line. The example on the next slide shows a qq-plot for t statistics with 4 degrees of freedom; the experiment compared two mutant cells lines in leukaemic mice on each slide. The method implicitly assumes M is roughly normal and that the genes are behaving independently (which may not be true).

37 t qq-plot t.statistics1a[, 1] qt(ppoints(t.statistics1a[, 1]), df = 4)

38 3.4 More complex experiments:

39 3.4 More complex experiments: One of the most commonly used designs in biological experiments is the reference design.

40 3.4 More complex experiments: One of the most commonly used designs in biological experiments is the reference design. The simplest such design compares two mrna samples A and B through a reference sample, Ref. That is, A is compared with Ref, and B is compared with Ref.

41 3.4 More complex experiments: One of the most commonly used designs in biological experiments is the reference design. The simplest such design compares two mrna samples A and B through a reference sample, Ref. That is, A is compared with Ref, and B is compared with Ref. In terms of log ratios M, for each gene j we now have M Aj = log(a j /Ref), M Bj = log(b j /Ref) where A, B are labelled red and Ref is labelled green.

42 3.4 More complex experiments: One of the most commonly used designs in biological experiments is the reference design. The simplest such design compares two mrna samples A and B through a reference sample, Ref. That is, A is compared with Ref, and B is compared with Ref. In terms of log ratios M, for each gene j we now have M Aj = log(a j /Ref), M Bj = log(b j /Ref) where A, B are labelled red and Ref is labelled green. The difference of interest is M Aj M Bj.

43 For ease of notation, we will drop the subscript j.

44 For ease of notation, we will drop the subscript j. In microarray experiments, there will be n 1 replicate arrays comparing (i.e. hybridising) A with Ref, and n 2 replicate arrays comparing B with Ref. Then the test statistic will be based on M A M B.

45 For ease of notation, we will drop the subscript j. In microarray experiments, there will be n 1 replicate arrays comparing (i.e. hybridising) A with Ref, and n 2 replicate arrays comparing B with Ref. Then the test statistic will be based on M A M B. We know that the optimal normal theory statistic is the two-sample t statistic: t = M A M B s p 1 n n 2 where s p is the pooled sample standard deviation.

46 The null hypothesis for each gene is that the expression levels in the two cell types A and B are the same, i.e., H 0 : µ A = µ B versus H a : µ A µ B.

47 The null hypothesis for each gene is that the expression levels in the two cell types A and B are the same, i.e., H 0 : µ A = µ B versus H a : µ A µ B. s p is sometimes replaced by the penalised pooled sample standard deviation, s p = a + s 2 p.

48 The example on the next page is from Dudoit et al. (2002) and shows a histogram of the observed two-sample t statistics, and the normal qq-plot for two-sample t statistics from a study comparing lipid levels in treated (A) and control mice (B).

49 The example on the next page is from Dudoit et al. (2002) and shows a histogram of the observed two-sample t statistics, and the normal qq-plot for two-sample t statistics from a study comparing lipid levels in treated (A) and control mice (B). There were 16 slides in the experiment, 8 for treated and 8 for control mice, each hybridised to a common reference pool of mice DNA (Ref).

50 The example on the next page is from Dudoit et al. (2002) and shows a histogram of the observed two-sample t statistics, and the normal qq-plot for two-sample t statistics from a study comparing lipid levels in treated (A) and control mice (B). There were 16 slides in the experiment, 8 for treated and 8 for control mice, each hybridised to a common reference pool of mice DNA (Ref). The points lying off the line are candidates for differential expression.

51 Histogram & qq plot ApoA1

52 Remarks on t statistics

53 Remarks on t statistics The t statistic has the advantage of extending to more complex situations, such as factorial designs and multiple regression.

54 Remarks on t statistics The t statistic has the advantage of extending to more complex situations, such as factorial designs and multiple regression. The above approach to analysis can be generalised to more than two samples using F statistics, and so on.

55 Remarks on t statistics The t statistic has the advantage of extending to more complex situations, such as factorial designs and multiple regression. The above approach to analysis can be generalised to more than two samples using F statistics, and so on. However, the two-sample t statistic assumes the random variables M A and M B are normally distributed and have equal variances, which may not be justified.

56 We can relax the equal variance assumption by using the approximate unequal variance form of the two-sample t statistic: t = M A M B s 2 A n 1 + s2 B n 2

57 We can relax the equal variance assumption by using the approximate unequal variance form of the two-sample t statistic: t = M A M B s 2 A n 1 + s2 B n 2 But there are better, alternative approaches.

58 The rest of this Chapter...

59 The rest of this Chapter... Nonparametric or distribution-free alternatives to the two-sample t statistic are popular and we consider two of these in 3.5: Mann-Whitney test Permutation test.

60 The rest of this Chapter... Nonparametric or distribution-free alternatives to the two-sample t statistic are popular and we consider two of these in 3.5: Mann-Whitney test Permutation test. Computer-intensive testing and estimation procedures are also popular, and in 3.6 we will study Bootstrap techniques.

61 The rest of this Chapter... Nonparametric or distribution-free alternatives to the two-sample t statistic are popular and we consider two of these in 3.5: Mann-Whitney test Permutation test. Computer-intensive testing and estimation procedures are also popular, and in 3.6 we will study Bootstrap techniques. In 3.7, we will study Bayesian estimation and testing procedures.

Single gene analysis of differential expression. Giorgio Valentini

Single gene analysis of differential expression. Giorgio Valentini Single gene analysis of differential expression Giorgio Valentini valenti@disi.unige.it Comparing two conditions Each condition may be represented by one or more RNA samples. Using cdna microarrays, samples

More information

Exam details. Final Review Session. Things to Review

Exam details. Final Review Session. Things to Review Exam details Final Review Session Short answer, similar to book problems Formulae and tables will be given You CAN use a calculator Date and Time: Dec. 7, 006, 1-1:30 pm Location: Osborne Centre, Unit

More information

Design of microarray experiments

Design of microarray experiments Design of microarray experiments Ulrich ansmann mansmann@imbi.uni-heidelberg.de Practical microarray analysis September Heidelberg Heidelberg, September otivation The lab biologist and theoretician need

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 6, Issue 1 2007 Article 28 A Comparison of Methods to Control Type I Errors in Microarray Studies Jinsong Chen Mark J. van der Laan Martyn

More information

Rank-Based Methods. Lukas Meier

Rank-Based Methods. Lukas Meier Rank-Based Methods Lukas Meier 20.01.2014 Introduction Up to now we basically always used a parametric family, like the normal distribution N (µ, σ 2 ) for modeling random data. Based on observed data

More information

Single gene analysis of differential expression

Single gene analysis of differential expression Single gene analysis of differential expression Giorgio Valentini DSI Dipartimento di Scienze dell Informazione Università degli Studi di Milano valentini@dsi.unimi.it Comparing two conditions Each condition

More information

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS In our work on hypothesis testing, we used the value of a sample statistic to challenge an accepted value of a population parameter. We focused only

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org

More information

The miss rate for the analysis of gene expression data

The miss rate for the analysis of gene expression data Biostatistics (2005), 6, 1,pp. 111 117 doi: 10.1093/biostatistics/kxh021 The miss rate for the analysis of gene expression data JONATHAN TAYLOR Department of Statistics, Stanford University, Stanford,

More information

Lesson 11. Functional Genomics I: Microarray Analysis

Lesson 11. Functional Genomics I: Microarray Analysis Lesson 11 Functional Genomics I: Microarray Analysis Transcription of DNA and translation of RNA vary with biological conditions 3 kinds of microarray platforms Spotted Array - 2 color - Pat Brown (Stanford)

More information

Design of Microarray Experiments. Xiangqin Cui

Design of Microarray Experiments. Xiangqin Cui Design of Microarray Experiments Xiangqin Cui Experimental design Experimental design: is a term used about efficient methods for planning the collection of data, in order to obtain the maximum amount

More information

Permutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

Permutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods Permutation Tests Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods The Two-Sample Problem We observe two independent random samples: F z = z 1, z 2,, z n independently of

More information

Statistics Handbook. All statistical tables were computed by the author.

Statistics Handbook. All statistical tables were computed by the author. Statistics Handbook Contents Page Wilcoxon rank-sum test (Mann-Whitney equivalent) Wilcoxon matched-pairs test 3 Normal Distribution 4 Z-test Related samples t-test 5 Unrelated samples t-test 6 Variance

More information

Transition Passage to Descriptive Statistics 28

Transition Passage to Descriptive Statistics 28 viii Preface xiv chapter 1 Introduction 1 Disciplines That Use Quantitative Data 5 What Do You Mean, Statistics? 6 Statistics: A Dynamic Discipline 8 Some Terminology 9 Problems and Answers 12 Scales of

More information

Identifying Bio-markers for EcoArray

Identifying Bio-markers for EcoArray Identifying Bio-markers for EcoArray Ashish Bhan, Keck Graduate Institute Mustafa Kesir and Mikhail B. Malioutov, Northeastern University February 18, 2010 1 Introduction This problem was presented by

More information

PSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests

PSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests PSY 307 Statistics for the Behavioral Sciences Chapter 20 Tests for Ranked Data, Choosing Statistical Tests What To Do with Non-normal Distributions Tranformations (pg 382): The shape of the distribution

More information

Comparison of Two Population Means

Comparison of Two Population Means Comparison of Two Population Means Esra Akdeniz March 15, 2015 Independent versus Dependent (paired) Samples We have independent samples if we perform an experiment in two unrelated populations. We have

More information

STATISTICS 4, S4 (4769) A2

STATISTICS 4, S4 (4769) A2 (4769) A2 Objectives To provide students with the opportunity to explore ideas in more advanced statistics to a greater depth. Assessment Examination (72 marks) 1 hour 30 minutes There are four options

More information

3 Joint Distributions 71

3 Joint Distributions 71 2.2.3 The Normal Distribution 54 2.2.4 The Beta Density 58 2.3 Functions of a Random Variable 58 2.4 Concluding Remarks 64 2.5 Problems 64 3 Joint Distributions 71 3.1 Introduction 71 3.2 Discrete Random

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Analysis of variance (ANOVA) Comparing the means of more than two groups

Analysis of variance (ANOVA) Comparing the means of more than two groups Analysis of variance (ANOVA) Comparing the means of more than two groups Example: Cost of mating in male fruit flies Drosophila Treatments: place males with and without unmated (virgin) females Five treatments

More information

Selection should be based on the desired biological interpretation!

Selection should be based on the desired biological interpretation! Statistical tools to compare levels of parasitism Jen_ Reiczigel,, Lajos Rózsa Hungary What to compare? The prevalence? The mean intensity? The median intensity? Or something else? And which statistical

More information

Distribution-Free Procedures (Devore Chapter Fifteen)

Distribution-Free Procedures (Devore Chapter Fifteen) Distribution-Free Procedures (Devore Chapter Fifteen) MATH-5-01: Probability and Statistics II Spring 018 Contents 1 Nonparametric Hypothesis Tests 1 1.1 The Wilcoxon Rank Sum Test........... 1 1. Normal

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca

More information

Experimental Design. Experimental design. Outline. Choice of platform Array design. Target samples

Experimental Design. Experimental design. Outline. Choice of platform Array design. Target samples Experimental Design Credit for some of today s materials: Jean Yang, Terry Speed, and Christina Kendziorski Experimental design Choice of platform rray design Creation of probes Location on the array Controls

More information

A Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data

A Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data A Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data Yoonkyung Lee Department of Statistics The Ohio State University http://www.stat.ohio-state.edu/ yklee May 13, 2005

More information

Unsupervised machine learning

Unsupervised machine learning Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels

More information

Unit 14: Nonparametric Statistical Methods

Unit 14: Nonparametric Statistical Methods Unit 14: Nonparametric Statistical Methods Statistics 571: Statistical Methods Ramón V. León 8/8/2003 Unit 14 - Stat 571 - Ramón V. León 1 Introductory Remarks Most methods studied so far have been based

More information

Glossary for the Triola Statistics Series

Glossary for the Triola Statistics Series Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling

More information

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION FOR SAMPLE OF RAW DATA (E.G. 4, 1, 7, 5, 11, 6, 9, 7, 11, 5, 4, 7) BE ABLE TO COMPUTE MEAN G / STANDARD DEVIATION MEDIAN AND QUARTILES Σ ( Σ) / 1 GROUPED DATA E.G. AGE FREQ. 0-9 53 10-19 4...... 80-89

More information

Optimal normalization of DNA-microarray data

Optimal normalization of DNA-microarray data Optimal normalization of DNA-microarray data Daniel Faller 1, HD Dr. J. Timmer 1, Dr. H. U. Voss 1, Prof. Dr. Honerkamp 1 and Dr. U. Hobohm 2 1 Freiburg Center for Data Analysis and Modeling 1 F. Hoffman-La

More information

Two-Color Microarray Experimental Design Notation. Simple Examples of Analysis for a Single Gene. Microarray Experimental Design Notation

Two-Color Microarray Experimental Design Notation. Simple Examples of Analysis for a Single Gene. Microarray Experimental Design Notation Simple Examples of Analysis for a Single Gene wo-olor Microarray Experimental Design Notation /3/0 opyright 0 Dan Nettleton Microarray Experimental Design Notation Microarray Experimental Design Notation

More information

Design of microarray experiments

Design of microarray experiments Design of microarray experiments Ulrich Mansmann mansmann@imbi.uni-heidelberg.de Practical microarray analysis March 23 Heidelberg Heidelberg, March 23 Experiments Scientists deal mostly with experiments

More information

Contents. Acknowledgments. xix

Contents. Acknowledgments. xix Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables

More information

Normalization. Example of Replicate Data. Biostatistics Rafael A. Irizarry

Normalization. Example of Replicate Data. Biostatistics Rafael A. Irizarry This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Chapter 7 Comparison of two independent samples

Chapter 7 Comparison of two independent samples Chapter 7 Comparison of two independent samples 7.1 Introduction Population 1 µ σ 1 1 N 1 Sample 1 y s 1 1 n 1 Population µ σ N Sample y s n 1, : population means 1, : population standard deviations N

More information

Sample Size Estimation for Studies of High-Dimensional Data

Sample Size Estimation for Studies of High-Dimensional Data Sample Size Estimation for Studies of High-Dimensional Data James J. Chen, Ph.D. National Center for Toxicological Research Food and Drug Administration June 3, 2009 China Medical University Taichung,

More information

Statistical. Psychology

Statistical. Psychology SEVENTH у *i km m it* & П SB Й EDITION Statistical M e t h o d s for Psychology D a v i d C. Howell University of Vermont ; \ WADSWORTH f% CENGAGE Learning* Australia Biaall apan Korea Меяко Singapore

More information

Correlation and Regression (Excel 2007)

Correlation and Regression (Excel 2007) Correlation and Regression (Excel 2007) (See Also Scatterplots, Regression Lines, and Time Series Charts With Excel 2007 for instructions on making a scatterplot of the data and an alternate method of

More information

Exam: high-dimensional data analysis February 28, 2014

Exam: high-dimensional data analysis February 28, 2014 Exam: high-dimensional data analysis February 28, 2014 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question (not the subquestions) on a separate piece of paper.

More information

Introduction to hypothesis testing

Introduction to hypothesis testing Introduction to hypothesis testing Review: Logic of Hypothesis Tests Usually, we test (attempt to falsify) a null hypothesis (H 0 ): includes all possibilities except prediction in hypothesis (H A ) If

More information

Non-parametric (Distribution-free) approaches p188 CN

Non-parametric (Distribution-free) approaches p188 CN Week 1: Introduction to some nonparametric and computer intensive (re-sampling) approaches: the sign test, Wilcoxon tests and multi-sample extensions, Spearman s rank correlation; the Bootstrap. (ch14

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Analysis and Statistical Methods Statistics 65 http://www.stat.tamu.edu/~suhasini/teaching.html Suhasini Subba Rao Review In the previous lecture we considered the following tests: The independent

More information

Power Calculations for Preclinical Studies Using a K-Sample Rank Test and the Lehmann Alternative Hypothesis

Power Calculations for Preclinical Studies Using a K-Sample Rank Test and the Lehmann Alternative Hypothesis Power Calculations for Preclinical Studies Using a K-Sample Rank Test and the Lehmann Alternative Hypothesis Glenn Heller Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center,

More information

Biochip informatics-(i)

Biochip informatics-(i) Biochip informatics-(i) : biochip normalization & differential expression Ju Han Kim, M.D., Ph.D. SNUBI: SNUBiomedical Informatics http://www.snubi snubi.org/ Biochip Informatics - (I) Biochip basics Preprocessing

More information

THE PRINCIPLES AND PRACTICE OF STATISTICS IN BIOLOGICAL RESEARCH. Robert R. SOKAL and F. James ROHLF. State University of New York at Stony Brook

THE PRINCIPLES AND PRACTICE OF STATISTICS IN BIOLOGICAL RESEARCH. Robert R. SOKAL and F. James ROHLF. State University of New York at Stony Brook BIOMETRY THE PRINCIPLES AND PRACTICE OF STATISTICS IN BIOLOGICAL RESEARCH THIRD E D I T I O N Robert R. SOKAL and F. James ROHLF State University of New York at Stony Brook W. H. FREEMAN AND COMPANY New

More information

Relative efficiency. Patrick Breheny. October 9. Theoretical framework Application to the two-group problem

Relative efficiency. Patrick Breheny. October 9. Theoretical framework Application to the two-group problem Relative efficiency Patrick Breheny October 9 Patrick Breheny STA 621: Nonparametric Statistics 1/15 Relative efficiency Suppose test 1 requires n 1 observations to obtain a certain power β, and that test

More information

REPLICATED MICROARRAY DATA

REPLICATED MICROARRAY DATA Statistica Sinica 1(), 31-46 REPLICATED MICROARRAY DATA Ingrid Lönnstedt and Terry Speed Uppsala University, University of California, Berkeley and Walter and Eliza Hall Institute Abstract: cdna microarrays

More information

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants.

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants. The idea of ANOVA Reminders: A factor is a variable that can take one of several levels used to differentiate one group from another. An experiment has a one-way, or completely randomized, design if several

More information

4/6/16. Non-parametric Test. Overview. Stephen Opiyo. Distinguish Parametric and Nonparametric Test Procedures

4/6/16. Non-parametric Test. Overview. Stephen Opiyo. Distinguish Parametric and Nonparametric Test Procedures Non-parametric Test Stephen Opiyo Overview Distinguish Parametric and Nonparametric Test Procedures Explain commonly used Nonparametric Test Procedures Perform Hypothesis Tests Using Nonparametric Procedures

More information

Inferential Statistical Analysis of Microarray Experiments 2007 Arizona Microarray Workshop

Inferential Statistical Analysis of Microarray Experiments 2007 Arizona Microarray Workshop Inferential Statistical Analysis of Microarray Experiments 007 Arizona Microarray Workshop μ!! Robert J Tempelman Department of Animal Science tempelma@msuedu HYPOTHESIS TESTING (as if there was only one

More information

Non-specific filtering and control of false positives

Non-specific filtering and control of false positives Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview

More information

EVALUATING THE REPEATABILITY OF TWO STUDIES OF A LARGE NUMBER OF OBJECTS: MODIFIED KENDALL RANK-ORDER ASSOCIATION TEST

EVALUATING THE REPEATABILITY OF TWO STUDIES OF A LARGE NUMBER OF OBJECTS: MODIFIED KENDALL RANK-ORDER ASSOCIATION TEST EVALUATING THE REPEATABILITY OF TWO STUDIES OF A LARGE NUMBER OF OBJECTS: MODIFIED KENDALL RANK-ORDER ASSOCIATION TEST TIAN ZHENG, SHAW-HWA LO DEPARTMENT OF STATISTICS, COLUMBIA UNIVERSITY Abstract. In

More information

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008 1 / 35 Lecture outline Motivation for not using

More information

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC CHI SQUARE ANALYSIS I N T R O D U C T I O N T O N O N - P A R A M E T R I C A N A L Y S E S HYPOTHESIS TESTS SO FAR We ve discussed One-sample t-test Dependent Sample t-tests Independent Samples t-tests

More information

Tables Table A Table B Table C Table D Table E 675

Tables Table A Table B Table C Table D Table E 675 BMTables.indd Page 675 11/15/11 4:25:16 PM user-s163 Tables Table A Standard Normal Probabilities Table B Random Digits Table C t Distribution Critical Values Table D Chi-square Distribution Critical Values

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2004 Paper 147 Multiple Testing Methods For ChIP-Chip High Density Oligonucleotide Array Data Sunduz

More information

Advanced Statistical Methods: Beyond Linear Regression

Advanced Statistical Methods: Beyond Linear Regression Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi

More information

Non-parametric tests, part A:

Non-parametric tests, part A: Two types of statistical test: Non-parametric tests, part A: Parametric tests: Based on assumption that the data have certain characteristics or "parameters": Results are only valid if (a) the data are

More information

2.830 Homework #6. April 2, 2009

2.830 Homework #6. April 2, 2009 2.830 Homework #6 Dayán Páez April 2, 2009 1 ANOVA The data for four different lithography processes, along with mean and standard deviations are shown in Table 1. Assume a null hypothesis of equality.

More information

Lecture 30. DATA 8 Summer Regression Inference

Lecture 30. DATA 8 Summer Regression Inference DATA 8 Summer 2018 Lecture 30 Regression Inference Slides created by John DeNero (denero@berkeley.edu) and Ani Adhikari (adhikari@berkeley.edu) Contributions by Fahad Kamran (fhdkmrn@berkeley.edu) and

More information

BIOINFORMATICS ORIGINAL PAPER

BIOINFORMATICS ORIGINAL PAPER BIOINFORMATICS ORIGINAL PAPER Vol 21 no 11 2005, pages 2684 2690 doi:101093/bioinformatics/bti407 Gene expression A practical false discovery rate approach to identifying patterns of differential expression

More information

1. How will an increase in the sample size affect the width of the confidence interval?

1. How will an increase in the sample size affect the width of the confidence interval? Study Guide Concept Questions 1. How will an increase in the sample size affect the width of the confidence interval? 2. How will an increase in the sample size affect the power of a statistical test?

More information

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous

More information

Tentative solutions TMA4255 Applied Statistics 16 May, 2015

Tentative solutions TMA4255 Applied Statistics 16 May, 2015 Norwegian University of Science and Technology Department of Mathematical Sciences Page of 9 Tentative solutions TMA455 Applied Statistics 6 May, 05 Problem Manufacturer of fertilizers a) Are these independent

More information

Hypothesis Testing with the Bootstrap. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

Hypothesis Testing with the Bootstrap. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods Hypothesis Testing with the Bootstrap Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods Bootstrap Hypothesis Testing A bootstrap hypothesis test starts with a test statistic

More information

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018 High-Throughput Sequencing Course Multiple Testing Biostatistics and Bioinformatics Summer 2018 Introduction You have previously considered the significance of a single gene Introduction You have previously

More information

4.1 Hypothesis Testing

4.1 Hypothesis Testing 4.1 Hypothesis Testing z-test for a single value double-sided and single-sided z-test for one average z-test for two averages double-sided and single-sided t-test for one average the F-parameter and F-table

More information

Kruskal-Wallis and Friedman type tests for. nested effects in hierarchical designs 1

Kruskal-Wallis and Friedman type tests for. nested effects in hierarchical designs 1 Kruskal-Wallis and Friedman type tests for nested effects in hierarchical designs 1 Assaf P. Oron and Peter D. Hoff Department of Statistics, University of Washington, Seattle assaf@u.washington.edu, hoff@stat.washington.edu

More information

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides Chapter 7 Inference for Distributions Introduction to the Practice of STATISTICS SEVENTH EDITION Moore / McCabe / Craig Lecture Presentation Slides Chapter 7 Inference for Distributions 7.1 Inference for

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Estimation of Transformations for Microarray Data Using Maximum Likelihood and Related Methods

Estimation of Transformations for Microarray Data Using Maximum Likelihood and Related Methods Estimation of Transformations for Microarray Data Using Maximum Likelihood and Related Methods Blythe Durbin, Department of Statistics, UC Davis, Davis, CA 95616 David M. Rocke, Department of Applied Science,

More information

probability George Nicholson and Chris Holmes 31st October 2008

probability George Nicholson and Chris Holmes 31st October 2008 probability George Nicholson and Chris Holmes 31st October 2008 This practical focuses on understanding probabilistic and statistical concepts using simulation and plots in R R. It begins with an introduction

More information

Quick Calculation for Sample Size while Controlling False Discovery Rate with Application to Microarray Analysis

Quick Calculation for Sample Size while Controlling False Discovery Rate with Application to Microarray Analysis Statistics Preprints Statistics 11-2006 Quick Calculation for Sample Size while Controlling False Discovery Rate with Application to Microarray Analysis Peng Liu Iowa State University, pliu@iastate.edu

More information

A Practical Approach to Inferring Large Graphical Models from Sparse Microarray Data

A Practical Approach to Inferring Large Graphical Models from Sparse Microarray Data A Practical Approach to Inferring Large Graphical Models from Sparse Microarray Data Juliane Schäfer Department of Statistics, University of Munich Workshop: Practical Analysis of Gene Expression Data

More information

Lecture 7: Hypothesis Testing and ANOVA

Lecture 7: Hypothesis Testing and ANOVA Lecture 7: Hypothesis Testing and ANOVA Goals Overview of key elements of hypothesis testing Review of common one and two sample tests Introduction to ANOVA Hypothesis Testing The intent of hypothesis

More information

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification 10-810: Advanced Algorithms and Models for Computational Biology Optimal leaf ordering and classification Hierarchical clustering As we mentioned, its one of the most popular methods for clustering gene

More information

SPOTTED cdna MICROARRAYS

SPOTTED cdna MICROARRAYS SPOTTED cdna MICROARRAYS Spot size: 50um - 150um SPOTTED cdna MICROARRAYS Compare the genetic expression in two samples of cells PRINT cdna from one gene on each spot SAMPLES cdna labelled red/green e.g.

More information

Nonparametric tests, Bootstrapping

Nonparametric tests, Bootstrapping Nonparametric tests, Bootstrapping http://www.isrec.isb-sib.ch/~darlene/embnet/ Hypothesis testing review 2 competing theories regarding a population parameter: NULL hypothesis H ( straw man ) ALTERNATIVEhypothesis

More information

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information

Advanced Statistics II: Non Parametric Tests

Advanced Statistics II: Non Parametric Tests Advanced Statistics II: Non Parametric Tests Aurélien Garivier ParisTech February 27, 2011 Outline Fitting a distribution Rank Tests for the comparison of two samples Two unrelated samples: Mann-Whitney

More information

Bootstrap. ADA1 November 27, / 38

Bootstrap. ADA1 November 27, / 38 The bootstrap as a statistical method was invented in 1979 by Bradley Efron, one of the most influential statisticians still alive. The idea is nonparametric, but is not based on ranks, and is very computationally

More information

Background to Statistics

Background to Statistics FACT SHEET Background to Statistics Introduction Statistics include a broad range of methods for manipulating, presenting and interpreting data. Professional scientists of all kinds need to be proficient

More information

Statistical testing. Samantha Kleinberg. October 20, 2009

Statistical testing. Samantha Kleinberg. October 20, 2009 October 20, 2009 Intro to significance testing Significance testing and bioinformatics Gene expression: Frequently have microarray data for some group of subjects with/without the disease. Want to find

More information

Inferences About the Difference Between Two Means

Inferences About the Difference Between Two Means 7 Inferences About the Difference Between Two Means Chapter Outline 7.1 New Concepts 7.1.1 Independent Versus Dependent Samples 7.1. Hypotheses 7. Inferences About Two Independent Means 7..1 Independent

More information

Stat 427/527: Advanced Data Analysis I

Stat 427/527: Advanced Data Analysis I Stat 427/527: Advanced Data Analysis I Review of Chapters 1-4 Sep, 2017 1 / 18 Concepts you need to know/interpret Numerical summaries: measures of center (mean, median, mode) measures of spread (sample

More information

Statistical analysis of microarray data: a Bayesian approach

Statistical analysis of microarray data: a Bayesian approach Biostatistics (003), 4, 4,pp. 597 60 Printed in Great Britain Statistical analysis of microarray data: a Bayesian approach RAPHAEL GTTARD University of Washington, Department of Statistics, Box 3543, Seattle,

More information

Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017

Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017 Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017 I. χ 2 or chi-square test Objectives: Compare how close an experimentally derived value agrees with an expected value. One method to

More information

The Nonparametric Bootstrap

The Nonparametric Bootstrap The Nonparametric Bootstrap The nonparametric bootstrap may involve inferences about a parameter, but we use a nonparametric procedure in approximating the parametric distribution using the ECDF. We use

More information

Nonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown

Nonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown Nonparametric Statistics Leah Wright, Tyler Ross, Taylor Brown Before we get to nonparametric statistics, what are parametric statistics? These statistics estimate and test population means, while holding

More information

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES BIOL 458 - Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES PART 1: INTRODUCTION TO ANOVA Purpose of ANOVA Analysis of Variance (ANOVA) is an extremely useful statistical method

More information

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies The t-test: So Far: Sampling distribution benefit is that even if the original population is not normal, a sampling distribution based on this population will be normal (for sample size > 30). Benefit

More information

Chapter 3. Measuring data

Chapter 3. Measuring data Chapter 3 Measuring data 1 Measuring data versus presenting data We present data to help us draw meaning from it But pictures of data are subjective They re also not susceptible to rigorous inference Measuring

More information

ST4241 Design and Analysis of Clinical Trials Lecture 9: N. Lecture 9: Non-parametric procedures for CRBD

ST4241 Design and Analysis of Clinical Trials Lecture 9: N. Lecture 9: Non-parametric procedures for CRBD ST21 Design and Analysis of Clinical Trials Lecture 9: Non-parametric procedures for CRBD Department of Statistics & Applied Probability 8:00-10:00 am, Friday, September 9, 2016 Outline Nonparametric tests

More information

eqr014 Lenth s Method for the Analysis of Unreplicated Experiments

eqr014 Lenth s Method for the Analysis of Unreplicated Experiments eqr014 Lenth s Method for the Analysis of Unreplicated Experiments Russell V. Lenth Department of Statistics and Actuarial Science The University of Iowa Iowa City, IA USA 52242 Voice 319-335-0814 FAX

More information

One-shot Learning of Poisson Distributions Information Theory of Audic-Claverie Statistic for Analyzing cdna Arrays

One-shot Learning of Poisson Distributions Information Theory of Audic-Claverie Statistic for Analyzing cdna Arrays One-shot Learning of Poisson Distributions Information Theory of Audic-Claverie Statistic for Analyzing cdna Arrays Peter Tiňo School of Computer Science University of Birmingham, UK One-shot Learning

More information

Nonparametric tests. Mark Muldoon School of Mathematics, University of Manchester. Mark Muldoon, November 8, 2005 Nonparametric tests - p.

Nonparametric tests. Mark Muldoon School of Mathematics, University of Manchester. Mark Muldoon, November 8, 2005 Nonparametric tests - p. Nonparametric s Mark Muldoon School of Mathematics, University of Manchester Mark Muldoon, November 8, 2005 Nonparametric s - p. 1/31 Overview The sign, motivation The Mann-Whitney Larger Larger, in pictures

More information

Resampling Methods. Lukas Meier

Resampling Methods. Lukas Meier Resampling Methods Lukas Meier 20.01.2014 Introduction: Example Hail prevention (early 80s) Is a vaccination of clouds really reducing total energy? Data: Hail energy for n clouds (via radar image) Y i

More information

Non-Parametric Two-Sample Analysis: The Mann-Whitney U Test

Non-Parametric Two-Sample Analysis: The Mann-Whitney U Test Non-Parametric Two-Sample Analysis: The Mann-Whitney U Test When samples do not meet the assumption of normality parametric tests should not be used. To overcome this problem, non-parametric tests can

More information