Chapter 3: Statistical methods for estimation and testing. Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001).
|
|
- Herbert Walton
- 5 years ago
- Views:
Transcription
1 Chapter 3: Statistical methods for estimation and testing Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001).
2 Chapter 3: Statistical methods for estimation and testing Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001). 3.1 Motivation: 1. Identify genes which show evidence of differential expression (DE).
3 Chapter 3: Statistical methods for estimation and testing Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001). 3.1 Motivation: 1. Identify genes which show evidence of differential expression (DE). In general, a study may involve one or a small number of genes, or many thousands of genes as in microarray experiments.
4 Chapter 3: Statistical methods for estimation and testing Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001). 3.1 Motivation: 1. Identify genes which show evidence of differential expression (DE). In general, a study may involve one or a small number of genes, or many thousands of genes as in microarray experiments. In microarray experiments, a primary goal is to rank genes according to evidence of DE.
5 Chapter 3: Statistical methods for estimation and testing Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001). 3.1 Motivation: 1. Identify genes which show evidence of differential expression (DE). In general, a study may involve one or a small number of genes, or many thousands of genes as in microarray experiments. In microarray experiments, a primary goal is to rank genes according to evidence of DE. 2. Matching DNA sequences.
6 For microarrays, we need to: (i) select one or more statistics to rank genes in order of evidence of DE, from strongest to weakest; and
7 For microarrays, we need to: (i) select one or more statistics to rank genes in order of evidence of DE, from strongest to weakest; and (ii) choose a critical value for the ranking statistic, above which any value is considered to be statistically significant, and therefore DE.
8 For microarrays, we need to: (i) select one or more statistics to rank genes in order of evidence of DE, from strongest to weakest; and (ii) choose a critical value for the ranking statistic, above which any value is considered to be statistically significant, and therefore DE. There are practical constraints: in a typical study, only a limited number of genes can be followed up for further study.
9 For microarrays, we need to: (i) select one or more statistics to rank genes in order of evidence of DE, from strongest to weakest; and (ii) choose a critical value for the ranking statistic, above which any value is considered to be statistically significant, and therefore DE. There are practical constraints: in a typical study, only a limited number of genes can be followed up for further study. Note that in Chapter 3, we assume the data have been appropriately normalized using the methods we studied in Chapter 2.
10 3.2 The simplest comparison: Two mrna samples A, B are hybridised to a single array (=slide), and the array is replicated n times. Assume each gene is spotted once on each slide.
11 3.2 The simplest comparison: Two mrna samples A, B are hybridised to a single array (=slide), and the array is replicated n times. Assume each gene is spotted once on each slide. Replication is very important.
12 3.2 The simplest comparison: Two mrna samples A, B are hybridised to a single array (=slide), and the array is replicated n times. Assume each gene is spotted once on each slide. Replication is very important. (i) A common approach to analysis is to calculate the average log ratio M j for each gene, and sort the genes according to the absolute value of M j.
13 3.2 The simplest comparison: Two mrna samples A, B are hybridised to a single array (=slide), and the array is replicated n times. Assume each gene is spotted once on each slide. Replication is very important. (i) A common approach to analysis is to calculate the average log ratio M j for each gene, and sort the genes according to the absolute value of M j. But this is a poor choice because it ignores the variability in expression levels for each gene.
14 3.2 The simplest comparison: Two mrna samples A, B are hybridised to a single array (=slide), and the array is replicated n times. Assume each gene is spotted once on each slide. Replication is very important. (i) A common approach to analysis is to calculate the average log ratio M j for each gene, and sort the genes according to the absolute value of M j. But this is a poor choice because it ignores the variability in expression levels for each gene. The variability of the M j values for a gene over replicates varies from gene to gene, and genes with larger variance have a good chance of giving a large M j statistic even if they are not DE.
15 (ii) It is better to use the single sample t statistic. For each gene j, calculate: t j = M j s j / n where s j is the standard deviation of M j -values for the replicates for a gene; j = 1,..., g genes.
16 (ii) It is better to use the single sample t statistic. For each gene j, calculate: t j = M j s j / n where s j is the standard deviation of M j -values for the replicates for a gene; j = 1,..., g genes. This is in fact a paired t statistic when applied to microarray data. Why?
17 (ii) It is better to use the single sample t statistic. For each gene j, calculate: t j = M j s j / n where s j is the standard deviation of M j -values for the replicates for a gene; j = 1,..., g genes. This is in fact a paired t statistic when applied to microarray data. Why? The null hypothesis is H 0 : µ j = 0 vs H A : µ j 0 for each gene j = 1,..., g.
18 (iii) Penalised t statistic: this is a compromise between using M j and the t j -statistic.
19 (iii) Penalised t statistic: this is a compromise between using M j and the t j -statistic. Aim is to avoid spuriously large t j resulting from unrealistically small s j.
20 (iii) Penalised t statistic: this is a compromise between using M j and the t j -statistic. Aim is to avoid spuriously large t j resulting from unrealistically small s j. The penalty is applied to the estimated standard deviation s j : t j = M j (a + s j )/ n
21 (iii) Penalised t statistic: this is a compromise between using M j and the t j -statistic. Aim is to avoid spuriously large t j resulting from unrealistically small s j. The penalty is applied to the estimated standard deviation s j : t j = M j (a + s j )/ n or to the estimated variance s 2 j : M j t j = (a + s 2 j )/n
22 The penalties can be estimated in different ways e.g., a may be the 90th percentile of the s j values.
23 The penalties can be estimated in different ways e.g., a may be the 90th percentile of the s j values. The choice is driven by empirical rather than by theoretical considerations.
24 The penalties can be estimated in different ways e.g., a may be the 90th percentile of the s j values. The choice is driven by empirical rather than by theoretical considerations. Intensity-dependent penalities are also applied in practice.
25 The penalties can be estimated in different ways e.g., a may be the 90th percentile of the s j values. The choice is driven by empirical rather than by theoretical considerations. Intensity-dependent penalities are also applied in practice. Always analyse the unadjusted t statistics.
26 The penalties can be estimated in different ways e.g., a may be the 90th percentile of the s j values. The choice is driven by empirical rather than by theoretical considerations. Intensity-dependent penalities are also applied in practice. Always analyse the unadjusted t statistics. The wisdom of penalising is open to debate.
27 (i) Standard error of M versus average gene intensity (ii) Normal qq-plot of penalised t statistic Standard deviation of log ratios Sample Quantiles Average gene intensity Theoretical Quantiles
28 Assessing differential expression 3.3. Ranking genes: Suppose we calculate the t statistic for every gene in the array (this could be any number up to 20,000) and rank the absolute values of the t statistics.
29 Assessing differential expression 3.3. Ranking genes: Suppose we calculate the t statistic for every gene in the array (this could be any number up to 20,000) and rank the absolute values of the t statistics. This will give us a ranked list of genes in which the largest values of t provide the strongest evidence of differential expression.
30 Assessing differential expression 3.3. Ranking genes: Suppose we calculate the t statistic for every gene in the array (this could be any number up to 20,000) and rank the absolute values of the t statistics. This will give us a ranked list of genes in which the largest values of t provide the strongest evidence of differential expression. However we have, in effect, just performed 20,000 t-tests!
31 Assessing differential expression 3.3. Ranking genes: Suppose we calculate the t statistic for every gene in the array (this could be any number up to 20,000) and rank the absolute values of the t statistics. This will give us a ranked list of genes in which the largest values of t provide the strongest evidence of differential expression. However we have, in effect, just performed 20,000 t-tests! The next step is to choose a cut-off value above which genes will be flagged as statistically significant. How should we do this?
32 The aim in attempting to determine which genes are truly DE is to control for the large amount of multiple testing inherent in the need to conduct a test for each gene. See Chapter 4 on Multiple Comparisons.
33 An informal, graphical method that can be used to assess significance is to display the sorted t statistics in a normal quantile-quantile plot, or a t-distribution qq-plot.
34 An informal, graphical method that can be used to assess significance is to display the sorted t statistics in a normal quantile-quantile plot, or a t-distribution qq-plot. The idea is to look for points which deviate markedly from the line.
35 An informal, graphical method that can be used to assess significance is to display the sorted t statistics in a normal quantile-quantile plot, or a t-distribution qq-plot. The idea is to look for points which deviate markedly from the line. The example on the next slide shows a qq-plot for t statistics with 4 degrees of freedom; the experiment compared two mutant cells lines in leukaemic mice on each slide.
36 An informal, graphical method that can be used to assess significance is to display the sorted t statistics in a normal quantile-quantile plot, or a t-distribution qq-plot. The idea is to look for points which deviate markedly from the line. The example on the next slide shows a qq-plot for t statistics with 4 degrees of freedom; the experiment compared two mutant cells lines in leukaemic mice on each slide. The method implicitly assumes M is roughly normal and that the genes are behaving independently (which may not be true).
37 t qq-plot t.statistics1a[, 1] qt(ppoints(t.statistics1a[, 1]), df = 4)
38 3.4 More complex experiments:
39 3.4 More complex experiments: One of the most commonly used designs in biological experiments is the reference design.
40 3.4 More complex experiments: One of the most commonly used designs in biological experiments is the reference design. The simplest such design compares two mrna samples A and B through a reference sample, Ref. That is, A is compared with Ref, and B is compared with Ref.
41 3.4 More complex experiments: One of the most commonly used designs in biological experiments is the reference design. The simplest such design compares two mrna samples A and B through a reference sample, Ref. That is, A is compared with Ref, and B is compared with Ref. In terms of log ratios M, for each gene j we now have M Aj = log(a j /Ref), M Bj = log(b j /Ref) where A, B are labelled red and Ref is labelled green.
42 3.4 More complex experiments: One of the most commonly used designs in biological experiments is the reference design. The simplest such design compares two mrna samples A and B through a reference sample, Ref. That is, A is compared with Ref, and B is compared with Ref. In terms of log ratios M, for each gene j we now have M Aj = log(a j /Ref), M Bj = log(b j /Ref) where A, B are labelled red and Ref is labelled green. The difference of interest is M Aj M Bj.
43 For ease of notation, we will drop the subscript j.
44 For ease of notation, we will drop the subscript j. In microarray experiments, there will be n 1 replicate arrays comparing (i.e. hybridising) A with Ref, and n 2 replicate arrays comparing B with Ref. Then the test statistic will be based on M A M B.
45 For ease of notation, we will drop the subscript j. In microarray experiments, there will be n 1 replicate arrays comparing (i.e. hybridising) A with Ref, and n 2 replicate arrays comparing B with Ref. Then the test statistic will be based on M A M B. We know that the optimal normal theory statistic is the two-sample t statistic: t = M A M B s p 1 n n 2 where s p is the pooled sample standard deviation.
46 The null hypothesis for each gene is that the expression levels in the two cell types A and B are the same, i.e., H 0 : µ A = µ B versus H a : µ A µ B.
47 The null hypothesis for each gene is that the expression levels in the two cell types A and B are the same, i.e., H 0 : µ A = µ B versus H a : µ A µ B. s p is sometimes replaced by the penalised pooled sample standard deviation, s p = a + s 2 p.
48 The example on the next page is from Dudoit et al. (2002) and shows a histogram of the observed two-sample t statistics, and the normal qq-plot for two-sample t statistics from a study comparing lipid levels in treated (A) and control mice (B).
49 The example on the next page is from Dudoit et al. (2002) and shows a histogram of the observed two-sample t statistics, and the normal qq-plot for two-sample t statistics from a study comparing lipid levels in treated (A) and control mice (B). There were 16 slides in the experiment, 8 for treated and 8 for control mice, each hybridised to a common reference pool of mice DNA (Ref).
50 The example on the next page is from Dudoit et al. (2002) and shows a histogram of the observed two-sample t statistics, and the normal qq-plot for two-sample t statistics from a study comparing lipid levels in treated (A) and control mice (B). There were 16 slides in the experiment, 8 for treated and 8 for control mice, each hybridised to a common reference pool of mice DNA (Ref). The points lying off the line are candidates for differential expression.
51 Histogram & qq plot ApoA1
52 Remarks on t statistics
53 Remarks on t statistics The t statistic has the advantage of extending to more complex situations, such as factorial designs and multiple regression.
54 Remarks on t statistics The t statistic has the advantage of extending to more complex situations, such as factorial designs and multiple regression. The above approach to analysis can be generalised to more than two samples using F statistics, and so on.
55 Remarks on t statistics The t statistic has the advantage of extending to more complex situations, such as factorial designs and multiple regression. The above approach to analysis can be generalised to more than two samples using F statistics, and so on. However, the two-sample t statistic assumes the random variables M A and M B are normally distributed and have equal variances, which may not be justified.
56 We can relax the equal variance assumption by using the approximate unequal variance form of the two-sample t statistic: t = M A M B s 2 A n 1 + s2 B n 2
57 We can relax the equal variance assumption by using the approximate unequal variance form of the two-sample t statistic: t = M A M B s 2 A n 1 + s2 B n 2 But there are better, alternative approaches.
58 The rest of this Chapter...
59 The rest of this Chapter... Nonparametric or distribution-free alternatives to the two-sample t statistic are popular and we consider two of these in 3.5: Mann-Whitney test Permutation test.
60 The rest of this Chapter... Nonparametric or distribution-free alternatives to the two-sample t statistic are popular and we consider two of these in 3.5: Mann-Whitney test Permutation test. Computer-intensive testing and estimation procedures are also popular, and in 3.6 we will study Bootstrap techniques.
61 The rest of this Chapter... Nonparametric or distribution-free alternatives to the two-sample t statistic are popular and we consider two of these in 3.5: Mann-Whitney test Permutation test. Computer-intensive testing and estimation procedures are also popular, and in 3.6 we will study Bootstrap techniques. In 3.7, we will study Bayesian estimation and testing procedures.
Single gene analysis of differential expression. Giorgio Valentini
Single gene analysis of differential expression Giorgio Valentini valenti@disi.unige.it Comparing two conditions Each condition may be represented by one or more RNA samples. Using cdna microarrays, samples
More informationExam details. Final Review Session. Things to Review
Exam details Final Review Session Short answer, similar to book problems Formulae and tables will be given You CAN use a calculator Date and Time: Dec. 7, 006, 1-1:30 pm Location: Osborne Centre, Unit
More informationDesign of microarray experiments
Design of microarray experiments Ulrich ansmann mansmann@imbi.uni-heidelberg.de Practical microarray analysis September Heidelberg Heidelberg, September otivation The lab biologist and theoretician need
More informationStatistical Applications in Genetics and Molecular Biology
Statistical Applications in Genetics and Molecular Biology Volume 6, Issue 1 2007 Article 28 A Comparison of Methods to Control Type I Errors in Microarray Studies Jinsong Chen Mark J. van der Laan Martyn
More informationRank-Based Methods. Lukas Meier
Rank-Based Methods Lukas Meier 20.01.2014 Introduction Up to now we basically always used a parametric family, like the normal distribution N (µ, σ 2 ) for modeling random data. Based on observed data
More informationSingle gene analysis of differential expression
Single gene analysis of differential expression Giorgio Valentini DSI Dipartimento di Scienze dell Informazione Università degli Studi di Milano valentini@dsi.unimi.it Comparing two conditions Each condition
More informationT.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS
ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS In our work on hypothesis testing, we used the value of a sample statistic to challenge an accepted value of a population parameter. We focused only
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org
More informationThe miss rate for the analysis of gene expression data
Biostatistics (2005), 6, 1,pp. 111 117 doi: 10.1093/biostatistics/kxh021 The miss rate for the analysis of gene expression data JONATHAN TAYLOR Department of Statistics, Stanford University, Stanford,
More informationLesson 11. Functional Genomics I: Microarray Analysis
Lesson 11 Functional Genomics I: Microarray Analysis Transcription of DNA and translation of RNA vary with biological conditions 3 kinds of microarray platforms Spotted Array - 2 color - Pat Brown (Stanford)
More informationDesign of Microarray Experiments. Xiangqin Cui
Design of Microarray Experiments Xiangqin Cui Experimental design Experimental design: is a term used about efficient methods for planning the collection of data, in order to obtain the maximum amount
More informationPermutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods
Permutation Tests Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods The Two-Sample Problem We observe two independent random samples: F z = z 1, z 2,, z n independently of
More informationStatistics Handbook. All statistical tables were computed by the author.
Statistics Handbook Contents Page Wilcoxon rank-sum test (Mann-Whitney equivalent) Wilcoxon matched-pairs test 3 Normal Distribution 4 Z-test Related samples t-test 5 Unrelated samples t-test 6 Variance
More informationTransition Passage to Descriptive Statistics 28
viii Preface xiv chapter 1 Introduction 1 Disciplines That Use Quantitative Data 5 What Do You Mean, Statistics? 6 Statistics: A Dynamic Discipline 8 Some Terminology 9 Problems and Answers 12 Scales of
More informationIdentifying Bio-markers for EcoArray
Identifying Bio-markers for EcoArray Ashish Bhan, Keck Graduate Institute Mustafa Kesir and Mikhail B. Malioutov, Northeastern University February 18, 2010 1 Introduction This problem was presented by
More informationPSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests
PSY 307 Statistics for the Behavioral Sciences Chapter 20 Tests for Ranked Data, Choosing Statistical Tests What To Do with Non-normal Distributions Tranformations (pg 382): The shape of the distribution
More informationComparison of Two Population Means
Comparison of Two Population Means Esra Akdeniz March 15, 2015 Independent versus Dependent (paired) Samples We have independent samples if we perform an experiment in two unrelated populations. We have
More informationSTATISTICS 4, S4 (4769) A2
(4769) A2 Objectives To provide students with the opportunity to explore ideas in more advanced statistics to a greater depth. Assessment Examination (72 marks) 1 hour 30 minutes There are four options
More information3 Joint Distributions 71
2.2.3 The Normal Distribution 54 2.2.4 The Beta Density 58 2.3 Functions of a Random Variable 58 2.4 Concluding Remarks 64 2.5 Problems 64 3 Joint Distributions 71 3.1 Introduction 71 3.2 Discrete Random
More informationBusiness Statistics. Lecture 10: Course Review
Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,
More informationAnalysis of variance (ANOVA) Comparing the means of more than two groups
Analysis of variance (ANOVA) Comparing the means of more than two groups Example: Cost of mating in male fruit flies Drosophila Treatments: place males with and without unmated (virgin) females Five treatments
More informationSelection should be based on the desired biological interpretation!
Statistical tools to compare levels of parasitism Jen_ Reiczigel,, Lajos Rózsa Hungary What to compare? The prevalence? The mean intensity? The median intensity? Or something else? And which statistical
More informationDistribution-Free Procedures (Devore Chapter Fifteen)
Distribution-Free Procedures (Devore Chapter Fifteen) MATH-5-01: Probability and Statistics II Spring 018 Contents 1 Nonparametric Hypothesis Tests 1 1.1 The Wilcoxon Rank Sum Test........... 1 1. Normal
More informationStatistical Applications in Genetics and Molecular Biology
Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca
More informationExperimental Design. Experimental design. Outline. Choice of platform Array design. Target samples
Experimental Design Credit for some of today s materials: Jean Yang, Terry Speed, and Christina Kendziorski Experimental design Choice of platform rray design Creation of probes Location on the array Controls
More informationA Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data
A Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data Yoonkyung Lee Department of Statistics The Ohio State University http://www.stat.ohio-state.edu/ yklee May 13, 2005
More informationUnsupervised machine learning
Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels
More informationUnit 14: Nonparametric Statistical Methods
Unit 14: Nonparametric Statistical Methods Statistics 571: Statistical Methods Ramón V. León 8/8/2003 Unit 14 - Stat 571 - Ramón V. León 1 Introductory Remarks Most methods studied so far have been based
More informationGlossary for the Triola Statistics Series
Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling
More informationGROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION
FOR SAMPLE OF RAW DATA (E.G. 4, 1, 7, 5, 11, 6, 9, 7, 11, 5, 4, 7) BE ABLE TO COMPUTE MEAN G / STANDARD DEVIATION MEDIAN AND QUARTILES Σ ( Σ) / 1 GROUPED DATA E.G. AGE FREQ. 0-9 53 10-19 4...... 80-89
More informationOptimal normalization of DNA-microarray data
Optimal normalization of DNA-microarray data Daniel Faller 1, HD Dr. J. Timmer 1, Dr. H. U. Voss 1, Prof. Dr. Honerkamp 1 and Dr. U. Hobohm 2 1 Freiburg Center for Data Analysis and Modeling 1 F. Hoffman-La
More informationTwo-Color Microarray Experimental Design Notation. Simple Examples of Analysis for a Single Gene. Microarray Experimental Design Notation
Simple Examples of Analysis for a Single Gene wo-olor Microarray Experimental Design Notation /3/0 opyright 0 Dan Nettleton Microarray Experimental Design Notation Microarray Experimental Design Notation
More informationDesign of microarray experiments
Design of microarray experiments Ulrich Mansmann mansmann@imbi.uni-heidelberg.de Practical microarray analysis March 23 Heidelberg Heidelberg, March 23 Experiments Scientists deal mostly with experiments
More informationContents. Acknowledgments. xix
Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables
More informationNormalization. Example of Replicate Data. Biostatistics Rafael A. Irizarry
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationChapter 7 Comparison of two independent samples
Chapter 7 Comparison of two independent samples 7.1 Introduction Population 1 µ σ 1 1 N 1 Sample 1 y s 1 1 n 1 Population µ σ N Sample y s n 1, : population means 1, : population standard deviations N
More informationSample Size Estimation for Studies of High-Dimensional Data
Sample Size Estimation for Studies of High-Dimensional Data James J. Chen, Ph.D. National Center for Toxicological Research Food and Drug Administration June 3, 2009 China Medical University Taichung,
More informationStatistical. Psychology
SEVENTH у *i km m it* & П SB Й EDITION Statistical M e t h o d s for Psychology D a v i d C. Howell University of Vermont ; \ WADSWORTH f% CENGAGE Learning* Australia Biaall apan Korea Меяко Singapore
More informationCorrelation and Regression (Excel 2007)
Correlation and Regression (Excel 2007) (See Also Scatterplots, Regression Lines, and Time Series Charts With Excel 2007 for instructions on making a scatterplot of the data and an alternate method of
More informationExam: high-dimensional data analysis February 28, 2014
Exam: high-dimensional data analysis February 28, 2014 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question (not the subquestions) on a separate piece of paper.
More informationIntroduction to hypothesis testing
Introduction to hypothesis testing Review: Logic of Hypothesis Tests Usually, we test (attempt to falsify) a null hypothesis (H 0 ): includes all possibilities except prediction in hypothesis (H A ) If
More informationNon-parametric (Distribution-free) approaches p188 CN
Week 1: Introduction to some nonparametric and computer intensive (re-sampling) approaches: the sign test, Wilcoxon tests and multi-sample extensions, Spearman s rank correlation; the Bootstrap. (ch14
More informationExperimental Design and Data Analysis for Biologists
Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1
More informationData Analysis and Statistical Methods Statistics 651
Data Analysis and Statistical Methods Statistics 65 http://www.stat.tamu.edu/~suhasini/teaching.html Suhasini Subba Rao Review In the previous lecture we considered the following tests: The independent
More informationPower Calculations for Preclinical Studies Using a K-Sample Rank Test and the Lehmann Alternative Hypothesis
Power Calculations for Preclinical Studies Using a K-Sample Rank Test and the Lehmann Alternative Hypothesis Glenn Heller Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center,
More informationBiochip informatics-(i)
Biochip informatics-(i) : biochip normalization & differential expression Ju Han Kim, M.D., Ph.D. SNUBI: SNUBiomedical Informatics http://www.snubi snubi.org/ Biochip Informatics - (I) Biochip basics Preprocessing
More informationTHE PRINCIPLES AND PRACTICE OF STATISTICS IN BIOLOGICAL RESEARCH. Robert R. SOKAL and F. James ROHLF. State University of New York at Stony Brook
BIOMETRY THE PRINCIPLES AND PRACTICE OF STATISTICS IN BIOLOGICAL RESEARCH THIRD E D I T I O N Robert R. SOKAL and F. James ROHLF State University of New York at Stony Brook W. H. FREEMAN AND COMPANY New
More informationRelative efficiency. Patrick Breheny. October 9. Theoretical framework Application to the two-group problem
Relative efficiency Patrick Breheny October 9 Patrick Breheny STA 621: Nonparametric Statistics 1/15 Relative efficiency Suppose test 1 requires n 1 observations to obtain a certain power β, and that test
More informationREPLICATED MICROARRAY DATA
Statistica Sinica 1(), 31-46 REPLICATED MICROARRAY DATA Ingrid Lönnstedt and Terry Speed Uppsala University, University of California, Berkeley and Walter and Eliza Hall Institute Abstract: cdna microarrays
More informationExample: Four levels of herbicide strength in an experiment on dry weight of treated plants.
The idea of ANOVA Reminders: A factor is a variable that can take one of several levels used to differentiate one group from another. An experiment has a one-way, or completely randomized, design if several
More information4/6/16. Non-parametric Test. Overview. Stephen Opiyo. Distinguish Parametric and Nonparametric Test Procedures
Non-parametric Test Stephen Opiyo Overview Distinguish Parametric and Nonparametric Test Procedures Explain commonly used Nonparametric Test Procedures Perform Hypothesis Tests Using Nonparametric Procedures
More informationInferential Statistical Analysis of Microarray Experiments 2007 Arizona Microarray Workshop
Inferential Statistical Analysis of Microarray Experiments 007 Arizona Microarray Workshop μ!! Robert J Tempelman Department of Animal Science tempelma@msuedu HYPOTHESIS TESTING (as if there was only one
More informationNon-specific filtering and control of false positives
Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview
More informationEVALUATING THE REPEATABILITY OF TWO STUDIES OF A LARGE NUMBER OF OBJECTS: MODIFIED KENDALL RANK-ORDER ASSOCIATION TEST
EVALUATING THE REPEATABILITY OF TWO STUDIES OF A LARGE NUMBER OF OBJECTS: MODIFIED KENDALL RANK-ORDER ASSOCIATION TEST TIAN ZHENG, SHAW-HWA LO DEPARTMENT OF STATISTICS, COLUMBIA UNIVERSITY Abstract. In
More informationFalse discovery rate and related concepts in multiple comparisons problems, with applications to microarray data
False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008 1 / 35 Lecture outline Motivation for not using
More informationCHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC
CHI SQUARE ANALYSIS I N T R O D U C T I O N T O N O N - P A R A M E T R I C A N A L Y S E S HYPOTHESIS TESTS SO FAR We ve discussed One-sample t-test Dependent Sample t-tests Independent Samples t-tests
More informationTables Table A Table B Table C Table D Table E 675
BMTables.indd Page 675 11/15/11 4:25:16 PM user-s163 Tables Table A Standard Normal Probabilities Table B Random Digits Table C t Distribution Critical Values Table D Chi-square Distribution Critical Values
More informationUniversity of California, Berkeley
University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2004 Paper 147 Multiple Testing Methods For ChIP-Chip High Density Oligonucleotide Array Data Sunduz
More informationAdvanced Statistical Methods: Beyond Linear Regression
Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi
More informationNon-parametric tests, part A:
Two types of statistical test: Non-parametric tests, part A: Parametric tests: Based on assumption that the data have certain characteristics or "parameters": Results are only valid if (a) the data are
More information2.830 Homework #6. April 2, 2009
2.830 Homework #6 Dayán Páez April 2, 2009 1 ANOVA The data for four different lithography processes, along with mean and standard deviations are shown in Table 1. Assume a null hypothesis of equality.
More informationLecture 30. DATA 8 Summer Regression Inference
DATA 8 Summer 2018 Lecture 30 Regression Inference Slides created by John DeNero (denero@berkeley.edu) and Ani Adhikari (adhikari@berkeley.edu) Contributions by Fahad Kamran (fhdkmrn@berkeley.edu) and
More informationBIOINFORMATICS ORIGINAL PAPER
BIOINFORMATICS ORIGINAL PAPER Vol 21 no 11 2005, pages 2684 2690 doi:101093/bioinformatics/bti407 Gene expression A practical false discovery rate approach to identifying patterns of differential expression
More information1. How will an increase in the sample size affect the width of the confidence interval?
Study Guide Concept Questions 1. How will an increase in the sample size affect the width of the confidence interval? 2. How will an increase in the sample size affect the power of a statistical test?
More informationParametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami
Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous
More informationTentative solutions TMA4255 Applied Statistics 16 May, 2015
Norwegian University of Science and Technology Department of Mathematical Sciences Page of 9 Tentative solutions TMA455 Applied Statistics 6 May, 05 Problem Manufacturer of fertilizers a) Are these independent
More informationHypothesis Testing with the Bootstrap. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods
Hypothesis Testing with the Bootstrap Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods Bootstrap Hypothesis Testing A bootstrap hypothesis test starts with a test statistic
More informationHigh-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018
High-Throughput Sequencing Course Multiple Testing Biostatistics and Bioinformatics Summer 2018 Introduction You have previously considered the significance of a single gene Introduction You have previously
More information4.1 Hypothesis Testing
4.1 Hypothesis Testing z-test for a single value double-sided and single-sided z-test for one average z-test for two averages double-sided and single-sided t-test for one average the F-parameter and F-table
More informationKruskal-Wallis and Friedman type tests for. nested effects in hierarchical designs 1
Kruskal-Wallis and Friedman type tests for nested effects in hierarchical designs 1 Assaf P. Oron and Peter D. Hoff Department of Statistics, University of Washington, Seattle assaf@u.washington.edu, hoff@stat.washington.edu
More informationChapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides
Chapter 7 Inference for Distributions Introduction to the Practice of STATISTICS SEVENTH EDITION Moore / McCabe / Craig Lecture Presentation Slides Chapter 7 Inference for Distributions 7.1 Inference for
More informationLinear Regression. In this lecture we will study a particular type of regression model: the linear regression model
1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor
More informationEstimation of Transformations for Microarray Data Using Maximum Likelihood and Related Methods
Estimation of Transformations for Microarray Data Using Maximum Likelihood and Related Methods Blythe Durbin, Department of Statistics, UC Davis, Davis, CA 95616 David M. Rocke, Department of Applied Science,
More informationprobability George Nicholson and Chris Holmes 31st October 2008
probability George Nicholson and Chris Holmes 31st October 2008 This practical focuses on understanding probabilistic and statistical concepts using simulation and plots in R R. It begins with an introduction
More informationQuick Calculation for Sample Size while Controlling False Discovery Rate with Application to Microarray Analysis
Statistics Preprints Statistics 11-2006 Quick Calculation for Sample Size while Controlling False Discovery Rate with Application to Microarray Analysis Peng Liu Iowa State University, pliu@iastate.edu
More informationA Practical Approach to Inferring Large Graphical Models from Sparse Microarray Data
A Practical Approach to Inferring Large Graphical Models from Sparse Microarray Data Juliane Schäfer Department of Statistics, University of Munich Workshop: Practical Analysis of Gene Expression Data
More informationLecture 7: Hypothesis Testing and ANOVA
Lecture 7: Hypothesis Testing and ANOVA Goals Overview of key elements of hypothesis testing Review of common one and two sample tests Introduction to ANOVA Hypothesis Testing The intent of hypothesis
More information10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification
10-810: Advanced Algorithms and Models for Computational Biology Optimal leaf ordering and classification Hierarchical clustering As we mentioned, its one of the most popular methods for clustering gene
More informationSPOTTED cdna MICROARRAYS
SPOTTED cdna MICROARRAYS Spot size: 50um - 150um SPOTTED cdna MICROARRAYS Compare the genetic expression in two samples of cells PRINT cdna from one gene on each spot SAMPLES cdna labelled red/green e.g.
More informationNonparametric tests, Bootstrapping
Nonparametric tests, Bootstrapping http://www.isrec.isb-sib.ch/~darlene/embnet/ Hypothesis testing review 2 competing theories regarding a population parameter: NULL hypothesis H ( straw man ) ALTERNATIVEhypothesis
More informationCHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)
FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter
More informationAdvanced Statistics II: Non Parametric Tests
Advanced Statistics II: Non Parametric Tests Aurélien Garivier ParisTech February 27, 2011 Outline Fitting a distribution Rank Tests for the comparison of two samples Two unrelated samples: Mann-Whitney
More informationBootstrap. ADA1 November 27, / 38
The bootstrap as a statistical method was invented in 1979 by Bradley Efron, one of the most influential statisticians still alive. The idea is nonparametric, but is not based on ranks, and is very computationally
More informationBackground to Statistics
FACT SHEET Background to Statistics Introduction Statistics include a broad range of methods for manipulating, presenting and interpreting data. Professional scientists of all kinds need to be proficient
More informationStatistical testing. Samantha Kleinberg. October 20, 2009
October 20, 2009 Intro to significance testing Significance testing and bioinformatics Gene expression: Frequently have microarray data for some group of subjects with/without the disease. Want to find
More informationInferences About the Difference Between Two Means
7 Inferences About the Difference Between Two Means Chapter Outline 7.1 New Concepts 7.1.1 Independent Versus Dependent Samples 7.1. Hypotheses 7. Inferences About Two Independent Means 7..1 Independent
More informationStat 427/527: Advanced Data Analysis I
Stat 427/527: Advanced Data Analysis I Review of Chapters 1-4 Sep, 2017 1 / 18 Concepts you need to know/interpret Numerical summaries: measures of center (mean, median, mode) measures of spread (sample
More informationStatistical analysis of microarray data: a Bayesian approach
Biostatistics (003), 4, 4,pp. 597 60 Printed in Great Britain Statistical analysis of microarray data: a Bayesian approach RAPHAEL GTTARD University of Washington, Department of Statistics, Box 3543, Seattle,
More informationStatistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017
Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017 I. χ 2 or chi-square test Objectives: Compare how close an experimentally derived value agrees with an expected value. One method to
More informationThe Nonparametric Bootstrap
The Nonparametric Bootstrap The nonparametric bootstrap may involve inferences about a parameter, but we use a nonparametric procedure in approximating the parametric distribution using the ECDF. We use
More informationNonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown
Nonparametric Statistics Leah Wright, Tyler Ross, Taylor Brown Before we get to nonparametric statistics, what are parametric statistics? These statistics estimate and test population means, while holding
More informationBIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES
BIOL 458 - Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES PART 1: INTRODUCTION TO ANOVA Purpose of ANOVA Analysis of Variance (ANOVA) is an extremely useful statistical method
More informationThe t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies
The t-test: So Far: Sampling distribution benefit is that even if the original population is not normal, a sampling distribution based on this population will be normal (for sample size > 30). Benefit
More informationChapter 3. Measuring data
Chapter 3 Measuring data 1 Measuring data versus presenting data We present data to help us draw meaning from it But pictures of data are subjective They re also not susceptible to rigorous inference Measuring
More informationST4241 Design and Analysis of Clinical Trials Lecture 9: N. Lecture 9: Non-parametric procedures for CRBD
ST21 Design and Analysis of Clinical Trials Lecture 9: Non-parametric procedures for CRBD Department of Statistics & Applied Probability 8:00-10:00 am, Friday, September 9, 2016 Outline Nonparametric tests
More informationeqr014 Lenth s Method for the Analysis of Unreplicated Experiments
eqr014 Lenth s Method for the Analysis of Unreplicated Experiments Russell V. Lenth Department of Statistics and Actuarial Science The University of Iowa Iowa City, IA USA 52242 Voice 319-335-0814 FAX
More informationOne-shot Learning of Poisson Distributions Information Theory of Audic-Claverie Statistic for Analyzing cdna Arrays
One-shot Learning of Poisson Distributions Information Theory of Audic-Claverie Statistic for Analyzing cdna Arrays Peter Tiňo School of Computer Science University of Birmingham, UK One-shot Learning
More informationNonparametric tests. Mark Muldoon School of Mathematics, University of Manchester. Mark Muldoon, November 8, 2005 Nonparametric tests - p.
Nonparametric s Mark Muldoon School of Mathematics, University of Manchester Mark Muldoon, November 8, 2005 Nonparametric s - p. 1/31 Overview The sign, motivation The Mann-Whitney Larger Larger, in pictures
More informationResampling Methods. Lukas Meier
Resampling Methods Lukas Meier 20.01.2014 Introduction: Example Hail prevention (early 80s) Is a vaccination of clouds really reducing total energy? Data: Hail energy for n clouds (via radar image) Y i
More informationNon-Parametric Two-Sample Analysis: The Mann-Whitney U Test
Non-Parametric Two-Sample Analysis: The Mann-Whitney U Test When samples do not meet the assumption of normality parametric tests should not be used. To overcome this problem, non-parametric tests can
More information