Sample Size Estimation for Studies of High-Dimensional Data
|
|
- Eugene Horton
- 5 years ago
- Views:
Transcription
1 Sample Size Estimation for Studies of High-Dimensional Data James J. Chen, Ph.D. National Center for Toxicological Research Food and Drug Administration June 3, 2009 China Medical University Taichung, Taiwan 1
2 Outline Background (Single endpoint) Power Analysis and Sample Size Sample Size Problem in Multiple Testing Multiple Testing Framework Type I error and Power Sample Size Estimation Independent Model Correlated Model
3 Statistical Hypothesis Testing Population - Control (m c, s); Treatment (m t, s) Difference d = m c m t Experiment - Control (x c, s c ); Treatment (x t, s t ) with n samples in each group Is the mean difference between Control and Treatment significant? Two-sample t-test: t s 2 c x c x t n s 2 t Statistical S1- Determine the proper test statistic Test S2- Perform statistical test from the observed data S3- Compute p-value (under the null hypothesis) S4- Compare the p-value with the pre-specified n s d 2 n
4 Power of a Test Two-sample t-test: t s 2 c x c x n s The significance (power) of a test depends on the sample size n, and the mean difference d (d), and standard deviation s (s). Power: Probability of declaring a significance (p-value ) Given d m c m t, s, n, and Power (1-b) of a test (t-test) can be computed. t 2 t Given d m c m t, s, (1-b), and Sample size of a test (t-test) can be computed to achieve a significance. n s d 2 n
5 Sample Size Estimation Sample size estimation for a given study is conducted during the design stage before data are collected. The three factors d, s, and together with the power (1-b) deter mine the needed sample size n. Effect size d: the targeted statistical distance between the param eters of two populations designed to detect (the difference in pop ulation means. It represents the smallest effect that is considered to be clinical or biological relevant (significance). Sample size: the number of experimental units (e.g., biological sa mples.
6 Sample Size Estimation One Endpoint Sample size problem: The number of samples needed to ensure the power (1-b) to detect the difference d with a test at the significance level. Sample size calculation requires specifying 1. Effect size d or standardized effect size d/s, 2. The desired power (1-b), 3. Type I error. Sample size for the two-sample t-test: n = 2 (t /2 + t b ) 2 /(d/s) 2.
7 High-Dimensional Data High dimensional data (e.g., gene expression data): Each sample is characterized by hundreds or thousands of correlated measurements. Most measurements are unrelated to the phenotypes. The number of phenotype samples is small. The curse of dimensionality: multiplicity testing; feature selection; over-fitting; poor scalability; etc
8 High-Dimensional Data for Cancer Biology Goal: To identify markers (genes or proteins) that may have a functional role in specific phenotypes Diagnosis: define patterns that can identify specific phenotypes. Prognosis: establish a patient s clinical outcome independent of treatment. Prediction: predict outcome of a specific treatment.
9 Data Matrix Potential Markers x 1 x 2 x 3. x m S 11 y 111 y 211 y 311. y m11 Control (-) Treatment (+) S 21 y 112 y 212 y 312. y m12... S n1 y 11k y 21k y 31k. y m1k S 12 y 121 y 221 y 321. y m21 S 22 y 122 y 222 y 322. y m22... S n2 y 12k y 22k y 32k. y m2k Using an appropriate test statistics to compare two groups for each variable (univariate analysis) at the significance level.
10 Testing Multiple Endpoints Decision True State Significance Non-significance Total Null V S 1- m 0 Alternative U 1-b T b m 1 Total R A m V is the number of the false positives; U is the number of the true positives; R is the total number of the significances; E(V)/m 0 = is the per comparison-wise error (CWE) rate; E(U)/m 1 = (1b) is the per comparison-wise power (sensitivity).
11 Type I Errors in Multiple Testing Three type I errors in multiple testing: CWE = E(V)/m: Expected proportion of positives. FWE = Pr(V > 0): Probability of at least one false positive. FDR = E(V/R): Expected proportion of false positive findings. Type I error based on the CWE at : = v*/m, the expected number of false positives is v*. = f*/m, the FWE will be at f* (Bonferroni approach). When m is large, becomes small. Thus, the Bonferroni approach is not paratical. Most favor the FDR approach in high-dimensional data.
12 FDR - Comments If m 0 = m (all null hypotheses are true), both FDR and FWE are 1 when there is any rejection. In this case, FWE and FDR are equivalent. If m 0 < m, it can be shown that Pr(V > 0) > E(V/R). A MCP procedure which controls FWE will controls FDR. The FDR approach allows the findings to be made, provided that the investigator is willing to accept a small fraction of false positive findings. Most favor the FDR approach in high-dimensional data.
13 Sample Size Estimation HD Since a large number of tests is made and the structu re of data is complex, determination of the needed sa mple size is difficult. Most current methods proposed for the sample size a nalysis do not consider the dependency of expressio n levels and/or assume equal variance among genes.
14 Power and Sample Size Power is defined by (1-b), the proportion that the true alternatives are significant (sensitivity l). Sample size problem: The number of samples needed to ensure detecting at least l fraction out of the m 1 true alternatives for the difference d with a test at the significance level. (Both m and m 1 are pre-specified by the investigator.)
15 A Simple Method For specified l, and (1-b), the needed sample size based on the univariate calculation is n = n* = 2 (t /2 + t b ) 2 /(d/s) 2, where n* is the smallest integer >= n*. Conversely, given d,, and n the outcome of a test is a Bernoulli with p = (1b). The expected proportion of detection (sensitivity) is l. Sample size as calculated will have the sensitivity l, on the average. But, the probability can be low.
16 Confidence Probability Given m, m 1, d, and, the relationship between the sample size n and the power (1-b) is n = n* = 2 (t /2 + t b ) 2 /(d/s) 2. Under the independent model, the probability f l to detect at least l fraction of the m 1 alternatives is the sum of the binomial probabilities:
17 Estimated sample size n, sensitivity l, and power f under the independent model based on m = 2,000, d = 2, and = Univariate Method p 1 (%) l f n* n * l f
18 An Alternative Formulation The number of samples needed to ensure the specified sensitivity l with the confidence probability at least f. The sample size n is calculated using the two equations: n* = 2 (t /2 + t b ) 2 /(d/s) 2. 18
19 Estimated sample size n, sensitivity l, and power f under the independent model based on m = 2,000, d = 2, and = Mean Method Confidence probability p 1 (%) l n * l f n l f
20 Type I Errors: CWE, FWE, FDR Type I errors: FWE, FDR, or CWE (v: number of false positives). Since m 1 and (1 β) are pre-specified, a Type I error can be expressed in terms of CWE. Setting α = v/m, the FWE will be controlled at v (e.g., v = 0.05). Setting α = [m 1 (1 β)q*]/[m 0 (1 q*)], the FDR will be controlled at q*. 20
21 Simulation Experiment: Independent Model Fixed m = 2,000, α = 0.001, d = 2; specify m 1 and n Null model, m 0 = m(1 π 1 ) from N(0,1); Alternative model, m 1 = mπ 1 from N(δ,1). For each sample set, the t-statistics were computed, and v and u are counted at α = The estimates of α, FDR, λ and φ λ were then calculated. The estimate of φ λ was the proportion of times out of the 1,000 simulations 21
22 Estimated and FDR q from the mean and confidence probability methods under the independent model, m = 2,000, d = 2, and = Mean Method Confidence Probability p 1 l q * n a q n b q
23 Estimated sensitivity l and confidence probability f from the mean and confidence probability methods under the independent model, m = 2,000, d = 2, and = Mean Method Confidence Probability p 1 (%) l n a l f n b l f
24 A Re-sampling Method - Tibshirani X i t 01 t 02 t 0N Null genes v 1 v 2 v N Type I error or FDR Pilot data X Null distribution t i s i t 11 t 12 t 1N σ d 1 1 n0 1 n Non-null genes 1 u 1 u 2 u N the (1f) th percentile u*
25 Lin-Chen Re-sampling 1. Start with the sample Method size n from independent model. 2. Compute the adjustment factors f = f 1 x f Generate permutation samples for (b = 1,2,..,B). Compute the t-statistics from the permutation samples and multiple each t-statistic by the factor f. t b = {ft 0b, ft 1b }. Add d to a set of randomly selected m 1 genes in Group Construct a null distribution by pooling all t 0b s. 5. Calculate v b (Type I error) and u b (power) for each t b. 6. Order u 1, u 2,, u B, and find the (1-f) th percentile, u*. 7. If u* > m 1 l stop and report n as the sample size estimate; otherwise, increase n by 1 and go to 1. 25
26 26 Comments 1. This method is modified from Tibshirani (2007). 2. It accounts for the correlations and variances of the variables. 3. The sample size in the pilot dataset is small, in practice. 4. Adjustment factors f = f 1 x f 2 2 / 2, 2 / 2, p p n n n n p p p p t t n n n n f f f f 1 uses the maximum likelihood estimate of the t-statistic f 2 is to account for differential sample sizes for the df of the t- distribution, i.e., n 0p + n 1p 2 and n 0 + n 1 2.
27 A Re-sampling Method Lin & Chen X i ft 01 ft 02 ft 0N Null genes v 1 v 2 v N Type I error or FDR Pilot data X Null distribution f n n n t 0p 1p n0 n12, / 2 0p n1 p 2 tn n 2, / 2 0 p 1p t i s i ft 11 ft 12 ft 1N σ d 1 1 n0 1 n Non-null genes 1 u 1 u 2 u N the (1f) th percentile u*
28 Simulation Experiment: Correlated Model Colon cancer data set (Alon et al., 1999, PNAS) The colon dataset consists of expression patterns of 2000 human genes with 22 normal and 40 colon tumor tissues. Fixed m = 2,000, α = 0.001, d = 2; specify m1 and n. 1. Random select 4 samples for each group. 2. Compute the sample sizes using the Tibshirani and Lin & Chen methods. 3. Repeat 1000 times by selecting different sets of four samples. 4. Estimate the mean and standard deviation. Repeat the procedure using 6 samples for each group. 28
29 Sample size estimates (sd), 4 or 6 samples samples per group with 1,000 repetitions, m = 2,000, d = 2 and =10-3 p 1 (%) l n a n b (4) n c (4) n b (6) n c (6) (3.16) 26.6(7.98) 14.3(2.35) 18.0(3.88) (3.23) 30.0(8.09) 15.3(2.48) 19.7(4.05) (3.68) 35.5(9.15) 17.3(2.45) 22.8(4.12) (2.76) 25.5(7.04) 14.2(2.33) 17.8(3.88) (3.33) 29.6(8.17) 15.4(2.38) 19.8(3.90) (3.48) 35.4(8.74) 17.3(2.53) 22.8(4.26) a. mean method; b. Lin & Chen; c. Tibshirani
30 Estimated sensitivity l and confidence probability f from the univariate mean and Lin & Chen methods using the colon tumor data, m = 2,000, d = 2, and = Univariate mean method Lin & Chen Method p 1 (%) l n l f n l f
31 Summary - 1 Common Approaches: Sample size estimation methods were derived from the independent or equi-correlated models because of complexity of the correlation among the variables. The sample size is formulated as the number of arrays needed to achieve the specified sensitivity l on the average. This formulation is inadequate due to the presence of the variance in estimating l.
32 Summary - II Alternative: The number of arrays needed to ensure detecting at least the specified sensitivity with a confidence probability at least 95%. A permutation method using a small pilot dataset to estimate sample size is proposed. use a small pilot dataset, 4-6 samples per groups. provide efficient estimates of sample size. perform well for an illustrative dataset.
33 References Tsai, C-A, Wang, S-J, Chen, D-T, and Chen, J.J. Sample size for gene expression microarray experiments. Bioinformatics 21, , Tibshirani R. A simple method for assessing sample sizes in microarray experiments. BMC Bioinformatics 2006; 7:106. Lin, W-J and Chen, J.J. Power and sample size estimation in microarray studies (manuscript).
34 Sample Size Estimation for Studies of High-Dimensional Data Before conducting an experiment, one important issue that needs to be decided is the number of samples required in order to have adequate power to detect treatment effects. Sample size estimation for a single endpoint is formulated as the number of samples needed to ensure the specified power of detecting the specified mean difference at a pre-specified significance level a. For high dimensional data, the common sample size estimation is calculated to ensure to detect a specified sensitivity on the average. The needed sample size can be calculated using a univariate method. This formulation is inadequate due to the presence of the variance in estimating the sensitivity. The univariate method does not take the dependence among the variables into consideration. This talk formulates the sample size problem as the number of samples needed to ensure detecting at least the specified sensitivity with the desired confidence probability. A permutation method using a small pilot dataset to estimate sample size will be presented. This method accounts for correlation and variance heterogeneity among variables and is shown to perform well for an example dataset.
Statistical Applications in Genetics and Molecular Biology
Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca
More informationResearch Article Sample Size Calculation for Controlling False Discovery Proportion
Probability and Statistics Volume 2012, Article ID 817948, 13 pages doi:10.1155/2012/817948 Research Article Sample Size Calculation for Controlling False Discovery Proportion Shulian Shang, 1 Qianhe Zhou,
More informationHigh-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018
High-Throughput Sequencing Course Multiple Testing Biostatistics and Bioinformatics Summer 2018 Introduction You have previously considered the significance of a single gene Introduction You have previously
More informationStep-down FDR Procedures for Large Numbers of Hypotheses
Step-down FDR Procedures for Large Numbers of Hypotheses Paul N. Somerville University of Central Florida Abstract. Somerville (2004b) developed FDR step-down procedures which were particularly appropriate
More informationChapter Seven: Multi-Sample Methods 1/52
Chapter Seven: Multi-Sample Methods 1/52 7.1 Introduction 2/52 Introduction The independent samples t test and the independent samples Z test for a difference between proportions are designed to analyze
More informationEstimation of the False Discovery Rate
Estimation of the False Discovery Rate Coffee Talk, Bioinformatics Research Center, Sept, 2005 Jason A. Osborne, osborne@stat.ncsu.edu Department of Statistics, North Carolina State University 1 Outline
More informationClass 24. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700
Class 4 Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science Copyright 013 by D.B. Rowe 1 Agenda: Recap Chapter 9. and 9.3 Lecture Chapter 10.1-10.3 Review Exam 6 Problem Solving
More informationAdvanced Statistical Methods: Beyond Linear Regression
Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi
More informationT.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS
ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS In our work on hypothesis testing, we used the value of a sample statistic to challenge an accepted value of a population parameter. We focused only
More informationStatistical testing. Samantha Kleinberg. October 20, 2009
October 20, 2009 Intro to significance testing Significance testing and bioinformatics Gene expression: Frequently have microarray data for some group of subjects with/without the disease. Want to find
More informationHigh-throughput Testing
High-throughput Testing Noah Simon and Richard Simon July 2016 1 / 29 Testing vs Prediction On each of n patients measure y i - single binary outcome (eg. progression after a year, PCR) x i - p-vector
More informationSample Size and Power I: Binary Outcomes. James Ware, PhD Harvard School of Public Health Boston, MA
Sample Size and Power I: Binary Outcomes James Ware, PhD Harvard School of Public Health Boston, MA Sample Size and Power Principles: Sample size calculations are an essential part of study design Consider
More informationTable of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. T=number of type 2 errors
The Multiple Testing Problem Multiple Testing Methods for the Analysis of Microarray Data 3/9/2009 Copyright 2009 Dan Nettleton Suppose one test of interest has been conducted for each of m genes in a
More informationMultiple Testing. Hoang Tran. Department of Statistics, Florida State University
Multiple Testing Hoang Tran Department of Statistics, Florida State University Large-Scale Testing Examples: Microarray data: testing differences in gene expression between two traits/conditions Microbiome
More informationMultiple testing: Intro & FWER 1
Multiple testing: Intro & FWER 1 Mark van de Wiel mark.vdwiel@vumc.nl Dep of Epidemiology & Biostatistics,VUmc, Amsterdam Dep of Mathematics, VU 1 Some slides courtesy of Jelle Goeman 1 Practical notes
More informationSociology 6Z03 Review II
Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability
More informationNon-specific filtering and control of false positives
Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview
More informationFamily-wise Error Rate Control in QTL Mapping and Gene Ontology Graphs
Family-wise Error Rate Control in QTL Mapping and Gene Ontology Graphs with Remarks on Family Selection Dissertation Defense April 5, 204 Contents Dissertation Defense Introduction 2 FWER Control within
More informationLooking at the Other Side of Bonferroni
Department of Biostatistics University of Washington 24 May 2012 Multiple Testing: Control the Type I Error Rate When analyzing genetic data, one will commonly perform over 1 million (and growing) hypothesis
More informationSample Size and Power Calculation in Microarray Studies Using the sizepower package.
Sample Size and Power Calculation in Microarray Studies Using the sizepower package. Weiliang Qiu email: weiliang.qiu@gmail.com Mei-Ling Ting Lee email: meilinglee@sph.osu.edu George Alex Whitmore email:
More informationBioinformatics. Genotype -> Phenotype DNA. Jason H. Moore, Ph.D. GECCO 2007 Tutorial / Bioinformatics.
Bioinformatics Jason H. Moore, Ph.D. Frank Lane Research Scholar in Computational Genetics Associate Professor of Genetics Adjunct Associate Professor of Biological Sciences Adjunct Associate Professor
More informationSta$s$cs for Genomics ( )
Sta$s$cs for Genomics (140.688) Instructor: Jeff Leek Slide Credits: Rafael Irizarry, John Storey No announcements today. Hypothesis testing Once you have a given score for each gene, how do you decide
More information1 Statistical inference for a population mean
1 Statistical inference for a population mean 1. Inference for a large sample, known variance Suppose X 1,..., X n represents a large random sample of data from a population with unknown mean µ and known
More informationHYPOTHESIS TESTING. Hypothesis Testing
MBA 605 Business Analytics Don Conant, PhD. HYPOTHESIS TESTING Hypothesis testing involves making inferences about the nature of the population on the basis of observations of a sample drawn from the population.
More informationSTAT 461/561- Assignments, Year 2015
STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and
More informationSingle gene analysis of differential expression
Single gene analysis of differential expression Giorgio Valentini DSI Dipartimento di Scienze dell Informazione Università degli Studi di Milano valentini@dsi.unimi.it Comparing two conditions Each condition
More informationAdaptive Designs: Why, How and When?
Adaptive Designs: Why, How and When? Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj ISBS Conference Shanghai, July 2008 1 Adaptive designs:
More informationHotelling s One- Sample T2
Chapter 405 Hotelling s One- Sample T2 Introduction The one-sample Hotelling s T2 is the multivariate extension of the common one-sample or paired Student s t-test. In a one-sample t-test, the mean response
More informationSummary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing
Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Statistics Journal Club, 36-825 Beau Dabbs and Philipp Burckhardt 9-19-2014 1 Paper
More informationAP Statistics Ch 12 Inference for Proportions
Ch 12.1 Inference for a Population Proportion Conditions for Inference The statistic that estimates the parameter p (population proportion) is the sample proportion p ˆ. p ˆ = Count of successes in the
More informationReview: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.
1 Review: Let X 1, X,..., X n denote n independent random variables sampled from some distribution might not be normal!) with mean µ) and standard deviation σ). Then X µ σ n In other words, X is approximately
More informationPubh 8482: Sequential Analysis
Pubh 8482: Sequential Analysis Joseph S. Koopmeiners Division of Biostatistics University of Minnesota Week 10 Class Summary Last time... We began our discussion of adaptive clinical trials Specifically,
More informationThe miss rate for the analysis of gene expression data
Biostatistics (2005), 6, 1,pp. 111 117 doi: 10.1093/biostatistics/kxh021 The miss rate for the analysis of gene expression data JONATHAN TAYLOR Department of Statistics, Stanford University, Stanford,
More informationStat 206: Estimation and testing for a mean vector,
Stat 206: Estimation and testing for a mean vector, Part II James Johndrow 2016-12-03 Comparing components of the mean vector In the last part, we talked about testing the hypothesis H 0 : µ 1 = µ 2 where
More informationSingle gene analysis of differential expression. Giorgio Valentini
Single gene analysis of differential expression Giorgio Valentini valenti@disi.unige.it Comparing two conditions Each condition may be represented by one or more RNA samples. Using cdna microarrays, samples
More informationFalse discovery control for multiple tests of association under general dependence
False discovery control for multiple tests of association under general dependence Nicolai Meinshausen Seminar für Statistik ETH Zürich December 2, 2004 Abstract We propose a confidence envelope for false
More informationBIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke
BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart
More informationTools and topics for microarray analysis
Tools and topics for microarray analysis USSES Conference, Blowing Rock, North Carolina, June, 2005 Jason A. Osborne, osborne@stat.ncsu.edu Department of Statistics, North Carolina State University 1 Outline
More informationPubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH
PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH The First Step: SAMPLE SIZE DETERMINATION THE ULTIMATE GOAL The most important, ultimate step of any of clinical research is to do draw inferences;
More informationCOMPARING GROUPS PART 1CONTINUOUS DATA
COMPARING GROUPS PART 1CONTINUOUS DATA Min Chen, Ph.D. Assistant Professor Quantitative Biomedical Research Center Department of Clinical Sciences Bioinformatics Shared Resource Simmons Comprehensive Cancer
More informationCHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity
CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity Prof. Kevin E. Thorpe Dept. of Public Health Sciences University of Toronto Objectives 1. Be able to distinguish among the various
More informationHYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă
HYPOTHESIS TESTING II TESTS ON MEANS Sorana D. Bolboacă OBJECTIVES Significance value vs p value Parametric vs non parametric tests Tests on means: 1 Dec 14 2 SIGNIFICANCE LEVEL VS. p VALUE Materials and
More informationTutorial 4: Power and Sample Size for the Two-sample t-test with Unequal Variances
Tutorial 4: Power and Sample Size for the Two-sample t-test with Unequal Variances Preface Power is the probability that a study will reject the null hypothesis. The estimated probability is a function
More informationRelax and good luck! STP 231 Example EXAM #2. Instructor: Ela Jackiewicz
STP 31 Example EXAM # Instructor: Ela Jackiewicz Honor Statement: I have neither given nor received information regarding this exam, and I will not do so until all exams have been graded and returned.
More informationCHAPTER 9, 10. Similar to a courtroom trial. In trying a person for a crime, the jury needs to decide between one of two possibilities:
CHAPTER 9, 10 Hypothesis Testing Similar to a courtroom trial. In trying a person for a crime, the jury needs to decide between one of two possibilities: The person is guilty. The person is innocent. To
More informationTesting a secondary endpoint after a group sequential test. Chris Jennison. 9th Annual Adaptive Designs in Clinical Trials
Testing a secondary endpoint after a group sequential test Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj 9th Annual Adaptive Designs in
More informationWelcome! Webinar Biostatistics: sample size & power. Thursday, April 26, 12:30 1:30 pm (NDT)
. Welcome! Webinar Biostatistics: sample size & power Thursday, April 26, 12:30 1:30 pm (NDT) Get started now: Please check if your speakers are working and mute your audio. Please use the chat box to
More informationBiochip informatics-(i)
Biochip informatics-(i) : biochip normalization & differential expression Ju Han Kim, M.D., Ph.D. SNUBI: SNUBiomedical Informatics http://www.snubi snubi.org/ Biochip Informatics - (I) Biochip basics Preprocessing
More informationM(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1
Math 66/566 - Midterm Solutions NOTE: These solutions are for both the 66 and 566 exam. The problems are the same until questions and 5. 1. The moment generating function of a random variable X is M(t)
More informationPower and Sample Size Bios 662
Power and Sample Size Bios 662 Michael G. Hudgens, Ph.D. mhudgens@bios.unc.edu http://www.bios.unc.edu/ mhudgens 2008-10-31 14:06 BIOS 662 1 Power and Sample Size Outline Introduction One sample: continuous
More informationOne-week Course on Genetic Analysis and Plant Breeding January 2013, CIMMYT, Mexico LOD Threshold and QTL Detection Power Simulation
One-week Course on Genetic Analysis and Plant Breeding 21-2 January 213, CIMMYT, Mexico LOD Threshold and QTL Detection Power Simulation Jiankang Wang, CIMMYT China and CAAS E-mail: jkwang@cgiar.org; wangjiankang@caas.cn
More informationWeek 14 Comparing k(> 2) Populations
Week 14 Comparing k(> 2) Populations Week 14 Objectives Methods associated with testing for the equality of k(> 2) means or proportions are presented. Post-testing concepts and analysis are introduced.
More informationStatistical Applications in Genetics and Molecular Biology
Statistical Applications in Genetics and Molecular Biology Volume 6, Issue 1 2007 Article 28 A Comparison of Methods to Control Type I Errors in Microarray Studies Jinsong Chen Mark J. van der Laan Martyn
More informationEVALUATING THE REPEATABILITY OF TWO STUDIES OF A LARGE NUMBER OF OBJECTS: MODIFIED KENDALL RANK-ORDER ASSOCIATION TEST
EVALUATING THE REPEATABILITY OF TWO STUDIES OF A LARGE NUMBER OF OBJECTS: MODIFIED KENDALL RANK-ORDER ASSOCIATION TEST TIAN ZHENG, SHAW-HWA LO DEPARTMENT OF STATISTICS, COLUMBIA UNIVERSITY Abstract. In
More informationComparison of Two Population Means
Comparison of Two Population Means Esra Akdeniz March 15, 2015 Independent versus Dependent (paired) Samples We have independent samples if we perform an experiment in two unrelated populations. We have
More informationFalse discovery rate and related concepts in multiple comparisons problems, with applications to microarray data
False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008 1 / 35 Lecture outline Motivation for not using
More informationChapter 3: Statistical methods for estimation and testing. Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001).
Chapter 3: Statistical methods for estimation and testing Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001). Chapter 3: Statistical methods for estimation and testing Key reference:
More informationLecture 21: October 19
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 21: October 19 21.1 Likelihood Ratio Test (LRT) To test composite versus composite hypotheses the general method is to use
More informationHOW TO DETERMINE THE NUMBER OF SUBJECTS NEEDED FOR MY STUDY?
HOW TO DETERMINE THE NUMBER OF SUBJECTS NEEDED FOR MY STUDY? TUTORIAL ON SAMPLE SIZE AND POWER CALCULATIONS FOR INEQUALITY TESTS. John Zavrakidis j.zavrakidis@nki.nl May 28, 2018 J.Zavrakidis Sample and
More informationIntroduction to Nonparametric Statistics
Introduction to Nonparametric Statistics by James Bernhard Spring 2012 Parameters Parametric method Nonparametric method µ[x 2 X 1 ] paired t-test Wilcoxon signed rank test µ[x 1 ], µ[x 2 ] 2-sample t-test
More informationSummary: the confidence interval for the mean (σ 2 known) with gaussian assumption
Summary: the confidence interval for the mean (σ known) with gaussian assumption on X Let X be a Gaussian r.v. with mean µ and variance σ. If X 1, X,..., X n is a random sample drawn from X then the confidence
More informationTwo Sample Problems. Two sample problems
Two Sample Problems Two sample problems The goal of inference is to compare the responses in two groups. Each group is a sample from a different population. The responses in each group are independent
More informationDose-response modeling with bivariate binary data under model uncertainty
Dose-response modeling with bivariate binary data under model uncertainty Bernhard Klingenberg 1 1 Department of Mathematics and Statistics, Williams College, Williamstown, MA, 01267 and Institute of Statistics,
More informationTutorial 2: Power and Sample Size for the Paired Sample t-test
Tutorial 2: Power and Sample Size for the Paired Sample t-test Preface Power is the probability that a study will reject the null hypothesis. The estimated probability is a function of sample size, variability,
More informationPerformance Evaluation and Comparison
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation
More informationPower and sample size calculations
Patrick Breheny October 20 Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 1 / 26 Planning a study Introduction What is power? Why is it important? Setup One of the most important
More informationCH.9 Tests of Hypotheses for a Single Sample
CH.9 Tests of Hypotheses for a Single Sample Hypotheses testing Tests on the mean of a normal distributionvariance known Tests on the mean of a normal distributionvariance unknown Tests on the variance
More informationQuick Calculation for Sample Size while Controlling False Discovery Rate with Application to Microarray Analysis
Statistics Preprints Statistics 11-2006 Quick Calculation for Sample Size while Controlling False Discovery Rate with Application to Microarray Analysis Peng Liu Iowa State University, pliu@iastate.edu
More informationPermutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods
Permutation Tests Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods The Two-Sample Problem We observe two independent random samples: F z = z 1, z 2,, z n independently of
More informationPolitical Science 236 Hypothesis Testing: Review and Bootstrapping
Political Science 236 Hypothesis Testing: Review and Bootstrapping Rocío Titiunik Fall 2007 1 Hypothesis Testing Definition 1.1 Hypothesis. A hypothesis is a statement about a population parameter The
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationThe Design of a Survival Study
The Design of a Survival Study The design of survival studies are usually based on the logrank test, and sometimes assumes the exponential distribution. As in standard designs, the power depends on The
More informationMATH Notebook 3 Spring 2018
MATH448001 Notebook 3 Spring 2018 prepared by Professor Jenny Baglivo c Copyright 2010 2018 by Jenny A. Baglivo. All Rights Reserved. 3 MATH448001 Notebook 3 3 3.1 One Way Layout........................................
More informationSampling Distributions: Central Limit Theorem
Review for Exam 2 Sampling Distributions: Central Limit Theorem Conceptually, we can break up the theorem into three parts: 1. The mean (µ M ) of a population of sample means (M) is equal to the mean (µ)
More informationPubh 8482: Sequential Analysis
Pubh 8482: Sequential Analysis Joseph S. Koopmeiners Division of Biostatistics University of Minnesota Week 12 Review So far... We have discussed the role of phase III clinical trials in drug development
More informationMultistage Tests of Multiple Hypotheses
Communications in Statistics Theory and Methods, 39: 1597 167, 21 Copyright Taylor & Francis Group, LLC ISSN: 361-926 print/1532-415x online DOI: 1.18/3619282592852 Multistage Tests of Multiple Hypotheses
More informationA moment-based method for estimating the proportion of true null hypotheses and its application to microarray gene expression data
Biostatistics (2007), 8, 4, pp. 744 755 doi:10.1093/biostatistics/kxm002 Advance Access publication on January 22, 2007 A moment-based method for estimating the proportion of true null hypotheses and its
More informationStatistics Applied to Bioinformatics. Tests of homogeneity
Statistics Applied to Bioinformatics Tests of homogeneity Two-tailed test of homogeneity Two-tailed test H 0 :m = m Principle of the test Estimate the difference between m and m Compare this estimation
More informationHypothesis testing: Steps
Review for Exam 2 Hypothesis testing: Steps Repeated-Measures ANOVA 1. Determine appropriate test and hypotheses 2. Use distribution table to find critical statistic value(s) representing rejection region
More informationStatistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong
Statistics Primer ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong 1 Quick Overview of Statistics 2 Descriptive vs. Inferential Statistics Descriptive Statistics: summarize and describe data
More informationComparing Adaptive Designs and the. Classical Group Sequential Approach. to Clinical Trial Design
Comparing Adaptive Designs and the Classical Group Sequential Approach to Clinical Trial Design Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj
More informationChapter 9 Inferences from Two Samples
Chapter 9 Inferences from Two Samples 9-1 Review and Preview 9-2 Two Proportions 9-3 Two Means: Independent Samples 9-4 Two Dependent Samples (Matched Pairs) 9-5 Two Variances or Standard Deviations Review
More informationGene Expression an Overview of Problems & Solutions: 3&4. Utah State University Bioinformatics: Problems and Solutions Summer 2006
Gene Expression an Overview of Problems & Solutions: 3&4 Utah State University Bioinformatics: Problems and Solutions Summer 006 Review Considering several problems & solutions with gene expression data
More information6 Single Sample Methods for a Location Parameter
6 Single Sample Methods for a Location Parameter If there are serious departures from parametric test assumptions (e.g., normality or symmetry), nonparametric tests on a measure of central tendency (usually
More informationLecture 28. Ingo Ruczinski. December 3, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University
Lecture 28 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University December 3, 2015 1 2 3 4 5 1 Familywise error rates 2 procedure 3 Performance of with multiple
More informationLecture on Null Hypothesis Testing & Temporal Correlation
Lecture on Null Hypothesis Testing & Temporal Correlation CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Acknowledgement Resources used in the slides
More informationHypothesis testing: Steps
Review for Exam 2 Hypothesis testing: Steps Exam 2 Review 1. Determine appropriate test and hypotheses 2. Use distribution table to find critical statistic value(s) representing rejection region 3. Compute
More informationDesign of Microarray Experiments. Xiangqin Cui
Design of Microarray Experiments Xiangqin Cui Experimental design Experimental design: is a term used about efficient methods for planning the collection of data, in order to obtain the maximum amount
More informationSleep data, two drugs Ch13.xls
Model Based Statistics in Biology. Part IV. The General Linear Mixed Model.. Chapter 13.3 Fixed*Random Effects (Paired t-test) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch
More informationSample size re-estimation in clinical trials. Dealing with those unknowns. Chris Jennison. University of Kyoto, January 2018
Sample Size Re-estimation in Clinical Trials: Dealing with those unknowns Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj University of Kyoto,
More informationTests for the Odds Ratio of Two Proportions in a 2x2 Cross-Over Design
Chapter 170 Tests for the Odds Ratio of Two Proportions in a 2x2 Cross-Over Design Introduction Senn (2002) defines a cross-over design as one in which each subject receives all treatments and the objective
More informationy ˆ i = ˆ " T u i ( i th fitted value or i th fit)
1 2 INFERENCE FOR MULTIPLE LINEAR REGRESSION Recall Terminology: p predictors x 1, x 2,, x p Some might be indicator variables for categorical variables) k-1 non-constant terms u 1, u 2,, u k-1 Each u
More informationLECTURE 5. Introduction to Econometrics. Hypothesis testing
LECTURE 5 Introduction to Econometrics Hypothesis testing October 18, 2016 1 / 26 ON TODAY S LECTURE We are going to discuss how hypotheses about coefficients can be tested in regression models We will
More informationChapter Six: Two Independent Samples Methods 1/51
Chapter Six: Two Independent Samples Methods 1/51 6.3 Methods Related To Differences Between Proportions 2/51 Test For A Difference Between Proportions:Introduction Suppose a sampling distribution were
More informationPSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests
PSY 307 Statistics for the Behavioral Sciences Chapter 20 Tests for Ranked Data, Choosing Statistical Tests What To Do with Non-normal Distributions Tranformations (pg 382): The shape of the distribution
More informationGeneralized Linear Models (1/29/13)
STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability
More informationBIO5312 Biostatistics Lecture 6: Statistical hypothesis testings
BIO5312 Biostatistics Lecture 6: Statistical hypothesis testings Yujin Chung October 4th, 2016 Fall 2016 Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 1/30 Previous Two types of statistical
More informationChapter 7 Comparison of two independent samples
Chapter 7 Comparison of two independent samples 7.1 Introduction Population 1 µ σ 1 1 N 1 Sample 1 y s 1 1 n 1 Population µ σ N Sample y s n 1, : population means 1, : population standard deviations N
More information22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)
22s:152 Applied Linear Regression Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) We now consider an analysis with only categorical predictors (i.e. all predictors are
More informationANOVA: Comparing More Than Two Means
ANOVA: Comparing More Than Two Means Chapter 11 Cathy Poliak, Ph.D. cathy@math.uh.edu Office Fleming 11c Department of Mathematics University of Houston Lecture 25-3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More information