Miscellanea False discovery rate for scanning statistics
|
|
- John Hunt
- 6 years ago
- Views:
Transcription
1 Biometrika (2011), 98,4,pp C 2011 Biometrika Trust Printed in Great Britain doi: /biomet/asr057 Miscellanea False discovery rate for scanning statistics BY D. O. SIEGMUND, N. R. ZHANG Department of Statistics, Stanford University, 390 Serra Mall, Stanford, California , U.S.A. siegmund@stanford.edu nzhang@stanford.edu AND B. YAKIR Department of Statistics, The Hebrew University of Jerusalem, Jerusalem 91905, Israel msby@mscc.huji.ac.il SUMMARY The false discovery rate is a criterion for controlling Type I error in simultaneous testing of multiple hypotheses. For scanning statistics, due to local dependence, clusters of neighbouring hypotheses are likely to be rejected together. In such situations, it is more intuitive and informative to group neighbouring rejections together and count them as a single discovery, with the false discovery rate defined as the proportion of clusters that are falsely declared among all declared clusters. Assuming that the number of false discoveries, under this broader definition of a discovery, is approximately Poisson and independent of the number of true discoveries, we examine approaches for estimating and controlling the false discovery rate, and provide examples from biological applications. Some key words: False discovery rate; Multiple comparisons; Poisson approximation; Scan statistic. 1. INTRODUCTION In a pioneering paper, Benjamini & Hochberg (1995) initiated a fruitful line of research into the false discovery rate as a method to evaluate Type I error when simultaneously testing large numbers of hypotheses. We use their notation, so R is the number of discoveries that emerge as a result of a particular statistical procedure, and V is the number of false discoveries among them. Then S = R V is the number of true discoveries. The false discovery rate is the expected relative proportion of false discoveries, FDR = E(V/R; R > 0). These quantities are defined implicitly in terms of the specific procedure that is used to make discoveries. We are concerned with estimation and control of false discovery rates when there is substantial local correlation among the statistics used for testing the hypotheses. Due to local correlation, large values of the statistic tend to occur in clumps, and multiple rejections within a clump may constitute only a single discovery, as it relates to model identification. Yet a possibly large number of correct rejections at some locations can inflate the denominator in the definition of false discovery rate, hence artificially creating a small false discovery rate, and lowering the barrier to possibly false detections at distant locations. Scanning statistics to detect sparsely distributed signals provide typical examples. In the examples that follow, there is an underlying set of observations y t, where t varies over an indexing set having some geometric structure. The y t are often assumed to be independent, but this is not necessary, providing the dependence between them is local with respect to the geometric structure. The test statistics {Z t : t D}, where Z t is a function of t and of the y s for s N t, an appropriate neighbourhood of t, are related by a measure of distance within the scanning index set D. Hence, values of Z t and Z s for nearby t and s in D
2 980 D. O. SIEGMUND, N.R.ZHANG AND B. YAKIR are correlated, so a large value at a specific τ D causes a cluster of large values at t close to τ. Thus,a group of large values of Z t within close proximity are often associated with a single signal. Example 1. The random fields to detect local activity in an fmri scan as discussed in a series of papers by Worsley, for example, Worsley et al. (1992) orsiegmund & Worsley (1995). Example 2. Massively parallel paired end DNA re-sequencing used to detect structural variation in genomic sequences. For a review, see Medvedev et al. (2009). The data come in the form of distances y t between mapped positions of relatively short paired reads from the ends of DNA sequences of approximately w base pairs in length, with the leftmost read mapped to position x t in the genome. Where there are no structural variations one observes, after subtracting w and standardizing, y t that are independent and have a standard normal distribution. For read pairs that straddle the breakpoint τ of a structural variation, the distribution of some percentage of the y t,forτ<x t τ + w, is shifted by an unknown amount δ, which is related to the size of the variant. The score statistic with respect to δ to test for a breakpoint at τ is τ+w τ+1 y t /w 1/2. A scan is conducted with τ varying over the genomic region of interest to find putative breakpoint locations. Example 3. The scan statistics of variable window width used in Zhang et al. (2010) and Siegmund et al to detect common regions of copy number variation in a set of subjects. An appropriate likelihood based statistic for the special case of a single sequence (Olshen et al., 2004) is similar to that of the preceding example, but the window width is unknown, so the scan involves two-dimensional maximization with respect to τ and w. Example 4. A genome scan to detect either linkage or association between a phenotype and related genetic variation, e.g., Lander & Botstein (1989), Siegmund & Yakir (2007). The main results of this paper are methods for estimating and controlling the false discovery rate of a given procedure, with discovery defined in the sense of detection of sparse, local signals. In order to focus on the conceptual aspect of how one defines a discovery, our assumptions are given in general, abstract terms, and we avoid, except for a few comments, the necessarily technical and application specific discussion of methods to ensure and test the validity of those assumptions. Motivated by a different genomic application, Zhang (2008) contains a similar approach without any theoretical analysis. In an unpublished 2011 manuscript in the Harvard University Biostatistics Working Paper Series A. Schwartzman, Y. Gavrilov and R. J. Adler discuss a different approach to the same general issue under specific technical assumptions motivated by and apparently limited to a one-dimensional process having the structure of Example 1. Our two central assumptions are (i) the distribution of the number of false discoveries, V, is Poisson, with expected value λ; and (ii) the number of false discoveries is independent of the number of true discoveries, S. The number of true discoveries must be nonnegative, but otherwise may follow any distribution. The Poisson assumption is valid asymptotically in a variety of applications. These include the examples given above under the assumptions made in the cited references, or more generally if one makes suitable adjustments when there are local dependencies in the underlying observations. See Aldous (1988) for numerous examples, Lindgren et al. (1983) for relevant general theorems under different sets of conditions, and Arratia et al. (1989) for a flexible Poisson approximation theorem that applies quite generally to processes involving local dependence. Methods for determining λ depend on the specific problem. Illustrative examples based on more explicit assumptions about the underlying process are discussed below. The assumption of independence between V and S is more subtle. If we treat the locations of the signals in D as fixed but unknown quantities, then D can be partitioned into disjoint sets D 0 D 1, where D 0 are hypotheses that, if rejected, would be considered part of a false discovery, and D 1 are hypotheses that, if rejected, would be considered part of a true discovery. For example, in the simple case of a one-dimensional scan with fixed window size w as in Example 2, suppose the true signals are a set of intervals I within the scan region. If we count those windows that overlap with any interval in I towards true discoveries, and the rest towards false discoveries, then D 0 ={t : (t, t + w] ι =,ι I} and D 1 = D \ D 0. Then, V
3 Miscellanea 981 would be a function solely of {Z t : t D 0 }, while S would be a function solely of {Z t : t D 1 }. At least when the true signals are sparse, approximate independence between V and S would follow if long-range dependencies between {Z t : t D 0 } and {Z t : t D 1 } are negligible. In practice, near overlap of detected signals is a danger sign regarding possible violation of this hypothesis of independence. For a more detailed discussion, see 3 1. Significant long range dependence between {Z t } may cause nonnegligible dependence between V and S. Scanning procedures that are based on a collection of localized tests are inherently designed for problems where dependence can be assumed to be local, since if long range independence does not hold, then procedures that account for that dependence would be preferred on the basis of greater power. The estimator that we propose for the false discovery rate is FDR ˆ = λ/(r + 1), (1) where λ is the expected number of false discoveries and R is the total number of discoveries. This estimator has been considered by Efron (2010) in the framework of hypothesis testing with a large number of independent hypotheses and, except for the constant 1 in the denominator, is the same as that suggested by Zhang (2008). In some cases, the parameter λ can be derived analytically. In other cases it, can be computed via permutations or simulations conducted under the null assumption. In 2, we show that the estimator (1) is unbiased under assumptions (i) and (ii). Our method for controlling the rate of false discoveries is closely associated with the procedure proposed by Benjamini & Hochberg (1995) for ordered p-values. We in effect replace their assumption regarding the relations between p-values of individual hypotheses by the assumption that an appropriately indexed family of false discoveries is a Poisson process. 2. ESTIMATING AND CONTROLLING THE FALSE DISCOVERY RATE Let V Po(λ) be the number of false discoveries and let S 0 be the number of true discoveries. Assume S is a nonnegative random variable independent of V. The total number of discoveries is R = V + S. Consider the ratio V/R, which is defined to be 0 if R = 0, and compare it to the estimator λ/(r + 1). THEOREM 1. Under assumptions (i) and (ii), E(V/R; R > 0) = E{λ/(R + 1)}. Proof. For fixed s 0, let F s (x) = (x + s) 1 I (x + s > 0), with the understanding that F 0 (0) = 0. After writing expectations as infinite series, algebraic manipulations show that E{VF s (V )}=λe{f s (V + 1)}. (2) The result follows by taking expectations with respect to the distribution of S. Hence, FDR ˆ defined in (1) is an unbiased estimator of the false discovery rate. Remark. Equation (2) has been applied elsewhere. In particular, it is the basis for the Chen (1975) method of Poisson approximation. Now suppose that false detections are a Poisson process V λ of rate 1, defined on the interval [0, λ]. We assume also that the process R λ = V λ + S λ is nondecreasing and that the processes V λ and S λ are independent. Define the backwards stopping time = max{λ λ : R λ λ/α}. This is a function of the observed process R λ, and thereby it is a function of the Poisson process V λ and the independent process S λ, both unobserved. The extreme case when = 0 corresponds to the case where R is equal to zero and the ratio V/R is then defined to be equal to zero as well. Consider the procedure whereby the stopping time is evaluated and R is reported as the number of discoveries. In Theorem 2, we prove that the expected proportion of false discoveries, E(V /R ),is bounded by α. The proof is a version of the argument given by Storey et al. (2004).
4 982 D. O. SIEGMUND, N.R.ZHANG AND B. YAKIR THEOREM 2. Under the given conditions and for the procedure associated with the stopping time, E(V /R ) α. Proof. Consider the process V λ /λ and notice that it is a mean one backwards martingale with respect to the filtration F λ = σ(v t, S t : λ t λ). The stopping time is measurable with respect to this filtration. It follows that E(V λ / λ) = E(V λ/ λ) = 1, for any λ>0. Let λ 0 and observe that 1( < λ)v λ /λ converges to 0 and is bounded by 1/α. Hence, by the dominated convergence theorem, we see that E(V / ; >0) = E(V λ/ λ) = 1. Consider the proportion V / of false detection of the proposed procedure. Since this proportion is defined to be equal to zero = 0, E(V /R ) = E(V /R ; >0). Dividing and multiplying by, we get E(V /R ; >0) = E{( /R ) (V / ); >0} α E{(V / ); >0}=α, where the inequality follows from the fact that when >0the definition of the stopping time implies that /R α. The conclusion follows. 3. EXAMPLES 3 1. Fixed-width sliding window scan Consider a fixed window scan statistic. Suppose Y 1,...,Y m are independent and normally distributed random variables with unit variance. Under a global null hypothesis they are standard normal. Under the alternative there are intervals of known length w, and unknown positive integers τ such that Y τ+1,...,y τ+w have mean μ τ > 0. The values of μ τ and the number of such intervals is unknown, although we assume that the total width of all intervals is small relative to the sample size m. This situation corresponds roughly to Example 2 in 1, although, to facilitate our simulations, the numerical values of the parameters we use below are smaller than would be typical for this application. Let Z t = ( t+w i=t+1 Y i)/w 1/2. The behaviour of Z t as a process under the global null hypothesis that all discoveries are false is easily inferred from known results. Specifically, an asymptotic approximation to p = pr(max 0 t m w Z t > z), is given, for a two-sided alternative, in display (5.3) of Siegmund & Yakir (2007, p. 112), with parameters C = 1, = 1, L = m w and β = 1/w. For large enough thresholds z, the probability that Z t exceeds z is small, and the number of clumps of Z t that exceeds z is approximately Poisson distributed with mean λ 0 = log(1 p) = mzw 1 φ(z)ν{z(2/w) 1/2 }, (3) where φ denotes the standard normal probability density function and ν is a special function associated with the overshoot of a stopped random walk (cf. Siegmund & Yakir, 2007, p. 112). Although there is no unique definition of a clump, there should usually be little difficulty in recognizing one in practice. Roughly speaking, it is a set of values of t that are relatively close together, where Z t z. Except when different true discoveries are themselves close together, different clumps are distinguished by relatively long gaps where Z t remains below the level z. If all clumps were false positives and z 0, then the size of a clump would be stochastically bounded, while the expected distances between clumps would be approximately 1/λ 0, and hence grow faster than exponentially in z. The independence of Y t makes Z s and Z t independent as long as s t >w. Clumps of false positives should be short and approximately uniformly distributed across the search interval. Hence, unless the true signals occur very frequently, the probability of a false positive occurring close to a true signal is small, so the independence of V and S would be approximately satisfied. The same would be true of the variable window scans of Example 3 provided the maximum window size is much smaller than the number of observations. See Siegmund et al. (2011) for a discussion of the data normalization used to validate the normality and independence assumptions needed by Example 3. Some simulated results are presented in Tables 1 and 2. For the simulations we took m = and w = 50. A total of 21 intervals of length w, scattered about the sequence, were simulated from the alternative distribution with mean values μ τ that range between 6/w 1/2 and 2/w 1/2 in steps of size 0 2/w 1/2.
5 Miscellanea 983 Table 1. Simulated values of false discovery rate and E{λ 0 /(R + 1)}, based on 400 repetitions with w = 50, m= Nominal values of λ 0 are 5, 3, 2 and 1, respectively. There are 21 possible discoveries, with noncentrality parameters ranging from 6 to 2 in steps of size 0 2. z FDR E{λ 0 /(R + 1)} E(V ) E(S) FDR, false discovery rate. Table 2. Simulated values of the false discovery rate for the procedure that controls this rate. The simulations are based on 400 repetitions with w = 50,m= The false discovery rate is controlled to be no more than 0 3, 0 2, 0 1 or 0 05, respectively. There are 21 possible discoveries, with noncentrality parameters ranging from 6 to 2 in steps of size 0 2. α FDR E(V ) E(S) FDR, false discovery rate. Table 1 examines the estimator λ 0 /(R + 1) of the false discovery rate for several thresholds z. Four values of z corresponding to nominal values of 5, 3, 2 and 1 for λ 0 are considered. For each level the actual level of the false discovery rate and the expectation of the estimator are presented. The expected number of false discoveries, E(V ), and the expected value of true discoveries, E(S), are also given. The expectations are based on 400 replicates of the scanning process. Table 2 examines the procedure for controlling the false discovery rate. We used the stopping rule inf{z 2:R(z) λ 0 (z)/α}, where R(z) is the number of discoveries associated with the threshold z and λ 0 (z) is the approximation (3) of the expected number of clumps associated with z computed under the global null distribution. Four values of α, 0 3, 0 2, 0 1 and 0 05, are considered. For each α the actual level of the false discovery rate, the expected number of false discoveries and the expected number of true discoveries are presented. The expectations are based on 400 replicates of the scanning process Allelic bias in transcribed RNA Another example involves an experiment of RNA expression profiles in autistic subjects (Ben-David et al., 2011). The goal of the experiment was to identify autosomal loci where only one of the two alleles is expressed. Nuclear RNA was extracted from blood cell-lines of 17 subjects and reverse transcribed. Both the cdna produced and the genomic DNA of each of the subjects were genotyped using the Affymetrix Single Nucleotide Polymorphism 6 0 array technology. The identification of loci with mono-allelic expression of RNA resulted from the examination of the cdna genotypes at single nucleotide polymorphisms that had been identified as heterozygous in genomic DNA. Specifically, the algorithm for the discovery of differentially expressed regions involved the removal, for each subject, of the single nucleotide polymorphisms that were homozygous in the genomic DNA, or were determined not to be sufficiently expressed. For the remaining cdna polymorphisms, an exponentially distributed distance from heterozygous expression was calculated using the log transformed ranking
6 984 D. O. SIEGMUND, N.R.ZHANG AND B. YAKIR z-score Location Fig. 1. Scanning windows (t, t + w) that exceed the threshold of z = 30 for a region containing 500 positions in the DNA copy number data of 3 3. Each black horizontal segment shows the start and end points of a window, with the actual value of the scan statistic shown on the y-axis. This region contains three discoveries, or clumps, shown as thick bars at the top of the plot. of the confidence score from Affymetrix Birdseed V2 genotyping algorithm (Korn et al., 2008). The p- values for the sum of scores in windows of five consecutive polymorphisms were calculated using the function rollapply from the R package zoo (R Development Core Team, 2011). Windows that included polymorphisms more than 1 Mbp apart were excluded from the analysis. On the other hand, consecutive windows with p-values <0 05 were combined if the distance between them was <1 Mbp. The p-values for the merged windows were recalculated. Final windows with a p-value < were declared to be discoveries. A total of 507 such windows were discovered using the algorithm described above. In order to estimate the false discovery rate of the algorithm, the method of 2 was applied. The markers used are heterozygous and widely separated in the scale of base pairs. Hence, it seems reasonable to assume that they behave independently, since transcription, currently understood as a localized process within the genome, should not induce dependence between the allelic expression of distantly separated polymorphisms. The transcribed allelic ratios can be permuted within individuals; and a Monte Carlo experiment then determines the Poisson parameter λ. The algorithm was applied to each permuted set of data, and the number of discoveries was counted. The average number of discoveries, computed from 100 permutations, was This average served as an estimate of the expected number of false discoveries. Consequently, the estimated rate of false discovery is 11 48/( ) = Population-wide copy number variation To detect copy number variation, Olshen et al. (2004) introduced a change-point model with white Gaussian measurement errors. Their procedure was found by Lai et al. (2005) tobepreferabletoother existing methods. See Jeng et al. (2010) for a recent discussion of this model. For the more general problem of aligned copy number variation in multiple sequences Zhang et al. (2010) and Siegmund et al. (2011), after a suitable normalization of the data described in those papers, found that the change-point model with Gaussian white noise measurement errors was reasonable. It follows from (3.3) and (3.4) of Siegmund et al. (2011) that V is approximately Poisson for high thresholds. We used (3.4) from that paper applied to the data from chromosome 4 of the Stanford Quality Control Panel. For a complete description of this application and dataset, see the cited papers. There is a total of positions, with 62 samples. We restricted our analysis to small intervals, and so conducted a variable window scan of all positions with a maximum window size of 50 and a minimum window size of 1. The theoretically derived value of λ(z) compares well with values estimated via Monte Carlo simulation, even for values of z where λ(z) is fairly large. With a false discovery rate threshold of 0 01, 337 discoveries were made. With a false discovery rate threshold of 0 1, 472 discoveries were made. See Fig. 1 for an example region containing 500 positions and 3 discoveries. ACKNOWLEDGEMENT The research of the first and third authors is supported by the Israeli-American Bi-National Fund. The second and third authors are supported by the National Science Foundation, U.S.A. We would like to thank
7 Miscellanea 985 Dr Shifman from The Hebrew University of Jerusalem for giving us access to the data of the experiment described in 3 and for conducting the simulation described therein. REFERENCES ALDOUS,D.(1988). Applications of the Poisson Clumping Heuristic. New York: Springer. ARRATIA,R.,GOLDSTEIN,L.&GORDON,L.(1989). Two moments suffice for Poisson approximation. Ann. Prob. 17, BEN-DAVID,E.,GRANOT-HERSHKOVITZ,E.,MONDERER-ROTHKOFF,G.,LERER,E.,LEVI,S.,YAARI,M.,EBSTEIN,R. P., YIRMIA, N.,& SHIFMAN, S.(2011). Identification of a functional rare variant in autism using genome-wide screen for monoallelic expression. Hum. Molec. Genet. 20, BENJAMINI, Y.&HOCHBERG, Y.(1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statist. Soc. B 57, CHEN,L.(1975). Poisson approximation for dependent trials. Ann. Prob. 3, EFRON, B. (2010). Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction. Cambridge: Cambridge University Press. JENG,X.J.,CAI,T.T.&LI,H.(2010). Optimal sparse segment identification with application in copy number variation analysis. J. Am. Statist. Assoc. 105, KORN, J.M.,KURUVILLA, F.G.,MCCARROLL, S.A.,WYSOKER, A.,NEMESH, J.,CAWLEY, S.,HUBBELL, E., VEITCH, J.,COLLINS, P.J.,DARVISHI, K.,et al. (2008). Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nature Genet. 40, LAI,W.R.,JOHNSON,M.D.,KUCHERLAPATI,R.&PARK,P.J.(2005). Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics 21, LANDER, E.&BOTSTEIN, D.(1989). Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121, LINDGREN,G.,LEADBETTER,M.R.&ROOTZÉN,H.(1983). Extremes and Related Properties of Stationary Sequences and Processes. New York: Springer. MEDVEDEV,P.,STANCIU,M.&BRUDNO,M.(2009). Computational methods for discovering structural variation with next-generation sequencing. Nature Meth. Suppl. 6, S OLSHEN,A.B.,VENKATRAMAN,E.S.,LUCITO,R.&WIGLER,M.(2004). Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5, RDEVELOPMENT CORE TEAM (2011). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN SIEGMUND, D.O.&WORSLEY, K.J.(1995). Testing for a signal with unknown location and scale in a stationary gaussian random field. Ann. Statist. 23, SIEGMUND,D.&YAKIR,B.(2007). The Statistics of Gene Mapping. New York: Springer. SIEGMUND, D.,YAKIR, B.&ZHANG, N.(2011). Detecting simultaneous variant intervals in aligned sequences. Ann. Appl. Statist. 5, STOREY, J.D.,TAYLOR, J.E.&SIEGMUND, D.O.(2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. J. R. Statist. Soc. B 66, WORSLEY, K., EVANS, A. C., MARRETT, S.& NEELIN P. (1992). A three dimensional statistical analysis for CBF activation studies in human brain. J. Cerebral Blood Flow Metab. 12, ZHANG, Y.(2008). Poisson approximation for significance in genome-wide ChiP-chip tiling arrays. Bioinformatics 24, ZHANG, N.,SIEGMUND, D.,JI, H.&LI, J.Z.(2010). Detecting simultaneous change-points in multiple sequences. Biometrika 97, [Received September Revised August 2011]
8
Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations
Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Yale School of Public Health Joint work with Ning Hao, Yue S. Niu presented @Tsinghua University Outline 1 The Problem
More informationThe Admixture Model in Linkage Analysis
The Admixture Model in Linkage Analysis Jie Peng D. Siegmund Department of Statistics, Stanford University, Stanford, CA 94305 SUMMARY We study an appropriate version of the score statistic to test the
More informationTable of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. T=number of type 2 errors
The Multiple Testing Problem Multiple Testing Methods for the Analysis of Microarray Data 3/9/2009 Copyright 2009 Dan Nettleton Suppose one test of interest has been conducted for each of m genes in a
More informationLinear Regression (1/1/17)
STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression
More informationPeak Detection for Images
Peak Detection for Images Armin Schwartzman Division of Biostatistics, UC San Diego June 016 Overview How can we improve detection power? Use a less conservative error criterion Take advantage of prior
More informationThe miss rate for the analysis of gene expression data
Biostatistics (2005), 6, 1,pp. 111 117 doi: 10.1093/biostatistics/kxh021 The miss rate for the analysis of gene expression data JONATHAN TAYLOR Department of Statistics, Stanford University, Stanford,
More informationNon-specific filtering and control of false positives
Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview
More informationSTAT 536: Genetic Statistics
STAT 536: Genetic Statistics Tests for Hardy Weinberg Equilibrium Karin S. Dorman Department of Statistics Iowa State University September 7, 2006 Statistical Hypothesis Testing Identify a hypothesis,
More informationResearch Article Sample Size Calculation for Controlling False Discovery Proportion
Probability and Statistics Volume 2012, Article ID 817948, 13 pages doi:10.1155/2012/817948 Research Article Sample Size Calculation for Controlling False Discovery Proportion Shulian Shang, 1 Qianhe Zhou,
More informationStatistical testing. Samantha Kleinberg. October 20, 2009
October 20, 2009 Intro to significance testing Significance testing and bioinformatics Gene expression: Frequently have microarray data for some group of subjects with/without the disease. Want to find
More informationDEGseq: an R package for identifying differentially expressed genes from RNA-seq data
DEGseq: an R package for identifying differentially expressed genes from RNA-seq data Likun Wang Zhixing Feng i Wang iaowo Wang * and uegong Zhang * MOE Key Laboratory of Bioinformatics and Bioinformatics
More informationSequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process
Applied Mathematical Sciences, Vol. 4, 2010, no. 62, 3083-3093 Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Julia Bondarenko Helmut-Schmidt University Hamburg University
More informationSummary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing
Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Statistics Journal Club, 36-825 Beau Dabbs and Philipp Burckhardt 9-19-2014 1 Paper
More informationFalse discovery rate and related concepts in multiple comparisons problems, with applications to microarray data
False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008 1 / 35 Lecture outline Motivation for not using
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationOptional Stopping Theorem Let X be a martingale and T be a stopping time such
Plan Counting, Renewal, and Point Processes 0. Finish FDR Example 1. The Basic Renewal Process 2. The Poisson Process Revisited 3. Variants and Extensions 4. Point Processes Reading: G&S: 7.1 7.3, 7.10
More informationA Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data
A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data Faming Liang, Chuanhai Liu, and Naisyin Wang Texas A&M University Multiple Hypothesis Testing Introduction
More informationI of a gene sampled from a randomly mating popdation,
Copyright 0 1987 by the Genetics Society of America Average Number of Nucleotide Differences in a From a Single Subpopulation: A Test for Population Subdivision Curtis Strobeck Department of Zoology, University
More informationRobust Detection and Identification of Sparse Segments in Ultra-High Dimensional Data Analysis
Robust Detection and Identification of Sparse Segments in Ultra-High Dimensional Data Analysis Hongzhe Li hongzhe@upenn.edu, http://statgene.med.upenn.edu University of Pennsylvania Perelman School of
More informationGenotype Imputation. Biostatistics 666
Genotype Imputation Biostatistics 666 Previously Hidden Markov Models for Relative Pairs Linkage analysis using affected sibling pairs Estimation of pairwise relationships Identity-by-Descent Relatives
More informationBumpbars: Inference for region detection. Yuval Benjamini, Hebrew University
Bumpbars: Inference for region detection Yuval Benjamini, Hebrew University yuvalbenj@gmail.com WHOA-PSI-2017 Collaborators Jonathan Taylor Stanford Rafael Irizarry Dana Farber, Harvard Amit Meir U of
More informationThe General Linear Model. Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London
The General Linear Model Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London SPM Course Lausanne, April 2012 Image time-series Spatial filter Design matrix Statistical Parametric
More informationGene mapping in model organisms
Gene mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Goal Identify genes that contribute to common human diseases. 2
More informationApplying the Benjamini Hochberg procedure to a set of generalized p-values
U.U.D.M. Report 20:22 Applying the Benjamini Hochberg procedure to a set of generalized p-values Fredrik Jonsson Department of Mathematics Uppsala University Applying the Benjamini Hochberg procedure
More informationComputational statistics
Computational statistics Combinatorial optimization Thierry Denœux February 2017 Thierry Denœux Computational statistics February 2017 1 / 37 Combinatorial optimization Assume we seek the maximum of f
More informationStatistical Applications in Genetics and Molecular Biology
Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca
More informationarxiv: v1 [math.st] 31 Mar 2009
The Annals of Statistics 2009, Vol. 37, No. 2, 619 629 DOI: 10.1214/07-AOS586 c Institute of Mathematical Statistics, 2009 arxiv:0903.5373v1 [math.st] 31 Mar 2009 AN ADAPTIVE STEP-DOWN PROCEDURE WITH PROVEN
More informationCalculation of IBD probabilities
Calculation of IBD probabilities David Evans and Stacey Cherny University of Oxford Wellcome Trust Centre for Human Genetics This Session IBD vs IBS Why is IBD important? Calculating IBD probabilities
More informationOverview. Background
Overview Implementation of robust methods for locating quantitative trait loci in R Introduction to QTL mapping Andreas Baierl and Andreas Futschik Institute of Statistics and Decision Support Systems
More informationIntroduction to QTL mapping in model organisms
Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics and Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman [ Teaching Miscellaneous lectures]
More informationFDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES
FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES Sanat K. Sarkar a a Department of Statistics, Temple University, Speakman Hall (006-00), Philadelphia, PA 19122, USA Abstract The concept
More informationUniversity of California, Berkeley
University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2004 Paper 147 Multiple Testing Methods For ChIP-Chip High Density Oligonucleotide Array Data Sunduz
More informationIntroduction to QTL mapping in model organisms
Introduction to QTL mapping in model organisms Karl Broman Biostatistics and Medical Informatics University of Wisconsin Madison kbroman.org github.com/kbroman @kwbroman Backcross P 1 P 2 P 1 F 1 BC 4
More informationarxiv: v3 [math.st] 15 Oct 2018
SEGMENTATION AND ESTIMATION OF CHANGE-POINT MODELS: FALSE POSITIVE CONTROL AND CONFIDENCE REGIONS arxiv:1608.03032v3 [math.st] 15 Oct 2018 Xiao Fang, Jian Li and David Siegmund The Chinese University of
More informationA COMPOUND POISSON APPROXIMATION INEQUALITY
J. Appl. Prob. 43, 282 288 (2006) Printed in Israel Applied Probability Trust 2006 A COMPOUND POISSON APPROXIMATION INEQUALITY EROL A. PEKÖZ, Boston University Abstract We give conditions under which the
More informationDrosophila melanogaster and D. simulans, two fruit fly species that are nearly
Comparative Genomics: Human versus chimpanzee 1. Introduction The chimpanzee is the closest living relative to humans. The two species are nearly identical in DNA sequence (>98% identity), yet vastly different
More informationProbabilistic Inference for Multiple Testing
This is the title page! This is the title page! Probabilistic Inference for Multiple Testing Chuanhai Liu and Jun Xie Department of Statistics, Purdue University, West Lafayette, IN 47907. E-mail: chuanhai,
More informationSample Size Estimation for Studies of High-Dimensional Data
Sample Size Estimation for Studies of High-Dimensional Data James J. Chen, Ph.D. National Center for Toxicological Research Food and Drug Administration June 3, 2009 China Medical University Taichung,
More informationHow to analyze many contingency tables simultaneously?
How to analyze many contingency tables simultaneously? Thorsten Dickhaus Humboldt-Universität zu Berlin Beuth Hochschule für Technik Berlin, 31.10.2012 Outline Motivation: Genetic association studies Statistical
More informationEmpirical Bayes Moderation of Asymptotically Linear Parameters
Empirical Bayes Moderation of Asymptotically Linear Parameters Nima Hejazi Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi nimahejazi.org twitter/@nshejazi github/nhejazi
More informationBustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #
Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Details of PRF Methodology In the Poisson Random Field PRF) model, it is assumed that non-synonymous mutations at a given gene are either
More informationHunting for significance with multiple testing
Hunting for significance with multiple testing Etienne Roquain 1 1 Laboratory LPMA, Université Pierre et Marie Curie (Paris 6), France Séminaire MODAL X, 19 mai 216 Etienne Roquain Hunting for significance
More informationHigh-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018
High-Throughput Sequencing Course Multiple Testing Biostatistics and Bioinformatics Summer 2018 Introduction You have previously considered the significance of a single gene Introduction You have previously
More informationOn adaptive procedures controlling the familywise error rate
, pp. 3 On adaptive procedures controlling the familywise error rate By SANAT K. SARKAR Temple University, Philadelphia, PA 922, USA sanat@temple.edu Summary This paper considers the problem of developing
More informationAnalysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems
Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Jeremy S. Conner and Dale E. Seborg Department of Chemical Engineering University of California, Santa Barbara, CA
More informationEstimation of a Two-component Mixture Model
Estimation of a Two-component Mixture Model Bodhisattva Sen 1,2 University of Cambridge, Cambridge, UK Columbia University, New York, USA Indian Statistical Institute, Kolkata, India 6 August, 2012 1 Joint
More informationStep-down FDR Procedures for Large Numbers of Hypotheses
Step-down FDR Procedures for Large Numbers of Hypotheses Paul N. Somerville University of Central Florida Abstract. Somerville (2004b) developed FDR step-down procedures which were particularly appropriate
More informationGeneralized Linear Models (1/29/13)
STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability
More informationCS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS
CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems
More informationMODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES
MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES Saurabh Ghosh Human Genetics Unit Indian Statistical Institute, Kolkata Most common diseases are caused by
More informationSupplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control
Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Xiaoquan Wen Department of Biostatistics, University of Michigan A Model
More informationProbability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies
Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies Ruth Pfeiffer, Ph.D. Mitchell Gail Biostatistics Branch Division of Cancer Epidemiology&Genetics National
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org
More informationRegularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics
Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics arxiv:1603.08163v1 [stat.ml] 7 Mar 016 Farouk S. Nathoo, Keelin Greenlaw,
More informationResearch Statement on Statistics Jun Zhang
Research Statement on Statistics Jun Zhang (junzhang@galton.uchicago.edu) My interest on statistics generally includes machine learning and statistical genetics. My recent work focus on detection and interpretation
More informationLatent Variable Methods for the Analysis of Genomic Data
John D. Storey Center for Statistics and Machine Learning & Lewis-Sigler Institute for Integrative Genomics Latent Variable Methods for the Analysis of Genomic Data http://genomine.org/talks/ Data m variables
More informationIntroduction to QTL mapping in model organisms
Human vs mouse Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] www.daviddeen.com
More informationExpression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia
Expression QTLs and Mapping of Complex Trait Loci Paul Schliekelman Statistics Department University of Georgia Definitions: Genes, Loci and Alleles A gene codes for a protein. Proteins due everything.
More informationA Monte-Carlo study of asymptotically robust tests for correlation coefficients
Biometrika (1973), 6, 3, p. 661 551 Printed in Great Britain A Monte-Carlo study of asymptotically robust tests for correlation coefficients BY G. T. DUNCAN AND M. W. J. LAYAKD University of California,
More informationDetecting Simultaneous Variant Intervals in Aligned Sequences
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2011 Detecting Simultaneous Variant Intervals in Aligned Sequences David Siegmund Benjamin Yakir Nancy R. Zhang University
More informationStatistical analysis of microarray data: a Bayesian approach
Biostatistics (003), 4, 4,pp. 597 60 Printed in Great Britain Statistical analysis of microarray data: a Bayesian approach RAPHAEL GTTARD University of Washington, Department of Statistics, Box 3543, Seattle,
More informationLet us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided
Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or
More informationModel-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate
Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel
More information1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics
1 Springer Nan M. Laird Christoph Lange The Fundamentals of Modern Statistical Genetics 1 Introduction to Statistical Genetics and Background in Molecular Genetics 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
More informationMonte Carlo Studies. The response in a Monte Carlo study is a random variable.
Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating
More informationAsymptotic properties of the likelihood ratio test statistics with the possible triangle constraint in Affected-Sib-Pair analysis
The Canadian Journal of Statistics Vol.?, No.?, 2006, Pages???-??? La revue canadienne de statistique Asymptotic properties of the likelihood ratio test statistics with the possible triangle constraint
More informationYour use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
Biometrika Trust Robust Regression via Discriminant Analysis Author(s): A. C. Atkinson and D. R. Cox Source: Biometrika, Vol. 64, No. 1 (Apr., 1977), pp. 15-19 Published by: Oxford University Press on
More informationStochastic processes and
Stochastic processes and Markov chains (part II) Wessel van Wieringen w.n.van.wieringen@vu.nl wieringen@vu nl Department of Epidemiology and Biostatistics, VUmc & Department of Mathematics, VU University
More informationSTAT 461/561- Assignments, Year 2015
STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and
More informationBIOINFORMATICS ORIGINAL PAPER
BIOINFORMATICS ORIGINAL PAPER Vol 21 no 11 2005, pages 2684 2690 doi:101093/bioinformatics/bti407 Gene expression A practical false discovery rate approach to identifying patterns of differential expression
More informationMultiple Testing. Hoang Tran. Department of Statistics, Florida State University
Multiple Testing Hoang Tran Department of Statistics, Florida State University Large-Scale Testing Examples: Microarray data: testing differences in gene expression between two traits/conditions Microbiome
More informationSimulating Properties of the Likelihood Ratio Test for a Unit Root in an Explosive Second Order Autoregression
Simulating Properties of the Likelihood Ratio est for a Unit Root in an Explosive Second Order Autoregression Bent Nielsen Nuffield College, University of Oxford J James Reade St Cross College, University
More informationThe Wright-Fisher Model and Genetic Drift
The Wright-Fisher Model and Genetic Drift January 22, 2015 1 1 Hardy-Weinberg Equilibrium Our goal is to understand the dynamics of allele and genotype frequencies in an infinite, randomlymating population
More informationClassical Selection, Balancing Selection, and Neutral Mutations
Classical Selection, Balancing Selection, and Neutral Mutations Classical Selection Perspective of the Fate of Mutations All mutations are EITHER beneficial or deleterious o Beneficial mutations are selected
More informationTwo-stage stepup procedures controlling FDR
Journal of Statistical Planning and Inference 38 (2008) 072 084 www.elsevier.com/locate/jspi Two-stage stepup procedures controlling FDR Sanat K. Sarar Department of Statistics, Temple University, Philadelphia,
More informationMATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME
MATHEMATICAL MODELING AND THE HUMAN GENOME Hilary S. Booth Australian National University, Australia Keywords: Human genome, DNA, bioinformatics, sequence analysis, evolution. Contents 1. Introduction:
More informationFalse Discovery Rate
False Discovery Rate Peng Zhao Department of Statistics Florida State University December 3, 2018 Peng Zhao False Discovery Rate 1/30 Outline 1 Multiple Comparison and FWER 2 False Discovery Rate 3 FDR
More informationBiostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences
Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per
More informationFalse Discovery Control in Spatial Multiple Testing
False Discovery Control in Spatial Multiple Testing WSun 1,BReich 2,TCai 3, M Guindani 4, and A. Schwartzman 2 WNAR, June, 2012 1 University of Southern California 2 North Carolina State University 3 University
More informationSupplementary Information for Discovery and characterization of indel and point mutations
Supplementary Information for Discovery and characterization of indel and point mutations using DeNovoGear Avinash Ramu 1 Michiel J. Noordam 1 Rachel S. Schwartz 2 Arthur Wuster 3 Matthew E. Hurles 3 Reed
More informationAffected Sibling Pairs. Biostatistics 666
Affected Sibling airs Biostatistics 666 Today Discussion of linkage analysis using affected sibling pairs Our exploration will include several components we have seen before: A simple disease model IBD
More informationAssociation Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5
Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative
More informationHeritability estimation in modern genetics and connections to some new results for quadratic forms in statistics
Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics Lee H. Dicker Rutgers University and Amazon, NYC Based on joint work with Ruijun Ma (Rutgers),
More informationSAMPLE SIZE RE-ESTIMATION FOR ADAPTIVE SEQUENTIAL DESIGN IN CLINICAL TRIALS
Journal of Biopharmaceutical Statistics, 18: 1184 1196, 2008 Copyright Taylor & Francis Group, LLC ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543400802369053 SAMPLE SIZE RE-ESTIMATION FOR ADAPTIVE
More informationLARGE NUMBERS OF EXPLANATORY VARIABLES. H.S. Battey. WHAO-PSI, St Louis, 9 September 2018
LARGE NUMBERS OF EXPLANATORY VARIABLES HS Battey Department of Mathematics, Imperial College London WHAO-PSI, St Louis, 9 September 2018 Regression, broadly defined Response variable Y i, eg, blood pressure,
More informationLecture 7 April 16, 2018
Stats 300C: Theory of Statistics Spring 2018 Lecture 7 April 16, 2018 Prof. Emmanuel Candes Scribe: Feng Ruan; Edited by: Rina Friedberg, Junjie Zhu 1 Outline Agenda: 1. False Discovery Rate (FDR) 2. Properties
More informationStatistical issues in QTL mapping in mice
Statistical issues in QTL mapping in mice Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Outline Overview of QTL mapping The X chromosome Mapping
More informationAn Integrated Approach for the Assessment of Chromosomal Abnormalities
An Integrated Approach for the Assessment of Chromosomal Abnormalities Department of Biostatistics Johns Hopkins Bloomberg School of Public Health June 6, 2007 Karyotypes Mitosis and Meiosis Meiosis Meiosis
More informationLecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015
Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2015 1 / 1 Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits.
More information6.207/14.15: Networks Lecture 12: Generalized Random Graphs
6.207/14.15: Networks Lecture 12: Generalized Random Graphs 1 Outline Small-world model Growing random networks Power-law degree distributions: Rich-Get-Richer effects Models: Uniform attachment model
More informationIntroduction to Bioinformatics
CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics
More informationEmpirical Bayes Moderation of Asymptotically Linear Parameters
Empirical Bayes Moderation of Asymptotically Linear Parameters Nima Hejazi Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi nimahejazi.org twitter/@nshejazi github/nhejazi
More informationA note on profile likelihood for exponential tilt mixture models
Biometrika (2009), 96, 1,pp. 229 236 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asn059 Advance Access publication 22 January 2009 A note on profile likelihood for exponential
More informationRejoinder on: Control of the false discovery rate under dependence using the bootstrap and subsampling
Test (2008) 17: 461 471 DOI 10.1007/s11749-008-0134-6 DISCUSSION Rejoinder on: Control of the false discovery rate under dependence using the bootstrap and subsampling Joseph P. Romano Azeem M. Shaikh
More informationMotivating the need for optimal sequence alignments...
1 Motivating the need for optimal sequence alignments... 2 3 Note that this actually combines two objectives of optimal sequence alignments: (i) use the score of the alignment o infer homology; (ii) use
More informationLecture 9. QTL Mapping 2: Outbred Populations
Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred
More informationHaplotype-based variant detection from short-read sequencing
Haplotype-based variant detection from short-read sequencing Erik Garrison and Gabor Marth July 16, 2012 1 Motivation While statistical phasing approaches are necessary for the determination of large-scale
More informationSpecific Differences. Lukas Meier, Seminar für Statistik
Specific Differences Lukas Meier, Seminar für Statistik Problem with Global F-test Problem: Global F-test (aka omnibus F-test) is very unspecific. Typically: Want a more precise answer (or have a more
More informationEco517 Fall 2004 C. Sims MIDTERM EXAM
Eco517 Fall 2004 C. Sims MIDTERM EXAM Answer all four questions. Each is worth 23 points. Do not devote disproportionate time to any one question unless you have answered all the others. (1) We are considering
More informationModel Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model
Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population
More information