Correlation, z-values, and the Accuracy of Large-Scale Estimators. Bradley Efron Stanford University
|
|
- Amy Parrish
- 5 years ago
- Views:
Transcription
1 Correlation, z-values, and the Accuracy of Large-Scale Estimators Bradley Efron Stanford University
2 Correlation and Accuracy Modern Scientific Studies N cases (genes, SNPs, pixels,... ) each with its own summary statistic z i, i = 1, 2,..., N N 10, 000 Estimate of interest ˆθ = s(z) [ e.g., ˆθ = #{z i > 3}/N ] Question How accurate is ˆθ? Easy answer if z i s independent (but usually not!) Troubles for the bootstrap Correlation, z-values, Accuracy 1
3 Leukemia Microarray Study (Golub et al., 1999) 72 leukemia patients: n 1 = 47 ALL, n 2 = 25 AML N = 7128 genes Data matrix X X has independent columns but correlated rows rms correlation ˆα =.11 t i = two-sample t-statistic, AML vs. ALL for gene i z i = Φ 1 (F 70 (t i )) [Φ, F 70 cdfs N(0, 1), t 70 ] H 0 : z i N(0, 1) theoretical null Correlation, z-values, Accuracy 2
4 Leukemia data: N=7128 z values comparing 47 ALL vs 25 AML patients; RMS correlation=.11; Central standard dev sighat0=1.68 yy fhat(z) [Poisson glm spline, df=5] z values Correlation, z-values, Accuracy 3
5 Leukemia z value histogram and average 100 bootstrap z hists. [Two sample Nonparametric Boots: resample Columns of X ] Frequency Poisson spline fit boot average z values Correlation, z-values, Accuracy 4
6 Bootstrap Dilation x i = ith row of X (n equals 72 = ) x i z i x z z i i i + N(0, σ 2) i Bootstrap histogram has extra component of variance: E N 1 / z 2 i N = N 1 z 2 i / N + N 1 σ 2 i / N Next: Boot stdev estimates for ˆF(x) = #{z i x}/n Correlation, z-values, Accuracy 5
7 Bootstrap Stdev for empirical cdf of Leukemia z values, compared with Formula X Sd estimates Formula X Bootstrap x value Correlation, z-values, Accuracy 6
8 Sd estimates jackknife Now permutation and jackknife ests of sd{empirical cdf} compared with Formula X perm Formula X x value Correlation, z-values, Accuracy 7
9 Formula X Var { ˆF(x) } { ˆF(x)(1 ˆF(x)) N } + { } ˆσ 2 0 ˆα f ˆ 2 (1) (x) 2 independence correlation penalty ˆσ 0 = 1.68 from empirical null ˆα =.11 ˆ f (1) (x) estimated RMS correlation first derivative of estimate ˆ f (x) Depends on normality: z i N(µ i, σ 2 i ) Correlation, z-values, Accuracy 8
10 Formula X for Leukemia Data x: ˆF(x) ŝd ŝd Correlation, z-values, Accuracy 9
11 Simulation: sd{fhat(x)} from Formula X; N=6000, n=20+20, alpha=.10; Solid Curve and bars are mean and stdev of sdhat values, 100 sims standard deviation estimates Dashed curve is actual sd Correlation, z-values, Accuracy 10
12 Digression: The Non-Null Distribution of z-values z-value is a test statistic N(0, 1) under H 0 Theorem Under reasonable conditions the non-null distribution of z is where z N(µ, σ 2 ) + O p (1/n) σ 2 = 1 + O ( 1 / ) n 1 2 Normality degrades more slowly than unit standard deviation Helps justify model z i N(µ i, σ 2 i ) Correlation, z-values, Accuracy 11
13 Student-t z-values t t ν (δ) [noncentral-t, noncentrality δ, d f = ν] H 0 : δ = 0 z = Φ 1 F ν (t) [F ν central t cdf, d f = ν] so under H 0, z N(0, 1) What if δ 0? Correlation, z-values, Accuracy 12
14 Densities for z=phiinv(fnu(t)), t~t(del,nu=20), for del=0,1,2,3,4,5; Dotted dashed lines are matching N(M,SD) density del= z value Correlation, z-values, Accuracy 13
15 The Count Vector y Partition range Z of z into K bins: Z = Each bin of width K k=1 Z k Bin centers x k, k = 1, 2,..., K (Leukemia histogram: Z = [ 7.9, 7.9], =.2, K = 79) Counts y k = # {z i Z k } y = (y 1, y 2,..., y K ) Count vector y is discretized order statistic of z (most statistics of interest of form ˆθ = m(y)) Correlation, z-values, Accuracy 14
16 Multi-Class Normal Model Suppose z i s are in classes C 1, C 2,..., C C, with z i N(µ c, σ 2 c) N c = # {C c }, p c = N c /N for z i C c [ so c N c = N, c p c = 1 ] Correlation distribution: g(ρ) = empirical density all ( N 2) true correlations Correlation, z-values, Accuracy 15
17 Mehler s Identity (Lancaster, 1958) ϕ ρ (u, v) = standard normal bivariate density Mehler λ ρ (u, v) = ϕ ρ(u, v) ϕ(u)ϕ(v) 1 = where h j is jth Hermite polynomial Crucial quantity: Λ(u, v) = = j α j j! h j(u)h j (v) where α j = j 1 ρ j j! h j(u)h j (v) λ ρ (u, v)g(ρ) dρ 1 1 ρ j g(ρ) dρ Correlation, z-values, Accuracy 16
18 Exact Covariance of y z i N(µ c, σ 2 c) for z i C c N c = #C c, p c = N c /N Theorem cov(y) = cov 0 + cov 1, { cov 0 = N p c diag(πc ) π c π c } c [independence] where π ck = Pr c {z i bin k }, π c = ( π ck... ), cov 1 = N 2 p c p d B cd N p c B cc [corr penalty] c ( xk µ c and B cd (k, l) = π ck π dl Λ d σ c c, x l µ ) d. σ d Correlation, z-values, Accuracy 17
19 Four Simplifications of cov 1 Drop N term Microarray standardization methods make α 1 0 Mehler expansion: α 2 = 1 Higher terms ignorable if α 2 small Simplified Formula (almost Formula X): Letting 1 2 α = α and φ (2) 2 k 1 ρ2 g(ρ) is the lead term = c p c ϕ (2) ( x kc µ c σ c ) / σ c cov 1 (N α) 2 φ (2) φ (2) / 2 [rms approximation] Correlation, z-values, Accuracy 18
20 Numerical Comparison N = 6000, α =.1 Two classes: (p c, µ c, σ c ) = (.95, 0, 1) (.05, 2.5, 1) Next figure compares standard deviations (square roots diagonal elements) of exact cov(y) & rms approximation Correlation, z-values, Accuracy 19
21 Compare sd{y[k]} from exact formula (solid) with rms approx (dashed); N=6000, alpha=.1, (p0,mu0,sig0)=(.95,0,1) and(.05,2.5,1) standard deviation rms approx imation sd{y[k]},exact without corr penalty z value dashes show bin centers x[k] Correlation, z-values, Accuracy 20
22 Same numerical example, now sd{fhat[k]} [ Fhat[k]=sum(y[l] for l>=k)/n ] sd{fhat} rms approx without corr penalty exact z value Correlation, z-values, Accuracy 21
23 Estimation of RMS Correlation α ˆρ ii = empirical correlation, rows i, i of X, N n expression matrix { ˆρ ii } has mean and variance (m, v) [leukemia = (.00,.19 2 )] ˆα 2 = n n 1 ( v 1 ) n 1 ALL AML Both ˆα: Correlation, z-values, Accuracy 22
24 More General Accuracy Estimates Q q-dimensional statistic of interest: Q = Q(y) Influence Function ˆD: dq = ˆD dy [ ˆD jk = Q j / y k ] ĉov(q) = ˆDcov(y) ˆD Correlation, z-values, Accuracy 23
25 Example: Accuracy of log ( f ˆ ) z y ˆ f by Poisson GLM of counts y k on polynomial (x k ) Q = log( ˆ f) = (... log f (x k )... ) ˆD = M [ M diag ( ˆ f ) M ] M / N with M the GLM structure matrix Correlation, z-values, Accuracy 24
26 Local False Discovery Rate p 0 = prior Pr null p 1 = prior Pr non-null z f 0 (z) f 1 (z) Mixture f (z) = p 0 f 0 (z) + p 1 f 1 (z) Estimated local false discovery rate fdr(z) = Pr{null z} = p 0 f 0 (z) / ˆ f (z) cov { log fdr } cov { log f ˆ } Correlation, z-values, Accuracy 25
27 sd{log fdrhat(z)} ; N=6000, alpha=0,.1, and.2, (p0,mu,sig) = (.95,0,1) and (.05,2.5,1) sd alpha=.2 alpha=.1 alpha= z value > stars are sd's for N=1500, alpha=.1; number are fdrhat[z] Correlation, z-values, Accuracy 26
28 Now compare sd's for log{fdrhat} and log{fdrhat}, alpha=.1 sd sdlogfdrnon sdlogfdr sdlogfdr z value > numbers are Fdr[z] Correlation, z-values, Accuracy 27
29 References Efron, B. (2007a). Correlation and large-scale simultaneous significance testing. J. Amer. Statist. Assoc. 102: Efron, B. (2007b). Size, power and false discovery rates. Ann. Statist. 35: Efron, B. (2010). Correlated z-values and the accuracy of largescale statistical estimates. J. Amer. Statist. Assoc. To appear ( brad/papers). Golub, T., Slonim, D. and Tamayo, P. et al. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286: Correlation, z-values, Accuracy 28
30 Lancaster, H. (1958). The structure of bivariate distributions. Ann. Math. Statist. 29: Owen, A. B. (2005). Variance of the number of false discoveries. J. Roy. Statist. Soc. Ser. B 67: Correlation, z-values, Accuracy 29
Tweedie s Formula and Selection Bias. Bradley Efron Stanford University
Tweedie s Formula and Selection Bias Bradley Efron Stanford University Selection Bias Observe z i N(µ i, 1) for i = 1, 2,..., N Select the m biggest ones: z (1) > z (2) > z (3) > > z (m) Question: µ values?
More informationFrequentist Accuracy of Bayesian Estimates
Frequentist Accuracy of Bayesian Estimates Bradley Efron Stanford University Bayesian Inference Parameter: µ Ω Observed data: x Prior: π(µ) Probability distributions: Parameter of interest: { fµ (x), µ
More informationThe locfdr Package. August 19, hivdata... 1 lfdrsim... 2 locfdr Index 5
Title Computes local false discovery rates Version 1.1-2 The locfdr Package August 19, 2006 Author Bradley Efron, Brit Turnbull and Balasubramanian Narasimhan Computation of local false discovery rates
More informationBayesian Inference and the Parametric Bootstrap. Bradley Efron Stanford University
Bayesian Inference and the Parametric Bootstrap Bradley Efron Stanford University Importance Sampling for Bayes Posterior Distribution Newton and Raftery (1994 JRSS-B) Nonparametric Bootstrap: good choice
More informationPackage locfdr. July 15, Index 5
Version 1.1-8 Title Computes Local False Discovery Rates Package locfdr July 15, 2015 Maintainer Balasubramanian Narasimhan License GPL-2 Imports stats, splines, graphics Computation
More informationModel Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University
Model Selection, Estimation, and Bootstrap Smoothing Bradley Efron Stanford University Estimation After Model Selection Usually: (a) look at data (b) choose model (linear, quad, cubic...?) (c) fit estimates
More informationFalse discovery rate and related concepts in multiple comparisons problems, with applications to microarray data
False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008 1 / 35 Lecture outline Motivation for not using
More informationAre a set of microarrays independent of each other?
Are a set of microarrays independent of each other? Bradley Efron Stanford University Abstract Having observed an m n matrix X whose rows are possibly correlated, we wish to test the hypothesis that the
More informationEmpirical Bayes Deconvolution Problem
Empirical Bayes Deconvolution Problem Bayes Deconvolution Problem Unknown prior density g(θ) gives unobserved realizations Θ 1, Θ 2,..., Θ N iid g(θ) Each Θ k gives observed X k p Θk (x) [p Θ (x) known]
More informationROW AND COLUMN CORRELATIONS (ARE A SET OF MICROARRAYS INDEPENDENT OF EACH OTHER?) Bradley Efron Department of Statistics Stanford University
ROW AND COLUMN CORRELATIONS (ARE A SET OF MICROARRAYS INDEPENDENT OF EACH OTHER?) By Bradley Efron Department of Statistics Stanford University Technical Report 244 March 2008 This research was supported
More informationMultiple Testing. Hoang Tran. Department of Statistics, Florida State University
Multiple Testing Hoang Tran Department of Statistics, Florida State University Large-Scale Testing Examples: Microarray data: testing differences in gene expression between two traits/conditions Microbiome
More informationThe bootstrap and Markov chain Monte Carlo
The bootstrap and Markov chain Monte Carlo Bradley Efron Stanford University Abstract This note concerns the use of parametric bootstrap sampling to carry out Bayesian inference calculations. This is only
More informationStatistical testing. Samantha Kleinberg. October 20, 2009
October 20, 2009 Intro to significance testing Significance testing and bioinformatics Gene expression: Frequently have microarray data for some group of subjects with/without the disease. Want to find
More informationLinear Models and Empirical Bayes Methods for. Assessing Differential Expression in Microarray Experiments
Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments by Gordon K. Smyth (as interpreted by Aaron J. Baraff) STAT 572 Intro Talk April 10, 2014 Microarray
More informationAdvanced Statistical Methods: Beyond Linear Regression
Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi
More informationCopula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011
Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011 Outline Ordinary Least Squares (OLS) Regression Generalized Linear Models
More informationTechnical Report 1004 Dept. of Biostatistics. Some Exact and Approximations for the Distribution of the Realized False Discovery Rate
Technical Report 14 Dept. of Biostatistics Some Exact and Approximations for the Distribution of the Realized False Discovery Rate David Gold ab, Jeffrey C. Miecznikowski ab1 a Department of Biostatistics,
More informationA Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data
A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data Faming Liang, Chuanhai Liu, and Naisyin Wang Texas A&M University Multiple Hypothesis Testing Introduction
More informationEstimation of a Two-component Mixture Model
Estimation of a Two-component Mixture Model Bodhisattva Sen 1,2 University of Cambridge, Cambridge, UK Columbia University, New York, USA Indian Statistical Institute, Kolkata, India 6 August, 2012 1 Joint
More informationThe miss rate for the analysis of gene expression data
Biostatistics (2005), 6, 1,pp. 111 117 doi: 10.1093/biostatistics/kxh021 The miss rate for the analysis of gene expression data JONATHAN TAYLOR Department of Statistics, Stanford University, Stanford,
More informationBootstrap tests. Patrick Breheny. October 11. Bootstrap vs. permutation tests Testing for equality of location
Bootstrap tests Patrick Breheny October 11 Patrick Breheny STA 621: Nonparametric Statistics 1/14 Introduction Conditioning on the observed data to obtain permutation tests is certainly an important idea
More informationESTIMATING THE PROPORTION OF TRUE NULL HYPOTHESES UNDER DEPENDENCE
Statistica Sinica 22 (2012), 1689-1716 doi:http://dx.doi.org/10.5705/ss.2010.255 ESTIMATING THE PROPORTION OF TRUE NULL HYPOTHESES UNDER DEPENDENCE Irina Ostrovnaya and Dan L. Nicolae Memorial Sloan-Kettering
More informationThe bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap
Patrick Breheny December 6 Patrick Breheny BST 764: Applied Statistical Modeling 1/21 The empirical distribution function Suppose X F, where F (x) = Pr(X x) is a distribution function, and we wish to estimate
More informationResampling and the Bootstrap
Resampling and the Bootstrap Axel Benner Biostatistics, German Cancer Research Center INF 280, D-69120 Heidelberg benner@dkfz.de Resampling and the Bootstrap 2 Topics Estimation and Statistical Testing
More informationFrequentist Accuracy of Bayesian Estimates
Frequentist Accuracy of Bayesian Estimates Bradley Efron Stanford University RSS Journal Webinar Objective Bayesian Inference Probability family F = {f µ (x), µ Ω} Parameter of interest: θ = t(µ) Prior
More informationStatistical Applications in Genetics and Molecular Biology
Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca
More informationResampling and the Bootstrap
Resampling and the Bootstrap Axel Benner Biostatistics, German Cancer Research Center INF 280, D-69120 Heidelberg benner@dkfz.de Resampling and the Bootstrap 2 Topics Estimation and Statistical Testing
More informationLarge-Scale Multiple Testing of Correlations
Large-Scale Multiple Testing of Correlations T. Tony Cai and Weidong Liu Abstract Multiple testing of correlations arises in many applications including gene coexpression network analysis and brain connectivity
More informationStatistics Applied to Bioinformatics. Tests of homogeneity
Statistics Applied to Bioinformatics Tests of homogeneity Two-tailed test of homogeneity Two-tailed test H 0 :m = m Principle of the test Estimate the difference between m and m Compare this estimation
More informationBootstrap, Jackknife and other resampling methods
Bootstrap, Jackknife and other resampling methods Part III: Parametric Bootstrap Rozenn Dahyot Room 128, Department of Statistics Trinity College Dublin, Ireland dahyot@mee.tcd.ie 2005 R. Dahyot (TCD)
More informationFDR and ROC: Similarities, Assumptions, and Decisions
EDITORIALS 8 FDR and ROC: Similarities, Assumptions, and Decisions. Why FDR and ROC? It is a privilege to have been asked to introduce this collection of papers appearing in Statistica Sinica. The papers
More informationA G-Modeling Program for Deconvolution and Empirical Bayes Estimation
A G-Modeling Program for Deconvolution and Empirical Bayes Estimation Balasubramanian Narasimhan Stanford University Bradley Efron Stanford University Abstract Empirical Bayes inference assumes an unknown
More informationLarge-Scale Hypothesis Testing
Chapter 2 Large-Scale Hypothesis Testing Progress in statistics is usually at the mercy of our scientific colleagues, whose data is the nature from which we work. Agricultural experimentation in the early
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationMath Review Sheet, Fall 2008
1 Descriptive Statistics Math 3070-5 Review Sheet, Fall 2008 First we need to know about the relationship among Population Samples Objects The distribution of the population can be given in one of the
More informationPROCEDURES CONTROLLING THE k-fdr USING. BIVARIATE DISTRIBUTIONS OF THE NULL p-values. Sanat K. Sarkar and Wenge Guo
PROCEDURES CONTROLLING THE k-fdr USING BIVARIATE DISTRIBUTIONS OF THE NULL p-values Sanat K. Sarkar and Wenge Guo Temple University and National Institute of Environmental Health Sciences Abstract: Procedures
More informationAsymptotic Statistics-VI. Changliang Zou
Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous
More informationComparison of the Empirical Bayes and the Significance Analysis of Microarrays
Comparison of the Empirical Bayes and the Significance Analysis of Microarrays Holger Schwender, Andreas Krause, and Katja Ickstadt Abstract Microarrays enable to measure the expression levels of tens
More informationA STUDY OF PRE-VALIDATION
A STUDY OF PRE-VALIDATION Holger Höfling Robert Tibshirani July 3, 2007 Abstract Pre-validation is a useful technique for the analysis of microarray and other high dimensional data. It allows one to derive
More informationThe assumptions are needed to give us... valid standard errors valid confidence intervals valid hypothesis tests and p-values
Statistical Consulting Topics The Bootstrap... The bootstrap is a computer-based method for assigning measures of accuracy to statistical estimates. (Efron and Tibshrani, 1998.) What do we do when our
More informationChapter 7: Model Assessment and Selection
Chapter 7: Model Assessment and Selection DD3364 April 20, 2012 Introduction Regression: Review of our problem Have target variable Y to estimate from a vector of inputs X. A prediction model ˆf(X) has
More informationStatistical Applications in Genetics and Molecular Biology
Statistical Applications in Genetics and Molecular Biology Volume 1, Issue 1 2002 Article 1 Pre-validation and inference in microarrays Robert J. Tibshirani Brad Efron Stanford University, tibs@stat.stanford.edu
More informationSTAT440/840: Statistical Computing
First Prev Next Last STAT440/840: Statistical Computing Paul Marriott pmarriott@math.uwaterloo.ca MC 6096 February 2, 2005 Page 1 of 41 First Prev Next Last Page 2 of 41 Chapter 3: Data resampling: the
More informationSanat Sarkar Department of Statistics, Temple University Philadelphia, PA 19122, U.S.A. September 11, Abstract
Adaptive Controls of FWER and FDR Under Block Dependence arxiv:1611.03155v1 [stat.me] 10 Nov 2016 Wenge Guo Department of Mathematical Sciences New Jersey Institute of Technology Newark, NJ 07102, U.S.A.
More informationRegularized Discriminant Analysis and Its Application in Microarrays
Biostatistics (2005), 1, 1, pp. 1 18 Printed in Great Britain Regularized Discriminant Analysis and Its Application in Microarrays By YAQIAN GUO Department of Statistics, Stanford University Stanford,
More informationCorrection for Tuning Bias in Resampling Based Error Rate Estimation
Correction for Tuning Bias in Resampling Based Error Rate Estimation Christoph Bernau & Anne-Laure Boulesteix Institute of Medical Informatics, Biometry and Epidemiology (IBE), Ludwig-Maximilians University,
More informationA Bias Correction for the Minimum Error Rate in Cross-validation
A Bias Correction for the Minimum Error Rate in Cross-validation Ryan J. Tibshirani Robert Tibshirani Abstract Tuning parameters in supervised learning problems are often estimated by cross-validation.
More informationOn testing the significance of sets of genes
On testing the significance of sets of genes Bradley Efron and Robert Tibshirani August 17, 2006 Abstract This paper discusses the problem of identifying differentially expressed groups of genes from a
More information4 Resampling Methods: The Bootstrap
4 Resampling Methods: The Bootstrap Situation: Let x 1, x 2,..., x n be a SRS of size n taken from a distribution that is unknown. Let θ be a parameter of interest associated with this distribution and
More informationFrom Histograms to Multivariate Polynomial Histograms and Shape Estimation. Assoc Prof Inge Koch
From Histograms to Multivariate Polynomial Histograms and Shape Estimation Assoc Prof Inge Koch Statistics, School of Mathematical Sciences University of Adelaide Inge Koch (UNSW, Adelaide) Poly Histograms
More informationWeek 9 The Central Limit Theorem and Estimation Concepts
Week 9 and Estimation Concepts Week 9 and Estimation Concepts Week 9 Objectives 1 The Law of Large Numbers and the concept of consistency of averages are introduced. The condition of existence of the population
More informationResearch Article Sample Size Calculation for Controlling False Discovery Proportion
Probability and Statistics Volume 2012, Article ID 817948, 13 pages doi:10.1155/2012/817948 Research Article Sample Size Calculation for Controlling False Discovery Proportion Shulian Shang, 1 Qianhe Zhou,
More informationUniversity of California San Diego and Stanford University and
First International Workshop on Functional and Operatorial Statistics. Toulouse, June 19-21, 2008 K-sample Subsampling Dimitris N. olitis andjoseph.romano University of California San Diego and Stanford
More informationChapter 3: Statistical methods for estimation and testing. Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001).
Chapter 3: Statistical methods for estimation and testing Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001). Chapter 3: Statistical methods for estimation and testing Key reference:
More informationSTAT 461/561- Assignments, Year 2015
STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and
More informationThe Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies
The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies Ian Barnett, Rajarshi Mukherjee & Xihong Lin Harvard University ibarnett@hsph.harvard.edu June 24, 2014 Ian Barnett
More informationMutual fund performance: false discoveries, bias, and power
Ann Finance DOI 10.1007/s10436-010-0151-9 RESEARCH ARTICLE Mutual fund performance: false discoveries, bias, and power Nik Tuzov Frederi Viens Received: 17 July 2009 / Accepted: 17 March 2010 Springer-Verlag
More informationPackage FDRreg. August 29, 2016
Package FDRreg August 29, 2016 Type Package Title False discovery rate regression Version 0.1 Date 2014-02-24 Author James G. Scott, with contributions from Rob Kass and Jesse Windle Maintainer James G.
More informationProbabilistic Inference for Multiple Testing
This is the title page! This is the title page! Probabilistic Inference for Multiple Testing Chuanhai Liu and Jun Xie Department of Statistics, Purdue University, West Lafayette, IN 47907. E-mail: chuanhai,
More informationA moment-based method for estimating the proportion of true null hypotheses and its application to microarray gene expression data
Biostatistics (2007), 8, 4, pp. 744 755 doi:10.1093/biostatistics/kxm002 Advance Access publication on January 22, 2007 A moment-based method for estimating the proportion of true null hypotheses and its
More informationThe Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies
The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies Ian Barnett, Rajarshi Mukherjee & Xihong Lin Harvard University ibarnett@hsph.harvard.edu August 5, 2014 Ian Barnett
More informationBayesian inference and the parametric bootstrap
Bayesian inference and the parametric bootstrap Bradley Efron Stanford University Abstract The parametric bootstrap can be used for the efficient computation of Bayes posterior distributions. Importance
More informationIntroduction to Computational Finance and Financial Econometrics Probability Theory Review: Part 2
Introduction to Computational Finance and Financial Econometrics Probability Theory Review: Part 2 Eric Zivot July 7, 2014 Bivariate Probability Distribution Example - Two discrete rv s and Bivariate pdf
More informationStatistics for exp. medical researchers Regression and Correlation
Faculty of Health Sciences Regression analysis Statistics for exp. medical researchers Regression and Correlation Lene Theil Skovgaard Sept. 28, 2015 Linear regression, Estimation and Testing Confidence
More informationOn prediction and density estimation Peter McCullagh University of Chicago December 2004
On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating
More informationSTAT 536: Genetic Statistics
STAT 536: Genetic Statistics Tests for Hardy Weinberg Equilibrium Karin S. Dorman Department of Statistics Iowa State University September 7, 2006 Statistical Hypothesis Testing Identify a hypothesis,
More informationResampling-based Multiple Testing with Applications to Microarray Data Analysis
Resampling-based Multiple Testing with Applications to Microarray Data Analysis DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School
More informationIntroduction The framework Bias and variance Approximate computation of leverage Empirical evaluation Discussion of sampling approach in big data
Discussion of sampling approach in big data Big data discussion group at MSCS of UIC Outline 1 Introduction 2 The framework 3 Bias and variance 4 Approximate computation of leverage 5 Empirical evaluation
More informationSupervised Dimension Reduction:
Supervised Dimension Reduction: A Tale of Two Manifolds S. Mukherjee, K. Mao, F. Liang, Q. Wu, M. Maggioni, D-X. Zhou Department of Statistical Science Institute for Genome Sciences & Policy Department
More informationFalse discovery control for multiple tests of association under general dependence
False discovery control for multiple tests of association under general dependence Nicolai Meinshausen Seminar für Statistik ETH Zürich December 2, 2004 Abstract We propose a confidence envelope for false
More informationSpring 2012 Math 541B Exam 1
Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote
More informationThe automatic construction of bootstrap confidence intervals
The automatic construction of bootstrap confidence intervals Bradley Efron Balasubramanian Narasimhan Abstract The standard intervals, e.g., ˆθ ± 1.96ˆσ for nominal 95% two-sided coverage, are familiar
More informationBootstrap (Part 3) Christof Seiler. Stanford University, Spring 2016, Stats 205
Bootstrap (Part 3) Christof Seiler Stanford University, Spring 2016, Stats 205 Overview So far we used three different bootstraps: Nonparametric bootstrap on the rows (e.g. regression, PCA with random
More informationStatistical Applications in Genetics and Molecular Biology
Statistical Applications in Genetics and Molecular Biology Volume 6, Issue 1 2007 Article 28 A Comparison of Methods to Control Type I Errors in Microarray Studies Jinsong Chen Mark J. van der Laan Martyn
More informationThe purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.
Chapter 9 Pearson s chi-square test 9. Null hypothesis asymptotics Let X, X 2, be independent from a multinomial(, p) distribution, where p is a k-vector with nonnegative entries that sum to one. That
More informationL2: Review of probability and statistics
Probability L2: Review of probability and statistics Definition of probability Axioms and properties Conditional probability Bayes theorem Random variables Definition of a random variable Cumulative distribution
More informationInference with Transposable Data: Modeling the Effects of Row and Column Correlations
Inference with Transposable Data: Modeling the Effects of Row and Column Correlations Genevera I. Allen Department of Pediatrics-Neurology, Baylor College of Medicine, Jan and Dan Duncan Neurological Research
More informationMath 180B, Winter Notes on covariance and the bivariate normal distribution
Math 180B Winter 015 Notes on covariance and the bivariate normal distribution 1 Covariance If and are random variables with finite variances then their covariance is the quantity 11 Cov := E[ µ ] where
More informationBootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University babu.
Bootstrap G. Jogesh Babu Penn State University http://www.stat.psu.edu/ babu Director of Center for Astrostatistics http://astrostatistics.psu.edu Outline 1 Motivation 2 Simple statistical problem 3 Resampling
More informationExtended Bayesian Information Criteria for Model Selection with Large Model Spaces
Extended Bayesian Information Criteria for Model Selection with Large Model Spaces Jiahua Chen, University of British Columbia Zehua Chen, National University of Singapore (Biometrika, 2008) 1 / 18 Variable
More informationA Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data
A Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data Yoonkyung Lee Department of Statistics The Ohio State University http://www.stat.ohio-state.edu/ yklee May 13, 2005
More informationFirst Year Examination Department of Statistics, University of Florida
First Year Examination Department of Statistics, University of Florida August 19, 010, 8:00 am - 1:00 noon Instructions: 1. You have four hours to answer questions in this examination.. You must show your
More informationA class of generalized ridge estimator for high-dimensional linear regression
A class of generalized ridge estimator for high-dimensional linear regression Advisor: akeshi Emura Presenter: Szu-Peng Yang June 3, 04 Graduate Institute of Statistics, NCU Outline Introduction Methodology
More informationBIOS 2083 Linear Models c Abdus S. Wahed
Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter
More informationUNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75
More informationStatistics Assignment 2 HET551 Design and Development Project 1
Statistics Assignment HET Design and Development Project Michael Allwright - 74634 Haddon O Neill 7396 Monday, 3 June Simple Stochastic Processes Mean, Variance and Covariance Derivation The following
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationCramér-Type Moderate Deviation Theorems for Two-Sample Studentized (Self-normalized) U-Statistics. Wen-Xin Zhou
Cramér-Type Moderate Deviation Theorems for Two-Sample Studentized (Self-normalized) U-Statistics Wen-Xin Zhou Department of Mathematics and Statistics University of Melbourne Joint work with Prof. Qi-Man
More informationAppendix F. Computational Statistics Toolbox. The Computational Statistics Toolbox can be downloaded from:
Appendix F Computational Statistics Toolbox The Computational Statistics Toolbox can be downloaded from: http://www.infinityassociates.com http://lib.stat.cmu.edu. Please review the readme file for installation
More informationMAS3301 Bayesian Statistics Problems 5 and Solutions
MAS3301 Bayesian Statistics Problems 5 and Solutions Semester 008-9 Problems 5 1. (Some of this question is also in Problems 4). I recorded the attendance of students at tutorials for a module. Suppose
More informationStep-down FDR Procedures for Large Numbers of Hypotheses
Step-down FDR Procedures for Large Numbers of Hypotheses Paul N. Somerville University of Central Florida Abstract. Somerville (2004b) developed FDR step-down procedures which were particularly appropriate
More informationRESAMPLING METHODS FOR HOMOGENEITY TESTS OF COVARIANCE MATRICES
Statistica Sinica 1(00), 769-783 RESAMPLING METHODS FOR HOMOGENEITY TESTS OF COVARIANCE MATRICES Li-Xing Zhu 1,,KaiW.Ng 1 and Ping Jing 3 1 University of Hong Kong, Chinese Academy of Sciences, Beijing
More informationapplication in microarrays
Biostatistics Advance Access published April 7, 2006 Regularized linear discriminant analysis and its application in microarrays Yaqian Guo, Trevor Hastie and Robert Tibshirani Abstract In this paper,
More informationGeneralized Estimating Equations (gee) for glm type data
Generalized Estimating Equations (gee) for glm type data Søren Højsgaard mailto:sorenh@agrsci.dk Biometry Research Unit Danish Institute of Agricultural Sciences January 23, 2006 Printed: January 23, 2006
More informationComparing Two Variances. CI For Variance Ratio
STAT 503 Two Sample Inferences Comparing Two Variances Assume independent normal populations. Slide For Σ χ ν and Σ χ ν independent the ration Σ /ν Σ /ν follows an F-distribution with degrees of freedom
More informationOne-Sample Numerical Data
One-Sample Numerical Data quantiles, boxplot, histogram, bootstrap confidence intervals, goodness-of-fit tests University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html
More informationFactor-Adjusted Robust Multiple Test. Jianqing Fan (Princeton University)
Factor-Adjusted Robust Multiple Test Jianqing Fan Princeton University with Koushiki Bose, Qiang Sun, Wenxin Zhou August 11, 2017 Outline 1 Introduction 2 A principle of robustification 3 Adaptive Huber
More informationLinear Regression Model. Badr Missaoui
Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus
More informationConfidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods
Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)
More informationOutline. Confidence intervals More parametric tests More bootstrap and randomization tests. Cohen Empirical Methods CS650
Outline Confidence intervals More parametric tests More bootstrap and randomization tests Parameter Estimation Collect a sample to estimate the value of a population parameter. Example: estimate mean age
More information