False Discovery Control in Spatial Multiple Testing

Size: px
Start display at page:

Download "False Discovery Control in Spatial Multiple Testing"

Transcription

1 False Discovery Control in Spatial Multiple Testing WSun 1,BReich 2,TCai 3, M Guindani 4, and A. Schwartzman 2 WNAR, June, University of Southern California 2 North Carolina State University 3 University of Pennsylvania 4 MD Anderson

2 Spatial modeling of time trends in tropospheric ozone The EPA uses a monitoring network to regulate ozone. Our objective it to identify areas with changing ozone. Other examples of spatial multiple testing: climate change, disease monitoring, neuroimaging, etc. Latitude Latitude Longitude (a) First stage estimates, ˆβ(s) Longitude (b) First stage z-scores, z(s) = ˆβ(s)/w(s)

3 New issues in spatial multiple testing One only observes data points at a discrete subset of the locations but needs to make inference everywhere in the spatial domain. A finite approximation strategy is needed for inference in a continuous spatial domain otherwise an uncountable number of tests needs to be conducted, which is impossible in practice. It is desirable to aggregate information from nearby locations to make cluster-wise inference, and to incorporate important spatial variables in the decision-making process.

4 Gaussian random field model Let X = {X (s) :s S} be a random field on a spatial domain S: X (s) =μ(s)+ɛ(s), (1) where μ(s) is the unobserved random process and ɛ(s) is the noise process. An important special case is the Gaussian random field model where the signals and errors are Gaussian processes with means μ and 0, and covariance functions ρ 1 and ρ 2, respectively. Let Θ denote the collection of all parameters in model (1).

5 Notation Hypotheses H 0 (s) :μ(s) A versus H 1 (s) :μ(s) A c A is the indifference region, e.g.a = {μ : μ μ 0 } True states θ(s) =0ifH 0 (s) andθ(s) =1ifH 1 (s) Null area: S 0 = {s S : θ(s) =0} Non-null area S 1 = {s S : θ(s) =1} Decisions δ(s) = 1 if reject and δ(s) = 0 otherwise Rejection region, R = {s S : δ(s) =1} Error regions False positive area: S FP = {s S : θ(s) =0,δ(s) =1} False negative area: S FN = {s S : θ(s) =1,δ(s) =0}

6 False discovery measures The key quantity to control is the false discovery proportion, FDP = ν(s FP) ν(r) I {ν(r) > s 0}, where s 0 is a small positive value. FDR: Typically, false discovery rate is controlled so that FDR = E(FDP) <α. FDX: We might also want to be reasonably confident the FDPislessthansomevalue,sayτ (0, 1). Therefore, we also consider the false discovery exceedance, FDX τ = P(FDP >τ) <α. MDR: The power of a multiple testing procedure is summarized by the missed discovery rate, MDR = E{ν(S FN )}.

7 Compound decision theory for spatial multiple testing Under mild conditions, the multiple testing problem is equivalent to a weighted classification problem L(θ, δ) =λν(s FP )+ν(s FN ). There is a one-to-one relationship between λ and α. The optimal rule is to reject if the posterior probability of the null is smaller than a threshold t, i.e., δ(s) =I [T OR (s) < t], where T OR (s) =P Θ {θ(s) =0 X }. The threshold t is taken to be as large as possible (to increase power) while still maintaining FDR(t) α. FDR(t) is unknown and must be approximated.

8 Discrete approximation This problem boils down to estimating FDR(t), which is difficult because it is an integral over potentially infinitely-many tests (spatial locations). Let m i=1 S i be a fine partition of S. Take a point s i from each S i. Compute the probability of the null, T OR (s i ). Let {T (i) OR : i =1,, m} be the ordered oracle statistics and S (i) the region corresponding to T (i) OR.

9 FDR control Procedure Define R j = j i=1 S (i) and { r =max j : ν(r j ) 1 } j T (i) OR ν(s (i)) α. i=1 The rejection area is given by R = r i=1 S (i). Therefore, we assume the decision is constant within pixels and approximate the FDR as the sum of posterior probabilities of the null over the pixels in the rejection region.

10 FDX control Similar to FDR control, we control FDX by: Procedure Define ( FDX m τ,j = P I {ν(rj )>0} Θ ν(r j ) s i Rj m and r =max{j : FDX m τ,j α}. ) {1 θ(s i )}ν(s i ) >τ ɛ 0 X Then the rejection region is given by R = r i=1 S (i).

11 The decision process on a continuous spatial domain can be described, within a small margin of error, by a finite number of decisions on a grid of pixels. Theorem Under conditions on random field and partition (a) The FDR level of the FDR procedure satisfies FDR α + o(1) when m. (b) The FDX level of the FDX procedure at tolerance level τ satisfies FDX α + o(1) when m.

12 Computational Algorithms The numerical methods for model fitting and parameter estimation in spatial models have been extensively studied in a Bayesian computational framework. Suppose the MCMC samples are {μ b : b =1,, B}, where μ b =[μ b (s 1 ),,μ b (s m )] is a sample b. Let θ b (s i )=I [μ b (s i ) A c ]. T OR (s i ) can be estimated by ˆT OR (s i )= 1 B B [1 θ b (s i )]. b=1

13 Simulation setting Generate data from the model x(s) =μ(s)+ɛ(s). Both the signals and errors are generated as Gaussian processes. The signal process μ has mean μ and exponential covariance Cov[μ(s),μ(s )] = σ 2 μ exp[ s s /ρ μ ]. The error process ɛ has mean zero and covariance Cov[ɛ(s),ɛ(s )] = (1 r)i (s = s )+r exp[ s s /ρ ɛ ]. Choose n = 1000, r =0.9, μ = 1, and σ μ =2. The expected proportion of positive observations is 0.31.

14 Models We compare seven methods: Non-spatial approach of Benjamini and Hochberg FDR (BH) Non-spatial approach of Genovese and Wasserman (GW) Spatial approach of Pacifico et al (PGVW) FDR and FDX Oracle (our approach with hyperparameters fixed at true values) FDR and FDX MC (our approach with hyperparameters estimated using MCMC) FDR and FDX Uninformative priors: μ N(0, ) σ 2 μ Gamma(0.1, 0.1) r,ρ μ,ρ ε Uniform(0, 1).

15 False discovery rate (target is α =0.1) by ρ μ Mean FDP BH GW Oracle FDR Oracle FDX MC FDR MC FDX PGVW FDR PGVW FDX Spatial range

16 Distribution of FDP by ρ μ Distribution of FDP Oracle FDR Oracle FDX MC FDR MC FDX Spatial range

17 Missed discovery rate by ρ μ MDR Spatial range

18 Summary of simulation The oracle FDR procedure controls the FDR nearly perfectly. The MC FDR procedure with uninformative priors has good FDR control. FDX methods are more conservative than the FDR methods. The BH, GW, and PGVW are very conservative. MDR levels of the oracle and MC methods are much lower.

19 Ozone data analysis (a) Posterior mean of μ(s) (b) Posterior prob μ(s) < 0.1

20 Rejection region (black) using the FDX rule

21 Summary Convention (e.g., Benjamini and Yekutieli (2001) and Sarkar (2002)) is that it is safe to apply standard methods as if the tests were independent. While standard methods control FDR, incorporating the underlying dependency structure can dramatically improve the power. A continuous decision process can be described, within a small margin of error, by a finite number of decisions on a grid of pixels. FDR and FDX controlling problems can be solved in a unified theoretical and computational framework. We have also extended this to deal with spatial clusters.

False discovery control in large-scale spatial multiple testing

False discovery control in large-scale spatial multiple testing J. R. tatist. oc. B (2015) 77, Part 1, pp. 59 83 False discovery control in large-scale spatial multiple testing Wenguang un, University of outhern California, Los Angeles, UA Brian J. Reich, North Carolina

More information

Doing Cosmology with Balls and Envelopes

Doing Cosmology with Balls and Envelopes Doing Cosmology with Balls and Envelopes Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Larry Wasserman Department of Statistics Carnegie

More information

High-throughput Testing

High-throughput Testing High-throughput Testing Noah Simon and Richard Simon July 2016 1 / 29 Testing vs Prediction On each of n patients measure y i - single binary outcome (eg. progression after a year, PCR) x i - p-vector

More information

New Approaches to False Discovery Control

New Approaches to False Discovery Control New Approaches to False Discovery Control Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Larry Wasserman Department of Statistics Carnegie

More information

New Procedures for False Discovery Control

New Procedures for False Discovery Control New Procedures for False Discovery Control Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Elisha Merriam Department of Neuroscience University

More information

Peak Detection for Images

Peak Detection for Images Peak Detection for Images Armin Schwartzman Division of Biostatistics, UC San Diego June 016 Overview How can we improve detection power? Use a less conservative error criterion Take advantage of prior

More information

A Large-Sample Approach to Controlling the False Discovery Rate

A Large-Sample Approach to Controlling the False Discovery Rate A Large-Sample Approach to Controlling the False Discovery Rate Christopher R. Genovese Department of Statistics Carnegie Mellon University Larry Wasserman Department of Statistics Carnegie Mellon University

More information

Exceedance Control of the False Discovery Proportion Christopher Genovese 1 and Larry Wasserman 2 Carnegie Mellon University July 10, 2004

Exceedance Control of the False Discovery Proportion Christopher Genovese 1 and Larry Wasserman 2 Carnegie Mellon University July 10, 2004 Exceedance Control of the False Discovery Proportion Christopher Genovese 1 and Larry Wasserman 2 Carnegie Mellon University July 10, 2004 Multiple testing methods to control the False Discovery Rate (FDR),

More information

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Christopher R. Genovese Department of Statistics Carnegie Mellon University joint work with Larry Wasserman

More information

Heterogeneity and False Discovery Rate Control

Heterogeneity and False Discovery Rate Control Heterogeneity and False Discovery Rate Control Joshua D Habiger Oklahoma State University jhabige@okstateedu URL: jdhabigerokstateedu August, 2014 Motivating Data: Anderson and Habiger (2012) M = 778 bacteria

More information

A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data

A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data Faming Liang, Chuanhai Liu, and Naisyin Wang Texas A&M University Multiple Hypothesis Testing Introduction

More information

Confidence Thresholds and False Discovery Control

Confidence Thresholds and False Discovery Control Confidence Thresholds and False Discovery Control Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Larry Wasserman Department of Statistics

More information

Hunting for significance with multiple testing

Hunting for significance with multiple testing Hunting for significance with multiple testing Etienne Roquain 1 1 Laboratory LPMA, Université Pierre et Marie Curie (Paris 6), France Séminaire MODAL X, 19 mai 216 Etienne Roquain Hunting for significance

More information

Journal of Statistical Software

Journal of Statistical Software JSS Journal of Statistical Software MMMMMM YYYY, Volume VV, Issue II. doi: 10.18637/jss.v000.i00 GroupTest: Multiple Testing Procedure for Grouped Hypotheses Zhigen Zhao Abstract In the modern Big Data

More information

Alpha-Investing. Sequential Control of Expected False Discoveries

Alpha-Investing. Sequential Control of Expected False Discoveries Alpha-Investing Sequential Control of Expected False Discoveries Dean Foster Bob Stine Department of Statistics Wharton School of the University of Pennsylvania www-stat.wharton.upenn.edu/ stine Joint

More information

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1 Controlling Bayes Directional False Discovery Rate in Random Effects Model 1 Sanat K. Sarkar a, Tianhui Zhou b a Temple University, Philadelphia, PA 19122, USA b Wyeth Pharmaceuticals, Collegeville, PA

More information

Lecture 7 April 16, 2018

Lecture 7 April 16, 2018 Stats 300C: Theory of Statistics Spring 2018 Lecture 7 April 16, 2018 Prof. Emmanuel Candes Scribe: Feng Ruan; Edited by: Rina Friedberg, Junjie Zhu 1 Outline Agenda: 1. False Discovery Rate (FDR) 2. Properties

More information

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018 High-Throughput Sequencing Course Multiple Testing Biostatistics and Bioinformatics Summer 2018 Introduction You have previously considered the significance of a single gene Introduction You have previously

More information

A GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE

A GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE A GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE Sanat K. Sarkar 1, Tianhui Zhou and Debashis Ghosh Temple University, Wyeth Pharmaceuticals and

More information

Simultaneous Testing of Grouped Hypotheses: Finding Needles in Multiple Haystacks

Simultaneous Testing of Grouped Hypotheses: Finding Needles in Multiple Haystacks University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2009 Simultaneous Testing of Grouped Hypotheses: Finding Needles in Multiple Haystacks T. Tony Cai University of Pennsylvania

More information

Bayesian spatial quantile regression

Bayesian spatial quantile regression Brian J. Reich and Montserrat Fuentes North Carolina State University and David B. Dunson Duke University E-mail:reich@stat.ncsu.edu Tropospheric ozone Tropospheric ozone has been linked with several adverse

More information

False Discovery Rates for Random Fields

False Discovery Rates for Random Fields False Discovery Rates for Random Fields M. Perone Pacifico, C. Genovese, I. Verdinelli, L. Wasserman 1 Carnegie Mellon University and Università di Roma La Sapienza. February 25, 2003 ABSTRACT This paper

More information

Hybrid Dirichlet processes for functional data

Hybrid Dirichlet processes for functional data Hybrid Dirichlet processes for functional data Sonia Petrone Università Bocconi, Milano Joint work with Michele Guindani - U.T. MD Anderson Cancer Center, Houston and Alan Gelfand - Duke University, USA

More information

Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. T=number of type 2 errors

Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. T=number of type 2 errors The Multiple Testing Problem Multiple Testing Methods for the Analysis of Microarray Data 3/9/2009 Copyright 2009 Dan Nettleton Suppose one test of interest has been conducted for each of m genes in a

More information

Post-Selection Inference

Post-Selection Inference Classical Inference start end start Post-Selection Inference selected end model data inference data selection model data inference Post-Selection Inference Todd Kuffner Washington University in St. Louis

More information

The Pennsylvania State University The Graduate School Eberly College of Science GENERALIZED STEPWISE PROCEDURES FOR

The Pennsylvania State University The Graduate School Eberly College of Science GENERALIZED STEPWISE PROCEDURES FOR The Pennsylvania State University The Graduate School Eberly College of Science GENERALIZED STEPWISE PROCEDURES FOR CONTROLLING THE FALSE DISCOVERY RATE A Dissertation in Statistics by Scott Roths c 2011

More information

Controlling the proportion of falsely-rejected hypotheses. when conducting multiple tests with climatological data

Controlling the proportion of falsely-rejected hypotheses. when conducting multiple tests with climatological data Controlling the proportion of falsely-rejected hypotheses when conducting multiple tests with climatological data Valérie Ventura 1 Department of Statistics and the Center for the Neural Basis of Cognition

More information

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University Multiple Testing Hoang Tran Department of Statistics, Florida State University Large-Scale Testing Examples: Microarray data: testing differences in gene expression between two traits/conditions Microbiome

More information

Modified Simes Critical Values Under Positive Dependence

Modified Simes Critical Values Under Positive Dependence Modified Simes Critical Values Under Positive Dependence Gengqian Cai, Sanat K. Sarkar Clinical Pharmacology Statistics & Programming, BDS, GlaxoSmithKline Statistics Department, Temple University, Philadelphia

More information

Large-Scale Hypothesis Testing

Large-Scale Hypothesis Testing Chapter 2 Large-Scale Hypothesis Testing Progress in statistics is usually at the mercy of our scientific colleagues, whose data is the nature from which we work. Agricultural experimentation in the early

More information

Spatial Inference of Nitrate Concentrations in Groundwater

Spatial Inference of Nitrate Concentrations in Groundwater Spatial Inference of Nitrate Concentrations in Groundwater Dawn Woodard Operations Research & Information Engineering Cornell University joint work with Robert Wolpert, Duke Univ. Dept. of Statistical

More information

More powerful control of the false discovery rate under dependence

More powerful control of the false discovery rate under dependence Statistical Methods & Applications (2006) 15: 43 73 DOI 10.1007/s10260-006-0002-z ORIGINAL ARTICLE Alessio Farcomeni More powerful control of the false discovery rate under dependence Accepted: 10 November

More information

Hierarchical Modeling for Univariate Spatial Data

Hierarchical Modeling for Univariate Spatial Data Hierarchical Modeling for Univariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Spatial Domain 2 Geography 890 Spatial Domain This

More information

Large-Scale Multiple Testing of Correlations

Large-Scale Multiple Testing of Correlations Large-Scale Multiple Testing of Correlations T. Tony Cai and Weidong Liu Abstract Multiple testing of correlations arises in many applications including gene coexpression network analysis and brain connectivity

More information

Looking at the Other Side of Bonferroni

Looking at the Other Side of Bonferroni Department of Biostatistics University of Washington 24 May 2012 Multiple Testing: Control the Type I Error Rate When analyzing genetic data, one will commonly perform over 1 million (and growing) hypothesis

More information

Extended Bayesian Information Criteria for Model Selection with Large Model Spaces

Extended Bayesian Information Criteria for Model Selection with Large Model Spaces Extended Bayesian Information Criteria for Model Selection with Large Model Spaces Jiahua Chen, University of British Columbia Zehua Chen, National University of Singapore (Biometrika, 2008) 1 / 18 Variable

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Stat 206: Estimation and testing for a mean vector,

Stat 206: Estimation and testing for a mean vector, Stat 206: Estimation and testing for a mean vector, Part II James Johndrow 2016-12-03 Comparing components of the mean vector In the last part, we talked about testing the hypothesis H 0 : µ 1 = µ 2 where

More information

Resampling-Based Control of the FDR

Resampling-Based Control of the FDR Resampling-Based Control of the FDR Joseph P. Romano 1 Azeem S. Shaikh 2 and Michael Wolf 3 1 Departments of Economics and Statistics Stanford University 2 Department of Economics University of Chicago

More information

REPRODUCIBLE ANALYSIS OF HIGH-THROUGHPUT EXPERIMENTS

REPRODUCIBLE ANALYSIS OF HIGH-THROUGHPUT EXPERIMENTS REPRODUCIBLE ANALYSIS OF HIGH-THROUGHPUT EXPERIMENTS Ying Liu Department of Biostatistics, Columbia University Summer Intern at Research and CMC Biostats, Sanofi, Boston August 26, 2015 OUTLINE 1 Introduction

More information

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data Hierarchical models for spatial data Based on the book by Banerjee, Carlin and Gelfand Hierarchical Modeling and Analysis for Spatial Data, 2004. We focus on Chapters 1, 2 and 5. Geo-referenced data arise

More information

Multiple testing with the structure-adaptive Benjamini Hochberg algorithm

Multiple testing with the structure-adaptive Benjamini Hochberg algorithm J. R. Statist. Soc. B (2019) 81, Part 1, pp. 45 74 Multiple testing with the structure-adaptive Benjamini Hochberg algorithm Ang Li and Rina Foygel Barber University of Chicago, USA [Received June 2016.

More information

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Applied Mathematical Sciences, Vol. 4, 2010, no. 62, 3083-3093 Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Julia Bondarenko Helmut-Schmidt University Hamburg University

More information

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008 1 / 35 Lecture outline Motivation for not using

More information

Hierarchical Modelling for Univariate Spatial Data

Hierarchical Modelling for Univariate Spatial Data Hierarchical Modelling for Univariate Spatial Data Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department

More information

3 Comparison with Other Dummy Variable Methods

3 Comparison with Other Dummy Variable Methods Stats 300C: Theory of Statistics Spring 2018 Lecture 11 April 25, 2018 Prof. Emmanuel Candès Scribe: Emmanuel Candès, Michael Celentano, Zijun Gao, Shuangning Li 1 Outline Agenda: Knockoffs 1. Introduction

More information

Hierarchical Modelling for Multivariate Spatial Data

Hierarchical Modelling for Multivariate Spatial Data Hierarchical Modelling for Multivariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Point-referenced spatial data often come as

More information

Improving the Performance of the FDR Procedure Using an Estimator for the Number of True Null Hypotheses

Improving the Performance of the FDR Procedure Using an Estimator for the Number of True Null Hypotheses Improving the Performance of the FDR Procedure Using an Estimator for the Number of True Null Hypotheses Amit Zeisel, Or Zuk, Eytan Domany W.I.S. June 5, 29 Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving

More information

Rejoinder on: Control of the false discovery rate under dependence using the bootstrap and subsampling

Rejoinder on: Control of the false discovery rate under dependence using the bootstrap and subsampling Test (2008) 17: 461 471 DOI 10.1007/s11749-008-0134-6 DISCUSSION Rejoinder on: Control of the false discovery rate under dependence using the bootstrap and subsampling Joseph P. Romano Azeem M. Shaikh

More information

PROCEDURES CONTROLLING THE k-fdr USING. BIVARIATE DISTRIBUTIONS OF THE NULL p-values. Sanat K. Sarkar and Wenge Guo

PROCEDURES CONTROLLING THE k-fdr USING. BIVARIATE DISTRIBUTIONS OF THE NULL p-values. Sanat K. Sarkar and Wenge Guo PROCEDURES CONTROLLING THE k-fdr USING BIVARIATE DISTRIBUTIONS OF THE NULL p-values Sanat K. Sarkar and Wenge Guo Temple University and National Institute of Environmental Health Sciences Abstract: Procedures

More information

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Xiaoquan Wen Department of Biostatistics, University of Michigan A Model

More information

FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES

FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES Sanat K. Sarkar a a Department of Statistics, Temple University, Speakman Hall (006-00), Philadelphia, PA 19122, USA Abstract The concept

More information

False Discovery Rate

False Discovery Rate False Discovery Rate Peng Zhao Department of Statistics Florida State University December 3, 2018 Peng Zhao False Discovery Rate 1/30 Outline 1 Multiple Comparison and FWER 2 False Discovery Rate 3 FDR

More information

Control of Generalized Error Rates in Multiple Testing

Control of Generalized Error Rates in Multiple Testing Institute for Empirical Research in Economics University of Zurich Working Paper Series ISSN 1424-0459 Working Paper No. 245 Control of Generalized Error Rates in Multiple Testing Joseph P. Romano and

More information

FALSE DISCOVERY AND FALSE NONDISCOVERY RATES IN SINGLE-STEP MULTIPLE TESTING PROCEDURES 1. BY SANAT K. SARKAR Temple University

FALSE DISCOVERY AND FALSE NONDISCOVERY RATES IN SINGLE-STEP MULTIPLE TESTING PROCEDURES 1. BY SANAT K. SARKAR Temple University The Annals of Statistics 2006, Vol. 34, No. 1, 394 415 DOI: 10.1214/009053605000000778 Institute of Mathematical Statistics, 2006 FALSE DISCOVERY AND FALSE NONDISCOVERY RATES IN SINGLE-STEP MULTIPLE TESTING

More information

Weighted Adaptive Multiple Decision Functions for False Discovery Rate Control

Weighted Adaptive Multiple Decision Functions for False Discovery Rate Control Weighted Adaptive Multiple Decision Functions for False Discovery Rate Control Joshua D. Habiger Oklahoma State University jhabige@okstate.edu Nov. 8, 2013 Outline 1 : Motivation and FDR Research Areas

More information

Chapter 4 - Fundamentals of spatial processes Lecture notes

Chapter 4 - Fundamentals of spatial processes Lecture notes Chapter 4 - Fundamentals of spatial processes Lecture notes Geir Storvik January 21, 2013 STK4150 - Intro 2 Spatial processes Typically correlation between nearby sites Mostly positive correlation Negative

More information

STAT 461/561- Assignments, Year 2015

STAT 461/561- Assignments, Year 2015 STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and

More information

Machine learning: Hypothesis testing. Anders Hildeman

Machine learning: Hypothesis testing. Anders Hildeman Location of trees 0 Observed trees 50 100 150 200 250 300 350 400 450 500 0 100 200 300 400 500 600 700 800 900 1000 Figur: Observed points pattern of the tree specie Beilschmiedia pendula. Location of

More information

A Bayesian Approach to Phylogenetics

A Bayesian Approach to Phylogenetics A Bayesian Approach to Phylogenetics Niklas Wahlberg Based largely on slides by Paul Lewis (www.eeb.uconn.edu) An Introduction to Bayesian Phylogenetics Bayesian inference in general Markov chain Monte

More information

Some General Types of Tests

Some General Types of Tests Some General Types of Tests We may not be able to find a UMP or UMPU test in a given situation. In that case, we may use test of some general class of tests that often have good asymptotic properties.

More information

Applying the Benjamini Hochberg procedure to a set of generalized p-values

Applying the Benjamini Hochberg procedure to a set of generalized p-values U.U.D.M. Report 20:22 Applying the Benjamini Hochberg procedure to a set of generalized p-values Fredrik Jonsson Department of Mathematics Uppsala University Applying the Benjamini Hochberg procedure

More information

The General Linear Model. Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London

The General Linear Model. Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London The General Linear Model Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London SPM Course Lausanne, April 2012 Image time-series Spatial filter Design matrix Statistical Parametric

More information

Controlling the Proportion of Falsely Rejected Hypotheses when Conducting Multiple Tests with Climatological Data

Controlling the Proportion of Falsely Rejected Hypotheses when Conducting Multiple Tests with Climatological Data 15 NOVEMBER 2004 VENTURA ET AL. 4343 Controlling the Proportion of Falsely Rejected Hypotheses when Conducting Multiple Tests with Climatological Data VALÉRIE VENTURA Department of Statistics, and Center

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or

More information

Statistical testing. Samantha Kleinberg. October 20, 2009

Statistical testing. Samantha Kleinberg. October 20, 2009 October 20, 2009 Intro to significance testing Significance testing and bioinformatics Gene expression: Frequently have microarray data for some group of subjects with/without the disease. Want to find

More information

Global Testing and Large-Scale Multiple Testing for High-Dimensional Covariance Structures

Global Testing and Large-Scale Multiple Testing for High-Dimensional Covariance Structures Global Testing and Large-Scale Multiple Testing for High-Dimensional Covariance Structures T. Tony Cai 1 1 Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, USA, 19104;

More information

Statistical Inference

Statistical Inference Statistical Inference Classical and Bayesian Methods Class 6 AMS-UCSC Thu 26, 2012 Winter 2012. Session 1 (Class 6) AMS-132/206 Thu 26, 2012 1 / 15 Topics Topics We will talk about... 1 Hypothesis testing

More information

Statistical Emerging Pattern Mining with Multiple Testing Correction

Statistical Emerging Pattern Mining with Multiple Testing Correction Statistical Emerging Pattern Mining with Multiple Testing Correction Junpei Komiyama 1, Masakazu Ishihata 2, Hiroki Arimura 2, Takashi Nishibayashi 3, Shin-Ichi Minato 2 1. U-Tokyo 2. Hokkaido Univ. 3.

More information

Physician Performance Assessment / Spatial Inference of Pollutant Concentrations

Physician Performance Assessment / Spatial Inference of Pollutant Concentrations Physician Performance Assessment / Spatial Inference of Pollutant Concentrations Dawn Woodard Operations Research & Information Engineering Cornell University Johns Hopkins Dept. of Biostatistics, April

More information

Performance Evaluation and Comparison

Performance Evaluation and Comparison Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation

More information

Step-down FDR Procedures for Large Numbers of Hypotheses

Step-down FDR Procedures for Large Numbers of Hypotheses Step-down FDR Procedures for Large Numbers of Hypotheses Paul N. Somerville University of Central Florida Abstract. Somerville (2004b) developed FDR step-down procedures which were particularly appropriate

More information

Luke B Smith and Brian J Reich North Carolina State University May 21, 2013

Luke B Smith and Brian J Reich North Carolina State University May 21, 2013 BSquare: An R package for Bayesian simultaneous quantile regression Luke B Smith and Brian J Reich North Carolina State University May 21, 2013 BSquare in an R package to conduct Bayesian quantile regression

More information

Online Rules for Control of False Discovery Rate and False Discovery Exceedance

Online Rules for Control of False Discovery Rate and False Discovery Exceedance Online Rules for Control of False Discovery Rate and False Discovery Exceedance Adel Javanmard and Andrea Montanari August 15, 216 Abstract Multiple hypothesis testing is a core problem in statistical

More information

Large-Scale Multiple Testing of Correlations

Large-Scale Multiple Testing of Correlations University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-5-2016 Large-Scale Multiple Testing of Correlations T. Tony Cai University of Pennsylvania Weidong Liu Follow this

More information

Detection theory. H 0 : x[n] = w[n]

Detection theory. H 0 : x[n] = w[n] Detection Theory Detection theory A the last topic of the course, we will briefly consider detection theory. The methods are based on estimation theory and attempt to answer questions such as Is a signal

More information

Estimation of a Two-component Mixture Model

Estimation of a Two-component Mixture Model Estimation of a Two-component Mixture Model Bodhisattva Sen 1,2 University of Cambridge, Cambridge, UK Columbia University, New York, USA Indian Statistical Institute, Kolkata, India 6 August, 2012 1 Joint

More information

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33 Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett

More information

Advanced Statistical Methods: Beyond Linear Regression

Advanced Statistical Methods: Beyond Linear Regression Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Hypothesis testing Machine Learning CSE546 Kevin Jamieson University of Washington October 30, 2018 2018 Kevin Jamieson 2 Anomaly detection You are

More information

FDR and ROC: Similarities, Assumptions, and Decisions

FDR and ROC: Similarities, Assumptions, and Decisions EDITORIALS 8 FDR and ROC: Similarities, Assumptions, and Decisions. Why FDR and ROC? It is a privilege to have been asked to introduce this collection of papers appearing in Statistica Sinica. The papers

More information

Bayesian Regression (1/31/13)

Bayesian Regression (1/31/13) STA613/CBB540: Statistical methods in computational biology Bayesian Regression (1/31/13) Lecturer: Barbara Engelhardt Scribe: Amanda Lea 1 Bayesian Paradigm Bayesian methods ask: given that I have observed

More information

An Introduction to Bayesian Linear Regression

An Introduction to Bayesian Linear Regression An Introduction to Bayesian Linear Regression APPM 5720: Bayesian Computation Fall 2018 A SIMPLE LINEAR MODEL Suppose that we observe explanatory variables x 1, x 2,..., x n and dependent variables y 1,

More information

arxiv: v1 [math.st] 31 Mar 2009

arxiv: v1 [math.st] 31 Mar 2009 The Annals of Statistics 2009, Vol. 37, No. 2, 619 629 DOI: 10.1214/07-AOS586 c Institute of Mathematical Statistics, 2009 arxiv:0903.5373v1 [math.st] 31 Mar 2009 AN ADAPTIVE STEP-DOWN PROCEDURE WITH PROVEN

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

The Pennsylvania State University The Graduate School A BAYESIAN APPROACH TO FALSE DISCOVERY RATE FOR LARGE SCALE SIMULTANEOUS INFERENCE

The Pennsylvania State University The Graduate School A BAYESIAN APPROACH TO FALSE DISCOVERY RATE FOR LARGE SCALE SIMULTANEOUS INFERENCE The Pennsylvania State University The Graduate School A BAYESIAN APPROACH TO FALSE DISCOVERY RATE FOR LARGE SCALE SIMULTANEOUS INFERENCE A Thesis in Statistics by Bing Han c 2007 Bing Han Submitted in

More information

Research Article Sample Size Calculation for Controlling False Discovery Proportion

Research Article Sample Size Calculation for Controlling False Discovery Proportion Probability and Statistics Volume 2012, Article ID 817948, 13 pages doi:10.1155/2012/817948 Research Article Sample Size Calculation for Controlling False Discovery Proportion Shulian Shang, 1 Qianhe Zhou,

More information

Incorporation of Sparsity Information in Large-scale Multiple Two-sample t Tests

Incorporation of Sparsity Information in Large-scale Multiple Two-sample t Tests Incorporation of Sparsity Information in Large-scale Multiple Two-sample t Tests Weidong Liu October 19, 2014 Abstract Large-scale multiple two-sample Student s t testing problems often arise from the

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Methods for High Dimensional Inferences With Applications in Genomics

Methods for High Dimensional Inferences With Applications in Genomics University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations Summer 8-12-2011 Methods for High Dimensional Inferences With Applications in Genomics Jichun Xie University of Pennsylvania,

More information

Lecture 28. Ingo Ruczinski. December 3, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Lecture 28. Ingo Ruczinski. December 3, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University Lecture 28 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University December 3, 2015 1 2 3 4 5 1 Familywise error rates 2 procedure 3 Performance of with multiple

More information

Tests about a population mean

Tests about a population mean October 2 nd, 2017 Overview Week 1 Week 2 Week 4 Week 7 Week 10 Week 12 Chapter 1: Descriptive statistics Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter 8: Confidence

More information

A NEW APPROACH FOR LARGE SCALE MULTIPLE TESTING WITH APPLICATION TO FDR CONTROL FOR GRAPHICALLY STRUCTURED HYPOTHESES

A NEW APPROACH FOR LARGE SCALE MULTIPLE TESTING WITH APPLICATION TO FDR CONTROL FOR GRAPHICALLY STRUCTURED HYPOTHESES A NEW APPROACH FOR LARGE SCALE MULTIPLE TESTING WITH APPLICATION TO FDR CONTROL FOR GRAPHICALLY STRUCTURED HYPOTHESES By Wenge Guo Gavin Lynch Joseph P. Romano Technical Report No. 2018-06 September 2018

More information

This paper has been submitted for consideration for publication in Biometrics

This paper has been submitted for consideration for publication in Biometrics BIOMETRICS, 1 10 Supplementary material for Control with Pseudo-Gatekeeping Based on a Possibly Data Driven er of the Hypotheses A. Farcomeni Department of Public Health and Infectious Diseases Sapienza

More information

Bumpbars: Inference for region detection. Yuval Benjamini, Hebrew University

Bumpbars: Inference for region detection. Yuval Benjamini, Hebrew University Bumpbars: Inference for region detection Yuval Benjamini, Hebrew University yuvalbenj@gmail.com WHOA-PSI-2017 Collaborators Jonathan Taylor Stanford Rafael Irizarry Dana Farber, Harvard Amit Meir U of

More information

Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests

Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests http://benasque.org/2018tae/cgi-bin/talks/allprint.pl TAE 2018 Benasque, Spain 3-15 Sept 2018 Glen Cowan Physics

More information

The miss rate for the analysis of gene expression data

The miss rate for the analysis of gene expression data Biostatistics (2005), 6, 1,pp. 111 117 doi: 10.1093/biostatistics/kxh021 The miss rate for the analysis of gene expression data JONATHAN TAYLOR Department of Statistics, Stanford University, Stanford,

More information

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information