Multiple Dependent Hypothesis Tests in Geographically Weighted Regression

Size: px
Start display at page:

Download "Multiple Dependent Hypothesis Tests in Geographically Weighted Regression"

Transcription

1 Multiple Dependent Hypothesis Tests in Geographically Weighted Regression Graeme Byrne 1, Martin Charlton 2, and Stewart Fotheringham 3 1 La Trobe University, Bendigo, Victoria Austrlaia Telephone: Fax: g.byrne@latrobe.edu.au 2 National Centre for Geocomputation,National University of Ireland, Maynooth, Co. Kildare, Ireland Telephone: +353 (0) Fax: +353 (0) martin.charlton@nuim.ie 3 National Centre for Geocomputation,National University of Ireland, Maynooth, Co. Kildare, Ireland Telephone: +353 (0) Fax: +353 (0) stewart.fotheringham@nuim.ie 1. Introduction Geographically weighted regression (Fotheringham et al., 2002) is a method of modelling spatial variability in regression coefficients. The procedure yields a separate model for each spatial location in the study area with all models generated from the same data set using a differential weighting scheme. The weighting scheme, which allows for spatial variation in the model parameters, involves a bandwidth parameter which is usually determined from the data using a cross-validation procedure. Part of the main output is a set of location-specific parameter estimates and associated t statistics which can be used to test hypotheses about individual model parameters. If there are n spatial locations and p parameters in each model, there will be up to np hypotheses to be tested which in most applications defines a very high order multiple inference problem. Solutions to problems of this type usually involve an adjustment to the decision rule for individual tests designed to contain the overall risk of mistaking chance variation for a genuine effect. An undesirable by-product of achieving this control is a reduction in statistical power for individual tests, which may result in genuine effects going undetected. These two competing aspects of multiple inference have become known as the multiplicity problem. In this paper we develop a simple Bonferroni style adjustment for testing multiple hypotheses about GWR model coefficients. The adjustment takes advantage of the intrinsic dependency between local GWR models to contain the overall risk mentioned above, without the large sacrifice in power associated with the traditional Bonferroni correction. We illustrate this adjustment and a range of other corrective procedures on two data sets. The first models the determinants of educational attainment in the counties of Georgia USA. Using area based census data we examine the links between levels of educational attainment and four potential predictors: the proportion of elderly, the proportion who are foreign born, the proportion living below the poverty line and the proportion of ethnic

2 blacks. The second model is a geographically weighted hedonic house price model based on individual mortgage records in Greater London in In both models we show how the various corrections can be used to guide the interpretation of the spatial variations in the parameter estimates. Finally we compare the statistical power of the proposed method with Bonferroni/Sidak corrections and those based on false discovery rate control. 2. Controlling the family wise error rate The traditional approach to the multiplicity problem is to control the probability of a type I error over a set (family) of m related hypothesis tests. The probability of rejecting one or more true null hypotheses is called the family wise error rate which we denote ξ m and controlling this quantity is a standard goal of most traditional multiple inference procedures. If, in a set of m hypothesis tests, E i is the event a type I errors occurs in the ith test, the family wise error rate is given by ξ m = P ( m ) E i m P(E i ), (1) where the last term follows from Boole s inequality. If P(E i ) has the same value (say α) for all tests, the Bonforroni (1935) correction follows from equation 1 since then ξ m mα and so setting α = ξ m /m controls the family-wise error rate to be ξ m or less. This result does not require the tests to be independent however if they are independent, we can write ( ) ( ) m m m m ξ m = P E i = 1 P E i = 1 P(E i ) = 1 (1 P(E i )). Thus, for m independent tests with P(E i ) = α, the family wise error rate is ξ m = 1 (1 α) m and the Šidák (1967) correction follows by choosing α = 1 (1 ξ m ) 1/m which controls the family wise error rate at exactly ξ m. Both of the above approaches become very conservative when the number of hypotheses to be tested is large and when the tests are not independent (Abdi, 2007). For dependent tests Moyè (2003, p ) uses conditional probability and induction arguments to derive the following generalisation of Šidák s result, ξ m = 1 (1 α) ( 1 α ( 1 D 2)) m 1, (2) where the family wise error rate now depends on D [0,1] which is called the degree of dependency (assumed constant) across all tests. For a full discussion of the dependency parameter see Moyè (2003, Ch. 5). If D = 1 we need only test any one of the hypotheses in the set using the desired family wise error rate, since then the decision for this hypothesis completely determines the outcome of all remaining hypotheses in the set. If D = 0 the hypotheses are independent and equation 2 reduces to the Šidák result. Of course it still remains to choose a value for D and unfortunately there are few general guidelines. Moyè (2003, pp ) discusses the issue in the context of multiple endpoints in clinical trials and later in this paper we propose a method in the GWR context. It appears that the choice of D will always depend on the context of the study and in some cases the analyst will have no clear direction and more powerful alternatives such as the

3 0.05 Per testαvalue to achieveξ Dependency parameter D Figure 1: Plot of the per test α value against the dependency parameter D to achieve a family-wise error rate of ξ 100 = Holm Bonferonni procedure (Holm, 1979) can be employed. If one is willing to redefine the problem in terms of the false discovery rate (the expected fraction of incorrectly rejected hypotheses), the methods of Benjamini and Hochberg (1995) or, for dependent tests, Benjamini and Yekutieli (2001) may be appropriate. Applying the binomial theorem to equation 2 it is straight forward to show that ξ m α(1 + (m 1)(1 D 2 )) a result that can also be established using Boole s inequality as demonstrated by Moyè (2003). Therefore, assuming we have a value for D, the family wise error rate can be controlled at ξ m or less by choosing α = ξ m 1 + (m 1)(1 D 2 ) (3) which is a Bonferroni style correction for dependent tests. Moyè (2003, p. 188) warns against overestimation of D since then the family-wise error rate will not be preserved. The importance of this advice is illustrated by the steepness of the curve in fig. 1 for D > In this region even a small overestimate of D can lead to large overestimate of the per test α value resulting in the family-wise error rate being considerably larger than required. With this in mind it would be prudent to adopt a value of D at the lower end of its expected range. 3. Dependency in Geographically Weighted Regression Since the calibration procedure in geographically weighted regression uses the same data set to calibrate models at each spatial location, the method necessarily results in a certain

4 degree of dependency between the models, which must flow over to the t statistics and associated hypothesis tests. A measure of this dependency can be calculated from the GWR hat matrix S defined in Brunsdon et al. (1999). The effective number of parameters is defined by p e = 2tr(S) tr(s S) where tr( ) is the matrix trace operator. Now suppose that there are p parameters in each model to be calibrated and there are n spatial locations. A sensible definition for the dependency between models is D = 1 p e np, (4) which is justified as follows. First consider the (admittedly impossible) case of complete independence among the models for which p e = np giving D = 0. In the case of complete dependence among the models (i.e. the global model is appropriate) we would have p e = p giving D = 1 1/n which will be close to unity in most GWR applications since n is usually large. Therefore equation 4 satisfies the requirement 0 D 1 and on substituting into equation 3, the family wise error rate for testing hypotheses about GWR model coefficients is controlled at ξ m or less by selecting α = ξ m 1 + p e p. (5) e np In most applications p e will be much less than the maximum number of parameters (np) and so a significant gain in statistical power is expected when using this approach. 4. Summary The main result in this paper equation 5, provides a very simple method of preserving the family wise error rate when testing multiple hypotheses about GWR model coefficients. It avoids the large sacrifice of statistical power associated with the ordinary Bonferroni correction by incorporating the intrinsic dependency between the local models into the procedure. We note that the dependency parameter is a function of the effective number of parameters which is calculated from the trace of the hat matrix which itself is a function of the estimated bandwidth. Since the bandwidth is calculated from the data it is really a random variable with an unknown distribution. Therefore D and consequently α are also random variables with unknown distributions. If this distribution can be estimated (via bootstrap or Monte Carlo methods for example), confidence bounds could be developed for D and α which would enhance the procedure and provide some protection against an inflated family-wise error resulting from overestimating D. We leave this for future work. 5. Acknowledgements The first author thanks the National Centre for Geocomputation for providing resources during his time there working on this research and La Trobe University for funding study leave and travel costs. References Abdi, H. (2007). The Bonferonni and Šidàk corrections for multiple comparisons. In: N. Salkind, ed., Encyclopedia of Measurement and Statistics, Sage.

5 Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B (Methodological), 57(1): Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. The Annals of Statistics, 29(4): Bonforroni, C.E. (1935). Il calcolo delle assicurazioni su gruppi di teste. In: Studi in Onore del Professore Salvatore Ortu Carboni, Rome, Italy. pp Brunsdon, C., Fotheringham, A.S. and Charlton, M. (1999). Some notes on parametric soignificance tests for geographically weighted regression. Journal of Regional Science, 39(3): Fotheringham, A.S., Brunsdon, C. and Charlton, M. (2002). Geographically Weighted Regression: The Analysis of Spatially Varying Relationships. John Wiley & Sons, Chichester. Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6: Moyè, L.A. (2003). Multiple Analyses in Clinical Trials: Fundamentals for Investigators. Springer. Šidák, Z. (1967). Rectangular confidence regions for the means of multivariate normal distributions. Journal of the American Statistical Association, 62(318):

Outline. Inference in regression. Critical values GEOGRAPHICALLY WEIGHTED REGRESSION

Outline. Inference in regression. Critical values GEOGRAPHICALLY WEIGHTED REGRESSION GEOGRAPHICALLY WEIGHTED REGRESSION Outline Multiple Hypothesis Testing : Setting critical values for local pseudo-t statistics stewart.fotheringham@nuim.ie http://ncg.nuim.ie ncg.nuim.ie/gwr/ Significance

More information

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018 High-Throughput Sequencing Course Multiple Testing Biostatistics and Bioinformatics Summer 2018 Introduction You have previously considered the significance of a single gene Introduction You have previously

More information

Familywise Error Rate Controlling Procedures for Discrete Data

Familywise Error Rate Controlling Procedures for Discrete Data Familywise Error Rate Controlling Procedures for Discrete Data arxiv:1711.08147v1 [stat.me] 22 Nov 2017 Yalin Zhu Center for Mathematical Sciences, Merck & Co., Inc., West Point, PA, U.S.A. Wenge Guo Department

More information

This paper has been submitted for consideration for publication in Biometrics

This paper has been submitted for consideration for publication in Biometrics BIOMETRICS, 1 10 Supplementary material for Control with Pseudo-Gatekeeping Based on a Possibly Data Driven er of the Hypotheses A. Farcomeni Department of Public Health and Infectious Diseases Sapienza

More information

Exploratory Spatial Data Analysis (ESDA)

Exploratory Spatial Data Analysis (ESDA) Exploratory Spatial Data Analysis (ESDA) VANGHR s method of ESDA follows a typical geospatial framework of selecting variables, exploring spatial patterns, and regression analysis. The primary software

More information

CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity

CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity Prof. Kevin E. Thorpe Dept. of Public Health Sciences University of Toronto Objectives 1. Be able to distinguish among the various

More information

Modified Simes Critical Values Under Positive Dependence

Modified Simes Critical Values Under Positive Dependence Modified Simes Critical Values Under Positive Dependence Gengqian Cai, Sanat K. Sarkar Clinical Pharmacology Statistics & Programming, BDS, GlaxoSmithKline Statistics Department, Temple University, Philadelphia

More information

Political Science 236 Hypothesis Testing: Review and Bootstrapping

Political Science 236 Hypothesis Testing: Review and Bootstrapping Political Science 236 Hypothesis Testing: Review and Bootstrapping Rocío Titiunik Fall 2007 1 Hypothesis Testing Definition 1.1 Hypothesis. A hypothesis is a statement about a population parameter The

More information

Specific Differences. Lukas Meier, Seminar für Statistik

Specific Differences. Lukas Meier, Seminar für Statistik Specific Differences Lukas Meier, Seminar für Statistik Problem with Global F-test Problem: Global F-test (aka omnibus F-test) is very unspecific. Typically: Want a more precise answer (or have a more

More information

False Discovery Rate

False Discovery Rate False Discovery Rate Peng Zhao Department of Statistics Florida State University December 3, 2018 Peng Zhao False Discovery Rate 1/30 Outline 1 Multiple Comparison and FWER 2 False Discovery Rate 3 FDR

More information

Prospect. February 8, Geographically Weighted Analysis - Review and. Prospect. Chris Brunsdon. The Basics GWPCA. Conclusion

Prospect. February 8, Geographically Weighted Analysis - Review and. Prospect. Chris Brunsdon. The Basics GWPCA. Conclusion bruary 8, 0 Regression (GWR) In a nutshell: A local statistical technique to analyse spatial variations in relationships Global averages of spatial data are not always helpful: climate data health data

More information

A Space-Time Model for Computer Assisted Mass Appraisal

A Space-Time Model for Computer Assisted Mass Appraisal RICHARD A. BORST, PHD Senior Research Scientist Tyler Technologies, Inc. USA Rich.Borst@tylertech.com A Space-Time Model for Computer Assisted Mass Appraisal A Computer Assisted Mass Appraisal System (CAMA)

More information

Geographically Weighted Regression as a Statistical Model

Geographically Weighted Regression as a Statistical Model Geographically Weighted Regression as a Statistical Model Chris Brunsdon Stewart Fotheringham Martin Charlton October 6, 2000 Spatial Analysis Research Group Department of Geography University of Newcastle-upon-Tyne

More information

High-throughput Testing

High-throughput Testing High-throughput Testing Noah Simon and Richard Simon July 2016 1 / 29 Testing vs Prediction On each of n patients measure y i - single binary outcome (eg. progression after a year, PCR) x i - p-vector

More information

Applying the Benjamini Hochberg procedure to a set of generalized p-values

Applying the Benjamini Hochberg procedure to a set of generalized p-values U.U.D.M. Report 20:22 Applying the Benjamini Hochberg procedure to a set of generalized p-values Fredrik Jonsson Department of Mathematics Uppsala University Applying the Benjamini Hochberg procedure

More information

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008 1 / 35 Lecture outline Motivation for not using

More information

1Department of Demography and Organization Studies, University of Texas at San Antonio, One UTSA Circle, San Antonio, TX

1Department of Demography and Organization Studies, University of Texas at San Antonio, One UTSA Circle, San Antonio, TX Well, it depends on where you're born: A practical application of geographically weighted regression to the study of infant mortality in the U.S. P. Johnelle Sparks and Corey S. Sparks 1 Introduction Infant

More information

Geographically Weighted Regression and Kriging: Alternative Approaches to Interpolation A Stewart Fotheringham

Geographically Weighted Regression and Kriging: Alternative Approaches to Interpolation A Stewart Fotheringham Geographically Weighted Regression and Kriging: Alternative Approaches to Interpolation A Stewart Fotheringham National Centre for Geocomputation National University of Ireland, Maynooth http://ncg.nuim.ie

More information

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

y ˆ i = ˆ  T u i ( i th fitted value or i th fit) 1 2 INFERENCE FOR MULTIPLE LINEAR REGRESSION Recall Terminology: p predictors x 1, x 2,, x p Some might be indicator variables for categorical variables) k-1 non-constant terms u 1, u 2,, u k-1 Each u

More information

Statistical testing. Samantha Kleinberg. October 20, 2009

Statistical testing. Samantha Kleinberg. October 20, 2009 October 20, 2009 Intro to significance testing Significance testing and bioinformatics Gene expression: Frequently have microarray data for some group of subjects with/without the disease. Want to find

More information

Control of Directional Errors in Fixed Sequence Multiple Testing

Control of Directional Errors in Fixed Sequence Multiple Testing Control of Directional Errors in Fixed Sequence Multiple Testing Anjana Grandhi Department of Mathematical Sciences New Jersey Institute of Technology Newark, NJ 07102-1982 Wenge Guo Department of Mathematical

More information

The GWmodel R package: Further Topics for Exploring Spatial Heterogeneity using Geographically Weighted Models

The GWmodel R package: Further Topics for Exploring Spatial Heterogeneity using Geographically Weighted Models The GWmodel R package: Further Topics for Exploring Spatial Heterogeneity using Geographically Weighted Models Binbin Lu a*, Paul Harris a, Martin Charlton a, Chris Brunsdon b a. National Centre for Geocomputation,

More information

FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES

FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES Sanat K. Sarkar a a Department of Statistics, Temple University, Speakman Hall (006-00), Philadelphia, PA 19122, USA Abstract The concept

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca

More information

Week 5 Video 1 Relationship Mining Correlation Mining

Week 5 Video 1 Relationship Mining Correlation Mining Week 5 Video 1 Relationship Mining Correlation Mining Relationship Mining Discover relationships between variables in a data set with many variables Many types of relationship mining Correlation Mining

More information

Basic Statistics. 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation).

Basic Statistics. 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation). Basic Statistics There are three types of error: 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation). 2. Systematic error - always too high or too low

More information

MULTISTAGE AND MIXTURE PARALLEL GATEKEEPING PROCEDURES IN CLINICAL TRIALS

MULTISTAGE AND MIXTURE PARALLEL GATEKEEPING PROCEDURES IN CLINICAL TRIALS Journal of Biopharmaceutical Statistics, 21: 726 747, 2011 Copyright Taylor & Francis Group, LLC ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543406.2011.551333 MULTISTAGE AND MIXTURE PARALLEL

More information

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Statistics Journal Club, 36-825 Beau Dabbs and Philipp Burckhardt 9-19-2014 1 Paper

More information

Introductory Econometrics. Review of statistics (Part II: Inference)

Introductory Econometrics. Review of statistics (Part II: Inference) Introductory Econometrics Review of statistics (Part II: Inference) Jun Ma School of Economics Renmin University of China October 1, 2018 1/16 Null and alternative hypotheses Usually, we have two competing

More information

Step-down FDR Procedures for Large Numbers of Hypotheses

Step-down FDR Procedures for Large Numbers of Hypotheses Step-down FDR Procedures for Large Numbers of Hypotheses Paul N. Somerville University of Central Florida Abstract. Somerville (2004b) developed FDR step-down procedures which were particularly appropriate

More information

Alpha-Investing. Sequential Control of Expected False Discoveries

Alpha-Investing. Sequential Control of Expected False Discoveries Alpha-Investing Sequential Control of Expected False Discoveries Dean Foster Bob Stine Department of Statistics Wharton School of the University of Pennsylvania www-stat.wharton.upenn.edu/ stine Joint

More information

Post-Selection Inference

Post-Selection Inference Classical Inference start end start Post-Selection Inference selected end model data inference data selection model data inference Post-Selection Inference Todd Kuffner Washington University in St. Louis

More information

A Simple, Graphical Procedure for Comparing Multiple Treatment Effects

A Simple, Graphical Procedure for Comparing Multiple Treatment Effects A Simple, Graphical Procedure for Comparing Multiple Treatment Effects Brennan S. Thompson and Matthew D. Webb May 15, 2015 > Abstract In this paper, we utilize a new graphical

More information

Spatial Trends of unpaid caregiving in Ireland

Spatial Trends of unpaid caregiving in Ireland Spatial Trends of unpaid caregiving in Ireland Stamatis Kalogirou 1,*, Ronan Foley 2 1. NCG Affiliate, Thoukididi 20, Drama, 66100, Greece; Tel: +30 6977 476776; Email: skalogirou@gmail.com; Web: http://www.gisc.gr.

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons:

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons: STAT 263/363: Experimental Design Winter 206/7 Lecture January 9 Lecturer: Minyong Lee Scribe: Zachary del Rosario. Design of Experiments Why perform Design of Experiments (DOE)? There are at least two

More information

A multiple testing procedure for input variable selection in neural networks

A multiple testing procedure for input variable selection in neural networks A multiple testing procedure for input variable selection in neural networks MicheleLaRoccaandCiraPerna Department of Economics and Statistics - University of Salerno Via Ponte Don Melillo, 84084, Fisciano

More information

Running head: GEOGRAPHICALLY WEIGHTED REGRESSION 1. Geographically Weighted Regression. Chelsey-Ann Cu GEOB 479 L2A. University of British Columbia

Running head: GEOGRAPHICALLY WEIGHTED REGRESSION 1. Geographically Weighted Regression. Chelsey-Ann Cu GEOB 479 L2A. University of British Columbia Running head: GEOGRAPHICALLY WEIGHTED REGRESSION 1 Geographically Weighted Regression Chelsey-Ann Cu 32482135 GEOB 479 L2A University of British Columbia Dr. Brian Klinkenberg 9 February 2018 GEOGRAPHICALLY

More information

ON STEPWISE CONTROL OF THE GENERALIZED FAMILYWISE ERROR RATE. By Wenge Guo and M. Bhaskara Rao

ON STEPWISE CONTROL OF THE GENERALIZED FAMILYWISE ERROR RATE. By Wenge Guo and M. Bhaskara Rao ON STEPWISE CONTROL OF THE GENERALIZED FAMILYWISE ERROR RATE By Wenge Guo and M. Bhaskara Rao National Institute of Environmental Health Sciences and University of Cincinnati A classical approach for dealing

More information

Testing a secondary endpoint after a group sequential test. Chris Jennison. 9th Annual Adaptive Designs in Clinical Trials

Testing a secondary endpoint after a group sequential test. Chris Jennison. 9th Annual Adaptive Designs in Clinical Trials Testing a secondary endpoint after a group sequential test Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj 9th Annual Adaptive Designs in

More information

Statistical Inference. Why Use Statistical Inference. Point Estimates. Point Estimates. Greg C Elvers

Statistical Inference. Why Use Statistical Inference. Point Estimates. Point Estimates. Greg C Elvers Statistical Inference Greg C Elvers 1 Why Use Statistical Inference Whenever we collect data, we want our results to be true for the entire population and not just the sample that we used But our sample

More information

Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. T=number of type 2 errors

Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. T=number of type 2 errors The Multiple Testing Problem Multiple Testing Methods for the Analysis of Microarray Data 3/9/2009 Copyright 2009 Dan Nettleton Suppose one test of interest has been conducted for each of m genes in a

More information

Lectures 5 & 6: Hypothesis Testing

Lectures 5 & 6: Hypothesis Testing Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across

More information

The Design of a Survival Study

The Design of a Survival Study The Design of a Survival Study The design of survival studies are usually based on the logrank test, and sometimes assumes the exponential distribution. As in standard designs, the power depends on The

More information

Controlling the proportion of falsely-rejected hypotheses. when conducting multiple tests with climatological data

Controlling the proportion of falsely-rejected hypotheses. when conducting multiple tests with climatological data Controlling the proportion of falsely-rejected hypotheses when conducting multiple tests with climatological data Valérie Ventura 1 Department of Statistics and the Center for the Neural Basis of Cognition

More information

Rigorous Science - Based on a probability value? The linkage between Popperian science and statistical analysis

Rigorous Science - Based on a probability value? The linkage between Popperian science and statistical analysis /3/26 Rigorous Science - Based on a probability value? The linkage between Popperian science and statistical analysis The Philosophy of science: the scientific Method - from a Popperian perspective Philosophy

More information

On Generalized Fixed Sequence Procedures for Controlling the FWER

On Generalized Fixed Sequence Procedures for Controlling the FWER Research Article Received XXXX (www.interscience.wiley.com) DOI: 10.1002/sim.0000 On Generalized Fixed Sequence Procedures for Controlling the FWER Zhiying Qiu, a Wenge Guo b and Gavin Lynch c Testing

More information

Interim Monitoring of Clinical Trials: Decision Theory, Dynamic Programming. and Optimal Stopping

Interim Monitoring of Clinical Trials: Decision Theory, Dynamic Programming. and Optimal Stopping Interim Monitoring of Clinical Trials: Decision Theory, Dynamic Programming and Optimal Stopping Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj

More information

Statistica Sinica Preprint No: SS R1

Statistica Sinica Preprint No: SS R1 Statistica Sinica Preprint No: SS-2017-0072.R1 Title Control of Directional Errors in Fixed Sequence Multiple Testing Manuscript ID SS-2017-0072.R1 URL http://www.stat.sinica.edu.tw/statistica/ DOI 10.5705/ss.202017.0072

More information

An Open Source Geodemographic Classification of Small Areas In the Republic of Ireland Chris Brunsdon, Martin Charlton, Jan Rigby

An Open Source Geodemographic Classification of Small Areas In the Republic of Ireland Chris Brunsdon, Martin Charlton, Jan Rigby An Open Source Geodemographic Classification of Small Areas In the Republic of Ireland Chris Brunsdon, Martin Charlton, Jan Rigby National Centre for Geocomputation National University of Ireland, Maynooth

More information

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1 Controlling Bayes Directional False Discovery Rate in Random Effects Model 1 Sanat K. Sarkar a, Tianhui Zhou b a Temple University, Philadelphia, PA 19122, USA b Wyeth Pharmaceuticals, Collegeville, PA

More information

Comparing Adaptive Designs and the. Classical Group Sequential Approach. to Clinical Trial Design

Comparing Adaptive Designs and the. Classical Group Sequential Approach. to Clinical Trial Design Comparing Adaptive Designs and the Classical Group Sequential Approach to Clinical Trial Design Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj

More information

CH.9 Tests of Hypotheses for a Single Sample

CH.9 Tests of Hypotheses for a Single Sample CH.9 Tests of Hypotheses for a Single Sample Hypotheses testing Tests on the mean of a normal distributionvariance known Tests on the mean of a normal distributionvariance unknown Tests on the variance

More information

Fundamental Probability and Statistics

Fundamental Probability and Statistics Fundamental Probability and Statistics "There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are

More information

Multiple Testing. Tim Hanson. January, Modified from originals by Gary W. Oehlert. Department of Statistics University of South Carolina

Multiple Testing. Tim Hanson. January, Modified from originals by Gary W. Oehlert. Department of Statistics University of South Carolina Multiple Testing Tim Hanson Department of Statistics University of South Carolina January, 2017 Modified from originals by Gary W. Oehlert Type I error A Type I error is to wrongly reject the null hypothesis

More information

Sample Size and Power I: Binary Outcomes. James Ware, PhD Harvard School of Public Health Boston, MA

Sample Size and Power I: Binary Outcomes. James Ware, PhD Harvard School of Public Health Boston, MA Sample Size and Power I: Binary Outcomes James Ware, PhD Harvard School of Public Health Boston, MA Sample Size and Power Principles: Sample size calculations are an essential part of study design Consider

More information

Looking at the Other Side of Bonferroni

Looking at the Other Side of Bonferroni Department of Biostatistics University of Washington 24 May 2012 Multiple Testing: Control the Type I Error Rate When analyzing genetic data, one will commonly perform over 1 million (and growing) hypothesis

More information

Journal Club: Higher Criticism

Journal Club: Higher Criticism Journal Club: Higher Criticism David Donoho (2002): Higher Criticism for Heterogeneous Mixtures, Technical Report No. 2002-12, Dept. of Statistics, Stanford University. Introduction John Tukey (1976):

More information

Multivariate Process Control Chart for Controlling the False Discovery Rate

Multivariate Process Control Chart for Controlling the False Discovery Rate Industrial Engineering & Management Systems Vol, No 4, December 0, pp.385-389 ISSN 598-748 EISSN 34-6473 http://dx.doi.org/0.73/iems.0..4.385 0 KIIE Multivariate Process Control Chart for Controlling e

More information

Stat 206: Estimation and testing for a mean vector,

Stat 206: Estimation and testing for a mean vector, Stat 206: Estimation and testing for a mean vector, Part II James Johndrow 2016-12-03 Comparing components of the mean vector In the last part, we talked about testing the hypothesis H 0 : µ 1 = µ 2 where

More information

Dose-response modeling with bivariate binary data under model uncertainty

Dose-response modeling with bivariate binary data under model uncertainty Dose-response modeling with bivariate binary data under model uncertainty Bernhard Klingenberg 1 1 Department of Mathematics and Statistics, Williams College, Williamstown, MA, 01267 and Institute of Statistics,

More information

GeoDa-GWR Results: GeoDa-GWR Output (portion only): Program began at 4/8/2016 4:40:38 PM

GeoDa-GWR Results: GeoDa-GWR Output (portion only): Program began at 4/8/2016 4:40:38 PM New Mexico Health Insurance Coverage, 2009-2013 Exploratory, Ordinary Least Squares, and Geographically Weighted Regression Using GeoDa-GWR, R, and QGIS Larry Spear 4/13/2016 (Draft) A dataset consisting

More information

Department of Statistics University of Central Florida. Technical Report TR APR2007 Revised 25NOV2007

Department of Statistics University of Central Florida. Technical Report TR APR2007 Revised 25NOV2007 Department of Statistics University of Central Florida Technical Report TR-2007-01 25APR2007 Revised 25NOV2007 Controlling the Number of False Positives Using the Benjamini- Hochberg FDR Procedure Paul

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction ReCap. Parts I IV. The General Linear Model Part V. The Generalized Linear Model 16 Introduction 16.1 Analysis

More information

Non-specific filtering and control of false positives

Non-specific filtering and control of false positives Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview

More information

How do we compare the relative performance among competing models?

How do we compare the relative performance among competing models? How do we compare the relative performance among competing models? 1 Comparing Data Mining Methods Frequent problem: we want to know which of the two learning techniques is better How to reliably say Model

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

An Equivalency Test for Model Fit. Craig S. Wells. University of Massachusetts Amherst. James. A. Wollack. Ronald C. Serlin

An Equivalency Test for Model Fit. Craig S. Wells. University of Massachusetts Amherst. James. A. Wollack. Ronald C. Serlin Equivalency Test for Model Fit 1 Running head: EQUIVALENCY TEST FOR MODEL FIT An Equivalency Test for Model Fit Craig S. Wells University of Massachusetts Amherst James. A. Wollack Ronald C. Serlin University

More information

Using Spatial Statistics Social Service Applications Public Safety and Public Health

Using Spatial Statistics Social Service Applications Public Safety and Public Health Using Spatial Statistics Social Service Applications Public Safety and Public Health Lauren Rosenshein 1 Regression analysis Regression analysis allows you to model, examine, and explore spatial relationships,

More information

STAT 4385 Topic 01: Introduction & Review

STAT 4385 Topic 01: Introduction & Review STAT 4385 Topic 01: Introduction & Review Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 Outline Welcome What is Regression Analysis? Basics

More information

Time: the late arrival at the Geocomputation party and the need for considered approaches to spatio- temporal analyses

Time: the late arrival at the Geocomputation party and the need for considered approaches to spatio- temporal analyses Time: the late arrival at the Geocomputation party and the need for considered approaches to spatio- temporal analyses Alexis Comber 1, Paul Harris* 2, Narumasa Tsutsumida 3 1 School of Geography, University

More information

Chapter 13 Section D. F versus Q: Different Approaches to Controlling Type I Errors with Multiple Comparisons

Chapter 13 Section D. F versus Q: Different Approaches to Controlling Type I Errors with Multiple Comparisons Explaining Psychological Statistics (2 nd Ed.) by Barry H. Cohen Chapter 13 Section D F versus Q: Different Approaches to Controlling Type I Errors with Multiple Comparisons In section B of this chapter,

More information

Parameter Estimation, Sampling Distributions & Hypothesis Testing

Parameter Estimation, Sampling Distributions & Hypothesis Testing Parameter Estimation, Sampling Distributions & Hypothesis Testing Parameter Estimation & Hypothesis Testing In doing research, we are usually interested in some feature of a population distribution (which

More information

STATISTICAL INFERENCE FOR SURVEY DATA ANALYSIS

STATISTICAL INFERENCE FOR SURVEY DATA ANALYSIS STATISTICAL INFERENCE FOR SURVEY DATA ANALYSIS David A Binder and Georgia R Roberts Methodology Branch, Statistics Canada, Ottawa, ON, Canada K1A 0T6 KEY WORDS: Design-based properties, Informative sampling,

More information

Mixtures of multiple testing procedures for gatekeeping applications in clinical trials

Mixtures of multiple testing procedures for gatekeeping applications in clinical trials Research Article Received 29 January 2010, Accepted 26 May 2010 Published online 18 April 2011 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/sim.4008 Mixtures of multiple testing procedures

More information

Inferential Statistics Hypothesis tests Confidence intervals

Inferential Statistics Hypothesis tests Confidence intervals Inferential Statistics Hypothesis tests Confidence intervals Eva Riccomagno, Maria Piera Rogantin DIMA Università di Genova riccomagno@dima.unige.it rogantin@dima.unige.it Part G. Multiple tests Part H.

More information

The Pennsylvania State University The Graduate School Eberly College of Science GENERALIZED STEPWISE PROCEDURES FOR

The Pennsylvania State University The Graduate School Eberly College of Science GENERALIZED STEPWISE PROCEDURES FOR The Pennsylvania State University The Graduate School Eberly College of Science GENERALIZED STEPWISE PROCEDURES FOR CONTROLLING THE FALSE DISCOVERY RATE A Dissertation in Statistics by Scott Roths c 2011

More information

Econometrics. 4) Statistical inference

Econometrics. 4) Statistical inference 30C00200 Econometrics 4) Statistical inference Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen Today s topics Confidence intervals of parameter estimates Student s t-distribution

More information

Rigorous Science - Based on a probability value? The linkage between Popperian science and statistical analysis

Rigorous Science - Based on a probability value? The linkage between Popperian science and statistical analysis /9/27 Rigorous Science - Based on a probability value? The linkage between Popperian science and statistical analysis The Philosophy of science: the scientific Method - from a Popperian perspective Philosophy

More information

MTMS Mathematical Statistics

MTMS Mathematical Statistics MTMS.01.099 Mathematical Statistics Lecture 12. Hypothesis testing. Power function. Approximation of Normal distribution and application to Binomial distribution Tõnu Kollo Fall 2016 Hypothesis Testing

More information

Contrasts and Multiple Comparisons Supplement for Pages

Contrasts and Multiple Comparisons Supplement for Pages Contrasts and Multiple Comparisons Supplement for Pages 302-323 Brian Habing University of South Carolina Last Updated: July 20, 2001 The F-test from the ANOVA table allows us to test the null hypothesis

More information

The Purpose of Hypothesis Testing

The Purpose of Hypothesis Testing Section 8 1A:! An Introduction to Hypothesis Testing The Purpose of Hypothesis Testing See s Candy states that a box of it s candy weighs 16 oz. They do not mean that every single box weights exactly 16

More information

Analysis of variance, multivariate (MANOVA)

Analysis of variance, multivariate (MANOVA) Analysis of variance, multivariate (MANOVA) Abstract: A designed experiment is set up in which the system studied is under the control of an investigator. The individuals, the treatments, the variables

More information

Multiple Testing of One-Sided Hypotheses: Combining Bonferroni and the Bootstrap

Multiple Testing of One-Sided Hypotheses: Combining Bonferroni and the Bootstrap University of Zurich Department of Economics Working Paper Series ISSN 1664-7041 (print) ISSN 1664-705X (online) Working Paper No. 254 Multiple Testing of One-Sided Hypotheses: Combining Bonferroni and

More information

Lecture 28. Ingo Ruczinski. December 3, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Lecture 28. Ingo Ruczinski. December 3, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University Lecture 28 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University December 3, 2015 1 2 3 4 5 1 Familywise error rates 2 procedure 3 Performance of with multiple

More information

Superchain Procedures in Clinical Trials. George Kordzakhia FDA, CDER, Office of Biostatistics Alex Dmitrienko Quintiles Innovation

Superchain Procedures in Clinical Trials. George Kordzakhia FDA, CDER, Office of Biostatistics Alex Dmitrienko Quintiles Innovation August 01, 2012 Disclaimer: This presentation reflects the views of the author and should not be construed to represent the views or policies of the U.S. Food and Drug Administration Introduction We describe

More information

Type I error rate control in adaptive designs for confirmatory clinical trials with treatment selection at interim

Type I error rate control in adaptive designs for confirmatory clinical trials with treatment selection at interim Type I error rate control in adaptive designs for confirmatory clinical trials with treatment selection at interim Frank Bretz Statistical Methodology, Novartis Joint work with Martin Posch (Medical University

More information

Rigorous Science - Based on a probability value? The linkage between Popperian science and statistical analysis

Rigorous Science - Based on a probability value? The linkage between Popperian science and statistical analysis Rigorous Science - Based on a probability value? The linkage between Popperian science and statistical analysis The Philosophy of science: the scientific Method - from a Popperian perspective Philosophy

More information

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto. Introduction to Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca September 18, 2014 38-1 : a review 38-2 Evidence Ideal: to advance the knowledge-base of clinical medicine,

More information

CSISS Tools and Spatial Analysis Software

CSISS Tools and Spatial Analysis Software CSISS Tools and Spatial Analysis Software June 5, 2006 Stephen A. Matthews Associate Professor of Sociology & Anthropology, Geography and Demography Director of the Geographic Information Analysis Core

More information

Journal of Statistical Software

Journal of Statistical Software JSS Journal of Statistical Software MMMMMM YYYY, Volume VV, Issue II. doi: 10.18637/jss.v000.i00 GroupTest: Multiple Testing Procedure for Grouped Hypotheses Zhigen Zhao Abstract In the modern Big Data

More information

Sociology 6Z03 Review II

Sociology 6Z03 Review II Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability

More information

Geographically Weighted Regression

Geographically Weighted Regression Geographically Weighted Regression Modelling spatially heterogenous processes Martin Charlton National Centre for Geocomputation National University of Ireland Maynooth Outline Introduction Spatial Data

More information

Group Sequential Designs: Theory, Computation and Optimisation

Group Sequential Designs: Theory, Computation and Optimisation Group Sequential Designs: Theory, Computation and Optimisation Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj 8th International Conference

More information

A Simple, Graphical Procedure for Comparing. Multiple Treatment Effects

A Simple, Graphical Procedure for Comparing. Multiple Treatment Effects A Simple, Graphical Procedure for Comparing Multiple Treatment Effects Brennan S. Thompson and Matthew D. Webb October 26, 2015 > Abstract In this paper, we utilize a new

More information

1 Overview. Coefficients of. Correlation, Alienation and Determination. Hervé Abdi Lynne J. Williams

1 Overview. Coefficients of. Correlation, Alienation and Determination. Hervé Abdi Lynne J. Williams In Neil Salkind (Ed.), Encyclopedia of Research Design. Thousand Oaks, CA: Sage. 2010 Coefficients of Correlation, Alienation and Determination Hervé Abdi Lynne J. Williams 1 Overview The coefficient of

More information

The General Linear Model. Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London

The General Linear Model. Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London The General Linear Model Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London SPM Course Lausanne, April 2012 Image time-series Spatial filter Design matrix Statistical Parametric

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

STAT 461/561- Assignments, Year 2015

STAT 461/561- Assignments, Year 2015 STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and

More information