Assessing the Effectiveness of Cumulative Sum Poisson- and Normal-based Tests for Detecting Rare Diseases

Similar documents
Improving Biosurveillance System Performance. Ronald D. Fricker, Jr. Virginia Tech July 7, 2015

Quality Control & Statistical Process Control (SPC)

Final Paper for MAT4996: Modelling the Impact of a Live Virus Vaccine for Tularemia

Directionally Sensitive Multivariate Statistical Process Control Methods

Surveillance of Infectious Disease Data using Cumulative Sum Methods

Lecture 13. Poisson Distribution. Text: A Course in Probability by Weiss 5.5. STAT 225 Introduction to Probability Models February 16, 2014

Accelerated warning of a bioterrorist event or endemic disease outbreak

NAVAL POSTGRADUATE SCHOOL THESIS

Practice Exam 1 (Sep. 21, 2015)

THE NATURE OF THE BIOTERRORISM THREAT

2018 TANEY COUNTY HEALTH DEPARTMENT. Communicable Disease. Annual Report

Early Detection of a Change in Poisson Rate After Accounting For Population Size Effects

Binomial and Poisson Probability Distributions

A Hypothesis Test for the End of a Common Source Outbreak

Early Detection of Important Animal Health Events

Risk Assessment of Staphylococcus aureus and Clostridium perfringens in ready to eat Egg Products

Encyclopedia of Bioterrorism Defense

Dr. habil. Anna Salek. Mikrobiologist Biotechnologist Research Associate

Core Courses for Students Who Enrolled Prior to Fall 2018

A Step Towards the Cognitive Radar: Target Detection under Nonstationary Clutter

EFFICIENT CHANGE DETECTION METHODS FOR BIO AND HEALTHCARE SURVEILLANCE

KEY CONCEPTS AND PROCESS SKILLS. 2. Most infectious diseases are caused by microbes.

Directionally Sensitive Multivariate Statistical Process Control Procedures with Application to Syndromic Surveillance

Lecture 3. Discrete Random Variables

EED Design & Performance Evaluation

Probability and Probability Distributions. Dr. Mohammed Alahmed

Biological terrorism preparedness evaluating the performance of the Early Aberration Reporting System (EARS) syndromic surveillance algorithms

If there exists a threshold k 0 such that. then we can take k = k 0 γ =0 and achieve a test of size α. c 2004 by Mark R. Bell,

Approximate Bayesian Computation

A Generic Multivariate Distribution for Counting Data

Surveillance of BiometricsAssumptions

Utilization of advanced clutter suppression algorithms for improved spectroscopic portal capability against radionuclide threats

Statistical challenges in Disease Ecology

Modelling spatio-temporal patterns of disease

USING CLUSTERING SOFTWARE FOR EXPLORING SPATIAL AND TEMPORAL PATTERNS IN NON-COMMUNICABLE DISEASES

Statistical Models and Algorithms for Real-Time Anomaly Detection Using Multi-Modal Data

Bugs in JRA-55 snow depth analysis

Continuous Probability Spaces

Chapter 4 Continuous Random Variables and Probability Distributions

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Entropy of Gaussian Random Functions and Consequences in Geostatistics

Bioterrorism Anthrax Guidebook: For Healthcare Workers, Public Officers (Allied Health, Nurses, Doctors, Public Health Workers, EMS Workers, Other...

Aerosol Generation and Characterisation for Nanotoxicology

Radioactive Decedents What is the risk?

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

Fighting Cholera With Maps

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Scientific Highlight October 2012

1. Poisson distribution is widely used in statistics for modeling rare events.

Bacillus subtilis strain GB03

Multivariate Charts for Multivariate. Poisson-Distributed Data. Busaba Laungrungrong

Applications of GIS in Health Research. West Nile virus

Approximate and Fiducial Confidence Intervals for the Difference Between Two Binomial Proportions

WM2011 Conference, February 27 - March 3, 2011, Phoenix, AZ

An introduction to biostatistics: part 1

What are Cells? How is this bacterium similar to a human? organism: a living thing. The cell is the basic unit of life.

Chapters 3.2 Discrete distributions

Pricing of Cyber Insurance Contracts in a Network Model

Lecture 3: Measures of effect: Risk Difference Attributable Fraction Risk Ratio and Odds Ratio

C h a p t e r 5 : W o r k p l a c e H a z a r d o u s M a t e r i a l s I n f o r m a t i o n S y s t e m ( W H M I S )

Downloaded from:

Name Unit 1 Study Guide: Nature of Biology Test Date: Collect/Analyze Your Data: During the experiment, you collect your data/measurements so that

Radiation Awareness Training. Stephen Price Office of Research Safety

EXAMINING THE DISTRIBUTION OF FRANCISELLA TULARENSIS, THE CAUSATIVE AGENT OF TULAREMIA, IN UKRAINE USING ECOLOGICAL NICHE MODELING

BIOAG'L SCI + PEST MGMT- BSPM (BSPM)

Background The events of fall 2001, with the deliberate

Bayes Formula. MATH 107: Finite Mathematics University of Louisville. March 26, 2014

UNIVERSITY OF SOUTH CAROLINA RADIATION SAFETY POLICY NO. 7. USC NOVEMBER 1985 (Revised May 2011) Preparation and Disposal of Radioactive Waste

Discrete distribution. Fitting probability models to frequency data. Hypotheses for! 2 test. ! 2 Goodness-of-fit test

Dr. Steven Koch Director, NOAA National Severe Storms Laboratory Chair, WRN Workshop Executive Committee. Photo Credit: Associated Press

Statistical Distributions and Uncertainty Analysis. QMRA Institute Patrick Gurian

Drought forecasting methods Blaz Kurnik DESERT Action JRC

Zoonotic Disease Report September 2 September 8, 2018 (Week 36)

Geographic Information Systems(GIS)

Brief Glimpse of Agent-Based Modeling

Estimating the Shape of Covert Networks

2. The CDF Technique. 1. Introduction. f X ( ).

PASS Sample Size Software. Poisson Regression

Spatial Variation in Local Road Pedestrian and Bicycle Crashes

Statistical Analysis of Spatio-temporal Point Process Data. Peter J Diggle

Regenerative Likelihood Ratio control schemes

1 Disease Spread Model

Multivariate Bayesian modeling of known and unknown causes of events An application to biosurveillance

Computational and Monte-Carlo Aspects of Systems for Monitoring Reliability Data. Emmanuel Yashchin IBM Research, Yorktown Heights, NY

A Modular NMF Matching Algorithm for Radiation Spectra

Global Harmonization System:

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina

Pump failure data. Pump Failures Time

Lecture 20. Poisson Processes. Text: A Course in Probability by Weiss STAT 225 Introduction to Probability Models March 26, 2014

An Application of the Autoregressive Conditional Poisson (ACP) model

Count data and the Poisson distribution

Model comparison. Patrick Breheny. March 28. Introduction Measures of predictive power Model selection

Source Term Estimation for Hazardous Releases

N) manual. Biomaster Operating manual

Battling Bluetongue and Schmallenberg virus: Local scale behavior of transmitting vectors

PROF. DR HAB. PIOTR TRYJANOWSKI

A Framework for the Study of Urban Health. Abdullah Baqui, DrPH, MPH, MBBS Johns Hopkins University

Sequential Analysis & Testing Multiple Hypotheses,

Assessing Atmospheric Releases of Hazardous Materials

Transcription:

Assessing the Effectiveness of Cumulative Sum Poisson- and Normal-based Tests for Detecting Rare Diseases 16 November 2010 LCDR Manuel Ganuza Thesis Research Prof. Ronald Fricker Advisor Graduate School Operations & Information Sciences Department of Operations Research

Agenda Motivating Problem: Detecting Tularemia Research Question Background Normal and Poisson Distributions CUSUM Applying the CUSUM Measures of Performance Results & Conclusions 2

Motivating Problem: Detecting Tularemia Highly virulent, highly infectious disease [1] Caused by bacterium Francisella tularensis Also known as rabbit fever and deer fly fever Primary vectors are ticks and deer flies Primary reservoirs are small to medium-sized mammals Extremely rare in the United States [1] From 1990 to 2000, rate less than 1 per 1,000,000 Can be weaponized for aerosol release [2] Category A agent by CDC Category A agents have the "potential to pose a severe threat to public health and safety Sources: [1] Center for Disease Control and Prevention. Tularemia United States, 1990-2000. MMWR 2002; 51: 181 184. [2] Steven D. Cronquist. Tularemia: the disease and the weapon. Dermatologic Clinics 2004. 22(3): 313 320. 3

Motivating Problem: Detecting Tularemia Modes of Transmission [3] Bites by infected arthropods Direct contact with infected animals Handling of infectious animal tissues or fluids Ingestion of contaminated food, water, or soil Possibly direct contact with contaminated soil or water Inhalation of infectious aerosols Exposure in laboratory setting Clinical Syndromes [3] Glandular and Ulceroglandular tularemia Pneumonic tularemia Oculoglandular tularemia Oropharyngeal tularemia Typhodial tularemia Source: [5] Center for Infectious Diseases Research & Policy. Tularemia: Overview. University of Minnesota March 2010. 4

Some Tularemia Outbreaks in US Martha s Vineyard 2000 [4] : One fatality CDC investigated for possibility of aerosolized F tularensis Subsequently documented cases of tularemia resulted from lawn mowing Washington, D.C. 2005 [5] : Small amounts of Francisella tularensis were detected Morning after an anti-war demonstration, biohazard sensors were triggered at six locations surrounding the Capitol Mall While thousands of people were potentially exposed, no infections were reported Sources: [4] Katherine A. Feldman, et all. An Outbreak of Primary Pneumonic Tularemia on Martha s Vineyard. New England Journal of Medicine 2001. 345(22): 1601 1606. [5] Center for Infectious Diseases Research & Policy News. Tularemia agent found in DC air, but no cases seen. University of Minnesota October 2005. 5

Research Question Background: Biosurveillance systems often use Cumulative Sum (CUSUM) detection algorithm CUSUM derived from normal distribution is most commonly used ( normal-based CUSUM ) Rare diseases do not follow normal distribution Poisson distribution more likely and appropriate Research question: How does normal-based CUSUM perform compared to Poisson-based CUSUM for detecting rare diseases? 6

Background: Normal Distribution Univariate distribution for continuous data Applies to many phenomena but not all! Two parameters: mean μ and variance σ 2 7

Background: Poisson Distribution Univariate distribution for discrete data Often a good model for rare events One parameter: λ Distribution has mean λ and variance λ λ=5 8

Background: The CUSUM Sequential test for change (increase) in distribution mean (Page, 1954) Basic form: where C t max 0, C C t is the current value of the CUSUM on day t X t is the observed number of cases on day t C t-1 is the value of the CUSUM on day t-1 ln[f 1 (X)/f 0 (X)] is the log-likelihood ratio for X for the outbreak distribution f 1 and the non-outbreak distribution f 0 Threshold h: CUSUM signals at time t when t1 ln f f 1 0 ( X) ( X) Ct h 9

Forms of the CUSUM The basic form: C t max 0, C t1 ln For f 0 = N( μ 0, σ 2 ) and f 1 = N( μ 1, σ 2 ): f f 1 0 ( X) ( X) 1 0 Ct max 0, Ct 1 X t 2 For f 0 = Poisson(λ 0 ) and f 1 = Poisson(λ 1 ): Ct max 0, Ct 1 X t ln 1 0 ln 1 0 10

Comparing the Forms Normal-based and Poisson-based CUSUMs are exactly the same when the reference values match: 1 0 1 0 1 0 2 2 ln ln So, the CUSUMs are the same when ln ln 2 1 0 which is the same as when and which is also the same as when 1 7.40 0 2 1/ 0 e 7.3891 1 0 11

Comparing the Forms (cont d) Three cases of interest Case #1: Case #2: Case #3: 1 = 7.40 0 7.40 1 0 7.40 1 0 12

Performance Metric Average Time to False/First Signal (ATFS) When there is no outbreak, it is the average time between (false) signals When there is an outbreak, it is the average delay from start of outbreak to first (real) signal The ideal algorithm: Has large time between false signals when there is no outbreak Has a small delay to first real signal when there is an outbreak Comparison methodology Match time between false signals for a given λ 0 Assess ATFS across variety of outbreaks (λ* > λ 0 ) Method with smaller ATFS values is preferred 13

Scenario Disease: Tularemia Incubation period from 3 to 5 days (range 1 to 14 days) Cases occur according to Poisson(λ*) distribution Biosurveillance Systems System A Uses normal-based CUSUM Assume disease distribution is approximate normal exempli gratia, for X ~ Poisson(λ) it assumes X ~ N(λ, λ) Set threshold h A, so ATFS is large when no outbreak System B Uses Poisson-based CUSUM Set threshold h B, so ATFS is large when no outbreak 14

Results: Case #1 ( 7.40 ) 1 0 Biosurveillance Systems System A System B User set parameters: No outbreak, mean λ 0 = 0.100 Outbreak, mean λ 1 = 0.740 Threshold, h A = 0.64 h B = 1.36 Actual occurrence: Data Poisson Outbreak, mean λ * 15

Results: Case #1 ( ) 1 7.40 0 Biosurveillance Systems System A System B System A User set parameters: No outbreak, mean λ 0 = 0.100 Outbreak, mean λ 1 = 0.740 Threshold, h A = 0.64 h B = 1.36 h A = 1.08 Actual occurrence: Data Poisson Outbreak, mean λ * 16

Results: Case #2 ( 7.40 ) 1 0 for λ* = λ 0, ATFS is too high Biosurveillance Systems System A System B User set parameters: No outbreak, mean λ 0 = 0.100 Outbreak, mean λ 1 = 0.105 Threshold, h A = 8.11 h B = 2.39 Actual occurrence: Data Poisson Outbreak, mean λ * 17

Results: Case #2 ( 7.40 ) 1 0 for λ* = λ 0, ATFS match Biosurveillance Systems System A System B System A User set parameters: No outbreak, mean λ 0 = 0.100 Outbreak, mean λ 1 = 0.105 Threshold, h A = 8.11 h B = 2.39 h A = 7.73 Actual occurrence: Data Poisson Outbreak, mean λ * 18

Results: Case #3 ( 7.40 ) 1 0 Biosurveillance Systems System A System B User set parameters: No outbreak, mean λ 0 = 0.100 Outbreak, mean λ 1 = 1.480 Threshold, h A = 0.12 h B = 0.49 Actual occurrence: Data Poisson Outbreak, mean λ * 19

Results: Case #3 ( 7.40 ) 1 0 Biosurveillance Systems System A System B System A User set parameters: No outbreak, mean λ 0 = 0.100 Outbreak, mean λ 1 = 1.480 Threshold, h A = 0.12 h B = 0.49 h A = 0.31 Actual occurrence: Data Poisson Outbreak, mean λ * 20

Conclusion #1 When Rate of disease, λ 0, is very low, Occurrence counts are Poisson distributed, X ~ Pois(λ 0 ), Outbreak, λ 1, manifests as only a small increase in the rate of disease, 1 7.400 Then incorrect use of normal-based CUSUM can result in unacceptable delay in detection To monitor rare diseases, such as Tularemia, include Poisson-based CUSUM in biosurveillance systems Example here shows potential delays in signaling on the order of weeks 21

Conclusion #2 When Rate of disease, λ 0, is very low, Occurrence counts are Poisson distributed, X ~ Pois(λ 0 ), Outbreak rate, λ 1, is significantly larger than rate of disease, 1 7.400 Then the normal-based CUSUM performs as well as Poisson-based CUSUM If threshold h, is set appropriately to achieve equivalent ATFS performance at λ * = λ 0 With threshold h, set incorrectly, Normal-based CUSUM has excessively high false alarm rate A real problem with existing biosurveillance systems 22

Questions? 23

BACKUP SLIDES Assessing the Effectiveness of Cumulative Sum Poisson- and Normal-based Tests for Detecting Rare Diseases Photo Source: [6] Public Health Image Library. Francisella tularensis, ID#10526. Center for Disease Control and Prevention 1972. 24

Francisella tularensis Subspecies [3] F tularensis subsp tularensis (type A) Highly infectious, more virulent, more genetically diverse Found in North America F tularensis subsp holarctica (type B) Found in North America, Europe, Siberia, Far East, and Kazakhstan F tularensis subsp mediaasiatica Found in Central Asia Source: [3] Center for Infectious Diseases Research & Policy. Tularemia: Overview. University of Minnesota March 2010. 25

Background: Normal Distribution Probability density function (pdf) 2 1 ( x ) f( x;, ) exp 2 2 2 26

Background: Poisson Distribution Probability mass function (pmf) Pr( X x; ) x! x e λ = 5 27

Applying the CUSUM: Normal Data For f 0 = N( μ 0, σ 2 ) and f 1 = N( μ 1, σ 2 ) 1 0 Ct max 0, Ct 1 X t 2 In words, at each time period add the observed data minus one-half the difference between μ 1 and to μ 0 the cumulative total If the cumulative total is negative, set to zero This is the most common form of the CUSUM Often applied to data without realizing that the underlying assumption is normality 28

Applying the CUSUM: Poisson Data For f 0 = Poisson(λ 0 ) and f 1 = Poisson(λ 1 ) Ct max 0, Ct 1 X t ln Similar idea, at each time period add the observed data minus a quantity based on a function of λ 1 and λ 0 to the cumulative total If the cumulative total is negative, set to zero Much less commonly known form 1 0 But it is the correct form for Poisson data ln 1 0 29