Dynamic Disease Screening

Similar documents
Modern Statistical Process Control Charts and Their Applications in Analyzing Big Data

Construction of An Efficient Multivariate Dynamic Screening System. Jun Li a and Peihua Qiu b. Abstract

Univariate Dynamic Screening System: An Approach For Identifying Individuals With Irregular Longitudinal Behavior. Abstract

Rejoinder. Peihua Qiu Department of Biostatistics, University of Florida 2004 Mowry Road, Gainesville, FL 32610

CONTROL CHARTS FOR MULTIVARIATE NONLINEAR TIME SERIES

Module B1: Multivariate Process Control

Rejoinder. 1 Phase I and Phase II Profile Monitoring. Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2

TECHNICAL APPENDIX WITH ADDITIONAL INFORMATION ON METHODS AND APPENDIX EXHIBITS. Ten health risks in this and the previous study were

Quantile Regression Methods for Reference Growth Charts

Stat 579: Generalized Linear Models and Extensions

Weighted Likelihood Ratio Chart for Statistical Monitoring of Queueing Systems

Statistical Process Control for Multivariate Categorical Processes

Inference for correlated effect sizes using multiple univariate meta-analyses

Misclassification Rates in Hypertension Diagnosis due to Measurement Errors

Correction for classical covariate measurement error and extensions to life-course studies

Estimating Optimal Dynamic Treatment Regimes from Clustered Data

Distribution-Free Monitoring of Univariate Processes. Peihua Qiu 1 and Zhonghua Li 1,2. Abstract

Time Series. Anthony Davison. c

ECAS Summer Course. Quantile Regression for Longitudinal Data. Roger Koenker University of Illinois at Urbana-Champaign

The Robustness of the Multivariate EWMA Control Chart

Correlation and Simple Linear Regression

Power and Sample Size Calculations with the Additive Hazards Model

Lecture 4 Multiple linear regression

BINF 702 SPRING Chapter 8 Hypothesis Testing: Two-Sample Inference. BINF702 SPRING 2014 Chapter 8 Hypothesis Testing: Two- Sample Inference 1

Data Mining Stat 588

Marginal Structural Cox Model for Survival Data with Treatment-Confounder Feedback

Designing Information Devices and Systems I Fall 2016 Babak Ayazifar, Vladimir Stojanovic Homework 12

Designing Information Devices and Systems I Spring 2016 Elad Alon, Babak Ayazifar Homework 11

Nonparametric Regression Analysis of Multivariate Longitudinal Data

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data

Lecture 32: Infinite-dimensional/Functionvalued. Functions and Random Regressions. Bruce Walsh lecture notes Synbreed course version 11 July 2013

Point Formulation and Adaptive Smoothing

Nonparametric Monitoring of Multiple Count Data

MS&E 226: Small Data

A Generalized Global Rank Test for Multiple, Possibly Censored, Outcomes

Single Equation Linear GMM with Serially Correlated Moment Conditions

Two sample hypothesis testing

Machine Learning Linear Classification. Prof. Matteo Matteucci

Directionally Sensitive Multivariate Statistical Process Control Methods

Residuals in the Analysis of Longitudinal Data

Mixed Effects Multivariate Adaptive Splines Model for the Analysis of Longitudinal and Growth Curve Data. Heping Zhang.

Web Appendix for Effect Estimation using Structural Nested Models and G-estimation

Exponentially Weighted Moving Average Control Charts for Monitoring Increases in Poisson Rate

Single Equation Linear GMM with Serially Correlated Moment Conditions

An exponentially weighted moving average scheme with variable sampling intervals for monitoring linear profiles

Social connectedness is associated with fibrinogen level in a human social network

Lecture 2: Constant Treatment Strategies. Donglin Zeng, Department of Biostatistics, University of North Carolina

Bayesian causal forests: dealing with regularization induced confounding and shrinking towards homogeneous effects

arxiv: v1 [stat.me] 14 Jan 2019

Distribution-free ROC Analysis Using Binary Regression Techniques

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

Directional Control Schemes for Multivariate Categorical Processes

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

arxiv: v1 [stat.ap] 6 Apr 2018

Monitoring Wafer Geometric Quality using Additive Gaussian Process

University of California, Berkeley

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

Monitoring General Linear Profiles Using Multivariate EWMA schemes

MS&E 226: Small Data

The concentration of a drug in blood. Exponential decay. Different realizations. Exponential decay with noise. dc(t) dt.

COMPARING GROUPS PART 1CONTINUOUS DATA

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

A NOTE ON A NONPARAMETRIC REGRESSION TEST THROUGH PENALIZED SPLINES

A DYNAMIC QUANTILE REGRESSION TRANSFORMATION MODEL FOR LONGITUDINAL DATA

System Monitoring with Real-Time Contrasts

Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources

Estimation in Covariate Adjusted Regression

Efficient Control Chart Calibration by Simulated Stochastic Approximation

Sample Size and Power Considerations for Longitudinal Studies

TA: Sheng Zhgang (Th 1:20) / 342 (W 1:20) / 343 (W 2:25) / 344 (W 12:05) Haoyang Fan (W 1:20) / 346 (Th 12:05) FINAL EXAM

Implementation of MEWMA Control Chart in Equipment Condition Monitoring

A Nonparametric Control Chart Based On The Mann Whitney

Differential Equations Practice: 2nd Order Linear: Nonhomogeneous Equations: Undetermined Coefficients Page 1

Likelihood-Based EWMA Charts for Monitoring Poisson Count Data with Time-Varying Sample Sizes

Covariate Adjusted Varying Coefficient Models

An Adaptive Exponentially Weighted Moving Average Control Chart for Monitoring Process Variances

ECEN 420 LINEAR CONTROL SYSTEMS. Lecture 2 Laplace Transform I 1/52

171:162 Design and Analysis of Biomedical Studies, Summer 2011 Exam #3, July 16th

Lecture 4: Generalized Linear Mixed Models

Harvard University. Harvard University Biostatistics Working Paper Series. Sheng-Hsuan Lin Jessica G. Young Roger Logan

Marginal Screening and Post-Selection Inference

Study Ch. 10.3, 67 70all, (no CI), 81, 83

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs

Introduction to Statistical Analysis

Cross-Validation with Confidence

Statistical Inference with Monotone Incomplete Multivariate Normal Data

Local Box Cox transformation on time-varying parametric models for smoothing estimation of conditional CDF with longitudinal data

Lecture 7 Time-dependent Covariates in Cox Regression

Multi-model Markov Decision Processes

eappendix: Description of mgformula SAS macro for parametric mediational g-formula

Analysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington

Cross-Validation with Confidence

Statistical Inference with Monotone Incomplete Multivariate Normal Data

Subset selection with sparse matrices

Well-developed and understood properties

Nonparametric meta-analysis for diagnostic accuracy studies Antonia Zapf

with the usual assumptions about the error term. The two values of X 1 X 2 0 1

MAS3301 / MAS8311 Biostatistics Part II: Survival

Multivariate Process Control Chart for Controlling the False Discovery Rate

Monitoring Paired Binary Surgical Outcomes Using Cumulative Sum Charts

Transcription:

Dynamic Disease Screening Peihua Qiu pqiu@ufl.edu Department of Biostatistics University of Florida December 10, 2014, NCTS Lecture, Taiwan p.1/25

Motivating Example SHARe Framingham Heart Study of NHLBI. Many residents at Framingham MA were involved. Major risk factors of cardiovascular diseases: blood pressure, total cholesterol level (TCL), smoking, obesity,... Identify patients with irregular longitudinal patterns of the disease risk factors as early as possible. Disease early detection and prevention December 10, 2014, NCTS Lecture, Taiwan p.1/25 Dynamic screening (DS) problem

DS Problem DS problem is popular Many products (e.g., airplanes, cars) are checked regularly or occasionally about certain variables related to their quality and/or performance. If the observed values of a product are significantly worse than the values of a typical well-functioning product of the same age, then some adjustments or interventions should be made to avoid unpleasant consequences. December 10, 2014, NCTS Lecture, Taiwan p.2/25

Possible Statistical Methods Confidence interval of the mean responses by longitudinal data analysis. This method uses the cross-sectional comparison approach. It does not make use of all history data of a subject. It cannot detect a shift sequentially. December 10, 2014, NCTS Lecture, Taiwan p.3/25

Possible Statistical Methods Statistical process control (SPC) methods Monitor each subject sequentially Use all history data of the subject They monitor subjects separately and cannot compare different subjects The process mean and variance may not be constants even when the subject is IC December 10, 2014, NCTS Lecture, Taiwan p.4/25

Dynamic Screening System (DySS) Estimate regular longitudinal pattern from an IC dataset Standardize observations of a new subject to monitor Monitor the standardized observations by a control chart December 10, 2014, NCTS Lecture, Taiwan p.5/25

References Qiu, P., and Xiang, D. (2014), Dynamic screening system: an approach for dynamically identifying irregular individuals, Technometrics, 56, 248-260. Qiu, P., and Xiang, D. (2014), Surveillance of cardiovascular diseases using a multivariate dynamic screening system, revised for Statistics in Medicine. Qiu, P., Zi, X., and Zou, C. (2014), Dynamic nonparametric curve monitoring, submitted. Li, J., and Qiu, P. (2014), Nonparametric dynamic screening system for monitoring correlated longitudinal data, submitted. Xiang, D., Qiu, P., and Pu, X. (2013), Nonparametric regression analysis of multivariate longitudinal data, Statistica Sinica, 23, 769 789. December 10, 2014, NCTS Lecture, Taiwan p.6/25

MDySS Qiu, P., and Xiang, D. (2014) Multivariate dynamic screening system (MDySS) December 10, 2014, NCTS Lecture, Taiwan p.7/25

Estimate regular longitudinal pattern IC data: observations of m well-functioning subjects For i = 1,2,...,m,j = 1,2,...,J i,t ij [0,1], y(t ij ) = µ(t ij )+ε(t ij ) y(t ij ) = (y 1 (t ij ),...,y q (t ij )) µ(t ij ) = (µ 1 (t ij ),...,µ q (t ij )) Regular pattern: µ(t) and Σ(s, t) = Cov(y(s), y(t)) Xiang, Qiu, and Pu (2013): estimation of µ(t) December 10, 2014, NCTS Lecture, Taiwan p.8/25 and Σ(s,t).

Standardize Observations New subject s y values are observed at t 1,t 2,... over [0,1]. When s/he is IC, y(t j) = µ(t j)+σ 1 2 (t j,t j)ǫ(t j) Standardized observations: ǫ(t j) = Σ ( ) 1 2 (t j,t j) y(t j) µ(t j; Σ) December 10, 2014, NCTS Lecture, Taiwan p.9/25

A Note By using the standardized observations of the new subject, we have actually compared its longitudinal pattern cross-sectionally with the estimated regular longitudinal pattern at the time points t 1,t 2,... December 10, 2014, NCTS Lecture, Taiwan p.10/25

Sequential Monitoring Zou and Qiu (2009): LASSO-based MEWMA chart MEWMA statistic U j = λ L ǫ(t j)+(1 λ L )U j 1 min α R q(u j α) (U j α)+ γ k q l=1 α l U jl Q j = max k=1,...,q W j, γk E(W j, γk ) Var(Wj, γk ) > h L December 10, 2014, NCTS Lecture, Taiwan p.11/25

Performance Evaluation Performance measures: IC average run length ARL 0 OC average run length ARL 1 December 10, 2014, NCTS Lecture, Taiwan p.12/25

Performance Evaluation (Con d) If {t j,j = 1,2,...} are unequally spaced, ARL 0 and ARL 1 may not be appropriate Basic time unit ω: largest time unit that all observation times are integer multiples of ω Define n j = t j /ω, for j = 0,1,2,..., where n 0 = t 0 = 0. t j = n jω, for all j. December 10, 2014, NCTS Lecture, Taiwan p.13/25

Performance Evaluation (Con d) IC: If a signal is given at the sth observation time, then E(n s) measures the IC average time to signal (ATS), denoted as ATS 0. OC: If a shift occurs at the τth observation time and a signal is given at the sth observation time with s τ, then E(n s n τ) is the OC ATS, denoted as ATS 1. December 10, 2014, NCTS Lecture, Taiwan p.14/25

SHARe Framingham Heart Study m = 945 non-stroke patients (IC data) 27 stroke patients (new subjects) each patient was followed 7 times (i.e., J = 7) Four medical indices: systolic blood pressure (mmhg), diastolic blood pressure (mmhg), total cholesterol level (mg/100ml), and glucose level (mg/100ml) December 10, 2014, NCTS Lecture, Taiwan p.15/25

SHARe Framingham Heart Study (con d) Qj 0 5 10 15 20 Patient 1 Patient 2 Patient 3 Patient 4 Qj 0 10 20 30 Patient 5 Patient 6 Patient 7 Patient 8 Qj 0 10 20 30 Patient 9 Patient 10 Patient 11 Patient 12 Qj 0 10 20 30 Patient 13 Patient 14 Patient 15 Patient 16 Qj 0 5 10 15 Patient 17 Patient 18 Patient 19 Patient 20 Qj 0 5 10 15 Patient 21 Patient 22 Patient 23 Patient 24 1 2 3 4 5 6 7 j Qj 0 5 10 15 Patient 25 Patient 26 Patient 27 December 10, 2014, NCTS Lecture, Taiwan p.16/25 1 2 3 4 5 6 7 j 1 2 3 4 5 6 7 j 1 2 3 4 5 6 7 j

SHARe Framingham Heart Study (con d) DySS approach: 26 out of 27 stroke patients got signals; 131 out of 945 non-stroke patients got signals. The average signal time is 11.84 years. December 10, 2014, NCTS Lecture, Taiwan p.17/25

Dynamic Curve Monitoring Qiu, Zi, and Zou (2014) Model µ(t i )+σ(t i )ε(t i ), for t i [0,τ], y(t i )= µ(t i )+σ(t i )g(t i )+σ(t i )ε(t i ), for t i (τ,t], After the transformation {y(t i ) µ(t i )}/σ(t i ), y(t i )= { ε(ti ), for t i [0,τ], g(t i )+ε(t i ), for t i (τ,t]. H 0 : τ > T versus H 1 : τ [0,T] December 10, 2014, NCTS Lecture, Taiwan p.18/25

Test and Estimation of g(t i ) Loss function: Q(t m ;λ) argmin a R m i=1 m i=1 {y(t i ) g(t i )} 2 (1 λ) t m t i {y(t i ) a} 2 (1 λ) t m t i ĝ λ (t m ) = m i=1 w i(t m )y(t i )/ m i=1 w i(t m ) Q H1 (t m ;λ) = m i=1 {y(t i) ĝ(t i )} 2 w i (t m ) Q H0 (t m ;λ) = m i=1 {y(t i)} 2 w i (t m ) December 10, 2014, NCTS Lecture, Taiwan p.19/25

Test and Estimation of g(t i ) Weighted GLR test statistic (WGLR): W λ (t m ) =Q H0 (t m ;λ) Q H1 (t m ;λ) m = w i (t m ){2y(t i ) ĝ(t i )}ĝ(t i ). i=1 Recursive formulas: W λ (t m ) = w m 1 (t m )W λ (t m 1 )+{2y(t m ) ĝ(t m )}ĝ(t m ), ĝ(t m ) = {α m 1 ĝ(t m 1 )+y(t m )}/α m where α m = m i=1 w i(t m ) = w m 1 (t m )α m 1 +1. December 10, 2014, NCTS Lecture, Taiwan p.20/25

Test of g(t i ) (con d) Dynamic EWMA (DEWMA) chart: W λ(t m ) = {W λ (t m ) E λ (t m )}/ V λ (t m ) > L When the observation times are equally spaced, DEWMA is the conventional EWMA chart. Benefits: Accommodate the unequally spaced observation times by using the weights (1 λ) t m t i Wλ (t m) is robust when g(t) values change December 10, 2014, NCTS Lecture, Taiwan p.21/25 much over time

Simulation Results t [0,1000], d = 1 IC model: µ(t) = 1+0.3t 1/2, σ 2 (t) = µ 2 (t) OC models: (I) Step Shift: g(t) = δ, for t > τ (II) Quadratic Drift: g(t) = (t τ) 2 δ (III) Sine Drift: g(t) = sin(0.003π(t τ))δ December 10, 2014, NCTS Lecture, Taiwan p.22/25

Model (I) Model (II) Model (III) Y 1.0 1.1 1.2 1.3 1.4 1.5 δ = 0 δ = 0.05 δ = 0.1 Y 1.0 1.1 1.2 1.3 1.4 1.5 δ = 0 δ = 0.5 δ = 1 Y 1.0 1.1 1.2 1.3 1.4 1.5 δ = 0 δ = 0.05 δ = 0.1 0 100 300 500 t (a) 0 100 300 500 t (b) 0 100 300 500 t (c) Y 0.5 0.0 0.5 1.0 True function local linear EWMA 0 100 300 500 Time (d) Y 0.5 0.0 0.5 1.0 0 100 300 500 Time (e) Y 0.5 0.0 0.5 1.0 0 100 300 500 December 10, 2014, NCTS Lecture, Taiwan p.23/25 Time (f)

Model I (equally) Model II (equally) Model III (equally) log(ats) 0 1 2 3 4 5 DEWMA(λ = 0.05) DEWMA(λ = 0.2) EWMA(λ = 0.05) EWMA(λ = 0.2) log(ats) 0 1 2 3 4 5 log(ats) 0 1 2 3 4 5 0.0 1.0 2.0 3.0 δ (a) Model I (random) 0 5 10 20 30 δ (b) Model II (random) 0 1 2 3 4 5 6 δ (c) Model III (random) log(ats) 0 1 2 3 4 5 log(ats) 0 1 2 3 4 5 log(ats) 0 1 2 3 4 5 0.0 1.0 2.0 3.0 δ (d) 0 5 10 20 30 δ (e) 0 1 2 3 4 5 6 December 10, 2014, NCTS Lecture, δ Taiwan p.24/25 (f)

Future Research Autocorrelation Nonparametric charts Accommodation of covariates December 10, 2014, NCTS Lecture, Taiwan p.25/25