Nonparametric predictive inference with parametric copulas for combining bivariate diagnostic tests

Similar documents
Three-group ROC predictive analysis for ordinal outcomes

Semi-parametric predictive inference for bivariate data using copulas

Nonparametric predictive inference for ordinal data

Durham E-Theses. Predictive Inference with Copulas for Bivariate Data MUHAMMAD, NORYANTI

System reliability using the survival signature

The structure function for system reliability as predictive (imprecise) probability

NONPARAMETRIC PREDICTIVE INFERENCE FOR REPRODUCIBILITY OF TWO BASIC TESTS BASED ON ORDER STATISTICS

Estimating Optimum Linear Combination of Multiple Correlated Diagnostic Tests at a Fixed Specificity with Receiver Operating Characteristic Curves

Introduction to Imprecise Probability and Imprecise Statistical Methods

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Lehmann Family of ROC Curves

Nonparametric Predictive Inference (An Introduction)

Non parametric ROC summary statistics

Machine Learning Linear Classification. Prof. Matteo Matteucci

Statistics: Learning models from data

COS513 LECTURE 8 STATISTICAL CONCEPTS

Modelling Receiver Operating Characteristic Curves Using Gaussian Mixtures

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Bayesian Decision Theory

Bayesian Reliability Demonstration

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests

Bayesian Inference. Introduction

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test

Unsupervised Learning with Permuted Data

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

University of California, Berkeley

CS 540: Machine Learning Lecture 2: Review of Probability & Statistics

Probability and Information Theory. Sargur N. Srihari

Topic 12 Overview of Estimation

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Uniformly Most Powerful Bayesian Tests and Standards for Statistical Evidence

MODULE -4 BAYEIAN LEARNING

Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD

Carl N. Morris. University of Texas

Semiparametric Inferential Procedures for Comparing Multivariate ROC Curves with Interaction Terms

Naïve Bayes classification

Introduction to Signal Detection and Classification. Phani Chavali

Sample Size Calculations for ROC Studies: Parametric Robustness and Bayesian Nonparametrics

Estimation of Copula Models with Discrete Margins (via Bayesian Data Augmentation) Michael S. Smith

Robustness of a semiparametric estimator of a copula

Parametric Techniques Lecture 3

Discriminant Analysis with High Dimensional. von Mises-Fisher distribution and

Introduction to Machine Learning. Lecture 2

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

A Parametric ROC Model Based Approach for Evaluating the Predictiveness of Continuous Markers in Case-control Studies

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

6.867 Machine Learning

Robustifying Trial-Derived Treatment Rules to a Target Population

Introduction. Chapter 1

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

Parametric Techniques

FDR and ROC: Similarities, Assumptions, and Decisions

Lecture 21: October 19

Bayesian Linear Regression

Building a Prognostic Biomarker

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

The Bayesian Approach to Multi-equation Econometric Model Estimation

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Biomarker Evaluation Using the Controls as a Reference Population

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Sparse Linear Models (10/7/13)

Introduction to Gaussian Processes

(1) Introduction to Bayesian statistics

BAYESIAN DECISION THEORY

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Diagnostics. Gad Kimmel

Performance Evaluation and Comparison

Machine Learning Lecture 2

JACKKNIFE EMPIRICAL LIKELIHOOD BASED CONFIDENCE INTERVALS FOR PARTIAL AREAS UNDER ROC CURVES

Memorial Sloan-Kettering Cancer Center

COMP90051 Statistical Machine Learning

Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology

Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

On the errors introduced by the naive Bayes independence assumption

Uncertain Inference and Artificial Intelligence

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

Classification. Classification is similar to regression in that the goal is to use covariates to predict on outcome.

The Volume of Bitnets

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava

Structural Uncertainty in Health Economic Decision Models

ECE521 week 3: 23/26 January 2017

Part III Measures of Classification Accuracy for the Prediction of Survival Times

Principles of Statistical Inference

Principles of Statistical Inference

David Hughes. Flexible Discriminant Analysis Using. Multivariate Mixed Models. D. Hughes. Motivation MGLMM. Discriminant. Analysis.

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

CSC314 / CSC763 Introduction to Machine Learning

Concentration-based Delta Check for Laboratory Error Detection

Transcription:

Nonparametric predictive inference with parametric copulas for combining bivariate diagnostic tests Noryanti Muhammad, Universiti Malaysia Pahang, Malaysia, noryanti@ump.edu.my Tahani Coolen-Maturi, Durham University, UK, tahani.maturi@durham.ac.uk Frank P.A. Coolen, Durham University, UK, frank.coolen@durham.ac.uk Abstract. Measuring the accuracy of diagnostic tests is crucial in many application areas including medicine, machine learning and credit scoring. The receiver operating characteristic (ROC) curve is a useful tool to assess the ability of a diagnostic test to discriminate among two classes or groups. In practice, multiple diagnostic tests or biomarkers may be combined to improve diagnostic accuracy, e.g. by maximizing the area under the ROC curve. In this paper we present Nonparametric Predictive Inference (NPI) for best linear combination of two biomarkers, where the dependence of the two biomarkers is modelled using parametric copulas. NPI is a frequentist statistical method that is explicitly aimed at using few modelling assumptions, enabled through the use of lower and upper probabilities to quantify uncertainty. The combination of NPI for the individual biomarkers, combined with a basic parametric copula to take dependence into account, has good robustness properties and leads to quite straightforward computation. We briefly comment on the results of a simulation study to investigate the performance of the proposed method in comparison to the empirical method. An example with data from the literature is provided to illustrate the proposed method, and related research problems are briefly discussed. Keywords. Bivariate diagnostic tests; copulas; diagnostic accuracy; lower and upper probabilities; nonparametric predictive inference; ROC curve. 1 Introduction Measuring the accuracy of diagnostics tests is crucial in many application areas including medicine and health care. The Receiver Operating Characteristic (ROC) curve is a popular statistical tool for describing the performance of diagnostic tests. The area under the ROC curve (AUC) is often used as a measure of the overall performance of the diagnostics test [18].

2 However, one diagnostic test may not be enough to draw a useful conclusion, thus in practice multiple diagnostic tests or biomarkers may be combined to improve diagnostic accuracy [19]. There are several approaches in the literature which aim to find the best linear combination of biomarkers in order to maximize the area under the ROC curve, and thus to improve the diagnostic accuracy, see e.g. [13, 15, 17, 19]. These approaches either assume an underlying distribution or focus on estimation. In this paper, we instead use nonparametric predictive inference (NPI) for this setting with two biomarkers. NPI [1, 4] is a frequentist statistics method which explicitly considers prediction, which is an attractive alternative to the classical estimation perspectives in this context as one is mainly interested in the performance of a diagnostic test for a future patient or healthy person to whom the test is applied. We first present the basic concepts and notation used in this paper. Let D be a binary variable describing the disease status, i.e. D = 1 for disease and D = 0 for non-disease. Suppose that the result of a diagnostic test, X, is a continuous random quantity, such that large values of X are considered more indicative of disease. We assume that we have diagnostic test results for two groups, the first consisting of people known to have the disease, often referred to as patients or cases, and the second consisting of people known not to have the disease, referred to as non-patients or controls. Test results for members of these groups are denoted with superscripts corresponding to the value of the disease status D, so X 1 for the disease group and X 0 for the non-disease group. The Receiver Operating Characteristic (ROC) curve is defined through the combination of False Positive Fraction (FPF) and True Positive Fraction (TPF) over all values of the threshold c, i.e. ROC = {(FPF(c), TPF(c)), c (, )}, where FPF(c) = P (X 0 > c D = 0) and TPF(c) = P (X 1 > c D = 1). An ideal test completely separates the patients with and without the disease for a threshold c, i.e. FPF(c) = 0 and TPF(c) = 1. A useless or uninformative test fails to distinguish between patients and nonpatients for all thresholds c, which would be reflected by FPF(c) = TPF(c) for all thresholds c [18]. In many cases, a single numerical value or summary may be useful to represent the accuracy of a diagnostic test or to compare two or more ROC curves [18]. A useful summary is the area under the ROC curve, AUC. The AUC measures the overall performance of the diagnostic test. Higher AUC values indicate more accurate tests, with AUC = 1 for ideal tests and AUC = 0.5 for uninformative tests. The AUC is equal to the probability that the test results from a randomly selected pair of diseased and non-diseased subjects are correctly ordered [18], i.e. AUC = P [ X 1 > X 0]. To estimate the ROC curve for diagnostic tests with continuous results, the nonparametric empirical method is popular due to its flexibility to adapt fully to the available data. This method yields the empirical ROC curve, which will be considered in this paper for comparison to the NPI method which we introduce. Suppose that we have test data on n 1 individuals from a disease group and n 0 individuals from a non-disease group, denoted by {x 1 i, i = 1,..., n 1} and {x 0 j, j = 1,..., n 0}, respectively. Throughout this paper we assume that the two groups are fully independent, meaning that no information about any aspect related to one group contains information about any aspect of the other group. For the empirical method, these observations per group are assumed to be realisations of random quantities that are identically distributed as X 1 and X 0, for the disease {( and non-disease ) groups, respectively. } The empirical estimator of the ROC is ROC = FPF(c), TPF(c), c (, ) with TPF(c) = 1 n1 n 1 1 { x 1 i > c} and FPF(c) = 1 { n0 n 0 j=1 1 x 0 j }, > c where 1{A} is the indicator function

2. NPI FOR ROC ANALYSIS 3 which is equal to 1 if A is true and 0 else. The empirical estimator of the AUC is given by ÂUC = 1 n0 } n1 n 1 n 0 j=1 {x 1 1 i > x0 j [18]. In this paper we consider the linear combination of results of two diagnostic tests applied to the individuals from the disease and non-disease groups. Let X D and Y D be continuous random quantities representing the results of the two diagnostic tests for disease group D = 0, 1. Consider a weighted average of the two test results, T D (X D, Y D ) = αx D + (1 α)y D, where α [0, 1] and the coefficient α is chosen to maximize the AUC associated with the composite score T D, where the ROC curve is used with the combined test score T 1 for all patients and T 0 for all non-patients [19]. Suppose that we have two test results for each of the n 1 individuals from a disease group and n 0 individuals from a non-disease group, we denote these observations by { (x 1 i, y1 i ), i = 1,..., n } { } 1 and (x 0 j, y0 j ), j = 1,..., n 0, respectively. Consider a weighted average of the test results, t 1 i = αx 1 i + (1 α)y1 i for the disease group and t0 j = αx0 j + (1 α)y0 j for the non-disease group, where α [0, 1]. The empirical estimator of the ROC curve corresponding to the combined test score is {( ) } ROC = FPF(c), TPF(c), c (, ) (1) with TPF(c) = 1 n 1 1{t 1 1 i > c}, FPF(c) = n 1 n 0 n 0 j=1 1{t 0 j > c} The AUC associated with T D (X D, Y D ) is [19] ÂUC = 1 n 1 n 0 n 0 n 1 j=1 1 { αx 1 i + (1 α)yi 1 > αx 0 j + (1 α)yj 0 } (2) It is best to use that value of α which maximizes the AUC in Equation (2), we denote this by ˆα. This paper is organized as follows. Section 2 provides a brief introduction to NPI for ROC analysis with a single biomarker. Section 3 reviews the combination of NPI for marginals with a parametric copula [6] and applies this method to the bivariate diagnostic test setting introduced above. Section 4 briefly discusses initial insights from a simulation study investigating the performance of our method, followed by an example of the application of our method to data from the literature. Some concluding remarks are provided in Section 5. 2 NPI for ROC analysis In this paper we present Nonparametric Predictive Inference (NPI) for best linear combination of two biomarkers, where the dependence of the two biomarkers is modelled using parametric copulas. NPI is a frequentist statistical method that is explicitly aimed at using few modelling assumptions, enabled through the use of lower and upper probabilities to quantify uncertainty. NPI is based on the assumption A (n), proposed by Hill [12], which gives a direct conditional probability for a future real-valued random quantity, conditional on observed values of n related random quantities. Effectively, A (n) implies that the rank of the future observation among the observed values is equally likely to have each possible value 1,..., n + 1. Hence, this assumption is that the next observation has probability 1/(n + 1) to be in each interval of the partition of

4 the real line as created by the n observations. We assume here, for ease of presentation, that there are no tied observations (these can be dealt with by assuming that such observations differ by a very small amount, a common method to break ties in statistics). Inferences based on A (n) are predictive and nonparametric, and can be considered suitable if there is hardly any knowledge about the random quantity of interest, other than the n observations, or if one does not want to use any such further information in order to derive at inferences that are strongly based on the data. The assumption A (n) is not sufficient to derive precise probabilities for many events of interest, but it provides bounds for probabilities via the fundamental theorem of probability [10], which are lower and upper probabilities [1, 2]. Augustin and Coolen [1] proved that NPI has attractive inferential properties, it is also exactly calibrated from frequentist statistics perspective [14], which allows interpretation of the NPI lower and upper probabilities as bounds on the long-term ratio with which the event of interest occurs upon repeated application of this statistical procedure. NPI has been presented for assessing the accuracy of a classifier s ability to discriminate between two groups for binary data [7], for diagnostic tests with ordinal observations [11] and with real-valued observations [8]. NPI has also been presented for three-group ROC surfaces, with real-valued observations [9] and with ordinal observations [5], to assess the ability of a diagnostic test to discriminate among three ordered classes or groups. We briefly introduce NPI for diagnostic accuracy, following Coolen-Maturi et al. [8]. The NPI method is different from the nonparametric empirical method as it is explicitly predictive, considering a single next future observation given the past observations, instead of aiming at estimation for an entire assumed underlying population. In NPI uncertainty is quantified by lower and upper probabilities for events of interest. The NPI lower and upper ROC curves, and the corresponding lower and upper AUC, have been derived by Coolen-Maturi et al. [8], corresponding to the assumptions A (n1 ) for the disease group and A (n0 ) for the non-disease group, where the inferences consider one future patient from each group. Suppose that Xi 1, i = 1,..., n 1, n 1 + 1, are continuous and exchangeable random quantities from the disease group and Xj 0, j = 1,..., n 0, n 0 + 1 are continuous and exchangeable random quantities from the non-disease group, where Xn 1 and X0 n are the next observations from the disease and non-disease groups following n 1 and n 0 observations, respectively. As mentioned before, we assume that both groups are fully independent. Let x 1 1 <... < x1 n 1 be the ordered observed values for the first n 1 individuals from the disease group and x 0 1 <... < x0 n 0 the ordered observed values for the first n 0 individuals from the non-disease group. For ease of notation, let x 1 0 = x0 0 = and x1 n = x0 n =. We assume that there are no ties in the data. The NPI lower and upper ROC curves are [8] ROC = {( FPF(c), TPF(c) ), c (, ) } (3) ROC = {( FPF(c), TPF(c) ), c (, ) } (4)

3. NPI WITH PARAMETRIC COPULA FOR BIVARIATE DIAGNOSTIC TESTS 5 where T P F (c) =P (Xn 1 > c) = T P F (c) =P (Xn 1 > c) = F P F (c) =P (Xn 0 > c) = F P F (c) =P (Xn 0 > c) = n1 1 { x 1 i > c} n 1 + 1 n1 1 { x 1 i > c} + 1 n 1 + 1 n0 j=1 1 { x 0 j > c } n 0 + 1 { } n0 j=1 1 x 0 j > c + 1 n 0 + 1 (5) (6) (7) (8) where P and P are NPI lower and upper probabilities [1]. It is easily seen that FPF(c) FPF(c) FPF(c) and TPF(c) TPF(c) TPF(c) for all c, which implies that the empirical ROC curve is bounded by the NPI lower and upper ROC curves [8]. The NPI lower and upper AUC, which are the areas under the NPI lower and upper ROC curves given in (3) and (4), respectively, are also equal to the NPI lower and upper probabilities for the event that the test result for the next individual from the disease group is greater than the test result for the next individual from the non-disease group, and given by [8] AUC =P ( Xn 1 > Xn 0 ) 1 = (n 1 + 1)(n 0 + 1) AUC =P ( Xn 1 > Xn 0 ) 1 = (n 1 + 1)(n 0 + 1) n 0 n 1 j=1 1 { x 1 i > x 0 } j (9) n 0 n 1 1 { x 1 i > x 0 j} + n1 + n 0 + 1 (10) j=1 It is interesting to notice that the imprecision in these lower and upper AUCs, AUC AUC = n 1 +n (n )(n ), depends only on the sample sizes n 0 and n 1. 3 NPI with parametric copula for bivariate diagnostic tests In this section we present NPI for the weighted average of the two diagnostic tests to optimize the diagnostic accuracy with consideration of the dependence structure through the use of a parametric copula. Taking into account the dependence between two diagnostic test results for the same person is important when considering the combination of the bivariate test results, as it can influence the accuracy of detection of diseases [3]. We use the recently introduced predictive inference method for bivariate data which consists of NPI for each of the marginals in combination with a parametric copula, where the parameter is estimated using the available data [6]. This is a relatively straightforward method for prediction of a bivariate random quantity, where imprecision resulting from the use of NPI for the marginals provides robustness with regard to the assumed copula for small sample sizes. We introduce this method here immediately with application to the linear combination of two diagnostic test results, for more details on this predictive method we refer to Coolen-Maturi et al. [6]. Consider a bivariate random quantity of diagnostic test results, (X, Y ), let (Xn D D +1, Y n D D +1 ) be the next future bivariate random quantity of diagnostic test results and let Tn D D +1 = αxd n D +1 +

6 (1 α)yn D D +1, with α [0, 1], be the weighted average of the two test results for a future person from the group with disease status D. For the disease group, the lower probability for the event that the sum of the next future observations will exceed a particular threshold ξ is S 1 c(t) = P (Tn 1 > ξ) = h 1 il ( θ 1 ) (11) (i,l) L 1 t with L 1 t = {(i, l) : αx 1 i 1 + (1 α)y1 l 1 > ξ}, and the corresponding upper probability is S 1 c(t) = P (Tn 1 > ξ) = h 1 il ( θ 1 ) (12) (i,l) U 1 t with Ut 1 = {(i, l) : αx 1 i + (1 α)y1 l > ξ}, where ξ (, ) and S 1 c(t) and S 1 c(t) are the lower and upper survival functions for the sum of the next future observations, Tn 1, where dependence is taken into account as explained next, and a subscript c is added to key notations throughout this chapter to emphasize the use of an assumed copula. The quantities h 1 il ( θ 1 ) are crucial to this method, as they take the dependence between the two diagnostic test meausurements for the future person from the disease group into account, as estimated based on the available data for this group with an assumed parametric copula. They are defined as h 1 il ( θ 1 ) = P c ( X 1 n ( i 1 n 1 + 1, ) ( i l 1, n 1 + 1 Ỹ n 1 n 1 + 1, ) l n 1 + 1 θ 1 ) (13) for i, l = 1, 2,..., n, where P c ( ˆθ 1 ) represents the copula-based probability using a parametric copula, where ˆθ 1 is the estimated parameter value for this copula for the disease group. This parameter can be multi-dimensional, but we will restrict attention later to widely used symmetric parametric copulas with a single parameter. If one uses a multi-dimensional parameter, the computational aspects of this method remain quite straightforward with only the estimation of the parameter possibly becoming more complicated. The random quantities X n 1 and Ỹ n 1 are both uniformly distribution on [0, 1] and they are dependent, with the dependence modelled through the assumed copula. Note that X n 1 is related to the random quantity X1 n through a transformation based on the assumption A (n1 ) for Xn 1, as explained in detail by Coolen-Maturi et al. [6]. Similarly, for the non-disease group, the lower probability for the event that the sum of the next future observations will exceed a particular threshold ξ is S 0 c(t) = P (Tn 0 > ξ) = h 0 jk ( θ 0 ) (14) (j,k) L 0 t with L 0 t = {(j, k) : αx 0 j 1 + (1 α)y0 k 1 > ξ}, and the corresponding upper probability is S 0 c(t) = P (T 0 n > ξ) = (j,k) U 0 t h 0 jk ( θ 0 ) (15) with Ut 0 = {(j, k) : αx 0 j + (1 α)y0 k > ξ}, where ξ (, ) and S0 c(t) and S 0 c(t) are the lower and upper survival functions for the sum of the next future observation, Tn 0. The probabilities

3. NPI WITH PARAMETRIC COPULA FOR BIVARIATE DIAGNOSTIC TESTS 7 h 0 jk ( θ 0 ) are defined as h 0 jk ( θ 0 ) = P c ( X 0 n ( j 1 n 0 + 1, ) ( j k 1, n 0 + 1 Ỹ n 0 n 0 + 1, ) k n 0 + 1 θ 0 ) (16) for j, k = 1, 2,..., n 0 + 1 where P c ( ˆθ 0 ) represents the copula-based probability with estimated parameter ˆθ 0 for the non-disease group. The method used above combines NPI for the marginals with an assumed parametric copula, with its parameter estimated on the basis of available data [6, 16]. Effectively it requires computation of the (n 1 + 1) 2 joint probabilities h 1 il for the disease group, and the (n 0 + 1) 2 joint probabilities h 0 jk for the non-disease group, for the assumed parametric copula with estimated parameter value θ 1 and θ 0, respectively; these are straightforward to compute for commonly used parametric copulas. The parameters can be estimated straightforwardly as well, using any suitable estimation method, e.g. maximum likelihood estimation [6]. Note that this method is computationally far easier than the standard method with copula-based models, as through the use of NPI for the marginals there is no need to simultaneously estimate the marginals and the copula. In the approach outlined above, the transformation of the marginals is included through the sets L 1 t, Ut 1, L 0 t and Ut 0 in Equations (11), (12), (14) and (15), respectively. The use of these sets also leads to additional imprecision in the inferences, which provides additional robustness for the choice of the specific copula. The NPI lower and upper survival functions from Equations (11), (12), (14) and (15) are used to derive lower and upper FPF and TPF for the weighted average of the next future observations per group, for different threshold values ξ, and these are combined to derive the corresponding NPI lower and upper ROC curves. This leads to the following optimal bounds for the TPF and FPF when considering the dependence structure with the assumed parametric copula, TPF c (ξ) = S 1 c(ξ) = P (T 1 n > ξ) = TPF c (ξ) = S 1 c(ξ) = P (T 1 n > ξ) = (i,l) L 1 t h 1 il ( θ 1 ) (17) h 1 il ( θ 1 ) (18) FPF c (ξ) = S 0 c(ξ) = P (T 0 n > ξ) = (i,l) U 1 t h 0 jk ( θ 0 ) (19) FPF c (ξ) = S 0 c(ξ) = P (T 0 n > ξ) = (j,k) L 0 t (j,k) U 0 t h 0 jk ( θ 0 ) (20) The lower and upper ROC curves are again defined to be the optimal bounds for all such curves corresponding to any pair of survival functions Sc 1 (t) and Sc 0 (t) for Tn 1 and T n 0 in between their respective lower and upper survival functions, as given by Equations (17) - (20). The ROC curve with copula clearly depends monotonously on the survival functions, it is easily seen that the optimal bounds based on our method, which are lower and upper ROC curves, are ROC c = {( F P F c (ξ), T P F c (ξ) ), ξ (, ) } (21) ROC c = {( F P F c (ξ), T P F c (ξ) ), ξ (, ) } (22)

8 In order to optimize the diagnostic accuracy of the weighted average of the two future diagnostic test results, we maximize the area under either the lower or the upper ROC curve, by finding the value of α such that Tn D D +1 = αxd n D +1 +(1 α)y n D D +1 maximizes the respective AUC. These lower and upper AUCs are derived as follows. For each block Bil 1 = (x1 i 1, x1 i ) (y1 l 1, y1 l ), generated by the observed data, let t 1 i 1,l 1 = αx1 i 1 + (1 α)y1 l 1 be the combined weighted value corresponding to the left-bottom corner of the block, and t 1 i,l = αx 1 i + (1 α)y1 l the combined weighted value corresponding to the right-top corner of the block. The same can be defined for each block Bjk 0 = (x0 j 1, x0 j ) (y0 k 1, y0 k ), leading to t0 j 1,k 1 = αx0 j 1 + (1 α)y0 k 1 and t 0 j,k = αx0 j + (1 α)y0 k. In line with Equations (11) - (16), the lower AUC and upper AUC associated with the weighted average for the bivariate diagnostic test results with parametric copula can directly be defined as AUC c = P (Tn 1 > Tn 0 ) = h 1 il ( ˆθ 1 ) 1{t 0 j,k < t1 i 1,l 1 }h0 jk ( ˆθ 0 ) (23) AUC c = P (T 1 n > T 0 n ) = j=1 h 1 il ( ˆθ 1 ) n j=1 k=1 k=1 1{t 0 j 1,k 1 < t1 i,l }h0 jk ( ˆθ 0 ) (24) The derivations of these lower and upper probabilities, so the second equality in each of Equations (23) and (24), are given in the Appendix. The arguments that these lower and upper probabilities are indeed equal to the areas under the lower and upper ROCs, as given in Equations (21) and (22), are identical to the arguments for the corresponding equalities in Equations (9) and (10) as explained by Coolen-Maturi et al. [8]) and in the PhD thesis of Muhammed [16]. These lower and upper AUCs are quite easy to maximize by searching over α [0, 1], we denote the value of α which maximizes AUC by ˆα c L, and the value of α which maximizes AUC by ˆαc U. 4 Simulation and example We performed a simulation study to investigate the performance of the proposed method against the empirical method, the details of the simulation study are presented in the PhD thesis by Muhammed [16]. It should be noted that these simulations took very substantial computation time, because of the need to find maximum values for the linear combination coefficient α in every run. Compared to this, the computation time required for the use of the bivariate inference method described in Section 3 was neglectable. We considered several cases, always sampling 10,000 runs in each case and using bivariate Normal distributions for all (X, Y ), with variances 1 for both X and Y and correlation 0.5. We particularly wished to investigate if the new method with NPI for the marginals and an assumed parametric copula, with the parameter estimated using the data, would provide different weights α compared to the empirical method. We applied four commonly used parametric copulas, namely the Clayton, Frank, Gumbel and Normal copulas [16], the results were very similar for all of these. The results of the simulations showed that our new method tends to give slightly larger weight to the variable X or Y for which difference in the the means of the Normal distributions for the disease and non-disease groups is larger. This is a promising feature for the method, although the actual difference in the optimal weights was relatively small and tended to disappear with increasing sample sizes. Overall, one could say that the simulation study showed

4. SIMULATION AND EXAMPLE 9 slightly better performance of our method for small sample sizes (up to about 30 for each group) but that there was no noticeable difference between this method and the empirical method for larger sample sizes. One reason for the perhaps somewhat surprising fact that our method, taking into account dependence between the X and Y variables, did not perform substantially better than the empirical method is likely to be the fact that we only considered data from bivariate Normal distributions, hence with a linear dependence between the X and Y variables. Since the weighted linear combination of the X and Y measurements therefore can take this model dependence into account, there would be less benefit expected from the method taking dependence into account, in particular also because we only used symmetric parametric copulas. For a clearer perspective of our general method, it would be necessary to study its performance with different, that is non-linear, underlying dependence structures. However, this would only be really successful if there is good topic knowledge available which would guide the choice of parametric copula. We are more hopeful on developing our method further by the use of nonparametric copulas instead of a parametric copula. This will give far more flexibility on dependence structure that can be taken into account, which will be particularly useful in practical scenarios with possibly complicated, yet mostly unknown, non-linear dependence between the X and Y variables. This provides a challenging direction for future research. The PhD thesis by Muhammed [16] contains a first attempt towards combining NPI for the marginals with a nonparametric, kernel-based, copula, but this work requires further development before it can be applied to combination of multiple diagnostic tests. As an example of the application of our method, we use a data set from the literature resulting from a study of 90 pancreatic cancer patients and 51 control patients with pancreatitis [20]. Two serum markers were measured on these patients, the carbohydrate antigen CA19-9 (biomarker X in our method) and the cancer antigen CA125 (biomarker Y ). The marker values were transformed to a natural logarithmic scale and are displayed in Figure 1. To make sure that our biomarkers tests results are comparable we used standardized values (i.e. with mean 0 and variance 1) after the natural logarithmic transformation as inputs to our method. Objectives of this example were to see if a weighted average of these two biomarkers could provide a better diagnostic quantity than the individual biomarkers, and to see if the choice of parametric copula in our method would make a substantial difference. Using the individual biomarkers only, and the empirical ROC method, biomarker CA19-9 leads to empirical AUC equal to 0.8614 and biomarker CA125 to empirical AUC equal to 0.7056. Using Equations (9) and (10), the NPI lower and upper AUC values using only biomarker CA19-9 are 0.8347 and 0.8648, respectively, and when using only biomarker CA125 these values are 0.6883 and 0.7130, respectively. We applied our method with the Clayton, Frank, Gumbel and Normal copulas. The optimal lower and upper AUC values for our method, with corresponding values for α, are shown in Table 1. These results show that there is very little difference in our method for the four different parametric copulas considered. In terms of the AUC value, the Clayton copula performs best but differences are mainly only in the third decimal. These lower and upper AUC values also all bound the AUC value 0.8614 achieved by using only the best single biomarker CA19-9. For all copulas the performance is, of course, substantially better than for the worst single biomarker CA125, but we aim at using a weighted combination in order to improve on the best individual biomarker. We also applied the empirical method for the linear combination of these

10 log(ca125) 2 3 4 5 6 7 Cases Controls 2 4 6 8 10 log(ca19 9) Figure 1. Pancreatic cancer data set Copula ˆα L c AUC c ˆα U c AUC c Normal 0.7160 0.8306 0.7151 0.8896 Frank 0.7077 0.8324 0.7077 0.8920 Clayton 0.7066 0.8364 0.7061 0.8947 Gumbel 0.7215 0.8301 0.7226 0.8880 Table 1. Lower and upper AUC values and corresponding values of α two biomarkers, this lead to maximum AUC equal to 0.8939 for ˆα = 0.7188, which is very close to the upper AUC values achieved by our method, with only the Clayton copula providing a slightly better upper AUC value. More importantly for practical purposes is that the optimal values for α for all these methods is nearly identical. This example only indicates a hardly noticeable improvement in diagnostic accuracy by combining the two biomarkers over the use of only the single biomarker CA19-9. One possible reason for this is that taking the weighted average is too restrictive if the individual biomarkers for the disease group and for the non-disease group have correlations with different signs, but more importantly our current method with the use of these basic symmetric parametric copulas only considers linear dependence, which is also taking into account in the empirical combination method. The real benefit of our method is likely to show for more complex dependence structures, which will require more flexible copulas to be used, where nonparametric copulas are most likely to be most successful as long as the data sets are suitably large. 5 Concluding remarks This paper reported on intermediate results from an ongoing study into generalizing the NPI approach to multivariate data, in particular considering the use of such methods for bivariate

5. CONCLUDING REMARKS 11 diagnostic tests. The study has shown that the computational aspects of the method for this application are not a substantial problem, the computation time was mostly determined by the need to find optimal linear combination coefficients, so not for the underlying novel statistical methodology which combines NPI for the marginals with an estimated parametric copula. Simulation studies [16], which we briefly reported on, showed a small improvement of our method on the far simpler empirical method in case of small data sets, but this advantage disappeared for larger data sets. However, thus far only scenarios with linear dependences in the underlying data models have been studies, for which the dependence can be taken into account by the weighted average, also for the empirical method. Real benefit from our novel statistical method is expected when there are more complex dependence structures, but this would require either parametric copulas which reflect the dependence structure, and therefore may require detailed topic knowledge, or the use of flexible nonparametric copulas. We are continuing our research project mainly focussing on the second of these approaches, that is developing predictive inference methods for bivariate data by combining NPI for the marginals with nonparametric copulas, which can also be extended to multivariate data of higher dimension. In principle our approach can be similarly applied to combination of biomarkers for diagnostic tests if nonparametric copulas are used, so for this specific application a further research topic is in the use of other combination functions of the biomarkers than weighted averages. For practical applications, however, one may wish to stay with functions that lead to a reasonable interpretation of the combination of the biomarkers, it will be interesting to explore combinations of relatively straightforward combination functions with the use of nonparametric copulas in our method and investigate the performance of the resulting methods. Appendix Proof. The lower probability for the event Tn 1 > T n 0, given in Equation (23), is derived as follows P = P (T 1 n > T 0 n ) = = = P (T 0 n < αx 1 n + (1 α)y 1 n, X 1 n (x 1 i 1, x 1 i ), Y 1 n (y 1 l 1, y1 l )) P ( Tn 0 < αxn 1 + (1 α)yn 1, (Xn 1, Yn 1 ) Bil 1 ) h 1 il ( ˆθ 1 )P (T 0 n < t 1 i 1,l 1 ) h 1 il ( ˆθ 1 ) h 1 il ( ˆθ 1 ) j=1 j=1 k=1 k=1 P ( αxn 0 + (1 α)yn 0 < t 1 i 1,l 1, (X0 n, Yn 0 ) Bjk 0 ) 1{t 0 j,k < t1 i 1,l 1 }h0 jk ( ˆθ 0 ) For the lower probability, we want to make the probability for the event T 1 n > T 0 n as small as possible. To this end, the first inequality follows by putting the probability h 1 il ( ˆθ 1 )

12 corresponding to the block Bil 1 to the left-bottom of the block, for all i, l = 1,..., n 1 + 1. Thus the corresponding combined weighted value is t 1 i 1,l 1 = αx1 i 1 + (1 α)y1 l 1. The second inequality follows by putting the probability h 0 jk ( ˆθ 0 ) corresponding to the block Bjk 0 to the righttop of the block, for all j, k = 1,..., n 0 + 1, and the corresponding combined weighted value is t 0 j,k = αx0 j + (1 α)y0 k. Because these configurations of the respective probability masses can actually be achieved, this is the maximum possible lower bound for the probability of interest and hence it is a lower probability. The upper probability for the event Tn 1 > T n 0, given in Equation (24), is derived as follows P = P (T 1 n > T 0 n ) = = = P (T 0 n < αx 1 n + (1 α)y 1 n, X 1 n (x 1 i 1, x 1 i ), Y 1 n (y 1 l 1, y1 l )) P ( Tn 0 < αxn 1 + (1 α)yn 1, (Xn 1, Yn 1 ) Bil 1 ) h 1 il ( ˆθ 1 )P (T 0 n < t 1 i,l ) h 1 il ( ˆθ 1 ) h 1 il ( ˆθ 1 ) j=1 j=1 k=1 k=1 P ( αxn 0 + (1 α)yn 0 < t 1 i,l, (X0 n, Yn 0 ) Bjk 0 ) 1{t 0 j 1,k 1 < t1 i,l }h0 jk ( ˆθ 0 ) For the upper probability, we want to make the probability for the event Tn 1 > T n 0 as large as possible. To this end, the first inequality follows by putting the probability h 1 il ( ˆθ 1 ) corresponding to the block Bil 1 to the right-top of the block, for all i, l = 1,..., n 1 + 1. Thus the corresponding combined weighted value is t 1 i,l = αx 1 i + (1 α)y1 l. The second inequality follows by putting the probability h 0 jk ( ˆθ 0 ) corresponding to the block Bjk 0 to the left-bottom of the block, for all j, k = 1,..., n 0 + 1, and the corresponding combined weighted value is t 0 j 1,k 1 = αx0 j 1 + (1 α)y0 k 1. This is indeed an upper probability as it is the minimum possible upper bound for the probability of interest. Bibliography [1] Augustin T. and Coolen F.P.A. (2004). Nonparametric predictive inference and interval probability. Journal of Statistical Planning and Inference, 124, 251 272. [2] Augustin T., Coolen F.P.A., de Cooman G. and Troffaes M.C.M. (Eds) (2014). Introduction to Imprecise Probabilities. Chichester: Wiley. [3] Bansal A. and Sullivan Pepe M. (2013). When does combining markers improve classification performance and what are implications for practice? Statistics in Medicine, 32, 1877 1892.

5. CONCLUDING REMARKS 13 [4] Coolen F.P.A. (2006). On nonparametric predictive inference and objective Bayesianism. Journal of Logic, Language and Information, 15, 21 47. [5] Coolen-Maturi T. (2017). Three-group ROC predictive analysis for ordinal outcomes. Communications in Statistics - Theory and Methods, to appear. [6] Coolen-Maturi T., Coolen F.P.A. and Muhammad N. (2016). Predictive inference for bivariate data: combining nonparametric predictive inference for marginals with an estimated copula. Journal of Statistical Theory and Practice, 10, 515 538. [7] Coolen-Maturi T., Coolen-Schrijner P. and Coolen F.P.A. (2012a). Nonparametric predictive inference for binary diagnostic tests. Journal of Statistical Theory and Practice, 6, 665 680. [8] Coolen-Maturi T., Coolen-Schrijner P. and Coolen F.P.A. (2012b). Nonparametric predictive inference for diagnostic accuracy. Journal of Statistical Planning and Inference, 142, 1141 1150. [9] Coolen-Maturi T., Elkhafifi F.F. and Coolen F.P.A. (2014). Three-group ROC analysis: A nonparametric predictive approach. Computational Statistics & Data Analysis, 78, 69 81. [10] De Finetti B. (1974). Theory of Probability. London: Wiley. [11] Elkhafifi F.F. and Coolen F.P.A. (2012). Nonparametric predictive inference for accuracy of ordinal diagnostic tests. Journal of Statistical Theory and Practice, 6, 681 697. [12] Hill B.M. (1968). Posterior distribution of percentiles: Bayes theorem for sampling from a population. Journal of the American Statistical Association, 63, 677 691. [13] Kang L., Liu A. and Tian L. (2016). Linear combination methods to improve diagnostic/prognostic accuracy on future observations. Statistical Methods in Medical Research, 25, 1359 1380. [14] Lawless J.F. and Fredette M. (2005). Frequentist prediction intervals and predictive distributions. Biometrika, 92, 529 542. [15] Liu C., Liu A. and Halabi S. (2011). A min max combination of biomarkers to improve diagnostic accuracy. Statistics in Medicine, 30, 2005 2014. [16] Muhammad N. (2016). Predictive Inference with Copulas for Bivariate Data. PhD thesis, Durham University, UK (available from www.npi-statistics.com). [17] Su J.Q. and Liu J.S. (1993). Linear combinations of multiple diagnostic markers. Journal of the American Statistical Association, 88, 1350 1355. [18] Sullivan Pepe M. (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford: Oxford University Press. [19] Sullivan Pepe M. and Thompson M.L. (2000). Combining diagnostic test results to increase accuracy. Biostatistics, 1, 123 140.

14 [20] Wieand S., Gail M.H., James B.R. and James K.L. (1989). A family of nonparametric statistics for comparing diagnostic markers with paired or unpaired data. Biometrika, 76, 585 592.