Passing-Bablok Regression for Method Comparison

Similar documents
Analysis of Covariance (ANCOVA) with Two Groups

NCSS Statistical Software. Harmonic Regression. This section provides the technical details of the model that is fit by this procedure.

Fractional Polynomial Regression

Ratio of Polynomials Fit Many Variables

Hotelling s One- Sample T2

Two Correlated Proportions Non- Inferiority, Superiority, and Equivalence Tests

Analysis of 2x2 Cross-Over Designs using T-Tests

Group-Sequential Tests for One Proportion in a Fleming Design

One-Way Analysis of Covariance (ANCOVA)

Tests for Two Coefficient Alphas

Ratio of Polynomials Fit One Variable

Confidence Intervals for the Odds Ratio in Logistic Regression with One Binary X

Confidence Intervals for the Interaction Odds Ratio in Logistic Regression with Two Binary X s

Confidence Intervals for the Odds Ratio in Logistic Regression with Two Binary X s

Tests for the Odds Ratio of Two Proportions in a 2x2 Cross-Over Design

Tests for the Odds Ratio in a Matched Case-Control Design with a Quantitative X

Mixed Models No Repeated Measures

Tests for the Odds Ratio in Logistic Regression with One Binary X (Wald Test)

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

CHAPTER 10. Regression and Correlation

Confidence Intervals for One-Way Repeated Measures Contrasts

General Linear Models (GLM) for Fixed Factors

One-Way Repeated Measures Contrasts

Ratio of Polynomials Search One Variable

Superiority by a Margin Tests for One Proportion

1 A Review of Correlation and Regression

Non-Inferiority Tests for the Ratio of Two Proportions in a Cluster- Randomized Design

MATH 1150 Chapter 2 Notation and Terminology

MULTIPLE LINEAR REGRESSION IN MINITAB

Multiple Regression Basic

WISE Regression/Correlation Interactive Lab. Introduction to the WISE Correlation/Regression Applet

LOOKING FOR RELATIONSHIPS

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

PASS Sample Size Software. Poisson Regression

Regression Analysis. Table Relationship between muscle contractile force (mj) and stimulus intensity (mv).

Item Reliability Analysis

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Midterm 2 - Solutions

Lecture # 31. Questions of Marks 3. Question: Solution:

Frequency Distribution Cross-Tabulation

Chapter 7: Correlation

STAT 350 Final (new Material) Review Problems Key Spring 2016

Tests for Two Correlated Proportions in a Matched Case- Control Design

Example name. Subgroups analysis, Regression. Synopsis

Ridge Regression. Chapter 335. Introduction. Multicollinearity. Effects of Multicollinearity. Sources of Multicollinearity

Basic Statistics. 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation).

1 Introduction to Minitab

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

REVIEW 8/2/2017 陈芳华东师大英语系

Investigating Models with Two or Three Categories

SPSS LAB FILE 1

Simple Linear Regression

Describing the Relationship between Two Variables

Irrational Thoughts. Aim. Equipment. Irrational Investigation: Teacher Notes

Understanding Inference: Confidence Intervals I. Questions about the Assignment. The Big Picture. Statistic vs. Parameter. Statistic vs.

IT 403 Statistics and Data Analysis Final Review Guide

M M Cross-Over Designs

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

Stat 101 Exam 1 Important Formulas and Concepts 1

Econometrics Review questions for exam

Using SPSS for One Way Analysis of Variance

Final Exam - Solutions

Computer simulation of radioactive decay

Business Statistics. Lecture 10: Course Review

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

Multiple Regression Introduction to Statistics Using R (Psychology 9041B)

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Lab 1 Uniform Motion - Graphing and Analyzing Motion

Inferences About the Difference Between Two Means

Economics 471: Econometrics Department of Economics, Finance and Legal Studies University of Alabama

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression

Advanced Quantitative Data Analysis

Contents. 13. Graphs of Trigonometric Functions 2 Example Example

Lesson 24: Using the Quadratic Formula,

Didacticiel - Études de cas. In this tutorial, we show how to use TANAGRA ( and higher) for measuring the association between ordinal variables.

y response variable x 1, x 2,, x k -- a set of explanatory variables

Lecture 7: OLS with qualitative information

Heteroskedasticity ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

y i s 2 X 1 n i 1 1. Show that the least squares estimators can be written as n xx i x i 1 ns 2 X i 1 n ` px xqx i x i 1 pδ ij 1 n px i xq x j x

Multiple Regression Analysis: Inference ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

Daniel Boduszek University of Huddersfield

Prof. Bodrero s Guide to Derivatives of Trig Functions (Sec. 3.5) Name:

Lab 07 Introduction to Econometrics

Quick Guide to OpenStat

Binary Dependent Variables

Final Exam - Solutions

AP Final Review II Exploring Data (20% 30%)

4:3 LEC - PLANNED COMPARISONS AND REGRESSION ANALYSES

DISCRETE RANDOM VARIABLES EXCEL LAB #3

Applied Statistics and Econometrics

Box-Cox Transformations

Midterm Exam 1, section 1. Thursday, September hour, 15 minutes

Lesson 1: Successive Differences in Polynomials

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) STATA Users

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

OECD QSAR Toolbox v.4.1. Tutorial on how to predict Skin sensitization potential taking into account alert performance

Introduction and Single Predictor Regression. Correlation

Transcription:

Chapter 313 Passing-Bablok Regression for Method Comparison Introduction Passing-Bablok regression for method comparison is a robust, nonparametric method for fitting a straight line to two-dimensional data where both variables, X and Y, are measured with error. It is useful when you have two devices that should give the same measurements and you want to compare them. This is accomplished by estimating a linear regression line and testing whether the intercept is zero and the slope is one. The Passing- Bablok regression procedure fits the intercept (β 0) and the slope (β 1) of the linear equation YY = ββ 0 + ββ 1 XX. The estimate of the slope (B1) is calculated as the median of all slopes that can be formed from all possible pairs of data points, except those pairs that result in a slope of 0/0 or -1. To correct for estimation bias caused by the lack of independence of these slopes, the median is shifted by a factor K which is the number of slopes that are less than -1. This creates an approximately unbiased estimator. The estimate of the intercept (B0) is the median of {Y i B1 X i}. The intercept is interpreted as the systematic bias (difference) between the two methods. The slope measures the amount of proportional bias (difference) between the two methods. Experimental Design Typical designs suitable for Passing-Bablok regression include up to 3000 paired measurements, (xx ii, yy ii ), i = 1,, n, similar to the common input for simple linear regression. Typical data of this type are shown in the table below. Typical Data for Passing-Bablok regression Subject X Y 1 7 7.9 2 8.3 8.2 3 10.5 9.6 4 9 9 5 5.1 6.5 6 8.2 7.3 7 10.2 10.2 8 10.3 10.6 313-1

Technical Details The methods and results in this chapter are based on the formulas given in Passing and Bablok (1983, 1984) and Bablok, Passing, Bender, and Schneider (1988). Assumptions Passing-Bablok regression requires the following assumptions: 1. The variables X and Y are highly, positively correlated (for method comparison only). 2. The relationship between X and Y is linear (straight-line). 3. No special assumptions are made about the distributions (including the variances) of X and Y. Passing-Bablok Estimation Define XX ii and YY ii, i = 1,, n, as the values for two variables, each sampled with error to give the observed values xx ii and yy ii, respectively. For each of the NN = nn possible pairs of points, define the slope by 2 SS iiii = yy ii yy jj xx ii xx jj If xx ii = xx jj and yy ii = yy jj, the result is 0/0 which is undefined. Omit these pairs from the calculation of the slope. If xx ii = xx jj and yy ii > yy jj, the result is. Enter a large positive number. Since we are using the median of the Sij, the value won t matter. If xx ii = xx jj and yy ii < yy jj, the result is. Enter a large negative number. Since we are using the median of the Sij, the value won t matter. If Sij = -1, omit the pair from the calculation of the slope. This corrects for the bias caused by the lack of independence in the Sij. If Sij < -1, use the pair and add one to a counter K. This value will be used to shift the median. The slope B1 is the shifted median of Sij, where the median is shifted to the right K steps. Using this slope, calculate intercept B0 as the median of all n of the quantities yy ii BB1xx ii. Confidence Bounds Calculate the confidence bounds for β 1 as follows. Let zz αα/2 be the 1 α/2 quantile of the standard normal distribution. Let Now calculate nn(nn 1)(2nn + 5) CC αα/2 = zz αα/2 18 Here, [U] rounds U to the nearest integer. Also calculate MM 1 = NN CC αα/2 2 MM 2 = NN MM 1 + 1 313-2

Finally, a confidence interval for β 1 is given by Where the Sij s are sorted. SS (MM1 +KK) ββ 1 SS (MM2 +KK) Confidence limits for the intercept are calculated as follows. Let BB1 LL and BB1 UU be the lower and upper limits for the slope from the last calculation. Now calculate the limits for the intercept as and BB0 LL = median{yy ii BB1 UU xx ii } BB0 UU = median{yy ii BB1 LL xx ii } Cusum Test of the Linearity Assumption An important assumption is that the relationship between X and Y is linear. Passing and Bablok (1983) proposed testing this assumption using a modified cusum test. This test proceeds as follows. Let n pos represent the number of observations with y i > B0 + B1 x i. Similarly, let n neg represent the number of observations with y i < B0 + B1 x i. Step 1 Assign score r i to each point as follows. If yy ii > BB0 + BB1 xx ii then rr ii = nn nnnnnn /nn pppppp If yy ii < BB0 + BB1 xx ii then rr ii = nn pppppp /nn nnnnnn If yy ii = BB0 + BB1 xx ii then rr ii = 0 Step 2 Assign the distance scores D i to each observation as follows. D ii = yy ii+ xx ii BB1 BB0 1+1/BB1 2 Step 3 Sort the r i in the order of the D i. Label these sorted values r (i). Step 4 Create the cusum (i) statistics using the following. ccccccccmm (ii) = ii kk=1 rr (kk) Step 5 Find the maximum absolute value of cusum (i) across all values of i. Call this MaxCusum. Step 6 The null hypothesis of random arrangement of residuals and therefore a linear relationship between X and Y is tested using the test statistic HH = MMMMMMMMMMMMMMMM nn nnnnnn +1 This statistic is compared to the Kolmogorov-Smirnov distribution. Details are given in Passing and Bablok (1983). For example, the critical value for a 5% test is 1.36 and for a 1% test is 1.63. Note that we created an approximating polynomial the allows us to calculate p-values for any value of H. Kendall s Tau Test of the High Correlation Assumption Passing and Bablok (1983) recommended that a preliminary two-sided test be conducted to determine if Kendall s tau correlation between X and Y is significantly different from zero. They also indicate that this correlation must be positive. Kendall s tau correlation is well documented in the Correlation procedure. 313-3

Procedure Options This section describes the options available in this procedure. Variables Tab This panel specifies the variables and estimation parameters used in the analysis. Variables Y Variable Specify a single data column to be used for the Y variable. This column contains the measurements from one of the two measurement systems being compared. The X variable contains corresponding measurements from the other (X) measurement system. It is arbitrary which method is selected for X and which is selected for Y. The estimated equation will be of the form Y = B0 + B1 X The analysis will test whether B0 = 0 and B1 = 1. If they do, then if follows that Y = X. You can enter the column name or number directly, or click the button on the right to display a Column Selection window that will let you select the column from a list. X Variable Specify a single data column to be used for the X variable. This column contains the measurements from one of the two measurement systems being compared. The Y variable contains corresponding measurements from the other (Y) measurement system. It is arbitrary which method is selected for X and which is selected for Y. The estimated equation will be of the form Y = B0 + B1 X The analysis will test whether B0 = 0 and B1 = 1. If they do, then if follows that Y = X. You can enter the column name or number directly, or click the button on the right to display a Column Selection window that will let you select the column from a list. Grouping Variable (Optional) Grouping Variable Enter a single categorical grouping variable. The values of this variable indicate which category each subject belongs in. Values may be text or numeric. The grouping variable is optional. A separate analysis will be performed for each group. 313-4

Reports Tab This tab controls which statistical reports are displayed in the output. Significance Levels and Confidence Levels Confidence Level This confidence level, entered as a percentage, is used in the calculation of confidence intervals for regression coefficients. Typical confidence levels are 90%, 95%, and 99%, with 95% being the most common. Range 50 < Confidence Level < 100 Assumptions Alpha Specify the alpha value (significance level) used for tests of assumptions. Alpha is the probability of rejecting the null hypothesis when it is actually true. Recommended Often, an alpha of 0.05 is used. However, many statisticians recommend a higher value of alpha for tests of assumptions such as 0.10 or even 0.20. Range Typical choices for alpha range between 0.001 and 0.200. Select Reports Run Summary to Residuals Each of these options specifies whether the indicated report is displayed. Report Options The following options control the format of the reports. Variable Names Specify whether to use variable names, variable labels, or both to label output reports. In this discussion, the variables are the columns of the data table. Names Variable names are the column headings that appear on the data table. They may be modified by clicking the Column Info button on the Data window or by clicking the right mouse button while the mouse is pointing to the column heading. Labels This refers to the optional labels that may be specified for each column. Clicking the Column Info button on the Data window allows you to enter them. Both Both the variable names and labels are displayed. 313-5

Comments 1. Most reports are formatted to receive about 12 characters for variable names. 2. Variable Names cannot contain blanks or math symbols (like + - * /.,), but variable labels can. Value Labels Value Labels are used to make reports more legible by assigning meaningful labels to numbers and codes. The options are Data Values All data are displayed in their original format, regardless of whether a value label has been set or not. Value Labels All values of variables that have a value label variable designated are converted to their corresponding value label when they are output. This does not modify their value during computation. Both Both data value and value label are displayed. Example A variable named GENDER (used as a grouping variable) contains 1 s and 2 s. By specifying a value label for GENDER, the report can display Male instead of 1 and Female instead of 2. This option specifies whether (and how) to use the value labels. Report Options Decimal Places Item Decimal Places These decimal options allow the user to specify the number of decimal places for items in the output. Your choice here will not affect calculations; it will only affect the format of the output. Auto If one of the Auto options is selected, the ending zero digits are not shown. For example, if Auto (0 to 7) is chosen, 0.0500 is displayed as 0.05 1.314583689 is displayed as 1.314584 The output formatting system is not designed to accommodate Auto (0 to 13), and if chosen, this will likely lead to lines that run on to a second line. This option is included, however, for the rare case when a very large number of decimals is needed. 313-6

Plots Tab These options specify which plots are produced as well as the plot format. Passing-Bablok Regression Plots Passing-Bablok Regression Scatter Plot Indicate whether to display this plot. Click the plot format button to change the plot settings. Show Combined Plot (If Grouping Variable Present) If you have a grouping variable present, this option allows you to plot all groups on one plot for comparison. Passing-Bablok regression is still performed on each group separately. Residual Plot Residual Plot Indicate whether to display this plot. Click the plot format button to change the plot settings. 313-7

Example 1 This section presents an example of how to run a Passing-Bablok regression method comparison of the data in the PassBablok 1 dataset. This dataset contains measurements from two measurement methods on each of 30 items. The goal is to determine if the two measurement systems obtain equivalent measurements. You may follow along here by making the appropriate entries or load the completed template Example 1 by clicking on Open Example Template from the File menu of the window. 1 Open the PassBablok 1 example dataset. From the File menu of the NCSS Data window, select Open Example Data. Click on the file PassBablok 1.NCSS. Click Open. 2 Open the window. Using the Analysis menu or the Procedure Navigator, find and select the Passing-Bablok Regression for Method Comparison procedure. On the menus, select File, then New Template. This will fill the procedure with the default template. 3 Specify the variables. Select the Variables tab. For Y Variable, enter Method2. For X Variable, enter Method1. Leave the Grouping Variable blank. 4 Specify the reports. Select the Reports tab. In addition to the reports already selected, check Residuals. 5 Specify the plots. Select the Plots tab. Click on the Passing-Bablok Regression Scatter Plot button. Click on the Residuals Plot button. 6 Run the procedure. From the Run menu, select Run Procedure. Alternatively, just click the green Run button. Run Summary Run Summary Item Value Item Value Y Variable Method2 Rows Used 30 X Variable Method1 B0: Intercept -0.0092 B1: Slope 0.9986 This report gives a summary of the input and various descriptive measures about the Passing-Bablok regression. 313-8

Descriptive Statistics Item Y X Y - X Variable Method2 Method1 Difference N 30 30 30 Mean 62.99333 63.14-0.1466667 Std Dev 23.5886 23.6153 2.2578 Std Error 4.3067 4.3115 0.4122 COV 0.3745 0.3740-15.3942 Minimum 12.7 14.2-3.8 First Quartile 47.425 48.8-0.5 Median 64.15 61.4-0.1 Third Quartile 83.075 84.1 0.3 Maximum 98.6 99.1 10.2 This report gives descriptive statistics about the variables used in the regression. Kendall s Tau Correlation Confidence Interval and Hypothesis Test Hypothesis Test of Y = X Kendall's Lower 95%: Upper 95%: Z-Value Reject H0 Tau Conf. Limit Conf. Limit for Testing that ρ = 0 Correlation of Tau of Tau H0: ρ = 0 P-Value at α = 0.05? 0.9724 0.9546 0.9833 7.5289 0.0000 Yes This procedure assumes that Kendall's Tau is significantly greater than zero. This assumption cannot be rejected. The section reports an analysis of Kendall s tau correlation. It provides both a confidence interval and a significance test. The main thing to look for here is that the correlation (0.9724 in this example) is large and positive. This can be surmised from both the confidence interval and the p-value. Cusum Test of Linearity Cusum Test of Linearity Maximum Linearity Linearity Reject Linearity Cusum Z Value P-Value at α = 0.05? 4.0000 1.0000 0.2848 No This procedure assumes that only a positive linear relationship exists between the two measurements. This assumption cannot be rejected. The section reports the results of the cusum test of linearity. When linearity is rejected, the assumption that the data supports this assumption is rejected and the analysis should not be used. 313-9

Regression Coefficient Intercept Slope Item B0 B1 Regression Coefficient -0.0092 0.9986 Lower 95% Conf. Limit of β(i) -0.7486 0.9856 Upper 95% Conf. Limit of β(i) 0.7987 1.0099 Method Comparison Hypothesis Test Hypothesis being tested, H0: β0.lcl < 0 < β0.ucl and β1.lcl < 1 < β1.ucl. Cannot reject H0. There is not enough evidence to conclude that the methods are not equal. This section reports the regression coefficients, along with their analytic confidence limits. The conclusion is reported on the last line. This is the main conclusion desired of this report. Residuals Report Residuals Difference Predicted Residual Row X Y Y - X Yhat (Y-Yhat) 1 69.3 69.1-0.2000 69.1945 0.0945 2 27.1 26.7-0.4000 27.0531 0.3531 3 61.3 61.4 0.1000 61.2056-0.1944 4 50.8 51.2 0.4000 50.7202-0.4798 5 34.4 34.7 0.3000 34.3430-0.3570 6 92.3 88.5-3.8000 92.1626 3.6626 7 57.5 57.9 0.4000 57.4109-0.4891 8 45.5 45.1-0.4000 45.4276 0.3276 This section reports the residuals and the difference for each of the input data points. 313-10

Passing-Bablok Regression Scatter Plot This report shows both the 45-degree line (Y = X) and the fitted Passing-Bablok regression line. In this example, the regression line hides the 45-degree line. 313-11

Residual Plot This plot emphasizes the deviation of the points from the regression line. 313-12