Confidence Intervals for the Interaction Odds Ratio in Logistic Regression with Two Binary X s

Similar documents
Confidence Intervals for the Odds Ratio in Logistic Regression with Two Binary X s

Confidence Intervals for the Odds Ratio in Logistic Regression with One Binary X

Tests for the Odds Ratio in Logistic Regression with One Binary X (Wald Test)

Tests for the Odds Ratio in a Matched Case-Control Design with a Quantitative X

PASS Sample Size Software. Poisson Regression

Passing-Bablok Regression for Method Comparison

Tests for the Odds Ratio of Two Proportions in a 2x2 Cross-Over Design

Tests for Two Correlated Proportions in a Matched Case- Control Design

Group-Sequential Tests for One Proportion in a Fleming Design

NCSS Statistical Software. Harmonic Regression. This section provides the technical details of the model that is fit by this procedure.

Tests for Two Coefficient Alphas

Analysis of Covariance (ANCOVA) with Two Groups

Non-Inferiority Tests for the Ratio of Two Proportions in a Cluster- Randomized Design

Confidence Intervals for One-Way Repeated Measures Contrasts

P A L A C E P IE R, S T. L E O N A R D S. R a n n o w, q u a r r y. W WALTER CR O TC H, Esq., Local Chairman. E. CO O PER EVANS, Esq.,.

Superiority by a Margin Tests for One Proportion

PALACE PIER, ST. LEONARDS. M A N A G E R - B O W A R D V A N B I E N E.

Hotelling s One- Sample T2

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

One-Way Analysis of Covariance (ANCOVA)

Two Correlated Proportions Non- Inferiority, Superiority, and Equivalence Tests

Lecture 12: Effect modification, and confounding in logistic regression

One-Way Repeated Measures Contrasts

Ratio of Polynomials Fit One Variable

Advanced Quantitative Data Analysis

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Ratio of Polynomials Fit Many Variables

Logistic Regression Models to Integrate Actuarial and Psychological Risk Factors For predicting 5- and 10-Year Sexual and Violent Recidivism Rates

Fractional Polynomial Regression

Analysis of 2x2 Cross-Over Designs using T-Tests

Generalized logit models for nominal multinomial responses. Local odds ratios

Lecture 5: Poisson and logistic regression

M M Cross-Over Designs

General Linear Models (GLM) for Fixed Factors

Lecture 2: Poisson and logistic regression

Homework Solutions Applied Logistic Regression

Lecture 10: Logistic Regression

Investigating Models with Two or Three Categories

LOWELL JOURNAL. MUST APOLOGIZE. such communication with the shore as Is m i Boimhle, noewwary and proper for the comfort

LOWELL WEEKLY JOURNAL. ^Jberxy and (Jmott Oao M d Ccmsparftble. %m >ai ruv GEEAT INDUSTRIES

Mixed Models No Repeated Measures

MANY BILLS OF CONCERN TO PUBLIC

Correlation and regression

Ratio of Polynomials Search One Variable

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

Lecture 1 Introduction to Multi-level Models

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.

PanHomc'r I'rui;* :".>r '.a'' W"»' I'fltolt. 'j'l :. r... Jnfii<on. Kslaiaaac. <.T i.. %.. 1 >

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

LOWELL WEEKI.Y JOURINAL

Sample size determination for logistic regression: A simulation study

Single-level Models for Binary Responses

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

8 Nominal and Ordinal Logistic Regression

PubH 7405: REGRESSION ANALYSIS INTRODUCTION TO LOGISTIC REGRESSION

EDF 7405 Advanced Quantitative Methods in Educational Research. Data are available on IQ of the child and seven potential predictors.

Logistic Regression in R. by Kerry Machemer 12/04/2015

Calculating Odds Ratios from Probabillities

Figure 36: Respiratory infection versus time for the first 49 children.

Dose-Response Analysis Report

Lecture 3: Measures of effect: Risk Difference Attributable Fraction Risk Ratio and Odds Ratio

Computer simulation of radioactive decay

MULTINOMIAL LOGISTIC REGRESSION

Lab 1 Uniform Motion - Graphing and Analyzing Motion

Introduction to mtm: An R Package for Marginalized Transition Models

Binary Dependent Variables

Financial Econometrics Review Session Notes 3

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

Variations. ECE 6540, Lecture 02 Multivariate Random Variables & Linear Algebra

More on Input Distributions

Tutorial 23 Back Analysis of Material Properties

Categorical data analysis Chapter 5

Estimating the Marginal Odds Ratio in Observational Studies

An ordinal number is used to represent a magnitude, such that we can compare ordinal numbers and order them by the quantity they represent.

Linear Regression Models P8111

Introduction to Analysis of Genomic Data Using R Lecture 6: Review Statistics (Part II)

Comparison of Stochastic Soybean Yield Response Functions to Phosphorus Fertilizer

Product and Inventory Management (35E00300) Forecasting Models Trend analysis

A Plot of the Tracking Signals Calculated in Exhibit 3.9

1 Introduction to Minitab

Review for Final Exam Stat 205: Statistics for the Life Sciences

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Multiple Regression Basic

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

Multinomial Logistic Regression Models

GEE for Longitudinal Data - Chapter 8

Outline. The binary choice model. The multinomial choice model. Extensions of the basic choice model

High-Throughput Sequencing Course

Machine Learning Linear Classification. Prof. Matteo Matteucci

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

Subgroup analysis using regression modeling multiple regression. Aeilko H Zwinderman

Chapter 11. Regression with a Binary Dependent Variable

,..., θ(2),..., θ(n)

Logistic Regression. Advanced Methods for Data Analysis (36-402/36-608) Spring 2014

Lecture 10: Introduction to Logistic Regression

Math, Stats, and Mathstats Review ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

Does low participation in cohort studies induce bias? Additional material

Unit 4: Polynomial and Rational Functions

Simple logistic regression

Transcription:

Chapter 867 Confidence Intervals for the Interaction Odds Ratio in Logistic Regression with Two Binary X s Introduction Logistic regression expresses the relationship between a binary response variable and one or more independent variables called covariates. This procedure calculates sample size for the case when there are two binary covariates (X and Z) and their interaction (XZ) in the logistic regression model and a Wald statistic is used to calculate a confidence interval for the interaction odds ratio (ORint). Often, Y is called the response variable, the first binary covariate, X, is referred to as the exposure variable and the second binary covariate, Z, is referred to as the confounder variable. For example, Y might refer to the presence or absence of cancer and X might indicate whether the subject smoked or not, and Z is the presence or absence of a certain gene. Sample Size Calculations Using the logistic model, the probability of a binary event is XX Pr(YY = 1 XX, ZZ) = This formula can be rearranged so that it is linear in X as follows exp(ββ 0 + ββ 1 XX + ββ 2 ZZ + ββ 3 XXXX) 1 + exp(ββ 0 + ββ 1 XX + ββ 2 ZZ + ββ 3 XXXX) Pr(YY = 1 XX, ZZ) log 1 Pr(YY = 1 XX, ZZ) = ββ 0 + ββ 1 XX + ββ 2 ZZ + ββ 3 XXXX In the logistic regression model, the magnitude of the association of XZ and Y is represented by the slope β 3. Since XZ is binary, only two cases need be considered: XZ = 0 and XZ = 1. 867-1

The logistic regression model defines the baseline probability as The odds ratio between Y and XZ is defined as PP 0 = Pr(YY = 1 XX = 0, ZZ = 0) = exp(ββ 0) 1 + exp(ββ 0 ) OORR iiiiii = exp(ββ 3 ) It well known that the distribution of the maximum likelihood estimate of β 3 is asymptotically normal. A confidence interval for this slope is commonly formed from the Wald statistic A (1 - α)% two-sided confidence interval for β 3 is z = ββ 3 ss ββ 3 ββ 3 ± zz αα 1 ss ββ 2 3 By transforming this interval into the odds ratio scale by exponentiating both limits, a (1 - α)% two-sided confidence interval for ORint is Note that this interval is not symmetric about ORint. (OORR LLLL, OORR UUUU ) = exp ββ 3 ± zz αα 1 ss ββ 2 3 Often, the goal during this part of the planning process is to find the sample size that reduces the width of the interval to a certain value DD = OORR UUUU OORR LLLL. A suitable D is found using a simple search of possible values of N. Usually, the value of ss ββ 1 is not known before the study so this quantity must be estimated. Demidenko (2008) gives a method for calculating an estimate of this variance from various quantities that can be set at the planning stage. Let p x be the probability that X = 1 in the sample. Similarly, let p z be the probability that Z = 1 in the sample. Define the relationship between X and Z as a logistic regression as follows Pr(XX = 1 ZZ) = exp(γγ 0 + γγ 1 ZZ) 1 + exp(γγ 0 + γγ 1 ZZ) The value of γ 0 is found from exp(γγ 0 ) = QQ + QQ2 + 4pp xx (1 pp xx )exp(γγ 1 ) 2(1 pp xx )exp(γγ 1 ) QQ = pp xx 1 + exp(γγ 1 ) + pp zz 1 exp(γγ 1 ) 1 The information matrix for this model is LL + FF + JJ + RR FF + RR JJ + RR RR FF + RR FF + RR RR RR II = JJ + RR RR JJ + RR RR RR RR RR RR where (1 pp zz )exp(ββ 0 ) LL = 1 + exp(γγ 0 ) 1 + exp(ββ 0 ) 2 pp zz exp(ββ 0 + ββ 1 + ββ 2 +ββ 3 + γγ 0 + γγ 1 ) RR = 1 + exp(γγ 0 + γγ 1 ) 1 + exp(ββ 0 + ββ 1 + ββ 2 + ββ 3 ) 2 867-2

(1 pp zz )exp(ββ 0 + ββ 1 + γγ 0 ) FF = 1 + exp(γγ 0 ) 1 + exp(ββ 0 + ββ 1 ) 2 pp zz exp(ββ 0 + ββ 2 ) JJ = 1 + exp(γγ 0 + γγ 1 ) 1 + exp(ββ 0 + ββ 2 ) 2 The value of NNss ββ 3 is the (4, 4) element of the inverse of I. The values of the regression coefficients are input as P 0 and the following odds ratio as follows The value of β 0 is calculated from P 0 using OOOOOOOOOO = exp(ββ 3 ) OOOOOOOO = exp(ββ 1 ) OOOOOOOO = exp(ββ 2 ) OOOOOOOO = exp(γγ 1 ) ββ 0 = log PP 0 1 PP 0 Thus, the confidence interval can be specified in terms of several odds ratios and P 0. Of course, these results are only approximate. The width of the final confidence interval depends on the actual data values. Procedure Options This section describes the options that are specific to this procedure. These are located on the Design tab. For more information about the options of other tabs, go to the Procedure Window chapter. Design Tab The Design tab contains most of the parameters and options that you will be concerned with. Solve For Solve For This option specifies the parameter to be solved for from the other parameters. The parameters that may be selected are Precision (Confidence Interval Width), Confidence Level, or Sample Size. One-Sided or Two-Sided Interval Interval Type Specify whether the confidence interval will be two-sided, one-sided with an upper limit, or one-sided with a lower limit. 867-3

Confidence Confidence Level (1 Alpha) This option specifies one or more values of the proportion of confidence intervals (constructed with this same confidence level, sample size, etc.) that would have the same width. The range of possible values is between 0 and 1. However, the range is usually between 0.5 and 1. Common choices are 0.9, 0.95, and 0.99. You should select a value that expresses the needs of this study. You can enter a single value such as 0.7 or a series of values such as 0.7 0.8 0.9 or 0.7 to 0.95 by 0.05. Sample Size N (Sample Size) This option specifies the total number of observations in the sample. You may enter a single value or a list of values. Precision Distance from ORint to Limit In a one-sided confidence interval (sometimes called a confidence bound), this is the distance between the upper or lower confidence limit of ORint and the value of ORint. As the sample size increases, this value decreases and thus the interval becomes more precise. Since an odds ratio is typically between 0.2 and 10, it is reasonable that the value of this distance is also between 0.2 and 10. By definition, only positive values are possible. You can enter a single value such as 1 or a series of values such as 0.5 1 1.5 or 0.5 to 1.5 by 0.2. Width of ORint Confidence Interval In a two-sided confidence interval, this is the difference between the upper and lower confidence limits of ORint. As the sample size increases, this width decreases and thus the interval becomes more precise. Since an odds ratio is typically between 0.2 and 10, it is reasonable that the value of this width is also between 0.2 and 10. By definition, only positive values are possible. You can enter a single value such as 1 or a series of values such as 0.5 1 1.5 or 0.5 to 1.5 by 0.2. Baseline Probability P0 [Pr(Y = 1 X = 0, Z = 0)] This gives the value of the baseline probability of a response, P0, when neither the exposure nor confounder are present. P0 is a probability, so it must be between zero and one. Odds Ratios ORint (X,Z Interaction Odds Ratio) Specify one or more values of the XZ-interaction Odds Ratio. This is the value that you expect to calculate from the data. You can enter a single value such as 1.5 or a series of values such as 1.5 2 2.5 or 0.5 to 0.9 by 0.1. The range of this parameter is 0 < ORint < (typically, 0.1 < ORint < 10). 867-4

ORyx (Y,X Odds Ratio) Specify one or more values of the Odds Ratio of Y and X, a measure of the effect size (event rate) that is to be detected by the study. This is the ratio of the odds of the outcome Y given that the exposure X = 1 to the odds of Y = 1 given X = 0. You can enter a single value such as 1.5 or a series of values such as 1.5 2 2.5 or 0.5 to 0.9 by 0.1. The range of this parameter is 0 < ORyx < (typically, 0.1 < ORyx < 10). ORyz (Y,Z Odds Ratio) Specify one or more values of the Odds Ratio of Y and Z, a measure of the relationship between Y and Z. This is the ratio of the odds of the outcome Y given that the exposure Z = 1 to the odds of Y = 1 given Z = 0. You can enter a single value such as 1.5 or a series of values such as 1.5 2 2.5 or 0.5 to 0.9 by 0.1. The range of this parameter is 0 < ORyz < (typically, 0.1 < ORyz < 10). ORxz (X,Z Odds Ratio) Specify one or more values of the Odds Ratio of X and Z, a measure of the relationship between X and Z. This is the ratio of the odds of the exposure X given that the confounder Z = 1 to the odds that X = 1 given Z = 0. You can enter a single value such as 1.5 or a series of values such as 1.5 2 2.5 or 0.5 to 0.9 by 0.1. The range of this parameter is 0 < ORxz < (typically, 0.1 < ORxz < 10). Prevalences Percent with X = 1 This is the percentage of the sample in which X = 1. It is often called the prevalence of X. You can enter a single value or a range of values. The permissible range is 1 to 99. Percent with Z = 1 This is the percentage of the sample in which Z = 1. It is often called the prevalence of Z. You can enter a single value or a range of values. The permissible range is 1 to 99. 867-5

Example 1 Find Sample size A study is to be undertaken to study the association between the occurrence of a certain type of cancer (response variable) and the presence of a certain food in the diet. A second variable, the presence or absence of a certain gene, is also thought to impact the result. The researchers what a sample size larger enough to guarantee that the width of a confidence interval about ORint is less than 0.90. The baseline cancer event rate is 5%. They estimate ORint will be 0.5 and they want a confidence level of 0.95. They want to look at the sensitivity of the analysis to the specification of the other odds ratios, so they want to obtain the results ORyz = 1, 1.5, 2 and ORxz = 1, 1.5, 2. They estimate ORyx at 1.5. The researchers estimate that about 40% of the sample eat the food being studied and about 25% will have the gene of interest. Setup This section presents the values of each of the parameters needed to run this example. First, from the PASS Home window, load the Confidence Intervals for the Interaction Odds Ratio in Logistic Regression with Two Binary X s procedure. You may then make the appropriate entries as listed below, or open Example 1 by going to the File menu and choosing Open Example Template. Option Value Design Tab Solve For... Sample Size Interval Type... Two-Sided Confidence Level... 0.95 Width of ORyx Confidence Interval... 0.9 P0 [Pr(Y=1 X=0, Z=0)]... 0.05 ORint (X, Z Interaction Odds Ratio)... 0.5 ORyx (Y, X Odds Ratio)... 1.5 ORyz (Y, Z Odds Ratio)... 1 1.5 2 ORxz (X, Z Odds Ratio)... 1 1.5 2 Percent with X = 1... 40 Percent with Z = 1... 25 Annotated Output Click the Calculate button to perform the calculations and generate the following output. Numeric Results Numeric Results for Two-Sided Confidence Interval of ORint Lower Upper Conf C.I. C.L. C.L. Pct Pct Level N Width ORint ORint ORint ORyx ORyz ORxz P0 X=1 Z=1 0.950 2995 0.8999 0.500 0.223 1.123 1.500 1.000 1.000 0.050 40.0 25.0 0.950 2868 0.8999 0.500 0.223 1.123 1.500 1.000 1.500 0.050 40.0 25.0 0.950 2845 0.8999 0.500 0.223 1.123 1.500 1.000 2.000 0.050 40.0 25.0 0.950 2253 0.9000 0.500 0.223 1.123 1.500 1.500 1.000 0.050 40.0 25.0 0.950 2169 0.8999 0.500 0.223 1.123 1.500 1.500 1.500 0.050 40.0 25.0 0.950 2156 0.9000 0.500 0.223 1.123 1.500 1.500 2.000 0.050 40.0 25.0 0.950 1884 0.8998 0.500 0.223 1.123 1.500 2.000 1.000 0.050 40.0 25.0 0.950 1821 0.8998 0.500 0.223 1.122 1.500 2.000 1.500 0.050 40.0 25.0 0.950 1813 0.8999 0.500 0.223 1.123 1.500 2.000 2.000 0.050 40.0 25.0 867-6

References Demidenko, Eugene. 2007. 'Sample size determination for logistic regression revisited', Statistics in Medicine, Volume 26, pages 3385-3397. Demidenko, Eugene. 2008. 'Sample size and optimal design for logistic regression with binary interaction', Statistics in Medicine, Volume 27, pages 36-46. Rochon, James. 1989. 'The Application of the GSK Method to the Determination of Minimum Sample Sizes', Biometrics, Volume 45, pages 193-205. Report Definitions Logistic regression equation: Log(P/(1-P)) = β0 + β1 X + β2 Z + β3 XZ, where P = Pr(Y = 1 X, Z) and X and Z are binary. Confidence Level is the proportions of studies with the same settings that produce a confidence interval that includes the true ORint. N is the sample size. C.I. Width is the distance between the two boundaries of the confidence interval. ORint is the expected sample value of the interaction odds ratio. It is the value of exp(β3). ORyx is the expected sample value of the odds ratio. It is the value of exp(β1). ORyz = Exp(β2) is the odds ratio of Y versus Z. ORxz is the odds ratio of X versus Z in a logistic regression of X on Z. C.I. of ORint Lower Limit is the lower limit of the confidence interval of ORint. C.I. of ORint Upper Limit is the upper limit of the confidence interval of ORint. P0 is the response probability at X = 0. That is, P0 = Pr(Y = 1 X = 0, Z = 0). Percent X=1 is the percent of the sample in which the exposure is 1 (present). Percent Z=1 is the percent of the sample in which the confounder is 1. Summary Statements A logistic regression of a binary response variable (Y) on two binary independent variables (X and Z) and their interaction with a sample size of 2995 observations at a 0.950 confidence level produces a two-sided confidence interval for the odds ratio of the interaction with a width of 0.8999. The sample odds ratio of the interaction is assumed to be 0.500. Other settings are ORyx = 1.500, ORyz = 1, ORxz = 1.000, and P0 (prevalence of Y given X = 0 and Z = 0) = 0.050. The prevalence of X is 40.0% and the prevalence of Z is 25.0%. A Wald statistic is used to construct the confidence interval. This report shows the sample size for each of the scenarios. Plot Section 867-7

These plots show the sample size for the various values of the other parameters. 867-8

Example 2 Validation for the Interaction We could not find a direct validation result in the literature, so we will create one by hand. This is easy to do in this case because we can create a dataset, analyze it with a statistical program such as NCSS, and then compare these results to those obtained with the above formulas in PASS. Here is a summary of the data that was used to generate this example. The numeric values are counts of the number of items in the corresponding cell. Group Y=1 Y=0 Total X=1, Z=1 5 10 15 X=1, Z=0 3 21 24 X=0, Z=1 17 3 20 X=0, Z=0 9 7 16 Total 34 41 75 Here is a printout from NCSS showing the estimated odds ratio (0.79412) and confidence interval (0.08304 to 7.59380). The width is 7.51076. Odds Ratios Independent Regression Odds Lower 95% Upper 95% Variable Coefficient Ratio Confidence Confidence X b(i) Exp(b(i)) Limit Limit Intercept 0.43853 1.55042 0.64509 3.72629 (X=1) -2.19722 0.11111 0.02331 0.52968 (Z=1) 1.48329 4.40741 0.91195 21.30077 (X=1)*(Z=1) -0.23052 0.79412 0.08304 7.59380 Note that the value of P0 is 9 / 16 = 0.5625. The value of Percent with X = 1 is 100 x 39 / 75 = 52%. The value of Percent with Z = 1 is 100 x 35 / 75 = 46.67%. Also, a logistic regression of X on Z produced an ORxz of 0.50. Setup This section presents the values of each of the parameters needed to run this example. First, from the PASS Home window, load the Confidence Intervals for the Interaction Odds Ratio in Logistic Regression with Two Binary X s procedure. You may then make the appropriate entries as listed below, or open Example 2 by going to the File menu and choosing Open Example Template. Option Value Design Tab Solve For... Precision (C.I. Width) Interval Type... Two-Sided Confidence Level... 0.95 N (Sample Size)... 75 P0 [Pr(Y=1 X=0, Z=0)]... 0.5625 ORint (Interaction Odds Ratio)... 0.79412 ORyx (Y, X Odds Ratio)... 0.1111 ORyz (Y, Z Odds Ratio)... 4.40741 ORxz (X, Z Odds Ratio)... 0.50 Percent with X = 1... 52 Percent with Z = 1... 46.6666667 867-9

Output Click the Calculate button to perform the calculations and generate the following output. Numeric Results Numeric Results for Two-Sided Confidence Interval of ORint Lower Upper Conf C.I. C.L. C.L. Pct Pct Level N Width ORint ORint ORint ORyx ORyz ORxz P0 X=1 Z=1 0.950 75 7.51103 0.79412 0.08304 7.59407 0.11110 4.40741 0.50000 0.56250 52.0 46.7 Using the above settings, PASS calculates the confidence interval to be (0.08304, 7.59407) which leads to a C. I. Width of 7.51103. This matches the results obtained from the data to within rounding. 867-10