Confidence Intervals for the Odds Ratio in Logistic Regression with One Binary X

Similar documents
Confidence Intervals for the Odds Ratio in Logistic Regression with Two Binary X s

Tests for the Odds Ratio in Logistic Regression with One Binary X (Wald Test)

Confidence Intervals for the Interaction Odds Ratio in Logistic Regression with Two Binary X s

Tests for the Odds Ratio in a Matched Case-Control Design with a Quantitative X

PASS Sample Size Software. Poisson Regression

Tests for the Odds Ratio of Two Proportions in a 2x2 Cross-Over Design

Passing-Bablok Regression for Method Comparison

Tests for Two Correlated Proportions in a Matched Case- Control Design

Tests for Two Coefficient Alphas

NCSS Statistical Software. Harmonic Regression. This section provides the technical details of the model that is fit by this procedure.

Superiority by a Margin Tests for One Proportion

Non-Inferiority Tests for the Ratio of Two Proportions in a Cluster- Randomized Design

Analysis of Covariance (ANCOVA) with Two Groups

Confidence Intervals for One-Way Repeated Measures Contrasts

Group-Sequential Tests for One Proportion in a Fleming Design

Hotelling s One- Sample T2

Lecture 12: Effect modification, and confounding in logistic regression

Logistic Regression Models to Integrate Actuarial and Psychological Risk Factors For predicting 5- and 10-Year Sexual and Violent Recidivism Rates

Two Correlated Proportions Non- Inferiority, Superiority, and Equivalence Tests

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Fractional Polynomial Regression

Advanced Quantitative Data Analysis

Lecture 10: Introduction to Logistic Regression

Generalized logit models for nominal multinomial responses. Local odds ratios

One-Way Analysis of Covariance (ANCOVA)

Homework Solutions Applied Logistic Regression

Ratio of Polynomials Fit One Variable

One-Way Repeated Measures Contrasts

Binomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials

Investigating Models with Two or Three Categories

Binary Dependent Variables

Binary Logistic Regression

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Ratio of Polynomials Fit Many Variables

Outline. The binary choice model. The multinomial choice model. Extensions of the basic choice model

Single-level Models for Binary Responses

Dose-Response Analysis Report

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Analysis of 2x2 Cross-Over Designs using T-Tests

Calculating Odds Ratios from Probabillities

Simple logistic regression

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation

Machine Learning Linear Classification. Prof. Matteo Matteucci

Chapter 11. Regression with a Binary Dependent Variable

Computer simulation of radioactive decay

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

Chapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

Lecture 5: Poisson and logistic regression

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

A Plot of the Tracking Signals Calculated in Exhibit 3.9

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

Mixed Models No Repeated Measures

EDF 7405 Advanced Quantitative Methods in Educational Research. Data are available on IQ of the child and seven potential predictors.

Lecture 2: Poisson and logistic regression

ssh tap sas913, sas

PubH 7405: REGRESSION ANALYSIS INTRODUCTION TO LOGISTIC REGRESSION

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

Logistic Regression in R. by Kerry Machemer 12/04/2015

General Linear Models (GLM) for Fixed Factors

Multiple Regression Basic

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Ratio of Polynomials Search One Variable

Lab 1 Uniform Motion - Graphing and Analyzing Motion

Introduction to mtm: An R Package for Marginalized Transition Models

High-Throughput Sequencing Course

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Introducing Generalized Linear Models: Logistic Regression

Linear Regression Models P8111

An ordinal number is used to represent a magnitude, such that we can compare ordinal numbers and order them by the quantity they represent.

multilevel modeling: concepts, applications and interpretations

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

8 Nominal and Ordinal Logistic Regression

Multinomial Logistic Regression Models

Chapter 22: Log-linear regression for Poisson counts

1 Introduction to Minitab

Correlation and regression

,..., θ(2),..., θ(n)

Consider Table 1 (Note connection to start-stop process).

Figure 36: Respiratory infection versus time for the first 49 children.

Logistic Regression: Regression with a Binary Dependent Variable

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

options description set confidence level; default is level(95) maximum number of iterations post estimation results

Chapter 20: Logistic regression for binary response variables

i (x i x) 2 1 N i x i(y i y) Var(x) = P (x 1 x) Var(x)

Week 7: Binary Outcomes (Scott Long Chapter 3 Part 2)

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

Multiple Regression: Chapter 13. July 24, 2015

The Logit Model: Estimation, Testing and Interpretation

Does low participation in cohort studies induce bias? Additional material

Generalized Linear Modeling - Logistic Regression

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Basic Business Statistics 6 th Edition

Modelling Rates. Mark Lunt. Arthritis Research UK Epidemiology Unit University of Manchester

Introduction to Statistical Analysis

M M Cross-Over Designs

Lecture 3.1 Basic Logistic LDA

Transcription:

Chapter 864 Confidence Intervals for the Odds Ratio in Logistic Regression with One Binary X Introduction Logistic regression expresses the relationship between a binary response variable and one or more independent variables called covariates. This procedure calculates sample size for the case when there is only one, binary covariate (X) in the logistic regression model and a Wald statistic is used to calculate a confidence interval for the odds ratio of Y to X. Often, Y is called the response variable and X is referred to as the exposure variable. For example, Y might refer to the presence or absence of cancer and X might indicate whether the subject smoked or not. Sample Size Calculations Using the logistic model, the probability of a binary event is Pr(YY = 1 XX) = exp(ββ 0 + ββ 1 XX) 1 + exp(ββ 0 + ββ 1 XX) = 1 1 + exp( ββ 0 ββ 1 XX) This formula can be rearranged so that it is linear in X as follows Pr(YY = 1 XX) log 1 Pr(YY = 1 XX) = ββ 0 + ββ 1 XX Note that the left side is the logarithm of the odds of a response event (Y = 1) versus a response non-event (Y = 0). This is sometimes called the logit transformation of the probability. In the logistic regression model, the magnitude of the association of X and Y is represented by the slope β 1. Since X is binary, only two cases need be considered: X = 0 and X = 1. The logistic regression model lets us define two quantities PP 0 = Pr(YY = 1 XX = 0) = exp(ββ 0) 1 + exp(ββ 0 ) PP 1 = Pr(YY = 1 XX = 1) = exp(ββ 0 + ββ 1 ) 1 + exp(ββ 0 + ββ 1 ) 864-1

These values are combined in the odds ratio (OR) of P 1 to P 0 resulting in or, by taking the logarithm of both sides, simply OORR yyyy = exp(ββ 1 ) PP 1 (1 PP log OORR yyyy = log 1 ) = ββ PP 1 0 (1 PP 0 ) Hence the relationship between Y and X can be quantified as a single regression coefficient. It well known that the distribution of the maximum likelihood estimate of β 1 is asymptotically normal. A significance test or confidence interval for this slope is commonly formed from the Wald statistic A (1 - α)% two-sided confidence interval for β 1 is z = ββ 1 ss ββ 1 ββ 1 ± zz αα 1 ss ββ 2 1 By transforming this interval into the odds ratio scale by exponentiating both limits, a (1 - α)% two-sided confidence interval for OR is Note that this interval is not symmetric about exp ββ 1. (OORR LLLL, OORR UUUU ) = exp ββ 1 ± zz αα 1 ss ββ 2 1 Often, the goal during this part of the planning process is to find the sample size that reduces the width of the interval to a certain value DD = OORR UUUU OORR LLLL. A suitable D is found using a simple search of possible values of N. Usually, the value of ss ββ 1 is not known before the study so this quantity must be estimated. Demidenko (2007) gives a method for calculating an estimate of the variance from various quantities that can be set at the planning stage. Let p x be the probability that X = 1 in the sample. The information matrix for this model is II = pp xx exp(ββ 0 + ββ 1 ) 1 + exp(ββ 0 + ββ 1 ) 2 + (1 pp xx)exp(ββ 0 ) pp xx exp(ββ 0 + ββ 1 ) 1 + exp(ββ 0 ) 2 1 + exp(ββ 0 + ββ 1 ) 2 pp xx exp(ββ 0 + ββ 1 ) pp xx exp(ββ 0 + ββ 1 ) 1 + exp(ββ 0 + ββ 1 ) 2 1 + exp(ββ 0 + ββ 1 ) 2 The value of NNss ββ 1 is the (2,2) element of the inverse of I. The values of β 0 and β 1 are calculated from OR yx and P 0 using ββ 0 = log PP 0 1 PP 0 PP 1 (1 PP ββ 1 = log OORR yyyy = log 1 ) PP 0 (1 PP 0 ) Thus, the confidence interval can be specified in terms of OR yx and P 0. Of course, these results are only approximate. The final confidence interval depends on the actual data values. 864-2

Procedure Options This section describes the options that are specific to this procedure. These are located on the Design tab. For more information about the options of other tabs, go to the Procedure Window chapter. Design Tab The Design tab contains most of the parameters and options that you will be concerned with. Solve For Solve For This option specifies the parameter to be solved for from the other parameters. The parameters that may be selected are Precision (Confidence Interval Width), Confidence Level, or Sample Size. One-Sided or Two-Sided Interval Interval Type Specify whether the confidence interval will be two-sided, one-sided with an upper limit, or one-sided with a lower limit. Confidence Confidence Level (1 Alpha) This option specifies one or more values of the proportion of confidence intervals (constructed with this same confidence level, sample size, etc.) that would have the same width. The range of possible values is between 0 and 1. However, the range is usually between 0.5 and 1. Common choices are 0.9, 0.95, and 0.99. You should select a value that expresses the needs of this study. You can enter a single value such as 0.7 or a series of values such as 0.7 0.8 0.9 or 0.7 to 0.95 by 0.05. Sample Size N (Sample Size) This option specifies the total number of observations in the sample. You may enter a single value or a list of values. Precision Distance from ORyx to Limit In a one-sided confidence interval (sometimes called a confidence bound), this is the distance between the upper or lower confidence limit of ORyx and the value of ORyx. As the sample size increases, this value decreases and thus the interval becomes more precise. Since an odds ratio is typically between 0.2 and 10, it is reasonable that the value of this distance is also between 0.2 and 10. By definition, only positive values are possible. You can enter a single value such as 1 or a series of values such as 0.5 1 1.5 or 0.5 to 1.5 by 0.2. 864-3

Width of ORyx Confidence Interval In a two-sided confidence interval, this is the difference between the upper and lower confidence limits of ORyx. As the sample size increases, this width decreases and thus the interval becomes more precise. Since an odds ratio is typically between 0.2 and 10, it is reasonable that the value of this width is also between 0.2 and 10. By definition, only positive values are possible. You can enter a single value such as 1 or a series of values such as 0.5 1 1.5 or 0.5 to 1.5 by 0.2. Baseline Probability P0 [Pr(Y = 1 X = 0)] This gives the value of the baseline probability of a response, P0, when the exposure is not present. P0 is a probability, so it must be between zero and one. It cannot be equal to P1. Odds Ratio (Confidence Interval Term) ORyx (Y, X Odds Ratio) Specify one or more values of the odds ratio of Y and X, a measure of the effect size (event rate) that is to be detected by the study. This is the ratio of the odds of the outcome Y given that the exposure X = 1 to the odds of Y = 1 given X = 0. That is, odds(y=1 X=1) / odds(y=1 X=0). Note that odds(a) = Pr(A)/Pr(Not A) You can enter a single value such as 1.5 or a series of values such as 1.5 2 2.5 or 0.5 to 0.9 by 0.1. The range of this parameter is 0 < ORyx < (typically, 0.1 < ORyx < 10). Prevalence Percent with X = 1 This is the percentage of the sample in which X = 1. It is often called the prevalence of X. You can enter a single value or a range of values. The permissible range is 1 to 99. 864-4

Example 1 Find Sample size A study is to be undertaken to study the association between the occurrence of a certain type of cancer (response variable) and the presence of a certain food in the diet. The baseline cancer event rate is 7%. The researchers want a sample size large enough to create a confidence interval with a width of 0.9. They assume that the actual odds ratio with be 2.0. The confidence level is set to 0.95. They also want to look at the sensitivity of the analysis to the specification of the odds ratio, so they also want to obtain the results for odds ratios of 1.75 and 2.25. The researchers assume that between 25% and 50% of the sample eat the food being studied, so they want results for both of these values. Setup This section presents the values of each of the parameters needed to run this example. First, from the PASS Home window, load the procedure. You may then make the appropriate entries as listed below, or open Example 1 by going to the File menu and choosing Open Example Template. Option Value Design Tab Solve For... Sample Size Interval Type... Two-Sided Confidence Level... 0.95 Width of ORyx Confidence Interval... 0.90 P0 [Pr(Y=1 X=0)]... 0.07 Odds Ratio (Odds1/Odds0)... 1.75 2.0 2.25 Percent with X = 1... 25 50 Annotated Output Click the Calculate button to perform the calculations and generate the following output. Numeric Results Numeric Results for Two-Sided Confidence Interval of ORyx Lower Upper Confidence C.I. Conf Limit Conf Limit Percent Level N Width ORyx of ORyx of ORyx P0 X=1 0.950 3525 0.8999 1.750 1.357 2.257 0.070 25.0 0.950 2979 0.8999 1.750 1.357 2.257 0.070 50.0 0.950 4294 0.9000 2.000 1.600 2.500 0.070 25.0 0.950 3727 0.9000 2.000 1.600 2.500 0.070 50.0 0.950 5136 0.9000 2.250 1.845 2.745 0.070 25.0 0.950 4561 0.9000 2.250 1.845 2.745 0.070 50.0 Report Definitions Logistic regression equation: Log(P/(1-P)) = β0 + β1 X, where P = Pr(Y = 1 X) and X is binary. Confidence Level is the proportions of studies with the same settings that produce a confidence interval that includes the true ORyx. N is the sample size. C.I. Width is the distance between the two boundaries of the confidence interval. ORyx is the expected sample value of the odds ratio. It is the value of exp(β1). C.I. of ORyx Lower Limit is the lower limit of the confidence interval of ORyx. C.I. of ORyx Upper Limit is the upper limit of the confidence interval of ORyx. P0 is the response probability at X = 0. That is, P0 = Pr(Y = 1 X = 0). Percent X=1 is the percent of the sample in which the exposure is 1 (present). 864-5

Summary Statements A logistic regression of a binary response variable (Y) on a binary independent variable (X) with a sample size of 3525 observations (of which 25.0% are in the group X=0) at a 0.950 confidence level produces a two-sided confidence interval with a width of 0.8999. The baseline response rate, P0, is assumed to be 0.070 and the sample odds ratio is assumed to be 1.750. A Wald statistic is used to construct the confidence interval. This report shows the power for each of the scenarios. Plot Section 864-6

These plots show the sample size for the various values of the other parameters. 864-7

Example 2 Validation for a Binary Covariate We could not find a direct validation result in the literature, so we decided to create one. This is easy to do in this case because we can create a dataset, analyze it with a statistical program such as NCSS, and then compare these results to those obtained with the above formulas in PASS. Here is a summary of the data that was used to generate this example. The numeric values are counts of the number of items in the corresponding cell. Group X=1 X=0 Total Y=1 8 26 34 Y=0 31 10 41 Total 39 36 75 Here is a printout from NCSS showing the estimated odds ratio (0.09926) and confidence interval (0.03419 to 0.28816). Odds Ratios Independent Regression Odds Lower 95% Upper 95% Variable Coefficient Ratio Confidence Confidence X b(i) Exp(b(i)) Limit Limit Intercept 0.95551 2.60000 1.25383 5.39149 (X="B") -2.31006 0.09926 0.03419 0.28816 Note that the simple odds ratio can also be calculated directly from the above table using the definition of the odds ratio. The formula gives (8 x 10) / (31 x 26) = 80 / 806 = 0.09926 which matches the value in the printout. Note that the value of P0 is 26 / 36 = 0.72222222 and Percent with X = 1 is 100 x 39 / 75 = 52%. Setup This section presents the values of each of the parameters needed to run this example. First, from the PASS Home window, load the procedure. You may then make the appropriate entries as listed below, or open Example 2 by going to the File menu and choosing Open Example Template. Option Value Design Tab Solve For... Precision Interval Type... Two-Sided Confidence Level... 0.95 Sample Size... 75 P0 [Pr(Y=1 X=0)]... 0.72222222 Odds Ratio (Odds1/Odds0)... 0.09925558 Percent with X = 1... 52 864-8

Output Click the Calculate button to perform the calculations and generate the following output. Numeric Results Numeric Results for Two-Sided Confidence Interval of ORyx Lower Upper Confidence C.I. Conf Limit Conf Limit Percent Level N Width ORyx of ORyx of ORyx P0 X=1 0.950 75 0.2540 0.099 0.034 0.288 0.722 52.0 Using the above settings, PASS also calculates the confidence interval to be (0.034, 0.288) which leads to a C. I. Width of 0.254. This validates the procedure with an independent calculation. 864-9