samplesizelogisticcasecontrol Package

Similar documents
CGEN(Case-control.GENetics) Package

Quantitative genetic (animal) model example in R

Introduction to pepxmltab

The NanoStringDiff package

Unstable Laser Emission Vignette for the Data Set laser of the R package hyperspec

Visualising very long data vectors with the Hilbert curve Description of the Bioconductor packages HilbertVis and HilbertVisGUI.

pensim Package Example (Version 1.2.9)

Special features of phangorn (Version 2.3.1)

Disease Ontology Semantic and Enrichment analysis

Processing microarray data with Bioconductor

coenocliner: a coenocline simulation package for R

Example analysis of biodiversity survey data with R package gradientforest

Conditional variable importance in R package extendedforest

Infer mirna-mrna interactions using paired expression data from a single sample

Numerically Stable Frank Copula Functions via Multiprecision: R Package Rmpfr

Duration of Unemployment - Analysis of Deviance Table for Nested Models

An Analysis. Jane Doe Department of Biostatistics Vanderbilt University School of Medicine. March 19, Descriptive Statistics 1

A not so short introduction to for Epidemiology

Package matrixnormal

Solution Anti-fungal treatment (R software)

Using the tmle.npvi R package

Follow-up data with the Epi package

The gpca Package for Identifying Batch Effects in High-Throughput Genomic Data

Chapter 4. Multivariate Distributions. Obviously, the marginal distributions may be obtained easily from the joint distribution:

Renormalizing Illumina SNP Cell Line Data

Cohen s s Kappa and Log-linear Models

Motif import, export, and manipulation

R-companion to: Estimation of the Thurstonian model for the 2-AC protocol

A guide to Dynamic Transcriptome Analysis (DTA)

(c) Interpret the estimated effect of temperature on the odds of thermal distress.

Package SEMModComp. R topics documented: February 19, Type Package Title Model Comparisons for SEM Version 1.0 Date Author Roy Levy

The mvtnorm Package. R topics documented: July 24, Title Multivariate Normal and T Distribution. Version 0.8-1

Chapter 4 Exercises 1. Data Analysis & Graphics Using R Solutions to Exercises (December 11, 2006)

Logistic regression: Why we often can do what we think we can do. Maarten Buis 19 th UK Stata Users Group meeting, 10 Sept. 2015

(Where does Ch. 7 on comparing 2 means or 2 proportions fit into this?)

First steps of multivariate data analysis

Analysis plan, data requirements and analysis of HH events in DK

Basic Medical Statistics Course

Lecture 12: Effect modification, and confounding in logistic regression

Package PoisNonNor. R topics documented: March 10, Type Package

Ch 6: Multicategory Logit Models

Suppose that we are concerned about the effects of smoking. How could we deal with this?

Tests for the Odds Ratio in a Matched Case-Control Design with a Quantitative X

ChemmineR-V2: Analysis of Small Molecule and Screening Data

MACAU 2.0 User Manual

Introduction to Computational Finance and Financial Econometrics Probability Theory Review: Part 2

Chapter 3: Testing alternative models of data

Linear regression models in matrix form

Multivariate and Multivariable Regression. Stella Babalola Johns Hopkins University

Winding journey down (in Gibbs energy)

MALA versus Random Walk Metropolis Dootika Vats June 4, 2017

Multivariate Analysis Homework 1

Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data

Package sscor. January 28, 2016

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Correlation and regression

Chapter 11 Sampling Distribution. Stat 115

Package plw. R topics documented: May 7, Type Package

Case-control studies C&H 16

R STATISTICAL COMPUTING

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models

Introduction to mtm: An R Package for Marginalized Transition Models

Homework 1 Solutions

Matched Pair Data. Stat 557 Heike Hofmann

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

Leslie matrices and Markov chains.

Visualizing the Multivariate Normal, Lecture 9

Random Variables and Their Distributions

Generalized linear models

Confidence Intervals for the Odds Ratio in Logistic Regression with Two Binary X s

Package MultiRNG. January 10, 2018

Confidence Intervals for the Odds Ratio in Logistic Regression with One Binary X

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Package mmm. R topics documented: February 20, Type Package

Continuous Time Analysis of Panel Data: An Illustration of the Exact Discrete Model (EDM)

Winding journey down (in Gibbs energy)

Confounding and effect modification: Mantel-Haenszel estimation, testing effect homogeneity. Dankmar Böhning

A comparison of 5 software implementations of mediation analysis

Introduction to R, Part I

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Unit 11: Multiple Linear Regression

Chapter 4 Multiple Random Variables

Stat 5101 Lecture Notes

2.1 Linear regression with matrices

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Logic Regression: Biological Motivation Cyclic Gene Study

Logistic Regression Models for Multinomial and Ordinal Outcomes

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London

Other hypotheses of interest (cont d)

First Year Examination Department of Statistics, University of Florida

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Package mvtnorm. January 27, 2018

STAT 7030: Categorical Data Analysis

Latent random variables

Steno 2 study 20 year follow-up end of 2014

Testing for group differences in brain functional connectivity

Textbook Examples of. SPSS Procedure

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Course information: Instructor: Tim Hanson, Leconte 219C, phone Office hours: Tuesday/Thursday 11-12, Wednesday 10-12, and by appointment.

Transcription:

samplesizelogisticcasecontrol Package January 31, 2017 > library(samplesizelogisticcasecontrol) Random data generation functions Let X 1 and X 2 be two variables with a bivariate normal ditribution with mean (0, 0) and covariance [1, 0.5; 0.5, 2]. X 2 corresponds to the exposure of interest. Let X 3 = X 1 X 2 and define functions for generating random data from the distribution of (X 1, X 2 ) and (X 1, X 2, X 3 ). > mymvn <- function(n) { + mu <- c(0, 0) + sigma <- matrix(c(1, 0.5, 0.5, 2), byrow=true, nrow=2, ncol=2) + dat <- rmvnorm(n, mean=mu, sigma=sigma) + dat + } > myf <- function(n) { + dat <- mymvn(n) + dat <- cbind(dat, dat[, 1]*dat[, 2]) + dat + } Generate some data > data <- myf(200) > colnames(data) <- paste("x", 1:3, sep="") > data[1:5, ] X1 X2 X3 [1,] 0.2192626-0.5025393-0.1101881 [2,] 0.7680064-0.8420339-0.6466874 [3,] -0.1152033 1.0586934-0.1219650 [4,] 0.1691127-1.0049474-0.1699494 [5,] -0.4336896-1.2477247 0.5411253 Examples of univariate calculations We have the logistic model logit = µ + βx and are testing β = 0. Suppose the disease prevalence is 0.01, the log-odds ratio for the exposure X is 0.26 and that the exposure follows a Bernoulli(p) distribution with p = 0.15. 1

> prev <- 0.01 > logor <- 0.26 > p <- 0.15 Compute the sample sizes > samplesize_binary(prev, logor, probxeq1=p) [1] 4472 [1] 4498 [1] 4467 [1] 4441 The same result can be obtained assuming X is ordinal and passing in the 2 probabilities P (X = 0) and P (X = 1). > samplesize_ordinal(prev, logor, probx=c(1-p, p)) [1] 4472 [1] 4498 [1] 4467 [1] 4440 Let X be ordinal with 3 levels. The vector being passed into the probx argument below is (P (X = 0), P (X = 1), P (X = 2)). > samplesize_ordinal(prev, logor, probx=c(0.4, 0.35, 0.25)) [1] 975 [1] 985 [1] 973 [1] 963 2

Now let the exposure X be N(0, 1). > samplesize_continuous(prev, logor) [1] 625 [1] 644 [1] 621 [1] 602 For the univariate case with continuous exposure, we can specify the probability density function of X in different ways. Consider X to have a chi-squared distribution with 1 degree of freedom. Note that the domain of a chi-squared pdf is from 0 to infinity, and that the var(x) = 2. > samplesize_continuous(prev, logor, distf="dchisq(x, 1)", + distf.support=c(0,inf), distf.var=2) [1] 170 [1] 242 [1] 168 [1] 113 > f <- function(x) {dchisq(x, 1)} > samplesize_continuous(prev, logor, distf=f, distf.support=c(0, Inf), + distf.var=2) [1] 170 [1] 242 [1] 168 [1] 113 3

If we do not set distf.var, then the variance of X will be approximated by numerical integration and could yield slightly different results. > samplesize_continuous(prev, logor, distf="dchisq(x, 1)", distf.support=c(0,inf)) [1] 170 [1] 242 [1] 168 [1] 113 Let X have the distribution defined by column X 1 in data. > samplesize_data(prev, logor, data[, "X1", drop=false]) [1] 608 [1] 626 [1] 604 [1] 586 Examples with confounders We have the logit model logit = µ + β 1 X 1 + β 2 X 2 and are interested in testing β 2 = 0. Here we must have log-odds ratios for X 1 and X 2, and we will use the distribution function mymvn defined above to generate 200 random samples. Note that logor[1] corresponds to X 1 and logor[2] corresponds to X 2. > logor <- c(0.1, 0.13) > samplesize_data(prev, logor, mymvn(200)) [1] 1554 [1] 1569 [1] 1550 4

[1] 1535 Now we would like to perform a test of interaction, β 3 = 0, where logit = µ + β 1 X 1 + β 2 X 2 + β 3 X 3 and X 3 = X 1 X 2. The vector of log-odds ratios must be of length 3 and in the same order as (X 1, X 2, X 3 ). > logor <- c(0.1, 0.15, 0.11) > samplesize_data(prev, logor, myf(1000)) [1] 1330 [1] 1371 [1] 1319 [1] 1279 Pilot data from a file Suppose we want to compute sample sizes for a case-control study where we have pilot data from a previous study. The pilot data is stored in the file: > file <- system.file("sampledata", "data.txt", package="samplesizelogisticcasecontrol") > file [1] "/tmp/rtmpbcalxu/rinst5bc92229fd1/samplesizelogisticcasecontrol/sampledata/data.txt" Here the exposure variable is Treatment, and Gender Male is a dummy variable for the confounder gender. We will use the data from only the controls and define a new variable of interest which is the interaction of gender and treatment. In our model, both gender and treatment will be confounders. First, read in the data. > data <- read.table(file, header=1, sep="\t") Create the interaction variable > data[, "Interaction"] <- data[, "Gender_Male"]*data[, "Treatment"] > data[1:5, ] Casecontrol Gender_Male Treatment Interaction 1 0 0 0 0 2 1 1 0 0 3 0 0 0 0 4 1 1 1 1 5 1 1 0 0 5

Now subset the data to use only the controls > temp <- data[, "Casecontrol"] %in% 0 > data <- data[temp, ] The data that gets passed in should only contain the columns that will be used in the analysis with the variable of interest being the last column. > vars <- c("gender_male", "Treatment", "Interaction") > data <- data[, vars] Define the log-odds ratios for gender, treatment, and the interaction of gender and treatment. The order of these log-odds ratios must match the order of the columns in the data. > logor <- c(0.1, 0.13, 0.27) Compute the sample sizes > samplesize_data(prev, logor, data) [1] 9402 [1] 9404 [1] 9403 [1] 9400 Note that the same results can be obtained by not reading in the data and creating a new interaction variable, but by setting the input argument of data to be of type file.list. > data.list <- list(file=file, header=1, sep="\t", + covars=c("gender_male", "Treatment"), + exposure=c("gender_male", "Treatment")) > data.list$subsetdata <- list(list(var="casecontrol", operator="%in%", value=0)) > samplesize_data(prev, logor, data.list) [1] 9402 [1] 9404 [1] 9403 [1] 9400 6

Session Information > sessioninfo() R version 3.3.0 (2016-05-03) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS release 6.6 (Final) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grdevices utils datasets methods base other attached packages: [1] samplesizelogisticcasecontrol_0.0.6 mvtnorm_1.0-5 loaded via a namespace (and not attached): [1] tools_3.3.0 7