Web-based Supplementary Materials for Calibrating. Sensitivity Analyses to Observed Covariates in. Observational Studies by Hsu and Small

Size: px
Start display at page:

Download "Web-based Supplementary Materials for Calibrating. Sensitivity Analyses to Observed Covariates in. Observational Studies by Hsu and Small"

Transcription

1 Web-based Supplementary Materials for Calibrating Sensitivity Analyses to Observed Covariates in Observational Studies by Hsu and Small Jesse Y. Hsu 1,2,3 Dylan S. Small 1 1 Department of Statistics, The Wharton School, University of Pennsylvania 400 Jon M. Huntsman Hall, 3730 Walnut Street Philadelphia, Pennsylvania, U.S.A. 2 Center for Outcomes Research, The Children s Hospital of Philadelphia Philadelphia, Pennsylvania, U.S.A. 3 hsu9@wharton.upenn.edu 1

2 A Details on Computation of p i (u) In this section, we provide details on computation of p i (u) and p + i discussed in Section 2.3; see Gastwirth et al. (1998) and Gastwirth et al. (2000) for more details. Under model (2), in the case of pair matching, the conditional distribution of the treatment assignment within pair i is π i = Pr(Z i1 = 1,Z i2 = 0 ỹ, Z,X) Pr(Z i1 = 1 ỹ, Z,X)Pr(Z i2 = 0 ỹ, Z,X) = Pr(Z i1 = 1 ỹ, Z,X)Pr(Z i2 = 0 ỹ, Z,X)+Pr(Z i1 = 0 ỹ, Z,X)Pr(Z i2 = 1 ỹ, Z,X) exp(γu i1 ) = exp(γu i1 )+exp(γu i2 ) 1 = 1+exp{γ(u i2 u i1 ). (A.1) Similarly, under model (3), the conditional distribution of response within pair i is λ i = Pr(y i1 = y i(2),y i2 = y i(1) ỹ, Z,X) Pr(y i1 = y i(2) ỹ, Z,X)Pr(y i2 = y i(1) ỹ, Z,X) = Pr(y i1 = y i(2) ỹ, Z,X)Pr(y i2 = y i(1) ỹ, Z,X)+Pr(y i1 = y i(1) ỹ, Z,X)Pr(y i2 = y i(2) ỹ, Z,X) exp{δ(y i(2) u i1 +y i(1) u i2 ) = exp{δ(y i(2) u i1 +y i(1) u i2 )+exp{δ(y i(1) u i1 +y i(2) u i2 ) 1 = 1+exp{δ(y i(2) y i(1) )(u i2 u i1 ). (A.2) For pair i, the chance that the treated subject has the higher response under the null hypothesis is p i (u) = π i λ i +(1 π i )(1 λ i ) = exp{γ(u i2 u i1 )exp{δ(y i(2) y i(1) )(u i2 u i1 )+1 [1+exp{γ(u i2 u i1 )][1+exp{δ(y i(2) y i(1) )(u i2 u i1 )]. (A.3) 2

3 The maximum value of p i (u), p + i, can be obtained by setting u i1 = 0 and u i2 = 1, which is p + i = p i (u i1 = 0,u i2 = 1) = exp(γ)exp{δ(y i(2) y i(1) )+1 {1+exp(γ)[1+exp{δ(y i(2) y i(1) )]. (A.4) B Empirical Results In this section, we provide additional empirical results for NHANES data, including a simultaneous sensitivity analysis, an approximation of Ω.05, calibration of a simultaneous sensitivity analysis to the observed covariates, and sensitivity of estimates to the choice of (Γ, ) Ω.05. B.1 Simultaneous Sensitivity Analysis to Hidden Bias One type of sensitivity analyses, simultaneous sensitivity analyses, use two sensitivity parameters, Γ and, to measure the degree of hidden bias due to the unobserved covariate in an observational study (Gastwirth et al., 1998). Suppose there is an unobserved covariate u that lies between 0 and 1. One sensitivity parameter, Γ, relates u to treatment; namely, the odds ratio of receiving treatment for two subjects with different values of u is at most Γ, and the other parameter,, relates u to response; namely, the odds ratio of having higher response for two subjects with different values of u is at most. The simultaneous sensitivity analysis finds the maximum p-value over all distributions of u for given values of Γ and. Web Table 1 gives the simultaneous sensitivity analysis for NHANES data. [Web Table 1 about here.] B.2 Approximation of Ω.05 Because the curves of Ω.05 does not have a closed form, we suggest an approach of grid search to find the approximation of Ω.05 ; that is Ω.05. First, we expand the values of (Γ, ) 3

4 from 1.01 to 16 with an increment of 0.01 and create a grid. Second, for each combination of (Γ, ) in the grid, we calculate the corresponding maximum p-values for McNemar s test statistic. Finally, for each Γ, we look for the such that the maximum p-value is close to the significance level, say In Web Table 2, we list 100 randomly selected values of (Γ, ) and their maximum p-values from the collection of (Γ, ) Ω.05, which is used to draw the curves in all figures. [Web Table 2 about here.] B.3 Calibrating the Sensitivity Analysis to Observed Covariates Using the optim function in R, we maximize two log-likelihood functions, logl{θ;z,x,γ = log(2.21) and logl{φ;y (0),X,δ = log(2.21), in (12) and (14) to obtain θ γ and φ δ, listed in the first two columns in Web Table 3. The last two columns of Web Table 3 show Θ γ = exp( θ γ ) and Φ δ = exp( φ δ ) which are the estimated effects of observed covariates on smoking and high blood lead. As an example of how to understand Web Table 3, if there are two subjects within a matched set and the subjects have the same values of all the observed covariates and the unobserved covariate except that one is male and the other is female, then the male subject has 2.36 times as high odds to smoke and has 5.32 times as high odds to have high blood lead. [Web Table 3 about here.] B.4 Sensitivity of Estimates to the Choice of (Γ, ) Ω.05 for the NHANES Data In this section, we examine how sensitive our proposed method is to different choices of (Γ, ) Ω.05 for the NHANES data. The general conjecture is discussed in Web Section C.2. Specifically, we empirically investigate the behavior of the estimates obtained from the 4

5 log-likelihood functions in (12) and (14) given different values of (Γ, ) Ω.05 (see Section B.2). In Web Table 2, we list 100 randomly selected values of (Γ, ) Ω.05. In this section, we pick4outofwebtable2alongwiththedefaultchoiceof(γ, ) = (2.21,2.21)andobtainthe corresponding estimates for the observed covariates given these 5 selected (Γ, ), which are (1.44,8.52), (1.61,4.12), (2.21,2.21), (6.95,1.47) and (11.28,1.41). Figure 1 shows calibration of the simultaneous sensitivity analysis to the observed covariates age, income-to-poverty level, gender, education and race from the 5 selected values of (Γ, ) Ω.05 in Web Table 2. The estimates are not sensitive to the choice of (Γ, ) Ω.05 in age, income-to-poverty level, gender and race. In education, estimates reveal some variation among estimates. Even though these estimates seem to vary in one dimension, mostly on the Γ-axis, at different values of (Γ, ) Ω.05, the conclusion remains consistent; i.e., estimates fall in the shaded area where the maximum p-value for McNemar s test statistic is less than Based on the empirical investigation, calibration of the simultaneous sensitivity analysis to observed covariates is not sensitive to the choice of (Γ, ) Ω.05. Therefore, we suggest a default choice of Γ = = 2.21 for the NHANES data. [Web Figure 1 about here.] C Simulation Studies We consider the following setup for simulation studies such that the simulated data are similar with but simpler than the data from our motivating example described in Section 1.2. Let X 1 be a continuous covariate (e.g., standardized age) with X 1ij N(0,0.25) and X 2 be a binary covariate (e.g., male gender) with X 2ij Bernoulli(0.5). Both X 1 and X 2 are observed. Let u denote a binary unobserved covariate such that u ij Bernoulli(0.5). All X 1, X 2 and u have the same variance of Consider the following data generating 5

6 processes for binary treatment Z ij and binary response under control y (0) ij : Z ij Bernoulli(p ij ), p ij = exp(θ 0 +θ 1 x 1ij +θ 2 x 2ij +γu ij ) 1+exp(θ 0 +θ 1 x 1ij +θ 2 x 2ij +γu ij ), and y (0) ij Bernoulli(π ij ), π ij = exp(φ 0 +φ 1 x 1ij +φ 2 x 2ij +δu ij ) 1+exp(φ 0 +φ 1 x 1ij +φ 2 x 2ij +δu ij ). (C.5) (C.6) Parameters in (C.5) and (C.6) are set as follows: (θ 0,θ 1,θ 2 ) = ( 2,0.5,0.5), (φ 0,φ 1,φ 2 ) = ( 3,0.2,0.2), and (γ,δ) = (0.5,0.2). The sample size for each replicate is 3,340. Suppose the response has a nonnegative effect, y (1) ij y (0) ij. If y(0) ij = 1, then y (1) ij = 1. If y (0) ij = 0, then y (1) ij = 0 or y (1) ij = 1. We assume the attributable effect the effect of the treatment on the ni j=1 Z ij(y (1) ij y (0) ij ). The parameter A is the number of treated subjects is A = I i=1 treated responses actually caused by exposure to the treatment. In Section C.1, we compare estimates obtained from two methods, when y (0) ij in the log-likelihood function in (13) are only observed partially. In Section C.2, we examine the sensitivity of estimates obtained from (14) to the choice of (Γ, ) Ω.05 that is discussed in Section 3.2. C.1 Sensitivity of Estimates to the Use of a Sub-Sample In this simulation, we compare estimates obtained from two methods, when y (0) ij in the loglikelihood function in (13) are only observed partially. Following the discussion in Section 3.1, method I replaces unobserved y (0) ij with y (1) ij for those who have Z ij = 1, and method II uses data only from subjects whose Z ij = 0. We consider the following four possible treatment effects: A = {0,50,75,100, where A = 0 represents the null treatment effect. Web Table 4 shows averaged estimates for parameters φ 1 and φ 2 in (C.6) and their standard errors from 1,000 replicates. When Fisher s sharp null hypothesis is true (i.e., A = 0), both methods provide good estimates while estimates from method II have lost some efficiency due to the use of a sub-sample. When the alternative hypothesis is true (i.e., A = 50,75, or 6

7 100), the estimates from method I starts deviating from true parameters as the treatment effect increases, but the estimates from method II still provides consistent estimates. Based on the results of this set of simulation studies, we suggest the use of method II in our paper. [Web Table 4 about here.] C.2 Sensitivity of Estimates to the Choice of (Γ, ) Ω.05 In this section, we examine how sensitive are estimates for θ and φ to the choice of (Γ, ) Ω.05 throughsimulation. Foreachreplicate, wethenchoose5different valuesof(γ, ) Ω.05 including the suggested default choice of Γ = and obtain 5 sets of estimates for θ 1 and θ 2 in (C.5) and φ 1 and φ 2 in (C.6) following the proposed method discussed in Section 3. We consider one treatment effect, A = 100, and generate 10 data sets following the setup described in the beginning of Section C. Web Figure 2 shows the calibration of the simultaneous sensitivity analysis to(a) observed covariate X 1 ; and (b) observed covariate X 2, at (Γ, ) Ω.05. Ten solid curves represent values of(γ, ) Ω.05 from10simulated data sets. Foreachsimulated dataset, five symbols (square, circle, diamond up-triangle, and down-triangle) represent 5 different arbitrary values of (Γ, ) Ω.05 (on the curve) and their corresponding estimates for exp(θ p ) and exp(φ p ) for p = 1,2 (off the curve); i.e., calibration of unobserved u to observed X p for p = 1,2. For both continuous (X 1 ) and binary (X 2 ) covariates, because estimates are located closely to each other among all 10 simulated data sets, the calibration of simultaneous sensitivity analysis to observed covariates is not sensitive to the choice of (Γ, ) Ω.05. Therefore, in our paper, we suggest choosing a default value of Γ =. [Web Figure 2 about here.] 7

8 D Software In this section, we provide the R codes for the calibration of a simultaneous sensitivity analysis to the observed covariates using the NHANES data in the paper. ############################################################################################################## # PART 1: Data (ref. #### d <- read.table("nhanes2008.lead.smoking.txt",header=t) attach(d) x <- cbind(age,male,edu.lt9,edu.9to11,edu.hischl,edu.somecol,edu.unknown,income,income.mis, black,mexicanam,otherhispan,otherrace) ############################################################################################################## # PART 2: Pair Matching #### library(optmatch) # Matching functions smahal <- function(z, X){ # Calculate rank-based Mahalanobis distance X <- as.matrix(x) n <- dim(x)[1] rownames(x) <- 1:n k <- dim(x)[2] m <- sum(z) for (j in 1:k) X[, j] <- rank(x[, j]) cv <- cov(x) vuntied <- var(1:n) rat <- sqrt(vuntied/diag(cv)) cv <- diag(rat) %*% cv %*% diag(rat) out <- matrix(na, m, n - m) Xc <- X[z == 0, ] Xt <- X[z == 1, ] rownames(out) <- rownames(x)[z == 1] colnames(out) <- rownames(x)[z == 0] library(mass) icov <- ginv(cv) for (i in 1:m) out[i, ] <- mahalanobis(xc, Xt[i, ], icov, inverted = T) out addcaliper <- function(dmat, z, p, calipersd = 0.2, penalty = 1000){ # add a caliper to distance matrix sdp = sd(p) adif = abs(outer(p[z == 1], p[z == 0], "-")) adif = (adif - (calipersd * sdp)) * (adif > (calipersd * sdp)) dmat = dmat + adif * penalty dmat pair.vector <- function(pairmatchvec, treatment){ # find out who is matched treated and who is matched control # NOTE: treated needs to be ordered at the beginning pairs.short <- substr(pairmatchvec, start = 3, stop = 10) pairsnumeric <- as.numeric(pairs.short) notreated <- sum(treatment) pairsvec <- rep(0, notreated) for (i in 1:notreated) { temp <- (pairsnumeric == i) * seq(1, length(pairsnumeric), 1) pairsvec[i] <- sum(temp, na.rm = T) - i pairsvec # propensity score model propscore.model <- glm(smoking~x,family=binomial,x=true) Xmat <- propscore.model$x[,-1] 8

9 colnames(xmat) <- colnames(x) propscore <- predict(propscore.model,type="response") # rank based Mahalanobis distance distmat <- smahal(smoking,xmat) # add caliper to distance matrix distmat2 <- addcaliper(distmat,smoking,propscore) ##### pair matching ##### pairmatchvec <- pairmatch(distmat2) # Create a vector saying which control unit each treated unit is matched to # NOTE: treated needs to be ordered at the beginning pairsvec <- pair.vector(pairmatchvec,smoking) # prepare to put matched data together matched.id <- seq(1,length(which(smoking==1))) Xmat <- as.data.frame(cbind(xmat,edu.college,white)) Xmat[which(Xmat$income.mis==1),"income"] <- NA Xnames <- names(xmat) # smokers id.s <- id[which(smoking==1)] Xmat.matched.smoker <- Xmat[which(smoking==1),] colnames(xmat.matched.smoker) <- paste(xnames,sep="",".s") lead.s <- lead[which(smoking==1)] # non-smokers id.u <- id[pairsvec] Xmat.matched.nonsmoker <- Xmat[pairsvec,] colnames(xmat.matched.nonsmoker) <- paste(xnames,sep="",".u") lead.u <- lead[pairsvec] # put matched data (smokers and non-smokers) together pair.match.data <- as.data.frame(cbind(matched.id,id.s,xmat.matched.smoker,lead.s, id.u,xmat.matched.nonsmoker,lead.u)) ############################################################################################################## # PART 3: Covariates Balance #### maketable.1 <- function(){ # Table 1 in the paper ##### Standardized differences ##### Xnames.stddif <- c("age","income","income.mis","male","edu.lt9","edu.9to11","edu.hischl","edu.somecol", "edu.college","edu.unknown","white","black","mexicanam","otherhispan","otherrace") sd.s <- apply(xmat[which(smoking==1),],2,sd,na.rm=t) sd.u <- apply(xmat[which(smoking==0),],2,sd,na.rm=t) ### Before matching ### STD.DIFF.before <- NULL for(i in 1:length(Xnames.stddif)){ name <- Xnames.stddif[i] x.s <- Xmat[which(smoking==1),name] x.u <- Xmat[which(smoking==0),name] if(name%in%c("age","income")){ mu.s <- mean(x.s,na.rm=t) mu.u <- mean(x.u,na.rm=t) std.diff <- (mu.s-mu.u)/sqrt((sd.s[name]^2 + sd.u[name]^2)/2) STD.DIFF.before <- rbind(std.diff.before,data.frame(cbind(mu.s,mu.u,std.diff))) else{ mu.s <- mean(x.s)*100 mu.u <- mean(x.u)*100 std.diff <- (mu.s/100 - mu.u/100)/sqrt((sd.s[name]^2 + sd.u[name]^2)/2) STD.DIFF.before <- rbind(std.diff.before,data.frame(cbind(mu.s,mu.u,std.diff))) lead.before <- cbind(round(mean(lead[smoking==1]>=5)*100,1),round(mean(lead[smoking==0]>=5)*100,1)) colnames(lead.before) <- c("mu.s","mu.u") rownames(lead.before) <- c("lead.before") ### After matching ### # pair matching STD.DIFF.pair <- NULL for(i in 1:length(Xnames.stddif)){ # i <- 7 name <- Xnames.stddif[i] x.s <- pair.match.data[,paste(name,".s",sep="")] 9

10 x.u <- pair.match.data[,paste(name,".u",sep="")] if(name%in%c("age","income")){ mu.s <- mean(x.s,na.rm=t) mu.u <- mean(x.u,na.rm=t) std.diff <- (mu.s-mu.u)/sqrt((sd.s[name]^2 + sd.u[name]^2)/2) STD.DIFF.pair <- rbind(std.diff.pair,data.frame(cbind(mu.s,mu.u,std.diff))) else{ mu.s <- mean(x.s)*100 mu.u <- mean(x.u)*100 std.diff <- (mu.s/100 - mu.u/100)/sqrt((sd.s[name]^2 + sd.u[name]^2)/2) STD.DIFF.pair <- rbind(std.diff.pair,data.frame(cbind(mu.s,mu.u,std.diff))) lead.after <- cbind(round(mean(pair.match.data$lead.s>=5)*100,1), round(mean(pair.match.data$lead.u>=5)*100,1)) colnames(lead.after) <- c("mu.s","mu.u") rownames(lead.after) <- c("lead.after") list(before=std.diff.before,after=std.diff.pair,outcome=rbind(lead.before,lead.after)) maketable.1() ############################################################################################################## # PART 4: Sensitivity Analysis #### # Functions for Sensitivity analysis for McNemar (binary) and Wilcoxon (continuous) statistics McNemar.sens <- function(i,t,gamma,delta){ # Simultaneous sensitivity analysis for a binary outcome and a binary treatment n.row <- length(gamma) n.col <- length(delta) p.value <- matrix(na,n.row,n.col) rownames(p.value) <- paste("gamma=",gamma,sep="") colnames(p.value) <- paste("delta=",delta,sep="") for(i in 1:n.row){ for(j in 1:n.col){ gamma <- log(gamma[i]) delta <- log(delta[j]) pi.bar <- exp(abs(gamma))/(1+exp(abs(gamma))) theta.bar <- exp(abs(delta))/(1+exp(abs(delta))) if(gamma*delta>=0) p <- pi.bar*theta.bar + (1-pi.bar)*(1-theta.bar) else p <- 1/2 p.value[i,j] <- 1-pbinom(T-1,I,p) p.value Wilcoxon.sens <- function(x,gamma=1,delta=1,gastwirth=true){ # Simultaneous sensitivity analysis for a continuous outcome and a binary treatment # Default with adjustment from Gastwirth et al. (1998) n.row <- length(gamma) n.col <- length(delta) if(sum(x==0)>0){ x <- x[x!=0] Std.Dev <- matrix(na,n.row,n.col) p.value <- matrix(na,n.row,n.col) rownames(std.dev) <- paste("gamma=",gamma,sep="") colnames(std.dev) <- paste("delta=",delta,sep="") rownames(p.value) <- paste("gamma=",gamma,sep="") colnames(p.value) <- paste("delta=",delta,sep="") rank <- rank(abs(x)) if(gastwirth==true){ rank.new <- 2*rank/length(rank) T <- sum(rank.new[x>0]) for(i in 1:n.row){ for(j in 1:n.col){ gamma <- log(gamma[i]) delta <- log(delta[j]) pi.bar <- 1/(1+exp(-abs(gamma))) theta.bar <- 1/(1+exp(-abs(delta)*rank.new)) if(gamma*delta>=0) p <- pi.bar*theta.bar + (1-pi.bar)*(1-theta.bar) else p <- 1/2 Std.Dev[i,j] <- (T - sum(p*rank.new))/sqrt(sum(p*(1-p)*rank.new*rank.new)) p.value[i,j] <- 1-pnorm(Std.Dev[i,j]) else{ 10

11 T <- sum(rank[x>0]) for(i in 1:n.row){ for(j in 1:n.col){ gamma <- log(gamma[i]) delta <- log(delta[j]) pi.bar <- 1/(1+exp(-abs(gamma))) theta.bar <- 1/(1+exp(-abs(delta)*rank)) if(gamma*delta>=0) p <- pi.bar*theta.bar + (1-pi.bar)*(1-theta.bar) else p <- 1/2 Std.Dev[i,j] <- (T - sum(p*rank))/sqrt(sum(p*(1-p)*rank*rank)) p.value[i,j] <- 1-pnorm(Std.Dev[i,j]) p.value ##### binary outcome ##### # binary outcome: 1 if lead>=5 (CDC cutoff) vs 0 if lead<5 binary.lead.s <- 1*(pair.match.data$lead.s>=5); binary.lead.u <- 1*(pair.match.data$lead.u>=5); # I=68 (number of all discordant pairs) # T=46 (number of discordant pairs that smokers had high blood lead but not nonsmokers) # Let qi=1 for i=1,...,68 I <- sum(binary.lead.s!=binary.lead.u) T <- sum(binary.lead.s==1 & binary.lead.u==0) # Test the null assuming no unobserved covariate (Gamma=1 & Delta=1) using McNemar s test McNemar.sens(I,T,1,1) omega <- function(i=null,t=null,r=null,length.out=16,alpha=0.05,type){ # This function create the line of omega at alpha # length.out: value of sensitivity parameters that that extends for plot # type: type of outcome "binary" or "continuous" if(type=="binary" & (!is.numeric(i)!is.numeric(t))){ stop("i or T cannot be NULL if binary") else if(type=="continuous" &!is.numeric(r)){ stop("r cannot be NULL if continuous") # Find values of Gamma and Delat such that Gamma=Delta and max. p-value is about 0.05 if(type=="binary"){ Gamma.eq.Delta <- floor(uniroot(function(g){mcnemar.sens(i,t,g,g)-alpha,interval=c(1,50))$root*1000)/1000 # Starting from the point where Gamma=Delta # Creating a list of Gamma and Delta that will lead to alpha # This is the collection of points on the curve of alpha # Rounding at one-hundredth, values tend to repeat at both ends Gamma.Delta <- rbind( cbind(round(unlist(lapply(rev(seq(ceiling(gamma.eq.delta*100)/100,length.out,.01)), function(d){uniroot(function(g){mcnemar.sens(i,t,g,d)-alpha,interval=c(1,50))$root)),2), rev(seq(ceiling(gamma.eq.delta*100)/100,length.out,.01))), cbind(seq(ceiling(gamma.eq.delta*100)/100,length.out,.01), round(unlist(lapply(seq(ceiling(gamma.eq.delta*100)/100,length.out,.01), function(g){uniroot(function(d){mcnemar.sens(i,t,g,d)-alpha,interval=c(1,50))$root)),2))) # max. p-values corresponding to values of Gamma and Delta from above Gamma.Delta p.alpha <- NULL for(i in 1:dim(Gamma.Delta)[1]){ p.alpha <- c(p.alpha, as.numeric(lapply(gamma.delta[i,1],function(g){ McNemar.sens(I,T,g,Gamma.Delta[i,2]) ))) ; rm(i) else if(type=="continuous"){ Gamma.eq.Delta <- floor(uniroot(function(g){wilcoxon.sens(r,g,g)-alpha,interval=c(1,50))$root*1000)/1000 # create a list of Gamma and Delta that will lead to alpha Gamma.Delta <- rbind( cbind(round(unlist(lapply(rev(seq(ceiling(gamma.eq.delta*100)/100,length.out,.01)), function(d){uniroot(function(g){wilcoxon.sens(r,g,d)-alpha,interval=c(1,11))$root)),2), rev(seq(ceiling(gamma.eq.delta*100)/100,length.out,.01))), cbind(seq(ceiling(gamma.eq.delta*100)/100,length.out,.01), round(unlist(lapply(seq(ceiling(gamma.eq.delta*100)/100,length.out,.01), function(g){uniroot(function(d){wilcoxon.sens(r,g,d)-alpha,interval=c(1,50))$root)),2))) p.alpha <- NULL for(i in 1:dim(Gamma.Delta)[1]){ p.alpha <- c(p.alpha, 11

12 unlist(lapply(gamma.delta[i,1],function(g){ Wilcoxon.sens(r,g,Gamma.Delta[i,2]) ))) # Redefine gamma.delta to make no repeat value a <- unique(gamma.delta[(gamma.delta[,1]<gamma.eq.delta),1]) b <- c(a,unique(gamma.delta[gamma.delta[,2]<gamma.eq.delta,2])) Omega.Gamma.Delta <- matrix(0,nrow=length(b),ncol=2) for(i in 1:length(a)){ Omega.Gamma.Delta[i,] <- matrix(gamma.delta[gamma.delta[,1]==a[i],])[abs(p.alpha[gamma.delta[,1]==a[i]]-alpha) ==min(abs(p.alpha[gamma.delta[,1]==a[i]]-alpha)),] for(i in (length(a)+1):length(b)){ Omega.Gamma.Delta[i,] <- matrix(gamma.delta[gamma.delta[,2]==b[i],])[abs(p.alpha[gamma.delta[,2]==b[i]]-alpha) ==min(abs(p.alpha[gamma.delta[,2]==b[i]]-alpha)),] colnames(omega.gamma.delta) <- c("gamma","delta") # max. p-values corresponding to values of Gamma and Delta from above Gamma.Delta Omega.p.alpha <- NULL for(i in 1:dim(Omega.Gamma.Delta)[1]){ Omega.p.alpha <- c(omega.p.alpha, p.alpha[which(gamma.delta[,1]==omega.gamma.delta[i,1] & Gamma.Delta[,2]==Omega.Gamma.Delta[i,2])]) # the function returns a list of two objects: # (1) values of Gamma and Delta at alpha and (2) their corresponding max. p-values list(gamma.delta=omega.gamma.delta,p.alpha=omega.p.alpha) # binary outcome requires: # I (number of all discordant pairs) and T (number of discordant pairs that treated had high outcome) binary.omega <- omega(i=i,t=t,type="binary") # Create a huge table (grid) of sensitivity analysis sens.grid <- function(i=null,t=null,r=null,type,length.out){ # length.out: value of sensitivity parameters that that extends for plot # type: type of outcome "binary" or "continuous" if(type=="binary" & (!is.numeric(i)!is.numeric(t))){ stop("i or T cannot be NULL if binary") else if(type=="continuous" &!is.numeric(r)){ stop("r cannot be NULL if continuous") Gamma <- seq(101,length.out*100,by=1)/100 Delta <- seq(101,length.out*100,by=1)/100 if(type=="binary"){ grid <- McNemar.sens(I,T,Gamma=Gamma,Delta=Delta) else if(type=="continuous"){ grid <- Wilcoxon.sens(r,Gamma=Gamma,Delta=Delta) grid binary.sens <- sens.grid(i=i,t=t,type="binary",length.out=12) maketable.2 <- function(){ # Web Table 1 in the supplementary materials sens <- round(mcnemar.sens(i,t,c(1,1.75,2,2.21,2.25,2.5, ),c(1,1.75,2,2.21,2.25,2.5, )),4) list(simultaneous.sens=sens) maketable.2() maketable.3 <- function(){ # Web Table 2 in the supplementary materials # Only the first row is shown Gamma <- c(1.39,1.88,2.29,2.95) out <- data.frame(binary.omega$gamma.delta[which(binary.omega$gamma.delta[,1]%in%gamma),]) out$p.value <- round(binary.omega$p.alpha[which(binary.omega$gamma.delta[,1]%in%gamma)],4) out maketable.3() ############################################################################################################## 12

13 # PART 5: Calibrating Sensitivity Analysis #### # Standardized Continuous Covariates to mean 0 and sd 1/2 (Gelman, 2008) stand.age <- (age-mean(age))/(2*sd(age)) stand.income <- (income-mean(income))/(2*sd(income)) binary.lead.outcome <- (lead>=5); # For all McNemar.gamma.delta, p-value is about.05 # Set k such that k=gamma=delta k <- as.numeric(apply(binary.omega$gamma.delta[ which(abs(binary.omega$gamma.delta[,1]-binary.omega$gamma.delta[,2]) ==min(abs(binary.omega$gamma.delta[,1]-binary.omega$gamma.delta[,2]))),],2,mean)[1]) # reset x with standardized continuous covariates x <- cbind(stand.age,male,edu.lt9,edu.9to11,edu.hischl,edu.somecol,edu.unknown,stand.income,income.mis, black,mexicanam,otherhispan,otherrace) calibration <- function(y,z,x,gamma,delta,type){ # y: outcome # z: binary treatment (0 represents control) # x: matrix of covariates # Gamma: sensitivity parameter # Delta: sensitivity parameter # type: type of outcome "binary" or "continuous" likefunc <- function(gamma,z,xmat,beta){ ### Compute the log-likelihood for P(Z_ij=1 X_ij) (or P(Y_ij=1 X_ij)), where ### beta is known and u_ij=1 (0) with probability 1/2 for each X_ij marginal.model <- glm(z~xmat,family=binomial,x=true); xmat.marginal.model <- marginal.model$x; expit <- function(x){ exp(x)/(1+exp(x)); marginal.over.u.prob <-.5*expit(xmat.marginal.model%*%matrix(gamma,ncol=1)) +.5*expit(xmat.marginal.model%*%matrix(gamma,ncol=1)+beta) loglikefunc <- sum(z*log(marginal.over.u.prob/(1-marginal.over.u.prob))+log(1-marginal.over.u.prob)) loglikefunc likefunc.cont <- function(parameter,y,xmat,beta){ ### variation of likefunc when outcome is continuous ### Compute the log-likelihood for P(Y_ij=y_ij X_ij), where ### beta is known and u_ij=1 (0) with probability 1/2 for each X_ij marginal.model <- glm(y~xmat,family=gaussian,x=true); xmat.marginal.model <- marginal.model$x; gamma <- parameter[1:(length(parameter)-1)] mu0 <- xmat.marginal.model%*%matrix(gamma,ncol=1) mu1 <- xmat.marginal.model%*%matrix(gamma,ncol=1) + beta sigma2 <- parameter[length(parameter)] marginal.over.u.prob <-.5*((1/sqrt(2*pi*sigma2))*exp(-(y-mu0)^2/(2*sigma2))) +.5*((1/sqrt(2*pi*sigma2))*exp(-(y-mu1)^2/(2*sigma2))) loglikefunc <- sum(log(marginal.over.u.prob)); loglikefunc # Find the treatment model which maximizes log likelihood, # beta is set to the value of log(k) where Gamma=Delta=k treatmentmodel.start <- glm(z~x,family=binomial,x=true) Xmat.treatment <- treatmentmodel.start$x[,-1] treatmentmodel.optim <- optim(coef(treatmentmodel.start),likefunc,control=list(maxit=20000,fnscale=-1), z=z,xmat=xmat.treatment,beta=log(gamma)) treatmentmodel.par <- treatmentmodel.optim$par[-1] select.s0 <- which(z==0) # only use controls if(type=="binary"){ # Find the outcome model which maximizes log likelihood, # beta is set to the value of log(k) where Gamma=Delta=k outcomemodel.start <- glm(y~x, family=binomial, subset=select.s0, x=true) Xmat.outcome <- outcomemodel.start$x[,-1] outcomemodel.optim <- optim(coef(outcomemodel.start),likefunc,control=list(maxit=20000,fnscale=-1), z=y[select.s0],xmat=xmat.outcome,beta=log(delta)) else if(type=="continuous"){ # Find the outcome model which maximizes log likelihood, # beta is set to the value of log(4.505) where Gamma=Delta=

14 outcomemodel.start <- glm(y~x, family=gaussian, subset=select.s0, x=true) sigma.sq.start <- sum(residuals(outcomemodel.start)^2)/length(residuals(outcomemodel.start)) Xmat.outcome <- outcomemodel.start$x[,-1] outcomemodel.optim <- optim(c(coef(outcomemodel.start),sigma.sq.start),likefunc.cont, control=list(maxit=20000,fnscale=-1),y=y[select.s0],xmat=xmat.outcome,beta=log(delta)) outcomemodel.par <- outcomemodel.optim$par[-1] calibrate.obs <- data.frame(cbind(treatmentmodel.par,outcomemodel.par, exp(treatmentmodel.par),exp(outcomemodel.par))) colnames(calibrate.obs) <- c("gamma","delta","gamma","delta") rownames(calibrate.obs) <- colnames(x) calibrate.obs McNemar.calibrate.obs <- calibration(binary.lead.outcome,smoking,x,gamma=k,delta=k,"binary") maketable.4 <- function(){ # Web Table 3 in the supplementary materials out <- round(mcnemar.calibrate.obs[c("stand.age","stand.income","male","edu.lt9","edu.9to11","edu.hischl","edu.somecol","black out maketable.4() References Gastwirth, J. L., Krieger, A. M., and Rosenbaum, P. R. (1998). Dual and simultaneous sensitivity analysis for matched pairs. Biometrika 85, Gastwirth, J. L., Krieger, A. M., and Rosenbaum, P. R. (2000). Asymptotic separability in sensitivity analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 62,

15 Mexican American vs. White Other Hispanic vs. White Male vs. Female Age of 2 SD difference Black vs. White Other races vs. White Income to poverty level of 2 SD difference Some College vs. College Less than 9th grade vs. College High school vs. College Ω 0.05 : max p value Ω 0.05 : max p value > 0.05 Ω 0.05 : max p value < th grade vs. College Γ=1.44 & =8.52 Γ=1.61 & =4.12 Γ=2.21 & =2.21 Γ=6.95 & =1.47 Γ=11.28 & = Web Figure 1: Calibration of the simultaneous sensitivity analysis to observed covariates (i.e., age, income-to-poverty level, gender and race) in NHANES data given (Γ, ) = {(1.44,8.52),(1.61,4.12),(2.21,2.21),(6.95,1.47),(11.28,1.41) Ω.05. The solid curves represent values of (Γ, ) Ω.05 where the maximum p-value is approximately equal to The shaded area represents values of (Γ, ) Ω.05 where the maximum p-value is less than Thewhitearearepresents valuesof(γ, ) Ω +.05 wherethemaximum p-valueisgreater than The areas (Γ, ) {[0,1] [0,1],[0,1] [1,12],[1,12] [0,1] are magnified to enable clear display of covariates in these areas. Γ 15

16 (a) Calibration to observed X max p value 0.05 Γ = 1 or = (b) Calibration to observed X 2 Γ max p value 0.05 Γ = 1 or = Web Figure 2: Sensitivity of estimates to the choice of (Γ, ) Ω.05 for (a) a continuous covariate X 1 ; and (b) a binary covariate X 2, among 10 simulated data sets. Solid curves are Ω.05 from 10 simulated data sets. Five symbols (square, circle, diamond up-triangle, and down-triangle) on Ω.05 are 5 arbitrary choices of (Γ, ) Ω.05 including the default choice of Γ =. Symbols off the curves are estimates for the observed covariate from 10 simulated data sets. Dashed lines are Γ = 1 or = Γ

17 Web Table 1: The simultaneous sensitivity analysis for NHANES data of smoking and high blood lead: maximum p-value for McNemar s test statistic for hidden bias of various magnitudes. = 1 = 1.75 = 2 = 2.21 = 2.25 = 2.5 Γ = Γ = Γ = Γ = Γ = Γ = Γ

18 Web Table 2: One hundred randomly selected values of (Γ, ) Ω.05 and their corresponding maximum p-values for McNemar s test statistic. Γ p-value Γ p-value Γ p-value Γ p-value

19 Web Table 3: Maximum likelihood estimates for (θ,φ) and (e θ,e φ ) given (γ,δ) = {log(2.21), log(2.21) from NHANES data of smoking and high blood lead. Observed Covariates θγ φδ e θ γ e φ δ Age (old vs. young) Income-to-poverty level (high vs. low) Male vs. Female Education Less than 9th grade vs. College th grade vs. College High school graduate vs. College Some college vs. College Race Black vs. White Mexican American vs. White Other Hispanic vs. White Other races vs. White binary comparisons for 2-standard-deviation difference 19

20 Web Table 4: Simulation studies for estimates and their standard errors of observed covariates obtained from both full and partial data under the null (H 0 : A = 0) and various alternative hypotheses (H 1 : A = 50,75, and 100) from 1,000 replicates, where A is the effect of the treatment on the treated subjects. Method I: use full data (y (0),X) HYP COV PAR MCEST MCSE H 0 : A = 0 X 1 φ 1 = X 2 φ 2 = H 1 : A = 50 X 1 φ 1 = X 2 φ 2 = H 1 : A = 75 X 1 φ 1 = X 2 φ 2 = H 1 : A = 100 X 1 φ 1 = X 2 φ 2 = Method II: use partial data from subjects whose Z ij = 0 (y (0),X) HYP COV PAR MCEST MCSE H 0 : A = 0 X 1 φ 1 = X 2 φ 2 = H 1 : A = 50 X 1 φ 1 = X 2 φ 2 = H 1 : A = 75 X 1 φ 1 = X 2 φ 2 = H 1 : A = 100 X 1 φ 1 = X 2 φ 2 = HYP:Hypothesis; COV:Covariate; PAR:Parameter; MCEST: Monte Carlo Estimate; MCSE: Monte Carlo Standard Error 20

Section 9: Matching without replacement, Genetic matching

Section 9: Matching without replacement, Genetic matching Section 9: Matching without replacement, Genetic matching Fall 2014 1/59 Matching without replacement The standard approach is to minimize the Mahalanobis distance matrix (In GenMatch we use a weighted

More information

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models. Last time: Background & motivation for moving beyond linear Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

More information

Section 3 : Permutation Inference

Section 3 : Permutation Inference Section 3 : Permutation Inference Fall 2014 1/39 Introduction Throughout this slides we will focus only on randomized experiments, i.e the treatment is assigned at random We will follow the notation of

More information

Econometrics Problem Set 10

Econometrics Problem Set 10 Econometrics Problem Set 0 WISE, Xiamen University Spring 207 Conceptual Questions Dependent variable: P ass Probit Logit LPM Probit Logit LPM Probit () (2) (3) (4) (5) (6) (7) Experience 0.03 0.040 0.006

More information

Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data

Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data Bryan A. Comstock and Patrick J. Heagerty Department of Biostatistics University of Washington

More information

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression 22s:52 Applied Linear Regression Ch. 4 (sec. and Ch. 5 (sec. & 4: Logistic Regression Logistic Regression When the response variable is a binary variable, such as 0 or live or die fail or succeed then

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

Introduction to mtm: An R Package for Marginalized Transition Models

Introduction to mtm: An R Package for Marginalized Transition Models Introduction to mtm: An R Package for Marginalized Transition Models Bryan A. Comstock and Patrick J. Heagerty Department of Biostatistics University of Washington 1 Introduction Marginalized transition

More information

BIOS 625 Fall 2015 Homework Set 3 Solutions

BIOS 625 Fall 2015 Homework Set 3 Solutions BIOS 65 Fall 015 Homework Set 3 Solutions 1. Agresti.0 Table.1 is from an early study on the death penalty in Florida. Analyze these data and show that Simpson s Paradox occurs. Death Penalty Victim's

More information

,..., θ(2),..., θ(n)

,..., θ(2),..., θ(n) Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.

More information

Homework 1 Solutions

Homework 1 Solutions 36-720 Homework 1 Solutions Problem 3.4 (a) X 2 79.43 and G 2 90.33. We should compare each to a χ 2 distribution with (2 1)(3 1) 2 degrees of freedom. For each, the p-value is so small that S-plus reports

More information

Asymptotic equivalence of paired Hotelling test and conditional logistic regression

Asymptotic equivalence of paired Hotelling test and conditional logistic regression Asymptotic equivalence of paired Hotelling test and conditional logistic regression Félix Balazard 1,2 arxiv:1610.06774v1 [math.st] 21 Oct 2016 Abstract 1 Sorbonne Universités, UPMC Univ Paris 06, CNRS

More information

Case Definition and Design Sensitivity

Case Definition and Design Sensitivity University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2013 Case Definition and Design Sensitivity Dylan S. Small University of Pennsylvania Jing Cheng M. Elizabeth Halloran

More information

Chapter 22. Comparing Two Proportions 1 /29

Chapter 22. Comparing Two Proportions 1 /29 Chapter 22 Comparing Two Proportions 1 /29 Homework p519 2, 4, 12, 13, 15, 17, 18, 19, 24 2 /29 Objective Students test null and alternate hypothesis about two population proportions. 3 /29 Comparing Two

More information

Sensitivity Analysis for Multiple Comparisons in Matched Observational Studies through Quadratically Constrained Linear Programming

Sensitivity Analysis for Multiple Comparisons in Matched Observational Studies through Quadratically Constrained Linear Programming "Sensitivity Analysis for Multiple Comparisons in Matched Observational Studies through Quadratically Constrained Linear Programming." Fogarty, Colin, and Dylan S. Small. Journal of the American Statistical

More information

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid Applied Economics Regression with a Binary Dependent Variable Department of Economics Universidad Carlos III de Madrid See Stock and Watson (chapter 11) 1 / 28 Binary Dependent Variables: What is Different?

More information

Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula.

Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula. FACULTY OF PSYCHOLOGY AND EDUCATIONAL SCIENCES Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula. Modern Modeling Methods (M 3 ) Conference Beatrijs Moerkerke

More information

Chapter 22. Comparing Two Proportions 1 /30

Chapter 22. Comparing Two Proportions 1 /30 Chapter 22 Comparing Two Proportions 1 /30 Homework p519 2, 4, 12, 13, 15, 17, 18, 19, 24 2 /30 3 /30 Objective Students test null and alternate hypothesis about two population proportions. 4 /30 Comparing

More information

Package Grace. R topics documented: April 9, Type Package

Package Grace. R topics documented: April 9, Type Package Type Package Package Grace April 9, 2017 Title Graph-Constrained Estimation and Hypothesis Tests Version 0.5.3 Date 2017-4-8 Author Sen Zhao Maintainer Sen Zhao Description Use

More information

Cross-screening in observational studies that test many hypotheses

Cross-screening in observational studies that test many hypotheses Cross-screening in observational studies that test many hypotheses Qingyuan Zhao, Dylan S. Small, Paul R. Rosenbaum 1 University of Pennsylvania, Philadelphia Abstract. We discuss observational studies

More information

Sensitivity Analysis for matched pair analysis of binary data: From worst case to average case analysis

Sensitivity Analysis for matched pair analysis of binary data: From worst case to average case analysis Sensitivity Analysis for matched pair analysis of binary data: From worst case to average case analysis Raiden B. Hasegawa and Dylan S. Small arxiv:1707.09549v2 [stat.me] 16 May 2018 May 16, 2018 Abstract

More information

Effect Modification and Design Sensitivity in Observational Studies

Effect Modification and Design Sensitivity in Observational Studies University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2013 Effect Modification and Design Sensitivity in Observational Studies Jesse Y. Hsu University of Pennsylvania Dylan

More information

Duration of Unemployment - Analysis of Deviance Table for Nested Models

Duration of Unemployment - Analysis of Deviance Table for Nested Models Duration of Unemployment - Analysis of Deviance Table for Nested Models February 8, 2012 The data unemployment is included as a contingency table. The response is the duration of unemployment, gender and

More information

Tribhuvan University Institute of Science and Technology 2065

Tribhuvan University Institute of Science and Technology 2065 1CSc. Stat. 108-2065 Tribhuvan University Institute of Science and Technology 2065 Bachelor Level/First Year/ First Semester/ Science Full Marks: 60 Computer Science and Information Technology (Stat. 108)

More information

Comparing MLE, MUE and Firth Estimates for Logistic Regression

Comparing MLE, MUE and Firth Estimates for Logistic Regression Comparing MLE, MUE and Firth Estimates for Logistic Regression Nitin R Patel, Chairman & Co-founder, Cytel Inc. Research Affiliate, MIT nitin@cytel.com Acknowledgements This presentation is based on joint

More information

Section 3: Permutation Inference

Section 3: Permutation Inference Section 3: Permutation Inference Yotam Shem-Tov Fall 2015 Yotam Shem-Tov STAT 239/ PS 236A September 26, 2015 1 / 47 Introduction Throughout this slides we will focus only on randomized experiments, i.e

More information

Dynamics in Social Networks and Causality

Dynamics in Social Networks and Causality Web Science & Technologies University of Koblenz Landau, Germany Dynamics in Social Networks and Causality JProf. Dr. University Koblenz Landau GESIS Leibniz Institute for the Social Sciences Last Time:

More information

Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity

Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity P. Richard Hahn, Jared Murray, and Carlos Carvalho June 22, 2017 The problem setting We want to estimate

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

GOV 2001/ 1002/ E-200 Section 7 Zero-Inflated models and Intro to Multilevel Modeling 1

GOV 2001/ 1002/ E-200 Section 7 Zero-Inflated models and Intro to Multilevel Modeling 1 GOV 2001/ 1002/ E-200 Section 7 Zero-Inflated models and Intro to Multilevel Modeling 1 Anton Strezhnev Harvard University March 23, 2016 1 These section notes are heavily indebted to past Gov 2001 TFs

More information

Section 9c. Propensity scores. Controlling for bias & confounding in observational studies

Section 9c. Propensity scores. Controlling for bias & confounding in observational studies Section 9c Propensity scores Controlling for bias & confounding in observational studies 1 Logistic regression and propensity scores Consider comparing an outcome in two treatment groups: A vs B. In a

More information

PROPENSITY SCORE MATCHING. Walter Leite

PROPENSITY SCORE MATCHING. Walter Leite PROPENSITY SCORE MATCHING Walter Leite 1 EXAMPLE Question: Does having a job that provides or subsidizes child care increate the length that working mothers breastfeed their children? Treatment: Working

More information

Master s Written Examination - Solution

Master s Written Examination - Solution Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2

More information

Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016

Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016 Statistics 255 - Survival Analysis Presented March 3, 2016 Motivating Dan Gillen Department of Statistics University of California, Irvine 11.1 First question: Are the data truly discrete? : Number of

More information

STOCKHOLM UNIVERSITY Department of Economics Course name: Empirical Methods Course code: EC40 Examiner: Lena Nekby Number of credits: 7,5 credits Date of exam: Saturday, May 9, 008 Examination time: 3

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section:

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section: Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 You have until 10:20am to complete this exam. Please remember to put your name,

More information

Stratified Randomized Experiments

Stratified Randomized Experiments Stratified Randomized Experiments Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Stratified Randomized Experiments Stat186/Gov2002 Fall 2018 1 / 13 Blocking

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 23. Decision Trees Barnabás Póczos Contents Decision Trees: Definition + Motivation Algorithm for Learning Decision Trees Entropy, Mutual Information, Information

More information

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, 2016-17 Academic Year Exam Version: A INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This

More information

CHAPTER 1: BINARY LOGIT MODEL

CHAPTER 1: BINARY LOGIT MODEL CHAPTER 1: BINARY LOGIT MODEL Prof. Alan Wan 1 / 44 Table of contents 1. Introduction 1.1 Dichotomous dependent variables 1.2 Problems with OLS 3.3.1 SAS codes and basic outputs 3.3.2 Wald test for individual

More information

SIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS

SIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS SIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS TOMMASO NANNICINI universidad carlos iii de madrid UK Stata Users Group Meeting London, September 10, 2007 CONTENT Presentation of a Stata

More information

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke

More information

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about

More information

Statistics 135: Fall 2004 Final Exam

Statistics 135: Fall 2004 Final Exam Name: SID#: Statistics 135: Fall 2004 Final Exam There are 10 problems and the number of points for each is shown in parentheses. There is a normal table at the end. Show your work. 1. The designer of

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 05 Full points may be obtained for correct answers to eight questions Each numbered question (which may have several parts) is worth

More information

STAT440/840: Statistical Computing

STAT440/840: Statistical Computing First Prev Next Last STAT440/840: Statistical Computing Paul Marriott pmarriott@math.uwaterloo.ca MC 6096 February 2, 2005 Page 1 of 41 First Prev Next Last Page 2 of 41 Chapter 3: Data resampling: the

More information

Summary of Extending the Rank Likelihood for Semiparametric Copula Estimation, by Peter Hoff

Summary of Extending the Rank Likelihood for Semiparametric Copula Estimation, by Peter Hoff Summary of Extending the Rank Likelihood for Semiparametric Copula Estimation, by Peter Hoff David Gerard Department of Statistics University of Washington gerard2@uw.edu May 2, 2013 David Gerard (UW)

More information

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing Primal-dual Covariate Balance and Minimal Double Robustness via (Joint work with Daniel Percival) Department of Statistics, Stanford University JSM, August 9, 2015 Outline 1 2 3 1/18 Setting Rubin s causal

More information

Tests for the Odds Ratio in Logistic Regression with One Binary X (Wald Test)

Tests for the Odds Ratio in Logistic Regression with One Binary X (Wald Test) Chapter 861 Tests for the Odds Ratio in Logistic Regression with One Binary X (Wald Test) Introduction Logistic regression expresses the relationship between a binary response variable and one or more

More information

Comparing Means from Two-Sample

Comparing Means from Two-Sample Comparing Means from Two-Sample Kwonsang Lee University of Pennsylvania kwonlee@wharton.upenn.edu April 3, 2015 Kwonsang Lee STAT111 April 3, 2015 1 / 22 Inference from One-Sample We have two options to

More information

Binary Logistic Regression

Binary Logistic Regression The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b

More information

Bootstrapping Sensitivity Analysis

Bootstrapping Sensitivity Analysis Bootstrapping Sensitivity Analysis Qingyuan Zhao Department of Statistics, The Wharton School University of Pennsylvania May 23, 2018 @ ACIC Based on: Qingyuan Zhao, Dylan S. Small, and Bhaswar B. Bhattacharya.

More information

WISE International Masters

WISE International Masters WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are

More information

Can a Pseudo Panel be a Substitute for a Genuine Panel?

Can a Pseudo Panel be a Substitute for a Genuine Panel? Can a Pseudo Panel be a Substitute for a Genuine Panel? Min Hee Seo Washington University in St. Louis minheeseo@wustl.edu February 16th 1 / 20 Outline Motivation: gauging mechanism of changes Introduce

More information

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,

More information

Lab 4, modified 2/25/11; see also Rogosa R-session

Lab 4, modified 2/25/11; see also Rogosa R-session Lab 4, modified 2/25/11; see also Rogosa R-session Stat 209 Lab: Matched Sets in R Lab prepared by Karen Kapur. 1 Motivation 1. Suppose we are trying to measure the effect of a treatment variable on the

More information

Epidemiology Principle of Biostatistics Chapter 14 - Dependent Samples and effect measures. John Koval

Epidemiology Principle of Biostatistics Chapter 14 - Dependent Samples and effect measures. John Koval Epidemiology 9509 Principle of Biostatistics Chapter 14 - Dependent Samples and effect measures John Koval Department of Epidemiology and Biostatistics University of Western Ontario What is being covered

More information

Recent Advances in the analysis of missing data with non-ignorable missingness

Recent Advances in the analysis of missing data with non-ignorable missingness Recent Advances in the analysis of missing data with non-ignorable missingness Jae-Kwang Kim Department of Statistics, Iowa State University July 4th, 2014 1 Introduction 2 Full likelihood-based ML estimation

More information

STP 226 EXAMPLE EXAM #3 INSTRUCTOR:

STP 226 EXAMPLE EXAM #3 INSTRUCTOR: STP 226 EXAMPLE EXAM #3 INSTRUCTOR: Honor Statement: I have neither given nor received information regarding this exam, and I will not do so until all exams have been graded and returned. Signed Date PRINTED

More information

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, 2016-17 Academic Year Exam Version: A INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This

More information

Chapter 11. Regression with a Binary Dependent Variable

Chapter 11. Regression with a Binary Dependent Variable Chapter 11 Regression with a Binary Dependent Variable 2 Regression with a Binary Dependent Variable (SW Chapter 11) So far the dependent variable (Y) has been continuous: district-wide average test score

More information

N Utilization of Nursing Research in Advanced Practice, Summer 2008

N Utilization of Nursing Research in Advanced Practice, Summer 2008 University of Michigan Deep Blue deepblue.lib.umich.edu 2008-07 536 - Utilization of ursing Research in Advanced Practice, Summer 2008 Tzeng, Huey-Ming Tzeng, H. (2008, ctober 1). Utilization of ursing

More information

Chapters 10. Hypothesis Testing

Chapters 10. Hypothesis Testing Chapters 10. Hypothesis Testing Some examples of hypothesis testing 1. Toss a coin 100 times and get 62 heads. Is this coin a fair coin? 2. Is the new treatment on blood pressure more effective than the

More information

Non-parametric Inference and Resampling

Non-parametric Inference and Resampling Non-parametric Inference and Resampling Exercises by David Wozabal (Last update. Juni 010) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend surfing

More information

Mixed Models No Repeated Measures

Mixed Models No Repeated Measures Chapter 221 Mixed Models No Repeated Measures Introduction This specialized Mixed Models procedure analyzes data from fixed effects, factorial designs. These designs classify subjects into one or more

More information

Stat 206: Estimation and testing for a mean vector,

Stat 206: Estimation and testing for a mean vector, Stat 206: Estimation and testing for a mean vector, Part II James Johndrow 2016-12-03 Comparing components of the mean vector In the last part, we talked about testing the hypothesis H 0 : µ 1 = µ 2 where

More information

Measurement error as missing data: the case of epidemiologic assays. Roderick J. Little

Measurement error as missing data: the case of epidemiologic assays. Roderick J. Little Measurement error as missing data: the case of epidemiologic assays Roderick J. Little Outline Discuss two related calibration topics where classical methods are deficient (A) Limit of quantification methods

More information

Statistical Analysis of List Experiments

Statistical Analysis of List Experiments Statistical Analysis of List Experiments Graeme Blair Kosuke Imai Princeton University December 17, 2010 Blair and Imai (Princeton) List Experiments Political Methodology Seminar 1 / 32 Motivation Surveys

More information

arxiv: v1 [stat.me] 2 Feb 2017

arxiv: v1 [stat.me] 2 Feb 2017 A new, powerful approach to the study of effect modification in observational studies Kwonsang Lee, Dylan S. Small, Paul R. Rosenbaum 1 University of Pennsylvania, Philadelphia arxiv:1702.00525v1 [stat.me]

More information

Econometrics Problem Set 3

Econometrics Problem Set 3 Econometrics Problem Set 3 Conceptual Questions 1. This question refers to the estimated regressions in table 1 computed using data for 1988 from the U.S. Current Population Survey. The data set consists

More information

Comparing groups using predicted probabilities

Comparing groups using predicted probabilities Comparing groups using predicted probabilities J. Scott Long Indiana University May 9, 2006 MAPSS - May 9, 2006 - Page 1 The problem Allison (1999): Di erences in the estimated coe cients tell us nothing

More information

Near/Far Matching. Building a Stronger Instrument in an Observational Study of Perinatal Care for Premature Infants

Near/Far Matching. Building a Stronger Instrument in an Observational Study of Perinatal Care for Premature Infants Near/Far Matching Building a Stronger Instrument in an Observational Study of Perinatal Care for Premature Infants Joint research: Mike Baiocchi, Dylan Small, Scott Lorch and Paul Rosenbaum What this talk

More information

Bayesian Graphical Models

Bayesian Graphical Models Graphical Models and Inference, Lecture 16, Michaelmas Term 2009 December 4, 2009 Parameter θ, data X = x, likelihood L(θ x) p(x θ). Express knowledge about θ through prior distribution π on θ. Inference

More information

MALA versus Random Walk Metropolis Dootika Vats June 4, 2017

MALA versus Random Walk Metropolis Dootika Vats June 4, 2017 MALA versus Random Walk Metropolis Dootika Vats June 4, 2017 Introduction My research thus far has predominantly been on output analysis for Markov chain Monte Carlo. The examples on which I have implemented

More information

TriMatch: An R Package for Propensity Score Matching of Non-binary Treatments

TriMatch: An R Package for Propensity Score Matching of Non-binary Treatments TriMatch: An R Package for Propensity Score Matching of Non-binary Treatments Jason M. Bryer Excelsior College May 3, 013 Abstract The use of propensity score methods (Rosenbaum and Rubin, 1983) have become

More information

Can you tell the relationship between students SAT scores and their college grades?

Can you tell the relationship between students SAT scores and their college grades? Correlation One Challenge Can you tell the relationship between students SAT scores and their college grades? A: The higher SAT scores are, the better GPA may be. B: The higher SAT scores are, the lower

More information

EXAM # 2. Total 100. Please show all work! Problem Points Grade. STAT 301, Spring 2013 Name

EXAM # 2. Total 100. Please show all work! Problem Points Grade. STAT 301, Spring 2013 Name STAT 301, Spring 2013 Name Lec 1, MWF 9:55 - Ismor Fischer Discussion Section: Please circle one! TA: Shixue Li...... 311 (M 4:35) / 312 (M 12:05) / 315 (T 4:00) Xinyu Song... 313 (M 2:25) / 316 (T 12:05)

More information

Combining Non-probability and Probability Survey Samples Through Mass Imputation

Combining Non-probability and Probability Survey Samples Through Mass Imputation Combining Non-probability and Probability Survey Samples Through Mass Imputation Jae-Kwang Kim 1 Iowa State University & KAIST October 27, 2018 1 Joint work with Seho Park, Yilin Chen, and Changbao Wu

More information

Multiple imputation to account for measurement error in marginal structural models

Multiple imputation to account for measurement error in marginal structural models Multiple imputation to account for measurement error in marginal structural models Supplementary material A. Standard marginal structural model We estimate the parameters of the marginal structural model

More information

Paul Schrimpf. January 23, UBC Economics 326. Statistics and Inference. Paul Schrimpf. Properties of estimators. Finite sample inference

Paul Schrimpf. January 23, UBC Economics 326. Statistics and Inference. Paul Schrimpf. Properties of estimators. Finite sample inference UBC Economics 326 January 23, 2018 1 2 3 4 Wooldridge (2013) appendix C Stock and Watson (2009) chapter 3 Angrist and Pischke (2014) chapter 1 appendix Diez, Barr, and Cetinkaya-Rundel (2012) chapters

More information

Tests for the Odds Ratio in a Matched Case-Control Design with a Quantitative X

Tests for the Odds Ratio in a Matched Case-Control Design with a Quantitative X Chapter 157 Tests for the Odds Ratio in a Matched Case-Control Design with a Quantitative X Introduction This procedure calculates the power and sample size necessary in a matched case-control study designed

More information

Sample size calculations for logistic and Poisson regression models

Sample size calculations for logistic and Poisson regression models Biometrika (2), 88, 4, pp. 93 99 2 Biometrika Trust Printed in Great Britain Sample size calculations for logistic and Poisson regression models BY GWOWEN SHIEH Department of Management Science, National

More information

Metropolis-Hastings Algorithm

Metropolis-Hastings Algorithm Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to

More information

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8 CIVL - 7904/8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8 Chi-square Test How to determine the interval from a continuous distribution I = Range 1 + 3.322(logN) I-> Range of the class interval

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

Math Camp. Justin Grimmer. Associate Professor Department of Political Science Stanford University. September 9th, 2016

Math Camp. Justin Grimmer. Associate Professor Department of Political Science Stanford University. September 9th, 2016 Math Camp Justin Grimmer Associate Professor Department of Political Science Stanford University September 9th, 2016 Justin Grimmer (Stanford University) Methodology I September 9th, 2016 1 / 61 Where

More information

Hypothesis testing. Data to decisions

Hypothesis testing. Data to decisions Hypothesis testing Data to decisions The idea Null hypothesis: H 0 : the DGP/population has property P Under the null, a sample statistic has a known distribution If, under that that distribution, the

More information

Statistical Analysis of the Item Count Technique

Statistical Analysis of the Item Count Technique Statistical Analysis of the Item Count Technique Kosuke Imai Department of Politics Princeton University Joint work with Graeme Blair May 4, 2011 Kosuke Imai (Princeton) Item Count Technique UCI (Statistics)

More information

Causal Sensitivity Analysis for Decision Trees

Causal Sensitivity Analysis for Decision Trees Causal Sensitivity Analysis for Decision Trees by Chengbo Li A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Mathematics in Computer

More information

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two

More information

Midterm 2 - Solutions

Midterm 2 - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 - Solutions You have until 10:20am to complete this exam. Please remember to put

More information

Modeling Uncertainty in the Earth Sciences Jef Caers Stanford University

Modeling Uncertainty in the Earth Sciences Jef Caers Stanford University Probability theory and statistical analysis: a review Modeling Uncertainty in the Earth Sciences Jef Caers Stanford University Concepts assumed known Histograms, mean, median, spread, quantiles Probability,

More information

Contents 1. Contents

Contents 1. Contents Contents 1 Contents 1 One-Sample Methods 3 1.1 Parametric Methods.................... 4 1.1.1 One-sample Z-test (see Chapter 0.3.1)...... 4 1.1.2 One-sample t-test................. 6 1.1.3 Large sample

More information

Using Split Samples and Evidence Factors in an Observational Study of Neonatal Outcomes

Using Split Samples and Evidence Factors in an Observational Study of Neonatal Outcomes University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2011 Using Split Samples and Evidence Factors in an Observational Study of Neonatal Outcomes Kai Zhang University

More information

STA 2101/442 Assignment 3 1

STA 2101/442 Assignment 3 1 STA 2101/442 Assignment 3 1 These questions are practice for the midterm and final exam, and are not to be handed in. 1. Suppose X 1,..., X n are a random sample from a distribution with mean µ and variance

More information

R Package glmm: Likelihood-Based Inference for Generalized Linear Mixed Models

R Package glmm: Likelihood-Based Inference for Generalized Linear Mixed Models R Package glmm: Likelihood-Based Inference for Generalized Linear Mixed Models Christina Knudson, Ph.D. University of St. Thomas user!2017 Reviewing the Linear Model The usual linear model assumptions:

More information

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014 LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers

More information