Web-based Supplementary Materials for Calibrating. Sensitivity Analyses to Observed Covariates in. Observational Studies by Hsu and Small
|
|
- Brendan Atkinson
- 6 years ago
- Views:
Transcription
1 Web-based Supplementary Materials for Calibrating Sensitivity Analyses to Observed Covariates in Observational Studies by Hsu and Small Jesse Y. Hsu 1,2,3 Dylan S. Small 1 1 Department of Statistics, The Wharton School, University of Pennsylvania 400 Jon M. Huntsman Hall, 3730 Walnut Street Philadelphia, Pennsylvania, U.S.A. 2 Center for Outcomes Research, The Children s Hospital of Philadelphia Philadelphia, Pennsylvania, U.S.A. 3 hsu9@wharton.upenn.edu 1
2 A Details on Computation of p i (u) In this section, we provide details on computation of p i (u) and p + i discussed in Section 2.3; see Gastwirth et al. (1998) and Gastwirth et al. (2000) for more details. Under model (2), in the case of pair matching, the conditional distribution of the treatment assignment within pair i is π i = Pr(Z i1 = 1,Z i2 = 0 ỹ, Z,X) Pr(Z i1 = 1 ỹ, Z,X)Pr(Z i2 = 0 ỹ, Z,X) = Pr(Z i1 = 1 ỹ, Z,X)Pr(Z i2 = 0 ỹ, Z,X)+Pr(Z i1 = 0 ỹ, Z,X)Pr(Z i2 = 1 ỹ, Z,X) exp(γu i1 ) = exp(γu i1 )+exp(γu i2 ) 1 = 1+exp{γ(u i2 u i1 ). (A.1) Similarly, under model (3), the conditional distribution of response within pair i is λ i = Pr(y i1 = y i(2),y i2 = y i(1) ỹ, Z,X) Pr(y i1 = y i(2) ỹ, Z,X)Pr(y i2 = y i(1) ỹ, Z,X) = Pr(y i1 = y i(2) ỹ, Z,X)Pr(y i2 = y i(1) ỹ, Z,X)+Pr(y i1 = y i(1) ỹ, Z,X)Pr(y i2 = y i(2) ỹ, Z,X) exp{δ(y i(2) u i1 +y i(1) u i2 ) = exp{δ(y i(2) u i1 +y i(1) u i2 )+exp{δ(y i(1) u i1 +y i(2) u i2 ) 1 = 1+exp{δ(y i(2) y i(1) )(u i2 u i1 ). (A.2) For pair i, the chance that the treated subject has the higher response under the null hypothesis is p i (u) = π i λ i +(1 π i )(1 λ i ) = exp{γ(u i2 u i1 )exp{δ(y i(2) y i(1) )(u i2 u i1 )+1 [1+exp{γ(u i2 u i1 )][1+exp{δ(y i(2) y i(1) )(u i2 u i1 )]. (A.3) 2
3 The maximum value of p i (u), p + i, can be obtained by setting u i1 = 0 and u i2 = 1, which is p + i = p i (u i1 = 0,u i2 = 1) = exp(γ)exp{δ(y i(2) y i(1) )+1 {1+exp(γ)[1+exp{δ(y i(2) y i(1) )]. (A.4) B Empirical Results In this section, we provide additional empirical results for NHANES data, including a simultaneous sensitivity analysis, an approximation of Ω.05, calibration of a simultaneous sensitivity analysis to the observed covariates, and sensitivity of estimates to the choice of (Γ, ) Ω.05. B.1 Simultaneous Sensitivity Analysis to Hidden Bias One type of sensitivity analyses, simultaneous sensitivity analyses, use two sensitivity parameters, Γ and, to measure the degree of hidden bias due to the unobserved covariate in an observational study (Gastwirth et al., 1998). Suppose there is an unobserved covariate u that lies between 0 and 1. One sensitivity parameter, Γ, relates u to treatment; namely, the odds ratio of receiving treatment for two subjects with different values of u is at most Γ, and the other parameter,, relates u to response; namely, the odds ratio of having higher response for two subjects with different values of u is at most. The simultaneous sensitivity analysis finds the maximum p-value over all distributions of u for given values of Γ and. Web Table 1 gives the simultaneous sensitivity analysis for NHANES data. [Web Table 1 about here.] B.2 Approximation of Ω.05 Because the curves of Ω.05 does not have a closed form, we suggest an approach of grid search to find the approximation of Ω.05 ; that is Ω.05. First, we expand the values of (Γ, ) 3
4 from 1.01 to 16 with an increment of 0.01 and create a grid. Second, for each combination of (Γ, ) in the grid, we calculate the corresponding maximum p-values for McNemar s test statistic. Finally, for each Γ, we look for the such that the maximum p-value is close to the significance level, say In Web Table 2, we list 100 randomly selected values of (Γ, ) and their maximum p-values from the collection of (Γ, ) Ω.05, which is used to draw the curves in all figures. [Web Table 2 about here.] B.3 Calibrating the Sensitivity Analysis to Observed Covariates Using the optim function in R, we maximize two log-likelihood functions, logl{θ;z,x,γ = log(2.21) and logl{φ;y (0),X,δ = log(2.21), in (12) and (14) to obtain θ γ and φ δ, listed in the first two columns in Web Table 3. The last two columns of Web Table 3 show Θ γ = exp( θ γ ) and Φ δ = exp( φ δ ) which are the estimated effects of observed covariates on smoking and high blood lead. As an example of how to understand Web Table 3, if there are two subjects within a matched set and the subjects have the same values of all the observed covariates and the unobserved covariate except that one is male and the other is female, then the male subject has 2.36 times as high odds to smoke and has 5.32 times as high odds to have high blood lead. [Web Table 3 about here.] B.4 Sensitivity of Estimates to the Choice of (Γ, ) Ω.05 for the NHANES Data In this section, we examine how sensitive our proposed method is to different choices of (Γ, ) Ω.05 for the NHANES data. The general conjecture is discussed in Web Section C.2. Specifically, we empirically investigate the behavior of the estimates obtained from the 4
5 log-likelihood functions in (12) and (14) given different values of (Γ, ) Ω.05 (see Section B.2). In Web Table 2, we list 100 randomly selected values of (Γ, ) Ω.05. In this section, we pick4outofwebtable2alongwiththedefaultchoiceof(γ, ) = (2.21,2.21)andobtainthe corresponding estimates for the observed covariates given these 5 selected (Γ, ), which are (1.44,8.52), (1.61,4.12), (2.21,2.21), (6.95,1.47) and (11.28,1.41). Figure 1 shows calibration of the simultaneous sensitivity analysis to the observed covariates age, income-to-poverty level, gender, education and race from the 5 selected values of (Γ, ) Ω.05 in Web Table 2. The estimates are not sensitive to the choice of (Γ, ) Ω.05 in age, income-to-poverty level, gender and race. In education, estimates reveal some variation among estimates. Even though these estimates seem to vary in one dimension, mostly on the Γ-axis, at different values of (Γ, ) Ω.05, the conclusion remains consistent; i.e., estimates fall in the shaded area where the maximum p-value for McNemar s test statistic is less than Based on the empirical investigation, calibration of the simultaneous sensitivity analysis to observed covariates is not sensitive to the choice of (Γ, ) Ω.05. Therefore, we suggest a default choice of Γ = = 2.21 for the NHANES data. [Web Figure 1 about here.] C Simulation Studies We consider the following setup for simulation studies such that the simulated data are similar with but simpler than the data from our motivating example described in Section 1.2. Let X 1 be a continuous covariate (e.g., standardized age) with X 1ij N(0,0.25) and X 2 be a binary covariate (e.g., male gender) with X 2ij Bernoulli(0.5). Both X 1 and X 2 are observed. Let u denote a binary unobserved covariate such that u ij Bernoulli(0.5). All X 1, X 2 and u have the same variance of Consider the following data generating 5
6 processes for binary treatment Z ij and binary response under control y (0) ij : Z ij Bernoulli(p ij ), p ij = exp(θ 0 +θ 1 x 1ij +θ 2 x 2ij +γu ij ) 1+exp(θ 0 +θ 1 x 1ij +θ 2 x 2ij +γu ij ), and y (0) ij Bernoulli(π ij ), π ij = exp(φ 0 +φ 1 x 1ij +φ 2 x 2ij +δu ij ) 1+exp(φ 0 +φ 1 x 1ij +φ 2 x 2ij +δu ij ). (C.5) (C.6) Parameters in (C.5) and (C.6) are set as follows: (θ 0,θ 1,θ 2 ) = ( 2,0.5,0.5), (φ 0,φ 1,φ 2 ) = ( 3,0.2,0.2), and (γ,δ) = (0.5,0.2). The sample size for each replicate is 3,340. Suppose the response has a nonnegative effect, y (1) ij y (0) ij. If y(0) ij = 1, then y (1) ij = 1. If y (0) ij = 0, then y (1) ij = 0 or y (1) ij = 1. We assume the attributable effect the effect of the treatment on the ni j=1 Z ij(y (1) ij y (0) ij ). The parameter A is the number of treated subjects is A = I i=1 treated responses actually caused by exposure to the treatment. In Section C.1, we compare estimates obtained from two methods, when y (0) ij in the log-likelihood function in (13) are only observed partially. In Section C.2, we examine the sensitivity of estimates obtained from (14) to the choice of (Γ, ) Ω.05 that is discussed in Section 3.2. C.1 Sensitivity of Estimates to the Use of a Sub-Sample In this simulation, we compare estimates obtained from two methods, when y (0) ij in the loglikelihood function in (13) are only observed partially. Following the discussion in Section 3.1, method I replaces unobserved y (0) ij with y (1) ij for those who have Z ij = 1, and method II uses data only from subjects whose Z ij = 0. We consider the following four possible treatment effects: A = {0,50,75,100, where A = 0 represents the null treatment effect. Web Table 4 shows averaged estimates for parameters φ 1 and φ 2 in (C.6) and their standard errors from 1,000 replicates. When Fisher s sharp null hypothesis is true (i.e., A = 0), both methods provide good estimates while estimates from method II have lost some efficiency due to the use of a sub-sample. When the alternative hypothesis is true (i.e., A = 50,75, or 6
7 100), the estimates from method I starts deviating from true parameters as the treatment effect increases, but the estimates from method II still provides consistent estimates. Based on the results of this set of simulation studies, we suggest the use of method II in our paper. [Web Table 4 about here.] C.2 Sensitivity of Estimates to the Choice of (Γ, ) Ω.05 In this section, we examine how sensitive are estimates for θ and φ to the choice of (Γ, ) Ω.05 throughsimulation. Foreachreplicate, wethenchoose5different valuesof(γ, ) Ω.05 including the suggested default choice of Γ = and obtain 5 sets of estimates for θ 1 and θ 2 in (C.5) and φ 1 and φ 2 in (C.6) following the proposed method discussed in Section 3. We consider one treatment effect, A = 100, and generate 10 data sets following the setup described in the beginning of Section C. Web Figure 2 shows the calibration of the simultaneous sensitivity analysis to(a) observed covariate X 1 ; and (b) observed covariate X 2, at (Γ, ) Ω.05. Ten solid curves represent values of(γ, ) Ω.05 from10simulated data sets. Foreachsimulated dataset, five symbols (square, circle, diamond up-triangle, and down-triangle) represent 5 different arbitrary values of (Γ, ) Ω.05 (on the curve) and their corresponding estimates for exp(θ p ) and exp(φ p ) for p = 1,2 (off the curve); i.e., calibration of unobserved u to observed X p for p = 1,2. For both continuous (X 1 ) and binary (X 2 ) covariates, because estimates are located closely to each other among all 10 simulated data sets, the calibration of simultaneous sensitivity analysis to observed covariates is not sensitive to the choice of (Γ, ) Ω.05. Therefore, in our paper, we suggest choosing a default value of Γ =. [Web Figure 2 about here.] 7
8 D Software In this section, we provide the R codes for the calibration of a simultaneous sensitivity analysis to the observed covariates using the NHANES data in the paper. ############################################################################################################## # PART 1: Data (ref. #### d <- read.table("nhanes2008.lead.smoking.txt",header=t) attach(d) x <- cbind(age,male,edu.lt9,edu.9to11,edu.hischl,edu.somecol,edu.unknown,income,income.mis, black,mexicanam,otherhispan,otherrace) ############################################################################################################## # PART 2: Pair Matching #### library(optmatch) # Matching functions smahal <- function(z, X){ # Calculate rank-based Mahalanobis distance X <- as.matrix(x) n <- dim(x)[1] rownames(x) <- 1:n k <- dim(x)[2] m <- sum(z) for (j in 1:k) X[, j] <- rank(x[, j]) cv <- cov(x) vuntied <- var(1:n) rat <- sqrt(vuntied/diag(cv)) cv <- diag(rat) %*% cv %*% diag(rat) out <- matrix(na, m, n - m) Xc <- X[z == 0, ] Xt <- X[z == 1, ] rownames(out) <- rownames(x)[z == 1] colnames(out) <- rownames(x)[z == 0] library(mass) icov <- ginv(cv) for (i in 1:m) out[i, ] <- mahalanobis(xc, Xt[i, ], icov, inverted = T) out addcaliper <- function(dmat, z, p, calipersd = 0.2, penalty = 1000){ # add a caliper to distance matrix sdp = sd(p) adif = abs(outer(p[z == 1], p[z == 0], "-")) adif = (adif - (calipersd * sdp)) * (adif > (calipersd * sdp)) dmat = dmat + adif * penalty dmat pair.vector <- function(pairmatchvec, treatment){ # find out who is matched treated and who is matched control # NOTE: treated needs to be ordered at the beginning pairs.short <- substr(pairmatchvec, start = 3, stop = 10) pairsnumeric <- as.numeric(pairs.short) notreated <- sum(treatment) pairsvec <- rep(0, notreated) for (i in 1:notreated) { temp <- (pairsnumeric == i) * seq(1, length(pairsnumeric), 1) pairsvec[i] <- sum(temp, na.rm = T) - i pairsvec # propensity score model propscore.model <- glm(smoking~x,family=binomial,x=true) Xmat <- propscore.model$x[,-1] 8
9 colnames(xmat) <- colnames(x) propscore <- predict(propscore.model,type="response") # rank based Mahalanobis distance distmat <- smahal(smoking,xmat) # add caliper to distance matrix distmat2 <- addcaliper(distmat,smoking,propscore) ##### pair matching ##### pairmatchvec <- pairmatch(distmat2) # Create a vector saying which control unit each treated unit is matched to # NOTE: treated needs to be ordered at the beginning pairsvec <- pair.vector(pairmatchvec,smoking) # prepare to put matched data together matched.id <- seq(1,length(which(smoking==1))) Xmat <- as.data.frame(cbind(xmat,edu.college,white)) Xmat[which(Xmat$income.mis==1),"income"] <- NA Xnames <- names(xmat) # smokers id.s <- id[which(smoking==1)] Xmat.matched.smoker <- Xmat[which(smoking==1),] colnames(xmat.matched.smoker) <- paste(xnames,sep="",".s") lead.s <- lead[which(smoking==1)] # non-smokers id.u <- id[pairsvec] Xmat.matched.nonsmoker <- Xmat[pairsvec,] colnames(xmat.matched.nonsmoker) <- paste(xnames,sep="",".u") lead.u <- lead[pairsvec] # put matched data (smokers and non-smokers) together pair.match.data <- as.data.frame(cbind(matched.id,id.s,xmat.matched.smoker,lead.s, id.u,xmat.matched.nonsmoker,lead.u)) ############################################################################################################## # PART 3: Covariates Balance #### maketable.1 <- function(){ # Table 1 in the paper ##### Standardized differences ##### Xnames.stddif <- c("age","income","income.mis","male","edu.lt9","edu.9to11","edu.hischl","edu.somecol", "edu.college","edu.unknown","white","black","mexicanam","otherhispan","otherrace") sd.s <- apply(xmat[which(smoking==1),],2,sd,na.rm=t) sd.u <- apply(xmat[which(smoking==0),],2,sd,na.rm=t) ### Before matching ### STD.DIFF.before <- NULL for(i in 1:length(Xnames.stddif)){ name <- Xnames.stddif[i] x.s <- Xmat[which(smoking==1),name] x.u <- Xmat[which(smoking==0),name] if(name%in%c("age","income")){ mu.s <- mean(x.s,na.rm=t) mu.u <- mean(x.u,na.rm=t) std.diff <- (mu.s-mu.u)/sqrt((sd.s[name]^2 + sd.u[name]^2)/2) STD.DIFF.before <- rbind(std.diff.before,data.frame(cbind(mu.s,mu.u,std.diff))) else{ mu.s <- mean(x.s)*100 mu.u <- mean(x.u)*100 std.diff <- (mu.s/100 - mu.u/100)/sqrt((sd.s[name]^2 + sd.u[name]^2)/2) STD.DIFF.before <- rbind(std.diff.before,data.frame(cbind(mu.s,mu.u,std.diff))) lead.before <- cbind(round(mean(lead[smoking==1]>=5)*100,1),round(mean(lead[smoking==0]>=5)*100,1)) colnames(lead.before) <- c("mu.s","mu.u") rownames(lead.before) <- c("lead.before") ### After matching ### # pair matching STD.DIFF.pair <- NULL for(i in 1:length(Xnames.stddif)){ # i <- 7 name <- Xnames.stddif[i] x.s <- pair.match.data[,paste(name,".s",sep="")] 9
10 x.u <- pair.match.data[,paste(name,".u",sep="")] if(name%in%c("age","income")){ mu.s <- mean(x.s,na.rm=t) mu.u <- mean(x.u,na.rm=t) std.diff <- (mu.s-mu.u)/sqrt((sd.s[name]^2 + sd.u[name]^2)/2) STD.DIFF.pair <- rbind(std.diff.pair,data.frame(cbind(mu.s,mu.u,std.diff))) else{ mu.s <- mean(x.s)*100 mu.u <- mean(x.u)*100 std.diff <- (mu.s/100 - mu.u/100)/sqrt((sd.s[name]^2 + sd.u[name]^2)/2) STD.DIFF.pair <- rbind(std.diff.pair,data.frame(cbind(mu.s,mu.u,std.diff))) lead.after <- cbind(round(mean(pair.match.data$lead.s>=5)*100,1), round(mean(pair.match.data$lead.u>=5)*100,1)) colnames(lead.after) <- c("mu.s","mu.u") rownames(lead.after) <- c("lead.after") list(before=std.diff.before,after=std.diff.pair,outcome=rbind(lead.before,lead.after)) maketable.1() ############################################################################################################## # PART 4: Sensitivity Analysis #### # Functions for Sensitivity analysis for McNemar (binary) and Wilcoxon (continuous) statistics McNemar.sens <- function(i,t,gamma,delta){ # Simultaneous sensitivity analysis for a binary outcome and a binary treatment n.row <- length(gamma) n.col <- length(delta) p.value <- matrix(na,n.row,n.col) rownames(p.value) <- paste("gamma=",gamma,sep="") colnames(p.value) <- paste("delta=",delta,sep="") for(i in 1:n.row){ for(j in 1:n.col){ gamma <- log(gamma[i]) delta <- log(delta[j]) pi.bar <- exp(abs(gamma))/(1+exp(abs(gamma))) theta.bar <- exp(abs(delta))/(1+exp(abs(delta))) if(gamma*delta>=0) p <- pi.bar*theta.bar + (1-pi.bar)*(1-theta.bar) else p <- 1/2 p.value[i,j] <- 1-pbinom(T-1,I,p) p.value Wilcoxon.sens <- function(x,gamma=1,delta=1,gastwirth=true){ # Simultaneous sensitivity analysis for a continuous outcome and a binary treatment # Default with adjustment from Gastwirth et al. (1998) n.row <- length(gamma) n.col <- length(delta) if(sum(x==0)>0){ x <- x[x!=0] Std.Dev <- matrix(na,n.row,n.col) p.value <- matrix(na,n.row,n.col) rownames(std.dev) <- paste("gamma=",gamma,sep="") colnames(std.dev) <- paste("delta=",delta,sep="") rownames(p.value) <- paste("gamma=",gamma,sep="") colnames(p.value) <- paste("delta=",delta,sep="") rank <- rank(abs(x)) if(gastwirth==true){ rank.new <- 2*rank/length(rank) T <- sum(rank.new[x>0]) for(i in 1:n.row){ for(j in 1:n.col){ gamma <- log(gamma[i]) delta <- log(delta[j]) pi.bar <- 1/(1+exp(-abs(gamma))) theta.bar <- 1/(1+exp(-abs(delta)*rank.new)) if(gamma*delta>=0) p <- pi.bar*theta.bar + (1-pi.bar)*(1-theta.bar) else p <- 1/2 Std.Dev[i,j] <- (T - sum(p*rank.new))/sqrt(sum(p*(1-p)*rank.new*rank.new)) p.value[i,j] <- 1-pnorm(Std.Dev[i,j]) else{ 10
11 T <- sum(rank[x>0]) for(i in 1:n.row){ for(j in 1:n.col){ gamma <- log(gamma[i]) delta <- log(delta[j]) pi.bar <- 1/(1+exp(-abs(gamma))) theta.bar <- 1/(1+exp(-abs(delta)*rank)) if(gamma*delta>=0) p <- pi.bar*theta.bar + (1-pi.bar)*(1-theta.bar) else p <- 1/2 Std.Dev[i,j] <- (T - sum(p*rank))/sqrt(sum(p*(1-p)*rank*rank)) p.value[i,j] <- 1-pnorm(Std.Dev[i,j]) p.value ##### binary outcome ##### # binary outcome: 1 if lead>=5 (CDC cutoff) vs 0 if lead<5 binary.lead.s <- 1*(pair.match.data$lead.s>=5); binary.lead.u <- 1*(pair.match.data$lead.u>=5); # I=68 (number of all discordant pairs) # T=46 (number of discordant pairs that smokers had high blood lead but not nonsmokers) # Let qi=1 for i=1,...,68 I <- sum(binary.lead.s!=binary.lead.u) T <- sum(binary.lead.s==1 & binary.lead.u==0) # Test the null assuming no unobserved covariate (Gamma=1 & Delta=1) using McNemar s test McNemar.sens(I,T,1,1) omega <- function(i=null,t=null,r=null,length.out=16,alpha=0.05,type){ # This function create the line of omega at alpha # length.out: value of sensitivity parameters that that extends for plot # type: type of outcome "binary" or "continuous" if(type=="binary" & (!is.numeric(i)!is.numeric(t))){ stop("i or T cannot be NULL if binary") else if(type=="continuous" &!is.numeric(r)){ stop("r cannot be NULL if continuous") # Find values of Gamma and Delat such that Gamma=Delta and max. p-value is about 0.05 if(type=="binary"){ Gamma.eq.Delta <- floor(uniroot(function(g){mcnemar.sens(i,t,g,g)-alpha,interval=c(1,50))$root*1000)/1000 # Starting from the point where Gamma=Delta # Creating a list of Gamma and Delta that will lead to alpha # This is the collection of points on the curve of alpha # Rounding at one-hundredth, values tend to repeat at both ends Gamma.Delta <- rbind( cbind(round(unlist(lapply(rev(seq(ceiling(gamma.eq.delta*100)/100,length.out,.01)), function(d){uniroot(function(g){mcnemar.sens(i,t,g,d)-alpha,interval=c(1,50))$root)),2), rev(seq(ceiling(gamma.eq.delta*100)/100,length.out,.01))), cbind(seq(ceiling(gamma.eq.delta*100)/100,length.out,.01), round(unlist(lapply(seq(ceiling(gamma.eq.delta*100)/100,length.out,.01), function(g){uniroot(function(d){mcnemar.sens(i,t,g,d)-alpha,interval=c(1,50))$root)),2))) # max. p-values corresponding to values of Gamma and Delta from above Gamma.Delta p.alpha <- NULL for(i in 1:dim(Gamma.Delta)[1]){ p.alpha <- c(p.alpha, as.numeric(lapply(gamma.delta[i,1],function(g){ McNemar.sens(I,T,g,Gamma.Delta[i,2]) ))) ; rm(i) else if(type=="continuous"){ Gamma.eq.Delta <- floor(uniroot(function(g){wilcoxon.sens(r,g,g)-alpha,interval=c(1,50))$root*1000)/1000 # create a list of Gamma and Delta that will lead to alpha Gamma.Delta <- rbind( cbind(round(unlist(lapply(rev(seq(ceiling(gamma.eq.delta*100)/100,length.out,.01)), function(d){uniroot(function(g){wilcoxon.sens(r,g,d)-alpha,interval=c(1,11))$root)),2), rev(seq(ceiling(gamma.eq.delta*100)/100,length.out,.01))), cbind(seq(ceiling(gamma.eq.delta*100)/100,length.out,.01), round(unlist(lapply(seq(ceiling(gamma.eq.delta*100)/100,length.out,.01), function(g){uniroot(function(d){wilcoxon.sens(r,g,d)-alpha,interval=c(1,50))$root)),2))) p.alpha <- NULL for(i in 1:dim(Gamma.Delta)[1]){ p.alpha <- c(p.alpha, 11
12 unlist(lapply(gamma.delta[i,1],function(g){ Wilcoxon.sens(r,g,Gamma.Delta[i,2]) ))) # Redefine gamma.delta to make no repeat value a <- unique(gamma.delta[(gamma.delta[,1]<gamma.eq.delta),1]) b <- c(a,unique(gamma.delta[gamma.delta[,2]<gamma.eq.delta,2])) Omega.Gamma.Delta <- matrix(0,nrow=length(b),ncol=2) for(i in 1:length(a)){ Omega.Gamma.Delta[i,] <- matrix(gamma.delta[gamma.delta[,1]==a[i],])[abs(p.alpha[gamma.delta[,1]==a[i]]-alpha) ==min(abs(p.alpha[gamma.delta[,1]==a[i]]-alpha)),] for(i in (length(a)+1):length(b)){ Omega.Gamma.Delta[i,] <- matrix(gamma.delta[gamma.delta[,2]==b[i],])[abs(p.alpha[gamma.delta[,2]==b[i]]-alpha) ==min(abs(p.alpha[gamma.delta[,2]==b[i]]-alpha)),] colnames(omega.gamma.delta) <- c("gamma","delta") # max. p-values corresponding to values of Gamma and Delta from above Gamma.Delta Omega.p.alpha <- NULL for(i in 1:dim(Omega.Gamma.Delta)[1]){ Omega.p.alpha <- c(omega.p.alpha, p.alpha[which(gamma.delta[,1]==omega.gamma.delta[i,1] & Gamma.Delta[,2]==Omega.Gamma.Delta[i,2])]) # the function returns a list of two objects: # (1) values of Gamma and Delta at alpha and (2) their corresponding max. p-values list(gamma.delta=omega.gamma.delta,p.alpha=omega.p.alpha) # binary outcome requires: # I (number of all discordant pairs) and T (number of discordant pairs that treated had high outcome) binary.omega <- omega(i=i,t=t,type="binary") # Create a huge table (grid) of sensitivity analysis sens.grid <- function(i=null,t=null,r=null,type,length.out){ # length.out: value of sensitivity parameters that that extends for plot # type: type of outcome "binary" or "continuous" if(type=="binary" & (!is.numeric(i)!is.numeric(t))){ stop("i or T cannot be NULL if binary") else if(type=="continuous" &!is.numeric(r)){ stop("r cannot be NULL if continuous") Gamma <- seq(101,length.out*100,by=1)/100 Delta <- seq(101,length.out*100,by=1)/100 if(type=="binary"){ grid <- McNemar.sens(I,T,Gamma=Gamma,Delta=Delta) else if(type=="continuous"){ grid <- Wilcoxon.sens(r,Gamma=Gamma,Delta=Delta) grid binary.sens <- sens.grid(i=i,t=t,type="binary",length.out=12) maketable.2 <- function(){ # Web Table 1 in the supplementary materials sens <- round(mcnemar.sens(i,t,c(1,1.75,2,2.21,2.25,2.5, ),c(1,1.75,2,2.21,2.25,2.5, )),4) list(simultaneous.sens=sens) maketable.2() maketable.3 <- function(){ # Web Table 2 in the supplementary materials # Only the first row is shown Gamma <- c(1.39,1.88,2.29,2.95) out <- data.frame(binary.omega$gamma.delta[which(binary.omega$gamma.delta[,1]%in%gamma),]) out$p.value <- round(binary.omega$p.alpha[which(binary.omega$gamma.delta[,1]%in%gamma)],4) out maketable.3() ############################################################################################################## 12
13 # PART 5: Calibrating Sensitivity Analysis #### # Standardized Continuous Covariates to mean 0 and sd 1/2 (Gelman, 2008) stand.age <- (age-mean(age))/(2*sd(age)) stand.income <- (income-mean(income))/(2*sd(income)) binary.lead.outcome <- (lead>=5); # For all McNemar.gamma.delta, p-value is about.05 # Set k such that k=gamma=delta k <- as.numeric(apply(binary.omega$gamma.delta[ which(abs(binary.omega$gamma.delta[,1]-binary.omega$gamma.delta[,2]) ==min(abs(binary.omega$gamma.delta[,1]-binary.omega$gamma.delta[,2]))),],2,mean)[1]) # reset x with standardized continuous covariates x <- cbind(stand.age,male,edu.lt9,edu.9to11,edu.hischl,edu.somecol,edu.unknown,stand.income,income.mis, black,mexicanam,otherhispan,otherrace) calibration <- function(y,z,x,gamma,delta,type){ # y: outcome # z: binary treatment (0 represents control) # x: matrix of covariates # Gamma: sensitivity parameter # Delta: sensitivity parameter # type: type of outcome "binary" or "continuous" likefunc <- function(gamma,z,xmat,beta){ ### Compute the log-likelihood for P(Z_ij=1 X_ij) (or P(Y_ij=1 X_ij)), where ### beta is known and u_ij=1 (0) with probability 1/2 for each X_ij marginal.model <- glm(z~xmat,family=binomial,x=true); xmat.marginal.model <- marginal.model$x; expit <- function(x){ exp(x)/(1+exp(x)); marginal.over.u.prob <-.5*expit(xmat.marginal.model%*%matrix(gamma,ncol=1)) +.5*expit(xmat.marginal.model%*%matrix(gamma,ncol=1)+beta) loglikefunc <- sum(z*log(marginal.over.u.prob/(1-marginal.over.u.prob))+log(1-marginal.over.u.prob)) loglikefunc likefunc.cont <- function(parameter,y,xmat,beta){ ### variation of likefunc when outcome is continuous ### Compute the log-likelihood for P(Y_ij=y_ij X_ij), where ### beta is known and u_ij=1 (0) with probability 1/2 for each X_ij marginal.model <- glm(y~xmat,family=gaussian,x=true); xmat.marginal.model <- marginal.model$x; gamma <- parameter[1:(length(parameter)-1)] mu0 <- xmat.marginal.model%*%matrix(gamma,ncol=1) mu1 <- xmat.marginal.model%*%matrix(gamma,ncol=1) + beta sigma2 <- parameter[length(parameter)] marginal.over.u.prob <-.5*((1/sqrt(2*pi*sigma2))*exp(-(y-mu0)^2/(2*sigma2))) +.5*((1/sqrt(2*pi*sigma2))*exp(-(y-mu1)^2/(2*sigma2))) loglikefunc <- sum(log(marginal.over.u.prob)); loglikefunc # Find the treatment model which maximizes log likelihood, # beta is set to the value of log(k) where Gamma=Delta=k treatmentmodel.start <- glm(z~x,family=binomial,x=true) Xmat.treatment <- treatmentmodel.start$x[,-1] treatmentmodel.optim <- optim(coef(treatmentmodel.start),likefunc,control=list(maxit=20000,fnscale=-1), z=z,xmat=xmat.treatment,beta=log(gamma)) treatmentmodel.par <- treatmentmodel.optim$par[-1] select.s0 <- which(z==0) # only use controls if(type=="binary"){ # Find the outcome model which maximizes log likelihood, # beta is set to the value of log(k) where Gamma=Delta=k outcomemodel.start <- glm(y~x, family=binomial, subset=select.s0, x=true) Xmat.outcome <- outcomemodel.start$x[,-1] outcomemodel.optim <- optim(coef(outcomemodel.start),likefunc,control=list(maxit=20000,fnscale=-1), z=y[select.s0],xmat=xmat.outcome,beta=log(delta)) else if(type=="continuous"){ # Find the outcome model which maximizes log likelihood, # beta is set to the value of log(4.505) where Gamma=Delta=
14 outcomemodel.start <- glm(y~x, family=gaussian, subset=select.s0, x=true) sigma.sq.start <- sum(residuals(outcomemodel.start)^2)/length(residuals(outcomemodel.start)) Xmat.outcome <- outcomemodel.start$x[,-1] outcomemodel.optim <- optim(c(coef(outcomemodel.start),sigma.sq.start),likefunc.cont, control=list(maxit=20000,fnscale=-1),y=y[select.s0],xmat=xmat.outcome,beta=log(delta)) outcomemodel.par <- outcomemodel.optim$par[-1] calibrate.obs <- data.frame(cbind(treatmentmodel.par,outcomemodel.par, exp(treatmentmodel.par),exp(outcomemodel.par))) colnames(calibrate.obs) <- c("gamma","delta","gamma","delta") rownames(calibrate.obs) <- colnames(x) calibrate.obs McNemar.calibrate.obs <- calibration(binary.lead.outcome,smoking,x,gamma=k,delta=k,"binary") maketable.4 <- function(){ # Web Table 3 in the supplementary materials out <- round(mcnemar.calibrate.obs[c("stand.age","stand.income","male","edu.lt9","edu.9to11","edu.hischl","edu.somecol","black out maketable.4() References Gastwirth, J. L., Krieger, A. M., and Rosenbaum, P. R. (1998). Dual and simultaneous sensitivity analysis for matched pairs. Biometrika 85, Gastwirth, J. L., Krieger, A. M., and Rosenbaum, P. R. (2000). Asymptotic separability in sensitivity analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 62,
15 Mexican American vs. White Other Hispanic vs. White Male vs. Female Age of 2 SD difference Black vs. White Other races vs. White Income to poverty level of 2 SD difference Some College vs. College Less than 9th grade vs. College High school vs. College Ω 0.05 : max p value Ω 0.05 : max p value > 0.05 Ω 0.05 : max p value < th grade vs. College Γ=1.44 & =8.52 Γ=1.61 & =4.12 Γ=2.21 & =2.21 Γ=6.95 & =1.47 Γ=11.28 & = Web Figure 1: Calibration of the simultaneous sensitivity analysis to observed covariates (i.e., age, income-to-poverty level, gender and race) in NHANES data given (Γ, ) = {(1.44,8.52),(1.61,4.12),(2.21,2.21),(6.95,1.47),(11.28,1.41) Ω.05. The solid curves represent values of (Γ, ) Ω.05 where the maximum p-value is approximately equal to The shaded area represents values of (Γ, ) Ω.05 where the maximum p-value is less than Thewhitearearepresents valuesof(γ, ) Ω +.05 wherethemaximum p-valueisgreater than The areas (Γ, ) {[0,1] [0,1],[0,1] [1,12],[1,12] [0,1] are magnified to enable clear display of covariates in these areas. Γ 15
16 (a) Calibration to observed X max p value 0.05 Γ = 1 or = (b) Calibration to observed X 2 Γ max p value 0.05 Γ = 1 or = Web Figure 2: Sensitivity of estimates to the choice of (Γ, ) Ω.05 for (a) a continuous covariate X 1 ; and (b) a binary covariate X 2, among 10 simulated data sets. Solid curves are Ω.05 from 10 simulated data sets. Five symbols (square, circle, diamond up-triangle, and down-triangle) on Ω.05 are 5 arbitrary choices of (Γ, ) Ω.05 including the default choice of Γ =. Symbols off the curves are estimates for the observed covariate from 10 simulated data sets. Dashed lines are Γ = 1 or = Γ
17 Web Table 1: The simultaneous sensitivity analysis for NHANES data of smoking and high blood lead: maximum p-value for McNemar s test statistic for hidden bias of various magnitudes. = 1 = 1.75 = 2 = 2.21 = 2.25 = 2.5 Γ = Γ = Γ = Γ = Γ = Γ = Γ
18 Web Table 2: One hundred randomly selected values of (Γ, ) Ω.05 and their corresponding maximum p-values for McNemar s test statistic. Γ p-value Γ p-value Γ p-value Γ p-value
19 Web Table 3: Maximum likelihood estimates for (θ,φ) and (e θ,e φ ) given (γ,δ) = {log(2.21), log(2.21) from NHANES data of smoking and high blood lead. Observed Covariates θγ φδ e θ γ e φ δ Age (old vs. young) Income-to-poverty level (high vs. low) Male vs. Female Education Less than 9th grade vs. College th grade vs. College High school graduate vs. College Some college vs. College Race Black vs. White Mexican American vs. White Other Hispanic vs. White Other races vs. White binary comparisons for 2-standard-deviation difference 19
20 Web Table 4: Simulation studies for estimates and their standard errors of observed covariates obtained from both full and partial data under the null (H 0 : A = 0) and various alternative hypotheses (H 1 : A = 50,75, and 100) from 1,000 replicates, where A is the effect of the treatment on the treated subjects. Method I: use full data (y (0),X) HYP COV PAR MCEST MCSE H 0 : A = 0 X 1 φ 1 = X 2 φ 2 = H 1 : A = 50 X 1 φ 1 = X 2 φ 2 = H 1 : A = 75 X 1 φ 1 = X 2 φ 2 = H 1 : A = 100 X 1 φ 1 = X 2 φ 2 = Method II: use partial data from subjects whose Z ij = 0 (y (0),X) HYP COV PAR MCEST MCSE H 0 : A = 0 X 1 φ 1 = X 2 φ 2 = H 1 : A = 50 X 1 φ 1 = X 2 φ 2 = H 1 : A = 75 X 1 φ 1 = X 2 φ 2 = H 1 : A = 100 X 1 φ 1 = X 2 φ 2 = HYP:Hypothesis; COV:Covariate; PAR:Parameter; MCEST: Monte Carlo Estimate; MCSE: Monte Carlo Standard Error 20
Section 9: Matching without replacement, Genetic matching
Section 9: Matching without replacement, Genetic matching Fall 2014 1/59 Matching without replacement The standard approach is to minimize the Mahalanobis distance matrix (In GenMatch we use a weighted
More informationGeneralized Linear Models. Last time: Background & motivation for moving beyond linear
Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered
More informationSection 3 : Permutation Inference
Section 3 : Permutation Inference Fall 2014 1/39 Introduction Throughout this slides we will focus only on randomized experiments, i.e the treatment is assigned at random We will follow the notation of
More informationEconometrics Problem Set 10
Econometrics Problem Set 0 WISE, Xiamen University Spring 207 Conceptual Questions Dependent variable: P ass Probit Logit LPM Probit Logit LPM Probit () (2) (3) (4) (5) (6) (7) Experience 0.03 0.040 0.006
More informationIntroduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data
Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data Bryan A. Comstock and Patrick J. Heagerty Department of Biostatistics University of Washington
More information22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression
22s:52 Applied Linear Regression Ch. 4 (sec. and Ch. 5 (sec. & 4: Logistic Regression Logistic Regression When the response variable is a binary variable, such as 0 or live or die fail or succeed then
More informationFigure 36: Respiratory infection versus time for the first 49 children.
y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects
More informationIntroduction to mtm: An R Package for Marginalized Transition Models
Introduction to mtm: An R Package for Marginalized Transition Models Bryan A. Comstock and Patrick J. Heagerty Department of Biostatistics University of Washington 1 Introduction Marginalized transition
More informationBIOS 625 Fall 2015 Homework Set 3 Solutions
BIOS 65 Fall 015 Homework Set 3 Solutions 1. Agresti.0 Table.1 is from an early study on the death penalty in Florida. Analyze these data and show that Simpson s Paradox occurs. Death Penalty Victim's
More information,..., θ(2),..., θ(n)
Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.
More informationHomework 1 Solutions
36-720 Homework 1 Solutions Problem 3.4 (a) X 2 79.43 and G 2 90.33. We should compare each to a χ 2 distribution with (2 1)(3 1) 2 degrees of freedom. For each, the p-value is so small that S-plus reports
More informationAsymptotic equivalence of paired Hotelling test and conditional logistic regression
Asymptotic equivalence of paired Hotelling test and conditional logistic regression Félix Balazard 1,2 arxiv:1610.06774v1 [math.st] 21 Oct 2016 Abstract 1 Sorbonne Universités, UPMC Univ Paris 06, CNRS
More informationCase Definition and Design Sensitivity
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2013 Case Definition and Design Sensitivity Dylan S. Small University of Pennsylvania Jing Cheng M. Elizabeth Halloran
More informationChapter 22. Comparing Two Proportions 1 /29
Chapter 22 Comparing Two Proportions 1 /29 Homework p519 2, 4, 12, 13, 15, 17, 18, 19, 24 2 /29 Objective Students test null and alternate hypothesis about two population proportions. 3 /29 Comparing Two
More informationSensitivity Analysis for Multiple Comparisons in Matched Observational Studies through Quadratically Constrained Linear Programming
"Sensitivity Analysis for Multiple Comparisons in Matched Observational Studies through Quadratically Constrained Linear Programming." Fogarty, Colin, and Dylan S. Small. Journal of the American Statistical
More informationApplied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid
Applied Economics Regression with a Binary Dependent Variable Department of Economics Universidad Carlos III de Madrid See Stock and Watson (chapter 11) 1 / 28 Binary Dependent Variables: What is Different?
More informationFlexible mediation analysis in the presence of non-linear relations: beyond the mediation formula.
FACULTY OF PSYCHOLOGY AND EDUCATIONAL SCIENCES Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula. Modern Modeling Methods (M 3 ) Conference Beatrijs Moerkerke
More informationChapter 22. Comparing Two Proportions 1 /30
Chapter 22 Comparing Two Proportions 1 /30 Homework p519 2, 4, 12, 13, 15, 17, 18, 19, 24 2 /30 3 /30 Objective Students test null and alternate hypothesis about two population proportions. 4 /30 Comparing
More informationPackage Grace. R topics documented: April 9, Type Package
Type Package Package Grace April 9, 2017 Title Graph-Constrained Estimation and Hypothesis Tests Version 0.5.3 Date 2017-4-8 Author Sen Zhao Maintainer Sen Zhao Description Use
More informationCross-screening in observational studies that test many hypotheses
Cross-screening in observational studies that test many hypotheses Qingyuan Zhao, Dylan S. Small, Paul R. Rosenbaum 1 University of Pennsylvania, Philadelphia Abstract. We discuss observational studies
More informationSensitivity Analysis for matched pair analysis of binary data: From worst case to average case analysis
Sensitivity Analysis for matched pair analysis of binary data: From worst case to average case analysis Raiden B. Hasegawa and Dylan S. Small arxiv:1707.09549v2 [stat.me] 16 May 2018 May 16, 2018 Abstract
More informationEffect Modification and Design Sensitivity in Observational Studies
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2013 Effect Modification and Design Sensitivity in Observational Studies Jesse Y. Hsu University of Pennsylvania Dylan
More informationDuration of Unemployment - Analysis of Deviance Table for Nested Models
Duration of Unemployment - Analysis of Deviance Table for Nested Models February 8, 2012 The data unemployment is included as a contingency table. The response is the duration of unemployment, gender and
More informationTribhuvan University Institute of Science and Technology 2065
1CSc. Stat. 108-2065 Tribhuvan University Institute of Science and Technology 2065 Bachelor Level/First Year/ First Semester/ Science Full Marks: 60 Computer Science and Information Technology (Stat. 108)
More informationComparing MLE, MUE and Firth Estimates for Logistic Regression
Comparing MLE, MUE and Firth Estimates for Logistic Regression Nitin R Patel, Chairman & Co-founder, Cytel Inc. Research Affiliate, MIT nitin@cytel.com Acknowledgements This presentation is based on joint
More informationSection 3: Permutation Inference
Section 3: Permutation Inference Yotam Shem-Tov Fall 2015 Yotam Shem-Tov STAT 239/ PS 236A September 26, 2015 1 / 47 Introduction Throughout this slides we will focus only on randomized experiments, i.e
More informationDynamics in Social Networks and Causality
Web Science & Technologies University of Koblenz Landau, Germany Dynamics in Social Networks and Causality JProf. Dr. University Koblenz Landau GESIS Leibniz Institute for the Social Sciences Last Time:
More informationBayesian regression tree models for causal inference: regularization, confounding and heterogeneity
Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity P. Richard Hahn, Jared Murray, and Carlos Carvalho June 22, 2017 The problem setting We want to estimate
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationGOV 2001/ 1002/ E-200 Section 7 Zero-Inflated models and Intro to Multilevel Modeling 1
GOV 2001/ 1002/ E-200 Section 7 Zero-Inflated models and Intro to Multilevel Modeling 1 Anton Strezhnev Harvard University March 23, 2016 1 These section notes are heavily indebted to past Gov 2001 TFs
More informationSection 9c. Propensity scores. Controlling for bias & confounding in observational studies
Section 9c Propensity scores Controlling for bias & confounding in observational studies 1 Logistic regression and propensity scores Consider comparing an outcome in two treatment groups: A vs B. In a
More informationPROPENSITY SCORE MATCHING. Walter Leite
PROPENSITY SCORE MATCHING Walter Leite 1 EXAMPLE Question: Does having a job that provides or subsidizes child care increate the length that working mothers breastfeed their children? Treatment: Working
More informationMaster s Written Examination - Solution
Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2
More informationLecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016
Statistics 255 - Survival Analysis Presented March 3, 2016 Motivating Dan Gillen Department of Statistics University of California, Irvine 11.1 First question: Are the data truly discrete? : Number of
More informationSTOCKHOLM UNIVERSITY Department of Economics Course name: Empirical Methods Course code: EC40 Examiner: Lena Nekby Number of credits: 7,5 credits Date of exam: Saturday, May 9, 008 Examination time: 3
More informationUNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator
UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages
More informationEcn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section:
Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 You have until 10:20am to complete this exam. Please remember to put your name,
More informationStratified Randomized Experiments
Stratified Randomized Experiments Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Stratified Randomized Experiments Stat186/Gov2002 Fall 2018 1 / 13 Blocking
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 23. Decision Trees Barnabás Póczos Contents Decision Trees: Definition + Motivation Algorithm for Learning Decision Trees Entropy, Mutual Information, Information
More informationWISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A
WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, 2016-17 Academic Year Exam Version: A INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This
More informationCHAPTER 1: BINARY LOGIT MODEL
CHAPTER 1: BINARY LOGIT MODEL Prof. Alan Wan 1 / 44 Table of contents 1. Introduction 1.1 Dichotomous dependent variables 1.2 Problems with OLS 3.3.1 SAS codes and basic outputs 3.3.2 Wald test for individual
More informationSIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS
SIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS TOMMASO NANNICINI universidad carlos iii de madrid UK Stata Users Group Meeting London, September 10, 2007 CONTENT Presentation of a Stata
More informationCausal Inference with General Treatment Regimes: Generalizing the Propensity Score
Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke
More informationSection IX. Introduction to Logistic Regression for binary outcomes. Poisson regression
Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about
More informationStatistics 135: Fall 2004 Final Exam
Name: SID#: Statistics 135: Fall 2004 Final Exam There are 10 problems and the number of points for each is shown in parentheses. There is a normal table at the end. Show your work. 1. The designer of
More informationMaster s Written Examination
Master s Written Examination Option: Statistics and Probability Spring 05 Full points may be obtained for correct answers to eight questions Each numbered question (which may have several parts) is worth
More informationSTAT440/840: Statistical Computing
First Prev Next Last STAT440/840: Statistical Computing Paul Marriott pmarriott@math.uwaterloo.ca MC 6096 February 2, 2005 Page 1 of 41 First Prev Next Last Page 2 of 41 Chapter 3: Data resampling: the
More informationSummary of Extending the Rank Likelihood for Semiparametric Copula Estimation, by Peter Hoff
Summary of Extending the Rank Likelihood for Semiparametric Copula Estimation, by Peter Hoff David Gerard Department of Statistics University of Washington gerard2@uw.edu May 2, 2013 David Gerard (UW)
More informationPrimal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing
Primal-dual Covariate Balance and Minimal Double Robustness via (Joint work with Daniel Percival) Department of Statistics, Stanford University JSM, August 9, 2015 Outline 1 2 3 1/18 Setting Rubin s causal
More informationTests for the Odds Ratio in Logistic Regression with One Binary X (Wald Test)
Chapter 861 Tests for the Odds Ratio in Logistic Regression with One Binary X (Wald Test) Introduction Logistic regression expresses the relationship between a binary response variable and one or more
More informationComparing Means from Two-Sample
Comparing Means from Two-Sample Kwonsang Lee University of Pennsylvania kwonlee@wharton.upenn.edu April 3, 2015 Kwonsang Lee STAT111 April 3, 2015 1 / 22 Inference from One-Sample We have two options to
More informationBinary Logistic Regression
The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b
More informationBootstrapping Sensitivity Analysis
Bootstrapping Sensitivity Analysis Qingyuan Zhao Department of Statistics, The Wharton School University of Pennsylvania May 23, 2018 @ ACIC Based on: Qingyuan Zhao, Dylan S. Small, and Bhaswar B. Bhattacharya.
More informationWISE International Masters
WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are
More informationCan a Pseudo Panel be a Substitute for a Genuine Panel?
Can a Pseudo Panel be a Substitute for a Genuine Panel? Min Hee Seo Washington University in St. Louis minheeseo@wustl.edu February 16th 1 / 20 Outline Motivation: gauging mechanism of changes Introduce
More informationBIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke
BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart
More informationComputational Systems Biology: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,
More informationLab 4, modified 2/25/11; see also Rogosa R-session
Lab 4, modified 2/25/11; see also Rogosa R-session Stat 209 Lab: Matched Sets in R Lab prepared by Karen Kapur. 1 Motivation 1. Suppose we are trying to measure the effect of a treatment variable on the
More informationEpidemiology Principle of Biostatistics Chapter 14 - Dependent Samples and effect measures. John Koval
Epidemiology 9509 Principle of Biostatistics Chapter 14 - Dependent Samples and effect measures John Koval Department of Epidemiology and Biostatistics University of Western Ontario What is being covered
More informationRecent Advances in the analysis of missing data with non-ignorable missingness
Recent Advances in the analysis of missing data with non-ignorable missingness Jae-Kwang Kim Department of Statistics, Iowa State University July 4th, 2014 1 Introduction 2 Full likelihood-based ML estimation
More informationSTP 226 EXAMPLE EXAM #3 INSTRUCTOR:
STP 226 EXAMPLE EXAM #3 INSTRUCTOR: Honor Statement: I have neither given nor received information regarding this exam, and I will not do so until all exams have been graded and returned. Signed Date PRINTED
More informationWISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A
WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, 2016-17 Academic Year Exam Version: A INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This
More informationChapter 11. Regression with a Binary Dependent Variable
Chapter 11 Regression with a Binary Dependent Variable 2 Regression with a Binary Dependent Variable (SW Chapter 11) So far the dependent variable (Y) has been continuous: district-wide average test score
More informationN Utilization of Nursing Research in Advanced Practice, Summer 2008
University of Michigan Deep Blue deepblue.lib.umich.edu 2008-07 536 - Utilization of ursing Research in Advanced Practice, Summer 2008 Tzeng, Huey-Ming Tzeng, H. (2008, ctober 1). Utilization of ursing
More informationChapters 10. Hypothesis Testing
Chapters 10. Hypothesis Testing Some examples of hypothesis testing 1. Toss a coin 100 times and get 62 heads. Is this coin a fair coin? 2. Is the new treatment on blood pressure more effective than the
More informationNon-parametric Inference and Resampling
Non-parametric Inference and Resampling Exercises by David Wozabal (Last update. Juni 010) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend surfing
More informationMixed Models No Repeated Measures
Chapter 221 Mixed Models No Repeated Measures Introduction This specialized Mixed Models procedure analyzes data from fixed effects, factorial designs. These designs classify subjects into one or more
More informationStat 206: Estimation and testing for a mean vector,
Stat 206: Estimation and testing for a mean vector, Part II James Johndrow 2016-12-03 Comparing components of the mean vector In the last part, we talked about testing the hypothesis H 0 : µ 1 = µ 2 where
More informationMeasurement error as missing data: the case of epidemiologic assays. Roderick J. Little
Measurement error as missing data: the case of epidemiologic assays Roderick J. Little Outline Discuss two related calibration topics where classical methods are deficient (A) Limit of quantification methods
More informationStatistical Analysis of List Experiments
Statistical Analysis of List Experiments Graeme Blair Kosuke Imai Princeton University December 17, 2010 Blair and Imai (Princeton) List Experiments Political Methodology Seminar 1 / 32 Motivation Surveys
More informationarxiv: v1 [stat.me] 2 Feb 2017
A new, powerful approach to the study of effect modification in observational studies Kwonsang Lee, Dylan S. Small, Paul R. Rosenbaum 1 University of Pennsylvania, Philadelphia arxiv:1702.00525v1 [stat.me]
More informationEconometrics Problem Set 3
Econometrics Problem Set 3 Conceptual Questions 1. This question refers to the estimated regressions in table 1 computed using data for 1988 from the U.S. Current Population Survey. The data set consists
More informationComparing groups using predicted probabilities
Comparing groups using predicted probabilities J. Scott Long Indiana University May 9, 2006 MAPSS - May 9, 2006 - Page 1 The problem Allison (1999): Di erences in the estimated coe cients tell us nothing
More informationNear/Far Matching. Building a Stronger Instrument in an Observational Study of Perinatal Care for Premature Infants
Near/Far Matching Building a Stronger Instrument in an Observational Study of Perinatal Care for Premature Infants Joint research: Mike Baiocchi, Dylan Small, Scott Lorch and Paul Rosenbaum What this talk
More informationBayesian Graphical Models
Graphical Models and Inference, Lecture 16, Michaelmas Term 2009 December 4, 2009 Parameter θ, data X = x, likelihood L(θ x) p(x θ). Express knowledge about θ through prior distribution π on θ. Inference
More informationMALA versus Random Walk Metropolis Dootika Vats June 4, 2017
MALA versus Random Walk Metropolis Dootika Vats June 4, 2017 Introduction My research thus far has predominantly been on output analysis for Markov chain Monte Carlo. The examples on which I have implemented
More informationTriMatch: An R Package for Propensity Score Matching of Non-binary Treatments
TriMatch: An R Package for Propensity Score Matching of Non-binary Treatments Jason M. Bryer Excelsior College May 3, 013 Abstract The use of propensity score methods (Rosenbaum and Rubin, 1983) have become
More informationCan you tell the relationship between students SAT scores and their college grades?
Correlation One Challenge Can you tell the relationship between students SAT scores and their college grades? A: The higher SAT scores are, the better GPA may be. B: The higher SAT scores are, the lower
More informationEXAM # 2. Total 100. Please show all work! Problem Points Grade. STAT 301, Spring 2013 Name
STAT 301, Spring 2013 Name Lec 1, MWF 9:55 - Ismor Fischer Discussion Section: Please circle one! TA: Shixue Li...... 311 (M 4:35) / 312 (M 12:05) / 315 (T 4:00) Xinyu Song... 313 (M 2:25) / 316 (T 12:05)
More informationCombining Non-probability and Probability Survey Samples Through Mass Imputation
Combining Non-probability and Probability Survey Samples Through Mass Imputation Jae-Kwang Kim 1 Iowa State University & KAIST October 27, 2018 1 Joint work with Seho Park, Yilin Chen, and Changbao Wu
More informationMultiple imputation to account for measurement error in marginal structural models
Multiple imputation to account for measurement error in marginal structural models Supplementary material A. Standard marginal structural model We estimate the parameters of the marginal structural model
More informationPaul Schrimpf. January 23, UBC Economics 326. Statistics and Inference. Paul Schrimpf. Properties of estimators. Finite sample inference
UBC Economics 326 January 23, 2018 1 2 3 4 Wooldridge (2013) appendix C Stock and Watson (2009) chapter 3 Angrist and Pischke (2014) chapter 1 appendix Diez, Barr, and Cetinkaya-Rundel (2012) chapters
More informationTests for the Odds Ratio in a Matched Case-Control Design with a Quantitative X
Chapter 157 Tests for the Odds Ratio in a Matched Case-Control Design with a Quantitative X Introduction This procedure calculates the power and sample size necessary in a matched case-control study designed
More informationSample size calculations for logistic and Poisson regression models
Biometrika (2), 88, 4, pp. 93 99 2 Biometrika Trust Printed in Great Britain Sample size calculations for logistic and Poisson regression models BY GWOWEN SHIEH Department of Management Science, National
More informationMetropolis-Hastings Algorithm
Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to
More informationCIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8
CIVL - 7904/8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8 Chi-square Test How to determine the interval from a continuous distribution I = Range 1 + 3.322(logN) I-> Range of the class interval
More informationSTA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3
STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae
More informationMonte Carlo Studies. The response in a Monte Carlo study is a random variable.
Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating
More informationMath Camp. Justin Grimmer. Associate Professor Department of Political Science Stanford University. September 9th, 2016
Math Camp Justin Grimmer Associate Professor Department of Political Science Stanford University September 9th, 2016 Justin Grimmer (Stanford University) Methodology I September 9th, 2016 1 / 61 Where
More informationHypothesis testing. Data to decisions
Hypothesis testing Data to decisions The idea Null hypothesis: H 0 : the DGP/population has property P Under the null, a sample statistic has a known distribution If, under that that distribution, the
More informationStatistical Analysis of the Item Count Technique
Statistical Analysis of the Item Count Technique Kosuke Imai Department of Politics Princeton University Joint work with Graeme Blair May 4, 2011 Kosuke Imai (Princeton) Item Count Technique UCI (Statistics)
More informationCausal Sensitivity Analysis for Decision Trees
Causal Sensitivity Analysis for Decision Trees by Chengbo Li A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Mathematics in Computer
More informationLinear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52
Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two
More informationMidterm 2 - Solutions
Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 - Solutions You have until 10:20am to complete this exam. Please remember to put
More informationModeling Uncertainty in the Earth Sciences Jef Caers Stanford University
Probability theory and statistical analysis: a review Modeling Uncertainty in the Earth Sciences Jef Caers Stanford University Concepts assumed known Histograms, mean, median, spread, quantiles Probability,
More informationContents 1. Contents
Contents 1 Contents 1 One-Sample Methods 3 1.1 Parametric Methods.................... 4 1.1.1 One-sample Z-test (see Chapter 0.3.1)...... 4 1.1.2 One-sample t-test................. 6 1.1.3 Large sample
More informationUsing Split Samples and Evidence Factors in an Observational Study of Neonatal Outcomes
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2011 Using Split Samples and Evidence Factors in an Observational Study of Neonatal Outcomes Kai Zhang University
More informationSTA 2101/442 Assignment 3 1
STA 2101/442 Assignment 3 1 These questions are practice for the midterm and final exam, and are not to be handed in. 1. Suppose X 1,..., X n are a random sample from a distribution with mean µ and variance
More informationR Package glmm: Likelihood-Based Inference for Generalized Linear Mixed Models
R Package glmm: Likelihood-Based Inference for Generalized Linear Mixed Models Christina Knudson, Ph.D. University of St. Thomas user!2017 Reviewing the Linear Model The usual linear model assumptions:
More informationLISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014
LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers
More information