University of California, Berkeley

Size: px
Start display at page:

Download "University of California, Berkeley"

Transcription

1 University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2011 Paper 283 Targete Maximum Likelihoo stimation of Conitional Relative Risk in a Semi-parametric Regression Moel Cathy Tuglus Kristin. Porter Mark J. van er Laan University of California, Berkeley, ctuglus@gmail.com University of California, Berkeley, kristinporter@berkeley.eu University of California - Berkeley, laan@berkeley.eu This working paper is hoste by The Berkeley lectronic Press (bepress) an may not be commercially reprouce without the permission of the copyright holer. Copyright c 2011 by the authors.

2 Targete Maximum Likelihoo stimation of Conitional Relative Risk in a Semi-parametric Regression Moel Cathy Tuglus, Kristin. Porter, an Mark J. van er Laan Abstract The conitional relative risk is an important measure in meical an epiemiological stuies when the outcome of interest is binary (i.e. isease vs. no isease). When the outcome is common, estimation of conitional relative risk an relate parameters can be problematic, especially when the exposure or covariates are continuous. We propose a new estimation proceure base on targete maximum likelihoo methoology that targets the parameters relating to the conitional relative risk for common outcomes uner a log-linear, or multiplicative, semi-parametric moel. In this paper, we present three possible targete maximum likelihoo estimators for relative risk parameters implie by such a moel: log-binomial base, Poisson-base, an a general semi-parametric approach. We present the properties an trae-offs of each of these estimators, focusing in particular on the Poisson-base estimator, which is most practical for implementation. We show that the resulting estimator is ouble robust an asymptotically linear, an inference can be obtaine using the corresponing influence curve. The robustness of our estimator is compare to alternative methos (e.g. loglinear, Poisson regression) through simulation uner moel misspecification an increasing violations of the positivity assumption. The estimation proceure is then applie to a stuy of HIV genetic susceptibility scores, which aims to etermine the effects of ifferent genetic susceptibility scores on viral response. ffect moi?cation by other covariates in the moel is also explore.

3 1 Introuction In meical an epiemiological stuies, when an outcome of interest is binary (i.e. isease vs. no isease) researchers are often intereste in measuring the relative risk of an exposure or treatment. The conitional relative risk (RR), which ajusts for possible confouners, can be an important measure in such stuies, because of its easy interpretation as the increase in the probability of the outcome in an expose group relative to an non-expose group. This straightforwar interpretation allows the researcher to communicate results to a variety of auiences. Given a rare outcome, the conitional RR can be approximate by the conitional os ratio (OR) an easily estimate using logistic regression. However, when the outcome is common, estimating the conitional RR can be problematic, especially when the exposure or covariates are continuous(mcnutt et al., 2003; Barros an Hirakata, 2003; Lumley et al., 2006). In this paper, we propose a new approach for estimating the conitional RR an its relate parameters for a common outcome base on targete maximum likelihoo methoology (van er Laan an Rubin, 2006). This approach is evelope uner a flexible multiplicative semi-parametric moel an can be implemente using stanar statistical software. In the literature, the most prevalent methos for estimating conitional relative risk parameters for a common outcome are parametric methos base on log-linear (e.g. Wacholer (1986), Skov et al. (1998) an Zocchett et al. (1995)), Poisson (e.g. Zou (2004) an McNutt et al. (2003)), an Cox regression moels (e.g. Lee an Chia (1993) an Lee (1994)). Overviews an comparisons of these methos can be foun in papers by McNutt et al. (2003), Barros an Hirakata (2003) an Lumley et al. (2006). These three methos o not imply three unique estimators, however. As the above mentione papers point out, the estimators implie by using Poisson regression an Cox regression are equivalent. In practice, Poisson regression is often preferre an easier to implement, therefore we focus our comparisons on Poisson regression. Some have also suggeste methos to convert the OR to the RR. However, these methos are susceptible to bias in both their estimates, tests, an confience intervals (Localio et al., 2007; McNutt et al., 2003). If the regression moel is correctly specifie, these parametric methos will provie consistent estimates of conitional relative risk parameters. Log-binomial regression estimates the conitional RR irectly using maximum likelihoo estimation. However, when continuous covariates are inclue in the moel, logbinomial regression becomes highly susceptible to convergence issues an requires moifications to achieve more stable parameter estimates (e.g. Wacholer (1986), SAS (2003) an Lumley et al. (2006)). Poisson regression is not plague by these convergence issues an provies consistent estimates of the conitional relative risk when the moel is correct (Stijnen an Van Houwelingen, 1993; Zou, 2004; Carter et al., 2005). However, the stanar errors provie by Poisson regression are overestimate, an alternative methos such as the sanwich estimator (Zou, 2004; Carter et al., 2005) or bootstrap (Barros an Hirakata, 2003) must be use to obtain correct inference. Due to their epenence on the accuracy of a fully specifie moel, parameter estimates from the above parametric methos are often biase in observational stuies. More flexible alternatives presente in the literature inclue semi-parametric counterparts to the log-linear an Poisson regression moels. Typically these are built uner either generalize partial linear moels or generalize aitive partial linear moels, the latter being a less flexible approach where each covariate has a separate aitive component in the moel (Hastie an Tibshirani, 1990). stimation methos for the parameter uner a partial linear moel inclue profile likelihoo methos (Speckman, 1988; Severini an Wong, 1992; Severini an Staniswalis, 1994) an backfitting algorithms (Hastie an Tibshirani, 1990). Uner the aitive partial linear moel, estimation methos inclue backfitting (Buja et al., 1989) an marginal integration (Chen et al., 1996). Though these estimation methos are less epenent on moel specification, they o not target estimation towars the parameter of interest an can still result in biase estimates an improper inference. In this paper, we introuce ouble robust (DR) targete maximum likelihoo estimators (TMLs) of conitional relative risk parameters. These estimators are evelope uner a multiplicative semi-parametric regression moel, which is also referre to as a log-link generalize partial linear moel. The moel only requires specification of the moel terms relating to the variable of interest (i.e. the exposure) an allows the remaining terms to be estimate as flexibly as possible, using a ata aaptive approach. Targete maximum likelihoo estimation upates an initial estimator of the ensity of the ata uner the assume moel in 1 Hoste by The Berkeley lectronic Press

4 a irection that targets estimation towars the parameter of interest (van er Laan an Rubin, 2006). In this case, the parameter of interest is the coefficient (or coefficient vector) for the exposure variable (i.e. variable of interest) an any effect moifier (i.e. interaction) terms. The TML for these coefficients can be use to compute conitional relative risk. Previous applications of target maximum likelihoo uner a semi-parametric regression moel can be foun in Tuglus an van er Laan (2008, 2010), an aitional applications of targete maximum likelihoo can be foun in van er Laan et al. (September, 2009) In this paper, we present three TML s for the parameter of interest in the multiplicative semi-parametric moel, but focus in particular on one practical TML, which we recommen for use in practice. ach TML is base on the upate of a ifferent initial ensity an makes ifferent assumptions on the istribution of the outcome. The three istributions for the TMLs we present here are: (1) log-binomial, (2) Poisson, an (3) overisperse exponential. The first TML, base on an upate of a log-binomial ensity, is the natural TML for parameters relate to the conitional relative risk. It correctly assumes a binary outcome an the resulting TML is ouble robust, asymptotically linear, an efficient. However, estimation requires log-binomial regression, which is often computationally unstable; therefore this TML is not typically feasible in practice. The secon TML is base on an upate of a Poisson ensity. This TML assumes the outcome follows a Poisson istribution, which is incorrect for a binary outcome; therefore, given a binary outcome, the resulting estimator is ouble robust an asymptotically linear but no longer efficient. However, this TML is computationally stable an straightforwar to implement with correct inference. Therefore, this TML is consiere the most practical of the three estimators, an this paper will focus primarily on this TML for implementation an application. The thir TML makes no assumptions on the istribution of the outcome an is base on an upate of a ensity in the overisperse exponential family. This TML is the most general of the three, an the targete maximum likelihoo upates of the two prior methos can be erive irectly from the upate for this TML. However, this metho is not straightforwar to implement in practice using stanar software. The layout of the paper is as follows. In Section 2, we present the ata structure. In Section 3, we present the assume multiplicative semi-parametric moel, an in Section 4 we formalize the parameter of interest, which is implie by the moel. In Section 5, we present the three TML s mentione above, followe by stepby-step implementation instructions for the more practical Poisson-base TML in Section 6. In Section 7, we emonstrate the performance of this Poisson-erive TML with results from a variety of simulations, an in Section 8, we apply this TML in an HIV viral response ata set. We conclue by summarizing the value our new approach in the iscussion in Section 9. 2 Data Structure Consier an observe point treatment ata set consisting of n inepenent an ientically istribute (i.i..) observations of O = (W, A, Y ) P 0 M. W is a vector of baseline covariates, A is the exposure of interest, an Y = {0, 1} is a binary outcome. P 0 enotes the true istribution of O, from which all subjects are sample. P 0 is an element of a statistical moel M, which is a semi-parametric moel efine below in Section 3. In this article, the subscript 0 (e.g. β 0 ) will represent a parameter or function uner the observe ata generating istribution, an the subscript n (e.g. β n ) will represent a estimate or estimator of a parameter or function etermine from the sample population. Aitionally a superscript inicates the iteration value, where a superscript 0 esignates an initial value, a superscript k esignates the values at the k th iteration, an a superscript inicates the final converge value. Note that for causal effects, we assume that O is a missing ata structure on a hypothetical full ata structure X = (W, Y a : a A), which contains all counterfactual outcomes Y a. We therefore view A as the missingness variable, as O contains only one of all possible counterfactual outcomes, Y = Y A. 2

5 3 Multiplicative Semi-parametric Moel The likelihoo of the observe ata can be factorize as follows in terms of the observe ata structure efine above: P 0 (O) = P 0 (W )P 0 (A W )P 0 (Y A, W ). We make no assumptions about the istributions of P 0 (W ) or g 0 (A W ) = P 0 (A W ) an assume the following semi-parametric multiplicative moel for the mean of P 0 (Y A, W ): or P 0 (Y = 1 A, W ) = e m β 0 (A,V ) P 0 (Y A = 0, W ), log(p 0 (Y = 1 A, W )) = m β0 (A, V ) + log(p 0 (Y = 1 A = 0, W )), where m β0 (A, V ) is a specifie function of A an effect moifiers V W, an P 0 (Y = 1 A = 0, W ) is unspecifie. The moel m β0 (A, V ), can be any form such that m β0 (A = 0, V ) = 0 for all values, v V. We generally specify a linear function of A an work with the following two moels (1) simple main effect: m β0 (A, V ) = β 0 A or (2) V-moifie m β0 (A, V ) = β 0(1) A + β 0(2) A : V, where the effect of exposure A is moifie by covariate V. The latter moel form is represente in terms of the parameter vector β 0 as m β0 (A, V ) = β T 0 [A A : V ]. Going forwar, we efine Q 0 (A, W ) P 0 (Y = 1 A, W ). For convenience, we introuce separate notation for P 0 (Y = 1 0, W ) an efine θ 0 (W ) Q 0 (0, W ). We can then rewrite the multiplicative semi-parametric moel as: Q 0 (A, W ) = e m β 0 (A,V ) θ 0 (W ). (1) 4 Parameter of Interest We can represent conitional relative risk (RR) in terms of the observe ata an the semi-parametric moel as follows: or on the log scale: RR 0 (a) = log P0(Y =1 A=a,W ) P 0(Y =1 A=0,W ) = Q 0(a,W ) θ 0(W ) = e m β 0 (a,v ), { } Q0 (a, W ) = m β0 (a, V ). θ 0 (W ) The moel in 2 requires that both Q 0 (A, W ) > 0 an θ 0 (W ) > 0. In orer for this to be true, we must have that P 0 (A = a W ) > 0 for all a, w. This latter requirement is often referre to as the positivity assumption (Robins, 1986, 1987, 1999), or the experimental treatment assignment (TA) assumption (Neugebauer an van er Laan, 2005). Therefore, the moel in 2 is true only for a, w for which there is support in the ata. So to be more precise, we can write: Q 0 (a, w) θ 0 (w) I(a, w A ) = e m β 0 (a,v), (2) where A contains the subset of a, w for which the positivity assumption is not violate (i.e. for which P 0 (A = a W ) > 0). This allows us to estimate the importance (i.e. risk) of exposure A in preicting the 3 Hoste by The Berkeley lectronic Press

6 outcome Y, conitional on W, for those strata of W in which the ata have sufficient support. The analyst may want to allow for extrapolation across all a, w, but this woul be unwise if the moel was not true for across all a, w. Base on the parameterization of the conitional relative risk shown above, we can efine our parameter of interest as β 0 = Ψ(P 0 ). We note that uner the moel efine in Section 3, the parameter of interest is only a function of Q0 (A, W ) = P 0 (Y = 1 A, W ). Therefore we can enote, β 0 = Ψ( Q 0 ). If we efine m β0 (A, V ) = β 0 A, then the parameter of interest, β 0 is the change in the log of the conitional relative risk for a one unit increment change in A. This allows us to estimate the importance (i.e. risk) of exposure A in preicting the outcome Y, conitional on W, for those strata of W in which the ata have sufficient support. In orer for the parameter of interest to have a causal interpretation, we require that not only that O is a missing ata structure on a hypothetical full ata structure X = (W, Y 0, Y 1 ), as escribe in Section 2, but we also require the ranomization assumption: {A Y 0, Y 1 W }. However, even in the case where these assumptions o not hol, the variable importance parameter efine above is a well-efine an meaningful parameter. We note that one motivation for using our multiplicative semi-parametric moel is that regarless of whether or not we believe our moel, we can still accurately test the following strong null hypothesis: H 0 : P 0 (Y = 1 A, W ) P 0 (Y = 1 A = 0, W ) = 1, for all W. Uner this null, the moel is always correct. Therefore, we can construct vali tests of this null hypotheses. 5 Targete Maximum Likelihoo stimation Targete maximum likelihoo estimation upates an initial estimator of the ensity in a irection that targets estimation towars the parameter of interest. The upate is achieve by regressing the outcome on a clever covariate while setting the initial estimate as an offset. This upate is iterate until convergence. The clever covariate is erive such that the converge TML is also the solution to the ouble robust estimating equation for the parameter of interest. The converge TML is an asymptotically linear estimator of the parameter of interest, therefore inference can be base on the corresponing influence curve (van er Laan an Rubin, 2006). For the parameter of interest, β 0, uner the assume semi-parametric multiplicative moel efine in Section 3, three ifferent TML s may be evelope: (1) as an upate to log-binomial ensity assuming the istribution for P 0 (Y A, W ) is binomial; (2) as an upate to a Poisson ensity assuming the istribution for P 0 (Y A, W ) is Poisson; or (3) as an upate to a general ensity of the overisperse exponential family inepenent of istributional assumptions on P 0 (Y A, W ). The latter TML is the most general an assumes only a semi-parametric multiplicative moel of general form as presente in 1. We note that TML s (1) an (2) are examples of TML (3) an can be erive from metho (3) uner their respective istributional assumption (see Appenix A for etails). In the case of a binary outcome, where one is intereste in estimating the conitional relative risk, all three TML s are ouble robust an asymptotically linear estimators. However, only TML s (1) an (3) respect the binary form of the outcome an are therefore efficient. The Poisson-base TML (2) assumes the outcome follows a Poisson istribution. Therefore, when the outcome is binary, the initial estimator for TML (2) is always misspecifie, an the corresponing influence curve is not the efficient influence curve. Although TML (2) is not efficient it remains DR - that is, the efficient score estimating function in the semi-parametric Poisson regression moel is an unbiase DR estimating function for the parameter of interest in the semi-parametric conitional mean moel, which oes not assume a Poisson istribution. Consequently, the Poisson-base TML (2) is consistent, an we can provie correct inference using the corresponing influence curve. In practice, the Poisson-base TML (2) is more stable than the TML (1), which requires upating a log-binomial regression. As mentione previously, log-binomial regression is often unstable especially 4

7 given continuous covariates. Although TML (3) is more general with no istributional assumptions on P 0 (Y A, W ), it can not be implemente using stanar software an is therefore a less practical estimator. Therefore in practice, the Poisson-base TML (2) is preferre an will be applie in this paper in simulation an application. 5.1 Specifics of the TML s For each of the three TMLs we present the form of the ensity, its fluctuation moel, an the erive clever covariate. In Appenix B we provie their respective efficient scores an estimating functions as well as the erivation of their respective clever covariates. For all three TMLs, the initial ensity P 0 (Y A, W ), with mean, 0 [Y A, W ] = P 0 (Y = 1 A, W ) = Q 0 (A, W ) is efine in terms of the semi-parametric moel presente above, where log Q 0 (A, W ) = m β 0(A, V ) + log θ 0 (W ). A class of submoels fluctuate with parameter ɛ is efine as P 0 (ɛ)(y A, W ) with corresponing mean log Q 0 (ɛ)(a, W ) = m β0 (ɛ)(a, V ) + log θ 0 (ɛ)(w ), where m β0 (ɛ)(a, V ) = m β0 +ɛ(a, V ) an log θ 0 (ɛ)(w ) = log θ 0 (W ) + ɛr Q0,g 0(W ). The form of r Q,g(W ) is etermine such that at ɛ = 0, Q 0 (ɛ = 0)(A, W )= Q 0 (A, W ) an the linear span of the score of the likelihoo for Q 0 (ɛ)(a, W ) with respect to ɛ at ɛ = 0 inclues the efficient score (van er Laan an Rubin, 2006). Given a moel m β (A, V ) that is linear in β, the moel, can be rearrange as an upate to the initial fit log Q 0 (ɛ)(a, W ) = m β0 (ɛ)(a, V ) + log θ 0 (ɛ)(w ), log Q 0 (ɛ)(a, W ) = log Q 0 (A, W ) + ɛ T β 0 m β 0(A, V ) + ɛt r Q0,g0(W ). The upate can be achieve by estimating ɛ with stanar maximum likelihoo estimation. For log-binomial an Poisson methos, the upate can be complete using generalize linear regression setting the initial estimate, Q0 (A, W ), as an offset an regressing Y onto the following clever covariate, H Q0,g 0 (A, W ) = β 0 m β 0(A, V ) + r Q 0,g0(W ). In practice the upate process is iterate such that β k+1 = β k + ɛ k an log θ k+1 (W ) = log θ k (W ) + ɛ k r Qk,g k(w ), where ɛk is the upate parameter for the k th upate (e.g. β 1 = β 0 + ɛ 0 ). Convergence is achieve when ɛ k 0. The final converge estimate is the solution to the robust estimating equation corresponing to the efficient score 1 n [ Dh, n Q,g(O i ) ] = 0, i=1 Therefore, the TML, β,inherits the DR properties of the solution to the efficient estimating equation an the efficient influence curve can be use to estimate the correct covariance an inference TML (1): Log-binomial moel Assuming a log-binomial istribution, the initial ensity is a binomial ensity efine as P 0 (Y A, W ) = Q 0 (A, W ) Y (1 Q 0 (A, W )) 1 Y, where Q 0 (A, W ) = P 0 (Y = 1 A, W ) = θ 0 (W )e m β 0 (A,V ) with the associate fluctuation P 0 (ɛ)(y A, W ) = Q 0 (ɛ)(a, W ) Y (1 Q 0 (ɛ)(a, W )) 1 Y, 5 Hoste by The Berkeley lectronic Press

8 given Q 0 (ɛ)(a, W ) = θ 0 (ɛ)(w )e m β 0 (ɛ) (A,V ). The proper form of the fluctuation function r Q0,g0(W ) is efine as follows [ ] Q0 (A,W ) r Q0,g 0(W ) = 1 Q 0 (A,W ) β m 0 β 0(A, V ) W [ ] Q0 (A,W ), W 1 Q 0 (A,W ) The upate is complete using log-binomial regression with an offset equal to the initial fit an a clever covariate efine as [ ] H Q0,g (A, W ) = Q0 (A,W ) 0 β 0 m β 0(A, V ) 1 Q 0 (A,W ) β m 0 β 0(A, V ) W [ ] Q0 (A,W ). W TML (2): Poisson 1 Q 0 (A,W ) Uner the Poisson istribution, the initial ensity is a Poisson ensity efine as with the associate fluctuation P 0 (Y A, W ) = Q 0 (A, W ) Y e Q 0 (A,W ), Y! P 0 (ɛ)(y A, W ) = Q 0 (ɛ)(a, W ) Y e Q 0 (ɛ)(a,w ). Y! The proper form for r Q0,g0(W ) is { r Q0,g 0(W ) = β 0 m β 0(A, V ) [ } β m 0 β 0(A, V )e m β 0 (A,V ) W ]. [e m β 0 (A,V ) W ] The upate is complete using Poisson regression with an offset equal to the initial fit an clever covariate efine as { } H Q0,g (A, W ) = [ 0 β 0 m β 0(A, V ) β m 0 β 0(A, V )e m β 0 (A,V ) W ]. [e m β 0 (A,V ) W ] In practice, the Poisson base TML is recommene ue to its computational stability. Therefore, subsequent implementation instructions an applications in this chapter will be presente in terms of the Poisson TML TML (3): Overisperse xponential Family A ensity of the overisperse exponential family in canonical form is represente as follows (McCullagh an Neler, 1989) { η P 0 0 Y B(η 0 } ) (Y A, W ) = h c (Y, τ) exp. (τ) for this subclass of moels (τ) is the conitional resiual variance, σy 2 (A, W ), η0 = Q 0 (A, W ) = θ 0 (W ) exp(m β 0(A, V )) an B (η 0 ) = Q 0 (A, W ). Therefore it can be rewritten as follows { Q0 P 0 (A, W )Y B( (Y A, W ) = h c (Y, τ) exp Q 0 } (A, W )) σy 2 (A, W ). We efine a class of submoels, fluctuate by parameter ɛ as { Q0 P 0 (ɛ)(a, W )Y B( (ɛ)(y A, W ) = h c (Y, τ) exp Q 0 } (ɛ)(a, W )) σy 2 (A, W ), 6

9 where Q 0 (ɛ)(a, W ) = θ(ɛ)(w ) exp(m β(ɛ) (A, V )), θ(ɛ)(w ) = θ(w ) exp(ɛr Q0,g0(W )) an β(ɛ) = β + ɛ. The proper form of the fluctuation function, r Q0,g0(W ), is efine as follows [ ] r Q0,g 0(W ) = β m 0 β 0(A, V ) Q0 (A,W ) 2 σ W Y [ 2 (A,W ] ) Q0 (A,W ) 2 σ W Y 2 (A,W ) an the clever covariate is efine as H Q0,g 0 (A, W ) = β 0 m β 0(A, V ) 5.2 stimating the clever covariate [ β m 0 β0 (A, V ) Q0 (A,W ) 2 [ Q0 (A,W ) 2 ] σ W Y 2 (A,W ] ) σ W Y 2 (A,W ) Unlike the TML for the aitive semi-parametric regression moel for a continuous outcome presente in Tuglus an van er Laan (2008, 2010), the clever covariate presente here is not only a function of the conitional mean of A, given W, but epens on P (A W ) in a more complex way. When A is continuous, the clever covariate can be irectly estimate using a ata-aaptive regression algorithm to estimate the expectations in the numerator an enominator of the secon term. This is the metho that is use in the following application in Section 8. Given a binary or categorical A with L levels, the clever covariate can also be etermine irectly as follows L H Q,g l=1 β (A, W ) = A m β(a = a l, V )e m β(a=a l,v ) P (A = a l W ) L. l=1 em β(a=a l,v ) P (A = a l W ) This metho requires the estimation of P (A = a l W ) for all L. It is recommene that these values are estimate ata-aaptively. This metho can also be use to approximate the expectation given a continuous A if the analyst is willing to iscretize A solely for the purpose of estimating this covariate. 5.3 Inference An important feature of targete maximum likelihoo estimators is that they solve the corresponing robust estimating equation as presente in Section 5. Therefore, statistical inference of the converge estimate of β 0 can be base on the influence curve associate with this estimating equation. Covariance estimates base on the influence curve are consistent if the estimator for g 0 (W ) = P 0 (A W ) is correctly specifie an the estimator for Q 0 is misspecifie. If g n is correctly specifie, the covariance is known to be asymptotically conservative (van er Laan et al., September, 2009), except if θ n is consistent or if g n = g 0 is known. Covariance may also be estimate using the bootstrap. However, this TML requires iteration an can be computationally intensive. For a parameter vector β 0 of length p, an estimate of the p by p covariance matrix, Σ n, can be obtaine using the influence curve efine for a single subject as given scale factor IC(O) = c 1 D h0, Q 0,g 0 (O), [ ] c = D β h0, Q 0,g 0 (O) 0 where IC(O) is a 1 by p vector for a parameter vector β 0 of length p. The empirical estimate for the covariance of β n is Σ n = 1 ÎC n (O i n )ÎC n(o i ) T i 7 Hoste by The Berkeley lectronic Press

10 an the normal approximation n(β n β 0 ) N(0, Σ n ), can be use for the purpose of statistical inference. Using the estimate covariance matrix, hypothesis tests can be performe for a single component β n (j), where j = 1,..., p, uner the null hypothesis H 0 : β 0 (j) = 0 using a stanar test statistic to obtain p-values, with estimate variance Σ n (j, j): nβn (j) T n (j) = N(0, 1). Σn (j, j) n Likewise the hypothesis H 0 : c T β 0 = 0 can also be teste using a stanar Wal test, where the covariance of c T β n is c T Σ n c. In the application presente in this paper, effect moification is teste at various levels of V. In this situation the vector c for the Wal test is {1, V q }, where V q is a specific quantile of effect moifier V. 6 Step-by-Step Implementation In this section, we provie step-by-step instructions for implementing targete maximum likelihoo estimation for β 0 uner the semi-parametric regression moel presente in (1). Although the following steps are presente assuming a Poisson istribution, the TML uner the log-binomial istribution as well as the more general TML base on the overisperse exponential ensity can be implemente in a similar fashion by substituting in the appropriate clever covariate an using log-binomial regression for the upate. For the following implementation steps, we assume the general linear moel form m β (A, V ) = A(β T V ). (1) Obtain an initial estimate, Q 0 n(a, W ) respecting the semi-parametric form log Q 0 n = m β 0 n + logθn, 0 which provies the initial estimate for the parameter of interest, βn. 0 This can be accomplishe by fitting the semi-parametric moel using methos such as those escribe in Speckman (1988); Severini an Wong (1992); Hastie an Tibshirani (1990), or by using methos such as DSA Sinisi an van er Laan (2004) to fix the parametric portion of the moel an allow the rest to be estimate ataaaptively. A more flexible alternative, uses an initial estimate for Q 0 (A, W ) of general moel form obtaine using ata-aaptive machine learning algorithms such as super learner van er Laan et al. (2007). We estimate Q n (A = 0, W ) using this general moel fit an then regress Y onto the moel m β 0(A, W ) an treat Q n (A = 0, W ) as a covariate. This results in our initial ensity estimate, Q 0 (A, W ) an initial estimate, β 0 uner the correct moel. (2) Calculate the clever covariate, H Q0 n,g n (A, W ) = AV [AV T e(a(β0 n V ) W ]. If A is iscrete the H Q0 n,g n [e (A(β0 n T V )) W ] can be calculate irectly as escribe in Section 5.2, or by estimating the numerator an enominator of the secon term ([AV e (A(β0 n T V )) W ] an [e (A(β0 n T V )) W ] respectively) using a flexible ata-aaptive algorithm. (3) stimate the fluctuation parameter, ɛ 0 using Poisson regression to project Y onto H Q0 n,g n, while setting the initial fitte values, Q0 n (A, W ), as an offset. The estimate coefficient associate with the clever covariate is the fluctuation parameter estimate, ɛ 0 n. If complete in R this is equivalent to fitting the following regression formula Y H Q0 n,g n (A, W ) + offset( Q 0 n(a, W )) 1 using glm. Note that there is no intercept, only the offset value. (4) Upate initial estimate by setting β 1 n = β 0 n + ɛ 0 n an an setting the upate fit as log( Q 1 n(a, W )) = log( Q 0 n(a, W )) + ɛ 0 nh Q0 n,g n (A, W ). These are the first-step upates. (5) Iterate steps 1 through 4. At each k th iteration, set βn k = βn k 1 + ɛ k 1 n an log( Q k k 1 n(a, W )) = log( Q n (A, W )) + ɛ k 1 n H Q0 n,g n (A, W ), until convergence (ɛ k n 0). 8

11 (6) Obtain stanar error an inference for the final converge estimate β n using the influence curve or bootstrap estimates, an then calculate stanar errors, p-values an confience intervals as escribe in Section Simulations In this section, we assess the properties of the Poisson-erive TML of β 0 with simulations that cover a range of scenarios seen in actual ata sets. With these simulations, we emonstrate the ouble robustness properties of this TML an ifferences in variability when we estimate Q 0 (A, W ) an/or g 0 (A W ) consistently or with super learner. We compare our results to those from common estimators in the literature (those obtaine from parametric log-binomial an Poisson regression moels), as well as to the estimator obtaine from an original fit of the semi-parametric moel, using super learner to estimate θ 0. To evaluate the performance of all estimators, we focus on bias, variance, mean square error an confience intervals. We show that the relative performance of the TML, when compare to the other estimators, epens partly on the level of sparsity in the ata. As iscusse earlier, within strata efine by W, we woul like the probability of exposure A to be boune away from zero an one. When A is binary or categorical an is perfectly ranomize, this is guarantee. The common methos in the literature perform well uner this scenario. However, when A is continuous an in observational stuies, analysts often have the challenge that g 0 (A W ) is very small for some strata of W. In other wors, we have practical violations of the positivity assumption for all a, W. Different estimators are affecte by positivity violations in ifferent ways. In the following simulations, we emonstrate the relative performance the Poison-erive TML of β 0, for both binary an continuous A, (1) when A is perfectly ranomize, (2) when the relationship between A an Y is confoune by W but there are no positivity violations an (3) when there are extreme positivity violations. In all simulations, the outcome variable, Y {0, 1} is an inicator, such as of of isease status. Also in all of the simulations, W is a vector of five covariates, which were generate as follows: W 1 Binom(1, 0.3) W 2 Binom(1, 0.65) W 3 N(0, 2) W 4 N(100, 10) W 5 N(1, 0.3). We set Q 0 (A, W ) = e 0.1A e I+0.1W3+0.02W2W3 0.01W1W4 0.02W5, (3) where I takes the following values for the various simulations: Binary A Continuous A Simulation 1 Simulation 2 Simulation 3 Simulation 1 Simulation 2 Simulation As (3) shows, β 0 = 0.1 in all simulations. Also, (3) shows that m β0 (A) = β 0 A, so we have assume there are no effect moifiers. 7.1 Simulations for Binary A For a binary A, we consier the following three conitional istributions for g 0 (A W ): 9 Hoste by The Berkeley lectronic Press

12 1. For the first simulation, A is perfectly ranomize such that g 0 (A W ) = 0.5. When g 0 (A W ) is misspecifie for this simulation, g n (A W ) = For the secon simulation, A is epenent on W such that: g 0 (A W ) = exp( (0.1W 3 )). With this mechanism for exposure, the correlation between A an W 3 is 0.10, an values of g 0 (A W ) range from 0.28 to 0.73, with a meian of Therefore, we o not have positivity violations. For this simulation, when g n (A W ) is misspecifie, the estimator epens only on W For the thir simulation, A is again epenent on W, but now we have: g 0 (A W ) = exp( (1.0W 3 )). This mechanism for exposure leas to positivity violations because g 0 (A W ) [ , 1.0]. The meian value is The correlation between A an W 3 is now Misspecification of g n (A W ) again occurs by having the estimator only epen on W Simulations for Continuous A For continuous A, we again varie g 0 (A W ) three ways: 1. For the first simulation, A is not epenent on W. It is normally istribute such that A N(1, 0.6). 2. For the secon simulation, A is epenent on W such that A N(1, 0.6) + 0.1W 3. In this simulation, the correlation between A an W 3 is For the thir simulation, A is epenent on W such that A N(0, 0.6) 0.8W 3. The correlation between A an W 3 in this simulation is 0.6. For all simulations, we generate 1000 samples of size All ata were generate an all estimators were implemente using R (Team, 2010). 7.3 Simulation Results Tables 1.1 an 1.2 present results for estimating β 0 when A is binary an when A is continuous. For continuous A, we estimate the numerator an enominator of the clever covariate using the lars package in R (fron et al., 2003). The first column of the tables presents the initial substitution estimator, β 0 n, base on the initial estimate of Q0. The secon column presents the TML, β n, obtaine by substitution after k iterations of upating Q 0 n to obtain Q n. The subsequent columns provie the bias, mean square error (MS) an empirical variance base on the estimates from 1000 samples. We also inclue the mean of the variance estimates calculate from the empirical variance of efficient influence curve ivie by the sample size of Finally, the last column shows the coverage probability (CP), or the percentage of the time that the 95% confience interval contains the true value of β 0 = 0.1. ach panel in the tables correspons to the simulations escribe above, an the rows in each panel inicate the specification of the estimators of Q 0 an g 0, on which the TML is base. For example, Qcgc inicates that the correct terms were inclue when estimating both Q 0 an g 0. Qcgm inicates that the 10

13 Table 1: Performance of Poisson-erive TML, binary A, by simulation βn 0 βn Bias MS Var(βn) Var(IC eff )/n CP Simulation 1 Qcgc Qcgw Qwgc Qslgsl Simulation 2 Qcgc Qcgw Qwgc Qslgsl Simulation 3 Qcgc Qcgw Qwgc Qslgsl estimator of g 0 was misspecifie as escribe above, while the estimator for Q 0 inclue the correct terms; an Qgmc inicates that the estimator for Q 0 was misspecifie as escribe above, while the correct terms were inclue when estimating g 0. Finally Qslgsl inicates that the super learner was use for the estimators for both Q 0 an g 0. Tables 1.1 an 1.2 illustrate the properties we expect to see for the TML of β 0 : The TML is ouble-robust. The finite-sample bias is close to zero if the estimator for either Q 0 or g 0 is consistent. We achieve this even uner substantial confouning an extreme violations of positivity in Simulation 3. When the estimator for g 0 is consistent, the variance estimate obtaine from the empirical variance of the efficient influence curve is approximately equal to the variance of the 1000 TMLs an the coverage probability is approximately 95%. When the estimator for g n is inconsistent, this variance estimate is asymptotically conservative. Using the super learner to estimate both Q 0 an g 0 provies robust estimators of either Q 0 or g 0 so that we achieve comparable bias an variance as obtaine when correctly specifying the moels for Q 0 an/or g 0. The one exception is when A is continuous an we have extreme positivity violations. Tables 1.3 an 1.4 compare the performance of the TML of β 0 to the common estimators in the literature when the initial working moel for Q 0 (A, W ) is incorrect. All of the estimators in the literature will perform well when the parametric moels on which they rely are correctly specifie. However, we are very oubtful that anyone can ever specify a parametric moel correctly. Therefore, we present comparisons uner a more realistic scenario. Within each panel for each simulation, the first two rows present results (bias, variance an MS) for the common methos in the literature - using log binomial an Poisson regression to estimate W-ajuste relative risk. The thir an fourth rows present results for the initial estimate of β 0, β 0 n, when Q 0 n(a, W ) is incorrectly specifie an when it is estimate by super learning. The last two rows then present results for the TML, β n when Q 0 n(a, W ) is incorrectly specifie an when it is estimate by super learning. Figure 1.1 also compares the performance of TML to other relative risk estimators The following summarizes key observations from both the tables an figure: 11 Hoste by The Berkeley lectronic Press

14 Table 2: Performance of Poisson-erive TML, continuous A, by simulation βn 0 βn Bias MS Var(βn) Var(IC eff )/n CP Simulation 1 Qcgc Qcgw Qwgc Qslgsl Simulation 2 Qcgc Qcgw Qwgc Qslgsl Simulation 3 Qcgc Qcgw Qwgc Qslgsl Table 3: Relative performance of Poisson-erive TML, binary A Bias Var MS Simulation 1 Log Binomial, incorrect Poisson, incorrect βn, 0 incorrect βn, 0 SL βn Qwgc βn Qslgsl Simulation 2 Log Binomial, incorrect Poisson, incorrect βn, 0 incorrect βn, 0 SL βn Qwgc βn Qslgsl Simulation 3 Log Binomial, incorrect Poisson, incorrect βn, 0 incorrect βn, 0 SL βn Qwgc βn Qslgsl

15 Table 4: Relative performance of Poisson-erive TML, continuous A Bias Var MS Simulation 1 Log Binomial, incorrect Poisson, incorrect βn, 0 incorrect βn, 0 SL βn Qwgc βn Qslgsl Simulation 2 Log Binomial, incorrect Poisson, incorrect βn, 0 incorrect βn, 0 SL βn Qwgc βn Qslgsl Simulation 3 Log Binomial, incorrect Poisson, incorrect βn, 0 incorrect βn, 0 SL βn Qwgc βn Qslgsl In a ranomize trial, as emonstrate in Simulation 1, all estimators perform comparably well, as expecte, for both binary an continuous A. As the relationship between the true confouner an A increases in Simulations 2 an 3, the estimators utilizing on log-binomial regression an Poisson regression are increasingly biase, while the variance (of the 1000 sample estimates of β 0 ) remains at the same or similar level (for binary A) or ecreases (for continuous A). The TML of β 0 achieve the lowest MS in both simulations with confouning (Simulations 2 an 3). We see a small trae-off in variance for removal of bias. ven with positivity violations, the TML s are robust. 13 Hoste by The Berkeley lectronic Press

16 Binary A stimate Simulation 1 Simulation 2 Simulation 3 Log binomial Poisson Initial Qm Initial, SL TML Qwgc TML Qslgsl stimation Metho Continuous A stimate Simulation 1 Simulation 2 Simulation 3 Log binomial Poisson Initial Qm Initial, SL TML Qwgc TML Qslgsl stimation Metho 8 Application Figure 1: stimates an 95% confience intervals by metho Genotypic resistance testing has become a powerful tool for clinicians in etermining the appropriate treatment regimen for people with HIV (Durant et al., 1999; Tural et al., 2002). However interpretation of the resistance testing can be ifficult. Over the years multiple interpretation algorithms have been evelope 14

17 to provie more straightforwar measures of resistance Rhee et al. (2009). A stuy complete by Rhee et al. (2009) assesse the preictive ability of four genotypic resistance test interpretation algorithms (ANRS (e Recerche sur le SIDA, ANRS), HIVb (Liu an Shafer, 2006), Rega (Van Laethem et al., 2002), an ViroSeq (shleman et al., 2004)) to etermine virologic response (VR), ajusting for aitional baseline covariates. In this analysis the ata are reanalyze using targete maximum likelihoo methoology to etermine the importance of each algorithm with respect to its preictive fit. In other wors, we are etermining the importance of each genotypic algorithm (A) on VR (Y ), ajusting for all other covariates in the original analysis (W ). Then, the genotypic algorithm with the highest importance measure is analyze further by estimating the moification of its effect by each of the covariates inclue in the original moel. 8.1 Backgroun For iniviuals infecte with human immunoeficiency virus (HIV) effective antiretroviral (ARV) treatments carry the promise of a longer an more gratifying life. HIV infects the boy s immune system an progressively estroys an impairs its ability to fight off infection. ARV treatments are esigne to slow own HIV reprouction an stall its ebilitating effects. A properly esigne an aministere treatment can prolong survival an increase overall quality of life (of Health an on Clinical Practices for Treatment of HIV Infection A). There are many ARV rugs available. Some of the more common classes of ARV rugs are booste protease inhibitors (PIs), nucleosie reverse transcriptase inhibitors (NRTIs) an non-nucleosie reverse transcriptase inhibitors (NRTIs) (of Health an on Clinical Practices for Treatment of HIV Infection A). Patients are generally place on a combination of several rugs from multiple classes calle a treatment regimen. However, HIV is a rapily evolving virus an commonly evelops rug resistant mutations renering initially effective treatment regimens useless (of Health an on Clinical Practices for Treatment of HIV Infection A). The rapi rate of mutation has mae genotypic resistance testing essential to etermining the appropriate treatment regimen for an iniviual patient (Durant et al., 1999; Tural et al., 2002). To facilitate the interpretation of these tests, interpretation algorithms have been evelope. The algorithms are rug-specific an applie to a patients baseline genotype (Rhee et al., 2009). 8.2 Data Stuy subjects were selecte accoring to the eligibility requirements outline in Rhee et al. (2009) from 16 clinics of the Kaiser-Permanente Meical Care Program, Northern California. In the original stuy 734 vali treatment change episoes (TCs) were recore for 641 patients. In this stuy a vali TC occurs when a iniviual has unergone a change in treatment regimen within 24 weeks of a genotypic resistance test an has receive at least four weeks of a new salvage regimen. Though the original stuy uses all TCs, to simplify our analysis, we ranomly select only one TCs per patient. Virologic response is measure accoring to plasma HIV-1 RNA levels. High plasma HIV-1 RNA levels inicate strong viral activity. The original stuy classifies VR into three classes: sustaine, transient, an absent. Sustaine VR is achieve when two subsequent tests show plasma RNA levels below the limit of quantification (BLQ). Transient refers to cases where only 1 subsequent test was at BLQ level, an absent when no subsequent test is BLQ. For this analysis sustaine an transient classes are merge into a single class (VR=1) (see Rhee et al. (2009) for more etails). The original Rhee et al. (2009) stuy focuse on four interpretation algorithms: ANRS (e Recerche sur le SIDA, ANRS), HIVb (Liu an Shafer, 2006), Rega (Van Laethem et al., 2002), an ViroSeq (shleman et al., 2004). ach algorithm is applie to an iniviual patients baseline genotype to etermine the rug specific genotypic susceptibility scores (GSSs) for each ARV. GSS measures range from 0 to 1, where a GSS of 1 inicates full susceptibility of HIV-1 to the particular ARV an a GSS of 0 inicates full resistance. The rug-specific GSSs are then combine into regimen specific GSSs (rgsss) through three alternative weighing schemes: booste PI weighte, comprehensive weighte, an unweighte. The unweighte rgss is the aition of all rug-specific GSSs weighte equally with 1.0. The booste PI weighte rgss increases the weight of the rug-specific GSSs for booste PIs to 1.5. The comprehensive weighte rgss increases 15 Hoste by The Berkeley lectronic Press

18 the weight of the rug-specific GSSs for booste PIs to 2.0 an ecreases the rug-specific GSSs for NRTIs to 0.5. This results in a total of 12 rgsss. We stanarize each rgss by subtracting its mean an iviing by its stanar eviation to allow irect comparison of the importance measures. Aitional covariates are also inclue in the original preiction analysis incluing iniviual emographics (age, sex, an race), features of ARV treatment prior to the TC (i.e. uration of therapy, number of ARVs, etc.), an features of the salvage regimen (i.e. number of new ARVs, new ARV rug classes, etc.), as well as plasma HIV-1 RNA level an CD4 count at baseline. 8.3 Analysis Targete maximum likelihoo estimation is first applie to estimate the variable importance of each rgss with respect to its own preiction of VR. These estimates provie a measure of how much each variable changes the probability of VR on a relative scale. As state previously in Section 5, TML for variable importance upates an initial regression estimate to target the parameter of interest. In this case, the initial regression estimate for a specific rgss is the super learner estimate, preicting VR using the rgss an aitional covariates. The covariate set is consistent across all rgss fits an is efine using univariate logistic regression. Covariates associate with VR with a p-value of 0.1 or less are inclue. This analysis focuses on the overall effect of the rgss, therefore the moel is the simple effect moel: m β (A, V ) = βa. For each importance measure, inference is obtaine using the influence-curve base estimate of the stanar error. 8.4 Results Results of the initial analysis show significant importance values for all rgss algorithms as expecte. Note that though stanarizing the scores allows us to irectly compare the importance measures, the increment increase in RR is now relative to an increase of one in the z-score of the rgss or corresponingly an increase of one stanar eviation of the original rgss measure. From Figures 2 an 3, we see that in general weighting i have an effect on overall importance. Weighting scheme 1 seems to increase the coefficient for each algorithm over the unweighte, an weighting scheme 2 increases it even more over that. Uner any weighting scheme, ANRS seeme to have the highest coefficient an corresponing increment RR (e.g. the change in the probability of sustaine virologic response for one unit increase in the zscore of the rgss). Of the twelve, the highest importance is attribute to ANRS with comprehensive weighting with a coefficient β = corresponing to an increment RR of 1.23 an associate p-value of 3.80e-06 before ajusting for multiple testing. 8.5 Seconary analysis As a seconary analysis the rgss with the highest importance, comprehensively weighte ANRS, is chosen an a V-moifie variable importance analysis is performe, in which a covariate V is inclue as an effect moifier of A. The moel for this analysis is as follows m β (A, V ) = A(β A + β v V ) where the parameter of interest is now measure as a function of two parameters with respect to a particular covariate, V. In this analysis, all other covariates are consiere iniviually as effect moifiers for the selecte rgss, an the V-moifie importance measure is estimate. The results are shown below. The iniviual coefficient values estimate using TML with their respective confience intervals are shown in Figure 4. Given only the coefficients it is ifficult to interpret the results. To clarify, the increment RR change is calculate at varying levels, V q of each effect moifier V. This is efine as follows for value V q of any V as RR = e β A+β V V q. 16

19 Calculating the corresponing stanar error is achieve by first calculating the stanar error of the linear combination c T β = β A + β V V q, as cσ V c T, where β = {β A, β V }, c = {1, V q }, an Σ V is the influence curve base covariance estimate for {β A, β V }. Then, the elta metho is use to calculate the corresponing stanar error estimate for the exponential of this linear combination. The quantiles of V (min, 25%, 50%, 75%, max) are chosen for V q. Note that in some cases the covariate in binary an there are only two possible values, an therefore only two points. The results are shown in Figures 5 an 6. From the results (Figures 5 an 6), it can be seen that the increase in the risk of sustaine virologic response with respect to change in the rgss score of ANRS uner comprehensive weighting is moifie by many of the covariates. This is not surprising ue to the complexity of boy s response to HIV. Virologic response is without a oubt a combination of genetics, current viral loa, as well as current an past treatment regimens. For instance, it is logical that increase baseline viral loa woul result in increase risk of virologic response, but through this type of analysis, it can also be seen that increase baseline viral loa seems to moify the effect of the genetic score on the relative risk of VR (Figures 5 an 6). This type of analysis helps eluciate an interpret the complex set of interactions that results in the boy s virologic response. Having a metho which targets the effect an provies consistent an locally efficient estimates of the effect with formal inference is key to further the unerstaning an treatment of iseases such as HIV. 17 Hoste by The Berkeley lectronic Press

20 ANRS HIVb Coeffiecients Rega ViroSeq W0 W1 W2 Scores W0 W1 W2 Figure 2: TML s of the coefficients for 4 algorithms, ANRS, HIVb, Rega, an ViroSeq, uner three ifferent weighting schemes: unweighte (W0), booste PI weighte (W1), an comprehensive weighte (W3). The brackets represent the 95% confience intervals accoring the influence curve base estimate of the stanar error. 18

21 ANRS HIVb Increment RR Rega ViroSeq W0 W1 W2 Scores W0 W1 W2 Figure 3: TML s of the increment RR for 4 algorithms, ANRS, HIVb, Rega, an ViroSeq, uner three ifferent weighting schemes: unweighte (W0), booste PI weighte (W1), an comprehensive weighte (W3). The brackets represent the 95% confience intervals accoring the influence curve base estimate of the stanar error using the elta metho 19 Hoste by The Berkeley lectronic Press

22 A A:V prev_vr_suppress NumPIs NumHXRegimens NumHXPIs NumHXNRTIs NumHXNonHAART Variable (V) NumHXNNRTIs NumHXHAART num_newr_newreg num_r_pastreg num_r_newreg newr_class uration_newreg bl_vloa bl_c4 age_baseline Coefficients Figure 4: TML s of the coefficients (β) for ANRS algorithm uner comprehensive weighting ajuste by the other covariates. Left plot is the coefficient for the main effect, A, an the right plot is the coefficient of the effect moification or interaction term, A: V. The brackets represent the 95% confience intervals accoring the influence curve base estimate of the stanar error. The covariates ajuste are liste from top to bottom: history of virologic suppression prior to baseline (prev vr suppress), number of PIs in new regimen (NumPIs), number of regimens receive prior to baseline (NumHXRegimens), number of PIs receive prior to baseline (NumHXPIs), number of NRTIs receive prior to baseline (NumHXNRTIs), number of non-haart regimens receive prior to baseline (NumHXNonHAART), number of NNRTIs receive prior to baseline (NumHXNNRTIs), number of HAART regimens receive prior to baseline (NumHXHAART), number of new ARVs in new regimen (num newr newreg), number of ARVs receive prior to baseline (num r pastreg), number of ARVs in new regimen (num r newreg), number of new ARV classes in new regimen (new r class), uration of new regimen in weeks (uration newreg), plasma HIV-1 RNA level at baseline in log-copies/ml (bl vloa), CD4 count at baseline in cells/ml (bl c4), an age at baseline in years (age baseline) 20

23 Increment RR uration_newreg NumHXNNRTIs NumHXNRTIs NumHXRegimens Quantiles (0,0.25,0.5,0.75,1) of V NumHXHAART NumHXNonHAART NumHXPIs NumPIs Figure 5: TML s of the increment RR for ANRS algorithm uner comprehensive weighting moifie by covariate V at quantile levels V q = {0%, 25%, 50%, 75%, 100%} for each covariate V. The brackets represent the 95% confience intervals accoring the influence curve base estimate of the stanar error. Covariate V are as follows (left to right, top to bottom): uration of new regimen in weeks (uration newreg), number of HAART regimens receive prior to baseline (NumHXHAART), number of NNRTIs receive prior to baseline (NumHXNNRTIs), number of non-haart regimens receive prior to baseline (NumHXNonHAART), number of NRTIs receive prior to baseline (NumHXNRTIs), number of PIs receive prior to baseline (NumHXPIs), number of regimens receive prior to baseline (NumHXRegimens), an number of PIs in new regimen (NumPIs). 21 Hoste by The Berkeley lectronic Press

24 age_baseline bl_c bl_vloa newr_class Increment RR num_r_newreg num_r_pastreg num_newr_newreg prev_vr_suppress Quantiles (0,0.25,0.5,0.75,1) of V Figure 6: TML s of the increment RR for ANRS algorithm uner comprehensive weighting moifie by covariate V at quantile levels V q = {0%, 25%, 50%, 75%, 100%} for each covariate V. The brackets represent the 95% confience intervals accoring the influence curve base estimate of the stanar error. Covariate V are as follows (left to right, top to bottom): age at baseline in years (age baseline), CD4 count at baseline in cells/ml (bl c4), plasma HIV-1 RNA level at baseline in log-copies/ml (bl vloa), number of new ARV classes in new regimen (new r class), number of ARVs in new regimen (num r newreg), number of ARVs receive prior to baseline (num r pastreg), number of new ARVs in new regimen (num newr newreg), history of virologic suppression prior to baseline (prev vr suppress). 22

25 9 Discussion In this paper, we introuce three new estimators for the parameters of the conitional relative risk evelope using targete maximum likelihoo methoology uner a flexible multiplicative semi-parametric moel. The most prevalent estimators in the literature, especially in the fiel of epiemiology, rely on parametric moels. However, in a worl where moels are often incorrect, this reliance on full parametric moels can result in biase estimates an inaccurate conclusions. TML s are evelope to avoi reliance on these, often incorrect, moels an reuce bias by targeting estimation towars the parameter of interest. The TML s presente here are all ouble robust estimators for the parameter of interest, making them more resilient to the effects of moel misspecification. The ouble robust property states that the resulting estimate is unbiase if either the estimator for Q 0 (A, W ) or the estimator for g 0 (A W ) is consistent. In the worl clinical trials an controlle experiments, where g 0 (A W ) is known, this property is especially beneficial. However, couple with ata-aaptive methos, such as super learner (van er Laan et al., 2007), these TML s are also beneficial in the worl of observational stuies. All three estimators are evelope uner a multiplicative semi-parametric moel, which conveniently accommoates both binary an continuous covariates an effect moifiers, while ajusting for aitional covariates flexibly using ata-aaptive methos. This moel, though more flexible than the fully parametric moels commonly use in these stuies, still requires that the analyst specify the components of the moel relating to the variable of interest. In this paper, we iscuss an implement two possible moel forms, the main effect moel an single effect moification moel. It is important to note that though possible moel forms are not restricte to these two moels, they are restricte to moels linear in the variable of interest (A). We also allow moels with aitional effect moifiers, but as the imension of the fluctuation parameter increases, a sequential upate may be require (Tuglus an van er Laan, 2010). In aition, the corresponing targete likelihoo also provies a framework for moel selection among effect moifiers (Tuglus an van er Laan, 2011). The three TML s introuce in this paper are efine an evelope accoring to three ifferent initial ensities: log-binomial, Poisson, an overisperse exponential. As mentione previously, for parameters of the conitional relative risk, the log-binomial ensity is the natural choice. However, analogous to its parametric counterpart, the log-binomial-base TML is plague by the instability inherent in log-binomial regression. As in previous stuies before us (e.g. Zou (2004) an McNutt et al. (2003)), we turn to Poisson regression as a more reliable alternative. Although the Poisson-base TML oes not correctly assume a binary outcome, it still possesses the ouble robust property, an its corresponing influence curve provies correct inference for the parameter of interest. We consier the Poisson-base TML the most practical TML of the three presente an therefore focus our efforts on its implementation an recommen it for general application. In an effort to avoi reliance on istributional assumptions of P 0 (Y A, W ), we evelope the TML base on the more general ensity from the overisperse exponential family. This TML is evelope irectly from the overall efficient score for the semi-parametric multiplicative regression moel an makes no assumptions on the istributional form of the resiuals, Y [Y A, W ]. We show in Appenix A that the efficient scores of the log-binomial an Poisson-base TML can be erive irectly from this more general overall efficient score. Although, this TML is a more general TML for the parameter of interest, it is not easily implemente using stanar software packages, which makes this TML less practical for general implementation an application. In this paper, we outline implementation instructions for the Poisson-base TML an present a simulation stuy in which we verify its ouble robust properties. We also emonstrate its performance uner TA violations an increases in confouning among the covariates. Similar to previously presente TML s evelope uner a semi-parametric moel (Tuglus an van er Laan, 2008, 2010), this TML shows strong resilience to TA violations. This is attribute to its lack of epenence on inverse probability weighting which is present in some of the previously evelope TML s (van er Laan et al., September, 2009). Like its preecessors, this resilience oes not nullify the positivity (or TA) assumption from Section 4. The resilience comes from a natural extrapolation of the P 0 (A W ) mean process. When A is continuous, the P 0 (A W ) mean process is estimate given the observe values of A an W, an naturally extrapolates over 23 Hoste by The Berkeley lectronic Press

26 areas of lower support. These areas of lower support rely on the accuracy of the estimator for P 0 (A W ). Therefore in practice, it is important to acknowlege this reliance when strong TA violations are present. In application, we emonstrate the usefulness of the TML for estimation of the main effect of a particular variable, in this case a genotypic score, controlling for confouning This type of analysis is particularly useful in biomarker iscovery an variable importance analyses when we want to accurately test the importance of many variables (see Tuglus an van er Laan (2008)). We also took it one step further an showe how the moel can be augmente to test moifications of an association by a single or set of covariates, where the effect of a genotypic score on the relative risk of virologic response was moifie by a variable such as the baseline viral loa. Accurate an interpretable effect moification analysis such as this can be useful in clinical trials to test how the effect of a treatment (A) is moifie by a particular gene expression, for instance. In summary, conitional relative risk parameters are useful an interpretable in many epiemiology an meical stuies. In this paper, we have shown that with a semi-parametric multiplicative moel, one can obtain robust estimates of relative risk parameters, even uner strong confouning an TA violations. We introuce three possible TML s of parameters in the semi-parametric multiplicative moel, an we implemente the more practical of these TML s - base on a Poisson ensity. We emonstrate this TML s DR properties, illustrate that it achieves proper inference an showe its improve performance over existing methos. We also illustrate the TML s value in a real ata analysis. This TML can be applie using stanar statistical software an will be useful in a wie variety of applications. References Aluísio J D Barros an Vânia N Hirakata. Alternatives for logistic regression in cross-sectional stuies: an empirical comparison of moels that irectly estimate the prevalence ratio. BMC Me Res Methool, 3: 21, Oct oi: / A Buja, T Hastie, an R Tibshirani. Linear smoothers an aitive moels (with iscussion). Annals of Statistics, 17( ), R Carter, SR Lipsitz, an BC Tilley. Biostatistics, 6(1):39 44, Quasi-likelihoo estimation for relative risk regression moels. R Chen, W Harle, O Linton, an Severance-Lossin. stimation an variable selection in aitive nonparametric regression moels. In W Harle an M Schimek, eitors, Proceeings of the COMPSTAT Satellite Meeting Semmering 1994, Agence National e Recerche sur le SIDA (ANRS). Anrs genotypic resistence guielines (verison 13). URL J Durant, P Clevenbergh, P Halfon, an et al. Drug-resistance genotyping in hiv-1 therapy: the viraapt ranmise controlle trial. Lancet, 353: , B. fron, Hastie H., Johnstone, an Tibshirani. Least angle regression (with iscussion). Annals of Statistics, SH shleman, J Jr. Hackett, an P et al. Swanson. Performance of the celera iagnostics viroseq hiv-1 genotyping system for sequence-base analysis of iverse human immunoeficiency virus type 1 strains. J Clin Microbiology, 42(2711-7), T. J. Hastie an R.J. Tibshirani. Generalize Aitive Moels. Chapman an Hall, Lonon, J. Lee. Os ratio or relative risk for cross-sectional ata? Int J piemiol, 23:201 3, J Lee an KS Chia. stimation of prevalence rate ratios for cross-sectional ata: an example in occupational epiemiology. Br J In Me, 50:861 2,

27 TF Liu an RW. Shafer. Web resources for hiv type 1 genotypic-resistence test interpretation. Clin Infect Dis, 42: , AR Localio, DJ Margolis, an JA Berlin. Relative risks an confience intervals were easily compute inirectly from multivariate logisitic regression. Journal of Clinical piemiology, 60: , Thomas Lumley, Richar Kronmal, an Shuangge Ma. Relative risk regresssion in meical research: Moels, contrasts, estimators, an algorithms. Technical Report 293, University of Washington, P. McCullagh an J. A. Neler. Generalize Linear Moels. Number 37 in Monographs on Statistics an Applie Probability. Chapman & Hall, 2n eition, Louise-Anne McNutt, Chuntao hu, Xiaonana Xue, an Paul Hafner. stimating the relative risk in cohort stuies an clinical trials of common outcomes. Am J piemiol, 157(10): , R. Neugebauer an M.J. van er Laan. Why prefer ouble robust estimates. Journal of Statistical Planning an Inference, 129(1-2):405 26, US Department of Health an Human Services Panel on Clinical Practices for Treatment of HIV Infection A. Guielines for the use of antiretroviral agents in hiv-1-infecte aults an aolescents (the living ocument, june 26, 2010). URL SY Rhee, WJ Fessel, TF Liu, NM Marlowe, CM Rowlan, RA Roe, AM Vanamme, K Van Laethem, F Brun-Vezinet, V Calvez, J Taylor, L Hurley, M Horberg, an RW Shafer. Preictive value of hiv-1 genotypic resistance test interpretation algorithms. J Infect. Dis., 200(3):453 63, J.M. Robins. Robust estimation in sequentially ignorable missing ata an causal inference moels. In Proceeings of the American Statistical Association: Section on Bayesian Statistical Science, pages 6 10, J.M. Robins. A new approach to causal inference in mortality stuies with sustaine exposure perios - application to control of the healthy worker survivor effect. Mathematical Moelling, 7: , J.M. Robins. Aenum to: A new approach to causal inference in mortality stuies with a sustaine exposure perio application to control of the healthy worker survivor effect [Math. Moelling 7 (1986), no. 9-12, ; MR 87m:92078]. Comput. Math. Appl., 14(9-12): , ISSN stimation of prevalence ratios when PROC GNMOD oes not converge, number , Cary, NC, SAS Institute, Proceeings of the 28th Annual SAS Users Group International Conference, March 30-April 2, T Severini an W Wong. Profile likelihoo an conitionally parametric moels. Annals of statistics, 20: , TA Severini an JG Staniswalis. Quasi-likelihoo estimation in semiparametric moels. Journal of the American Statisical Association, 89: , S. Sinisi an M.J. van er Laan. The eletion/substitution/aition algorithm in loss function base estimation: Applications in genomics. Journal of Statistical Methos in Molecular Biology, 3(1), T Skov, J.A. Deens, an M.R. Petersen. Prevalence proportion ratios: estimation an hypothesis testing. Int J piemiol, 27:91 95, P Speckman. Regression analysis for partially linear moels. Journal of the Royal Statistical Society, Series B, 50: , T Stijnen an HC Van Houwelingen. Relative risk, risk ifference an rate ifference moels for sparse stratifie ata: A pseuo likelihoo approach. Statistics in Meicine, 12(24): , Hoste by The Berkeley lectronic Press

28 R Development Core Team. R: A Language an nvironment for Statistical Computing. R Founation for Statistical Computing, Vienna, Austria, C. Tuglus an M.J. van er Laan. Targete methos for biomarker iscovery, the search for a stanar. UC Berkeley Working Paper Series, page C. Tuglus an M.J. van er Laan. Repeate measures semiparametric regression using targete maximum likelihoo methoology with application to transcription factor activity iscovery. UC Berkeley Working Paper Series, page C. Tuglus an M.J. van er Laan. Robust Semiparametric Regression stimation Using Targete Maximum Likelihoo with Application to Biomarker Discovery an piemiology. PhD thesis, UC Berkeley, C Tural, L Ruiz, C Holtzer, an et al. Clinical utility of hiv-1 genotyping an expert avice: the havana trial. AIDS, 16:209 18, M.J. van er Laan. Statistical inference for variable importance. International Journal of Biostatistics, 2 (1), M.J. van er Laan an D. Rubin. Targete maximum likelihoo learning. The International Journal of Biostatistics, 2(1), M.J. van er Laan,. Polley, an A. Hubbar. Molecular Biology, 6(25), ISSN 1. Super learner. Statistical Applications in Genetics an M.J. van er Laan, S. Rose, an S. Gruber. Reaings on targete maximum likelihoo estimation. Technical report, working paper series September, K Van Laethem, A De Luca, A Antinori, A Cingolani, CF Perna, an AM Vanamme. A genotypic rug resistence interpretation algorithm that significantly preicts therapy response in hiv-1 infecte patients. Antivir Ther, 7:123 9, S Wacholer. Binomial regression in glim, esitmating risk ratios an risk ifferences. Am J piemiol, 123: , C Zocchett, Dario Consonni, an Pier Alberto Bertazzi. Re: stimation of prevalence rate ratios from cross-sectional ata (letter). Int J piemiol, 24: , G. Zou. A moifie poisson regresion approach to prospective stuies with binary ata. Am J piemiol, 159:702 6, A fficient Influence curve for effect parameter of multiplicative semi-parametric moel We present the efficient influence curve for the multiplicative semi-parametric moel parameter, an show that it can be represente as a score of the fluctuation in the overisperse exponential family of canonical form. We first present below the general form of the efficient influence curve, then present the overisperse exponential family of canonical form, an finally construct a submoel in this exponential family with score equal to the efficient influence curve. 26

29 A.1 fficient Influence curve for effect parameter of multiplicative semi-parametric moel Assume a multiplicative semi-parametric moel of the following form Q(A, W )m β (A, V ) = θ(w ) uner the following constraints: m (A = 0, V β) = 1 an 0 m β (A, V ) : for all {a, v} {A, V }, where m β (A, V ) = e m β(a,v ). The effect parameter of interest is efine as Ψ(P ) = β, where Ψ(P 0 ) = β 0 is the true parameter efine uner the true ata generating istribution. As presente in van er Laan (2006), the orthogonal complement of the nuisance tangent space for estimation of β is foun to be of the form. T nuis(p 0 ) = {h Q0,g 0 (A, W ) 0 [h Q0,g 0 (A, W ) W ]}(Y m β 0 (A, V ) θ 0 (W )) with class of estimating functions (O, β 0, Q 0, g 0 ) D h, Q0,g 0 (O β 0 ) {h Q0,g 0 (A, W ) 0 [h Q0,g 0 (A, W ) W ]}(Y m β 0 (A, V ) θ 0 (W )) for β 0 inexe by h, where θ 0 = 0 (Y A = 0, W ), an g 0 = P 0 (A W ). The corresponing influence curve is efine as D h, Q0,g IC h, Q0,g 0 (O) = 0 (O β 0 ) β 0 0 [D h, Q0,g 0 (O β 0 )]. The optimal choice of h, h opt, is such that for any vector c, c T Cov(IC hopt, Q 0,g 0 )c c T Cov(IC h, Q0,g 0 )c for all possible h Q0,g 0 (A, W ), thus proviing the most efficient estimating function. For effect parameter β uner the presente multiplicative semi-parametric moel, h opt, Q0,g 0 is efine as follows. Define the following terms. H 0 (O β 0 ) Y m β 0 (A, V ) where, or This can be rewritten as h opt, Q0,g 0 (A, W ) = h opt, Q0,g 0 (A, W ) = h opt, Q0,g 0 (A, W ) = 1 σ 2 (A, W ) ɛ(β 0 ) H 0 (O β 0 ) 0 [H 0 (O β 0 ) W ] ɛ (β 0 A, W ) β 0[ɛ(β) A, W ] σ 2 (A, W ) 0 (ɛ 2 (β 0 ) A, W ] 1 σ 2 (A, W ) ɛ (β 0 A, W ) 1 σ 2 (A, W ) ɛ (β 0 A, W ) Q 0 (A, W ) β m β 0 (A, V ) β=β0 σ 2 (A,W ) P 0(a W ) 1 σ 2 (A,W ) P 0(a W ) ɛ (β 0 A,W ) [ ] ɛ (β 0 A,W ) 0 σ 2 (A,W ) W [ ] 1 0 W σ 2 (A,W ) 0 [ Q0(A,W ) 0 [. β m β 0 (A,V ) σ 2 (A,W ) 1 σ 2 (A,W ) This results in the following efficient score/efficient estimating function: D hopt, Q 0,g 0 (A, W β 0 ) = h opt, Q0,g 0 (A, W )(Y m β 0 (A, V ) θ 0 (W )). ] W ] W 27 Hoste by The Berkeley lectronic Press

30 Note that σ 2 (A, W ) = m β 0 (A, V ) 2 σy 2 (A, W ), where σ2 Y (A, W ) = V ar(y A, W ), so that the efficient estimating function can be simplifie further as which can be rewritten as D h, Q0,g 0 (A, W β 0 ) = h opt, Q 0,g 0 (A, W )(Y Q 0 (A, W )) { h opt, Q 0,g 0 (A, W ) = Q 0(A,W ) σ 2 Y (A,W ) β m β 0 (A, V ) 0 h i β m 1 β 0 (A,V ) σ 2 W (A,W ) h i 1 0 σ 2 W (A,W ) D hopt, Q0,g 0, Q 0,g 0 (A, W β 0 ) = h opt(a, W )(Y Q 0 (A, W )) (4)» h opt, Q 0,g 0 (A, W ) = Q 0(A,W ) σy 2 (A,W ) β m β 0 (A, V ) 0 β m Q β 0 (A,V ) 0 (A,W ) W 2 σ Y 2 (A,W ) Q0 (A,W ) 2. (5) 0» σ 2 Y (A,W ) W Up until this point no istributional assumptions are mae about the conitional istribution of Y. The above efficient estimating function can be simplifie further by assuming a form for σ 2 Y (A, W ) = V ar(y A, W ). If one assumes a bernoulli or binomial outcome, σ 2 Y (A, W ) = Q 0 (A, W )(1 Q 0 (A, W )), an the above estimating function reuces to the estimating function use in Section for the targete maximum likelihoo upate of an initial log-binomial ensity. If one assumes a Poisson count outcome, then σ 2 Y (A, W ) = Q 0 (A, W ), an the above estimating function reuces to the estimating function use in Section for the targete maximum likelihoo upate of an initial Poisson ensity. A.2 Overisperse exponential family The ensity of the overisperse exponential family in canonical form is represente as follows { } ηy B(η) Pη,τ 0 (Y A, W ) = h c (Y, τ) exp. (τ) We efine η = Q 0 (A, W ) = θ(w ) exp(m β 0(A, V )) an efine a class of submoels, fluctuate by parameter ɛ as { } η(ɛ)y B(η(ɛ)) Pη,τ 0 (Y A, W ) = h c (Y, τ) exp, (τ) where η(ɛ) = Q(ɛ)(A, W ) = θ(ɛ)(w ) exp(m β(ɛ) (A, V )), θ(ɛ)(w ) = θ(w ) exp(ɛr Q0,g0(W )) an β(ɛ) = β + ɛ. Therefore the score of the above likelihoo with respect to epsilon is as follows which can be rearrange as ɛ log P 0 η,τ (ɛ)(y A, W ) = 1 (τ) ɛ log P η,τ 0 (ɛ)(y A, W ) = 1 (τ) { ɛ η(ɛ)y B (η(ɛ)) ɛ η(ɛ) ɛ η(ɛ)(y B (η(ɛ))). By efinition of the canonical form, B (η(ɛ)) = Q(ɛ)(A, W ), so that the score can be written as ɛ log P τ 0 (ɛ)(y A, W ) = 1 (τ) }, ɛ Q(ɛ)(A, W )(Y Q(ɛ)(A, W )). Given the form for Q(ɛ)(A, W ), it follows that { } ɛ Q(ɛ)(A, W ) = Q(ɛ)(A, W ) β m β(a, V ) + r Q0,g 0(W ), 28 },

31 so that the score at ɛ = 0 is given by ɛ log P τ 0 (ɛ)(y A, W ) = Q 0 (A, W ) ɛ=0 (τ) { } β m β(a, V ) + r Q0,g 0(W ) (Y Q 0 (A, W )). This score is equivalent to the form shown in quation (6) when (τ) = σy 2 (A, W ). This proves that the estimating function/efficient score presente above is a score of a likelihoo in the overisperse exponential family. This provies another verification of the fact that the claime efficient score is inee an efficient score. B Derivation of TML s B.1 Log-binomial Uner the Log-binomial istribution, the initial ensity is a binomial ensity efine as P 0 (Y A, W ) = Q 0 (A, W ) Y (1 Q 0 (A, W )) 1 Y, where Q 0 (A, W ) = P 0 (Y = 1 A, W ) = θ 0 (W )e m β 0 (A,V ) with the associate fluctuation P 0 (ɛ)(y A, W ) = Q 0 (ɛ)(a, W ) Y (1 Q 0 (ɛ)(a, W )) 1 Y, given Q 0 (ɛ)(a, W ) = θ 0 (ɛ)(w )e m β 0 (ɛ) (A,V ) The associate score for the above likelihoo with respect to ɛ at ɛ = 0 is as follows { } 1 S(r) = 1 Q 0 (A, W ) β m β 0(A, V ) + r Q 0,g 0(W ) (Y Q 0 (A, W )). The efficient score for effect parameter β 0 uner the multiplicative semi-parametric moel of Section 3 assuming a log-binomial istribution is efine below uner P 0 D h0, Q 0 (O) = h Q0,g 0 (A, W )(Y Q 0 (A, W )), where h Q0,g 0 (A, W ) = 1 1 Q m 0 (A, W ) β0 (A, V ) β 0 [ ] Q0(A,W ) 0 1 Q 0(A,W ) β 0 m β0 (A, V ) W [ ] Q0(A,W ) 0 W 1 Q 0(A,W ). This score is shown to belong to the general class of estimating functions for effect parameter β uner the multiplicative semi-parametric moel in Appenix A, equals the efficient score for a binary outcome. The proper form of the fluctuation function r Q0,g0(W ) is therefore as follows [ ] Q0 (A,W ) r Q0,g 0(W ) = 1 Q 0 (A,W ) β m 0 β 0(A, V ) W [ ] Q0 (A,W ). W Given a moel m β (A, V ) that is linear in β, the moel, can be rearrange as an upate to the initial fit 1 Q 0 (A,W ) log Q 0 (ɛ)(a, W ) = m β 0 (ɛ)(a, V ) + log θ 0 (ɛ)(w ), log Q 0 (ɛ)(a, W ) = log Q 0 (A, W ) + ɛ T 29 β 0 m β 0(A, V ) + ɛt r Q0,g0(W ). Hoste by The Berkeley lectronic Press

32 Therefore the upate can be achieve by estimating ɛ with stanar maximum likelihoo estimation. The upate can be complete using log-binomial regression setting the initial estimate, Q0 (A, W ), as an offset an regressing Y onto the following clever covariate, B.2 Poisson H Q0,g 0 (A, W ) = β 0 m β 0(A, V ) [ Q0 (A,W ) 1 Q 0 (A,W ) [ Q0 (A,W ) 1 Q 0 (A,W ) ] β m 0 β 0(A, V ) W ]. W Uner the Poisson istribution, the initial ensity is a Poisson ensity efine as with the associate fluctuation P 0 (Y A, W ) = Q 0 (A, W ) Y e Q 0 (A,W ), Y! P 0 (ɛ)(y A, W ) = Q 0 (ɛ)(a, W ) Y e Q 0 (ɛ)(a,w ). Y! The associate score for the above likelihoo with respect to ɛ at ɛ = 0 is as follows { } S(r) = β m β 0(A, V ) + r Q 0,g 0(W ) (Y Q 0 (A, W )). The efficient score associate uner the multiplicative semi-parametric moel of Section 3 assuming a Poisson istribution is efine below uner P 0. This score is also shown to belong to the general class of estimating functions for the effect parameter β uner the multiplicative semi-parametric moel in Appenix A, an is the efficient score for this effect parameter β 0 given a Poisson (i.e. count) outcome. We have where D h0, Q 0 (O) = h Q0,g 0 (A, W )(Y Q 0 (A, W )), h Q0,g 0 (A, W ) = m β0 (A, V ) 0[ β 0 m β0 (A, V )e m β 0 (A,V ) W ]. β 0 0 [e m β 0 (A,V ) W ] It follows that the proper form for r Q0,g0(W ) is { r Q0,g 0(W ) = β 0 m β 0(A, V ) [ } β m 0 β 0(A, V )e m β 0 (A,V ) W ]. [e m β 0 (A,V ) W ] Given a simple linear moel for m β (A, V ) = βa the above can rewritten as { } r Q0,g 0(W ) = A [Aeβ0A W ] [e β0 A. W ] Similar to the log-binomial case, the upate can be achieve by estimating ɛ with stanar maximum likelihoo estimation. The upate is complete using Poisson regression with an offset equal to the initial fit an clever covariate efine as { } H Q0,g (A, W ) = [ 0 β 0 m β 0(A, V ) β m 0 β 0(A, V )e m β 0 (A,V ) W ]. [e m β 0 (A,V ) W ] 30

33 B.3 General semi-parametric multiplicative moel Assuming only the semi-parametric multiplicative moel, we efine the initial ensity as a member of the overisperse exponential family as outline in Appenix A such that { Q0 Pτ 0 Y B( (Y A, W ) = h c (Y, τ) exp Q 0 } ). (τ) an we efine a class of submoels, fluctuate by parameter ɛ as { Q0 Pτ 0 (ɛ)y B( (ɛ)(y A, W ) = h c (Y, τ) exp Q 0 } (ɛ)), (τ) where Q(ɛ)(A, W ) = θ(ɛ)(w ) exp(m β(ɛ) (A, V )), θ(ɛ)(w ) = θ(w ) exp(ɛr Q0,g0(W )) an β(ɛ) = β + ɛ. The score of the above likelihoo with respect to ɛ efine at ɛ = 0 is as follows S τ (r) = Q 0 (A, W ) (τ) { } β m β(a, V ) + r Q0,g 0(W ) (Y Q 0 (A, W )). The efficient score for the effect parameter β 0 only assuming the multiplicative semi-parametric moel efine in Section 3 as shown previously in Appenix A is restate here D hopt, Q0,g 0, Q 0,g 0 (A, W β 0 ) = h opt(a, W )(Y Q 0 (A, W )) (6)» h opt, Q 0,g 0 (A, W ) = Q 0(A,W ) σy 2 (A,W ) β m β 0 (A, V ) 0 β m Q β 0 (A,V ) 0 (A,W ) W 2 σ Y 2 (A,W ) Q0 (A,W ) 2. (7) 0» σ 2 Y (A,W ) W It follows irectly that [ ] r Q0,g 0(W ) = β m 0 β 0(A, V ) Q 0 (A,W ) 2 σ W Y [ 2 (A,W ] ) Q0 (A,W ) 2 σ W, Y 2 (A,W ) given (τ) = σy 2 (A, W ). Therefore the clever covariate can be efine as [ H Q0,g 0 (A, W ) = β 0 m β 0(A, V ) β m 0 β 0(A, V ) Q 0 (A,W ) 2 ] σ W Y [ 2 (A,W ] ) Q0 (A,W ) 2. σ W Y 2 (A,W ) 31 Hoste by The Berkeley lectronic Press

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

A Modification of the Jarque-Bera Test. for Normality

A Modification of the Jarque-Bera Test. for Normality Int. J. Contemp. Math. Sciences, Vol. 8, 01, no. 17, 84-85 HIKARI Lt, www.m-hikari.com http://x.oi.org/10.1988/ijcms.01.9106 A Moification of the Jarque-Bera Test for Normality Moawa El-Fallah Ab El-Salam

More information

SYNCHRONOUS SEQUENTIAL CIRCUITS

SYNCHRONOUS SEQUENTIAL CIRCUITS CHAPTER SYNCHRONOUS SEUENTIAL CIRCUITS Registers an counters, two very common synchronous sequential circuits, are introuce in this chapter. Register is a igital circuit for storing information. Contents

More information

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5

More information

Topic 7: Convergence of Random Variables

Topic 7: Convergence of Random Variables Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information

More information

Survey-weighted Unit-Level Small Area Estimation

Survey-weighted Unit-Level Small Area Estimation Survey-weighte Unit-Level Small Area Estimation Jan Pablo Burgar an Patricia Dörr Abstract For evience-base regional policy making, geographically ifferentiate estimates of socio-economic inicators are

More information

The Role of Models in Model-Assisted and Model- Dependent Estimation for Domains and Small Areas

The Role of Models in Model-Assisted and Model- Dependent Estimation for Domains and Small Areas The Role of Moels in Moel-Assiste an Moel- Depenent Estimation for Domains an Small Areas Risto Lehtonen University of Helsini Mio Myrsylä University of Pennsylvania Carl-Eri Särnal University of Montreal

More information

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21 Large amping in a structural material may be either esirable or unesirable, epening on the engineering application at han. For example, amping is a esirable property to the esigner concerne with limiting

More information

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische

More information

SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION

SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION Johns Hopkins University, Dept. of Biostatistics Working Papers 3-3-2011 SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION Michael Rosenblum Johns Hopkins Bloomberg

More information

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments 2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor

More information

Logarithmic spurious regressions

Logarithmic spurious regressions Logarithmic spurious regressions Robert M. e Jong Michigan State University February 5, 22 Abstract Spurious regressions, i.e. regressions in which an integrate process is regresse on another integrate

More information

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics This moule is part of the Memobust Hanbook on Methoology of Moern Business Statistics 26 March 2014 Metho: Balance Sampling for Multi-Way Stratification Contents General section... 3 1. Summary... 3 2.

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2010 Paper 260 Collaborative Targeted Maximum Likelihood For Time To Event Data Ori M. Stitelman Mark

More information

Online Appendix for Trade Policy under Monopolistic Competition with Firm Selection

Online Appendix for Trade Policy under Monopolistic Competition with Firm Selection Online Appenix for Trae Policy uner Monopolistic Competition with Firm Selection Kyle Bagwell Stanfor University an NBER Seung Hoon Lee Georgia Institute of Technology September 6, 2018 In this Online

More information

Copyright 2015 Quintiles

Copyright 2015 Quintiles fficiency of Ranomize Concentration-Controlle Trials Relative to Ranomize Dose-Controlle Trials, an Application to Personalize Dosing Trials Russell Reeve, PhD Quintiles, Inc. Copyright 2015 Quintiles

More information

under the null hypothesis, the sign test (with continuity correction) rejects H 0 when α n + n 2 2.

under the null hypothesis, the sign test (with continuity correction) rejects H 0 when α n + n 2 2. Assignment 13 Exercise 8.4 For the hypotheses consiere in Examples 8.12 an 8.13, the sign test is base on the statistic N + = #{i : Z i > 0}. Since 2 n(n + /n 1) N(0, 1) 2 uner the null hypothesis, the

More information

Estimation of District Level Poor Households in the State of. Uttar Pradesh in India by Combining NSSO Survey and

Estimation of District Level Poor Households in the State of. Uttar Pradesh in India by Combining NSSO Survey and Int. Statistical Inst.: Proc. 58th Worl Statistical Congress, 2011, Dublin (Session CPS039) p.6567 Estimation of District Level Poor Househols in the State of Uttar Praesh in Inia by Combining NSSO Survey

More information

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences. S 63 Lecture 8 2/2/26 Lecturer Lillian Lee Scribes Peter Babinski, Davi Lin Basic Language Moeling Approach I. Special ase of LM-base Approach a. Recap of Formulas an Terms b. Fixing θ? c. About that Multinomial

More information

Simple Tests for Exogeneity of a Binary Explanatory Variable in Count Data Regression Models

Simple Tests for Exogeneity of a Binary Explanatory Variable in Count Data Regression Models Communications in Statistics Simulation an Computation, 38: 1834 1855, 2009 Copyright Taylor & Francis Group, LLC ISSN: 0361-0918 print/1532-4141 online DOI: 10.1080/03610910903147789 Simple Tests for

More information

Cascaded redundancy reduction

Cascaded redundancy reduction Network: Comput. Neural Syst. 9 (1998) 73 84. Printe in the UK PII: S0954-898X(98)88342-5 Cascae reunancy reuction Virginia R e Sa an Geoffrey E Hinton Department of Computer Science, University of Toronto,

More information

MODELLING DEPENDENCE IN INSURANCE CLAIMS PROCESSES WITH LÉVY COPULAS ABSTRACT KEYWORDS

MODELLING DEPENDENCE IN INSURANCE CLAIMS PROCESSES WITH LÉVY COPULAS ABSTRACT KEYWORDS MODELLING DEPENDENCE IN INSURANCE CLAIMS PROCESSES WITH LÉVY COPULAS BY BENJAMIN AVANZI, LUKE C. CASSAR AND BERNARD WONG ABSTRACT In this paper we investigate the potential of Lévy copulas as a tool for

More information

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

Research Article When Inflation Causes No Increase in Claim Amounts

Research Article When Inflation Causes No Increase in Claim Amounts Probability an Statistics Volume 2009, Article ID 943926, 10 pages oi:10.1155/2009/943926 Research Article When Inflation Causes No Increase in Claim Amounts Vytaras Brazauskas, 1 Bruce L. Jones, 2 an

More information

THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE

THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE Journal of Soun an Vibration (1996) 191(3), 397 414 THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE E. M. WEINSTEIN Galaxy Scientific Corporation, 2500 English Creek

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2015 Paper 334 Targeted Estimation and Inference for the Sample Average Treatment Effect Laura B. Balzer

More information

inflow outflow Part I. Regular tasks for MAE598/494 Task 1

inflow outflow Part I. Regular tasks for MAE598/494 Task 1 MAE 494/598, Fall 2016 Project #1 (Regular tasks = 20 points) Har copy of report is ue at the start of class on the ue ate. The rules on collaboration will be release separately. Please always follow the

More information

A note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz

A note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz A note on asymptotic formulae for one-imensional network flow problems Carlos F. Daganzo an Karen R. Smilowitz (to appear in Annals of Operations Research) Abstract This note evelops asymptotic formulae

More information

Improving Estimation Accuracy in Nonrandomized Response Questioning Methods by Multiple Answers

Improving Estimation Accuracy in Nonrandomized Response Questioning Methods by Multiple Answers International Journal of Statistics an Probability; Vol 6, No 5; September 207 ISSN 927-7032 E-ISSN 927-7040 Publishe by Canaian Center of Science an Eucation Improving Estimation Accuracy in Nonranomize

More information

Transmission Line Matrix (TLM) network analogues of reversible trapping processes Part B: scaling and consistency

Transmission Line Matrix (TLM) network analogues of reversible trapping processes Part B: scaling and consistency Transmission Line Matrix (TLM network analogues of reversible trapping processes Part B: scaling an consistency Donar e Cogan * ANC Eucation, 308-310.A. De Mel Mawatha, Colombo 3, Sri Lanka * onarecogan@gmail.com

More information

Designing of Acceptance Double Sampling Plan for Life Test Based on Percentiles of Exponentiated Rayleigh Distribution

Designing of Acceptance Double Sampling Plan for Life Test Based on Percentiles of Exponentiated Rayleigh Distribution International Journal of Statistics an Systems ISSN 973-675 Volume, Number 3 (7), pp. 475-484 Research Inia Publications http://www.ripublication.com Designing of Acceptance Double Sampling Plan for Life

More information

Flexible High-Dimensional Classification Machines and Their Asymptotic Properties

Flexible High-Dimensional Classification Machines and Their Asymptotic Properties Journal of Machine Learning Research 16 (2015) 1547-1572 Submitte 1/14; Revise 9/14; Publishe 8/15 Flexible High-Dimensional Classification Machines an Their Asymptotic Properties Xingye Qiao Department

More information

Error Floors in LDPC Codes: Fast Simulation, Bounds and Hardware Emulation

Error Floors in LDPC Codes: Fast Simulation, Bounds and Hardware Emulation Error Floors in LDPC Coes: Fast Simulation, Bouns an Harware Emulation Pamela Lee, Lara Dolecek, Zhengya Zhang, Venkat Anantharam, Borivoje Nikolic, an Martin J. Wainwright EECS Department University of

More information

The International Journal of Biostatistics

The International Journal of Biostatistics The International Journal of Biostatistics Volume 2, Issue 1 2006 Article 2 Statistical Inference for Variable Importance Mark J. van der Laan, Division of Biostatistics, School of Public Health, University

More information

Estimating Causal Direction and Confounding Of Two Discrete Variables

Estimating Causal Direction and Confounding Of Two Discrete Variables Estimating Causal Direction an Confouning Of Two Discrete Variables This inspire further work on the so calle aitive noise moels. Hoyer et al. (2009) extene Shimizu s ientifiaarxiv:1611.01504v1 [stat.ml]

More information

Gaussian processes with monotonicity information

Gaussian processes with monotonicity information Gaussian processes with monotonicity information Anonymous Author Anonymous Author Unknown Institution Unknown Institution Abstract A metho for using monotonicity information in multivariate Gaussian process

More information

arxiv: v1 [hep-lat] 19 Nov 2013

arxiv: v1 [hep-lat] 19 Nov 2013 HU-EP-13/69 SFB/CPP-13-98 DESY 13-225 Applicability of Quasi-Monte Carlo for lattice systems arxiv:1311.4726v1 [hep-lat] 19 ov 2013, a,b Tobias Hartung, c Karl Jansen, b Hernan Leovey, Anreas Griewank

More information

Modeling the effects of polydispersity on the viscosity of noncolloidal hard sphere suspensions. Paul M. Mwasame, Norman J. Wagner, Antony N.

Modeling the effects of polydispersity on the viscosity of noncolloidal hard sphere suspensions. Paul M. Mwasame, Norman J. Wagner, Antony N. Submitte to the Journal of Rheology Moeling the effects of polyispersity on the viscosity of noncolloial har sphere suspensions Paul M. Mwasame, Norman J. Wagner, Antony N. Beris a) epartment of Chemical

More information

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

Resilient Modulus Prediction Model for Fine-Grained Soils in Ohio: Preliminary Study

Resilient Modulus Prediction Model for Fine-Grained Soils in Ohio: Preliminary Study Resilient Moulus Preiction Moel for Fine-Graine Soils in Ohio: Preliminary Stuy by Teruhisa Masaa: Associate Professor, Civil Engineering Department Ohio University, Athens, OH 4570 Tel: (740) 59-474 Fax:

More information

Optimal Design for Hill Model

Optimal Design for Hill Model Optimal Design for Hill Moel DA 01 D-optimal; aaptive esign; personalize esign Russell Reeve, PhD Quintiles Durham, NC, USA Copyright 01 Quintiles Outline Motivating problem, from rheumatoi arthritis Hill

More information

Role of parameters in the stochastic dynamics of a stick-slip oscillator

Role of parameters in the stochastic dynamics of a stick-slip oscillator Proceeing Series of the Brazilian Society of Applie an Computational Mathematics, v. 6, n. 1, 218. Trabalho apresentao no XXXVII CNMAC, S.J. os Campos - SP, 217. Proceeing Series of the Brazilian Society

More information

Balancing Expected and Worst-Case Utility in Contracting Models with Asymmetric Information and Pooling

Balancing Expected and Worst-Case Utility in Contracting Models with Asymmetric Information and Pooling Balancing Expecte an Worst-Case Utility in Contracting Moels with Asymmetric Information an Pooling R.B.O. erkkamp & W. van en Heuvel & A.P.M. Wagelmans Econometric Institute Report EI2018-01 9th January

More information

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine

More information

Chapter 6: Energy-Momentum Tensors

Chapter 6: Energy-Momentum Tensors 49 Chapter 6: Energy-Momentum Tensors This chapter outlines the general theory of energy an momentum conservation in terms of energy-momentum tensors, then applies these ieas to the case of Bohm's moel.

More information

Nonparametric Additive Models

Nonparametric Additive Models Nonparametric Aitive Moels Joel L. Horowitz The Institute for Fiscal Stuies Department of Economics, UCL cemmap working paper CWP20/2 Nonparametric Aitive Moels Joel L. Horowitz. INTRODUCTION Much applie

More information

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation Tutorial on Maximum Likelyhoo Estimation: Parametric Density Estimation Suhir B Kylasa 03/13/2014 1 Motivation Suppose one wishes to etermine just how biase an unfair coin is. Call the probability of tossing

More information

CONFIRMATORY FACTOR ANALYSIS

CONFIRMATORY FACTOR ANALYSIS 1 CONFIRMATORY FACTOR ANALYSIS The purpose of confirmatory factor analysis (CFA) is to explain the pattern of associations among a set of observe variables in terms of a smaller number of unerlying latent

More information

A Review of Multiple Try MCMC algorithms for Signal Processing

A Review of Multiple Try MCMC algorithms for Signal Processing A Review of Multiple Try MCMC algorithms for Signal Processing Luca Martino Image Processing Lab., Universitat e València (Spain) Universia Carlos III e Mari, Leganes (Spain) Abstract Many applications

More information

Robust Low Rank Kernel Embeddings of Multivariate Distributions

Robust Low Rank Kernel Embeddings of Multivariate Distributions Robust Low Rank Kernel Embeings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.eu, boai@gatech.eu Abstract Kernel embeing of istributions

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2008 Paper 241 A Note on Risk Prediction for Case-Control Studies Sherri Rose Mark J. van der Laan Division

More information

THE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS

THE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS THE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS BY ANDREW F. MAGYAR A issertation submitte to the Grauate School New Brunswick Rutgers,

More information

Parameter estimation: A new approach to weighting a priori information

Parameter estimation: A new approach to weighting a priori information Parameter estimation: A new approach to weighting a priori information J.L. Mea Department of Mathematics, Boise State University, Boise, ID 83725-555 E-mail: jmea@boisestate.eu Abstract. We propose a

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2004 Paper 155 Estimation of Direct and Indirect Causal Effects in Longitudinal Studies Mark J. van

More information

arxiv: v4 [math.pr] 27 Jul 2016

arxiv: v4 [math.pr] 27 Jul 2016 The Asymptotic Distribution of the Determinant of a Ranom Correlation Matrix arxiv:309768v4 mathpr] 7 Jul 06 AM Hanea a, & GF Nane b a Centre of xcellence for Biosecurity Risk Analysis, University of Melbourne,

More information

Thermal runaway during blocking

Thermal runaway during blocking Thermal runaway uring blocking CES_stable CES ICES_stable ICES k 6.5 ma 13 6. 12 5.5 11 5. 1 4.5 9 4. 8 3.5 7 3. 6 2.5 5 2. 4 1.5 3 1. 2.5 1. 6 12 18 24 3 36 s Thermal runaway uring blocking Application

More information

arxiv:hep-th/ v1 3 Feb 1993

arxiv:hep-th/ v1 3 Feb 1993 NBI-HE-9-89 PAR LPTHE 9-49 FTUAM 9-44 November 99 Matrix moel calculations beyon the spherical limit arxiv:hep-th/93004v 3 Feb 993 J. Ambjørn The Niels Bohr Institute Blegamsvej 7, DK-00 Copenhagen Ø,

More information

Examining Geometric Integration for Propagating Orbit Trajectories with Non-Conservative Forcing

Examining Geometric Integration for Propagating Orbit Trajectories with Non-Conservative Forcing Examining Geometric Integration for Propagating Orbit Trajectories with Non-Conservative Forcing Course Project for CDS 05 - Geometric Mechanics John M. Carson III California Institute of Technology June

More information

State observers and recursive filters in classical feedback control theory

State observers and recursive filters in classical feedback control theory State observers an recursive filters in classical feeback control theory State-feeback control example: secon-orer system Consier the riven secon-orer system q q q u x q x q x x x x Here u coul represent

More information

Situation awareness of power system based on static voltage security region

Situation awareness of power system based on static voltage security region The 6th International Conference on Renewable Power Generation (RPG) 19 20 October 2017 Situation awareness of power system base on static voltage security region Fei Xiao, Zi-Qing Jiang, Qian Ai, Ran

More information

Targeted Maximum Likelihood Estimation in Safety Analysis

Targeted Maximum Likelihood Estimation in Safety Analysis Targeted Maximum Likelihood Estimation in Safety Analysis Sam Lendle 1 Bruce Fireman 2 Mark van der Laan 1 1 UC Berkeley 2 Kaiser Permanente ISPE Advanced Topics Session, Barcelona, August 2012 1 / 35

More information

Forestry An International Journal of Forest Research

Forestry An International Journal of Forest Research Forestry An International Journal of Forest Research Forestry 217; 9, 18 31, oi:1.193/forestry/cpw42 Avance Access publication 24 October 216 A emographic stuy of the exponential istribution applie to

More information

On Characterizing the Delay-Performance of Wireless Scheduling Algorithms

On Characterizing the Delay-Performance of Wireless Scheduling Algorithms On Characterizing the Delay-Performance of Wireless Scheuling Algorithms Xiaojun Lin Center for Wireless Systems an Applications School of Electrical an Computer Engineering, Purue University West Lafayette,

More information

Sparse Reconstruction of Systems of Ordinary Differential Equations

Sparse Reconstruction of Systems of Ordinary Differential Equations Sparse Reconstruction of Systems of Orinary Differential Equations Manuel Mai a, Mark D. Shattuck b,c, Corey S. O Hern c,a,,e, a Department of Physics, Yale University, New Haven, Connecticut 06520, USA

More information

28.1 Parametric Yield Estimation Considering Leakage Variability

28.1 Parametric Yield Estimation Considering Leakage Variability 8.1 Parametric Yiel Estimation Consiering Leakage Variability Rajeev R. Rao, Aniruh Devgan*, Davi Blaauw, Dennis Sylvester University of Michigan, Ann Arbor, MI, *IBM Corporation, Austin, TX {rrrao, blaauw,

More information

Separation of Variables

Separation of Variables Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical

More information

M-Quantile Regression for Binary Data with Application to Small Area Estimation

M-Quantile Regression for Binary Data with Application to Small Area Estimation University of Wollongong Research Online Centre for Statistical & Survey Methoology Working Paper Series Faculty of Engineering an Information Sciences 2012 M-Quantile Regression for Binary Data with Application

More information

ELEC3114 Control Systems 1

ELEC3114 Control Systems 1 ELEC34 Control Systems Linear Systems - Moelling - Some Issues Session 2, 2007 Introuction Linear systems may be represente in a number of ifferent ways. Figure shows the relationship between various representations.

More information

Maximal Causes for Non-linear Component Extraction

Maximal Causes for Non-linear Component Extraction Journal of Machine Learning Research 9 (2008) 1227-1267 Submitte 5/07; Revise 11/07; Publishe 6/08 Maximal Causes for Non-linear Component Extraction Jörg Lücke Maneesh Sahani Gatsby Computational Neuroscience

More information

A comparison of small area estimators of counts aligned with direct higher level estimates

A comparison of small area estimators of counts aligned with direct higher level estimates A comparison of small area estimators of counts aligne with irect higher level estimates Giorgio E. Montanari, M. Giovanna Ranalli an Cecilia Vicarelli Abstract Inirect estimators for small areas use auxiliary

More information

The total derivative. Chapter Lagrangian and Eulerian approaches

The total derivative. Chapter Lagrangian and Eulerian approaches Chapter 5 The total erivative 51 Lagrangian an Eulerian approaches The representation of a flui through scalar or vector fiels means that each physical quantity uner consieration is escribe as a function

More information

Non-Linear Bayesian CBRN Source Term Estimation

Non-Linear Bayesian CBRN Source Term Estimation Non-Linear Bayesian CBRN Source Term Estimation Peter Robins Hazar Assessment, Simulation an Preiction Group Dstl Porton Down, UK. probins@stl.gov.uk Paul Thomas Hazar Assessment, Simulation an Preiction

More information

Predictive Control of a Laboratory Time Delay Process Experiment

Predictive Control of a Laboratory Time Delay Process Experiment Print ISSN:3 6; Online ISSN: 367-5357 DOI:0478/itc-03-0005 Preictive Control of a aboratory ime Delay Process Experiment S Enev Key Wors: Moel preictive control; time elay process; experimental results

More information

Introduction to Machine Learning

Introduction to Machine Learning How o you estimate p(y x)? Outline Contents Introuction to Machine Learning Logistic Regression Varun Chanola April 9, 207 Generative vs. Discriminative Classifiers 2 Logistic Regression 2 3 Logistic Regression

More information

Nonlinear Adaptive Ship Course Tracking Control Based on Backstepping and Nussbaum Gain

Nonlinear Adaptive Ship Course Tracking Control Based on Backstepping and Nussbaum Gain Nonlinear Aaptive Ship Course Tracking Control Base on Backstepping an Nussbaum Gain Jialu Du, Chen Guo Abstract A nonlinear aaptive controller combining aaptive Backstepping algorithm with Nussbaum gain

More information

Math 342 Partial Differential Equations «Viktor Grigoryan

Math 342 Partial Differential Equations «Viktor Grigoryan Math 342 Partial Differential Equations «Viktor Grigoryan 6 Wave equation: solution In this lecture we will solve the wave equation on the entire real line x R. This correspons to a string of infinite

More information

Perturbation Analysis and Optimization of Stochastic Flow Networks

Perturbation Analysis and Optimization of Stochastic Flow Networks IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. XX, NO. Y, MMM 2004 1 Perturbation Analysis an Optimization of Stochastic Flow Networks Gang Sun, Christos G. Cassanras, Yorai Wari, Christos G. Panayiotou,

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2011 Paper 290 Targeted Minimum Loss Based Estimation of an Intervention Specific Mean Outcome Mark

More information

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control 19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior

More information

Applications of First Order Equations

Applications of First Order Equations Applications of First Orer Equations Viscous Friction Consier a small mass that has been roppe into a thin vertical tube of viscous flui lie oil. The mass falls, ue to the force of gravity, but falls more

More information

Dissipative numerical methods for the Hunter-Saxton equation

Dissipative numerical methods for the Hunter-Saxton equation Dissipative numerical methos for the Hunter-Saton equation Yan Xu an Chi-Wang Shu Abstract In this paper, we present further evelopment of the local iscontinuous Galerkin (LDG) metho esigne in [] an a

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2009 Paper 251 Nonparametric population average models: deriving the form of approximate population

More information

WEIGHTING A RESAMPLED PARTICLE IN SEQUENTIAL MONTE CARLO. L. Martino, V. Elvira, F. Louzada

WEIGHTING A RESAMPLED PARTICLE IN SEQUENTIAL MONTE CARLO. L. Martino, V. Elvira, F. Louzada WEIGHTIG A RESAMPLED PARTICLE I SEQUETIAL MOTE CARLO L. Martino, V. Elvira, F. Louzaa Dep. of Signal Theory an Communic., Universia Carlos III e Mari, Leganés (Spain). Institute of Mathematical Sciences

More information

Proof of SPNs as Mixture of Trees

Proof of SPNs as Mixture of Trees A Proof of SPNs as Mixture of Trees Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. Proof. Argue by contraiction that T is not a

More information

Optimization of Geometries by Energy Minimization

Optimization of Geometries by Energy Minimization Optimization of Geometries by Energy Minimization by Tracy P. Hamilton Department of Chemistry University of Alabama at Birmingham Birmingham, AL 3594-140 hamilton@uab.eu Copyright Tracy P. Hamilton, 1997.

More information

Robust Bounds for Classification via Selective Sampling

Robust Bounds for Classification via Selective Sampling Nicolò Cesa-Bianchi DSI, Università egli Stui i Milano, Italy Clauio Gentile DICOM, Università ell Insubria, Varese, Italy Francesco Orabona Iiap, Martigny, Switzerlan cesa-bianchi@siunimiit clauiogentile@uninsubriait

More information

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,

More information

CUSTOMER REVIEW FEATURE EXTRACTION Heng Ren, Jingye Wang, and Tony Wu

CUSTOMER REVIEW FEATURE EXTRACTION Heng Ren, Jingye Wang, and Tony Wu CUSTOMER REVIEW FEATURE EXTRACTION Heng Ren, Jingye Wang, an Tony Wu Abstract Popular proucts often have thousans of reviews that contain far too much information for customers to igest. Our goal for the

More information

Analytische Qualitätssicherung Baden-Württemberg

Analytische Qualitätssicherung Baden-Württemberg Analytische Qualitätssicherung Baen-Württemberg Proficiency Test 1/13 TW S1 sweeteners an benzotriazoles in rinking water acesulfam, cyclamate, saccharin, sucralose, 1H-benzotriazole, 4-methyl-1H-benzotriazole,

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri ITA, UC San Diego, 9500 Gilman Drive, La Jolla, CA Sham M. Kakae Karen Livescu Karthik Sriharan Toyota Technological Institute at Chicago, 6045 S. Kenwoo Ave., Chicago, IL kamalika@soe.ucs.eu

More information

Backward Bifurcation of Sir Epidemic Model with Non- Monotonic Incidence Rate under Treatment

Backward Bifurcation of Sir Epidemic Model with Non- Monotonic Incidence Rate under Treatment OSR Journal of Mathematics (OSR-JM) e-ssn: 78-578, p-ssn: 39-765X. Volume, ssue 4 Ver. (Jul-Aug. 4), PP -3 Backwar Bifurcation of Sir Epiemic Moel with Non- Monotonic ncience Rate uner Treatment D. Jasmine

More information

3.2 Shot peening - modeling 3 PROCEEDINGS

3.2 Shot peening - modeling 3 PROCEEDINGS 3.2 Shot peening - moeling 3 PROCEEDINGS Computer assiste coverage simulation François-Xavier Abaie a, b a FROHN, Germany, fx.abaie@frohn.com. b PEENING ACCESSORIES, Switzerlan, info@peening.ch Keywors:

More information

Lecture 6: Generalized multivariate analysis of variance

Lecture 6: Generalized multivariate analysis of variance Lecture 6: Generalize multivariate analysis of variance Measuring association of the entire microbiome with other variables Distance matrices capture some aspects of the ata (e.g. microbiome composition,

More information

TMA 4195 Matematisk modellering Exam Tuesday December 16, :00 13:00 Problems and solution with additional comments

TMA 4195 Matematisk modellering Exam Tuesday December 16, :00 13:00 Problems and solution with additional comments Problem F U L W D g m 3 2 s 2 0 0 0 0 2 kg 0 0 0 0 0 0 Table : Dimension matrix TMA 495 Matematisk moellering Exam Tuesay December 6, 2008 09:00 3:00 Problems an solution with aitional comments The necessary

More information

On the Surprising Behavior of Distance Metrics in High Dimensional Space

On the Surprising Behavior of Distance Metrics in High Dimensional Space On the Surprising Behavior of Distance Metrics in High Dimensional Space Charu C. Aggarwal, Alexaner Hinneburg 2, an Daniel A. Keim 2 IBM T. J. Watson Research Center Yortown Heights, NY 0598, USA. charu@watson.ibm.com

More information

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y Ph195a lecture notes, 1/3/01 Density operators for spin- 1 ensembles So far in our iscussion of spin- 1 systems, we have restricte our attention to the case of pure states an Hamiltonian evolution. Toay

More information

DEGREE DISTRIBUTION OF SHORTEST PATH TREES AND BIAS OF NETWORK SAMPLING ALGORITHMS

DEGREE DISTRIBUTION OF SHORTEST PATH TREES AND BIAS OF NETWORK SAMPLING ALGORITHMS DEGREE DISTRIBUTION OF SHORTEST PATH TREES AND BIAS OF NETWORK SAMPLING ALGORITHMS SHANKAR BHAMIDI 1, JESSE GOODMAN 2, REMCO VAN DER HOFSTAD 3, AND JÚLIA KOMJÁTHY3 Abstract. In this article, we explicitly

More information

Construction of the Electronic Radial Wave Functions and Probability Distributions of Hydrogen-like Systems

Construction of the Electronic Radial Wave Functions and Probability Distributions of Hydrogen-like Systems Construction of the Electronic Raial Wave Functions an Probability Distributions of Hyrogen-like Systems Thomas S. Kuntzleman, Department of Chemistry Spring Arbor University, Spring Arbor MI 498 tkuntzle@arbor.eu

More information

A variance decomposition and a Central Limit Theorem for empirical losses associated with resampling designs

A variance decomposition and a Central Limit Theorem for empirical losses associated with resampling designs Mathias Fuchs, Norbert Krautenbacher A variance ecomposition an a Central Limit Theorem for empirical losses associate with resampling esigns Technical Report Number 173, 2014 Department of Statistics

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in

More information