Simulation of Propenity Scoring Method Dee H. Wu, Ph.D, David M. Thompon, Ph.D., David Bard, Ph.D. Univerity of Oklahoma Health Science Center, Oklahoma City, OK ABSTRACT In certain clinical trial or obervational tudie, proper random aignment of treatment and control group i not alway poible, o that election bia may become an iue. Recent effort to addre iue of nonrandom aignment, including a cla of method known a Propenity Scoring, are alternative to reduce bia in the etimation of treatment effect when aignment i not random. The original technique wa decribed by Roenbaum and Rubin in 1983. However, we have found that much of current literature decribe the technique uing a large number of mathematical contruct. The multiplicity of mathematical approache reduce the technique acceibility to uer from different interdiciplinary department. Failure to undertand the tatitical method deter proper uage. We will firt decribe the technique through diagram and imulation to improve teaching and undertanding the material intuitively in SAS. We hope that the introduction permit reader to enhance their own undertanding of the relevant mathematical formulim preented in the literature. INTRODUCTION On returning from a national SAS uer group meeting, we preented the Propenity Scoring Method to our local tudent/faculty SAS group. It wa apparent from the group repone that thoe who did not attend the national meeting were only able to follow part of the complex preentation. We decided to create a imulation and viual preentation of the method in SAS for teaching and to put the problem in the hand of tudent. Some Are highly correlated With treatment aignment Some Are highly correlated With treatment aignment For Simplicity in Drawing (we actually Keep multiple in the problem) A vector that i highly correlated with treatment aignment A vector that i highly correlated with treatment aignment e i diregarded Jut howing two compoite Vector a et which highly impact The treatment election and one that doen t Increae in Probability Of treatment 1 0.8 0.6 0.4 0.2 0 0 50 100 150 200 250 300 350 400 450 Now generate a propenity Score (a calar function) Figure 1: Overview of the propenity coring method in picture in a nuthell. I II III IV V e I I I II III IV V Group (match ) Now We can Do our analyi On group 1
Our firt goal wa to decribe the mathematic purely through image (ee figure 1). The method multiple tep make propenity coring method (PSM) difficult to grap. PSM involve projection of vector, the aumption that a vector of i highly correlated with treatment aignment, the ue of logitic regreion, ome algorithm for matching or tratification (or claification uch a deign tree), and the contruction of general linear model. Only when we created thee picture did intuition about the method became more apparent to u. We generated a tet et baed on 4 example model that we deigned and elected to how the technique trength and weaknee. The imulation procedure i decribed below Method: 1. Generate four imulation et (with X and output Y) 1 A 2 3 4 Figure 2: Hitogram of four exemplary ditribution. Although we produced thee ditribution with N=5000 for diplay, we imulated tet cae with N=500 to create larger enitivity in the etimate. The parameter for the four cae were imulated in Matlab but can be eaily run in SAS: cae 1: x1 = GenerateNHumpRand(x,[-4 4],[0 1 0],[0 0 0 ], [1 1 1]); cae 2: x1 = GenerateNHumpRand(x,[-4 4],[0.4 0 0.6],[-1 0 1], [0.5 0 0.3]); cae 3: x1 = GenerateNHumpRand(x,[-4 4],[0.3 0.2 0.5],[-1 0 1], [0.2 0.3 0.3]); cae 4: x1 = GenerateNHumpRand(x,[-4 4],[0.3 0 0.7],[-3.5 0 1], [2.1 0.1]); 2
The input to the function GenerateNHumpRand are: The firt vector [-4 4] i the permiible range from which the imulation draw the x value from a tandard normal ditribution. The econd vector ize indicate the ditribution number of hump, and the fractional multiplier pecify the ize of the obervation The third vector pecifie location for the mean of the ditribution, The fourth vector pecifie tandard deviation. 2. Create 3 (x2, x3, x4) that are uncorrelated with x1, which wa generated above. Although the vector x1 may in reality be a combination of other vector, we reduced x1 to a ingle one for demontration implicity.!"## $$%%"$ &!&"$"&&$'!'%$%%&% ($($($($($ $%%"$$!&)$! $!$'! ($$" "&($$%($ $ &!&"$!&)$$%$ '$ ') ($$" "&$'&'"$%'($!$"&&$! $%$*$ %)$!)%%% ($($$'&'"$'!($ "$'!'%$!$ '*$ %)$$'' ($$%$%'$'!$) ##$%%&% $'!$ ')$!)%%%$''$ ($($($($$) 3. Calculate a propenity core baed a linear combination (weight um to 1) of the X1 to X4, along with a mall random normal error term The variable 'propenity core wa generated by a linear combination of the but could have been generated by any function. propenity = 0.79*x1+0.15*x2+0.05*x3+0.01 *x4+0.3*normal(0,1); 4. Aign to Treatment group baed on the propenity core. Note we ue run a tandard random normal probability with mean 0 and td=1 to aign treatment group (i.e. x1 ha a large influence on etting the treatment group). Note, we ued the invere Probit tranformation to control the treatment/control ratio (proportion) and for generating the proper threhold for claifying the treatment. Notice z1 ha a large influence on the treatment claification. The core we et up typically had range of -2 to 2. 5. Create obervation vector Y for each of the tet cae 3
Y= a1*x1 + a2*x2 + a3*x3 + a4*x4 + b*t + c1*t*x1 + c2*t*x2 + c3*t*x3 + c4*t*x4 + 0.4*Norm(0,1) where, a1=0.28,a2=0.13,a3=0.23,a4=0.352; b=0.4; c1=c2=c3=0.2,c4=0.02 y 6 5 4 3 2 1 0-1 -2-3 y 6 5 4 3 2 1 0-1 -2-3 -4-3 -2-1 0 1 2 3 4-2 -1 0 1 2 x1 Treat 0 1 x2 Treat 0 1 Figure 3: The treatment et and control group are plotted againt x1. In many application, everal may be aociated with treatment aignment, not jut a ingle one like x1. x1 can repreent a modeled linear combination of. Note that, Figure 3 illutrate heterogeneity of regreion that would challenge conventional ANCOVA method but may not affect PSM. Reult and Dicuion: Ditribut ion Decription Niave Quintile Method Full Model Error Quint Error Full Delta Quint Delta Full - 1 1lump 1.62 0.399 0.375-0.3% -6.3% 0.001 0.025 2 2lump 1.45 0.411 0.471 2.7% 17.8% 0.011 0.071 - - 3 3lumpbroad 1.13 0.324 0.378 19.0% -5.5% -0.076 0.022-4 2aymetric 1.64 0.67 0.28 67.5% 30.0% 0.27-0.12 Table 1: We compare the Propenity Quintile method v the Full Model Note that in cae where the propenity core behave well and cover a wide range of value, the propenity quintile method outperformed the Full regreion model: +!",,,!," However, when the propenity core are more aymmetrically ditributed or mixed, the propenity method wa le ueful. A we taught ourelve, the efficacy of the propenity coring technique depended on it ability to create a well-behaved function, and on the matching technique. Matching can be improved with better claification cheme; we performed a imple quintile tratification. Concluion: Our approach to demontrating the Propenity Scoring technique by pictorial decription over mathematical formulation wa better received by faculty and tudent, helped our undertanding, 4
and built collaboration. Thi demontration provided a tool that tudent and faculty can ue to better undertand the propenity technique. Thi work can be extended to the multiple varietie of dicriminant/claification cheme available for matching. Regardle, uer of the technique hould beware of it pitfall and challenge once they have undertood the bae method. The Propenity Method in Parimoniou Mathematical Formulim. Now we return to the mathematic, a extracted from our undertanding and from an online earch for the topic. Y i = α + τw + X β + ε i ' i i α, the grand mean, i the average treatment effect (in condition that aignment to the treatment or control group i random) τ i the treatment effect W i = 1 for individual in the treated group. W i = 0 for thoe in the control group. Y i i the oberved outcome of individual i X i include a et of oberved characteritic (), ome of which affection election into treatment. ε i i the error term denoting unoberved characteritic. Z i i an oberved variable that affect election into the treatment. Matching i neceary becaue of dependence between ε i and W i that i due to a et of oberved, X i, that are aociated with election into the treatment. Matching pecifically et: E(ε i W i, X i, Z i ) = E(ε it X i, Z i ). We want to etimate W i Z i (the probability of Wi given Zi) by contructing an independent vector that i orthogonal to it and i therefore free from election or other bia. The matching condition can be decribed a: (Y i1, Y i0 ) W i Z i and provide a vector from which we can calculate unbiaed etimate of treatment effect. 5
Appendix: -.#/0! 123 & 4567 8#9 :+ :8;:/7:/;9; 7!'%$ '&&'$!"'!$)!($ <"'"!""$%&)%!$%'%%'% 4"''&"$%))) % *:/ 97:<7 $!!"! $)!$)!"%)!$!) :8;4#=::7:/;9; #/"!)'$'"'" &"'&$")&!%)!'$'!($ 4%$&!%$&!'$"!$ :8;4#===::7:/;9; #/" $") )!$!% &"!$)($ 4%$&!%$&!'$"!$ : +<+<9 =#$)!"!!!">$)!"%)! $')($ #/* $ ")" ))>$&%'")%* $'($ #/ *$&!'!'%'%'>$%')%*$ '($ #/!*$!"" ')>$!!!*$!"($ #/"*$"!&%!'>$)'%) *)$!($ #/$>$$$ 4*$!''%%'' >$!!!%!*!$&$ 4$>$$$ -.#/0! 44++#+-03 4++#+-03?3@ 6
;2A92+!",,,!,"BC 123 & 4567 8#9 :+ :8;:/7:/;9; 7'%%%$&''!'&"$))))")%$%($ <"'&"$%)))"&$ %! 4"''&"$%))) % *:/ 97:<7 $)'' &)$)"! $!' '$!) :8;4#=::7:/;9; 4 &$&)&"& &$&)&"&)%$($ $'%"'!$'%"'!%%'$%"($!"$!!''"&!"$!!''"& $!($!%$'" %$'" "$!($ "!&$'")!&$'")% $"%($,4$''%$''%!$&$!,4&$"!& &$"!& "'$"($!,4$!)%$!)%$&$&)"% ",4$!!''$!!'' $! $ )" :8;4#===::7:/;9; 4 $ ") $ ")"$!$ $'% % $'% % %$!$!!&$&")!&$&") "!$"%($!$ $ $$&! "!)$"%!) "!)$"%!) "%%$"($,4$"&)!$"&)!$!$&),4&$'!) &$'!) "'$)&($!,4$'!'$'!'$&$&))' ",4$!!''$!!'' $! $ )" : +<+<9 =#$! &&'"& >$&)!"$!%($ 4*$!&"&) >$')!&%"*!$)$ Table 2: The reult from the 1 lump model (ditribution 1) with the full model (including convariate and treatment effect 7
% 08-02-07 Dee Wu, David Thompon % Univerity of Oklahoma Health Science Center %macro r; %do i=2 %to 2; /*create working SAS data tep from alump tab-delimited text file*/ %let indata="f:\wu\latet\propenity&i.lump.txt"; data in; infile &indata firtob=2 dlm='09'x; input num propenity x1 x2 x3 x4 a1 a2 a3 a4 b c1 c2 c3 c4 d1 ort1 ort2 ort3 Treat y T2 y2 Y3; goption reet=all; title "Ditribution of Y veru confounder x1 for dataet &i"; proc gplot data=in; ymbol v=plu i=rl; plot y*x1=treat; goption reet=all; title "Ditribution of Y veru confounder x2 for dataet &i"; proc gplot data=in; ymbol v=plu i=rl; plot y*x2=treat; goption reet=all; proc corr data=in; var y x1 x2 x3 x4 propenity; proc univariate data=in; title "Ditribution of Y veru confounder x1 for dataet &i"; var y; hitogram y; /*Thi tep calculate the mean and SD of the outcome Y for the two oberved treatment group. While our initial attempt at dicerning true group memberhip in a mixture model ued thi information, it' le important here.*/ proc mean data=in mean td; cla treat; var y; od output ummary=treatmean; /*Logitic regreion obtain PROPENSITY SCORES in the form of predicted probabilitie that an obervation i aigned to treatment group 1, conditional on it oberved. The regreion function conider jut the (including potential confounder), not the outcome Y.*/ proc logitic data=in; model treat (event='1')= x1 x2 x3 x4; output out=predprob p=phat; 8
/*Obervation can be grouped on the bai of the predicted probabilitie that are output from the previou tep. One approach imply dichotomize the grouping. Becaue phat=p(treat=1), predgroup i 1 if phat i high (.5 or higher), and 2 if phat i le than.5*/ data predict1; et predprob; predgroup=2-(phat ge.5); /*A more flexible approach group obervation on the bai of quantile.*/ proc rank data=predprob group=5 out=rank; rank pquintile; var phat; proc freq data=rank; table pquintile*treat; proc ort data=rank; by decending treat; /*Checking aociation among and quantile aignment.*/ title('model x1 x2 x3 x4 = pquintile treat pquintile*treat'); cla pquintile treat; model x1 x2 x3 x4 = pquintile treat pquintile*treat; /*THIS INTERACTION MODEL PRODUCES DATA S1 WITH BETA ESTIMATE*/ title('model y = pquintile treat pquintile*treat'); cla pquintile treat; model y = pquintile treat pquintile*treat / olution; od output parameteretimate=1 (where=(ubtr(parameter,1,5)='treat')); /*THIS NO INTERACTION MODEL THAT ADJUSTS FOR THE PROPENSITY SCORE'S QUINTILE PRODUCES DATA S2 WITH BETA ESTIMATE*/ title('model y = pquintile treat'); cla pquintile treat; model y = pquintile treat / olution; od output parameteretimate=2 (where=(ubtr(parameter,1,5)='treat')); /*THIS NAIVE MODEL MAKES NO ADJUSTMENT AND RECORDS BETA ESTIMATE IN DATA S3*/ title('model y=treat'); cla treat; model y = treat / olution; od output parameteretimate=3 (where=(ubtr(parameter,1,5)='treat')); 9
/*WEIGHTING THE NAIVE MODEL BY THE PROPENSITY SCORE*/ weight phat; cla treat; model y=treat / olution; od output parameteretimate=4 (where=(ubtr(parameter,1,5)='treat')); /*TREATING THE PROPENSITY SCORE AS A SURROGATE FOR INFORMATION ON THE COVARIATES*/ cla treat ; model y=treat phat / olution; od output parameteretimate=5 (where=(ubtr(parameter,1,5)='treat')); /*A final method ue IPTW etimator by contructing invere weight from original predicted probabilitie, then uing thee in another logitic regreion.*/ data weight; et predprob; if treat=1 then w=1/phat; if treat=0 then w=1/(1-phat); /*and run the regreion model with IPTW taking place of that might confound expoure of interet.*/ proc glm data=weight order=data; cla treat; model y = treat w / olution; od output parameteretimate=6 (where=(ubtr(parameter,1,5)='treat')); /*INSTEAD OF USING THE IPTW ESTIMATOR AS A SURROGATE FOR THE COVARIATES, USE IT AS A WEIGHT.*/ proc glm data=weight order=data; cla treat; WEIGHT W; model y = treat / olution; od output parameteretimate=7 (where=(ubtr(parameter,1,5)='treat')); /*THIS STEP COLLECTS THE BETA ESTIMATES IN A SINGLE DATASET and applie a format to identify the model that produced the etimate. */ proc format; value methf 1='model y = pquintile treat pquintile*treat' 2='model y = pquintile treat' 3='unadjuted model y=treat' 4='unadjuted model weighted by propenity core' 5='model y=treat phat' 6='IPTW etimator a urrogate for model y=treat w' 7='IPTW etimator a weight in model y=treat' ; data reult; et 1 2 3 4 5 6 S7; 10
if etimate ne 0; method+1; format method methf.; drop dependent biaed; %end; %mend r; %r; quit; 11