Georgetown Unversty From the SelectedWorks of Mark J Meyer 8 Usng the estmated penetrances to determne the range of the underlyng genetc model n casecontrol desgn Mark J Meyer Neal Jeffres Gang Zheng Avalable at: https://works.bepress.com/markjmeyer/4/
Usng the estmated penetrances to determne the range of the underlyng genetc model n case-control desgn Mark Meyer, Neal Jeffres, Gang Zheng* Offce of Bostatstcs Research, Natonal Heart, Lung and Blood Insttute, 67 Rockledge Drve, MSC 793, Bethesda, MD 89-793, USA *Correspondng author Emal addresses: MM: meyerm@nhlb.nh.gov NJ: nealjeff@nhlb.nh.gov GZ: zhengg@nhlb.nh.gov
Abstract It s well known that the penetrance cannot be estmated usng the retrospectve casecontrol samples wthout makng addtonal assumptons. In the lterature the estmaton of the penetrance s based on the assumptons that ether the dsease s rare or the dsease prevalence s known. We propose an alternatve approach to estmate the penetrance by assumng an underlyng genetc model even though t s unknown. Wth ths assumpton, we can obtan the pont estmates of the penetrances as functons of the genetc model, from whch the range of underlyng genetc models can be determned. We examne the performance of our results under varous genetc models usng smulaton studes and the case-control dataset of GAW6. Background Penetrance s a useful parameter n genetc assocaton studes. The penetrance s defned as the probablty of havng a dsease gven one of the three genotypes for a dallelc marker. Three penetrances are used n the lterature, each correspondng to one genotype. Varous underlyng genetc models are also defned usng the penetrances. In case-control assocaton studes, cases and controls are retrospectvely sampled from the study populatons. It s known that the penetrances cannot be estmated usng the retrospectve case-control samples unless some addtonal assumptons are made. In the lterature, one often assumes that the dsease s rare or the dsease prevalence s known. Then, under these assumptons, the penetrances can be estmated usng the retrospectve case-control samples. We propose an alternatve approach by assumng an underlyng genetc model for a SNP wth assocaton the dsease, even though the true genetc model s
unknown. Under ths assumpton we can wrte the penetrances as functons of odds ratos and the underlyng genetc model. Thus, the penetrances can be estmated wthout assumng the rare dsease or the known dsease prevalence. Because the penetrances are functons of the specfed underlyng genetc model, usng the estmates of penetrances we could determne the range of underlyng genetc models. We examne the performance of our results under varous possble genetc models usng smulaton studes and case-control data from GAW6. Methods Notaton Denote the alleles for the dallelc marker (SNP) of nterest as A and B and the three genotypes as G = AA, G = AB, and G = BB. The penetrance s then gven by f = Pr( dsease G ), =,,. The genotype counts for ( G, G, G ) n cases are denoted by ( r, r, r ) and n controls by ( s, s, s ). The odds rato OR s the rato of G relatve to G gven the dsease status, whch can be wrtten as OR = f f ) /{ f ( f )},,. Assume ( = that the genetc model s specfed by f = x) f + xf, x (,). Then the followng equatons can be obtaned: ( f x + xor OR = and, =,. ( x)( OR )( OR ) = OR f f f + f OR Estmatng penetrances under the specfed genetc model From the prevous secton, the penetrances are wrtten as functons of the two 3
odds ratos. Thus, denote f x) = h ( OR, OR x), for,,. Then the pont ( = estmate for a penetrance s gven by fˆ ( ) ( ˆ, ˆ x = h OR OR x), where O Rˆ = r s /( r ) and O Rˆ = r s /( r ) are based on the observed case-control data s for that SNP. s Usng the estmates to determne the underlyng genetc model Note that the estmates of penetrances, f ˆ ( x ), as functons of x should be between and. However, f ˆ ( x ) are not necessary between and for any x n (,). Therefore, we can determne the range of x n (,) by restrctng f ˆ ( x ) to be between and. The reason that ths approach could work s that, when one of alleles has hgher rsk,.e., f < f < f, OR > OR >, we have f x OR OR = ( x) ( OR )( OR > ) f f f f OR and = = > ( ). x f x x f + f OR Thus, a sngle nterval, x ( a, b), can be obtaned wth whch the three penetrances are between and. Then we expect the nterval (a,b) would cover the true underlyng genetc model x. Numercal Results Smulaton studes We conduct smulaton studes to examne the performance of the above results under the null and alternatve hypotheses. In the smulatons, we specfy the mnor allele frequency (MAF) p =.,.3, and.5, the dsease prevalence k =., genotype relatve rsks (GRRs) λ = f / f and λ = f / f satsfyng 4
λ = x + xλ for some x=.5,.5, and.75. We assume 5 cases and 5 controls are used and λ =.. For each replcate, whether or not the nterval (a,b) covers the true value of x s determned and the coverage of the nterval s defned as the percentage of the replcates that the nterval covers the true value of x. The results are reported n Table based on, replcates. The results n Table show that under the null hypothess ( λ = ), the coverage s about % regardless of the MAF and true value of x, whle under the alternatve hypothess ( λ = ), the coverage ncreases to above 5% when MAF s small (p =.) and the true value of x s small or large to above 95% when the true value of x s around the addtve model (x =.5) for the moderate MAF. The smulaton results ndcate the proposed approach provde some nsght nto the possble genetc model underlyng the data. Applcatons The above method s also appled to the sx assocated SNPs reported n Problem of GAW6. So the ranges of possble genetc models for these SNPs can be obtaned. The sx SNPs are gven n Table along wth the genes that they belong to and the chromosome numbers. These SNPs were reported n Plenge et al. (5) as lnked to RA or selected from the canddate-gene studes. Our approach cannot be appled to the last SNP n Table because ts x may be outsde of the range (,). Ths happens when the true genetc model does not belong to the genetc models between the recessve and domnant models. One such example s the overdomnant model whose x s ether less than or greater than. The ntervals for the underlyng genetc models for the other four SNPs are reported n Table (last column). The plots of the three estmates of the penetrances are gven 5
n Fgure for the frst four SNPs. Concluson and Dscusson We studed how to nfer possble underlyng genetc model usng the estmated penetrance usng case-control data. No other approaches seem to be avalable to provde a range of possble genetc model. Our usefulness of our results were demonstrated by the smulaton studes and were llustrated by applyng the results to GAW6 dataset. We would lke to contnue ths research to study how the dentfed range of genetc models can be used to mprove power for ntal genome-wde assocaton studes compared to usng MAX of Fredln et al. (), Zheng et al. (7), and L et al. (8). References Fredln et al. (). Trend tests for case-control studes of genetc markers: Power, sample sze and robustness. Hum. Hered. 53, 46 5. L et al. (8). MAX-rank: a smple and robust genome-wde scan for case-control assocaton studes. Hum. Genet. 3, 67-63. Plenge et al. (8). TRAF-C5 as a rsk locus for rheumatod arthrts - a genomewde study. NEJM 357, 99-9. Zheng et al. (7). Robust ranks of true assocatons n genome-wde case-control assocaton studes. BMC Proc. (Suppl ), S65. 6
Table : Smulaton of the coverage of the true underlyng genetc model based on, replcates by restrctng the estmated penetrances between and under the null ( λ = ) and alternatve ( λ = ) hypotheses. Coverage MAF True x λ = λ =..5.% 54.8%.5.8% 77.6%.5.6% 73.%.75.4% 6.7%.95.3% 53.%.3.5.6% 66.%.5.% 95.8%.5.% 93.9%.75 9.6% 76.%.95 9.9% 55.%.5.5 9.5% 63.4%.5 9.3% 9.6%.5 9.7% 98.3%.75.% 83.9%.95 9.6% 56.4% 7
Table : The sx SNPs selected from Problem of GAW6. SNP ID Genes Chromosome Model nterval rs645767 MHC 6 (.3,.94) rs4766 PTPN (.57,.78) rs7574865 STAT4 (.8,.) rs66 TNFRSFb (.57,.6) rs73838 SLCA4 5 (.46,.53) rs48696 DLG5 NA Fgure : Plots of the estmates of the three penetrances over all possble genetc models wth x n (,). 8
9