Tian Zheng Department of Statistics Columbia University

Haplotype Tansmsson Assocaton (HTA) An "Impotance" Measue fo Selectng Genetc Makes Tan Zheng Depatment of Statstcs Columba Unvesty Ths s a jont wok wth Pofesso Shaw-Hwa Lo n the Depatment of Statstcs at Columba Unvesty. 1

Outlne Make selectng poblem n genetc mappng fo complex tats HTA Statstcs Make sceenng based on HTA Example Summay 2

Genetc Mappng: Sngle locus dseases: the sk of the dsease s decded by the dffeences on one gene. Conventonal appoaches make-wse tests Common dseases: Complex tats of common dseases ae usually caused by multple genes wth possble nteactons among them. Make-wse tests cannot captue pesumed nteactons among dsease genes. Analyss that takes nto account the possble nteactons among makes should be used. 3

Poblem: Fne mappng: Lage numbe of avalable genetc makes. Complex tats: Jont analyss of ndvdual makes and the possble nteactons. The numbe of makes and possble nteactons geatly exceed the numbe of patents n the study. Soluton: Pe-select a small set of most mpotant makes so that detaled analyss that nvolves nteactons can be caed out usng data wth modeate sze. 4

Make selecton fo complex tats Stat wth a lage set of canddate genetc makes. Sceen out the makes wth lttle nfomaton egadng the dsease tats. Take nto account possble nteacton among the dsease genes. Tme and memoy effcent. 5

Data: genetc nfomaton of a andom sample of n patents and the paents. 2n paent-patent tansmsson pas Each pa conssts of two haplotypes one tansmtted and the othe untansmtted () l h t th Fo l pa, let be the haplotype tansmtted to the dseased chld, and h () l u be the untansmtted t = () l = t n #( h h ) u = () l = u n #( h h ) 6

Haplotype tansmsson dsequlbum (HTD) s defned to measue the amount of lnkage/ld nfomaton contaned n the set of makes beng tested: t u 2 ( ), HTD = n n whose expectaton unde the null hypothess s equal to the tace of the Fshe s nfomaton matx paametezed by haplotype elatve sks. 7

Assume m makes SM = { M1, M2,..., Mm} ae beng tested, to evaluate the nfomaton contbuted by the th make M, whch has alleles a and b, consde SM = SM / M( th -deleted make set). Let H = { h, h,..., h } 1 2 H be the set of haplotypes spanned by S M t,and n can be defned as befoe. n u, and the counts 8

t Denote by n t ( a) and n( b), the numbe of tansmssons of the enlaged haplotypes: h a, and h b, espectvely. ae defned fo the non- u Smlaly, two counts n u ( a) and n ( b) tansmssons of the enlaged haplotypes. It s easy to obseve that n = n ( a ) + n ( b ) t t t n = n ( a ) + n ( b ) u u u. 9

HTD fo m makes, M 1 2 S = { M, M,..., M }: HTD( m) = ( n ( a )- n ( a )) + ( n ( b )- n ( b )) h H HTD fo the m-1 makes n m t u 2 t u 2 th -deleted make set S = S / M : M M t u HTD ( m 1) = ( n n ) h 2 t t u u = ( n ( a ) + n ( b ) n ( a )- n ( b )) h = ( n ( a ) n ( a )) + ( n ( b )- n ( b )) h H H H t u 2 t u 2 t u t u + 2 ( n ( a ) n ( a )( n ( b )- n ( b )) h = HTD( m) t u t u + 2 ( n ( a ) n ( a )( n ( b )- n ( b )) h H H 2 10

Thus, the amount of nfomaton bought by make M can be evaluated usng the HTD dffeence the nfomaton dop. HTD ( m 1) = HTD ( m 1) HTD( m) t u t u = 2 ( n( a) n ( a)( n( b)- n ( b)) h H Haplotype Tansmsson Assocaton (HTA) s t u t u HTA ( m) = ( n( a) n ( a)( n( b) - n ( b)) + h Η h h n( ) a b h Η 11

The popetes of HTA statstc: Negatve Expectaton of HTA ( m ) Most mpotant M contbutes mpotant lnkage nfomaton to the cuent make set. Zeo M only contans no lnkage nfomaton and no make n the data set s assocated wth any dsease susceptblty loc. M contbutes lttle lnkage nfomaton but nose to the data, and Postve dlutes the tue lnkage/assocaton nfomaton caed by othe makes. Least mpotant 12

Make selecton algothm based on the HTA statstc Data {,,..., } S = M M M M 1 2 K s the total numbe of makes. m s the numbe of makes etaned n S M. Fo each = 1,2,..., m, calculate HTA ( m ) fo M. K Delete the make wth the hghest HTA ( m ) n S M and contnue n the loop. Any non-negatve HTA? No Retun S M as sceenng esult. Yes 13

Example: a complex dsease wth thee susceptblty loc A, B, and E. Haplotype Haplotypc Relatve Rsk (HRR) ABE hgh ABe low AbE low Abe hgh abe low abe hgh abe hgh abe low 20 makes ae geneated wth 3 of them n stong lnkage/dsequlbum wth the dsease loc espectvely. 14

Aveage HTA values fo lnked and unlnked makes loop 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 lnked 0 0 0 1 0-2 0 2 14 36 93 174 390 805 1626 2864 5538 NA NA NA unlnked 0-1 -2-1 -3 0-6 -6-15 -41-71 -114-343 -716-1568 -3317-6563 -12549-968 -1296 HTD values dung the sceenng + : lnked make o : unlnked make Sceenng stopped. 15

Summay HTA statstc and BHTA algothm HTA measues the mpotance of a make n tems of the amount of nfomaton contbuted by t. Sceenng algothm based on HTA s fast, haplotype-based. It s able to handle complcated nteacton among dsease loc. Refeence pape: Lo, Shaw-Hwa, Zheng, Tan: Backwad Haplotype Tansmsson Assocaton (BHTA) Algothm A Fast Multple-Make Sceenng Method. Hum Heed. To appea. 16