The TDT. (Transmission Disequilibrium Test) (Qualitative and quantitative traits) D M D 1 M 1 D 2 M 2 M 2D1 M 1

The TDT (Trasmsso Dsequlbrum Test) (Qualtatve ad quattatve trats) Our am s to test for lkage (ad maybe ad/or assocato) betwee a dsease locus D ad a marker locus M. We kow where (.e. o what chromosome, ad where o that chromosome), the M locus s. Thus f the dsease locus s lked to the marker locus, t s also o that chromosome. M M D We assume alleles ad at the marker locus, D ad D at the dsease locus (. s the dsease,.e. bad, allele.) We have assocato (aka lkage dsequlbrum) f freq (D M ) freq (D ) freq (M ) = freq (D M ) freq (D ) freq (M ) 0 mples freq (M amog those wth the dsease) freq (M amog those free of the dsease) Why mght we have a case where dffers from 0? Two reasos.. Lkage. (Hece the term lkage dsequlbrum).... See later. Marker Locus M Orgal mutato at the dsease locus * M tme maybe ~000 years D D M D M Resdual assocato betwee M ad D tme ow D M

Null hypothess: = ½ (Dsease ad marker loc ulked) Alteratve hypothess: < ½ (Dsease ad marker loc lked) How ca we test ths ull hypothess?. Populato-based tests. Of these the oldest, ad most popular, s the casecotrol study.. Famly-based tests. (Why? See later) The classc populato-based test assesses whether the frequecy of oe marker allele I the case sample dffers sgfcatly from ts frequecy the cotrol sample. Ths s doe by a stadard two-by-two chsquare. Qualtatve trats (affected or ot affected) CASE / CONTROL ANALYSIS M M Total AFFECTED (CASES) R NOT AFFECTED (CONTROLS) R Total C N C Compare R z - score z wth R R R CC NR R R R C C N The ma problem wth the case-cotrol approach s that assocato ca arse through populato stratfcato, as well as lkage. Example far har/blue eyes vs dark har/brow eyes. So about 30 years ago, the focus moved to famlybased tests.

Of these, the most popular was the affected sb par sharg approach. (No detals gve here.) Problem: sbs share much of ther geome, affected or ot. Thus hard to fe-map the dsease locus. Also - problems wth complex dseases. The TDT (trasmsso-dsequlbrum test) s a famly-based test thus avodg problems of populato stratfcato - that uses assocato thus allowg fe mappg. How does t work? The basc ut (although may varatos ad extesos o the theme exst) s the famly tro of mother, father ad affected chld. I effect t compares the marker locus gees passed o to the affected chld wth those ot passed o (to a o-perso.) Mother M M Affected M M 4 Father M 3 M 4 No-perso M M 3 Combato of trasmtted ad o-trasmtted marker alleles M ad M amog parets of affected chldre Probabltes for trasmtted ad o-trasmtted marker alleles M ad M amog parets of affected chldre No-trasmtted allele No-trasmtted allele Trasmtted Allele M M Total Trasmtted Allele M M Total M. M. Total.. M P() P() P(.) M P() P() P(.) Total P(.) P(.)

P() = ( (p q + p q ))/( p ) P() = ( (p q ( q ) + p ( q )))/( p ) P() = ( (p q ( q ) + p ( q )))/( p ) P() = ( (p ( q ) p ( q ))) / ( p ) = relatve sze of subpopulato = lkage dsequlbrum subpopulato p = frequecy of D subpopulato q = frequecy of M subpopulato Ital choce of test statstc s Whe the ull hypothess s true, ths has mea 0, ad varace + z Equvaletly,... z Ths s the TDT statstc. A property of the TDT procedure:- Whe H 0 : = ½ s true, trasmssos of marker alleles to two or more affect sbs are depedet. Therefore the TDT may be used as a test of ths ull hypothess whe the data cota famles wth two or more affected chldre. Aother property of the TDT s that t has creased power whe assocato s hgher. Hs s show by the probabltes cosdered above whe there s o stratfcato. See ext slde the larger s the larger s the dfferece betwee P(,) ad P(,) ad thus the larger s the power to test the ull hypothess =½. P() = q + q /p P() = q ( q) + ( q) / p P() = q ( q) + ( q) / p P() = ( q) ( q) / p

Subpopulato k The ext few sldes show a mmgrato model checkg ths. Mgrats come from varous subpopulatos to a ew populato ad mate at radom there. Dfferet parameter values the subpopulatos create (after two geeratos the ew populato) assocato whch creases the power of the TDT as a test of lkage. Relatve Sze k Coeffcet of gametc Dsequlbrum k Geerato 0 Geerato Geerato Geerato 3 Gametc Dsequlbrum Gametc Dsequlbrum Gametc Dsequlbrum = ( ) 3 = ( ) Parets of geerato mate oly wth ther subpopulato Parets of geerato mate at radom throughout populato Parets of geerato 3 mate at radom throughout populato Geerato 0 Geerato Gametc Dsequlbrum Geerato Gametc Dsequlbrum Geerato 3 Gametc Dsequlbrum 3 Geerato 4, etc The value of the TDT statstc two models. Immedate admxture Geerato.48 Geerato.07 Geerato 3 5.34 Geerato 4.43. Gradual admxture Geerato.48 Geerato.07 Geerato 3 8.53 Geerato 4 6.99 The TDT as a test of assocato The TDT s ofte used, ad sometmes eve maly thought of, as a test of assocato. Why would we wat to do ths? The ma use s to carry out fe mappg oce lkage s ot questo. Affected sbs aalyss Mother D D D D sb sb Shared rego Urelated s aalyss Orgal mutato (or MRCA) D D D D perso perso Shared rego Much sharg Not much sharg

What chages are eeded to the testg procedure whe the TDT s to be used for ths purpose? There s a problem wth the TDT as a test of the hypothess =0 Trasmssos to affected sbs are ot depedet, eve whe the ull hypothess s true. Thus the bomal requremet #3 uderlyg the TDT procedure does ot hold. Ths affects the varace of. Suppose that famly j, M s trasmtted j tmes, M s trasmtted j tmes, from M M parets. Defe D j as j j The test statstc s T T D j D D j j D j Suppose that there s oly oe affected chld each famly. The D j = ± (for all j) Tz Equvaletly,... T = TDT The may marker allele case. Combato of trasmtted ad otrasmtted marker alleles M, M,, M k amog parets of affected chldre Trasmtted allele No-trasmtted allele M M M k Total M. M. M k k k k. Total...k Because of the probably sparsty the table, t s ot clear what s the best test statstc to use practce. Oe possble test statstc s maxtdt, defed by max (.. ) / (.+. ) However, ths has a complcated ull hypothess dstrbuto, whch s ukow. Defe d =.. d d d d3... d v = 0, v j = j + j V = {v j } Aother possble test statstc s ch-square = dv - d

The sb-tdt The TDT as descrbed earler requres kowledge of paretal geotypes at the marker locus. What ca be doe whe ths formato s ot avalable, as mght be the case for dseases such as Alzhemer s? We use uaffected sbs as cotrols. How? GENOTYPE M M M M M M Total Affected 4 37 38 73 Uaffected 4 30 443 986 Total 338 538 84 700 Orgal data: geotype of chld M M M M M M Total Affected 3 6 Uaffected 0 4 5 Total 3 3 5 Permuted data: geotype of chld M M M M M M Total Affected 3 6 Uaffected 5 Total 3 3 5 I a permutato procedure, the margal totals are fxed uder permutato. Affected Uaffected Example: famly # margal totals are:- M M M M M M Total Total r s t r s t a u Let X be the umber of M gees ths famly amog the affected sbs. The uder permutato, Mea of X = (r+s)a/t, Varace of X = au{4r(t-r-s) +s(t-s)} / {t (t-)} Combed Procedure TDT: # trasmssos of M from M M parets to affected sbs Null hypothess mea = / Null hypothess varace = /4 comb = / + S TDT comb = /4 + S TDT Now sum the X values over all famles to get a z score.

If there had bee oly oe famly, the X ( r s) a / t z V where X s the total umber of M gees amog affecteds, wth Now mage that ths table relates to ALL famles the data set, so that we replace a by A, r by R, etc. The defe z X ( R S) A/ T V V V defed by au{4r( t r s) s( t s)} t ( t ) where X s the total umber of M gees amog affecteds, summed over all famles, ad AU{4R( T R S) S( T S)} V T ( T ) The quatty Z does ot have the z dstrbuto, because the mea ad varace formulae are wrog. However, we calculate z for may markers ad see f the observed z s a extreme member of the z scores so calculated. Ths s ad example of the geeral geomc cotrol method of Devl ad Roeder see ext slde for more detals. The geomc cotrol method s prmarly used the stadard case cotrol method (see a earler slde) the two-by-two chsquare s computed for may markers ad the emprcal dstrbuto of the ch-square values obtaed s used as the ull hypothess dstrbuto. If we do ot observe ether paret, Aother approach s to use the stadard two-by-two ch-square after estmatg, ad the correctg for, populato stratfcato (Prtchard, Stephes ad Doelly). but fer, from two affected sbs, oe of whom s M M, the other s M M, that both are M M, we may ot use data from ths famly the TDT.

We observe two affected sbs: Sb Sb MM MM We ca fer Paret Paret M M M M But whle the mea umber of M trasmssos ths famly s, the varace s 0. Geeralzatos There are may geeralzatos of the TDT procedure. Here we meto just a few. The frst s the PDT (pedgree dsequlbrum test) (Mart et al. 000). Ths actually has problems wth pedgree data, so we cosder t here oly for the case where the pedgree s smply a famly. W assume a marker locus (maybe a SNP) wth two alleles, deoted M ad M. Here we have two forms of data from ay famly:- () Dscordat sb-par data. X () Tj Trad data. A trad s a mother-father-affected chld tro. For trad j the famly, we compute defed by X Tj, {# M alleles trasmtted from the parets trad j} {# M alleles ot trasm tted from the parets trad j}. (Note that homozygous parets cotrbute zero to ths umber.) Cosder each dscordat sb-par (DSP) ay famly (.e. oe sb affected, the other ot affected). Defe by X Sk X Sk {# M gees the affected sb ( 0, or )} - {# M gees the uaffected sb ( 0, or )} Now defe by D [ X X ]/[ ]. D all j Tj all k The sum s take over all trads ad over all DSPs famly the data set. (I ths formula, T s the umber of trads the famly ad S s the umber of dscordat sb pars famly.) Sk T S The frst shot at a test statstc s Whe the ull hypothess of o assocato betwee the marker ad the dsease alleles s true, each D s a radom varable havg mea 0. Thus whe ths ull hypothess s true, the mea of D s also 0. However, the varace of D s ukow. D

However, we estmate ths varace (usg the aalogy wth a t statstc) by D. Ths leads to a test statstc T [ D]/[ D ]. whch, whe the ull hypothess s true, s approxmately a Z (.e. ormal, mea 0, varace.) How does ths compare wth the stadard TDT statstc? To make ths comparso we have to gore all DSPs, sce the TDT does ot use these. Also, for smplcty, we assume oly oe affected chld each famly ad that both parets are heterozygotes. MM I ths case t ca be show that the TDT statstc ad the PDT statstc become, respectvely, TDT statstc = D /, where s the umber of famles the etre data set, ad, as before, T = D / D. Whe the ull hypothess s true, takes the values +, 0 or - wth respectve probabltes ¼. ½ ad ¼. Thus the mea values of s ad the mea value of +(/4) + 0 + (-)(/4) = 0, D s D D D, 0, I these formulae, famly, or accordg as to whether both parets pass o the affected chld, exactly oe does, or ether do. M to What does ths mea? 4(/4) + 0 + 4(/4) =. The Quattatve TDT (QTDT) (D. Allso) We start wth the followg smple example. We have a sample of famles where Paret s M M, Paret s uformatve, ad there s oe affected chld. t test x average measuremet for famles x where Paret trasmts M average measuremet for where Paret trasmts M t x x s famles

test L = lower cut-off value U = upper cut-off value M M L U Do a table test. Aother approach va regresso As a smple case cosder the chldre a famly where the father s MM (ad thus uterestg) ad the mother s. MM Let Y be the measuremet (blood pressure, weght, etc.) of a affected chld, ad defe X= f the chld receves M from the mother ad X = 0 otherwse. The test the ull hypothess 0 the regresso model Y X E.