The TDT. (Transmission Disequilibrium Test) (Qualitative and quantitative traits) D M D 1 M 1 D 2 M 2 M 2D1 M 1

Similar documents
Lecture 8: Linear Regression

Chapter 8. Inferences about More Than Two Population Central Values

Summary of the lecture in Biostatistics

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

Econometric Methods. Review of Estimation

Special Instructions / Useful Data

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

TESTS BASED ON MAXIMUM LIKELIHOOD

SPECIAL CONSIDERATIONS FOR VOLUMETRIC Z-TEST FOR PROPORTIONS

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

Comparison of Dual to Ratio-Cum-Product Estimators of Population Mean

22 Nonparametric Methods.

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

Parameter, Statistic and Random Samples

ENGI 3423 Simple Linear Regression Page 12-01

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

Lecture Notes Types of economic variables

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

CHAPTER VI Statistical Analysis of Experimental Data

Chapter 14 Logistic Regression Models

Simple Linear Regression

Chapter 13, Part A Analysis of Variance and Experimental Design. Introduction to Analysis of Variance. Introduction to Analysis of Variance

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

Chapter 3 Sampling For Proportions and Percentages

Chapter 5 Properties of a Random Sample

Analysis of Variance with Weibull Data

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

Wu-Hausman Test: But if X and ε are independent, βˆ. ECON 324 Page 1

STK3100 and STK4100 Autumn 2017

Class 13,14 June 17, 19, 2015

Lecture 3. Sampling, sampling distributions, and parameter estimation

LECTURE - 4 SIMPLE RANDOM SAMPLING DR. SHALABH DEPARTMENT OF MATHEMATICS AND STATISTICS INDIAN INSTITUTE OF TECHNOLOGY KANPUR

Chapter 4 Multiple Random Variables

MEASURES OF DISPERSION

The Mathematical Appendix

Lecture 1 Review of Fundamental Statistical Concepts

Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests. Soccer Goals in European Premier Leagues

CLASS NOTES. for. PBAF 528: Quantitative Methods II SPRING Instructor: Jean Swanson. Daniel J. Evans School of Public Affairs

Bootstrap Method for Testing of Equality of Several Coefficients of Variation

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

STK4011 and STK9011 Autumn 2016

ρ < 1 be five real numbers. The

hp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations

Arithmetic Mean Suppose there is only a finite number N of items in the system of interest. Then the population arithmetic mean is

2.28 The Wall Street Journal is probably referring to the average number of cubes used per glass measured for some population that they have chosen.

Lecture 3 Probability review (cont d)

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b

Fundamentals of Regression Analysis

Application of Calibration Approach for Regression Coefficient Estimation under Two-stage Sampling Design

Simulation Output Analysis

Idea is to sample from a different distribution that picks points in important regions of the sample space. Want ( ) ( ) ( ) E f X = f x g x dx

STA302/1001-Fall 2008 Midterm Test October 21, 2008

Simple Linear Regression

Chapter 8: Statistical Analysis of Simulated Data

Continuous Distributions

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

Chapter 13 Student Lecture Notes 13-1

Example: Multiple linear regression. Least squares regression. Repetition: Simple linear regression. Tron Anders Moger

Multiple Linear Regression Analysis

STA 105-M BASIC STATISTICS (This is a multiple choice paper.)

Lecture 2: Linear Least Squares Regression

STK3100 and STK4100 Autumn 2018

Chapter 11 Systematic Sampling

Chapter 11 The Analysis of Variance

LINEAR REGRESSION ANALYSIS

f f... f 1 n n (ii) Median : It is the value of the middle-most observation(s).

Handout #8. X\Y f(x) 0 1/16 1/ / /16 3/ / /16 3/16 0 3/ /16 1/16 1/8 g(y) 1/16 1/4 3/8 1/4 1/16 1

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d

ENGI 4421 Propagation of Error Page 8-01

Chapter -2 Simple Random Sampling

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

The equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use.

Multiple Choice Test. Chapter Adequacy of Models for Regression

Functions of Random Variables

Lecture 9: Tolerant Testing

Chapter Two. An Introduction to Regression ( )

Random Variables and Probability Distributions

Chapter -2 Simple Random Sampling

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

GOALS The Samples Why Sample the Population? What is a Probability Sample? Four Most Commonly Used Probability Sampling Methods

Median as a Weighted Arithmetic Mean of All Sample Observations

4. Standard Regression Model and Spatial Dependence Tests

(b) By independence, the probability that the string 1011 is received correctly is

Chapter Statistics Background of Regression Analysis

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

X ε ) = 0, or equivalently, lim

Third handout: On the Gini Index

Module 7: Probability and Statistics

A New Family of Transformations for Lifetime Data

The number of observed cases The number of parameters. ith case of the dichotomous dependent variable. the ith case of the jth parameter

Point Estimation: definition of estimators

Permutation Tests for More Than Two Samples

Outline. Point Pattern Analysis Part I. Revisit IRP/CSR

Linear Regression with One Regressor

1. The weight of six Golden Retrievers is 66, 61, 70, 67, 92 and 66 pounds. The weight of six Labrador Retrievers is 54, 60, 72, 78, 84 and 67.

Introduction to local (nonparametric) density estimation. methods

Transcription:

The TDT (Trasmsso Dsequlbrum Test) (Qualtatve ad quattatve trats) Our am s to test for lkage (ad maybe ad/or assocato) betwee a dsease locus D ad a marker locus M. We kow where (.e. o what chromosome, ad where o that chromosome), the M locus s. Thus f the dsease locus s lked to the marker locus, t s also o that chromosome. M M D We assume alleles ad at the marker locus, D ad D at the dsease locus (. s the dsease,.e. bad, allele.) We have assocato (aka lkage dsequlbrum) f freq (D M ) freq (D ) freq (M ) = freq (D M ) freq (D ) freq (M ) 0 mples freq (M amog those wth the dsease) freq (M amog those free of the dsease) Why mght we have a case where dffers from 0? Two reasos.. Lkage. (Hece the term lkage dsequlbrum).... See later. Marker Locus M Orgal mutato at the dsease locus * M tme maybe ~000 years D D M D M Resdual assocato betwee M ad D tme ow D M

Null hypothess: = ½ (Dsease ad marker loc ulked) Alteratve hypothess: < ½ (Dsease ad marker loc lked) How ca we test ths ull hypothess?. Populato-based tests. Of these the oldest, ad most popular, s the casecotrol study.. Famly-based tests. (Why? See later) The classc populato-based test assesses whether the frequecy of oe marker allele I the case sample dffers sgfcatly from ts frequecy the cotrol sample. Ths s doe by a stadard two-by-two chsquare. Qualtatve trats (affected or ot affected) CASE / CONTROL ANALYSIS M M Total AFFECTED (CASES) R NOT AFFECTED (CONTROLS) R Total C N C Compare R z - score z wth R R R CC NR R R R C C N The ma problem wth the case-cotrol approach s that assocato ca arse through populato stratfcato, as well as lkage. Example far har/blue eyes vs dark har/brow eyes. So about 30 years ago, the focus moved to famlybased tests.

Of these, the most popular was the affected sb par sharg approach. (No detals gve here.) Problem: sbs share much of ther geome, affected or ot. Thus hard to fe-map the dsease locus. Also - problems wth complex dseases. The TDT (trasmsso-dsequlbrum test) s a famly-based test thus avodg problems of populato stratfcato - that uses assocato thus allowg fe mappg. How does t work? The basc ut (although may varatos ad extesos o the theme exst) s the famly tro of mother, father ad affected chld. I effect t compares the marker locus gees passed o to the affected chld wth those ot passed o (to a o-perso.) Mother M M Affected M M 4 Father M 3 M 4 No-perso M M 3 Combato of trasmtted ad o-trasmtted marker alleles M ad M amog parets of affected chldre Probabltes for trasmtted ad o-trasmtted marker alleles M ad M amog parets of affected chldre No-trasmtted allele No-trasmtted allele Trasmtted Allele M M Total Trasmtted Allele M M Total M. M. Total.. M P() P() P(.) M P() P() P(.) Total P(.) P(.)

P() = ( (p q + p q ))/( p ) P() = ( (p q ( q ) + p ( q )))/( p ) P() = ( (p q ( q ) + p ( q )))/( p ) P() = ( (p ( q ) p ( q ))) / ( p ) = relatve sze of subpopulato = lkage dsequlbrum subpopulato p = frequecy of D subpopulato q = frequecy of M subpopulato Ital choce of test statstc s Whe the ull hypothess s true, ths has mea 0, ad varace + z Equvaletly,... z Ths s the TDT statstc. A property of the TDT procedure:- Whe H 0 : = ½ s true, trasmssos of marker alleles to two or more affect sbs are depedet. Therefore the TDT may be used as a test of ths ull hypothess whe the data cota famles wth two or more affected chldre. Aother property of the TDT s that t has creased power whe assocato s hgher. Hs s show by the probabltes cosdered above whe there s o stratfcato. See ext slde the larger s the larger s the dfferece betwee P(,) ad P(,) ad thus the larger s the power to test the ull hypothess =½. P() = q + q /p P() = q ( q) + ( q) / p P() = q ( q) + ( q) / p P() = ( q) ( q) / p

Subpopulato k The ext few sldes show a mmgrato model checkg ths. Mgrats come from varous subpopulatos to a ew populato ad mate at radom there. Dfferet parameter values the subpopulatos create (after two geeratos the ew populato) assocato whch creases the power of the TDT as a test of lkage. Relatve Sze k Coeffcet of gametc Dsequlbrum k Geerato 0 Geerato Geerato Geerato 3 Gametc Dsequlbrum Gametc Dsequlbrum Gametc Dsequlbrum = ( ) 3 = ( ) Parets of geerato mate oly wth ther subpopulato Parets of geerato mate at radom throughout populato Parets of geerato 3 mate at radom throughout populato Geerato 0 Geerato Gametc Dsequlbrum Geerato Gametc Dsequlbrum Geerato 3 Gametc Dsequlbrum 3 Geerato 4, etc The value of the TDT statstc two models. Immedate admxture Geerato.48 Geerato.07 Geerato 3 5.34 Geerato 4.43. Gradual admxture Geerato.48 Geerato.07 Geerato 3 8.53 Geerato 4 6.99 The TDT as a test of assocato The TDT s ofte used, ad sometmes eve maly thought of, as a test of assocato. Why would we wat to do ths? The ma use s to carry out fe mappg oce lkage s ot questo. Affected sbs aalyss Mother D D D D sb sb Shared rego Urelated s aalyss Orgal mutato (or MRCA) D D D D perso perso Shared rego Much sharg Not much sharg

What chages are eeded to the testg procedure whe the TDT s to be used for ths purpose? There s a problem wth the TDT as a test of the hypothess =0 Trasmssos to affected sbs are ot depedet, eve whe the ull hypothess s true. Thus the bomal requremet #3 uderlyg the TDT procedure does ot hold. Ths affects the varace of. Suppose that famly j, M s trasmtted j tmes, M s trasmtted j tmes, from M M parets. Defe D j as j j The test statstc s T T D j D D j j D j Suppose that there s oly oe affected chld each famly. The D j = ± (for all j) Tz Equvaletly,... T = TDT The may marker allele case. Combato of trasmtted ad otrasmtted marker alleles M, M,, M k amog parets of affected chldre Trasmtted allele No-trasmtted allele M M M k Total M. M. M k k k k. Total...k Because of the probably sparsty the table, t s ot clear what s the best test statstc to use practce. Oe possble test statstc s maxtdt, defed by max (.. ) / (.+. ) However, ths has a complcated ull hypothess dstrbuto, whch s ukow. Defe d =.. d d d d3... d v = 0, v j = j + j V = {v j } Aother possble test statstc s ch-square = dv - d

The sb-tdt The TDT as descrbed earler requres kowledge of paretal geotypes at the marker locus. What ca be doe whe ths formato s ot avalable, as mght be the case for dseases such as Alzhemer s? We use uaffected sbs as cotrols. How? GENOTYPE M M M M M M Total Affected 4 37 38 73 Uaffected 4 30 443 986 Total 338 538 84 700 Orgal data: geotype of chld M M M M M M Total Affected 3 6 Uaffected 0 4 5 Total 3 3 5 Permuted data: geotype of chld M M M M M M Total Affected 3 6 Uaffected 5 Total 3 3 5 I a permutato procedure, the margal totals are fxed uder permutato. Affected Uaffected Example: famly # margal totals are:- M M M M M M Total Total r s t r s t a u Let X be the umber of M gees ths famly amog the affected sbs. The uder permutato, Mea of X = (r+s)a/t, Varace of X = au{4r(t-r-s) +s(t-s)} / {t (t-)} Combed Procedure TDT: # trasmssos of M from M M parets to affected sbs Null hypothess mea = / Null hypothess varace = /4 comb = / + S TDT comb = /4 + S TDT Now sum the X values over all famles to get a z score.

If there had bee oly oe famly, the X ( r s) a / t z V where X s the total umber of M gees amog affecteds, wth Now mage that ths table relates to ALL famles the data set, so that we replace a by A, r by R, etc. The defe z X ( R S) A/ T V V V defed by au{4r( t r s) s( t s)} t ( t ) where X s the total umber of M gees amog affecteds, summed over all famles, ad AU{4R( T R S) S( T S)} V T ( T ) The quatty Z does ot have the z dstrbuto, because the mea ad varace formulae are wrog. However, we calculate z for may markers ad see f the observed z s a extreme member of the z scores so calculated. Ths s ad example of the geeral geomc cotrol method of Devl ad Roeder see ext slde for more detals. The geomc cotrol method s prmarly used the stadard case cotrol method (see a earler slde) the two-by-two chsquare s computed for may markers ad the emprcal dstrbuto of the ch-square values obtaed s used as the ull hypothess dstrbuto. If we do ot observe ether paret, Aother approach s to use the stadard two-by-two ch-square after estmatg, ad the correctg for, populato stratfcato (Prtchard, Stephes ad Doelly). but fer, from two affected sbs, oe of whom s M M, the other s M M, that both are M M, we may ot use data from ths famly the TDT.

We observe two affected sbs: Sb Sb MM MM We ca fer Paret Paret M M M M But whle the mea umber of M trasmssos ths famly s, the varace s 0. Geeralzatos There are may geeralzatos of the TDT procedure. Here we meto just a few. The frst s the PDT (pedgree dsequlbrum test) (Mart et al. 000). Ths actually has problems wth pedgree data, so we cosder t here oly for the case where the pedgree s smply a famly. W assume a marker locus (maybe a SNP) wth two alleles, deoted M ad M. Here we have two forms of data from ay famly:- () Dscordat sb-par data. X () Tj Trad data. A trad s a mother-father-affected chld tro. For trad j the famly, we compute defed by X Tj, {# M alleles trasmtted from the parets trad j} {# M alleles ot trasm tted from the parets trad j}. (Note that homozygous parets cotrbute zero to ths umber.) Cosder each dscordat sb-par (DSP) ay famly (.e. oe sb affected, the other ot affected). Defe by X Sk X Sk {# M gees the affected sb ( 0, or )} - {# M gees the uaffected sb ( 0, or )} Now defe by D [ X X ]/[ ]. D all j Tj all k The sum s take over all trads ad over all DSPs famly the data set. (I ths formula, T s the umber of trads the famly ad S s the umber of dscordat sb pars famly.) Sk T S The frst shot at a test statstc s Whe the ull hypothess of o assocato betwee the marker ad the dsease alleles s true, each D s a radom varable havg mea 0. Thus whe ths ull hypothess s true, the mea of D s also 0. However, the varace of D s ukow. D

However, we estmate ths varace (usg the aalogy wth a t statstc) by D. Ths leads to a test statstc T [ D]/[ D ]. whch, whe the ull hypothess s true, s approxmately a Z (.e. ormal, mea 0, varace.) How does ths compare wth the stadard TDT statstc? To make ths comparso we have to gore all DSPs, sce the TDT does ot use these. Also, for smplcty, we assume oly oe affected chld each famly ad that both parets are heterozygotes. MM I ths case t ca be show that the TDT statstc ad the PDT statstc become, respectvely, TDT statstc = D /, where s the umber of famles the etre data set, ad, as before, T = D / D. Whe the ull hypothess s true, takes the values +, 0 or - wth respectve probabltes ¼. ½ ad ¼. Thus the mea values of s ad the mea value of +(/4) + 0 + (-)(/4) = 0, D s D D D, 0, I these formulae, famly, or accordg as to whether both parets pass o the affected chld, exactly oe does, or ether do. M to What does ths mea? 4(/4) + 0 + 4(/4) =. The Quattatve TDT (QTDT) (D. Allso) We start wth the followg smple example. We have a sample of famles where Paret s M M, Paret s uformatve, ad there s oe affected chld. t test x average measuremet for famles x where Paret trasmts M average measuremet for where Paret trasmts M t x x s famles

test L = lower cut-off value U = upper cut-off value M M L U Do a table test. Aother approach va regresso As a smple case cosder the chldre a famly where the father s MM (ad thus uterestg) ad the mother s. MM Let Y be the measuremet (blood pressure, weght, etc.) of a affected chld, ad defe X= f the chld receves M from the mother ad X = 0 otherwse. The test the ull hypothess 0 the regresso model Y X E.