Backward Haplotype Transmission Association (BHTA) Algorithm. Tian Zheng Department of Statistics Columbia University. February 5 th, 2002

Similar documents
Tian Zheng Department of Statistics Columbia University

Physics 11b Lecture #2. Electric Field Electric Flux Gauss s Law

Multistage Median Ranked Set Sampling for Estimating the Population Median

Thermodynamics of solids 4. Statistical thermodynamics and the 3 rd law. Kwangheon Park Kyung Hee University Department of Nuclear Engineering

A. Thicknesses and Densities

Learning the structure of Bayesian belief networks

APPLICATIONS OF SEMIGENERALIZED -CLOSED SETS

The Greatest Deviation Correlation Coefficient and its Geometrical Interpretation

Chapter 23: Electric Potential

Groupoid and Topological Quotient Group

Correspondence Analysis & Related Methods

3. A Review of Some Existing AW (BT, CT) Algorithms

Khintchine-Type Inequalities and Their Applications in Optimization

A Brief Guide to Recognizing and Coping With Failures of the Classical Regression Assumptions

Using the estimated penetrances to determine the range of the underlying genetic model in casecontrol

Set of square-integrable function 2 L : function space F

UNIT10 PLANE OF REGRESSION

Chapter Fifiteen. Surfaces Revisited

Machine Learning. Spectral Clustering. Lecture 23, April 14, Reading: Eric Xing 1

Rigid Bodies: Equivalent Systems of Forces

Distinct 8-QAM+ Perfect Arrays Fanxin Zeng 1, a, Zhenyu Zhang 2,1, b, Linjie Qian 1, c

If there are k binding constraints at x then re-label these constraints so that they are the first k constraints.

P 365. r r r )...(1 365

Generating Functions, Weighted and Non-Weighted Sums for Powers of Second-Order Recurrence Sequences

Optimization Methods: Linear Programming- Revised Simplex Method. Module 3 Lecture Notes 5. Revised Simplex Method, Duality and Sensitivity analysis

Physics Exam II Chapters 25-29

Physics 202, Lecture 2. Announcements

V. Principles of Irreversible Thermodynamics. s = S - S 0 (7.3) s = = - g i, k. "Flux": = da i. "Force": = -Â g a ik k = X i. Â J i X i (7.

iclicker Quiz a) True b) False Theoretical physics: the eternal quest for a missing minus sign and/or a factor of two. Which will be an issue today?

Part V: Velocity and Acceleration Analysis of Mechanisms

Efficiency of the principal component Liu-type estimator in logistic

Energy in Closed Systems

24-2: Electric Potential Energy. 24-1: What is physics

Physics 2A Chapter 11 - Universal Gravitation Fall 2017

Density Functional Theory I

Lecture 5 Single factor design and analysis

8 Baire Category Theorem and Uniform Boundedness

ON THE FRESNEL SINE INTEGRAL AND THE CONVOLUTION

PHYS Week 5. Reading Journals today from tables. WebAssign due Wed nite

Machine Learning 4771

CEEP-BIT WORKING PAPER SERIES. Efficiency evaluation of multistage supply chain with data envelopment analysis models

Cooperative and Active Sensing in Mobile Sensor Networks for Scalar Field Mapping

Recall that quantitative genetics is based on the extension of Mendelian principles to polygenic traits.

Contact, information, consultations

N = N t ; t 0. N is the number of claims paid by the

19 The Born-Oppenheimer Approximation

A Bijective Approach to the Permutational Power of a Priority Queue

Engineering Mechanics. Force resultants, Torques, Scalar Products, Equivalent Force systems

Exact Simplification of Support Vector Solutions

Dirichlet Mixture Priors: Inference and Adjustment

A Tutorial on Low Density Parity-Check Codes

4 SingularValue Decomposition (SVD)

Goodness-of-fit for composite hypotheses.

Large scale magnetic field generation by accelerated particles in galactic medium

PHYS 705: Classical Mechanics. Derivation of Lagrange Equations from D Alembert s Principle

Detection and Estimation Theory

An Approach to Inverse Fuzzy Arithmetic

THE EQUIVALENCE OF GRAM-SCHMIDT AND QR FACTORIZATION (page 227) Gram-Schmidt provides another way to compute a QR decomposition: n

A Study about One-Dimensional Steady State. Heat Transfer in Cylindrical and. Spherical Coordinates

ITI Introduction to Computing II

4.4 Continuum Thermomechanics

EE 5337 Computational Electromagnetics (CEM)

Approximate Abundance Histograms and Their Use for Genome Size Estimation

Fairing of Parametric Quintic Splines

A NOVEL DWELLING TIME DESIGN METHOD FOR LOW PROBABILITY OF INTERCEPT IN A COMPLEX RADAR NETWORK

an application to HRQoL

ALL QUESTIONS ARE WORTH 20 POINTS. WORK OUT FIVE PROBLEMS.

ANOVA. The Observations y ij

Bayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County

Many Fields of Battle: How Cost Structure Affects Competition Across Multiple Markets

MULTIPOLE FIELDS. Multipoles, 2 l poles. Monopoles, dipoles, quadrupoles, octupoles... Electric Dipole R 1 R 2. P(r,θ,φ) e r

A NOTE ON ELASTICITY ESTIMATION OF CENSORED DEMAND

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

On a quantity that is analogous to potential and a theorem that relates to it

CSU ATS601 Fall Other reading: Vallis 2.1, 2.2; Marshall and Plumb Ch. 6; Holton Ch. 2; Schubert Ch r or v i = v r + r (3.

Bayesian Assessment of Availabilities and Unavailabilities of Multistate Monotone Systems

Experimental study on parameter choices in norm-r support vector regression machines with noisy input

Minimal Detectable Biases of GPS observations for a weighted ionosphere

INTERVAL ESTIMATION FOR THE QUANTILE OF A TWO-PARAMETER EXPONENTIAL DISTRIBUTION

Scalars and Vectors Scalar

Checking Pairwise Relationships. Lecture 19 Biostatistics 666

Chapter I Matrices, Vectors, & Vector Calculus 1-1, 1-9, 1-10, 1-11, 1-17, 1-18, 1-25, 1-27, 1-36, 1-37, 1-41.

Amplifier Constant Gain and Noise

Auchmuty High School Mathematics Department Advanced Higher Notes Teacher Version

GENERALIZED MULTIVARIATE EXPONENTIAL TYPE (GMET) ESTIMATOR USING MULTI-AUXILIARY INFORMATION UNDER TWO-PHASE SAMPLING

Physics Exam 3

Classical Worm algorithms (WA)

Multiple Criteria Secretary Problem: A New Approach

6.6 The Marquardt Algorithm

CSJM University Class: B.Sc.-II Sub:Physics Paper-II Title: Electromagnetics Unit-1: Electrostatics Lecture: 1 to 4

Remember: When an object falls due to gravity its potential energy decreases.

Some Approximate Analytical Steady-State Solutions for Cylindrical Fin

Consequences of Long Term Transients in Large Area High Density Plasma Processing: A 3-Dimensional Computational Investigation*

Theo K. Dijkstra. Faculty of Economics and Business, University of Groningen, Nettelbosje 2, 9747 AE Groningen THE NETHERLANDS

Topic 23 - Randomized Complete Block Designs (RCBD)

Capítulo. Three Dimensions

2-DIMENSIONAL MODELING OF PULSED PLASMAS WITH AND WITHOUT SUBSTRATE BIAS USING MODERATE PARALLELISM*

PHY126 Summer Session I, 2008

2 dependence in the electrostatic force means that it is also

State Feedback Controller Design via Takagi- Sugeno Fuzzy Model : LMI Approach

Transcription:

Backwad Haplotype Tansmsson Assocaton (BHTA) Algothm A Fast ult-pont Sceenng ethod fo Complex Tats Tan Zheng Depatment of Statstcs Columba Unvesty Febuay 5 th, 2002 Ths s a jont wok wth Pofesso Shaw-Hwa Lo n the Depatment of Statstcs at Columba Unvesty. 1

Outlne Backgound Genetcs temnology Bologcal bass of genetc mappng Cuent assocaton methods Issues on mappng fo complex tats Intoducton Issues we addessed usng the poposed method. Illustatve example Algothm Detals Dscusson Pefomance of the BHTA algothm Poblem and soluton Futue effots Summay 2

Chomosomes and Genes the genetc nfomaton cae Nucleus of body cell One Stand of chomosome Two copes of gene A Alleles Gene 1 One pa of chomosomes Gene 2 Two copes of gene B Alleles 3

eoss: Cossove and Recombnaton the bass of genetc mappng Haplotype Cossove Recombnatons 4

Lnkage and Lnkage Dsequlbum bass of cuent mappng algothms to locate dsease susceptblty loc utant allele, whch wll make ncease the sk of the dsease Souce: Assocaton Study Desgns Fo Complex Dseases Lon R. Cadon and John I. Bell. Natue Revews Genetcs Volume 2, 2001 5

Assocaton ethods fo the mappng of dsease susceptblty loc Case-contol study Use famly-based contols Tansmsson/Dsequlbum Test (TDT) case-paent to method (Spelman et al. 1993) Assume a make wth alleles and m s studed. Tansmtted Untansmtted m a b m c d The test statstc used s a T = 2 χ statstc wth d.f.=1, ( b c) b+ c 2 6

Sceenng fo Complex Tats Complex dseases multfactoal dseases ae caused by multple genes nteactng wth each othe and wth envonmental factos to ceate a gadent of genetc susceptblty to dsease. (Weeks and Lathop 1995) Badet-Bedl Syndome (BBS): Bughes et al (2001) Scence VOL 293: 2213-2214 7

Haplotypc Tansmsson Dsequlbum Assume each make has two alleles ( n, m). Assume a paent of a patent has two haplotypes: h m1 n1 m 2 m 2 =,and h = m9 m9 m n 1 2 10 10 P( h tans., h untans dseased chld) 1 2 P(dseased h tans., h untans) Inteacton among P(dseased) dsease genes P(dseased tans. alleles at loc a, b, c) = P(dseased) 1 2 = Ph ( 1 tans., h2 untans) tans. alleles at loc a, b, c p(tans. alleles at a, b, c, h1 tans. & h2 Lnkage/LD between the makes and the dsease loc untans) 8

Cuent Genome Scan methods and ultponts algothms ultple ndvdual Lnkage o Assocaton tests had to establsh sgnfcance, can t handle nteacton among dsease genes. Haplotype TD tests fo tghtly lnked makes the econstucton of the tansmsson matx, the matx s too lage and spase to 2 cay out a vald χ test. The Call fo moe genealzed haplotype based methods thee seems to be a geat need fo the development of multlocus tests of assocaton that make use of haplotype nfomaton, snce these mght pove to be much moe effcent. J. Ptchad and. Pzewosk (June 2001). Am. J. Hum. Genet. 69: 1-14,the most nfomatve and cost effectve method of LD mappng that based on haplotypes. D. Clayton, J. Todd et al (Octobe 2001). Natue Genetcs Volume 29: 233-237 9

BHTA Intoducton Backwad Haplotype Tansmsson Assocaton (BHTA) algothm A fast mult-pont sceenng algothm based on haplotypc tansmsson dsequlbum ultpont Sceenng Fast and memoy-effcent Use haplotypc tansmsson nfomaton takng nto account possble nteactons among dsease susceptblty loc Automatcally select a set of mpotant makes as sceenng esult 10

Results Fom 2001Pape fo Glles de la Touette Syndom by Ingd Smonc, Jug Ott, et al. Amecan Jounal of edcal Genetcs (Neuopsychatc Genetcs) 105:163-167 (2001) Results Summay Fom TDT and HRR Analyses Ognal Case-Contol Study Follow Up Famlal TDT and HRR Study Ch Locus 1-Locus TDT Extended Haplotypes 2- Locus TDT 2-Locus HRR 3-Locus HRR c fom p-tel p p p p p 2 D2S139 101.56 0.039 D2S440 103.16 0.002 0.734 D2S417 106.84 0.160 8 D8S1119 101.01 0.01 0.349 T7-27 103.69 0.835 D8S270 103.69 0.823 GATA28F12 104.33 0.056 D8S257 111.68 0.01 0.638 11 D11S1377 120.87 < 0.000001 0.022 D11S1353 122.47 0.135 D11S933 124.07 0.0009 0.535 20 D20S1085 82.07 0.0001 0.240 D20S469 84.78 0.1 0.411 21 GATA45C03 31.26 0.0004 0.221 D21S1252 35.45 0.000008 0.279 0.007 0.025 0.059 0.013 0.109 0.108 0.011 11

Result fom BHTA algothm 5 akes dentfed by BHTA sceenng: Ch 2 Ch 8 D2S440 GATA28F12 D8S559 Ch 11 D11S1377 Ch 20 D20S469 Evaluated usng pemutatons unde null hypothess: p-value 10 4 Haplotypc Tansmsson Dsequlbum Obseved feq untansmtted 0 5 10 15 20 25 0 5 10 15 20 25 Obseved feq tansmtted to the patent Haplotypes ae defned by the 5 makes selected 12

Data: genetc nfomaton of a andom sample of n patents and the paents. 2n paent-patent tansmsson pas Each pa conssts of two haplotypes one tansmtted and the othe untansmtted th Fo k pa, let tk be the haplotype tansmtted to the dseased chld, and u k be the untansmtted. Defne n, j= #( tk = h, uk = hj), as the numbe of tansmssons whch tansmtted to the dseased chld and left untansmtted. h j h 13

Counts summazng the haplotype tansmssons ae also defned as follows: t n = #( t = h) = n k j u n = #( u = h) = k, j n, j A statstc haplotype tansmsson dsequlbum (HTD) s defned to measue the amount of lnkage/ld nfomaton contaned n the set of makes beng tested: t u 2 ( ), HTD = n n whose expectaton unde the null hypothess s popotonal to the tace of the Fshe s nfomaton matx usng haplotype elatve sks. 14

Idea of sceenng: Pck out makes contans lttle amount of lnkage nfomaton, one at a tme untl all the makes emaned ae mpotant n tems of lnkage/ld wth the dsease. Assume m makes S = { 1, 2,..., m} ae beng tested, and we want to evaluate the nfomaton th contaned n the make, whch has 2 alleles a and b. th Consde S = S / ( -deleted make set). Let H = { h, h,..., h } 1 2 H be the set of haplotypes spanned by S, and the t u counts,and can be defned as befoe. n n 15

t t Denote by n( a) and n( b), the numbe of tansmssons of the enlaged haplotypes: h a, and h b, espectvely. u u Smlaly, two counts n ( a) and n ( b) ae defned fo the non-tansmssons of the enlaged haplotypes. It s easy to obseve that t t t n = n ( a ) + n ( b ) u u u n = n ( a ) + n ( b ) 16

Fo example, T U ABdE 1 AbDe 1 AbdE 1 Abde 1 abde 1 1 abde 1 abde 1 t n t t u u n n ( E ) n () e n ( E ) n () e u ABd 1 0 1 0 0 0 AbD 1 0 1 0 0 0 Abd 0 2 0 0 1 1 abd 1 1 0 1 0 1 abd 1 1 1 0 0 1 17

HTD fo m makes, S = { 1, 2,..., m} : HTD( m) = ( n ( a )- n ( a )) + ( n ( b )- n ( b )) h H t u 2 t u 2 HTD fo the m-1 makes n S = S / : th -deleted make set t u HTD ( m 1) = ( n n ) h H 2 t t u u = ( n ( a ) + n ( b ) n ( a )- n ( b )) h H = ( n ( a ) n ( a )) + ( n ( b )- n ( b )) = h H t u 2 t u 2 t u t u + 2 ( n ( a ) n ( a )( n ( b )- n ( b )) HTD( m) t u t u + 2 ( n ( a ) n ( a )( n ( b )- n ( b )) h H h H 2 18

Thus, the amount of nfomaton bought by make can be evaluated usng the HTD dffeence the nfomaton dop. HTD ( m 1) = HTD ( m 1) HTD( m) t u t u = 2 ( n( a) n ( a)( n( b)- n ( b) ) h H Haplotype Tansmsson Assocaton (HTA) s ntoduced as follows, h h t u t u HTA ( m) = ( n( a) n ( a)( n( b)- n ( b)) + n( ) a b h H h H 19

The popetes of HTA statstc: Expectaton of HTA ( m) ost mpotant contbutes mpotant lnkage Negatve nfomaton to the cuent make set. Zeo only contans edundant lnkage nfomaton what s aleady caed by some makes n the set S ; th -deleted make o no make n the data set s assocated wth any dsease susceptblty loc. Postve contbutes no lnkage nfomaton but nose to the data, and dlutes the tue lnkage/assocaton nfomaton caed by othe makes. Least mpotant 20

BHTA algothm based on the HTA statstc Data {,,..., } S = 1 2 K s the total numbe of makes. K m s the numbe of makes etaned n S. Fo each = 1,2,..., m, calculate HTA ( m ) fo. Delete the make wth the hghest HTA ( m ) n S and contnue n the loop. Any postve HTA? Yes No Retun S as sceenng esult wth an HTD scoe. 21

Evaluaton of the sgnfcance 1. ultpont TDT tests 2. Pemutaton test (Lazzweon and Lange 1998) th Fo k patent-paent tansmsson pa, ( t, u ), t k was what actually tansmtted to the dseased chld, and u k s the untansmtted. Pemute the tansmtted label unde the null hypothess fo each pa to get a duplcate of the ognal data. * Use ndependent valdaton data snce the BHTA sceenng algothm tends to fnd the maxmal tansmsson dsequlbum n the data. k k 22

Smulaton usng the BBS model At each dsease susceptblty locus, assume thee ae thee makes (wth stong lnkage/ld) beng tested. The esult fom ndvdual TDT: agnal TDT pefomance: 150 patents Acceptance egon 0 2 4 6 8 1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930 23

BHTA s powe detectng tue lnkage: Oveall powe detectng the 3 susceptblty loc (the esultng make set cove all of them) s 75%. Fo each of the loc, the detectng powe of BHTA sceenng algothm usng 150 patents s 90% 30 mk,(nolk) 150 patents, 1000 sceenngs colsums(geneout04.data2)[1:30] 0 100 200 300 400 500 600 0 5 10 15 20 25 30 Index Oveall powe detectng the 3 loc : 75% 24

Infomaton flow dung BHTA sceenng nfomaton 0 2000 4000 6000 8000 10000 12000 14000 0 5 10 15 20 25 30 steps Red dot: make deleted s unassocated; Blue dot: make deleted s assocated. Black lne: unde tue lnkage (altenatve) Blue/yellow lne: no lnkage/ld (null) Vetcal lnes: whee the sceenng stopped altenatve, null 25

Poblem: Sample sze and the numbe of makes affect the pefomance of the sceenng. Spaseness at the begnnng of the sceenng leads to andom deleton. Soluton: two-stage sceenng 100 mk,(lk) 250 patents, 5000 smulatons colsums(geneout04.data2)[1:100] 100 200 300 400 500 600 0 20 40 60 80 100 Index 26

27

Futue effots 1. Extenson to moe genealzed complex tats Quanttatve tats Onset age 2. Futhe nvestgaton of the two-stage sceenng pocedue. Reasonable theshold Pefomance 3. Develop pocedues based on ths algothm, fo lage set of makes, say moe than 1000. 4. Desgn of algothm based on the same dea fo patents wth only affected sblng o elatves nfomaton. 5. Possble development of elated softwae 28

Summay BHTA algothm A fast haplotype-based algothm able to handle complcated nteacton among dsease loc. Woks wthout specfc dsease model peassumed. Able to handle 100+ makes at the same tme. Tme effcent no e-constucton of tansmsson matx and avods tedous computaton. Especally useful, f some canddate loc o egons have been found. 29