Tian Zheng Department of Statistics Columbia University

Similar documents
Backward Haplotype Transmission Association (BHTA) Algorithm. Tian Zheng Department of Statistics Columbia University. February 5 th, 2002

Using the estimated penetrances to determine the range of the underlying genetic model in casecontrol

Multistage Median Ranked Set Sampling for Estimating the Population Median

Correspondence Analysis & Related Methods

Optimization Methods: Linear Programming- Revised Simplex Method. Module 3 Lecture Notes 5. Revised Simplex Method, Duality and Sensitivity analysis

Thermodynamics of solids 4. Statistical thermodynamics and the 3 rd law. Kwangheon Park Kyung Hee University Department of Nuclear Engineering

Learning the structure of Bayesian belief networks

Chapter 23: Electric Potential

A. Thicknesses and Densities

Physics 2A Chapter 11 - Universal Gravitation Fall 2017

Large scale magnetic field generation by accelerated particles in galactic medium

Physics 11b Lecture #2. Electric Field Electric Flux Gauss s Law

Goodness-of-fit for composite hypotheses.

P 365. r r r )...(1 365

Khintchine-Type Inequalities and Their Applications in Optimization

V. Principles of Irreversible Thermodynamics. s = S - S 0 (7.3) s = = - g i, k. "Flux": = da i. "Force": = -Â g a ik k = X i. Â J i X i (7.

3. A Review of Some Existing AW (BT, CT) Algorithms

UNIT10 PLANE OF REGRESSION

Set of square-integrable function 2 L : function space F

x = , so that calculated

Part V: Velocity and Acceleration Analysis of Mechanisms

APPLICATIONS OF SEMIGENERALIZED -CLOSED SETS

ALL QUESTIONS ARE WORTH 20 POINTS. WORK OUT FIVE PROBLEMS.

Density Functional Theory I

Recall that quantitative genetics is based on the extension of Mendelian principles to polygenic traits.

Checking Pairwise Relationships. Lecture 19 Biostatistics 666

Generating Functions, Weighted and Non-Weighted Sums for Powers of Second-Order Recurrence Sequences

N = N t ; t 0. N is the number of claims paid by the

PHYS Week 5. Reading Journals today from tables. WebAssign due Wed nite

8 Baire Category Theorem and Uniform Boundedness

ITI Introduction to Computing II

Title: Bounds and normalization of the composite linkage disequilibrium coefficient.

Physics 202, Lecture 2. Announcements

A Brief Guide to Recognizing and Coping With Failures of the Classical Regression Assumptions

Bayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County

Machine Learning. Spectral Clustering. Lecture 23, April 14, Reading: Eric Xing 1

A Bijective Approach to the Permutational Power of a Priority Queue

Variability, Randomness and Little s Law

24-2: Electric Potential Energy. 24-1: What is physics

Many Fields of Battle: How Cost Structure Affects Competition Across Multiple Markets

Groupoid and Topological Quotient Group

CS649 Sensor Networks IP Track Lecture 3: Target/Source Localization in Sensor Networks

The Greatest Deviation Correlation Coefficient and its Geometrical Interpretation

Energy in Closed Systems

The Law of Biot-Savart & RHR P θ

Event Shape Update. T. Doyle S. Hanlon I. Skillicorn. A. Everett A. Savin. Event Shapes, A. Everett, U. Wisconsin ZEUS Meeting, October 15,

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

An Approach to Inverse Fuzzy Arithmetic

A NOTE ON ELASTICITY ESTIMATION OF CENSORED DEMAND

Chapter Fifiteen. Surfaces Revisited

Chapter 4: Probability and Probability Distributions

Lecture 5 Single factor design and analysis

ON INDEPENDENT SETS IN PURELY ATOMIC PROBABILITY SPACES WITH GEOMETRIC DISTRIBUTION. 1. Introduction. 1 r r. r k for every set E A, E \ {0},

an application to HRQoL

ANOVA. The Observations y ij

Physics Exam II Chapters 25-29

Summer Workshop on the Reaction Theory Exercise sheet 8. Classwork

PHYS 705: Classical Mechanics. Derivation of Lagrange Equations from D Alembert s Principle

PHYS Summer Professor Caillault Homework Solutions. Chapter 5

Lecture 3: Probability Distributions

Scalars and Vectors Scalar

Numerical Integration

Splay Trees Handout. Last time we discussed amortized analysis of data structures

CEEP-BIT WORKING PAPER SERIES. Efficiency evaluation of multistage supply chain with data envelopment analysis models

Machine Learning 4771

Classical Worm algorithms (WA)

Bayesian Assessment of Availabilities and Unavailabilities of Multistate Monotone Systems

Name. Date. Period. Engage Examine the pictures on the left. 1. What is going on in these pictures?

Polynomial Regression Models

Physics 207 Lecture 16

A Method of Reliability Target Setting for Electric Power Distribution Systems Using Data Envelopment Analysis

Physics 121 Hour Exam #5 Solution

Distinct 8-QAM+ Perfect Arrays Fanxin Zeng 1, a, Zhenyu Zhang 2,1, b, Linjie Qian 1, c

Mechanics Physics 151

Comparison of Regression Lines

Supersymmetry in Disorder and Chaos (Random matrices, physics of compound nuclei, mathematics of random processes)

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

6.6 The Marquardt Algorithm

INTERVAL ESTIMATION FOR THE QUANTILE OF A TWO-PARAMETER EXPONENTIAL DISTRIBUTION

1 Explicit Explore or Exploit (E 3 ) Algorithm

Vibration Input Identification using Dynamic Strain Measurement

State Estimation. Ali Abur Northeastern University, USA. Nov. 01, 2017 Fall 2017 CURENT Course Lecture Notes

4.4 Continuum Thermomechanics

Applied Statistical Mechanics Lecture Note - 13 Molecular Dynamics Simulation

Monte Carlo comparison of back-propagation, conjugate-gradient, and finite-difference training algorithms for multilayer perceptrons

THE SUMMATION NOTATION Ʃ

AP Centripetal Acceleration Lab

PHY126 Summer Session I, 2008

4 SingularValue Decomposition (SVD)

Uniform Circular Motion

ON THE FRESNEL SINE INTEGRAL AND THE CONVOLUTION

1) (A B) = A B ( ) 2) A B = A. i) A A = φ i j. ii) Additional Important Properties of Sets. De Morgan s Theorems :

Approximate Abundance Histograms and Their Use for Genome Size Estimation

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Markov Chain Monte Carlo Lecture 6

Quantitative Genetic Models Least Squares Genetic Model. Hardy-Weinberg (1908) Principle. change of allele & genotype frequency over generations

Revision of Lecture Eight

6 PROBABILITY GENERATING FUNCTIONS

Chapter 5 Multilevel Models

Transcription:

Haplotype Tansmsson Assocaton (HTA) An "Impotance" Measue fo Selectng Genetc Makes Tan Zheng Depatment of Statstcs Columba Unvesty Ths s a jont wok wth Pofesso Shaw-Hwa Lo n the Depatment of Statstcs at Columba Unvesty. 1

Outlne Make selectng poblem n genetc mappng fo complex tats HTA Statstcs Make sceenng based on HTA Example Summay 2

Genetc Mappng: Sngle locus dseases: the sk of the dsease s decded by the dffeences on one gene. Conventonal appoaches make-wse tests Common dseases: Complex tats of common dseases ae usually caused by multple genes wth possble nteactons among them. Make-wse tests cannot captue pesumed nteactons among dsease genes. Analyss that takes nto account the possble nteactons among makes should be used. 3

Poblem: Fne mappng: Lage numbe of avalable genetc makes. Complex tats: Jont analyss of ndvdual makes and the possble nteactons. The numbe of makes and possble nteactons geatly exceed the numbe of patents n the study. Soluton: Pe-select a small set of most mpotant makes so that detaled analyss that nvolves nteactons can be caed out usng data wth modeate sze. 4

Make selecton fo complex tats Stat wth a lage set of canddate genetc makes. Sceen out the makes wth lttle nfomaton egadng the dsease tats. Take nto account possble nteacton among the dsease genes. Tme and memoy effcent. 5

Data: genetc nfomaton of a andom sample of n patents and the paents. 2n paent-patent tansmsson pas Each pa conssts of two haplotypes one tansmtted and the othe untansmtted () l h t th Fo l pa, let be the haplotype tansmtted to the dseased chld, and h () l u be the untansmtted t = () l = t n #( h h ) u = () l = u n #( h h ) 6

Haplotype tansmsson dsequlbum (HTD) s defned to measue the amount of lnkage/ld nfomaton contaned n the set of makes beng tested: t u 2 ( ), HTD = n n whose expectaton unde the null hypothess s equal to the tace of the Fshe s nfomaton matx paametezed by haplotype elatve sks. 7

Assume m makes SM = { M1, M2,..., Mm} ae beng tested, to evaluate the nfomaton contbuted by the th make M, whch has alleles a and b, consde SM = SM / M( th -deleted make set). Let H = { h, h,..., h } 1 2 H be the set of haplotypes spanned by S M t,and n can be defned as befoe. n u, and the counts 8

t Denote by n t ( a) and n( b), the numbe of tansmssons of the enlaged haplotypes: h a, and h b, espectvely. ae defned fo the non- u Smlaly, two counts n u ( a) and n ( b) tansmssons of the enlaged haplotypes. It s easy to obseve that n = n ( a ) + n ( b ) t t t n = n ( a ) + n ( b ) u u u. 9

HTD fo m makes, M 1 2 S = { M, M,..., M }: HTD( m) = ( n ( a )- n ( a )) + ( n ( b )- n ( b )) h H HTD fo the m-1 makes n m t u 2 t u 2 th -deleted make set S = S / M : M M t u HTD ( m 1) = ( n n ) h 2 t t u u = ( n ( a ) + n ( b ) n ( a )- n ( b )) h = ( n ( a ) n ( a )) + ( n ( b )- n ( b )) h H H H t u 2 t u 2 t u t u + 2 ( n ( a ) n ( a )( n ( b )- n ( b )) h = HTD( m) t u t u + 2 ( n ( a ) n ( a )( n ( b )- n ( b )) h H H 2 10

Thus, the amount of nfomaton bought by make M can be evaluated usng the HTD dffeence the nfomaton dop. HTD ( m 1) = HTD ( m 1) HTD( m) t u t u = 2 ( n( a) n ( a)( n( b)- n ( b)) h H Haplotype Tansmsson Assocaton (HTA) s t u t u HTA ( m) = ( n( a) n ( a)( n( b) - n ( b)) + h Η h h n( ) a b h Η 11

The popetes of HTA statstc: Negatve Expectaton of HTA ( m ) Most mpotant M contbutes mpotant lnkage nfomaton to the cuent make set. Zeo M only contans no lnkage nfomaton and no make n the data set s assocated wth any dsease susceptblty loc. M contbutes lttle lnkage nfomaton but nose to the data, and Postve dlutes the tue lnkage/assocaton nfomaton caed by othe makes. Least mpotant 12

Make selecton algothm based on the HTA statstc Data {,,..., } S = M M M M 1 2 K s the total numbe of makes. m s the numbe of makes etaned n S M. Fo each = 1,2,..., m, calculate HTA ( m ) fo M. K Delete the make wth the hghest HTA ( m ) n S M and contnue n the loop. Any non-negatve HTA? No Retun S M as sceenng esult. Yes 13

Example: a complex dsease wth thee susceptblty loc A, B, and E. Haplotype Haplotypc Relatve Rsk (HRR) ABE hgh ABe low AbE low Abe hgh abe low abe hgh abe hgh abe low 20 makes ae geneated wth 3 of them n stong lnkage/dsequlbum wth the dsease loc espectvely. 14

Aveage HTA values fo lnked and unlnked makes loop 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 lnked 0 0 0 1 0-2 0 2 14 36 93 174 390 805 1626 2864 5538 NA NA NA unlnked 0-1 -2-1 -3 0-6 -6-15 -41-71 -114-343 -716-1568 -3317-6563 -12549-968 -1296 HTD values dung the sceenng + : lnked make o : unlnked make Sceenng stopped. 15

Summay HTA statstc and BHTA algothm HTA measues the mpotance of a make n tems of the amount of nfomaton contbuted by t. Sceenng algothm based on HTA s fast, haplotype-based. It s able to handle complcated nteacton among dsease loc. Refeence pape: Lo, Shaw-Hwa, Zheng, Tan: Backwad Haplotype Tansmsson Assocaton (BHTA) Algothm A Fast Multple-Make Sceenng Method. Hum Heed. To appea. 16