Calculation of IBD probabilities

Similar documents
Calculation of IBD probabilities

The Lander-Green Algorithm. Biostatistics 666 Lecture 22

Genotype Imputation. Biostatistics 666

Affected Sibling Pairs. Biostatistics 666

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees:

MCMC IN THE ANALYSIS OF GENETIC DATA ON PEDIGREES

Lecture 9. QTL Mapping 2: Outbred Populations

The Quantitative TDT

2. Map genetic distance between markers

For 5% confidence χ 2 with 1 degree of freedom should exceed 3.841, so there is clear evidence for disequilibrium between S and M.

Gene mapping, linkage analysis and computational challenges. Konstantin Strauch

The universal validity of the possible triangle constraint for Affected-Sib-Pairs

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

UNIT 8 BIOLOGY: Meiosis and Heredity Page 148

AFFECTED RELATIVE PAIR LINKAGE STATISTICS THAT MODEL RELATIONSHIP UNCERTAINTY

Inference on pedigree structure from genome screen data. Running title: Inference on pedigree structure. Mary Sara McPeek. The University of Chicago

Breeding Values and Inbreeding. Breeding Values and Inbreeding

p(d g A,g B )p(g B ), g B

The genomes of recombinant inbred lines

Prediction of the Confidence Interval of Quantitative Trait Loci Location

Natural Selection. Population Dynamics. The Origins of Genetic Variation. The Origins of Genetic Variation. Intergenerational Mutation Rate

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

Variance Component Models for Quantitative Traits. Biostatistics 666

Resemblance between relatives

MEIOSIS, THE BASIS OF SEXUAL REPRODUCTION

Lecture WS Evolutionary Genetics Part I 1

Identification of Pedigree Relationship from Genome Sharing

Gene mapping in model organisms

Analytic power calculation for QTL linkage analysis of small pedigrees

Objectives. Announcements. Comparison of mitosis and meiosis

genome a specific characteristic that varies from one individual to another gene the passing of traits from one generation to the next

Mathematical models in population genetics II

Statistical issues in QTL mapping in mice

Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination)

GBLUP and G matrices 1

Lecture 6. QTL Mapping

Life Cycles, Meiosis and Genetic Variability24/02/2015 2:26 PM

Computational Approaches to Statistical Genetics

Evaluating the Performance of a Block Updating McMC Sampler in a Simple Genetic Application

I Have the Power in QTL linkage: single and multilocus analysis

Complexity and Approximation of the Minimum Recombination Haplotype Configuration Problem

SOLUTIONS TO EXERCISES FOR CHAPTER 9

SNP Association Studies with Case-Parent Trios

TOPICS IN STATISTICAL METHODS FOR HUMAN GENE MAPPING

Solutions to Problem Set 4

Introduction to population genetics & evolution

Lecture 11: Multiple trait models for QTL analysis

THEORETICAL ASPECTS OF PEDIGREE ANALYSIS

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

Linkage and Linkage Disequilibrium

Resultants in genetic linkage analysis

Unit 6 Reading Guide: PART I Biology Part I Due: Monday/Tuesday, February 5 th /6 th

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

Dropping Your Genes. A Simulation of Meiosis and Fertilization and An Introduction to Probability

F1 Parent Cell R R. Name Period. Concept 15.1 Mendelian inheritance has its physical basis in the behavior of chromosomes

Population Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda

Lesson 4: Understanding Genetics

(Genome-wide) association analysis

Genetics (patterns of inheritance)

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Notes on Population Genetics

Outline. P o purple % x white & white % x purple& F 1 all purple all purple. F purple, 224 white 781 purple, 263 white

Methods for QTL analysis

Constructing a Pedigree

Use of hidden Markov models for QTL mapping

Optimal Allele-Sharing Statistics for Genetic Mapping Using Affected Relatives

Introduction to Genetics

Methods for Cryptic Structure. Methods for Cryptic Structure

Q Expected Coverage Achievement Merit Excellence. Punnett square completed with correct gametes and F2.

Quantitative Genetics & Evolutionary Genetics

Concept 15.1 Mendelian inheritance has its physical basis in the behavior of chromosomes

On Computation of P-values in Parametric Linkage Analysis

Modeling IBD for Pairs of Relatives. Biostatistics 666 Lecture 17

Quantitative Genetics

MIXED MODELS THE GENERAL MIXED MODEL

QTL mapping under ascertainment

Computation of Multilocus Prior Probability of Autozygosity for Complex Inbred Pedigrees

Introduction to Genetics

Covariance between relatives

Univariate Linkage in Mx. Boulder, TC 18, March 2005 Posthuma, Maes, Neale

Linkage analysis and QTL mapping in autotetraploid species. Christine Hackett Biomathematics and Statistics Scotland Dundee DD2 5DA

A mixed model based QTL / AM analysis of interactions (G by G, G by E, G by treatment) for plant breeding

Introduc)on to Gene)cs How to Analyze Your Own Genome Fall 2013

Meiosis vs Mitosis. How many times did it go through prophase-metaphase-anaphase-telophase?

MOLECULAR MAPS AND MARKERS FOR DIPLOID ROSES

Mapping quantitative trait loci in oligogenic models

Genetic Association Studies in the Presence of Population Structure and Admixture

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power

Introduction to QTL mapping in model organisms

Lander-Green Algorithm. Biostatistics 666

Biol. 303 EXAM I 9/22/08 Name

Parts 2. Modeling chromosome segregation

BIOLOGY LTF DIAGNOSTIC TEST MEIOSIS & MENDELIAN GENETICS

Bayesian Inference of Interactions and Associations

CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation

Guided Notes Unit 6: Classical Genetics

Lecture 3: Probability

Heredity and Genetics WKSH

Causal Graphical Models in Systems Genetics

Transcription:

Calculation of IBD probabilities David Evans University of Bristol

This Session Identity by Descent (IBD) vs Identity by state (IBS) Why is IBD important? Calculating IBD probabilities Lander-Green Algorithm (MERLIN) Single locus probabilities Hidden Markov Model > Multipoint IBD Other ways of calculating IBD status Elston-Stewart Algorithm MCMC approaches MERLIN Practical Example IBD determination Information content mapping SNPs vs micro-satellite markers?

Identity By Descent (IBD) 2 3 4 2 3 3 4 3 2 Identical by Descent Identical by state only Two alleles are IBD if they are descended from the same ancestral allele

Example: IBD in Siblings Consider a mating between mother AB x father CD: Sib 2 Sib AC AD BC BD AC 2 0 AD 2 0 BC 0 2 BD 0 2 IBD 0 : : 2 25% : 50% : 25%

Why is IBD Sharing Important? 3/4 /2 3/4 /3 2/4 /4 4/4 Affected relatives not only share disease alleles IBD, but also tend to share marker alleles close to the disease locus IBD more often than chance 4/4 /4 IBD sharing forms the basis of nonparametric linkage statistics

Crossing over between homologous chromosomes

Cosegregation > Linkage A Q Parental genotype A Q Non-recombinant Parental genotypes (many, θ) A 2 Q 2 A Q 2 A 2 Q 2 Recombinant genotypes (few, θ) A 2 Q Alleles close together on the same chromosome tend to stay together in meiosis; therefore they tend be co-transmitted.

Segregating Chromosomes MARKER DISEASE GENE

Marker Shared Among Affecteds /2 3/4 3/4 /3 2/4 /4 4/4 4/4 /4 Genotypes for a marker with alleles {,2,3,4}

Linkage between QTL and marker QTL Marker IBD 0 IBD IBD 2

NO Linkage between QTL and marker Marker

IBD can be trivial / 2 / 2 IBD0 / 2 / 2

Two Other Simple Cases / 2 / 2 / 2 / 2 IBD2 / / 2 2 / 2 2 /

A little more complicated / 2 2 / 2 IBD (50% chance) / 2 / 2 IBD2 (50% chance)

And even more complicated IBD? / /

Bayes Theorem for IBD Probabilities j0,,2 j IBD G P j IBD P i IBD G P i IBD P G P i IBD G P i IBD P G P G i G i IBD P ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ), P(IBD ) (

P(Genotype IBD State) Sib Sib 2 P(observing genotypes k alleles IBD) k0 k k2 A A A A p 4 p 3 p 2 A A A A 2 2p 3 p 2 p 2 p 2 0 A A A 2 A 2 p 2 p 2 2 0 0 A A 2 A A 2p 3 p 2 p 2 p 2 0 A A 2 A A 2 4p 2 p 2 2 p p 2 2p p 2 A A 2 A 2 A 2 2p p 3 2 p p 2 2 0 A 2 A 2 A A p 2 p 2 2 0 0 A 2 A 2 A A 2 2p p 3 2 p p 2 2 0 A 2 A 2 A 2 A 2 p 4 2 p 3 2 p 2 2

Worked Example p 0.5 P( G IBD 0) P( G IBD ) P( G IBD 2) P( G) P( IBD 0 G) / P( IBD G) / P( IBD 2 G)

Worked Example / / 9 4 ) ( 4 ) 2 ( 9 4 ) ( 2 ) ( 9 ) ( 4 ) 0 ( 64 9 4 2 4 ) ( 4 2) ( 8 ) ( 6 0) ( 0.5 2 3 4 2 3 4 2 3 4 + + G P p G IBD P G P p G IBD P G P p G IBD P p p p G P p IBD G P p IBD G P p IBD G P p

For ANY PEDIGREE the inheritance pattern at any point in the genome can be completely described by a binary inheritance vector of length 2n: v(x) (p, m, p 2, m 2,,p n,m n ) whose coordinates describe the outcome of the paternal and maternal meioses giving rise to the n non-founders in the pedigree p i (m i ) is 0 if the grandpaternal allele transmitted p i (m i ) is if the grandmaternal allele is transmitted a / b c / d v(x) [0,0,,] a / c b / d

Inheritance Vector In practice, it is not possible to determine the true inheritance vector at every point in the genome, rather we represent partial information as a probability distribution of the possible inheritance vectors a b 2 a c p m p 2 3 a b 5 a c 4 m 2 b b Inheritance vector Prior Posterior ------------------------------------------------------------------- 0000 /6 /8 000 /6 /8 000 /6 0 00 /6 0 000 /6 /8 00 /6 /8 00 /6 0 0 /6 0 000 /6 /8 00 /6 /8 00 /6 0 0 /6 0 00 /6 /8 0 /6 /8 0 /6 0 /6 0

Computer Representation At each marker location l Define inheritance vector v l Meiotic outcomes specified in index bit Likelihood for each gene flow pattern Conditional on observed genotypes at location l 2 2n elements!!! 0000 000 000 00 000 00 00 0 000 00 00 0 00 0 0 L L L L L L L L L L L L L L L L

a) bit-indexed array 0000 000 000 00 000 00 00 0 000 00 00 0 00 0 0 L L 2 L L 2 L L 2 L L 2 b) packed tree L L 2 L L 2 L L 2 L L 2 c) sparse tree Legend Node with zero likelihood Node identical to sibling L L 2 L L 2 Likelihood for this branch Abecasis et al (2002) Nat Genet 30:97-0

Multipoint IBD IBD status may not be able to be ascertained with certainty because e.g. the mating is not informative, parental information is not available IBD information at uninformative loci can be made more precise by examining nearby linked loci

Multipoint IBD a / b / c / d 2 / IBD 0 a / c b / d / / 2 IBD 0 or IBD?

Complexity of the Problem in Larger Pedigrees For each person 2n meioses in pedigree with n non-founders Each meiosis has 2 possible outcomes Therefore 2 2n possibilities for each locus For each genetic locus One location for each of m genetic markers Distinct, non-independent meiotic outcomes Up to 4 nm distinct outcomes!!!

Example: Sib-pair Genotyped at 0 Markers Inheritance vector P(G 0000) ( θ) 4 0000 000 000 2 3 4 m 0 Marker (2 2xn ) m (2 2 x 2 ) 0 ~ 0 2 possible paths!!!

P(IBD) 2 at Marker Three IBD Inheritance vector (2) () () 0000 000 000 (2) 2 3 4 m 0 Marker (L[0000] + L[00] + L[00] + L[] ) / L[ALL]

P(IBD) 2 at arbitrary position on the chromosome Inheritance vector 0000 000 000 2 3 4 m 0 Marker (L[0000] + L[00] + L[00] + L[] ) / L[ALL]

Lander-Green Algorithm The inheritance vector at a locus is conditionally independent of the inheritance vectors at all preceding loci given the inheritance vector at the immediately preceding locus ( Hidden Markov chain )

Lander-Green Algorithm Inheritance vector 0000 000 000 2 3 4 m 0 Marker M(2 2n ) 2 0 x 6 2 2560 calculations

Lander-Green Algorithm Summary Factorize likelihood by marker Complexity m e n Strengths Large number of markers Relatively small pedigrees

Elston-Stewart Algorithm Factorize likelihood by individual Complexity n e m Small number of markers Large pedigrees With little inbreeding VITESSE, FASTLINK etc

Other methods Number of MCMC methods proposed ~Linear on # markers ~Linear on # people Hard to guarantee convergence on very large datasets Many widely separated local minima E.g. SIMWALK

MERLIN-- Multipoint Engine for Rapid Likelihood Inference

Capabilities Linkage Analysis NPL and K&C LOD Variance Components Haplotypes Most likely Sampling All IBD and info content Error Detection Most SNP typing errors are Mendelian consistent Recombination No. of recombinants per family per interval can be controlled Simulation

MERLIN Website www.sph.umich.edu/csg/abecasis/merlin Reference FAQ Source Tutorial Linkage Haplotyping Simulation Error detection IBD calculation Binaries

Input Files Pedigree File Relationships Genotype data Phenotype data Data File Describes contents of pedigree file Map File Records location of genetic markers

Example Pedigree File <contents of example.ped> 0 0 x 3 3 x x 2 0 0 2 x 4 4 x x 3 0 0 x 2 x x 4 2 2 x 4 3 x x 5 3 4 2 2.234 3 2 2 6 3 4 2 4.32 2 4 2 2 <end of example.ped> Encodes family relationships, marker and phenotype information

Data File Field Codes Code M A T C Z S[n] Description Marker Genotype Affection Status. Quantitative Trait. Covariate. Zygosity. Skip n columns.

Example Data File <contents of example.dat> T some_trait_of_interest M some_marker M another_marker <end of example.dat> Provides information necessary to decode pedigree file

Example Map File <contents of example.map> CHROMOSOME MARKER POSITION 2 D2S60 60.0 2 D2S308 65.0 <end of example.map> Indicates location of individual markers, necessary to derive recombination fractions between them

Worked Example p 0.5 P( IBD 0 G) P( IBD G) P( IBD 2 G) 9 4 9 4 9 / / merlin d example.dat p example.ped m example.map --ibd

Application: Information Content Mapping Information content: Provides a measure of how well a marker set approaches the goal of completely determining the inheritance outcome Based on concept of entropy E -ΣP i log 2 P i where P i is probability of the ith outcome I E (x) E(x)/E 0 Always lies between 0 and Does not depend on test for linkage Scales linearly with power

Application: Information Content Mapping Simulations ABI ( micro-satellite per 0cM) decode ( microsatellite per 3cM) Illumina ( SNP per 0.5cM) Affymetrix ( SNP per 0.2 cm) Which panel performs best in terms of extracting marker information? merlin d file.dat p file.ped m file.map --information

SNPs vs Microsatellites Information Content.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. 0.0 Densities SNP microsat 0.2 cm 3 cm 0.5 cm 0 cm 0 0 20 30 40 50 60 70 80 90 00 Position (cm) SNPs + parents microsat + parents