Introduction to Linkage Disequilibrium

Similar documents
Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination)

Problems for 3505 (2011)

1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES:

Case-Control Association Testing. Case-Control Association Testing

Phasing via the Expectation Maximization (EM) Algorithm

EXERCISES FOR CHAPTER 3. Exercise 3.2. Why is the random mating theorem so important?

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

Lab 12. Linkage Disequilibrium. November 28, 2012

Population Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

BTRY 7210: Topics in Quantitative Genomics and Genetics

1. Understand the methods for analyzing population structure in genomes

Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012

LECTURE # How does one test whether a population is in the HW equilibrium? (i) try the following example: Genotype Observed AA 50 Aa 0 aa 50

STAT 536: Genetic Statistics

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015

Question: If mating occurs at random in the population, what will the frequencies of A 1 and A 2 be in the next generation?

Lecture 1 Hardy-Weinberg equilibrium and key forces affecting gene frequency

Linkage and Linkage Disequilibrium

Friday Harbor From Genetics to GWAS (Genome-wide Association Study) Sept David Fardo

Population Genetics II (Selection + Haplotype analyses)

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

Genotype Imputation. Biostatistics 666

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

(Genome-wide) association analysis

Goodness of Fit Goodness of fit - 2 classes

CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation

Binomial Mixture Model-based Association Tests under Genetic Heterogeneity

Linear Regression (1/1/17)

Learning Your Identity and Disease from Research Papers: Information Leaks in Genome-Wide Association Study

Statistical Genetics I: STAT/BIOST 550 Spring Quarter, 2014

Cover Page. The handle holds various files of this Leiden University dissertation

For 5% confidence χ 2 with 1 degree of freedom should exceed 3.841, so there is clear evidence for disequilibrium between S and M.

Introduction to Analysis of Genomic Data Using R Lecture 6: Review Statistics (Part II)

On the limiting distribution of the likelihood ratio test in nucleotide mapping of complex disease

The Quantitative TDT

Department of Forensic Psychiatry, School of Medicine & Forensics, Xi'an Jiaotong University, Xi'an, China;

p(d g A,g B )p(g B ), g B

8. Genetic Diversity

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power

Lecture 2. Basic Population and Quantitative Genetics

opulation genetics undamentals for SNP datasets

Nature Genetics: doi: /ng Supplementary Figure 1. Number of cases and proxy cases required to detect association at designs.

Evolutionary Genetics Midterm 2008

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Lecture 7: Hypothesis Testing and ANOVA

Population Genetics I. Bio

2. Map genetic distance between markers

Genetic Association Studies in the Presence of Population Structure and Admixture

Computational Systems Biology: Biology X

Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies

Outline of lectures 3-6

BIOL Evolution. Lecture 9

Processes of Evolution

Notes on Population Genetics

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Notes for MCTP Week 2, 2014

Bayesian analysis of the Hardy-Weinberg equilibrium model

Genotype Imputation. Class Discussion for January 19, 2016

Humans have two copies of each chromosome. Inherited from mother and father. Genotyping technologies do not maintain the phase

Association studies and regression

Natural Selection. Population Dynamics. The Origins of Genetic Variation. The Origins of Genetic Variation. Intergenerational Mutation Rate

Case Studies in Ecology and Evolution

Lecture 13: Population Structure. October 8, 2012

Tutorial on Theoretical Population Genetics

Introductory seminar on mathematical population genetics

Population Genetics & Evolution

Outline. P o purple % x white & white % x purple& F 1 all purple all purple. F purple, 224 white 781 purple, 263 white

On coding genotypes for genetic markers with multiple alleles in genetic association study of quantitative traits

Outline of lectures 3-6

ML Testing (Likelihood Ratio Testing) for non-gaussian models

EM algorithm and applications Lecture #9

The genome encodes biology as patterns or motifs. We search the genome for biologically important patterns.

Lecture 1 Introduction to Quantitative Genetics

Neutral Theory of Molecular Evolution

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Detecting selection from differentiation between populations: the FLK and hapflk approach.

1 Preliminary Variance component test in GLM Mediation Analysis... 3

Lecture 9. Short-Term Selection Response: Breeder s equation. Bruce Walsh lecture notes Synbreed course version 3 July 2013

Heredity and Genetics WKSH

D. Incorrect! That is what a phylogenetic tree intends to depict.

Weierstraß-Institut. für Angewandte Analysis und Stochastik. Leibniz-Institut im Forschungsverbund Berlin e. V. Preprint ISSN

List the five conditions that can disturb genetic equilibrium in a population.(10)

Recombina*on and Linkage Disequilibrium (LD)

STAT 536: Migration. Karin S. Dorman. October 3, Department of Statistics Iowa State University

AEC 550 Conservation Genetics Lecture #2 Probability, Random mating, HW Expectations, & Genetic Diversity,

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia

Genetics (patterns of inheritance)

Using haplotypes for the prediction of allelic identity to fine-map QTL: characterization and properties

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.

Private Computation with Genomic Data for Genome-Wide Association and Linkage Studies

Modeling IBD for Pairs of Relatives. Biostatistics 666 Lecture 17

Life Cycles, Meiosis and Genetic Variability24/02/2015 2:26 PM

Educational Items Section

The phenotype of this worm is wild type. When both genes are mutant: The phenotype of this worm is double mutant Dpy and Unc phenotype.

Supporting Information

The Genetics of Natural Selection

Mechanisms of Evolution

Powerful multi-locus tests for genetic association in the presence of gene-gene and gene-environment interactions

Enduring Understanding: Change in the genetic makeup of a population over time is evolution Pearson Education, Inc.

Transcription:

Introduction to September 10, 2014

Suppose we have two genes on a single chromosome gene A and gene B such that each gene has only two alleles Aalleles : A 1 and A 2 Balleles : B 1 and B 2

Suppose we have two genes on a single chromosome gene A and gene B such that each gene has only two alleles Possible allele combinations: Aalleles : A 1 and A 2 Balleles : B 1 and B 2 A 1 B 1, A 1 B 2, A 2 B 1 and A 2 B 2

p 1 = probability of seeing allele A 1 p 2 = probability of seeing allele A 2 p 1 + p 2 = 1

p 1 = probability of seeing allele A 1 p 2 = probability of seeing allele A 2 p 1 + p 2 = 1 By Hardy-Weinberg principle, probability of genotype A 1 A 1 is p1 2 probability of genotype A 1 A 2 is 2p 1 p 2 probability of genotype A 2 A 2 is p2 2

p 1 = probability of seeing allele A 1 p 2 = probability of seeing allele A 2 p 1 + p 2 = 1 By Hardy-Weinberg principle, probability of genotype A 1 A 1 is p 2 1 probability of genotype A 1 A 2 is 2p 1 p 2 probability of genotype A 2 A 2 is p 2 2 HW equilibrium is about a single locus (with two alleles). How do we generalize to two loci??

Linkage Equilibrium: two sites/genes each with two alleles Linkage Equilibrium: Random Association : correlation between two loci

Linkage Equilibrium: two sites/genes each with two alleles Linkage Equilibrium: Random Association : correlation between two loci p 11 = probability of seeing the A 1 B 1 haplotype p 12 = probability of seeing the A 1 B 2 haplotype p 21 = probability of seeing the A 2 B 1 haplotype p 22 = probability of seeing the A 2 B 2 haplotype

Linkage Equilibrium: two sites/genes each with two alleles Linkage Equilibrium: Random Association : correlation between two loci p 11 = probability of seeing the A 1 B 1 haplotype p 12 = probability of seeing the A 1 B 2 haplotype p 21 = probability of seeing the A 2 B 1 haplotype p 22 = probability of seeing the A 2 B 2 haplotype The sites are in Linkage Equilibrium if p 11 = p 1 q 1, p 12 = p 1 q 2, etc.

is a deviation from this equilibrium: D = p 11 p 1 q 1 Note that p 11 = p 1 q 1 + D, p 12 = p 1 q 2 D, p 21 = p 2 q 1 D, p 22 = p 2 q 2 + D.

Lemma D = p 11 p 22 p 12 p 21.

Lemma D = p 11 p 22 p 12 p 21. Proof: p 11 p 22 = (p 1 q 1 + D)(p 2 q 2 + D) = p 1 q 1 p 2 q 2 + p 1 q 1 D + p 2 q 2 D + D 2 p 12 p 21 = (p 1 q 2 D)(p 2 q 1 D) = p 1 q 1 p 2 q 2 p 2 q 1 D p 1 q 2 D + D 2

Lemma D = p 11 p 22 p 12 p 21. Proof: p 11 p 22 = (p 1 q 1 + D)(p 2 q 2 + D) = p 1 q 1 p 2 q 2 + p 1 q 1 D + p 2 q 2 D + D 2 p 12 p 21 = (p 1 q 2 D)(p 2 q 1 D) = p 1 q 1 p 2 q 2 p 2 q 1 D p 1 q 2 D + D 2 And by subtracting these, we obtain p 11 p 22 p 12 p 21 = D(p 1 q 1 + p 2 q 1 + p 2 q 2 + p 1 q 2 ) = D (1) = D

What is the range of D??

What is the range of D?? Let D min = max{ p 1 q 1, p 2 q 2 } D max = min{p 1 q 2, p 2 q 1 }

What is the range of D?? Let D min = max{ p 1 q 1, p 2 q 2 } Now define: D max = min{p 1 q 2, p 2 q 1 } D = { D D max, D > 0 D D min, D < 0 Since p 11 = p 1 q 1 + D, and p 1 q 1 + D 0 (since p 11 is a probability), this implies D p 1 q 1 (and similarly D p 2 q 2 ).

For both to be satisfied, it must be that D max{ p 1 q 1, p 2 q 2 }

For both to be satisfied, it must be that D max{ p 1 q 1, p 2 q 2 } Similarly, D min{p 1 q 2, p 2 q 1 }

For both to be satisfied, it must be that Similarly, D max{ p 1 q 1, p 2 q 2 } D min{p 1 q 2, p 2 q 1 } For the two loci that we are considering, each loci individually is in Hardy-Weinberg equilibrium, but together a disequilibrium exists.

Example: Consider two SNPs in the coding region of glycoprotein A and glycoprotein B that change the amino acid sequence. Both of the proteins are on chromosome 4 and are found on the outside of red blood cells. SNP AminoAcids For Protein A : A Serine G Leucine For Protein B : T Methianine C Threonine

We have 1000 British people in the study (which means that there are 2000 chromosomes). The genotypes for each gene are as follows: Protein A : 298 AA Protein B : 99 TT 489 AG 418 TC 213 GG 483 CC The loci are individually in HW equilibrium.

We have 1000 British people in the study (which means that there are 2000 chromosomes). The genotypes for each gene are as follows: Protein A : 298 AA Protein B : 99 TT 489 AG 418 TC 213 GG 483 CC The loci are individually in HW equilibrium. Next, we can estimate the allele frequencies: 2 298 + 489 A : p A = =.5425 2000 489 + 2 213 G : q a = =.4575 2000 Similarly, you can find that T: p B =.3080 and C: q b =.6920.

If the haplotypes are in Linkage Equilibrium, then the probability of each haplotype will be p A p B, p A q b, q a p B, and q a q b respectively. HAPLOTYPE OBSERVED EXPECTED AT 474 (.5425)(.3080)(2000) = 334.2 AC 611 (.5425)(.6920)(2000) = 750.8 GT 142 (.4575)(.3080)(2000) = 281.8 GC 773 (.4575)(.6920)(2000) = 633.2 χ 2 = (O i E i ) 2 E i where the degrees of freedom = #categories-1-#other dependencies, in this case is 4-1-2 = 1. χ 2 = 184.7 with 1 d.f., which yields a P-value of.0001, so we can safely REJECT Linkage Equilibrium between the two SNPs.

What about the??

What about the?? How much LD exists between the two loci?? ˆP AB = 474 2000 =.2370 ˆP Ab = 611 2000 =.3055 ˆP ab = 142 2000 =.0710 ˆP ab = 773 2000 =.3865 D = ˆP AB ˆP ab ˆP ab ˆP Ab =.07 D max = min{p A q b, q a p B } = min{.38,.14}=.14. D D max =.07 So D =.14 = 50%. This means that the LD is 50% of its theoretical maximum!

Summary: We reject Linkage Equilibrium by the χ 2 test, so that means that LD exists. How much LD? 50% of the theoretical maximum.

Summary: We reject Linkage Equilibrium by the χ 2 test, so that means that LD exists. How much LD? 50% of the theoretical maximum. Other Measures of LD D = { D D max, D > 0 D D min, D < 0 r 2 D = p A p a p B p b r 2 is the correlation coefficient of the frequencies. It has the convenient property that χ 2 = r 2 N, where N is the number of chromosomes in the sample (see the lecture on Introduction to r 2 for a proof)..

Summary: We reject Linkage Equilibrium by the χ 2 test, so that means that LD exists. How much LD? 50% of the theoretical maximum. Other Measures of LD D = { D D max, D > 0 D D min, D < 0 r 2 D = p A p a p B p b r 2 is the correlation coefficient of the frequencies. It has the convenient property that χ 2 = r 2 N, where N is the number of chromosomes in the sample (see the lecture on Introduction to r 2 for a proof)..

D and r 2 are the most important measures of LD. r 2 is the favorite of the HapMap project.

Definition The case D = 1 is called Complete LD.

Definition The case D = 1 is called Complete LD. Intuition for Complete LD: two SNPs are not separated by recombination. In this case, there are at most 3 of the 4 possible haplotypes present in the population.

Definition The case D = 1 is called Complete LD. Intuition for Complete LD: two SNPs are not separated by recombination. In this case, there are at most 3 of the 4 possible haplotypes present in the population. Definition The case r 2 = 1 is called Perfect LD. The case of perfect LD happens if and only if the two SNPs have not been separated by recombination, but also have the same allele frequencies.