Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia

Similar documents
1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

Major Genes, Polygenes, and

2. Map genetic distance between markers

Lecture 6: Introduction to Quantitative genetics. Bruce Walsh lecture notes Liege May 2011 course version 25 May 2011

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES

Affected Sibling Pairs. Biostatistics 666

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees:

Modeling IBD for Pairs of Relatives. Biostatistics 666 Lecture 17

Introduction to QTL mapping in model organisms

Lecture 9. QTL Mapping 2: Outbred Populations

GSBHSRSBRSRRk IZTI/^Q. LlML. I Iv^O IV I I I FROM GENES TO GENOMES ^^^H*" ^^^^J*^ ill! BQPIP. illt. goidbkc. itip31. li4»twlil FIFTH EDITION

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power

Lesson 4: Understanding Genetics

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015

Introduction to QTL mapping in model organisms

Gene mapping in model organisms

Inferring Genetic Architecture of Complex Biological Processes

The universal validity of the possible triangle constraint for Affected-Sib-Pairs

Lecture WS Evolutionary Genetics Part I 1

Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies

Case-Control Association Testing. Case-Control Association Testing

Classical Selection, Balancing Selection, and Neutral Mutations

Genetic Association Studies in the Presence of Population Structure and Admixture

The Admixture Model in Linkage Analysis

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

Population Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda

Life Cycles, Meiosis and Genetic Variability24/02/2015 2:26 PM

Legend: S spotted Genotypes: P1 SS & ss F1 Ss ss plain F2 (with ratio) 1SS :2 WSs: 1ss. Legend W white White bull 1 Ww red cows ww ww red

Genotype Imputation. Biostatistics 666

Phasing via the Expectation Maximization (EM) Algorithm

1. Understand the methods for analyzing population structure in genomes

Coding sequence array Office hours Wednesday 3-4pm 304A Stanley Hall

Solutions to Problem Set 4

p(d g A,g B )p(g B ), g B

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Evolution of phenotypic traits

Backward Genotype-Trait Association. in Case-Control Designs

Principles of Genetics

HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA)

Calculation of IBD probabilities

Introduction to QTL mapping in model organisms

KEY: Chapter 9 Genetics of Animal Breeding.

Part 2- Biology Paper 2 Inheritance and Variation Knowledge Questions

Quantitative Genetics & Evolutionary Genetics

Linkage and Linkage Disequilibrium

Latent Variable models for GWAs

NIH Public Access Author Manuscript Stat Sin. Author manuscript; available in PMC 2013 August 15.

Use of hidden Markov models for QTL mapping

QTL model selection: key players

Variance Component Models for Quantitative Traits. Biostatistics 666

Multiple QTL mapping

Mapping QTL to a phylogenetic tree

I Have the Power in QTL linkage: single and multilocus analysis

For 5% confidence χ 2 with 1 degree of freedom should exceed 3.841, so there is clear evidence for disequilibrium between S and M.

Causal Graphical Models in Systems Genetics

Nature Genetics: doi: /ng Supplementary Figure 1. Number of cases and proxy cases required to detect association at designs.

Chapter 2: Extensions to Mendel: Complexities in Relating Genotype to Phenotype.

Statistical Genetics I: STAT/BIOST 550 Spring Quarter, 2014

Overview. Background

Prediction of the Confidence Interval of Quantitative Trait Loci Location

Natural Selection. Population Dynamics. The Origins of Genetic Variation. The Origins of Genetic Variation. Intergenerational Mutation Rate

Biology. Revisiting Booklet. 6. Inheritance, Variation and Evolution. Name:

Stationary Distribution of the Linkage Disequilibrium Coefficient r 2

UNIT 8 BIOLOGY: Meiosis and Heredity Page 148

Calculation of IBD probabilities

1. What is the definition of Evolution? a. Descent with modification b. Changes in the heritable traits present in a population over time c.

Genetics (patterns of inheritance)

Module 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline

Evolutionary Genetics Midterm 2008

BIOL Evolution. Lecture 9

Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination)

Statistical issues in QTL mapping in mice

Network Biology-part II

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

EM algorithm. Rather than jumping into the details of the particular EM algorithm, we ll look at a simpler example to get the idea of how it works

Problems for 3505 (2011)

Friday Harbor From Genetics to GWAS (Genome-wide Association Study) Sept David Fardo

(Genome-wide) association analysis

THE WORK OF GREGOR MENDEL

Quantitative trait evolution with mutations of large effect

Processes of Evolution

Exam 5 Review Questions and Topics

Miller & Levine Biology

Joint Analysis of Multiple Gene Expression Traits to Map Expression Quantitative Trait Loci. Jessica Mendes Maia

Linear Regression (1/1/17)

List the five conditions that can disturb genetic equilibrium in a population.(10)

Quantile based Permutation Thresholds for QTL Hotspots. Brian S Yandell and Elias Chaibub Neto 17 March 2012

Binomial Mixture Model-based Association Tests under Genetic Heterogeneity

Parts 2. Modeling chromosome segregation

Statistical testing. Samantha Kleinberg. October 20, 2009

Binary trait mapping in experimental crosses with selective genotyping

GREENWOOD PUBLIC SCHOOL DISTRICT Genetics Pacing Guide FIRST NINE WEEKS Semester 1

On Computation of P-values in Parametric Linkage Analysis

Name Class Date. KEY CONCEPT Gametes have half the number of chromosomes that body cells have.

Science Unit Learning Summary

Linkage analysis and QTL mapping in autotetraploid species. Christine Hackett Biomathematics and Statistics Scotland Dundee DD2 5DA

Resultants in genetic linkage analysis

SOLUTIONS TO EXERCISES FOR CHAPTER 9

Transcription:

Expression QTLs and Mapping of Complex Trait Loci Paul Schliekelman Statistics Department University of Georgia

Definitions: Genes, Loci and Alleles A gene codes for a protein. Proteins due everything. A gene position on a chromosome is called a locus. The alternate forms of the gene at a locus are called alleles.

Definitions, continued: A causative locus for a trait is a locus with DNA sequence variation between individuals that contributes to variation in the trait between them. A quantitative trait locus (QTL) is a causative locus for a quantitative trait (e.g. height, IQ).

Definitions: Genetic Linkage Genes on the same chromosome tend to be inherited together. This is called linkage. The closer together two genes lie, the more often they are inherited together.

Genetic Markers A genetic marker is a known and detectable variation in a gene or other stretch of DNA. The presence of a marker can be detected in an individual organism s genome and its inheritance can be followed. A marker that is linked to a locus affecting a trait will appear more often in individuals with that trait.

Gene Mapping Goal of gene mapping is to locate genetic loci that are responsible for variation in traits of interest (e.g. occurrence of disease, intelligence). Based on searching for correlations between markers and trait of interest.

Gene Mapping Successes Mendelian traits have inheritance patterns consistent with a single causative locus. Hundreds of genes affecting known Mendelian traits (e.g. cystic fibrosis, breast cancer) in humans have been mapped. Some cases have led to new treatments and/or screening methods.

Gene Mapping Failures Complex traits do not follow simple one- locus Mendelian expectations. Assumed to be caused by mutations at multiple loci. Examples include many common diseases: asthma, bipolar disorder, diabetes, prostate cancer, etc.

Mapping of complex disease genes little success Altmuller et al, 2001: Reviewed 101 full genome scans of 31 complex diseases. - 67% did not show significant linkage to any marker. - of the significant linkages, very few have been reproduced.

Low power for mapping complex trait loci Power to detect genes decreases as the number of loci affecting the trait increases. By definition, complex diseases should be harder to map. We don t know how many genes underlie complex diseases, so don t know whether to be surprised by lack of success.

The central problem of human disease genetics is to solving this dilemma.

New Buzzword: Genetical Genomics: : the combination of molecular marker data and genome-wide expression data to elucidate the genetics of complex traits

Expression QTLs (eqtls) Gene expression levels can be treated as quantitative traits and their quantitative trait loci (QTLs) mapped. Transcript: : gene whose expression level is being measured. An eqtl for that transcript refers to a genetic locus with DNA sequence variation causing variation in the expression level.

Expression QTLs (eqtls) The eqtl could be at the transcript locus itself, or elsewhere in the genome. By combining microarrays and marker data, eqtls can be simultaneously generated for thousands of gene expression levels

Recent eqtl surveys Numerous recent eqtl surveys have found that expression levels frequently exhibit intermediate to high heritabilities. E.g. Schadt et al 2003 found 4339 eqtls over 3701 genes with log-of of-odd odd scores>4.3 and 11,021 genes with eqtl LOD scores>3.0

Recent eqtl surveys Thus, it is relatively easy to map QTLs for many expression levels.

Will eqtls be useful for dissecting complex physiological traits (e.g. disease)? Presumably, disease causative loci are also eqtls for some gene expression levels. Two task must be accomplished: 1. Show that transcript is related to disease. 2. Map eqtl for transcript. Then this eqtl should be causative locus for disease.

Causative Loci Exp. Lev. Disease Events Exp. Lev. Exp. Lev. Disease

Model for Gene Expression Assume L disease causative loci. M expression levels ( (X 1 X M ) tested. Each expression level X i depends on some subset of the L causative loci (often empty). T = indicator variable for disease status.

Model for Gene Expression P ( X T = 1) = = P ( T = 1 X ) P( X ) P( T = 1) ( T = 1 G ) P( X G ) P( G ) P G K K = Disease prevalence

Expression Levels Assume expression levels are normally distributed on some scale (e.g. log scale). Mean and variance determined by genotype at one or more of the L causative loci. Expression level distribution is then a mixture of normal distributions with weights determined by genotype distribution conditioned on disease status.

P Multiplicative Model ( T = G) = P( T = 1 G ) P( T = 1 G )... ( 1 ) P 1 1 1 2 2 P T L = G L ( = 0) = ( ) ( ) ( ) ( ) P X T g, g... g 1 2 Can show: ( X T = 1) c = g, g 1 2 P,... g c ( X g, g,... g ) u ( g ) u ( g )... u ( g ) P( g ) P( g )... P( g ) 1 2 c 1 1 K ( + ) ( ) ( ) P X g, g,... g 1 u G u G... u G K... K P g... P g 1 2 c 1 1 2 2 c c c 1 L 1 c 1 K 1 2 K 2 2... K c c c 1 2 c Where K i = P T G i ( = 1 G ) P( G ) and u ( G ) = P( T = 1 G ) i i i i i i i

If c=1 (expression controlled by one locus) and Hardy-Weinberg Equilibrium (alleles independent): P X T = 1 = 1 p u dd ϕ X μ, σ 2 ( ) ( ) ( ) ( ) 1 1 + 2p 1 p u Dd ϕ X μ, σ + ( ) ( ) ( ) 1 1 1 pu DD ϕ X μ, σ 2 1 1 Where p 1 is the allele frequency D=disease allele and d= normal allele ( X, ) φ μ σ is normal pdf DD DD dd ( ) ( ) DD DD dd Dd Dd

Additive Model P ( T = G ) = P( T = 1 G ) + P( T = 1 G ) +... + P( 1 ) P 1 1 1 2 2 T L = G L Can show: ( X T 1) ( X G )( u ( G ) + K K ) P( G ) P 1 1 1 2 +... G1 = = K L 1 P ( X T = 0) = G 1 P ( X G )( 1 u ( G ) K K... K ) P( G ) 1 1 1 1 K 2 3 L 1

Expression Level Mean Expression level mean assumed to have either multiplicative or additive dependence on genotype: μ g,..., g = μ g μ g... μ g or ( ) ( ) ( ) ( ) 1 c 1 1 2 2 μ g,..., g = μ g + μ g +... + μ g ( ) ( ) ( ) ( ) 1 c 1 1 2 2 c c

Power Calculations Two groups: disease affected and unaffected. t-test test for group difference conducted for each gene. Calculate power to detect differential expression. Interested in relationship between power and genetic model.

Causative Loci Exp. Lev. Disease Events Exp. Lev. Exp. Lev. C=1 C=5 C=2 Disease

Additive Model

Detecting expression level differences: Conclusions Power to detect expression level differences is poor for a multiplicative model with c=1, but reasonable for c=2-5. Power to detect expression level differences is very poor for an additive model.

Power for Mapping eqtls Power for mapping QTLs in natural populations deteriorates quickly as the number of QTLs increases. Power poor even for c=2.

Conclusions Natural Populations Power to detect expression level differences extremely poor if disease probability is additive. For multiplicative model there is no value of c (number of controlling loci) where power is good for detecting both linkage and expression level differences.

Many Unknowns Distribution of relationships between expression level and causative loci. Dominance relationships. Form of gene interactions. Allele frequencies.

Much Potential For Improved Power There is much information in the joint behavior of expression levels. Better experimental designs are possible (e.g. utilize family structure).

How can we use joint expression information? Schadt et al 2003: Crossed two mouse strains: one susceptible to diabetes and weight gain on high fat diet and the other not. 111 offspring from cross were put on high fat diet for 4 months. Liver tissue from each was profiled using microarrays. Genotypes for several hundred markers were obtained for each mouse.

From Schadt et al., 2003 Genetics of gene expression surveyed in maize, mouse and man. Nature 422: 297-302.

Results: Microarray data splits high FPM data into two groups. Gene mapping comparing separate high- FPM groups to low-fpm produced significant linkage for a obesity locus. Gene mapping comparing combined high- FPM groups to low-fpm groups do not find significant linkage.

Genetic Heterogeneity Affected Individuals Locus 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 x x x 2 x x x x 3 x x 4 x x 5 x x x 6 x x Major variation in which disease mutations are carried by affected individuals

Affected Individuals Locus 2 6 14 4 8 9 12 5 7 3 11 13 10 1 1 x x x 2 x x x x 3 x x 4 x x 5 x x x x 6 x x x

Goal: A method for identifying the genetic heterogeneity using genomewide expression data Gene mapping power could be dramatically improved if this heterogeneity could be accounted for

Proposed EM Algorithm Based Method: Data: N A affected and N U unaffected individuals. Microarray for each individual Disease status for each individual

Sample Indiv. Disease Status Transcript 1 2 3 4 5 6 1 Y 1 X 11 X 12 X 13 X 14 X 15 X 16 2 Y 2 X 21 X 22 X 23 X 24 X 25 X 26 3 Y 3 X 31 X 32 X 33 X 34 X 35 X 36 4 Y 4 X 41 X 42 X 43 X 44 X 45 X 46 5 Y- 5 X 51 X 52 X 53 X 54 X 55 X 56

Unobserved Variables Z =controlling i locus for transcript i, i=1 to M V jk =genotype on controlling locus j in individual k. Controlling loci 1 to L are the disease loci. Controlling loci L+1 to L+c are null loci that do not affect disease.

More specific goal: Cluster sample individuals by their values of V (that jk is, their genotype on the disease loci. ).

Likelihood L( θ X, Y, Z, V) = P X ZV PY V P ZV (,, θ ) (, θ ) (, θ ) P X Z V P Y V P Z V N M N (,, ) (, ) (, ) ij j iz θ k k θ θ = i= 1 j= 1 k= 1 j

P X Z, V, = Expression level distn. for gene j ( θ ) ij j iz j in indiv. i P( Y, ) k Vk θ = Disease probability for indiv. k P Z, V = Controlling locus and genotype probs. ( θ )

Parameter Estimates: Expression level mean for transcript a with controlling locus genotype b: ˆ μ L N = t= 1 i= 1 ab L N t= 1 i= 1 x P Z t V b X Y ( =, =,, θ ) 0 ia a it P Z t V b X Y ( =, =,, θ ) a it 0

ˆ σ Expression level variance for transcript a with controlling locus = L N 2 t= 1 i= 1 ab L N genotype b: ( ) 2 x (,,, ) ia μab P Za = t Vit = b X Y θ0 t= 1 i= 1 P Z t V b X Y ( ) a =, it =,, θ0

Probability that the expression of gene r has controlling locus q) ˆ = P Z =,, α rq ( q X Y θ ) r 0

Probability that controlling locus s has genotype t N ˆ λ st = j= 1 ( = t XY,, θ ) js 0 PV N

Disease penetrance parameters The genotype specific risk parameters are found by numerical maximization.

Assigning Genotypes: ( k Y, W ) P v W ij = = = ij i ij P Y v k P W v k ( ) ( ) i ij = ij ij = PYW (, ) vector containing the values of expression levels in individual i that are controlled by locus j. i ij

Transcripts assigned to same controlling locus if their expression is correlated in a way that is consistent with genetic model. Sample individuals assigned to same genotype at controlling locus if controlled expression levels are similar.

How well will it work? Potentially much information about genotype available in expression levels. On the other hand: Highly assumption laden. Many parameters to estimate. High-level transcripts may overwhelm analysis.

Causative Loci Exp. Lev. Disease Events Exp. Lev. Exp. Lev. C=1 C=5 C=2 Disease

Mapping Power - multiplicative