Major Genes, Polygenes, and QTLs
Major genes --- genes that have a significant effect on the phenotype Polygenes --- a general term of the genes of small effect that influence a trait QTL, quantitative trait locus --- a particular gene underlying the trait. Usually used when a gene underlying a trait is mapped to a particular chromosomal region Candidate gene --- a particular known gene that is of interest as being a potential candidate for contributing to the variation in a trait Mendelizing allele. The allele has a sufficiently large effect that its impact is obvious when looking at phenotype
Major Genes Major morphological mutations of classical genetics that arose by spontaneous or induced mutation Genes of large effect have been found selected lines pygmy, obese, dwarf and hg alleles in mice booroola F in sheep halothane sensitivity in pigs Major genes tend to be deleterious and are at very low frequencies in unselected populations, and contribute little to Var(A)
Genes for Genetic modification of muscling Natural mutations in the myostatin gene in cattle
Natural mutation in the callipyge - gene in sheep
Booroola gene in sheep increasing ovulation rate Merino Sheep
Major genes for mouse body size The mutations ob or db cause deficiencies in leptin production, or leptin receptor deficiencies
Major Genes and Isoalleles What is the genetic basis for quantitative variation? Honest answer --- don t know. One hypothesis: isoalleles. A locus that has an allele of major effect may also have alleles of much smaller effect (isoalleles) that influence the trait of interest. Structural vs. regulatory changes Structural: change in an amino acid sequence Regulatory: change affecting gene regulation General assumption: regulatory changes are likely more important
Cis vs. trans effects Cis effect --- regulatory change only affects genes (tightly) linked on the same chromosome Cis-acting locus. The allele influences The regulation of a gene on the same DNA molecule
Trans effect --- a diffusible factor that can influence regulation of unlinked genes Trans-acting locus. This locus influences genes on other chromosomes and non-adjacent sites on the same chromosome
CIS-modifiers Genomic location of genes on array Genomic location of mrna level modifiers
TRANS-modifiers Genomic location of genes on array Genomic location of mrna level modifiers
MASTER modifiers Genomic location of genes on array Genomic location of mrna level modifiers
Polygenic Mutation For normal genes (i.e., those with large effects) simply giving a mutation rate is sufficient (e.g. the rate at which an dwarfing allele appears) For alleles contributing to quantitative variation, we must account for both the rate at which mutants appear as well as the phenotypic effect of each Mutational variance, V m or! 2 m - the amount of new additive genetic variance introduced by mutation each generation Typically V m is on the order of 10-3 V E
Simple Tests for the Presence of Major Genes Simple Visual tests: Phenotypes fall into discrete classes Multimodality --- distribution has several modes (peaks) Simple statistical tests Fit to a mixture model (LR test) p(z) = pr(qq)p(z QQ) +pr(qq)p(z Qq) + pr(qq)p(z qq) Heterogeneity of within-family variances Select and backcross
Mixture Models The distribution of trait value z is the weighted sum of n underlying distributions The probability that a random individual is from class i p(z) = n i=1 Pr(i) p i (z) p(z) = n i=1 The distribution of phenotypes z conditional of the individual belonging to class i Pr(i) ϕ(z, µi, σ 2 i ) The component distributions are typically assumed normal
The component distributions are typically assumed n p(z) = Pr(i) ϕ(z, µi, σi 2 ) i=1 ϕ(z, µ i,σ 2 i ) = 1 2πσ 2 i [ (z µi ) exp 2 2σ 2 i ] Normal with mean µ and variance! 2 3n-1 parameters: n-1 mixture proportions, n means, n variances Typically assume common variances -> 2n-1 parameters
In quantitative genetics, the underlying classes are typically different genotypes (e.g. QQ vs. Qq) although we could also model different environments in the same fashion Likelihood function for an individual under a mixture model l(z j ) = Pr(QQ) p QQ (z j ) + Pr(Qq) p Qq (z j ) + Pr(qq) p qq (z j ) = Pr(QQ) ϕ(z j,µ QQ,σ 2 ) + Pr(Qq)ϕ(z j, µ Qq, σ 2 ) + Pr(qq) ϕ(z j,µ qq, σ 2 ) Mixture proportions follow from Hardy-Weinberg, e.g. Pr(QQ) = p Q * p Q
l(z j ) = Pr(QQ) p QQ (z j ) + Pr(Qq)p Qq (z j ) + Pr(qq)p qq (z j ) = Pr(QQ) ϕ(z j,µ QQ,σ 2 ) + Pr(Qq) ϕ(z j, µ Qq, σ 2 ) + Pr(qq) ϕ(z j,µ qq, σ 2 ) Likelihood function for a random sample of m individuals l(z) = l(z 1,z 2,, z m ) = m l(z j ) j=1
Likelihood Ratio test for Mixtures Null hypothesis: A single normal distribution is adequate to fit the data. The maximum of the likelihood function under the null hypothesis is max l 0 (z 1, z 2,,z m ) = (2πS 2 ) m/2 exp 1 2S 2 S 2 = 1 m (z i z) 2 m (z j z ) 2 j=1 The LR test for a significantly better fit under a mixture is given by 2 ln (max { likelihood under mixture}/max l 0 ) The LR follows a chi-square distribution with n-2 df, where n-1 = number of fitted parameters for the mixture
Complex Segregation Analysis A significant fit to a mixture only suggests the possibility of a major gene. A much more formal demonstration of a major gene is given by the likelihood-based method of Complex Segregation Analysis (CSA) Testing the fit of a mixture model requires a sample of random individuals from the population. CSA requires a pedigree of individuals. CSA uses likelihood to formally test for the transmission of a major gene in the pedigree
Building the likelihood for CSA Start with a mixture model Difference is that the mixing proportions are not the same for each individual, but rather are a function of its parental (presumed) genotypes Phenotypic value of individual j in family i Sum is over all possible genotypes, indexed by g o =1,2,3 Mean of genotype g o l(zij gf, gm) = 3 g o =1 Pr(go gf, gm) ϕ(zij, µg o, σ 2 ) Major-locus genotypes of parents Transmission Probability of an offspring having genotype g o given the parental genotypes are g f, g m. Phenotypic variance conditioned on major-locus genotype
l(zij gf, gm) = 3 g o =1 Pr(go gf, gm) ϕ(zij, µg o, σ 2 ) Transmission Probability example: code qq=3, Qq=2,QQ=1 (g o = 3 g f = 1, g m = 2) = (qq g f = QQ, g m = Qq) = 0 (go = 2 gf = 1, gm = 2) = (Qq gf = QQ, gm = Qq) = 1/2 (go = 1 gf = 1, gm = 2) = (QQ gf = QQ, gm = Qq) = 1/2 Conditional family likelihood l(zi gf, gm) = n i j=1 l(zij gf, gm) Likelihood for family i 3 3 l(zi ) = g f =1 g m =1 l(zi gf, gm) (gf, gm)
Transmission Probabilities Explicitly model the transmission probabilities Pr( qq g f, g m ) = (1 τ gf )(1 τ gm ) Pr( Qq gf, gm) = τg f (1 τg m ) + τg m (1 τg f ) Pr(QQ gf, gm) = τg f τg m Probability that the father transmits Q Probability that the mother transmits Q Formal CSA test of a major gene (three steps):
Pr(qq g f, g m ) = (1 τ gf ) (1 τ gm ) Pr( Qq gf, gm) = τg f (1 τg m ) + τg m (1 τg f ) Pr( QQ gf, gm) = τg f τg m Formal CSA test of a major gene (three steps): Significantly better overall fit of a mixture model compared with a single normal Failure to reject the hypothesis of Mendelian segregation : " QQ = 1,, " Qq = 1/2, " qq = 0 Rejection of the hypothesis of equal transmission for all genotypes (" QQ = " Qq = " qq )
CSA Modification: Common Family Effects Families can share a common environmental effect Expected value for g o genotype, family i is µ go + c i Likelihood conditioned on common family effect c i l(z i g f, g m, c i ) = n i j=1 3 g oj =1 Pr(g oj g f, g m )ϕ(z ij, µ goj + c i, σ 2 ) Unconditional likelihood (average over all c) --- assumed Normally distributed with mean zero and variance! c 2
l(z i g f, g m, c i ) = n i j=1 3 g oj =1 Pr(g oj g f, g m )ϕ(z ij, µ goj + c i, σ 2 ) Unconditional likelihood (average over all c) --- assumed Normal with mean zero and variance! c 2 l(zi gf, gm) = l(zi gf, gm, c) ϕ(c, 0, σ 2 c ) dc Likelihood function with no major gene, but family effects l(z i ) = = l(z i c) ϕ(c, 0, σ 2 c) dc n i j=1 ϕ ( zij,µ + c, σ 2) ( ) ϕ c, 0, σ 2 c dc