Conditions under which genome-wide association studies will be positively misleading

Size: px
Start display at page:

Download "Conditions under which genome-wide association studies will be positively misleading"

Transcription

1 Genetics: Published Articles Ahead of Print, published on September 2, 2010 as /genetics Conditions under which genome-wide association studies will be positively misleading Alexander Platt 1, Bjarni J. Vilhjálmsson 1, Magnus Nordborg 1,2 1 Molecular and Computational Biology, University of Southern California, Los Angeles, California, United States of America. 2 Gregor Mendel Institute, Austrian Academy of Sciences, Vienna, Austria. Manuscript submitted to Genetics, August 2, 2010 Genome-wide association mapping is a popular method for using natural variation within a species to generate a genotype-phenotype map. Statistical association between an allele at a locus and the trait in question is used as evidence that variation at the locus is responsible for variation of the trait. Indirect association, however, can give rise to statistically significant results at loci unrelated to the trait. We use a haploid, three-locus, binary genetic model to describe the conditions under which these indirect associations become stronger than any of the causative associations in the organism even to the point of representing the only associations present in the data. These indirect associations are the result of disequilibrium between multiple factors affecting a single trait. Epistasis and population structure can exacerbate the problem but are not required to create it. From a statistical point of view, indirect associations are true associations rather than the result of stochastic noise: they will not be ameliorated by increasing sampling size or marker density and can be reproduced in independent studies. Introduction Genome-wide association mapping is a powerful tool that leverages the natural variation of a trait in a population to identify genetic factors that influence the trait. The theory is that due to the large number of recombination events in the genetic history of the population, only markers in tight linkage disequilibrium with loci responsible for the trait variation will exhibit significant statistical association with the trait. There are two ways in which genome-wide association mapping will fail by identifying loci which are not responsible for the variation in the trait (i.e., false positives): stochastic noise can generate an association in a sample that is not present in the larger population, or patterns of correlation among loci and factors causing trait variation can create indirect associations between markers and traits where no causal relation exists. While the former can be well-quantified and managed with traditional sampling theory and replication, genomic control, and properly specified error terms in statistical models, these techniques do little to address the latter. As the association is true and not a statistical aberration, all accurate tests of association will point to the same non-causative loci, increasing sample sizes and marker densities will only heighten the misleading results, and these results can be reproduced in all follow-up studies. It has long been recognized that population structure can cause these kinds of spurious, non-random associations (LI, 1969; LANDER and SCHORK, 1994), and considerable effort has been devoted to addressing this problem statistically (DE- VLIN and ROEDER, 1999; PRITCHARD et al., 2000; YU et al., To whom correspondence should be addressed. alexander.platt@usc.edu 2006; PRICE et al., 2006). However, attention has almost exclusively focused on the case where a non-causal marker is falsely identified as causal (or closely linked to a causal polymorphism) because both it and the trait are correlated with a single unobserved variable (e.g., geographic origin in a structured population). The effect of including multiple causal loci has not adequately been considered. That this matters has been demonstrated by two recent papers. DICKSON et al. (2010) used simulations to show that the presence of two or more rare causal variants in disequilibrium that can themselves not be detected due to lack of statistical power can produce spurious associations that are only distantly linked to the causal polymorphisms, and ATWELL et al. (2010) showed that negative disequilibrium between two causal polymorphisms in the gene FRIGIDA interfered with the ability to find either of them but created strong signals at several distantly linked markers in a genome-wide association study in Arabidopsis thaliana. To understand these cases we need a model with at least three variables: a non-causal marker, and two background, unobserved factors. Here we present the simplest possible model a haploid model of three binary loci and use it to illustrate what conditions give rise to misleading genome-wide association mapping results. Results The simplest model possible Table 1 defines the model: C denotes the causative locus we are trying to identify; L is a latent variable, be it a second locus 1

2 Table 1. Model specification. Latent Causative Non-causal variable polymorphism marker Phenotype Genotype frequency a b β C c β C d β L e β L f β C + β L + β LC g β C + β L + β LC h The model is defined as a genotype of three binary factors, L, C, and N. Every combination of those factors perfectly describes a phenotype P and occurs with a frequency indicated by a... h. or environmental factor, that may also influence the organism s phenotype, and N is a non-causal marker locus. Parameters a... h are the population frequencies of all the possible genotypes. β C and β L represent the additive component of the influence on phenotype of the designated causative allele and state of the latent variable, respectively. β LC is an epistatic term defined as the deviation from additivity of the combined effects of L and C. Without loss of generality, the causative alleles and latent variables are labeled so that β C and β L are both 0 and the non-causal marker is labeled so that cov (N, P ) 0. In every case we consider the phenotype, P, to be fully determined by L and C. There is no stochastic noise included in our analyses. With this model we can describe simple traits with only a single factor influencing the phenotype by setting β L and β LC to 0. A trait governed by purely additive contributions from two factors is modeled by letting β C and β L vary freely but keeping β LC at 0. Varying β LC gives us a wide range of epistatic effects. Positive values of β LC give us synergistic epistasis and negative values antagonistic. Table 2 defines some useful parameterizations. Table 2. Parameterization. Symbol Description Definition ρ L frequency of variable L e + f + g + h ρ C frequency of allele C c + d + g + h ρ N frequency of allele N b + d + f + h D NC disequilibrium between N and C d + h ρ N ρ C D NL disequilibrium betweenn and L f + h ρ N ρ L D LC disequilibrium between L and C g + h ρ Lρ C D NLC three locus disequilibrium h ρ N ρ Lρ C Re-parameterizing the model in terms of the frequencies of individual factors and the disequilibrium between them facilitates biological understanding of what creates associations between factors and phenotypes. In association mapping we are looking for non-independence between alleles and phenotypes. Non-independence can be quantified in many ways. Our analytical work will focus on covariance between proposed factors and observed phenotypes. A significantly non-zero covariance indicates an association between the trait and the marker being examined. The hope is that this indicates that the associated locus contributes biologically to variation for trait, or is very closely linked to a locus that does. In our model, we want the covariance between the causal polymorphism and the trait, cov(c, P ), to be high (or we will not be able to detect the causal association), and we want the covariance between the non-causal marker, cov(n, P ), and the trait to be high if and only if the marker is tightly linked to the causal polymorphism. We do not want to cov(n, P ) > cov(c, P ) lest we misidentify the non-causal marker as causal. The covariance between the latent variable and the trait, cov(l, P ), finally, is just a nuisance from the point of view of identifying the causal polymorphism. For our model, we have: cov (N, P ) = β C D NC + β L D NL + β LC (D NLC ρ N D LC ) cov (C, P ) = β C ρ C (1 ρ C ) + β L D LC + β LC (g + h)(1 ρ C ) cov (L, P ) = β L ρ L (1 ρ L ) + β C D LC + β LC (g + h)(1 ρ L ) By looking at these covariance terms in various settings we will illustrate when we can expect association mapping to be misleading. For clarity, we focus on expectations, and do not consider the stochastic error introduced by finite sample sizes. Simple traits Setting β L and β LC equal to 0 we describe a trait that is only influenced by a single causative polymorphism. In this case equations 1 and 2 reduce to cov (N, P ) = β C D NC (1) (2) (3) 2

3 and cov (C, P ) = β C ρ C (1 ρ C ), respectively. The causative allele will give the most significant results when its effect on the phenotype is large and it is at an intermediate frequency in the sample. The non-causal marker will give significant results when the effect of the causative allele is large and there is disequilibrium between the two loci. In expectation, however, the non-causal marker should not give a more significant result than the causative polymorphism. Indeed, cov (N, P ) cov (C, P ) (4) expands to which simplifies to β C (d + h ρ N ρ C ) β C ρ C (1 ρ C ), (c + g b f)ρ C c + g. This is always true as c, g, b, f and ρ C are all defined on the interval [0, 1]. While disequilibrium can generate significant results for non-causal markers, with sufficient sample size the most significant results can be expected to be for the causative polymorphism or, if it is not present in the marker panel, the marker in greatest disequilibrium with it. Thus, while false positives, in the sense of significantly associated but unlinked non-causal markers may exist (especially if population structure induces long-distance linkage disequilibrium across the genome), sufficiently powered association studies should always also locate the causal polymorphism if it exists. However, with traits with more than one contributing factor there is no such guarantee. This is the problem we turn to next. (Association studies can of course always be misleading if no causal polymorphism exists but non-causal markers co-vary with a non-genetic latent variable: this is readily seen by setting β C and β LC equal to 0 in our model.) Complex traits When two or more factors contribute to variation in a trait, association studies may be misleading in the sense that non-causal markers can be expected to be more strongly associated than either causal polymorphism. To see this we consider several scenarios beginning with causative factors with only additive effects. Additive effects, strong latent variable In an extreme case where effects are additive (β LC = 0), but β L β C equations 1 and 2 can be approximated by and cov (N, P ) = β L D NL cov (C, P ) = β L D LC. Under these conditions the causative polymorphism acts like a non-causal marker and the most significant signals will come from whichever has the greatest disequilibrium with the latent variable that is responsible for most of the variation in the phenotype. If the latent variable is another genetic locus this is not a problematic result as we have simply approximated the previously described case of a simple genetic trait. If the latent variable is an exogenous factor, however, we now see that we may erroneously ascribe its effect to a genetic locus that happens to be correlated with it. Equivalent additive factors Less trivially, setting β LC = 0 and β L = β C = β describes a trait controlled equally by two factors and gives us covariance terms: cov (N, P ) = β (D NC + D NL ) (5) cov (C, P ) = β [D LC + ρ C (1 ρ C )] (6) cov (L, P ) = β [D LC + ρ L (1 ρ L )]. (7) In this case, the non-causal marker is expected to have a more significant result than the causative allele whenever D NC + D NL > D LC + ρ C (1 ρ C ), (8) which makes it intuitive to see how rare causative alleles can give rise to the kind of synthetic association described by DICKSON et al. (2010). The term involving ρ C on the right becomes small leaving ample opportunity for the two disequilibrium terms on the left to swamp out the one disequilibrium term on the right. The specific pattern described in that paper is one where the latent variable is a second causative genetic variant at a locus. This creates strong negative covariance between the two causative factors and eliminates the opportunity for genetic interactions to play any role. In this case the only haplotypes that occur with appreciable frequencies correspond in our model to a, b, d, and f. Setting all other haplotype frequencies to 0 in equation 8 and simplifying shows us that under these conditions the strongest association will be expected at the non-causal locus whenever ρ N < 1 bd/f. For this scenario to cause problematic results, the non-causal marker cannot be too common or it cannot be in sufficiently strong linkage disequilibrium with the rare causative loci. Epistasis There are limits to the degree of confounding possible when interactions are purely additive. Within the restriction of additivity, even when the strongest signal in an association study is coming from a non-causal locus, we should expect at least one of the truly causative factors to exhibit at least some association. This is because the covariance between the noncausal marker and the phenotype will never be larger than the sum of the covariance between the causative locus and the phenotype and the latent variable and the phenotype. From equations 1, 2, and 3, expands to cov (N, P ) cov (C, P ) + cov (L, P ) β C D NC + β L D NL β C ρ C (1 ρ C ) + β C D LC + β L ρ L (1 ρ L ) + β L D LC. 3

4 From equation 4 it follows that β C D NC β C ρ C (1 ρ C ), which is also true if you replace all the Cs with Ls. Doing so and substituting lets us cancel and get 0 (β C + β L )D LC which is always true. A non-zero interaction term does away with this upper bound for cov (N, P ), however. Consider, for example, the case where β C = β L = β but β LC = β (negative epistasis: either causative allele is sufficient for the phenotype), a = b = c = e = h = 0 and d = f = g = 1/3 (negative covariance between the two causal factors). In this example, cov(c, P ) = cov(l, P ) = 0, but cov(n, P ) = 2β/3. In other words, the non-causal marker can have an arbitrarily large covariance with the trait even though there is no association for any of the truly causative factors, no matter how powerful the study. Simulated example To illustrate the behavior of our model using real polymorphism data, we use the data of ATWELL et al. (2010), who carried out a genome-wide association study using 216,130 Single Nucleotide Polymorphism (SNP) marker in a set of 199 inbred lines of A. thaliana. The sample is characterized by complex population structure (PLATT et al., 2010), which makes it ideal for illustrative purposes. Many traits are strongly correlated with latitude in A. thaliana. This can come about through geographically-distributed causative genetic polymorphism or large effect, the combined effect of many causative polymorphisms of small effect, or non-genetic confounding factors. We performed two sets of simulations. A first causative locus is picked at random from the 216,130 SNPs and a random allele is assigned an effect. The second causative factor is then either a SNP or a binary environmental factor where both possibilities for an effect allele are used. This is repeated for 10% of the SNPs in the data set and a new trait is generated, resulting in 43,200 non-constant traits for each of the sets of simulations. For the first set, the traits are correlated with the population structure of the organism, and the second causative variable is a latent indicator variable that identifies each individual as having been collected north of 50 latitude, a line that lies midway between London and Paris, and which divides the sample roughly in half. In the second set of simulations, the second causative variable is another randomly selected SNP. Phenotypes were calculated for three different trait architectures, letting β C = β L = β with differing degrees of interaction (Table 3). Setting β LC = 0 gives a purely additive model. With β LC = β we get an or model where either causative factor is sufficient to create phenotypic change. When describing two genetic loci this model can reflect the interaction between loss of function mutations in different genes in a common pathway. With an environmental cofactor this represents a canalized trait whose genetic variation is only revealed phenotypically in certain environments. As described above, this kind of negative epistasis can give rise to situations where only the non-causal marker is correlated with the phenotype. Setting β LC = 2β gives us an xor model where individuals with zero and two labeled factors share a common phenotype but are different from those with only one (regardless of which it is). Genetically, this model can reflect the interaction between a compensatory pair of mutations, such as one in a transcription factor and one in a binding site. As an environmental effect this scenario occurs whenever there are trade-offs between responses in different environments. Pathogen resistance is one example. Functional resistance genes can raise seed production where pathogens are present but reduce it where they are not (KORVES and BERGELSON, 2004). For each simulated phenotype we performed a genome-wide association study using the non-parametric Wilcoxon Rank Sum test on every marker. For the first set of simulations, where the latent variable is a North-South split, Figures 1 (A C) show how far down in the list of associated markers one would have to go to find the correct locus. In the purely additive simulations there are few problems (Fig. 1A). The correct locus is easily identified as one of the very strongest results in almost all cases, with the vast majority of exceptions being associated with cases where the causative locus has a very low minor allele frequency. The or model exhibits greater confounding (Fig. 1B). The locus is perfectly identified less than half of the time and is sometimes missed even when the minor allele frequency is intermediate. The correct locus was essentially never found in the xor model regardless of the minor allele frequency (Fig. 1C). Measurements of the distance between the causative locus and the locus with the lowest P-value followed the same pattern. When the causative locus is among the highest ranked SNPs it is near the locus with the lowest P-value. As its rank falls it tends to be farther and farther away, and by the time it s not within the top thousand SNPs it is often on the wrong chromosome. Figure 1 (D F) shows the distribution of maximum distances to the causative SNP for all markers with association greater or equal than the causative locus. It is evident that when the causative marker is not the most significant, a very distant marker usually is. This is true even in the simple additive case. In the xor model the causative marker is not significant most of the time. Turning to the simulations with two randomly chosen causative loci, Figures 2 (A C) show the p-value rank distribution of the two causative alleles, both the top-ranking and the second-ranking. A true causative locus is essentially always found in the additive case (Fig. 2A), and the more weakly associated locus is often among the most significant ones. For the epistatic or and xor models a true causative locus is missed one time in eight and two times out of five, respectively (Figs. 2B C). The rank of the second-ranking causative locus also becomes lower in the epistatic models. Figures 2 (D F) shows the distribution of maximum distances to the nearest causative SNP for all markers with association greater than the second-ranking causative locus. This demonstrates that there are often unlinked loci with greater significance than the 4

5 Table 3. Simulated phenotypes. Genotype Phenotype Latent Causative Additive Or Xor variable polymorphism β LC = 0 β LC = P β LC = 2P North South 0 β β β North 1 β β β South 1 2β β 0 Model for generating phenotypes from data with one causative genetic locus and a non-genetic, geographic factor that will is treated as a latent variable. Frequency A B C 0 <MAF <MAF <MAF <MAF <MAF 0.5 D 10 2 <r <r <r 10 2 <r P-value rank of the causative SNP E F <r 10 2 <r Caus. not significant Caus. significant Frequency Distance to nearest causative (kb) Figure 1: Simulations results for a geographical latent variable, a North-South split. A C: Rank of the causative SNP. Illustration of how many markers had a stronger association than the causative SNP in a given analysis under (A) additive genetic model, (B) or model, and (C) xor model. Colors indicate the minor allele frequency of the causative SNP. D F: Maximum distance to the causative SNP of all SNPs with greater of equal association than the causative SNP under (D) additive genetic model, (E) or model, and (D) xor model. Colors indicate whether the causative marker was found to be significant at the Bonferroni threshold. Only results were where at least one SNP was found significant were included in the analysis. 5

6 second-ranking causative locus, even when both causative loci are significant. This is a particularly serious problem in the epistatic models (see also Table 4). Discussion Causes of confounding We have used a very simply three-locus model to clarify the conditions under which genome-wide association studies are expected to be reproducibly misleading. We believe there are three distinct problem sources: correlation between causal factors and (unlinked) noncausal markers; more than a single causal factor (especially if the factors themselves are correlated), and; epistasis (i.e., non-linear interactions between causal factors in the determining the phenotype). Consider each in turn. Correlation with unlinked markers Correlation between causal factors and unlinked, non-causal markers (note that all non-causal markers are unlinked if the causal factors are nongenetic) violates the basic assumption of GWAS, and causes false positives. Population structure, by definition, causes genome-wide correlations between alleles (linkage disequilibrium), which can easily lead to genome-wide occurrence of false positives (ROSENBERG and NORDBORG, 2006), a problem that has long been recognized (LI, 1969; LANDER and SCHORK, 1994), and for which many statistical solutions have been proposed (DE- VLIN and ROEDER, 1999; PRITCHARD et al., 2000; YU et al., 2006; PRICE et al., 2006). However, it is important to realize that associations at unlinked, non-causal markers can also arise because of pleiotropy. Consider, for example, a scenario in which one polymorphism affects both skin and eye color and another affects just skin color. If skin color variation is locally adaptive, then selection causes correlation (linkage disequilibrium) between the two loci. A GWAS for eye color would detect associations at both loci, even though one of them has nothing to do with this trait. Unlike false positives caused by population structure, these types of false positives would not occur at random throughout the genome: they would only occur at noncausal markers correlated with causal factors through selection on pleiotropic traits. This might make them less common: it would certainly make them more difficult to eliminate through statistical methods. More than a single causative factor Whenever a trait is controlled by more than a single factor it is possible that the strongest associations in the data are indirect ones. As biologically uninformative as these associations are, they are true associations and will respond as such to statistical tests; gaining significance with increased sampling and reproducing in multiple data sets. Without any population structure, strong indirect associations can arise at loci that are genetically linked to two or more causative factors, even if the causative factors are in equilibrium with each other. This linkage-only case has been welldocumented in linkage mapping literature (MARTINEZ and CURNOW, 1992; HALEY and KNOTT, 1992). Here, two genetically linked quantitative trait loci combine to produce a false, or ghost peak of association between them. In the presence of natural selection it is no longer necessary for the indirectly associated marker to be linked to more than one causative locus (as in the ghost peak version) as there will already exist correlations between the causative factors. A marker linked to one is likely to be in disequilibrium with all of them. With populations structure or selection and pleiotropy, however, these indirect associations can be far removed from all causative factors. Epistasis When the causative loci interact epistatically it is possible that the only loci exhibiting any association with the phenotype are non-causal. While it has long been recognized that epistatically interacting loci may be difficult to find due to lack of marginal effect (EAVES, 1994), correlated non-causal loci can serve as excellent markers for the joint state of several causative loci working in consort. Tests for association based on multi-locus haplotypes (or which model explicit interaction terms) will improve results but not completely ameliorate the problem. While we have mostly been describing the factors L, C, and N as single loci, they can just as easily represent arbitrarily complex combinations of loci (and external factors). A statistician who perfectly models the trait architecture, and knows that he or she has done so, will have effectively recast the complex trait as a simple trait (albeit with complex inputs). It would be guaranteed that no non-causal marker-complex will have a stronger association than the causative factor-complex, but there is nothing stopping non-causal marker-complexes from having associations just as strong as the causative ones. Even simple non-causal markers may have associations as strong as the causative markercomplex which would mislead any sort of model-selection algorithm. Conclusions Our purpose in writing this paper has been to clarify the conditions under which GWAS are expected to be reproducibly misleading. As our simulation results demonstrate, severe problems may arise when we attempt to model traits that are really due to multiple, possibly correlated, possibly epistatically interacting factors using single-locus models that assume that unlinked, non-causal markers are not correlated with the causal factors. Not only do we face the well-known problem of false positives across the genome, we also see that the strongest associations may appear on chromosomes completely devoid of causative loci, and that the true positives may be undetectable. 6

7 A B C Top ranked causative Other causative Frequency Frequency D 10 2 <r <r E <r 10 2 <r P-value rank F <r 10 2 <r Neither significant One significant Both significant Distance to nearest causative (kb) Figure 2: Simulations results for two causative SNPs, where both are chosen at random. A-C: Rank of the top-ranking causative SNP (blue), and the second-ranking causative SNP (orange) under (A) additive genetic model, (B) or model, and (C) xor model. D-F: Maximum distance to nearest causative SNP among SNPs with greater association than the more weakly associated causative SNP under (D) additive genetic model, (E) or model, and (D) xor model. Colors indicate whether two, one or none of the causative SNPs were found significant at the Bonferroni threshold. Only results were where at least one SNP was found significant were included in the analysis. Table 4. Summary of simulation result. At least one significant? a Top-ranking causal? b Distant non-causal found? c Model Additive Or Xor Additive Or Xor Additive Or Xor Latent North-South variable Two causal loci a Fraction of results with at least one significant SNP (at a Bonferroni-corrected threshold of 5) and that were used for subsequent analysis. b Fraction of results in which the top-ranking association was a causal polymorphism (the causal polymorphism in the case of a latent variable). c Fraction of results in which a SNP more strongly associated with the phenotype than a casual polymorphism (the causal polymorphism in the case of a latent variable) was more than 50kb away from the nearest causal polymorphism. 7

8 In this light, the common practice of correcting for population structure may be misguided. The real goal should be correcting for the confounding effects of multiple causative factors. Some of the techniques currently employed as population structure correction actually do this very well. The mixedmodel approach (YU et al., 2006), for instance, can be interpreted as removing the effect of a large number of unlinked selectively neutral factors, each with an uninterestingly small effect on the studied trait (KANG et al., 2010). Approaches such as Structured Analysis (PRITCHARD et al., 2000) and PCA (PRICE et al., 2006), on the other hand, only aid in correcting for the correlations among multiple causative factors to the extent that clustering on global patterns of genetic variation approximates the distributions of the individual causative factors. Attempting to correct for population structure directly, as opposed to correcting for correlations among multiple causative factors, runs the risk of eliminating the effects of the largest, most interesting loci from the study. This will happen whenever alleles at those loci have a distribution similar to the genomic patterns of correlation. Such factors can easily and accurately be identified as being associated, though they will be in disequilibrium with many non-causal loci, making them difficult to locate with any precision. This is not to say, however, that the presence of any of these confounding attributes of complex traits dooms a genome-wide association study to failure. All of them, multiple factors, natural selection, epistasis, and population structure, contribute to confounding in quantitative ways and in amounts that will be greatly influenced by their specific details. A carefully constructed human case-control study, for instance, may not suffer from appreciable population structure and would therefore only introduce an imprecision in the location of the cause of the associations. Larger, population-based cohort studies, however, may soon find themselves running in to the kinds of large-scale population structure inherent in the human species (FREED- MAN et al., 2004; NOVEMBRE et al., 2008). The results may still be mostly accurate if natural selection is weak and the additive effects of the majority of the causative loci are large, but may become questionable when considering highly polygenic traits under strong selection. Genome-wide association studies applied to other organisms, however, may be considerably more problematic. The very worst situation is likely to arise in species that have undergone strong local adaptation or have experienced artificial selection to create numerous different phenotypes. In these cases the correlated effects of population structure and selection may well be expected to swamp any remaining causative associations with rampant and excessive indirect associations spread all across the genome. Organisms like Arabidopsis thaliana may be intermediate, with confounding ranging from almost non-existent to extremely problematic depending on the architecture of the trait. In organisms with high levels of confounding, it is necessary to proceed with caution, and treat identified associations as hypotheses for followup confirmatory studies (ATWELL et al., 2010). It is also worth noting that these indirectly-associated sites confound not just the scientist attempting to discover the map between phenotype and genotype, but similarly interfere with the process of natural selection as well. In the example of epistasis described above, in which marginal effects of the causal factors are completely missing, any selection applied to the trait in question would change the allele frequency (producing a partial selective sweep) only at the non-causal, neutral locus, not at any of the loci that actually contribute to the phenotype. Where natural selection has an advantage over the scientist is that the scientist is generally restricted to a snapshot of a population and its patterns of disequilibrium. Natural selection is a process that unfolds over successive generations and may have the opportunity to break apart disadvantageous correlations. Scientists can mimic this process in some cases by performing experimental crosses, genetic transformations, or pedigree or family-based analyses and thereby disrupting the extant patterns of disequilibrium, though this is often not feasible in clinical studies. Acknowledgements We are thankful to David Conti, Sergey Nuzhdin, Paul Marjoram, Juan Pablo Lewinger, Thomas Turner, Quingrun Zhang and Quan Long for helpful discussions. This work was supported by NSF DEB , NIH P50 HG002790, and by the Austrian Academy of Sciences. References ATWELL, S., Y. S. HUANG, B. J. VILHJÁLMSSON, G. WILLEMS, M. HORTON, et al., 2010 Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465: DEVLIN, B., and K. ROEDER, 1999 Genomic control for association studies. genomic control for association studies. genomic control for association studies. Biometrics 55: DICKSON, S. P., K. WANG, I. KRANTZ, H. HAKONARSON, and D. B. GOLDSTEIN, 2010 Rare variants create synthetic genome-wide associations. PLoS Biol 8: e EAVES, L. J., 1994 Effect of genetic architecture on the power of human linkage studies to resolve the contribution of quantitative trait loci. Heredity 72: FREEDMAN, M. L., D. REICH, K. L. PENNEY, G. J. MC- DONALD, A. A. MIGNAULT, et al., 2004 Assessing the impact of population stratification on genetic association studies. Nat Genet 36: HALEY, C. S., and S. A. KNOTT, 1992 Maximum-likelihood mapping of quantitative trait loci using full-sib families. Genetics 132: KANG, H. M., J. H. SUL, S. K. SERVICE, N. A. ZAITLEN, S. YEE KONG, et al., 2010 Variance component model to account for sample structure in genome-wide association studies. Nat Genet 42:

9 KORVES, T., and J. BERGELSON, 2004 A novel cost of r gene resistance in the presence of disease. Am Nat 163: LANDER, E. S., and N. J. SCHORK, 1994 Genetic dissection of complex traits. Science 265: LI, C. C., 1969 Population subdivision with respect to multiple alleles. Ann Hum Genet. 33: MARTINEZ, O., and R. N. CURNOW, 1992 Estimating the locations and the sizes of the effects of quantitative trait loci using flanking markers. Theor. Appl. Genet. 85: NOVEMBRE, J., T. JOHNSON, K. BRYC, Z. KUTALIK, A. R. BOYKO, et al., 2008 Genes mirror geography within europe. Nature 456: PLATT, A., M. HORTON, Y. S. HUANG, Y. LI, A. E. ANAS- TASIO, et al., 2010 The scale of population structure in Arabidopsis thaliana. PLoS Genet 6: e PRICE, A. L., N. J. PATTERSON, R. M. PLENGE, M. E. WEINBLATT, N. A. SHADICK, et al., 2006 Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38: PRITCHARD, J. K., M. STEPHENS, N. A. ROSENBERG, and P. DONNELLY, 2000 Association mapping in structured populations. Am J Hum Genet 67: ROSENBERG, N., and M. NORDBORG, 2006 A general population-genetic model for the production by population structure of spurious genotype-phenotype associations in discrete, admixed, or spatially distributed populations. Genetics 173: YU, J., G. PRESSOIR, W. BRIGGS, I. VROH BI, M. YA- MASAKI, et al., 2006 A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet 38:

p(d g A,g B )p(g B ), g B

p(d g A,g B )p(g B ), g B Supplementary Note Marginal effects for two-locus models Here we derive the marginal effect size of the three models given in Figure 1 of the main text. For each model we assume the two loci (A and B)

More information

Methods for Cryptic Structure. Methods for Cryptic Structure

Methods for Cryptic Structure. Methods for Cryptic Structure Case-Control Association Testing Review Consider testing for association between a disease and a genetic marker Idea is to look for an association by comparing allele/genotype frequencies between the cases

More information

Lecture WS Evolutionary Genetics Part I 1

Lecture WS Evolutionary Genetics Part I 1 Quantitative genetics Quantitative genetics is the study of the inheritance of quantitative/continuous phenotypic traits, like human height and body size, grain colour in winter wheat or beak depth in

More information

(Genome-wide) association analysis

(Genome-wide) association analysis (Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by

More information

Evolution of phenotypic traits

Evolution of phenotypic traits Quantitative genetics Evolution of phenotypic traits Very few phenotypic traits are controlled by one locus, as in our previous discussion of genetics and evolution Quantitative genetics considers characters

More information

Genotype Imputation. Biostatistics 666

Genotype Imputation. Biostatistics 666 Genotype Imputation Biostatistics 666 Previously Hidden Markov Models for Relative Pairs Linkage analysis using affected sibling pairs Estimation of pairwise relationships Identity-by-Descent Relatives

More information

Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies

Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies Ruth Pfeiffer, Ph.D. Mitchell Gail Biostatistics Branch Division of Cancer Epidemiology&Genetics National

More information

Asymptotic distribution of the largest eigenvalue with application to genetic data

Asymptotic distribution of the largest eigenvalue with application to genetic data Asymptotic distribution of the largest eigenvalue with application to genetic data Chong Wu University of Minnesota September 30, 2016 T32 Journal Club Chong Wu 1 / 25 Table of Contents 1 Background Gene-gene

More information

Linear Regression (1/1/17)

Linear Regression (1/1/17) STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression

More information

Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination)

Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination) 12/5/14 Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination) Linkage Disequilibrium Genealogical Interpretation of LD Association Mapping 1 Linkage and Recombination v linkage equilibrium ²

More information

1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES:

1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES: .5. ESTIMATION OF HAPLOTYPE FREQUENCIES: Chapter - 8 For SNPs, alleles A j,b j at locus j there are 4 haplotypes: A A, A B, B A and B B frequencies q,q,q 3,q 4. Assume HWE at haplotype level. Only the

More information

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics 1 Springer Nan M. Laird Christoph Lange The Fundamentals of Modern Statistical Genetics 1 Introduction to Statistical Genetics and Background in Molecular Genetics 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

More information

An Arabidopsis Example of Association Mapping in Structured Samples

An Arabidopsis Example of Association Mapping in Structured Samples An Arabidopsis Example of Association Mapping in Structured Samples Keyan Zhao 1, María José Aranzana 1, Sung Kim 1, Clare Lister 2, Chikako Shindo 2, Chunlao Tang 1, Christopher Toomajian 1, Honggang

More information

BTRY 7210: Topics in Quantitative Genomics and Genetics

BTRY 7210: Topics in Quantitative Genomics and Genetics BTRY 7210: Topics in Quantitative Genomics and Genetics Jason Mezey Biological Statistics and Computational Biology (BSCB) Department of Genetic Medicine jgm45@cornell.edu February 12, 2015 Lecture 3:

More information

Introduction to Genetics

Introduction to Genetics Introduction to Genetics The Work of Gregor Mendel B.1.21, B.1.22, B.1.29 Genetic Inheritance Heredity: the transmission of characteristics from parent to offspring The study of heredity in biology is

More information

Friday Harbor From Genetics to GWAS (Genome-wide Association Study) Sept David Fardo

Friday Harbor From Genetics to GWAS (Genome-wide Association Study) Sept David Fardo Friday Harbor 2017 From Genetics to GWAS (Genome-wide Association Study) Sept 7 2017 David Fardo Purpose: prepare for tomorrow s tutorial Genetic Variants Quality Control Imputation Association Visualization

More information

SWEEPFINDER2: Increased sensitivity, robustness, and flexibility

SWEEPFINDER2: Increased sensitivity, robustness, and flexibility SWEEPFINDER2: Increased sensitivity, robustness, and flexibility Michael DeGiorgio 1,*, Christian D. Huber 2, Melissa J. Hubisz 3, Ines Hellmann 4, and Rasmus Nielsen 5 1 Department of Biology, Pennsylvania

More information

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES Saurabh Ghosh Human Genetics Unit Indian Statistical Institute, Kolkata Most common diseases are caused by

More information

GENETICS - CLUTCH CH.1 INTRODUCTION TO GENETICS.

GENETICS - CLUTCH CH.1 INTRODUCTION TO GENETICS. !! www.clutchprep.com CONCEPT: HISTORY OF GENETICS The earliest use of genetics was through of plants and animals (8000-1000 B.C.) Selective breeding (artificial selection) is the process of breeding organisms

More information

Variance Component Models for Quantitative Traits. Biostatistics 666

Variance Component Models for Quantitative Traits. Biostatistics 666 Variance Component Models for Quantitative Traits Biostatistics 666 Today Analysis of quantitative traits Modeling covariance for pairs of individuals estimating heritability Extending the model beyond

More information

Bayesian Inference of Interactions and Associations

Bayesian Inference of Interactions and Associations Bayesian Inference of Interactions and Associations Jun Liu Department of Statistics Harvard University http://www.fas.harvard.edu/~junliu Based on collaborations with Yu Zhang, Jing Zhang, Yuan Yuan,

More information

Binomial Mixture Model-based Association Tests under Genetic Heterogeneity

Binomial Mixture Model-based Association Tests under Genetic Heterogeneity Binomial Mixture Model-based Association Tests under Genetic Heterogeneity Hui Zhou, Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 April 30,

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 20: Epistasis and Alternative Tests in GWAS Jason Mezey jgm45@cornell.edu April 16, 2016 (Th) 8:40-9:55 None Announcements Summary

More information

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics: Homework Assignment, Evolutionary Systems Biology, Spring 2009. Homework Part I: Phylogenetics: Introduction. The objective of this assignment is to understand the basics of phylogenetic relationships

More information

Supplementary Information

Supplementary Information Supplementary Information 1 Supplementary Figures (a) Statistical power (p = 2.6 10 8 ) (b) Statistical power (p = 4.0 10 6 ) Supplementary Figure 1: Statistical power comparison between GEMMA (red) and

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University kbroman@jhsph.edu www.biostat.jhsph.edu/ kbroman Outline Experiments and data Models ANOVA

More information

HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA)

HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA) BIRS 016 1 HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA) Malka Gorfine, Tel Aviv University, Israel Joint work with Li Hsu, FHCRC, Seattle, USA BIRS 016 The concept of heritability

More information

HEREDITY AND VARIATION

HEREDITY AND VARIATION HEREDITY AND VARIATION OVERVIEW Students often do not understand the critical role of variation to evolutionary processes. In fact, variation is the only fundamental requirement for evolution to occur.

More information

Statistical Power of Model Selection Strategies for Genome-Wide Association Studies

Statistical Power of Model Selection Strategies for Genome-Wide Association Studies Statistical Power of Model Selection Strategies for Genome-Wide Association Studies Zheyang Wu 1, Hongyu Zhao 1,2 * 1 Department of Epidemiology and Public Health, Yale University School of Medicine, New

More information

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics.

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics. Evolutionary Genetics (for Encyclopedia of Biodiversity) Sergey Gavrilets Departments of Ecology and Evolutionary Biology and Mathematics, University of Tennessee, Knoxville, TN 37996-6 USA Evolutionary

More information

Backward Genotype-Trait Association. in Case-Control Designs

Backward Genotype-Trait Association. in Case-Control Designs Backward Genotype-Trait Association (BGTA)-Based Dissection of Complex Traits in Case-Control Designs Tian Zheng, Hui Wang and Shaw-Hwa Lo Department of Statistics, Columbia University, New York, New York,

More information

Principles of QTL Mapping. M.Imtiaz

Principles of QTL Mapping. M.Imtiaz Principles of QTL Mapping M.Imtiaz Introduction Definitions of terminology Reasons for QTL mapping Principles of QTL mapping Requirements For QTL Mapping Demonstration with experimental data Merit of QTL

More information

Natural Selection. Population Dynamics. The Origins of Genetic Variation. The Origins of Genetic Variation. Intergenerational Mutation Rate

Natural Selection. Population Dynamics. The Origins of Genetic Variation. The Origins of Genetic Variation. Intergenerational Mutation Rate Natural Selection Population Dynamics Humans, Sickle-cell Disease, and Malaria How does a population of humans become resistant to malaria? Overproduction Environmental pressure/competition Pre-existing

More information

NIH Public Access Author Manuscript Stat Sin. Author manuscript; available in PMC 2013 August 15.

NIH Public Access Author Manuscript Stat Sin. Author manuscript; available in PMC 2013 August 15. NIH Public Access Author Manuscript Published in final edited form as: Stat Sin. 2012 ; 22: 1041 1074. ON MODEL SELECTION STRATEGIES TO IDENTIFY GENES UNDERLYING BINARY TRAITS USING GENOME-WIDE ASSOCIATION

More information

Prediction of the Confidence Interval of Quantitative Trait Loci Location

Prediction of the Confidence Interval of Quantitative Trait Loci Location Behavior Genetics, Vol. 34, No. 4, July 2004 ( 2004) Prediction of the Confidence Interval of Quantitative Trait Loci Location Peter M. Visscher 1,3 and Mike E. Goddard 2 Received 4 Sept. 2003 Final 28

More information

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information # Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Details of PRF Methodology In the Poisson Random Field PRF) model, it is assumed that non-synonymous mutations at a given gene are either

More information

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia Expression QTLs and Mapping of Complex Trait Loci Paul Schliekelman Statistics Department University of Georgia Definitions: Genes, Loci and Alleles A gene codes for a protein. Proteins due everything.

More information

Cheng-Ruei Lee 1 *, Jill T. Anderson 2, Thomas Mitchell-Olds 1,3. Abstract. Introduction

Cheng-Ruei Lee 1 *, Jill T. Anderson 2, Thomas Mitchell-Olds 1,3. Abstract. Introduction Unifying Genetic Canalization, Genetic Constraint, and Genotype-by-Environment Interaction: QTL by Genomic Background by Environment Interaction of Flowering Time in Boechera stricta Cheng-Ruei Lee 1 *,

More information

(Write your name on every page. One point will be deducted for every page without your name!)

(Write your name on every page. One point will be deducted for every page without your name!) POPULATION GENETICS AND MICROEVOLUTIONARY THEORY FINAL EXAMINATION (Write your name on every page. One point will be deducted for every page without your name!) 1. Briefly define (5 points each): a) Average

More information

Overview. Background

Overview. Background Overview Implementation of robust methods for locating quantitative trait loci in R Introduction to QTL mapping Andreas Baierl and Andreas Futschik Institute of Statistics and Decision Support Systems

More information

The Admixture Model in Linkage Analysis

The Admixture Model in Linkage Analysis The Admixture Model in Linkage Analysis Jie Peng D. Siegmund Department of Statistics, Stanford University, Stanford, CA 94305 SUMMARY We study an appropriate version of the score statistic to test the

More information

Implicit Causal Models for Genome-wide Association Studies

Implicit Causal Models for Genome-wide Association Studies Implicit Causal Models for Genome-wide Association Studies Dustin Tran Columbia University David M. Blei Columbia University Abstract Progress in probabilistic generative models has accelerated, developing

More information

Computational Approaches to Statistical Genetics

Computational Approaches to Statistical Genetics Computational Approaches to Statistical Genetics GWAS I: Concepts and Probability Theory Christoph Lippert Dr. Oliver Stegle Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen

More information

Combining SEM & GREML in OpenMx. Rob Kirkpatrick 3/11/16

Combining SEM & GREML in OpenMx. Rob Kirkpatrick 3/11/16 Combining SEM & GREML in OpenMx Rob Kirkpatrick 3/11/16 1 Overview I. Introduction. II. mxgreml Design. III. mxgreml Implementation. IV. Applications. V. Miscellany. 2 G V A A 1 1 F E 1 VA 1 2 3 Y₁ Y₂

More information

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems

More information

Genotype Imputation. Class Discussion for January 19, 2016

Genotype Imputation. Class Discussion for January 19, 2016 Genotype Imputation Class Discussion for January 19, 2016 Intuition Patterns of genetic variation in one individual guide our interpretation of the genomes of other individuals Imputation uses previously

More information

Partitioning Genetic Variance

Partitioning Genetic Variance PSYC 510: Partitioning Genetic Variance (09/17/03) 1 Partitioning Genetic Variance Here, mathematical models are developed for the computation of different types of genetic variance. Several substantive

More information

6.3 How the Associational Criterion Fails

6.3 How the Associational Criterion Fails 6.3. HOW THE ASSOCIATIONAL CRITERION FAILS 271 is randomized. We recall that this probability can be calculated from a causal model M either directly, by simulating the intervention do( = x), or (if P

More information

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative

More information

Research Statement on Statistics Jun Zhang

Research Statement on Statistics Jun Zhang Research Statement on Statistics Jun Zhang (junzhang@galton.uchicago.edu) My interest on statistics generally includes machine learning and statistical genetics. My recent work focus on detection and interpretation

More information

Inferring Genetic Architecture of Complex Biological Processes

Inferring Genetic Architecture of Complex Biological Processes Inferring Genetic Architecture of Complex Biological Processes BioPharmaceutical Technology Center Institute (BTCI) Brian S. Yandell University of Wisconsin-Madison http://www.stat.wisc.edu/~yandell/statgen

More information

Theoretical Population Biology

Theoretical Population Biology Theoretical Population Biology 87 (013) 6 74 Contents lists available at SciVerse ScienceDirect Theoretical Population Biology journal homepage: www.elsevier.com/locate/tpb Genotype imputation in a coalescent

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,

More information

Lecture 11: Multiple trait models for QTL analysis

Lecture 11: Multiple trait models for QTL analysis Lecture 11: Multiple trait models for QTL analysis Julius van der Werf Multiple trait mapping of QTL...99 Increased power of QTL detection...99 Testing for linked QTL vs pleiotropic QTL...100 Multiple

More information

UNIT 8 BIOLOGY: Meiosis and Heredity Page 148

UNIT 8 BIOLOGY: Meiosis and Heredity Page 148 UNIT 8 BIOLOGY: Meiosis and Heredity Page 148 CP: CHAPTER 6, Sections 1-6; CHAPTER 7, Sections 1-4; HN: CHAPTER 11, Section 1-5 Standard B-4: The student will demonstrate an understanding of the molecular

More information

Calculation of IBD probabilities

Calculation of IBD probabilities Calculation of IBD probabilities David Evans and Stacey Cherny University of Oxford Wellcome Trust Centre for Human Genetics This Session IBD vs IBS Why is IBD important? Calculating IBD probabilities

More information

1. Understand the methods for analyzing population structure in genomes

1. Understand the methods for analyzing population structure in genomes MSCBIO 2070/02-710: Computational Genomics, Spring 2016 HW3: Population Genetics Due: 24:00 EST, April 4, 2016 by autolab Your goals in this assignment are to 1. Understand the methods for analyzing population

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Number of cases and proxy cases required to detect association at designs.

Nature Genetics: doi: /ng Supplementary Figure 1. Number of cases and proxy cases required to detect association at designs. Supplementary Figure 1 Number of cases and proxy cases required to detect association at designs. = 5 10 8 for case control and proxy case control The ratio of controls to cases (or proxy cases) is 1.

More information

Lecture 24: Multivariate Response: Changes in G. Bruce Walsh lecture notes Synbreed course version 10 July 2013

Lecture 24: Multivariate Response: Changes in G. Bruce Walsh lecture notes Synbreed course version 10 July 2013 Lecture 24: Multivariate Response: Changes in G Bruce Walsh lecture notes Synbreed course version 10 July 2013 1 Overview Changes in G from disequilibrium (generalized Bulmer Equation) Fragility of covariances

More information

Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics

Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics Lee H. Dicker Rutgers University and Amazon, NYC Based on joint work with Ruijun Ma (Rutgers),

More information

Class Copy! Return to teacher at the end of class! Mendel's Genetics

Class Copy! Return to teacher at the end of class! Mendel's Genetics Class Copy! Return to teacher at the end of class! Mendel's Genetics For thousands of years farmers and herders have been selectively breeding their plants and animals to produce more useful hybrids. It

More information

Eiji Yamamoto 1,2, Hiroyoshi Iwata 3, Takanari Tanabata 4, Ritsuko Mizobuchi 1, Jun-ichi Yonemaru 1,ToshioYamamoto 1* and Masahiro Yano 5,6

Eiji Yamamoto 1,2, Hiroyoshi Iwata 3, Takanari Tanabata 4, Ritsuko Mizobuchi 1, Jun-ichi Yonemaru 1,ToshioYamamoto 1* and Masahiro Yano 5,6 Yamamoto et al. BMC Genetics 2014, 15:50 METHODOLOGY ARTICLE Open Access Effect of advanced intercrossing on genome structure and on the power to detect linked quantitative trait loci in a multi-parent

More information

Powerful multi-locus tests for genetic association in the presence of gene-gene and gene-environment interactions

Powerful multi-locus tests for genetic association in the presence of gene-gene and gene-environment interactions Powerful multi-locus tests for genetic association in the presence of gene-gene and gene-environment interactions Nilanjan Chatterjee, Zeynep Kalaylioglu 2, Roxana Moslehi, Ulrike Peters 3, Sholom Wacholder

More information

Lecture 28: BLUP and Genomic Selection. Bruce Walsh lecture notes Synbreed course version 11 July 2013

Lecture 28: BLUP and Genomic Selection. Bruce Walsh lecture notes Synbreed course version 11 July 2013 Lecture 28: BLUP and Genomic Selection Bruce Walsh lecture notes Synbreed course version 11 July 2013 1 BLUP Selection The idea behind BLUP selection is very straightforward: An appropriate mixed-model

More information

Lecture 9. QTL Mapping 2: Outbred Populations

Lecture 9. QTL Mapping 2: Outbred Populations Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred

More information

GENETIC association studies aim to map phenotypeinfluencing

GENETIC association studies aim to map phenotypeinfluencing Copyright Ó 2006 by the Genetics Society of America DOI: 10.1534/genetics.105.055335 A General Population-Genetic Model for the Production by Population Structure of Spurious Genotype Phenotype Associations

More information

1. they are influenced by many genetic loci. 2. they exhibit variation due to both genetic and environmental effects.

1. they are influenced by many genetic loci. 2. they exhibit variation due to both genetic and environmental effects. October 23, 2009 Bioe 109 Fall 2009 Lecture 13 Selection on quantitative traits Selection on quantitative traits - From Darwin's time onward, it has been widely recognized that natural populations harbor

More information

Classical Selection, Balancing Selection, and Neutral Mutations

Classical Selection, Balancing Selection, and Neutral Mutations Classical Selection, Balancing Selection, and Neutral Mutations Classical Selection Perspective of the Fate of Mutations All mutations are EITHER beneficial or deleterious o Beneficial mutations are selected

More information

QTL Mapping I: Overview and using Inbred Lines

QTL Mapping I: Overview and using Inbred Lines QTL Mapping I: Overview and using Inbred Lines Key idea: Looking for marker-trait associations in collections of relatives If (say) the mean trait value for marker genotype MM is statisically different

More information

Learning Your Identity and Disease from Research Papers: Information Leaks in Genome-Wide Association Study

Learning Your Identity and Disease from Research Papers: Information Leaks in Genome-Wide Association Study Learning Your Identity and Disease from Research Papers: Information Leaks in Genome-Wide Association Study Rui Wang, Yong Li, XiaoFeng Wang, Haixu Tang and Xiaoyong Zhou Indiana University at Bloomington

More information

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017 Lecture 2: Genetic Association Testing with Quantitative Traits Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 29 Introduction to Quantitative Trait Mapping

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University kbroman@jhsph.edu www.biostat.jhsph.edu/ kbroman Outline Experiments and data Models ANOVA

More information

Evolutionary quantitative genetics and one-locus population genetics

Evolutionary quantitative genetics and one-locus population genetics Evolutionary quantitative genetics and one-locus population genetics READING: Hedrick pp. 57 63, 587 596 Most evolutionary problems involve questions about phenotypic means Goal: determine how selection

More information

When one gene is wild type and the other mutant:

When one gene is wild type and the other mutant: Series 2: Cross Diagrams Linkage Analysis There are two alleles for each trait in a diploid organism In C. elegans gene symbols are ALWAYS italicized. To represent two different genes on the same chromosome:

More information

Calculation of IBD probabilities

Calculation of IBD probabilities Calculation of IBD probabilities David Evans University of Bristol This Session Identity by Descent (IBD) vs Identity by state (IBS) Why is IBD important? Calculating IBD probabilities Lander-Green Algorithm

More information

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Xiaoquan Wen Department of Biostatistics, University of Michigan A Model

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

There are 3 parts to this exam. Use your time efficiently and be sure to put your name on the top of each page.

There are 3 parts to this exam. Use your time efficiently and be sure to put your name on the top of each page. EVOLUTIONARY BIOLOGY EXAM #1 Fall 2017 There are 3 parts to this exam. Use your time efficiently and be sure to put your name on the top of each page. Part I. True (T) or False (F) (2 points each). Circle

More information

Theoretical and computational aspects of association tests: application in case-control genome-wide association studies.

Theoretical and computational aspects of association tests: application in case-control genome-wide association studies. Theoretical and computational aspects of association tests: application in case-control genome-wide association studies Mathieu Emily November 18, 2014 Caen mathieu.emily@agrocampus-ouest.fr - Agrocampus

More information

I. Multiple choice. Select the best answer from the choices given and circle the appropriate letter of that answer.

I. Multiple choice. Select the best answer from the choices given and circle the appropriate letter of that answer. NOTE: I ve eliminated several questions that come from material we ll cover after next week, but this should give you a good feel for the types of questions I ll ask. I. Multiple choice. Select the best

More information

On the limiting distribution of the likelihood ratio test in nucleotide mapping of complex disease

On the limiting distribution of the likelihood ratio test in nucleotide mapping of complex disease On the limiting distribution of the likelihood ratio test in nucleotide mapping of complex disease Yuehua Cui 1 and Dong-Yun Kim 2 1 Department of Statistics and Probability, Michigan State University,

More information

CHAPTER 23 THE EVOLUTIONS OF POPULATIONS. Section C: Genetic Variation, the Substrate for Natural Selection

CHAPTER 23 THE EVOLUTIONS OF POPULATIONS. Section C: Genetic Variation, the Substrate for Natural Selection CHAPTER 23 THE EVOLUTIONS OF POPULATIONS Section C: Genetic Variation, the Substrate for Natural Selection 1. Genetic variation occurs within and between populations 2. Mutation and sexual recombination

More information

Maize Genetics Cooperation Newsletter Vol Derkach 1

Maize Genetics Cooperation Newsletter Vol Derkach 1 Maize Genetics Cooperation Newsletter Vol 91 2017 Derkach 1 RELATIONSHIP BETWEEN MAIZE LANCASTER INBRED LINES ACCORDING TO SNP-ANALYSIS Derkach K. V., Satarova T. M., Dzubetsky B. V., Borysova V. V., Cherchel

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture16: Population structure and logistic regression I Jason Mezey jgm45@cornell.edu April 11, 2017 (T) 8:40-9:55 Announcements I April

More information

Structure learning in human causal induction

Structure learning in human causal induction Structure learning in human causal induction Joshua B. Tenenbaum & Thomas L. Griffiths Department of Psychology Stanford University, Stanford, CA 94305 jbt,gruffydd @psych.stanford.edu Abstract We use

More information

Name Class Date. KEY CONCEPT Gametes have half the number of chromosomes that body cells have.

Name Class Date. KEY CONCEPT Gametes have half the number of chromosomes that body cells have. Section 1: Chromosomes and Meiosis KEY CONCEPT Gametes have half the number of chromosomes that body cells have. VOCABULARY somatic cell autosome fertilization gamete sex chromosome diploid homologous

More information

Reinforcement Unit 3 Resource Book. Meiosis and Mendel KEY CONCEPT Gametes have half the number of chromosomes that body cells have.

Reinforcement Unit 3 Resource Book. Meiosis and Mendel KEY CONCEPT Gametes have half the number of chromosomes that body cells have. 6.1 CHROMOSOMES AND MEIOSIS KEY CONCEPT Gametes have half the number of chromosomes that body cells have. Your body is made of two basic cell types. One basic type are somatic cells, also called body cells,

More information

Test for interactions between a genetic marker set and environment in generalized linear models Supplementary Materials

Test for interactions between a genetic marker set and environment in generalized linear models Supplementary Materials Biostatistics (2013), pp. 1 31 doi:10.1093/biostatistics/kxt006 Test for interactions between a genetic marker set and environment in generalized linear models Supplementary Materials XINYI LIN, SEUNGGUEN

More information

Relationship between Genomic Distance-Based Regression and Kernel Machine Regression for Multi-marker Association Testing

Relationship between Genomic Distance-Based Regression and Kernel Machine Regression for Multi-marker Association Testing Relationship between Genomic Distance-Based Regression and Kernel Machine Regression for Multi-marker Association Testing Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota,

More information

Lesson Overview Meiosis

Lesson Overview Meiosis 11.4 As geneticists in the early 1900s applied Mendel s laws, they wondered where genes might be located. They expected genes to be carried on structures inside the cell, but which structures? What cellular

More information

27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1

27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1 10-708: Probabilistic Graphical Models, Spring 2015 27: Case study with popular GM III Lecturer: Eric P. Xing Scribes: Hyun Ah Song & Elizabeth Silver 1 Introduction: Gene association mapping for complex

More information

Algebra Exam. Solutions and Grading Guide

Algebra Exam. Solutions and Grading Guide Algebra Exam Solutions and Grading Guide You should use this grading guide to carefully grade your own exam, trying to be as objective as possible about what score the TAs would give your responses. Full

More information

Lesson 4: Understanding Genetics

Lesson 4: Understanding Genetics Lesson 4: Understanding Genetics 1 Terms Alleles Chromosome Co dominance Crossover Deoxyribonucleic acid DNA Dominant Genetic code Genome Genotype Heredity Heritability Heritability estimate Heterozygous

More information

Overview of clustering analysis. Yuehua Cui

Overview of clustering analysis. Yuehua Cui Overview of clustering analysis Yuehua Cui Email: cuiy@msu.edu http://www.stt.msu.edu/~cui A data set with clear cluster structure How would you design an algorithm for finding the three clusters in this

More information

Genetic Association Studies in the Presence of Population Structure and Admixture

Genetic Association Studies in the Presence of Population Structure and Admixture Genetic Association Studies in the Presence of Population Structure and Admixture Purushottam W. Laud and Nicholas M. Pajewski Division of Biostatistics Department of Population Health Medical College

More information

Populations in statistical genetics

Populations in statistical genetics Populations in statistical genetics What are they, and how can we infer them from whole genome data? Daniel Lawson Heilbronn Institute, University of Bristol www.paintmychromosomes.com Work with: January

More information

The universal validity of the possible triangle constraint for Affected-Sib-Pairs

The universal validity of the possible triangle constraint for Affected-Sib-Pairs The Canadian Journal of Statistics Vol. 31, No.?, 2003, Pages???-??? La revue canadienne de statistique The universal validity of the possible triangle constraint for Affected-Sib-Pairs Zeny Z. Feng, Jiahua

More information

Multiple QTL mapping

Multiple QTL mapping Multiple QTL mapping Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] 1 Why? Reduce residual variation = increased power

More information

Statistical issues in QTL mapping in mice

Statistical issues in QTL mapping in mice Statistical issues in QTL mapping in mice Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Outline Overview of QTL mapping The X chromosome Mapping

More information

Ch 11.Introduction to Genetics.Biology.Landis

Ch 11.Introduction to Genetics.Biology.Landis Nom Section 11 1 The Work of Gregor Mendel (pages 263 266) This section describes how Gregor Mendel studied the inheritance of traits in garden peas and what his conclusions were. Introduction (page 263)

More information