BAYESIAN MAPPING OF MULTIPLE QUANTITATIVE TRAIT LOCI

Size: px
Start display at page:

Download "BAYESIAN MAPPING OF MULTIPLE QUANTITATIVE TRAIT LOCI"

Transcription

1 BAYESIAN MAPPING OF MULTIPLE QUANTITATIVE TRAIT LOCI By DÁMARIS SANTANA MORANT A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2005

2 Copyright 2005 by Dámaris Santana Morant

3 To Tim, my parents and sister

4 ACKNOWLEDGMENTS There are no words, time, or space to thank all that have contributed in some way to this journey. Everyone that I came across or spoke to all these years may be unaware of their contribution. A simple gesture, a word, a smile are sometimes the turning point in a particular day. I recognize the Creator as the Source of all knowledge and thank Him for the opportunity of finishing this degree and all the blessings that I have received. I thank my advisor George Casella for sharing, so generously, his knowledge and time with me. I thank him for his support, encouragement and inspiration. He believed in me more that I believed in myself, one of the reasons that enabled me to reach the end of this journey. I thank the members of my committee, Jim Hobert, Sam Wu, Rongling Wu and John Davis, for their interest in my work and for their support. We had excellent committee meetings out of which great ideas emerged. I Thank them for such an open learning environment. I thank Matias Kirst for joining us and for helping in finding the data sets. I thank the faculty, students and staff of the Department of Statistics at UF for all their help and support. I thank my husband, Tim Porch, for all his support and encouragement. I thank my parents and my sister for all their love, support and prayers. I thank my huge family and all my friends for all their love, support and prayers. I cherish every call, meeting, flower, letter, meal, cup of coffee and visit. I finally thank Carlos Castillo-Chavez for inviting me to participate at his Institute at Cornell University, starting the chain of events that brought me to finishing this degree. iv

5 TABLE OF CONTENTS page ACKNOWLEDGMENTS iv TABLE vii LIST OF FIGURES viii ABSTRACT x CHAPTER 1 INTRODUCTION Basic Concepts of Genetics QTL Analysis METHODS FOR QTL MAPPING Single QTL Methods Single Marker Analysis and Simple Linear Regression Interval Mapping and Regression Mapping Multiple QTL Methods Multiple Regression on Marker Genotypes and Composite Interval Mapping Bayesian Methods SIMULTANEOUS QTL ESTIMATION Model Gibbs Sampler for Model Implementation Model Conditions for a Proper Posterior Distribution Full Conditional Distributions PERFORMANCE EVALUATION Simulated Data Convergence and Results Presentation Simulation Results Model 2 with Full Conditionals Conditioned on β Data Analysis v

6 5 CONCLUSIONS AND FURTHER RESEARCH APPENDIX A PRIORS FOR µ AND γ B PARAMETERIZATION OF QTL POSITIONS B.1 Range of the p 1k s and p 2k s and Exploration of Their Prior Distribution B.2 Reparameterization of the Posterior Distribution in Terms of the Recombination Fractions C FULL CONDITIONALS FOR MODEL 2 CONDITIONED β D RUNNING MEANS FOR PERFORMANCE ANALYSYS REFERENCES BIOGRAPHICAL SKETCH vi

7 Table TABLE page 2 1 Backcross Design, Model Assumptions vii

8 Figure LIST OF FIGURES page 3 1 Histograms of QTL positions. Top Panel: Only r Qk (k) is calculated. Bottom Panel: Both r Qk (k) and r Qk (k+1) are calculated Simulated data, equally spaced markers at approximately.26m. QTL at second and forth intervals with equal effects of 1. h 2 =.94, σ 2 =.04. Model 1 with τ 2 = 1 (top left), τ 2 = 1 (top right) and Model 2 with 100 τ 2 = 1 (bottom) Simulated data, only a few individuals exceed L = 3000 and no more than.5% of the total number of iterations Example 1: Simulated data, equally spaced markers at approximately 26cM. QTL located at the second and forth interval with equal effects of 1. Top panel: h 2 =.94, σ 2 =.04, Bottom panel: h 2 =.4, σ 2 = 1, Left Panel: Model 2, Right Panel: Composite interval mapping Example 2: Model 2 on simulated data at equally spaced markers at approximately 15cM. QTL located at the second and forth interval with equal effects of Example 2: Composite interval mapping on simulated data at equally spaced markers at approximately 15cM. QTL located at the second and forth interval with equal effects of Example 3: Model 2 on simulated data at equally spaced markers at approximately 15cM. QTL with effects 1,.1,1,.1 located at marker intervals 2, 5, 7 and 8, respectively Example 4: Model 2 on simulated data at equally spaced markers at approximately 5cM. QTL with effects.1,1,.1,1 located at marker intervals 1,5,7 and 9, respectively Example 4: Composite interval mapping on simulated data at equally spaced markers at approximately 5cM. QTL with effects.1,1,.1,1 located at marker intervals 1,5,7 and 9, respectively Simulated data, equally spaced markers at approximately 26cM with h 2 =.94. QTL at second and forth intervals with equal effects of 1. Gibbs sampler with full conditionals conditioned on β viii

9 4 8 Top Panel: Results for Model 2 on Barley Data using all markers. Bottom Panel: Results for Model 2 on Barley Data using selected markers Running means for Model 2 on Barley data using all markers Running means for Model 2 on Barley data using selected markers B 1 Histograms of QTL positions. Each graph shows 5 intervals of equal length. Top Panel: Only r Qk (k+1) is calculated. Bottom Panel: Only r Qk (k) is calculated B 2 Histograms of QTL positions generated using the mixture distribution for p 1k p 2k at different interval lengths. Each graphs show 5 intervals of equal length D 1 Example 2: h 2 =.4, n= D 2 Example 2: h 2 =.2, n= D 3 Example 2: h 2 =.2, n= D 4 Example 2: h 2 =.1, n= D 5 Example 3: h 2 =.4, n= D 6 Example 3: h 2 =.3, n= D 7 Example 3: h 2 =.3, n= D 8 Example 4: h 2 =.3, n= ix

10 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy BAYESIAN MAPPING OF MULTIPLE QUANTITATIVE TRAIT LOCI Chair: George Casella Major Department: Statistics By Dámaris Santana Morant December 2005 We describe a method for the simultaneous estimation of the locations and the effects of quantitative trail loci (QTL) in a backcross population. We consider a mixed model that includes one QTL per interval and considers all markers as covariates. By including one possible QTL per interval it was possible to examine and account for all of the QTL effects. The marker information is included in the model using a term that takes into account the portion of the effects of the markers that are not taken into account by the QTL. The effects of the marker information are considered random effects in the model. We obtain the posterior distribution of the QTL effects along the genome using a Gibbs sampler. To determine the significant effects, 95% posterior confidence intervals are used. One advantage of this approach is that all markers are used as covariates, which eliminates the constraint of marker selection. The performance of our method was studied using simulated data for equally spaced markers at different interval lengths. We considered examples with six and ten markers and with different heritability levels. We compared our results to results using composite interval mapping for some of x

11 the examples. The analysis of Barley (Hordeum vulgare) chromosome five is also presented. xi

12 CHAPTER 1 INTRODUCTION Many agronomic traits in plants are classified as quantitative in nature; i.e., the observed phenotype is the joint result of the effects of a number of genetic and environmental factors. The genetics of quantitative traits are studied through estimating the effects of the genes contributing to the traits as well as by determining their location in the genome. Once a molecular location is determined for the genes, they are called quantitative trait loci (QTL). Knowledge about these loci assist in the selection of superior genotypes in a population for trait improvement (e.g., yield and disease resistance in crops). Several methods for QTL analysis have been developed to determine the number, location and effects of QTL. These methods fall in two categories, those that model the effects of single QTL and those that model the effects of multiple QTL. The best approach to search for multiple QTL remains an open problem (Sen and Churchill, 2001). We developed a method for the simultaneous estimation of QTL effects and locations in the genome. We focused on experimental populations, particularly the backcross design. In Chapter 2 we review some of the methods for QTL mapping presented in the literature. Our method for simultaneous QTL estimation in a Bayesian framework is presented in Chapter 3. The performance of our method on simulated data as well as on chromosome 5 of a Barley (Hordeum vulgare) dataset is presented in Chapter 4. A discussion of the results and a presentation on further research are presented in Chapter 5. First, we introduce some basic concepts of genetics and a background on QTL analysis. 1

13 2 1.1 Basic Concepts of Genetics We may ask ourselves, why do individuals look different? Aside from environmental factors, the differences between individuals are defined by their genetic makeup. This genetic information is coded in deoxyribonucleic acid (DNA). DNA is composed of four repeating units called nucleotides and forms double helix molecules, termed chromosomes. The information in the DNA is expressed through the translation of DNA into amino acids, which in turn form proteins that are responsible for the function and structure of cells. It is believed that only a small part of the DNA encodes for proteins. DNA that encodes for information is called a gene and the location of the gene on the chromosome is called a locus. Genes are therefore the inherited factors that control a trait s phenotype, or the observable form of the trait. A trait can be controlled by one gene or by multiple genes (quantitative traits). The variation in the phenotype of a particular trait corresponds to different forms of the gene, or alleles. Change in the sequence of nucleotides that compose a gene results in the formation of a new allele. Therefore, a gene may have many alleles. Most animal and plan species are diploid; thus there are two copies of each chromosome, and individuals have two copies of each gene. Given alleles A 1 and A 2 for a specific gene, the individual will have one of three possible genotypes: A 1 A 1, A 1 A 2 or A 2 A 2. An individual that has genotype A 1 A 1 or A 2 A 2 is called homozygous, indicating that the alleles are identical. Otherwise, the individual is heterozygous. If a trait is only controlled by a single gene, the expression of the phenotype is determined by the dominance relationship between the alleles. If the allele A 1 is completely dominant, the individuals with genotype A 1 A 1 and A 1 A 2 will be indistinguishable and will express the A1 phenotype. If A 1 is incompletely dominant, the heterozygotes will have a phenotype intermediate between the

14 3 two homozygotes. In this case, the alleles are said to be codominant because heterozygotes and homozygotes may be distinguished. The transmission of genetic information from parents to offspring is through egg and sperm cells. These cells, called gametes, carry one complement of the chromosomes of each parent. During the formation of gametes in the process of meiosis each chromosome in the pair duplicates resulting in four chromosomes. The duplicates interchange DNA, in events called crossovers, resulting in two new chromosomes that are a mosaic of the parental chromosomes. The four chromosomes then separate to form four new cells (gametes), each one with one chromosome. Two gametes result with copies identical to one of the chromosomes in the original pair, the other two with chromosomes that are a combination of both. The latter are called recombinant gametes. The diploid copy number in the cell is restored when the egg and the sperm unite. Thus, recombination is a key source of genetic variation. Consider two diallelic genes in an individual with genotype AaBb. AB are on one chromosome and ab on the other. There are 4 possible gametes: AB, ab, and the recombinants Ab and ab. By Mendel s rule of independent assortment, each of these gametes would have the same probability. This rule states that alleles of different genes segregate independently. It was discovered later that the frequency of gametes depends on the genetic distance between the genes. Genes that are close to each other are more likely to remain together in the process of meiosis. Genes that are further away are more likely to experience crossovers and recombination. Genetic distance is determined through use of linkage between two loci to calculate the recombination fraction, the ratio of the number of recombinant gametes to total number of gametes. Recombination between two loci on the same chromosome is more likely the further the loci are apart. For example, the recombination fraction, r = 0, between two loci means that they are completely linked, while r = 1 means 2

15 4 that they segregate independently and are unlinked. Mendel s rule of independent assortment applies to genes that are unlinked. 1.2 QTL Analysis Linkage is the basis for QTL analysis. Genetic maps are constructed based on the recombination fraction between genetic markers. The relationship between phenotypes and the genotypes of these genetic markers is explored. If an association exists, it suggests that a gene controlling the trait is located near the marker. Experimental populations, like backcross populations and F2 populations, derived from the cross of pure inbred lines offer the ideal setting for the study of associations between phenotypes and markers because the progeny of the first filial generation (F 1 ) are genetically identical. Because the population structure is controlled and the parental genotypes are known, genetic questions can be precisely determined. Molecular markers are small regions of DNA for which detectable heritable variation can be analyzed for individuals in a population. A genetic map consists of linearly ordered molecular markers and the genetic distance between them. The genetic map is constructed by analyzing the relationship of the marker genotypes for the individuals in a population by a process called linkage analysis (Liu 1997). Genetic markers are placed in linkage groups based on their linkage relationship defined by recombination fractions. Recombination fractions are then translated to genetic distances using a mapping function (Haldane 1919, Kosambi 1944). Genetic maps provide a representation of genome structure and have been developed for many plant species. We focus on the backcross population structure for the development of a multiple QTL analysis method. Although the backcross population may be considered simple to analyze, it still presents big challenges. The extension of the methods developed for backcross experiments to other types of experimental

16 5 crosses is often not difficult. For the purpose of QTL analysis, the parental lines that form the population should be homozygous, but they must differ for the trait of interest. The two parental lines are crossed to get the F1 generation. Each F1 individual receives a copy of a chromosome from each of its two parents; thus they are heterozygous wherever the parental lines differ. Individuals in the F1 generation are genetically identical. The F1 generation can be backcrossed to the P1 or P2 parent to obtain the BC1 or BC2 population. The individuals in the backcross population have one of two genotypes at every locus, homozygous or heterozygous. After the population is generated, phenotypic information and the marker genotypes is obtained for all the individuals in the population as well as the two parents. Thus, the goal of the QTL analysis is to determine the association between the individual phenotypes and the alleles they received from their parents at various marker loci using the genetic map, the marker information, and the phenotypic data.

17 CHAPTER 2 METHODS FOR QTL MAPPING Several methods have been described in the literature for QTL mapping. These methods fall essentially in two categories, those that estimate the effects of single QTL and those that estimate the effects of multiple QTL. We will briefly review some of the methods in both categories. Some of these methods perform the analysis at the marker locations while others use the marker information to estimate effects between markers. More extensive reviews are presented by Doerge (2002), Broman and Speed (1999) and Doerge et al. (1997). 2.1 Single QTL Methods We will consider several methods: single marker analysis (Soller and Broody 1976), simple linear regression, interval mapping (Lander and Botstein 1989) and regression mapping (Haley and Knott 1992, Martinez and Curnow 1992) Single Marker Analysis and Simple Linear Regression Single marker analysis involves studying single genetic markers one at a time. Based on the putative QTL genotype of each individual, the population can be separated into two groups in the backcross design and their respective trait means can be compared. Unfortunately, the QTL genotypes are unknown. Thus the analysis is performed at the markers where the genotypes are known. We will assume that the phenotypes of parent P1, parent P2 and the first filial (F1) are distributed N(µ QQ, σ 2 ), N(µ qq, σ 2 ) and N(µ Qq, σ 2 ), respectively (Table 2 1). The means of these populations are attributed to the QTL effect. Markers are assumed to have no effect on the trait. The backcross population between P1 and F1 will have marker-qtl genotypes M 1 Q/M 1 Q, M 1 Q/M 1 q, M 1 Q/M 2 Q, M 1 Q/M 2 q with probability 1 r MQ 2, r MQ 2, r MQ 2, 1 r MQ 2, respectively. r MQ is the recombination fraction 6

18 7 between marker M and the putative QTL. In this case, Q represents the QTL genotype that is not observed. Two observable marker classes are obtained with the following mixture distributions: M 1 /M 1 : (1 r MQ )N(µ QQ, σ 2 ) + r MQ N(µ Qq, σ 2 ) M 1 /M 2 : r MQ N(µ QQ, σ 2 ) + (1 r MQ )N(µ Qq, σ 2 ). The difference in the means of these two populations is µ M1 M 1 µ M1 M 2 = (1 2r MQ )(µ QQ µ Qq ). The usual t-test will then test the hypothesis H 0 : µ M1 M 1 = µ M1 M 2 H 1 : µ M1 M 1 µ M1 M 2 which is equivalent to H 0 : r MQ =.5 H 1 : r MQ <.5 which tests the presence of a QTL unlinked to the marker under consideration. From the experimental design, it is known that µ QQ µ Qq 0 and that we started with parental lines that differ at the trait of interest. However, the analysis is confounded by the effect of locus Q, since it is the product (1 2r MQ )(µ QQ µ Qq ) that is being tested for departures from zero. A QTL with a small effect that is close to the marker will give the same result as a QTL with a larger effect located further from the marker. Care should be taken with the determination of critical values since the distribution of the populations in consideration, in this case the observable marker classes, are mixtures of normals. Churchill and Doerge (1994) discuss permutation theory for the calculation of empirical threshold values.

19 8 Table 2 1. Backcross Design, Model Assumptions P1: M 1 Q/M 1 Q x P2: M 2 q/m 2 q N(µ QQ, σ 2 ) N(µ qq, σ 2 ) F1: M 1 Q/M 2 q N(µ Qq, σ 2 ) B1: M 1 Q/M 1 Q M 1 Q/M 1 q M 1 Q/M 2 Q M 1 Q/M 2 q 1 r MQ 2 r MQ 2 r MQ 2 1 r MQ 2 observable marker class 1 observable marker class 2 Consider the linear regression model y i = β 0 + β 1 x i + ɛ i (2 1) where y i is the trait value, x i is either 1 or 0 (in a backcross design) depending on the marker genotype, homozygote (M 1 M 1 ) or heterozygote (M 1 M 2 ) respectively and ɛ i is a N(0, σ 2 ) random variable. The hypothesis H 0 : β 1 = 0 is equivalent to the above hypothesis since the regression coefficient β 1 is the difference between the mean of the observable marker classes, that is, β 1 = (1 2r MQ )(µ QQ µ Qq ) Interval Mapping and Regression Mapping Lander and Botstein (1989) introduced interval mapping to remedy the problems presented by single marker analysis, including the confounding of genetic locations of QTL and phenotypic effects. An additive model is assumed for the phenotype, with no epistasis (interactions among QTL). The phenotype results from summing the effects of individual QTL and normally distributed environmental noise. Interval mapping extends the idea of maximum likelihood in model (2 1). For this model, the likelihood function is L(β 0, β 1, σ 2 ) = n i=1 1 2πσ 2 e 1 2σ 2 (y i (β 0 +β 1 x i )) 2

20 9 To test the hypothesis H 0 : β 1 = 0 the test statistic typically used is L( LOD = log ˆβ 0, ˆβ 1, ˆσ 2 ) 10 L( ˆβ0, 0, ˆσ2 ) where ˆβ0 and ˆσ2 are the constrained MLEs obtained under the null hypothesis. The LOD score indicates how much more probable it is that the data arise from the situation of having a QTL present versus absent. In interval mapping, the information provided by the genetic map is used to march along the chromosome and calculate the LOD score at different positions in the genome. The model is y i = β 0 + β 1 z i + ɛ i where z i is the genotype of the putative QTL. When calculating the LOD score at positions other than the markers, z i is unknown. By using the genetic map, the probability distribution of z i given the flanking markers is known (see Section 3.1). The probability distribution is based on the recombination fraction between the putative QTL and the flanking markers. Once at a particular position, these recombination fractions are determined from the genetic map. Thus, the likelihood is L(β 0, β 1, σ 2 ) = n [ 1 P (z i = 0) e 1 2σ 2 (y i β 0 ) P (z i = 1) e 1 2πσ 2 2πσ 2 i=1 2σ 2 (y i (β 0 +β 1 )) 2 ] Parameter estimation is done using the EM algorithm. The maximum LOD score over all studied positions in the genome is an indication of a single QTL if it is larger than some specified threshold value. Martinez and Curnow (1992) used the same approach of marching along the chromosome. Their model is y i = β 0 + β 1 P (z i = 1) + ɛ i

21 10 Note that P (z i = 1) is the expected value of z i. As before, once at a particular position, these probabilities can be calculated using the genetic map; thus this is a simple regression model and the statistical analysis is straightforward. The minimum residual sum of squares over all studied positions in the genome is an indication of a single QTL if it is smaller than some specified threshold value. The drawback of these two methods, as discussed by several authors including Martinez and Curnow (1992), is that they do not guard against ghost QTL. Ghost QTL occur when a QTL is located in a marker interval and neighbouring regions also exhibit significant test statistics. The problem is that these methods do not take into account the presence of other possible QTL in the genome. These methods have been shown to give accurate estimates of the QTL position and its effect when there is only one QTL segregating in the population. This problem brings us to the second category of methods which take into account multiple QTL. 2.2 Multiple QTL Methods Methods that account for multiple QTL allow for the separation of the effects of linked QTL and for the study of interactions among QTL. We will focus on Bayesian models which is the framework for our research. We will first describe briefly multiple regression and composite interval mapping (Zeng 1993, 1994, Jansen and Stam 1994) Multiple Regression on Marker Genotypes and Composite Interval Mapping Multiple regression on marker genotypes is an extension of the model (2 1). Many markers are considered instead of one marker at a time. Let t be the number of markers. The model is defined by y i = β 0 + t β j x ij + ɛ i (2 2) j=1

22 11 where y i, ɛ j and x ij are defined as in model (2 1), except that now we have t markers. Zeng (1993,1994) discussed properties of the multiple regression analysis. Among them, the partial regression coefficient parameter for marker j depends only on those QTL which are located between markers j 1 and j + 1, under the assumption of no epistasis. He argues that this results in a test for the presence of QTL between markers j 1 and j + 1, regardless of the presence of other QTL in the genome. Zeng (1993) discussed that the use of multiple regression alone is not appropriate since the estimates of QTL effects by the partial regression coefficient are biased. Thus, he introduced composite interval mapping which is a combination of interval mapping and multiple regression. A genome scan using interval mapping is performed, but markers outside the interval under consideration are used as covariates to control for the effects of other QTL. At each position under consideration, LOD scores are calculated. The selection of the markers to use as covariates is a problem with this method. Conditioning on linked markers potentially increases the precision of the test and estimation, but with a possible decrease in statistical power (Zeng 1993). If there are QTL in the intervals immediately adjacent to the interval under consideration, this method has the potential to falsely indicate the presence of a QTL (Zeng 1994). Broman and Speed (1999,2002) stated that QTL mapping should be viewed as a problem of model selection and not of multiple testing. Instead of minimizing the prediction error, they seek to identify the subset of markers for which β j 0. They introduced a modified version of the Bayesian information criteria (BIC δ ) for model selection. They proceed in two stages. First, the space of models is searched in order to pick the best ones, those that would have been chosen if all models were fitted. Second, the model with minimum BIC δ is chosen among the selected models. They discussed several methods to select the best models, including

23 12 backward and forward selection and an MCMC method, among others. Doerge and Churchill (1996) discuss other methods for model selection Bayesian Methods In general, the methods presented in the literature that use a Bayesian approach consider the problem of mapping multiple QTL as a model selection problem. In some methods, the number of QTL is considered a parameter of the model and the dimension of the problem varies through the analysis. In other methods, the dimensionality is kept constant and models with different numbers of QTL are compared. The goal of Bayesian methods is to obtain the posterior distribution of the parameters in the model (QTL position, QTL effects and in some approaches, QTL number) given the phenotype and marker information. Bayesian methods allow for the case of missing markers genotype, treating them as parameters in the model. The methods described below take into account missing marker information. Satagopan et al.(1996) and Sillanpää and Arjas (1998) proposed models that, in a backcross design, are equivalent to y i = µ + and s γ k z ik + ɛ i (2 3) k=1 w s y i = µ + β j x ij + γ k z ik + ɛ i (2 4) j=1 k=1 respectively. The x ij s are a subset of w markers selected as covariates from the total of t markers, the z ij s are the unobserved QTL genotypes and s is the number of QTL, also unknown. Although the z ij s are unknown, their probability distribution given the flanking markers is known (see Section 3.1). The probability distribution is based on the recombination fraction between the putative QTL and the flanking markers. Both articles developed MCMC algorithms to sample from

24 13 the posterior distribution of the parameters given the data. Let β = (β 1,..., β w ), γ = (γ 1,..., γ s ), y = (y 1,..., y n ), X = {x ij }, Z = {z ik } and λ = (λ 1,..., λ s ) (the vector of QTL positions). Satagopan et al. sample from µ, γ, σ 2, Z, λ y, X using a Gibbs sampler with Metropolis-Hastings steps to update QTL positions. They use Bayes factors to compare models of different sizes. Sillanpää and Arjas (1998) sample from µ, β, γ, σ 2, Z, λ, s y, X using the Metropolis-Hastings algorithm and reversible-jump MCMC to move between models of different sizes. Satagopan and Yandell (1996) use a similar approach. Sillanpää and Arjas (1998) use stepwise regression to choose the markers that will be used at covariates. Yi (2004) noted that methods that change the dimensionality of the problem by changing the number of QTL during the analysis have the disadvantage that the information about a QTL is lost as soon as it is removed from the model. The author also noted that reversible jump MCMC is usually subject to poor mixing and slow convergence. Yi et al. (2003) presented a Gibbs sampler for the multiple regression model (2 2) (for analysis at the markers) based on a variable selection method called stochastic search variable selection, developed by George and McCulloch (1993). As in Broman and Speed (1999,2002), the objective is to identify the subset of markers for which β j 0 (model (2 2)). In this approach, the dimensionality of the problem is kept constant by limiting the posterior distribution of nonsignificant terms (markers with no effects) in a neighborhood of zero instead of removing them from the model. That is, they define δ = (δ 1,..., δ t ), where δ j = 1 or 0 represent the presence or absence of the covariate j in the model. The marker effects β j, j = 1,..., t are given a prior distribution β j δ j (1 δ j )N(0, τ 2 j ) + δ j N(0, c 2 jτ 2 j ), where c 2 j and τ 2 j are chosen so that τ 2 j is small and c 2 jτ 2 j is large. If δ j = 0, the effect β j is pushed toward zero. Based on this prior, the posterior distribution of

25 14 (β 1,..., β t ) is multivariate normal. The authors discussed that with sparse and irregularly spaced markers, the marker analysis will be biased, as noted by Zeng (1993). They proposed, without implementing or giving details, two ways to deal with this problem. One of them is to incorporate in their procedure the multiple imputation method proposed by Sen and Churchill (2001). Their idea is that if the complete genotype information on a dense set of markers is known, regressing the phenotype on each marker will give information about how likely it is that a QTL is close to that marker. Because, in practice, the markers may be widely spaced across the genome, they proposed to create a complete and dense set of markers by adding what they called pseudomarkers. The genotypic information for these pseudomarkers can be inferred using their assigned positions in the genome, the genetic map and the available marker data. Several versions of this complete genotype data are constructed and the LOD scores obtained from each of them is then combined to measure the evidence in favor of a QTL being near any given pseudomarker. To account for multiple QTL, they computed LOD scores for each pair of pseudomarkers for a two-dimensional genome scan and also implemented a three-dimensional genome scan.

26 CHAPTER 3 SIMULTANEOUS QTL ESTIMATION Our objective is to develop a genomewide search that accounts for multiple QTL. As noted previously, analysis at the markers will be biased when the markers are sparse and irregularly spaced. It was also mentioned that methods that change the dimensionality of the problem by changing the number of QTL during the analysis have the disadvantage that the information about a QTL is lost as soon as it is removed from the model. Our proposed method addresses these two problems by sampling the entire genome while keeping the dimension of the problem fixed. At this point, we are working under the assumptions of no interference and no epistasis. For now, we are not considering the case of missing marker observations. We will be working in a Bayesian framework, therefore our target is the posterior distribution of the locations and the effects of the QTL. Two models will be considered. We will present both models along with detailed calculations and their implementation in the next two sections, although Model 2 (section 3.2) is our final model. For a discussion on the performance of Model 2 see Chapter 4. We consider the model y i = µ Model 1 t t 1 β j x ij + γ k z ik + ɛ i (3 1) j=1 This model is based on model (2 4) with all markers as covariates and one QTL per marker interval, i.e. t 1 QTL. The z ik s represent the unobserved QTL genotypes. In a backcross design, z ik is either 1 or 0 depending on the QTL k=1 15

27 16 genotype, homozygote (QQ) or heterozygote (Qq). We will consider the βs as random effects and will keep the dimension of the model constant by considering one QTL per interval. We believe that including one possible QTL per interval will allow us to examine and account for all their effects and that the effects that are important will be significant. Since we are not interested in prediction, we are not thinking about the problem as model selection. This model can be written as y = µ1 + Xβ + Zγ + ɛ where X is a n t matrix, β is a vector of dimension t, ɛ and µ1 are of dimension n, Z is a n (t 1) matrix, and γ is a vector of dimension t 1. Although the z ik s are unobserved, their distribution given the flanking marker information (x ik, x i(k+1) ) can be derived if the genetic distance, λ Qk (k), between the QTL k and one of the flanking markers, say marker k, is known. Under the assumption of no interference, the genetic distance λ Qk (k) is related to the recombination fraction r Qk (k) (the recombination fraction between QTL k and marker k) by the Haldane mapping function r Qk (k) = 1(1 2 e 2λ Q k (k) ). Then for i = 1,..., n and k = 1,..., t 1 f(z ik p k, x ik, x i(k+1) ) = p z ik k (1 p k) 1 z ik (3 2) where p k = p 1k = (1 r Q k (k))(1 r Qk (k+1)) 1 r k(k+1) if x ik = 1, x i(k+1) = 1 p 2k = (1 r Q k (k))r Qk (k+1) r k(k+1) if x ik = 1, x i(k+1) = 0 1 p 2k = r Q k (k)(1 r Qk (k+1)) r k(k+1) if x ik = 0, x i(k+1) = 1 1 p 1k = r Q k (k)r Qk (k+1) 1 r k(k+1) if x ik = 0, x i(k+1) = 0 and r k(k+1) is the recombination fraction between markers k and k + 1, which are assumed to be known from the genetic map. Let ρ be a 2 (t 1) matrix of the p 1k s and p 2k s. (3 3)

28 17 Since our interest is the posterior distribution of locations and effects, we will use a Gibbs sampler. In Section we will show the posterior distribution for Model 1 and we will derive the full conditional posterior for all the parameters in the model. The details and challenges on sampling from these distributions will be discussed in Section Gibbs Sampler for Model 1 To set up a Gibbs sampler for the model in equation (3 1), note that the posterior distribution of the parameters given the data is f(µ, γ, Z, ρ, σ 2 y) = f(µ, β, γ, Z, ρ, σ 2 y)dβ = = = [f(y µ, β, γ, Z, ρ, σ 2 )f(µ, β, γ, Z, ρ, σ 2 ) ] dβ [f(y µ, β, γ, Z, σ 2 )f(z µ, β, γ, ρ, σ 2 )f(µ, β, γ, ρ, σ 2 ) ] dβ [f(y µ, β, γ, Z, σ 2 )f(µ)f(γ)f(β σ 2 )f(σ 2 )f(z ρ)f(ρ) ] dβ { [f(y µ, β, γ, Z, σ 2 )f(µ)f(γ)f(β σ 2 )f(σ 2 ) ] dβ} f(z ρ)f(ρ) (3 4) This factorization shows that, conditional on the QTL genotypes Z, the problem involving y, µ, γ and σ 2 can be solved independently from the problem involving ρ (Sen and Churchill 2001). To find equation (3 4), assume priors N(η, v µ ), N(ξ, A), IG(a, b) and N(0, τ 2 σ 2 I t ) for µ, γ, σ 2 and β, respectively. τ 2 is considered to be fixed. Assume a Beta(c 2, d 2 ) prior for p 2k. For p 1k, f(p 1k p 2k ) p c 1 1 1k (1 p 1k ) d 1 1 with range (1 p 2k r k(k+1) 1 r k(k+1), 1) with probability 1 2 and with range (p 2k, 1) with probability 1. Flat priors for µ and γ were also considered for Model 2 2, the conditions needed to obtain a proper posterior distribution under these priors are presented in Section and proofs in Appendix A. See Appendix B for graphs that explore the shape of the prior distribution chosen for the p 1k s and

29 18 p 2k s. The mixture provides a more uniform distribution of QTL locations in the intervals. Also, in Appendix B, a reparameterization of the posterior in terms of the recombination fraction is presented. The parameterization in terms of the p 1k s and p 2k s turned out to be simpler than the one using the recombination fractions. With the priors mentioned above, the term inside the integral of equation (3 4) is f(y µ, β, γ, Z, σ 2 )f(µ)f(γ)f(β σ 2 )f(σ 2 ) ( ) n e 2σ 2 (y µ1 Xβ Zγ) (y µ1 Xβ Zγ) e 1 2 (γ ξ) A 1 (γ ξ) e 1 2vµ (µ η)2 σ 2 ( ) t ( ) 1 2 a+1 1 e τ 2 σ 2 2τ 2 σ 2 β β 1 e 1 σ 2 σ 2 b Thus, f(y µ, β, γ, Z, σ 2 )f(µ)f(γ)f(β σ 2 )f(σ 2 )dβ = = = ( 1 σ 2 ( 1 σ 2 ( 1 σ 2 ) n+t 2 +a+1 e 1 σ 2 b e 1 2 (γ ξ) A 1 (γ ξ) e 1 2vµ (µ η)2 e 1 2σ 2 (y µ1 Xβ Zγ) (y µ1 Xβ Zγ) e 1 2τ 2 σ 2 β β dβ ) n+t 2 +a+1 e 1 σ 2 b e 1 2 (γ ξ) A 1 (γ ξ) e 1 2vµ (µ η)2 e 1 2σ 2 (y µ1 Zγ) (y µ1 Zγ) e 1 2σ 2 {β X Xβ 2β X (y µ1 Zγ)+ β β τ 2 } dβ ) n+t 2 +a+1 e 1 σ 2 b e 1 2 (γ ξ) A 1 (γ ξ) e 1 2vµ (µ η)2 e 1 2σ 2 (y µ1 Zγ) (y µ1 Zγ) e 1 2σ 2 {β (X X+ I t τ 2 )β 2β X (y µ1 Zγ)} dβ ( ) n+t 1 2 +a+1 e 1 σ 2 σ 2 b e 1 2 (γ ξ) A 1 (γ ξ) e 1 2vµ (µ η)2 e 1 2σ 2 {(y µ1 Zγ) (I XU 1 X )(y µ1 Zγ)} e 1 2σ 2 {(β U 1 X (y µ1 Zγ)) U(β U 1 X (y µ1 Zγ))} dβ ( ) n+t 1 2 +a+1 e 1 σ 2 σ 2 b e 1 2 (γ ξ) A 1 (γ ξ) e 1 2vµ (µ η)2 e 1 2σ 2 {(y µ1 Zγ) (I XU 1 X )(y µ1 Zγ)} (σ 2 ) t 1 2 U 2

30 19 where U = X X + It τ 2. The last line follows since the term in the integral is the kernel of a N t (U 1 X (y µ1 Zγ), σ 2 U 1 ). Therefore equation (3 4) becomes, f(µ, γ, Z, ρ, σ 2 y) e 1 2σ 2 {(y µ1 Zγ) (I XU 1 X )(y µ1 Zγ)} U 1 2 e 1 2 (γ ξ) A 1 (γ ξ) ( ) n e 1 2vµ 1 (µ η)2 2 +a+1 e 1 σ 2 σ 2 b f(z ρ)f(ρ) (3 5) To obtain f(z ρ)f(ρ), from equation (3 2) f(z ρ) n t 1 f(z ik p k, x ik, x i(k+1) ) i=1 k=1 n t 1 i=1 k=1 p z ik k (1 p k) 1 z ik (3 6) Therefore (range of p 1k omitted for simplicity), f(z ρ)f(ρ) = t 1 k=1 t 1 k=1 p z 1k 1k (1 p 1k) n 1k z 1k p z 2k 2k (1 p 2k) n 2k z 2k t 1 (1 p 2k ) z n 3k p 3k z3k 2k (1 p 1k ) z n 4k p 4k z4k 1k k=1 t 1 p c 1 1 1k k=1 (1 p 1k ) d1 1 p c 2 1 (1 p 2k ) d 2 1 p n 4k z4k +z 1k +c 1 1 1k (1 p 1k ) n 1k z1k +z 4k +d 1 1 t 1 k=1 2k p n 3k z3k +z 2k +c 2 1 2k (1 p 2k ) n 2k z2k +z 3k +d 2 1 (3 7) where n 1k is the number of individuals with marker genotypes x ik = 1, x i(k+1) = 1, i.e n 1k = n i=1 I(x ik = 1, x i(k+1) = 1). Similarly, n 2k = n 3k = n 4k = n I(x ik = 1, x i(k+1) = 0) i=1 n I(x ik = 0, x i(k+1) = 1) i=1 n I(x ik = 0, x i(k+1) = 0) i=1

31 20 and z1k = z2k = z3k = z4k = n z ik I(x ik = 1, x i(k+1) = 1) i=1 n z ik I(x ik = 1, x i(k+1) = 0) i=1 n z ik I(x ik = 0, x i(k+1) = 1) i=1 n z ik I(x ik = 0, x i(k+1) = 0). i=1 Using equations (3 5), (3 6) and (3 7) we now calculate the full conditional distributions of all the parameters in our model. Distribution of σ 2 µ, γ, Z, ρ, y It is clear from equation (3 5) that σ 2 µ, γ, Z, ρ, y IG( n 2 + a, (y µ1 Zγ) (I XU 1 X )(y µ1 Zγ) 2 Distribution of µ γ, Z, ρ, σ 2, y To obtain the distribution of µ γ, Z, ρ, σ 2, y note that from equation (3 5) + 1 b ). f(µ γ, Z, ρ, σ 2, y) e 1 2σ 2 {(y µ1 Zγ) (I XU 1 X )(y µ1 Zγ)} e 1 2vµ (µ η)2 e 1 2σ 2 {µ2 1 (I XU 1 X )1 2µ1 (I XU 1 X )(y Zγ)} e 1 2vµ (µ2 2µη) = e 1 2 {µ2 ( 1 (I XU 1 X )1 σ vµ ) 2µ( 1 (I XU 1 X )(y Zγ) σ 2 + η vµ )} = e 1 2 ( 1 (I XU 1 X )1 σ vµ ){µ2 2µ vµ1 (I XU 1 X )(y Zγ)+ησ 2 vµ1 (I XU 1 X )1+σ 2 } e 1 2 ( 1 (I XU 1 X )1 σ vµ ){µ v µ1 (I XU 1 X )(y Zγ)+ησ 2 vµ1 (I XU 1 X )1+σ 2 } 2 Therefore, µ γ, Z, ρ, σ 2, y N( v µ1 (I XU 1 X )(y Zγ)+ησ 2 v µ 1 (I XU 1 X )1+σ 2, Distribution of γ µ, Z, ρ, σ 2, y σ 2 v µ v µ 1 (I XU 1 X )1+σ 2 ). To obtain the distribution of γ µ, Z, ρ, σ 2, y note that from equation (3 5)

32 21 f(γ µ, Z, ρ, σ 2, y) e 1 2 {(y µ1 Zγ) (I XU 1 X )(y µ1 Zγ)} e 1 e = e = e (γ ξ) A 1 (γ ξ) γ Z (I XU 1 X )Zγ σ 2 2 γ Z (I XU 1 X )(y µ1) «σ 2 1 2(γ A 1 γ 2γ A 1 ξ) «1 γ 2 Z (I XU 1 X )Z σ 2 +A 1 γ 2 γ Z (I XU 1 X )(y µ1) σ 2 «2γ A 1 ξ «««1 γ 2 Z (I XU 1 X )Z σ 2 +A 1 γ 2γ Z (I XU 1 X )(y µ1) σ 2 +A 1 ξ 1 2 e { γ T 1 ««Z (I XU 1 X )(y µ1) σ 2 +A 1 ξ T γ T 1 ««Z (I XU 1 X )(y µ1) σ 2 +A 1 ξ } where T = Z (I XU 1 X )Z σ 2 + A 1. Thus, γ µ, Z, ρ, σ 2, y N(T 1 ( Z (I XU 1 X )(y µ1) σ 2 Distribution of Z µ, γ, ρ, σ 2, y (3 6) ) + A 1 ξ, T 1 ). To obtain the distribution of Z µ, γ, ρ, σ 2, y note that from equations (3 5) and f(z µ, γ, ρ, σ 2, y) = e 1 e 1 2σ 2 {(y µ1 Zγ) (I XU 1 X )(y µ1 Zγ)} n t 1 i=1 k=1 2 {(Zγ (y µ1)) (I XU 1 X ) n t 1 σ 2 (Zγ (y µ1))} i=1 k=1 Distribution of p 1k µ, β, γ, Z, σ 2, y, p 2k and p 2k µ, β, γ, Z, σ 2, y p z ik k (1 p k) 1 z ik p z ik k (1 p k) 1 z ik The distribution of p 1k µ, β, γ, Z, σ 2, y and p 2k µ, β, γ, Z, σ 2, y is obtained using equation (3 7). For k = 1,..., t 1, p 2k µ, β, γ, Z, σ 2, y Beta(n 3k z 3k + z 2k + c 2, n 2k z 2k + z 3k + d 2 ) and p 1k µ, β, γ, Z, σ 2, y, p 2k p n 4k z4k +z 1k +c 1 1 1k (1 p 1k ) n 1k z1k +z 4k +d 1 1

33 22 r k(k+1) with range (1 p 2k 1 r k(k+1), 1), with probability 1 and with range (p 2 2k, 1), with probability Implementation Sampling from the distributions of σ 2 µ, γ, Z, ρ, y, µ γ, Z, ρ, σ 2, y and γ µ, Z, ρ, σ 2, y is straight forward. Sampling from the distribution of Z µ, γ, ρ, σ 2, y, p 1k µ, β, γ, Z, σ 2, y, p 2k and p 2k µ, β, γ, Z, σ 2, y is more challenging. Sampling from the distribution of Z µ, γ, ρ, σ 2, y The distribution of Z µ, γ, ρ, σ 2, y is f(z µ, γ, ρ, σ 2, y) = e 1 e 1 2σ 2 {(y µ1 Zγ) (I XU 1 X )(y µ1 Zγ)} n t 1 i=1 k=1 2 {(Zγ (y µ1)) (I XU 1 X ) n t 1 σ 2 (Zγ (y µ1))} i=1 k=1 p z ik k (1 p k) 1 z ik p z ik k (1 p k) 1 z ik To sample from this distribution we went back to the expression f(µ, γ, Z, ρ, σ 2 y) { [f(y µ, β, γ, Z, σ 2 )f(µ)f(β σ 2 )f(γ)f(σ 2 ) ] dβ} f(z ρ)f(ρ) Thus, f(z µ, γ, ρ, σ 2, y) = = { [f(y µ, β, γ, Z, σ 2 )f(β σ 2 ) ] dβ} f(z ρ) { { } n e 1 2σ 2 (y µ1 Xβ Zγ) (y µ1 Xβ Zγ) e 1 2τ 2 σ 2 β β dβ e 1 2σ 2 {P n i=1( P t 1 k=1 z ikγ k w i) 2} e 1 2τ 2 σ 2 β β dβ n i=1 [ } n t 1 i=1 k=1 t 1 i=1 k=1 e 1 ( P 2σ 2 t 1 t 1 k=1 z ikγ k w i) 2 p z ik k (1 p k) 1 z ik k=1 ] p z ik k (1 p k) 1 z ik p z ik k (1 p k) 1 z ik e 1 2τ 2 σ 2 β β dβ (3 8)

34 23 where w i = y i µ X (i) β and X (i) represent the ith row of X. From here we can see that given β the Zs can be generated independently for each individual. For individual i, the vector (z i1,..., z i(t 1) ) can be generated as a block. This can be done in two ways. The first approach is to calculate the exact probability distribution of (z i1,..., z i(t 1) ) and sample from it. This is done by calculating the probability of each of the 2 (t 1) possible vectors. Since this is done for each individual at each iteration of the Gibbs sampler, it can be computationally intensive as the number of markers and individuals increase. A second way is to use the Accept Reject algorithm with target distribution A candidate distribution will be e 1 ( P 2σ 2 t 1 t 1 k=1 z ikγ k w i) 2 p z ik k (1 p k) 1 z ik. k=1 t 1 e 1 2ω 2 (az ikγ k bw i ) 2 p z ik k k=1 (1 p k) 1 z ik which implies that the Zs are independent with P (z ik and P (z ik = 1) e 1 2ω 2 (aγ k bw i ) 2 p k = 0) e 1 2ω 2 (bw i) 2 (1 p k ). The supremum of the ratio of the target and candidate distributions, typically called M, has to be bounded from above, and should be close to 1 to have a good acceptance rate in the algorithm. a, b and ω are free parameters that are used to get a closed form of M to facilitate the implementation of the algorithm as well as to assess its performance. In this case, M = sup z where u 1 and u 2 are constants. u 1 σ e 1 ( P 2σ 2 t 1 k=1 z ikγ k w i) 2 u 2 ω e 1 P t 1 2ω 2 k=1 (az ikγ k bw i ) 2,

35 24 Note that = t 1 (az ik γ k bw i ) 2 k=1 t 1 ( ) 2 azik γ k azγ + azγ bw i k=1 t 1 ( = a 2 zik γ k Zγ ) 2 ( ) 2 + (t 1) azγ bwi k=1 t 1 ( = a 2 zik γ k Zγ ) ( ) t (t 1)b 2 a z ik γ k w i b(t 1) k=1 where Zγ = 1 t 1 t 1 k=1 z ikγ k. Thus, M = { u sup 1 ω exp z u 2 σ ( 1 t 1 2σ 2 where S z = t 1 k=1 ( zik γ k Zγ ) 2. k=1 z ikγ k w i ) 2 + (t 1)b 2 ( a 2ω 2 b(t 1) k=1 t 1 k=1 z ikγ k w i ) 2 + a 2 2ω 2 S z } We examined various choices for a,b and ω with the goal of minimizing M. For example, one possibility is a = b(t 1) and (t 1)b2 2ω 2 = 1 σ 2 M = sup z u 1 ω u 2 σ e (t 1) 2σ 2 Sz giving which can give large values for M as seen in simulation studies. After examination of choices with the goal of making the value in the exponent as small as possible, we decided to set a = b(t 1) and (t 1)b2 = κ 1, where 0 < κ < 1. Then, 2ω 2 2σ 2 M = sup z u 1 ω u 2 σ exp 1 2σ 2 ( t 1 ) 2 z ik γ k w i (1 κ) + κ Finally, to ensure a negative exponent, we obtained κ such that κ k=1 (t 1) S 2σ 2 z < 1 κ 2σ 2 min z ( t 1 k=1 ) 2 z ik γ k w i (t 1) 2σ 2 S z

36 25 this is which implies that (t 1)S z min z { ( t 1 k=1 z ikγ k w i ) 2 } < 1 κ 1 κ < for all z. Thus, we choose κ = Now, choosing ω = σ 2, we have that min z { ( t 1 k=1 z ikγ k w i ) 2 } min z { ( t 1 k=1 z ikγ k w i ) 2 } + (t 1)S z min z { ( t 1 k=1 z ikγ k w i ) 2 } min z { ( t 1 k=1 z ikγ k w i ) 2 } + (t 1) max{s z }. M u 1 u 2. As seen in simulations, this method could give large M for few individuals while running the Gibbs sampler. For Model 2 (see Section 3.2), we will use a modified version of this Accept Reject algorithm. Note that to calculate the posterior and all the full conditionals, we integrated β out. But to sample from Z we are going back to the expression in the integral. An alternative approach, which we tried for Model 2, is to consider β as part of the Gibbs sampler, i.e., the full conditional posterior distributions will be conditioned also on β. The full conditional distributions for this case are shown in Appendix C. We show an example in Section 4.4 where the Gibbs sampler defined this way did not recover the effects of the simulated QTL. Sampling from the distribution of p 1k µ, β, γ, Z, σ 2, y, p 2k and p 2k µ, β, γ, Z, σ 2, y For k = 1,..., t 1, p 2k µ, β, γ, Z, σ 2, y Beta(n 3k z 3k + z 2k + c 2, n 2k z 2k + z 3k + d 2 )

37 26 and p 1k µ, β, γ, Z, σ 2, y, p 2k p n 4k z4k +z 1k +c 1 1 1k (1 p 1k ) n 1k z1k +z 4k +d 1 1 r k(k+1) with range (1 p 2k 1 r k(k+1), 1), with probability 1 and with range (p 2 2k, 1), with probability 1 2. The following sampling scheme was developed while analyzing simulated data. Recall that our objective is to sample a QTL position at interval k. Once we have a value of p 1k and a value for p 2k, we calculate the recombination fraction r Qk (k) or r Qk (k+1). As can be seen from the definitions of r Qk (k) and r Qk (k+1), each result in different restrictions on the range of the p s (see Appendix B). We started by r k(k+1) sampling p 1k s using the range (1 p 2k 1 r k(k+1), 1) and calculating r Qk (k) s only. We noted that as the interval length increases, this method was not mixing well. Mostly the right side of the interval was visited by the algorithm and rarely the left side (see Figure 3 1 top). This is because only the left flanking marker was used to determine the location of the QTL. We, therefore, decided to use the right flanking marker as well by also calculating r Qk (k+1) s. We now decide at random which flanking marker to use. Using this method, the distribution of the QTL positions will look like the bottom panel of Figure 3 1. Sampling from p 2k µ, β, γ, Z, σ 2, y is straightforward. Given p 2k, we sample p 1k using the Accept Reject algorithm with a uniform candidate on (a, b), where (a, b) r k(k+1) is either (1 p 2k 1 r k(k+1), 1) or (p 2k, 1). In this case, M is easier to calculate. Let s = b a, w 1 = n 4k z 4k + z 1k + c 1 1 and w 2 = n 1k z 1k + z 4k + d 1 1, then M = sup s<p 1k <1 p w 1 1k (1 p 1k) w 2 1 s pw 1 1k (1 p 1k) w 2 dp1k s

38 27 where sup p w 1 1k (1 p 1k) w 2 = s<p 1k <1 1 if w 1 = 0 or w 2 = 0 w 1 ( w 1 +w 2 ) w 1 (1 w 1 s w 1 (1 s) w 2 w 1 +w 1 ) w 2 if if w 1 w 1 +w 2 w 1 w 1 +w 2 > s < s (3 9) 3.2 Model 2 While examining the performance of Model 1 in simulations, we noted that the fixed parameter τ 2 had to be very small to be able to recover the simulated QTL effects in problems where the marker intervals were small. It seems that the information in the matrix of QTL genotypes Z is very similar to the information in the matrix of marker genotypes X and the QTL effects are unidentifiable. Choosing a very small τ 2 forces the βs to be zero while generating from the mixture in equation (3 8). This motivated Model 2 where we consider the model y = µ + (I H z )Xβ + Zγ + ɛ (3 10) where H z = Z(Z Z) 1 Z. The idea is to account for the portion of the effects of the markers that is not being taken into account by the matrix of QTL genotypes Z. Although we are not considering this case here, if the matrix Z is not of full column rank, the Gibbs sampler using H z = Z(Z Z) Z will still work. This case is likely to arise in a genomewide search where the number of marker intervals is likely to be greater than the number of individuals. Figure 3 2 shows the graphs of the γ s obtained by Model 1 with τ = 1,τ = and with Model 2 with τ = 1. These are results on simulated data with true QTL at marker intervals 2 and 4. Model 1 could not recover the QTL effects when τ = 1. The simulation set up will be discussed in Chapter 4.

39 28 To set up a Gibbs sampler for this model, we assumed the same priors as in Model 1, except for µ for which a flat prior is assumed. The posterior distribution is f(µ, γ, Z, ρ, σ 2 y) = { [f(y µ, β, γ, Z, σ 2 )f(µ)f(γ)f(β σ 2 )f(σ 2 ) ] dβ} f(z ρ)f(ρ) where f(z ρ)f(ρ) is as before and f(y µ, β, γ, Z, σ 2 )f(µ)f(γ)f(β σ 2 )f(σ 2 ) ( 1 σ 2 ) n 2 e 1 2σ 2 (y µ1 (I Hz)Xβ Zγ) (y µ1 (I H z)xβ Zγ) e 1 2 (γ ξ) A 1 (γ ξ) ( ) t ( ) 1 2 a+1 1 e τ 2 σ 2 2τ 2 σ 2 β β 1 e 1 σ 2 σ 2 b Thus, f(y µ, β, γ, Z, σ 2 )f(µ)f(γ)f(β σ 2 )f(σ 2 )dβ = = = ( 1 σ 2 ( 1 σ 2 ( 1 σ 2 ) n+t 2 +a+1 e 1 σ 2 b e 1 2 (γ ξ) A 1 (γ ξ) e 1 2σ 2 (y µ1 (I H z)xβ Zγ) (y µ1 (I H z )Xβ Zγ) e 1 2τ 2 σ 2 β β dβ ) n+t 2 +a+1 e 1 σ 2 b e 1 2 (γ ξ) A 1 (γ ξ) e 1 2σ 2 (y µ1 Zγ) (y µ1 Zγ) e 1 2σ 2 {β X (I H z)xβ 2β X (I H z)(y µ1 Zγ)+ β β τ 2 } dβ ) n+t 2 +a+1 e 1 σ 2 b e 1 2 (γ ξ) A 1 (γ ξ) e 1 2σ 2 (y µ1 Zγ) (y µ1 Zγ) e 1 2σ 2 {β (X (I H z )X+ I t τ 2 )β 2β X (I H z )(y µ1)} dβ ( ) n+t 1 2 +a+1 e 1 σ 2 σ 2 b e 1 2 (γ ξ) A 1 (γ ξ) e 1 2σ 2 (y µ1 Zγ) (y µ1 Zγ) e 1 2σ 2 (y µ1) (I H z )XU 1 X (I H z )(y µ1) e 1 2σ 2 {(β U 1 X (I H z )(y µ1)) U(β U 1 X (I H z )(y µ1))} dβ

40 29 ( ) n+t 1 2 +a+1 e 1 σ 2 σ 2 b e 1 2 (γ ξ) A 1 (γ ξ) e 1 2σ 2 (y µ1 Zγ) (y µ1 Zγ) e 1 2σ 2 (y µ1) (I H z )XU 1 X (I H z )(y µ1) (σ 2 ) t 1 2 U 2 where U = X (I H z )X + I t τ 2. The last line follows since the term in the integral is the kernel of a N t (U 1 X (I H z )(y µ1), σ 2 U 1 ). Therefore, the posterior distribution of the parameters is, f(µ, γ, Z, ρ, σ 2 y) e 1 2σ 2 (y µ1 Zγ) (y µ1 Zγ) e 1 2σ 2 (y µ1) (I H z )XU 1 X (I H z )(y µ1) U 1 2 e 1 2 (γ ξ) A 1 (γ ξ) ( 1 σ 2 ) n 2 +a+1 e 1 σ 2 b f(z ρ)f(ρ) (3 11) that can be written as f(µ, γ, Z, ρ, σ 2 y) e 1 2σ 2 (y µ1 Zγ) [I (I H z)xu 1 X (I H z)](y µ1 Zγ) U 1 2 e 1 2 (γ ξ) A 1 (γ ξ) ( 1 σ 2 ) n 2 +a+1 e 1 σ 2 b f(z ρ)f(ρ) The full conditional distributions are presented in Section In the next section, we present results on the conditions needed to obtain a proper posterior distribution under flat priors for µ and γ Conditions for a Proper Posterior Distribution The conditions needed to obtain a proper posterior distribution under flat priors for µ and γ are summarized in the following theorems. Proofs can be found in Appendix A. Theorem 3.1 The posterior distribution is proper if a flat prior on µ and a N(ξ, A) on γ are assumed. Theorem 3.2 The posterior distribution is proper if a N(η, v µ ) on µ and a flat prior on γ are assumed and Z is of full column rank. Theorem 3.3 The posterior distribution is proper if a flat prior on µ and γ is assumed and Z is of full column rank and 1 (I H z )1 0.

41 30 For Model 2, we are assuming a flat prior for µ and a N(ξ, A) for γ (Theorem 3.1). The others were not considered because the matrix Z is not necessarily of full column rank. This will be the case if there are less individuals than number of marker intervals or if the QTL genotypes are the same for all individuals in two or more marker intervals. However, we are not considering any of these cases in our models explicitly. The second case occurs, although rarely, while sampling from the matrix Z. Since we are not using generalized inverse, we are dropping these occurences and resampling Full Conditional Distributions We now present the full conditional distributions for Model 2 which are obtained from equation (3 11). The calculations are similar to the ones performed for Model 1. In fact, p 1k µ, β, γ, Z, σ 2, y, p 2k and p 2k µ, β, γ, Z, σ 2, y do not change. Let W = (I H z )XU 1 X (I H z ). Distribution of σ 2 µ, γ, Z, ρ, y σ 2 µ, γ, Z, ρ, y IG( n 2 +a, (y µ1 Zγ) (y µ1 Zγ) (y µ1) W (y µ1) b ). Distribution of µ γ, Z, ρ, σ 2, y µ γ, Z, ρ, σ 2, y N( 1 (I W )(y Zγ) σ 2, 1 (I W )1 1 (I W )1 ). Distribution of γ µ, Z, ρ, σ 2, y γ µ, Z, ρ, σ 2, y N(T 1 ( Z (y µ1) σ 2 where T = Z Z σ 2 + A 1. Distribution of Z µ, γ, ρ, σ 2, y ) + A 1 ξ, T 1 ). f(z µ, γ, ρ, σ 2, y) X (I H z )X + I t τ e 1 2σ 2 {(y µ1 Zγ) (I W )(y µ1 Zγ)} n t 1 i=1 k=1 p z ik k (1 p k) 1 z ik.

42 31 Sampling from this last distribution is a challenge. As in Model 1, we have gone back to the equation = f(z µ, γ, ρ, σ 2, y) { } n e 1 2σ 2 (y µ1 (I H z)xβ Zγ) (y µ1 (I H z )Xβ Zγ) e 1 2τ 2 σ 2 β β dβ { e 1 2σ 2 (y µ1 (I H z)xβ Zγ) (y µ1 (I H z )Xβ Zγ) n t 1 i=1 k=1 t 1 i=1 k=1 p z ik k (1 p k) 1 z ik p z ik k (1 p k) 1 z ik } e 1 2τ 2 σ 2 β β dβ. Thus, given a β we must sample from the distribution inside the integral. To this end we are using the Metropolis-Hastings algorithm with candidate distribution e 1 2σ 2 (y µ1 Zγ) (y µ1 Zγ) n t 1 i=1 k=1 p z ik k (1 p k) 1 z ik Recall that we described how to sample from this candidate using the Accept Reject algorithm in Section We showed that the vector (z 1,..., z (t 1) ) can be sampled for each individual independently. In this case, we must sample the entire matrix Z, the genotypes of all individuals, to have a candidate draw for the Metropolis-Hastings algorithm. As we mentioned before, the Accept Reject algorithm presented in Section gives large M for a few individuals while running the Gibbs sampler. Since in this case this distribution is a candidate distribution and not the distribution of interest, we decided to set a maximum number of trials for the Accept Reject algorithm, say L, based on simulations. While generating the matrix Z, if an individual exceed L, the vector (z 1,..., z (t 1) ) from the previous iteration is kept in the matrix for that individual. This modification leads to sampling the vector z = (z 1,..., z (t 1) ) from a mixture distribution of the form qg(z) + (1 q)δ z prev where g(z) e 1 ( P 2σ 2 t 1 k=1 z kγ k w) t 1 2 p z k k (1 p k ) 1 z k k=1

43 32 and δ z prev is a constant. The value of q is unknown, but is approximately 1 when L is large enough. For simulated data using L = 3000, Figure 3 3 shows that in iterations only a few individuals exceed L and no more than.5% of the time. Individuals ID are shown in the x axis and the percent of the total iterations for which that individual exceed L in the y axis. The individual with ID 558 exceeds L, keeping the the vector z from the previous iteration, in only 629 out of iterations, for a little bit more of.4%. Few other individuals exceed L much less than.4% of the time. Most of them never exceed L in any trial. The performance evaluation of Model 2 will be discussed in Chapter 4.

44 Figure 3 1. Histograms of QTL positions. Top Panel: Only r Qk (k) is calculated. Bottom Panel: Both r Qk (k) and r Qk (k+1) are calculated. 33

45 34 Figure 3 2. Simulated data, equally spaced markers at approximately.26m. QTL at second and forth intervals with equal effects of 1. h 2 =.94, σ 2 =.04. Model 1 with τ 2 = 1 (top left), τ 2 = 1 (top right) and Model 2 with 100 τ 2 = 1 (bottom).

46 35 Figure 3 3. Simulated data, only a few individuals exceed L = 3000 and no more than.5% of the total number of iterations.

47 CHAPTER 4 PERFORMANCE EVALUATION 4.1 Simulated Data The performance evaluation was done on data simulated from a backcross experiment. Several examples were generated for equally spaced marker at different interval lengths. We considered examples with six and ten markers with different heritability levels. To simulate the data we first defined the length of the marker intervals in terms of recombination fraction. Then, defined the location of the QTL, by the recombination fraction from their left flanking marker. Then, the marker genotypes for the individuals (matrix X) are generated as well as for the QTL genotypes (matrix Z). For each individual we first generate the genotype of the first marker using a Bernoulli( 1 ). Then, the rest of the markers and QTL genotypes for that 2 individual are generated sequentially. If the previous marker (QTL) genotype was 1 a Bernoulli(1 p) is used to generate the current genotype. If the previous marker (QTL) genotype was 0 a Bernoulli(p) is used. p is the recombination fraction from the previous left flanking marker or QTL, 0 p 1 2. Phenotypes (y) are generated by the equation y = µ+za+ɛ, where µ is a fixed value, a is the vector of fixed values for the QTL effects and ɛ N(0, σe 2 ). The magnitude of the effects of the QTL (a) and the environmental variance (σ 2 E ) were chosen so that we would have data from populations with high and low heritability. Heritability is defined as the ratio of the genetic variance to the total phenotypic variance, h 2 = σ 2 g σ 2 g + σ 2 E 36

48 37 where for L QTL with effects a i σ 2 g = 1 4 L a 2 i + 1 (1 2r ij )a i a j, 4 i=1 i j r ij is the recombination fraction between QTL i and QTL j. It is expected that the effects of QTL will be easier to recover from data sets with high heritability. The implementation was done in Ox version 3.30 (Doornik, 2002) and the graphs were created using the statistical software R. 4.2 Convergence and Results Presentation We are interested in the effects of QTL at different locations in the genome for a specific trait of interest. We are exploring convergence by looking at the running means of the γs in small windows of size 1cM or 2cM. Recall that at each iteration of the Gibbs sampler a position for a QTL is generated in each interval as well as a corresponding γ. Thus, the total number of γs varies in each window since at every iteration of the Gibbs sampler a γ will not be necessarily generated on that window. The running mean is weighted by the number of γs in the particular window and not by the number of iterations in the Gibbs sampler. The running means of all windows in an interval will be shown in the same panel, thus the x axis that corresponds to the iteration number is also scaled. For the results we plot the posterior distribution of the γ s, i.e, the estimated QTL effects against their QTL positions. The means of the γ s at each of the small windows will be shown as well as corresponding 5 and 95 percent cutoffs. 4.3 Simulation Results We present the results obtained from Model 2 in simulated data. We considered examples with 6 and 10 markers spaced at 26cM, 15cM and 5cM as well as different levels of heritability. We will present each example and its results separately. Graphs for the running mean of the γs for each one of examples are shown in Appendix D. As expected, it is harder to separate QTL effects for the cases of

49 38 low heritability (.1,.2). The effectiveness of our method will depend on the number of individuals in the sample. Increasing the number of iterations may improve performance in certain cases. We compared the results of our method with those of composite interval mapping (CIM) in three of the examples. Example 1: We considered simulated data for one chromosome with 6 equally spaced markers at 26cM distance. QTL were located at marker intervals 2 and 4, both with effect equal to one. Phenotypic data was generated for 250 individuals. Heritability.94 (σ 2 E =.04) and.4 (σ2 E = 1) were considered. The results obtained from Model 2 were compared with those from composite interval mapping using all other markers as covariates. The results are shown in Figure 4 1. The top panel shows the case with heritability.94, the bottom panel the case with heritability.4. The left panel shows the results from Model 2. The last 10,000 of 30,000 iterations are shown for the data with high heritability, and the last 10,000 of 60,000 iterations for the data with low heritability. Figure 4 1 shows that the Model 2 was able to identify the effects of the multiple QTL, separating their effects successfully. The means of the γs at windows of size 2cM are shown in black and 5 and 95 percent cutoffs in red. Composite interval mapping did the same for the model with high heritability, but the LOD scores are above the threshold value almost for all intervals. Recall that CIM has the potential to falsely indicate the presence of a QTL if there are QTL in the intervals immediately adjacent to the interval under consideration (Zeng 1994). For the data with low heritability the separate effects were not recovered. The LOD scores are showing a possible QTL at marker interval 2, and at marker interval 3 and 4. Example 2: We considered simulated data for one chromosome with 6 equally spaced markers at 15cM distance. QTL were located at marker intervals 2 and 4, both with effect equal to one. 250 individuals were considered for the case of heritability.4 (σe 2 = 1). 250 and 500 individuals were considered for the case of

50 39 heritability.2 (σ 2 E = 2.54). For heritability.1 (σ2 E = 5) only 500 individuals were considered. The results from Model 2 are shown in Figure 4 2. QTL were separated successfully for the case of heritability.4. For heritability.2, Model 2 was successful with 500 individuals. For heritability.1 our model was not successful. The last 70 thousand of 150 thousand iterations are shown. Composite interval mapping did not separate the QTL effects in any of the cases (Figure 4 3). In this example we are using CIM with backward and forward regression to select the markers that are used as covariates. Example 3: We considered simulated data for one chromosome with 10 equally spaced markers at 15cM distance. QTL were located at marker intervals 2, 5, 7 and 8, with effects 1,.1, 1, and.1, respectively. Cases with heritability.4 and.3 were considered. 250 individuals were considered for the case of heritability.4 (σ 2 E =.82). 600 and 800 individuals were considered for the case of heritability.3 (σe 2 = 1.27). The results from Model 2 are shown in Figure 4 4. QTL of large effect were separated successfully for the case of heritability.4. For heritability.3, Model 2 was successful in detecting the QTL of large effects when the sample size was 800. It indicate a possible QTL at interval 8th but with equal effect of the one in interval 7, suggesting that the model was not able to separate the effects. The last 70 thousand of 150 thousand iterations are shown. Increasing the number of iterations may improve the performance when using 600 individuals as well as with 800. Example 4: We considered simulated data for one chromosome with 10 equally spaced markers at 5cM distance. QTL were located at marker intervals 1, 5, 7 and 9, with effects 1,.1, 1, and.1, respectively. The case with heritability.3 (σ 2 E = 1.31) was considered with 600 individuals.

51 40 The results from Model 2 are shown in Figure 4 5. QTL of large effects were separated successfully. CIM with backward and forward regression to select the markers that are used as covariates did not separate the QTL effects successfully (Figure 4 6). 4.4 Model 2 with Full Conditionals Conditioned on β Recall that to calculate the posterior and the full conditionals in Section we integrated β out, i.e, all conditionals were marginalized on β. But to sample from Z we went back to the expression in the integral. An alternative approach is to consider β as part of the Gibbs sampler, i.e., the full conditional posterior distributions will be conditioned also on β. We show the results using this approach on the simulated data set with h 2 =.94 and markers distances 26cM described in the previous section. Two QTL with effect of magnitude 1 were simulated in marker intervals 2 and 4. Figure 4 7 shows that with this approach the effects are not recovered after 30,000 iterations. 4.5 Data Analysis We analyzed chromosome five of the Harrington x TR306 population of Barley. This population is composed of 150 doubled haploid (DH) lines. The parents in this population are closely related and thus the level of polymorphism is relatively low. Harrington, a 2-row barley variety, has high malting quality. TR306 is a high yielding line, but is a non-malting type. From the population, a random sample of 150 doubled haploid lines were selected for genotypic analysis in order to generate the molecular map (Kleinhofs et al. 1993; Kasha et al. 1994) and for phenotypic analysis of traits such as heading date in replicated field trials. Days to heading is a measurement of the number of days between planting and flowering. The phenotypic data for this trait was collected in 29 different environments for 145 DH lines in the population. Previous work has found that the heritability for heading date in barley is high, for example (Ma et al., 2000) and 0.42 to 0.86

52 41 (Esparza Martnez and Foster, 1998). Thus, heading date is a robust trait that is suitable for testing QTL methods. The phenotypic values were averaged across the environments and to compare with the results obtained by Yi et al. (2003) were also standardized. There are 14 markers in the genetic map at locations: 0, 10.9, 18.5, 78.2, 91.2, 111.2, 114.7, 121.7, 125.2, 138.8, 143.7, 150.7, 154.2, cm. Individuals with missing marker information were removed from the analysis. For now, our method assumes no missing marker information. We analyzed the data using all markers. After removing the missing data, the population was reduced from 145 to 107 individuals. The results are shown in the top panel of Figure 4 8. The last hundred thousand of seven hundred thousand iterations are shown. The running means are shown in Figure 4 9, there is evidence that more iterations are needed. No QTL effects were detected except for marker interval 3 that is a very long interval. This chromosome was analyzed by Yi et al. (2003). They did account for missing marker information and used all markers. Recall that their method assumes that the QTL is at the marker. They estimated that marker 10 has posterior probability greater that.4 with an estimated effect between.2 and.4. This could be interpreted as saying that there are no QTL effects. Note that the markers are very close in certain regions, thus we decided to analyze selecting markers that were at least 13cM apart. The marker distances were now: 0, 18.5, 78.2, 91.2, 111.2, 125.2, 143.7, cm. After removing the missing data, the population was reduced from 145 to 114 individuals. The results are shown in the bottom panel of Figure 4 8. The running means are shown in Figure QTL are detected at all marker intervals, but if we look at the chromosome as a whole, and at the estimated effects, it seems that the effects of the QTL are canceling each other out resulting in no overall QTL effect.

53 42 This latter observation leads us to reassess the interpretation of our results. We have been interpreting the results interval by interval when we should be concentrating on the overall picture. Quantitative traits are the joint effects of a number of genetic and environmental factors. Our model is separating these factors and estimating the genetic ones. It seems that the appropriate way to look at the plots of the posterior distribution of the effects is as a whole. In further studies we must explore this interpretation and the possibilities it presents for prediction and validation of our model. Also, from this analysis it is evident that we must better understand the possible effects of correlation between the markers in the estimation of QTL effects.

54 43 Figure 4 1. Example 1: Simulated data, equally spaced markers at approximately 26cM. QTL located at the second and forth interval with equal effects of 1. Top panel: h 2 =.94, σ 2 =.04, Bottom panel: h 2 =.4, σ 2 = 1, Left Panel: Model 2, Right Panel: Composite interval mapping.

55 Figure 4 2. Example 2: Model 2 on simulated data at equally spaced markers at approximately 15cM. QTL located at the second and forth interval with equal effects of 1. 44

56 45 Figure 4 3. Example 2: Composite interval mapping on simulated data at equally spaced markers at approximately 15cM. QTL located at the second and forth interval with equal effects of 1.

57 46 Figure 4 4. Example 3: Model 2 on simulated data at equally spaced markers at approximately 15cM. QTL with effects 1,.1,1,.1 located at marker intervals 2, 5, 7 and 8, respectively.

58 Figure 4 5. Example 4: Model 2 on simulated data at equally spaced markers at approximately 5cM. QTL with effects.1,1,.1,1 located at marker intervals 1,5,7 and 9, respectively. 47

59 48 Figure 4 6. Example 4: Composite interval mapping on simulated data at equally spaced markers at approximately 5cM. QTL with effects.1,1,.1,1 located at marker intervals 1,5,7 and 9, respectively.

60 49 Figure 4 7. Simulated data, equally spaced markers at approximately 26cM with h 2 =.94. QTL at second and forth intervals with equal effects of 1. Gibbs sampler with full conditionals conditioned on β.

61 50 Figure 4 8. Top Panel: Results for Model 2 on Barley Data using all markers. Bottom Panel: Results for Model 2 on Barley Data using selected markers.

Multiple QTL mapping

Multiple QTL mapping Multiple QTL mapping Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] 1 Why? Reduce residual variation = increased power

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics and Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman [ Teaching Miscellaneous lectures]

More information

Mapping multiple QTL in experimental crosses

Mapping multiple QTL in experimental crosses Human vs mouse Mapping multiple QTL in experimental crosses Karl W Broman Department of Biostatistics & Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman www.daviddeen.com

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl Broman Biostatistics and Medical Informatics University of Wisconsin Madison kbroman.org github.com/kbroman @kwbroman Backcross P 1 P 2 P 1 F 1 BC 4

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University kbroman@jhsph.edu www.biostat.jhsph.edu/ kbroman Outline Experiments and data Models ANOVA

More information

Gene mapping in model organisms

Gene mapping in model organisms Gene mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Goal Identify genes that contribute to common human diseases. 2

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University kbroman@jhsph.edu www.biostat.jhsph.edu/ kbroman Outline Experiments and data Models ANOVA

More information

QTL model selection: key players

QTL model selection: key players Bayesian Interval Mapping. Bayesian strategy -9. Markov chain sampling 0-7. sampling genetic architectures 8-5 4. criteria for model selection 6-44 QTL : Bayes Seattle SISG: Yandell 008 QTL model selection:

More information

Statistical issues in QTL mapping in mice

Statistical issues in QTL mapping in mice Statistical issues in QTL mapping in mice Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Outline Overview of QTL mapping The X chromosome Mapping

More information

Use of hidden Markov models for QTL mapping

Use of hidden Markov models for QTL mapping Use of hidden Markov models for QTL mapping Karl W Broman Department of Biostatistics, Johns Hopkins University December 5, 2006 An important aspect of the QTL mapping problem is the treatment of missing

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Human vs mouse Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] www.daviddeen.com

More information

QTL Mapping I: Overview and using Inbred Lines

QTL Mapping I: Overview and using Inbred Lines QTL Mapping I: Overview and using Inbred Lines Key idea: Looking for marker-trait associations in collections of relatives If (say) the mean trait value for marker genotype MM is statisically different

More information

Methods for QTL analysis

Methods for QTL analysis Methods for QTL analysis Julius van der Werf METHODS FOR QTL ANALYSIS... 44 SINGLE VERSUS MULTIPLE MARKERS... 45 DETERMINING ASSOCIATIONS BETWEEN GENETIC MARKERS AND QTL WITH TWO MARKERS... 45 INTERVAL

More information

Fast Bayesian Methods for Genetic Mapping Applicable for High-Throughput Datasets

Fast Bayesian Methods for Genetic Mapping Applicable for High-Throughput Datasets Fast Bayesian Methods for Genetic Mapping Applicable for High-Throughput Datasets Yu-Ling Chang A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment

More information

Mapping multiple QTL in experimental crosses

Mapping multiple QTL in experimental crosses Mapping multiple QTL in experimental crosses Karl W Broman Department of Biostatistics and Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman [ Teaching Miscellaneous lectures]

More information

UNIT 8 BIOLOGY: Meiosis and Heredity Page 148

UNIT 8 BIOLOGY: Meiosis and Heredity Page 148 UNIT 8 BIOLOGY: Meiosis and Heredity Page 148 CP: CHAPTER 6, Sections 1-6; CHAPTER 7, Sections 1-4; HN: CHAPTER 11, Section 1-5 Standard B-4: The student will demonstrate an understanding of the molecular

More information

Lecture 8. QTL Mapping 1: Overview and Using Inbred Lines

Lecture 8. QTL Mapping 1: Overview and Using Inbred Lines Lecture 8 QTL Mapping 1: Overview and Using Inbred Lines Bruce Walsh. jbwalsh@u.arizona.edu. University of Arizona. Notes from a short course taught Jan-Feb 2012 at University of Uppsala While the machinery

More information

Lecture 9. QTL Mapping 2: Outbred Populations

Lecture 9. QTL Mapping 2: Outbred Populations Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred

More information

Overview. Background

Overview. Background Overview Implementation of robust methods for locating quantitative trait loci in R Introduction to QTL mapping Andreas Baierl and Andreas Futschik Institute of Statistics and Decision Support Systems

More information

R/qtl workshop. (part 2) Karl Broman. Biostatistics and Medical Informatics University of Wisconsin Madison. kbroman.org

R/qtl workshop. (part 2) Karl Broman. Biostatistics and Medical Informatics University of Wisconsin Madison. kbroman.org R/qtl workshop (part 2) Karl Broman Biostatistics and Medical Informatics University of Wisconsin Madison kbroman.org github.com/kbroman @kwbroman Example Sugiyama et al. Genomics 71:70-77, 2001 250 male

More information

MULTIPLE-TRAIT MULTIPLE-INTERVAL MAPPING OF QUANTITATIVE-TRAIT LOCI ROBY JOEHANES

MULTIPLE-TRAIT MULTIPLE-INTERVAL MAPPING OF QUANTITATIVE-TRAIT LOCI ROBY JOEHANES MULTIPLE-TRAIT MULTIPLE-INTERVAL MAPPING OF QUANTITATIVE-TRAIT LOCI by ROBY JOEHANES B.S., Universitas Pelita Harapan, Indonesia, 1999 M.S., Kansas State University, 2002 A REPORT submitted in partial

More information

Meiosis and Mendel. Chapter 6

Meiosis and Mendel. Chapter 6 Meiosis and Mendel Chapter 6 6.1 CHROMOSOMES AND MEIOSIS Key Concept Gametes have half the number of chromosomes that body cells have. Body Cells vs. Gametes You have body cells and gametes body cells

More information

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees:

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees: MCMC for the analysis of genetic data on pedigrees: Tutorial Session 2 Elizabeth Thompson University of Washington Genetic mapping and linkage lod scores Monte Carlo likelihood and likelihood ratio estimation

More information

QTL model selection: key players

QTL model selection: key players QTL Model Selection. Bayesian strategy. Markov chain sampling 3. sampling genetic architectures 4. criteria for model selection Model Selection Seattle SISG: Yandell 0 QTL model selection: key players

More information

Name Class Date. KEY CONCEPT Gametes have half the number of chromosomes that body cells have.

Name Class Date. KEY CONCEPT Gametes have half the number of chromosomes that body cells have. Section 1: Chromosomes and Meiosis KEY CONCEPT Gametes have half the number of chromosomes that body cells have. VOCABULARY somatic cell autosome fertilization gamete sex chromosome diploid homologous

More information

QTL Model Search. Brian S. Yandell, UW-Madison January 2017

QTL Model Search. Brian S. Yandell, UW-Madison January 2017 QTL Model Search Brian S. Yandell, UW-Madison January 2017 evolution of QTL models original ideas focused on rare & costly markers models & methods refined as technology advanced single marker regression

More information

THE WORK OF GREGOR MENDEL

THE WORK OF GREGOR MENDEL GENETICS NOTES THE WORK OF GREGOR MENDEL Genetics-. - Austrian monk- the father of genetics- carried out his work on. Pea flowers are naturally, which means that sperm cells fertilize the egg cells in

More information

Reinforcement Unit 3 Resource Book. Meiosis and Mendel KEY CONCEPT Gametes have half the number of chromosomes that body cells have.

Reinforcement Unit 3 Resource Book. Meiosis and Mendel KEY CONCEPT Gametes have half the number of chromosomes that body cells have. 6.1 CHROMOSOMES AND MEIOSIS KEY CONCEPT Gametes have half the number of chromosomes that body cells have. Your body is made of two basic cell types. One basic type are somatic cells, also called body cells,

More information

Prediction of the Confidence Interval of Quantitative Trait Loci Location

Prediction of the Confidence Interval of Quantitative Trait Loci Location Behavior Genetics, Vol. 34, No. 4, July 2004 ( 2004) Prediction of the Confidence Interval of Quantitative Trait Loci Location Peter M. Visscher 1,3 and Mike E. Goddard 2 Received 4 Sept. 2003 Final 28

More information

Solutions to Problem Set 4

Solutions to Problem Set 4 Question 1 Solutions to 7.014 Problem Set 4 Because you have not read much scientific literature, you decide to study the genetics of garden peas. You have two pure breeding pea strains. One that is tall

More information

Heredity and Genetics WKSH

Heredity and Genetics WKSH Chapter 6, Section 3 Heredity and Genetics WKSH KEY CONCEPT Mendel s research showed that traits are inherited as discrete units. Vocabulary trait purebred law of segregation genetics cross MAIN IDEA:

More information

Lesson 4: Understanding Genetics

Lesson 4: Understanding Genetics Lesson 4: Understanding Genetics 1 Terms Alleles Chromosome Co dominance Crossover Deoxyribonucleic acid DNA Dominant Genetic code Genome Genotype Heredity Heritability Heritability estimate Heterozygous

More information

Life Cycles, Meiosis and Genetic Variability24/02/2015 2:26 PM

Life Cycles, Meiosis and Genetic Variability24/02/2015 2:26 PM Life Cycles, Meiosis and Genetic Variability iclicker: 1. A chromosome just before mitosis contains two double stranded DNA molecules. 2. This replicated chromosome contains DNA from only one of your parents

More information

Section 11 1 The Work of Gregor Mendel

Section 11 1 The Work of Gregor Mendel Chapter 11 Introduction to Genetics Section 11 1 The Work of Gregor Mendel (pages 263 266) What is the principle of dominance? What happens during segregation? Gregor Mendel s Peas (pages 263 264) 1. The

More information

Detection of multiple QTL with epistatic effects under a mixed inheritance model in an outbred population

Detection of multiple QTL with epistatic effects under a mixed inheritance model in an outbred population Genet. Sel. Evol. 36 (2004) 415 433 415 c INRA, EDP Sciences, 2004 DOI: 10.1051/gse:2004009 Original article Detection of multiple QTL with epistatic effects under a mixed inheritance model in an outbred

More information

Linear Regression (1/1/17)

Linear Regression (1/1/17) STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression

More information

KEY: Chapter 9 Genetics of Animal Breeding.

KEY: Chapter 9 Genetics of Animal Breeding. KEY: Chapter 9 Genetics of Animal Breeding. Answer each question using the reading assigned to you. You can access this information by clicking on the following URL: https://drive.google.com/a/meeker.k12.co.us/file/d/0b1yf08xgyhnad08xugxsnfvba28/edit?usp=sh

More information

PRINCIPLES OF MENDELIAN GENETICS APPLICABLE IN FORESTRY. by Erich Steiner 1/

PRINCIPLES OF MENDELIAN GENETICS APPLICABLE IN FORESTRY. by Erich Steiner 1/ PRINCIPLES OF MENDELIAN GENETICS APPLICABLE IN FORESTRY by Erich Steiner 1/ It is well known that the variation exhibited by living things has two components, one hereditary, the other environmental. One

More information

Introduction to Genetics

Introduction to Genetics Introduction to Genetics The Work of Gregor Mendel B.1.21, B.1.22, B.1.29 Genetic Inheritance Heredity: the transmission of characteristics from parent to offspring The study of heredity in biology is

More information

A Statistical Framework for Expression Trait Loci (ETL) Mapping. Meng Chen

A Statistical Framework for Expression Trait Loci (ETL) Mapping. Meng Chen A Statistical Framework for Expression Trait Loci (ETL) Mapping Meng Chen Prelim Paper in partial fulfillment of the requirements for the Ph.D. program in the Department of Statistics University of Wisconsin-Madison

More information

Family resemblance can be striking!

Family resemblance can be striking! Family resemblance can be striking! 1 Chapter 14. Mendel & Genetics 2 Gregor Mendel! Modern genetics began in mid-1800s in an abbey garden, where a monk named Gregor Mendel documented inheritance in peas

More information

MCMC IN THE ANALYSIS OF GENETIC DATA ON PEDIGREES

MCMC IN THE ANALYSIS OF GENETIC DATA ON PEDIGREES MCMC IN THE ANALYSIS OF GENETIC DATA ON PEDIGREES Elizabeth A. Thompson Department of Statistics, University of Washington Box 354322, Seattle, WA 98195-4322, USA Email: thompson@stat.washington.edu This

More information

Lecture 6. QTL Mapping

Lecture 6. QTL Mapping Lecture 6 QTL Mapping Bruce Walsh. Aug 2003. Nordic Summer Course MAPPING USING INBRED LINE CROSSES We start by considering crosses between inbred lines. The analysis of such crosses illustrates many of

More information

Inferring Genetic Architecture of Complex Biological Processes

Inferring Genetic Architecture of Complex Biological Processes Inferring Genetic Architecture of Complex Biological Processes BioPharmaceutical Technology Center Institute (BTCI) Brian S. Yandell University of Wisconsin-Madison http://www.stat.wisc.edu/~yandell/statgen

More information

Chapter 11 INTRODUCTION TO GENETICS

Chapter 11 INTRODUCTION TO GENETICS Chapter 11 INTRODUCTION TO GENETICS 11-1 The Work of Gregor Mendel I. Gregor Mendel A. Studied pea plants 1. Reproduce sexually (have two sex cells = gametes) 2. Uniting of male and female gametes = Fertilization

More information

Name Class Date. Pearson Education, Inc., publishing as Pearson Prentice Hall. 33

Name Class Date. Pearson Education, Inc., publishing as Pearson Prentice Hall. 33 Chapter 11 Introduction to Genetics Chapter Vocabulary Review Matching On the lines provided, write the letter of the definition of each term. 1. genetics a. likelihood that something will happen 2. trait

More information

is the scientific study of. Gregor Mendel was an Austrian monk. He is considered the of genetics. Mendel carried out his work with ordinary garden.

is the scientific study of. Gregor Mendel was an Austrian monk. He is considered the of genetics. Mendel carried out his work with ordinary garden. 11-1 The 11-1 Work of Gregor Mendel The Work of Gregor Mendel is the scientific study of. Gregor Mendel was an Austrian monk. He is considered the of genetics. Mendel carried out his work with ordinary

More information

Outline for today s lecture (Ch. 14, Part I)

Outline for today s lecture (Ch. 14, Part I) Outline for today s lecture (Ch. 14, Part I) Ploidy vs. DNA content The basis of heredity ca. 1850s Mendel s Experiments and Theory Law of Segregation Law of Independent Assortment Introduction to Probability

More information

The Admixture Model in Linkage Analysis

The Admixture Model in Linkage Analysis The Admixture Model in Linkage Analysis Jie Peng D. Siegmund Department of Statistics, Stanford University, Stanford, CA 94305 SUMMARY We study an appropriate version of the score statistic to test the

More information

Natural Selection. Population Dynamics. The Origins of Genetic Variation. The Origins of Genetic Variation. Intergenerational Mutation Rate

Natural Selection. Population Dynamics. The Origins of Genetic Variation. The Origins of Genetic Variation. Intergenerational Mutation Rate Natural Selection Population Dynamics Humans, Sickle-cell Disease, and Malaria How does a population of humans become resistant to malaria? Overproduction Environmental pressure/competition Pre-existing

More information

CSS 350 Midterm #2, 4/2/01

CSS 350 Midterm #2, 4/2/01 6. In corn three unlinked dominant genes are necessary for aleurone color. The genotypes B-D-B- are colored. If any of these loci is homozygous recessive the aleurone will be colorless. What is the expected

More information

Module 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline

Module 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline Module 4: Bayesian Methods Lecture 9 A: Default prior selection Peter Ho Departments of Statistics and Biostatistics University of Washington Outline Je reys prior Unit information priors Empirical Bayes

More information

A New Metric for Parental Selection in Plant Breeding

A New Metric for Parental Selection in Plant Breeding Graduate Theses and Dissertations Graduate College 2014 A New Metric for Parental Selection in Plant Breeding Ye Han Iowa State University Follow this and additional works at: http://libdriastateedu/etd

More information

Name: Period: EOC Review Part F Outline

Name: Period: EOC Review Part F Outline Name: Period: EOC Review Part F Outline Mitosis and Meiosis SC.912.L.16.17 Compare and contrast mitosis and meiosis and relate to the processes of sexual and asexual reproduction and their consequences

More information

Chapter 11 Meiosis and Genetics

Chapter 11 Meiosis and Genetics Chapter 11 Meiosis and Genetics Chapter 11 Meiosis and Genetics Grade:«grade» Subject:Biology Date:«date» 1 What are homologous chromosomes? A two tetrads, both from mom or both from dad B a matching pair

More information

Evaluating the Performance of a Block Updating McMC Sampler in a Simple Genetic Application

Evaluating the Performance of a Block Updating McMC Sampler in a Simple Genetic Application Evaluating the Performance of a Block Updating McMC Sampler in a Simple Genetic Application N.A. SHEEHAN B. GULDBRANDTSEN 1 D.A. SORENSEN 1 1 Markov chain Monte Carlo (McMC) methods have provided an enormous

More information

Chapter 6 Meiosis and Mendel

Chapter 6 Meiosis and Mendel UNIT 3 GENETICS Chapter 6 Meiosis and Mendel 1 hairy ears (hypertrichosis)- due to holandric gene. (Y chromosome)-only occurs in males. Appears in all sons. 2 Polydactyly- having extra fingers Wendy the

More information

Ch 11.Introduction to Genetics.Biology.Landis

Ch 11.Introduction to Genetics.Biology.Landis Nom Section 11 1 The Work of Gregor Mendel (pages 263 266) This section describes how Gregor Mendel studied the inheritance of traits in garden peas and what his conclusions were. Introduction (page 263)

More information

Mendelian Genetics. Introduction to the principles of Mendelian Genetics

Mendelian Genetics. Introduction to the principles of Mendelian Genetics + Mendelian Genetics Introduction to the principles of Mendelian Genetics + What is Genetics? n It is the study of patterns of inheritance and variations in organisms. n Genes control each trait of a living

More information

Introduction to Genetics

Introduction to Genetics Chapter 11 Introduction to Genetics Section 11 1 The Work of Gregor Mendel (pages 263 266) This section describes how Gregor Mendel studied the inheritance of traits in garden peas and what his conclusions

More information

Selecting explanatory variables with the modified version of Bayesian Information Criterion

Selecting explanatory variables with the modified version of Bayesian Information Criterion Selecting explanatory variables with the modified version of Bayesian Information Criterion Institute of Mathematics and Computer Science, Wrocław University of Technology, Poland in cooperation with J.K.Ghosh,

More information

genome a specific characteristic that varies from one individual to another gene the passing of traits from one generation to the next

genome a specific characteristic that varies from one individual to another gene the passing of traits from one generation to the next genetics the study of heredity heredity sequence of DNA that codes for a protein and thus determines a trait genome a specific characteristic that varies from one individual to another gene trait the passing

More information

Outline. P o purple % x white & white % x purple& F 1 all purple all purple. F purple, 224 white 781 purple, 263 white

Outline. P o purple % x white & white % x purple& F 1 all purple all purple. F purple, 224 white 781 purple, 263 white Outline - segregation of alleles in single trait crosses - independent assortment of alleles - using probability to predict outcomes - statistical analysis of hypotheses - conditional probability in multi-generation

More information

Guided Notes Unit 6: Classical Genetics

Guided Notes Unit 6: Classical Genetics Name: Date: Block: Chapter 6: Meiosis and Mendel I. Concept 6.1: Chromosomes and Meiosis Guided Notes Unit 6: Classical Genetics a. Meiosis: i. (In animals, meiosis occurs in the sex organs the testes

More information

Full file at CHAPTER 2 Genetics

Full file at   CHAPTER 2 Genetics CHAPTER 2 Genetics MULTIPLE CHOICE 1. Chromosomes are a. small linear bodies. b. contained in cells. c. replicated during cell division. 2. A cross between true-breeding plants bearing yellow seeds produces

More information

Unit 3 - Molecular Biology & Genetics - Review Packet

Unit 3 - Molecular Biology & Genetics - Review Packet Name Date Hour Unit 3 - Molecular Biology & Genetics - Review Packet True / False Questions - Indicate True or False for the following statements. 1. Eye color, hair color and the shape of your ears can

More information

Dropping Your Genes. A Simulation of Meiosis and Fertilization and An Introduction to Probability

Dropping Your Genes. A Simulation of Meiosis and Fertilization and An Introduction to Probability Dropping Your Genes A Simulation of Meiosis and Fertilization and An Introduction to To fully understand Mendelian genetics (and, eventually, population genetics), you need to understand certain aspects

More information

Cell Division: the process of copying and dividing entire cells The cell grows, prepares for division, and then divides to form new daughter cells.

Cell Division: the process of copying and dividing entire cells The cell grows, prepares for division, and then divides to form new daughter cells. Mitosis & Meiosis SC.912.L.16.17 Compare and contrast mitosis and meiosis and relate to the processes of sexual and asexual reproduction and their consequences for genetic variation. 1. Students will describe

More information

A new simple method for improving QTL mapping under selective genotyping

A new simple method for improving QTL mapping under selective genotyping Genetics: Early Online, published on September 22, 2014 as 10.1534/genetics.114.168385 A new simple method for improving QTL mapping under selective genotyping Hsin-I Lee a, Hsiang-An Ho a and Chen-Hung

More information

Multiple interval mapping for ordinal traits

Multiple interval mapping for ordinal traits Genetics: Published Articles Ahead of Print, published on April 3, 2006 as 10.1534/genetics.105.054619 Multiple interval mapping for ordinal traits Jian Li,,1, Shengchu Wang and Zhao-Bang Zeng,, Bioinformatics

More information

Causal Model Selection Hypothesis Tests in Systems Genetics

Causal Model Selection Hypothesis Tests in Systems Genetics 1 Causal Model Selection Hypothesis Tests in Systems Genetics Elias Chaibub Neto and Brian S Yandell SISG 2012 July 13, 2012 2 Correlation and Causation The old view of cause and effect... could only fail;

More information

Q Expected Coverage Achievement Merit Excellence. Punnett square completed with correct gametes and F2.

Q Expected Coverage Achievement Merit Excellence. Punnett square completed with correct gametes and F2. NCEA Level 2 Biology (91157) 2018 page 1 of 6 Assessment Schedule 2018 Biology: Demonstrate understanding of genetic variation and change (91157) Evidence Q Expected Coverage Achievement Merit Excellence

More information

Linkage Mapping. Reading: Mather K (1951) The measurement of linkage in heredity. 2nd Ed. John Wiley and Sons, New York. Chapters 5 and 6.

Linkage Mapping. Reading: Mather K (1951) The measurement of linkage in heredity. 2nd Ed. John Wiley and Sons, New York. Chapters 5 and 6. Linkage Mapping Reading: Mather K (1951) The measurement of linkage in heredity. 2nd Ed. John Wiley and Sons, New York. Chapters 5 and 6. Genetic maps The relative positions of genes on a chromosome can

More information

Advance Organizer. Topic: Mendelian Genetics and Meiosis

Advance Organizer. Topic: Mendelian Genetics and Meiosis Name: Row Unit 8 - Chapter 11 - Mendelian Genetics and Meiosis Advance Organizer Topic: Mendelian Genetics and Meiosis 1. Objectives (What should I be able to do?) a. Summarize the outcomes of Gregor Mendel's

More information

THE data in the QTL mapping study are usually composed. A New Simple Method for Improving QTL Mapping Under Selective Genotyping INVESTIGATION

THE data in the QTL mapping study are usually composed. A New Simple Method for Improving QTL Mapping Under Selective Genotyping INVESTIGATION INVESTIGATION A New Simple Method for Improving QTL Mapping Under Selective Genotyping Hsin-I Lee,* Hsiang-An Ho,* and Chen-Hung Kao*,,1 *Institute of Statistical Science, Academia Sinica, Taipei 11529,

More information

CHAPTER 23 THE EVOLUTIONS OF POPULATIONS. Section C: Genetic Variation, the Substrate for Natural Selection

CHAPTER 23 THE EVOLUTIONS OF POPULATIONS. Section C: Genetic Variation, the Substrate for Natural Selection CHAPTER 23 THE EVOLUTIONS OF POPULATIONS Section C: Genetic Variation, the Substrate for Natural Selection 1. Genetic variation occurs within and between populations 2. Mutation and sexual recombination

More information

Gibbs Sampling in Linear Models #2

Gibbs Sampling in Linear Models #2 Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling

More information

The Chromosomal Basis of Inheritance

The Chromosomal Basis of Inheritance The Chromosomal Basis of Inheritance Mitosis and meiosis were first described in the late 800s. The chromosome theory of inheritance states: Mendelian genes have specific loci (positions) on chromosomes.

More information

On the mapping of quantitative trait loci at marker and non-marker locations

On the mapping of quantitative trait loci at marker and non-marker locations Genet. Res., Camb. (2002), 79, pp. 97 106. With 3 figures. 2002 Cambridge University Press DOI: 10.1017 S0016672301005420 Printed in the United Kingdom 97 On the mapping of quantitative trait loci at marker

More information

Genetics_2011.notebook. May 13, Aim: What is heredity? Homework. Rd pp p.270 # 2,3,4. Feb 8 11:46 PM. Mar 25 1:15 PM.

Genetics_2011.notebook. May 13, Aim: What is heredity? Homework. Rd pp p.270 # 2,3,4. Feb 8 11:46 PM. Mar 25 1:15 PM. Aim: What is heredity? LE1 3/25/11 Do Now: 1.Make a T Chart comparing and contrasting mitosis & meiosis. 2. Have your lab out to be collected Homework for Tuesday 3/29 Read pp. 267 270 p.270 # 1,3 Vocabulary:

More information

Sexual Reproduction and Genetics

Sexual Reproduction and Genetics Chapter Test A CHAPTER 10 Sexual Reproduction and Genetics Part A: Multiple Choice In the space at the left, write the letter of the term, number, or phrase that best answers each question. 1. How many

More information

Biol. 303 EXAM I 9/22/08 Name

Biol. 303 EXAM I 9/22/08 Name Biol. 303 EXAM I 9/22/08 Name -------------------------------------------------------------------------------------------------------------- This exam consists of 40 multiple choice questions worth 2.5

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Teacher: Cheely/ Harbuck Course: Biology Period(s): All Day Week of: 1/12/15 EOCEP Lesson Plan/5E s

Teacher: Cheely/ Harbuck Course: Biology Period(s): All Day Week of: 1/12/15 EOCEP Lesson Plan/5E s EOCEP Lesson Plan/5E s Day of the Week Monday Curriculum 2005 SDE Support Doc Standard:: B-4: The student will demonstrate an understanding of the molecular basis of heredity. Indicator: B-4.5 Goals (Objectives

More information

For 5% confidence χ 2 with 1 degree of freedom should exceed 3.841, so there is clear evidence for disequilibrium between S and M.

For 5% confidence χ 2 with 1 degree of freedom should exceed 3.841, so there is clear evidence for disequilibrium between S and M. STAT 550 Howework 6 Anton Amirov 1. This question relates to the same study you saw in Homework-4, by Dr. Arno Motulsky and coworkers, and published in Thompson et al. (1988; Am.J.Hum.Genet, 42, 113-124).

More information

Chapter 13 Meiosis and Sexual Reproduction

Chapter 13 Meiosis and Sexual Reproduction Biology 110 Sec. 11 J. Greg Doheny Chapter 13 Meiosis and Sexual Reproduction Quiz Questions: 1. What word do you use to describe a chromosome or gene allele that we inherit from our Mother? From our Father?

More information

Objective 3.01 (DNA, RNA and Protein Synthesis)

Objective 3.01 (DNA, RNA and Protein Synthesis) Objective 3.01 (DNA, RNA and Protein Synthesis) DNA Structure o Discovered by Watson and Crick o Double-stranded o Shape is a double helix (twisted ladder) o Made of chains of nucleotides: o Has four types

More information

Causal Graphical Models in Systems Genetics

Causal Graphical Models in Systems Genetics 1 Causal Graphical Models in Systems Genetics 2013 Network Analysis Short Course - UCLA Human Genetics Elias Chaibub Neto and Brian S Yandell July 17, 2013 Motivation and basic concepts 2 3 Motivation

More information

Chapter 4 Lesson 1 Heredity Notes

Chapter 4 Lesson 1 Heredity Notes Chapter 4 Lesson 1 Heredity Notes Phases of Meiosis I Prophase I Nuclear membrane breaks apart and chromosomes condense. 3.1 Sexual Reproduction and Meiosis Metaphase I Sister chromatids line up along

More information

-Genetics- Guided Notes

-Genetics- Guided Notes -Genetics- Guided Notes Chromosome Number The Chromosomal Theory of Inheritance genes are located in specific on chromosomes. Homologous Chromosomes chromosomes come in, one from the male parent and one

More information

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics 1 Springer Nan M. Laird Christoph Lange The Fundamentals of Modern Statistical Genetics 1 Introduction to Statistical Genetics and Background in Molecular Genetics 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

More information

EVOLUTION ALGEBRA Hartl-Clark and Ayala-Kiger

EVOLUTION ALGEBRA Hartl-Clark and Ayala-Kiger EVOLUTION ALGEBRA Hartl-Clark and Ayala-Kiger Freshman Seminar University of California, Irvine Bernard Russo University of California, Irvine Winter 2015 Bernard Russo (UCI) EVOLUTION ALGEBRA 1 / 10 Hartl

More information

Observing Patterns in Inherited Traits

Observing Patterns in Inherited Traits Observing Patterns in Inherited Traits Chapter 10 Before you go on Review the answers to the following questions to test your understanding of previous material. 1. Most organisms are diploid. What does

More information

Interest Grabber. Analyzing Inheritance

Interest Grabber. Analyzing Inheritance Interest Grabber Section 11-1 Analyzing Inheritance Offspring resemble their parents. Offspring inherit genes for characteristics from their parents. To learn about inheritance, scientists have experimented

More information

Computational statistics

Computational statistics Computational statistics Combinatorial optimization Thierry Denœux February 2017 Thierry Denœux Computational statistics February 2017 1 / 37 Combinatorial optimization Assume we seek the maximum of f

More information

9 Genetic diversity and adaptation Support. AQA Biology. Genetic diversity and adaptation. Specification reference. Learning objectives.

9 Genetic diversity and adaptation Support. AQA Biology. Genetic diversity and adaptation. Specification reference. Learning objectives. Genetic diversity and adaptation Specification reference 3.4.3 3.4.4 Learning objectives After completing this worksheet you should be able to: understand how meiosis produces haploid gametes know how

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Linkage analysis and QTL mapping in autotetraploid species. Christine Hackett Biomathematics and Statistics Scotland Dundee DD2 5DA

Linkage analysis and QTL mapping in autotetraploid species. Christine Hackett Biomathematics and Statistics Scotland Dundee DD2 5DA Linkage analysis and QTL mapping in autotetraploid species Christine Hackett Biomathematics and Statistics Scotland Dundee DD2 5DA Collaborators John Bradshaw Zewei Luo Iain Milne Jim McNicol Data and

More information

Parents can produce many types of offspring. Families will have resemblances, but no two are exactly alike. Why is that?

Parents can produce many types of offspring. Families will have resemblances, but no two are exactly alike. Why is that? Parents can produce many types of offspring Families will have resemblances, but no two are exactly alike. Why is that? Meiosis and Genetic Linkage Objectives Recognize the significance of meiosis to sexual

More information

theta H H H H H H H H H H H K K K K K K K K K K centimorgans

theta H H H H H H H H H H H K K K K K K K K K K centimorgans Linkage Phase Recall that the recombination fraction ρ for two loci denotes the probability of a recombination event between those two loci. For loci on different chromosomes, ρ = 1=2. For loci on the

More information