Hierarchical Generalized Linear Models for Multiple QTL Mapping

Size: px
Start display at page:

Download "Hierarchical Generalized Linear Models for Multiple QTL Mapping"

Transcription

1 Genetics: Published Articles Ahead of Print, published on January 1, 009 as /genetics Hierarchical Generalized Linear Models for Multiple QTL Mapping Nengun Yi 1,* and Samprit Baneree 1 Department of Biostatistics, Section on Statistical Genetics, University of Alabama at Birmingham, Birmingham, Alabama 3594, Divison of Biostatistics and Epidemiology, Department of Public Health, Weill Medical College of Cornell University, New York, NY-1001 * Corresponding author: Nengun Yi Department of Biostatistics University of Alabama at Birmingham Birmingham, AL Phone: Fax: nyi@ms.soph.uab.edu Key words: Bayesian methods, Generalized linear models, Interactions, Quantitative trait loci, Shrinkage Running title: Generalized QTL models 1

2 ABSTRACT We develop hierarchical generalized linear models and computationally efficient algorithms for genome-wide analysis of quantitative trait loci (QTL) for various types of phenotypes in experimental crosses. The proposed models can fit a large number of effects, including covariates, main effects of numerous loci, gene-gene (epistasis) and gene-environment (G E) interactions. The key to the approach is the use of continuous prior distribution on coefficients that favors sparseness in the fitted model and facilitates computation. We develop a fast expectation-maximization (EM) algorithm to fit models by estimating posterior modes of coefficients. We incorporate our algorithm into the iteratively weighted least squares for classical generalized linear models as implemented in the package R. We propose a model search strategy to build a parsimonious model. Our method takes advantage of the special correlation structure in QTL data. Simulation studies demonstrate reasonable power to detect true effects, while controlling the rate of false positives. We illustrate with three real datasets and compare our method to existing methods for multiple QTL mapping. Our method has been implemented in our freely available package R/qtlbim ( providing a valuable addition to our previous Markov chain Monte Carlo (MCMC) approach.

3 INTRODUCTION Most complex traits are influenced by interacting networks of multiple quantitative trait loci (QTL) and environmental factors (Carlborg and Haley 004). Mapping QTL is to infer which genomic loci are strongly associated with the complex trait, and to estimate genetic effects of these loci, i.e., main effects, gene-gene (epistasis) and gene-environment (G E) interactions. Due to the multi-locus nature of complex traits, it is desirable to simultaneously analyze multiple loci rather than one (or a few) locus at a time. However, QTL mapping studies usually genotype hundreds or thousands of genomic loci (markers), leading to numerous variables and a huge number of possible models, and the dependence of genotypes on a chromosome results in many correlated variables. Therefore, mapping multiple QTL requires sophisticated methods that can handle problems with high-dimensional correlated variables. The popular approaches to mapping multiple QTL are some form of variable selection. Such techniques involve identifying a subset of all possible genetic effects (a multiple QTL model) that best explains the phenotypic variation. Classical variable selection methods use forward or stepwise search procedures and selection criteria such as BIC (Bayesian information criteria) or modified versions to find a multiple QTL model (Kao et al. 1999; Broman and Speed 00; Bogdan et al. 004; Baierl et al. 006). Bayesian methods proceed by setting up a likelihood function for observed data and prior distributions on unobserved quantities. Two types of prior distributions have been suggested for multiple QTL mapping. The first assumes a two-component mixture distribution for each genetic effect, typically a normal distribution with known or unknown variance and a point mass at zero. This discrete prior allows each effect to have 3

4 positive probability of dropping out of the model (Yi 004; Yi and Shriner 008). The second formulation takes continuous prior distributions for genetic effects that favor a sparse structure with many of the effects having values close to zero and few with large values (Meuwissen et al. 001; Xu 003; Yi and Xu 008). These Bayesian models are computed using Markov chain Monte Carlo (MCMC) algorithms to sample from the posterior distribution. Due to the recent development of MCMC algorithms and associated computer software, Bayesian methods have become increasingly popular in QTL mapping (Yi et al. 005; Yi et al. 007; Yandell et al. 007a, b; Yi and Shriner 008). The Bayesian MCMC approaches can provide comprehensive information, but they are computationally intensive in interacting QTL analysis. Further, the existing methods rely primarily upon normal linear models. We present a unified methodology for mapping multiple QTL based upon the generalized linear model framework. Generalized linear model is the main tool for routine statistical analysis, enoying a body of well-developed theory, algorithms and software and including various models as special cases. However, classical generalized linear models cannot simultaneously handle many correlated variables. We propose a Bayesian generalized linear model approach, placing continuous prior distributions on genetic effects, to dealing with the high-dimensional problem. Although various priors can be used for this purpose (Griffin and Brown 007; Park and Casella 008), we consider the well-known Student-t distribution and recommend the choice of hyperparameters to yield a sharp mode at zero and to induce a sparse model. Our unified framework incorporates the advantages of generalized linear models and hierarchical modeling into multiple QTL mapping, allowing us to deal with various types of 4

5 continuous and discrete phenotypes and to simultaneously analyze covariates, main effects of numerous loci, epistatic and G E interactions. In principle, we can fit our hierarchical generalized linear model by adapting the MCMC algorithms of Yi and Xu (008) for normal regressions. However, it is desirable to have a quick calculation that estimates the posterior mode of genetic effects (a point estimate that maximizes the posterior distribution). Although the posterior mode estimate provides less information than a fully Bayesian MCMC analysis, such analyses are similar to classical QTL practice, easy to understand, and can be valuable for identifying significant variables. Various methods have been proposed for the purpose, but they are built upon special models and optimization algorithms (Tibshirani 1996; Figueiredo 003; Kiiveri 003; Efron et al. 004; Genkin et al. 007) and cannot be easily set up for multiple QTL mapping. Recently, some mode-finding methods for mapping multiple QTL have been suggested (Zhang and Xu 005; Xu 007). However, these methods have dealt with only continuous traits and have not been effectively implemented in computer software. We present a fast expectation-maximization (EM) algorithm to fit our hierarchical generalized linear models by estimating the posterior modes of coefficients. We develop our algorithm by making use of the two-level formulation of the Student-t density and expressing the prior information on coefficients as additional data points. This strategy allows us to incorporate our algorithm into the usual iteratively weighted least squares for generalized linear models as implemented in R (R Proect 006). This implementation takes advantage of the well-developed existing algorithm and software, extending available tools for fitting generalized linear models to multiple QTL mapping. 5

6 Although in principle our hierarchical model and algorithm can fit all possible effects in a single model, we don t recommend this and rather propose a novel model search method to build a parsimonious model by seeking significant genetic effects. This procedure provides a flexible and convenient way to deal with large-scale QTL data, and also has the advantage of accommodating the correlation structure in QTL data. Simulation studies demonstrate that our method provides reasonable power to detect true effects and can control the rate of extraneous effects. Real data analyses show that the proposed approach compares favorably to existing sophisticated methods. METHODS Generalized linear models of multiple QTL: We consider experimental crosses derived from two inbred lines (for example, F, backcross and recombinant inbred lines). Observed data in QTL studies consist of phenotypic values of a complex trait, genetic markers across the genome, and/or some relevant environmental factors (covariates). The marker data include the genotypes and the genomic positions of markers. Mapping QTL is to identify genomic loci that associate with the phenotype and to estimate their genetic effects. For simplicity, we describe our methods by considering observed markers as potential QTL. The methods can be extended to consider loci between the observed markers. We use the generalized linear model framework to analyze various types of complex traits. A generalized linear model consists of three components: the linear predictor, the link function, and the distribution of the outcome variable (McCullagh and Nelder 1989; Gelman et al. 003). We simultaneously fit environmental effects, main 6

7 effects of markers, epistatic effects, and gene-environment (G E) interactions. Therefore, the linear predictor can be expressed as η = β + X β + X β + X β + X β Xβ, (1) 0 E E G G GG GG GE GE where β 0 is the intercept, βe and β G represent the vectors of environmental effects and all possible main effects, respectively, β GG and β GE represent the vectors of all possible epistatic and G E interactions, respectively, and X E, X G, X GG, and X GE are the corresponding design matrices of effect predictors. We describe the construction of these design matrices in the next subsection. The invertible link function g relates the linear predictor η to the mean of the outcome variable y: 1 1 E( y X) = g ( η) = g ( Xβ ), () T where the vector y = ( y, y,, y n ) represent the phenotypic values. The distribution of 1 y depends on the linear predictor X β and also a dispersion (or variance) parameterφ (but some models do not requireφ ), and can be expressed as n py ( Xβ, φ) = py ( Xβ, φ), (3) i= 1 i i where X iβ = ηi is the linear predictor for the i th individual. Generalized linear modeling provides a unified framework for statistical analysis; by choosing appropriate link functions and data distributions, some commonly used models, e.g., normal linear (Gaussian), logistic, probit and Poisson regressions, become special cases. 7

8 Construction of the design matrices: There is some discussion in the QTL mapping literature on the best way to construct contrasts for genetic predictors (Zeng et al. 005). Among those, due to its orthogonal property, the Cockerham genetic model is widely used in QTL mapping and is applied in this study. For a backcross design, the Cockerham model defines the values of the main-effect contrast as -0.5 or 0.5 for two genotypes at any locus. For an intercross (F ) design, there are two types of main effects, called additive and dominance effects, and the original Cockerham model defines additive contrast as -1, 0 and 1 for three genotypes and dominance contrast as -0.5 and 0.5 for homozygotes and heterozygote, respectively. To make additive and dominance contrasts have a common scale, however, we rescale additive contrast as , 0 and For loci with missing genotypic values, we replace the values of the above contrasts by their conditional expectations given the observed marker data (Haley and Knott 199). The conditional expectations can be calculated using the multipoint method (Jiang and Zeng 1997). For each covariate, we transform the raw values to have a mean of 0 and a standard deviation of 0.5, by subtracting the mean and dividing by *sd (the standard deviation of the raw values) (Gelman et al. 008). This transformation standardizes all the covariates to a common scale, the scale of all the genetic main effects described above. Finally, the epistatic contrasts multiplying two corresponding main-effect contrasts. X GG or the G E contrasts X GE are constructed by Prior and posterior distributions: Mapping QTL is equivalent to estimating coefficients β in the above model, including environmental effects, main effects of markers, epiatstic and G E interactions. The number of coefficients in the model can be 8

9 large and the predictors are highly correlated, precluding the use of classical maximum likelihood methods. This problem could be solved by setting up a prior distribution on β to capture the notion that most of the components of β are likely to be zero or at least negligible; such prior distributions are often referred to as shrinkage priors. For the intercept β 0 and the dispersion parameterφ if present, we use the uniform priors, p( β0) 1 and p( φ) 1. We assume independent Student-t priors tν s on coefficients β s, with ν (0, ) and s chosen to give each coefficient a high probability of being near zero while still allowing for occasional large effects. We are motivated to use the t distribution because it allows for robust inference, shrinkage estimation and easy computation (Gelman et al. 003). There is no easy way to estimate the coefficients directly using the t densities, but it is straightforward to deal with the two-level formulation of t distribution (Gelman et al. 003; Gelman et al. 008). The t distribution tν s can be expressed as a mixture of (0, ) normal distributions with mean 0 and variance distributed as scaled inverse- χ : β τ τ, ~ N(0, ) τ ~ Inv- χ ( ν, s ), = 1,, J, (4) where J is the number of the coefficients, and the hyperparameters ν > 0 and s > 0 represent the degrees of freedom and the scale of the distribution, respectively. The priors (4) introduce coefficient-specific variances, resulting in distinct shrinkage for different coefficients. The variancesτ s are not the parameters of interest, but they are useful intermediate quantities to allow easy and efficient computation. The hyperparametersν and s affect the amount of shrinkage in the coefficient estimates and 9

10 should be carefully chosen. We shall discuss our choice of ν and s after describing our computational algorithm. Our algorithm highlights how these hyperparameters affect the estimates of the parameters. With the above prior distributions, we can express the log-posterior distribution of the parameters ( β, φτ, ) as n J J log p( βφτ,, yx, ) log py ( i Xiβφ, ) + log p( β τ ) + log p( τ ν, s) i= 1 = 1 = 1 n J 1 β J ν ν ν s log py ( i Xiβφ, ) logτ + + log s ( 1) log + τ i= 1 = 1 τ = 1 τ, (5) whereτ = ( τ 1,, τ J ). EM algorithm for computing the posterior mode: Our hierarchical generalized linear model can be fit using MCMC algorithms, applied to the oint posterior distribution p ( βφτ,, yx, ) (Gelman et al. 003). However, it is desirable to have a faster computation that estimates the posterior mode of β and φ by maximizing the marginal posterior p( β, φ yx, ) rather than the full posterior distribution. The two-level hierarchical model described above allows us to obtain the posterior mode by using the EM algorithm (Gelman et al. 003; Gelman et al. 008). Following Gelman et al. (008), we develop our approach built upon the standard algorithm for classical generalized linear models. The analysis of classical generalized linear models uses iterative weighted linear regression to obtain approximate maximum likelihood estimates for the parameters (as implemented in the R routine glm, for example). The basic method is to approximate the generalized linear model by a normal linear model and then apply the algorithm for 10

11 normal linear models to estimate the parameters ( β, φ ) (Gelman et al. 003; Gelman et al. 008). At each iteration, the algorithm proceeds by calculating pseudo-data z i and pseudo-variances σ i for each observation i based on the current estimates of the parameters( ˆ β, ˆ φ ), approximating the generalized likelihood py ( Xβ, φ ) by the weighted normal likelihood N( z X β, σ ), and then updating the parameters ( β, φ ) by i i i weighted linear regression. The iteration proceeds until convergence. The pseudo-data z i and pseudo-variances σ i are calculated by L'( y ˆ, ˆ i ηi φ) 1 z ˆ i = ηi, σi =, (6) L''( y ˆ η, ˆ φ) L''( y ˆ η, ˆ φ) i i i i whereη = X ˆ β, Ly ( ˆ η, ˆ φ) = log py ( X ˆ β, ˆ φ), L'( y η, φ) = dl( y η, φ)/ dη, ˆi i i i i i i i i i i i i i i i L''( y η, φ) = d L( y η, φ)/ dη, and ˆ β and ˆ φ are the current estimates of β and φ, i i respectively. Given the variancesτ, the prior information on β (i.e., β τ ~ N(0, τ )) can be included in the classical generalized linear model as J additional observations of value 0, with predictor matrix I J and residual variance matrix Σ β = diag( τ,, τ J ), where I J is the J J identity matrix (Gelman et al. 003; Gelman et al. 008). Thus, the generalized linear model with the normal prior on β can be approximated by: 1 z X, β, φ ~ N( X β, Σ ), (7) * * * * z where z * = ( ) 1 0 n + J, X* X =, * I J ( n+ J) J Σz Σ = 0 0 Σ β with Σ = diag( σ,, σ ). Thus, z 1 n we can use the standard iteratively weighted least squares computation to estimate ( β, φ ) 11

12 by performing a weighed regression of the augmented data vector z * on the augmented predictor matrix X * with augmented weight vector w* = ( σ1,, σn, τ1,, τj ). To implement the EM algorithm, we treat the unknown variancesτ = ( τ 1,, τ J ) as missing data, and average over them by replacing the terms involving bothτ and β in the oint posterior (5) by their expected values conditional on the current estimate of ( ˆ, ˆ β φ ). Thus, we must evaluate the conditional expectations of 1/ τ for all. It can be easily shown that the conditional posterior distributions of τ is Inv-χ 1 + ν, ˆ ν s + β, and thus the conditional expectations of 1+ ν 1/ τ equals ˆ ν s + β 1 ν + variances by 1. Therefore, the E-step of our EM algorithm is equivalent to replacing the ˆ τ ˆ ν s + β = 1+ ν, = 1,, J, (8) where ˆ β is the current estimate of β. We have incorporated our algorithm into the standard package glm in R for fitting generalized linear models. We altered the glm function by inserting the steps for calculating the augmented data (7) and updating the variances (8) into the iterative procedure in the glm function. Our algorithm is initialized by setting eachτ to a small value (say τ = 0.1) and ( β, φ ) to the starting value provided by the glm function. At each step of our algorithm, we average over the variances τ = ( τ 1,, τ J ) and then 1

13 update ( β, φ ) by maximizing the posterior density (5). In summary, our algorithm proceeds as follows: 1. Based on the current value of ( β, φ ), calculate pseudo-data z and pseudo-variances σ using (6);. E-step: replace each variance τ by its conditional expectation using (8); 3. M-step: perform the weighted least squares regression based on the augmented data to obtain the estimate ( ˆ β, ˆ φ ) using (7); 4. Repeat steps 1-3 until convergence. We apply the criterion in the glm function to assess convergence. In practice our algorithm converges rapidly. At convergence of the algorithm, we obtain all the outputs produced by the glm function, including the latest estimate ˆ β, their standard deviations and p-values (for testing β = 0 ), and some additional values (e.g., for the variances). The outputs are automatically stored as a standard glm obect. Choosing the hyperparameters: Before discussing our choice of the hyperparameters ν and s we discuss the improper Jeffreys prior on the variances, π ( τ ) τ, which is equivalent to the uniform distribution on logτ and is the limiting case of the scaled inverse- χ density withν = 0. The Jeffreys prior yields an improper posterior on each β with an infinite mode at zero. As β increases, however, a second finite local mode appears away from zero (e.g., Griffin and Brown 007; ter Braak et al 005). The property of a model with the Jeffreys prior may make it problematic to fully explore the posterior, but still formally allows the analysis of posterior modes (see Griffin 13

14 and Brown 007). Much work has shown that the Jeffreys prior leads to good performance in MCMC and mode-finding analyses, yielding strong shrinkage for very small effects but weak shrinkage for large effects (Xu 003; Bae and Mallick 004; Figueiredo 003; Kiiveri 003; Griffin and Brown 007). We choose ν and s to retain the good features of the Jeffreys prior while avoiding the drawback of impropriety of both prior and posterior. This can be achieved by setting ν and s to be small enough so that the variances ˆ τ obtained by the equation (8) are close to zero for near-zero coefficients ( β 0), but approximately equals β for ˆ large coefficients ( β ν s ˆ ). We set ν = 0.01 and s = 10-4 in our application. This prior works well and stably. However, we find that small changes from this default value (e.g., ν from 0 to 0.1, s from 0 to 10-3 ) do not affect the results. The small positive value of ν and s yields a proper prior on β sharply peaked near zero but approximately uniform for the range away from zero, and thus leads to strong sparseness in the fitted model while allowing for occasional large coefficients. Finally, our scaled ˆ inverse- χ prior includes uniform priors for β as a special case ( ν = s = ), and thus we can perform no shrinkage for certain variables (e.g., relevant covariates). Model search strategy: In principle the above hierarchical models can handle any number of variables (see the augmented regression (8), where the number of data points (n + J) is larger than the number of variables J), and thus can simultaneously estimate genetic effects of a large number of markers. For biological interpretability, however, we prefer using only a subset of the genetic effects in the model. Furthermore, including all possible effects requires large memory and intensive computation. 14

15 Therefore, we propose a model search strategy to build a parsimonious model, beginning with a model with no genetic effect but some relevant covariates if any, gradually adding different types of genetic effects into the model and then fitting the new model. It is expected that many variables have no effects on the phenotype and should be pruned from the model. We set a very small threshold value t 1 (say 10-8 ) and delete genetic effects satisfying ˆ β < t 1 from the model. Our approach differs from classical forward stepwise methods by simultaneously adding many correlated variables and deleting many near-zero variables. As described below, our search strategy can take advantage of the special correlation structure in QTL data: genotypes on same chromosomes are correlated. In summary, our approach proceeds as follows: 1. Searching for main effects: for each chromosome c ( c = 1,, ), simultaneously add all possible main effects for markers on chromosome c into the current model, fit the model and then delete some main effects based on the above mentioned criterion ;. Searching for epistatic effects among the included main effects: simultaneously add all possible epistatic interactions among the included main effects into the current model, fit the model and delete some epistatic effects; 3. Searching for epistatic effects between the excluded and the included main effects: for chromosome c ( c = 1,, ), simultaneously add all possible epistatic interactions between the remaining main effects (not included in the current model) on chromosome c and the included main effects into the current model, fit the model and delete some epistatic effects added in this step; 15

16 4. Searching for interactions between the covariate(s) and main effects: for chromosome c ( c = 1,, ), simultaneously add all possible interactions between the covariate(s) and all possible main effects on chromosome c into the current model, fit the model and delete some G E interactions added in this step. After these steps, we obtain a final model with preset covariates and genetic effects satisfying ˆ β > t 1. With our shrinkage prior, the maority of genetic effects are shrunk to zero, and thus the final model includes only a limited number of variables. As shown in our simulation studies, our model search procedure picks up true effects with reasonable high probability, but can occasionally include some extraneous effects. However, inclusion of spurious effects has little effect on detecting and estimating true or strong effects, and the estimates of spurious effects (and the corresponding p-values) are smaller (larger) than those of the true effects, allowing us to easily identify strong causal effects. Although not necessary, however, it could be useful to have a formal procedure to remove extraneous effects from the final model. A simple way is to use the p-values for testing β = 0 (equivalent to the likelihood ratio test) in the final model; we set a small threshold value t (say 0.001) and remove genetic effects with the p-values larger than t from the model. As described later, a value of t = can greatly reduce the rate of false positives. However, the p-values should be interpreted with caution, as they are based on the final model. Implementation in R/qtlbim: The freely available package R/qtlbim ( is an extensible and interactive environment for multiple QTL mapping in experimental crosses (Yandell et al. 007), built on top of the widely used R/qtl (Broman et al. 003). The previous version of R/qtlbim performs fully Bayesian 16

17 analyses via MCMC algorithms to simultaneously handle covariates, main effects, epistatic and gene-environment interactions for continuous traits, binary and ordinal traits. We have incorporated the proposed method into R/qtlbim, by creating new functions to implement our model fitting algorithm and model search procedure. The outputs for the final model are stored as a standard glm obect and thus can be summarized and displayed conventionally. SIMULATION STUDIES Simulation design: Although our method can handle various types of phenotypes and models, we here evaluate the performance of the method for a normal continuous trait and a binary trait. We expect that the basic conclusions hold for other cases. In our simulations, we generated the continuous phenotypes based on different multiple QTL models and the binary trait by transforming the continuous values into two categories with the proportions of 40 and 60%, respectively. We generated a genome with 19 chromosomes, each of length 100 cm. Five percent of the marker genotypes were assumed to be randomly missing. The first experiment was an F intercross composed of 400 progenies and having 11 equally spaced markers (placed 10 cm apart) on each chromosome. We considered a model with one binary covariate and five QTL with only main effects. The covariate effect explained ~ 4% of the variance of the continuous phenotype. Table 1a presents the simulated positions of five QTL, their additive or dominance effects, and the approximate heritabilities (proportion of the phenotypic variation explained by an effect). As can be seen, two of five simulated QTL (Q and Q4) were positioned at markers, and others 17

18 were in marker intervals. The second simulation was the same as the first one except that each chromosome had 51 equally spaced markers (placed cm apart). Therefore, these two simulations had 09 and 969 total markers, respectively. Our third experiment was a backcross (BC) with 400 progenies and 11 equally spaced markers on each chromosome. We simulated one binary covariate and eight QTL with complex interactions; among the eight simulated QTL, five had main effects while other three had no main effects but interacted with other QTL or the covariate. The covariate effect explained ~ % of the variance of the continuous phenotype. Table 1b presents the positions of simulated QTL, the genetic effects (main effects, epistasis and G E), and the approximate heritabilities. The fourth simulation was the same as the third except that each chromosome had 51 equally spaced markers. (insert Table 1 here) Analysis and evaluation: Under each scenario, we carried out 00 simulation replicates. Each of the F simulated data sets was analyzed using two models; the first included a covariate term and main effects of all markers, and the second allowed not only the covariate and main effects but also epistatic and G E interactions. The aim of the second analysis is to examine whether or not we detect interactions if they are not present. For each of the simulated BC data sets, we fit a model with a covariate effect, main effects of all markers, epistatic and G E interactions. For all analyses, the continuous and the binary traits were analyzed using Gaussian and Probit models, respectively. We tried several values for the hyperparameters ν from 0 to 0.1 and s from 0 to 10-3, and got similar results. We displayed the results with ( ν, s ) = (0.01, 10-4 ). To 18

19 investigate the influence of two thresholds t 1 and t on our model search method (see the last section), we analyzed the data sets using several values (t 1 = 10-4, 10-6, 10-8, 10-10, 10-1, and t = 1, 0.01, 0.005, 0.001). We evaluated the performance of our method by examining the frequency of each effect included in the final model, true positives (the frequency of true effects included in the final model), and false positives (the frequency of false effects included in the final model) over the 00 simulation replicates. The inclusion frequency of an effect presents the empirical power to detect the effect. A detected main or G E effect (episttic effect) is considered as correct if the associated position(s) is within 10 cm of any true QTL. Results: The threshold t 1 controls the minimal size of genetic effects to enter the models. We found that the final models almost picked up the same sets of effects for the above values of t 1 in all the simulation experiments, and thus the choice of t 1 had little influence on our model search method. Therefore, we only present the results for t 1 = 10-8 below. Figure 1 displays the genetic effects included in the final models with frequencies greater than 0.01 for the four values of t in the first simulation experiment. We can see that no false effect (defined above) was included the final models with frequency greater than For the continuous trait, the simulated main effects located at markers, (1@60)d and (@50)a, were detected with frequencies close to 1. As expected, the analyses of the binary trait had lower power than the continuous trait, especially for the two linked QTL on chromosome. For the simulated effects located within the marker intervals, our analyses picked up the corresponding effects of the flanking markers in the final models; for example, for the simulated effect (@3)a, the final models included 19

20 or For these simulated effects, the true positives that roughly equal the sum of the inclusion frequencies of the corresponding flanking marker effects were also high. Although no interactions were present, the analyses allowing epistatic and G E effects did not reduce the power for detecting the main effects and did not include any interactions with high frequency. Finally, it can be seen that a smaller threshold t slightly reduced the power. However, this threshold controls the rate of false positives which will be illustrated shortly. (insert Figure 1 here) Figure shows the results of the analyses for the second experiment. The models including or excluding epistatic and G E interactions produced similar results again. Our second experiment simulated 51 markers evenly distributed on each chromosome, resulting in much higher correlation among the variables. Consequently, the simulated effects were included with lower frequencies than in the first experiment. However, we were still able to identify the true effects with reasonable high power and true positives for both the continuous and the binary traits. For example, the first simulated effect (1@1)a was identified with power of ~ 60% (40%) and true positives of ~ 90% (80%) for the continuous (binary) trait. (insert Figure here) The results for the BC simulations are illustrated in Figure 3. For the experiment with 11 markers on each chromosome, no false effect was included the final models with frequency greater than For the continuous trait, all the simulated main effects, epistatic and G E interactions were identified almost in all 00 replicates. For all the simulated main effects and epistatic interactions, the analyses of the binary trait had 0

21 lower power; the reduction of the power was larger for the main effects of the two relatively tightly linked QTL on chromosome and epistasis. For the fourth experiment where 51 markers were evenly placed on each chromosome, most of the simulated effects were included in the final models with frequencies greater than 60% for the continuous trait. Higher correlation among the variables resulted in the method to occasionally pick up effects with associated positions near to the true effects. Again, the true positives for all the simulated effects were high. Finally, it was observed that the threshold t had little influence on the power of detecting the true effects. (insert Figure 3 here) Figure 4 shows the false positives for the BC experiments under different p-values t. The other experiments gave similar results. With a large value t, the models included extraneous effects with high frequencies in the simulation replicates. Data with dense marker maps generally produced lower false positives. False effects were removed from the models rapidly as the threshold t decreased, indicating that false effects were associated with larger p-values. For all the cases, we observed that the estimates of the simulated effects were close to the corresponding true values. But the estimates of spurious effects were much smaller. (insert Figure 4 here) REAL DATA EXAMPLES We illustrate our method with three real QTL mapping experiments. These data sets have been extensively analyzed previously, and thus serve as good examples to compare our method with the existing methods and to illustrate the advantages of our 1

22 method in terms of statistical and computational efficiencies and modeling flexibility. We used several values for ν (between 0 and 0.1) and s (between 0 and 10-3 ), and obtained identical results. We displayed the results by setting the thresholds t 1 = 10-8 and t = 1. Barley dataset: The barley dataset (Tinker et al. 1996) was reanalyzed for demonstration. The data were collected from a doubled haploid population that contained n = 145 lines; each was grown in 5 different environments. The phenotype analyzed was the average value of kernel weight across environments. A total of 17 markers spanning seven chromosomes were genotyped. The data contained ~5% missing marker genotypes. We used a Gaussian model to fit main effects of all markers and their epistatic interactions. The total number of effects was 819 (17 main effects and 17 16/ = 8001 epistatic effects). Our analysis took ~0.05 minutes on a P4 computer. We obtained a final model with eight main effects, three on chromosome 1, two on chromosome, one on chromosomes 3, 4 and 7, respectively. We did not detect epistatic effects. The estimates of the main effects are displayed in the top panel of Figure 5. The p-values for these effects were 1.3e-07, 3.3e-15, 1.4e-03, 4.3e-05, 6.9e-04, 9.e-05, 3.3e-03, and 3.1e- 4, respectively. If we set t = 0.001, the effects 1@1.1 and 4@145.5 are removed from the final model. For comparison, we reanalyzed the data using the MCMC method (Yi et al. 005; Yi et al. 007a) implemented in R/qtlbim (Yandell et al. 007). Our MCMC analysis also did not detect epistatic effects, but multiple main effects with positions overlapping or close to those in the proposed method analysis (see the bottom panel of Figure 5). (insert Figure 5 here)

23 This dataset has been extensively analyzed using different methods, including mode-finding algorithms (Xu 007; Xu and Jia 007) and shrinkage MCMC methods (Xu 003; Yi and Xu 008). Xu (007) (also see Xu and Jia 007) analyzed this dataset using an empirical Bayes method and the LASSO of Tibshirani (1996). Both the analyses took ~ minutes and obtained the results similar to ours. The shrinkage MCMC algorithms also gave the similar results, but took a few hours (Xu 003; Yi and Xu 008). Therefore, our method performs comparably with the alternative sophisticated techniques, and is computationally faster. Listeria monocytogenes dataset: As a second study we illustrate the application of our method by re-analyses of the mouse data of Boyartchuk et al. (001). This dataset consisted of 116 female mice from an intercross (F ) between the BALB/cByJ and C57BL/6ByJ strains. Each animal was genotyped at 133 markers spanning 0 chromosomes. The phenotype of interest was the time to death following infection with Listeria monocytogenes. The animals that died from infection had a mean survival of hours. Roughly 30% of the mice recovered from the infection and survived past the 64-hour time point. Several methods have been applied to this dataset, including twopart model (Broman 003), parametric proportional hazards model (Diao et al. 004) and a unified semi-parametric framework approach (Jin et al. 007). These analyses detected significant QTL on chromosomes 1, 5 and 13, and suggestive QTL on chromosome 15. We applied our method to the data using the two-part model of Broman (003) rather than developing a new model for this type of phenotype. The two-part model decomposes the phenotype into a binary trait, indicating whether the subect survived, and a continuous trait for time to death for those dying within 64 hours (Broman 003). 3

24 We used a Probit regression model for the binary phenotype and a Gaussian model for the logarithm of the continuous phenotype, and fit a model with main effects of all markers and their epistatic interactions. The total number of effects was (133 additive effects, 133 dominance effects, and 3511 epistatic effects). Our analyses took ~0.0 and 0.03 minutes on a P4 computer, for the continuous and the binary traits, respectively. Our results were consistent to the previous ones ( Figure 6). For the continuous trait, the final model included two additive effects on chromosomes 1 and 15, respectively, and one dominance effect on chromosome 13. The p-values for these effects were 3.8e-05, 1.8e-0 and 1.6e-03, respectively. For the binary trait, the final model included two additive effects on chromosomes 5 and 13, with the p- values being 1.7e-07 and 1.9e-04, respectively. We did not detect epistatic effects. (insert Figure 6 here) We also analyzed the data using the Bayesian MCMC method (Yi et al. 005; Yi et al. 007a, b; Yandell et al. 007). The MCMC algorithm detected additive effects on chromosomes 1 and 15 and dominance effect on chromosome 13 for the continuous trait, and additive effects on chromosomes 5 and 13 for the binary trait (not shown here). The positions of these QTL were close to the markers detected by the proposed method. In addition these main effects, the fully Bayesian analysis found evidence of epistatic effects; for the continuous trait, QTL on chromosomes 1, 5 and 9 showed evidence of epistasis, and for the binary trait, chromosomes 6 and 13 harbored epistatic QTL. However, these epistatic effects were not strong. The MCMC analyses took ~ 8 and 10 minutes for the continuous and the binary traits, respectively. 4

25 Obesity dataset: We also applied our method to an obesity dataset of backcross mice. This dataset has been extensively analyzed using Bayesian MCMC methods (Yi et al. 005; Yi et al. 006; Yi et al. 007a). The cross was produced from two highly divergent strains: M16i, consisting of large and moderately obese mice, and CAST/Ei, a wild strain of small mice with lean bodies. CAST/Ei males were mated to M16i females, and F 1 males were backcrossed to M16i females, resulting in 41 mice (13 males and 08 females) reaching 1 wk of age. All mice were genotyped for 9 markers located on 19 autosomal chromosomes. We analyzed a continuous trait Fat, the sum of right gonadal and hindlimb subcutaneous fat pads. We included sex and body weight at the age of 1 weeks as covariates, and permitted the inclusion of epistatic and gene-sex interactions in a Gaussian regression model. The total number of genetic effects was 4370, including 9 main effects, 4186 epistatic effects and 9 gene-sex interactions. Using the MCMC algorithm (took ~15 minutes), Yi et al. (007a) detected evidence of QTL on 9 chromosomes (1,, 6, 8, 13, 14, 15, 18, and 19). These QTL showed a complex network of interactions. The strongest main-effect QTL was found on chromosome, and QTL on chromosomes 8, 13, 14, 15, and 19 also had main effects. Strong epistatic interactions between chromosomes 15 and 19 and chromosomes 1 and 18 were detected. Evidence of epistasis was found between a few other chromosome pairs. The QTL on chromosome was found to interact with the covariate sex. Our hierarchical generalized linear model analysis took ~0.15 minutes, and obtained a final model including 1 main effects, five epistatic effects, and two gene-sex interactions. The estimates of the genetic effects and their p-values are displayed in Figure 7. Our results were basically consistent to the previous full Bayesian analysis; all 5

26 strong effects were detected by the both analyses. All the main-effect QTL detected in Yi et al. (007a) were also identified in our reanalysis, with positions in the regions detected previously. The p-values for most of the main effects were smaller than Main effects of two linked markers on chromosome were simultaneously included in the model, which were the largest among the detected main effects. The epistatic interactions, and were the strongest, which were also detected previously. Main effects on both chromosomes 1 and 18 were not included in the final model, indicating that our method can detect QTL with weak main effects but strong epistasis. Two gene-sex interactions, (@73):SEX and (15@1):SEX, were included in the final model with p-values being 1.7e-03 and 4.4e-03, respectively. (insert Figure 7 here) DISCUSSION We propose a new statistical framework for genome-wide multiple QTL mapping in experimental crosses. Our method is developed based on the generalized linear model framework and thus can deal with various phenotypes that the standard generalized linear models can analyze. The proposed models assume sparseness-inducing priors and thus can simultaneously fit a large number of effects, including environmental effects, main effects of numerous genomic loci, epistatic and G E interactions. By looking for the modes of the posterior distribution via the EM algorithm we are able to quickly identify significant effects. Experiments with extensive simulation studies and real datasets have shown good performance. We assign independent normal distributions with unknown variances to the coefficients, and devote our attention to the scaled inverse- χ distribution for the 6

27 variance parameters. Our prior includes the noninformative Jeffreys prior as a special case, which has the form of a scaled inverse- χ density with 0 degrees of freedom. As already shown in previous studies (Figueiredo 003; Kiiveri 003; Xu 003; Bae and Mallick 004; Griffin and Brown 007) and in our experimental results, the Jeffreys prior usually yields good performance, inducing strong sparseness in the fitted model. However, we recommend choosing the hyperparameters in the scaled inverse- χ to yield a proper prior on coefficients sharply peaked near zero but approximately uniform for the range away from zero. Our prior induces strong shrinkage for near-zero effects but weak shrinkage for large effects. Our method can be in principle extended to models with other prior distributions, for example, the exponential distributions with fixed hyperparameters. However, it would be appealing to treat the hyper-parameters as unknowns and estimate them from the data so that the model can shrink the coefficients as much as can be ustified by the data. Yi and Xu (008) developed MCMC algorithms to ointly estimate all hyperparameters and model parameters, and found that such procedures usually outperform those with fixed hyperparameters. Future research will extend the proposed EM algorithm to estimate hyperparameters. Gelman et al. (008) proposed independent Cauchy distribution with mean 0 and scale.5 (i.e., β τ ~ N( µ, τ ) and τ ~ Inv- χ (1,.5) ) for routine data analysis, and implemented a procedure to fit generalized linear models with this prior by incorporating an EM algorithm into the usual iteratively weighted least squares. However, this Cauchy prior cannot shrink small effects to zero and thus cannot be used for our high-dimensional models. Our algorithm is similar to Gelman et al. (008), but differs in regarding the variances rather than the coefficients as missing data. The trick of the algorithm is to 7

28 express the prior information on coefficients as additional data points, allowing us to take advantage of the existing iteratively weighted least squares algorithm. Various other efficient methods have been proposed recently, but are built upon special algorithms and thus more difficult to implement (e.g., Tibshirani 1996; Efron et al. 004; Genkin et al. 007; Xu 007). Rather than fitting all possible effects in a large model, we propose a novel model search strategy to build a parsimonious model. Our method begins with a model with no genetic effect, and then adds different types of genetic effects into the model and deletes very small effects. Unlike traditional variable selection approach, we simultaneously add or delete many variables and take care of the correlation among the variables. Kiiveri (003) and Zhang and Xu (005) adopted a similar criterion for pruning effects from the model, but used backward selection procedure. Our search strategy avoids the need of large memory and computational problems when the number of effects is huge, and thus facilitates the genome-wide interacting QTL analysis. The fully Bayesian approach, employing the MCMC simulation to generate posterior samples from the oint posterior distribution of all unknowns, can provide more comprehensive posterior inferences (Gelman et al. 003). In QTL mapping, the fully Bayesian methods can easily treat missing genotypes and positions of QTL as unknowns and use model averaging to induce robustness and stability (Yi et al. 005; Yi et al. 007a, b; Yi and Shriner 008). We describe our method by allowing only observed markers as potential QTL and replacing missing marker genotypes by their conditional expectations. We can extend our method to detect QTL within marker intervals by inserting loci within each marker intervals (pseudo-markers) as possible positions of 8

29 QTL. It may deserve further investigation on how the number and the spacing of pseudomarkers affects the performance of the method. The proposed method has computational advantages over the fully Bayesian MCMC methods. High-order interactions could be implemented using our approach, but this would substantially increase the model space to be explored and thus may need further investigation. Future research will extend our method to genome-wide association analysis and gene expression QTL (eqtl) mapping. Fast and efficient algorithms are an essential feature for the practical analysis of these more complicated cases. ACKNOWLEDGMENTS This work was supported by the National Institutes of Health (NIH) Grants R01 GM to NY and CTSC GRANT UL1-RR04996 to SB. LITERATURE CITED Baierl, A., M. Bogdan, F. Frommlet and A. Futschik, 006 On locating multiple interacting quantitative trait loci in intercross designs. Genetics 173: Bogdan, M., Ghosh, J. K., and R. W. Doerge, 004 Modifying the Schwarz Bayesian information criterion to locate multiple interacting quantitative trait loci. Genetics 167: Broman, K. W. and T. P. Speed, 00 A model selection approach for identification of quantitative trait loci in experimental crosses. J. R. Statist. Soc. B. 64: Broman, K. W., Wu, H., Sen, Ś., and G. A. Churchill, 003 R/qtl: QTL mapping in experimental crosses. Bioinformatics 19: Broman, K. W., 003 Mapping quantitative trait loci in the case of a spike in the phenotype distribution. Genetics 163:

30 Bao, K. and B. K. Mallick, 004 Gene selection using a two-level hierarchical Bayesian model. Bioinformatics 0: Boyartchuk, V. L., Broman, K. W., Mosher, R. E., D'Orazio S. E. F., Starnbach, M. N. and Dietrich, W. F., 001 Multigenic control of Listeria monocytogenes susceptibility in mice. Nature Genetics 7: Carlborg, Ö. and C. Haley, 004 Epistasis: too often neglected in complex trait studies? Nat. Rev. Genet. 5: Diao, G., Lin, D. Y., and F. Zou, 004 Mapping quantitative trait loci with censored observations. Genetics 168: Efron, B., Hastie, T., Johnstone, I., and R. Tibshirani, 004 Least angle regression. Annals of Statistics 3: Figueiredo, M. A. T., 003 Adaptive sparseness for supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 5: Gelman, A., Carlin, J., Stern, H., and D. Rubin, 003 Bayesian data analysis. Chapman and Hall, London. Gelman, A., Jakulin, A., Pittau, M. G., and Y. S. Su, 008 A weakly informative default prior distribution for logistic and other regression models. Annals of Applied Statistics (in press). Genkin, A., Lewis, D. D., and D. Madigan, 007 Large-scale Bayesian logistic regression for text categorization. Technometrics 49: Griffin, J. E. and P. J. Brown, 007 Bayesian adaptive lassos with non-convex penalization. Technical report, University of Warwick. Haley, C. S. and S. A. Knott, 199 A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69: Jiang, C., and Z-B. Zeng, 1997 Mapping quantitative trait loci with dominant and missing markers in various crosses from two inbred lines. Genetica 101: Jin, C., Fine, J. P., and B. S. Yandell, 007 A unified semiparametric framework for quantitative trait loci analysis, with application to spike phenotypes. Journal of the American Statistical Association 10: Kao, C. H., Zeng, Z-B., and R. D. Teasdale, 1999 Multiple interval mapping for quantitative trait loci. Genetics 15:

MOST complex traits are influenced by interacting

MOST complex traits are influenced by interacting Copyright Ó 2009 by the Genetics Society of America DOI: 10.1534/genetics.108.099556 Hierarchical Generalized Linear Models for Multiple Quantitative Trait Locus Mapping Nengjun Yi*,1 and Samprit Banerjee

More information

Multiple QTL mapping

Multiple QTL mapping Multiple QTL mapping Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] 1 Why? Reduce residual variation = increased power

More information

Overview. Background

Overview. Background Overview Implementation of robust methods for locating quantitative trait loci in R Introduction to QTL mapping Andreas Baierl and Andreas Futschik Institute of Statistics and Decision Support Systems

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University kbroman@jhsph.edu www.biostat.jhsph.edu/ kbroman Outline Experiments and data Models ANOVA

More information

Mapping multiple QTL in experimental crosses

Mapping multiple QTL in experimental crosses Mapping multiple QTL in experimental crosses Karl W Broman Department of Biostatistics and Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman [ Teaching Miscellaneous lectures]

More information

QTL model selection: key players

QTL model selection: key players Bayesian Interval Mapping. Bayesian strategy -9. Markov chain sampling 0-7. sampling genetic architectures 8-5 4. criteria for model selection 6-44 QTL : Bayes Seattle SISG: Yandell 008 QTL model selection:

More information

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong

More information

Mapping multiple QTL in experimental crosses

Mapping multiple QTL in experimental crosses Human vs mouse Mapping multiple QTL in experimental crosses Karl W Broman Department of Biostatistics & Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman www.daviddeen.com

More information

Gene mapping in model organisms

Gene mapping in model organisms Gene mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Goal Identify genes that contribute to common human diseases. 2

More information

QTL Mapping I: Overview and using Inbred Lines

QTL Mapping I: Overview and using Inbred Lines QTL Mapping I: Overview and using Inbred Lines Key idea: Looking for marker-trait associations in collections of relatives If (say) the mean trait value for marker genotype MM is statisically different

More information

QTL Model Search. Brian S. Yandell, UW-Madison January 2017

QTL Model Search. Brian S. Yandell, UW-Madison January 2017 QTL Model Search Brian S. Yandell, UW-Madison January 2017 evolution of QTL models original ideas focused on rare & costly markers models & methods refined as technology advanced single marker regression

More information

R/qtl workshop. (part 2) Karl Broman. Biostatistics and Medical Informatics University of Wisconsin Madison. kbroman.org

R/qtl workshop. (part 2) Karl Broman. Biostatistics and Medical Informatics University of Wisconsin Madison. kbroman.org R/qtl workshop (part 2) Karl Broman Biostatistics and Medical Informatics University of Wisconsin Madison kbroman.org github.com/kbroman @kwbroman Example Sugiyama et al. Genomics 71:70-77, 2001 250 male

More information

Selecting explanatory variables with the modified version of Bayesian Information Criterion

Selecting explanatory variables with the modified version of Bayesian Information Criterion Selecting explanatory variables with the modified version of Bayesian Information Criterion Institute of Mathematics and Computer Science, Wrocław University of Technology, Poland in cooperation with J.K.Ghosh,

More information

Statistical issues in QTL mapping in mice

Statistical issues in QTL mapping in mice Statistical issues in QTL mapping in mice Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Outline Overview of QTL mapping The X chromosome Mapping

More information

Use of hidden Markov models for QTL mapping

Use of hidden Markov models for QTL mapping Use of hidden Markov models for QTL mapping Karl W Broman Department of Biostatistics, Johns Hopkins University December 5, 2006 An important aspect of the QTL mapping problem is the treatment of missing

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Human vs mouse Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] www.daviddeen.com

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University kbroman@jhsph.edu www.biostat.jhsph.edu/ kbroman Outline Experiments and data Models ANOVA

More information

Inferring Genetic Architecture of Complex Biological Processes

Inferring Genetic Architecture of Complex Biological Processes Inferring Genetic Architecture of Complex Biological Processes BioPharmaceutical Technology Center Institute (BTCI) Brian S. Yandell University of Wisconsin-Madison http://www.stat.wisc.edu/~yandell/statgen

More information

QTL model selection: key players

QTL model selection: key players QTL Model Selection. Bayesian strategy. Markov chain sampling 3. sampling genetic architectures 4. criteria for model selection Model Selection Seattle SISG: Yandell 0 QTL model selection: key players

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

Locating multiple interacting quantitative trait. loci using rank-based model selection

Locating multiple interacting quantitative trait. loci using rank-based model selection Genetics: Published Articles Ahead of Print, published on May 16, 2007 as 10.1534/genetics.106.068031 Locating multiple interacting quantitative trait loci using rank-based model selection, 1 Ma lgorzata

More information

STA 216, GLM, Lecture 16. October 29, 2007

STA 216, GLM, Lecture 16. October 29, 2007 STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics and Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman [ Teaching Miscellaneous lectures]

More information

Binary trait mapping in experimental crosses with selective genotyping

Binary trait mapping in experimental crosses with selective genotyping Genetics: Published Articles Ahead of Print, published on May 4, 2009 as 10.1534/genetics.108.098913 Binary trait mapping in experimental crosses with selective genotyping Ani Manichaikul,1 and Karl W.

More information

Module 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline

Module 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline Module 4: Bayesian Methods Lecture 9 A: Default prior selection Peter Ho Departments of Statistics and Biostatistics University of Washington Outline Je reys prior Unit information priors Empirical Bayes

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Lecture 8. QTL Mapping 1: Overview and Using Inbred Lines

Lecture 8. QTL Mapping 1: Overview and Using Inbred Lines Lecture 8 QTL Mapping 1: Overview and Using Inbred Lines Bruce Walsh. jbwalsh@u.arizona.edu. University of Arizona. Notes from a short course taught Jan-Feb 2012 at University of Uppsala While the machinery

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl Broman Biostatistics and Medical Informatics University of Wisconsin Madison kbroman.org github.com/kbroman @kwbroman Backcross P 1 P 2 P 1 F 1 BC 4

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation

Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation COMPSTAT 2010 Revised version; August 13, 2010 Michael G.B. Blum 1 Laboratoire TIMC-IMAG, CNRS, UJF Grenoble

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Geometric ergodicity of the Bayesian lasso

Geometric ergodicity of the Bayesian lasso Geometric ergodicity of the Bayesian lasso Kshiti Khare and James P. Hobert Department of Statistics University of Florida June 3 Abstract Consider the standard linear model y = X +, where the components

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Estimation of Parameters in Random. Effect Models with Incidence Matrix. Uncertainty

Estimation of Parameters in Random. Effect Models with Incidence Matrix. Uncertainty Estimation of Parameters in Random Effect Models with Incidence Matrix Uncertainty Xia Shen 1,2 and Lars Rönnegård 2,3 1 The Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala, Sweden; 2 School

More information

Proteomics and Variable Selection

Proteomics and Variable Selection Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Xiaoquan Wen Department of Biostatistics, University of Michigan A Model

More information

Statistical Practice

Statistical Practice Statistical Practice A Note on Bayesian Inference After Multiple Imputation Xiang ZHOU and Jerome P. REITER This article is aimed at practitioners who plan to use Bayesian inference on multiply-imputed

More information

Regularization Path Algorithms for Detecting Gene Interactions

Regularization Path Algorithms for Detecting Gene Interactions Regularization Path Algorithms for Detecting Gene Interactions Mee Young Park Trevor Hastie July 16, 2006 Abstract In this study, we consider several regularization path algorithms with grouped variable

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Least Absolute Shrinkage is Equivalent to Quadratic Penalization

Least Absolute Shrinkage is Equivalent to Quadratic Penalization Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

Partitioning Genetic Variance

Partitioning Genetic Variance PSYC 510: Partitioning Genetic Variance (09/17/03) 1 Partitioning Genetic Variance Here, mathematical models are developed for the computation of different types of genetic variance. Several substantive

More information

In Search of Desirable Compounds

In Search of Desirable Compounds In Search of Desirable Compounds Adrijo Chakraborty University of Georgia Email: adrijoc@uga.edu Abhyuday Mandal University of Georgia Email: amandal@stat.uga.edu Kjell Johnson Arbor Analytics, LLC Email:

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu 1,2, Daniel F. Schmidt 1, Enes Makalic 1, Guoqi Qian 2, John L. Hopper 1 1 Centre for Epidemiology and Biostatistics,

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics arxiv:1603.08163v1 [stat.ml] 7 Mar 016 Farouk S. Nathoo, Keelin Greenlaw,

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

Model Selection for Multiple QTL

Model Selection for Multiple QTL Model Selection for Multiple TL 1. reality of multiple TL 3-8. selecting a class of TL models 9-15 3. comparing TL models 16-4 TL model selection criteria issues of detecting epistasis 4. simulations and

More information

A general mixed model approach for spatio-temporal regression data

A general mixed model approach for spatio-temporal regression data A general mixed model approach for spatio-temporal regression data Thomas Kneib, Ludwig Fahrmeir & Stefan Lang Department of Statistics, Ludwig-Maximilians-University Munich 1. Spatio-temporal regression

More information

A new simple method for improving QTL mapping under selective genotyping

A new simple method for improving QTL mapping under selective genotyping Genetics: Early Online, published on September 22, 2014 as 10.1534/genetics.114.168385 A new simple method for improving QTL mapping under selective genotyping Hsin-I Lee a, Hsiang-An Ho a and Chen-Hung

More information

Bayesian model selection: methodology, computation and applications

Bayesian model selection: methodology, computation and applications Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

A Note on Bayesian Inference After Multiple Imputation

A Note on Bayesian Inference After Multiple Imputation A Note on Bayesian Inference After Multiple Imputation Xiang Zhou and Jerome P. Reiter Abstract This article is aimed at practitioners who plan to use Bayesian inference on multiplyimputed datasets in

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j Standard Errors & Confidence Intervals β β asy N(0, I( β) 1 ), where I( β) = [ 2 l(β, φ; y) ] β i β β= β j We can obtain asymptotic 100(1 α)% confidence intervals for β j using: β j ± Z 1 α/2 se( β j )

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2013 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Meghana Kshirsagar (mkshirsa), Yiwen Chen (yiwenche) 1 Graph

More information

Multiple interval mapping for ordinal traits

Multiple interval mapping for ordinal traits Genetics: Published Articles Ahead of Print, published on April 3, 2006 as 10.1534/genetics.105.054619 Multiple interval mapping for ordinal traits Jian Li,,1, Shengchu Wang and Zhao-Bang Zeng,, Bioinformatics

More information

The Bayesian Approach to Multi-equation Econometric Model Estimation

The Bayesian Approach to Multi-equation Econometric Model Estimation Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population

More information

arxiv: v1 [stat.me] 5 Aug 2015

arxiv: v1 [stat.me] 5 Aug 2015 Scalable Bayesian Kernel Models with Variable Selection Lorin Crawford, Kris C. Wood, and Sayan Mukherjee arxiv:1508.01217v1 [stat.me] 5 Aug 2015 Summary Nonlinear kernels are used extensively in regression

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Problems with Penalised Maximum Likelihood and Jeffrey s Priors to Account For Separation in Large Datasets with Rare Events

Problems with Penalised Maximum Likelihood and Jeffrey s Priors to Account For Separation in Large Datasets with Rare Events Problems with Penalised Maximum Likelihood and Jeffrey s Priors to Account For Separation in Large Datasets with Rare Events Liam F. McGrath September 15, 215 Abstract When separation is a problem in binary

More information

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Tihomir Asparouhov 1, Bengt Muthen 2 Muthen & Muthen 1 UCLA 2 Abstract Multilevel analysis often leads to modeling

More information

Empirical Bayesian LASSO-logistic regression for multiple binary trait locus mapping

Empirical Bayesian LASSO-logistic regression for multiple binary trait locus mapping Huang et al. BMC Genetics 2013, 14:5 METHODOLOGY ARTICLE Open Access Empirical Bayesian LASSO-logistic regression for multiple binary trait locus mapping Anhui Huang 1, Shizhong Xu 2 and Xiaodong Cai 1*

More information

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information # Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Details of PRF Methodology In the Poisson Random Field PRF) model, it is assumed that non-synonymous mutations at a given gene are either

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

Regression Model In The Analysis Of Micro Array Data-Gene Expression Detection

Regression Model In The Analysis Of Micro Array Data-Gene Expression Detection Jamal Fathima.J.I 1 and P.Venkatesan 1. Research Scholar -Department of statistics National Institute For Research In Tuberculosis, Indian Council For Medical Research,Chennai,India,.Department of statistics

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

PACKAGE LMest FOR LATENT MARKOV ANALYSIS

PACKAGE LMest FOR LATENT MARKOV ANALYSIS PACKAGE LMest FOR LATENT MARKOV ANALYSIS OF LONGITUDINAL CATEGORICAL DATA Francesco Bartolucci 1, Silvia Pandofi 1, and Fulvia Pennoni 2 1 Department of Economics, University of Perugia (e-mail: francesco.bartolucci@unipg.it,

More information

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3 University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.

More information

Mixture Cure Model with an Application to Interval Mapping of Quantitative Trait Loci

Mixture Cure Model with an Application to Interval Mapping of Quantitative Trait Loci Mixture Cure Model with an Application to Interval Mapping of Quantitative Trait Loci Abstract. When censored time-to-event data are used to map quantitative trait loci (QTL), the existence of nonsusceptible

More information

Causal Model Selection Hypothesis Tests in Systems Genetics

Causal Model Selection Hypothesis Tests in Systems Genetics 1 Causal Model Selection Hypothesis Tests in Systems Genetics Elias Chaibub Neto and Brian S Yandell SISG 2012 July 13, 2012 2 Correlation and Causation The old view of cause and effect... could only fail;

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Penalized Loss functions for Bayesian Model Choice

Penalized Loss functions for Bayesian Model Choice Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented

More information

Quantile POD for Hit-Miss Data

Quantile POD for Hit-Miss Data Quantile POD for Hit-Miss Data Yew-Meng Koh a and William Q. Meeker a a Center for Nondestructive Evaluation, Department of Statistics, Iowa State niversity, Ames, Iowa 50010 Abstract. Probability of detection

More information

what is Bayes theorem? posterior = likelihood * prior / C

what is Bayes theorem? posterior = likelihood * prior / C who was Bayes? Reverend Thomas Bayes (70-76) part-time mathematician buried in Bunhill Cemetary, Moongate, London famous paper in 763 Phil Trans Roy Soc London was Bayes the first with this idea? (Laplace?)

More information

A default prior distribution for logistic and other regression models

A default prior distribution for logistic and other regression models A default prior distribution for logistic and other regression models Andrew Gelman, Aleks Jakulin, Maria Grazia Pittau, and Yu-Sung Su September 12, 2006 Abstract We propose a new prior distribution for

More information

Linear Regression (1/1/17)

Linear Regression (1/1/17) STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression

More information

THE data in the QTL mapping study are usually composed. A New Simple Method for Improving QTL Mapping Under Selective Genotyping INVESTIGATION

THE data in the QTL mapping study are usually composed. A New Simple Method for Improving QTL Mapping Under Selective Genotyping INVESTIGATION INVESTIGATION A New Simple Method for Improving QTL Mapping Under Selective Genotyping Hsin-I Lee,* Hsiang-An Ho,* and Chen-Hung Kao*,,1 *Institute of Statistical Science, Academia Sinica, Taipei 11529,

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

Causal Graphical Models in Systems Genetics

Causal Graphical Models in Systems Genetics 1 Causal Graphical Models in Systems Genetics 2013 Network Analysis Short Course - UCLA Human Genetics Elias Chaibub Neto and Brian S Yandell July 17, 2013 Motivation and basic concepts 2 3 Motivation

More information

A Review of Bayesian Variable Selection Methods: What, How and Which

A Review of Bayesian Variable Selection Methods: What, How and Which Bayesian Analysis (2009) 4, Number 1, pp. 85 118 A Review of Bayesian Variable Selection Methods: What, How and Which R.B. O Hara and M. J. Sillanpää Abstract. The selection of variables in regression

More information

Quantile-based permutation thresholds for QTL hotspot analysis: a tutorial

Quantile-based permutation thresholds for QTL hotspot analysis: a tutorial Quantile-based permutation thresholds for QTL hotspot analysis: a tutorial Elias Chaibub Neto and Brian S Yandell September 18, 2013 1 Motivation QTL hotspots, groups of traits co-mapping to the same genomic

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information