Bayesian Multilocus Association Models for Prediction and Mapping of Genome-Wide Data
|
|
- Ashlyn Ray
- 5 years ago
- Views:
Transcription
1 Bayesian Multilocus Association Models for Prediction and Mapping of Genome-Wide Data DOCTORAL THESIS IN ANIMAL SCIENCE Hanni P. Kärkkäinen ACADEMIC DISSERTATION To be presented, with the permission of the Faculty of Agriculture and Forestry of the University of Helsinki, for public criticism in the Lecture Hall of Koetilantie 5, Helsinki, on November 15th 2013, at 12 o clock noon. Helsinki 2013 DEPARTMENT OF AGRICULTURAL SCIENCES PUBLICATIONS 20
2 Supervisor: Professor Mikko J. Sillanpää University of Oulu Department of Mathematical Sciences Department of Biology and Biocenter Oulu P.O.Box 3000 FIN Oulu, Finland Co-supervisor: Adjunct Professor Jarmo Juga University of Helsinki Department of Agricultural Sciences P.O.Box 27 FIN Helsinki, Finland Reviewers: Professor Daniel Sorensen Aarhus University Department of Molecular Biology and Genetics P.O.Box 50 DK 8830 Tjele, Denmark Professor Otso Ovaskainen University of Helsinki Department of Biosciences P.O. Box 56 FIN Helsinki, Finland Opponent: Senior Researcher Luc Janss Aarhus University Department of Molecular Biology and Genetics P.O.Box 50 DK 8830 Tjele, Denmark ISBN (Paperback) ISBN (PDF) Electronic publication at Unigrafia Helsinki 2013
3 List of original publications The following original papers are referred in the text by their Roman numerals. (I) Kärkkäinen, H. P. and M. J. Sillanpää, 2012 Back to basics for Bayesian model building in genomic selection. Genetics 191: (II) Kärkkäinen, H. P. and M. J. Sillanpää, 2012 Robustness of Bayesian multilocus association models to cryptic relatedness.ann.hum.genet. 76: Corrected by: Corrigendum. Ann. Hum. Genet. 77:275. (III) Kärkkäinen, H. P. and M. J. Sillanpää, 2013 Fast genomic predictions via Bayesian G-BLUP and multilocus models of threshold traits including censored Gaussian data. G3 (Bethesda) 3: The publications have been reprinted with the kind permission of their copyright holders. The contributions of the authors HPK and MJS can be detailed as follows: I Both authors were involved in the conception and design of the study. HPK derived the fully conditional posterior distributions and the GEM algorithm, implemented the algorithm with Matlab, performed the data analyses and drafted the manuscript. Both authors participated in the interpretation of results and critically revised the manuscript. II Both authors were involved in the conception and design of the study. HPK derived the fully conditional posterior distributions and the GEM algorithm, implemented the algorithm with Matlab, performed the data analyses and drafted the manuscript. Both authors participated in the interpretation of results and critically revised the manuscript. III Both authors were involved in the conception and design of the study. HPK derived the fully conditional posterior distributions and the GEM algorithm, implemented the algorithm with Matlab, performed the data analyses and drafted the manuscript. Both authors participated in the interpretation of results and critically revised the manuscript. 3
4 Contents 1 Introduction 5 2 Objectives of the study 11 3 Hierarchical Bayesian model Gaussian likelihood Shrinkage inducing priors Hierarchical formulation of the prior densities Sub-models Polygenic component Indicator Hyperprior Student s t vs. Laplace prior Bayesian LASSO and its extensions Bayesian G-BLUP Fully conditional posterior densities Threshold model Binary response Censored Gaussian response Parameter estimation Generalized expectation-maximization Prior selection in MAP estimation GEM-algorithm for a MAP estimate Example analyses Data sets XIII QTL-MAS Workshop data Real pig (Sus scrofa) data Human HapMap data Discrete and censored data Pre-selection of the markers Genomic prediction Association mapping Decision making Diagnostics Of speed and convergence
5 6 Conclusions Current status What have we learned? What s next? Foreword Genome-wide marker data is used in animal and plant breeding in computing genomic breeding values, and in human genetics in identifying disease susceptibility genes, predicting unobserved phenotypes and assessing disease risks. While the tremendous number of markers available for easy and cost-effective genotyping is an invaluable asset in genetic research and animal and plant breeding, the ever increasing data sets are placing heavy demands on the statistical analysis methodology. The statistical methods proposed for genomic selection are based on either traditional best linear unbiased prediction (BLUP) or different Bayesian multilocus association models. In human genetics the most prevalent approach is a single SNP association model. The thesis consists of three original articles trying to obtain further understanding of the behavior of the different Bayesian multilocus association models and of the instances in which different methods work best, to seek connections between the different Bayesian models, and to develop a Bayesian multilocus association model framework, along with an efficient parameter estimation machinery, that can be utilized in phenotype prediction, genomic breeding value estimation and quantitative trait locus (QTL) location and effect estimation from a variety of genome-wide data. 1 Introduction The invention of single nucleotide polymorphisms (SNP) in conjunction with the utilization of microarray technology in high-throughput genotyping has exploded the availability of genome-wide sets of molecular markers. Whole genome SNP chips are available for a wide range of species, including humans, agriculturally important plant and animal species, and genetic model organisms. In human genetics the common goal of a genome-wide association (GWA) study is to detect disease susceptibility genes, predict unobserved phenotypes, and assess disease risks at the individual level (Lee 5
6 et al. 2008; de los Campos et al. 2010). The animal and plant breeders, on the other hand, are mainly interested in estimating genomic breeding values for genomic selection (Eggen 2012; Nakaya and Isobe 2012). Genomic selection refers to marker assisted selection using a genomewide marker information directly in predicting genomic breeding values, rather than first identifying the causal genes (Meuwissen et al. 2001). The basic principle of genomic selection includes a set of individuals, known as the training set or the reference population, with phenotypic records and genotypic information of a whole-genome SNP array, and a statistical model explaining the connection between the marker genotypes and the phenotypic observations. The training set data is employed in estimating the effects of the SNP markers or genotypes to the phenotype, that is, the parameters of the model. The acquired information is then used in predicting the heritable part of the phenotype, i.e. genomic breeding value, of new individuals (the prediction set) that have only genotypic information available. In animal and plant breeding, the most commonly used approach to predict genomic breeding values based on molecular markers is the genomic best linear unbiased prediction or G-BLUP, a direct descendant of the pedigree-based best linear unbiased prediction (BLUP) model (Henderson 1975). G-BLUP employs the marker information in estimating genomic relationships between the individuals, and utilizes the marker-estimated genomic relationship matrix in a mixed model context (e.g. VanRaden 2008; Powell et al. 2010). A relatively recent but promising contender for the BLUP-type of model in the genomic selection field is to apply simultaneous estimation and variable selection or regularization to multilocus association models (e.g. Meuwissen et al. 2001; Xu 2003). A multilocus association model uses the marker information directly by assigning different, possibly zero, effects to the marker alleles and quantifies the genomic breeding value of an individual as the sum of the marker effects. The advantage of a multilocus association over G-BLUP is that the former allows the estimated effect size to vary over the set of markers, while the latter assumes a constant impact throughout the genome. In human genetics the genome-wide association methods are mainly used for mapping of complex genetic traits. Association mapping utilizes the linkage disequilibrium (LD) between the markers and the causal loci in locating the actual causal genes by searching associations between the markers and the phenotype. Population-based association analyses are more powerful than within-family analyses in detecting the genetic loci associated with the phenotype of interest. As a draw-back, the population- 6
7 based studies often suffer from an inflated rate of false positives due to population stratification (i.e. model misspecification in the presence of hidden population structure) and cryptic relatedness (i.e. model misspecification in the presence of sample structure) (see Kang et al. 2010). For example, if two populations in Hardy-Weinberg proportions with divergent allele frequencies are combined, the combined population may have large amount of linkage disequilibrium simply due to the combination (e.g. Ewans and Spielman 1995). Equivalently, the sample structure of the data may lead to allelic association caused by close relatedness between the individuals rather than true association between the marker and the trait. As e.g. PLINK (Purcell et al. 2007) omits the sample and population structure from the model, the artificial linkage disequilibrium is likely to cause false positive and negative signals for marker loci without any connection to the studied trait. Although some other heavily-used association methods, including e.g. TASSEL (Bradbury et al. 2007), GenABEL (Aulchenko et al. 2007), EMMA (Kang et al. 2008) and EMMAX (Kang et al. 2010), provide a sample structure correction, they consider only one marker at the time, ignoring the possible effects of the other major loci. This is less than ideal in genome-wide study for a complex trait, as such traits are assumed to be affected by a multitude of genes (Weeks and Lathrop 1995). The problem with a multilocus association model applied to a genomewide data set is oversaturation: since usually the number of SNP markers is orders of magnitude greater than the number of individuals, there are far more explanatory variables than observations in the model. This leads to a situation where some kind of selection or regularization of the predictors is required, either by selecting a subset of the variables that explains a large proportion of the variation, by using orthogonal or nonorthogonal combinations of the variables, or by shrinking the effects of the variables towards zero (e.g. Sillanpää and Bhattacharjee 2005; Hoggart et al. 2008; O Hara and Sillanpää 2009; Wu et al. 2009; Ayers and Cordell 2010; Cho et al. 2010). The appeal in the shrunken estimates is that these methods keep the dimension constant across the possible models by not actually selecting a subset of variables, but instead setting the effect of unimportant ones to (or near) zero. The drawback is that the estimates tend to be biased towards too small values. The methods discarding markers irrelevant to the phenotype are often referred as variable selection, while the ones assigning a penalty term to shrink the marker effects towards zero are considered as variable regularization. 7
8 Contrary to the frequentist way of deriving a shrinkage estimator by subtracting a penalty from the gain function (in other words, by adding a penalty to the loss function), in the Bayesian context the regularization mechanism is included into the model by specifying an appropriate prior density for the regression coefficients. A penalized maximum likelihood estimate for the regression coefficients β is acquired by maximizing the penalized gain function β PML = arg max β log(p(data β)) λj(β), (1.1) where log(p(data β)) is the log likelihood and J(β) a penalty function. Commonly used penalty functions are derived from the sum of the L2 or L1 norms of the regression coefficients, J(β) = p β j 2 2 = p βj 2 and J(β) = p β j 1 = p β j, j=1 j=1 j=1 j=1 leading to Ridge Regression (Hoerl 1962) and LASSO (Tibshirani 1996) estimates, respectively. The frequentist penalty function is connected to the prior density of a Bayesian model, as the exponent of the function maximized in the frequentist method equals the product exp( log(p(data β)) λj(β)) = p(data β) exp( λj(β)), (1.2) where p(data β) is the likelihood and exp( λj(β)) represents the prior density function. For example, it can be easily seen that the Ridge Regression penalty equals a Gaussian prior density, as exp( (1/λ) p j=1 β2 j ) is a kernel of a Gaussian probability density function. Similarly the L1 penalty equals a double exponential or Laplace density. Although it is clearly more logical to consider the assumptions about the model sparseness as a part of the model (the prior is a part of the model) rather than a part of the estimator (a penalty is a part of the estimator), the difference may seem trivial in practice. However, the fact that in Bayesian context the model includes all available information, permits the estimator to be always the same, either the whole posterior density or a maximum a posteriori (MAP) point estimate, which in turn enables a straightforward translation of the model into an algorithm. In the Bayesian context the variable regularization is included into the model by specifying a spike and slab prior for the regression coefficients, with spike being the probability mass centered near zero and slab the probability mass distributed over the nonzero values (see O Hara and Sillanpää 2009). This prior represents the assumption that only a small pro- 8
9 portion of the predictors have a non-negligible effect ( slab ), while the majority of the effects are close to zero ( spike ). The Bayesian models proposed in the literature differ with respect to the spike and slab prior densities given for the regression coefficients. The desired shape for the prior density may be acquired either as a mixture of two densities, in which case the model includes a dummy variable indicating whether the effect of a given explanatory variable comes from the spike or from the slab part of the prior, or alternatively a single prior density approximating the spike and slab -shape may be assigned directly on the regression coefficients. In the latter case, the probability density functions commonly used for imitating the spike and slab shape are Student s t (e.g. Bayes A by Meuwissen et al. 2001; Xu 2003; Yi and Banerjee 2009) and Laplace densities (e.g. Park and Casella 2008; Yi and Xu 2008; de los Campos et al. 2009; Xu 2010; Li et al. 2011). Due to the connection to the frequentist L1 penalty function the models with a Laplace prior density are commonly denoted as Bayesian LASSO (Park and Casella 2008). Both Student s t and Laplace density functions possess several favorable features, including high kurtosis and heavy tails, that make them worthy candidates for shrinkage inducing priors. Compared to Gaussian density, these functions consist of a greater probability mass centered near zero and higher probability for large values inducing strong shrinkage to the intermediate sized estimate values and proportionally less shrinkage to the large values and the values near zero. While a Gaussian prior density, or equivalently frequentist Ridge Regression, assigns same penalty to all of the regression coefficients, the heavy-tailed functions work by producing a clearer distinction between large and small estimate values by pushing the intermediate sized values to either direction. For this reason the method is sometimes denoted as adaptive shrinkage. Several modifications of the indicator-type methods have been introduced, differing with respect to the mixture components (distributions that are used to form the mixture distribution) set for the regression coefficients and the hierarchical structure of the prior (the dependency between the indicator and the marker effect, and the participation of the indicator in the likelihood). While the stochastic search variable selection (SSVS) models considers the spike and slab as a mixture of two normal distributions (George and McCulloch 1993; Verbyla et al. 2009), or two Student s t distributions (e.g. Yi et al. 2003), majority of the methods straightforwardly set the regression coefficient to be zero when the indicator is zero (so the spike is in fact a point mass located at zero). A prior consisting a mixture 9
10 of a Student s t density and a point mass at zero has been used in several methods, including BayesB (Meuwissen et al. 2001), Hayashi and Iwata (2010) and Habier et al. (2011). A similar mixture based on a Laplace density has been used by Meuwissen et al. (2009) and Shepherd et al. (2010). The simplest hierarchical structure of the prior density, proposed by Kuo and Mallick (1998), determines the effect of the marker j to the phenotype as a product of the indicator γ j and the effect size β j, and considers these two to be a priori independent. Hence the joint prior of the marker effect γ j β j becomes simply p(γ j β j ) = p(γ j )p(β j ), where p(γ j ) is a Bernoulli density with a prior probability for a marker to be linked to the trait and p(β j ) is the Gaussian, Student s t or Laplace prior density given for the effect size. Other types of hierarchical structures presented in the literature include BayesB (Meuwissen et al. 2001) where the marker effect is given by β j alone since the likelihood does not include the indicator; instead, the indicator acts through the effect variance. In Gibbs variable selection, on the other hand, the marker effect is considered as a product of the indicator and the effect size, but the prior density of the effect size is dependent on the indicator (Dellaportas et al. 2002). Whether the model is based on a Student s t, Laplace, or a mixture prior density, the intensity of the shrinkage produced by the prior is determined by the prior parameters (i.e. hyperparameters) defining the shape of the prior density function. The models proposed in the literature differ from each other in terms of the procedures they use to determine the prior parameters. In the original BayesA and BayesB the parameters of the Student s t prior density were defined to produce the desired genetic variance (Meuwissen et al. 2001). The Xu (2003) method is otherwise similar to BayesA, except that the prior parameters are estimated instead of setting into constant values. Similar modifications of BayesB have been considered by e.g. Yi and Xu (2008) and Habier et al. (2011). Under the Bayesian LASSO the prior parameters are more commonly estimated from data (e.g. Yi and Xu 2008; de los Campos et al. 2009; Sun et al. 2010; Shepherd et al. 2010) than given as constants (Xu 2010). While the Bayesian models have proven workable, efficient and flexible, the tremendous number of markers in the modern genome-wide data sets make the computational methods traditionally connected to Bayesian estimation, e.g. Markov Chain Monte Carlo (MCMC), quite slow and cumbersome. For the same models fast alternative estimation procedures have been proposed, most commonly based on estimation of the maximum point of the posterior density (MAP-estimate), rather than the whole posterior 10
11 distribution, by expectation-maximization (EM) algorithm (Dempster et al. 1977; McLachlan and Krishnan 1997; for the methods see e.g. Yi and Banerjee 2009; Hayashi and Iwata 2010; Figueiredo 2003; Sun et al. 2010; Xu 2010; Meuwissen et al. 2009; Shepherd et al. 2010; Lee et al. 2010). 2 Objectives of the study The objectives of this work are to 1) better understand the behavior of the different Bayesian multilocus association models, especially under the maximum a posteriori estimation context, and to obtain further information on the instances in which different methods work best, 2) seek connections between the different Bayesian models and try to see the different model variants as special cases or sub-models of a common model framework, 3) pay special attention to the significance of the parametrization and hierarchical structure of the model for elegant derivation and convergence properties of the estimation algorithm, and 4) to develop a flexible and versatile Bayesian multilocus association model framework, along with an efficient parameter estimation machinery, that can be utilized in phenotype prediction, genomic breeding value estimation and QTL (quantitative trait loci) detection and effect estimation from a variety of genome-wide data. The original papers I III contribute to the objectives in the following manner. In I we lay the foundation for our Bayesian model framework, examine the behavior and predictive performance of different sub-models and prior densities, including G-BLUP, and present a generalized expectationmaximization algorithm (GEM) for the parameter estimation. In II we apply selected parts of the model framework in QTL mapping context and, in particular, consider the impact of an additional polygenic component for the performance of the model and the GEM-algorithm. In III we generalize the model framework and the GEM-algorithm for ordered categorical and censored Gaussian phenotypes. 11
12 3 Hierarchical Bayesian model In Bayesian inference the learning from data is based on updating the prior belief concerning the model parameters into the posterior belief by applying the Bayes theorem. Let p(θ) denote the joint prior density for the unknown parameters and p(data Θ) the likelihood of the data given those parameters. Now the posterior density for the unknown parameters, given the data, is acquired from the Bayes formula p(θ data) = p(data Θ)p(Θ) p(data) p(data Θ)p(Θ), where the normalizing constant p(data) = p(data Θ)p(Θ)dΘ is the marginal likelihood of the data. As the marginal likelihood has a Θ constant value, it is usually omitted from the computation, and the joint posterior density is considered to be proportional to the product of the likelihood and the joint prior density. In addition to the prior conception of the parameter values, the joint prior density expresses the mutual relationships of the parameters, e.g. whether the parameters are considered a priori independent or conditional to some other parameters. This definition is denoted as the hierarchical structure of the Bayesian model. Let e.g. the parameter vector be Θ = (θ 1, θ 2 ), and let θ 1 be a priori dependent on θ 2. Now the joint prior is given by p(θ) = p(θ 1 θ 2 )p(θ 2 ), and the dependent parameter θ 1 is said to be located on a lower layer of the model hierarchy. In its complete form our hierarchical Bayesian model framework, depicted as a directed acyclic graph in Figure 3.1, consists of two separate parts, the linear Gaussian model and the threshold model. Under the linear Gaussian model the phenotype measurements are assumed to be continuous and follow a Gaussian density, while the additional threshold model handles binary, ordinal and censored Gaussian observations. The hierarchical model has a total of six layers, two of which are optional. The observed data, located on the 1st and 2nd layers in the graph, comprises phenotype and genotype information and, optionally, a known pedigree of a sample of related individuals. The continuous Gaussian phenotypes, denoted by a vector y, and the genetic data matrix X consisting the genotypes of biallelic SNP markers, are located on the observed data layer of the linear Gaussian model. As the binary, ordinal and censored Gaussian observations are handled via a latent variable parametrization, they are located on the optional observed layer of the threshold model in Figure 3.1. The possible pedigree information is given in a form of an additive genetic relationship matrix (Lange 1997), located on the optional observed layer 12
13 Figure 3.1: Hierarchical structure of the model framework. The ellipses represent random parameters and rectangles fixed values, while the roundcornered rectangles may be either, depending on the selected model. Solid arrows indicate statistical dependency and dashed arrows functional relationship. The background boxes indicate the main modules of the model framework. 13
14 in the directed acyclic graph (Figure 3.1) to represent its non-compulsory nature. In the following sections we first will consider the linear Gaussian model part, and only after that focus on the threshold model for the discrete or censored data. 3.1 Gaussian likelihood In the center of a Bayesian model there is the likelihood function of the data given the model parameters. The likelihood is based on the probability model (sometimes called the sampling model) determining how the independent variables or traits are connected to the explanatory variables. In our model framework the Gaussian phenotypes are connected to the marker and pedigree information with a linear Gaussian association model (see Figure 3.1) y = β 0 + XΓβ + Zu + ε, (3.1) where y denotes the phenotypic records of n individuals, β 0 is the population intercept, and ε corresponds to the residuals, assumed normal and independent, ε MVN(0, I n σ0). 2 If necessary, the intercept β 0 can be easily replaced with a vector of environmental variables. The second term on the right hand side of the equation (3.1) comprises the observed genotypes X and the allele substitution effects Γβ. The observed genotypes of the p biallelic SNP markers are coded with respect to the number of the rare alleles (0,1 and 2) and standardized to have null mean and unity variance. In the complete model the allele substitution effect (see Marker effect in Figure 3.1) is modeled following Kuo and Mallick (1998) as a product of the size of the effect and a variable indicating whether the marker is linked to the phenotype. In the equation (3.1), β denotes the additive effects sizes, and Γ is a diagonal matrix of indicator variables, whose jth diagonal element γ j has value 1 if the jth SNP marker is included in the model, and 0 otherwise. As depicted in Figure 3.1, the indicator and the effect size are considered a priori independent. The term u in the equation (3.1) denotes the additive polygenic effects due to the combined effect of infinite number of loci, and Z is a design matrix connecting the polygenic effects to the observed phenotypes. The individuals, or their phenotypic values y i, are assumed conditionally independent given the genotype information X and the polygenic effect u. This assumption and the described linear marker association model (3.1) 14
15 gives a multivariate normal likelihood p (y β 0, σ0, 2 β, Γ, u, X, Z) det(i n σ0) 2 1/2 ( exp 1 ) 2 (y β 0 XΓβ Zu) (I n σ0) 2 1 (y β 0 XΓβ Zu) (3.2) for the phenotypes given the parameter vector. Due to the independence of the observations, the likelihood can be interpreted also as an univariate normal N(β 0 + p j=1 γ jβ j x ij + u i, σ 2 0) given a single observation y i and the appropriate parameters. The parameters of the multilocus association model that are present in the likelihood function are located in the model parameters -layer of the linear Gaussian model in Figure Shrinkage inducing priors The second essential component of a Bayesian model consists of the prior densities for the model parameters. The prior for a given parameter represents the a priori understanding of the plausibility of different parameter values. In some cases there is no reason to believe that one parameter value would be more plausible than another, which conception is expressed with a flat or an uninformative prior density, e.g. by setting p(β 0 ) 1 and p(σ0) 2 1/σ0 2 (note the Noninformative uniform priors at layer 5 in Figure 3.1). In some cases, however, the prior density plays a most important role in the model operation. A central feature of handling an oversaturated model is the selection or regularization of the excess predictors. In the Bayesian context the regularization is included into the model by specifying such a prior density for the regression coefficients, that it represents the a priori understanding that the majority of the predictors have only a negligible effect, while there are a few predictors with possibly large effect sizes. A prior that would evince this idea should consist of a probability mass centered near zero and a probability mass distributed over the nonzero values, including a reasonably high probability for large values. The probability density functions we have used for imitating this spike and slab shape are Student s t (following e.g. Meuwissen et al. 2001; Xu 2003) and Laplace densities (following e.g. Park and Casella 2008; de los Campos et al. 2009), either alone or combined with a point mass at zero (e.g. Meuwissen et al. 2001; Shepherd et al. 2010). In our full model framework, (3.1) and Figure 3.1, the mixture prior with the point mass at zero is accomplished by adding a dummy variable to indicate whether the effect of a given predictor variable is included into 15
16 the model or not. Following Kuo and Mallick (1998) the marker effects are modeled as a product of the indicator variable γ j and the effect size β j, which are considered a priori independent, hence the joint prior of the marker effect becomes simply p(γ j β j ) = p(γ j )p(β j ), where p(γ j ) is a Bernoulli density with a prior probability π = P(γ j = 1) for a marker to be linked to the trait and p(β j ) is the prior density for the effect size Hierarchical formulation of the prior densities The Student s t and the Laplace distribution can both be expressed as a scale mixture of normal distributions with a common mean and effect specific variances. The hierarchical formulation of a Student s t-distribution with ν degrees of freedom, location µ and scale τ 2 is a scale mixture of normal densities with mean µ and variances following a scaled inverse-χ 2 distribution with ν degrees of freedom and scale τ 2, } β j σj 2 N(µ, σj 2 ) = β σj 2 ν, τ 2 Inv-χ 2 (ν, τ 2 j t ν (µ, τ 2 ), ) while a Laplace density with location µ and rate λ can be presented in a similar manner, the mixing distribution now being an exponential one, } β j σj 2 N(µ, σj 2 ) = β σj 2 λ 2 Exp(λ 2 j Laplace(µ, λ). /2) The hierarchical representation of the prior densities bears a twofold advantage (I). First, the derivation of the fully conditional posterior densities, and hence the derivation of the estimation algorithm, simplifies greatly. Within MCMC world, the hierarchical formulation of the prior densities, also known as model- or parameter expansion, is a well known device to simplify computations by transforming the prior into a conjugate and thus enabling Gibbs sampling. Conjugacy of a prior distribution means that the fully conditional posterior probability distribution of a given parameter will be of same type as the prior distribution of that parameter, and hence we are guaranteed to get a closed form fully conditional posterior with a known probability density function. The hierarchical formulation of a prior density is also known to accelerate convergence of a MCMC sampler by adding more working parts and therefore more space for the random walk to move (see e.g. Gilks et al. 1996; Gelman et al. 2004; Gelman 2004). In maximum a posteriori (MAP) estimation, on the other hand, a commonly adopted approach to try and simplify the model is to integrate out the effect variances. However, the conjugacy maintained by preserving the intermediate variance 16
17 layer (layer 4 in Figure 3.1) is a valuable feature also for MAP-estimation, as it enables the straightforward derivation of the fully conditional posterior density functions. Expressed as a scale mixture, the Student s t distribution leads to conjugate priors for normal likelihood parameters, and hence is a perfect choice for a conjugate analysis. Although the decomposition of the Laplace prior does not provide conjugacy, it leads to a tractable fully conditional posterior density for the inverse of the effect variance. Second, the estimation algorithm is likely to behave better under a hierarchical model. Even though the marginal distributions of the marker effects are mathematically equivalent in hierarchical and non-hierarchical models, we noted in I that the parametrization and model structure alter the properties and behavior of the model, and thus have influence on the mixing and convergence properties of an estimation algorithm, and also on the values of the actual estimates. We noted in I that in some cases the hierarchical Laplace model was clearly more accurate than its nonhierarchical counterpart. Also, contrary to the non-hierarchical version, the hierarchical Laplace model worked without the additional indicator variable, i.e. without a zero-point-mass in the prior of the marker effects. This simplification of the model leads not only to more straightforward implementation and faster estimation, but also to easier and more accurate selection of prior parameters. 3.3 Sub-models As mentioned above, we like to consider the full model in Figure 3.1 as a framework incorporating a set of model variants, or sub-models, embodying different components of the model framework. In I we covered a multitude of such variants, and also showed how the model variants correspond to the Bayesian phenotype prediction and genomic breeding value estimation methods proposed in the literature. The non-compulsory components of the multilocus association model comprise the polygenic component, the indicator variable and the 6th, optional hyperprior layer. The selection between the Student s t and the Laplace prior densities forms one means of modifying the prior density assigned for the marker effects, while the inclusion/exclusion of the indicator and the hyperprior layer forms another. The polygenic component, on the other hand, is clearly an external addition to the multilocus association model. 17
18 3.3.1 Polygenic component The polygenic component u is included into the model to represent the genetic variation possibly not captured by the SNP markers and to take account for putative residual dependencies between individuals (Yu et al. 2006). The sample or population structure is included into the model as the covariance matrix of the multivariate normal prior density given for the polygenic effect u (σu, 2 A) MVN(0, σua), 2 where σu 2 is the polygenic variance component and A is the genetic relationship matrix. The genetic relationship matrix is either a pedigree based additive genetic relationship matrix (see Lange 1997) (I and II), or, if there is no pedigree available, a finite locus approximation based on the markers not included in the actual multilocus association model (a genomic relationship matrix) (II). The polygenic variance component σu 2 has been given an Inverse-χ 2 (ν u, τu) 2 prior distribution with suitable data specific parameter values. On the basis of the existing literature the need for an additional polygenic component within a multilocus association model is unclear. Many authors have found the polygenic component irrelevant (e.g. Calus and Veerkamp 2007; Pikkuhookana and Sillanpää 2009), while e.g. de los Campos et al. (2009) and Lund et al. (2009) see it as a necessary. In I and II we examined the importance of the additional polygenic component in genomic selection and in association mapping context, respectively, with both simulated and real data. Within these works the estimates of the polygenic component were negligible, and had no influence neither in the prediction accuracy (I) nor in the gene location ability (II) of the model. None of the Bayesian multilocus models seemed to benefit from addition of the polygenic component with neither simulated (I and II) nor real data (I), the phenotype of the latter most likely being quite polygenic in nature. The polygenic component did not find extra information even when the task was made as easy as possible by generating the polygenic component of the data by using the same relationship matrix which was also used in the analyses (II). Therefore, to our experience, the polygenic component can safely be omitted from the multilocus association model (3.1) Indicator The indicator variable is added to the model framework to participate as a source of extra shrinkage in a mixture prior alongside the Student s t or the Laplace density. The usefulness of the indicator variable depends on the other source of shrinkage in the model. As mentioned above, the 18
19 hierarchical Laplace model does not seem to require the additional point mass at zero, on the contrary the model efficiency sustains damage if the indicator is added (Tables 2 5 in I). On the other hand, the Student s t model clearly benefits from the additional point mass. The latter observation is in strict concordance with the existing literature, as the superiority of BayesB (Student s t plus indicator) (Meuwissen et al. 2001) over BayesA (only Student s t) can be considered as common knowledge. While the main purpose of the indicator variable within our model framework is to participate in the mixture prior with the Student s t or Laplace densities, in II we have considered a pure indicator model. Under the Indicator model proposed in II, the prior for the effect sizes β j is Gaussian with zero mean and a predetermined variance, and therefore the prior for the marker effects γ j β j is a mixture of a Gaussian density and a point mass at zero. As the Gaussian prior introduces a constant shrinkage to the estimates, and hence the variable selection relies solely on the indicator, a Bayes factor based on the values of the indicators can be used in determining the significance of a marker effect. Contrary to phenotype or breeding value prediction, in gene mapping the significance of the individual marker effects is of importance. Nevertheless, the Indicator model in II is mainly considered as a curiosity and a proof of the power of a multilocus association treatment, as even an extremely simple multilocus association method may exceed the performance of a most sophisticated single marker method (Figure 1, A and B in II). The indicator has a Bernoulli prior with a prior probability π = P(γ j = 1) for the SNP j contributing to the trait. The value given for the probability π also represents our prior assumption of the proportion of the SNP markers that are linked to the trait. However, as the indicator affects the shrinkage of the marker effects concurrent with the shrinkage generated by the Student s t or the Laplace density, the parameters assigned for these densities affect the selection of π Hyperprior The optional hyperprior layer (the 6th layer in Figure 3.1) composes another facultative part of the model framework. The parameters of the prior densities (layer 5 in Figure 3.1) can be either predetermined or estimated simultaneously to the model parameters. As the prior densities for the effect size and the indicator are responsible for the regularization of the excess variables in the model, the impact of the parameter values of these priors is greater than of the other prior densities in the model. There- 19
20 fore the putative estimation of the prior parameters is limited to these two parameters. The estimation of the prior parameters is depicted in Figure 3.1 by considering the priors for the indicator and the effect variance as random variables, and adding the 6th layer into the model. If the parameters for the prior densities are considered fixed, the optional hyperprior layer is absent from the model. The fixed prior parameter values can be determined e.g. by cross validation or by Bayesian information criterion (see Sun et al. 2010). It is noteworthy, that even if the prior parameters are estimated from the data, i.e. the 6th layer is present in the model, the need for predetermined values does not vanish, but simply passes to the next layer of the model hierarchy. Hence, inevitably, at the very bottom of the model hierarchy the user has to determine some values prior to the actual parameter estimation. The hyperprior given for the effect size is a conjugate Gamma(κ, ξ) density for the scale τ 2 of the inverse-χ 2 density under the Student s t model, or, respectively, for the rate λ 2 of the exponential density under the Laplace model. There is neither conjugate prior nor closed form posterior density available for the degrees of freedom parameter of the Student s t model, and hence we have decided to consider it as fixed (I). For the indicator variable, the prior probability π = P(γ j = 1) of the marker j to be linked to the trait, is estimated with either an uninformative uniform Beta(1,1), or an informative Beta(a, b) density. The informative beta prior embodies our a priori assumed belief of the proportion of significant markers by considering a as the number of markers assumed to be linked to trait and b as the number of markers not to be linked (i.e. b = p a, p being the number of markers in the data set) Student s t vs. Laplace prior In the original work I one of our main interests was to consider the pros and cons of the Student s t and Laplace prior densities. The advantage of the hierarchically formulated Student s t density as a prior is the extremely easy derivation of the fully conditional posterior densities. Although the hierarchical Laplace prior also leads to tractable fully conditional posterior functions, the derivation of the posterior for the effect variance is clearly more complicated than with the Student s t density. However, the Student s t prior has some shortcomings too. The first problem we encountered with the Student s t model was the estimation of the parameters for the prior densities (5th layer in the Figure 3.1). We tried numerous hyperpriors for the effect variance and the indicator, but it appeared to be impossible to 20
21 select ones leading to a reasonable estimate. Hence, after several attempts, we decided on treating the prior parameters of the Student s t model as given. Under the Laplace model there was no such complications, and the prior parameters of the Laplace model are estimated from the data. Therefore, in the Laplace model the 6th layer of Figure 3.1 is always included in the model, while in the Student s t model it is always excluded from the model. Due to its shape, the shrinking ability of the Student s t prior is weaker than of the Laplace prior. While the hierarchical Laplace prior worked fine without the additional indicator variable, the Student s t prior required the additional point mass at zero in order to provide a strong enough shrinkage (Tables 2 5 in I). As pointed out previously, a low number of parameters is a desirable characteristic in a model. Apart from a single data set (table 2 in I), the prediction accuracy of the Laplace model was higher compared to the Student s t model (Tables 3-5 in I). The better performance of the Laplace model may be partially due to the easier and hence more accurate prior selection, partially due to the more favorable shape of the density itself. Also, as the prior parameters for the effect variance can be estimated, and hence there is an additional layer in the hierarchical model, the model may be more robust to the given hyperprior parameter values. Altogether, on the basis of our findings in I, we feel that the hierarchical Laplace model appears to have an advantage over the Student s t model, and therefore decided to concentrate on the former in II and in III Bayesian LASSO and its extensions The hierarchical Bayesian model with a Laplace prior density is commonly denoted as the Bayesian LASSO (Park and Casella 2008) since it leads to a nearly identical estimate as the frequentist LASSO by Tibshirani (1996). The Bayesian LASSO has been further modified by several authors, including Yi and Xu (2008), Mutshinda and Sillanpää (2010), Sun et al. (2010) and Fang et al. (2012). In II we considered a modification of the Bayesian LASSO introduced by Mutshinda and Sillanpää (2010) called the Extended Bayesian LASSO (EBL). Following common hierarchical Bayesian LASSO, the Laplace prior is expressed as a scale mixture of normal densities with exponential mixing distribution, so the EBL assigns a normal prior with independent locusspecific variances to the regression parameters given the locus variances β j σj 2 N(0, σj 2 ), and further an exponential prior to the variances σj 2 λ 2 j Exp(λ 2 j/2). Unlike Bayesian LASSO, the regularization parameters λ 2 j of 21
22 EBL are locus specific, and can be decomposed by setting λ j = δη j, where δ represents the model sparseness common to all loci, and η j is a locusspecific deviation representing the shrinkage working at locus j. Now the common Bayesian LASSO can be seen as a special case of EBL with the locus specific component set to η j = 1 j. Setting the common shrinkage parameter δ = 1 would lead to the Improved Bayesian LASSO proposed by Fang et al. (2012) Bayesian G-BLUP In addition to the multilocus association model, in I and III we have considered a Bayesian version of the genomic- or G-BLUP, a classical BLUP model where the numerator relationship matrix, estimated from the pedigree, is replaced by a genomic marker-based relationship matrix y = β 0 + Zu + ε. (3.3) In the model framework in Figure 3.1 the G-BLUP can be seen as a mirror image of the multilocus association model without the polygenic component, as here we have the polygene without the marker effects. The likelihood of the data under the G-BLUP is simply a multivariate normal with mean β 0 + Zu and covariance I n σ0. 2 Prior for the genetic values u and the population intercept β 0 are conjugate multivariate normal MVN(0, Gσu) 2 and uniform, respectively, G being the genomic relationship matrix. The variances σ0 2 and σu 2 have inverse-χ 2 priors, uninformative p(σ0) 2 1/σ0 2 and a level Inv-χ 2 (ν u, τu), 2 respectively. Under the G-BLUP the genetic marker data is incorporated into the model in a form of a genomic relationship matrix. There are numerous methods of generating the genomic relationship matrix, we have used the second method described in VanRaden (2008). This method is based on the identity by state (IBS) of the marker genotypes, and hence it measures the realized relationship between the individuals. The Bayesian approach differs from the frequentist G-BLUP in terms of handling the variance components. While the frequentist methods commonly estimate the genomic breeding values with known variance components, in a Bayesian approach the variance components are estimated simultaneously to the breeding values (Hallander et al. 2010). Therefore the Bayesian inference is always based on variances that are up-to-date and specific to the analyzed trait, letting also the uncertainty of the variance components to be incorporated into the estimates of the breeding values. Even though e.g. ASREML (Gilmour et al. 2009) estimates the variance 22
23 components from the data, and hence satisfies the up-to-date criterion, the variances are not estimated simultaneously to the breeding values, instead, the pre-estimated variance components are considered constant while estimating the breeding values. 3.4 Fully conditional posterior densities As depicted in the Figure 1, the model parameters β 0, σ 2 0, β, γ and u, located at the 3rd layer, are considered a priori independent of each other. The prior independence of the indicator and the effect size, as suggested by Kuo and Mallick (1998), leads to the most straightforward parametrization of a mixture prior for the effects. In conjunction with the conjugate, or otherwise well chosen, prior densities it enables an easy derivation of a closed form fully conditional posterior distribution for every parameter of the model framework. The joint posterior distribution of the parameters, given the data, is proportional to the product of the joint prior and the likelihood. We can easily extract the fully conditional posterior densities of individual parameters from the joint posterior by handling all other parameters as constants and leaving them out, and hence selecting only the terms including the parameter in question. For example, the fully conditional posterior distribution of a single regression coefficient β j, given all other parameters and the data, is derived from the joint distribution simply by selecting only the terms including β j, i.e. the likelihood and the conditional prior p(β j σ 2 j ). Under the full multilocus association model (3.1) we get the following, closed form, fully conditional posterior distributions for the model parameters (for simplicity: = the data, and the parameters except the one in question ): ( 1 β 0 N n n (y i i=1 σ 2 0 Inv-χ 2 (n, 1 n p γ j β j x ij u i ), j=1 n (y i β 0 i=1 σ0 2 ), (3.4) n p γ j β j x ij u i ) ), 2 (3.5) j=1 β j N(µ j, s 2 j), where (3.6) µ j n ( = γ j x ij y i β 0 ) / ( n ) γ l β l x il u i (γ j x ij ) 2 + σ2 0, σ 2 i=1 l j i=1 j s 2 j /( n ) = σ0 2 (γ j x ij ) 2 + σ2 0, σj 2 i=1 23
The joint posterior distribution of the unknown parameters and hidden variables, given the
DERIVATIONS OF THE FULLY CONDITIONAL POSTERIOR DENSITIES The joint posterior distribution of the unknown parameters and hidden variables, given the data, is proportional to the product of the joint prior
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationBayesian Regression (1/31/13)
STA613/CBB540: Statistical methods in computational biology Bayesian Regression (1/31/13) Lecturer: Barbara Engelhardt Scribe: Amanda Lea 1 Bayesian Paradigm Bayesian methods ask: given that I have observed
More informationProbabilistic machine learning group, Aalto University Bayesian theory and methods, approximative integration, model
Aki Vehtari, Aalto University, Finland Probabilistic machine learning group, Aalto University http://research.cs.aalto.fi/pml/ Bayesian theory and methods, approximative integration, model assessment and
More informationRegularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics
Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics arxiv:1603.08163v1 [stat.ml] 7 Mar 016 Farouk S. Nathoo, Keelin Greenlaw,
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationRecent advances in statistical methods for DNA-based prediction of complex traits
Recent advances in statistical methods for DNA-based prediction of complex traits Mintu Nath Biomathematics & Statistics Scotland, Edinburgh 1 Outline Background Population genetics Animal model Methodology
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationAn Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models
Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong
More informationModel-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate
Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel
More information10. Exchangeability and hierarchical models Objective. Recommended reading
10. Exchangeability and hierarchical models Objective Introduce exchangeability and its relation to Bayesian hierarchical models. Show how to fit such models using fully and empirical Bayesian methods.
More informationSupplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control
Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Xiaoquan Wen Department of Biostatistics, University of Michigan A Model
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationSTA 216, GLM, Lecture 16. October 29, 2007
STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationLecture 16: Mixtures of Generalized Linear Models
Lecture 16: Mixtures of Generalized Linear Models October 26, 2006 Setting Outline Often, a single GLM may be insufficiently flexible to characterize the data Setting Often, a single GLM may be insufficiently
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More informationBayesian Linear Regression
Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective
More informationBayesian shrinkage approach in variable selection for mixed
Bayesian shrinkage approach in variable selection for mixed effects s GGI Statistics Conference, Florence, 2015 Bayesian Variable Selection June 22-26, 2015 Outline 1 Introduction 2 3 4 Outline Introduction
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More informationIntroduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models
Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Matthew S. Johnson New York ASA Chapter Workshop CUNY Graduate Center New York, NY hspace1in December 17, 2009 December
More informationBayesian Inference: Concept and Practice
Inference: Concept and Practice fundamentals Johan A. Elkink School of Politics & International Relations University College Dublin 5 June 2017 1 2 3 Bayes theorem In order to estimate the parameters of
More informationAssociation Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5
Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative
More informationA Review of Bayesian Variable Selection Methods: What, How and Which
Bayesian Analysis (2009) 4, Number 1, pp. 85 118 A Review of Bayesian Variable Selection Methods: What, How and Which R.B. O Hara and M. J. Sillanpää Abstract. The selection of variables in regression
More informationRidge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation
Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking
More informationMultiple QTL mapping
Multiple QTL mapping Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] 1 Why? Reduce residual variation = increased power
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationBayesian construction of perceptrons to predict phenotypes from 584K SNP data.
Bayesian construction of perceptrons to predict phenotypes from 584K SNP data. Luc Janss, Bert Kappen Radboud University Nijmegen Medical Centre Donders Institute for Neuroscience Introduction Genetic
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationMIXED MODELS THE GENERAL MIXED MODEL
MIXED MODELS This chapter introduces best linear unbiased prediction (BLUP), a general method for predicting random effects, while Chapter 27 is concerned with the estimation of variances by restricted
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More information. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)
Supplementary information S7 Testing for association at imputed SPs puted SPs Score tests A Score Test needs calculations of the observed data score and information matrix only under the null hypothesis,
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationEstimation of Parameters in Random. Effect Models with Incidence Matrix. Uncertainty
Estimation of Parameters in Random Effect Models with Incidence Matrix Uncertainty Xia Shen 1,2 and Lars Rönnegård 2,3 1 The Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala, Sweden; 2 School
More informationDavid Giles Bayesian Econometrics
David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian
More informationRonald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California
Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University
More informationOr How to select variables Using Bayesian LASSO
Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More information25 : Graphical induced structured input/output models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large
More informationConsistent high-dimensional Bayesian variable selection via penalized credible regions
Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More informationDynamic System Identification using HDMR-Bayesian Technique
Dynamic System Identification using HDMR-Bayesian Technique *Shereena O A 1) and Dr. B N Rao 2) 1), 2) Department of Civil Engineering, IIT Madras, Chennai 600036, Tamil Nadu, India 1) ce14d020@smail.iitm.ac.in
More informationAlternative implementations of Monte Carlo EM algorithms for likelihood inferences
Genet. Sel. Evol. 33 001) 443 45 443 INRA, EDP Sciences, 001 Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Louis Alberto GARCÍA-CORTÉS a, Daniel SORENSEN b, Note a
More informationGWAS IV: Bayesian linear (variance component) models
GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian
More informationHeritability estimation in modern genetics and connections to some new results for quadratic forms in statistics
Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics Lee H. Dicker Rutgers University and Amazon, NYC Based on joint work with Ruijun Ma (Rutgers),
More informationOutline lecture 2 2(30)
Outline lecture 2 2(3), Lecture 2 Linear Regression it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic Control
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationGENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (BL)
GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (BL) Paulino Pérez 1 José Crossa 2 1 ColPos-México 2 CIMMyT-México September, 2014. SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions
More informationChoosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation
Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation COMPSTAT 2010 Revised version; August 13, 2010 Michael G.B. Blum 1 Laboratoire TIMC-IMAG, CNRS, UJF Grenoble
More informationBayesian methods in economics and finance
1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationBayesian Inference: Probit and Linear Probability Models
Utah State University DigitalCommons@USU All Graduate Plan B and other Reports Graduate Studies 5-1-2014 Bayesian Inference: Probit and Linear Probability Models Nate Rex Reasch Utah State University Follow
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationLecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011
Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector
More informationContents. Part I: Fundamentals of Bayesian Inference 1
Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationLecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017
Lecture 2: Genetic Association Testing with Quantitative Traits Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 29 Introduction to Quantitative Trait Mapping
More informationHERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA)
BIRS 016 1 HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA) Malka Gorfine, Tel Aviv University, Israel Joint work with Li Hsu, FHCRC, Seattle, USA BIRS 016 The concept of heritability
More informationMS-C1620 Statistical inference
MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents
More information(Genome-wide) association analysis
(Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by
More informationLarge-scale Ordinal Collaborative Filtering
Large-scale Ordinal Collaborative Filtering Ulrich Paquet, Blaise Thomson, and Ole Winther Microsoft Research Cambridge, University of Cambridge, Technical University of Denmark ulripa@microsoft.com,brmt2@cam.ac.uk,owi@imm.dtu.dk
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationBayesian model selection: methodology, computation and applications
Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program
More informationBayesian Grouped Horseshoe Regression with Application to Additive Models
Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne
More information1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics
1 Springer Nan M. Laird Christoph Lange The Fundamentals of Modern Statistical Genetics 1 Introduction to Statistical Genetics and Background in Molecular Genetics 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationThe Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations
The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture
More informationLecture 8 Genomic Selection
Lecture 8 Genomic Selection Guilherme J. M. Rosa University of Wisconsin-Madison Mixed Models in Quantitative Genetics SISG, Seattle 18 0 Setember 018 OUTLINE Marker Assisted Selection Genomic Selection
More informationGibbs Sampling in Linear Models #2
Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling
More informationLecture 9. QTL Mapping 2: Outbred Populations
Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationLECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)
LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationBayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson
Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n
More informationSome models of genomic selection
Munich, December 2013 What is the talk about? Barley! Steptoe x Morex barley mapping population Steptoe x Morex barley mapping population genotyping from Close at al., 2009 and phenotyping from cite http://wheat.pw.usda.gov/ggpages/sxm/
More informationQTL model selection: key players
Bayesian Interval Mapping. Bayesian strategy -9. Markov chain sampling 0-7. sampling genetic architectures 8-5 4. criteria for model selection 6-44 QTL : Bayes Seattle SISG: Yandell 008 QTL model selection:
More informationDAG models and Markov Chain Monte Carlo methods a short overview
DAG models and Markov Chain Monte Carlo methods a short overview Søren Højsgaard Institute of Genetics and Biotechnology University of Aarhus August 18, 2008 Printed: August 18, 2008 File: DAGMC-Lecture.tex
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of
More informationHierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31
Hierarchical models Dr. Jarad Niemi Iowa State University August 31, 2017 Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 1 / 31 Normal hierarchical model Let Y ig N(θ g, σ 2 ) for i = 1,...,
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2
Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate
More informationLinear Models A linear model is defined by the expression
Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose
More informationMotivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University
Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined
More informationCSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection
CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection
More informationBayesian Linear Models
Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationIntegrated Non-Factorized Variational Inference
Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014
More informationBayesian data analysis in practice: Three simple examples
Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationOutline Lecture 2 2(32)
Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic
More information