Bayesian Multilocus Association Models for Prediction and Mapping of Genome-Wide Data

Size: px
Start display at page:

Download "Bayesian Multilocus Association Models for Prediction and Mapping of Genome-Wide Data"

Transcription

1 Bayesian Multilocus Association Models for Prediction and Mapping of Genome-Wide Data DOCTORAL THESIS IN ANIMAL SCIENCE Hanni P. Kärkkäinen ACADEMIC DISSERTATION To be presented, with the permission of the Faculty of Agriculture and Forestry of the University of Helsinki, for public criticism in the Lecture Hall of Koetilantie 5, Helsinki, on November 15th 2013, at 12 o clock noon. Helsinki 2013 DEPARTMENT OF AGRICULTURAL SCIENCES PUBLICATIONS 20

2 Supervisor: Professor Mikko J. Sillanpää University of Oulu Department of Mathematical Sciences Department of Biology and Biocenter Oulu P.O.Box 3000 FIN Oulu, Finland Co-supervisor: Adjunct Professor Jarmo Juga University of Helsinki Department of Agricultural Sciences P.O.Box 27 FIN Helsinki, Finland Reviewers: Professor Daniel Sorensen Aarhus University Department of Molecular Biology and Genetics P.O.Box 50 DK 8830 Tjele, Denmark Professor Otso Ovaskainen University of Helsinki Department of Biosciences P.O. Box 56 FIN Helsinki, Finland Opponent: Senior Researcher Luc Janss Aarhus University Department of Molecular Biology and Genetics P.O.Box 50 DK 8830 Tjele, Denmark ISBN (Paperback) ISBN (PDF) Electronic publication at Unigrafia Helsinki 2013

3 List of original publications The following original papers are referred in the text by their Roman numerals. (I) Kärkkäinen, H. P. and M. J. Sillanpää, 2012 Back to basics for Bayesian model building in genomic selection. Genetics 191: (II) Kärkkäinen, H. P. and M. J. Sillanpää, 2012 Robustness of Bayesian multilocus association models to cryptic relatedness.ann.hum.genet. 76: Corrected by: Corrigendum. Ann. Hum. Genet. 77:275. (III) Kärkkäinen, H. P. and M. J. Sillanpää, 2013 Fast genomic predictions via Bayesian G-BLUP and multilocus models of threshold traits including censored Gaussian data. G3 (Bethesda) 3: The publications have been reprinted with the kind permission of their copyright holders. The contributions of the authors HPK and MJS can be detailed as follows: I Both authors were involved in the conception and design of the study. HPK derived the fully conditional posterior distributions and the GEM algorithm, implemented the algorithm with Matlab, performed the data analyses and drafted the manuscript. Both authors participated in the interpretation of results and critically revised the manuscript. II Both authors were involved in the conception and design of the study. HPK derived the fully conditional posterior distributions and the GEM algorithm, implemented the algorithm with Matlab, performed the data analyses and drafted the manuscript. Both authors participated in the interpretation of results and critically revised the manuscript. III Both authors were involved in the conception and design of the study. HPK derived the fully conditional posterior distributions and the GEM algorithm, implemented the algorithm with Matlab, performed the data analyses and drafted the manuscript. Both authors participated in the interpretation of results and critically revised the manuscript. 3

4 Contents 1 Introduction 5 2 Objectives of the study 11 3 Hierarchical Bayesian model Gaussian likelihood Shrinkage inducing priors Hierarchical formulation of the prior densities Sub-models Polygenic component Indicator Hyperprior Student s t vs. Laplace prior Bayesian LASSO and its extensions Bayesian G-BLUP Fully conditional posterior densities Threshold model Binary response Censored Gaussian response Parameter estimation Generalized expectation-maximization Prior selection in MAP estimation GEM-algorithm for a MAP estimate Example analyses Data sets XIII QTL-MAS Workshop data Real pig (Sus scrofa) data Human HapMap data Discrete and censored data Pre-selection of the markers Genomic prediction Association mapping Decision making Diagnostics Of speed and convergence

5 6 Conclusions Current status What have we learned? What s next? Foreword Genome-wide marker data is used in animal and plant breeding in computing genomic breeding values, and in human genetics in identifying disease susceptibility genes, predicting unobserved phenotypes and assessing disease risks. While the tremendous number of markers available for easy and cost-effective genotyping is an invaluable asset in genetic research and animal and plant breeding, the ever increasing data sets are placing heavy demands on the statistical analysis methodology. The statistical methods proposed for genomic selection are based on either traditional best linear unbiased prediction (BLUP) or different Bayesian multilocus association models. In human genetics the most prevalent approach is a single SNP association model. The thesis consists of three original articles trying to obtain further understanding of the behavior of the different Bayesian multilocus association models and of the instances in which different methods work best, to seek connections between the different Bayesian models, and to develop a Bayesian multilocus association model framework, along with an efficient parameter estimation machinery, that can be utilized in phenotype prediction, genomic breeding value estimation and quantitative trait locus (QTL) location and effect estimation from a variety of genome-wide data. 1 Introduction The invention of single nucleotide polymorphisms (SNP) in conjunction with the utilization of microarray technology in high-throughput genotyping has exploded the availability of genome-wide sets of molecular markers. Whole genome SNP chips are available for a wide range of species, including humans, agriculturally important plant and animal species, and genetic model organisms. In human genetics the common goal of a genome-wide association (GWA) study is to detect disease susceptibility genes, predict unobserved phenotypes, and assess disease risks at the individual level (Lee 5

6 et al. 2008; de los Campos et al. 2010). The animal and plant breeders, on the other hand, are mainly interested in estimating genomic breeding values for genomic selection (Eggen 2012; Nakaya and Isobe 2012). Genomic selection refers to marker assisted selection using a genomewide marker information directly in predicting genomic breeding values, rather than first identifying the causal genes (Meuwissen et al. 2001). The basic principle of genomic selection includes a set of individuals, known as the training set or the reference population, with phenotypic records and genotypic information of a whole-genome SNP array, and a statistical model explaining the connection between the marker genotypes and the phenotypic observations. The training set data is employed in estimating the effects of the SNP markers or genotypes to the phenotype, that is, the parameters of the model. The acquired information is then used in predicting the heritable part of the phenotype, i.e. genomic breeding value, of new individuals (the prediction set) that have only genotypic information available. In animal and plant breeding, the most commonly used approach to predict genomic breeding values based on molecular markers is the genomic best linear unbiased prediction or G-BLUP, a direct descendant of the pedigree-based best linear unbiased prediction (BLUP) model (Henderson 1975). G-BLUP employs the marker information in estimating genomic relationships between the individuals, and utilizes the marker-estimated genomic relationship matrix in a mixed model context (e.g. VanRaden 2008; Powell et al. 2010). A relatively recent but promising contender for the BLUP-type of model in the genomic selection field is to apply simultaneous estimation and variable selection or regularization to multilocus association models (e.g. Meuwissen et al. 2001; Xu 2003). A multilocus association model uses the marker information directly by assigning different, possibly zero, effects to the marker alleles and quantifies the genomic breeding value of an individual as the sum of the marker effects. The advantage of a multilocus association over G-BLUP is that the former allows the estimated effect size to vary over the set of markers, while the latter assumes a constant impact throughout the genome. In human genetics the genome-wide association methods are mainly used for mapping of complex genetic traits. Association mapping utilizes the linkage disequilibrium (LD) between the markers and the causal loci in locating the actual causal genes by searching associations between the markers and the phenotype. Population-based association analyses are more powerful than within-family analyses in detecting the genetic loci associated with the phenotype of interest. As a draw-back, the population- 6

7 based studies often suffer from an inflated rate of false positives due to population stratification (i.e. model misspecification in the presence of hidden population structure) and cryptic relatedness (i.e. model misspecification in the presence of sample structure) (see Kang et al. 2010). For example, if two populations in Hardy-Weinberg proportions with divergent allele frequencies are combined, the combined population may have large amount of linkage disequilibrium simply due to the combination (e.g. Ewans and Spielman 1995). Equivalently, the sample structure of the data may lead to allelic association caused by close relatedness between the individuals rather than true association between the marker and the trait. As e.g. PLINK (Purcell et al. 2007) omits the sample and population structure from the model, the artificial linkage disequilibrium is likely to cause false positive and negative signals for marker loci without any connection to the studied trait. Although some other heavily-used association methods, including e.g. TASSEL (Bradbury et al. 2007), GenABEL (Aulchenko et al. 2007), EMMA (Kang et al. 2008) and EMMAX (Kang et al. 2010), provide a sample structure correction, they consider only one marker at the time, ignoring the possible effects of the other major loci. This is less than ideal in genome-wide study for a complex trait, as such traits are assumed to be affected by a multitude of genes (Weeks and Lathrop 1995). The problem with a multilocus association model applied to a genomewide data set is oversaturation: since usually the number of SNP markers is orders of magnitude greater than the number of individuals, there are far more explanatory variables than observations in the model. This leads to a situation where some kind of selection or regularization of the predictors is required, either by selecting a subset of the variables that explains a large proportion of the variation, by using orthogonal or nonorthogonal combinations of the variables, or by shrinking the effects of the variables towards zero (e.g. Sillanpää and Bhattacharjee 2005; Hoggart et al. 2008; O Hara and Sillanpää 2009; Wu et al. 2009; Ayers and Cordell 2010; Cho et al. 2010). The appeal in the shrunken estimates is that these methods keep the dimension constant across the possible models by not actually selecting a subset of variables, but instead setting the effect of unimportant ones to (or near) zero. The drawback is that the estimates tend to be biased towards too small values. The methods discarding markers irrelevant to the phenotype are often referred as variable selection, while the ones assigning a penalty term to shrink the marker effects towards zero are considered as variable regularization. 7

8 Contrary to the frequentist way of deriving a shrinkage estimator by subtracting a penalty from the gain function (in other words, by adding a penalty to the loss function), in the Bayesian context the regularization mechanism is included into the model by specifying an appropriate prior density for the regression coefficients. A penalized maximum likelihood estimate for the regression coefficients β is acquired by maximizing the penalized gain function β PML = arg max β log(p(data β)) λj(β), (1.1) where log(p(data β)) is the log likelihood and J(β) a penalty function. Commonly used penalty functions are derived from the sum of the L2 or L1 norms of the regression coefficients, J(β) = p β j 2 2 = p βj 2 and J(β) = p β j 1 = p β j, j=1 j=1 j=1 j=1 leading to Ridge Regression (Hoerl 1962) and LASSO (Tibshirani 1996) estimates, respectively. The frequentist penalty function is connected to the prior density of a Bayesian model, as the exponent of the function maximized in the frequentist method equals the product exp( log(p(data β)) λj(β)) = p(data β) exp( λj(β)), (1.2) where p(data β) is the likelihood and exp( λj(β)) represents the prior density function. For example, it can be easily seen that the Ridge Regression penalty equals a Gaussian prior density, as exp( (1/λ) p j=1 β2 j ) is a kernel of a Gaussian probability density function. Similarly the L1 penalty equals a double exponential or Laplace density. Although it is clearly more logical to consider the assumptions about the model sparseness as a part of the model (the prior is a part of the model) rather than a part of the estimator (a penalty is a part of the estimator), the difference may seem trivial in practice. However, the fact that in Bayesian context the model includes all available information, permits the estimator to be always the same, either the whole posterior density or a maximum a posteriori (MAP) point estimate, which in turn enables a straightforward translation of the model into an algorithm. In the Bayesian context the variable regularization is included into the model by specifying a spike and slab prior for the regression coefficients, with spike being the probability mass centered near zero and slab the probability mass distributed over the nonzero values (see O Hara and Sillanpää 2009). This prior represents the assumption that only a small pro- 8

9 portion of the predictors have a non-negligible effect ( slab ), while the majority of the effects are close to zero ( spike ). The Bayesian models proposed in the literature differ with respect to the spike and slab prior densities given for the regression coefficients. The desired shape for the prior density may be acquired either as a mixture of two densities, in which case the model includes a dummy variable indicating whether the effect of a given explanatory variable comes from the spike or from the slab part of the prior, or alternatively a single prior density approximating the spike and slab -shape may be assigned directly on the regression coefficients. In the latter case, the probability density functions commonly used for imitating the spike and slab shape are Student s t (e.g. Bayes A by Meuwissen et al. 2001; Xu 2003; Yi and Banerjee 2009) and Laplace densities (e.g. Park and Casella 2008; Yi and Xu 2008; de los Campos et al. 2009; Xu 2010; Li et al. 2011). Due to the connection to the frequentist L1 penalty function the models with a Laplace prior density are commonly denoted as Bayesian LASSO (Park and Casella 2008). Both Student s t and Laplace density functions possess several favorable features, including high kurtosis and heavy tails, that make them worthy candidates for shrinkage inducing priors. Compared to Gaussian density, these functions consist of a greater probability mass centered near zero and higher probability for large values inducing strong shrinkage to the intermediate sized estimate values and proportionally less shrinkage to the large values and the values near zero. While a Gaussian prior density, or equivalently frequentist Ridge Regression, assigns same penalty to all of the regression coefficients, the heavy-tailed functions work by producing a clearer distinction between large and small estimate values by pushing the intermediate sized values to either direction. For this reason the method is sometimes denoted as adaptive shrinkage. Several modifications of the indicator-type methods have been introduced, differing with respect to the mixture components (distributions that are used to form the mixture distribution) set for the regression coefficients and the hierarchical structure of the prior (the dependency between the indicator and the marker effect, and the participation of the indicator in the likelihood). While the stochastic search variable selection (SSVS) models considers the spike and slab as a mixture of two normal distributions (George and McCulloch 1993; Verbyla et al. 2009), or two Student s t distributions (e.g. Yi et al. 2003), majority of the methods straightforwardly set the regression coefficient to be zero when the indicator is zero (so the spike is in fact a point mass located at zero). A prior consisting a mixture 9

10 of a Student s t density and a point mass at zero has been used in several methods, including BayesB (Meuwissen et al. 2001), Hayashi and Iwata (2010) and Habier et al. (2011). A similar mixture based on a Laplace density has been used by Meuwissen et al. (2009) and Shepherd et al. (2010). The simplest hierarchical structure of the prior density, proposed by Kuo and Mallick (1998), determines the effect of the marker j to the phenotype as a product of the indicator γ j and the effect size β j, and considers these two to be a priori independent. Hence the joint prior of the marker effect γ j β j becomes simply p(γ j β j ) = p(γ j )p(β j ), where p(γ j ) is a Bernoulli density with a prior probability for a marker to be linked to the trait and p(β j ) is the Gaussian, Student s t or Laplace prior density given for the effect size. Other types of hierarchical structures presented in the literature include BayesB (Meuwissen et al. 2001) where the marker effect is given by β j alone since the likelihood does not include the indicator; instead, the indicator acts through the effect variance. In Gibbs variable selection, on the other hand, the marker effect is considered as a product of the indicator and the effect size, but the prior density of the effect size is dependent on the indicator (Dellaportas et al. 2002). Whether the model is based on a Student s t, Laplace, or a mixture prior density, the intensity of the shrinkage produced by the prior is determined by the prior parameters (i.e. hyperparameters) defining the shape of the prior density function. The models proposed in the literature differ from each other in terms of the procedures they use to determine the prior parameters. In the original BayesA and BayesB the parameters of the Student s t prior density were defined to produce the desired genetic variance (Meuwissen et al. 2001). The Xu (2003) method is otherwise similar to BayesA, except that the prior parameters are estimated instead of setting into constant values. Similar modifications of BayesB have been considered by e.g. Yi and Xu (2008) and Habier et al. (2011). Under the Bayesian LASSO the prior parameters are more commonly estimated from data (e.g. Yi and Xu 2008; de los Campos et al. 2009; Sun et al. 2010; Shepherd et al. 2010) than given as constants (Xu 2010). While the Bayesian models have proven workable, efficient and flexible, the tremendous number of markers in the modern genome-wide data sets make the computational methods traditionally connected to Bayesian estimation, e.g. Markov Chain Monte Carlo (MCMC), quite slow and cumbersome. For the same models fast alternative estimation procedures have been proposed, most commonly based on estimation of the maximum point of the posterior density (MAP-estimate), rather than the whole posterior 10

11 distribution, by expectation-maximization (EM) algorithm (Dempster et al. 1977; McLachlan and Krishnan 1997; for the methods see e.g. Yi and Banerjee 2009; Hayashi and Iwata 2010; Figueiredo 2003; Sun et al. 2010; Xu 2010; Meuwissen et al. 2009; Shepherd et al. 2010; Lee et al. 2010). 2 Objectives of the study The objectives of this work are to 1) better understand the behavior of the different Bayesian multilocus association models, especially under the maximum a posteriori estimation context, and to obtain further information on the instances in which different methods work best, 2) seek connections between the different Bayesian models and try to see the different model variants as special cases or sub-models of a common model framework, 3) pay special attention to the significance of the parametrization and hierarchical structure of the model for elegant derivation and convergence properties of the estimation algorithm, and 4) to develop a flexible and versatile Bayesian multilocus association model framework, along with an efficient parameter estimation machinery, that can be utilized in phenotype prediction, genomic breeding value estimation and QTL (quantitative trait loci) detection and effect estimation from a variety of genome-wide data. The original papers I III contribute to the objectives in the following manner. In I we lay the foundation for our Bayesian model framework, examine the behavior and predictive performance of different sub-models and prior densities, including G-BLUP, and present a generalized expectationmaximization algorithm (GEM) for the parameter estimation. In II we apply selected parts of the model framework in QTL mapping context and, in particular, consider the impact of an additional polygenic component for the performance of the model and the GEM-algorithm. In III we generalize the model framework and the GEM-algorithm for ordered categorical and censored Gaussian phenotypes. 11

12 3 Hierarchical Bayesian model In Bayesian inference the learning from data is based on updating the prior belief concerning the model parameters into the posterior belief by applying the Bayes theorem. Let p(θ) denote the joint prior density for the unknown parameters and p(data Θ) the likelihood of the data given those parameters. Now the posterior density for the unknown parameters, given the data, is acquired from the Bayes formula p(θ data) = p(data Θ)p(Θ) p(data) p(data Θ)p(Θ), where the normalizing constant p(data) = p(data Θ)p(Θ)dΘ is the marginal likelihood of the data. As the marginal likelihood has a Θ constant value, it is usually omitted from the computation, and the joint posterior density is considered to be proportional to the product of the likelihood and the joint prior density. In addition to the prior conception of the parameter values, the joint prior density expresses the mutual relationships of the parameters, e.g. whether the parameters are considered a priori independent or conditional to some other parameters. This definition is denoted as the hierarchical structure of the Bayesian model. Let e.g. the parameter vector be Θ = (θ 1, θ 2 ), and let θ 1 be a priori dependent on θ 2. Now the joint prior is given by p(θ) = p(θ 1 θ 2 )p(θ 2 ), and the dependent parameter θ 1 is said to be located on a lower layer of the model hierarchy. In its complete form our hierarchical Bayesian model framework, depicted as a directed acyclic graph in Figure 3.1, consists of two separate parts, the linear Gaussian model and the threshold model. Under the linear Gaussian model the phenotype measurements are assumed to be continuous and follow a Gaussian density, while the additional threshold model handles binary, ordinal and censored Gaussian observations. The hierarchical model has a total of six layers, two of which are optional. The observed data, located on the 1st and 2nd layers in the graph, comprises phenotype and genotype information and, optionally, a known pedigree of a sample of related individuals. The continuous Gaussian phenotypes, denoted by a vector y, and the genetic data matrix X consisting the genotypes of biallelic SNP markers, are located on the observed data layer of the linear Gaussian model. As the binary, ordinal and censored Gaussian observations are handled via a latent variable parametrization, they are located on the optional observed layer of the threshold model in Figure 3.1. The possible pedigree information is given in a form of an additive genetic relationship matrix (Lange 1997), located on the optional observed layer 12

13 Figure 3.1: Hierarchical structure of the model framework. The ellipses represent random parameters and rectangles fixed values, while the roundcornered rectangles may be either, depending on the selected model. Solid arrows indicate statistical dependency and dashed arrows functional relationship. The background boxes indicate the main modules of the model framework. 13

14 in the directed acyclic graph (Figure 3.1) to represent its non-compulsory nature. In the following sections we first will consider the linear Gaussian model part, and only after that focus on the threshold model for the discrete or censored data. 3.1 Gaussian likelihood In the center of a Bayesian model there is the likelihood function of the data given the model parameters. The likelihood is based on the probability model (sometimes called the sampling model) determining how the independent variables or traits are connected to the explanatory variables. In our model framework the Gaussian phenotypes are connected to the marker and pedigree information with a linear Gaussian association model (see Figure 3.1) y = β 0 + XΓβ + Zu + ε, (3.1) where y denotes the phenotypic records of n individuals, β 0 is the population intercept, and ε corresponds to the residuals, assumed normal and independent, ε MVN(0, I n σ0). 2 If necessary, the intercept β 0 can be easily replaced with a vector of environmental variables. The second term on the right hand side of the equation (3.1) comprises the observed genotypes X and the allele substitution effects Γβ. The observed genotypes of the p biallelic SNP markers are coded with respect to the number of the rare alleles (0,1 and 2) and standardized to have null mean and unity variance. In the complete model the allele substitution effect (see Marker effect in Figure 3.1) is modeled following Kuo and Mallick (1998) as a product of the size of the effect and a variable indicating whether the marker is linked to the phenotype. In the equation (3.1), β denotes the additive effects sizes, and Γ is a diagonal matrix of indicator variables, whose jth diagonal element γ j has value 1 if the jth SNP marker is included in the model, and 0 otherwise. As depicted in Figure 3.1, the indicator and the effect size are considered a priori independent. The term u in the equation (3.1) denotes the additive polygenic effects due to the combined effect of infinite number of loci, and Z is a design matrix connecting the polygenic effects to the observed phenotypes. The individuals, or their phenotypic values y i, are assumed conditionally independent given the genotype information X and the polygenic effect u. This assumption and the described linear marker association model (3.1) 14

15 gives a multivariate normal likelihood p (y β 0, σ0, 2 β, Γ, u, X, Z) det(i n σ0) 2 1/2 ( exp 1 ) 2 (y β 0 XΓβ Zu) (I n σ0) 2 1 (y β 0 XΓβ Zu) (3.2) for the phenotypes given the parameter vector. Due to the independence of the observations, the likelihood can be interpreted also as an univariate normal N(β 0 + p j=1 γ jβ j x ij + u i, σ 2 0) given a single observation y i and the appropriate parameters. The parameters of the multilocus association model that are present in the likelihood function are located in the model parameters -layer of the linear Gaussian model in Figure Shrinkage inducing priors The second essential component of a Bayesian model consists of the prior densities for the model parameters. The prior for a given parameter represents the a priori understanding of the plausibility of different parameter values. In some cases there is no reason to believe that one parameter value would be more plausible than another, which conception is expressed with a flat or an uninformative prior density, e.g. by setting p(β 0 ) 1 and p(σ0) 2 1/σ0 2 (note the Noninformative uniform priors at layer 5 in Figure 3.1). In some cases, however, the prior density plays a most important role in the model operation. A central feature of handling an oversaturated model is the selection or regularization of the excess predictors. In the Bayesian context the regularization is included into the model by specifying such a prior density for the regression coefficients, that it represents the a priori understanding that the majority of the predictors have only a negligible effect, while there are a few predictors with possibly large effect sizes. A prior that would evince this idea should consist of a probability mass centered near zero and a probability mass distributed over the nonzero values, including a reasonably high probability for large values. The probability density functions we have used for imitating this spike and slab shape are Student s t (following e.g. Meuwissen et al. 2001; Xu 2003) and Laplace densities (following e.g. Park and Casella 2008; de los Campos et al. 2009), either alone or combined with a point mass at zero (e.g. Meuwissen et al. 2001; Shepherd et al. 2010). In our full model framework, (3.1) and Figure 3.1, the mixture prior with the point mass at zero is accomplished by adding a dummy variable to indicate whether the effect of a given predictor variable is included into 15

16 the model or not. Following Kuo and Mallick (1998) the marker effects are modeled as a product of the indicator variable γ j and the effect size β j, which are considered a priori independent, hence the joint prior of the marker effect becomes simply p(γ j β j ) = p(γ j )p(β j ), where p(γ j ) is a Bernoulli density with a prior probability π = P(γ j = 1) for a marker to be linked to the trait and p(β j ) is the prior density for the effect size Hierarchical formulation of the prior densities The Student s t and the Laplace distribution can both be expressed as a scale mixture of normal distributions with a common mean and effect specific variances. The hierarchical formulation of a Student s t-distribution with ν degrees of freedom, location µ and scale τ 2 is a scale mixture of normal densities with mean µ and variances following a scaled inverse-χ 2 distribution with ν degrees of freedom and scale τ 2, } β j σj 2 N(µ, σj 2 ) = β σj 2 ν, τ 2 Inv-χ 2 (ν, τ 2 j t ν (µ, τ 2 ), ) while a Laplace density with location µ and rate λ can be presented in a similar manner, the mixing distribution now being an exponential one, } β j σj 2 N(µ, σj 2 ) = β σj 2 λ 2 Exp(λ 2 j Laplace(µ, λ). /2) The hierarchical representation of the prior densities bears a twofold advantage (I). First, the derivation of the fully conditional posterior densities, and hence the derivation of the estimation algorithm, simplifies greatly. Within MCMC world, the hierarchical formulation of the prior densities, also known as model- or parameter expansion, is a well known device to simplify computations by transforming the prior into a conjugate and thus enabling Gibbs sampling. Conjugacy of a prior distribution means that the fully conditional posterior probability distribution of a given parameter will be of same type as the prior distribution of that parameter, and hence we are guaranteed to get a closed form fully conditional posterior with a known probability density function. The hierarchical formulation of a prior density is also known to accelerate convergence of a MCMC sampler by adding more working parts and therefore more space for the random walk to move (see e.g. Gilks et al. 1996; Gelman et al. 2004; Gelman 2004). In maximum a posteriori (MAP) estimation, on the other hand, a commonly adopted approach to try and simplify the model is to integrate out the effect variances. However, the conjugacy maintained by preserving the intermediate variance 16

17 layer (layer 4 in Figure 3.1) is a valuable feature also for MAP-estimation, as it enables the straightforward derivation of the fully conditional posterior density functions. Expressed as a scale mixture, the Student s t distribution leads to conjugate priors for normal likelihood parameters, and hence is a perfect choice for a conjugate analysis. Although the decomposition of the Laplace prior does not provide conjugacy, it leads to a tractable fully conditional posterior density for the inverse of the effect variance. Second, the estimation algorithm is likely to behave better under a hierarchical model. Even though the marginal distributions of the marker effects are mathematically equivalent in hierarchical and non-hierarchical models, we noted in I that the parametrization and model structure alter the properties and behavior of the model, and thus have influence on the mixing and convergence properties of an estimation algorithm, and also on the values of the actual estimates. We noted in I that in some cases the hierarchical Laplace model was clearly more accurate than its nonhierarchical counterpart. Also, contrary to the non-hierarchical version, the hierarchical Laplace model worked without the additional indicator variable, i.e. without a zero-point-mass in the prior of the marker effects. This simplification of the model leads not only to more straightforward implementation and faster estimation, but also to easier and more accurate selection of prior parameters. 3.3 Sub-models As mentioned above, we like to consider the full model in Figure 3.1 as a framework incorporating a set of model variants, or sub-models, embodying different components of the model framework. In I we covered a multitude of such variants, and also showed how the model variants correspond to the Bayesian phenotype prediction and genomic breeding value estimation methods proposed in the literature. The non-compulsory components of the multilocus association model comprise the polygenic component, the indicator variable and the 6th, optional hyperprior layer. The selection between the Student s t and the Laplace prior densities forms one means of modifying the prior density assigned for the marker effects, while the inclusion/exclusion of the indicator and the hyperprior layer forms another. The polygenic component, on the other hand, is clearly an external addition to the multilocus association model. 17

18 3.3.1 Polygenic component The polygenic component u is included into the model to represent the genetic variation possibly not captured by the SNP markers and to take account for putative residual dependencies between individuals (Yu et al. 2006). The sample or population structure is included into the model as the covariance matrix of the multivariate normal prior density given for the polygenic effect u (σu, 2 A) MVN(0, σua), 2 where σu 2 is the polygenic variance component and A is the genetic relationship matrix. The genetic relationship matrix is either a pedigree based additive genetic relationship matrix (see Lange 1997) (I and II), or, if there is no pedigree available, a finite locus approximation based on the markers not included in the actual multilocus association model (a genomic relationship matrix) (II). The polygenic variance component σu 2 has been given an Inverse-χ 2 (ν u, τu) 2 prior distribution with suitable data specific parameter values. On the basis of the existing literature the need for an additional polygenic component within a multilocus association model is unclear. Many authors have found the polygenic component irrelevant (e.g. Calus and Veerkamp 2007; Pikkuhookana and Sillanpää 2009), while e.g. de los Campos et al. (2009) and Lund et al. (2009) see it as a necessary. In I and II we examined the importance of the additional polygenic component in genomic selection and in association mapping context, respectively, with both simulated and real data. Within these works the estimates of the polygenic component were negligible, and had no influence neither in the prediction accuracy (I) nor in the gene location ability (II) of the model. None of the Bayesian multilocus models seemed to benefit from addition of the polygenic component with neither simulated (I and II) nor real data (I), the phenotype of the latter most likely being quite polygenic in nature. The polygenic component did not find extra information even when the task was made as easy as possible by generating the polygenic component of the data by using the same relationship matrix which was also used in the analyses (II). Therefore, to our experience, the polygenic component can safely be omitted from the multilocus association model (3.1) Indicator The indicator variable is added to the model framework to participate as a source of extra shrinkage in a mixture prior alongside the Student s t or the Laplace density. The usefulness of the indicator variable depends on the other source of shrinkage in the model. As mentioned above, the 18

19 hierarchical Laplace model does not seem to require the additional point mass at zero, on the contrary the model efficiency sustains damage if the indicator is added (Tables 2 5 in I). On the other hand, the Student s t model clearly benefits from the additional point mass. The latter observation is in strict concordance with the existing literature, as the superiority of BayesB (Student s t plus indicator) (Meuwissen et al. 2001) over BayesA (only Student s t) can be considered as common knowledge. While the main purpose of the indicator variable within our model framework is to participate in the mixture prior with the Student s t or Laplace densities, in II we have considered a pure indicator model. Under the Indicator model proposed in II, the prior for the effect sizes β j is Gaussian with zero mean and a predetermined variance, and therefore the prior for the marker effects γ j β j is a mixture of a Gaussian density and a point mass at zero. As the Gaussian prior introduces a constant shrinkage to the estimates, and hence the variable selection relies solely on the indicator, a Bayes factor based on the values of the indicators can be used in determining the significance of a marker effect. Contrary to phenotype or breeding value prediction, in gene mapping the significance of the individual marker effects is of importance. Nevertheless, the Indicator model in II is mainly considered as a curiosity and a proof of the power of a multilocus association treatment, as even an extremely simple multilocus association method may exceed the performance of a most sophisticated single marker method (Figure 1, A and B in II). The indicator has a Bernoulli prior with a prior probability π = P(γ j = 1) for the SNP j contributing to the trait. The value given for the probability π also represents our prior assumption of the proportion of the SNP markers that are linked to the trait. However, as the indicator affects the shrinkage of the marker effects concurrent with the shrinkage generated by the Student s t or the Laplace density, the parameters assigned for these densities affect the selection of π Hyperprior The optional hyperprior layer (the 6th layer in Figure 3.1) composes another facultative part of the model framework. The parameters of the prior densities (layer 5 in Figure 3.1) can be either predetermined or estimated simultaneously to the model parameters. As the prior densities for the effect size and the indicator are responsible for the regularization of the excess variables in the model, the impact of the parameter values of these priors is greater than of the other prior densities in the model. There- 19

20 fore the putative estimation of the prior parameters is limited to these two parameters. The estimation of the prior parameters is depicted in Figure 3.1 by considering the priors for the indicator and the effect variance as random variables, and adding the 6th layer into the model. If the parameters for the prior densities are considered fixed, the optional hyperprior layer is absent from the model. The fixed prior parameter values can be determined e.g. by cross validation or by Bayesian information criterion (see Sun et al. 2010). It is noteworthy, that even if the prior parameters are estimated from the data, i.e. the 6th layer is present in the model, the need for predetermined values does not vanish, but simply passes to the next layer of the model hierarchy. Hence, inevitably, at the very bottom of the model hierarchy the user has to determine some values prior to the actual parameter estimation. The hyperprior given for the effect size is a conjugate Gamma(κ, ξ) density for the scale τ 2 of the inverse-χ 2 density under the Student s t model, or, respectively, for the rate λ 2 of the exponential density under the Laplace model. There is neither conjugate prior nor closed form posterior density available for the degrees of freedom parameter of the Student s t model, and hence we have decided to consider it as fixed (I). For the indicator variable, the prior probability π = P(γ j = 1) of the marker j to be linked to the trait, is estimated with either an uninformative uniform Beta(1,1), or an informative Beta(a, b) density. The informative beta prior embodies our a priori assumed belief of the proportion of significant markers by considering a as the number of markers assumed to be linked to trait and b as the number of markers not to be linked (i.e. b = p a, p being the number of markers in the data set) Student s t vs. Laplace prior In the original work I one of our main interests was to consider the pros and cons of the Student s t and Laplace prior densities. The advantage of the hierarchically formulated Student s t density as a prior is the extremely easy derivation of the fully conditional posterior densities. Although the hierarchical Laplace prior also leads to tractable fully conditional posterior functions, the derivation of the posterior for the effect variance is clearly more complicated than with the Student s t density. However, the Student s t prior has some shortcomings too. The first problem we encountered with the Student s t model was the estimation of the parameters for the prior densities (5th layer in the Figure 3.1). We tried numerous hyperpriors for the effect variance and the indicator, but it appeared to be impossible to 20

21 select ones leading to a reasonable estimate. Hence, after several attempts, we decided on treating the prior parameters of the Student s t model as given. Under the Laplace model there was no such complications, and the prior parameters of the Laplace model are estimated from the data. Therefore, in the Laplace model the 6th layer of Figure 3.1 is always included in the model, while in the Student s t model it is always excluded from the model. Due to its shape, the shrinking ability of the Student s t prior is weaker than of the Laplace prior. While the hierarchical Laplace prior worked fine without the additional indicator variable, the Student s t prior required the additional point mass at zero in order to provide a strong enough shrinkage (Tables 2 5 in I). As pointed out previously, a low number of parameters is a desirable characteristic in a model. Apart from a single data set (table 2 in I), the prediction accuracy of the Laplace model was higher compared to the Student s t model (Tables 3-5 in I). The better performance of the Laplace model may be partially due to the easier and hence more accurate prior selection, partially due to the more favorable shape of the density itself. Also, as the prior parameters for the effect variance can be estimated, and hence there is an additional layer in the hierarchical model, the model may be more robust to the given hyperprior parameter values. Altogether, on the basis of our findings in I, we feel that the hierarchical Laplace model appears to have an advantage over the Student s t model, and therefore decided to concentrate on the former in II and in III Bayesian LASSO and its extensions The hierarchical Bayesian model with a Laplace prior density is commonly denoted as the Bayesian LASSO (Park and Casella 2008) since it leads to a nearly identical estimate as the frequentist LASSO by Tibshirani (1996). The Bayesian LASSO has been further modified by several authors, including Yi and Xu (2008), Mutshinda and Sillanpää (2010), Sun et al. (2010) and Fang et al. (2012). In II we considered a modification of the Bayesian LASSO introduced by Mutshinda and Sillanpää (2010) called the Extended Bayesian LASSO (EBL). Following common hierarchical Bayesian LASSO, the Laplace prior is expressed as a scale mixture of normal densities with exponential mixing distribution, so the EBL assigns a normal prior with independent locusspecific variances to the regression parameters given the locus variances β j σj 2 N(0, σj 2 ), and further an exponential prior to the variances σj 2 λ 2 j Exp(λ 2 j/2). Unlike Bayesian LASSO, the regularization parameters λ 2 j of 21

22 EBL are locus specific, and can be decomposed by setting λ j = δη j, where δ represents the model sparseness common to all loci, and η j is a locusspecific deviation representing the shrinkage working at locus j. Now the common Bayesian LASSO can be seen as a special case of EBL with the locus specific component set to η j = 1 j. Setting the common shrinkage parameter δ = 1 would lead to the Improved Bayesian LASSO proposed by Fang et al. (2012) Bayesian G-BLUP In addition to the multilocus association model, in I and III we have considered a Bayesian version of the genomic- or G-BLUP, a classical BLUP model where the numerator relationship matrix, estimated from the pedigree, is replaced by a genomic marker-based relationship matrix y = β 0 + Zu + ε. (3.3) In the model framework in Figure 3.1 the G-BLUP can be seen as a mirror image of the multilocus association model without the polygenic component, as here we have the polygene without the marker effects. The likelihood of the data under the G-BLUP is simply a multivariate normal with mean β 0 + Zu and covariance I n σ0. 2 Prior for the genetic values u and the population intercept β 0 are conjugate multivariate normal MVN(0, Gσu) 2 and uniform, respectively, G being the genomic relationship matrix. The variances σ0 2 and σu 2 have inverse-χ 2 priors, uninformative p(σ0) 2 1/σ0 2 and a level Inv-χ 2 (ν u, τu), 2 respectively. Under the G-BLUP the genetic marker data is incorporated into the model in a form of a genomic relationship matrix. There are numerous methods of generating the genomic relationship matrix, we have used the second method described in VanRaden (2008). This method is based on the identity by state (IBS) of the marker genotypes, and hence it measures the realized relationship between the individuals. The Bayesian approach differs from the frequentist G-BLUP in terms of handling the variance components. While the frequentist methods commonly estimate the genomic breeding values with known variance components, in a Bayesian approach the variance components are estimated simultaneously to the breeding values (Hallander et al. 2010). Therefore the Bayesian inference is always based on variances that are up-to-date and specific to the analyzed trait, letting also the uncertainty of the variance components to be incorporated into the estimates of the breeding values. Even though e.g. ASREML (Gilmour et al. 2009) estimates the variance 22

23 components from the data, and hence satisfies the up-to-date criterion, the variances are not estimated simultaneously to the breeding values, instead, the pre-estimated variance components are considered constant while estimating the breeding values. 3.4 Fully conditional posterior densities As depicted in the Figure 1, the model parameters β 0, σ 2 0, β, γ and u, located at the 3rd layer, are considered a priori independent of each other. The prior independence of the indicator and the effect size, as suggested by Kuo and Mallick (1998), leads to the most straightforward parametrization of a mixture prior for the effects. In conjunction with the conjugate, or otherwise well chosen, prior densities it enables an easy derivation of a closed form fully conditional posterior distribution for every parameter of the model framework. The joint posterior distribution of the parameters, given the data, is proportional to the product of the joint prior and the likelihood. We can easily extract the fully conditional posterior densities of individual parameters from the joint posterior by handling all other parameters as constants and leaving them out, and hence selecting only the terms including the parameter in question. For example, the fully conditional posterior distribution of a single regression coefficient β j, given all other parameters and the data, is derived from the joint distribution simply by selecting only the terms including β j, i.e. the likelihood and the conditional prior p(β j σ 2 j ). Under the full multilocus association model (3.1) we get the following, closed form, fully conditional posterior distributions for the model parameters (for simplicity: = the data, and the parameters except the one in question ): ( 1 β 0 N n n (y i i=1 σ 2 0 Inv-χ 2 (n, 1 n p γ j β j x ij u i ), j=1 n (y i β 0 i=1 σ0 2 ), (3.4) n p γ j β j x ij u i ) ), 2 (3.5) j=1 β j N(µ j, s 2 j), where (3.6) µ j n ( = γ j x ij y i β 0 ) / ( n ) γ l β l x il u i (γ j x ij ) 2 + σ2 0, σ 2 i=1 l j i=1 j s 2 j /( n ) = σ0 2 (γ j x ij ) 2 + σ2 0, σj 2 i=1 23

The joint posterior distribution of the unknown parameters and hidden variables, given the

The joint posterior distribution of the unknown parameters and hidden variables, given the DERIVATIONS OF THE FULLY CONDITIONAL POSTERIOR DENSITIES The joint posterior distribution of the unknown parameters and hidden variables, given the data, is proportional to the product of the joint prior

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Bayesian Regression (1/31/13)

Bayesian Regression (1/31/13) STA613/CBB540: Statistical methods in computational biology Bayesian Regression (1/31/13) Lecturer: Barbara Engelhardt Scribe: Amanda Lea 1 Bayesian Paradigm Bayesian methods ask: given that I have observed

More information

Probabilistic machine learning group, Aalto University Bayesian theory and methods, approximative integration, model

Probabilistic machine learning group, Aalto University  Bayesian theory and methods, approximative integration, model Aki Vehtari, Aalto University, Finland Probabilistic machine learning group, Aalto University http://research.cs.aalto.fi/pml/ Bayesian theory and methods, approximative integration, model assessment and

More information

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics arxiv:1603.08163v1 [stat.ml] 7 Mar 016 Farouk S. Nathoo, Keelin Greenlaw,

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Recent advances in statistical methods for DNA-based prediction of complex traits

Recent advances in statistical methods for DNA-based prediction of complex traits Recent advances in statistical methods for DNA-based prediction of complex traits Mintu Nath Biomathematics & Statistics Scotland, Edinburgh 1 Outline Background Population genetics Animal model Methodology

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong

More information

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel

More information

10. Exchangeability and hierarchical models Objective. Recommended reading

10. Exchangeability and hierarchical models Objective. Recommended reading 10. Exchangeability and hierarchical models Objective Introduce exchangeability and its relation to Bayesian hierarchical models. Show how to fit such models using fully and empirical Bayesian methods.

More information

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Xiaoquan Wen Department of Biostatistics, University of Michigan A Model

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

STA 216, GLM, Lecture 16. October 29, 2007

STA 216, GLM, Lecture 16. October 29, 2007 STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Lecture 16: Mixtures of Generalized Linear Models

Lecture 16: Mixtures of Generalized Linear Models Lecture 16: Mixtures of Generalized Linear Models October 26, 2006 Setting Outline Often, a single GLM may be insufficiently flexible to characterize the data Setting Often, a single GLM may be insufficiently

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Bayesian shrinkage approach in variable selection for mixed

Bayesian shrinkage approach in variable selection for mixed Bayesian shrinkage approach in variable selection for mixed effects s GGI Statistics Conference, Florence, 2015 Bayesian Variable Selection June 22-26, 2015 Outline 1 Introduction 2 3 4 Outline Introduction

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Matthew S. Johnson New York ASA Chapter Workshop CUNY Graduate Center New York, NY hspace1in December 17, 2009 December

More information

Bayesian Inference: Concept and Practice

Bayesian Inference: Concept and Practice Inference: Concept and Practice fundamentals Johan A. Elkink School of Politics & International Relations University College Dublin 5 June 2017 1 2 3 Bayes theorem In order to estimate the parameters of

More information

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative

More information

A Review of Bayesian Variable Selection Methods: What, How and Which

A Review of Bayesian Variable Selection Methods: What, How and Which Bayesian Analysis (2009) 4, Number 1, pp. 85 118 A Review of Bayesian Variable Selection Methods: What, How and Which R.B. O Hara and M. J. Sillanpää Abstract. The selection of variables in regression

More information

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking

More information

Multiple QTL mapping

Multiple QTL mapping Multiple QTL mapping Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] 1 Why? Reduce residual variation = increased power

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Bayesian construction of perceptrons to predict phenotypes from 584K SNP data.

Bayesian construction of perceptrons to predict phenotypes from 584K SNP data. Bayesian construction of perceptrons to predict phenotypes from 584K SNP data. Luc Janss, Bert Kappen Radboud University Nijmegen Medical Centre Donders Institute for Neuroscience Introduction Genetic

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

MIXED MODELS THE GENERAL MIXED MODEL

MIXED MODELS THE GENERAL MIXED MODEL MIXED MODELS This chapter introduces best linear unbiased prediction (BLUP), a general method for predicting random effects, while Chapter 27 is concerned with the estimation of variances by restricted

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q) Supplementary information S7 Testing for association at imputed SPs puted SPs Score tests A Score Test needs calculations of the observed data score and information matrix only under the null hypothesis,

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Estimation of Parameters in Random. Effect Models with Incidence Matrix. Uncertainty

Estimation of Parameters in Random. Effect Models with Incidence Matrix. Uncertainty Estimation of Parameters in Random Effect Models with Incidence Matrix Uncertainty Xia Shen 1,2 and Lars Rönnegård 2,3 1 The Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala, Sweden; 2 School

More information

David Giles Bayesian Econometrics

David Giles Bayesian Econometrics David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

Dynamic System Identification using HDMR-Bayesian Technique

Dynamic System Identification using HDMR-Bayesian Technique Dynamic System Identification using HDMR-Bayesian Technique *Shereena O A 1) and Dr. B N Rao 2) 1), 2) Department of Civil Engineering, IIT Madras, Chennai 600036, Tamil Nadu, India 1) ce14d020@smail.iitm.ac.in

More information

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Genet. Sel. Evol. 33 001) 443 45 443 INRA, EDP Sciences, 001 Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Louis Alberto GARCÍA-CORTÉS a, Daniel SORENSEN b, Note a

More information

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics

Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics Lee H. Dicker Rutgers University and Amazon, NYC Based on joint work with Ruijun Ma (Rutgers),

More information

Outline lecture 2 2(30)

Outline lecture 2 2(30) Outline lecture 2 2(3), Lecture 2 Linear Regression it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic Control

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (BL)

GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (BL) GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (BL) Paulino Pérez 1 José Crossa 2 1 ColPos-México 2 CIMMyT-México September, 2014. SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions

More information

Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation

Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation COMPSTAT 2010 Revised version; August 13, 2010 Michael G.B. Blum 1 Laboratoire TIMC-IMAG, CNRS, UJF Grenoble

More information

Bayesian methods in economics and finance

Bayesian methods in economics and finance 1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Bayesian Inference: Probit and Linear Probability Models

Bayesian Inference: Probit and Linear Probability Models Utah State University DigitalCommons@USU All Graduate Plan B and other Reports Graduate Studies 5-1-2014 Bayesian Inference: Probit and Linear Probability Models Nate Rex Reasch Utah State University Follow

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017 Lecture 2: Genetic Association Testing with Quantitative Traits Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 29 Introduction to Quantitative Trait Mapping

More information

HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA)

HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA) BIRS 016 1 HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA) Malka Gorfine, Tel Aviv University, Israel Joint work with Li Hsu, FHCRC, Seattle, USA BIRS 016 The concept of heritability

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

(Genome-wide) association analysis

(Genome-wide) association analysis (Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by

More information

Large-scale Ordinal Collaborative Filtering

Large-scale Ordinal Collaborative Filtering Large-scale Ordinal Collaborative Filtering Ulrich Paquet, Blaise Thomson, and Ole Winther Microsoft Research Cambridge, University of Cambridge, Technical University of Denmark ulripa@microsoft.com,brmt2@cam.ac.uk,owi@imm.dtu.dk

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Bayesian model selection: methodology, computation and applications

Bayesian model selection: methodology, computation and applications Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne

More information

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics 1 Springer Nan M. Laird Christoph Lange The Fundamentals of Modern Statistical Genetics 1 Introduction to Statistical Genetics and Background in Molecular Genetics 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture

More information

Lecture 8 Genomic Selection

Lecture 8 Genomic Selection Lecture 8 Genomic Selection Guilherme J. M. Rosa University of Wisconsin-Madison Mixed Models in Quantitative Genetics SISG, Seattle 18 0 Setember 018 OUTLINE Marker Assisted Selection Genomic Selection

More information

Gibbs Sampling in Linear Models #2

Gibbs Sampling in Linear Models #2 Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling

More information

Lecture 9. QTL Mapping 2: Outbred Populations

Lecture 9. QTL Mapping 2: Outbred Populations Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

Some models of genomic selection

Some models of genomic selection Munich, December 2013 What is the talk about? Barley! Steptoe x Morex barley mapping population Steptoe x Morex barley mapping population genotyping from Close at al., 2009 and phenotyping from cite http://wheat.pw.usda.gov/ggpages/sxm/

More information

QTL model selection: key players

QTL model selection: key players Bayesian Interval Mapping. Bayesian strategy -9. Markov chain sampling 0-7. sampling genetic architectures 8-5 4. criteria for model selection 6-44 QTL : Bayes Seattle SISG: Yandell 008 QTL model selection:

More information

DAG models and Markov Chain Monte Carlo methods a short overview

DAG models and Markov Chain Monte Carlo methods a short overview DAG models and Markov Chain Monte Carlo methods a short overview Søren Højsgaard Institute of Genetics and Biotechnology University of Aarhus August 18, 2008 Printed: August 18, 2008 File: DAGMC-Lecture.tex

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31 Hierarchical models Dr. Jarad Niemi Iowa State University August 31, 2017 Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 1 / 31 Normal hierarchical model Let Y ig N(θ g, σ 2 ) for i = 1,...,

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Linear Models A linear model is defined by the expression

Linear Models A linear model is defined by the expression Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose

More information

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined

More information

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Integrated Non-Factorized Variational Inference

Integrated Non-Factorized Variational Inference Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014

More information

Bayesian data analysis in practice: Three simple examples

Bayesian data analysis in practice: Three simple examples Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

Outline Lecture 2 2(32)

Outline Lecture 2 2(32) Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic

More information