The joint posterior distribution of the unknown parameters and hidden variables, given the

Similar documents
Linear Models A linear model is defined by the expression

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian linear regression

The linear model is the most fundamental of all serious statistical models encompassing:

Bayesian Models in Machine Learning

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

variability of the model, represented by σ 2 and not accounted for by Xβ

Bayesian Multilocus Association Models for Prediction and Mapping of Genome-Wide Data

Exponential Families

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

GAUSSIAN PROCESS REGRESSION

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Hierarchical Modeling for Univariate Spatial Data

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

November 2002 STA Random Effects Selection in Linear Mixed Models

Bayesian Regression (1/31/13)

Bayesian Linear Regression

Bayesian Linear Models

10. Exchangeability and hierarchical models Objective. Recommended reading

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

More Spectral Clustering and an Introduction to Conjugacy

Sparse Linear Models (10/7/13)

Pattern Recognition and Machine Learning

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Bayesian Linear Models

Gibbs Sampling in Linear Models #2

Introduction to Machine Learning

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Nonparametric Bayesian Methods (Gaussian Processes)

Reliability Monitoring Using Log Gaussian Process Regression

Bayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop

STAT J535: Chapter 5: Classes of Bayesian Priors

Bayesian data analysis in practice: Three simple examples

Relevance Vector Machines

First Year Examination Department of Statistics, University of Florida

Part 6: Multivariate Normal and Linear Models

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

The Expectation Maximization or EM algorithm

STAT 518 Intro Student Presentation

Hierarchical Linear Models

Introduction to Machine Learning

Hierarchical Linear Models. Hierarchical Linear Models. Much of this material already seen in Chapters 5 and 14. Hyperprior on K parameters α:

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP

Hierarchical Modelling for Univariate Spatial Data

Gibbs Sampling in Linear Models #1

A Bayesian Treatment of Linear Gaussian Regression

General Bayesian Inference I

STA414/2104 Statistical Methods for Machine Learning II

Binary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017

Introduction to Probabilistic Machine Learning

One Parameter Models

Probabilistic Graphical Models

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

The Normal Linear Regression Model with Natural Conjugate Prior. March 7, 2016

Stat 5101 Lecture Notes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

Gaussian Processes in Machine Learning

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011

Statistical Theory MT 2007 Problems 4: Solution sketches

Subject CS1 Actuarial Statistics 1 Core Principles

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Bayesian Regression Linear and Logistic Regression

Probability and Estimation. Alan Moses

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Classical and Bayesian inference

Introduction to Gaussian Processes

Multivariate Bayesian Linear Regression MLAI Lecture 11

1 Data Arrays and Decompositions

Supplementary Material for Analysis of Job Satisfaction: The Case of Japanese Private Companies

Chapter 5 continued. Chapter 5 sections

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

STAT Advanced Bayesian Inference

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.

TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1

Hierarchical Models & Bayesian Model Selection

Bayesian Linear Models

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics

Linear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model

Learning Bayesian network : Given structure and completely observed data

Module 17: Bayesian Statistics for Genetics Lecture 4: Linear regression

A Process over all Stationary Covariance Kernels

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Module 4: Bayesian Methods Lecture 5: Linear regression

CS 7140: Advanced Machine Learning

Bayesian Inference for the Multivariate Normal

36-720: The Rasch Model

MAS223 Statistical Inference and Modelling Exercises

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

An Introduction to Bayesian Linear Regression

Bayesian (conditionally) conjugate inference for discrete data models. Jon Forster (University of Southampton)

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

STAT 801: Mathematical Statistics. Distribution Theory

STT 843 Key to Homework 1 Spring 2018

Expectation Propagation Algorithm

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Transcription:

DERIVATIONS OF THE FULLY CONDITIONAL POSTERIOR DENSITIES The joint posterior distribution of the unknown parameters and hidden variables, given the data, is proportional to the product of the joint prior and the likelihood, and the fully conditional posteriors of the parameters can be easily determined by selecting the terms including the parameter in question from the joint posterior. For simplicity: the data, and the parameters except the one in question The fully conditional posterior distribution of the population intercept β 0 is proportional to the likelihood, since the prior of β 0 is proportional to one, so n pβ 0 exp y i β 0 γ j β j x ij u i. S This is a product of n kernels of normal distributions, with a common variance σ 0, and means y i p γ jβ j x ij u i, i,..., n. The set of Gaussian functions is closed under multiplication, i.e. the product of normal densities is also a normal density, with the mean and the variance of a product density given by µi µ σ i σ i and σ σ i respectively. Hence, since in this case variance of the factors in the product is constant, the mean of product density reduces to the sum of the means of the individual distributions, divided by n, while the variance of the product density is given simply by the variance of the individual distributions divided by n. The fully conditional posterior distribution of the regression coefficients β j is proportional to the product of the likelihood and the conditional prior pβ j σ j, pβ j n exp y i β 0 l m γ l β lm x ilm u i exp β j, S that is a product of two types of kernels of normal distributions. One distribution comes from the prior, it has mean 0 and variance σ j. The other part comes from the likelihood H. P. Kärkkäinen and M. J. Sillanpää SI

regarding β j there are n kernels of normal distributions with means y i β 0 γ j x ij l,m j,k γ l β lm x ilm u i, i,..., n, and variances σ 0/γ j x ij, i,..., n. Hence we get as a product a normal distribution with a mean γj x ij y i β 0 σ 0 l,m j,k / γ l β lm x ilm u i γ i x ij σ 0 + σ j and variance γ i x ij σ 0 + σ j, which equals the variance given in the Appendix when each term is multiplied by σ 0. The conditional posterior distribution of the polygenic effect u is multivariate normal, as it is a product of the multivariate normal prior distribution, u σ u N n 0, Aσ u and a normal likelihood, pu pu σupy exp n u Aσu u exp y i β 0 l m γ l β lm x ilm u i. The kernel of the likelihood part of the posterior can also be interpreted as a kernel of a multivariate normal density regarding the polygenic effect u N n y β 0 XΓβ, I n σ 0, so can simply use the same multiplication rule of normal densities as in previous cases, and we therefore get conditional posterior mean I σ0 n + A y β σu σ0 0 XΓβ S3 and covariance I σ0 n +. A σu H. P. Kärkkäinen and M. J. Sillanpää 3 SI

The variance parameters of the model have inverse χ -priors which, due to conjugacy, leads to inverse χ -posteriors. The prior of the residual variance σ 0 is proportional to /σ 0, which leads to a posterior density pσ 0 σ 0 + n exp σ 0 y i β 0 γ j β j x ij u i. S4 Regarding σ 0, this is an unnormalized probability density function of an inverse χ -distribution, with n degrees of freedom and scale parameter equal to n y i β 0 γ j β j x ij u i. The inverse χ -posteriors of the other variance parameters, σ j and σ u, are derived with an identical logic. In case of the Laplace0, λ prior for the effect size, the effect variance σ j Expλ /, and hence the fully conditional posterior of σ j is proportional to the product of the exponential prior and the normal, conditional prior of the effect size β j σ j, pσ j pβ j σ j pσ j λ / exp β j λ exp λ. Since exponential density is not conjugate to normal density, we need to consider the inverse of the variance p σ j / exp β j λ σ j, where the last term is the Jacobian of the transformation σ j terms, we get p λ β j + σ j 3/ λ and completing the numerator of the exponent into square p σ j 3/ λ β j σ βj λ j + λ λ σ j. By rearranging the β j λ H. P. Kärkkäinen and M. J. Sillanpää 4 SI

σ j is canceled out from the the last term of the exponent, hence the term being constant and left out, after which the exponent is expanded by β j /λ and we get p λ σ σ j 3/ j λ σ 3/ j σ j σ j β j λ λ β j λ β j λ β j, + λ β j S5 that is an inverse-gaussian probability density function with mean µ and shape λ µ λ β j and λ λ, the parametrization of the inverse-gaussian density being fx x 3/ exp λ x µ. µ x The fully conditional posterior distribution of the indicators γ j is Bernoulli. Directly from Bayes formula we get pγ j y, pγ jpy γ j, py γj πrj π πr j π π + πr j π + πr j where π pγ j pγ j 0, and R j py γ j, py γ j 0, exp σ 0 pγ j py γ j, pγ j 0py γ j 0, + pγ j py γ j, γj π π + πr j exp y i β 0 γ h β h x ih β j x ij h j exp y i β 0 γ h β h x ih h j k β j x ij y i β 0 γ h β h x ih β j x ij. h j γj, S6 H. P. Kärkkäinen and M. J. Sillanpää 5 SI

Hierarchical Laplace Non-hierarchical Laplace with indicator no indicator with indicator no indicator nqtl 30 30 30 00 ξ 3 0.5 0 00 00 00 00 λ Original 75 75 74 43 05 64 44 37 39 44 Mean 75 75 75 43 05 50 44 39 39 45 Max 75 75 75 43 06 75 45 4 40 45 Min 74 75 74 43 05 06 4 37 38 44 Std 0.05 0.05 0.05 0.03 0.07 7 0.6.4 0.37 0. π Original 0.96 0.0 0.5 0.69 0.0 0.03 Mean 0.96 0.0 0.7 0.69 0.03 0.03 Max 0.97 0.0 0.3 0.83 0.0 0.04 Min 0.94 0.0 0.5 0.58 0.0 0.03 Std 0.007 0.0005 0.038 0.047 0.05 0.0008 Table S: Estimated hyperparameter values of λ and π in the original QTL-MAS data, in addition to the mean, maximum, minimum and standard deviance of the estimates in the analyses of the 00 replicated data sets. The nqtl and ξ denote alternative hyperprior parameter values for the prior of the indicator π Betanqtl, p nqtl and the rate parameter λ Gamma, ξ under the hierarchical Laplace model and λ Gamma, ξ under the non-hierarchical Laplace model, with and without the indicator variable. denotes the optional Beta, prior of the indicator. H. P. Kärkkäinen and M. J. Sillanpää 6 SI