Integrated Anlaysis of Genomics Data

Size: px
Start display at page:

Download "Integrated Anlaysis of Genomics Data"

Transcription

1 Integrated Anlaysis of Genomics Data Elizabeth Jennings July 3, 01 Abstract In this project, we integrate data from several genomic platforms in a model that incorporates the biological relationships between platforms to more precisely identify the molecular mechanisms that affect the clinical outcome of cancer. We present the model and describe the estimation procedure. We then summarize simulation results to validify our choice of parameterization and to assess the performance of our method. Finally we apply our method to a Glioblastoma Multiforme (GBM) data set and discuss the results. 1 Introduction The Central Dogma of Biology summarizes the steps involved in the expression of a gene on a molecular level: DNA is transcribed to messenger RNA (mrna), which is then translated to a protein, which carries out an action. There are also alterations and interferences that can occur at the DNA and/or mrna levels which affect the ultimate expression. In this project we will consider the platforms of methylation (which occurs at the DNA level and typically results in a silencing of the gene), copy number (which describes an attribute at the DNA level that affects mrna expression), and mrna expression (which affects protein expression). There is also the potential to incorporate other platforms in a straightforward manner, and we plan to have microrna (mirna) data (which can mute mrna or affect mrna expression directly) in the near future. The process above describes the expression of a single gene, but it is believed that the mechanism of cancer involves multiple genes. Research has found that genes interact and are related through certain pathways, and for this project we focus on genes from a single pathway that is believed to affect GBM processes [1]. Our goal is to integrate data from several genomic platforms in a model that incorporates the biological relationships between platforms to identify not only which genes have a significant effect on survival, but also which platform(s) of those genes is (are) modulating the effect. 1

2 Model We employ a two step, hierarchical model. The first component can be considered the mechanistic model, and the second can be considered the clinical model. This hierarchical setup has recently been introduced as an integrative Bayesian analysis of genomics data (ibag) model [4]..1 ibag Model The mechanistic model is the first component, and it models the effect of methylation and copy number on gene expression. The clinical model subsequently models the effects of these pieces on the clinical outcome. The model can be expressed as: mrna i = M i + CN i + O i where i = 1,..., max(p j ); j = 1,..., k (1) with definitions as follows: Y = Mβ 1 + CNβ + Oβ 3 + ɛ () Let n = number of patients; k = number of platforms; and p j = number of genes with data from platform j. mrna i is the level of gene expression for gene i and has dimension (n 1). M i is the part of gene i expression that is due to methylation and has dimension (n 1). M has dimension (n p 1 ). CN i is the part of gene i expression that is due to copy number and has dimension (n 1). CN has dimension (n p ). O i is the other remaining part of gene i expression, i.e., the part due to neither methylation nor copy number, and has dimension (n 1). O has dimension (n p 3 ). Y is the clinical outcome (survival in days from diagnosis) and has dimension (n 1). β i is the effect of platform i on clinical outcome and has dimension (p i 1). ɛ is the error term and has dimension (n 1).

3 . Estimation To estimate M i, CN i, and O i in the mechanistic model, we first carry out two Principal Component Analyses (PCA s) for gene i : one on the methylation data for gene i and one on the copy number data for gene i. Then we perform least squares regression of mrna i on the methylation and copy number PC scores that account for 90% of the variation. We use the estimated pieces and the residuals from this regression to estimate the vectors M i, CN i, and O i. This process is repeated for each gene to yield the matrices M, CN, and O. Since we believe there is sparsity in the parameters in the clinical model, we estimate them by implementing the Bayesian Lasso. (This approach is also beneficial because it will allow us to estimate the variances of the parameters.) We represent the clinical model with the following hierarchy (based on the suggestion of Park and Casella [3]): n = number of samples; k = number of platforms; p i = number of predictors for platform i; k p = p i = number of total predictors in clinical model; i=1 Y = Mβ 1 + CNβ + Oβ 3 + ɛ = Xβ + ɛ where Y is mean-centered, and ɛ = Normal(0 n, σ I n ); Thus: Y = Normal(Xβ, σ I n ); β = Normal(0 p, σ D τ ) where D τ = diag(τ 1,1,..., τ 1,p 1,..., τ k,1,...τ k,p k ); τ i,j = Negative Exponential with mean /λ i, i.e., density (λ i /) exp( λ i τ i,j/); σ = InverseGamma(a, b), i.e., density b a (σ ) (a+1) exp( b/σ )/Γ(a); λ i = Gamma(r, δ), i.e., density δ r (λ i ) r 1 exp( δλ i )/Γ(r). From this representation, we derive the complete conditionals (see Appendix for details). Note that we will be using the parameterization involving the precision (γ ) as opposed to the parameterization with τ : 3

4 β rest Normal { (X T X + Dτ 1 ) 1 X T Y, σ (X T X + Dτ 1 ) 1} σ rest Inv.Gamma ( a = a + (n + p)/, b = b + {(Y Xβ) T (Y Xβ) + β T Dτ 1 β}/ ) p λ i rest Gamma(a = p i + r, b = δ + τi,j/) τ i,j rest Gen.Inv.Gaussian(a = λ i, b = β i,j/σ, p = 1/), where j=1 V = Gen.Inv.Gaussian(a, b, p) has density (a/b) p/ v p 1 exp{ (av + b/v)/}/{k p ( ab)}, where K p ( ) is a modified Bessel function of the second kind. γ i,j rest = (1/τ i,j) rest Inv.Gaussian( ν = (σ λ i /β i,j) 1/, λ = λ i ), where X = Inv.Gaussian( ν, λ) has density { λ/(π)} 1/ x 3/ exp{ λ(x ν) /( ν x)} Since the complete conditionals are all in closed form, we can use a Gibbs sampler (with block updates for β and γ ) to do estimation. The initial values and hyperparameters are chosen as follows: The initial β is the estimate from the frequentist Lasso with a single shrinkage parameter. The initial σ is the MLE for σ. Each initial λ i is the square of the penalty parameter chosen by cross validation by the frequentist Lasso. (So, the initial λ i s are all equal.) The initial γ i,j s are all set to 1. The hyperparameters for σ are a = b = 0.001, so as to be uninformative. The hyperparameters for each λ i are r = 1 and δ = 0.1, which results in a posterior that is relatively flat with high posterior probability near the MLE []. We also consider three alternative parameterization options that provide equivalent models but different estimation properties; we will discuss those in the Simulation section. 3 Simulations We use simulations to compare four parameterization options for the clinical model. After choosing the best parameterization, we then further assess its estimation properties. 4

5 3.1 Choosing parameterization Park and Casella argue for σ to be in the β prior because it results in a unimodal full posterior [3], but we were concerned that this might also inflate the correlation between MCMC samples, so we investigate the option of giving β a Normal(0 p, D τ ) prior. In each of those two cases (with and without σ in the β prior), we also compare using the τ versus γ parameterization. We were interested in this aspect because we were somewhat suspicious about the accuracy of the R commands used to sample from the Inverse Gaussian and Generalized Inverse Gaussian distributions; some of the values of the parameters of the complete conditional distributions that were coming up in the MCMC samples were extreme, and we wanted to make sure that the variation seen in the output of the R commands was just a result of this, and not an error in sampling mechanism. We adjusted the Gibbs sampler to reflect the differences in complete conditional distributions for each parameterization. Then for each different choice of β, we simulated a training data set such that n = 100, k = 1, p 1 = 90, σ = 1, each X entry from Normal(0, 1), and Y = Normal(Xβ, σ I n ) and also a test data set with the same settings except n = 400. The results from the simulations that most closely reflect what we anticipate from the data (i.e. sparsity, and number of predictors close to number of samples) are shown in the tables below. The MSE Ratio is the MSE from least squares divided by the MSE from our method. Summary for β = (0, 3, 0, 0, 3, 0, 3, 0, 0) with each entry repeated 10 times: Summary for β = 90 values sampled from Laplace(λ = 1): Since the estimates of σ seem to be very inaccurate when we leave σ out of the β prior, we choose to include it. Since the results do not appear to be impacted by the choice of γ versus τ, we choose to use the parameterization with γ, since the code with the Inverse Gaussian distribution runs more than ten times faster than the code with 5

6 the Generalized Inverse Gaussian distribution. The high correlation between σ and λ is somewhat concerning, and we address this in the next subsection. 3. Assessing Gibbs Sampler We ran many simulations to assess the sampler. We tried different combinations of p i (number of genes per platform) and k (number of platforms), resulting in cases that ranged from p << n to p > n, focusing on the case where p is close to n, since that is what we anticipate to see in real data sets. (Recall that p = k i=1 p i = number of predictors in clinical model.) We also investigated the results when using true β values coming from a mixture of 0 and ±3 (as in the Casella simulation []) as well as true β values as sampled from a Laplace distribution with various λ values (to increase sparsity). We also simulated data using different σ values. After running 10, 000 iterations of the Gibbs sampler, using 500 for a burn-in, we looked at results such as trace plots, posterior means, credible intervals for β, shrinkage plots, correlation between σ and λ i s, and MSE efficiency for each of these scenarios. The training data and test data were simulated in the same manner as the previous simulations, with n = 100 for the training data and n = 400 for the test data, and other settings (σ, β, etc.) specific to the particular simulation. We generally saw excellent mixing of the β i,j s, except for some distinct autocorrelation when we tried updating the β i,j s in a few blocks as opposed to a single block. The shrinkage plots showed what we wanted: the effects close to zero were shrunk even closer to 0, and the larger effects had minimal shrinkage. The trace plots of σ and the λ i s looked good for n = 100 and p = 30, but we started seeing autocorrelation with p = 60 and even stronger autocorrelation when p was increased to 90. The estimates still seemed reasonable, but we wanted to investigate further to ensure that we were covering the entire parameter space. In investigating this issue, we realized that in the cases when the trace plots showed high autocorrelation, there was also high correlation between σ and λ i ; this makes sense because with our parameterization, the parameter of the Laplace prior for β i is σ/λ i. So we checked the trace plots of σ/λ i and found that the ratio was showing no distinct autocorrelation even when p = 90. We also did simulations with the same data (simulated with k = 1) but different starting values for λ 1 to ensure that the posterior means and convergence were not highly dependent on the initial value; with a true λ 1 of 9, we used initial values of 0.1, 1, 10, 30, and 100, and obtained almost identical results from each. At this point we were satisfied that the sampler was performing well. Results and plots from two relevant simulations are shown below. In each, we set n = 100, σ = 1, λ i = 3 (i = 1,..., k) and sampled from Laplace(λ i ) to set β. 6

7 Simulation 1: k = 3, p 1 = p = p 3 = 30 (3 shrinkage parameters) 7

8 Posterior means: σ = λ 1 = , λ = , λ 3 = ˆσ/ λ 1 = , ˆσ/ λ = 0.581, ˆσ/ λ 3 = MSE Efficiency: On training data = On test data = Simulation : k = 1, p 1 = 90 (1 shrinkage parameter) Posterior means: σ = 0.761, λ 1 = , ˆσ/ λ 1 = MSE Efficiency: On training data = On test data =

9 The patterns discussed above can be seen in the plots from the simulations. We observe that the ratio σ/λ i seems to be estimated more accurately than the individual parameters. We also note that the MSE efficiency (which is the MSE from least squares divided by the MSE from our method) is < 1 on the training data but > 1 on the test data. This is consistent with the idea that with so much true sparsity, the least squares estimates overfit the training data while our method provides estimates that are more applicable to the population. Overall, we are pleased with the sampler and believe that it is mixing well and converging to the true values. We are confident to move forward and apply this to a real data set. 4 Analysis on Data Glioblastoma Multiforme (GBM) is one of the most common and most malignant brain tumors. The data used in this project is GBM data from The Cancer Genome Atlas (TCGA). Among other things, the data set contains information on mrna expression, methylation, copy number, and survival for 33 patients. We are using the data corresponding to 49 genes in a single signaling pathway. 4.1 Description The bioanalyst extracted the relevant data from a much larger set of data that included information for many more genes. Four patients had multiple samples on at least one platform; the bioanalyst thinks the repeats were done to ensure the data is consistent. Under the assumption that this is the rationale (and not that there was some problem with the first sample), we decided to use the average of these repeats. Then we reformatted the extracted data into several structures: 1. OurSurvival (33 1), containing days of survival after diagnosis for each patient (with no missing data).. OurMRNA (33 49), containing mrna expression levels for each gene (columns) for each patient (rows) (with no missing data). 3. OurMeth (33 176), containing data on the methylation markers (columns) for each patient (rows). There can be multiple (ranging from 1-1) markers per gene; the columns are ordered by gene - all markers for gene 1, then all markers for gene, etc. There are 40 missing values out of (< 0.1%). There is one gene with no methylation data, so we set the corresponding coefficient to 0 later in the analysis. 9

10 4. OurCopyNumber (33 54), containing copy number data (columns) for each patient (rows). Again, there are multiple (ranging from 1-43) values per gene, and the columns are ordered by gene. There are 676 missing values out of 109 (5.5%). After imputing the missing values (see the following subsection), we perform two PCA s for each gene: one on the associated methylation markers and one on the associated copy number locations, each time keeping enough PC scores to account for at least 90% of the variation. Then (still for the single gene) we do least squares regression on the mrna expression using the PC scores as predictors, and we use the predicted pieces and the residuals from this regression to estimate M i, CN i, and O i in the mechanistic model. After repeating this for each gene, we put all the M i, CN i, and O i vectors into a matrix X (33 147). Each row of X corresponds to a patient, and the columns consist of 49 columns for the M i s, then 49 for the CN i s, and then 49 for the O i s. One gene has no methylation data, so we remove that column from the X matrix, which essentially sets that effect to be 0. Any effect that may be due to methylation for that gene would then be captured by the O predictor in the clinical model. Since we are analyzing survival data, we choose to use a log-normal model, and thus we mean-center log(oursurvival) to obtain our response vector. After standardizing the columns of X, we now have our final X matrix and response vector. We run the Gibbs sampler using k = 3, so there is a separate shrinkage parameter for methylation, copy number, and other. 4. Missing Data Since the percentage of missing data is so low, we choose to do imputation using the following algorithm for both the methylation data and the copy number data: (1) For each marker, replace any NA s with the mean of the other patients. Call this resulting matrix Temp. () Use Temp to calculate a correlation matrix between markers. (3) For each marker with missing value(s), regress it on the 3 markers it is most highly positively correlated with (using the Temp matrix for the predictors to avoid further complications from missing data). (4) Substitute this predicted value for the missing value in the original matrix. 4.3 Results Plots and output summarizing the results of the above steps are shown below: 10

11 First, we see that there does not appear to be an issue of autocorrelation in the trace plots of σ or the λ i s. The β i,j s also seem to be mixing well. The posterior mean of σ is 0.445, and the posterior means of λ 1, λ, and λ 3 are 6.6, 83.0, and 71., respectively. There are six 95% credible intervals that do not contain 0; they correspond to the effects of M 33 (which has an estimated effect less than 0), and CN 40, CN 45, O 15, O 0, and O (which have estimated effects greater than 1). Genes 33, 40, 45, 15, 0, and are GRB, CCND1, MDM, SRC, PDGFRB, and ERBB, respectively. 5 Conclusions & Future Work We have identified six genes that appear to have a significant effect on survival, and we have also identified the mechanism of the effect. 11

12 However, the shrinkage plot from analyzing the GBM data shows that the effects with the largest least squares estimates are shrunk much more than we would like; it appears that the single parameter in the beta prior is not sufficient to capture both the mass around zero and the mass in the tails. Our next step is to try using a two-parameter prior for beta, such as NEG or NG. There are several more things we are planning to investigate: We will check diagnostic plots to ensure the validity of the model. Instead of doing the mrna regression first and then the Bayesian Lasso, we may consider connecting the two steps in a fully unified Bayesian framework. We could incorporate the functional aspect of the copy number data, using chromosome location information, and do functional PCA as opposed to simply PCA. We may consider other methods of handling the missing data values, such as multiple step imputation or a variant of PCA designed to handle missing data. We plan to include mirna as another platform once we receive that data. The holdup is a biological issue of which mirna s to associate with which genes. The bioanalyst is trying to obtain association scores that we could use to make this determination. We may look into incorporating additional platforms, such as proteomic data. Eventually, our goal is to incorporate multiple gene pathways into one model. 1

13 References [1] Memorial Sloan-Kettering Cancer Center. Pathway analysis of genetic alterations in glioblastoma (tcga) [] Minjung Kyung, Jeff Gill, Malay Ghosh, and George Casella. Penalized regression, standard errors, and bayesian lassos. Bayesian Analysis, 5():369 41, 010. [3] Trevor Park and George Casella. The bayesian lasso. Journal of the American Statistical Association, 103(48): , 008. [4] Wenting Wang, Veera Baladandayuthapani, Jeffrey S. Morris, Bradley M. Broom, Ganiraju C. Manyam, and Kim-Anh Do. Integrative bayesian analysis of high-dimensional multi-platform genomics data. Submitted to Bioinformatics, May

14 Appendix Derivations of Complete Conditional Distributions: Inverse Gamma: V = IG(a, b) has a density f V (v) = b a v (a+1) exp( b/v)/γ(a). Generalized Inverse Gaussian: V = GIG(a, b, p) has density (a/b) p/ v p 1 exp{ (av + b/v)/}/{k p ( ab)}, where K p ( ) is a modified Bessel function of the second kind. Inverse Gaussian: X = InvGauss(ν, λ) has a density Parameterization of the model: f X (x) = {λ/(π)} 1/ x 3/ exp{ λ(x ν) /(ν x)}. k = number of predictor sets; p i = number of predictors for set i; k p = p i = total number of predictors; i=1 n = number of samples; Y = Xβ + ɛ where Yis mean-centered, β = (β 1,1,..., β 1,p,..., β k,1,..., β k,p ) and ɛ = Normal(0 n, σ I n ); Thus: Y = Normal(Xβ, σ I n ); β = Normal(0 p, σ D τ ) where D τ = diag(τ 1,1,..., τ 1,p,..., τ k,1,...τ k,p); τ i,j = Negative Exponential with mean /λ i, i.e., density (λ i /) exp( λ i τ i,j/); σ = InverseGamma(a, b), i.e., density b a (σ ) (a+1) exp( b/σ )/Γ(a); λ i = Gamma(r, δ), i.e., density δ r (λ i ) r 1 exp( δλ i )/Γ(r). The joint likelihood of (Y, X, β, σ, τ, λ ) is (σ ) n/ exp{ (Y Xβ) T (Y Xβ)/(σ )} p k i (σ ) p/ [ (τi,j) 1/ ] exp{ β T Dτ 1 β/(σ )} i=1 j=1 (σ ) (a+1) exp( b/σ ) k (λ i ) (r 1) exp( δλ i ) i=1 p k i λ i exp( λ i τi,j/). i=1 j=1 1

15 The distribution of β given the data and (τ, σ, λ ). This should be normal. [β rest] exp{ (Y Xβ) T (Y Xβ)/(σ )} exp{ β T Dτ 1 β/(σ )} exp{y T Xβ/σ β T (X T X + Dτ 1 )β/(σ )} Now we can apply the shortcut that when density exp( β T Σ 1 β/ + Cβ), then β Normal(ΣC T, Σ) to find: β rest Normal { (X T X + D 1 τ ) 1 X T Y, σ (X T X + D 1 τ ) 1}. The distribution of τ given the data and (β, σ, λ ). This should be generalized inverse gamma. [ τ rest ] ( k ) p i (τi,j) 1/ exp{ β T Dτ 1 β/(σ )} = i=1 j=1 p k i exp( λ i τi,j/) i=1 j=1 k i=1 j=1 p [ (τ i,j ) 1/ exp{ βi,j/(σ τi,j} exp( λ i τi,j/) ] = g(τ 1,1)g(τ 1,)...g(τ k,p k ) where g(τ j ) = (τ i,j) 1/ exp[ {λ i τ i,j + β i,j/(σ τ i,j)}/] So we can see that the τ i,j s are conditionally independent with τ i,j rest GIG(a = λ i, b = β i,j/σ, p = 1/) Also, defining the precision γi,j = 1/τi,j and applying the change of variable formula, we obtain: [ γ rest ] p k i g(1/γ1,1)g(1/γ 1,)...g(1/γ k,p k ) = h(γ 1,1)...h(γ k,p k ) where h(γ i,j ) = (γ i,j ) 3/ exp{ β i,jγ i,j/(σ ) λ i /(γ i,j)} i=1 j=1 (γ i,j) (γ i,j ) 3/ exp{ β i,jγ i,jλ i /(σ λ i ) + λ i (σ λ i /β i,j) 1/ λ i /(γ i,j)} = (γ i,j ) 3/ exp[ λ i {γ i,j (σ λ i /β i,j) 1/ } /{(σ λ i /β i,j)γ i,j}] So we can see that the γ i,j s are conditionally independent with γ i,j rest InvGaussian( ν = (σ λ i /β i,j) 1/, λ = λ i ) 13

16 The distribution of σ given the data and (τ, β, λ ). This should be inverse Gamma. [ σ rest ] (σ ) n/ exp{ (Y Xβ) T (Y Xβ)/(σ )} (σ ) p/ exp{ β T Dτ 1 β/(σ )} (σ ) (a+1) exp( b/σ ) = (σ ) {(n+ p)/+a+1)} exp [ {(Y Xβ) T (Y Xβ) + β T Dτ 1 β + b}/(σ ) ] Thus, we see that σ rest InvGamma ( a = a + (n + p)/, b = b + {(Y Xβ) T (Y Xβ) + β T D 1 τ β}/ ) The distribution of λ given the data and (τ, σ, β). This should be Gamma. [ λ rest ] = p k i λ i exp( λ i τi,j/) (λ i ) r 1 exp( δλ i ) i=1 j=1 p k i (λ i ) pi+r 1 exp{ λ i (δ + τi,j/)} i=1 j=1 So we can see that the λ i s are conditionally independent with p i λ i rest Gamma(a = p i + r, b = δ + τi,j/) j=1 14

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics arxiv:1603.08163v1 [stat.ml] 7 Mar 016 Farouk S. Nathoo, Keelin Greenlaw,

More information

Variable Selection in Structured High-dimensional Covariate Spaces

Variable Selection in Structured High-dimensional Covariate Spaces Variable Selection in Structured High-dimensional Covariate Spaces Fan Li 1 Nancy Zhang 2 1 Department of Health Care Policy Harvard University 2 Department of Statistics Stanford University May 14 2007

More information

Multi Omics Clustering. ABDBM Ron Shamir

Multi Omics Clustering. ABDBM Ron Shamir Multi Omics Clustering ABDBM Ron Shamir 1 Outline Introduction Cluster of Clusters (COCA) icluster Nonnegative Matrix Factorization (NMF) Similarity Network Fusion (SNF) Multiple Kernel Learning (MKL)

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Multivariate Survival Analysis

Multivariate Survival Analysis Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in

More information

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance

More information

Part 7: Hierarchical Modeling

Part 7: Hierarchical Modeling Part 7: Hierarchical Modeling!1 Nested data It is common for data to be nested: i.e., observations on subjects are organized by a hierarchy Such data are often called hierarchical or multilevel For example,

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q) Supplementary information S7 Testing for association at imputed SPs puted SPs Score tests A Score Test needs calculations of the observed data score and information matrix only under the null hypothesis,

More information

Structural Learning and Integrative Decomposition of Multi-View Data

Structural Learning and Integrative Decomposition of Multi-View Data Structural Learning and Integrative Decomposition of Multi-View Data, Department of Statistics, Texas A&M University JSM 2018, Vancouver, Canada July 31st, 2018 Dr. Gen Li, Columbia University, Mailman

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Stability and the elastic net

Stability and the elastic net Stability and the elastic net Patrick Breheny March 28 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/32 Introduction Elastic Net Our last several lectures have concentrated on methods for

More information

Horseshoe, Lasso and Related Shrinkage Methods

Horseshoe, Lasso and Related Shrinkage Methods Readings Chapter 15 Christensen Merlise Clyde October 15, 2015 Bayesian Lasso Park & Casella (JASA 2008) and Hans (Biometrika 2010) propose Bayesian versions of the Lasso Bayesian Lasso Park & Casella

More information

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31 Hierarchical models Dr. Jarad Niemi Iowa State University August 31, 2017 Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 1 / 31 Normal hierarchical model Let Y ig N(θ g, σ 2 ) for i = 1,...,

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu 1,2, Daniel F. Schmidt 1, Enes Makalic 1, Guoqi Qian 2, John L. Hopper 1 1 Centre for Epidemiology and Biostatistics,

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Module 11: Linear Regression. Rebecca C. Steorts

Module 11: Linear Regression. Rebecca C. Steorts Module 11: Linear Regression Rebecca C. Steorts Announcements Today is the last class Homework 7 has been extended to Thursday, April 20, 11 PM. There will be no lab tomorrow. There will be office hours

More information

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8,

More information

Bayesian shrinkage approach in variable selection for mixed

Bayesian shrinkage approach in variable selection for mixed Bayesian shrinkage approach in variable selection for mixed effects s GGI Statistics Conference, Florence, 2015 Bayesian Variable Selection June 22-26, 2015 Outline 1 Introduction 2 3 4 Outline Introduction

More information

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017 Markov Chain Monte Carlo (MCMC) and Model Evaluation August 15, 2017 Frequentist Linking Frequentist and Bayesian Statistics How can we estimate model parameters and what does it imply? Want to find the

More information

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Exploring gene expression data Scale factors, median chip correlation on gene subsets for crude data quality investigation

More information

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial

More information

A Bayesian Mixture Model with Application to Typhoon Rainfall Predictions in Taipei, Taiwan 1

A Bayesian Mixture Model with Application to Typhoon Rainfall Predictions in Taipei, Taiwan 1 Int. J. Contemp. Math. Sci., Vol. 2, 2007, no. 13, 639-648 A Bayesian Mixture Model with Application to Typhoon Rainfall Predictions in Taipei, Taiwan 1 Tsai-Hung Fan Graduate Institute of Statistics National

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Module 7: Introduction to Gibbs Sampling. Rebecca C. Steorts

Module 7: Introduction to Gibbs Sampling. Rebecca C. Steorts Module 7: Introduction to Gibbs Sampling Rebecca C. Steorts Agenda Gibbs sampling Exponential example Normal example Pareto example Gibbs sampler Suppose p(x, y) is a p.d.f. or p.m.f. that is difficult

More information

Lasso & Bayesian Lasso

Lasso & Bayesian Lasso Readings Chapter 15 Christensen Merlise Clyde October 6, 2015 Lasso Tibshirani (JRSS B 1996) proposed estimating coefficients through L 1 constrained least squares Least Absolute Shrinkage and Selection

More information

Bayesian construction of perceptrons to predict phenotypes from 584K SNP data.

Bayesian construction of perceptrons to predict phenotypes from 584K SNP data. Bayesian construction of perceptrons to predict phenotypes from 584K SNP data. Luc Janss, Bert Kappen Radboud University Nijmegen Medical Centre Donders Institute for Neuroscience Introduction Genetic

More information

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong

More information

Correlate. A method for the integrative analysis of two genomic data sets

Correlate. A method for the integrative analysis of two genomic data sets Correlate A method for the integrative analysis of two genomic data sets Sam Gross, Balasubramanian Narasimhan, Robert Tibshirani, and Daniela Witten February 19, 2010 Introduction Sparse Canonical Correlation

More information

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which

More information

g-priors for Linear Regression

g-priors for Linear Regression Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,

More information

Lecture 14: Shrinkage

Lecture 14: Shrinkage Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER

New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER MLE Going the Way of the Buggy Whip Used to be gold standard of statistical estimation Minimum variance

More information

Nonconvex penalties: Signal-to-noise ratio and algorithms

Nonconvex penalties: Signal-to-noise ratio and algorithms Nonconvex penalties: Signal-to-noise ratio and algorithms Patrick Breheny March 21 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/22 Introduction In today s lecture, we will return to nonconvex

More information

Linear Regression (1/1/17)

Linear Regression (1/1/17) STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships

More information

STA 216, GLM, Lecture 16. October 29, 2007

STA 216, GLM, Lecture 16. October 29, 2007 STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural

More information

ST 740: Linear Models and Multivariate Normal Inference

ST 740: Linear Models and Multivariate Normal Inference ST 740: Linear Models and Multivariate Normal Inference Alyson Wilson Department of Statistics North Carolina State University November 4, 2013 A. Wilson (NCSU STAT) Linear Models November 4, 2013 1 /

More information

Proteomics and Variable Selection

Proteomics and Variable Selection Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial

More information

Model Checking and Improvement

Model Checking and Improvement Model Checking and Improvement Statistics 220 Spring 2005 Copyright c 2005 by Mark E. Irwin Model Checking All models are wrong but some models are useful George E. P. Box So far we have looked at a number

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Exam: high-dimensional data analysis January 20, 2014

Exam: high-dimensional data analysis January 20, 2014 Exam: high-dimensional data analysis January 20, 204 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question not the subquestions on a separate piece of paper. - Finish

More information

Gibbs Sampling in Linear Models #2

Gibbs Sampling in Linear Models #2 Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling

More information

Bayesian Regression (1/31/13)

Bayesian Regression (1/31/13) STA613/CBB540: Statistical methods in computational biology Bayesian Regression (1/31/13) Lecturer: Barbara Engelhardt Scribe: Amanda Lea 1 Bayesian Paradigm Bayesian methods ask: given that I have observed

More information

MULTILEVEL IMPUTATION 1

MULTILEVEL IMPUTATION 1 MULTILEVEL IMPUTATION 1 Supplement B: MCMC Sampling Steps and Distributions for Two-Level Imputation This document gives technical details of the full conditional distributions used to draw regression

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Hierarchical Models & Bayesian Model Selection

Hierarchical Models & Bayesian Model Selection Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or

More information

Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies. Abstract

Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies. Abstract Bayesian Estimation of A Distance Functional Weight Matrix Model Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies Abstract This paper considers the distance functional weight

More information

Linear Regression 9/23/17. Simple linear regression. Advertising sales: Variance changes based on # of TVs. Advertising sales: Normal error?

Linear Regression 9/23/17. Simple linear regression. Advertising sales: Variance changes based on # of TVs. Advertising sales: Normal error? Simple linear regression Linear Regression Nicole Beckage y " = β % + β ' x " + ε so y* " = β+ % + β+ ' x " Method to assess and evaluate the correlation between two (continuous) variables. The slope of

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

Bayesian Prediction of Code Output. ASA Albuquerque Chapter Short Course October 2014

Bayesian Prediction of Code Output. ASA Albuquerque Chapter Short Course October 2014 Bayesian Prediction of Code Output ASA Albuquerque Chapter Short Course October 2014 Abstract This presentation summarizes Bayesian prediction methodology for the Gaussian process (GP) surrogate representation

More information

Frequentist Accuracy of Bayesian Estimates

Frequentist Accuracy of Bayesian Estimates Frequentist Accuracy of Bayesian Estimates Bradley Efron Stanford University RSS Journal Webinar Objective Bayesian Inference Probability family F = {f µ (x), µ Ω} Parameter of interest: θ = t(µ) Prior

More information

Nonparametric Bayes tensor factorizations for big data

Nonparametric Bayes tensor factorizations for big data Nonparametric Bayes tensor factorizations for big data David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & DARPA N66001-09-C-2082 Motivation Conditional

More information

Lecture 3. Univariate Bayesian inference: conjugate analysis

Lecture 3. Univariate Bayesian inference: conjugate analysis Summary Lecture 3. Univariate Bayesian inference: conjugate analysis 1. Posterior predictive distributions 2. Conjugate analysis for proportions 3. Posterior predictions for proportions 4. Conjugate analysis

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (BL)

GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (BL) GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (BL) Paulino Pérez 1 José Crossa 2 1 ColPos-México 2 CIMMyT-México September, 2014. SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

Bayesian Classification Methods

Bayesian Classification Methods Bayesian Classification Methods Suchit Mehrotra North Carolina State University smehrot@ncsu.edu October 24, 2014 Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 1 / 33 How do you define

More information

Bayesian course - problem set 5 (lecture 6)

Bayesian course - problem set 5 (lecture 6) Bayesian course - problem set 5 (lecture 6) Ben Lambert November 30, 2016 1 Stan entry level: discoveries data The file prob5 discoveries.csv contains data on the numbers of great inventions and scientific

More information

Dimension Reduction Methods

Dimension Reduction Methods Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School

More information

Bayesian Linear Models

Bayesian Linear Models Eric F. Lock UMN Division of Biostatistics, SPH elock@umn.edu 03/07/2018 Linear model For observations y 1,..., y n, the basic linear model is y i = x 1i β 1 +... + x pi β p + ɛ i, x 1i,..., x pi are predictors

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

Probability models for machine learning. Advanced topics ML4bio 2016 Alan Moses

Probability models for machine learning. Advanced topics ML4bio 2016 Alan Moses Probability models for machine learning Advanced topics ML4bio 2016 Alan Moses What did we cover in this course so far? 4 major areas of machine learning: Clustering Dimensionality reduction Classification

More information

Optimal rules for timing intercourse to achieve pregnancy

Optimal rules for timing intercourse to achieve pregnancy Optimal rules for timing intercourse to achieve pregnancy Bruno Scarpa and David Dunson Dipartimento di Statistica ed Economia Applicate Università di Pavia Biostatistics Branch, National Institute of

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

Bayesian methods in economics and finance

Bayesian methods in economics and finance 1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent

More information

Linear Models A linear model is defined by the expression

Linear Models A linear model is defined by the expression Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose

More information

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data Hierarchical models for spatial data Based on the book by Banerjee, Carlin and Gelfand Hierarchical Modeling and Analysis for Spatial Data, 2004. We focus on Chapters 1, 2 and 5. Geo-referenced data arise

More information

Lecture 16: Mixtures of Generalized Linear Models

Lecture 16: Mixtures of Generalized Linear Models Lecture 16: Mixtures of Generalized Linear Models October 26, 2006 Setting Outline Often, a single GLM may be insufficiently flexible to characterize the data Setting Often, a single GLM may be insufficiently

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis Summarizing a posterior Given the data and prior the posterior is determined Summarizing the posterior gives parameter estimates, intervals, and hypothesis tests Most of these computations are integrals

More information

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina Local Likelihood Bayesian Cluster Modeling for small area health data Andrew Lawson Arnold School of Public Health University of South Carolina Local Likelihood Bayesian Cluster Modelling for Small Area

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Hierarchical Modeling for Univariate Spatial Data

Hierarchical Modeling for Univariate Spatial Data Hierarchical Modeling for Univariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Spatial Domain 2 Geography 890 Spatial Domain This

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Bayesian Sparse Linear Regression with Unknown Symmetric Error

Bayesian Sparse Linear Regression with Unknown Symmetric Error Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of

More information

Lasso regression. Wessel van Wieringen

Lasso regression. Wessel van Wieringen Lasso regression Wessel van Wieringen w.n.van.wieringen@vu.nl Department of Epidemiology and Biostatistics, VUmc & Department of Mathematics, VU University Amsterdam, The Netherlands Lasso regression Instead

More information

Package IGG. R topics documented: April 9, 2018

Package IGG. R topics documented: April 9, 2018 Package IGG April 9, 2018 Type Package Title Inverse Gamma-Gamma Version 1.0 Date 2018-04-04 Author Ray Bai, Malay Ghosh Maintainer Ray Bai Description Implements Bayesian linear regression,

More information

Bayesian Variable Selection Regression Of Multivariate Responses For Group Data

Bayesian Variable Selection Regression Of Multivariate Responses For Group Data Bayesian Variable Selection Regression Of Multivariate Responses For Group Data B. Liquet 1,2 and K. Mengersen 2 and A. N. Pettitt 2 and M. Sutton 2 1 LMAP, Université de Pau et des Pays de L Adour 2 ACEMS,

More information

Package horseshoe. November 8, 2016

Package horseshoe. November 8, 2016 Title Implementation of the Horseshoe Prior Version 0.1.0 Package horseshoe November 8, 2016 Description Contains functions for applying the horseshoe prior to highdimensional linear regression, yielding

More information

Robust Bayesian Simple Linear Regression

Robust Bayesian Simple Linear Regression Robust Bayesian Simple Linear Regression October 1, 2008 Readings: GIll 4 Robust Bayesian Simple Linear Regression p.1/11 Body Fat Data: Intervals w/ All Data 95% confidence and prediction intervals for

More information

Geometric ergodicity of the Bayesian lasso

Geometric ergodicity of the Bayesian lasso Geometric ergodicity of the Bayesian lasso Kshiti Khare and James P. Hobert Department of Statistics University of Florida June 3 Abstract Consider the standard linear model y = X +, where the components

More information

5.2 Expounding on the Admissibility of Shrinkage Estimators

5.2 Expounding on the Admissibility of Shrinkage Estimators STAT 383C: Statistical Modeling I Fall 2015 Lecture 5 September 15 Lecturer: Purnamrita Sarkar Scribe: Ryan O Donnell Disclaimer: These scribe notes have been slightly proofread and may have typos etc

More information

Inferring Transcriptional Regulatory Networks from High-throughput Data

Inferring Transcriptional Regulatory Networks from High-throughput Data Inferring Transcriptional Regulatory Networks from High-throughput Data Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Physician Performance Assessment / Spatial Inference of Pollutant Concentrations

Physician Performance Assessment / Spatial Inference of Pollutant Concentrations Physician Performance Assessment / Spatial Inference of Pollutant Concentrations Dawn Woodard Operations Research & Information Engineering Cornell University Johns Hopkins Dept. of Biostatistics, April

More information