A temporal hidden Markov regression model for the analysis. of gene regulatory networks

Size: px
Start display at page:

Download "A temporal hidden Markov regression model for the analysis. of gene regulatory networks"

Transcription

1 A temporal hidden Markov regression model for the analysis of gene regulatory networks Mayetri Gupta, Pingping Qu, and Joseph G. Ibrahim February 20, 2007 Abstract We propose a novel hierarchical hidden Markov regression model for determining gene regulatory networks from genomic sequence and temporally collected gene expression microarray data. The statistical challenge is to simultaneously determine the groupings of genes and subsets of motifs involved in their regulation, when the groupings may vary over time, and a large number of potential regulators are available. We devise a hybrid Monte Carlo methodology to estimate parameters under two classes of latent structure, one arising due to the unobservable state identity of genes, and the other due to the unknown set of covariates influencing the response within a state. The effectiveness of this method is demonstrated through a simulation study and an application on an yeast cell-cycle data set. Introduction With the increased availability of large scale genomic data, discovering the roles of various components in the genome has become an important goal in the biomedical sciences. Many biological processes in complex organisms are regulated by combinations of genes forming a pathway or network (Wang et al., 2005). A fundamental question is to determine how genes interact to regulate biological pathways. Making robust statistical Department of Biostatistics, University of North Carolina at Chapel Hill, NC 27599, U.S.A.; gupta@bios.unc.edu

2 inference from diverse types of genomic data to discover pathways of biological function presents a formidable challenge, especially as various latent correlations exist in gene behavior, and these behaviors and relationships may change over time. The first step in gene regulation constitutes the binding of certain proteins, called transcription factors (TFs) to TF binding sites (TFBSs) on the DNA sequence, which activates the genes downstream into being transcribed into messenger RNA (mrna). The pattern common to the TFBSs for a TF is termed a motif, and genes that are regulated by the same TF often share a common motif. A popular strategy for studying gene regulation is a two-step procedure searching for motifs upstream of genes after clustering the genes by similar expression patterns. However, a major problem with this approach is that if the initial clustering is inaccurate, discovery of the correct motifs may be heavily biased.. Gene regulatory processes and statistical modeling We first describe some terminology and current approaches. A motif of length w is represented through a 4 w matrix called a position-specific weight matrix (PSWM), each column denoting the probabilities (or frequencies) of observing the 4 letters A, C, G, or T in that position (w typically ranges between 8 to 20). For example, binding sites for the yeast motif RAP can be represented through the PSWM (of counts): (A) (C) (G) (T ) Successful computational strategies to discover functional motifs in biological processes have mostly involved searching for conserved motifs in the promoter sequence adjacent to each of a set of co-regulated genes (Liu et al., 995), for example, genes exhibiting similar expression patterns across a number of experimental conditions in a microarray experiment. More recently, approaches that incorporate more biological information, such as weighting promoters by expression-based scores before motif discovery, have been proposed (Liu et al., 2002). Linear model-based approaches (Conlon et al., 2003) relate 2

3 expression values to the estimated propensity of motif occurrences (summarized by a sequence motif score ), assuming that the presence of a motif site contributes additively to the gene expression level. These approaches assume a single set of gene expression patterns, which makes it difficult to discover motifs having a positive effect on a subset of genes and a negative or neutral effect on another, a problem exacerbated in the case of temporal gene expression. Probabilistic relational models, based on the specification of multiple conditional distributions (Segal et al., 200) have been used for modeling gene regulation; however, such models involve a degree of parametric complexity that hinders model validation approaches. One promising approach involved iterative modelbased clustering and motif determination (Holmes and Bruno, 2000) but avoided a direct model-based link between expression and sequence..2 A joint approach for time dependent regulatory networks In this article we present a method that determines groups of genes and TFs regulating them, allowing for the pattern of regulation to change temporally. This approach addresses the discovery of motifs temporally involved in gene regulation, unlike methods which either ignore the time-dependence or consider each time point separately. It may not be reasonable to assume that genes can be grouped into clusters that behave similarly as a group over time. Our model allows correlations to exist between gene measurements over different time points as well as between separate genes on the same array. Several patterns of gene behavior may exist in a single experiment, where genes may enter or leave a particular pattern at a certain time point in the study. We present a novel statistical approach that can integrate the information from expression measurements over time and genomic sequence data through a Bayesian hidden Markov model (HMM) framework, and simultaneously uncover relationships between genes and TFs that regulate them. We generalize the regression model proposed in Conlon et al. (2003) to a framework than can accommodate temporally varying motif effects. 3

4 In our model, the temporal structure of dependence in gene behavior is modeled by a latent Markov process, where at every time point, the observed expression measurement is modeled as a state-dependent function of the time and motif covariates. Using a regression framework provides interpretability of the resulting coefficients, and a systematic way to model interactions between various factors. Unlike other two-step clustering and motif-finding approaches, our method provides a unified framework giving simultaneous estimates of gene co-regulation and time-dependent motif effects. We first describe the general Bayesian model framework (Section 2) and present a hybrid Monte Carlo procedure for model fitting, parameter estimation, and variable selection under a two-layered latent structure (Section 3). Defining the model within a hierarchical framework, we show that problems of parameter estimability, identifiability, and model validation can be addressed in a robust way. A Bayesian criterion for selecting the number of model states is discussed (Section 4), which is seen to outperform criteria such as the AIC or BIC. Finally, simulation studies and a application to a yeast cell-cycle data set demonstrates the feasibility of this model in a real biological scenario (Sections 5, 6)..3 A yeast cell cycle gene expression data set Our motivating data set is from a set of yeast microarray experiments (Spellman et al., 998), an extensive study of genes regulated in a periodic manner coincident with the cell-cycle. cdna microarrays were used to analyze mrna levels in cell cultures that had been synchronized by three independent methods. Gene expression was recorded as the logarithm of the ratio of expression of that gene in the sample to a baseline control. The data set was preprocessed by computing the fluorescence ratios through a local background correction (intensities of the weakest 2% of the pixels in each box) and normalized such that the average log-ratio over the course of the experiments equaled zero. Cell-cycle synchronization was inferred for genes (i) whose mrna levels were 4

5 significantly correlated with mrna levels of genes previously known to be cell cycle regulated, and (ii) showed a high periodicity based on a numerical score derived from Fourier analysis. A total of 800 genes were identified as being cell-cycle regulated. For our analysis, we restrict our focus to the common group of genes between the three synchronization methods. In Section 6 we describe the analysis of the elutriation data set, which consists of measurements of 477 genes over 2 time points corresponding to approximately two cell cycles. For deriving the motif scores corresponding to each gene, position-specific weight matrices (PSWMs) and upstream promoter regions of each gene (about a Kb region) were downloaded from the Saccharomyces Cerevisiae promoter database (Zhu and Zhang, 999). The motif scoring method is described in Section 2.2. The cell cycle data set is available from the Stanford microarray database ( 2 A hidden Markov regression model Now we motivate the development of the model framework. Let Y i = (Y i,..., Y it ), (i =,..., N) denote the vector of T expression measurements made on gene i. Assume that each observation may have been generated from one of K classes, indexed by the corresponding latent vector Z i = (Z i,..., Z it ). Further, assume that we have measurements on a set of p motif covariates for each observation, the design matrix being denoted by X = (X il ), (i =,..., N; l =,..., p). Let β k = (β k,..., β kp ) denote the regression coefficient vector for class k. First, assume Y ij s are independently generated from class k (k =,..., K), with the relationship between Y i and X i = (X i,..., X ip ) given by Y ij Z ij = k N (ζ k (X i ), σ 2 k ) f k( β k, σ 2 k ), where ζ k ( ) is shorthand for a function specifying the regression relationship, i.e. for a linear regression model, ζ k (X i ) = X T i β k. If class memberships are unknown a priori, π k ( k K) denoting the prior probability of being in class k, the Y ij s can be 5

6 assumed to be generated from a mixture of regression models, each component of the mixture encapsulating a separate regression. However, if the measurements are made over time, the observations Y i,... Y it are intrinsically dependent. By setting the class membership to account for the dependence in Y i, we capture this dependence through a hidden Markov model, introducing a stochastic relationship in the latent class indicator Z i. Assuming that the dependence structure is homogeneous over time, we model the dependence in Z ij (j =,..., T ) using a transition matrix τ = ((τ kl )), where τ kl = P (Z ij = l Z i,j = k) j {,..., T }; k, l {,..., K}. (2.) At the initial time point (j = ), we assume the prior probability of states is given by π = (π,..., π K ), where P (Z i = k) = π k, (k =,..., K). We denote this model as a hidden Markov regression model (HMRM), where the conditional distribution of Y ij at time j for unit i can be written in the form: Y ij Z ij = k N (ζ k (X i, φ(j)), σk 2 ). (2.2) In (2.2), we have included an extra covariate φ(j) for the j-th time point, to account for time-dependence. (2.2) and (2.) completely specify the hidden Markov regression model, up to the functional forms of ζ k ( ) and φ(j). In the following section, we describe the model components in more detail. 2. Modeling cell-cycle effects The two goals of interest are to: (i) determine the variables X influencing the observed Y and (ii) find the optimal number of classes K. We assume that the observed gene expression measurement Y ij at time j is a composite of two effects, a fixed motif-dependent effect ζ k (X i, φ(j)), and a random effect ψ ijk due to gene-specific differences in expression magnitude, where ψ ijk Z ij = k N (µ kj, ξ 0 σk 2) (ξ 0 denoting a variance inflation factor). If a sufficient number of replicate measurements are available, gene-specific variances can 6

7 be modeled by using a hierarchical prior. Since such replication is often not available (as in the yeast data set), genes in a common class are currently assumed to have the same variance. Integrating out the random effect gives the marginal distribution of Y ij as Y ij Z ij = k N (µ kj + ζ k (X i, φ(j)), ( + ξ 0 ) σk 2 ). (2.3) We next choose appropriate functional forms for ζ( ) and φ( ). For regulatory networks over time, the covariates of interest are motif scores corresponding to each gene. We assign each sequence promoter region a score covariate X il ( l D; i N) with respect to the set of D motif patterns, indicating its propensity to contain one or more binding sites for that motif (Section 2.2). Most previous work relating sequence effects with gene expression assume a linear relationship (Bussemaker et al., 200; Conlon et al., 2003) which may be a simplification of the real biological model. However, for balancing the trade-off between parameter identifiability and model sophistication, in the face of increased model complexity, we also currently consider only linear covariate effects of the motifs. For the time covariate, the effect is cyclical, depending on which functional cluster each gene lies within and the point of the cell cycle being observed. With a total set of D motif covariates, we assume D ζ k (X i, φ(j)) = α k sin φ(j) + α 2k cos φ(j) + β kl X il, (2.4) l= where φ(j) is a suitably chosen function that maps time point j into a phase of the cellcycle. The state-dependent sinusoidal coefficients allow the flexibility of gene clusters varying in amplitude of their effects, as well as frequency. This may help, for example, in identifying clusters having opposing time-dependent effects over the cycle. Since the data set was partially synchronized before analysis, we limited the number of harmonic components to one to avoid over-parametrization. More stringent tests could be carried out to determine an adequate number of harmonic components (see, for example, Quinn (989)). 7

8 Next, we summarize the general data structure. For simplifying the notation, unless indicated otherwise, the subscript ranges in the following are: i =,..., N; j =,..., T ; l =,..., p, with p = 2 + D. Let Y (k) = {Y ij µ kj : Z ij = k}, and x = {x ijl }, where x ij = (x ij,..., x ijp ) = (sin φ(j), cos φ(j), X i,..., X id ). Further let X = ((X il )), ( i NT ) stack{x ij, i N; j T }. X is of dimension NT p, where each row corresponds to a realization of one gene (i) at one time point (j). We denote the subset of X corresponding to observations in state k as X (k) = {X : Z ij = k}. Let u = (u,..., u p ) denote a p dimensional vector where u l = (0) denotes that covariate X l is present (absent) in the model. The subset of X indexed by the selected variables u is X (u) = {X il X : u l = ; i NT }. Then, the submatrix of X formed by the rows (i, j) corresponding to Z ij = k, and u = p l= u l columns such that u l =, is given by X (u) (k) = {X(u) : Z ij = k}. Also, for the set of coefficients β k = (α k, α 2k, β k,..., β kd ) T corresponding to the subset of variables in state k, indexed by u, we use the notation β (u) k = {β kl : u l = }. 2.2 Motif scoring model and covariates Each upstream sequence is next given a score with respect to each position-specific weight matrix (PSWM), Θ j, ( j D), of width w j columns, to get a N D covariate matrix of gene-sequence scores X, where each row X i = (X i,..., X id ) is the score vector for gene i ( i N). As mentioned in Section., a PSWM is characterized as Θ = ((θ jk )) ( j 4; k w), to denote a motif of w columns, where each column of the matrix θ k = (θ k,..., θ 4k ) T denotes the relative frequencies of each letter in that position. Let θ 0 = (θ 0,..., θ 04 ) denote the relative frequencies of the four nucleotides characterizing the background sequence (not containing motifs) assuming that the sequence is generated from a Markov process of order zero. If the probability of motif occurrence is assumed uniform over the sequence, then the score X ij for sequence i, relative to motif j, is taken to be the logarithm of the likelihood ratio between the j-th motif model and the 8

9 background model, assuming that any number of motif sites can occur in the sequence, with the constraint that there is no overlap between sites. Let x [l:m] denote the sequence {x l, x l+,..., x m }. For the motif covariates X ij for gene i and motif j, given the position-specific PSWM Θ j, the likelihood ratio is P (x [:L i ] Θ j ) P (x [:Li ] θ 0 ), where L i the length of the upstream sequence for gene i. Calculation of the denominator is straightforward, for both an independent and Markovian model. However, since the actual partitions of the sequence into individual sites and background is not known, evaluating the numerator (with Θ j fixed) involves an exponential order sum including all possible segmentations into possible motif sites and background sequence. Fortunately, we can use a recursive technique that has been formulated in detail in Gupta and Liu (2003) in the context of motif discovery, for more efficient computation. Let Φ k (Θ) be the sum of probabilities for all legitimate partitions for partial sequence x [:(k )]. Then, we can recursively evaluate Φ k as Φ k (Θ) = l {,w j } ρ(x [(k l):k] )Φ k l (Θ), (2.5) where ρ(x [l:m] ) = P (x [l:m] Θ j ) [m l+=w j ] P (x [l:m] θ 0 ) [m l+=w j ] evaluates the probability that the previous segment is either a motif site or a letter generated from the background, and [X=a] denotes an indicator variable that takes value only if X = a. P (x [l:m] Θ j ) is calculated assuming the letter frequencies in each position are independent multinomials with parameters corresponding to columns of Θ j. By recursively evaluating expression (2.5), the likelihood of the entire segment is calculated as Φ Li (Θ) for sequence i ( i N). We note that Conlon et al. (2003) use a simpler version of this score under the assumption that a sequence can contain only a single motif site. 2.3 A hierarchical prior framework In a linear regression framework, an attractive choice of prior for the regression coefficient β is the conjugate g-prior (Zellner, 986). With the error variance denoted as σ 2, the 9

10 g-prior is of the form gσ 2 I, where I denotes the Fisher information matrix X T X, and g is a scalar constant. However, when the observations belong to one of K unknown classes, i.e. in a mixture or HMM framework, the density is no longer from a regular exponential family, and the information matrix cannot be derived in an analytical form. In our model, the entire set of regression coefficients for state k ( k K) is denoted by β k = (α k, α 2k, β k,..., β kd ) T, and β = (β,..., β K ). Following the general form of the g-prior, we choose a conjugate prior for β, as β k N (β 0, gσ 2 k (XT X) ). This approximates the complete data g-prior with covariance gσ 2 k ( i:z i =k X ix T i ), which cannot be directly evaluated since Z is a latent variable. Our modified g-prior preserves a weakened form of the correlation structure of the likelihood, in comparison to using a simpler (e.g. independent) form, and leads to several attractive resultant properties in posterior estimates (Section 3..). Also, by not assuming an a priori independence structure on β, we can avoid imposing posterior independence among the TF effects, which appears biologically more meaningful. The choice of the scalar g, which controls the penalty for choosing models, is discussed further in Section 3.2. For the other parameters, we use standard conjugate prior formulations. Let µ = ((µ kj )); σ 2 = (σ 2,..., σ2 K ); ( j T ; k K). We take µ kj N (m k0, v 2 k0 ), σ 2 k Gamma ( w 0 /2, S 0 /2 ), (τ k, τ k2,..., τ kk ) Dirichlet(ω k = ω k,..., ω kk ); and (π,..., π K ) Dirichlet(α 0 = (α 0,..., α 0K )). As a hierarchical prior for u, we set u l η Bernoulli(η) ( l p) and η Beta(ɛ, ɛ 2 ). 3 Parameter estimation in the HMRM After developing the model in Section 2, we now introduce a hybrid Monte Carlo procedure for efficient model fitting and parameter estimation. Let θ = (µ, β, σ 2, τ ) denote the set of all parameters in the model. To start with, assume that the total number of states K, is fixed and known. The complete data posterior distribution, integrating out 0

11 the random effects ψ, is given by P (θ Y, X, Z, u) P (Y X, Z, u, θ)p (Z θ)p (u X, θ)p (θ) K T [{ } N (y ij ; µ kj + ζ k (X i, φ(j)), ( + ξ 0 )σk 2 ) p(µ, β, σ2, τ ) k= j= i:z ij =k { N }] P (Z ij Z i,j, τ ) IBet ɛ,ɛ 2 ( u, D u ). (3.) i=2 We update the model parameters through the following iterative procedure: (i) Covariate selection: update u Y, X, Z, θ; (ii) State updating: update Z Y, X, u, θ; (iii) Parameter updating: update θ Y, X, u, Z. Details of each step are provided in the next section. 3. Covariate selection The initial number p of covariates included in the model may potentially be very large. Covariates that are not significantly correlated with the response may introduce noise and result in inaccurate fitting, especially as the clusters are determined by the statespecific regression relationship between the response and covariate. We thus include a variable selection step using the evolutionary Monte Carlo (EMC) procedure, which has been shown to be highly efficient in high-dimensional problems (Liang and Wong, 2000). 3.. Evolutionary Monte Carlo procedure EMC is a population-based Monte Carlo procedure that involves (i) sampling simultaneously from parallel Monte Carlo chains using tempered versions of the target distribution, ranging from lowest (target distribution) to highest ( flattest distribution), maximizing mixing and (ii) using local Metropolis-Hastings moves of mutation and crossover. In the variable selection step, we need to compare models of dimensions u and v ( u = p l= u l). We first calculate a marginalized likelihood, integrating out the

12 parameters whose dimensions vary with u. With conjugate priors for β k and σk 2, we get H(u X, Z, µ) = P (Y X (u), Z, µ, u) (3.2) = P (Y X (u), Z, µ, u, β (u), σ 2, τ )P (u η)p(β (u), σ 2, η, τ )dβ (u) dσ 2 dηdτ [ K = IBet ɛ,ɛ 2 ( u, D u )IDirα 0 (m ) IDirω k (t k ) IIGamma w 0 2, S 0 2 ( nk k= ) 2, R(u) k [2π( + ξ0 )] nk/2 X (u)t X (u) 2 X (u)t (k) X (u) (k) +ξ 0 + X (u)t X (u) g g 2 2 ], (3.3) where IBet a,b (c, d) denotes the inverse ratio of the normalizing constants for the Beta distributions Beta(a + c, b + d) and Beta(a, b), and IIGamma a,b (c, d) = (b+d)a+c Γ(a) b a Γ(a+c) denotes the ratio of normalizing constants for the Inverse-Gamma distributions IG(a + c, b + d) and IG(a, b). IDir ωk (t k ) denotes the ratio of normalizing constants for the Dirichlet distributions Dir(t k + ω k ) and Dir(ω k ), where t k = (t k,..., t kk ); t kl = T j=2 [Z i,j =k,z ij =l] is the number of observed transitions between states k and l. N i= m = (m,..., m K ); m k = N i= [Z i, =k], and n k = N i= T j= [Z ij =k]. Also, R (u) k = Y T (k) + ξ Y (k) + 0 g β(u)t 0 X (u)t X (u) β (u) (u)t 0 β k Σ (u) βk(u) β k, (3.4) where the posterior estimates for the covariance and mean of β k are [ (u)t X (k) X (u) (k) Σ βk(u) = + X(u)T X (u) ] and + ξ 0 g β (u) k [ (u)t X (k) Y (k) = Σ βk(u) + X(u)T X (u) + ξ 0 g ] β (u) 0. (3.5) R k can be interpreted as a cluster-specific residual (see Appendix A). Also, for a large value of g, the R k term has the attractive property of reducing to a scaled version of the frequentist residual sum of squares in the regression framework (Appendix B). Conversely, for a small value of g, implying a strongly informative prior for β, R k represents a sum of squares scaled by the involutory matrix (I 2H k ), where (I 2H k ) T (I 2H k ) = I (Harville, 997). It is interesting to note that while Y T (k) (I H k)y (k) = Y T (k) (I H k)(i H k )Y (k) represents the residual sum of squares, Y T (k) (I 2H k)(i 2H k )Y (k) = Y T (k) Y (k), 2

13 the total sum of squares. Since (I 2H k ) is not positive definite (though it is nonsingular), taking a very small value of g may lead to instability or inestimability of parameter estimates in case Y T (k) (I 2H k)y (k) is negative. Thus it is desirable to perform a few pilot runs before selecting an appropriate g for the analysis. Further details of the EMC procedure are given in Appendix C. A remaining task is to sample the states and parameters. A Gibbs sampling procedure may be used to sample states, one gene at a time, successively from the conditional distributions P (Z i,t Z i,,..., Z i,t, Z i,t+,..., Z i,t, X, u, θ). However, a more efficient method in this scenario is a recursive data augmentation (DA) step, based on forward sum-backward sampling techniques used in computation of the likelihood in hidden Markov-type models (Gupta and Liu, 2003). This represents a grouped sampling step rather than a conditional update, and has been shown to have better convergence properties than the Gibbs sampler (Liu et al., 994). Since conjugate priors are used, each of the updating steps is of closed analytical form and straightforward. The state and parameter updating steps are detailed in Appendices D and E. 3.2 Sensitivity of covariate selection to choice of g A large g in the prior for β k results in robust posterior inference for β, but over-penalizes larger models. As g, the Bayes factor for comparing any choice of predictors to the null model tends to zero, making the model selection procedure inconsistent (see Appendix F and Bartlett (957)). For regular families, the choice of g = n represents a unit information prior, leading to Bayes factors that behave as the BIC (Kass and Wasserman, 995). In the case of HMMs this result does not hold as the observations are correlated. Our empirical studies suggest that the BIC tends to overestimate the number of components. One alternative is to take a hyperprior on g. By assuming g Inv-χ 2 (ν), (β k β 0 ) t ν ( ; σk 2(XT X) ), a scaled t-distribution with ν degrees of freedom. The robustness of 3

14 the t distribution makes it an attractive alternative to using a fixed g prior. In the case of the marginal distribution (3.2), integrating out g is analytically intractable; however, one can sample g from its posterior distribution during the MCMC procedure. This involves adding a sampling step for g, using: ( ν + K u Y, X, u, Z, β Gamma, + K g 2 k= σ 2 k (β(u) k β (u) 0 )T X (u)t X (u) (β (u) k β (u) 2 In applications, a numerically stable procedure appears to be initiating the algorithm with a large g value, and sampling g with other parameters once the algorithm stabilizes. In general, one must check that the diagonal elements of g(x T X) are not too small. 0 ) ). 4 Bayesian criterion-based model selection To estimate the number of states K in the HMRM, a usual approach would be calculating the Bayes factor for models with different K. The HMRM involves the latent variables u and Z, relating to the included covariates and hidden states. To evaluate the Bayes factor, we need to compute the marginal probabilities P (Y M, X) = θ u Z P (Y M, X, θ, Z, u)p (θ M)P (Z)P (u)dθ, for each model M, integrating out the latent variables. These sums are analytically intractable, and computational methods for calculating the Bayes factor (Meng and Wong, 996; Chen and Shao, 997; Green, 995) are typically either computationally expensive, or unstable due to the irregular nature of the likelihood function. Instead, we use an alternative approach through a Bayesian goodness-of-fit statistic constructed from the posterior predictive distribution of the data. The L-measure (Ibrahim et al., 200), and its calibration distribution allows the formal comparison of two competing models. Let W i, (i =,..., N) denote the future values from an imagined replicate experiment, with the same sampling distribution as Y i, (i =,..., N) as in Section 2. The generalized L-measure (Ibrahim et al., 200) is L(y, ν) = E[(W E(W y))(w E(W y))] T + ν(e(w y) y)(e(w y) y) T, (4.) 4

15 where the expectations are taken with respect to the posterior predictive distribution P (W y), and 0 ν (see Appendix G for details). The two terms in (4.) can be interpreted as corresponding to a variance and bias term, and hence choosing ν = would reduce (4.) to the matrix of Euclidean distance between y and W. An attractive feature of the L-measure approach, compared to Bayes factors, is that this criterion gives an absolute judgment of whether the selected model provides adequate fit to the data, rather than a comparison between models. To calibrate the L-measure, we compute the empirical distribution of D(y, ν) = ˆL c (y, ν) ˆL t (y, ν), where ˆL c and ˆL t denote the estimated trace of the L- measure for the candidate and true models. In practice, as the true model is unknown, we follow Ibrahim et al. (200) in replacing the true model by the criterion-minimizing model, and generating samples ỹ from the prior predictive distribution of y. 5 Simulation studies As a proof-of-principle test for the new approach, simulation studies were conducted to test the (i) consistency of the method in estimating parameters in presence of noise and increasing dimensionality of the data, and (ii) robustness of the method to misspecification of hyperparameters. Data was generated from a two-cluster and five-cluster model with 400 response variables (genes) and 5 true covariates (motifs), with measurements over 8 time points. The motif scores were simulated from a set of real yeast PSWMs through the scoring function described in Section 2.2. We also generated a set of 35 dummy covariates corresponding to each gene, some of which resembled true motif scores not significantly correlated with the response, and the remaining as noise. Given the motif scores, and cluster identity Z ij at time point j, for gene i, the distribution of gene expression scores 5

16 was given by ( Y ij Z ij = k N µ kj + α k sin φ(j) + α 2k cos φ(j) + 5 l= β kl X il, ( + ξ 0 )σ 2 k We tested the algorithm both by fixing g such that g(x T X) ranged between and 00 and also allowing g to vary once the algorithm had achieved some stability- either ). way, we observed no significant variability in the posterior estimates. The next tests for consistency and robustness were done assuming the total number of true states K was known. ACFs for all parameters for different specifications of ξ 0 were seen to have a maximum autocorrelation of 0.05 at lag 5 (Supplementary material) so the algorithm was assumed to have converged to the stationary distribution over the range of iterations being considered (with a burn-in of about 5000). A typical run of 50,000 iterations on a data set of the size mentioned took a median CPU time of hours on a IBM BladeCenter cluster with dual Intel Xeon 2.4GHz nodes running RedHat Linux 7.3. Tests for consistency. We tested the performance of the algorithm under increasing levels of noise, by introducing sets of noise variables into the algorithm, 5, 25, and 35 variables in turn. Results are presented for the K = 5 data set; the K = 2 data set gives even more accurate results due to the decreased complexity of the model, and are not shown here. Table shows that increasing the number of covariates has almost no effect on the selection of the correct ones. This observation suggests that the algorithm is robust to mis-inclusion of covariates which have no effect on the response, and thus should be able to efficiently select covariates even when the total dimensionality is high, as long as the true covariates have a significant effect on the response. To estimate the performance of the algorithm when unknown or unmodeled motifs interact with the pathway, we also ran a similar analysis excluding one and two true motifs from the model fitting step (Table ). This leads to a very slight increase in the errors of variable selection. The parameter estimates are consistent, with the exception of σ 2, for which the informative prior appears to have a biasing effect for clusters of very small sizes. 6

17 [Table about here] Tests for robustness to hyperparameter specification. We tested the performance of the algorithm under varying amounts of noise from the random effects component ξ 0, and under varying degrees of misspecification of ξ 0. Results (Figure in Appendix) show that the algorithm performs consistently well for a range of values of ξ 0 (given in the log scale), but as expected, at extremely high values of ξ 0 the performance declines as more noise is introduced into the data. However, the performance is extremely robust to parameter settings- for lower true values of ξ 0, the magnitude of set values of ξ 0 appears to have no effect on either the MSEs of parameter estimates, misclassification rates, or correct selection of covariates. Model selection using the L-measure. For the simulated data set, we next applied the model selection method based on the L-measure to test whether the method could recover the true value of K. The results here are presented for the K = 2 data set. We used the posterior estimates to construct the L-measure for values of ν varying between 0 and. It can be seen that the true model with K = 2 outperforms all the other models (Table 2). To check whether the fitted model was not significantly different from the true, we constructed the calibration distribution for the difference D(y, ν) = ˆL K=2 (y, ν) ˆL t (y, ν), where ˆL t (y, ν) denotes the L-measure for the true model with the true parameter values. For almost all values of ν (except for very small ones), it appeared that the model fits quite well. As a comparison, we also computed the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) model selection criteria based on the likelihood calculated through recursion (Appendix D) at the modal parameter estimates for K = 2, 3, 4. As can be seen in Table 2, while AIC and BIC tend to slightly overfit the data, the L-measure clearly picks out the correct number of states. Since the effective sample size (ESS) is an issue with the correct calculation of the BIC for the dependent data, we computed the BIC for both extremes with sample 7

18 size equaling N (under-estimate, assuming each gene contributes an ESS of ) and NT (over-estimate, assuming measurements for a gene over all time points are independent); the true ESS would be between these two extremes. However, since over-fitting occurs for both extremes, it appears that the L-measure is probably a more accurate model selection criterion in this case. [Table 2 about here] 6 Case study: Analysis of yeast cell-cycle data In this section we describe the analysis of the yeast cell cycle data set (Spellman et al., 998) applying the new methodology. The HMRM model was fitted to the yeast cell cycle data over a range of values for the total number of states K, with a starting set of 24 motif covariates, and the L-measure used to determine the optimal number of states. We chose the hyperparameter settings as suggested by our simulation studies, ξ 0 in a range of 0. 5, g in the range to 00, and the results were virtually identical within this range. We report here the results for the settings of ξ 0 = 0., g = 00, ɛ = ɛ 2 = 2, ω kk =, ω k, k = 9 (k =,..., K), and data-dependent priors for S 0 = Var(Y ) and w 0 = N. Table 3 indicates that K = 3 or 4 are almost identical to the model with the lowest value of the L-criterion (K = 5). [Table 3 about here] 6. Data analysis Posterior convergence diagnostics were computed using the R CODA package. Autocorrelation plots for all parameters and trace plots of the number of selected variables (Figure 3 in Supplementary Material) showed adequate convergence on taking about 50,000 iterations of the sampler. Posterior samples were used to make inference regarding the classification of genes and to judge whether there were significant effects of certain 8

19 transcription factors on groups of genes. Figure gives the motifs that showed significant effects (posterior probability of selection greater than 50%) and the 95% confidence intervals for the regression coefficient for each state, with K = 3 and 5. For almost all selected motifs, the posterior intervals do not overlap between states, and in at least one state shows a highly significant positive or negative effect on gene expression. For K = 5, significant positive effects (details in Supplementary Table 2) are shown by the TFs GAL4, GCR, MATalpha2, MIG, XBP (state ); CSRE, GCN4, MCM, PHO2, UASPHR (state 2); MCB, PHO4, RAP, STE2, SWI5 (state 4); while significant negative effects are shown by CAR repressor, CSRE, MCM, PHO4, RLM, ROX, STE2, UASPHR (state ); GCR, MCB, MIG, RAP, SWI5, XBP (state 2); CSRE, GAL4, GCN4, GCR, MATalpha2, UASPHR (state 4); and SMP (state 5). State 3 overall seems to show weaker motif effects. The significant transcription factors picked out by K = 3 states include many of the same motifs, suggesting that the extra states are formed by subdividing some of the previous states (e.g. state is similar for the two sets, while state 2 for the K = 5 model is similar to state 3 in the K = 3 model). By comparing to the motifs listed in Spellman et al. (998), we see that a number of factors known to have strong effects on the cell cycle, such as MCB, MCM and SWI5, are detected by the method, as well as motifs that are active at specific time points (Table 4). By jointly modeling temporal expression with motif sequence scores, we thus can get a simultaneous picture of groups of genes regulated by certain transcription factors, that may have opposing effects in different groups. Since the groups are allowed to change over time, this implies that we can uncover the pattern of regulation of transcription factors over the cell-cycle, not being limited by a fixed grouping of genes. [Figure about here] 9

20 6.2 Biological validation We compared our results with previous inference on the same data set (Spellman et al., 998) and also re-analyzed the data using a stepwise multiple regression approach on the lines of Conlon et al. (2003). The stepwise method uses each time point separately, and assumes all genes in a single group without clusters. It uses the AIC and a forwardbackward procedure to select significant covariates. In comparison, the HMRM detects the overall influence of motifs over time, and allows genes to belong to different states at different times. Table 4 indicates that the HMRM succeeds in uncovering more motif effects, that are also observed to be cluster-specific. Motifs that are not found significant by HMRM are neither found by stepwise regression, with the exception of TBP. The stepwise method misses known cell-cycle regulators MCB and SWI5, while MCM signals are weak (picked up at only one time point). One reason why they might not appear significant is that they have opposing effects in sub-groups of genes at different time points (e.g. SWI5 has a negative effect in group and a positive effect in group 3 for the K = 3 model), opposing effects nullifying the overall one. Also, although a few transcription factors show continual effects over a phase of the cell cycle (e.g. GCN4, MIG), in many cases most motifs only show significant effects sparsely, as no information is borrowed over neighboring time points to judge whether the overall effect over a period of time is significant, which is biologically more meaningful. [Table 4 about here] We compared how the clusters based jointly on motif effects and gene expression correlate to groups of genes categorized by their involvement in a functional pathway over certain points in the cell cycle, as shown in Figure 7 of Spellman et al. (998). Our results indicate that a number of functional groupings at particular phases of the cell cycle are shown to have high over-representation in certain clusters (Table 5): for example, genes involved in DNA repair (G), DNA synthesis (G), cell-cycle control 20

21 (G), budding (G and M/G), mitosis (M), nutrition (M) and mating (M/G). Even more promising is the fact that the analysis for gene clusters are remarkably consistent for the K = 3 and K = 5 models; the pathway-related genes are found to be grouped together at the relevant phase irrespective of the model chosen, which is important for robust inference. The only significant difference is for the budding:fatty acids pathway for which the K = 3 and K = 5 models pick out two different groups, which are active at different points of the cell-cycle, hence are still consistent. It can be noted here that a single-cluster based analysis, such as the stepwise regression approach, does not provide us a mechanism for attempting this kind of pathway inference which can generate useful biological hypotheses for further testing. [Table 5 about here] 7 Discussion Treating gene expression clustering as a temporal variable may help in discovering relationships between functional sequence motifs, and groups of genes they regulate. Differing groups of genes may behave as a cluster at different time points, influenced by different groups of transcription factors. If two groups of genes are differentially regulated by a TF, e.g. in one group the TF acts by inducing the response, while in the other it has a repressive effect, grouping the genes together will lead to losing the effect of the TF for the entire set. In order to uncover the relationships between TFs and genes that are involved in biological pathways of interest, it is thus desirable to determine how these TF effects may vary between groups and over time. The hidden Markov regression model framework allows for (i) determining covariates (motifs and phase of cell-cycle) that have significant effects on the response (gene expression) (ii) covariate effects varying between states (iii) gene clusters varying over time. The hierarchical model framework also leads to nicely interpretable properties of the poste- 2

22 rior estimates and asymptotic comparability to the frequentist framework. In addition, although the model as discussed here does not directly find novel motifs, a de-novo motif discovery method may be formulated that uses the model predictions to generate groups of genes for motif discovery. Our approach also seems to induce a tighter clustering of genes, by grouping noisy genes which are not significantly affected by the covariates as a separate cluster, as seen in the yeast cell-cycle application. However, a principled way to induce tight clustering under a model based framework, still needs to be studied in detail. Our approach allows a novel, and to our belief the first, model-based method for determining groups of genes being influenced by separate sets of covariates over time. Application to yeast cell cycle data succeeds in detecting regulatory modules - groups of genes regulated by sets of TFs, that match previous biological knowledge. The joint modeling approach also succeeds in discovering more known functional TFs than the stepwise method in the yeast data (e.g. MCB, SWI5) one likely reason for this is the separation of effects of groups of TFs that have differential effects on groups of genes. Our approach to determining the number of clusters K is based on a Bayesian model choice criterion, the L-measure. The use of this approach is supported by the fact that (i) the number of biologically interesting clusters are within an approximately known, small range of values, and (ii) inference for the significant covariates appears consistent for a small range of K around the optimal value. For instance, for the yeast cell-cycle data, if we choose five states instead of three, the gene-cluster allocation, and cluster-specific covariates remain essentially the same, with a few new genes (having weaker covariate effects) being sub-stratified. Another promising direction for avoiding the model choice issue is by considering extensions to models such as the infinite mixture model based on the Dirichlet process, which we intend to explore further in future work. 22

23 Acknowledgments This research is supported by funding from the National Institutes of Health and the Environmental Protection Agency. The authors are grateful to two anonymous referees and the editor, whose comments significantly improved the content and presentation in this article. References Bartlett, M. S. (957). A comment on D.V. Lindley s statistical paradox. Biometrika, 44: Bussemaker, H. J., Li, H., and Siggia, E. D. (200). Regulatory detection using correlation with expression. Nature Genetics, 27: Chen, M.-H. and Shao, Q.-M. (997). On Monte Carlo methods for estimating ratios of normalizing constants. The Annals of Statistics, 25: Conlon, E. M., Liu, X. S., Lieb, J. D., and Liu, J. S. (2003). Integrating regulatory motif discovery and genome-wide expression analysis. Proc. Natl Acad. Sci. USA, 00(6): Green, P. J. (995). Reversible jump MCMC and Bayesian model determination. Biometrika, 82: Gupta, M. and Liu, J. S. (2003). Discovery of conserved sequence patterns using a stochastic dictionary model. J. Am. Stat. Assoc., 98(46): Harville, D. A. (997). Matrix Algebra from a Statistician s Perspective. Springer-Verlag. Holmes, I. and Bruno, W. (2000). Finding regulatory elements using joint likelihoods for sequence and expression profile data. Proc Int Conf Intell Syst Mol Biol., 8:

24 Ibrahim, J. G., Chen, M.-H., and Sinha, D. (200). Criterion-based methods for Bayesian model assessment. Statistica Sinica, (2): Kass, R. E. and Wasserman, L. (995). A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. J. Am. Stat. Assoc., 90: Liang, F. and Wong, W. H. (2000). Evolutionary Monte Carlo: applications to c p model sampling and change point problem. Statistica Sinica, 0: Liu, J. S., Neuwald, A. F., and Lawrence, C. E. (995). Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J. Am. Stat. Assoc., 90: Liu, J. S., Wong, W. H., and Kong, A. (994). Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes. Biometrika, 8: Liu, X., Brutlag, D. L., and Liu, J. S. (2002). An algorithm for finding protein-dna binding sites with applications to chromatin-immunoprecipitation microarray experiments. Nat. Biotech., 20(8): Meng, X. L. and Wong, W. (996). Simulating ratios of normalising constants via a simple identity: a theoretical exploration. Statistica Sinica, 6: Quinn, B. G. (989). Estimating the number of terms in a sinusoidal regression. J. Time Ser. Anal., 0():7 75. Segal, E., Taskar, B., Gasch, A., Friedman, N., and Koller, D. (200). Rich probabilistic models for gene expression. Bioinformatics, 7:S Spellman, P., Sherlock, G., Zhang, M., Iyer, V., Anders, K., Eisen, M., Brown, P., Botstein, D., and Futcher, B. (998). Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell., 9(2):

25 Wang, W., Cherry, J. M., Nochomovitz, Y., Jolly, E., Botstein, D., and Li, H. (2005). Inference of combinatorial regulation in yeast transcriptional networks: a case study of sporulation. Proc Natl Acad Sci U S A, 02(6): Zellner, A. (986). On assessing prior distributions and Bayesian regression analysis with g-prior distributions, volume Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti of Goel, P.K. and Zellner, A. (Eds.), page 233. North- Holland, Amsterdam. Zhu, J. and Zhang, M. (999). SCPD: a promoter database of the yeast Saccharomyces cerevisiae. Bioinformatics, 5(7):

26 Table : Comparing misclassification and variable selection rates for simulation studies, for number of states in the HMRM equaling 5. Var sel denotes the results of variable selection: Sens denotes the sensitivity, or proportion of correct variables found; Spec denotes specificity, the proportion of selected variables that are correctly selected. Misc(%) is minus the misclassification rate, based on state allocation by the sampler. noise noise noise missing missing missing Parameter motif 2 motif 5 motifs 2 & 5 Var sel Sens(%) Spec(%) Misc(%) β Bias (var) (.04) (.043) (.04) (.0309) (.0402) (.0642) σ 2 Bias (var) (.299) (0.289) (.289) (.726) (.7873) (.7794) τ Bias (var) (.0004) (.0004) (.0004) (.0004) (.0005) (.0005) µ Bias (var) (.0078) (.0079) (.0078) (.052) (.066) (.0235) 26

27 Table 2: Comparison of model selection criteria for simulation study. The columns give the number of states K; the mode of the log-likelihood; the AIC; the BIC with sample size N (N T ); and the L-measure evaluated for ν = 0.5. The optimal choice of K for each criterion is highlighted, showing that only the L-measure gives the correct selection of K, while AIC and BIC overfit by choosing 3 states. K L(ˆθ) AIC BIC log L K (y, 0.5) ( ) ( ) ( ).26 Table 3: Model selection based on L-criterion for yeast cell-cycle data. Based solely on the L-measure, the optimal model choice is for K = 5. Using the calibration distribution for D(y, ν) shows that models with K = 3, 4 or 5 are essentially indistinguishable (95% posterior intervals of the difference almost symmetric about zero), hence we may choose either the most parsimonious model, with K = 3, or K = 5 for inference. K log L K (y, 0.5) log L K (y, 0.5) 95% CI for D(y, ν) (0.0426, 0.32) ( , ) ( , 0.04) 27

28 Table 4: Transcription factors selected by (i) stepwise regression separately over each time point and (ii) the HMRM. The + and signs denote positive and negative motif effects, based on the 95% confidence (posterior) interval for the regression coefficient. The confidence (posterior) intervals are given in Tables -3 in the Supplementary material. TF Stepwise Regression HMRM Effect Time Effect (Cluster) % Selected ABF CSRE (), + (2), (3) 0.95 GAL4 + (), (4) GCN4 +, + 80, 20 + (2), (4) GCR (), (2), (4) MATalpha2 + (), (4) 0.99 MCB (2), + (4) MCM 20 (), + (2) 0.8 MIG, +, +, + 30, 270, 300, (), (2) PHO2 + (2) PDR/PDR3 PHO4 300 (), + (4) 0.98 REB ROX + 0 (), (3) 0.62 RAP 240 (2), + (4) RLM (), + (5) 0.98 CAR Repressor () 0.98 SMP (5) 0.88 SWI5 (2), + (4) STE (), + (4) 0.58 TBP 390 UASPHR (), + (2), (4) XBP +, 60, (), (2)

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Variable Selection in Structured High-dimensional Covariate Spaces

Variable Selection in Structured High-dimensional Covariate Spaces Variable Selection in Structured High-dimensional Covariate Spaces Fan Li 1 Nancy Zhang 2 1 Department of Health Care Policy Harvard University 2 Department of Statistics Stanford University May 14 2007

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

Gibbs Sampling Methods for Multiple Sequence Alignment

Gibbs Sampling Methods for Multiple Sequence Alignment Gibbs Sampling Methods for Multiple Sequence Alignment Scott C. Schmidler 1 Jun S. Liu 2 1 Section on Medical Informatics and 2 Department of Statistics Stanford University 11/17/99 1 Outline Statistical

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Modelling Gene Expression Data over Time: Curve Clustering with Informative Prior Distributions.

Modelling Gene Expression Data over Time: Curve Clustering with Informative Prior Distributions. BAYESIAN STATISTICS 7, pp. 000 000 J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West (Eds.) c Oxford University Press, 2003 Modelling Data over Time: Curve

More information

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Discovering molecular pathways from protein interaction and ge

Discovering molecular pathways from protein interaction and ge Discovering molecular pathways from protein interaction and gene expression data 9-4-2008 Aim To have a mechanism for inferring pathways from gene expression and protein interaction data. Motivation Why

More information

Stat 516, Homework 1

Stat 516, Homework 1 Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional

More information

STAT 425: Introduction to Bayesian Analysis

STAT 425: Introduction to Bayesian Analysis STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 2) Fall 2017 1 / 19 Part 2: Markov chain Monte

More information

Coupled Hidden Markov Models: Computational Challenges

Coupled Hidden Markov Models: Computational Challenges .. Coupled Hidden Markov Models: Computational Challenges Louis J. M. Aslett and Chris C. Holmes i-like Research Group University of Oxford Warwick Algorithms Seminar 7 th March 2014 ... Hidden Markov

More information

Appendix: Modeling Approach

Appendix: Modeling Approach AFFECTIVE PRIMACY IN INTRAORGANIZATIONAL TASK NETWORKS Appendix: Modeling Approach There is now a significant and developing literature on Bayesian methods in social network analysis. See, for instance,

More information

Variable selection for model-based clustering

Variable selection for model-based clustering Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

MCMC: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

Outline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models

Outline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models Collaboration with Rudolf Winter-Ebmer, Department of Economics, Johannes Kepler University

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data 1 Gene Networks Definition: A gene network is a set of molecular components, such as genes and proteins, and interactions between

More information

The Bayesian Approach to Multi-equation Econometric Model Estimation

The Bayesian Approach to Multi-equation Econometric Model Estimation Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation

More information

Modeling Gene Expression from Microarray Expression Data with State-Space Equations. F.X. Wu, W.J. Zhang, and A.J. Kusalik

Modeling Gene Expression from Microarray Expression Data with State-Space Equations. F.X. Wu, W.J. Zhang, and A.J. Kusalik Modeling Gene Expression from Microarray Expression Data with State-Space Equations FX Wu, WJ Zhang, and AJ Kusalik Pacific Symposium on Biocomputing 9:581-592(2004) MODELING GENE EXPRESSION FROM MICROARRAY

More information

Recursive Deviance Information Criterion for the Hidden Markov Model

Recursive Deviance Information Criterion for the Hidden Markov Model International Journal of Statistics and Probability; Vol. 5, No. 1; 2016 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education Recursive Deviance Information Criterion for

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Fuzzy Clustering of Gene Expression Data

Fuzzy Clustering of Gene Expression Data Fuzzy Clustering of Gene Data Matthias E. Futschik and Nikola K. Kasabov Department of Information Science, University of Otago P.O. Box 56, Dunedin, New Zealand email: mfutschik@infoscience.otago.ac.nz,

More information

Penalized Loss functions for Bayesian Model Choice

Penalized Loss functions for Bayesian Model Choice Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented

More information

Bayesian non-parametric model to longitudinally predict churn

Bayesian non-parametric model to longitudinally predict churn Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Multiple QTL mapping

Multiple QTL mapping Multiple QTL mapping Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] 1 Why? Reduce residual variation = increased power

More information

Matrix-based pattern discovery algorithms

Matrix-based pattern discovery algorithms Regulatory Sequence Analysis Matrix-based pattern discovery algorithms Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe)

More information

Physical network models and multi-source data integration

Physical network models and multi-source data integration Physical network models and multi-source data integration Chen-Hsiang Yeang MIT AI Lab Cambridge, MA 02139 chyeang@ai.mit.edu Tommi Jaakkola MIT AI Lab Cambridge, MA 02139 tommi@ai.mit.edu September 30,

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Scaling up Bayesian Inference

Scaling up Bayesian Inference Scaling up Bayesian Inference David Dunson Departments of Statistical Science, Mathematics & ECE, Duke University May 1, 2017 Outline Motivation & background EP-MCMC amcmc Discussion Motivation & background

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

November 2002 STA Random Effects Selection in Linear Mixed Models

November 2002 STA Random Effects Selection in Linear Mixed Models November 2002 STA216 1 Random Effects Selection in Linear Mixed Models November 2002 STA216 2 Introduction It is common practice in many applications to collect multiple measurements on a subject. Linear

More information

Bayesian nonparametrics

Bayesian nonparametrics Bayesian nonparametrics 1 Some preliminaries 1.1 de Finetti s theorem We will start our discussion with this foundational theorem. We will assume throughout all variables are defined on the probability

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Learning in Bayesian Networks

Learning in Bayesian Networks Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

More information

Mixtures and Hidden Markov Models for analyzing genomic data

Mixtures and Hidden Markov Models for analyzing genomic data Mixtures and Hidden Markov Models for analyzing genomic data Marie-Laure Martin-Magniette UMR AgroParisTech/INRA Mathématique et Informatique Appliquées, Paris UMR INRA/UEVE ERL CNRS Unité de Recherche

More information

Fractal functional regression for classification of gene expression data by wavelets

Fractal functional regression for classification of gene expression data by wavelets Fractal functional regression for classification of gene expression data by wavelets Margarita María Rincón 1 and María Dolores Ruiz-Medina 2 1 University of Granada Campus Fuente Nueva 18071 Granada,

More information

Bayesian model selection: methodology, computation and applications

Bayesian model selection: methodology, computation and applications Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program

More information

Related Concepts: Lecture 9 SEM, Statistical Modeling, AI, and Data Mining. I. Terminology of SEM

Related Concepts: Lecture 9 SEM, Statistical Modeling, AI, and Data Mining. I. Terminology of SEM Lecture 9 SEM, Statistical Modeling, AI, and Data Mining I. Terminology of SEM Related Concepts: Causal Modeling Path Analysis Structural Equation Modeling Latent variables (Factors measurable, but thru

More information

Introduction to Bioinformatics

Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics

More information

MCMC Sampling for Bayesian Inference using L1-type Priors

MCMC Sampling for Bayesian Inference using L1-type Priors MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

A note on Reversible Jump Markov Chain Monte Carlo

A note on Reversible Jump Markov Chain Monte Carlo A note on Reversible Jump Markov Chain Monte Carlo Hedibert Freitas Lopes Graduate School of Business The University of Chicago 5807 South Woodlawn Avenue Chicago, Illinois 60637 February, 1st 2006 1 Introduction

More information

Chapter 8. Regulatory Motif Discovery: from Decoding to Meta-Analysis. 1 Introduction. Qing Zhou Mayetri Gupta

Chapter 8. Regulatory Motif Discovery: from Decoding to Meta-Analysis. 1 Introduction. Qing Zhou Mayetri Gupta Chapter 8 Regulatory Motif Discovery: from Decoding to Meta-Analysis Qing Zhou Mayetri Gupta Abstract Gene transcription is regulated by interactions between transcription factors and their target binding

More information

Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models

Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models Ioannis Ntzoufras, Department of Statistics, Athens University of Economics and Business, Athens, Greece; e-mail: ntzoufras@aueb.gr.

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Predicting Protein Functions and Domain Interactions from Protein Interactions

Predicting Protein Functions and Domain Interactions from Protein Interactions Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput

More information

Cluster Analysis of Gene Expression Microarray Data. BIOL 495S/ CS 490B/ MATH 490B/ STAT 490B Introduction to Bioinformatics April 8, 2002

Cluster Analysis of Gene Expression Microarray Data. BIOL 495S/ CS 490B/ MATH 490B/ STAT 490B Introduction to Bioinformatics April 8, 2002 Cluster Analysis of Gene Expression Microarray Data BIOL 495S/ CS 490B/ MATH 490B/ STAT 490B Introduction to Bioinformatics April 8, 2002 1 Data representations Data are relative measurements log 2 ( red

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

Bayesian time series classification

Bayesian time series classification Bayesian time series classification Peter Sykacek Department of Engineering Science University of Oxford Oxford, OX 3PJ, UK psyk@robots.ox.ac.uk Stephen Roberts Department of Engineering Science University

More information

ST 740: Markov Chain Monte Carlo

ST 740: Markov Chain Monte Carlo ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:

More information

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification 10-810: Advanced Algorithms and Models for Computational Biology Optimal leaf ordering and classification Hierarchical clustering As we mentioned, its one of the most popular methods for clustering gene

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Markov Chain Monte Carlo in Practice

Markov Chain Monte Carlo in Practice Markov Chain Monte Carlo in Practice Edited by W.R. Gilks Medical Research Council Biostatistics Unit Cambridge UK S. Richardson French National Institute for Health and Medical Research Vilejuif France

More information

Markov Chains and Hidden Markov Models

Markov Chains and Hidden Markov Models Chapter 1 Markov Chains and Hidden Markov Models In this chapter, we will introduce the concept of Markov chains, and show how Markov chains can be used to model signals using structures such as hidden

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND

More information

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics arxiv:1603.08163v1 [stat.ml] 7 Mar 016 Farouk S. Nathoo, Keelin Greenlaw,

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Supplemental Information for Pramila et al. Periodic Normal Mixture Model (PNM)

Supplemental Information for Pramila et al. Periodic Normal Mixture Model (PNM) Supplemental Information for Pramila et al. Periodic Normal Mixture Model (PNM) The data sets alpha30 and alpha38 were analyzed with PNM (Lu et al. 2004). The first two time points were deleted to alleviate

More information

Introduction to Bayesian methods in inverse problems

Introduction to Bayesian methods in inverse problems Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction

More information

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition. Christian P. Robert The Bayesian Choice From Decision-Theoretic Foundations to Computational Implementation Second Edition With 23 Illustrations ^Springer" Contents Preface to the Second Edition Preface

More information

Nonparametric Bayes tensor factorizations for big data

Nonparametric Bayes tensor factorizations for big data Nonparametric Bayes tensor factorizations for big data David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & DARPA N66001-09-C-2082 Motivation Conditional

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Using Bayesian Priors for More Flexible Latent Class Analysis

Using Bayesian Priors for More Flexible Latent Class Analysis Using Bayesian Priors for More Flexible Latent Class Analysis Tihomir Asparouhov Bengt Muthén Abstract Latent class analysis is based on the assumption that within each class the observed class indicator

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced

More information

arxiv: v1 [stat.co] 18 Feb 2012

arxiv: v1 [stat.co] 18 Feb 2012 A LEVEL-SET HIT-AND-RUN SAMPLER FOR QUASI-CONCAVE DISTRIBUTIONS Dean Foster and Shane T. Jensen arxiv:1202.4094v1 [stat.co] 18 Feb 2012 Department of Statistics The Wharton School University of Pennsylvania

More information

Modelling geoadditive survival data

Modelling geoadditive survival data Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model

More information

Gene Regula*on, ChIP- X and DNA Mo*fs. Statistics in Genomics Hongkai Ji

Gene Regula*on, ChIP- X and DNA Mo*fs. Statistics in Genomics Hongkai Ji Gene Regula*on, ChIP- X and DNA Mo*fs Statistics in Genomics Hongkai Ji (hji@jhsph.edu) Genetic information is stored in DNA TCAGTTGGAGCTGCTCCCCCACGGCCTCTCCTCACATTCCACGTCCTGTAGCTCTATGACCTCCACCTTTGAGTCCCTCCTC

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Infinite Feature Models: The Indian Buffet Process Eric Xing Lecture 21, April 2, 214 Acknowledgement: slides first drafted by Sinead Williamson

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information