Correction for Hidden Confounders in the Genetic Analysis of Gene Expression

Size: px
Start display at page:

Download "Correction for Hidden Confounders in the Genetic Analysis of Gene Expression"

Transcription

1 1 Supplemental Information for: Correction for Hidden Confounders in the Genetic Analysis of Gene Expression Jennifer Listgarten, Carl Kadie, Eric E. Schadt, and David Heckerman Proceedings of the National Academy of Sciences, 21 Supplemental Methods We review linear mixed-effect models, how they have been used for correction of population structure (PS) or expression heterogeneity (EH) in the genetics association literature, how a problem can arise in application of EH models to data like our human and mouse data, and how to fix it. Then, we introduce our model for joint correction of EH and PS called LMM-EH-PS. Linear mixed-effects models can be understood at an intuitive level from a variety of viewpoints. We view the linear mixed-effects model as a probabilistic generative model [1 3] in which a target variable, y g, in our case a vector of gene probe expression levels across all individuals for one gene probe g ( y g is of dimension N 1 for N individuals), is modeled as the linear combination of three quantities: (1) fixed effects X (dimension N Q) contain the effect of S SNPs and Q S other fixed effects (e.g., a bias/offset term and covariates such as gender and age) (2) random effects, u g (of dimension N 1 for N individuals), which model hidden effects such as PS or EH (and will be integrated out given a suitable distribution), and (3) independent Gaussian noise, e g (dimension N 1), y g = X β g + τ g u g + σ g e g, where τ g and σ g (scalars) are respectively the magnitude of the confounding structure, and the residual noise; and β g (dimension Q 1) are the coefficients of the fixed effects. The conditional likelihood is also sometimes written in the following equivalent manner, more convenient for exposition of our method, p( y g u g, X, β g, τ g, σ g ) = N( y g X β g + u g τ g, Iσ 2 g),

2 2 where N( r m, Σ) denotes a Gaussian distribution for r with mean m and covariance matrix Σ; and I denotes the N xn identity matrix. Shortly we will describe how to jointly model all gene probes. Note that because in our work we only test for one SNP at a time, we do not incorporate all SNPs into the model jointly, instead restricting the fixed effects to include just one SNP at a time. When it is assumed that some individuals share the same random effects, u g, for example in the case where there are multiple individuals from the same strain in a genetic association study (who by definition have identical genetics), one can augment the model with an indicator matrix, Z for which Z nt = 1 if and only if individual n belongs to strain t, and otherwise zero, so that p( y g u g, X, β g, Z, τ g, σ g ) = N( y g X β g + Z u g τ g, Iσ 2 g), where now u g is of dimension T x1 for T strains. We use this more general model going forward, because it also encompasses the case where each individual comes from a unique strain (T = N). As an aside, one could instead add a level to the hierarchy so that the j th individual from strain t gets a random effect w tj g which is a noisy version of u t, the hidden variable for strain t that is, p(w tj g ) = N(w tj g u t g, σ t g2 ). In some circumstances this addition could improve the model, although we did not use it here. The distribution of u g is assumed to be a zero-mean Gaussian distribution with covariance matrix K of dimension T xt, p( u g K) = N( u g, K). (1) Intuitively, this distribution captures similarity between strains/individuals through the covariance matrix, K, which (to the extent that it has off-diagonal elements) couples the observations among individuals (Equation 5). In so doing, the model can correct for the fact that individuals in a study are not perfectly balanced for various factors such as race. Note that data for individuals are independent from one another given u g that is, knowing the value of the hidden variables decouples the observations in the model. Of course in practice u g is hidden and this decoupling does not occur. When correcting for population structure, an Identity-By-Descent (IBD), Identity-By-State (IBS), or covariance matrix has been used for K (e.g., [4 7]). In separate experiments, we found them to yield comparable results. Thus we used only the covariance matrix in this study [4, 7]. When modeling expression heterogeneity, Kang et al. use the covariance matrix of the gene expression data in their ICE model [8]. The K we use for modeling expression

3 3 heterogeneity is described in the next section. The likelihood, L, for the mixed-effects model can be found by integrating out the random effects, u g, giving L g p( y g X, β g, τ g, σ g, Z, K) (2) = p( y g, u g X, β g, τ g, σ g, Z, K) d u g (3) u g = N( y g Xβ g + Z u g τ g, Iσg) 2 p( u g K) d u g (4) u g = N( y g Xβ g, τg 2 ZKZ + Iσg) 2 (5) where Z denotes the transpose of Z. Now let us consider the joint model for all gene probes, g = 1,..., G. We assume that, conditioned on K, Z, X and β g, τ g, σ g, for all g, the expression data for each gene are mutually independent. That is, the likelihood of Y = [ y 1, y 2,..., y G ] (dimension N G) is given by p(y X, { β g, τ g, σ g }, Z, K) = g p( y g X, β g, τ g, σ g, Z, K) = g N( y g X β g, τ 2 g ZKZ + Iσ 2 g), (6) where { β g, τ g, σ g } denotes the parameters for all gene probes. This generative model is shown as a graphical model in Figure 1. Note that when K is unknown the gene probes are coupled together. Later we show that in the case of expression heterogeneity correction, we treat K as an unknown parameter and learn it from all of the gene expression data jointly. In the meantime, we treat K as known and thus model gene probes independently. The parameters { β g, τ g, σ g } can be estimated using the principle of REML (Restricted Maximum Likelihood), which is a degree-of-freedom correction to the principle of ML (Maximum Likelihood) parameter fitting and can be viewed as integrating out the fixed effects when estimating the variance parameters [9]. Alternatively, one can simply use the principle of Maximum Likelihood. We use ML parameter estimation, having found no discernible empirical difference between ML and REML in experiments when only a handful of fixed effects are used, such as is the case in this paper. The extensions to linear mixed-effects models that we develop here can easily be adapted to the REML framework. It turns out that the ML parameters estimates for β g, τ g, σ g for a given g are an analytic function of δ g σ2 g τ 2 g for e.g., [4]). Thus, the likelihood L g can be written in terms of δ g only, and can be maximized with respect only to δ g using numerical optimization. Once this value for δ g is identified, the values for the remaining parameters (see

4 4 K u 1 u 2 u N,, y 1 y 2 y N gene probe x x x Figure 1. Graphical model depiction of a linear mixed-effects model with one random effect for all gene probes. Letters enclosed in a circle denote random variables. Observed variables are shaded. Indexes 1,..., N denote N individuals. The square annotated with gene probe is a plate, denoting that the sub-model within the plate is repeated for each gene probe. Letters β, σ 2, τ 2 within the plate represent gene-probe-specific parameters (by convention we have dropped the g subscript used elsewhere for these parameters because the plate makes this dependence apparent). Parameter K, residing outside the plate, is a covariance matrix describing the similarity among the hidden variables ( confounder coefficients ), u = [u 1,..., u N ], and is shared across all gene probes. Variables x n denotes the slice of the design matrix, X = [ x 1,..., x N ], for individual n and includes SNP features, a bias/offset term, and other covariates. y = [y 1,... y N ] are the gene probe measurements for one gene probe and all individuals.

5 5 follow. Additionally, it has been shown that this numerical optimization can be made orders of magnitude faster than the naive approach by way of an algebraic equivalence that changes the time complexity for W iterations from O(W N 3 ) to O(W N + N 3 ) [4,1]. By re-writing the likelihood as a function of the matrix H g = H g (δ g ) ZKZ +Iδ g, and then using a singular value decomposition (SVD) of K, the computation of H g (δ g ) (where K denotes determinant of K) and H g (δ g ) 1 becomes only O(N) (conditioned on having computed the SVD of K) rather than O(N 3 ) given by the naive approach. This reformulation enables more starting points to be used for the optimization over δ g, thereby lessening the problem of local maxima in the likelihood function during parameter estimation. This approach can be found in the EMMA algorithm [4] which will form a core component of our algorithms. When using mixed-effects models to look for associations between SNPs and gene expression, one obtains a p-value for each hypothesis in the usual way that is, by fitting the model parameters under a null model (without SNPs), and under an alternative model (with SNPs), and then using, for example, a likelihood ratio test (LRT) or F-test [4]. In our empirical studies, we have found little difference between the two tests and use the LRT in our results. Pitfall of Pre-Computing K for EH correction As we have seen, the parameters { β g, σ g, τ g } are estimated using, say, ML, whereas the matrix K has been computed independently and ahead of time in a pre-processing step, often from the data itself (i.e., SNP or expression data). However, in the case of an EH-correction model such as ICE [8], there is a theoretical and practical concern in such an approach. The concern arises because the mixed-effects model, in the case of EH-correction, is a generative model for the expression data, and the parameter K is being estimated from the expression data, but in a way that is not, in general, consistent with the generative model. That is, the ICE estimate for K is not a consistent estimator for the model being used (e.g., an ML or REML estimate), as specified by the joint likelihood of the data given in Equation 6. Readers familiar with graphical models will see from Figure 1 that, roughly speaking, the parameter K should be estimated from the hidden variables u g (Equations 12-17) rather than from the observed variables y g as is done in ICE. One consistent estimate for K is the ML estimate for the model specified in Equation 6. This estimate is precisely what we use. Note however that one must have enough gene probes to support such an estimate, because the number of parameters in K scales quadratically with the number of individuals. In many applications of mixed-effects models, estimating K with no constraints as we do here would not be feasible. Because of the large number of simultaneous mixed-effects sub-models (one for each gene probe g) all sharing K

6 6 in our application, however, we can take such an approach. As an aside, it is interesting to note that given a null model for mean-centered data as in [8], where the parameters σ g, and τ g are the same for all gene probes, the parameters in the covariance of the observed data, Kτg 2 + Iσg, 2 are not identifiable. Consequently, we can set τ g = 1 and σ g = or equivalently u g = y g without loss of generalization, and the ICE procedure in [8] is in that case a consistent estimator. Such a model, however, is neither flexible nor likely to be useful in practice. The issue of using an estimator that is not consistent is not just a theoretical one, but can have a practical, negative affect on our ability to uncover eqtl associations in data. The issue can manifest itself by giving rise to seriously deflated p-value distributions, which in turn leads to overly conservative FDR [11] (False Discovery Rate) estimates. Additionally, by modeling K correctly in EH models, one can increase power. Both of these points are demonstrated in the main paper. An experiment we performed that highlights the issue of the inconsistent estimator was to (1) estimate the ICE model parameters on real data (with no SNP data) (2) generate synthetic data from these ICE models with no SNP fixed effects (i.e., strictly null eqtl data), (3) test a set of randomly selected SNPs with ICE to obtain a p-value distribution that theoretically should be uniform on [, 1] because only null hypotheses were generated. In our experiments of this nature, we repeatedly found that the resulting p-value distribution was deflated. For example, we typically observed λ in the range This phenomenon is of course data-dependent, but we have broadly observed this trend on experiments based on real data. Interestingly, the ICE algorithm applied to the HapMap data analyzed in the ICE paper [8], shows no deflation. In the next section, we introduce our LMM-EH-PS model, and describe our method for estimating K for EH, including the special case where only correction for EH is required (no PS). Note that the issue we have highlighted is not restricted to mixed-effects models, but in general may apply to any method that tackles the problem of expression heterogeneity using a two-step statistical modeling approach where some parameters related to confounding EH structure are first estimated from expression data and then are plugged into a generative model for that same expression data. If the first step is not using a consistent estimator for the parameters in the generative model used in the second step, then the estimation of K will be inconsistent. Surrogate Variable Analysis (SVA) suffers from this inconsistency. In particular, the prescribed way to use SVA for our eqtl study is to first compute the SVs from the expression data and then construct a linear regression model, independently for each gene probe, that uses the SVs as covariates and the gene expression data as targets. Because our own statistical approach in this paper is centered on a mixed-effects model, we did not investigate

7 7 this issue in great detail with SVA. However, we did find SVA used in this manner on our data to perform poorly in terms of calibration of p-values, the ability to detect cis-eqtl more strongly than trans eqtl on real data, and power to detect eqtl on synthetic data (see Results in main paper and below). LMM-EH-PS: Mixed-Effects Model for Joint Correction of EH and PS In our development of the standard linear mixed-effects model, we assumed a single random effect, u g arising from a single distribution with covariance matrix K. Indeed, this model is the one that has been used to date for PS correction, and separately for EH correction. However, to model both EH and PS, we must include multiple random effects (or, multiple variance components) in our model. Although we present such a model with two components (for EH and PS), the model and associated algorithms (i.e., parameter estimation) are easily extended to any number of components should the need arise. Our model, LMM-EH-PS, for a single vector of gene-probe expression measurements across all individuals, is fully specified by p( y g u g, v g, X, β g, τ PSg, Z, τ EHg, σ g ) = N( y g X β g + Z u g τ PSg + v g τ EHg, Iσ 2 g) p( u g K PS ) = N( u g, K PS ) p( v g K EH ) = N( v g, K EH ), so that the likelihood is given by L g p( y g X, β g, τ PSg, Z, τ EHg, σ g, K EH, K PS ) (7) = N( y g Xβ g + Z u g τ PSg + v g τ EHg, Iσg) 2 N( u g, K PS ) N( v g, K EH ) du dv (8) u v = N( y g Xβ g + Z u g τ PSg, τeh 2 g K EH + Iσg) 2 N( u g, K PS ) du (9) u = N( y g Xβ g, τeh 2 g K EH + τps 2 g ZK PS Z + Iσg), 2 (1) = N( y g Xβ g, τg 2 [w g K EH + (1 w g )ZK PS Z ] + Iσg), 2 (11) where K EH and K PS encode the PS and EH structure, respectively, u g, v g are the corresponding random effect vectors and where we re-write the likelihood in the more convenient form where w g = τ 2 EHg τehg 2 +τ PSg 2 [, 1], denotes the relative proportion of K EH (versus K PS ) to be used in the model for gene probe g, and τg 2 = τeh 2 g + τps 2 g scales K EH and K PS appropriately. Equation 1 is obtained by sequential application of the transformation from Equation

8 8 4 to 5 for a single component. We use the form in Equation 11 rather than the line above because it is convenient for parameter estimation. Note that K PS is of dimension T x T as before, whereas K EH is of dimension N x N because each individual has different gene expression measurements, even if they belong to the same strain. It is useful to note that Equation 11, which derived from a mixed-effects model with two random effects can also be interpreted as a mixed-effects model with a single random effect with covariance K w g K PS +(w g 1)K EH. This fact becomes useful later in deriving our algorithm for estimating K EH. Note too that this equivalence generalizes to any number of random effects (variance components). Parameter Estimation for Multiple Random Effects As discussed, we estimate parameters using the ML criterion. First, let us consider the (inconsistent) case where we compute K EH and K PS ahead of time, using, for example, the ICE covariance matrix and the SNP covariance matrix, respectively. In the following section, we will then consider the (consistent) case where K EH and the other parameters are estimated together. Given K EH and K PS, we can estimate the parameters β g, τ g, w g and σ g independently for each gene probe g. We do so by wrapping EMMA inside a numerical optimization algorithm that searches for the best linear combination of K EH and K PS. Specifically, we apply Brent s Method [12] to optimize the likelihood as a function of w g, L g (w g ). Each time Brent s Method is called, it evaluates L (w g ) for a new value of w g using EMMA with fixed K = w g K EH + (1 w g )ZK PS Z. In our experiments, we use Brent to search for optimal w g within six evenly-spaced intervals in [, 1]. We also compute the value of L g () and L g (1) to ensure that a model with only a single one of our random effects cannot, by virtue of local maxima, do better than the mixture. Within each call to EMMA, we use 1 starting points for δ g evenly spaced in log space between -1 and 1. We note that when there are R > 2 random effects, one must optimize multiple mixing coefficients analogous to w g (w g1, w g2,..., w gr ). One option is to use a generalization of Brent s Method. Another is to abandon the use of EMMA as the inner loop of a Brent optimization and instead use a more general-purpose numerical optimizer, such as the L-BFGS Quasi-Newton method [13], to simultaneously optimize β g, w g (for R random effects) and σ g. Such an approach requires as input the partial derivatives of L g ( w g ) with respect to each of the parameters, along with L g ( w g ). This approach can be used when there are only two random effects, but we found that the Brent s method approach performed roughly as well or better in all cases. Both approaches for estimating parameters in our LMM-EH-PS model (with two random effects) have time complexity O(W N 3 ) for W iterations and N individuals. The L-BFGS approach scales linearly in the number of components, with time

9 9 complexity O(RW N 3 ) for R variance components, whereas Brent-like alternatives increase more steeply. Estimation of Parameters Including K EH As discussed earlier, use of the covariance of the expression data for K EH, as is done in ICE, is not consistent with the mixed-effects model and, in our experience, leads to deflation of p-values and loss of power (see main paper). To avoid these practical problems, we estimate K EH along with the remaining parameters from our model and the data. Because the estimate of K EH depends on all gene probes, we must perform this estimation across all gene probes simultaneously (although, conditioned on a fixed value of K EH, the other parameters can still be learned independently for each gene probe). We do not include SNP fixed effects when learning K EH, similar to the approaches taken in [8, 14]. Thus, in practice, we first learn K EH without SNPs, then assume that K EH is fixed and apply our mixed model approach as outlined earlier. We have found our method for learning K EH to be robust to this approximation of ignoring the SNPs as shown later in the section Robustness of Estimating K EH in the Presence of True Associations. The joint parameter estimation can be accomplished using a variety of algorithms. The one we describe here builds on the parameter estimation for { β g, τ g, w g, σ g } developed in the previous section. In particular, our algorithm uses a coordinate ascent approach and alternates between 1. identifying the ML values of { β g, τ g, w g, σ g }, conditioned on a fixed value of K EH 2. identifying the ML value of K EH conditioned on fixed values for { β g, τ g, w g, σ g } In each step the likelihood will either increase or remain the same. As we have already described step 1 earlier in the exposition, we next consider step 2. Note that step 1 can be done independently for each gene probe, as in the usual mixed model framework, whereas step 2 requires use of all of the gene expression data simultaneously. When we want to do joint modeling of EH and PS, as for our LMM-EH-PS model described in Equation 11, we must estimate K EH in the context of a fixed K PS that is, we include K PS in the model, and find the K EH that gives the best likelihood over all gene probes, when each gene probe is allowed to have its own value for w g. If not for this latter flexibility, there would be no point in including K PS during estimation, as the final estimate for K would be the same in either case (i.e., K = w g K PS + (1 w g )KEH = w K EH, where w K EH would be what was learned if K PS were not included during parameter learning and where w g K PS + (1 w g )KEH would be what was learned if K PS were included during parameter learning). When we model only EH, that is, when PS is not present, we say that we are estimating K EH with no context. Although it would be possible to infer K EH with no

10 1 context and then add K PS after the fact to model the data, we have found that such an approach is inadequate. Let us now consider the algorithm where K EH is estimated in the context of K PS. The no-context algorithm is a special case. We use the Expectation-Maximization (EM) algorithm [15] to fit the parameter K EH in step 2, which proceeds roughly as follows. One starts with an initialization for the parameter K EH (we initialize with the ICE estimate in our experiments). Then one computes, for each gene probe g, the distribution for v g given the data, y g and the parameters, β g, τ g, σ g w g, K EH, K PS (E-step). Then, given these distributions, one re-estimates the parameter of interest, K EH (M-step). The algorithm iterates between these two steps, in our case, stopping after a fixed number of iterations. The E-step is given by, p( v g y g ) p( y g v g )p( v g ) = N( v g c g, C g ), where (12) C g = (K 1 EH + A 1 g A g = Iσ2 g + (1 w g )τ 2 g ZK PS Z τ 2 g w g ) 1, and c g = C g A 1 g a g, and (13) where we have used the Gaussian identities from, respectively, [16] and [17], that state,, and a g = y g X β g τ g wg. (14) p(x a, A)p(X b, B) = p(x c, C)p( a b, A + B) where C = (A 1 + B 1 ) 1 and c = C(A 1 a + B 1 b), (15) N(Ax m, M) N(x A 1 m, A 1 MA T ). (16) The M-step is given by ˆK EH (i, j) = G [C g (i, j) + c g (i)c g (j))]. (17) g A single E- and M-step has time complexity O(GN 3 ) for G gene probes and N individuals. In practice, we have found that the following schedule of alternating optimizations yields good estimates after 1-2 repetitions: i. Determine the ML values of { β g, τ g, w g, σ g }, conditioned on a fixed value of K EH. ii. Use 3 iterations of EM to improve the value of the likelihood with respect to K EH, conditioned on fixed values of { β g, τ g, w g, σ g }. If we want to estimate K EH in a model that does not include K PS, we simply remove the relevant terms from

11 11 the equations above. In particular, we need only change Equation 14 to A g = Iσ2 g τ 2 g, and a g = y g X β g τ g, and in this case, because A g is now diagonal, it is possible to use the same SVD trick as in [4,1] to factorize K EH so that the naive computation of C g = (K 1 EH + A 1 g ) 1 across all gene probes can be changed from O(GN 3 ) to O(N 3 ). After K EH is learned in this manner (without SNP data), we then treat K EH as fixed and use the mixed model parameter learning presented earlier with LRT tests to obtain a p-value for each single SNP gene-probe hypothesis. Time Complexity and Running Time The total time complexity for the K EH estimation procedure in a PS context is O(IW GN 3 ), for W iterations to estimate { β g, τ g, w g, σ g }, G gene probes, N individuals, and I iterations of steps i and ii outlined above. With no PS context, the K EH estimation procedure becomes O(IGW N + N 3 ). For our mouse data set, where we estimated K EH with a PS context, with N = 188, I = 2, and G 4,, estimation of K EH took approximately 1 hours on a 11 processor cluster. For our human data set, where we estimated K EH with no context, with N = 378, I = 2, and G 4,, estimation of K EH took approximately 5 hours on a 11 processor cluster. Because all parts of parameter estimation are trivially parallelizable (step 1 above can be done independently for each gene probe, and the E-step in step 2 can be broken down into independent parts for each gene probe), running the algorithm in parallel is straightforward. Instead of specifying W directly, we used a tolerance criterion on the log likelihood in the Brent searches of 1e 5, which in turn dictated W. Note that there are approaches in related areas (e.g., Gaussian Process Regression [16]) to speed up the O(N 3 ) computations. We have also considered the idea of using an approximation to the full covariance matrix K by forcing zeros in the corresponding precision matrix, K 1 (which in turn corresponds to use of a non-fully connected graphical model) for which computations can be less than cubic in N. These approaches may be fruitful avenues for further investigation.

12 12 Supplemental Experiments Power Experiments Figure 2 shows the results of power experiments on the synthetic mouse data from the main paper which contained 5% associations at strengths found in the real data as well as strengths multiplied by a factor of 3 and 5. Only factor 3 was shown in the main paper (Figure 4 in the main paper) LMM EH PS ICE PS ICE LMM EH LINREG LMM PS SVA SVA PS ROC (strength=1) ROC (strength=5) TPR.5 TPR FPR FPR 1.9 ROC (strength=3) 25 LMM EH PS ICE PS #Hits vs Estimated FDR (strength=3) TPR.5 # Hits FPR Estimated FDR Figure 2. Power curves for synthetic mouse data. Clock-wise, the first three plots show ROC curves for different strengths of association. The red line denotes what random guessing would have achieved. For the lowest strength of associations (first plot), all models recovered only a few associations. For stronger strengths of association, our model, LMM-EH-PS, achieved maximum power. The final plot illustrates how the best ICE-based model (ICE-PS), which yielded deflated p-values, penalized itself because of its overly conservative estimated FDR..

13 13 Genomic Control Given the substantial deflation observed with ICE-based models, one might consider the application of Genomic Control (GC) [18] to alleviate deflation in the ICE results. However, it is worth noting that Genomic Control was developed theoretically to mitigate inflation, not deflation of p-values, and in fact prescribes that no adjustment be made when λ < 1. Nevertheless, we applied GC (making the correction even when λ < 1) to p-values obtained by all models presented in the main paper, with results shown in Figures 3 and 4. One can see that, on the mouse data, where a large amount of deflation was observed for ICE-based models, that a Genomic Control correction did not fix the problem. For minor deflation, such as in the human data set, GC appears to have made the distribution more reasonable. However, note that because GC only re-scales all test statistic values by a single factor, GC can not change the rank order of hypotheses and thus has no effect on two of the three power experiments (i.e., ROC curve and non-parametric permutation test), nor on the cis-enrichment test. λ=1., LINREG 4 λ=1., SVA 4 λ=1., LINREG 4 λ=1., SVA λ=1., ICE 4 λ=1., LMM EH 4 λ=1., ICE 4 λ=1., LMM EH Figure 3. Application of Genomic Control to histograms of p-values from the real (left) and synthetic (right) human data. The corresponding histograms without GC are shown in Figure 1 of the main paper. Robustness of Estimating K EH in the Presence of True Associations A concern one might have in estimating K EH as we have done, is whether the approximation of ignoring SNP fixed effects during estimation is a good one in practice. To address this concern, we plotted, as shown in Figure 5, only the true nulls from the synthetic mouse power experiments. In these experiments, K EH was in each case estimated from all data, including the 5% of gene-probe SNP pairs that were not null (a reasonable upper bound for a real

14 14 2 λ=1., SVA PS 2 λ=1., SVA 15 λ=1., SVA PS 2 λ=1., SVA λ=1., LMM PS λ=1., LMM EH λ=1., LINREG λ=1., ICE λ=1., LMM PS λ=1., LMM EH λ=1., LINREG λ=1., ICE λ=1., ICE PS λ=1., LMM EH PS λ=1., ICE PS λ=1., LMM EH PS Figure 4. Application of Genomic Control to histograms of p-values from the real (left) and synthetic (right) mouse data. The corresponding histograms without GC are shown in Figure 3 of the main paper. Although λ = 1 now, distributions which were non-uniform before correction remain non-uniform. data set). Even as we increased the strength multiplier of the SNP fixed effects up to five (producing extremely strong associations as can be seen in the ROC curves in Figure 2), the estimated K EH continued to yield calibrated p-values. Experiment with Data Generated from SVA-PS To further assess the robustness of our method, we also generated null data from a linear mixed model with a PS correction and SVA covariates (SVA-PS) fitted to the real data. The SVA algorithm found 29 SVs on the real data (so we generated using 29), and then found 24 on the synthetically generated data. (In both cases we set SVA to use 1 permutations over a range of eigengene-significance from.1 to.5; it always achieved the same number of SVs across this range.) Figure 6 shows the resulting p-value histograms. As expected, linear regression (LINREG) and a linear mixed model with just a PS correction (LMM-PS) showed inflation. Interestingly, SVA-PS, the model from which we generated the data, showed inflation (λ = 2.5), which may result from overfitting. In particular, although SVA found 24 statistically significant SVs when looking at all of the expression data jointly, when we then used these 24 covariates for each gene probe s model, there were only 188 data points per gene probe with which to fit these 24 parameters. This hypothesis of overfitting is reinforced by the fact that, when we performed a similar experiment using linear regression with SV covariates, we observed similar inflation, but then when we tripled the sample size, the inflation disappeared. Lastly, we note that our model was able to successfully capture the confounding structure generated by SVA-PS. A two-sided, one-sample KS test from uniformity indicated that

15 15 1 λ=1.1, (strength=) λ=.99, (strength=3) λ=1.1, (strength=1) λ=.99, (strength=5) Figure 5. QQ plots of (natural) log p-values for synthetic mouse null data where K EH was estimated in the presence of increasingly strong associations. The vertical axis displays the observed quantiles, while the horizontal axis displays the expected quantiles. Each figure is based on synthetic data of which 3% are true associations, increasing in strength multiplier with each figure. The plots show the p-values only for the null tests, to demonstrate that even in the presence of strong associations, our model was able to infer a good value for K EH such that the p-values remained calibrated. Error bars on QQ plots of log p-values shown in dashed red were computed as follows. We assumed that there are M independent p-values, drawn from a uniform distribution on [, 1]. The kth largest p-value from this draw is the k th order statistic, which is known to have a k Beta(k, M k + 1) distribution. The mean of this distribution is M+1. Thus, at (log) k M+1 on the x-axis, we plot as error bars the (log) 9% confidence error envelope of this Beta distribution. the p-value distribution of our model was not significantly different from the theoretical null distribution (p=.24), but that the other three model s distributions were significantly different at a significance level α = 1e 3. This result reinforces the view that our model is robust to various types of confounding structure, and also that SVA easily induces spurious associations, even on data conforming to its own assumptions.

16 16 15 λ=2.53, SVA PS 6 5 λ=1.4, LMM PS λ=5.4, LINREG λ=.94, LMM EH PS Figure 6. P -value histograms on null synthetic mouse data generated from SVA-PS. All models, except for ours (LMM-EH-PS) show inflation. Even the model from which the data was generated, SVA-PS, shows inflation. Modification of SVA Because SVA contains a number of steps beyond using just projections into eigengene space, we also extended our evaluation from the main paper to include evaluation of a model that omits these steps and instead directly projects the expression data of each individual into eigengene space and then uses these projection covariates as covariates which we refer to as Eigenstrat Extension covariates in different models (using the same number of eigengenes as deemed significant by SVA). We use EE to denote a linear regression model with Eigenstrat Extension covariates. We use EE-PS to denote a mixed model with PS correction and these covariates. As in the main paper, we continue to use SVA and SVA-PS to denote, respectively, linear regression with SVA covariates, and a mixed model with PS correction and SVA covariates. Figure 7 shows p-value histograms on synthetic mouse data containing no associations (the same data as used in the main paper for Figures 3 and 4). We see that EE has less inflation than SVA, and that EE-PS has less inflation than SVA-PS, suggesting the EE-based models perform better than SVA-based models. Furthermore, Figure 8 shows that EE-based models have better power than their corresponding SVA-based models. Also, note that all four models perform worse than linear regression. In contrast, we see that our model and others outperform linear regression (see main paper).

17 17 λ=3.8, SVA λ=3.3, EE λ=3.58, SVA λ=3.52, EE λ=2.77, SVA PS λ=2.38, EE PS λ=3.13, SVA PS λ=2.25, EE PS Figure 7. P -value histograms on mouse data (as described in the main paper). Comparison of SVA-based algorithms and EE-based algorithms for the real data (left) and the synthetic null data (right). EE and EE-PS show less inflation respectively than SVA and SVA-PS LINREG EE SVA EE PS SVA PS ROC (strength=3) TPR FPR Figure 8. Power curves for synthetic mouse data. Comparison of EE-based algorithms to SVA-based algorithms and linear regression (all but EE and EE-PS are shown in Figure 4 of the main paper, which also shows our model outperforming linear regression). Linear regression performed better than EE and SVA based models. EE and SVA performed better when not combined with a PS correction. EE-based models performed better than SVA-based models.

18 18 LMM EH ICE LINREG SVA Figure 9. Human 2D gene-probe SNP map. Data shown are from chromosome 1 as described in the main paper (real, not synthetic). The horizontal axis denotes enumerated SNPs and the vertical axis denotes enumerated gene probes (both displayed according to relative ordering along the chromosome, not absolute position). The darker a pixel, the more significant the hypotheses in that region. A cis diagonal band is visible in all but linear regression. We know from the cis enrichment experiments in the main paper that the models are detecting an appreciable number of cis associations. Trans-bands, presumably mostly spurious, are visually apparent in linear regression, and more subtly in SVA, but not in the other models. Because the density of hypotheses is much higher than screen or paper resolution, naive plotting of these data results is sub-optimal. Thus, to improve visual clarity, we used an averaging filter of window size 2 2 (i.e., each p-value becomes the average of the p-values for all hypotheses within a 2 2 gene-probe SNP window in the original matrix of p-values). Then these averaged values are sub-sampled, taking every twentieth averaged value in either direction (SNP and gene probe). Finally, we took the log of this matrix and displayed it using Matlab s image mapping functionality (imagesc with default settings).

19 19 LMM EH PS ICE PS LMM EH ICE LMM PS LINREG SVA PS SVA Figure 1. Mouse 2D gene-probe SNP map. Data shown are from chromosome 1 as described in the main paper (real, not synthetic). The horizontal axes denotes SNPs and the vertical axis denotes gene probes, as in Figure 9. A cis diagonal band is subtly visible for LMM-EH-PS, ICE, LMM-EH, and ICE, but less so or not at all for the others. We know from the cis enrichment experiments in the main paper that the models are detecting an appreciable number of cis associations. Trans-bands, presumably mostly spurious, are visually apparent in linear regression and LMM-PS, and more subtly in SVA and PS-SVA, but not in the other models. See caption of Figure 9 for note about the construction of these images although here we used an averaging window size of 15 and a subsample rate of 1.

20 2 References 1. Bishop, C. M. Pattern Recognition and Machine Learning (Information Science and Statistics) (Springer, 26). 2. Heckerman, D. A tutorial on learning with Bayesian networks. In Learning in graphical models, (Kluwer, 1998). 3. Pearl, J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (Morgan Kaufmann, San Mateo, CA, 1988). 4. Kang, H. M. et al. Efficient control of population structure in model organism association mapping. Genetics (28). 5. Yu, J. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38, (26). 6. Zhao, K. et al. An Arabidopsis example of association mapping in structured samples. PLoS Genet. 3, e4 (27). 7. Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nature genetics 38, (26). 8. Kang, H. M., Ye, C. & Eskin, E. Accurate discovery of expression quantitative trait loci under confounding from spurious and genuine regulatory hotspots. Genetics 18 (28). 9. Harville, D. Bayesian inference for variance components using only error contrasts. Biometrika (1974). 1. Patterson, H. & Thompson, R. Recovery of inter-block information when block sizes are unequal. Biometrika 58 (1971). 11. Storey, J. D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. 1, (23). 12. Brent, R. P. Algorithms for Minimization without Derivatives (Prentice-Hall, Englewood Cliffs, New Jersey, 1973).

21 Nocedal, J. & Wright, S. J. Numerical Optimization (Springer Verlag, 26). 14. Leek, J. T. & Storey, J. D. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet 3, e161+ (27). 15. Dempster, A. P., Laird, N. M. & Rubin, D. B. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39, 1 38 (1977). 16. Rasmussen, C. E. & Williams, C. K. I. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) (The MIT Press, 25). 17. Petersen, K. B. & Pedersen, M. S. The matrix cookbook. Tech. Rep. (26). 18. Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, (1999).

FaST linear mixed models for genome-wide association studies

FaST linear mixed models for genome-wide association studies Nature Methods FaS linear mixed models for genome-wide association studies Christoph Lippert, Jennifer Listgarten, Ying Liu, Carl M Kadie, Robert I Davidson & David Heckerman Supplementary Figure Supplementary

More information

Mixture Models and EM

Mixture Models and EM Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

FaST Linear Mixed Models for Genome-Wide Association Studies

FaST Linear Mixed Models for Genome-Wide Association Studies FaST Linear Mixed Models for Genome-Wide Association Studies Christoph Lippert 1-3, Jennifer Listgarten 1,3, Ying Liu 1, Carl M. Kadie 1, Robert I. Davidson 1, and David Heckerman 1,3 1 Microsoft Research

More information

A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations

A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations Joint work with Karim Oualkacha (UQÀM), Yi Yang (McGill), Celia Greenwood

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

ABSTRACT INTRODUCTION

ABSTRACT INTRODUCTION ABSTRACT Presented in this paper is an approach to fault diagnosis based on a unifying review of linear Gaussian models. The unifying review draws together different algorithms such as PCA, factor analysis,

More information

Computer Vision Group Prof. Daniel Cremers. 3. Regression

Computer Vision Group Prof. Daniel Cremers. 3. Regression Prof. Daniel Cremers 3. Regression Categories of Learning (Rep.) Learnin g Unsupervise d Learning Clustering, density estimation Supervised Learning learning from a training data set, inference on the

More information

Latent Variable models for GWAs

Latent Variable models for GWAs Latent Variable models for GWAs Oliver Stegle Machine Learning and Computational Biology Research Group Max-Planck-Institutes Tübingen, Germany September 2011 O. Stegle Latent variable models for GWAs

More information

Methods for Cryptic Structure. Methods for Cryptic Structure

Methods for Cryptic Structure. Methods for Cryptic Structure Case-Control Association Testing Review Consider testing for association between a disease and a genetic marker Idea is to look for an association by comparing allele/genotype frequencies between the cases

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017 CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

INTRODUCTION TO PATTERN RECOGNITION

INTRODUCTION TO PATTERN RECOGNITION INTRODUCTION TO PATTERN RECOGNITION INSTRUCTOR: WEI DING 1 Pattern Recognition Automatic discovery of regularities in data through the use of computer algorithms With the use of these regularities to take

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Model Accuracy Measures

Model Accuracy Measures Model Accuracy Measures Master in Bioinformatics UPF 2017-2018 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Variables What we can measure (attributes) Hypotheses

More information

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

Supplementary Information

Supplementary Information Supplementary Information 1 Supplementary Figures (a) Statistical power (p = 2.6 10 8 ) (b) Statistical power (p = 4.0 10 6 ) Supplementary Figure 1: Statistical power comparison between GEMMA (red) and

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

More information

Computational paradigms for the measurement signals processing. Metodologies for the development of classification algorithms.

Computational paradigms for the measurement signals processing. Metodologies for the development of classification algorithms. Computational paradigms for the measurement signals processing. Metodologies for the development of classification algorithms. January 5, 25 Outline Methodologies for the development of classification

More information

Latent Variable Methods for the Analysis of Genomic Data

Latent Variable Methods for the Analysis of Genomic Data John D. Storey Center for Statistics and Machine Learning & Lewis-Sigler Institute for Integrative Genomics Latent Variable Methods for the Analysis of Genomic Data http://genomine.org/talks/ Data m variables

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Optimization of Gaussian Process Hyperparameters using Rprop

Optimization of Gaussian Process Hyperparameters using Rprop Optimization of Gaussian Process Hyperparameters using Rprop Manuel Blum and Martin Riedmiller University of Freiburg - Department of Computer Science Freiburg, Germany Abstract. Gaussian processes are

More information

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Xiaoquan Wen Department of Biostatistics, University of Michigan A Model

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Linear Regression (continued)

Linear Regression (continued) Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Estimating Gaussian Mixture Densities with EM A Tutorial

Estimating Gaussian Mixture Densities with EM A Tutorial Estimating Gaussian Mixture Densities with EM A Tutorial Carlo Tomasi Due University Expectation Maximization (EM) [4, 3, 6] is a numerical algorithm for the maximization of functions of several variables

More information

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Thomas G. Dietterich tgd@eecs.oregonstate.edu 1 Outline What is Machine Learning? Introduction to Supervised Learning: Linear Methods Overfitting, Regularization, and the

More information

A CUSUM approach for online change-point detection on curve sequences

A CUSUM approach for online change-point detection on curve sequences ESANN 22 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges Belgium, 25-27 April 22, i6doc.com publ., ISBN 978-2-8749-49-. Available

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Lecture 21: Spectral Learning for Graphical Models

Lecture 21: Spectral Learning for Graphical Models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Statistics Journal Club, 36-825 Beau Dabbs and Philipp Burckhardt 9-19-2014 1 Paper

More information

The Expectation Maximization Algorithm

The Expectation Maximization Algorithm The Expectation Maximization Algorithm Frank Dellaert College of Computing, Georgia Institute of Technology Technical Report number GIT-GVU-- February Abstract This note represents my attempt at explaining

More information

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science 1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science www.people.cornell.edu/pages/cmc59 Work done jointly with David Ruppert, School

More information

Mixtures of Gaussians with Sparse Structure

Mixtures of Gaussians with Sparse Structure Mixtures of Gaussians with Sparse Structure Costas Boulis 1 Abstract When fitting a mixture of Gaussians to training data there are usually two choices for the type of Gaussians used. Either diagonal or

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures

More information

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Independent Component Analysis

Independent Component Analysis Independent Component Analysis James V. Stone November 4, 24 Sheffield University, Sheffield, UK Keywords: independent component analysis, independence, blind source separation, projection pursuit, complexity

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

Mixtures of Gaussians with Sparse Regression Matrices. Constantinos Boulis, Jeffrey Bilmes

Mixtures of Gaussians with Sparse Regression Matrices. Constantinos Boulis, Jeffrey Bilmes Mixtures of Gaussians with Sparse Regression Matrices Constantinos Boulis, Jeffrey Bilmes {boulis,bilmes}@ee.washington.edu Dept of EE, University of Washington Seattle WA, 98195-2500 UW Electrical Engineering

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Using Expectation-Maximization for Reinforcement Learning

Using Expectation-Maximization for Reinforcement Learning NOTE Communicated by Andrew Barto and Michael Jordan Using Expectation-Maximization for Reinforcement Learning Peter Dayan Department of Brain and Cognitive Sciences, Center for Biological and Computational

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Self Adaptive Particle Filter

Self Adaptive Particle Filter Self Adaptive Particle Filter Alvaro Soto Pontificia Universidad Catolica de Chile Department of Computer Science Vicuna Mackenna 4860 (143), Santiago 22, Chile asoto@ing.puc.cl Abstract The particle filter

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Clustering VS Classification

Clustering VS Classification MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Uncertainty quantification and visualization for functional random variables

Uncertainty quantification and visualization for functional random variables Uncertainty quantification and visualization for functional random variables MascotNum Workshop 2014 S. Nanty 1,3 C. Helbert 2 A. Marrel 1 N. Pérot 1 C. Prieur 3 1 CEA, DEN/DER/SESI/LSMR, F-13108, Saint-Paul-lez-Durance,

More information

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance

More information

Mathematical Formulation of Our Example

Mathematical Formulation of Our Example Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Genet. Sel. Evol. 33 001) 443 45 443 INRA, EDP Sciences, 001 Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Louis Alberto GARCÍA-CORTÉS a, Daniel SORENSEN b, Note a

More information

AN ALTERNATING MINIMIZATION ALGORITHM FOR NON-NEGATIVE MATRIX APPROXIMATION

AN ALTERNATING MINIMIZATION ALGORITHM FOR NON-NEGATIVE MATRIX APPROXIMATION AN ALTERNATING MINIMIZATION ALGORITHM FOR NON-NEGATIVE MATRIX APPROXIMATION JOEL A. TROPP Abstract. Matrix approximation problems with non-negativity constraints arise during the analysis of high-dimensional

More information

Estimation of linear non-gaussian acyclic models for latent factors

Estimation of linear non-gaussian acyclic models for latent factors Estimation of linear non-gaussian acyclic models for latent factors Shohei Shimizu a Patrik O. Hoyer b Aapo Hyvärinen b,c a The Institute of Scientific and Industrial Research, Osaka University Mihogaoka

More information

Hidden Markov Models Part 1: Introduction

Hidden Markov Models Part 1: Introduction Hidden Markov Models Part 1: Introduction CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Modeling Sequential Data Suppose that

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Non-specific filtering and control of false positives

Non-specific filtering and control of false positives Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Linear Classifiers as Pattern Detectors

Linear Classifiers as Pattern Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2014/2015 Lesson 16 8 April 2015 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Machine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20.

Machine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20. 10-601 Machine Learning, Midterm Exam: Spring 2008 Please put your name on this cover sheet If you need more room to work out your answer to a question, use the back of the page and clearly mark on the

More information

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence

More information

Fast and Accurate Causal Inference from Time Series Data

Fast and Accurate Causal Inference from Time Series Data Fast and Accurate Causal Inference from Time Series Data Yuxiao Huang and Samantha Kleinberg Stevens Institute of Technology Hoboken, NJ {yuxiao.huang, samantha.kleinberg}@stevens.edu Abstract Causal inference

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

DS-GA 1002 Lecture notes 12 Fall Linear regression

DS-GA 1002 Lecture notes 12 Fall Linear regression DS-GA Lecture notes 1 Fall 16 1 Linear models Linear regression In statistics, regression consists of learning a function relating a certain quantity of interest y, the response or dependent variable,

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture

More information