Latent Variable models for GWAs

Size: px
Start display at page:

Download "Latent Variable models for GWAs"

Transcription

1 Latent Variable models for GWAs Oliver Stegle Machine Learning and Computational Biology Research Group Max-Planck-Institutes Tübingen, Germany September 2011 O. Stegle Latent variable models for GWAs Tübingen 1

2 Motivation Why latent variables? Causal influences on phenotypes Genotype Primary variable of interest Known confounding factors Covariates Population structure Unknown (latent) confounders Sample handling Sample history Subtle environmental perturbations Genome? Phenome individuals ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT yy 1 yy 2... y N phenotypes SNPs O. Stegle Latent variable models for GWAs Tübingen 2

3 Motivation Why latent variables? Causal influences on phenotypes Genotype Primary variable of interest Known confounding factors Covariates Population structure Unknown (latent) confounders Sample handling Sample history Subtle environmental perturbations Genome? Phenome individuals ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT yy 1 yy 2... y N phenotypes SNPs Covariates Population y O. Stegle Latent variable models for GWAs Tübingen 2

4 Motivation Why latent variables? Causal influences on phenotypes Genotype Primary variable of interest Known confounding factors Covariates Population structure Unknown (latent) confounders Sample handling Sample history Subtle environmental perturbations Genome? Phenome individuals ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT yy 1 yy 2... y N phenotypes SNPs Covariates Population y Confounders y O. Stegle Latent variable models for GWAs Tübingen 2

5 Outline Outline O. Stegle Latent variable models for GWAs Tübingen 3

6 Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Outline Motivation Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Modeling hidden confounders in GWAs Model Applications Modeling unobserved cellular phenotypes in genetic analyses Model Applications A unifying view Summary O. Stegle Latent variable models for GWAs Tübingen 4

7 Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Manifolds and dimension reduction (from Olivier Grisel, Generated using the Modular Data Processing toolkit and matplotlib.) O. Stegle Latent variable models for GWAs Tübingen 5

8 Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Linear dimension reduction Map G dimensional data on K dimensional manifold; K << G Y }{{} NxG = }{{} H }{{} W NxK KxG + Ψ }{{} NxG H: latent factors in low-dimensional space W: weights for factors on data dimensions Ψ: noise, ψn,g N (0, σ 2 ). Challenge: neither W nor H known! Depending on assumptions on W and H: Principle component analysis (PCA) Independent component analysis (ICA)... O. Stegle Latent variable models for GWAs Tübingen 6

9 Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Linear dimension reduction Map G dimensional data on K dimensional manifold; K << G Y }{{} NxG = }{{} H }{{} W NxK KxG + Ψ }{{} NxG H: latent factors in low-dimensional space W: weights for factors on data dimensions Ψ: noise, ψ n,g N (0, σ 2 ). Challenge: neither W nor H known! Depending on assumptions on W and H: Principle component analysis (PCA) Independent component analysis (ICA)... O. Stegle Latent variable models for GWAs Tübingen 6

10 Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Linear dimension reduction Map G dimensional data on K dimensional manifold; K << G Y }{{} NxG = }{{} H }{{} W NxK KxG + Ψ }{{} NxG H: latent factors in low-dimensional space W: weights for factors on data dimensions Ψ: noise, ψ n,g N (0, σ 2 ). Challenge: neither W nor H known! Depending on assumptions on W and H: Principle component analysis (PCA) Independent component analysis (ICA)... O. Stegle Latent variable models for GWAs Tübingen 6

11 Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Linear dimension reduction PCA PCA is corresponds to a noise-free version of the model Y }{{} NxG = }{{} H }{{} W NxK KxG PCA components (H) correspond to directions of maximum data variance in the original dataset: Covariance matrix: C = YY T Eigenvalue/Eigen vectors Cvi = λ i v i Projection matrix P = [v1,..., v K ] Principle components Hn = P Y n. O. Stegle Latent variable models for GWAs Tübingen 7

12 Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Linear dimension reduction PCA PCA is corresponds to a noise-free version of the model Y }{{} NxG = }{{} H }{{} W NxK KxG PCA components (H) correspond to directions of maximum data variance in the original dataset: Covariance matrix: C = YY T Eigenvalue/Eigen vectors Cvi = λ i v i Projection matrix P = [v1,..., v K ] Principle components Hn = P Y n. O. Stegle Latent variable models for GWAs Tübingen 7

13 Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Linear dimension reduction PCA PCA is corresponds to a noise-free version of the model Y }{{} NxG = }{{} H }{{} W NxK KxG PCA components (H) correspond to directions of maximum data variance in the original dataset: Covariance matrix: C = YY T Eigenvalue/Eigen vectors Cvi = λ i v i Projection matrix P = [v1,..., v K ] Principle components Hn = P Y n. O. Stegle Latent variable models for GWAs Tübingen 7

14 Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Linear dimension reduction PCA PCA is corresponds to a noise-free version of the model Y }{{} NxG = }{{} H }{{} W NxK KxG PCA components (H) correspond to directions of maximum data variance in the original dataset: Covariance matrix: C = YY T Eigenvalue/Eigen vectors Cvi = λ i v i Projection matrix P = [v1,..., v K ] Principle components Hn = P Y n. O. Stegle Latent variable models for GWAs Tübingen 7

15 Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Linear dimension reduction PCA PCA is corresponds to a noise-free version of the model Y }{{} NxG = }{{} H }{{} W NxK KxG PCA components (H) correspond to directions of maximum data variance in the original dataset: Covariance matrix: C = YY T Eigenvalue/Eigen vectors Cvi = λ i v i Projection matrix P = [v1,..., v K ] Principle components Hn = P Y n. O. Stegle Latent variable models for GWAs Tübingen 7

16 Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Linear dimension reduction Bayesian PCA and GPLVM Assumption: data dimensions or sample dimension independent given H and W. Probabilistic PCA GPLVM N ( p(y H, W) = N y n hnw, σ 2 ) I n=1 N ( ) 2 p(h) = N h n 0, σ h I n=1 N ( 2 p(y W) = N y n 0, σ h WWT + σ 2 ) I n=1 [Tipping and Bishop, 1999] G ( p(y H, W) = N y :,g Hwg, σ 2 ) I g=1 G ( ) 2 p(w) = N w :,g 0, σ h I g=1 G p(y H) = N y:,g 0, σ 2 h g=1 } HHT +σ 2 I {{} NxN [Lawrence, 2005] O. Stegle Latent variable models for GWAs Tübingen 8

17 Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Linear dimension reduction Bayesian PCA and GPLVM Assumption: data dimensions or sample dimension independent given H and W. Probabilistic PCA GPLVM N ( p(y H, W) = N y n hnw, σ 2 ) I n=1 N ( ) 2 p(h) = N h n 0, σ h I n=1 N ( 2 p(y W) = N y n 0, σ h WWT + σ 2 ) I n=1 [Tipping and Bishop, 1999] G ( p(y H, W) = N y :,g Hwg, σ 2 ) I g=1 G ( ) 2 p(w) = N w :,g 0, σ h I g=1 G p(y H) = N y:,g 0, σ 2 h g=1 } HHT +σ 2 I {{} NxN [Lawrence, 2005] O. Stegle Latent variable models for GWAs Tübingen 8

18 Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) GPLVM Marginal likelihood p(y H) = G N ( y :,g 0, σh 2 HHT + σ 2 I ) g=1 Inference of most probable hidden factors H: Ĥ = argmax H log G N ( y :,g 0, σh 2 HHT + σ 2 I ) g=1 O. Stegle Latent variable models for GWAs Tübingen 9

19 Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) GPLVM Marginal likelihood p(y H) = G N ( y :,g 0, σh 2 HHT + σ 2 I ) g=1 Inference of most probable hidden factors H: Ĥ = argmax H log G N ( y :,g 0, σh 2 HHT + σ 2 I ) g=1 O. Stegle Latent variable models for GWAs Tübingen 9

20 Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Are linear relationships sufficient? (from Olivier Grisel, Generated using the Modular Data Processing toolkit and matplotlib.) Non-linear generalizations, introducing general kernel function instead of linear covariance Ĥ = argmax H log G N ( y :,g 0, σh 2 HHT + σ 2 I ) g=1 O. Stegle Latent variable models for GWAs Tübingen 10

21 Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Are linear relationships sufficient? (from Olivier Grisel, Generated using the Modular Data Processing toolkit and matplotlib.) Non-linear generalizations, introducing general kernel function instead of linear covariance Ĥ = argmax H log G N ( y :,g 0, σh 2 HHT + σ 2 I ) g=1 O. Stegle Latent variable models for GWAs Tübingen 10

22 Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Are linear relationships sufficient? (from Olivier Grisel, Generated using the Modular Data Processing toolkit and matplotlib.) Non-linear generalizations, introducing general kernel function instead of linear covariance Ĥ = argmax H log G N ( y :,g 0, σh 2 K H,H + σ 2 I ) g=1 O. Stegle Latent variable models for GWAs Tübingen 10

23 Modeling hidden confounders in GWAs Outline Motivation Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Modeling hidden confounders in GWAs Model Applications Modeling unobserved cellular phenotypes in genetic analyses Model Applications A unifying view Summary O. Stegle Latent variable models for GWAs Tübingen 11

24 Modeling hidden confounders in GWAs Confounders in eqtl studies Confounders in eqtl studies Experimental procedures Gene regulation (Translation) Standard to take known factors (gender, population structure) into account. It is key to account for hidden factors as well. Sample preparation Sample history Genome? Phenome individuals ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT yy1 yy2... yn y phenotypes SNPs Covariates Confounders Population y y O. Stegle Latent variable models for GWAs Tübingen 12

25 Modeling hidden confounders in GWAs [ Confounders in eqtl studies Confounders in eqtl studies Experimental procedures Gene regulation (Translation) Standard to take known factors (gender, population structure) into account. It is key to account for hidden factors as well. Sample preparation Sample history DNA transcription mrna [ translation proteins SNPs ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT observed & hidden external influenes experimental procedures environment gene regulation alternative splicing transcription factors small RNAs O. Stegle Latent variable models for GWAs Tübingen 12

26 Modeling hidden confounders in GWAs [ Confounders in eqtl studies Confounders in eqtl studies Experimental procedures Gene regulation (Translation) Standard to take known factors (gender, population structure) into account. It is key to account for hidden factors as well. Sample preparation Sample history DNA transcription mrna [ translation proteins SNPs ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT observed & hidden external influenes experimental procedures environment gene regulation alternative splicing transcription factors small RNAs O. Stegle Latent variable models for GWAs Tübingen 12

27 Modeling hidden confounders in GWAs [ Confounders in eqtl studies Confounders in eqtl studies Experimental procedures Gene regulation (Translation) Standard to take known factors (gender, population structure) into account. It is key to account for hidden factors as well. Sample preparation Sample history DNA transcription mrna [ translation proteins SNPs ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT observed & hidden external influenes experimental procedures environment gene regulation alternative splicing transcription factors small RNAs O. Stegle Latent variable models for GWAs Tübingen 12

28 Modeling hidden confounders in GWAs Motivating examples HapMAP II, 3 populations, 90 individuals each, 40K genes, 3 million SNPs (here: small region in chromosome 7) 15 SLC35B4 (a) Standard eqtl LOD 10 5 FPR 0.01% FPR 0.01% (b) VBFA accounting for hidden factors VBFA eqtl LOD SLC35B4 SLC35B4 FPR 0.01% FPR 0.01% Position in chr. 7 x 10 8 O. Stegle Latent variable models for GWAs Tübingen 13

29 Modeling hidden confounders in GWAs Model Association model Direct effects of confounders Start with standard association model. Include (K, few) global hidden factors (confounders) in the model. Factors H = {h :,1,..., h :,K } need to be learned from the expression data. How to control model complexity, i.e. choose the number of factors? Genome? Phenome individuals ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT yy1 yy2... yn y phenotypes SNPs genetic {}}{ N y :,g = (θ n,g s n ) + n=1 ψ }{{} noise Covariates Population y O. Stegle Latent variable models for GWAs Tübingen 14

30 Modeling hidden confounders in GWAs Model Association model Direct effects of confounders Start with standard association model. Include (K, few) global hidden factors (confounders) in the model. Factors H = {h :,1,..., h :,K } need to be learned from the expression data. How to control model complexity, i.e. choose the number of factors? Genome? Phenome individuals ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT yy1 yy2... yn y phenotypes SNPs genetic {}}{ N y :,g = (θ n,g s n ) + n=1 K w g,k h k k=1 }{{} hidden factors Covariates Confounders Population y + ψ }{{} noise y O. Stegle Latent variable models for GWAs Tübingen 14

31 Modeling hidden confounders in GWAs Model Association model Direct effects of confounders Start with standard association model. Include (K, few) global hidden factors (confounders) in the model. Factors H = {h :,1,..., h :,K } need to be learned from the expression data. How to control model complexity, i.e. choose the number of factors? Genome? Phenome individuals ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT yy1 yy2... yn y phenotypes SNPs genetic {}}{ N y :,g = (θ n,g s n ) + n=1 K w g,k h k k=1 }{{} hidden factors Covariates Confounders Population y + ψ }{{} noise y O. Stegle Latent variable models for GWAs Tübingen 14

32 Modeling hidden confounders in GWAs Model Association model Gaussian process formulation Data likelihood of the linear generative model assuming Gaussian noise ( ) G S K p(y X, H, θ, W, Θ K ) = N y :,g x s θ s + w g,k h k, σei 2 g=1 s=1 Specify Gaussian priors on the factor weight P (w g,k ) = N ( w g,k 0, σh 2 ) Integrating out the weights yields a Gaussian process marginal likelihood, factorizing over genes given H ( G S ) K P (Y X, H, Θ K ) = N y :,g x s θ s, σh 2 h k h T k + σei 2 g=1 s=1 k=1 k=1 O. Stegle Latent variable models for GWAs Tübingen 15

33 Modeling hidden confounders in GWAs Model Association model Gaussian process formulation Data likelihood of the linear generative model assuming Gaussian noise ( ) G S K p(y X, H, θ, W, Θ K ) = N y :,g x s θ s + w g,k h k, σei 2 g=1 s=1 Specify Gaussian priors on the factor weight P (w g,k ) = N ( w g,k 0, σh 2 ) Integrating out the weights yields a Gaussian process marginal likelihood, factorizing over genes given H ( G S ) K P (Y X, H, Θ K ) = N y :,g x s θ s, σh 2 h k h T k + σei 2 g=1 s=1 k=1 k=1 O. Stegle Latent variable models for GWAs Tübingen 15

34 Modeling hidden confounders in GWAs Model Association model Gaussian process formulation Data likelihood of the linear generative model assuming Gaussian noise ( ) G S K p(y X, H, θ, W, Θ K ) = N y :,g x s θ s + w g,k h k, σei 2 g=1 s=1 Specify Gaussian priors on the factor weight P (w g,k ) = N ( w g,k 0, σh 2 ) Integrating out the weights yields a Gaussian process marginal likelihood, factorizing over genes given H ( G S ) K P (Y X, H, Θ K ) = N y :,g x s θ s, σh 2 h k h T k + σei 2 g=1 s=1 k=1 k=1 O. Stegle Latent variable models for GWAs Tübingen 15

35 Modeling hidden confounders in GWAs Model Association model Including population structure P (Y X, H, Θ K ) = G N y :,g g=1 S K x s θ s, σh 2 h k h T k + σ2 ei k=1 }{{} NxN s=1 Including additional covariance term for population structure. Association test for single SNP. Challenge: Refitting σ g, σ h, H for every SNP. O. Stegle Latent variable models for GWAs Tübingen 16

36 Modeling hidden confounders in GWAs Model Association model Including population structure G P (Y X, H, Θ K ) = N g=1 y :,g S K x s θ s, σh 2 h k h T k + σ gxx T + σei 2 k=1 }{{} NxN s=1 Including additional covariance term for population structure. Association test for single SNP. Challenge: Refitting σ g, σ h, H for every SNP. O. Stegle Latent variable models for GWAs Tübingen 16

37 Modeling hidden confounders in GWAs Model Association model Including population structure P (Y X, H, Θ K ) = G N y K :,g x i θ i, σh 2 h k h T k + σ gxx T + σ 2 ei k=1 }{{} NxN g=1 Including additional covariance term for population structure. Association test for single SNP. Challenge: Refitting σ g, σ h, H for every SNP. O. Stegle Latent variable models for GWAs Tübingen 16

38 Modeling hidden confounders in GWAs Model Association model Including population structure P (Y X, H, Θ K ) = G N y K :,g x i θ i, σh 2 h k h T k + σ gxx T + σ 2 ei k=1 }{{} NxN g=1 Including additional covariance term for population structure. Association test for single SNP. Challenge: Refitting σ g, σ h, H for every SNP. O. Stegle Latent variable models for GWAs Tübingen 16

39 Modeling hidden confounders in GWAs Model Association model Fit parameter once on null model Ĥ, ˆσ g, σˆ h, ˆσ e = argmax P (Y X, H, Θ K ) Ĥ, ˆσ e, σˆ g, ˆσ e Association testing for fixed hyperparameters ( ) p(y :,g x i, H, Θ K ) = N y :,g xi θ i, α 2 ( ˆσ e ĤĤT + ˆσ g XX T ) + σei 2 O. Stegle Latent variable models for GWAs Tübingen 17

40 Modeling hidden confounders in GWAs Model Association model Fit parameter once on null model Ĥ, ˆσ g, σˆ h, ˆσ e = argmax P (Y X, H, Θ K ) Ĥ, ˆσ e, σˆ g, ˆσ e Association testing for fixed hyperparameters ( ) p(y :,g x i, H, Θ K ) = N y :,g xi θ i, α 2 ( ˆσ e ĤĤT + ˆσ g XX T ) + σei 2 O. Stegle Latent variable models for GWAs Tübingen 17

41 Modeling hidden confounders in GWAs Applications Evaluation on real data CIS gene: YJL213W Promoter S ATGACCTGAAACTGGGCGGATGACGTGGAACGGTATGACCTGAAACTGGGGGACTGACGTGGAACGGTATGACCTGAAACT TRANS gene: YJL213W S Promoter ATGA...ACTGGGCGGATGACGTGGAACGGTATGACCTGAAACTGGGGGACTGACGTGGAACGGTATGACCTGAAACT O. Stegle Latent variable models for GWAs Tübingen 18

42 Modeling hidden confounders in GWAs Applications Evaluation on real data CIS gene: YJL213W Promoter S ATGACCTGAAACTGGGCGGATGACGTGGAACGGTATGACCTGAAACTGGGGGACTGACGTGGAACGGTATGACCTGAAACT (a) Yeast TRANS S cis eqtls gene: YJL213W Promoter ATGA...ACTGGGCGGATGACGTGGAACGGTATGACCTGAAACTGGGGGACTGACGTGGAACGGTATGACCTGAAACT (b) Yeast O. Stegle Latent variable models for GWAs trans eqtls Tu bingen 18

43 Modeling hidden confounders in GWAs Applications Evaluation on real data Application to HapMap II expression analysis O. Stegle Latent variable models for GWAs Tübingen 19

44 Modeling hidden confounders in GWAs Applications Evaluation on real data Application to HapMap II expression analysis (a) Probes with a VBeQTL in pooled population O. Stegle (f) cis VBeQTL location and strength relative to gene start Latent variable models for GWAs Tu bingen 19

45 Modeling unobserved cellular phenotypes in genetic analyses Outline Motivation Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Modeling hidden confounders in GWAs Model Applications Modeling unobserved cellular phenotypes in genetic analyses Model Applications A unifying view Summary O. Stegle Latent variable models for GWAs Tübingen 20

46 Modeling unobserved cellular phenotypes in genetic analyses Model [ Association studies Regulatory and external factors Confounding factors Accounting for hidden confounders genetic {}}{ ( ) y g,j = b n,g θn,gsn,j + v gf j + w gx j + ψ n,g. }{{} }{{} }{{} known factors hidden factors noise Accounting for regulatory factors? DNA transcription mrna [ translation proteins SNPs ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT observed & hidden external influenes experimental procedures environment gene regulation alternative splicing transcription factors small RNAs O. Stegle Latent variable models for GWAs Tübingen 21

47 Modeling unobserved cellular phenotypes in genetic analyses Model [ Association studies Regulatory and external factors Confounding factors Accounting for hidden confounders genetic {}}{ ( ) y g,j = b n,g θn,gsn,j + v gf j + w gx j + ψ n,g. }{{} }{{} }{{} known factors hidden factors noise Accounting for regulatory factors? DNA transcription mrna [ translation proteins SNPs ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT observed & hidden external influenes experimental procedures environment gene regulation alternative splicing transcription factors small RNAs O. Stegle Latent variable models for GWAs Tübingen 21

48 Modeling unobserved cellular phenotypes in genetic analyses Model [ Association studies Regulatory factors Account for regulatory factors: Transcription factors Pathway components Hypothesis: intermediate cellular factors mediate the observed association signals of target genes. Measuring X? Difficult and expensive. Learn the unobserved factors X. DNA transcription mrna [ translation proteins SNPs ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT observed & hidden external influenes experimental procedures environment gene regulation alternative splicing small RNAs [Parts et al., 2011] O. Stegle Latent variable models for GWAs Tübingen 22

49 Modeling unobserved cellular phenotypes in genetic analyses Model [ Association studies Regulatory factors Account for regulatory factors: Transcription factors Pathway components Hypothesis: intermediate cellular factors mediate the observed association signals of target genes. Measuring X? Difficult and expensive. Learn the unobserved factors X. DNA transcription mrna [ translation proteins SNPs ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT observed & hidden external influenes experimental procedures environment gene regulation alternative splicing small RNAs [Parts et al., 2011] O. Stegle Latent variable models for GWAs Tübingen 22

50 Modeling unobserved cellular phenotypes in genetic analyses Model [ Association studies Regulatory factors Account for regulatory factors: Transcription factors Pathway components Hypothesis: intermediate cellular factors mediate the observed association signals of target genes. Measuring X? Difficult and expensive. Learn the unobserved factors X. DNA transcription mrna [ translation proteins SNPs ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT observed & hidden external influenes experimental procedures environment gene regulation alternative splicing small RNAs [Parts et al., 2011] O. Stegle Latent variable models for GWAs Tübingen 22

51 Modeling unobserved cellular phenotypes in genetic analyses Model Association studies Transcription factor effects Inference of regulatory factors using a bilinear model Y }{{} Expr. = W }{{} Weights X }{{} Factors + Ψ }{{} Noise. DNA SNPs ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT W is sparse; each factor regulates only a subset of all genes. Incorporate biological prior knowledge to that factor are interpretable: Transcription factor binding affinities. Inferred factors summarize co-expression clusters. transcription mrna [ translation proteins [ observed & hidden external influenes experimental procedures environment gene regulation alternative splicing small RNAs O. Stegle Latent variable models for GWAs Tübingen 23

52 Modeling unobserved cellular phenotypes in genetic analyses Model Association studies Transcription factor effects Inference of regulatory factors using a bilinear model Y }{{} Expr. = W }{{} Weights X }{{} Factors + Ψ }{{} Noise. W is sparse; each factor regulates only a subset of all genes. Incorporate biological prior knowledge to that factor are interpretable: Transcription factor binding affinities. Inferred factors summarize co-expression clusters. TF: PHO4 Promoter gene: YJL213W ATGACCTGAAACTGGGCGGATGACGTGGAACGGTATGACCTGAAACTGGGGGACTGACGTGGAACGGTATGACCTGAAACT O. Stegle Latent variable models for GWAs Tübingen 23

53 Modeling unobserved cellular phenotypes in genetic analyses Model Association studies Transcription factor effects Inference of regulatory factors using a bilinear model Y }{{} Expr. = W }{{} Weights X }{{} Factors + Ψ }{{} Noise. Gene expression W is sparse; each factor regulates only a subset of all genes. Incorporate biological prior knowledge to that factor are interpretable: Transcription factor binding affinities. Inferred factors summarize co-expression clusters. PHO4 Known targets from Yeastract Segregants (218) PHO4 factor activation Segregants (218) O. Stegle Latent variable models for GWAs Tübingen 23 Genes (5493)

54 Modeling unobserved cellular phenotypes in genetic analyses Model Sparse factor analysis Probabilistic model Graphical model for Y = W X + Ψ. Indicators z g,k determine the sparsity pattern: P (w g,k z g,k = 0) = N ( w g,k 0, σ 2 0 ) P (w g,k z g,k = 1) = N ( w g,k 0, σ 2 1 ). Prior knowledge is encoded in π g,k = P (z g,k = 1). Standard conjugate priors for the remaining random variables. O. Stegle Latent variable models for GWAs Tübingen 24

55 Modeling unobserved cellular phenotypes in genetic analyses Model Sparse factor analysis Probabilistic model Graphical model for Y = W X + Ψ. Indicators z g,k determine the sparsity pattern: P (w g,k z g,k = 0) = N ( w g,k 0, σ 2 0 ) P (w g,k z g,k = 1) = N ( w g,k 0, σ 2 1 ). Prior knowledge is encoded in π g,k = P (z g,k = 1). Standard conjugate priors for the remaining random variables. O. Stegle Latent variable models for GWAs Tübingen 24

56 Modeling unobserved cellular phenotypes in genetic analyses Model Sparse factor analysis Probabilistic model Graphical model for Y = W X + Ψ. Indicators z g,k determine the sparsity pattern: P (w g,k z g,k = 0) = N ( w g,k 0, σ 2 0 ) P (w g,k z g,k = 1) = N ( w g,k 0, σ 2 1 ). Prior knowledge is encoded in π g,k = P (z g,k = 1). Standard conjugate priors for the remaining random variables. O. Stegle Latent variable models for GWAs Tübingen 24

57 Modeling unobserved cellular phenotypes in genetic analyses Model Sparse factor analysis Inference Scale is challenging: thousands of genes, hundreds of TFs. Efficient deterministic approximate inference: 1. Variational Bayesian inference for the core model. 2. Expectation Propagation for the sparsity submodel. O. Stegle Latent variable models for GWAs Tübingen 25

58 Modeling unobserved cellular phenotypes in genetic analyses Model Sparse factor analysis Inference Scale is challenging: thousands of genes, hundreds of TFs. Efficient deterministic approximate inference: 1. Variational Bayesian inference for the core model. 2. Expectation Propagation for the sparsity submodel. O. Stegle Latent variable models for GWAs Tübingen 25

59 Modeling unobserved cellular phenotypes in genetic analyses Model Sparse factor analysis Comparison to MCMC on (small!) simulated dataset Recovery of the true simulated network structure. O. Stegle Latent variable models for GWAs Tübingen 26

60 Modeling unobserved cellular phenotypes in genetic analyses Model Sparse factor analysis Comparison to MCMC on (small!) simulated dataset Recovery of the true simulated factor activations. O. Stegle Latent variable models for GWAs Tübingen 26

61 Modeling unobserved cellular phenotypes in genetic analyses Applications Application to yeast Dataset Applied the approach to 108 yeast strains (crosses), genotyped and expression profiled in 2 conditions (ethanol and glucose). Alternative types of prior information available TF binding affinities (Yeastract). Pathway information (KEGG). [Smith and Kruglyak, 2008] O. Stegle Latent variable models for GWAs Tübingen 27

62 Modeling unobserved cellular phenotypes in genetic analyses Applications [ Application to yeast Factor associations Biological hypotheses Genetic variation (SNPs) may regulate factor activations. Similarly for the environment condition (glucose/ethanol). Interaction effects between SNPs, factors and genes. DNA transcription mrna [ translation proteins SNPs ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT observed & hidden external influenes experimental procedures environment gene regulation alternative splicing small RNAs O. Stegle Latent variable models for GWAs Tübingen 28

63 Modeling unobserved cellular phenotypes in genetic analyses Applications [ Application to yeast Factor associations Biological hypotheses Genetic variation (SNPs) may regulate factor activations. Similarly for the environment condition (glucose/ethanol). Interaction effects between SNPs, factors and genes. DNA transcription mrna [ translation proteins SNPs ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT observed & hidden external influenes experimental procedures environment gene regulation alternative splicing small RNAs O. Stegle Latent variable models for GWAs Tübingen 28

64 Modeling unobserved cellular phenotypes in genetic analyses Applications [ Application to yeast Factor associations Biological hypotheses Genetic variation (SNPs) may regulate factor activations. Similarly for the environment condition (glucose/ethanol). Interaction effects between SNPs, factors and genes. DNA transcription mrna [ translation proteins SNPs ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT observed & hidden external influenes experimental procedures environment gene regulation alternative splicing small RNAs O. Stegle Latent variable models for GWAs Tübingen 28

65 Modeling unobserved cellular phenotypes in genetic analyses Applications [ Application to yeast Factor associations Biological hypotheses Genetic variation (SNPs) may regulate factor activations. Similarly for the environment condition (glucose/ethanol). Interaction effects between SNPs, factors and genes. DNA transcription mrna [ translation proteins SNPs ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT observed & hidden external influenes experimental procedures environment gene regulation alternative splicing small RNAs O. Stegle Latent variable models for GWAs Tübingen 28

66 Modeling unobserved cellular phenotypes in genetic analyses Applications Application to yeast Factor associations Genome-wide association density. # of associations 10 1 Yeastract KEGG Freeform Trans genes # of associations I II III IV V VI VII VIII IX X XI XII XIII XIV XV XVI O. Stegle Latent Chromosomal variable models for position GWAs Tübingen 29

67 Modeling unobserved Growth cellular condition phenotypes in genetic analyses Application to yeast Factor interactions Segregating loci (2956) Genotype... Segregants (218) Applications Interaction tests between all gene/snp/factor triplets. Segregant count PHO4 activation (c) PHO4 factor activation Genotype-factor interaction Segregating loci (2956) Genes (5493) Genotype... Gene expression... Segregants (218) YJL213W expression Interaction LOD = 35.0 (PHO84 locus) BY Glu RM Glu BY Eth RM Eth PHO4 activation O. Stegle Latent variable models for GWAs Tübingen 30

68 Modeling unobserved cellular phenotypes in genetic analyses Applications Application to yeast Factor interactions Example interactions. (a) YAP1-IRA2 interaction (b) PHO4-PHO84 interaction (c) MAT-YOX1 interaction O. Stegle Latent variable models for GWAs Tübingen 30

69 Modeling unobserved cellular phenotypes in genetic analyses Applications Application to yeast Factor interactions Genome-wide interaction density. # of interacting genes 1500 OAF AMN1 Environment Yeastract KEGG Freeform PHO84 HAP1 I II III IV V VI VII VIII IX X XI XII Chromosomal position XIII XIV XV XVI IRA2 MKT1 CIN5 O. Stegle Latent variable models for GWAs Tübingen 30

70 A unifying view Outline Motivation Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Modeling hidden confounders in GWAs Model Applications Modeling unobserved cellular phenotypes in genetic analyses Model Applications A unifying view Summary O. Stegle Latent variable models for GWAs Tübingen 31

71 A unifying view Matrix variate normal distributions S Confounders induces structure between samples. Gene regulation induces structure between genes. Matrix variate models for joint correction. samples 1 genes confounders hidden known O. Stegle Latent variable models for GWAs Tübingen 32

72 A unifying view Matrix variate normal distributions Confounders induces structure between samples. Gene regulation induces structure between genes. Matrix variate models for joint correction. samples S genes O. Stegle Latent variable models for GWAs Tübingen 32

73 A unifying view Matrix variate normal distributions Confounders induces structure between samples. Gene regulation induces structure between genes. Matrix variate models for joint correction. O. Stegle Latent variable models for GWAs Tübingen 32

74 A unifying view Matrix variate normal distributions S samples genes Row covariances (samples) K Column covariances (genes) Λ 1 O. Stegle Latent variable models for GWAs Tübingen 33

75 A unifying view Matrix variate normal distributions S samples genes Row covariances (samples) K 0 sampless Column covariances (genes) Λ sampless O. Stegle Latent variable models for GWAs Tübingen 33

76 A unifying view Matrix variate normal distributions S samples genes Row covariances (samples) K sampless Column covariances (genes) Λ 1 genes sampless O. Stegle Latent variable models for GWAs genes Tübingen 33

77 A unifying view Matrix variate normal distributions S samples genes Row covariances (samples) K sampless Column covariances (genes) Λ 1 genes sampless O. Stegle Latent variable models for GWAs genes Tübingen 33

78 A unifying view Matrix variate distributions Matrix variate distribution with column covariance Λ 1 and row covariance K p(y µ = 0, Λ 1, K) exp ( 0.5tr [ ΛY T K 1 Y ]) Equivalent to Kronecker covariance structure on vectorized data p(vecy µ = 0, Λ 1, K) = N ( vecy 0, Λ 1 K ) O. Stegle Latent variable models for GWAs Tübingen 34

79 A unifying view Matrix variate distributions Matrix variate distribution with column covariance Λ 1 and row covariance K p(y µ = 0, Λ 1, K) exp ( 0.5tr [ ΛY T K 1 Y ]) Equivalent to Kronecker covariance structure on vectorized data p(vecy µ = 0, Λ 1, K) = N ( vecy 0, Λ 1 K ) Drawing samples from a MN Kronecker matrix product 1. sample R Y r=1 g=1 G N (0, 1) 2. Y = chol(k) Y 3. Y = Y chol(λ 1 ) T a 11 B a 12 B... a 1H B a 21 B a 22 B... a 2H B A B = a G1 B a G2 B... a GH B O. Stegle Latent variable models for GWAs Tübingen 34

80 A unifying view Simulated example better recovery of the true graph Recovery of simulated network. Weak confounding variation (20% variance in row covariance). Precision GLasso Kronecker GLasso Ideal GLasso Recall ground truth GLasso O. Stegle Latent variable models for GWAs Tübingen 35

81 A unifying view Simulated example better recovery of the true graph Recovery of simulated network. Weak confounding variation (20% variance in row covariance). Precision GLasso Kronecker GLasso Ideal GLasso ground truth GLasso Recall Kornecker GLasso Ideal GLasso O. Stegle Latent variable models for GWAs Tübingen 35

82 Summary Outline Motivation Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Modeling hidden confounders in GWAs Model Applications Modeling unobserved cellular phenotypes in genetic analyses Model Applications A unifying view Summary O. Stegle Latent variable models for GWAs Tübingen 36

83 Summary Summary Latent variables can have a dramatic effect in GWAs. In expression studies, confounders can be estimated from data. Cellular features can be learnt from the expression profiles. Duality of regulatory relationships and confounders in matrix variate normal models. O. Stegle Latent variable models for GWAs Tübingen 37

84 Summary References I N. Lawrence. Probabilistic non-linear principal component analysis with gaussian process latent variable models. The Journal of Machine Learning Research, 6: , L. Parts, O. Stegle, J. Winn, and R. Durbin. Joint genetic analysis of gene expression data with inferred cellular phenotypes. PLoS genetics, 7(1):e , E. Smith and L. Kruglyak. Gene environment interaction in yeast gene expression. PLoS biology, 6(4):e83, M. Tipping and C. Bishop. Probabilistic principal component analysis. Journal of the Royal Statistical Society. Series B, Statistical Methodology, pages , O. Stegle Latent variable models for GWAs Tübingen 38

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

Introduction to Bioinformatics

Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

Causal Graphical Models in Systems Genetics

Causal Graphical Models in Systems Genetics 1 Causal Graphical Models in Systems Genetics 2013 Network Analysis Short Course - UCLA Human Genetics Elias Chaibub Neto and Brian S Yandell July 17, 2013 Motivation and basic concepts 2 3 Motivation

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Causal Model Selection Hypothesis Tests in Systems Genetics

Causal Model Selection Hypothesis Tests in Systems Genetics 1 Causal Model Selection Hypothesis Tests in Systems Genetics Elias Chaibub Neto and Brian S Yandell SISG 2012 July 13, 2012 2 Correlation and Causation The old view of cause and effect... could only fail;

More information

Inferring Transcriptional Regulatory Networks from High-throughput Data

Inferring Transcriptional Regulatory Networks from High-throughput Data Inferring Transcriptional Regulatory Networks from High-throughput Data Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20

More information

Probabilistic Models for Learning Data Representations. Andreas Damianou

Probabilistic Models for Learning Data Representations. Andreas Damianou Probabilistic Models for Learning Data Representations Andreas Damianou Department of Computer Science, University of Sheffield, UK IBM Research, Nairobi, Kenya, 23/06/2015 Sheffield SITraN Outline Part

More information

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

Inferring Transcriptional Regulatory Networks from Gene Expression Data II Inferring Transcriptional Regulatory Networks from Gene Expression Data II Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday

More information

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data 1 Gene Networks Definition: A gene network is a set of molecular components, such as genes and proteins, and interactions between

More information

Learning in Bayesian Networks

Learning in Bayesian Networks Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

More information

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems

More information

Chapter 15 Active Reading Guide Regulation of Gene Expression

Chapter 15 Active Reading Guide Regulation of Gene Expression Name: AP Biology Mr. Croft Chapter 15 Active Reading Guide Regulation of Gene Expression The overview for Chapter 15 introduces the idea that while all cells of an organism have all genes in the genome,

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday

Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday 1. What is the Central Dogma? 2. How does prokaryotic DNA compare to eukaryotic DNA? 3. How is DNA

More information

Bayesian learning of sparse factor loadings

Bayesian learning of sparse factor loadings Magnus Rattray School of Computer Science, University of Manchester Bayesian Research Kitchen, Ambleside, September 6th 2008 Talk Outline Brief overview of popular sparsity priors Example application:

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Linear Regression (1/1/17)

Linear Regression (1/1/17) STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression

More information

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia Expression QTLs and Mapping of Complex Trait Loci Paul Schliekelman Statistics Department University of Georgia Definitions: Genes, Loci and Alleles A gene codes for a protein. Proteins due everything.

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

Inferring Genetic Architecture of Complex Biological Processes

Inferring Genetic Architecture of Complex Biological Processes Inferring Genetic Architecture of Complex Biological Processes BioPharmaceutical Technology Center Institute (BTCI) Brian S. Yandell University of Wisconsin-Madison http://www.stat.wisc.edu/~yandell/statgen

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Computational Approaches to Statistical Genetics

Computational Approaches to Statistical Genetics Computational Approaches to Statistical Genetics GWAS I: Concepts and Probability Theory Christoph Lippert Dr. Oliver Stegle Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

Predicting Protein Functions and Domain Interactions from Protein Interactions

Predicting Protein Functions and Domain Interactions from Protein Interactions Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

Research Statement on Statistics Jun Zhang

Research Statement on Statistics Jun Zhang Research Statement on Statistics Jun Zhang (junzhang@galton.uchicago.edu) My interest on statistics generally includes machine learning and statistical genetics. My recent work focus on detection and interpretation

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large

More information

Network Biology-part II

Network Biology-part II Network Biology-part II Jun Zhu, Ph. D. Professor of Genomics and Genetic Sciences Icahn Institute of Genomics and Multi-scale Biology The Tisch Cancer Institute Icahn Medical School at Mount Sinai New

More information

BTRY 7210: Topics in Quantitative Genomics and Genetics

BTRY 7210: Topics in Quantitative Genomics and Genetics BTRY 7210: Topics in Quantitative Genomics and Genetics Jason Mezey Biological Statistics and Computational Biology (BSCB) Department of Genetic Medicine jgm45@cornell.edu February 12, 2015 Lecture 3:

More information

Factor Analysis (10/2/13)

Factor Analysis (10/2/13) STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.

More information

Bioinformatics 2 - Lecture 4

Bioinformatics 2 - Lecture 4 Bioinformatics 2 - Lecture 4 Guido Sanguinetti School of Informatics University of Edinburgh February 14, 2011 Sequences Many data types are ordered, i.e. you can naturally say what is before and what

More information

Physical network models and multi-source data integration

Physical network models and multi-source data integration Physical network models and multi-source data integration Chen-Hsiang Yeang MIT AI Lab Cambridge, MA 02139 chyeang@ai.mit.edu Tommi Jaakkola MIT AI Lab Cambridge, MA 02139 tommi@ai.mit.edu September 30,

More information

Modelling Transcriptional Regulation with Gaussian Processes

Modelling Transcriptional Regulation with Gaussian Processes Modelling Transcriptional Regulation with Gaussian Processes Neil Lawrence School of Computer Science University of Manchester Joint work with Magnus Rattray and Guido Sanguinetti 8th March 7 Outline Application

More information

Identifying Signaling Pathways

Identifying Signaling Pathways These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by Anthony Gitter, Mark Craven, Colin Dewey Identifying Signaling Pathways BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2018

More information

Probability models for machine learning. Advanced topics ML4bio 2016 Alan Moses

Probability models for machine learning. Advanced topics ML4bio 2016 Alan Moses Probability models for machine learning Advanced topics ML4bio 2016 Alan Moses What did we cover in this course so far? 4 major areas of machine learning: Clustering Dimensionality reduction Classification

More information

Latent Variable Methods for the Analysis of Genomic Data

Latent Variable Methods for the Analysis of Genomic Data John D. Storey Center for Statistics and Machine Learning & Lewis-Sigler Institute for Integrative Genomics Latent Variable Methods for the Analysis of Genomic Data http://genomine.org/talks/ Data m variables

More information

The Monte Carlo Method: Bayesian Networks

The Monte Carlo Method: Bayesian Networks The Method: Bayesian Networks Dieter W. Heermann Methods 2009 Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 1 / 18 Outline 1 Bayesian Networks 2 Gene Expression Data 3 Bayesian Networks

More information

A Penalized Regression Model for the Joint Estimation of eqtl Associations and Gene Network Structure

A Penalized Regression Model for the Joint Estimation of eqtl Associations and Gene Network Structure A Penalized Regression Model for the Joint Estimation of eqtl Associations and Gene Network Structure Micol Marchetti-Bowick Machine Learning Department Carnegie Mellon University micolmb@cs.cmu.edu Abstract

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Quantile based Permutation Thresholds for QTL Hotspots. Brian S Yandell and Elias Chaibub Neto 17 March 2012

Quantile based Permutation Thresholds for QTL Hotspots. Brian S Yandell and Elias Chaibub Neto 17 March 2012 Quantile based Permutation Thresholds for QTL Hotspots Brian S Yandell and Elias Chaibub Neto 17 March 2012 2012 Yandell 1 Fisher on inference We may at once admit that any inference from the particular

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Course 495: Advanced Statistical Machine Learning/Pattern Recognition Course 495: Advanced Statistical Machine Learning/Pattern Recognition Goal (Lecture): To present Probabilistic Principal Component Analysis (PPCA) using both Maximum Likelihood (ML) and Expectation Maximization

More information

Bayesian Regression (1/31/13)

Bayesian Regression (1/31/13) STA613/CBB540: Statistical methods in computational biology Bayesian Regression (1/31/13) Lecturer: Barbara Engelhardt Scribe: Amanda Lea 1 Bayesian Paradigm Bayesian methods ask: given that I have observed

More information

Discovering molecular pathways from protein interaction and ge

Discovering molecular pathways from protein interaction and ge Discovering molecular pathways from protein interaction and gene expression data 9-4-2008 Aim To have a mechanism for inferring pathways from gene expression and protein interaction data. Motivation Why

More information

Predicting causal effects in large-scale systems from observational data

Predicting causal effects in large-scale systems from observational data nature methods Predicting causal effects in large-scale systems from observational data Marloes H Maathuis 1, Diego Colombo 1, Markus Kalisch 1 & Peter Bühlmann 1,2 Supplementary figures and text: Supplementary

More information

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Latent Variable Models and EM Algorithm

Latent Variable Models and EM Algorithm SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/

More information

Written Exam 15 December Course name: Introduction to Systems Biology Course no

Written Exam 15 December Course name: Introduction to Systems Biology Course no Technical University of Denmark Written Exam 15 December 2008 Course name: Introduction to Systems Biology Course no. 27041 Aids allowed: Open book exam Provide your answers and calculations on separate

More information

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Exploring gene expression data Scale factors, median chip correlation on gene subsets for crude data quality investigation

More information

Gaussian Process Dynamical Models Jack M Wang, David J Fleet, Aaron Hertzmann, NIPS 2005

Gaussian Process Dynamical Models Jack M Wang, David J Fleet, Aaron Hertzmann, NIPS 2005 Gaussian Process Dynamical Models Jack M Wang, David J Fleet, Aaron Hertzmann, NIPS 2005 Presented by Piotr Mirowski CBLL meeting, May 6, 2009 Courant Institute of Mathematical Sciences, New York University

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Integrated Non-Factorized Variational Inference

Integrated Non-Factorized Variational Inference Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014

More information

Causal Discovery by Computer

Causal Discovery by Computer Causal Discovery by Computer Clark Glymour Carnegie Mellon University 1 Outline 1. A century of mistakes about causation and discovery: 1. Fisher 2. Yule 3. Spearman/Thurstone 2. Search for causes is statistical

More information

Relevance Vector Machines

Relevance Vector Machines LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian

More information

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Linear Factor Models. Sargur N. Srihari

Linear Factor Models. Sargur N. Srihari Linear Factor Models Sargur N. srihari@cedar.buffalo.edu 1 Topics in Linear Factor Models Linear factor model definition 1. Probabilistic PCA and Factor Analysis 2. Independent Component Analysis (ICA)

More information

Variable Selection in Structured High-dimensional Covariate Spaces

Variable Selection in Structured High-dimensional Covariate Spaces Variable Selection in Structured High-dimensional Covariate Spaces Fan Li 1 Nancy Zhang 2 1 Department of Health Care Policy Harvard University 2 Department of Statistics Stanford University May 14 2007

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

REVIEW SESSION. Wednesday, September 15 5:30 PM SHANTZ 242 E

REVIEW SESSION. Wednesday, September 15 5:30 PM SHANTZ 242 E REVIEW SESSION Wednesday, September 15 5:30 PM SHANTZ 242 E Gene Regulation Gene Regulation Gene expression can be turned on, turned off, turned up or turned down! For example, as test time approaches,

More information

What is the expectation maximization algorithm?

What is the expectation maximization algorithm? primer 2008 Nature Publishing Group http://www.nature.com/naturebiotechnology What is the expectation maximization algorithm? Chuong B Do & Serafim Batzoglou The expectation maximization algorithm arises

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Eukaryotic Gene Expression

Eukaryotic Gene Expression Eukaryotic Gene Expression Lectures 22-23 Several Features Distinguish Eukaryotic Processes From Mechanisms in Bacteria 123 Eukaryotic Gene Expression Several Features Distinguish Eukaryotic Processes

More information

Markov Chain Monte Carlo Algorithms for Gaussian Processes

Markov Chain Monte Carlo Algorithms for Gaussian Processes Markov Chain Monte Carlo Algorithms for Gaussian Processes Michalis K. Titsias, Neil Lawrence and Magnus Rattray School of Computer Science University of Manchester June 8 Outline Gaussian Processes Sampling

More information

Asymptotic distribution of the largest eigenvalue with application to genetic data

Asymptotic distribution of the largest eigenvalue with application to genetic data Asymptotic distribution of the largest eigenvalue with application to genetic data Chong Wu University of Minnesota September 30, 2016 T32 Journal Club Chong Wu 1 / 25 Table of Contents 1 Background Gene-gene

More information

Bayesian Sparse Correlated Factor Analysis

Bayesian Sparse Correlated Factor Analysis Bayesian Sparse Correlated Factor Analysis 1 Abstract In this paper, we propose a new sparse correlated factor model under a Bayesian framework that intended to model transcription factor regulation in

More information

BME 5742 Biosystems Modeling and Control

BME 5742 Biosystems Modeling and Control BME 5742 Biosystems Modeling and Control Lecture 24 Unregulated Gene Expression Model Dr. Zvi Roth (FAU) 1 The genetic material inside a cell, encoded in its DNA, governs the response of a cell to various

More information

Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics

Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics Lee H. Dicker Rutgers University and Amazon, NYC Based on joint work with Ruijun Ma (Rutgers),

More information

TOWARD ROBUST GROUP-WISE EQTL MAPPING VIA INTEGRATING MULTI-DOMAIN HETEROGENEOUS DATA. Wei Cheng. Chapel Hill 2015

TOWARD ROBUST GROUP-WISE EQTL MAPPING VIA INTEGRATING MULTI-DOMAIN HETEROGENEOUS DATA. Wei Cheng. Chapel Hill 2015 TOWARD ROBUST GROUP-WISE EQTL MAPPING VIA INTEGRATING MULTI-DOMAIN HETEROGENEOUS DATA Wei Cheng A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Lecture 6: April 19, 2002

Lecture 6: April 19, 2002 EE596 Pat. Recog. II: Introduction to Graphical Models Spring 2002 Lecturer: Jeff Bilmes Lecture 6: April 19, 2002 University of Washington Dept. of Electrical Engineering Scribe: Huaning Niu,Özgür Çetin

More information

Prediction of double gene knockout measurements

Prediction of double gene knockout measurements Prediction of double gene knockout measurements Sofia Kyriazopoulou-Panagiotopoulou sofiakp@stanford.edu December 12, 2008 Abstract One way to get an insight into the potential interaction between a pair

More information

CELL BIOLOGY. by the numbers. Ron Milo. Rob Phillips. illustrated by. Nigel Orme

CELL BIOLOGY. by the numbers. Ron Milo. Rob Phillips. illustrated by. Nigel Orme CELL BIOLOGY by the numbers Ron Milo Rob Phillips illustrated by Nigel Orme viii Detailed Table of Contents List of Estimates xii Preface xv Acknowledgments xiii The Path to Biological Numeracy Why We

More information

Quantitative Biology Lecture 3

Quantitative Biology Lecture 3 23 nd Sep 2015 Quantitative Biology Lecture 3 Gurinder Singh Mickey Atwal Center for Quantitative Biology Summary Covariance, Correlation Confounding variables (Batch Effects) Information Theory Covariance

More information

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4 ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian

More information

Inferring Latent Functions with Gaussian Processes in Differential Equations

Inferring Latent Functions with Gaussian Processes in Differential Equations Inferring Latent Functions with in Differential Equations Neil Lawrence School of Computer Science University of Manchester Joint work with Magnus Rattray and Guido Sanguinetti 9th December 26 Outline

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture16: Population structure and logistic regression I Jason Mezey jgm45@cornell.edu April 11, 2017 (T) 8:40-9:55 Announcements I April

More information

Proteomics. 2 nd semester, Department of Biotechnology and Bioinformatics Laboratory of Nano-Biotechnology and Artificial Bioengineering

Proteomics. 2 nd semester, Department of Biotechnology and Bioinformatics Laboratory of Nano-Biotechnology and Artificial Bioengineering Proteomics 2 nd semester, 2013 1 Text book Principles of Proteomics by R. M. Twyman, BIOS Scientific Publications Other Reference books 1) Proteomics by C. David O Connor and B. David Hames, Scion Publishing

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Dimension Reduction (PCA, ICA, CCA, FLD,

Dimension Reduction (PCA, ICA, CCA, FLD, Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1 Outline Dimension reduction

More information

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing George Papandreou and Alan Yuille Department of Statistics University of California, Los Angeles ICCV Workshop on Information

More information

Gene Regulatory Networks II Computa.onal Genomics Seyoung Kim

Gene Regulatory Networks II Computa.onal Genomics Seyoung Kim Gene Regulatory Networks II 02-710 Computa.onal Genomics Seyoung Kim Goal: Discover Structure and Func;on of Complex systems in the Cell Identify the different regulators and their target genes that are

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Grouping of correlated feature vectors using treelets

Grouping of correlated feature vectors using treelets Grouping of correlated feature vectors using treelets Jing Xiang Department of Machine Learning Carnegie Mellon University Pittsburgh, PA 15213 jingx@cs.cmu.edu Abstract In many applications, features

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Causal inference (with statistical uncertainty) based on invariance: exploiting the power of heterogeneous data

Causal inference (with statistical uncertainty) based on invariance: exploiting the power of heterogeneous data Causal inference (with statistical uncertainty) based on invariance: exploiting the power of heterogeneous data Peter Bühlmann joint work with Jonas Peters Nicolai Meinshausen ... and designing new perturbation

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/

More information

Modelling gene expression dynamics with Gaussian processes

Modelling gene expression dynamics with Gaussian processes Modelling gene expression dynamics with Gaussian processes Regulatory Genomics and Epigenomics March th 6 Magnus Rattray Faculty of Life Sciences University of Manchester Talk Outline Introduction to Gaussian

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing

More information

Prokaryotic Gene Expression (Learning Objectives)

Prokaryotic Gene Expression (Learning Objectives) Prokaryotic Gene Expression (Learning Objectives) 1. Learn how bacteria respond to changes of metabolites in their environment: short-term and longer-term. 2. Compare and contrast transcriptional control

More information

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Gaussian with mean ( µ ) and standard deviation ( σ)

Gaussian with mean ( µ ) and standard deviation ( σ) Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (

More information