Some Statistical Models and Algorithms for Change-Point Problems in Genomics

Size: px
Start display at page:

Download "Some Statistical Models and Algorithms for Change-Point Problems in Genomics"

Transcription

1 Some Statistical Models and Algorithms for Change-Point Problems in Genomics S. Robin UMR 518 AgroParisTech / INRA Applied MAth & Comput. Sc. Journées SMAI-MAIRCI Grenoble, September 2012 S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 1 / 35

2 Outline 1 Biological Motivation 2 Frequentist Inference One Profile Several Series 3 Bayesian Inference S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 2 / 35

3 Biological Motivation Biological (and Statistical) Motivation S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 3 / 35

4 Biological Motivation Genomic Alterations in Cancer Cells Genomic Alterations in Cancer Cells Genomic alterations are associated with (responsible for?) various types of cancers. Normal cell S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 4 / 35

5 Biological Motivation Genomic Alterations in Cancer Cells Genomic Alterations in Cancer Cells Genomic alterations are associated with (responsible for?) various types of cancers. Normal cell Tumor cell Source: Hupé (2008). S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 4 / 35

6 Biological Motivation Array CGH technology: Principle Genomic Alterations in Cancer Cells S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 5 / 35

7 Biological Motivation Genomic Alterations in Cancer Cells Segmentation: The basic problem Data at hand: for a given individual a sequence of known positions along the reference genome, labeled as t = 1,2,...,n a signal measured at each position Y t = signal at position t S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 6 / 35

8 Biological Motivation Segmentation: The basic problem Genomic Alterations in Cancer Cells Data at hand: for a given individual a sequence of known positions along the reference genome, labeled as t = 1,2,...,n a signal measured at each position log 2 rat Y t = signal at position t genomic position x 10 6 S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 6 / 35

9 Biological Motivation Segmentation: The basic problem Genomic Alterations in Cancer Cells Data at hand: for a given individual a sequence of known positions along the reference genome, labeled as t = 1,2,...,n a signal measured at each position log 2 rat Y t = signal at position t genomic position x 10 6 Y t f(relative copy number at position t) = log-fluorescence, sum of the two allele signals for a given SNP, etc. S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 6 / 35

10 Biological Motivation Segmentation: The basic problem Genomic Alterations in Cancer Cells Data at hand: for a given individual a sequence of known positions along the reference genome, labeled as Amplified segments t = 1,2,...,n log 2 rat 0 a signal measured at each position 1 2 Deleted segment Unaltered segment Y t = signal at position t genomic position x 10 6 Y t f(relative copy number at position t) = log-fluorescence, sum of the two allele signals for a given SNP, etc. S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 6 / 35

11 Biological Motivation Cancer cell genomic alterations Genomic Alterations in Cancer Cells Zoom on CGH profile chrom. 1 chrom. 17 Karyotype CGH technology gives access to CNV, but does not allow the positioning of translocations. (Hupé (2008)) S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 7 / 35

12 Genome Annotation Biological Motivation Genome Annotation S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 8 / 35

13 Genome Annotation Biological Motivation Genome Annotation A bit of biology: Gene coding sequences are organized in exons / introns. S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 9 / 35

14 Genome Annotation Biological Motivation Genome Annotation A bit of biology: Gene coding sequences are organized in exons / introns. RNA-seq counts RNA-seq: Data provided by NGS should allow to locate precisely the exon boundaries. Known exons source: genomebiology.com S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 9 / 35

15 Genome Annotation Biological Motivation Genome Annotation A bit of biology: Gene coding sequences are organized in exons / introns. RNA-seq: Data provided by NGS should allow to locate precisely the exon boundaries. Reconsidering annotation: Hence the official annotation can be reconsidered. S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 9 / 35

16 Biological Motivation Statistical Modeling (Statistical) Questions Biological questions: How many change points? Where are the change points? (How precise is the location?) Do the change points have the same location in samples A and B?... Statistical issues: Statistical modeling Computational efficiency Model selection S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 10 / 35

17 Biological Motivation Basic Segmentation Model Statistical Modeling S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 11 / 35

18 Biological Motivation Statistical Modeling Basic Segmentation Model Statistical model. Signal = f(position); Y t signal time S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 11 / 35

19 Biological Motivation Statistical Modeling Basic Segmentation Model Statistical model. Signal = f(position); Breakpoint positions: τ 1,τ 2,...,τ K 1 ; signal if t r k, Y t time S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 11 / 35

20 Biological Motivation Statistical Modeling Basic Segmentation Model Statistical model. Signal = f(position); Breakpoint positions: τ 1,τ 2,...,τ K 1 ; Mean signal (e.g. relative copy number) within each interval: θ 1,θ 2,...,θ K ; signal if t r k, Y t θ k time S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 11 / 35

21 Biological Motivation Statistical Modeling Basic Segmentation Model Statistical model. Signal = f(position); Breakpoint positions: τ 1,τ 2,...,τ K 1 ; Mean signal (e.g. relative copy number) within each interval: θ 1,θ 2,...,θ K ; Observed signal at time t: independent variables with given means. Signal if t r k, Y t F(θ k ) Time S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 11 / 35

22 Biological Motivation Statistical Modeling Basic Segmentation Model Statistical model. Signal = f(position); Breakpoint positions: τ 1,τ 2,...,τ K 1 ; Mean signal (e.g. relative copy number) within each interval: θ 1,θ 2,...,θ K ; Observed signal at time t: independent variables with given means if t r k, Y t F(θ k ) Possible choices for the distribution F: CGHarray: continuous signal (fluorescence) Gaussian distribution; NGS: discrete signal (counts) Poisson or negative binomial distribution. S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 11 / 35 Signal Time

23 Biological Motivation Statistical Modeling Some Specificities Two different types of parameters of the models: Segmentation (change point locations): T = (τ 1,...,τ K 1 ) discrete parameter; Distribution parameters (within segment means): Θ = (θ 1,...,θ K ) continuous parameter. S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 12 / 35

24 Some Specificities Biological Motivation Statistical Modeling Two different types of parameters of the models: Segmentation (change point locations): T = (τ 1,...,τ K 1 ) discrete parameter; Distribution parameters (within segment means): Θ = (θ 1,...,θ K ) continuous parameter. Segmentation space: The number of possible segmentation T is huge: ( ) n 1 M K = ; K 1 The systematic exploration of M K is impossible. S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 12 / 35

25 Frequentist Framework Frequentist Inference S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 13 / 35

26 Frequentist Framework Frequentist Inference Frequentist principle: The parameters T and Θ are fixed and we want to provide estimates T and Θ of their true values. S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 13 / 35

27 Frequentist Framework Frequentist Inference Frequentist principle: The parameters T and Θ are fixed and we want to provide estimates T and Θ of their true values. One classical approach: Maximum likelihood estimates (MLE) ( T, Θ) = argmax T,Θ logp(y;t,θ) where Y stands for the whole data set Y = (Y 1,...,Y n ). Hybrid optimization problem with both discrete and continuous arguments. S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 13 / 35

28 One Profile Frequentist Inference One Profile Likelihood. Thanks to independence assumptions, we have logp(y;t,θ) = logp(y t ;θ k ). k t r k S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 14 / 35

29 One Profile Frequentist Inference One Profile Likelihood. Thanks to independence assumptions, we have logp(y;t,θ) = logp(y t ;θ k ). k t r k Parameter estimates Θ. For any segment r k, the MLE of θ k is θ k = argmax logp(y t ;θ k ), θ k t r k S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 14 / 35

30 One Profile Frequentist Inference One Profile Likelihood. Thanks to independence assumptions, we have logp(y;t,θ) = logp(y t ;θ k ). k t r k Parameter estimates Θ. For any segment r k, the MLE of θ k is θ k = argmax logp(y t ;θ k ), e.g. θ k = mean within segment r k. θ k t r k S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 14 / 35

31 One Profile Frequentist Inference One Profile Likelihood. Thanks to independence assumptions, we have logp(y;t,θ) = logp(y t ;θ k ). k t r k Parameter estimates Θ. For any segment r k, the MLE of θ k is θ k = argmax logp(y t ;θ k ), e.g. θ k = mean within segment r k. θ k t r k Segmentation estimates T. S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 14 / 35

32 One Profile Frequentist Inference One Profile Likelihood. Thanks to independence assumptions, we have logp(y;t,θ) = logp(y t ;θ k ). k t r k Parameter estimates Θ. For any segment r k, the MLE of θ k is θ k = argmax logp(y t ;θ k ), e.g. θ k = mean within segment r k. θ k t r k Segmentation estimates T. Finding T is equivalent to a shortest path problem where t r k logp(y t ; θ k ) is the cost of segment r k. S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 14 / 35

33 One Profile Frequentist Inference One Profile Likelihood. Thanks to independence assumptions, we have logp(y;t,θ) = logp(y t ;θ k ). k t r k Parameter estimates Θ. For any segment r k, the MLE of θ k is θ k = argmax logp(y t ;θ k ), e.g. θ k = mean within segment r k. θ k t r k Segmentation estimates T. Finding T is equivalent to a shortest path problem where t r k logp(y t ; θ k ) is the cost of segment r k. Because the cost is additive, this can be solve via dynamic programming with time complexity O(Kn 2 ) and space complexity O(Kn). S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 14 / 35

34 Frequentist Inference One Profile How Many Change-Points? How many segments? The number of segments K is not known a priori. The fit of the segmentation improves as K increases. Penalized likelihood = most common strategy: S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 15 / 35

35 Frequentist Inference How Many Change-Points? One Profile How many segments? The number of segments K is not known a priori. The fit of the segmentation improves as K increases. Penalized likelihood = most common strategy: logp(y;k) Contrast K S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 15 / 35

36 Frequentist Inference How Many Change-Points? One Profile How many segments? The number of segments K is not known a priori. The fit of the segmentation improves as K increases. Penalized likelihood = most common strategy: logp(y;k)+pen(k) Contrast K S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 15 / 35

37 Frequentist Inference How Many Change-Points? One Profile How many segments? The number of segments K is not known a priori. The fit of the segmentation improves as K increases. Penalized likelihood = most common strategy: Contrast K = argmin K K S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 15 / 35

38 Frequentist Inference How Many Change-Points? One Profile How many segments? The number of segments K is not known a priori. The fit of the segmentation improves as K increases. Penalized likelihood = most common strategy: Contrast K = argmin K A huge literature has been developed (Lebarbier (2005), Lavielle (2005), Zhang and Siegmund (2007),...) to insure that Pr{ K = K} 1 when n or to approximate P(K Y). K S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 15 / 35

39 Frequentist Inference One Profile Speeding-up Dynamic Programming Quadratic complexity O(Kn 2 ) is not acceptable for very large signals (e.g. NGS for DNAseq where n = 10 6 to S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 16 / 35

40 Frequentist Inference One Profile Speeding-up Dynamic Programming Quadratic complexity O(Kn 2 ) is not acceptable for very large signals (e.g. NGS for DNAseq where n = 10 6 to Pruned DP can strongly reduce the computational burden (Rigaill (2010)). Its theoretical worst complexity is the same as regular DP. Its mean empirical complexity is almost linear: O(Kn log n). S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 16 / 35

41 Frequentist Inference One Profile Speeding-up Dynamic Programming Quadratic complexity O(Kn 2 ) is not acceptable for very large signals (e.g. NGS for DNAseq where n = 10 6 to Pruned DP can strongly reduce the computational burden (Rigaill (2010)). Its theoretical worst complexity is the same as regular DP. Its mean empirical complexity is almost linear: O(Kn log n). Trick. Not to compute parameter estimates Θ (see # 14) before to perform segmentation. Only applicable for nice models (e.g. Gaussian, Poisson, negative binomial with known φ,...) S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 16 / 35

42 Frequentist Inference One Profile Breast Cancer Data (collab. Curie/Servier) Chrom. 10 and 8 of two breast carcinomas (TNBC). Rigaill (2011) S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 17 / 35

43 Several Series Frequentist Inference Several Series Cancer studies are most often carried on cohorts involving numerous CGH profiles (m 100). S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 18 / 35

44 Frequentist Inference Several Series Several Series Cancer studies are most often carried on cohorts involving numerous CGH profiles (m 100). Thegoalisthentodetectrecurrentalterations among patients carrying the same type of cancer. As all the profiles are obtained with the same technology: same CGH probes, same NGS sequences, correlations between profiles are likelily to exist. Picard et al. (2011b) S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 18 / 35

45 Algorithmic Issue Frequentist Inference One series: OK with classical DP. Several Series Graphical model representation. S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 19 / 35

46 Algorithmic Issue Frequentist Inference One series: OK with classical DP. Several Series Graphical model representation. Independent series: OK with 2-stage DP. S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 19 / 35

47 Algorithmic Issue Frequentist Inference One series: OK with classical DP. Several Series Graphical model representation. Independent series: OK with 2-stage DP. Correlation, same change points: same complexity as for one series classical DP. S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 19 / 35

48 Algorithmic Issue Frequentist Inference One series: OK with classical DP. Several Series Graphical model representation. Independent series: OK with 2-stage DP. Correlation, same change points: same complexity as for one series classical DP. Correlation, different change points: Non-additivity of the likelihood DP cannot apply as such. S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 19 / 35

49 Frequentist Inference Modeling the Dependency Several Series Joint segmentation model. In the Gaussian framework, we can write Y tm = µ km +F tm, t r m k S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 20 / 35

50 Frequentist Inference Modeling the Dependency Several Series Joint segmentation model. In the Gaussian framework, we can write Y tm = µ km +F tm, t r m k Factor model (Random effect model). An old trick in multivariate Gaussian models: S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 20 / 35

51 Frequentist Inference Modeling the Dependency Several Series Joint segmentation model. In the Gaussian framework, we can write Y tm = µ km +F tm, t r m k Factor model (Random effect model). An old trick in multivariate Gaussian models: a (unobserved) random effect Z t = (Z t1,...,z tq ) N(0,I) is associated to each position (probe); the residual terms F t at position t all depend on this effect: F t = BZ t +E t, {E t } i.i.d. N(0,σ 2 I); S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 20 / 35

52 Frequentist Inference Modeling the Dependency Several Series Joint segmentation model. In the Gaussian framework, we can write Y tm = µ km +F tm, t r m k Factor model (Random effect model). An old trick in multivariate Gaussian models: a (unobserved) random effect Z t = (Z t1,...,z tq ) N(0,I) is associated to each position (probe); the residual terms F t at position t all depend on this effect: F t = BZ t +E t, {E t } i.i.d. N(0,σ 2 I); so the residuals at position t have covariance matrix Σ = BB +σ 2 I. This allows to capture any covariance structure (e.g. taking Q = m 1) S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 20 / 35

53 Frequentist Inference Breaking down dependency Several Series Marginal likelihood: log p(y) is not additive w.r.t. the segments r m k. S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 21 / 35

54 Frequentist Inference Breaking down dependency Several Series Marginal likelihood: log p(y) is not additive w.r.t. the segments r m k. Joint likelihood: log p(y, Z) is still not additive. S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 21 / 35

55 Frequentist Inference Breaking down dependency Several Series Marginal likelihood: log p(y) is not additive w.r.t. the segments r m k. Joint likelihood: log p(y, Z) is still not additive. Conditional likelihood: log p(y Z) is additive w.r.t. the segments. S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 21 / 35

56 E-M algorithm Frequentist Inference Several Series Incomplete data model. In the factor model, the random effect Z is unobserved. S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 22 / 35

57 E-M algorithm Frequentist Inference Several Series Incomplete data model. In the factor model, the random effect Z is unobserved. E-M algorithm. To perform maximum likelihood inference: first notice that ( T, Θ, Σ) = argmax T,Θ logp(y;t,θ,σ) logp(y;t,θ) = E[logP(Y,Z;T,Θ) Y] E[logP(Z Y;T,Θ) Y] S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 22 / 35

58 E-M algorithm Frequentist Inference Several Series Incomplete data model. In the factor model, the random effect Z is unobserved. E-M algorithm. To perform maximum likelihood inference: first notice that ( T, Θ, Σ) = argmax T,Θ logp(y;t,θ,σ) logp(y;t,θ) = E[logP(Y,Z;T,Θ) Y] E[logP(Z Y;T,Θ) Y] then, iteratively, E-step: compute the conditional distribution P(Z Y; T, Θ, Σ); M-step: update the estimates as ( T, Θ, Σ) = argmax T,Θ E[logP(Y,Z;T,Θ,Σ) Y] = argmax T,Θ E[logP(Z;Σ) Y]+E[logP(Y Z;T,Θ) Y]... which can be achieved with dynamic programming. S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 22 / 35

59 Frequentist Inference Several Series Bladder Cancer Data (collab. Curie) Some large probe effect Z t are detected Poor probe affinity? Wrong annotation? Polymorphism? S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 23 / 35

60 Frequentist Inference Several Series Bladder Cancer Data (collab. Curie) Some large probe effect Z t are detected Poor probe affinity? Wrong annotation? Polymorphism? Breakpoints near this position are likely to be artifacts S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 23 / 35

61 Frequentist Inference Several Series Bladder Cancer Data (collab. Curie) Some large probe effect Z t are detected Poor probe affinity? Wrong annotation? Polymorphism? Breakpoints near this position are likely to be artifacts. The global segmentation is affected by this correction: : independent segmentation, : multivariate mixed segmentation S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 23 / 35

62 Frequentist Inference CGH-seg Package CGH-seg Package CGHseg is an R package dedicated to the analysis of CGH profiles Picard et al. (2011a). 1 uniform correlation S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 24 / 35

63 Frequentist Inference CGH-seg Package CGH-seg Package CGHseg is an R package dedicated to the analysis of CGH profiles Picard et al. (2011a). Linear model framework: Segmentation regression on unknown T: Y = Tµ + E 1 uniform correlation S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 24 / 35

64 Frequentist Inference CGH-seg Package CGH-seg Package CGHseg is an R package dedicated to the analysis of CGH profiles Picard et al. (2011a). Linear model framework: Segmentation regression on unknown T: Y = Tµ + E Correlation Factor model( 1 ): Z Y = Tµ+BZ+E 1 uniform correlation S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 24 / 35

65 Frequentist Inference CGH-seg Package CGH-seg Package CGHseg is an R package dedicated to the analysis of CGH profiles Picard et al. (2011a). Linear model framework: Segmentation regression on unknown T: Y = Tµ + E Correlation Factor model( 1 ): Z Y = Tµ+BZ+E Correction fixed covariates effect: β Y = Tµ+Xβ +E 1 uniform correlation S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 24 / 35

66 Frequentist Inference CGH-seg Package CGH-seg Package CGHseg is an R package dedicated to the analysis of CGH profiles Picard et al. (2011a). Linear model framework: Segmentation regression on unknown T: Y = Tµ + E Correlation Factor model( 1 ): Z Y = Tµ+BZ+E Correction fixed covariates effect: β Y = Tµ+Xβ +E Calling crosstabulation table C: Y = TCm + E 1 uniform correlation S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 24 / 35

67 Frequentist Inference CGH-seg Package CGH-seg Package CGHseg is an R package dedicated to the analysis of CGH profiles Picard et al. (2011a). Linear model framework: Segmentation regression on unknown T: Y = Tµ + E Correlation Factor model( 1 ): Z Y = Tµ+BZ+E Correction fixed covariates effect: β Y = Tµ+Xβ +E Calling crosstabulation table C: Y = TCm + E Combinations fixed + random effects: Y = Tµ+Xβ +LZ+E fixed effects + calling: Y = TCm+Xβ +E 1 uniform correlation S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 24 / 35

68 Bayesian Framework Bayesian Inference S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 25 / 35

69 Bayesian Framework Bayesian Inference Limitation of the frequentist approach: Due the discrete nature of the change points, their inference is complex and confidence intervals can not been easily derived. S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 25 / 35

70 Bayesian Inference Bayesian Framework Limitation of the frequentist approach: Due the discrete nature of the change points, their inference is complex and confidence intervals can not been easily derived. Bayesian principle: All parameters (K, T, Θ) are random. Prior distribution of the parameters: P(K), P(T K), P(Θ T,K); Likelihood of the data given the parameters: P(Y K,T,Θ); Posterior distribution of the parameter: P(K,T,Θ Y) = P(K,T,Θ,Y) P(Y) S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 25 / 35

71 Bayesian Framework Bayesian Inference Limitation of the frequentist approach: Due the discrete nature of the change points, their inference is complex and confidence intervals can not been easily derived. Bayesian principle: All parameters (K, T, Θ) are random. Prior distribution of the parameters: P(K), P(T K), P(Θ T,K); Likelihood of the data given the parameters: P(Y K,T,Θ); Posterior distribution of the parameter: where P(K,T,Θ Y) = P(K,T,Θ,Y) P(Y) P(K,T,Θ,Y) = P(K)P(T K)P(Θ T,K)(Y K,T,Θ) P(Y) = P(K,T,Θ,Y). K Θ T M K S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 25 / 35

72 Posterior Distributions Bayesian Inference Biological questions: S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 26 / 35

73 Bayesian Inference Posterior Distributions Biological questions: How many change points? P(K Y); S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 26 / 35

74 Bayesian Inference Posterior Distributions Biological questions: How many change points? P(K Y); Where are the change points? P(T Y,K) or P(T Y); S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 26 / 35

75 Bayesian Inference Posterior Distributions Biological questions: How many change points? P(K Y); Where are the change points? P(T Y,K) or P(T Y); How precise is the location? P(τ 1 Y,K) or P(τ 1 Y); S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 26 / 35

76 Bayesian Inference Posterior Distributions Biological questions: How many change points? P(K Y); Where are the change points? P(T Y,K) or P(T Y); How precise is the location? P(τ 1 Y,K) or P(τ 1 Y); Do the change points have the same location in samples A and B? P(τ A 1 τb 1 YA,Y B ). S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 26 / 35

77 Bayesian Inference Posterior Distributions Biological questions: How many change points? P(K Y); Where are the change points? P(T Y,K) or P(T Y); How precise is the location? P(τ 1 Y,K) or P(τ 1 Y); Do the change points have the same location in samples A and B? P(τ A 1 τb 1 YA,Y B ). One common issue. The calculation of all these quantities require this of P(Y K) = T M K P(Y,T K) where M K stands for the set of all K-segment segmentations. S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 26 / 35

78 Bayesian Inference Exact Posterior Calculating Exact Posterior Distributions 1 - Factorization assumption: P(T) = f(r), P(θ T) = f(θ r ) P(Y T,θ) = f(y r,θ r ) r T r T r T S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 27 / 35

79 Bayesian Inference Exact Posterior Calculating Exact Posterior Distributions 1 - Factorization assumption: P(T) = f(r), P(θ T) = f(θ r ) P(Y T,θ) = f(y r,θ r ) r T r T r T 2 - Conjugate prior: P(Y r θ r )P(θ r )dθ r has a closed form expression. S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 27 / 35

80 Bayesian Inference Exact Posterior Calculating Exact Posterior Distributions 1 - Factorization assumption: P(T) = f(r), P(θ T) = f(θ r ) P(Y T,θ) = f(y r,θ r ) r T r T r T 2 - Conjugate prior: P(Y r θ r )P(θ r )dθ r has a closed form expression. Sum-product form: The total probability can be rewritten as P(Y K) = P(Y,T,Θ K) T M K = Θ r M K r T P(Y r ) S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 27 / 35

81 Bayesian Inference Exploring the segmentation space Exact Posterior Computational consequence. r M K r T P(Yr ) can be computed as [A K ] 0,n, where A ij = P(Y i+1,...y j ); with complexity O(Kn 2 ) Rigaill et al. (2011). S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 28 / 35

82 Bayesian Inference Exploring the segmentation space Exact Posterior Computational consequence. r M K r T P(Yr ) can be computed as [A K ] 0,n, where A ij = P(Y i+1,...y j ); with complexity O(Kn 2 ) Rigaill et al. (2011). We are hence able to compute with quadratic complexity, S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 28 / 35

83 Bayesian Inference Exact Posterior Exploring the segmentation space Computational consequence. r M K r T P(Yr ) can be computed as [A K ] 0,n, where A ij = P(Y i+1,...y j ); with complexity O(Kn 2 ) Rigaill et al. (2011). We are hence able to compute with quadratic complexity, The probability that there is a breakpoint at position t; The probability for a segment r = t 1,t 2 to be part of segmentation; The posterior entropy of T within a dimension; S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 28 / 35

84 Bayesian Inference Exact Posterior Exploring the segmentation space Computational consequence. r M K r T P(Yr ) can be computed as [A K ] 0,n, where A ij = P(Y i+1,...y j ); with complexity O(Kn 2 ) Rigaill et al. (2011). We are hence able to compute with quadratic complexity, The probability that there is a breakpoint at position t; The probability for a segment r = t 1,t 2 to be part of segmentation; The posterior entropy of T within a dimension; Similar to an algorithm proposed by Guédon (2007) in the HMM context with known parameters. S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 28 / 35

85 Bayesian Inference Exact Posterior Exploring the segmentation space Computational consequence. r M K r T P(Yr ) can be computed as [A K ] 0,n, where A ij = P(Y i+1,...y j ); with complexity O(Kn 2 ) Rigaill et al. (2011). We are hence able to compute with quadratic complexity, The probability that there is a breakpoint at position t; The probability for a segment r = t 1,t 2 to be part of segmentation; The posterior entropy of T within a dimension; Similar to an algorithm proposed by Guédon (2007) in the HMM context with known parameters. Algorithmic comment. Replacing products with sums and sums with min gives the dynamic programming algorithm mentioned in # 14. S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 28 / 35

86 Bayesian Inference A CGH profile: K = 3,4 Exact Posterior S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 29 / 35

87 Bayesian Inference Presence of an alteration Exact Posterior To assess the gain or loss of a given locus (e.g. gene), one need to know if it is included within an alteration (see slide # 17). The posterior probability of a segment can be computed as well: f(t 1,t 2 ) = Pr{ k : r k = t 1,t 2 Y,K} S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 30 / 35

88 Bayesian Inference Presence of an alteration Exact Posterior To assess the gain or loss of a given locus (e.g. gene), one need to know if it is included within an alteration (see slide # 17). The posterior probability of a segment can be computed as well: f(t 1,t 2 ) = Pr{ k : r k = t 1,t 2 Y,K} K = 3 K = 4 S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 30 / 35

89 Bayesian Inference Exact Posterior Model selection: Exact BIC and ICL Standard model selection criteria can be computed exactly (without, say, Laplace approximation). Choice of K. Exact version of BIC(K): logp(y,k) S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 31 / 35

90 Bayesian Inference Exact Posterior Model selection: Exact BIC and ICL Standard model selection criteria can be computed exactly (without, say, Laplace approximation). Choice of K. Exact version of BIC(K): logp(y,k) Choice of T. Exact version of BIC(T): logp(y,t) S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 31 / 35

91 Bayesian Inference Exact Posterior Model selection: Exact BIC and ICL Standard model selection criteria can be computed exactly (without, say, Laplace approximation). Choice of K. Exact version of BIC(K): logp(y,k) Choice of T. Exact version of BIC(T): logp(y,t) Alternative choice of K. Exact version of ICL(T) (Biernacki et al. (2000)): logp(y,k)+ r M K P(T Y,K)logP(T Y,K) }{{} Entropy within M K favors dimensions K the best segmentation T K clearly outperforms the others. S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 31 / 35

92 Bayesian Inference Exact Posterior Comparison BIC/ICL: Simulation study Simulations: Poisson signal with alternating means λ 0 and λ 1, n = 100. BIC(K) BIC(T) ICL(K) Criterion = % of recovery of the true number of segments (K = 5). BIC(T) achieves (much) better performances than BIC(K) ICL(K) outperforms both BIC(K) and BIC(T). λ 1 λ 0 (λ 0 = 1) Rigaill et al. (2011) S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 32 / 35

93 Re-annotation Bayesian Inference Alternative Splicing The posterior distribution of the change-points location can be used to assess the genome annotation. It also allows to study variations of the transcription start due to condition changes. Cleynen (2012) S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 33 / 35

94 Bayesian Inference Alternative Splicing Alternative Splicing (collab. univ. Berkeley) Alternative splicing refer to modification of exon boundaries (or to exon skipping). Based on the posterior distribution of exon boundaries, a test for alternative splicing can be derived. S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 34 / 35

95 Bayesian Inference Alternative Splicing Alternative Splicing (collab. univ. Berkeley) Alternative splicing refer to modification of exon boundaries (or to exon skipping). Based on the posterior distribution of exon boundaries, a test for alternative splicing can be derived. The EBS package performs an exact bayesian segmentation on data and returns all quantities described above: cran.r-project.org/web/packages/ebs S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 34 / 35

96 Discussion Bayesian Inference Alternative Splicing Statistics are needed to assess the existence and the location of genomic alteration or to comfort genome annotation. S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 35 / 35

97 Discussion Bayesian Inference Alternative Splicing Statistics are needed to assess the existence and the location of genomic alteration or to comfort genome annotation. Algorithmic complexity is one major limitation of such analyses. How to manage profiles on 10 8 nucleotides for a cohort of 10 3 or 10 4 patients? S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 35 / 35

98 Bayesian Inference Alternative Splicing Discussion Statistics are needed to assess the existence and the location of genomic alteration or to comfort genome annotation. Algorithmic complexity is one major limitation of such analyses. How to manage profiles on 10 8 nucleotides for a cohort of 10 3 or 10 4 patients? Some related questions. Assess the relation between an alteration with given length and frequency in a cohort an a certain type of cancer. Study the influence of various experimental conditions on alternative splicing for genes with a large (10, 20) number of exons. S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 35 / 35

99 Appendix Biernacki, C., Celeux, G. and Govaert, G. (2000). Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Anal. Machine Intel. 22 (7) Guédon, Y. (2007). Exploring the state sequence space for hidden Markov and semi-markov chain. Comput. Statist. and Data Analysis Hupé, P. (2008). Biostatistical algorithms for omics data in oncology: Application to DNA copy number microarray experiments. PhD thesis, AgroParisTech. Lavielle, M. (2005). Using penalized contrasts for the change-point problem. Signal Proc Lebarbier, E. (2005). Detecting multiple change-points in the mean of gaussian process by model selection. Signal Proc Picard, F., Lebarbier, E., Hoebeke, M., Rigaill, G., Thiam, B. and Robin, S. (2011a). Joint segmentation, calling, and normalization of multiple CGH profiles. Biostatistics. doi: /biostatistics/kxq075. Picard, F., Lebarbier, E., Budinska, E. and Robin, S. (2011b). Joint segmentation of multivariate gaussian processes using mixed linear models. Comput. Statist. and Data Analysis. 55 (2) Rigaill, G. (2010), Pruned dynamic programming for optimal multiple change-point detection. Technical report, arxiv. arxiv: Rigaill, G. (2011). Statistical and algorithmic developments for the analysis of Triple Negative Breast Cancers. PhD thesis, ABIES. Rigaill, G., Lebarbier, E. and Robin, S. (2011). Exact posterior distributions over the segmentation space and model selection for multiple change-point detection problem. Stat. Comp DOI: /s Zhang, N. R. and Siegmund, D. O. (2007). A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data. Biometrics. 63 (1) S. Robin (AgroParisTech / INRA) Statistics & Segmentation in Genomics SMAI-MAIRCI 35 / 35

Linear models for the joint analysis of multiple. array-cgh profiles

Linear models for the joint analysis of multiple. array-cgh profiles Linear models for the joint analysis of multiple array-cgh profiles F. Picard, E. Lebarbier, B. Thiam, S. Robin. UMR 5558 CNRS Univ. Lyon 1, Lyon UMR 518 AgroParisTech/INRA, F-75231, Paris Statistics for

More information

Mixtures and Hidden Markov Models for analyzing genomic data

Mixtures and Hidden Markov Models for analyzing genomic data Mixtures and Hidden Markov Models for analyzing genomic data Marie-Laure Martin-Magniette UMR AgroParisTech/INRA Mathématique et Informatique Appliquées, Paris UMR INRA/UEVE ERL CNRS Unité de Recherche

More information

Mixture models for analysing transcriptome and ChIP-chip data

Mixture models for analysing transcriptome and ChIP-chip data Mixture models for analysing transcriptome and ChIP-chip data Marie-Laure Martin-Magniette French National Institute for agricultural research (INRA) Unit of Applied Mathematics and Informatics at AgroParisTech,

More information

A segmentation-clustering problem for the analysis of array CGH data

A segmentation-clustering problem for the analysis of array CGH data A segmentation-clustering problem for the analysis of array CGH data F. Picard, S. Robin, E. Lebarbier, J-J. Daudin UMR INA P-G / ENGREF / INRA MIA 518 APPLIED STOCHASTIC MODELS AND DATA ANALYSIS Brest

More information

arxiv: v2 [stat.co] 1 Jul 2013

arxiv: v2 [stat.co] 1 Jul 2013 Fast estimation of the Integrated Completed Likelihood criterion for change-point detection problems with applications to Next-Generation Sequencing data arxiv:1211.3210v2 [stat.co] 1 Jul 2013 A. Cleynen

More information

Variable selection for model-based clustering

Variable selection for model-based clustering Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Dispersion modeling for RNAseq differential analysis

Dispersion modeling for RNAseq differential analysis Dispersion modeling for RNAseq differential analysis E. Bonafede 1, F. Picard 2, S. Robin 3, C. Viroli 1 ( 1 ) univ. Bologna, ( 3 ) CNRS/univ. Lyon I, ( 3 ) INRA/AgroParisTech, Paris IBC, Victoria, July

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Yale School of Public Health Joint work with Ning Hao, Yue S. Niu presented @Tsinghua University Outline 1 The Problem

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Deciphering and modeling heterogeneity in interaction networks

Deciphering and modeling heterogeneity in interaction networks Deciphering and modeling heterogeneity in interaction networks (using variational approximations) S. Robin INRA / AgroParisTech Mathematical Modeling of Complex Systems December 2013, Ecole Centrale de

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Genetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig

Genetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig Genetic Networks Korbinian Strimmer IMISE, Universität Leipzig Seminar: Statistical Analysis of RNA-Seq Data 19 June 2012 Korbinian Strimmer, RNA-Seq Networks, 19/6/2012 1 Paper G. I. Allen and Z. Liu.

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Multivariate Gaussians Mark Schmidt University of British Columbia Winter 2019 Last Time: Multivariate Gaussian http://personal.kenyon.edu/hartlaub/mellonproject/bivariate2.html

More information

arxiv: v1 [stat.ml] 22 Jun 2012

arxiv: v1 [stat.ml] 22 Jun 2012 Hidden Markov Models with mixtures as emission distributions Stevenn Volant 1,2, Caroline Bérard 1,2, Marie-Laure Martin Magniette 1,2,3,4,5 and Stéphane Robin 1,2 arxiv:1206.5102v1 [stat.ml] 22 Jun 2012

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Mixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data

Mixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data Mixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data Cinzia Viroli 1 joint with E. Bonafede 1, S. Robin 2 & F. Picard 3 1 Department of Statistical Sciences, University

More information

Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions

Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions Parthan Kasarapu & Lloyd Allison Monash University, Australia September 8, 25 Parthan Kasarapu

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

EM & Variational Bayes

EM & Variational Bayes EM & Variational Bayes Hanxiao Liu September 9, 2014 1 / 19 Outline 1. EM Algorithm 1.1 Introduction 1.2 Example: Mixture of vmfs 2. Variational Bayes 2.1 Introduction 2.2 Example: Bayesian Mixture of

More information

Goodness of fit of logistic models for random graphs: a variational Bayes approach

Goodness of fit of logistic models for random graphs: a variational Bayes approach Goodness of fit of logistic models for random graphs: a variational Bayes approach S. Robin Joint work with P. Latouche and S. Ouadah INRA / AgroParisTech MIAT Seminar, Toulouse, November 2016 S. Robin

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Today s Lecture: HMMs

Today s Lecture: HMMs Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Finite Singular Multivariate Gaussian Mixture

Finite Singular Multivariate Gaussian Mixture 21/06/2016 Plan 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Plan Singular Multivariate Normal Distribution 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Multivariate

More information

Co-expression analysis of RNA-seq data

Co-expression analysis of RNA-seq data Co-expression analysis of RNA-seq data Etienne Delannoy & Marie-Laure Martin-Magniette & Andrea Rau Plant Science Institut of Paris-Saclay (IPS2) Applied Mathematics and Informatics Unit (MIA-Paris) Genetique

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

An Introduction to mixture models

An Introduction to mixture models An Introduction to mixture models by Franck Picard Research Report No. 7 March 2007 Statistics for Systems Biology Group Jouy-en-Josas/Paris/Evry, France http://genome.jouy.inra.fr/ssb/ An introduction

More information

g-priors for Linear Regression

g-priors for Linear Regression Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Mixtures of Gaussians. Sargur Srihari

Mixtures of Gaussians. Sargur Srihari Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Bayesian Nonparametrics: Models Based on the Dirichlet Process

Bayesian Nonparametrics: Models Based on the Dirichlet Process Bayesian Nonparametrics: Models Based on the Dirichlet Process Alessandro Panella Department of Computer Science University of Illinois at Chicago Machine Learning Seminar Series February 18, 2013 Alessandro

More information

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010 Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

Gibbs Sampling Methods for Multiple Sequence Alignment

Gibbs Sampling Methods for Multiple Sequence Alignment Gibbs Sampling Methods for Multiple Sequence Alignment Scott C. Schmidler 1 Jun S. Liu 2 1 Section on Medical Informatics and 2 Department of Statistics Stanford University 11/17/99 1 Outline Statistical

More information

Hidden Markov Models (I)

Hidden Markov Models (I) GLOBEX Bioinformatics (Summer 2015) Hidden Markov Models (I) a. The model b. The decoding: Viterbi algorithm Hidden Markov models A Markov chain of states At each state, there are a set of possible observables

More information

2018 SISG Module 20: Bayesian Statistics for Genetics Lecture 2: Review of Probability and Bayes Theorem

2018 SISG Module 20: Bayesian Statistics for Genetics Lecture 2: Review of Probability and Bayes Theorem 2018 SISG Module 20: Bayesian Statistics for Genetics Lecture 2: Review of Probability and Bayes Theorem Jon Wakefield Departments of Statistics and Biostatistics University of Washington Outline Introduction

More information

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics arxiv:1603.08163v1 [stat.ml] 7 Mar 016 Farouk S. Nathoo, Keelin Greenlaw,

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Choosing among models

Choosing among models Eco 515 Fall 2014 Chris Sims Choosing among models September 18, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported

More information

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Michael J. Daniels and Chenguang Wang Jan. 18, 2009 First, we would like to thank Joe and Geert for a carefully

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Bayesian Nonparametric Regression for Diabetes Deaths

Bayesian Nonparametric Regression for Diabetes Deaths Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

Computational Genomics

Computational Genomics Computational Genomics http://www.cs.cmu.edu/~02710 Introduction to probability, statistics and algorithms (brief) intro to probability Basic notations Random variable - referring to an element / event

More information

Master 2 Informatique Probabilistic Learning and Data Analysis

Master 2 Informatique Probabilistic Learning and Data Analysis Master 2 Informatique Probabilistic Learning and Data Analysis Faicel Chamroukhi Maître de Conférences USTV, LSIS UMR CNRS 7296 email: chamroukhi@univ-tln.fr web: chamroukhi.univ-tln.fr 2013/2014 Faicel

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin

More information

2017 SISG Module 1: Bayesian Statistics for Genetics Lecture 7: Generalized Linear Modeling

2017 SISG Module 1: Bayesian Statistics for Genetics Lecture 7: Generalized Linear Modeling 2017 SISG Module 1: Bayesian Statistics for Genetics Lecture 7: Generalized Linear Modeling Jon Wakefield Departments of Statistics and Biostatistics University of Washington Outline Introduction and Motivating

More information

Fast estimation of posterior probabilities in change-point analysis through a constrained hidden Markov model

Fast estimation of posterior probabilities in change-point analysis through a constrained hidden Markov model arxiv:1203.4394v2 [stat.ap] 22 Jan 2013 Fast estimation of posterior probabilities in change-point analysis through a constrained hidden Markov model The Minh Luong, Yves Rozenholc, and Gregory Nuel MAP5,

More information

Multivariate Bayesian Linear Regression MLAI Lecture 11

Multivariate Bayesian Linear Regression MLAI Lecture 11 Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Bayesian Model Diagnostics and Checking

Bayesian Model Diagnostics and Checking Earvin Balderama Quantitative Ecology Lab Department of Forestry and Environmental Resources North Carolina State University April 12, 2013 1 / 34 Introduction MCMCMC 2 / 34 Introduction MCMCMC Steps in

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

Variable Selection in Structured High-dimensional Covariate Spaces

Variable Selection in Structured High-dimensional Covariate Spaces Variable Selection in Structured High-dimensional Covariate Spaces Fan Li 1 Nancy Zhang 2 1 Department of Health Care Policy Harvard University 2 Department of Statistics Stanford University May 14 2007

More information

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

CS Homework 3. October 15, 2009

CS Homework 3. October 15, 2009 CS 294 - Homework 3 October 15, 2009 If you have questions, contact Alexandre Bouchard (bouchard@cs.berkeley.edu) for part 1 and Alex Simma (asimma@eecs.berkeley.edu) for part 2. Also check the class website

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

Hidden Markov Models. Terminology, Representation and Basic Problems

Hidden Markov Models. Terminology, Representation and Basic Problems Hidden Markov Models Terminology, Representation and Basic Problems Data analysis? Machine learning? In bioinformatics, we analyze a lot of (sequential) data (biological sequences) to learn unknown parameters

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

LECTURE 15 Markov chain Monte Carlo

LECTURE 15 Markov chain Monte Carlo LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Module 22: Bayesian Methods Lecture 9 A: Default prior selection Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical

More information

Different points of view for selecting a latent structure model

Different points of view for selecting a latent structure model Different points of view for selecting a latent structure model Gilles Celeux Inria Saclay-Île-de-France, Université Paris-Sud Latent structure models: two different point of views Density estimation LSM

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Segmentation of the mean of heteroscedastic data via cross-validation

Segmentation of the mean of heteroscedastic data via cross-validation Segmentation of the mean of heteroscedastic data via cross-validation 1 UMR 8524 CNRS - Université Lille 1 2 SSB Group, Paris joint work with Sylvain Arlot GDR Statistique et Santé Paris, October, 21 2009

More information

Gaussian discriminant analysis Naive Bayes

Gaussian discriminant analysis Naive Bayes DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate

More information

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio Estimation of reliability parameters from Experimental data (Parte 2) This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist

More information

Markov Chains and Hidden Markov Models

Markov Chains and Hidden Markov Models Chapter 1 Markov Chains and Hidden Markov Models In this chapter, we will introduce the concept of Markov chains, and show how Markov chains can be used to model signals using structures such as hidden

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Bayesian model selection: methodology, computation and applications

Bayesian model selection: methodology, computation and applications Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program

More information

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging 10-708: Probabilistic Graphical Models 10-708, Spring 2018 10 : HMM and CRF Lecturer: Kayhan Batmanghelich Scribes: Ben Lengerich, Michael Kleyman 1 Case Study: Supervised Part-of-Speech Tagging We will

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Multivariate Survival Analysis

Multivariate Survival Analysis Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in

More information

The Expectation Maximization or EM algorithm

The Expectation Maximization or EM algorithm The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:

More information

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010 Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Latent Variable models for GWAs

Latent Variable models for GWAs Latent Variable models for GWAs Oliver Stegle Machine Learning and Computational Biology Research Group Max-Planck-Institutes Tübingen, Germany September 2011 O. Stegle Latent variable models for GWAs

More information

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter

More information