A segmentation-clustering problem for the analysis of array CGH data

Size: px
Start display at page:

Download "A segmentation-clustering problem for the analysis of array CGH data"

Transcription

1 A segmentation-clustering problem for the analysis of array CGH data F. Picard, S. Robin, E. Lebarbier, J-J. Daudin UMR INA P-G / ENGREF / INRA MIA 518 APPLIED STOCHASTIC MODELS AND DATA ANALYSIS Brest May 24

2 Microarray CGH technology - Known effects of big size chromosomal aberrations (ex: trisomy). experimental tool: Karyotype (Resolution chromosome). - Change of scale: what are the effects of small size DNA sequences deletions/amplifications? experimental tool: "conventional" CGH (resolution 1Mb). - CGH= Comparative Genomic Hybridization : method for the comparative measurement of relative DNA copy numbers between two samples (normal/disease, test/reference). Application of the microarray technology to CGH : last generation of chips: resolution 1kb.

3 Microarray technology in its principle

4 Interpretation of a CGH profile 3 2 segment amplifié 1 1 segment "normal" 2 segment délété A dot on the graph represents copies of BAC(t) in the test genome copies of BAC(t) in the reference genome

5 First step of the statistical analysis Break-points detection in a gaussian signal - a random process such that - Suppose that the parameters of the distribution of the abrupt-changes at unknown coordinates - Those break-points define a partition of the data into. are affected by K-1. segments of size :!! - Suppose that those parameters are constant between two changes: " - The parameters of this model are : - Break-points detection aims at studying the spatial structure of the signal.

6 Estimating the parameters in a model of abrupt-changes detection Log-Likelihood Estimating the parameters with fixed by maximum likelihood - Joint estimation of and with dynamic programming. - Necessary property of the likelihood : additivity in (sum of local likelihoods calculated on each segment). Model Selection : choice of - Penalized Likelihood :. - With. - is adaptively estimated to the data (Lavielle(23)).

7 Example of segmentation on array CGH data log 2 rat log 2 rat genomic position x 1 5 BT474 chromosome 1, genomic position x 1 6 BT474 chromosome 9,

8 Considering biologists objective and the need for a new model y y, x x x x x, x x x x x x x x,, x x x x x structure sur les y x, x x x x x x, x x x x x x, x x x x x structure sur les t t structure sur les t t Segmentation: structure spatiale du signal Segmentation/Classification

9 A new model for segmentation-clustering purposes - We suppose there exists a secondary underlying structure of the segments into populations with weights. - We introduce hidden variables, segment. indicators of the population of origin of - Those variables are supposed independent, with multinomial distribution: - Conditionnally to the hidden variables, we know the distribution of : - It is a model of segmentation/clustering. - The parameters of this model are avec

10 Likelihood and statistical units of the model - Mixture Model of segments : the statistical units are segments :, the density of is a mixture density: are independent, we have: If the s, - Classical mixture model : the statistical units are the

11 An hybrid algorithm for the optimization of the likelihood and known Alternate parameters estimation with : is fixed, the EM algorithm estimates 1 When : is fixed, dynamic programming estimates 2 When An increasing sequence of likelihoods:

12 Mixture Model when the segmentation is knwon Mixture model parameters estimators. - the estimator the the mixing proportions is: : - In the gaussian case, - Big size vectors will have a bigger impact in the estimation of the parameters, via the term

13 Influence of the vectors size on the affectation (MAP) can be written as follows: - The density of to population : distance of the mean of vector variability : intra-vector - Big size Individuals will be affected with certitude to the closest population

14 Segmentation with a fixed mixture Back to dynamic programming - the incomplete mixture log-likelihood can be written as a sum of local loglikelihoods: - the local log-likelihood of segment of vector corresponds to the mixture log-density - can be optimized in with fixed, by dynamix programming.

15 A decreasing log-likelihood? Evolution of the incomplete log-likelihood with respect to the number of segments.

16 What is going on? When the true number of segments is reached (6), segments are cut on the edges.

17 Explaining the behavior of the likelihood Optimization of the incomplete likelihood with dynamic programming: Hypothesis: 1 We suppose that the true number of segments is and that the partitions are nested for. Segment is cut into : 2 We suppose that if then :

18 " An intrinsic penality Under hypothesis 1-2: The log-likelihood is decomposed into two terms - A term of fit that increases with partitions), and is constant from a certain (nested - A term of differences of entropies that decreases with penalty for the choice of : plays the role of Choosing the number of segments when is fixed can be done with a penalized likelihood

19 Incomplete Likelihood behavior with respect to the number of segments P=2 P=3 P=4 P=5 P= The incomplete log-likelihood is decreasing from de.

20 Decomposition of the log-likelihood P=2 P=3 P=4 P=5 P= P=2 P=3 P=4 P=5 P= term of fit differences of entropies

21 Resulting clusters log 2 rat Segmentation/Clustering, genomic position x 1 5 Segmentation

22 Resulting clusters log 2 rat Segmentation/Clustering, genomic position x 1 5 Segmentation

23 Perspective : simultaneous choice for and Incomplete Log-likelihood with respect to 15 2 and.

24 This is the end Conclusions: - Definition of a new model that considers the a priori knowledge we have about the biological phenomena under study. - Development of an hybrid algorithm (EM/dynamic programming) for the parameters estimation (problems linked to EM : initializtion, local maxima, degeneracy). - Still waiting for an other data set to assess the performance of the clustering. Perspectives: - Modeling : Comparison with Hidden Markov Models - Model choice: Develop an adaptive procedure for two components. - Other application field DNA sequences (in progress)

Linear models for the joint analysis of multiple. array-cgh profiles

Linear models for the joint analysis of multiple. array-cgh profiles Linear models for the joint analysis of multiple array-cgh profiles F. Picard, E. Lebarbier, B. Thiam, S. Robin. UMR 5558 CNRS Univ. Lyon 1, Lyon UMR 518 AgroParisTech/INRA, F-75231, Paris Statistics for

More information

Some Statistical Models and Algorithms for Change-Point Problems in Genomics

Some Statistical Models and Algorithms for Change-Point Problems in Genomics Some Statistical Models and Algorithms for Change-Point Problems in Genomics S. Robin UMR 518 AgroParisTech / INRA Applied MAth & Comput. Sc. Journées SMAI-MAIRCI Grenoble, September 2012 S. Robin (AgroParisTech

More information

Mixtures and Hidden Markov Models for analyzing genomic data

Mixtures and Hidden Markov Models for analyzing genomic data Mixtures and Hidden Markov Models for analyzing genomic data Marie-Laure Martin-Magniette UMR AgroParisTech/INRA Mathématique et Informatique Appliquées, Paris UMR INRA/UEVE ERL CNRS Unité de Recherche

More information

Uncovering structure in biological networks: A model-based approach

Uncovering structure in biological networks: A model-based approach Uncovering structure in biological networks: A model-based approach J-J Daudin, F. Picard, S. Robin, M. Mariadassou UMR INA-PG / ENGREF / INRA, Paris Mathématique et Informatique Appliquées Statistics

More information

Mixture models for analysing transcriptome and ChIP-chip data

Mixture models for analysing transcriptome and ChIP-chip data Mixture models for analysing transcriptome and ChIP-chip data Marie-Laure Martin-Magniette French National Institute for agricultural research (INRA) Unit of Applied Mathematics and Informatics at AgroParisTech,

More information

arxiv: v2 [stat.co] 1 Jul 2013

arxiv: v2 [stat.co] 1 Jul 2013 Fast estimation of the Integrated Completed Likelihood criterion for change-point detection problems with applications to Next-Generation Sequencing data arxiv:1211.3210v2 [stat.co] 1 Jul 2013 A. Cleynen

More information

Discovering molecular pathways from protein interaction and ge

Discovering molecular pathways from protein interaction and ge Discovering molecular pathways from protein interaction and gene expression data 9-4-2008 Aim To have a mechanism for inferring pathways from gene expression and protein interaction data. Motivation Why

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Mixture Models and Expectation-Maximization

Mixture Models and Expectation-Maximization Mixture Models and Expectation-Maximiation David M. Blei March 9, 2012 EM for mixtures of multinomials The graphical model for a mixture of multinomials π d x dn N D θ k K How should we fit the parameters?

More information

An Integrated Approach for the Assessment of Chromosomal Abnormalities

An Integrated Approach for the Assessment of Chromosomal Abnormalities An Integrated Approach for the Assessment of Chromosomal Abnormalities Department of Biostatistics Johns Hopkins Bloomberg School of Public Health June 6, 2007 Karyotypes Mitosis and Meiosis Meiosis Meiosis

More information

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically

More information

Post-selection Inference for Changepoint Detection

Post-selection Inference for Changepoint Detection Post-selection Inference for Changepoint Detection Sangwon Hyun (Justin) Dept. of Statistics Advisors: Max G Sell, Ryan Tibshirani Committee: Will Fithian (UC Berkeley), Alessandro Rinaldo, Kathryn Roeder,

More information

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence

More information

Computational functional genomics

Computational functional genomics Computational functional genomics (Spring 2005: Lecture 8) David K. Gifford (Adapted from a lecture by Tommi S. Jaakkola) MIT CSAIL Basic clustering methods hierarchical k means mixture models Multi variate

More information

Model-based cluster analysis: a Defence. Gilles Celeux Inria Futurs

Model-based cluster analysis: a Defence. Gilles Celeux Inria Futurs Model-based cluster analysis: a Defence Gilles Celeux Inria Futurs Model-based cluster analysis Model-based clustering (MBC) consists of assuming that the data come from a source with several subpopulations.

More information

An Introduction to mixture models

An Introduction to mixture models An Introduction to mixture models by Franck Picard Research Report No. 7 March 2007 Statistics for Systems Biology Group Jouy-en-Josas/Paris/Evry, France http://genome.jouy.inra.fr/ssb/ An introduction

More information

Solution. Daozheng Chen. Challenge 1

Solution. Daozheng Chen. Challenge 1 Solution Daozheng Chen 1 For all the scatter plots and 2D histogram plots within this solution, the x axis is for the saturation component, and the y axis is the value component. Through out the solution,

More information

arxiv: v1 [stat.ml] 22 Jun 2012

arxiv: v1 [stat.ml] 22 Jun 2012 Hidden Markov Models with mixtures as emission distributions Stevenn Volant 1,2, Caroline Bérard 1,2, Marie-Laure Martin Magniette 1,2,3,4,5 and Stéphane Robin 1,2 arxiv:1206.5102v1 [stat.ml] 22 Jun 2012

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Statistical Modeling. Prof. William H. Press CAM 397: Introduction to Mathematical Modeling 11/3/08 11/5/08

Statistical Modeling. Prof. William H. Press CAM 397: Introduction to Mathematical Modeling 11/3/08 11/5/08 Statistical Modeling Prof. William H. Press CAM 397: Introduction to Mathematical Modeling 11/3/08 11/5/08 What is a statistical model as distinct from other kinds of models? Models take inputs, turn some

More information

K-Means, Expectation Maximization and Segmentation. D.A. Forsyth, CS543

K-Means, Expectation Maximization and Segmentation. D.A. Forsyth, CS543 K-Means, Expectation Maximization and Segmentation D.A. Forsyth, CS543 K-Means Choose a fixed number of clusters Choose cluster centers and point-cluster allocations to minimize error can t do this by

More information

Different points of view for selecting a latent structure model

Different points of view for selecting a latent structure model Different points of view for selecting a latent structure model Gilles Celeux Inria Saclay-Île-de-France, Université Paris-Sud Latent structure models: two different point of views Density estimation LSM

More information

Segmentation of the mean of heteroscedastic data via cross-validation

Segmentation of the mean of heteroscedastic data via cross-validation Segmentation of the mean of heteroscedastic data via cross-validation 1 UMR 8524 CNRS - Université Lille 1 2 SSB Group, Paris joint work with Sylvain Arlot GDR Statistique et Santé Paris, October, 21 2009

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

More information

Latent Variable View of EM. Sargur Srihari

Latent Variable View of EM. Sargur Srihari Latent Variable View of EM Sargur srihari@cedar.buffalo.edu 1 Examples of latent variables 1. Mixture Model Joint distribution is p(x,z) We don t have values for z 2. Hidden Markov Model A single time

More information

Multivariate point process models

Multivariate point process models Faculty of Science Multivariate point process models Niels Richard Hansen Department of Mathematical Sciences January 8, 200 Slide /20 Ideas and outline General aim: To build and implement a flexible (non-parametric),

More information

Robust Classification using Boltzmann machines by Vasileios Vasilakakis

Robust Classification using Boltzmann machines by Vasileios Vasilakakis Robust Classification using Boltzmann machines by Vasileios Vasilakakis The scope of this report is to propose an architecture of Boltzmann machines that could be used in the context of classification,

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

Exam: high-dimensional data analysis January 20, 2014

Exam: high-dimensional data analysis January 20, 2014 Exam: high-dimensional data analysis January 20, 204 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question not the subquestions on a separate piece of paper. - Finish

More information

Graphical model inference with unobserved variables via latent tree aggregation

Graphical model inference with unobserved variables via latent tree aggregation Graphical model inference with unobserved variables via latent tree aggregation Geneviève Robin Christophe Ambroise Stéphane Robin UMR 518 AgroParisTech/INRA MIA Équipe Statistique et génome 15 decembre

More information

BioHMM: a heterogeneous hidden Markov model for segmenting array CGH data

BioHMM: a heterogeneous hidden Markov model for segmenting array CGH data Vol. 22 no. 9 2006, pages 1144 1146 BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btl089 Genome analysis BioHMM: a heterogeneous hidden Markov model for segmenting array CGH data J. C. Marioni

More information

Data Mining in Bioinformatics HMM

Data Mining in Bioinformatics HMM Data Mining in Bioinformatics HMM Microarray Problem: Major Objective n Major Objective: Discover a comprehensive theory of life s organization at the molecular level 2 1 Data Mining in Bioinformatics

More information

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Learning in Bayesian Networks

Learning in Bayesian Networks Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

More information

State Space Models for Wind Forecast Correction

State Space Models for Wind Forecast Correction for Wind Forecast Correction Valérie 1 Pierre Ailliot 2 Anne Cuzol 1 1 Université de Bretagne Sud 2 Université de Brest MAS - 2008/28/08 Outline 1 2 Linear Model : an adaptive bias correction Non Linear

More information

Unsupervised machine learning

Unsupervised machine learning Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

Bayesian Regression of Piecewise Constant Functions

Bayesian Regression of Piecewise Constant Functions Marcus Hutter - 1 - Bayesian Regression of Piecewise Constant Functions Bayesian Regression of Piecewise Constant Functions Marcus Hutter Istituto Dalle Molle di Studi sull Intelligenza Artificiale IDSIA,

More information

Change-Point Detection in Time Series Data via the Cross-Entropy Method

Change-Point Detection in Time Series Data via the Cross-Entropy Method nd International Congress on Modelling and Simulation, Hobart, Tasmania, Australia, 3 to 8 December 017 mssanz.org.au/modsim017 Change-Point Detection in Time Series Data via the Cross-Entropy Method G.

More information

he Applications of Tensor Factorization in Inference, Clustering, Graph Theory, Coding and Visual Representation

he Applications of Tensor Factorization in Inference, Clustering, Graph Theory, Coding and Visual Representation he Applications of Tensor Factorization in Inference, Clustering, Graph Theory, Coding and Visual Representation Amnon Shashua School of Computer Science & Eng. The Hebrew University Matrix Factorization

More information

Note Set 5: Hidden Markov Models

Note Set 5: Hidden Markov Models Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional

More information

Unsupervised Learning: Projections

Unsupervised Learning: Projections Unsupervised Learning: Projections CMPSCI 689 Fall 2015 Sridhar Mahadevan Lecture 2 Data, Data, Data LIBS spectrum Steel'drum' The Image Classification Challenge: 1,000 object classes 1,431,167 images

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

Learning Markov Networks. Presented by: Mark Berlin, Barak Gross

Learning Markov Networks. Presented by: Mark Berlin, Barak Gross Learning Markov Networks Presented by: Mark Berlin, Barak Gross Introduction We shall egi, pehaps Eugene Onegin, Chapter VI Off did he take, I folloed at his heels. Inferno, Canto II Reminder Until now

More information

De novo assembly and genotyping of variants using colored de Bruijn graphs

De novo assembly and genotyping of variants using colored de Bruijn graphs De novo assembly and genotyping of variants using colored de Bruijn graphs Iqbal et al. 2012 Kolmogorov Mikhail 2013 Challenges Detecting genetic variants that are highly divergent from a reference Detecting

More information

Heeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University

Heeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University Heeyoul (Henry) Choi Dept. of Computer Science Texas A&M University hchoi@cs.tamu.edu Introduction Speaker Adaptation Eigenvoice Comparison with others MAP, MLLR, EMAP, RMP, CAT, RSW Experiments Future

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Outline Maximum likelihood (ML) Priors, and

More information

Lecture 21: Spectral Learning for Graphical Models

Lecture 21: Spectral Learning for Graphical Models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation

More information

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Yale School of Public Health Joint work with Ning Hao, Yue S. Niu presented @Tsinghua University Outline 1 The Problem

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Today s Lecture: HMMs

Today s Lecture: HMMs Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models

More information

BLIND SEPARATION OF TEMPORALLY CORRELATED SOURCES USING A QUASI MAXIMUM LIKELIHOOD APPROACH

BLIND SEPARATION OF TEMPORALLY CORRELATED SOURCES USING A QUASI MAXIMUM LIKELIHOOD APPROACH BLID SEPARATIO OF TEMPORALLY CORRELATED SOURCES USIG A QUASI MAXIMUM LIKELIHOOD APPROACH Shahram HOSSEII, Christian JUTTE Laboratoire des Images et des Signaux (LIS, Avenue Félix Viallet Grenoble, France.

More information

Clustering using Mixture Models

Clustering using Mixture Models Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Lecture 6: Gaussian Mixture Models (GMM)

Lecture 6: Gaussian Mixture Models (GMM) Helsinki Institute for Information Technology Lecture 6: Gaussian Mixture Models (GMM) Pedram Daee 3.11.2015 Outline Gaussian Mixture Models (GMM) Models Model families and parameters Parameter learning

More information

Introduction. Probability and distributions

Introduction. Probability and distributions Introduction. Probability and distributions Joe Felsenstein Genome 560, Spring 2011 Introduction. Probability and distributions p.1/18 Probabilities We will assume you know that Probabilities of mutually

More information

Stochastic segmentation models for array-based comparative genomic hybridization data analysis

Stochastic segmentation models for array-based comparative genomic hybridization data analysis Biostatistics (2008), 9, 2, pp. 290 307 doi:10.1093/biostatistics/kxm031 Advance Access publication on September 12, 2007 Stochastic segmentation models for array-based comparative genomic hybridization

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Variable Selection in Structured High-dimensional Covariate Spaces

Variable Selection in Structured High-dimensional Covariate Spaces Variable Selection in Structured High-dimensional Covariate Spaces Fan Li 1 Nancy Zhang 2 1 Department of Health Care Policy Harvard University 2 Department of Statistics Stanford University May 14 2007

More information

Temporal Multi-View Inconsistency Detection for Network Traffic Analysis

Temporal Multi-View Inconsistency Detection for Network Traffic Analysis WWW 15 Florence, Italy Temporal Multi-View Inconsistency Detection for Network Traffic Analysis Houping Xiao 1, Jing Gao 1, Deepak Turaga 2, Long Vu 2, and Alain Biem 2 1 Department of Computer Science

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

arxiv: v2 [cs.ne] 22 Feb 2013

arxiv: v2 [cs.ne] 22 Feb 2013 Sparse Penalty in Deep Belief Networks: Using the Mixed Norm Constraint arxiv:1301.3533v2 [cs.ne] 22 Feb 2013 Xanadu C. Halkias DYNI, LSIS, Universitè du Sud, Avenue de l Université - BP20132, 83957 LA

More information

An Integrated Approach for the Assessment of Chromosomal Abnormalities

An Integrated Approach for the Assessment of Chromosomal Abnormalities An Integrated Approach for the Assessment of Chromosomal Abnormalities Department of Biostatistics Johns Hopkins Bloomberg School of Public Health June 26, 2007 Karyotypes Karyotypes General Cytogenetics

More information

Sex accelerates adaptation

Sex accelerates adaptation Molecular Evolution Sex accelerates adaptation A study confirms the classic theory that sex increases the rate of adaptive evolution by accelerating the speed at which beneficial mutations sweep through

More information

Microarray Data Analysis: Discovery

Microarray Data Analysis: Discovery Microarray Data Analysis: Discovery Lecture 5 Classification Classification vs. Clustering Classification: Goal: Placing objects (e.g. genes) into meaningful classes Supervised Clustering: Goal: Discover

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

Deciphering and modeling heterogeneity in interaction networks

Deciphering and modeling heterogeneity in interaction networks Deciphering and modeling heterogeneity in interaction networks (using variational approximations) S. Robin INRA / AgroParisTech Mathematical Modeling of Complex Systems December 2013, Ecole Centrale de

More information

Bayesian methods for graph clustering

Bayesian methods for graph clustering Author manuscript, published in "Advances in data handling and business intelligence (2009) 229-239" DOI : 10.1007/978-3-642-01044-6 Bayesian methods for graph clustering P. Latouche, E. Birmelé, and C.

More information

Computational Genetics Winter 2013 Lecture 10. Eleazar Eskin University of California, Los Angeles

Computational Genetics Winter 2013 Lecture 10. Eleazar Eskin University of California, Los Angeles Computational Genetics Winter 2013 Lecture 10 Eleazar Eskin University of California, Los ngeles Pair End Sequencing Lecture 10. February 20th, 2013 (Slides from Ben Raphael) Chromosome Painting: Normal

More information

Intro to Probability. Andrei Barbu

Intro to Probability. Andrei Barbu Intro to Probability Andrei Barbu Some problems Some problems A means to capture uncertainty Some problems A means to capture uncertainty You have data from two sources, are they different? Some problems

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

Co-expression analysis of RNA-seq data

Co-expression analysis of RNA-seq data Co-expression analysis of RNA-seq data Etienne Delannoy & Marie-Laure Martin-Magniette & Andrea Rau Plant Science Institut of Paris-Saclay (IPS2) Applied Mathematics and Informatics Unit (MIA-Paris) Genetique

More information

Evaluation of the performance of Amersham HyPer5 dye in comparative genomic hybridization microarray applications

Evaluation of the performance of Amersham HyPer5 dye in comparative genomic hybridization microarray applications GE Healthcare Application note 28-9304-76 AA Nucleic acid labeling and detection Evaluation of the performance of Amersham HyPer5 dye in comparative genomic hybridization microarray applications Key words:

More information

Dispersion modeling for RNAseq differential analysis

Dispersion modeling for RNAseq differential analysis Dispersion modeling for RNAseq differential analysis E. Bonafede 1, F. Picard 2, S. Robin 3, C. Viroli 1 ( 1 ) univ. Bologna, ( 3 ) CNRS/univ. Lyon I, ( 3 ) INRA/AgroParisTech, Paris IBC, Victoria, July

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu November 3, 2015 Methods to Learn Matrix Data Text Data Set Data Sequence Data Time Series Graph

More information

The Informativeness of k-means for Learning Mixture Models

The Informativeness of k-means for Learning Mixture Models The Informativeness of k-means for Learning Mixture Models Vincent Y. F. Tan (Joint work with Zhaoqiang Liu) National University of Singapore June 18, 2018 1/35 Gaussian distribution For F dimensions,

More information

Lecture 1a: Basic Concepts and Recaps

Lecture 1a: Basic Concepts and Recaps Lecture 1a: Basic Concepts and Recaps Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced

More information

CSE 473: Artificial Intelligence Autumn Topics

CSE 473: Artificial Intelligence Autumn Topics CSE 473: Artificial Intelligence Autumn 2014 Bayesian Networks Learning II Dan Weld Slides adapted from Jack Breese, Dan Klein, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 473 Topics

More information

Going Beyond SNPs with Next Genera5on Sequencing Technology Personalized Medicine: Understanding Your Own Genome Fall 2014

Going Beyond SNPs with Next Genera5on Sequencing Technology Personalized Medicine: Understanding Your Own Genome Fall 2014 Going Beyond SNPs with Next Genera5on Sequencing Technology 02-223 Personalized Medicine: Understanding Your Own Genome Fall 2014 Next Genera5on Sequencing Technology (NGS) NGS technology Discover more

More information

Fast Approximate MAP Inference for Bayesian Nonparametrics

Fast Approximate MAP Inference for Bayesian Nonparametrics Fast Approximate MAP Inference for Bayesian Nonparametrics Y. Raykov A. Boukouvalas M.A. Little Department of Mathematics Aston University 10th Conference on Bayesian Nonparametrics, 2015 1 Iterated Conditional

More information

A New Smoothing Model for Analyzing Array CGH Data

A New Smoothing Model for Analyzing Array CGH Data A New Smoothing Model for Analyzing Array CGH Data Nha Nguyen, Heng Huang, Soontorn Oraintara and An Vo Department of Electrical Engineering, University of Texas at Arlington Email: nhn3175@exchange.uta.edu,

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

The lasso: some novel algorithms and applications

The lasso: some novel algorithms and applications 1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,

More information

Gibbs Sampling Methods for Multiple Sequence Alignment

Gibbs Sampling Methods for Multiple Sequence Alignment Gibbs Sampling Methods for Multiple Sequence Alignment Scott C. Schmidler 1 Jun S. Liu 2 1 Section on Medical Informatics and 2 Department of Statistics Stanford University 11/17/99 1 Outline Statistical

More information

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010 Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data

More information

STA 414/2104, Spring 2014, Practice Problem Set #1

STA 414/2104, Spring 2014, Practice Problem Set #1 STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,

More information

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood

More information

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics Table of Preface page xi PART I INTRODUCTION 1 1 The meaning of probability 3 1.1 Classical definition of probability 3 1.2 Statistical definition of probability 9 1.3 Bayesian understanding of probability

More information

Sean Escola. Center for Theoretical Neuroscience

Sean Escola. Center for Theoretical Neuroscience Employing hidden Markov models of neural spike-trains toward the improved estimation of linear receptive fields and the decoding of multiple firing regimes Sean Escola Center for Theoretical Neuroscience

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

Package MixGLasso. August 31, 2013

Package MixGLasso. August 31, 2013 Package MixGLasso August 31, 2013 Type Package Title The MixGLasso-package is a powerful tool for clustering high-dimensional data. Version 1.0 Date 2013-08-15 Author Nicolas Staedler Maintainer Nicolas

More information

Lecture 14. Clustering, K-means, and EM

Lecture 14. Clustering, K-means, and EM Lecture 14. Clustering, K-means, and EM Prof. Alan Yuille Spring 2014 Outline 1. Clustering 2. K-means 3. EM 1 Clustering Task: Given a set of unlabeled data D = {x 1,..., x n }, we do the following: 1.

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2004 Paper 147 Multiple Testing Methods For ChIP-Chip High Density Oligonucleotide Array Data Sunduz

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Inferring Transcriptional Regulatory Networks from High-throughput Data

Inferring Transcriptional Regulatory Networks from High-throughput Data Inferring Transcriptional Regulatory Networks from High-throughput Data Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20

More information

Approximate inference in Energy-Based Models

Approximate inference in Energy-Based Models CSC 2535: 2013 Lecture 3b Approximate inference in Energy-Based Models Geoffrey Hinton Two types of density model Stochastic generative model using directed acyclic graph (e.g. Bayes Net) Energy-based

More information