Clustering gene expression data & the EM algorithm

Similar documents
Cluster Validation Determining Number of Clusters. Umut ORHAN, PhD.

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

EM and Structure Learning

Lecture Nov

Lecture Notes on Linear Regression

Spectral Clustering. Shannon Quinn

The Expectation-Maximization Algorithm

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

Lecture 12: Classification

Semi-Supervised Learning

VQ widely used in coding speech, image, and video

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Retrieval Models: Language models

Gaussian Mixture Models

Generalized Linear Methods

Feature Selection: Part 1

Multilayer Perceptron (MLP)

Maximum Likelihood Estimation

Clustering. CS4780/5780 Machine Learning Fall Thorsten Joachims Cornell University

Problem Set 9 Solutions

K means B d ase Consensus Cluste i r ng Dr. Dr Junjie Wu Beihang University

p 1 c 2 + p 2 c 2 + p 3 c p m c 2

Mixture o f of Gaussian Gaussian clustering Nov

Ensemble Methods: Boosting

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

EDMS Modern Measurement Theories. Multidimensional IRT Models. (Session 6)

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

Linear Feature Engineering 11

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Online Classification: Perceptron and Winnow

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

9 : Learning Partially Observed GM : EM Algorithm

Logistic Classifier CISC 5800 Professor Daniel Leeds

Some Reading. Clustering and Unsupervised Learning. Some Data. K-Means Clustering. CS 536: Machine Learning Littman (Wu, TA)

Aggregation of Social Networks by Divisive Clustering Method

Lecture 4: November 17, Part 1 Single Buffer Management

Classification as a Regression Problem

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Differentiating Gaussian Processes

Hierarchical Clustering

Clustering & (Ken Kreutz-Delgado) UCSD

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

10-701/ Machine Learning, Fall 2005 Homework 3

I529: Machine Learning in Bioinformatics (Spring 2017) Markov Models

Clustering & Unsupervised Learning

Statistical pattern recognition

Communication with AWGN Interference

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

Singular Value Decomposition: Theory and Applications

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Spatial Statistics and Analysis Methods (for GEOG 104 class).

Lecture 10 Support Vector Machines II

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation

Search sequence databases 2 10/25/2016

Handling Uncertain Spatial Data: Comparisons between Indexing Structures. Bir Bhanu, Rui Li, Chinya Ravishankar and Jinfeng Ni

Hidden Markov Models

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Perceptual Organization (IV)

NP-Completeness : Proofs

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Semi-supervised Classification with Active Query Selection

Meshless Surfaces. presented by Niloy J. Mitra. An Nguyen

Linear Regression Analysis: Terminology and Notation

Chapter 8 SCALAR QUANTIZATION

The EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X

Supporting Information

Expected Value and Variance

CS 229, Public Course Problem Set #3 Solutions: Learning Theory and Unsupervised Learning

18.1 Introduction and Recap

CLUSTER ANALYSIS. SUKANTA DASH M.Sc. (Agricultural Statistics), Roll No I.A.S.R.I., Library Avenue, New Delhi Chairperson: Sh. S.D.

Open Systems: Chemical Potential and Partial Molar Quantities Chemical Potential

Support Vector Machines

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

Outline. Clustering: Similarity-Based Clustering. Supervised Learning vs. Unsupervised Learning. Clustering. Applications of Clustering

Limited Dependent Variables

Mixture of Gaussians Expectation Maximization (EM) Part 2

Dynamic Programming. Preview. Dynamic Programming. Dynamic Programming. Dynamic Programming (Example: Fibonacci Sequence)

EEE 241: Linear Systems

Kernel Methods and SVMs Extension

Support Vector Machines

Clustering Techniques for Information Retrieval

Clustering with Gaussian Mixtures

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

Expectation Maximization Mixture Models HMMs

Lecture 12: Discrete Laplacian

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique

EGR 544 Communication Theory

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

Split alignment. Martin C. Frith April 13, 2012

Performance of Different Algorithms on Clustering Molecular Dynamics Trajectories

Chapter 12. Ordinary Differential Equation Boundary Value (BV) Problems

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov

THE ARIMOTO-BLAHUT ALGORITHM FOR COMPUTATION OF CHANNEL CAPACITY. William A. Pearlman. References: S. Arimoto - IEEE Trans. Inform. Thy., Jan.

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Homework Assignment 3 Due in class, Thursday October 15

Support Vector Machines

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

Transcription:

CG, Fall 2011-12 Clusterng gene expresson data & the EM algorthm CG 08 Ron Shamr 1

How Gene Expresson Data Looks Entres of the Raw Data matrx: Rato values Absolute values Row = gene s expresson pattern / fngerprnt vector Column = experment/condton s profle genes condtons Expresson levels, Raw Data 2

condtons 10 20 30 40 50 Data Preprocessng Input: Real-valued raw data matrx. Compute the smlarty matrx (dot product/correlaton/ ) Alternatvely dstances genes Expresson levels, Raw Data From the Raw Data matrx we compute the smlarty matrx S. S j reflects the smlarty of the expresson patterns of gene and gene j. 60 10 20 30 40 50 60 3

DNA chps: Applcatons Deducng functons of unknown genes (smlar expresson pattern smlar functon) Identfyng dsease profles Decpherng regulatory mechansms (co-expresson co-regulaton). Classfcaton of bologcal condtons Drug development Analyss requres clusterng of genes/condtons. 4

Clusterng: Objectve Group elements (genes) to clusters satsfyng: Homogenety: Elements nsde a cluster are hghly smlar to each other. Separaton: Elements from dfferent clusters have low smlarty to each other. Needs formal objectve functons. Most useful versons are NP-hard. 5

The Clusterng Bazaar 6

Herarchcal clusterng CG 08 Ron Shamr 7

An Alternatve Vew Instead of partton to clusters Form a tree-herarchy of the nput elements satsfyng: More smlar elements are placed closer along the tree. Or: Tree dstances reflect element smlarty 8

Herarchcal Representaton Dendrogram: rooted tree, usually bnary; all leaf-root dstances are equal. 5.0 4.5 2.8 1 2 3 4 1 2 3 49

Herarchcal Clusterng: Average Lnkage Sokal & Mchener 58, Lance & Wllams 67 Input: Dstance matrx (D j ) Iteratve algorthm. Intally each element s a cluster. n r - sze of cluster r Fnd mn element D rs n D; merge clusters r,s Delete elements r,s; add new element t wth D t =D t =n r /(n r +n s ) D r + n s /(n r +n s ) D s Repeat 10

A General Framework Lance & Wllams 67 Fnd mn element D rs, merge clusters r,s Delete elems. r,s, add new elem. t wth D t =D t =α r D r + α s D s + γ D r -D s Sngle-lnkage: D t =mn{d r,d s } Complete-lnkage: D t =max{d r,d s } Note: analogous formulaton n terms of smlarty matrx (rather than dstance) 11

Herarchcal clusterng of GE data Esen et al., PNAS 1998 Growth response: Starved human fbroblast cells, added serum Montored 8600 genes over 13 tme-ponts t j - fluorescence level of gene n condton j; r j same for reference s j = log(t j /r j ) S kl =(Σ j s kj s lj )/[ s k s l ] (cosne of angle) Appled average lnkage method Ordered leaves by ncreasng element weght: average expresson level, tme of maxmal nducton, or other crtera 12

CG 08 Ron Shamr 13

Esengrams for same data randomly permuted wthn rows (1), columns (2) and both(3) 14

Yeast stress data CG 08 Ron Shamr 15

Comments Dstnct measurements of same genes cluster together Genes of smlar functon cluster together Many cluster-functon specfc nsghts Interpretaton s a REAL bologcal challenge 16

More on herarchcal methods Agglomeratve vs. the more natural dvsve. Advantages: gves a sngle coherent global pcture Intutve for bologsts (from phylogeny) Dsadvantages: No sngle partton; no specfc clusters Forces all elements to ft a tree herarchy 17

Non-Herarchcal Clusterng CG 08 Ron Shamr 18

K-means (Lloyd 57, Macqueen 67) Input: vector v for each element ; #clusters=k Defne a centrod c p of a cluster C p as ts average vector. Goal: mnmze Σ clusters p Σ n cluster p d(v,c p ) Objectve = homogenety only (k fxed) NP-hard already for k=2. 19

K-means alg. Intalze an arbtrary partton P nto k clusters. Repeat the followng tll convergence: Update centrods (max c, P fxed) Assgn each pont to ts closest centrod (max P, c fxed) Can be shown to have poly expected tme under varous assumptons on data dstrbuton. A varant: perform a sngle best modfcaton (that decreases the score the most). 20

21

22

A Soft Verson Based on a probablstc model of data as comng from a mxture of Gaussans: Pz ( = j) = π Px ( z= j)~ N( µ, σi) Goal: evaluate the parameters θ (assume σ s known). Method: apply EM to maxmze the lkelhood of data. j 2 d( x, µ j) L( θ) π j exp( ) 2 2σ j j 23

EM, soft verson Iteratvely, compute soft assgnment and use t to derve expectatons of π, μ: 24

Soft vs. hard clusterng Soft verson mnmzes: 2 d( x, µ j) L( θ) π j exp( ) 2 2σ j If we assume that each element s n one cluster (hard assgnment) then: lo Lg( θ ) d( x, µ ) Ths s exactly the k-means crteron! c () 2 25

Expectaton-maxmzaton: The probablstc settng Input: data x comng from a probablstc model wth hdden nformaton y Goal: Learn the model s parameters so that the lkelhood of the data s maxmzed. Example: a mxture of two Gaussans Py ( = 1) = p; Py ( = 2) = p = 1 p 1 2 1 2 1 ( x µ ) j Px ( y = j) = exp 2 σ 2π 2σ CG 08 Ron Shamr

The lkelhood functon Py ( = 1) = p; Py ( = 2) = p = 1 p 1 2 1 2 1 ( x µ ) j Px ( y = j) = exp 2 σ 2π 2σ L( θ) = Px ( θ) = Px (, y = j θ) j 2 p j ( x µ ) j log L( θ ) = log exp 2 j σ 2π 2σ CG 08 Ron Shamr

The EM algorthm Goal: max logp(x θ)=log (Σ P(x,y θ)) Assume we have a model θ t whch we wsh to mprove. Note: P(x θ) = P(x,y θ) / P(y x,θ) t t t Py ( x, θ ) lo Px g( θ) = Py ( x, θ ) lo Pxy g(, θ) Py ( x, θ ) lo Py g( x, θ) t t t Py ( x, θ ) lo Px g( θ) = Py ( x, θ ) lo Pxy g(, θ) Py ( x, θ ) lo Py g( x, θ) y y y t t log Px ( θ) = Py ( x, θ ) log Pxy (, θ) Py ( x, θ ) log Py ( x, θ) t log Px ( θ ) = y y y t t t t Py ( x, θ ) lo Pxy g(, θ ) Py ( x, θ ) lo Py g( x, θ ) t t t t t Py ( x, θ ) = Q( θ θ ) Q( θ θ ) + Py ( x, θ ) lo g y Py ( x, θ ) Constant >=0 y CG 08 Ron Shamr

The EM algorthm (cont.) Man component: s the expectaton of logp(x,y θ) over the dstrbuton of y gven by the current parameters θ t The algorthm: E-step: Calculate the Q functon M-step: Maxmze Q(θ θ t ) wth respect to θ [ ] t t t Q( θ θ ) = Py ( x, θ ) log Pxy (, θ) = E log Pxy (, θ) y CG 08 Ron Shamr

Applcaton to the mxture model [ ] t t t Q( θ θ ) = Py ( x, θ ) log Pxy (, θ) = E log Pxy (, θ) y Pxy (, θ) = Px (, y= j θ) = Px (, y= j θ) y j 1 = 0 y y = j j j log Pxy (, θ) = ylog Px (, y= j θ) j j t t E[log Pxy (, θ)] = E[ y]log Px (, y= j θ) j j y j 30

Applcaton (cont.) t t E[log Pxy (, θ)] = E[ y]log Px (, y= j θ) w: = E[ y] = Py ( = 1 x, θ ) = j j t Px (, y= j θ ) t t t j j j t j Px (, y= j θ ) t t 1 Q( θ θ ) = wj log logσ + log pj j 2π ( x µ ) 2 2σ j 2 31