Ensemble Non-negative Matrix Factorization Methods for Clustering Protein-Protein Interactions
|
|
- Lynn Parks
- 6 years ago
- Views:
Transcription
1 Belfield Campus Map Ensemble Non-negative Matrix Factorization Methods for Clustering Protein-Protein Interactions <D onn ybr ook N11 Entrance Greenfield Entrance Derek Greene 1,2 Gerard Cagney Nevan Krogan 1 Pádraig Cunningham : School of Computer Science & Informatics, UCD 2: Department of Cellular & Molecular Pharmacology, UCSF 3 2 Richview Entrance
2 2 Outline Protein Interaction Data Existing Cluster Analysis Techniques Hierarchical Clustering Non-negative Matrix Factorization (NMF) Objectives for Clustering Ensemble NMF Clustering Algorithm Generation Phase Integration Phase Experimental Evaluation NMF Tree Browser Application
3 3 Analysing Protein Interaction Data Large biological datasets comprising thousands of protein-protein interactions have been assembled. Cataloguing and analysing interaction data is a first step toward understanding the biological basis of the interactions and the role of any network structure that underlies them. In recent years, the size and density of these datasets has presented a barrier to analysis, even by individuals with extensive knowledge of the proteins. e.g. 18,324 physically interacting protein pairs in the Saccharomyces cerevisiae proteome alone (Salwinski et al., 2004). Cluster analysis techniques are often used to explore and organize large biological datasets.
4 4 Hierarchical Clustering Constructs a binary tree by iteratively merging most similar clusters. Applied to identify functional groupings in protein interaction data (Collins et al., 2007). ARP2 ARP2 X Drawbacks: Each data object can only reside in a single branch of the tree at a given level. In protein networks proteins may be associated with multiple biological processes. A protein should belong to multiple distinct branches in the natural cluster hierarchy of the data.
5 5 NMF Clustering Non-negative Matrix Factorization (NMF) (Lee & Seung, 1999) algorithms have been used to discover overlapping groups. Produces a low-dimensional approximation of a non-negative data matrix, which can be interpreted as a "soft" clustering. Symmetric NMF (Ding & He, 2005) Non-negative Similarity Matrix Factor Matrix (Clustering) S V V T n n n k k n S ij : Strength of association between protein i and protein j V ij : Real-valued membership weight for protein i in cluster j
6 6 NMF Clustering Non-negative Matrix Factorization (NMF) (Lee & Seung, 1999) algorithms have been used to discover overlapping groups. Produces a low-dimensional approximation of a non-negative data matrix, which can be interpreted as a "soft" clustering. Symmetric NMF (Ding & He, 2005) Non-negative Similarity Matrix Factor Matrix (Clustering) S n n V n k V T k n Symmetric NMF k =2 Cluster Cluster Significant overlap Pairwise Similarity Matrix Factor Matrix
7 7 NMF Clustering - Analysis Advantages Solutions can represent overlapping clusters. Often produces a sparse factor matrix... Can identify small, localised clusters. Can eliminate irrelevant and outlying instances. Disadvantages Output depends on initial matrix used to seed the algorithm Does not discover hierarchical relations between clusters. No intuitive visualisation for the output. How are these clusters related? Parameter selection can be difficult... How many clusters k in the factor matrix?
8 8 Objectives for Clustering Q. What features do we require in a cluster analysis procedure when working with protein interaction data? 1. Clusters similar to known protein complex compositions. 2. Clusters should be presented in an intuitive visual format. 3. Provision of meaningful hierarchical structure. 4. Identify shared subunits and "moonlighting" proteins. 5. Assignment of putative protein function. When analysing protein interaction networks, we propose a new algorithm that combines... Ability of NMF to accurately identify overlapping groups. Organisational and visualisation benefits of hierarchical clustering.
9 9 Soft Hierarchical Clustering An alternative binary tree representation that supports overlapping groups. Proteins can be associated with multiple nodes in the tree to different degrees.
10 10 Ensemble NMF Algorithm Key Idea: Ensemble algorithms combine the output of multiple Machine Learning procedures to produce a superior result. Algorithm involves a two phase process: 1. Generation phase: 2. Integration phase: Produce a collection of NMF factorizations (i.e. the members of the ensemble) Combine the factorizations to produce an improved clustering. Symmetric NMF Integration Function Original Dataset NMF Factorizations Consensus Solution NB: Consensus solution is a soft hierarchical clustering.
11 11 Algorithm: Generation Phase Q. How do we generate an ensemble of factorizations? Repeatedly apply Symmetric NMF to a pairwise similarity matrix representing our data: V 1V2 Pairwise Similarity Matrix S Symmetric NMF V 3 V 4 Large collection of ensemble members Ensemble techniques are most effective when combining a diverse collection of solutions (Opitz & Shavlik, 1996). To introduce diversity in the generation phase: Initialise Symmetric NMF with a random solution. Randomly select the number of factors k from a fixed range. The fixed range can be chosen "roughly", which simplifies the NMF model selection problem.
12 12 Algorithm: Integration Phase Q. How do we combine an ensemble of factorizations to produce a final "consensus" clustering of the data? Construct a dataset from all clusters present in the ensemble. Apply "min-max" hierarchical clustering to produce a metaclustering (i.e. a clustering of clusters) V 1V2 V 3 V 4 Build Matrix Transpose Matrix Min-Max Clustering n l l n Ensemble of Factorizations Matrix of Clusters (Columns) Matrix of Clusters (Rows) Meta Clustering NB: We can construct a soft hierarchical clustering of the original proteins from the meta-clustering. Take mean vector for each tree node in the meta-clustering.
13 13 Experimental Evaluation We used an extensive and high-quality assembly of binary interactions for 2390 proteins (Collins et al., 2007). This dataset provides a confidence score measuring the evidence that the proteins do indeed co-purify, referred to as Purification Enrichment (PE). We apply Ensemble NMF to the corresponding PE matrix. S PE Score Matrix S ij Strength of evidence that there is a genuine positive or negative interaction between protein i and protein j Baseline approach: We also applied average-linkage hierarchical clustering to the PE score matrix.
14 14 Evaluation: External Validation External validation: compare a clustering to a "gold standard" classification, if available. For protein interaction data we use functional groupings provided by the MIPS database. We consider two well-known validation measures: Precision: Fraction of proteins in a given cluster that pertain to a specific MIPS class. Recall: Fraction of the proteins from a given MIPS class that were recovered in a given cluster. Ideally we want a cluster analysis procedure that recovers known protein complex compositions with high precision and recall.
15 15 External Validation Results The structures uncovered by Ensemble NMF seem to be far more informative than those identified using the baseline approach. Greene et al Reflected in the substantially improved validation scores for both validation approaches, based on MIPS classes. Table 1. Validation scores for 20 most significant clusters identified by Ensemble NMF on Collins protein interaction data. Table 2. Validation scores for 20 most significant clusters identified by average-linkage hierarchical clustering on Collins protein interaction data. Class Precision Recall 20S proteasome Anaphase promoting complex (APC) H+-transporting ATPase vacuolar Post-replication complex Pre-replication complex (pre-rc) Replication complex Replication initiation complex Septin filaments TRAPP complex RNA polymerase I SWI/SNF activator complex COPI Ensemble NMF Exocyst complex Kornbergs mediator (SRB) complex Signal recognition particle (SRP) Gim complexes TFIIIC /22S regulator Arp2p/Arp3p complex Class Precision Recall Geranylgeranyltransferase II v-snares NEF3 complex RNA polymerase I RNase MRP RNase P Replication factor C complex mrna splicing Other respiration chain complexes RSC complex SWI/SNF transcription activator complex SAGA complex Hierarchical Clustering rrna splicing Dam1 protein complex S proteasome RNA polymerase III ADA complex RNA polymerase II TRAPP complex
16 16 Evaluation: Discussion Provision of meaningful hierarchical structure: Soft hierarchical clustering produced by Ensemble NMF lends itself to the identification of sub-complexes. Example: the COMA subcomplex (Ame1, Okp1, Mcm21, Ctf19) of the larger CTF19 central kinetochore complex can be resolved Identification of shared subunits and "moonlighting" proteins: Ensemble NMF successfully accommodates proteins that are present in two or more groupings. Example: The 3 chromatin remodelling complexes SWR-C, INO80, and Nu4A all contain actin and the actin-related protein Arp4. Assignment of putative protein function: The uncharacterised protein YNR024W is grouped within a tree node that contains all twelve members of the exosome complex. YNR024W may be a previously undescribed component of this complex, and/or participate in these processes.
17 17 NMF Tree Browser Application We developed the NMF Tree Browser, a cross-platform Java application for visually inspecting a soft hierarchy produced by the Ensemble NMF algorithm. Zoom controls Statistics for selected node Class correlations for selected node Currently selected node Tree root node Membership weights for selected node
18 18 NMF Tree Browser Application The application includes a range of data exploration tools. Class sizes and correlations Precision & Recall scores List of most significant class/ node combinations Membership weights for proteins in selected node Clustering and Tree Browser software is freely available:
19 19 Conclusions We have presented a new clustering approach that involves aggregating a collection of matrix factorizations generated using NMF-like techniques. In evaluations on high-quality protein interaction data, we have observed that Ensemble NMF can... Improve our ability to identify groupings that accurately reflect known protein complex compositions. Help discover overlapping groups and multi-function or "moonlighting" proteins. Provide an intuitive, tree-like organisation of the data. We have developed the NMF Tree Browser application, which supports cluster visualisation and labelling of previously uncharacterised proteins. Many other potential applications - e.g. discovering structures genetic interaction data, gene microarray data.
20 20 References Salwinski, L., Miller, C. S., Smith, A. J., Pettit, F. K., Bowie, J. U., and Eisenberg, D. (2004). The database of interacting proteins: 2004 update. Nucleic Acids Res, 32(Database issue), pp Collins, S. R. R., Kemmeren, P., Zhao, X.-C. C., Greenblatt, J. F. F., Spencer, F., Holstege, F. C. C., Weissman, J. S. S., and Krogan, N. J. J. (2007). Towards a comprehensive atlas of the physical interactome of Saccharomyces cescerevisiae. MolCell Proteomics. Strehl, A. and Ghosh, J. (2002). Cluster ensembles - a knowledge reuse framework for combining partitionings. In Proc. Conference on Artificial Intelligence (AAAI 02), pp Ding, C. and He, X. (2005). On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering. In Proc. SIAM International Conference on Data Mining (SDM 05), pp Lee, D. D. and Seung, H. S. (1999). Learning the parts of objects by nonnegative matrix factorization. Nature, 401, pp Opitz, D. W. and Shavlik, J. W. (1996). Generating accurate and diverse members of a neural-network ensemble. NIPS 8, pp
BIOINFORMATICS ORIGINAL PAPER
BIOINFORMATICS ORIGINAL PAPER Vol. 24 no. 15 2008, pages 1722 1728 doi:10.1093/bioinformatics/btn286 Data and text mining Ensemble non-negative matrix factorization methods for clustering protein protein
More informationDiscovering modules in expression profiles using a network
Discovering modules in expression profiles using a network Igor Ulitsky 1 2 Protein-protein interactions (PPIs) Low throughput measurements: accurate, scarce High throughput: more abundant, noisy Large,
More informationRobust Community Detection Methods with Resolution Parameter for Complex Detection in Protein Protein Interaction Networks
Robust Community Detection Methods with Resolution Parameter for Complex Detection in Protein Protein Interaction Networks Twan van Laarhoven and Elena Marchiori Institute for Computing and Information
More informationAnalysis and visualization of protein-protein interactions. Olga Vitek Assistant Professor Statistics and Computer Science
1 Analysis and visualization of protein-protein interactions Olga Vitek Assistant Professor Statistics and Computer Science 2 Outline 1. Protein-protein interactions 2. Using graph structures to study
More informationarxiv: v3 [cs.lg] 18 Mar 2013
Hierarchical Data Representation Model - Multi-layer NMF arxiv:1301.6316v3 [cs.lg] 18 Mar 2013 Hyun Ah Song Department of Electrical Engineering KAIST Daejeon, 305-701 hyunahsong@kaist.ac.kr Abstract Soo-Young
More informationHub Gene Selection Methods for the Reconstruction of Transcription Networks
for the Reconstruction of Transcription Networks José Miguel Hernández-Lobato (1) and Tjeerd. M. H. Dijkstra (2) (1) Computer Science Department, Universidad Autónoma de Madrid, Spain (2) Institute for
More informationBME 5742 Biosystems Modeling and Control
BME 5742 Biosystems Modeling and Control Lecture 24 Unregulated Gene Expression Model Dr. Zvi Roth (FAU) 1 The genetic material inside a cell, encoded in its DNA, governs the response of a cell to various
More informationBioinformatics Chapter 1. Introduction
Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!
More informationMarkov Random Field Models of Transient Interactions Between Protein Complexes in Yeast
Markov Random Field Models of Transient Interactions Between Protein Complexes in Yeast Boyko Kakaradov Department of Computer Science, Stanford University June 10, 2008 Motivation: Mapping all transient
More informationDiscovering molecular pathways from protein interaction and ge
Discovering molecular pathways from protein interaction and gene expression data 9-4-2008 Aim To have a mechanism for inferring pathways from gene expression and protein interaction data. Motivation Why
More informationTowards Detecting Protein Complexes from Protein Interaction Data
Towards Detecting Protein Complexes from Protein Interaction Data Pengjun Pei 1 and Aidong Zhang 1 Department of Computer Science and Engineering State University of New York at Buffalo Buffalo NY 14260,
More informationFast Nonnegative Matrix Factorization with Rank-one ADMM
Fast Nonnegative Matrix Factorization with Rank-one Dongjin Song, David A. Meyer, Martin Renqiang Min, Department of ECE, UCSD, La Jolla, CA, 9093-0409 dosong@ucsd.edu Department of Mathematics, UCSD,
More informationIntegrating Ontological Prior Knowledge into Relational Learning
Stefan Reckow reckow@mpipsykl.mpg.de Max Planck Institute of Psychiatry, Proteomics and Biomarkers, 80804 Munich, Germany Volker Tresp volker.tresp@siemens.com Siemens AG, Corporate Research & Technology,
More information2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.
Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. A global picture of the protein universe will help us to understand
More informationAN ENHANCED INITIALIZATION METHOD FOR NON-NEGATIVE MATRIX FACTORIZATION. Liyun Gong 1, Asoke K. Nandi 2,3 L69 3BX, UK; 3PH, UK;
213 IEEE INERNAIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEP. 22 25, 213, SOUHAMPON, UK AN ENHANCED INIIALIZAION MEHOD FOR NON-NEGAIVE MARIX FACORIZAION Liyun Gong 1, Asoke K. Nandi 2,3
More informationZhongyi Xiao. Correlation. In probability theory and statistics, correlation indicates the
Character Correlation Zhongyi Xiao Correlation In probability theory and statistics, correlation indicates the strength and direction of a linear relationship between two random variables. In general statistical
More informationIntroduction to Bioinformatics
CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics
More informationOn the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering
On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering Chris Ding, Xiaofeng He, Horst D. Simon Published on SDM 05 Hongchang Gao Outline NMF NMF Kmeans NMF Spectral Clustering NMF
More informationEvidence for dynamically organized modularity in the yeast protein-protein interaction network
Evidence for dynamically organized modularity in the yeast protein-protein interaction network Sari Bombino Helsinki 27.3.2007 UNIVERSITY OF HELSINKI Department of Computer Science Seminar on Computational
More informationMIPCE: An MI-based protein complex extraction technique
MIPCE: An MI-based protein complex extraction technique PRIYAKSHI MAHANTA 1, *, DHRUBA KR BHATTACHARYYA 1 and ASHISH GHOSH 2 1 Department of Computer Science and Engineering, Tezpur University, Napaam
More informationV 5 Robustness and Modularity
Bioinformatics 3 V 5 Robustness and Modularity Mon, Oct 29, 2012 Network Robustness Network = set of connections Failure events: loss of edges loss of nodes (together with their edges) loss of connectivity
More informationPredicting Protein Functions and Domain Interactions from Protein Interactions
Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput
More informationBayesian Hierarchical Classification. Seminar on Predicting Structured Data Jukka Kohonen
Bayesian Hierarchical Classification Seminar on Predicting Structured Data Jukka Kohonen 17.4.2008 Overview Intro: The task of hierarchical gene annotation Approach I: SVM/Bayes hybrid Barutcuoglu et al:
More informationOn Spectral Basis Selection for Single Channel Polyphonic Music Separation
On Spectral Basis Selection for Single Channel Polyphonic Music Separation Minje Kim and Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 31 Hyoja-dong, Nam-gu
More informationLearning in Bayesian Networks
Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks
More informationDetecting temporal protein complexes from dynamic protein-protein interaction networks
Detecting temporal protein complexes from dynamic protein-protein interaction networks Le Ou-Yang, Dao-Qing Dai, Xiao-Li Li, Min Wu, Xiao-Fei Zhang and Peng Yang 1 Supplementary Table Table S1: Comparative
More informationComputational Biology: Basics & Interesting Problems
Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information
More informationAn Efficient Algorithm for Protein-Protein Interaction Network Analysis to Discover Overlapping Functional Modules
An Efficient Algorithm for Protein-Protein Interaction Network Analysis to Discover Overlapping Functional Modules Ying Liu 1 Department of Computer Science, Mathematics and Science, College of Professional
More informationNetwork by Weighted Graph Mining
2012 4th International Conference on Bioinformatics and Biomedical Technology IPCBEE vol.29 (2012) (2012) IACSIT Press, Singapore + Prediction of Protein Function from Protein-Protein Interaction Network
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationInferring Transcriptional Regulatory Networks from Gene Expression Data II
Inferring Transcriptional Regulatory Networks from Gene Expression Data II Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification
More informationCellular Neuroanatomy I The Prototypical Neuron: Soma. Reading: BCP Chapter 2
Cellular Neuroanatomy I The Prototypical Neuron: Soma Reading: BCP Chapter 2 Functional Unit of the Nervous System The functional unit of the nervous system is the neuron. Neurons are cells specialized
More informationMULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE
MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE Manmeet Kaur 1, Navneet Kaur Bawa 2 1 M-tech research scholar (CSE Dept) ACET, Manawala,Asr 2 Associate Professor (CSE Dept) ACET, Manawala,Asr
More informationSimulation of Gene Regulatory Networks
Simulation of Gene Regulatory Networks Overview I have been assisting Professor Jacques Cohen at Brandeis University to explore and compare the the many available representations and interpretations of
More informationEUSIPCO
EUSIPCO 2013 1569741067 CLUSERING BY NON-NEGAIVE MARIX FACORIZAION WIH INDEPENDEN PRINCIPAL COMPONEN INIIALIZAION Liyun Gong 1, Asoke K. Nandi 2,3 1 Department of Electrical Engineering and Electronics,
More informationNote on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing
Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing 1 Zhong-Yuan Zhang, 2 Chris Ding, 3 Jie Tang *1, Corresponding Author School of Statistics,
More informationNon-Negative Factorization for Clustering of Microarray Data
INT J COMPUT COMMUN, ISSN 1841-9836 9(1):16-23, February, 2014. Non-Negative Factorization for Clustering of Microarray Data L. Morgos Lucian Morgos Dept. of Electronics and Telecommunications Faculty
More informationFuzzy Clustering of Gene Expression Data
Fuzzy Clustering of Gene Data Matthias E. Futschik and Nikola K. Kasabov Department of Information Science, University of Otago P.O. Box 56, Dunedin, New Zealand email: mfutschik@infoscience.otago.ac.nz,
More informationA Complex-based Reconstruction of the Saccharomyces cerevisiae Interactome* S
Research Author s Choice A Complex-based Reconstruction of the Saccharomyces cerevisiae Interactome* S Haidong Wang, Boyko Kakaradov, Sean R. Collins **, Lena Karotki, Dorothea Fiedler **, Michael Shales,
More informationA New Method to Build Gene Regulation Network Based on Fuzzy Hierarchical Clustering Methods
International Academic Institute for Science and Technology International Academic Journal of Science and Engineering Vol. 3, No. 6, 2016, pp. 169-176. ISSN 2454-3896 International Academic Journal of
More informationCluster Analysis of Gene Expression Microarray Data. BIOL 495S/ CS 490B/ MATH 490B/ STAT 490B Introduction to Bioinformatics April 8, 2002
Cluster Analysis of Gene Expression Microarray Data BIOL 495S/ CS 490B/ MATH 490B/ STAT 490B Introduction to Bioinformatics April 8, 2002 1 Data representations Data are relative measurements log 2 ( red
More informationBMD645. Integration of Omics
BMD645 Integration of Omics Shu-Jen Chen, Chang Gung University Dec. 11, 2009 1 Traditional Biology vs. Systems Biology Traditional biology : Single genes or proteins Systems biology: Simultaneously study
More informationComparative Genomics II
Comparative Genomics II Advances in Bioinformatics and Genomics GEN 240B Jason Stajich May 19 Comparative Genomics II Slide 1/31 Outline Introduction Gene Families Pairwise Methods Phylogenetic Methods
More informationChapter 16. Clustering Biological Data. Chandan K. Reddy Wayne State University Detroit, MI
Chapter 16 Clustering Biological Data Chandan K. Reddy Wayne State University Detroit, MI reddy@cs.wayne.edu Mohammad Al Hasan Indiana University - Purdue University Indianapolis, IN alhasan@cs.iupui.edu
More informationMCB 110. "Molecular Biology: Macromolecular Synthesis and Cellular Function" Spring, 2018
MCB 110 "Molecular Biology: Macromolecular Synthesis and Cellular Function" Spring, 2018 Faculty Instructors: Prof. Jeremy Thorner Prof. Qiang Zhou Prof. Eva Nogales GSIs:!!!! Ms. Samantha Fernandez Mr.
More informationSUPPLEMENTARY INFORMATION
Supplementary information S3 (box) Methods Methods Genome weighting The currently available collection of archaeal and bacterial genomes has a highly biased distribution of isolates across taxa. For example,
More informationClustering and Network
Clustering and Network Jing-Dong Jackie Han jdhan@picb.ac.cn http://www.picb.ac.cn/~jdhan Copy Right: Jing-Dong Jackie Han What is clustering? A way of grouping together data samples that are similar in
More informationConstraint-based Subspace Clustering
Constraint-based Subspace Clustering Elisa Fromont 1, Adriana Prado 2 and Céline Robardet 1 1 Université de Lyon, France 2 Universiteit Antwerpen, Belgium Thursday, April 30 Traditional Clustering Partitions
More informationApproximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)
Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles
More informationIntroduction to clustering methods for gene expression data analysis
Introduction to clustering methods for gene expression data analysis Giorgio Valentini e-mail: valentini@dsi.unimi.it Outline Levels of analysis of DNA microarray data Clustering methods for functional
More informationUE Praktikum Bioinformatik
UE Praktikum Bioinformatik WS 08/09 University of Vienna 7SK snrna 7SK was discovered as an abundant small nuclear RNA in the mid 70s but a possible function has only recently been suggested. Two independent
More informationMulti Omics Clustering. ABDBM Ron Shamir
Multi Omics Clustering ABDBM Ron Shamir 1 Outline Introduction Cluster of Clusters (COCA) icluster Nonnegative Matrix Factorization (NMF) Similarity Network Fusion (SNF) Multiple Kernel Learning (MKL)
More informationIntegration of functional genomics data
Integration of functional genomics data Laboratoire Bordelais de Recherche en Informatique (UMR) Centre de Bioinformatique de Bordeaux (Plateforme) Rennes Oct. 2006 1 Observations and motivations Genomics
More informationInferring Transcriptional Regulatory Networks from High-throughput Data
Inferring Transcriptional Regulatory Networks from High-throughput Data Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20
More informationXiaosi Zhang. A thesis submitted to the graduate faculty. in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE
GENE EXPRESSION PATTERN ANALYSIS Xiaosi Zhang A thesis submitted to the graduate faculty in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Major: Bioinformatics and Computational
More informationChallenges and Rewards of Interaction Proteomics
MCP Papers in Press. Published on September 17, 2008 as Manuscript R800014-MCP200 Challenges and Rewards of Interaction Proteomics Shoshana J. Wodak 1,2,3 #, Shuye Pu 1, James Vlasblom 1, and Bertrand
More information10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison
10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:
More informationCOMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017
COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University TOPIC MODELING MODELS FOR TEXT DATA
More informationIntroduction to clustering methods for gene expression data analysis
Introduction to clustering methods for gene expression data analysis Giorgio Valentini e-mail: valentini@dsi.unimi.it Outline Levels of analysis of DNA microarray data Clustering methods for functional
More informationEnsembles of Classifiers.
Ensembles of Classifiers www.biostat.wisc.edu/~dpage/cs760/ 1 Goals for the lecture you should understand the following concepts ensemble bootstrap sample bagging boosting random forests error correcting
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu November 16, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision
More informationarxiv: v1 [stat.ml] 23 Dec 2015
k-means Clustering Is Matrix Factorization Christian Bauckhage arxiv:151.07548v1 [stat.ml] 3 Dec 015 B-IT, University of Bonn, Bonn, Germany Fraunhofer IAIS, Sankt Augustin, Germany http://mmprec.iais.fraunhofer.de/bauckhage.html
More informationTiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1
Tiffany Samaroo MB&B 452a December 8, 2003 Take Home Final Topic 1 Prior to 1970, protein and DNA sequence alignment was limited to visual comparison. This was a very tedious process; even proteins with
More informationProtoNet 4.0: A hierarchical classification of one million protein sequences
ProtoNet 4.0: A hierarchical classification of one million protein sequences Noam Kaplan 1*, Ori Sasson 2, Uri Inbar 2, Moriah Friedlich 2, Menachem Fromer 2, Hillel Fleischer 2, Elon Portugaly 2, Nathan
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu March 16, 2016 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision
More informationGene expression microarray technology measures the expression levels of thousands of genes. Research Article
JOURNAL OF COMPUTATIONAL BIOLOGY Volume 7, Number 2, 2 # Mary Ann Liebert, Inc. Pp. 8 DOI:.89/cmb.29.52 Research Article Reducing the Computational Complexity of Information Theoretic Approaches for Reconstructing
More informationVisualize Biological Database for Protein in Homosapiens Using Classification Searching Models
International Journal of Computational Intelligence Research ISSN 0973-1873 Volume 13, Number 2 (2017), pp. 213-224 Research India Publications http://www.ripublication.com Visualize Biological Database
More informationPREDICTION OF HETERODIMERIC PROTEIN COMPLEXES FROM PROTEIN-PROTEIN INTERACTION NETWORKS USING DEEP LEARNING
PREDICTION OF HETERODIMERIC PROTEIN COMPLEXES FROM PROTEIN-PROTEIN INTERACTION NETWORKS USING DEEP LEARNING Peiying (Colleen) Ruan, PhD, Deep Learning Solution Architect 3/26/2018 Background OUTLINE Method
More informationModern Information Retrieval
Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction
More informationClass 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio
Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant
More informationExtending the Associative Rule Chaining Architecture for Multiple Arity Rules
Extending the Associative Rule Chaining Architecture for Multiple Arity Rules Nathan Burles, James Austin, and Simon O Keefe Advanced Computer Architectures Group Department of Computer Science University
More informationIterative Laplacian Score for Feature Selection
Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,
More informationWelcome to Class 21!
Welcome to Class 21! Introductory Biochemistry! Lecture 21: Outline and Objectives l Regulation of Gene Expression in Prokaryotes! l transcriptional regulation! l principles! l lac operon! l trp attenuation!
More informationMulti-Task Clustering using Constrained Symmetric Non-Negative Matrix Factorization
Multi-Task Clustering using Constrained Symmetric Non-Negative Matrix Factorization Samir Al-Stouhi Chandan K. Reddy Abstract Researchers have attempted to improve the quality of clustering solutions through
More informationAutomatic Rank Determination in Projective Nonnegative Matrix Factorization
Automatic Rank Determination in Projective Nonnegative Matrix Factorization Zhirong Yang, Zhanxing Zhu, and Erkki Oja Department of Information and Computer Science Aalto University School of Science and
More informationPredictive analysis on Multivariate, Time Series datasets using Shapelets
1 Predictive analysis on Multivariate, Time Series datasets using Shapelets Hemal Thakkar Department of Computer Science, Stanford University hemal@stanford.edu hemal.tt@gmail.com Abstract Multivariate,
More informationMathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007
-2 Transcript Alignment Assembly and Automated Gene Structure Improvements Using PASA-2 Mathangi Thiagarajan mathangi@jcvi.org Rice Genome Annotation Workshop May 23rd, 2007 About PASA PASA is an open
More informationWritten Exam 15 December Course name: Introduction to Systems Biology Course no
Technical University of Denmark Written Exam 15 December 2008 Course name: Introduction to Systems Biology Course no. 27041 Aids allowed: Open book exam Provide your answers and calculations on separate
More informationDifferential Modeling for Cancer Microarray Data
Differential Modeling for Cancer Microarray Data Omar Odibat Department of Computer Science Feb, 01, 2011 1 Outline Introduction Cancer Microarray data Problem Definition Differential analysis Existing
More informationField 045: Science Life Science Assessment Blueprint
Field 045: Science Life Science Assessment Blueprint Domain I Foundations of Science 0001 The Nature and Processes of Science (Standard 1) 0002 Central Concepts and Connections in Science (Standard 2)
More informationInteraction Network Topologies
Proceedings of International Joint Conference on Neural Networks, Montreal, Canada, July 31 - August 4, 2005 Inferrng Protein-Protein Interactions Using Interaction Network Topologies Alberto Paccanarot*,
More informationGCD3033:Cell Biology. Transcription
Transcription Transcription: DNA to RNA A) production of complementary strand of DNA B) RNA types C) transcription start/stop signals D) Initiation of eukaryotic gene expression E) transcription factors
More information86 Part 4 SUMMARY INTRODUCTION
86 Part 4 Chapter # AN INTEGRATION OF THE DESCRIPTIONS OF GENE NETWORKS AND THEIR MODELS PRESENTED IN SIGMOID (CELLERATOR) AND GENENET Podkolodny N.L. *1, 2, Podkolodnaya N.N. 1, Miginsky D.S. 1, Poplavsky
More informationStructure and Centrality of the Largest Fully Connected Cluster in Protein-Protein Interaction Networks
22 International Conference on Environment Science and Engieering IPCEE vol.3 2(22) (22)ICSIT Press, Singapoore Structure and Centrality of the Largest Fully Connected Cluster in Protein-Protein Interaction
More informationINTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA
INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA XIUFENG WAN xw6@cs.msstate.edu Department of Computer Science Box 9637 JOHN A. BOYLE jab@ra.msstate.edu Department of Biochemistry and Molecular Biology
More informationComputational approaches for functional genomics
Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding
More informationBioinformatics 2. Yeast two hybrid. Proteomics. Proteomics
GENOME Bioinformatics 2 Proteomics protein-gene PROTEOME protein-protein METABOLISM Slide from http://www.nd.edu/~networks/ Citrate Cycle Bio-chemical reactions What is it? Proteomics Reveal protein Protein
More informationComplete all warm up questions Focus on operon functioning we will be creating operon models on Monday
Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday 1. What is the Central Dogma? 2. How does prokaryotic DNA compare to eukaryotic DNA? 3. How is DNA
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More informationMTopGO: a tool for module identification in PPI Networks
MTopGO: a tool for module identification in PPI Networks Danila Vella 1,2, Simone Marini 3,4, Francesca Vitali 5,6,7, Riccardo Bellazzi 1,4 1 Clinical Scientific Institute Maugeri, Pavia, Italy, 2 Department
More informationGrundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson
Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 1 1 Introduction Grundlagen der Bioinformatik Summer semester 2010 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a)
More informationIntroduction. Gene expression is the combined process of :
1 To know and explain: Regulation of Bacterial Gene Expression Constitutive ( house keeping) vs. Controllable genes OPERON structure and its role in gene regulation Regulation of Eukaryotic Gene Expression
More informationSystems biology and biological networks
Systems Biology Workshop Systems biology and biological networks Center for Biological Sequence Analysis Networks in electronics Radio kindly provided by Lazebnik, Cancer Cell, 2002 Systems Biology Workshop,
More informationidentifiers matched to homologous genes. Probeset annotation files for each array platform were used to
SUPPLEMENTARY METHODS Data combination and normalization Prior to data analysis we first had to appropriately combine all 1617 arrays such that probeset identifiers matched to homologous genes. Probeset
More informationProtein Expression Molecular Pattern Discovery by Nonnegative Principal Component Analysis
Protein Expression Molecular Pattern Discovery by Nonnegative Principal Component Analysis Xiaoxu Han and Joseph Scazzero Department of Mathematics and Bioinformatics Program Department of Accounting and
More informationIdentifying Signaling Pathways
These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by Anthony Gitter, Mark Craven, Colin Dewey Identifying Signaling Pathways BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2018
More informationhsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference
CS 229 Project Report (TR# MSB2010) Submitted 12/10/2010 hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference Muhammad Shoaib Sehgal Computer Science
More informationData visualization and clustering: an application to gene expression data
Data visualization and clustering: an application to gene expression data Francesco Napolitano Università degli Studi di Salerno Dipartimento di Matematica e Informatica DAA Erice, April 2007 Thanks to
More informationCAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan
CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs15.html Describing & Modeling Patterns
More information