Università della Calabria

Size: px
Start display at page:

Download "Università della Calabria"

Transcription

1 Università della Calabria Facoltà di Ingegneria BIOINFORMATICS TECHNIQUES AND METHODOLOGIES Research group coordinated by Prof. Luigi Palopoli Lecturer: Simona Rombo

2 OUTLINE 1. Introduction to Bioinformatics 2. Pattern discovery Strings Images 3. Biological Networks Analysis Network alignment Network clustering 2

3 Introduction to Bioinformatics Donald Knuth, 1993: It is hard for me to say confidently that, after fifty more years of explosive growth of computer science, there will still be a lot of fascinating unsolved problems at people fingertips, that it won t be pretty much working on refinement of wellexplored things. Maybe all of the simple stuff and the really great stuff has been discovered. It may not be true, but I can t predict an unending growth. I can t be as confident about computer science as I can about biology. Biology easily has 500 years of exciting problems to work on 3

4 Introduction to Bioinformatics There are several facts about biology that are important to keep in mind: In biology there are no rules without exceptions In reasoning with biological structures, looking for generalizations maybe often misleading It is often impossible to look at a biological phenomenon in isolation, for it may take place just as long as other related phenomena take place as well, which need to be taken care of too To reason with incomplete information is quite the rule rather than the exception In reasoning about biological structures and functions it is important to bear in mind the pervasive role of evolution 4

5 Introduction to Bioinformatics A definition: Bioinformatics is the combination of biology and Information technology. It is the branch of science that deals with computer-based analysis of large biological data sets. Bioinformatics incorporates the development of databases to store and search data, and statistical tools and algorithms to analyze and determine relationships between biological data sets, such as macromolecular sequences, structures, expression profiles and biochemical pathways. (R.M. Twyman) In most cases, computer based tools developed in bioinformatics require expert human intervention for the addressed problems to get solved 5

6 Introduction to Bioinformatics Generally speaking, the aim of bioinformatics is to help biologists in gathering and processing biological data and to aid in studying protein structures and interactions in order to allow optimal drug design. 6

7 Introduction to Bioinformatics Here is a summary of CS methods and techniques relevant to bioinformatics: String algorithms, grammars and automata Indexing methods and query optimization Integration techniques Optimization techniques Dynamic programming and heuristics Data mining and machine learning techniques Probability and statistic-based methods Computational geometry methods Text mining 7

8 Introduction to Bioinformatics Two main points of view: 1. Cellular components (e.g., DNA, RNA, proteins) 2. Interaction of cellular components (e.g., metabolic pathways, protein-protein interactions) 8

9 Introduction to Bioinformatics Cellular Components 9

10 Introduction to Bioinformatics Cellular Components DNA 10

11 Introduction to Bioinformatics Cellular Components AMINO ACIDS Proteins are the core structures determining cell lifecycle; they are made up of elementary units called amino acids (few exceptions exist) or residues; There are 20 amino acids in nature 11

12 Introduction to Bioinformatics Interactions of components Another perspective is the analysis of protein mutual interactions Proteins are involved in complexes performing specific biological functions Saccaromyces Cerevisiae 12

13 Pattern Discovery 13

14 Pattern discovery Efficient data structures Trie A tree data structure used to store strings Each edge has a label representing a symbol Two edges out of the same node have distinct labels Each node, except the root, is associated with a string Concatenating all the symbols in the path from the root to a node n, the string corresponding to n is obtained All the descendance of the same node n are associated with strings having a common prefix, i.e., the string corresponding to n 14

15 Pattern discovery Example A trie storing the words {to, te, tea, ten, hi, he, her}: t o e to a tea h e i te n hi ten he r her 15

16 Pattern discovery Efficient data structures Suffix Tree Given a string s of n caracters on the alphabet Σ, a suffix tree T associated to s can be defined as a trie containing all the n suffixes of s. For each leaf of T, the concatenation of the edge labels on the path from the root to leaf i exactly spells out the suffix si of s For any pairs of suffixes in s, the path associated with their longer prefix is the same in T (Example on the string abbababbab) 16

17 Pattern Discovery 17

18 Pattern Discovery 18

19 Pattern Discovery 19

20 Pattern Discovery 20

21 Pattern Discovery 21

22 Pattern Discovery 22

23 Pattern Discovery 23

24 Pattern Discovery Problem: often the size of the output is exponential in the input size 24

25 Pattern Discovery 25

26 Pattern Discovery 2D Array 26

27 Pattern Discovery 2D Array 27

28 Definition of maximal motif MAXIMAL not in composition not in length 28

29 29

30 BASIS A basis of an image I is a set of irredundant motifs able to generate all the other motifs of I It is possible to prove that each image has ONLY ONE basis the basis is unique The size of the basis is linear in the size of the image - If I has size N, the number of motifs in the basis is O(N) In general, the number of motifs with don t care in I is exponential in N An important problem is the extraction of the basis from I 30

31 A key concept: autocorrelation Autocorrelations: the meet between I and all its bites P ababbbbaba ba b ba b baba bababababa bababababa bbb b baba b b bababa bab b b ba b b b baba bbb ba b ba ba bbbbbaabab A bbbbaba babbaba bababab abbbbab ababbbb Q bababa bababa bbbbab ababbb meet between P and Q: b b bab b b ab bb b b bab b b ab bb 31

32 Consensus, Meet, Autocorrelation Projection at (i1, j1) and (i2, j2) 32

33 Basic Approach Theorem: the basis is a subset of the set of autocorrelations Three steps: 1. Generate all the autocorrelations of the inpute image I 2. Compute the lists of occurrences of the autocorrelations 3. Discard irredundant motifs 1. O(N2) 2.? 3. O(N2) 33

34 Second step 1) Fisher & Paterson O(N2lognloglogn) 2) Incremental building of the setb of irredundant motifs O(N3) j ababbbbaba bababababa bababababa i bbbbbbbaba bababababa ij bbbbbaabab R Bij Bij+1 3) Exploit some properties about don t cares O(N2), but only for binary alphabets 34

35 Optimal Approach Exploit some properties holding for Σ =2 (e.g., Σ ={a,b}) 35

36 Optimal Approach - Example d1=2 Is (2, 2) an occurrence of A34? d2=0 d3=2 Is (2, 4) an occurrence of A34? d2=1 d3=1 36

37 Optimal Approach Three steps: 1. Generate all the autocorrelations of the inpute image I 2. Compute the lists of occurrences of the autocorrelations 3. Discard irredundant motifs 1. O(N2) 2. O(N2) Only black-and-white Images 3. O(N2) Overall Cost: O(N2) 37

38 Image Compression Main Idea: Exploit motif basis as 2D patches 38

39 Image Compression 39

40 Image Compression 40

41 Pattern discovery References: A. Amelio, A. Apostolico and S. E. Rombo. Image Compression by 2D Motif Basis. In Proceedings of IEEE Data Compression Conference (DCC 2011), IEEE CS Press, Snowbird, UT, USA, 2011 (Forthcoming). A. Apostolico, L. Parida and S. E. Rombo, Motif Patterns in 2D. Theoretical Computer Science S. E. Rombo: Optimal extraction of motif patterns in 2D. Inf. Process. Lett. 109(17): (2009). A. Apostolico and L. Parida, Incremental Paradigms of Motif Discovery, J. of Comp. Biol. 11:1 (2004) A. Amir and M. Farach, Two-dimensional dictionary matching, Inf. Process. Lett. 44:5 (1992) M.J. Fisher and M.S. Paterson, String Matching and Other Products, in: R.M. Karp (Ed.), Complexity of Computation (SIAM-AMS Proceedings, v.7), 1974, pp

42 Pattern discovery Approfondimenti (dal 2009 in poi): Compressione di immagini Analisi di immagini biologiche Pattern discovery/matching su immagini con rotazioni, scaling e altre varianti Tecniche applicate alla ricerca di similarità tra immagini Pattern discovery (motif extraction) su stringhe biologiche 42

43 Biological Networks Analysis PPI networks similarity search Evolution influence protein-protein interactions Proteins cannot be analyzed independently Both high-throughput and computational methods contribute to discover and predict protein-protein interactions 43

44 Biological Networks Analysis The Interaction Network of an organism: nodes= proteins edges= interactions 44

45 Biological Networks Analysis Why searching for similarity between proteins belonging to different PPI networks? To individuate functional conservations across species 45

46 Biological Networks Analysis Our basic idea Two proteins p1 and p2 in two different PPI networks may be considered similar if: p1 and p2 have similar sequences proteins p1 and p2 are connected with, i.e., their neighborhoods, have similar sequences 46

47 Biological Networks Analysis Refining protein similarities S=sequence similarity 47

48 Biological Networks Analysis Refining protein similarities S =refined similarity 48

49 Biological Networks Analysis The Graph Network P = a set of nodes labeled by proteins id I = a set of indirect labeled edges <w,c> w,c [0,1] w = weakness c = confidence Graph Network: GN = <P,I> 49

50 Biological Networks Analysis Interaction Pathi (I-Pathi) A path such that: F(i-1) Σu wu F(i), i 1, F(0) = 0 Example: p1 <0.8,0.4> p2 <0.2,0.7> p6 <0.1,0.6> p4 <0.3,0.4> p5 <0.6,0.2> <0.9,0.4> p8 p9 p3 <0.7,0.1> p7 F(x)=x2 i=1 <p2, p1, p4> satisfied <p3, p4, p5, p6 > satisfied <p4, p5, p9 > not satisfied <0.5,0.3> 50

51 Biological Networks Analysis Cumulative Confidence Given an I-Pathi: C=Πucu Example: p1 <0.8,0.4> p2 <0.2,0.7> p6 <0.1,0.6> p4 <0.3,0.4> p5 <0.6,0.2> <0.9,0.4> p8 p9 p3 <0.7,0.1> p7 F(x)=x2 i=1 For the path <p2, p1, p4>: C = 0.4 * 0.7 = 0.28 <0.5,0.3> 51

52 Biological Networks Analysis i-th Neighborhood Given a node p in GN = <P,I>: N(p,i)={q q P, q p, <p,q> is an I-Pathi in GN with minimum Σuwu} Example: p1 p2 <0.3,0.4> p3 <0.6,0.2> <0.9,0.4> p5 p6 <0.7,0.1> p4 F(x)=x2 i=1 N(p,i)={p, p, p, p } <0.5,0.3> 52

53 Biological Networks Analysis The Bi-GRAPPIN Algorithm Let GN 1 and GN 2 be graph networks of two different organisms, with n1 and n2 nodes, resp. Align each pair of proteins (p,p ) p GN 1 and p GN 2 (e.g., by the BLAST 2 seq. algorithm) 53

54 Biological Networks Analysis The Bi-GRAPPIN Algorithm INPUT: a sequence similarity dictionary SSD storing all the triplets: <p, p, f0> p GN 1, p GN 2, f0 [0,1] f0: obtained by sequence alignment parameters OUTPUT: a dictionary FSD storing: <p, p, fp> p GN 1, p GN 2, fp [0,1] fp: functional similarity 54

55 Biological Networks Analysis The Bi-GRAPPIN Algorithm FSD = SSD for each <p,p, f0> SSD if (f0 > fcut-off ) set i=1 while i<imax a fixed treshold value corr. to the maximum network percentage to be analized generate N(p,i) and N(p,i) compute a bipartite graph maximum weight matching between N(p,i) and N(p,i) refine f0 obtaining a new value fp, according to the objective function of the max. weight matching i=i+1 return FSD 55

56 Biological Networks Analysis Example (1/3) yeast Target N(, 1) P P fly imax =4 f0(p,p )>fcut OFF F(x)=Identity <w,c> = <1,1> 56

57 Biological Networks Analysis Example (2/3) Bipartite graph maximum weight matching between N(p,1) and N(p,1) 0,75 0,22 yeast 0,83 0,34 0,89 0,85 0,73 fly 0,82 0,33 0,65 57

58 Biological Networks Analysis Example (2/3) Bipartite graph maximum weight matching between N(p,1) and N(p,1) 0,75 0,22 yeast 0,83 0,34 0, ,73 fly 0,82 0,33 0,65 fp(1)=δ(1)*µ(n(p,1),n(p,1),fsd,α)+[1 δ(1)]* f0(p,p ) 58

59 Biological Networks Analysis Example (3/3) yeast Target N(, 1) P P fly imax =4 f0(p,p )>fcut OFF F(x)=Identity <w,c> = <1,1> 59

60 Biological Networks Analysis Example (3/3) yeast Target N(, 1) N(, 2) P P fly imax =4 f0(p,p )>fcut OFF F(x)=Identity <w,c> = <1,1> 60

61 Biological Networks Analysis Example (3/3) yeast P P Target N(, 1) N(, 2) N(, 3) <p, p, fp(3)> FSD fly imax =4 f0(p,p )>fcut OFF F(x)=Identity <w,c> = <1,1> 61

62 Biological Networks Analysis Synthetic data (1/3) Very similar neighborhoods: final fp greater than f0 62

63 Biological Networks Analysis Synthetic data (2/3) High f0 but very dissimilar neighborhoods: final fp lower than f0 63

64 Biological Networks Analysis Synthetic data (3/3) High f0, not very similar N(, 1) but very similar N(, 2) : final fp greater than f0 64

65 Functional Orthologs S. Bandyopadhyay, R. Sharan, and T. Ideker. Systematic identification of functional orthologs based on protein network comparison. Genome Research, 16(3): , R. Singh, J. Xu, and B. Berger. Pairwise global alignment of protein interaction networks by matching neighborhood topology. In RECOMB LNB,

66 Biological Networks Analysis Further experiments Query D. Melanogaster PPI network with Abp1, for which no evident homolog has been detected The most similar protein based on the sequence homology: CG10083 (a debrin-like protein) Abp1: an actin binding protein regulating actin nucleation Is it possible to find other proteins involved in actin reorganization, comparing the sub-net composing Abp1 together with its first two neighborhoods against the entire drosophila network? 66

67 Biological Networks Analysis Further experiments Best match according to our refined similarity: CG10083 (confirm the pairwise sequence similarity) Abp1 and CG10083 are both Actin-binding proteins Other proteins of unknown functions showing low sequence similarity with Abp1, may share similar function CG6873-PA: a cofilin-like protein possibly involved in cytoskeleton shaping SSD: <Abp1, CG6873-PA, 0.287> FSD: <Abp1, CG6873-PA, > 67

68 Biological Networks Analysis Asymmetric Alignment Master Network Guides the alignment process Slave Network It s aligned to the master Some well-characterized organisms: E.g. Saccharomyces Cerevisiae This is not the case for many other organisms Advantage: Results retain the structural characteristic of the master network (so they are sound ) 68

69 Biological Networks Analysis Asymmetric Alignment Linearization of the slave network: Translation of the network into a sequence of symbols Given a linearization of the slave find the portion of the master that can be associated to it Motivations: Only the slave network is linearized, all the structural information about the master network are kept The approximation allows us to find similar groups of proteins, not just isomorphic structures The resulting algorithm has a polynomial time complexity 69

70 Biological Networks Analysis Asymmetric Alignment Master network Alignment Model Weighted finite-state automaton States of the model corresponds to proteins (p1, 0), (p2, 1),..., (p3, 0) score 1 (p1, 0), (*, 1),..., (*, 0) score 2 Find the maximum scoring path (among the states of the master) for the linearization of the slave network: Viterbi Algorithm 70

71 Biological Networks Analysis Asymmetric Alignment Global Alignment of Yeast (Master) and Fly (Slave) 71

72 Biological Networks Analysis Asymmetric Alignment Yeast (as the master) vs. Fly: 945 protein pairings Fly (as the master) vs. Yeast: 707 protein pairings Possible explanation: Yeast network is better characterized than Fly network with yeast as slave much structural information gets lost There are more regions of the Yeast that have been conserved in the Fly than vice versa, since the Fly is more complex 72

73 Biological Networks Analysis PPI networks clustering Aim: clustering dense regions of a given PPI network, since it has been observed by biologists that groups of highly interacting proteins could be involved in common biological processes 73

74 Biological Networks Analysis Search of functional modules in PPI networks The network is modeled by a matrix representing the interactions. The algorithm introduces the concept of quality of a sub-matrix and apply a greedy tecnique to discover compact regions of the network. 74

75 Biological Networks Analysis 75

76 Biological Networks Analysis 76

77 Biological Networks Analysis 77

78 Biological Networks Analysis 78

79 Biological Networks Analysis Validation 79

80 Biological Networks Analysis References 1. N. Ferraro, L. Palopoli, S. Panni and S. E. Rombo. Master-Slave Biological Network Alignment. In Proceedings of 6th International symposium on Bioinformatics Research and Applications (ISBRA 2010), , Connecticut, USA, F. Bruno, L. Palopoli and S. E. Rombo. New trends in graph mining: Structural and Node-colored network motifs. International Journal of Knowledge Discovery in Bioinformatics, 1(1), 81 99, C. Pizzuti and S. E. Rombo. Multi-functional Protein Clustering in PPI Networks. BIRD V. Fionda, S. Panni, L. Palopoli and S. E. Rombo. Bi-GRAPPIN: Bipartite graph based protein-protein interaction networks similarity search. In Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM'07). Silicon Valley, USA, C. Pizzuti and S. E. Rombo. PINCoC: a Co-Clustering based Method to Analyze Protein-Protein Interaction Networks. In Proceedings of the 8th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL'07). Birmingham, UK, 16th-19th December, S. Bandyopadhyay, R. Sharan, and T. Ideker. Systematic identification of functional orthologs based on protein network comparison. Genome Research, 16(3): ,

81 Biological Networks Analysis Approfondimenti (dal 2009 in poi): Alignment of biological networks Integration and cleaning of biological networks Querying of biological databases/networks Biological networks clustering RNA structure prediction RNA sequence/structure alignment 81

Comparative Network Analysis

Comparative Network Analysis Comparative Network Analysis BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2016 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by

More information

Computational methods for predicting protein-protein interactions

Computational methods for predicting protein-protein interactions Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational

More information

Network alignment and querying

Network alignment and querying Network biology minicourse (part 4) Algorithmic challenges in genomics Network alignment and querying Roded Sharan School of Computer Science, Tel Aviv University Multiple Species PPI Data Rapid growth

More information

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms

More information

Pattern Matching (Exact Matching) Overview

Pattern Matching (Exact Matching) Overview CSI/BINF 5330 Pattern Matching (Exact Matching) Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Pattern Matching Exhaustive Search DFA Algorithm KMP Algorithm

More information

Network Alignment 858L

Network Alignment 858L Network Alignment 858L Terms & Questions A homologous h Interolog = B h Species 1 Species 2 Are there conserved pathways? What is the minimum set of pathways required for life? Can we compare networks

More information

Algorithms for Molecular Biology

Algorithms for Molecular Biology Algorithms for Molecular Biology BioMed Central Research A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series Sara C Madeira* 1,2,3 and Arlindo

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:

Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17: Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:50 5001 5 Multiple Sequence Alignment The first part of this exposition is based on the following sources, which are recommended reading:

More information

Quantifying sequence similarity

Quantifying sequence similarity Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity

More information

Computational Systems Biology

Computational Systems Biology Computational Systems Biology Vasant Honavar Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Graduate Program Center for Computational Intelligence, Learning, & Discovery

More information

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

CISC 636 Computational Biology & Bioinformatics (Fall 2016) CISC 636 Computational Biology & Bioinformatics (Fall 2016) Predicting Protein-Protein Interactions CISC636, F16, Lec22, Liao 1 Background Proteins do not function as isolated entities. Protein-Protein

More information

Understanding Science Through the Lens of Computation. Richard M. Karp Nov. 3, 2007

Understanding Science Through the Lens of Computation. Richard M. Karp Nov. 3, 2007 Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007 The Computational Lens Exposes the computational nature of natural processes and provides a language for their description.

More information

Graph Alignment and Biological Networks

Graph Alignment and Biological Networks Graph Alignment and Biological Networks Johannes Berg http://www.uni-koeln.de/ berg Institute for Theoretical Physics University of Cologne Germany p.1/12 Networks in molecular biology New large-scale

More information

BIOINFORMATICS. Improved Network-based Identification of Protein Orthologs. Nir Yosef a,, Roded Sharan a and William Stafford Noble b

BIOINFORMATICS. Improved Network-based Identification of Protein Orthologs. Nir Yosef a,, Roded Sharan a and William Stafford Noble b BIOINFORMATICS Vol. no. 28 Pages 7 Improved Network-based Identification of Protein Orthologs Nir Yosef a,, Roded Sharan a and William Stafford Noble b a School of Computer Science, Tel-Aviv University,

More information

Discovering Binding Motif Pairs from Interacting Protein Groups

Discovering Binding Motif Pairs from Interacting Protein Groups Discovering Binding Motif Pairs from Interacting Protein Groups Limsoon Wong Institute for Infocomm Research Singapore Copyright 2005 by Limsoon Wong Plan Motivation from biology & problem statement Recasting

More information

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Bioinformatics. Dept. of Computational Biology & Bioinformatics Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS

More information

Comparative Genomics: Sequence, Structure, and Networks. Bonnie Berger MIT

Comparative Genomics: Sequence, Structure, and Networks. Bonnie Berger MIT Comparative Genomics: Sequence, Structure, and Networks Bonnie Berger MIT Comparative Genomics Look at the same kind of data across species with the hope that areas of high correlation correspond to functional

More information

Bioinformatics: Network Analysis

Bioinformatics: Network Analysis Bioinformatics: Network Analysis Comparative Network Analysis COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 Biomolecular Network Components 2 Accumulation of Network Components

More information

On the Monotonicity of the String Correction Factor for Words with Mismatches

On the Monotonicity of the String Correction Factor for Words with Mismatches On the Monotonicity of the String Correction Factor for Words with Mismatches (extended abstract) Alberto Apostolico Georgia Tech & Univ. of Padova Cinzia Pizzi Univ. of Padova & Univ. of Helsinki Abstract.

More information

Module 9: Tries and String Matching

Module 9: Tries and String Matching Module 9: Tries and String Matching CS 240 - Data Structures and Data Management Sajed Haque Veronika Irvine Taylor Smith Based on lecture notes by many previous cs240 instructors David R. Cheriton School

More information

CSE182-L7. Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding CSE182

CSE182-L7. Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding CSE182 CSE182-L7 Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding 10-07 CSE182 Bell Labs Honors Pattern matching 10-07 CSE182 Just the Facts Consider the set of all substrings

More information

Phylogenetic Analysis of Molecular Interaction Networks 1

Phylogenetic Analysis of Molecular Interaction Networks 1 Phylogenetic Analysis of Molecular Interaction Networks 1 Mehmet Koyutürk Case Western Reserve University Electrical Engineering & Computer Science 1 Joint work with Sinan Erten, Xin Li, Gurkan Bebek,

More information

Foreword. Grammatical inference. Examples of sequences. Sources. Example of problems expressed by sequences Switching the light

Foreword. Grammatical inference. Examples of sequences. Sources. Example of problems expressed by sequences Switching the light Foreword Vincent Claveau IRISA - CNRS Rennes, France In the course of the course supervised symbolic machine learning technique concept learning (i.e. 2 classes) INSA 4 Sources s of sequences Slides and

More information

A New Dominant Point-Based Parallel Algorithm for Multiple Longest Common Subsequence Problem

A New Dominant Point-Based Parallel Algorithm for Multiple Longest Common Subsequence Problem A New Dominant Point-Based Parallel Algorithm for Multiple Longest Common Subsequence Problem Dmitry Korkin This work introduces a new parallel algorithm for computing a multiple longest common subsequence

More information

Linear Classifiers (Kernels)

Linear Classifiers (Kernels) Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers (Kernels) Blaine Nelson, Christoph Sawade, Tobias Scheffer Exam Dates & Course Conclusion There are 2 Exam dates: Feb 20 th March

More information

Introduction to Bioinformatics. Shifra Ben-Dor Irit Orr

Introduction to Bioinformatics. Shifra Ben-Dor Irit Orr Introduction to Bioinformatics Shifra Ben-Dor Irit Orr Lecture Outline: Technical Course Items Introduction to Bioinformatics Introduction to Databases This week and next week What is bioinformatics? A

More information

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic

More information

Tandem Mass Spectrometry: Generating function, alignment and assembly

Tandem Mass Spectrometry: Generating function, alignment and assembly Tandem Mass Spectrometry: Generating function, alignment and assembly With slides from Sangtae Kim and from Jones & Pevzner 2004 Determining reliability of identifications Can we use Target/Decoy to estimate

More information

Protein Complex Identification by Supervised Graph Clustering

Protein Complex Identification by Supervised Graph Clustering Protein Complex Identification by Supervised Graph Clustering Yanjun Qi 1, Fernanda Balem 2, Christos Faloutsos 1, Judith Klein- Seetharaman 1,2, Ziv Bar-Joseph 1 1 School of Computer Science, Carnegie

More information

Introduction to Bioinformatics

Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 1 1 Introduction Grundlagen der Bioinformatik Summer semester 2010 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a)

More information

Bioinformatics Chapter 1. Introduction

Bioinformatics Chapter 1. Introduction Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!

More information

PROTEINS form the basic building blocks of all living

PROTEINS form the basic building blocks of all living IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL.???, NO.???,??? 2010 1 Mining minimal motif pair sets maximally covering interactions in a protein-protein interaction network Peter

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms   Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Motif Extraction from Weighted Sequences

Motif Extraction from Weighted Sequences Motif Extraction from Weighted Sequences C. Iliopoulos 1, K. Perdikuri 2,3, E. Theodoridis 2,3,, A. Tsakalidis 2,3 and K. Tsichlas 1 1 Department of Computer Science, King s College London, London WC2R

More information

Lecture 10: May 19, High-Throughput technologies for measuring proteinprotein

Lecture 10: May 19, High-Throughput technologies for measuring proteinprotein Analysis of Gene Expression Data Spring Semester, 2005 Lecture 10: May 19, 2005 Lecturer: Roded Sharan Scribe: Daniela Raijman and Igor Ulitsky 10.1 Protein Interaction Networks In the past we have discussed

More information

Jeremy Chang Identifying protein protein interactions with statistical coupling analysis

Jeremy Chang Identifying protein protein interactions with statistical coupling analysis Jeremy Chang Identifying protein protein interactions with statistical coupling analysis Abstract: We used an algorithm known as statistical coupling analysis (SCA) 1 to create a set of features for building

More information

Peter Wood. Department of Computer Science and Information Systems Birkbeck, University of London Automata and Formal Languages

Peter Wood. Department of Computer Science and Information Systems Birkbeck, University of London Automata and Formal Languages and and Department of Computer Science and Information Systems Birkbeck, University of London ptw@dcs.bbk.ac.uk Outline and Doing and analysing problems/languages computability/solvability/decidability

More information

Introduction to Bioinformatics Online Course: IBT

Introduction to Bioinformatics Online Course: IBT Introduction to Bioinformatics Online Course: IBT Multiple Sequence Alignment Building Multiple Sequence Alignment Lec1 Building a Multiple Sequence Alignment Learning Outcomes 1- Understanding Why multiple

More information

Computational Structural Bioinformatics

Computational Structural Bioinformatics Computational Structural Bioinformatics ECS129 Instructor: Patrice Koehl http://koehllab.genomecenter.ucdavis.edu/teaching/ecs129 koehl@cs.ucdavis.edu Learning curve Math / CS Biology/ Chemistry Pre-requisite

More information

STATC141 Spring 2005 The materials are from Pairwise Sequence Alignment by Robert Giegerich and David Wheeler

STATC141 Spring 2005 The materials are from Pairwise Sequence Alignment by Robert Giegerich and David Wheeler STATC141 Spring 2005 The materials are from Pairise Sequence Alignment by Robert Giegerich and David Wheeler Lecture 6, 02/08/05 The analysis of multiple DNA or protein sequences (I) Sequence similarity

More information

Computational Molecular Biology (

Computational Molecular Biology ( Computational Molecular Biology (http://cmgm cmgm.stanford.edu/biochem218/) Biochemistry 218/Medical Information Sciences 231 Douglas L. Brutlag, Lee Kozar Jimmy Huang, Josh Silverman Lecture Syllabus

More information

E D I C T The internal extent formula for compacted tries

E D I C T The internal extent formula for compacted tries E D C T The internal extent formula for compacted tries Paolo Boldi Sebastiano Vigna Università degli Studi di Milano, taly Abstract t is well known [Knu97, pages 399 4] that in a binary tree the external

More information

CSCE555 Bioinformatics. Protein Function Annotation

CSCE555 Bioinformatics. Protein Function Annotation CSCE555 Bioinformatics Protein Function Annotation Why we need to do function annotation? Fig from: Network-based prediction of protein function. Molecular Systems Biology 3:88. 2007 What s function? The

More information

Genomics and bioinformatics summary. Finding genes -- computer searches

Genomics and bioinformatics summary. Finding genes -- computer searches Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence

More information

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9 Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic

More information

Predicting Protein Functions and Domain Interactions from Protein Interactions

Predicting Protein Functions and Domain Interactions from Protein Interactions Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput

More information

Combinatorial Optimization

Combinatorial Optimization Combinatorial Optimization Problem set 8: solutions 1. Fix constants a R and b > 1. For n N, let f(n) = n a and g(n) = b n. Prove that f(n) = o ( g(n) ). Solution. First we observe that g(n) 0 for all

More information

BLAST. Varieties of BLAST

BLAST. Varieties of BLAST BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database

More information

MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE

MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE Manmeet Kaur 1, Navneet Kaur Bawa 2 1 M-tech research scholar (CSE Dept) ACET, Manawala,Asr 2 Associate Professor (CSE Dept) ACET, Manawala,Asr

More information

CS375 Midterm Exam Solution Set (Fall 2017)

CS375 Midterm Exam Solution Set (Fall 2017) CS375 Midterm Exam Solution Set (Fall 2017) Closed book & closed notes October 17, 2017 Name sample 1. (10 points) (a) Put in the following blank the number of strings of length 5 over A={a, b, c} that

More information

Protein Structure Prediction Using Neural Networks

Protein Structure Prediction Using Neural Networks Protein Structure Prediction Using Neural Networks Martha Mercaldi Kasia Wilamowska Literature Review December 16, 2003 The Protein Folding Problem Evolution of Neural Networks Neural networks originally

More information

METABOLIC PATHWAY PREDICTION/ALIGNMENT

METABOLIC PATHWAY PREDICTION/ALIGNMENT COMPUTATIONAL SYSTEMIC BIOLOGY METABOLIC PATHWAY PREDICTION/ALIGNMENT Hofestaedt R*, Chen M Bioinformatics / Medical Informatics, Technische Fakultaet, Universitaet Bielefeld Postfach 10 01 31, D-33501

More information

Gene Ontology and overrepresentation analysis

Gene Ontology and overrepresentation analysis Gene Ontology and overrepresentation analysis Kjell Petersen J Express Microarray analysis course Oslo December 2009 Presentation adapted from Endre Anderssen and Vidar Beisvåg NMC Trondheim Overview How

More information

Unsupervised Learning in Spectral Genome Analysis

Unsupervised Learning in Spectral Genome Analysis Unsupervised Learning in Spectral Genome Analysis Lutz Hamel 1, Neha Nahar 1, Maria S. Poptsova 2, Olga Zhaxybayeva 3, J. Peter Gogarten 2 1 Department of Computer Sciences and Statistics, University of

More information

Towards Detecting Protein Complexes from Protein Interaction Data

Towards Detecting Protein Complexes from Protein Interaction Data Towards Detecting Protein Complexes from Protein Interaction Data Pengjun Pei 1 and Aidong Zhang 1 Department of Computer Science and Engineering State University of New York at Buffalo Buffalo NY 14260,

More information

Integrative Protein Function Transfer using Factor Graphs and Heterogeneous Data Sources

Integrative Protein Function Transfer using Factor Graphs and Heterogeneous Data Sources Integrative Protein Function Transfer using Factor Graphs and Heterogeneous Data Sources Antonina Mitrofanova New York University antonina@cs.nyu.edu Vladimir Pavlovic Rutgers University vladimir@cs.rutgers.edu

More information

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. A global picture of the protein universe will help us to understand

More information

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it? Proteomics What is it? Reveal protein interactions Protein profiling in a sample Yeast two hybrid screening High throughput 2D PAGE Automatic analysis of 2D Page Yeast two hybrid Use two mating strains

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Jianlin Cheng, PhD Department of Computer Science Informatics Institute 2011 Topics Introduction Biological Sequence Alignment and Database Search Analysis of gene expression

More information

Motivating the need for optimal sequence alignments...

Motivating the need for optimal sequence alignments... 1 Motivating the need for optimal sequence alignments... 2 3 Note that this actually combines two objectives of optimal sequence alignments: (i) use the score of the alignment o infer homology; (ii) use

More information

Clustering and Network

Clustering and Network Clustering and Network Jing-Dong Jackie Han jdhan@picb.ac.cn http://www.picb.ac.cn/~jdhan Copy Right: Jing-Dong Jackie Han What is clustering? A way of grouping together data samples that are similar in

More information

Self Similar (Scale Free, Power Law) Networks (I)

Self Similar (Scale Free, Power Law) Networks (I) Self Similar (Scale Free, Power Law) Networks (I) E6083: lecture 4 Prof. Predrag R. Jelenković Dept. of Electrical Engineering Columbia University, NY 10027, USA {predrag}@ee.columbia.edu February 7, 2007

More information

Model Accuracy Measures

Model Accuracy Measures Model Accuracy Measures Master in Bioinformatics UPF 2017-2018 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Variables What we can measure (attributes) Hypotheses

More information

On the Sizes of Decision Diagrams Representing the Set of All Parse Trees of a Context-free Grammar

On the Sizes of Decision Diagrams Representing the Set of All Parse Trees of a Context-free Grammar Proceedings of Machine Learning Research vol 73:153-164, 2017 AMBN 2017 On the Sizes of Decision Diagrams Representing the Set of All Parse Trees of a Context-free Grammar Kei Amii Kyoto University Kyoto

More information

Learning in Bayesian Networks

Learning in Bayesian Networks Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

More information

Procedure to Create NCBI KOGS

Procedure to Create NCBI KOGS Procedure to Create NCBI KOGS full details in: Tatusov et al (2003) BMC Bioinformatics 4:41. 1. Detect and mask typical repetitive domains Reason: masking prevents spurious lumping of non-orthologs based

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

Algorithmische Bioinformatik WS 11/12:, by R. Krause/ K. Reinert, 14. November 2011, 12: Motif finding

Algorithmische Bioinformatik WS 11/12:, by R. Krause/ K. Reinert, 14. November 2011, 12: Motif finding Algorithmische Bioinformatik WS 11/12:, by R. Krause/ K. Reinert, 14. November 2011, 12:00 4001 Motif finding This exposition was developed by Knut Reinert and Clemens Gröpl. It is based on the following

More information

An Efficient Algorithm for Protein-Protein Interaction Network Analysis to Discover Overlapping Functional Modules

An Efficient Algorithm for Protein-Protein Interaction Network Analysis to Discover Overlapping Functional Modules An Efficient Algorithm for Protein-Protein Interaction Network Analysis to Discover Overlapping Functional Modules Ying Liu 1 Department of Computer Science, Mathematics and Science, College of Professional

More information

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics GENOME Bioinformatics 2 Proteomics protein-gene PROTEOME protein-protein METABOLISM Slide from http://www.nd.edu/~networks/ Citrate Cycle Bio-chemical reactions What is it? Proteomics Reveal protein Protein

More information

Exhaustive search. CS 466 Saurabh Sinha

Exhaustive search. CS 466 Saurabh Sinha Exhaustive search CS 466 Saurabh Sinha Agenda Two different problems Restriction mapping Motif finding Common theme: exhaustive search of solution space Reading: Chapter 4. Restriction Mapping Restriction

More information

Repeat resolution. This exposition is based on the following sources, which are all recommended reading:

Repeat resolution. This exposition is based on the following sources, which are all recommended reading: Repeat resolution This exposition is based on the following sources, which are all recommended reading: 1. Separation of nearly identical repeats in shotgun assemblies using defined nucleotide positions,

More information

Types of biological networks. I. Intra-cellurar networks

Types of biological networks. I. Intra-cellurar networks Types of biological networks I. Intra-cellurar networks 1 Some intra-cellular networks: 1. Metabolic networks 2. Transcriptional regulation networks 3. Cell signalling networks 4. Protein-protein interaction

More information

Searching Sear ( Sub- (Sub )Strings Ulf Leser

Searching Sear ( Sub- (Sub )Strings Ulf Leser Searching (Sub-)Strings Ulf Leser This Lecture Exact substring search Naïve Boyer-Moore Searching with profiles Sequence profiles Ungapped approximate search Statistical evaluation of search results Ulf

More information

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Automata-based Verification - III

Automata-based Verification - III COMP30172: Advanced Algorithms Automata-based Verification - III Howard Barringer Room KB2.20: email: howard.barringer@manchester.ac.uk March 2009 Third Topic Infinite Word Automata Motivation Büchi Automata

More information

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference CS 229 Project Report (TR# MSB2010) Submitted 12/10/2010 hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference Muhammad Shoaib Sehgal Computer Science

More information

Implicit and Explicit Representation of Approximated Motifs

Implicit and Explicit Representation of Approximated Motifs Università di Pisa Dipartimento di Informatica Technical Report: TR-05- Implicit and Explicit Representation of Approximated Motifs Nadia Pisanti Henry Soldano Mathilde Capentier Joel Pothier September

More information

Pathway Association Analysis Trey Ideker UCSD

Pathway Association Analysis Trey Ideker UCSD Pathway Association Analysis Trey Ideker UCSD A working network map of the cell Network evolutionary comparison / cross-species alignment to identify conserved modules The Working Map Network-based classification

More information

Inferring Protein-Signaling Networks

Inferring Protein-Signaling Networks Inferring Protein-Signaling Networks Lectures 14 Nov 14, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall (JHN) 022 1

More information

The State Explosion Problem

The State Explosion Problem The State Explosion Problem Martin Kot August 16, 2003 1 Introduction One from main approaches to checking correctness of a concurrent system are state space methods. They are suitable for automatic analysis

More information

Optimal spaced seeds for faster approximate string matching

Optimal spaced seeds for faster approximate string matching Optimal spaced seeds for faster approximate string matching Martin Farach-Colton Gad M. Landau S. Cenk Sahinalp Dekel Tsur Abstract Filtering is a standard technique for fast approximate string matching

More information

BIOINFORMATICS LAB AP BIOLOGY

BIOINFORMATICS LAB AP BIOLOGY BIOINFORMATICS LAB AP BIOLOGY Bioinformatics is the science of collecting and analyzing complex biological data. Bioinformatics combines computer science, statistics and biology to allow scientists to

More information

Radial Basis Function Neural Networks in Protein Sequence Classification ABSTRACT

Radial Basis Function Neural Networks in Protein Sequence Classification ABSTRACT (): 195-04 (008) Radial Basis Function Neural Networks in Protein Sequence Classification Zarita Zainuddin and Maragatham Kumar School of Mathematical Sciences, University Science Malaysia, 11800 USM Pulau

More information

RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology"

RNA Search and! Motif Discovery Genome 541! Intro to Computational! Molecular Biology RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology" Day 1" Many biologically interesting roles for RNA" RNA secondary structure prediction" 3 4 Approaches to Structure

More information

11/24/13. Science, then, and now. Computational Structural Bioinformatics. Learning curve. ECS129 Instructor: Patrice Koehl

11/24/13. Science, then, and now. Computational Structural Bioinformatics. Learning curve. ECS129 Instructor: Patrice Koehl Computational Structural Bioinformatics ECS129 Instructor: Patrice Koehl http://www.cs.ucdavis.edu/~koehl/teaching/ecs129/index.html koehl@cs.ucdavis.edu Learning curve Math / CS Biology/ Chemistry Pre-requisite

More information

Lecture: Computational Systems Biology Universität des Saarlandes, SS Introduction. Dr. Jürgen Pahle

Lecture: Computational Systems Biology Universität des Saarlandes, SS Introduction. Dr. Jürgen Pahle Lecture: Computational Systems Biology Universität des Saarlandes, SS 2012 01 Introduction Dr. Jürgen Pahle 24.4.2012 Who am I? Dr. Jürgen Pahle 2009-2012 Manchester Interdisciplinary Biocentre, The University

More information

Optimal spaced seeds for faster approximate string matching

Optimal spaced seeds for faster approximate string matching Optimal spaced seeds for faster approximate string matching Martin Farach-Colton Gad M. Landau S. Cenk Sahinalp Dekel Tsur Abstract Filtering is a standard technique for fast approximate string matching

More information

Improved network-based identification of protein orthologs

Improved network-based identification of protein orthologs BIOINFORMATICS Vol. 24 ECCB 28, pages i2 i26 doi:.93/bioinformatics/btn277 Improved network-based identification of protein orthologs Nir Yosef,, Roded Sharan and William Stafford Noble 2,3 School of Computer

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models A selection of slides taken from the following: Chris Bystroff Protein Folding Initiation Site Motifs Iosif Vaisman Bioinformatics and Gene Discovery Colin Cherry Hidden Markov Models

More information

networks in molecular biology Wolfgang Huber

networks in molecular biology Wolfgang Huber networks in molecular biology Wolfgang Huber networks in molecular biology Regulatory networks: components = gene products interactions = regulation of transcription, translation, phosphorylation... Metabolic

More information

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison 10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:

More information

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST

More information

Detecting unfolded regions in protein sequences. Anne Poupon Génomique Structurale de la Levure IBBMC Université Paris-Sud / CNRS France

Detecting unfolded regions in protein sequences. Anne Poupon Génomique Structurale de la Levure IBBMC Université Paris-Sud / CNRS France Detecting unfolded regions in protein sequences Anne Poupon Génomique Structurale de la Levure IBBMC Université Paris-Sud / CNRS France Large proteins and complexes: a domain approach Structural studies

More information

Integrative Protein Function Transfer using Factor Graphs and Heterogeneous Data Sources

Integrative Protein Function Transfer using Factor Graphs and Heterogeneous Data Sources Integrative Protein Function Transfer using Factor Graphs and Heterogeneous Data Sources Antonina Mitrofanova New York University antonina@cs.nyu.edu Vladimir Pavlovic Rutgers University vladimir@cs.rutgers.edu

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information