Feature Based Gene Summary Extraction with Re-ranking

Size: px
Start display at page:

Download "Feature Based Gene Summary Extraction with Re-ranking"

Transcription

1 Feature Based Gene Summary Extraction with Re-ranking Samir Gupta Computer and Information Sciences University of Delaware Newark, DE USA Abstract Due to the vast availability of bio-medical literature, searching medical databases for information about genes is getting problematic and cumbersome. Searching PubMed with a gene name as query, returns thousands of results, including irrelevant ones. Gene Ontology(GO) and UniProtKB databases provide an indication of relevant terms associated with a gene but is not enough for a quick understanding of the different properties of the gene. Besides these are manually written and curated which is both labor-intensive and time consuming. Automatically generating summaries for gene would help biologists to get an overall picture about the gene quickly. In this paper we adapt generic feature-based extractive summarization techniques and augment it with biomedical domain specific features. We also use the concept of novelty to reduce the redundancy in the extracted summary. Our results show inclusion of domain specific features and redundancy removal improve the content of the summary significantly. 1 Introduction Biomedical databases like PubMed(McEntyre and Lipman, 2001) and BioMed Central 1 are expanding rapidly and contain millions of articles. Due to this vast amount of information, biologists spend a large amount of time searching and reading articles to find relevant information. One such information need which life scientists look for is gene-specific information. A quick overview of the different properties, functions and other aspects of a gene would be very useful. Efforts have 1 been made to construct databases such as Entrez- Gene(Maglott et al., 2005), Gene Ontology 2 and UnitProtKB 3, which provides important information about a gene. But these database are manually created and require curation and regular updates which is labor-intensive. This necessitates the development automatic gene summary extractor. In this paper we describe an approach which expands on the generic features used in summary extraction by including domain specific features. Feature based summary extraction techniques were explored by (Edmundson, 1969; Kupiec et al., 1995) for generic domains. We augment these features with certain domain specific features like presence of gene name and certain biological cue phrases. We also use a variant of Maximal Marginal Relevance(Carbonell and Goldstein, 1998) to reduce the redundancy in the final summary. We use different modules of egift(tudor et al., 2010), a gene information mining tool, to extract the set of abstracts relating to a gene, compute descriptive words, and extract gene name variations. The major contributions of this paper are: Applying the generic features used by Edmundson(1969) to the biomedical domain. Augmenting the generic features with biomedical domain specific features. Using terms provided by Gene Ontology and UnitProtKB medical databases to re-rank sentences based on information novelty. 2 Approach In this section we discuss the details of the the gene summarization system. The input to this system is a gene identifier, same as the one used in the egift system (Tudor et al., 2010). Given a gene

2 identifier, we first extract a set of abstracts from Medline. The retrieval of relevant abstracts for a given gene is done using the egrab (Extractor of Gene-Relevant ABstracts) module of egift. The egrab module considers all gene names, synonyms, and aliases, to query the Medline database and return a set of abstracts for the given gene. Each sentence in the set of abstracts is scored based on a number of features. A subset of these features like term frequency, sentence position, presence of title words and sentence length, are similar to the ones used in (Edmundson, 1969; Kupiec et al., 1995). In addition to these, we use features like the presence of the gene name and certain biological phrases to adapt the generic techniques to the biological domain. After several iterations on some test genes, we manually assigned weights to each of the features and compute a final score for a sentence. The top ranking sentences are then selected to be included in the summary. We have also explored the notion of information novelty to reduce redundancy across the sentences to be selected. This approach is based on Maximal Marginal Relevance(MMR) model used in (Carbonell and Goldstein, 1998), but the difference lies in the computation of novelty and how it is used. Based on the MMR model we re-rank a subset of the sentences returned by the featuredbased system. In the next two subsection we will discuss in details the features and the re-ranking system. 2.1 Computing Sentence Importance The set of abstracts returned by egrab mdoule are preprocessed and segmented into sentences. A set of features are used to score and compute the importance of the sentences. Based on the weighted score, the sentences are ranked and top ranking sentences included in the summary. The first four features are used by generic extractive summarizers. We have added two new features which are more specific to the bio-medical domain. We name the system using the first four features as System-A. System-A will help us understand if and how well generic approaches adapt to particular domain. We hope to see a significant difference and improvement when the last two domain specific features are added(system-b). Sentence Position Feature This features encodes positional information about a sentence in an abstract. Sentence position can be one of the following: title, first, last and middle sentence. As argued in early work of extractive summarization by Edmundson(1969), first and last sentences are typically important than other sentences. Thus higher scores are assigned to first and last sentence positions as opposed to middle or title sentence positions. Title Words Feature This feature assigns a score between 0 and 1 to a sentence based on the presence of title words in the sentence. The title of the abstract are decomposed into words, the words stemmed. These words are regarded as descriptive words and each sentence is scored based on the frequency of occurrence of title words in them. The score is divided by the length of the sentence and then normalized. Sentence Length Feature Kupiec et al.(1995) used sentence length as one of the feature for summarization. In their implementation the feature was true if the sentence length was above a certain threshold, thereby giving less importance to very short sentences. In our system, we have used a low and a high threshold is used to assign low scores to very short or very long sentences. Very long sentences alongwith containing some relevant information contain unnecessary information(noise, we argue should also be given a low score. This helps us to select short sentences in which noise is minimal and thus is more informative to the user. It also helps us in the second phase - the re-ranking step, by allowing more relevant and novel sentences to be selected. Frequency Based Feature This feature is used to assign a score to the sentence between 0 and 1, indicating the presence of descriptive words in the sentence. Most of the early works in the area of summarization used term frequency and its variations to identify the most descriptive words of a document. Term Frequency*Inverse Document Frequency (TF*IDF) has been used in the field of Information Retrieval(Salton and Buckley, 1988; Jones, 1972) as a measure of computing descriptive words in a document. We use egift s(tudor et al., 2010) iterm scores, a variant of TF*IDF weights to extract descriptive words in a set of abstracts relating to a gene. egift automatically computes and associates informative term, iterms with a gene based

3 on frequency information from a set of abstracts returned by egrab module, which is called the About Set for the gene. It assigns scores to unigrams and bigrams, excluding stop-words, as well as a set of bio-medical terms that we extracted from different knowledge bases, including Entrez- Gene, Gene Ontology, NCBI Taxonomy, UMLS, and MeSH that matched in text. The terms are converted to base-form for scoring purposes. Each term is assigned a score depending on its frequency in the About Set, contrasted with it s frequency in Background Set. The background set is the set of all abstracts in the bio-medical database. For each term t, a score s(t) is assigned as follows: s(t) = ( df a(t) N a df b(t) ) ln( N b N b df b (t) ) where df a (t) and df b (t) are the number of abstracts containing term t in the About Set for the gene and the Background Set, respectively, and N a and N b are the total number of abstracts in these two sets. The difference between the normalized document frequencies dfa(t) N a df b(t) N b rewards terms occurring more frequently in the About Set and ln( N b df b (t)) penalizes very frequent terms in all documents. An important thing to note is that egift considers document frequency as opposed to term frequency in a specific document. This is because, iterms are descriptive terms across a set of abstracts and not a single document and thus yields better relevance of term to a gene. Given the score for each term a set of top ranking informative terms or iterms are computed for gene. We score each sentence in the About Set of a gene by considering the occurrences of the iterms and its score. The final score is divided by the number of words in the sentence and normalized. Gene Feature The abstracts returned by the egrab module are related to the gene, whose summary is to be extracted. This feature indicates the presence of the gene name in the sentence. The sentences may or may not contain the gene name, which might be used as an indicator of the sentence s importance. This features assigns a score of 1 to sentences which contains the gene name and 0 otherwise. This boosts the score of sentences containing the gene name in them. A gene in bio-medical literature is referred by several names, abbreviations. For example the SMAD2 has variations such as Smad family member 2, smad-2, madr2, xsmad2 etc. egift provides certain APIs which given a gene identifier returns all the variations of the gene name. It uses official names of genes provided by Entrez Gene(Maglott et al., 2005), synonyms, and word sense disambiguation techniques to return the different variations. Biological Cue Phrase Feature This features assigns a score between 0 and 1 depending upon the presence of certain phrases in the sentence. This approach is based on the fact that certain phrases in a document indicates sentence importance. Authors of technical documents follow certain writing styles, using certain phrases to indicate important relations between different entities in text. These writing styles are domain dependent and require study of the documents to identify them. We argue that phrases are more important than others to indicate a sentence important as they convey very strong relations between the entities in text. EntrezGene(Maglott et al., 2005) contains manually created summaries for some of the genes. We did a preliminary study of the human written summaries from Entrez, in-order to understand, what types of information is typically conveyed in a summary. We identified several aspects which are covered almost in every summary. ATTRIBUTE: The different properties/attributes associated with a gene. FAMILY: Gene family the gene belongs to. FUNCTION: The various biological functions or processes the gene is involved in. DOMAIN: The domains the gene contains. INTERACTION: The interaction of this gene with other gene or proteins. DISEASE: Diseases caused by this gene. These aspects were found to span multiple sentences or different aspects mentioned in a single sentence. For the purposes of this paper we explored the first three aspects. In next paragraph we examine first three aspects in some details and discuss the biological phrases associated with each. ATTRIBUTE: A gene typically has some wellknown properties which need to be captured in a summary. These are typically isa relations between a gene and a noun phrase. For example, sentence fragments like,.. groucho proteins are transcriptional corepressors.. and.. groucho homolog tle-4, a corepressor.. both indicate the gene groucho is a corepressor. Thus for this as-

4 pect we look for phrases like is a, appositives and relative clauses. The pattern should be immediately preceded by the gene in question for this feature to be considered. FAMILY: Almost all gene belongs to a family of genes, which share certain common characteristics. Including the family information, helps biologists to ascertain certain important attributes of the gene. For example, sentence fragments like, The Drosophila Groucho (Gro) protein is the defining member of a family of metazoan corepressors.., Groucho (Gro) is the founding member of a family of transcriptional co-repressor.. indicate that grocho belongs to a family of gene which are corepressors. For this aspects we look for phrases like belongs to and member of. Similar to the above patterns, this pattern should be immediately preceded by the gene in question for this feature to be considered. FUNCTION: Most of the sentences in the human written summaries contain this aspect. These indicate the different biological processes and functions the gene is involved in, required for etc. These are typically mentioned with different aspects, for example typically followed after an IN- TERACTION apsect. Identifying the different functions of a gene is very important and sentences which mention such kind of relations should be included in a summary. From the following sentence fragments we can determine easily that groucho is related to the biological functions such as notch signaling, segmentation and neural development. Examples: Groucho is a transcriptional repressor implicated in notch signaling..,.. Groucho.. involved in neural development and segmentation in drosophila, Groucho is required for Drosophila neurogenesis, segmentation.. and that Gro/TLE proteins play a role in the repression of target genes. We look for the highlighted phrases mentioned in the above sentences when assigning this bio-feature. The gene may not immediately precede the pattern for this aspect, but further the gene from the phrase, the lower the score. Each sentence in the About Set for a gene is searched for the mentioned patterns. The sentence should also contains the gene name. The lexical distance between gene mention and the pattern/phrase is considered while assigning the score for this feature. The distance should be small for FAMILY and ATTRIBUTE aspects, and may be longer for the FUNCTION aspect. The score for each bio-feature in a sentence is added and the scores normalized. 2.2 Re-Ranking based on Novelty Gene summary should contain as much diverse information as possible, thereby reducing the redundancy of information, while maintaining maximal relevance to the gene. As the number of abstracts in the About Set for a gene is very large in number, sentences extracted based only in feature scores may contain high amount of redundant information. Hence the removal of information is necessary, hence redundant sentences should not be selected when producing the final summary. The main intuition behind this method is based on Maximal Marginal Relevance (MMR)(Carbonell and Goldstein, 1998). A sentence which is similar to a sentence already selected should be penalized. A weighted combination of the feature score and novelty score is used to make selected maximally diverse and maximally relevant sentences to a gene. Algorithm 1 provides the pseudo-code for the re-ranking systems. Our re-ranking system takes as input the set of ranked sentences returned the featured based method discussed in section 2.1. For every selected sentence a set of important terms is computed. These include GO terms and UniProtKB keywords. Gene Ontology (GO)project is a major bioinformatics initiative with the aim of standardizing the representation of gene and gene product attributes across species and databases. The project provides a controlled vocabulary of terms for describing gene product characteristics and gene product annotation data from GO Consortium member. The ontology covers three domains: cellular component, molecular function and biological process. The UniProt Knowledgebase (UniProtKB) is the central hub for the collection of functional information on proteins, with accurate, consistent and rich annotation. UniProtKB gene entries are tagged keywords relating to the gene. Instead of considering and minimizing similarity between two sentences as used in MMR, we compute novel score for each sentence. When a sentence is selected, the GO terms and UniProt Keywords are added to the set seletedt erms. The novel score for a sentence is assigned based on the number of new GO terms and UniProt Keywords that is contained in the sentence. The final

5 Input: Set of Ranked Sentences Set D Tuning parameter : λ Output: Set of Re-Ranked Sentences R selectedt erms empty; rerankedsents empty; while D is not empty do foreach sentence s in the set D do fscore s feature score for s; extract GO Terms for s; extract UniProt Keywords for s; add extracted terms to currt erms s ; newt erms s diff(currt erms s, selectedt erms); nscore s novelscore(newt erms s ); score λ f Score+(1 λ) nscore; end determine sent s for which score s is max; delete s from D; add s to R; add newt erms s to selectedt erms; end return R; Algorithm 1: Novelty Based Re-Rank Table 1: Features Based Ranking: Summary Phrases Matches System A System B Improvement SMAD % VPS % BRI % BAG3 0 3 NA% LTBP % KAT2A % score is a weighted depending on a user-tunable parameter λ. The sentence with the highest final score is added to set of re-ranked sentences and deleted from the original ranking. Finally the GO terms and UniProt Keywords are added to the set selectedt erms. A λ value closer to 1 will yield a relevance based ranking while λ value closer to 0 will retrieve a novelty based ranking. When the initial rank set of sentences is empty the algorithm stop and yield a new ranking of sentences. 3 Results In this section we present the results of our evaluation. We used six genes for evaluation purposes. EntrezGene Summary for these genes were used as the gold set. We measured the number of phrase in the extracted sentences which matched with the phrases in the summary. While matching phrases we also considered the relation between the phrase and the gene. A phrase in extracted summary sentence was said to matched if it matched to a phrase in the gold set and had the same relation with gene as in the gold set. For example for gene kat2a a summary sentence is: KAT2A, or GCN5, is a histone acetyltransferase (HAT) that functions primarily as a transcriptional activator.. Re- Ranking system with λ = 0 extracted the following sentence : histone acetyltransferases ( hats ) such as gcn5 play a role in transcriptional activation. The phrase transcriptional activation is marked as matched because its has the same relation with the gene i.e. same function. Figure 1 shows the matching phrases for the gene smaad2 in the summary extracted from the feature based system. System A refers to output generated by using only generic features while System B refers to the output generated by adding the bio-domain specific features. The matched phrases are shown as bold text. Figure 2 shows matching phrases in the summary extracted by the re-ranking system with lambda = 0, 0.3and0.7. A lambda value closer to 0 indicated more importance to information novelty. Table 1 shows the comparison between System A and System B with respect to number of phrase matches each system achieved. The last column indicates the improvement of System B over System A i.e. improvement after adding bio-domain specific features. The results indicate adding domain specific features increase the phrase matches and thus improving the summary content. Table 2 shows the number of matched phrases for the re-ranking system over different values of λ. The first column with λ = 1 is the same as System B in table 1. In the evaluation of the re-ranking system we have the used th set of ranked sentences returned by System B only. The results indicate λ value closer to 0 yields the best results for most of the genes. For example for the gene bri1 the set of summary sentences : BRI1 ligand is brassinolide which binds at the extracellular domain. Binding results in phosphorylation of the kinase domain which activates the BRI1 protein leading to BR responses. is accurately captured by the reranker system (with λ = 0) sentence : brassinosteroids ( brs ) bind to the extracellular domain of the receptor kinase bri1 to activate a signal trans-

6 Table 2: Novelty Based Re-ranking: Summary Phrases Matches λ = 1 λ = 0.9 λ = 0.7 λ = 0.3 λ = 0 Max Improvement over System B SMAD % VPS % BRI % BAG % LTBP % KAT2A % duction cascade that regulates nuclear gene expression and plant development. A similar example occurs for the gene smad2 with extracted sentence: activated tbetari phosphorylates smad2, which then heterodimerizes with smad4, translocates into the nucleus, and subsequently effects gene transcription. which perfectly captures the a set summary sentences(refer fig1). 4 Conclusion We combine generic features for computing sentence with certain bio-medical domain specific features like presence of gene name and biological cue phrases. We also use GO terms and Unit- ProtKeywords as a novelty measure to re-rank sentences and remove information redundancy. Our evaluation suggests that bio-medical features and redundancy removal augmented system extract much more informative summaries. One of the problems of these extractive approaches is the presence of noise in addition to relevant information in the extracted sentences. For example consider a extracted summary sentence for smad2: second, the role of smad 2, an intracellular mediator of activin and tgf-beta, in oocyte maturation was investigated. Only the highlighted fragment is relevant and there is no need to include the entire sentence. In future, we hope that the biological relation patterns discussed in section 2.1 will helps us to determine only the relevant portions of a sentence. These patterns will helps us create an intermediate representation of the set of sentences like smad2 [isa] intracellular mediator OF(activin). Instead of just extracting representative sentences from the About Set, these relations will helps us generate phrases and move toward abstractive summarization. We could combine different relations in a single depending on certain causal links like, INTERACTION aspect followed by FUNCTION aspect. References [Carbonell and Goldstein1998] Jaime Carbonell and Jade Goldstein The use of mmr, diversitybased reranking for reordering documents and producing summaries. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages ACM. [Edmundson1969] H. P. Edmundson New methods in automatic extracting. J. ACM, 16(2): , April. [Jones1972] Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval. Journal of documentation, 28(1): [Kupiec et al.1995] Julian Kupiec, Jan Pedersen, and Francine Chen A trainable document summarizer. In Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR 95, pages 68 73, New York, NY, USA. ACM. [Maglott et al.2005] Donna Maglott, Jim Ostell, Kim D. Pruitt, and Tatiana Tatusova Entrez gene: gene-centered information at ncbi. Nucleic Acids Research, 33(suppl 1):D54 D58. [McEntyre and Lipman2001] Johanna McEntyre and David Lipman Pubmed: bridging the information gap. Canadian Medical Association Journal, 164(9): [Radev et al.2002] Dragomir R. Radev, Eduard Hovy, and Kathleen McKeown Introduction to the special issue on summarization. Comput. Linguist., 28(4): , December. [Salton and Buckley1988] Gerard Salton and Christopher Buckley Term-weighting approaches in automatic text retrieval. Information processing & management, 24(5): [Tudor et al.2010] Catalina O Tudor, Carl J Schmidt, and K Vijay-Shanker egift: Mining gene information from the literature. BMC bioinformatics, 11(1):418.

7 Gene : SMAD2 Entrez Summary: The protein encoded by this gene belongs to the SMAD, a family of proteins similar to the gene products of the Drosophila gene 'mothers against decapentaplegic' (Mad) and the C. elegans gene Sma. SMAD proteins are signal transducers and transcriptional modulators that mediate multiple signaling pathways. This protein mediates the signal of the transforming growth factor (TGF)-beta, and thus regulates multiple cellular processes, such as cell proliferation, apoptosis, and differentiation. This protein is recruited to the TGF-beta receptors through its interaction with the SMAD anchor for receptor activation (SARA) protein. In response to TGF-beta signal, this protein is phosphorylated by the TGF-beta receptors. The phosphorylation induces the dissociation of this protein with SARA and the association with the family member SMAD4. The association with SMAD4 is important for the translocation of this protein into the nucleus, where it binds to target promoters and forms a transcription repressor complex with other cofactors. This protein can also be phosphorylated by activin type 1 receptor kinase, and mediates the signal from the activin. System A (without Bio-Features) smad2 overexpression suppressed osteocalcin mrna expression in ros17/2.8 cells. tgfbeta signaling is initiated when the type i receptor phosphorylates the mad-related protein, smad2, on c-terminal serine residues. mad-related genes on chromosome 18q21.1 are altered infrequently in escc. activation of transforming growth factor-beta ( tgf-beta ) receptors triggers phosphorylation of smad2 and smad3. cells that lack smad2 may escape from tgf-beta-mediated growth inhibition and promote cancer progression. phosphorylation-dependent activation of the transcription factors smad2 and smad3 plays an important role in tgfbeta-dependent furthermore, we observed a strong correlation between sustained smad2 phosphorylation and resistance to tgf-beta1-mediated growth inhibition. System B(With Bio-Features) phosphorylation-dependent activation of the transcription factors smad2 and smad3 plays an important role in tgfbeta-dependent we report that smad2, a transcription factor activated by tgfbeta, mediates tgf-beta induction of enos in endothelial cells. identification of smad2, a human mad-related protein in the transforming growth factor beta signaling pathway. conclusions : the results suggest that mutation of smad2 does not play a key role in human stomach carcinogenesis. second, the role of smad 2, an intracellular mediator of activin and tgf-beta, in oocyte maturation was investigated. thus, heteromeric complex formation of smad2 with smad4 is required for nuclear translocation of smad4. evidence that smad2 is a tumor suppressor implicated in the control of cellular invasion. Figure 1: Feature-Based Ranked Summaries for SMAD2 for System A and B

8 λ = phosphorylation-dependent activation of the transcription factors smad2 and smad3 plays an important role in tgfbeta-dependent 2. second, the role of smad 2, an intracellular mediator of activin and tgf-beta, in oocyte maturation was investigated. 3. smad2 and smad3 are signalling proteins that are involved in mediating the transcriptional regulation of target genes downstream of transforming growth factor-beta and activin receptors. 4. activated tbetari phosphorylates smad2, which then heterodimerizes with smad4, translocates into the nucleus, and subsequently effects gene transcription. 5. identification of smad2, a human mad-related protein in the transforming growth factor beta signaling pathway. 6. xmad2, a recently identified tgf-beta signal transducer, forms a complex with the transcription factor in an activin-dependent fashion to generate an activated are-binding complex. 7. ligation of the t cell receptor complex results in phosphorylation of smad2 in t lymphocytes. λ = phosphorylation-dependent activation of the transcription factors smad2 and smad3 plays an important role in tgfbeta-dependent 2. second, the role of smad 2, an intracellular mediator of activin and tgf-beta, in oocyte maturation was investigated. 3. smad2 and smad3 are signalling proteins that are involved in mediating the transcriptional regulation of target genes downstream of transforming growth factor-beta and activin receptors. 4. identification of smad2, a human mad-related protein in the transforming growth factor beta signaling pathway. 5. thus, heteromeric complex formation of smad2 with smad4 is required for nuclear translocation of smad4. 6. ubiquitination of smad2 is a consequence of its accumulation in the nucleus. 7. xmad2, a recently identified tgf-beta signal transducer, forms a complex with the transcription factor in an activin-dependent fashion to generate an activated are-binding complex. λ = phosphorylation-dependent activation of the transcription factors smad2 and smad3 plays an important role in tgfbeta-dependent 2. second, the role of smad 2, an intracellular mediator of activin and tgf-beta, in oocyte maturation was investigated. 3. identification of smad2, a human mad-related protein in the transforming growth factor beta signaling pathway. 4. thus, heteromeric complex formation of smad2 with smad4 is required for nuclear translocation of smad4. 5. we report that smad2, a transcription factor activated by tgf-beta, mediates tgf-beta induction of enos in endothelial cells. 6. conclusions : the results suggest that mutation of smad2 does not play a key role in human stomach carcinogenesis. 7. evidence that smad2 is a tumor suppressor implicated in the control of cellular invasion. Figure 2: Re-Ranked Summaries for SMAD2 with λ = 0, 0.3, 0.7

Biol403 - Receptor Serine/Threonine Kinases

Biol403 - Receptor Serine/Threonine Kinases Biol403 - Receptor Serine/Threonine Kinases The TGFβ (transforming growth factorβ) family of growth factors TGFβ1 was first identified as a transforming factor; however, it is a member of a family of structurally

More information

Regulation and signaling. Overview. Control of gene expression. Cells need to regulate the amounts of different proteins they express, depending on

Regulation and signaling. Overview. Control of gene expression. Cells need to regulate the amounts of different proteins they express, depending on Regulation and signaling Overview Cells need to regulate the amounts of different proteins they express, depending on cell development (skin vs liver cell) cell stage environmental conditions (food, temperature,

More information

Introduction. Gene expression is the combined process of :

Introduction. Gene expression is the combined process of : 1 To know and explain: Regulation of Bacterial Gene Expression Constitutive ( house keeping) vs. Controllable genes OPERON structure and its role in gene regulation Regulation of Eukaryotic Gene Expression

More information

ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database

ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database Dina Vishnyakova 1,2, 4, *, Julien Gobeill 1,3,4, Emilie Pasche 1,2,3,4 and Patrick Ruch

More information

Combining Vector-Space and Word-based Aspect Models for Passage Retrieval

Combining Vector-Space and Word-based Aspect Models for Passage Retrieval Combining Vector-Space and Word-based Aspect Models for Passage Retrieval Raymond Wan Vo Ngoc Anh Ichigaku Takigawa Hiroshi Mamitsuka Bioinformatics Center, Institute for Chemical Research, Kyoto University,

More information

Cell Cell Communication in Development

Cell Cell Communication in Development Biology 4361 Developmental Biology Cell Cell Communication in Development June 25, 2008 Cell Cell Communication Concepts Cells in developing organisms develop in the context of their environment, including

More information

Framework for a Protein Ontology

Framework for a Protein Ontology Framework for a rotein Ontology TMBIO November 2006 Darren A. Natale, h.d. rotein Science Team Lead, IR Research Assistant rofessor, GUMC GO: ontologies that pertain, in part, to the locations, the processes,

More information

Cell-Cell Communication in Development

Cell-Cell Communication in Development Biology 4361 - Developmental Biology Cell-Cell Communication in Development June 23, 2009 Concepts Cell-Cell Communication Cells develop in the context of their environment, including: - their immediate

More information

Information Extraction from Biomedical Text. BMI/CS 776 Mark Craven

Information Extraction from Biomedical Text. BMI/CS 776  Mark Craven Information Extraction from Biomedical Text BMI/CS 776 www.biostat.wisc.edu/bmi776/ Mark Craven craven@biostat.wisc.edu Spring 2012 Goals for Lecture the key concepts to understand are the following! named-entity

More information

L3.1: Circuits: Introduction to Transcription Networks. Cellular Design Principles Prof. Jenna Rickus

L3.1: Circuits: Introduction to Transcription Networks. Cellular Design Principles Prof. Jenna Rickus L3.1: Circuits: Introduction to Transcription Networks Cellular Design Principles Prof. Jenna Rickus In this lecture Cognitive problem of the Cell Introduce transcription networks Key processing network

More information

Big Idea 3: Living systems store, retrieve, transmit and respond to information essential to life processes. Tuesday, December 27, 16

Big Idea 3: Living systems store, retrieve, transmit and respond to information essential to life processes. Tuesday, December 27, 16 Big Idea 3: Living systems store, retrieve, transmit and respond to information essential to life processes. Enduring understanding 3.B: Expression of genetic information involves cellular and molecular

More information

Bio-Medical Text Mining with Machine Learning

Bio-Medical Text Mining with Machine Learning Sumit Madan Department of Bioinformatics - Fraunhofer SCAI Textual Knowledge PubMed Journals, Books Patents EHRs What is Bio-Medical Text Mining? Phosphorylation of glycogen synthase kinase 3 beta at Threonine,

More information

Using the Biological Taxonomy to Access. Biological Literature with PathBinderH

Using the Biological Taxonomy to Access. Biological Literature with PathBinderH Bioinformatics, 21(10) (May 2005), pp. 2560-2562. Using the Biological Taxonomy to Access Biological Literature with PathBinderH J. Ding 1,2, K. Viswanathan 2,3,4, D. Berleant 1,2,5,*, L. Hughes, 1,2 E.

More information

The Protein Ontology: An Evolution

The Protein Ontology: An Evolution The rotein Ontology: An Evolution 251 st ACS National Meeting & Exposition Chemistry, Data & the Semantic Web: An Important Triple to Advance Science San Diego, CA Darren A. Natale, h.d. rotein Science

More information

Reception The target cell s detection of a signal coming from outside the cell May Occur by: Direct connect Through signal molecules

Reception The target cell s detection of a signal coming from outside the cell May Occur by: Direct connect Through signal molecules Why Do Cells Communicate? Regulation Cells need to control cellular processes In multicellular organism, cells signaling pathways coordinate the activities within individual cells that support the function

More information

Signal Transduction. Dr. Chaidir, Apt

Signal Transduction. Dr. Chaidir, Apt Signal Transduction Dr. Chaidir, Apt Background Complex unicellular organisms existed on Earth for approximately 2.5 billion years before the first multicellular organisms appeared.this long period for

More information

Gene Ontology and overrepresentation analysis

Gene Ontology and overrepresentation analysis Gene Ontology and overrepresentation analysis Kjell Petersen J Express Microarray analysis course Oslo December 2009 Presentation adapted from Endre Anderssen and Vidar Beisvåg NMC Trondheim Overview How

More information

CLRG Biocreative V

CLRG Biocreative V CLRG ChemTMiner @ Biocreative V Sobha Lalitha Devi., Sindhuja Gopalan., Vijay Sundar Ram R., Malarkodi C.S., Lakshmi S., Pattabhi RK Rao Computational Linguistics Research Group, AU-KBC Research Centre

More information

S1 Gene ontology (GO) analysis of the network alignment results

S1 Gene ontology (GO) analysis of the network alignment results 1 Supplementary Material for Effective comparative analysis of protein-protein interaction networks by measuring the steady-state network flow using a Markov model Hyundoo Jeong 1, Xiaoning Qian 1 and

More information

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1 Tiffany Samaroo MB&B 452a December 8, 2003 Take Home Final Topic 1 Prior to 1970, protein and DNA sequence alignment was limited to visual comparison. This was a very tedious process; even proteins with

More information

16 The Cell Cycle. Chapter Outline The Eukaryotic Cell Cycle Regulators of Cell Cycle Progression The Events of M Phase Meiosis and Fertilization

16 The Cell Cycle. Chapter Outline The Eukaryotic Cell Cycle Regulators of Cell Cycle Progression The Events of M Phase Meiosis and Fertilization The Cell Cycle 16 The Cell Cycle Chapter Outline The Eukaryotic Cell Cycle Regulators of Cell Cycle Progression The Events of M Phase Meiosis and Fertilization Introduction Self-reproduction is perhaps

More information

5- Semaphorin-Plexin-Neuropilin

5- Semaphorin-Plexin-Neuropilin 5- Semaphorin-Plexin-Neuropilin 1 SEMAPHORINS-PLEXINS-NEUROPILINS ligands receptors co-receptors semaphorins and their receptors are known signals for: -axon guidance -cell migration -morphogenesis -immune

More information

Differential Modeling for Cancer Microarray Data

Differential Modeling for Cancer Microarray Data Differential Modeling for Cancer Microarray Data Omar Odibat Department of Computer Science Feb, 01, 2011 1 Outline Introduction Cancer Microarray data Problem Definition Differential analysis Existing

More information

Activation of a receptor. Assembly of the complex

Activation of a receptor. Assembly of the complex Activation of a receptor ligand inactive, monomeric active, dimeric When activated by growth factor binding, the growth factor receptor tyrosine kinase phosphorylates the neighboring receptor. Assembly

More information

Information Extraction from Biomedical Text

Information Extraction from Biomedical Text Information Extraction from Biomedical Text BMI/CS 776 www.biostat.wisc.edu/bmi776/ Mark Craven craven@biostat.wisc.edu February 2008 Some Important Text-Mining Problems hypothesis generation Given: biomedical

More information

Regulation of Transcription in Eukaryotes

Regulation of Transcription in Eukaryotes Regulation of Transcription in Eukaryotes Leucine zipper and helix-loop-helix proteins contain DNA-binding domains formed by dimerization of two polypeptide chains. Different members of each family can

More information

BMD645. Integration of Omics

BMD645. Integration of Omics BMD645 Integration of Omics Shu-Jen Chen, Chang Gung University Dec. 11, 2009 1 Traditional Biology vs. Systems Biology Traditional biology : Single genes or proteins Systems biology: Simultaneously study

More information

Co-ordination occurs in multiple layers Intracellular regulation: self-regulation Intercellular regulation: coordinated cell signalling e.g.

Co-ordination occurs in multiple layers Intracellular regulation: self-regulation Intercellular regulation: coordinated cell signalling e.g. Gene Expression- Overview Differentiating cells Achieved through changes in gene expression All cells contain the same whole genome A typical differentiated cell only expresses ~50% of its total gene Overview

More information

9/4/2015 INDUCTION CHAPTER 1. Neurons are similar across phyla Thus, many different model systems are used in developmental neurobiology. Fig 1.

9/4/2015 INDUCTION CHAPTER 1. Neurons are similar across phyla Thus, many different model systems are used in developmental neurobiology. Fig 1. INDUCTION CHAPTER 1 Neurons are similar across phyla Thus, many different model systems are used in developmental neurobiology Fig 1.1 1 EVOLUTION OF METAZOAN BRAINS GASTRULATION MAKING THE 3 RD GERM LAYER

More information

A Database of human biological pathways

A Database of human biological pathways A Database of human biological pathways Steve Jupe - sjupe@ebi.ac.uk 1 Rationale Journal information Nature 407(6805):770-6.The Biochemistry of Apoptosis. Caspase-8 is the key initiator caspase in the

More information

Written Exam 15 December Course name: Introduction to Systems Biology Course no

Written Exam 15 December Course name: Introduction to Systems Biology Course no Technical University of Denmark Written Exam 15 December 2008 Course name: Introduction to Systems Biology Course no. 27041 Aids allowed: Open book exam Provide your answers and calculations on separate

More information

Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday

Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday 1. What is the Central Dogma? 2. How does prokaryotic DNA compare to eukaryotic DNA? 3. How is DNA

More information

Sig2GRN: A Software Tool Linking Signaling Pathway with Gene Regulatory Network for Dynamic Simulation

Sig2GRN: A Software Tool Linking Signaling Pathway with Gene Regulatory Network for Dynamic Simulation Sig2GRN: A Software Tool Linking Signaling Pathway with Gene Regulatory Network for Dynamic Simulation Authors: Fan Zhang, Runsheng Liu and Jie Zheng Presented by: Fan Wu School of Computer Science and

More information

Networks & pathways. Hedi Peterson MTAT Bioinformatics

Networks & pathways. Hedi Peterson MTAT Bioinformatics Networks & pathways Hedi Peterson (peterson@quretec.com) MTAT.03.239 Bioinformatics 03.11.2010 Networks are graphs Nodes Edges Edges Directed, undirected, weighted Nodes Genes Proteins Metabolites Enzymes

More information

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 1 1 Introduction Grundlagen der Bioinformatik Summer semester 2010 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a)

More information

Supplementary Materials for mplr-loc Web-server

Supplementary Materials for mplr-loc Web-server Supplementary Materials for mplr-loc Web-server Shibiao Wan and Man-Wai Mak email: shibiao.wan@connect.polyu.hk, enmwmak@polyu.edu.hk June 2014 Back to mplr-loc Server Contents 1 Introduction to mplr-loc

More information

Regulation of Gene Expression

Regulation of Gene Expression Chapter 18 Regulation of Gene Expression Edited by Shawn Lester PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley

More information

Probabilistic Information Retrieval

Probabilistic Information Retrieval Probabilistic Information Retrieval Sumit Bhatia July 16, 2009 Sumit Bhatia Probabilistic Information Retrieval 1/23 Overview 1 Information Retrieval IR Models Probability Basics 2 Document Ranking Problem

More information

Sequence analysis and comparison

Sequence analysis and comparison The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

More information

Gene mention normalization in full texts using GNAT and LINNAEUS

Gene mention normalization in full texts using GNAT and LINNAEUS Gene mention normalization in full texts using GNAT and LINNAEUS Illés Solt 1,2, Martin Gerner 3, Philippe Thomas 2, Goran Nenadic 4, Casey M. Bergman 3, Ulf Leser 2, Jörg Hakenberg 5 1 Department of Telecommunications

More information

Welcome to Class 21!

Welcome to Class 21! Welcome to Class 21! Introductory Biochemistry! Lecture 21: Outline and Objectives l Regulation of Gene Expression in Prokaryotes! l transcriptional regulation! l principles! l lac operon! l trp attenuation!

More information

16 CONTROL OF GENE EXPRESSION

16 CONTROL OF GENE EXPRESSION 16 CONTROL OF GENE EXPRESSION Chapter Outline 16.1 REGULATION OF GENE EXPRESSION IN PROKARYOTES The operon is the unit of transcription in prokaryotes The lac operon for lactose metabolism is transcribed

More information

Plant Molecular and Cellular Biology Lecture 10: Plant Cell Cycle Gary Peter

Plant Molecular and Cellular Biology Lecture 10: Plant Cell Cycle Gary Peter Plant Molecular and Cellular Biology Lecture 10: Plant Cell Cycle Gary Peter 9/10/2008 1 Learning Objectives Explain similarities and differences between fungal, mammalian and plant cell cycles Explain

More information

1 Information retrieval fundamentals

1 Information retrieval fundamentals CS 630 Lecture 1: 01/26/2006 Lecturer: Lillian Lee Scribes: Asif-ul Haque, Benyah Shaparenko This lecture focuses on the following topics Information retrieval fundamentals Vector Space Model (VSM) Deriving

More information

Name Period The Control of Gene Expression in Prokaryotes Notes

Name Period The Control of Gene Expression in Prokaryotes Notes Bacterial DNA contains genes that encode for many different proteins (enzymes) so that many processes have the ability to occur -not all processes are carried out at any one time -what allows expression

More information

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Bioinformatics. Dept. of Computational Biology & Bioinformatics Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS

More information

Bioinformatics Exercises

Bioinformatics Exercises Bioinformatics Exercises AP Biology Teachers Workshop Susan Cates, Ph.D. Evolution of Species Phylogenetic Trees show the relatedness of organisms Common Ancestor (Root of the tree) 1 Rooted vs. Unrooted

More information

GENE ONTOLOGY (GO) Wilver Martínez Martínez Giovanny Silva Rincón

GENE ONTOLOGY (GO) Wilver Martínez Martínez Giovanny Silva Rincón GENE ONTOLOGY (GO) Wilver Martínez Martínez Giovanny Silva Rincón What is GO? The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions of gene products in

More information

Context dependent visualization of protein function

Context dependent visualization of protein function Article III Context dependent visualization of protein function In: Juho Rousu, Samuel Kaski and Esko Ukkonen (eds.). Probabilistic Modeling and Machine Learning in Structural and Systems Biology. 2006,

More information

BIOINFORMATICS LAB AP BIOLOGY

BIOINFORMATICS LAB AP BIOLOGY BIOINFORMATICS LAB AP BIOLOGY Bioinformatics is the science of collecting and analyzing complex biological data. Bioinformatics combines computer science, statistics and biology to allow scientists to

More information

UNIT 6 PART 3 *REGULATION USING OPERONS* Hillis Textbook, CH 11

UNIT 6 PART 3 *REGULATION USING OPERONS* Hillis Textbook, CH 11 UNIT 6 PART 3 *REGULATION USING OPERONS* Hillis Textbook, CH 11 REVIEW: Signals that Start and Stop Transcription and Translation BUT, HOW DO CELLS CONTROL WHICH GENES ARE EXPRESSED AND WHEN? First of

More information

Improving Diversity in Ranking using Absorbing Random Walks

Improving Diversity in Ranking using Absorbing Random Walks Improving Diversity in Ranking using Absorbing Random Walks Andrew B. Goldberg with Xiaojin Zhu, Jurgen Van Gael, and David Andrzejewski Department of Computer Sciences, University of Wisconsin, Madison

More information

Gene Control Mechanisms at Transcription and Translation Levels

Gene Control Mechanisms at Transcription and Translation Levels Gene Control Mechanisms at Transcription and Translation Levels Dr. M. Vijayalakshmi School of Chemical and Biotechnology SASTRA University Joint Initiative of IITs and IISc Funded by MHRD Page 1 of 9

More information

Cell-Cell Communication in Development

Cell-Cell Communication in Development Biology 4361 - Developmental Biology Cell-Cell Communication in Development October 2, 2007 Cell-Cell Communication - Topics Induction and competence Paracrine factors inducer molecules Signal transduction

More information

Research Article HomoKinase: A Curated Database of Human Protein Kinases

Research Article HomoKinase: A Curated Database of Human Protein Kinases ISRN Computational Biology Volume 2013, Article ID 417634, 5 pages http://dx.doi.org/10.1155/2013/417634 Research Article HomoKinase: A Curated Database of Human Protein Kinases Suresh Subramani, Saranya

More information

Chapter 15 Active Reading Guide Regulation of Gene Expression

Chapter 15 Active Reading Guide Regulation of Gene Expression Name: AP Biology Mr. Croft Chapter 15 Active Reading Guide Regulation of Gene Expression The overview for Chapter 15 introduces the idea that while all cells of an organism have all genes in the genome,

More information

CSCE555 Bioinformatics. Protein Function Annotation

CSCE555 Bioinformatics. Protein Function Annotation CSCE555 Bioinformatics Protein Function Annotation Why we need to do function annotation? Fig from: Network-based prediction of protein function. Molecular Systems Biology 3:88. 2007 What s function? The

More information

23 april th nordic conference on scholarly publishing lund, sweden

23 april th nordic conference on scholarly publishing lund, sweden 23 april 2008 4th nordic conference on scholarly publishing lund, sweden river blindness river blindness The drug Mectazin has been used effectively, but has to be taken over 15 years and does not kill

More information

Introduction to the EMBL-EBI Ontology Lookup Service

Introduction to the EMBL-EBI Ontology Lookup Service Introduction to the EMBL-EBI Ontology Lookup Service Simon Jupp jupp@ebi.ac.uk, @simonjupp Samples, Phenotypes and Ontologies Team European Bioinformatics Institute Cambridge, UK. Ontologies in the life

More information

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: m Eukaryotic mrna processing Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: Cap structure a modified guanine base is added to the 5 end. Poly-A tail

More information

Plant Molecular and Cellular Biology Lecture 8: Mechanisms of Cell Cycle Control and DNA Synthesis Gary Peter

Plant Molecular and Cellular Biology Lecture 8: Mechanisms of Cell Cycle Control and DNA Synthesis Gary Peter Plant Molecular and Cellular Biology Lecture 8: Mechanisms of Cell Cycle Control and DNA Synthesis Gary Peter 9/10/2008 1 Learning Objectives Explain why a cell cycle was selected for during evolution

More information

Synthesis of Biological Models from Mutation Experiments

Synthesis of Biological Models from Mutation Experiments Synthesis of Biological Models from Mutation Experiments Ali Sinan Köksal, Saurabh Srivastava, Rastislav Bodík, UC Berkeley Evan Pu, MIT Jasmin Fisher, Microsoft Research Cambridge Nir Piterman, University

More information

Predicting Protein Functions and Domain Interactions from Protein Interactions

Predicting Protein Functions and Domain Interactions from Protein Interactions Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput

More information

Francisco M. Couto Mário J. Silva Pedro Coutinho

Francisco M. Couto Mário J. Silva Pedro Coutinho Francisco M. Couto Mário J. Silva Pedro Coutinho DI FCUL TR 03 29 Departamento de Informática Faculdade de Ciências da Universidade de Lisboa Campo Grande, 1749 016 Lisboa Portugal Technical reports are

More information

Lipniacki 2004 Ground Truth

Lipniacki 2004 Ground Truth Abstract Lipniacki 2004 Ground Truth The two-feedback-loop regulatory module of nuclear factor kb (NF-kB) signaling pathway is modeled by means of ordinary differential equations. signaling pathway: https://en.wikipedia.org/wiki/signaling_pathway

More information

Hands-On Nine The PAX6 Gene and Protein

Hands-On Nine The PAX6 Gene and Protein Hands-On Nine The PAX6 Gene and Protein Main Purpose of Hands-On Activity: Using bioinformatics tools to examine the sequences, homology, and disease relevance of the Pax6: a master gene of eye formation.

More information

A Protein Ontology from Large-scale Textmining?

A Protein Ontology from Large-scale Textmining? A Protein Ontology from Large-scale Textmining? Protege-Workshop Manchester, 07-07-2003 Kai Kumpf, Juliane Fluck and Martin Hofmann Instructive mistakes: a narrative Aim: Protein ontology that supports

More information

Programmed Cell Death

Programmed Cell Death Programmed Cell Death Dewajani Purnomosari Department of Histology and Cell Biology Faculty of Medicine Universitas Gadjah Mada d.purnomosari@ugm.ac.id What is apoptosis? a normal component of the development

More information

Leucine-rich repeat receptor-like kinases (LRR-RLKs), HAESA, ERECTA-family

Leucine-rich repeat receptor-like kinases (LRR-RLKs), HAESA, ERECTA-family Leucine-rich repeat receptor-like kinases (LRR-RLKs), HAESA, ERECTA-family GENES & DEVELOPMENT (2000) 14: 108 117 INTRODUCTION Flower Diagram INTRODUCTION Abscission In plant, the process by which a plant

More information

Lecture 10: Cyclins, cyclin kinases and cell division

Lecture 10: Cyclins, cyclin kinases and cell division Chem*3560 Lecture 10: Cyclins, cyclin kinases and cell division The eukaryotic cell cycle Actively growing mammalian cells divide roughly every 24 hours, and follow a precise sequence of events know as

More information

Eukaryotic vs. Prokaryotic genes

Eukaryotic vs. Prokaryotic genes BIO 5099: Molecular Biology for Computer Scientists (et al) Lecture 18: Eukaryotic genes http://compbio.uchsc.edu/hunter/bio5099 Larry.Hunter@uchsc.edu Eukaryotic vs. Prokaryotic genes Like in prokaryotes,

More information

Bi 1x Spring 2014: LacI Titration

Bi 1x Spring 2014: LacI Titration Bi 1x Spring 2014: LacI Titration 1 Overview In this experiment, you will measure the effect of various mutated LacI repressor ribosome binding sites in an E. coli cell by measuring the expression of a

More information

Mathematical Modeling and Analysis of Crosstalk between MAPK Pathway and Smad-Dependent TGF-β Signal Transduction

Mathematical Modeling and Analysis of Crosstalk between MAPK Pathway and Smad-Dependent TGF-β Signal Transduction Processes 2014, 2, 570-595; doi:10.3390/pr2030570 Article OPEN ACCESS processes ISSN 2227-9717 www.mdpi.com/journal/processes Mathematical Modeling and Analysis of Crosstalk between MAPK Pathway and Smad-Dependent

More information

Genomes and Their Evolution

Genomes and Their Evolution Chapter 21 Genomes and Their Evolution PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from

More information

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison 10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:

More information

Midterm 1. Average score: 74.4 Median score: 77

Midterm 1. Average score: 74.4 Median score: 77 Midterm 1 Average score: 74.4 Median score: 77 NAME: TA (circle one) Jody Westbrook or Jessica Piel Section (circle one) Tue Wed Thur MCB 141 First Midterm Feb. 21, 2008 Only answer 4 of these 5 problems.

More information

Sequences, Structures, and Gene Regulatory Networks

Sequences, Structures, and Gene Regulatory Networks Sequences, Structures, and Gene Regulatory Networks Learning Outcomes After this class, you will Understand gene expression and protein structure in more detail Appreciate why biologists like to align

More information

MTopGO: a tool for module identification in PPI Networks

MTopGO: a tool for module identification in PPI Networks MTopGO: a tool for module identification in PPI Networks Danila Vella 1,2, Simone Marini 3,4, Francesca Vitali 5,6,7, Riccardo Bellazzi 1,4 1 Clinical Scientific Institute Maugeri, Pavia, Italy, 2 Department

More information

III.6 Advanced Query Types

III.6 Advanced Query Types III.6 Advanced Query Types 1. Query Expansion 2. Relevance Feedback 3. Novelty & Diversity Based on MRS Chapter 9, BY Chapter 5, [Carbonell and Goldstein 98] [Agrawal et al 09] 123 1. Query Expansion Query

More information

Cells to Tissues. Peter Takizawa Department of Cell Biology

Cells to Tissues. Peter Takizawa Department of Cell Biology Cells to Tissues Peter Takizawa Department of Cell Biology From one cell to ensembles of cells. Multicellular organisms require individual cells to work together in functional groups. This means cells

More information

Drosophila melanogaster- Morphogen Gradient

Drosophila melanogaster- Morphogen Gradient NPTEL Biotechnology - Systems Biology Drosophila melanogaster- Morphogen Gradient Dr. M. Vijayalakshmi School of Chemical and Biotechnology SASTRA University Joint Initiative of IITs and IISc Funded by

More information

Chem Lecture 10 Signal Transduction

Chem Lecture 10 Signal Transduction Chem 452 - Lecture 10 Signal Transduction 111202 Here we look at the movement of a signal from the outside of a cell to its inside, where it elicits changes within the cell. These changes are usually mediated

More information

RNA Synthesis and Processing

RNA Synthesis and Processing RNA Synthesis and Processing Introduction Regulation of gene expression allows cells to adapt to environmental changes and is responsible for the distinct activities of the differentiated cell types that

More information

REVIEW SESSION. Wednesday, September 15 5:30 PM SHANTZ 242 E

REVIEW SESSION. Wednesday, September 15 5:30 PM SHANTZ 242 E REVIEW SESSION Wednesday, September 15 5:30 PM SHANTZ 242 E Gene Regulation Gene Regulation Gene expression can be turned on, turned off, turned up or turned down! For example, as test time approaches,

More information

Chapter 18 Lecture. Concepts of Genetics. Tenth Edition. Developmental Genetics

Chapter 18 Lecture. Concepts of Genetics. Tenth Edition. Developmental Genetics Chapter 18 Lecture Concepts of Genetics Tenth Edition Developmental Genetics Chapter Contents 18.1 Differentiated States Develop from Coordinated Programs of Gene Expression 18.2 Evolutionary Conservation

More information

Eukaryotic Gene Expression

Eukaryotic Gene Expression Eukaryotic Gene Expression Lectures 22-23 Several Features Distinguish Eukaryotic Processes From Mechanisms in Bacteria 123 Eukaryotic Gene Expression Several Features Distinguish Eukaryotic Processes

More information

Chemical Data Retrieval and Management

Chemical Data Retrieval and Management Chemical Data Retrieval and Management ChEMBL, ChEBI, and the Chemistry Development Kit Stephan A. Beisken What is EMBL-EBI? Part of the European Molecular Biology Laboratory International, non-profit

More information

Measuring TF-DNA interactions

Measuring TF-DNA interactions Measuring TF-DNA interactions How is Biological Complexity Achieved? Mediated by Transcription Factors (TFs) 2 Regulation of Gene Expression by Transcription Factors TF trans-acting factors TF TF TF TF

More information

CS612 - Algorithms in Bioinformatics

CS612 - Algorithms in Bioinformatics Fall 2017 Databases and Protein Structure Representation October 2, 2017 Molecular Biology as Information Science > 12, 000 genomes sequenced, mostly bacterial (2013) > 5x10 6 unique sequences available

More information

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting. Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction

More information

Honors Biology Reading Guide Chapter 11

Honors Biology Reading Guide Chapter 11 Honors Biology Reading Guide Chapter 11 v Promoter a specific nucleotide sequence in DNA located near the start of a gene that is the binding site for RNA polymerase and the place where transcription begins

More information

FOXO1 transcription factor and its role in Tamoxifen- resistant breast cancer. Christina Warner. Department of Biochemistry and Molecular Biophysics

FOXO1 transcription factor and its role in Tamoxifen- resistant breast cancer. Christina Warner. Department of Biochemistry and Molecular Biophysics FOXO1 transcription factor and its role in Tamoxifen- resistant breast cancer Christina Warner Department of Biochemistry and Molecular Biophysics University of Arizona 85726 Abstract: 136 words Main Body:

More information

Lecture 4: Transcription networks basic concepts

Lecture 4: Transcription networks basic concepts Lecture 4: Transcription networks basic concepts - Activators and repressors - Input functions; Logic input functions; Multidimensional input functions - Dynamics and response time 2.1 Introduction The

More information

APGRU6L2. Control of Prokaryotic (Bacterial) Genes

APGRU6L2. Control of Prokaryotic (Bacterial) Genes APGRU6L2 Control of Prokaryotic (Bacterial) Genes 2007-2008 Bacterial metabolism Bacteria need to respond quickly to changes in their environment STOP u if they have enough of a product, need to stop production

More information

Introduction Biology before Systems Biology: Reductionism Reduce the study from the whole organism to inner most details like protein or the DNA.

Introduction Biology before Systems Biology: Reductionism Reduce the study from the whole organism to inner most details like protein or the DNA. Systems Biology-Models and Approaches Introduction Biology before Systems Biology: Reductionism Reduce the study from the whole organism to inner most details like protein or the DNA. Taxonomy Study external

More information

Comparative Document Analysis for Large Text Corpora

Comparative Document Analysis for Large Text Corpora Comparative Document Analysis for Large Text Corpora Xiang Ren Yuanhua Lv Kuansan Wang Jiawei Han University of Illinois at Urbana-Champaign, Urbana, IL, USA Microsoft Research, Redmond, WA, USA {xren7,

More information

Regulation of gene expression. Premedical - Biology

Regulation of gene expression. Premedical - Biology Regulation of gene expression Premedical - Biology Regulation of gene expression in prokaryotic cell Operon units system of negative feedback positive and negative regulation in eukaryotic cell - at any

More information

Bioinformatics: Network Analysis

Bioinformatics: Network Analysis Bioinformatics: Network Analysis Kinetics of Gene Regulation COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 The simplest model of gene expression involves only two steps: the

More information

The EGF Signaling Pathway! Introduction! Introduction! Chem Lecture 10 Signal Transduction & Sensory Systems Part 3. EGF promotes cell growth

The EGF Signaling Pathway! Introduction! Introduction! Chem Lecture 10 Signal Transduction & Sensory Systems Part 3. EGF promotes cell growth Chem 452 - Lecture 10 Signal Transduction & Sensory Systems Part 3 Question of the Day: Who is the son of Sevenless? Introduction! Signal transduction involves the changing of a cell s metabolism or gene

More information

Answer Key. Cell Growth and Division

Answer Key. Cell Growth and Division Cell Growth and Division Answer Key SECTION 1. THE CELL CYCLE Cell Cycle: (1) Gap1 (G 1): cells grow, carry out normal functions, and copy their organelles. (2) Synthesis (S): cells replicate DNA. (3)

More information