CLUSTER, FUNCTION AND PROMOTER: ANALYSIS OF YEAST EXPRESSION ARRAY

Size: px
Start display at page:

Download "CLUSTER, FUNCTION AND PROMOTER: ANALYSIS OF YEAST EXPRESSION ARRAY"

Transcription

1 CLUSTER, FUNCTION AND PROMOTER: ANALYSIS OF YEAST EXPRESSION ARRAY J. ZHU, M. Q. ZHANG Cold Spring Harbor Lab, P. O. Box 100 Cold Spring Harbor, NY Gene clusters could be derived based on expression profiles, function categorization and promoter regions. To obtain thorough understanding of gene expression and regulation, the three aspects should be combined in an organic way. In this study, we explored the possible ways to analyze the large-scale gene expression data. Three approaches were used to analyze yeast temporal expression data: 1) start from clustering on the expression profiles followed by function categorization and promoter analysis, 2) start from function categorization followed by clustering on expression profiles and promoter analysis, and 3) start from clustering on the promoter region followed by clustering on expression profiles. For clustering analysis on the timeseries data, we developed a largest-first algorithm, which provide a mechanism for quality control on clusters. For promoter analysis, we developed a core-extension algorithm. Introduction DNA chip technology enables the study of gene expression in a large scale. Yeast has been a popular model system for large-scale gene expression studies. Its advantages over other organisms include availability of the entire genomic sequence, relatively small size of genome and a large body of biomedical and genetic information. Up to now, half of its 6200 ORFs have been characterized (1). In addition, the promoter regions of more than 200 genes have been investigated in detail and more than 40 regulatory elements are well defined. With the available information, it is possible to combine function categorization and promoter analysis with the large-scale gene expression study to gain in-depth view on the genome structure and gene regulation. Large-scale gene expression experiments are used to determine drug targets, identify co-regulated genes (2, 3, 4) and study the response to environmental conditions (4, 5) and the effect of a single gene on the entire genome (5, 7). Genes respond to experimental conditions in a dynamic way. Temporal expression experiments are used to capture the dynamic nature of gene expression. In such experiments, the expression data are collected over a time course. Examples of temporal experiments can be found in the studies of diauxic shift (5), sporulation (2) and cell cycles (3). Co-regulated genes may share similar expression profiles, may be involved in related functions or regulated by common regulatory elements. There are different approaches to analyzing the large-scale gene expression data. The essence

2 is to identify gene clusters. For example, one can start from clustering on the expression profiles. For genes with similar expression patterns, identify their functions and putative regulatory elements in their promoter region. The study of the promoter region helps to understand gene co-regulation on the transcription level. Function categorization provides information on protein interaction and pathways. As an alternative, one can start from function categorization. For genes with related functions, study their expression patterns. Gene expression and regulation are complex biological processes. Genes involved in the same pathway or related functions unnecessarily have same expression patterns. It is important to understand what expression patterns are associated with a specific function. Finally, one can start from clustering genes based on their promoters. Genes carrying putative sites of a transcription factor may be related or unrelated in the transcription level. Few studies of regulatory elements on the genome level have been performed (8). Clustering study on genes sharing regulatory elements may provide clue on issues such as on what conditions those elements are active, their roles in activation and repression and their interactions with each other. Since each approach focuses on different aspect of the genome, they are equally important. We tested the above three approaches using data from the sporulation experiment (2), which contains seven time steps. Before introduce the three approaches methods for clustering on expression profiles, function categorization and promoter analysis are described. Methods Largest-first clustering algorithm. Clustering methods are essential to the discovery of expression patterns in the time series data. Two clustering methods are mainly used for this purpose. One is the classic hierarchical clustering method (9). The other is self-organizing map, SOM (10, 11). Both methods provide overviews on the entire data set. Researchers are also interested in clusters with special features, for example clusters with the similarity between each pair of genes being higher than a certain cutoff. To address such problem, we developed an algorithm based on the density search method. Due to the fact that the largest cluster always came out first, it was termed largest-first. Using this approach, clusters could be derived based on their quality requirements. For this algorithm, a data point corresponded to the measurement of temporal expression of a gene, which was shown as a multi-dimensional vector. The similarity between two data points was measured by the Pearson correlation, which has a range from -1 to 1. Prior to clustering, a similarity matrix was generated. At the beginning, all data points were placed in an initial data point pool, which was a collection of unclustered data points. The algorithm underwent iterations. One iteration generated one cluster. Clustered data points withdrew from the initial pool,

3 unclustered data points remained in the pool. In each round and for each data point in the data point pool, its density of neighbors was determined by calculating the number of data points with similarity to the selected data point above a certain cutoff, which was a parameter given by the user. The data point with the most number of neighbors was selected as the core for a new cluster. The next step was to grow the new cluster. For a data point to join a cluster, its similarity to the cluster should be greater than the cutoff. Once a cluster was updated, cluster members were screened against each other, the ones failed to maintain the same criterion withdrew from the cluster and returned to the pool. The algorithm stopped until no more cluster could be found. The major advantage of this algorithm was to provide the mechanism for controlling the quality of clusters. The quality of an output cluster could be controlled by the similarity cutoff. The higher the cutoff the higher the quality. The algorithm was implemented in C. We tested the largest-first algorithm using random sample which contained 1000 data points with 7 time steps. With the similarity cutoff = 0.9, the largest cluster found contained only three data points. With the similarity cutoff = 0.8, the size of largest cluster is 12. For a given data set, tests on the random data of the same dimensions are needed to determine the minimal size of a cluster to be considered significant. Besides providing a mechanism to control the quality of clusters, the largest-first algorithm provide a way to test the significance of a cluster, basically by controlling the size of a cluster. From the random test, the maximum expected size of clusters could be determined. For a cluster to be significant, its size should be several times larger than the maximum expected size. Function categorization. MIPS provides a function catalogue of all yeast genes. Among 6350 genes and ORFs documented in MIPS, 3529 are assigned to at least one function category. 12 major function categories are selected. Refer to table 1 and 2 for their name and number of consisting ORFs. For an identified cluster, two parameters were needed to decide which function category was most related. The first parameter, designated as N, was the number of genes within a function category. The second one designated as -ln P, was the negative natural logarithm of the probability for finding N genes in a function category. The frequency of a function category was determined by dividing the size of the function category by total number of genes. The expected number of genes belonging to a function category (E) was equal to the cluster s size times the function category s frequency. The probability was computed assuming a Poisson distribution. The most related function category was the one with largest -ln P value and N much greater than E. As an alternative Z-scores could be used to determine the most significant function categories.

4 Promoter analysis. The availability of genomic sequence of yeast enables the promoter analysis in the context of clusters. Table 1 shows transcription factors and regulatory elements provided by SCPD (12). Consensus sequences and their functions are also given. They are divided into two groups, uni-core and multiple-core according to the nature of recognition sites. Table 1. Known transcription factors binding sites and regulatory elements Name Consensus sequence Related function Uni-core ADR1 TCTTC Carbohydrate utilization GCN4 TGACTC Amino-acid metabolism GCR1 GMWTCCW Carbohydrate utilization GLN3 GATAAG Nitrogen utilization HAP2 ACCAATNA CCAAT-binding factor INO2 ATGTGAAWW Lipid biosynthesis MAC1 TTTGCTC Reduction and utilization Fe, Cu MATA1 TGATGTWR Repress haploid genes in diploid cells MATα1 WCAAYGNCAG Activates alpha-specific genes MATα2 TCNTGT turn off a-specific genes MCB WCGCGW Cell cycle control MIG1 CCCCRSWWWWW Glucose-repression MSE CACAAAA Middle sporulation element PDR1 TCCGYGGA Detoxification PHO4 CACGTK Phosphate utilization RAP1 RMACCCA Transcription control REB1 CCGGGTARNNR Transcription control RME1 GAACCTCAA Meiosis and mitosis ROX1 YYNATTGTTY Repressor of hypoxic genes SCB CNCGAAA Cell cycle control STE12 ATGAAAC Mating SWI5 WACCAKY Cell cycle control UME6 WCGGCGGCWA Nitrogen repression and induction of meiosis YAP1 TTACTAA Oxidative stress response TBP TATAWAW TATA binding protein SFF GTMAACAA Swi five factor, function with MCM1 ECB GGAAAAD Early cell cycle box STRE AGGGG Stress response element Multi-core ABF1 TCRNNNNNNACG DNA-replication and transcriptional regulation HAP1 CGGNNNTANNCGG Heme-dependent activation GAL4 CGGNNNNNNNNNNNCCG Galactose-induction LEU3 CCGGNNCCGG Branched chain amino acid biosynthesis pathways MCM1 DCCNNNWWRGG Recruits coregulatory proteins for both gene activation and repression at a variety of loci PUT3 CGGNNNNNNNNNNCCG Proline utilization pathway PPR1 CGGNNNNNNCCG Regulating pyrimidine pathway Uni-core motifs contain only one core region to be recognized by transcription factors. Multi-core motifs contain several core regions.

5 The background frequency of a promoter element was determined using the promoter region of all yeast ORFs. The promoter region ranged from -500 to -1 regarding to the start of coding region. For an identified cluster, the promoter regions of all cluster genes were gathered as well. The putative sites of each promoter element were determined by searching through the promoter region for matches to the consensus sequences. The cluster frequency of a promoter element was determined by dividing the number of putative sites by the number of genes in the cluster. The expected number of putative sites could be estimated based on the background frequency. The probability (P) of finding certain number of putative sites could be calculated assuming a Poisson distribution. The significant putative motifs were those having a cluster frequency much higher than the background one and large -ln P. Using this method, non-specific signals such as TATA box and ploy(a) could be easily filtered. There is also great interest in identifying unknown promoter elements. Multiple sequence alignments are commonly used to find common motifs in the promoter regions. Such examples including Gibbs sampler (13), Consensus (14) and MEME (15). Prior information on motif length and motif distribution is required for these approaches. Here, we present an algorithm called core-extension, which is based on k-tuple analysis of the promoter region, and requires no assumption on motif length and distribution. It is similar to approach described in (16). Three types of k-tuples were employed. 5-tuple was a 5-mer with no mismatch. Degenerate 5-tuple was a 5-mer with one mismatch at position 2, 3 and 4, e.g. TNACT, TGNCT, and TGANT. N represented {A, T, C, G}. Degenerate 6- tuple was a 6 mer with one mismatch at position 2, 3, 4, 5. The core-extension algorithm first selected significant k-tuples from the promoter region of a gene cluster. The selection procedure was similar to that for selecting known promoter elements. To demonstrate the procedure, 5-tuple was used as an example. The distributions of 5-tuples in the promoter region of all ORFs were used as a control. Over-represented 5-mers in a gene cluster were selected based on their Z-scores. The selected 5-mers were used as cores for sequence motifs. The initial motif was assumed to have the 5-tuple in the middle with extensions of 5 nucleotides on both sides. All sequences matching this motif were selected. A matrix was built upon. Each cell of the matrix contained the frequency count of a nucleotide at the corresponding position. For each position, a position score was computed as the standard deviation of nucleotide counts in all rows divided by 4. The score ranged from 0 to 1. It was a measurement on how conserved a position was. A score close to 1 indicated one nucleotide was dominant at that position. To validate a matrix, its ends were checked. If a position at either end of a matrix had a score lower than 0.3, it was dropped off from the matrix. The validation procedure stopped until both end positions had scores greater than 0.3. The matrix was used to select new putative sites using a cutoff. New putative sites were used to update the matrix, and the matrix was validated and applied to the next

6 round search. Updating and searching stopped until the matrix reached a stable stage. A consensus sequence was generated based on the final matrix output. The core-extension algorithm was suitable for finding motifs containing a single core. It could find conserved flanking regions around the core and incorporate mismatches in the core region. Other advantages include no requirement of knowledge on motif length and distribution. Degenerate 5-tuples and 6-tuples could be used to find motifs with less conserved cores. Approach 1: start from expression profiles For this approach, gene clusters were derived based on expression profiles. For each cluster, the most significant function category was determined and promoter elements were identified. The sporulation experiment (2) contained seven time steps. The original data set included measurement on more than 6000 ORFs. Among them, 1870 represented characterized genes and contained no null time point data. Using the largest-first clustering method, with similarity cutoff equal to 0.9 and cluster size larger than 30, 8 clusters were identified. The average expression profiles of each cluster are shown in figure 1. Cluster 1, 3, 4, 5 and 6 consisted of genes mainly repressed during sporulation. Other clusters contained induced genes. Most of them reached the highest expression level in the middle of sporulation. Table 2 summarizes the results from function categorization. The significance measurements of each function category were given in table 3. Table 4 shows the most significant function category for each cluster and identified putative regulatory elements. Genes in cluster 2 are induced in the middle of sporulation, most of them carried MSE, the middle sporulation element, in their promoter region. This is consistent with previous studies (17). Function categorization indicated that 30% of them were related to cell growth. In cluster 4, 72 genes were related to protein synthesis, 67 of them were ribosomal proteins. Putative RAP1 sites were identified in the promoter regions for most of them. RAP1 repressed the expression of ribosomal protein during sporulation (18). For clusters with similar profiles, their related functions and regulatory elements may be totally different. For example, cluster 2 and 8 all contained genes induced in the middle of sporulation. The dominant function for cluster 2 was cell growth. No major function was found for cluster 8. Their major promoter elements were also different, MSE for cluster 2 versus CCCCC for cluster 8. To understand the complexity of gene expression, one has to combine the analysis of expression pattern, function categorization and promoter analysis together. Approach 2: start from function categorization

7 This approach was used to identify different expression profile associated with a function category. Expression data were first sorted by function catalogue prior to clustering. Clustering method was the largest-first described above. Promoter analysis was performed on identified clusters. Using the same sporulation data described above, expression data were broken into 12 function categories. Here, we used genes related to cell growth as an example to demonstrate the procedure. Among the 1870 genes selected from the sporulation data, 685 genes were related to cell growth. The result from the largest-first clustering on these genes expression profiles is given in Figure clusters were identified with similarity cutoff = 0.8 and size greater than 10. Size of each cluster and identified putative regulatory elements are given in table 5. Figure 1. Clustering on the expression profiles using the largest-first algorithm. The similarity cutoff is 0.9. The minimal size of a cluster is 30.

8 Table 2. Function categorization of clusters in Figure 1 Cluster Total number of genes Metabolism(1047)* Energy(246) Cell growth(785) Transcription(742) Protein synthesis(346) Protein destination(538) Transportation facilitation(303) Intracellular transport(450) Cellular biogenesis(188) Signal transduction(126) Cell rescue, defense, death, aging(352) Ionic homeostasis(121) *The number in parentheses shows the number of genes related to that function. Table 3. Significance measurement (-ln P) of function categories for clusters in Figure 1 Cluster Metabolism(1047) Energy(246) Cell growth(785) Transcription(742) Protein synthesis(346) Protein destination(538) Transportation facilitation(303) Intracellular transport(450) Cellular biogenesis(188) Signal transduction(126) Cell rescue, defense, death, aging(352) Ionic homeostasis(121) P is the probability of find N genes (as shown table2) belonging to a function category in a cluster. Cluster 1, 4, 6 had different expression profiles but they seemed to be controlled by the same transcription factors MCB, with consensus ACGCGT. Cluster 1 and 6 were mainly repressed during sporulation, while cluster 4 was induced in the later stage. It indicated that MCB might function differently on different gene cluster. Cluster 2 also consisted of repressed genes, however, most of them carried sequences similar to REB1 s binding sites. Compared to cluster 6, cluster 2 had a lagged repression. This might be due to the difference between functions of MCB and REB1. Cluster 7 genes were induced in the middle of sporulation. Instead of MSE, STRE was dominant in their promoter region. STRE

9 was the dominant in the promoter region of cluster 1 in figure 1, which was repressed during sporulation. The same element was involved in repression as well as induction under the same condition. The results indicted a transcription factor may play different roles on different gene clusters. Approach 3: start from promoter It is important to find the relationship between promoter elements and expression profiles. Approach 1 presents a way to identify putative promoter elements for a given expression pattern. It is only one aspect of the problem. Another aspect is to identify expression patterns for a promoter element. For this approach, genes containing certain motifs were selected first. Then clustering was performed. MSE was identified as the major regulatory element for cluster 2 in Figure 1, which was induced in the middle of sporulation. Among 1870 genes selected from the sporulation data, 528 genes contained MSE (CACAAAA) in their promoter region. Figure 3 shows the clustering result on MSE containing genes from the largest-first algorithm. The similarity cutoff was 0.8. Besides clusters induced in the middle of sporulation, there were cluster corresponding to induction in the early and late stage, and those repressed at various stages. It indicated MSE alone was not enough to determine the expression pattern. It might function through interactions with other elements. For example, cluster 5 was induced in the early middle of sporulation. Promoter analysis showed cluster 5 was also rich in STRE (AGGGG) besides MSE. Conclusion Clustering analysis, function categorization and promoter analysis help to gain thorough overviews of the expression data. There is no logic order to define which analysis should come first. Currently, the most commonly used approach is the approach 1 described in the text. However, other approaches also provide useful information. No single method is good enough. It is important to combine different approaches together. Acknowledgement This work was supported by NIH grant HG01696 to M. Q. Zhang.

10 Figure 2. Clustering on expression profiles of 685 cell growth related genes using largest-first algorithm. Similarity cutoff is 0.8. Figure 3. Clustering on expression profiles of 528 MSE containing genes using largest-first algorithm. Similarity cutoff is 0.8.

11 Table 4. Functions and regulatory elements for clusters in Figure 1 Cluster Function* Regulatory elements 1 Energy AGGGG (STRE) 2 Cell growth CACAAAA (MSE) 3 Metabolism TNCCACAC 4 Protein synthesis ACCCATACAT (RAP1) 5 6 Metabolism GCGCAAAA 7 Cell growth AGGCGCCT 8 CCCCC *The most significant function was determined by combining information in Table 1 and 2. Table 5. Size and promoter elements for clusters in Figure 2 Cluster Size Promoter elements ACGCGT (MCB) 2 89 TTACCCG (REB1) ACGCGW (MCB) ACGCGT (MCB) 7 23 AGGGG (STRE)

12 Reference: 1. J.M. Cherry, C Adler, C Ball, S.A. Chervitz, S.S. Dwight, E.T. Hester, Y. Jia, G. Juvik, T. Roe, M. Schroeder, S. Weng, D. Botstein. Nucleic Acids Res 1998, 26: S. Chu, J. DeRisi, M. Eisen, J. Mulholland, D. Botstein, P.O. Brown, I. Herskowitz. Science 1998, 282: P.T. Spellman, G. Sherlock, M.Q. Zhang, V.R. Iyer, K. Anders, M.B. Eisen, P.O. Brown, D. Botstein, B. Futcher. Mol Biol Cell 1998, 9: F.P. Roth, J.D. Hughes, P.W. Estep, G.M. Church, Nat Biotechnol 1998, 16: J.L. DeRisi, V.R. Iyer, P.O. Brown. Science 1997, 278: E.G. Jennings, R.A. Young. Trends Genet 1999, 15: D.A. Pearce, T. Ferea, S.A. Nosel, B. Das, F. Sherman. Nat Genet 1999, 22: M.Q. Zhang. 1999, Genome Res, in press. 9. M.B. Eisen, P.T. Spellman, P.O. Brown, D. Botstein. Proc Natl Acad Sci U S A 1998, 95: P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E.S. Lander, T. Golub. Proc Natl Acad Sci U S A 1999, 96: P. Toronen, M. Kolehmainen, G. Wong, E. Castren. FEBS Lett. 1999, 451: J. Zhu, M.Q. Zhang. Bioinformatics. In press 13. C.E. Lawrence, S.F. Altschul, M.S. Boguski, J.S. Liu, A.F. Neuwald, J.C. Wootton. Science 1993, 262: G.Z. Hertz, G.W. Hartzell, G.D. Stormo. Comput Appl Biosci. 1990, 6: M.Q. Zhang. 1999, Comp & Chem. 23: T.L. Bailey, C. Elkan. Ismb 1995, 3: N. Ozsarac, M.J. Straffon, H.E. Dalton, I.W. Dawes. Mol Cell Biol 1997, 17: K. Mizuta, R. Tsujii, J.R. Warner, M. Nishiyama. Nucleic Acids Res 1998, 26:

Rule learning for gene expression data

Rule learning for gene expression data Rule learning for gene expression data Stefan Enroth Original slides by Torgeir R. Hvidsten The Linnaeus Centre for Bioinformatics Predicting biological process from gene expression time profiles Papers:

More information

Estimation of Identification Methods of Gene Clusters Using GO Term Annotations from a Hierarchical Cluster Tree

Estimation of Identification Methods of Gene Clusters Using GO Term Annotations from a Hierarchical Cluster Tree Estimation of Identification Methods of Gene Clusters Using GO Term Annotations from a Hierarchical Cluster Tree YOICHI YAMADA, YUKI MIYATA, MASANORI HIGASHIHARA*, KENJI SATOU Graduate School of Natural

More information

Matrix-based pattern discovery algorithms

Matrix-based pattern discovery algorithms Regulatory Sequence Analysis Matrix-based pattern discovery algorithms Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe)

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2004 Paper 144 A Statistical Method for Constructing Transcriptional Regulatory Networks Using Gene

More information

Topographic Independent Component Analysis of Gene Expression Time Series Data

Topographic Independent Component Analysis of Gene Expression Time Series Data Topographic Independent Component Analysis of Gene Expression Time Series Data Sookjeong Kim and Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 31 Hyoja-dong,

More information

Bioinformatics. Transcriptome

Bioinformatics. Transcriptome Bioinformatics Transcriptome Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe) http://www.bigre.ulb.ac.be/ Bioinformatics

More information

A Case Study -- Chu et al. The Transcriptional Program of Sporulation in Budding Yeast. What is Sporulation? CSE 527, W.L. Ruzzo 1

A Case Study -- Chu et al. The Transcriptional Program of Sporulation in Budding Yeast. What is Sporulation? CSE 527, W.L. Ruzzo 1 A Case Study -- Chu et al. An interesting early microarray paper My goals Show arrays used in a real experiment Show where computation is important Start looking at analysis techniques The Transcriptional

More information

Inferring Gene Regulatory Networks from Time-Ordered Gene Expression Data of Bacillus Subtilis Using Differential Equations

Inferring Gene Regulatory Networks from Time-Ordered Gene Expression Data of Bacillus Subtilis Using Differential Equations Inferring Gene Regulatory Networks from Time-Ordered Gene Expression Data of Bacillus Subtilis Using Differential Equations M.J.L. de Hoon, S. Imoto, K. Kobayashi, N. Ogasawara, S. Miyano Pacific Symposium

More information

Alignment. Peak Detection

Alignment. Peak Detection ChIP seq ChIP Seq Hongkai Ji et al. Nature Biotechnology 26: 1293-1300. 2008 ChIP Seq Analysis Alignment Peak Detection Annotation Visualization Sequence Analysis Motif Analysis Alignment ELAND Bowtie

More information

Chapter 7: Regulatory Networks

Chapter 7: Regulatory Networks Chapter 7: Regulatory Networks 7.2 Analyzing Regulation Prof. Yechiam Yemini (YY) Computer Science Department Columbia University The Challenge How do we discover regulatory mechanisms? Complexity: hundreds

More information

Modelling Gene Expression Data over Time: Curve Clustering with Informative Prior Distributions.

Modelling Gene Expression Data over Time: Curve Clustering with Informative Prior Distributions. BAYESIAN STATISTICS 7, pp. 000 000 J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West (Eds.) c Oxford University Press, 2003 Modelling Data over Time: Curve

More information

Gene Regula*on, ChIP- X and DNA Mo*fs. Statistics in Genomics Hongkai Ji

Gene Regula*on, ChIP- X and DNA Mo*fs. Statistics in Genomics Hongkai Ji Gene Regula*on, ChIP- X and DNA Mo*fs Statistics in Genomics Hongkai Ji (hji@jhsph.edu) Genetic information is stored in DNA TCAGTTGGAGCTGCTCCCCCACGGCCTCTCCTCACATTCCACGTCCTGTAGCTCTATGACCTCCACCTTTGAGTCCCTCCTC

More information

Computational Genomics. Reconstructing dynamic regulatory networks in multiple species

Computational Genomics. Reconstructing dynamic regulatory networks in multiple species 02-710 Computational Genomics Reconstructing dynamic regulatory networks in multiple species Methods for reconstructing networks in cells CRH1 SLT2 SLR3 YPS3 YPS1 Amit et al Science 2009 Pe er et al Recomb

More information

Supplemental Information for Pramila et al. Periodic Normal Mixture Model (PNM)

Supplemental Information for Pramila et al. Periodic Normal Mixture Model (PNM) Supplemental Information for Pramila et al. Periodic Normal Mixture Model (PNM) The data sets alpha30 and alpha38 were analyzed with PNM (Lu et al. 2004). The first two time points were deleted to alleviate

More information

Graph Alignment and Biological Networks

Graph Alignment and Biological Networks Graph Alignment and Biological Networks Johannes Berg http://www.uni-koeln.de/ berg Institute for Theoretical Physics University of Cologne Germany p.1/12 Networks in molecular biology New large-scale

More information

Analyzing Microarray Time course Genome wide Data

Analyzing Microarray Time course Genome wide Data OR 779 Functional Data Analysis Course Project Analyzing Microarray Time course Genome wide Data Presented by Xin Zhao April 29, 2002 Cornell University Overview 1. Introduction Biological Background Biological

More information

Transcription Factor Binding Site Positioning in Yeast: Proximal Promoter Motifs Characterize TATA-Less Promoters

Transcription Factor Binding Site Positioning in Yeast: Proximal Promoter Motifs Characterize TATA-Less Promoters Transcription Factor Binding Site Positioning in Yeast: Proximal Promoter Motifs Characterize TATA-Less Promoters Ionas Erb 1, Erik van Nimwegen 2 * 1 Bioinformatics and Genomics program, Center for Genomic

More information

Regulatory Element Detection using a Probabilistic Segmentation Model

Regulatory Element Detection using a Probabilistic Segmentation Model Regulatory Element Detection using a Probabilistic Segmentation Model Harmen J Bussemaker 1, Hao Li 2,3, and Eric D Siggia 2,4 1 Swammerdam Institute for Life Sciences and Amsterdam Center for Computational

More information

Eukaryotic Gene Expression

Eukaryotic Gene Expression Eukaryotic Gene Expression Lectures 22-23 Several Features Distinguish Eukaryotic Processes From Mechanisms in Bacteria 123 Eukaryotic Gene Expression Several Features Distinguish Eukaryotic Processes

More information

Tree-Dependent Components of Gene Expression Data for Clustering

Tree-Dependent Components of Gene Expression Data for Clustering Tree-Dependent Components of Gene Expression Data for Clustering Jong Kyoung Kim and Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 31 Hyoja-dong, Nam-gu Pohang

More information

Written Exam 15 December Course name: Introduction to Systems Biology Course no

Written Exam 15 December Course name: Introduction to Systems Biology Course no Technical University of Denmark Written Exam 15 December 2008 Course name: Introduction to Systems Biology Course no. 27041 Aids allowed: Open book exam Provide your answers and calculations on separate

More information

Genome-wide co-occurrence of promoter elements reveals a cisregulatory. cassette of rrna transcription motifs in S. cerevisiae

Genome-wide co-occurrence of promoter elements reveals a cisregulatory. cassette of rrna transcription motifs in S. cerevisiae Genome-wide co-occurrence of promoter elements reveals a cisregulatory cassette of rrna transcription motifs in S. cerevisiae Priya Sudarsanam #, Yitzhak Pilpel #, and George M. Church * Department of

More information

Lecture 2: Read about the yeast MAT locus in Molecular Biology of the Gene. Watson et al. Chapter 10. Plus section on yeast as a model system Read

Lecture 2: Read about the yeast MAT locus in Molecular Biology of the Gene. Watson et al. Chapter 10. Plus section on yeast as a model system Read Lecture 2: Read about the yeast MAT locus in Molecular Biology of the Gene. Watson et al. Chapter 10. Plus section on yeast as a model system Read chapter 22 and chapter 10 [section on MATing type gene

More information

Theoretical distribution of PSSM scores

Theoretical distribution of PSSM scores Regulatory Sequence Analysis Theoretical distribution of PSSM scores Jacques van Helden Jacques.van-Helden@univ-amu.fr Aix-Marseille Université, France Technological Advances for Genomics and Clinics (TAGC,

More information

BMD645. Integration of Omics

BMD645. Integration of Omics BMD645 Integration of Omics Shu-Jen Chen, Chang Gung University Dec. 11, 2009 1 Traditional Biology vs. Systems Biology Traditional biology : Single genes or proteins Systems biology: Simultaneously study

More information

PRINCIPAL COMPONENTS ANALYSIS TO SUMMARIZE MICROARRAY EXPERIMENTS: APPLICATION TO SPORULATION TIME SERIES

PRINCIPAL COMPONENTS ANALYSIS TO SUMMARIZE MICROARRAY EXPERIMENTS: APPLICATION TO SPORULATION TIME SERIES PRINCIPAL COMPONENTS ANALYSIS TO SUMMARIZE MICROARRAY EXPERIMENTS: APPLICATION TO SPORULATION TIME SERIES Soumya Raychaudhuri *, Joshua M. Stuart *, and Russ B. Altman Ψ Stanford Medical Informatics Stanford

More information

Biology I Fall Semester Exam Review 2014

Biology I Fall Semester Exam Review 2014 Biology I Fall Semester Exam Review 2014 Biomolecules and Enzymes (Chapter 2) 8 questions Macromolecules, Biomolecules, Organic Compunds Elements *From the Periodic Table of Elements Subunits Monomers,

More information

UNIT 6 PART 3 *REGULATION USING OPERONS* Hillis Textbook, CH 11

UNIT 6 PART 3 *REGULATION USING OPERONS* Hillis Textbook, CH 11 UNIT 6 PART 3 *REGULATION USING OPERONS* Hillis Textbook, CH 11 REVIEW: Signals that Start and Stop Transcription and Translation BUT, HOW DO CELLS CONTROL WHICH GENES ARE EXPRESSED AND WHEN? First of

More information

Integration of functional genomics data

Integration of functional genomics data Integration of functional genomics data Laboratoire Bordelais de Recherche en Informatique (UMR) Centre de Bioinformatique de Bordeaux (Plateforme) Rennes Oct. 2006 1 Observations and motivations Genomics

More information

T H E J O U R N A L O F C E L L B I O L O G Y

T H E J O U R N A L O F C E L L B I O L O G Y T H E J O U R N A L O F C E L L B I O L O G Y Supplemental material Breker et al., http://www.jcb.org/cgi/content/full/jcb.201301120/dc1 Figure S1. Single-cell proteomics of stress responses. (a) Using

More information

Number of questions TEK (Learning Target) Biomolecules & Enzymes

Number of questions TEK (Learning Target) Biomolecules & Enzymes Unit Biomolecules & Enzymes Number of questions TEK (Learning Target) on Exam 8 questions 9A I can compare and contrast the structure and function of biomolecules. 9C I know the role of enzymes and how

More information

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

Inferring Transcriptional Regulatory Networks from Gene Expression Data II Inferring Transcriptional Regulatory Networks from Gene Expression Data II Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday

More information

NOISE-ROBUST SOFT CLUSTERING OF GENE EXPRESSION TIME-COURSE DATA

NOISE-ROBUST SOFT CLUSTERING OF GENE EXPRESSION TIME-COURSE DATA Journal of Bioinformatics and Computational Biology Vol., No. 4 (5) 965 988 c Imperial College Press NOISE-ROBUST SOFT CLUSTERING OF GENE EXPRESSION TIME-COURSE DATA MATTHIAS E. FUTSCHIK, Institute of

More information

In-Depth Assessment of Local Sequence Alignment

In-Depth Assessment of Local Sequence Alignment 2012 International Conference on Environment Science and Engieering IPCBEE vol.3 2(2012) (2012)IACSIT Press, Singapoore In-Depth Assessment of Local Sequence Alignment Atoosa Ghahremani and Mahmood A.

More information

Cluster Analysis of Gene Expression Microarray Data. BIOL 495S/ CS 490B/ MATH 490B/ STAT 490B Introduction to Bioinformatics April 8, 2002

Cluster Analysis of Gene Expression Microarray Data. BIOL 495S/ CS 490B/ MATH 490B/ STAT 490B Introduction to Bioinformatics April 8, 2002 Cluster Analysis of Gene Expression Microarray Data BIOL 495S/ CS 490B/ MATH 490B/ STAT 490B Introduction to Bioinformatics April 8, 2002 1 Data representations Data are relative measurements log 2 ( red

More information

Position-specific scoring matrices (PSSM)

Position-specific scoring matrices (PSSM) Regulatory Sequence nalysis Position-specific scoring matrices (PSSM) Jacques van Helden Jacques.van-Helden@univ-amu.fr Université d ix-marseille, France Technological dvances for Genomics and Clinics

More information

Linear models to discover cis-regulatory modules. Statistics 246 Spring 2006 Week 16 Lecture 1

Linear models to discover cis-regulatory modules. Statistics 246 Spring 2006 Week 16 Lecture 1 Linear models to discover cis-regulatory modules Statistics 246 Spring 26 Week 6 Lecture Finding motifs/modules using gene expression data Linear regression methods Model log gene expression (or log ratios

More information

Exploratory statistical analysis of multi-species time course gene expression

Exploratory statistical analysis of multi-species time course gene expression Exploratory statistical analysis of multi-species time course gene expression data Eng, Kevin H. University of Wisconsin, Department of Statistics 1300 University Avenue, Madison, WI 53706, USA. E-mail:

More information

Whole-genome analysis of GCN4 binding in S.cerevisiae

Whole-genome analysis of GCN4 binding in S.cerevisiae Whole-genome analysis of GCN4 binding in S.cerevisiae Lillian Dai Alex Mallet Gcn4/DNA diagram (CREB symmetric site and AP-1 asymmetric site: Song Tan, 1999) removed for copyright reasons. What is GCN4?

More information

De novo identification of motifs in one species. Modified from Serafim Batzoglou s lecture notes

De novo identification of motifs in one species. Modified from Serafim Batzoglou s lecture notes De novo identification of motifs in one species Modified from Serafim Batzoglou s lecture notes Finding Regulatory Motifs... Given a collection of genes that may be regulated by the same transcription

More information

Introduction to Bioinformatics

Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics

More information

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1 Tiffany Samaroo MB&B 452a December 8, 2003 Take Home Final Topic 1 Prior to 1970, protein and DNA sequence alignment was limited to visual comparison. This was a very tedious process; even proteins with

More information

Chapter 15 Active Reading Guide Regulation of Gene Expression

Chapter 15 Active Reading Guide Regulation of Gene Expression Name: AP Biology Mr. Croft Chapter 15 Active Reading Guide Regulation of Gene Expression The overview for Chapter 15 introduces the idea that while all cells of an organism have all genes in the genome,

More information

Predicting Protein Functions and Domain Interactions from Protein Interactions

Predicting Protein Functions and Domain Interactions from Protein Interactions Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput

More information

Boolean models of gene regulatory networks. Matthew Macauley Math 4500: Mathematical Modeling Clemson University Spring 2016

Boolean models of gene regulatory networks. Matthew Macauley Math 4500: Mathematical Modeling Clemson University Spring 2016 Boolean models of gene regulatory networks Matthew Macauley Math 4500: Mathematical Modeling Clemson University Spring 2016 Gene expression Gene expression is a process that takes gene info and creates

More information

2. Yeast two-hybrid system

2. Yeast two-hybrid system 2. Yeast two-hybrid system I. Process workflow a. Mating of haploid two-hybrid strains on YPD plates b. Replica-plating of diploids on selective plates c. Two-hydrid experiment plating on selective plates

More information

Dynamic modular architecture of protein-protein interaction networks beyond the dichotomy of date and party hubs

Dynamic modular architecture of protein-protein interaction networks beyond the dichotomy of date and party hubs Dynamic modular architecture of protein-protein interaction networks beyond the dichotomy of date and party hubs Xiao Chang 1,#, Tao Xu 2,#, Yun Li 3, Kai Wang 1,4,5,* 1 Zilkha Neurogenetic Institute,

More information

Gene regulation I Biochemistry 302. Bob Kelm February 25, 2005

Gene regulation I Biochemistry 302. Bob Kelm February 25, 2005 Gene regulation I Biochemistry 302 Bob Kelm February 25, 2005 Principles of gene regulation (cellular versus molecular level) Extracellular signals Chemical (e.g. hormones, growth factors) Environmental

More information

Deciphering the cis-regulatory network of an organism is a

Deciphering the cis-regulatory network of an organism is a Identifying the conserved network of cis-regulatory sites of a eukaryotic genome Ting Wang and Gary D. Stormo* Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110 Edited

More information

A temporal hidden Markov regression model for the analysis. of gene regulatory networks

A temporal hidden Markov regression model for the analysis. of gene regulatory networks A temporal hidden Markov regression model for the analysis of gene regulatory networks Mayetri Gupta, Pingping Qu, and Joseph G. Ibrahim February 20, 2007 Abstract We propose a novel hierarchical hidden

More information

Shrinkage-Based Similarity Metric for Cluster Analysis of Microarray Data

Shrinkage-Based Similarity Metric for Cluster Analysis of Microarray Data Shrinkage-Based Similarity Metric for Cluster Analysis of Microarray Data Vera Cherepinsky, Jiawu Feng, Marc Rejali, and Bud Mishra, Courant Institute, New York University, 5 Mercer Street, New York, NY

More information

Self Similar (Scale Free, Power Law) Networks (I)

Self Similar (Scale Free, Power Law) Networks (I) Self Similar (Scale Free, Power Law) Networks (I) E6083: lecture 4 Prof. Predrag R. Jelenković Dept. of Electrical Engineering Columbia University, NY 10027, USA {predrag}@ee.columbia.edu February 7, 2007

More information

Xiaosi Zhang. A thesis submitted to the graduate faculty. in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE

Xiaosi Zhang. A thesis submitted to the graduate faculty. in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE GENE EXPRESSION PATTERN ANALYSIS Xiaosi Zhang A thesis submitted to the graduate faculty in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Major: Bioinformatics and Computational

More information

Biology Assessment. Eligible Texas Essential Knowledge and Skills

Biology Assessment. Eligible Texas Essential Knowledge and Skills Biology Assessment Eligible Texas Essential Knowledge and Skills STAAR Biology Assessment Reporting Category 1: Cell Structure and Function The student will demonstrate an understanding of biomolecules

More information

STAAR Biology Assessment

STAAR Biology Assessment STAAR Biology Assessment Reporting Category 1: Cell Structure and Function The student will demonstrate an understanding of biomolecules as building blocks of cells, and that cells are the basic unit of

More information

56:198:582 Biological Networks Lecture 10

56:198:582 Biological Networks Lecture 10 56:198:582 Biological Networks Lecture 10 Temporal Programs and the Global Structure The single-input module (SIM) network motif The network motifs we have studied so far all had a defined number of nodes.

More information

Lecture 5: November 19, Minimizing the maximum intracluster distance

Lecture 5: November 19, Minimizing the maximum intracluster distance Analysis of DNA Chips and Gene Networks Spring Semester, 2009 Lecture 5: November 19, 2009 Lecturer: Ron Shamir Scribe: Renana Meller 5.1 Minimizing the maximum intracluster distance 5.1.1 Introduction

More information

Multiple Choice Review- Eukaryotic Gene Expression

Multiple Choice Review- Eukaryotic Gene Expression Multiple Choice Review- Eukaryotic Gene Expression 1. Which of the following is the Central Dogma of cell biology? a. DNA Nucleic Acid Protein Amino Acid b. Prokaryote Bacteria - Eukaryote c. Atom Molecule

More information

Central postgenomic. Transcription regulation: a genomic network. Transcriptome: the set of all mrnas expressed in a cell at a specific time.

Central postgenomic. Transcription regulation: a genomic network. Transcriptome: the set of all mrnas expressed in a cell at a specific time. Transcription regulation: a genomic network Nicholas Luscombe Laboratory of Mark Gerstein Department of Molecular Biophysics and Biochemistry Yale University Understand Proteins, through analyzing populations

More information

Comparative RNA-seq analysis of transcriptome dynamics during petal development in Rosa chinensis

Comparative RNA-seq analysis of transcriptome dynamics during petal development in Rosa chinensis Title Comparative RNA-seq analysis of transcriptome dynamics during petal development in Rosa chinensis Author list Yu Han 1, Huihua Wan 1, Tangren Cheng 1, Jia Wang 1, Weiru Yang 1, Huitang Pan 1* & Qixiang

More information

IDENTIFYING BIOLOGICAL PATHWAYS VIA PHASE DECOMPOSITION AND PROFILE EXTRACTION

IDENTIFYING BIOLOGICAL PATHWAYS VIA PHASE DECOMPOSITION AND PROFILE EXTRACTION IDENTIFYING BIOLOGICAL PATHWAYS VIA PHASE DECOMPOSITION AND PROFILE EXTRACTION 2691 Yi Zhang and Zhidong Deng * Department of Computer Science, Tsinghua University Beijing, 100084, China * Email: michael@tsinghua.edu.cn

More information

3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization.

3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization. 3.B.1 Gene Regulation Gene regulation results in differential gene expression, leading to cell specialization. We will focus on gene regulation in prokaryotes first. Gene regulation accounts for some of

More information

Welcome to Class 21!

Welcome to Class 21! Welcome to Class 21! Introductory Biochemistry! Lecture 21: Outline and Objectives l Regulation of Gene Expression in Prokaryotes! l transcriptional regulation! l principles! l lac operon! l trp attenuation!

More information

Dynamics of cellular level function and regulation derived from murine expression array data

Dynamics of cellular level function and regulation derived from murine expression array data Dynamics of cellular level function and regulation derived from murine expression array data Benjamin de Bivort*, Sui Huang, and Yaneer Bar-Yam* *Department of Molecular and Cellular Biology, Harvard University,

More information

Introduction to clustering methods for gene expression data analysis

Introduction to clustering methods for gene expression data analysis Introduction to clustering methods for gene expression data analysis Giorgio Valentini e-mail: valentini@dsi.unimi.it Outline Levels of analysis of DNA microarray data Clustering methods for functional

More information

Bacterial Genetics & Operons

Bacterial Genetics & Operons Bacterial Genetics & Operons The Bacterial Genome Because bacteria have simple genomes, they are used most often in molecular genetics studies Most of what we know about bacterial genetics comes from the

More information

Evidence for dynamically organized modularity in the yeast protein-protein interaction network

Evidence for dynamically organized modularity in the yeast protein-protein interaction network Evidence for dynamically organized modularity in the yeast protein-protein interaction network Sari Bombino Helsinki 27.3.2007 UNIVERSITY OF HELSINKI Department of Computer Science Seminar on Computational

More information

RNA & PROTEIN SYNTHESIS. Making Proteins Using Directions From DNA

RNA & PROTEIN SYNTHESIS. Making Proteins Using Directions From DNA RNA & PROTEIN SYNTHESIS Making Proteins Using Directions From DNA RNA & Protein Synthesis v Nitrogenous bases in DNA contain information that directs protein synthesis v DNA remains in nucleus v in order

More information

SI Materials and Methods

SI Materials and Methods SI Materials and Methods Gibbs Sampling with Informative Priors. Full description of the PhyloGibbs algorithm, including comprehensive tests on synthetic and yeast data sets, can be found in Siddharthan

More information

Gibbs Sampling Methods for Multiple Sequence Alignment

Gibbs Sampling Methods for Multiple Sequence Alignment Gibbs Sampling Methods for Multiple Sequence Alignment Scott C. Schmidler 1 Jun S. Liu 2 1 Section on Medical Informatics and 2 Department of Statistics Stanford University 11/17/99 1 Outline Statistical

More information

INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA

INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA XIUFENG WAN xw6@cs.msstate.edu Department of Computer Science Box 9637 JOHN A. BOYLE jab@ra.msstate.edu Department of Biochemistry and Molecular Biology

More information

Quantitative Bioinformatics

Quantitative Bioinformatics Chapter 9 Class Notes Signals in DNA 9.1. The Biological Problem: since proteins cannot read, how do they recognize nucleotides such as A, C, G, T? Although only approximate, proteins actually recognize

More information

Formative/Summative Assessments (Tests, Quizzes, reflective writing, Journals, Presentations)

Formative/Summative Assessments (Tests, Quizzes, reflective writing, Journals, Presentations) Biology Curriculum Map 2017-18 2 Weeks- Introduction to Biology: Scientific method, lab safety, organizing and analyzing data, and psuedoscience. This unit establishes the fundamental nature of scientific

More information

Chapter 12. Genes: Expression and Regulation

Chapter 12. Genes: Expression and Regulation Chapter 12 Genes: Expression and Regulation 1 DNA Transcription or RNA Synthesis produces three types of RNA trna carries amino acids during protein synthesis rrna component of ribosomes mrna directs protein

More information

Clustering & microarray technology

Clustering & microarray technology Clustering & microarray technology A large scale way to measure gene expression levels. Thanks to Kevin Wayne, Matt Hibbs, & SMD for a few of the slides 1 Why is expression important? Proteins Gene Expression

More information

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs15.html Describing & Modeling Patterns

More information

Prokaryotic Gene Expression (Learning Objectives)

Prokaryotic Gene Expression (Learning Objectives) Prokaryotic Gene Expression (Learning Objectives) 1. Learn how bacteria respond to changes of metabolites in their environment: short-term and longer-term. 2. Compare and contrast transcriptional control

More information

Introduction to Biology

Introduction to Biology Introduction to Biology Course Description Introduction to Biology is an introductory course in the biological sciences. Topics included are biological macromolecules, cell biology and metabolism, DNA

More information

Gene regulation II Biochemistry 302. February 27, 2006

Gene regulation II Biochemistry 302. February 27, 2006 Gene regulation II Biochemistry 302 February 27, 2006 Molecular basis of inhibition of RNAP by Lac repressor 35 promoter site 10 promoter site CRP/DNA complex 60 Lewis, M. et al. (1996) Science 271:1247

More information

GCD3033:Cell Biology. Transcription

GCD3033:Cell Biology. Transcription Transcription Transcription: DNA to RNA A) production of complementary strand of DNA B) RNA types C) transcription start/stop signals D) Initiation of eukaryotic gene expression E) transcription factors

More information

Variable Selection in Structured High-dimensional Covariate Spaces

Variable Selection in Structured High-dimensional Covariate Spaces Variable Selection in Structured High-dimensional Covariate Spaces Fan Li 1 Nancy Zhang 2 1 Department of Health Care Policy Harvard University 2 Department of Statistics Stanford University May 14 2007

More information

Lecture 3: A basic statistical concept

Lecture 3: A basic statistical concept Lecture 3: A basic statistical concept P value In statistical hypothesis testing, the p value is the probability of obtaining a result at least as extreme as the one that was actually observed, assuming

More information

Topic 4 - #14 The Lactose Operon

Topic 4 - #14 The Lactose Operon Topic 4 - #14 The Lactose Operon The Lactose Operon The lactose operon is an operon which is responsible for the transport and metabolism of the sugar lactose in E. coli. - Lactose is one of many organic

More information

On the Monotonicity of the String Correction Factor for Words with Mismatches

On the Monotonicity of the String Correction Factor for Words with Mismatches On the Monotonicity of the String Correction Factor for Words with Mismatches (extended abstract) Alberto Apostolico Georgia Tech & Univ. of Padova Cinzia Pizzi Univ. of Padova & Univ. of Helsinki Abstract.

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

2015 FALL FINAL REVIEW

2015 FALL FINAL REVIEW 2015 FALL FINAL REVIEW Biomolecules & Enzymes Illustrate table and fill in parts missing 9A I can compare and contrast the structure and function of biomolecules. 9C I know the role of enzymes and how

More information

Prokaryotic Gene Expression (Learning Objectives)

Prokaryotic Gene Expression (Learning Objectives) Prokaryotic Gene Expression (Learning Objectives) 1. Learn how bacteria respond to changes of metabolites in their environment: short-term and longer-term. 2. Compare and contrast transcriptional control

More information

Data visualization and clustering: an application to gene expression data

Data visualization and clustering: an application to gene expression data Data visualization and clustering: an application to gene expression data Francesco Napolitano Università degli Studi di Salerno Dipartimento di Matematica e Informatica DAA Erice, April 2007 Thanks to

More information

Lecture 4: Transcription networks basic concepts

Lecture 4: Transcription networks basic concepts Lecture 4: Transcription networks basic concepts - Activators and repressors - Input functions; Logic input functions; Multidimensional input functions - Dynamics and response time 2.1 Introduction The

More information

Introduction to clustering methods for gene expression data analysis

Introduction to clustering methods for gene expression data analysis Introduction to clustering methods for gene expression data analysis Giorgio Valentini e-mail: valentini@dsi.unimi.it Outline Levels of analysis of DNA microarray data Clustering methods for functional

More information

Initiation of translation in eukaryotic cells:connecting the head and tail

Initiation of translation in eukaryotic cells:connecting the head and tail Initiation of translation in eukaryotic cells:connecting the head and tail GCCRCCAUGG 1: Multiple initiation factors with distinct biochemical roles (linking, tethering, recruiting, and scanning) 2: 5

More information

Lecture 18 June 2 nd, Gene Expression Regulation Mutations

Lecture 18 June 2 nd, Gene Expression Regulation Mutations Lecture 18 June 2 nd, 2016 Gene Expression Regulation Mutations From Gene to Protein Central Dogma Replication DNA RNA PROTEIN Transcription Translation RNA Viruses: genome is RNA Reverse Transcriptase

More information

Bayesian networks for reconstructing transcriptional regulatory networks

Bayesian networks for reconstructing transcriptional regulatory networks Bayesian networks for reconstructing transcriptional regulatory networks Prof Su-In Lee Computer Science & Genome Sciences University of Washington, Seattle GENOME 541 Introduction to Computational Molecular

More information

Curriculum Map. Biology, Quarter 1 Big Ideas: From Molecules to Organisms: Structures and Processes (BIO1.LS1)

Curriculum Map. Biology, Quarter 1 Big Ideas: From Molecules to Organisms: Structures and Processes (BIO1.LS1) 1 Biology, Quarter 1 Big Ideas: From Molecules to Organisms: Structures and Processes (BIO1.LS1) Focus Standards BIO1.LS1.2 Evaluate comparative models of various cell types with a focus on organic molecules

More information

4. Why not make all enzymes all the time (even if not needed)? Enzyme synthesis uses a lot of energy.

4. Why not make all enzymes all the time (even if not needed)? Enzyme synthesis uses a lot of energy. 1 C2005/F2401 '10-- Lecture 15 -- Last Edited: 11/02/10 01:58 PM Copyright 2010 Deborah Mowshowitz and Lawrence Chasin Department of Biological Sciences Columbia University New York, NY. Handouts: 15A

More information

Gene Control Mechanisms at Transcription and Translation Levels

Gene Control Mechanisms at Transcription and Translation Levels Gene Control Mechanisms at Transcription and Translation Levels Dr. M. Vijayalakshmi School of Chemical and Biotechnology SASTRA University Joint Initiative of IITs and IISc Funded by MHRD Page 1 of 9

More information

Supplementary materials Quantitative assessment of ribosome drop-off in E. coli

Supplementary materials Quantitative assessment of ribosome drop-off in E. coli Supplementary materials Quantitative assessment of ribosome drop-off in E. coli Celine Sin, Davide Chiarugi, Angelo Valleriani 1 Downstream Analysis Supplementary Figure 1: Illustration of the core steps

More information

Supplemental Material

Supplemental Material Supplemental Material Protocol S1 The PromPredict algorithm considers the average free energy over a 100nt window and compares it with the the average free energy over a downstream 100nt window separated

More information

REVIEW SESSION. Wednesday, September 15 5:30 PM SHANTZ 242 E

REVIEW SESSION. Wednesday, September 15 5:30 PM SHANTZ 242 E REVIEW SESSION Wednesday, September 15 5:30 PM SHANTZ 242 E Gene Regulation Gene Regulation Gene expression can be turned on, turned off, turned up or turned down! For example, as test time approaches,

More information

2. Cellular and Molecular Biology

2. Cellular and Molecular Biology 2. Cellular and Molecular Biology 2.1 Cell Structure 2.2 Transport Across Cell Membranes 2.3 Cellular Metabolism 2.4 DNA Replication 2.5 Cell Division 2.6 Biosynthesis 2.1 Cell Structure What is a cell?

More information