Functional Redundancy and Expression Divergence among Gene Duplicates in Yeast

Size: px
Start display at page:

Download "Functional Redundancy and Expression Divergence among Gene Duplicates in Yeast"

Transcription

1 Functional Redundancy and Expression Divergence among Gene Duplicates in Yeast by Zineng Yuan A thesis submitted in conformity with the requirements for the degree of Master of Science Department of Molecular Genetics University of Toronto Copyright by Zineng Yuan, 2010

2 Functional Redundancy and Expression Divergence among Gene Duplicates in Yeast Abstract Zineng Yuan Master of Science Department of Molecular Genetics University of Toronto 2010 My research mainly focused on the functional redundancy and expression divergence of gene duplicates to address currently unsolved problems. Herein, we employed a method based on GO terms to measure functional overlap between paralogs. We established that functional similarity between duplicate genes is the key determinant of their backup capacity. Later, we also investigated expression divergence. Recent studies suggest that only a small proportion of expression variation can be explained by transcriptional variation between paralogs. Here, the contribution from diverged TFregulations was re-examined and differential promoter chromatin status was also found as an important contributor to expression divergence. To better understand the role of gene duplication in great detail, a case study was performed on the yeast chaperone system, which includes many gene duplicates. Taken together, this study sheds light on the roles of redundancy and divergence in long-term retention of gene duplicates. ii

3 Acknowledgments I would like to first express my sincere gratitude to my supervisor, Professor Zhaolei Zhang, for his invaluable guidance and strong support throughout my M.Sc. program of University of Toronto. I also thank my committee members, Dr. John Parkinson and Dr. Walid A. Houry. I have tremendously benefited from their outstanding visions, technical insights and practical sensibility. I deeply appreciate their strict training and precious advice in all aspects of my academic development. I thank all my colleagues Jingjing Li, Xiao Li, Dong Dong, Lee Zamparo and Renqiang Min for their valuable discussions on research and precious suggestions and friendship that support me throughout the past two years. I wish them all the best in pursing their dreams in the future. I would like to thank my friends Colin and Vestraea, who are funny guys, for their company and support. I appreciate all the comfort from you when I feel horrible. I appreciate all the joy they have brought to me. I feel so blessed to have them on my side to unconditionally support me in my efforts towards my goals. Finally, I wish to send my gratitude to my parents. Their unreserved love and support have been the most important power that kept my perseverance throughout the past years. Without their care and encouragement from a distance, I would not have been so focused in pursuing my study. This thesis is dedicated to them. iii

4 Abstract... ii Acknowledgments... iii List of Tables... vi List of Figures... vii List of Abbreviations... viii Chapter 1 Introduction Overview of gene duplication Background and overview Whole Genome Duplication (WGD) and Small Scale Duplication (SSD) Two scales of gene duplications Intrinsic difference between WGD and SSD genes Different evolutionary constraints on WGD and SSD genes Evolutionary fates of gene duplicates Nonfunctionalization (Pseudogenization) Subfunctionalization Neofunctionalization Expression divergence contributing to functional divergence between paralogs Genetic redundancy by gene duplication S. cerevisiae as a model organism to study gene duplicates Thesis rationale Chapter 2 Functional Redundancy and Expression Divergence in Gene Duplicates Functional overlap in gene duplicates Introductions and Motivations Data and methods Prevalent and strong genetic backup between duplicate paralogs Partial functional overlap as a key determinant of backup capacity between paralogs Molecular basis of genetic buffering between duplicated genes The contribution of cis-elements to expression divergence between duplicated genes Introduction and motivations iv

5 2.2.2 Data and Methods Transcription factors divergence explains expression divergence between paralogs Divergence in promoter chromatin structure between paralogs Better explanation of expression divergence between paralogs by the composite divergence of TF regulation and promoter chromatin status Gene duplicates in the chaperone system Introduction and motivations Functional divergence and redundancy in the chaperone system Functional divergence leading to functional specificity Chapter 3 Conclusion Summary of research findings Long-term retention of gene duplicates Future work Reference v

6 List of Tables Table 2-1 Summary of chaperones in budding yeast S. cerevisiae Appendix Table 1 Gene duplicates in the chaperone system vi

7 List of Figures Figure 1-1 Models of divergence in duplication paralogs Figure 1-2 Functional dispersal of a duplicated gene Figure 2-1 An example of calculating GO-div between gene A and B Figure 2-2 Aggregating genetic interactions among gene duplicates Figure 2-3 Genetic buffering between duplicates resulting from functional redundancy Figure 2-4 Prediction of backup capacity between paralogs Figure 2-5 Comparison of the co-cluster in both backup and non-backup pairs Figure 2-6 A schematic figure representing the neural network applied in this study. Figure 2-7 Comparison of age distribution in this study and yeast genome background. Figure 2-8 Comparison of open in ancient and recent pairs Figure 2-9 Divergence in nuc-sequences, open and transcription factors Figure 2-10 Chromatin divergence accounting for expression divergence Figure 2-11 Correlation comparison between expression divergence and TF divergence Figure 2-12 Distinct functional enrichment of negative interactors for CPR6 /CPR7 Figure 2-13 Functional study of CPR7 linking its role to transportation vii

8 List of Abbreviations open promoter chromatin status Arabidopsis Arabidopsis thaliana BP biological process ChIP chromatin immunoprecipitation DDC duplication-degeneration- complementation DivTF TF divergence GCA growth curve analysis GO gene ontology GO-div gene ontology divergence KDE kernel density estimation Ka non-synonymous substitution rate Ks synonymous substitution rate K. waltii Kluyveromyces waltii (also known as Lachancea waltii) MIPS Munich information center for protein sequences database MSE mean square error PCC Pearson correlation coefficient PNDR promoter nucleosome-depleted region RSA random spore analysis S. cerevisiae Saccharomyces cerevisiae SGA synthetic genetic array SGD Saccharomyces Genome Database SSD small-scale duplication SVM support vector machine TF transcription factor WGD whole-genome duplication viii

9 Chapter 1 Introduction 1.1 Overview of gene duplication Background and overview Gene duplication has long been considered a major driver for creating genetic novelty. Forty years ago, Susumu Ohno hypothesized that natural selection merely modified, while redundancy created in his classical work Evolution by Gene duplication (Ohno, 1970). Gene duplication is critical to the development of novel cellular functionality, because one copy is free to evolve novel functions while the other retains ancestral function (Lynch and Conery, 2000a). In recent decades, extensive studies have been carried out to clarify the evolutionary fate of duplicated genes. In the introduction part of this thesis, I mainly review previous work regarding gene duplication. I first introduce the two different classes of gene duplication in yeast, namely whole-genome duplication (WGD) and small-scale duplication (SSD) and highlight their major differences. I then focus on the evolutionary fates of gene duplicates. Finally, I outline the thesis rationale and underscore its potential contribution in addressing currently unsolved problems. 1.2 Whole Genome Duplication (WGD) and Small Scale Duplication (SSD) Two scales of gene duplications Gene duplications could occur on two distinct scales. One is Whole Genome Duplication (WGD), by doubling the chromosomes derived from a species (autopolyploidy) or from different species (allopolyploidy) (Chen, 2007). The other is Small Scale Duplication (SSD), which occurs locally involving a single gene or a group of genes within a chromosomal segment. Shortly after the completion of the genome sequencing of S. cerevisiae, Wolfe and colleagues proposed that this species had undergone WGD due to the existence of large, non-overlapping blocks in the genome sequence (Wolfe and 1

10 Shields 1997). Wolfe and colleagues further provided criteria which would satisfy the yeast WGD, including conserved gene order in non-overlapping, paired chromosomal blocks with an approximate 2:1 orthological relationship with an outgroup species (Skrabanek and Wolfe, 1998). Kellis and colleagues provided the strongest evidence confirming the genome scale duplication (Kellis et al, 2004). They compared the whole genome sequence of S. cerevisiae with a related species, K. waltii, and found a large number of aligned blocks exist in two copies in S. cerevisiae. In addition to budding yeast, a number of other eukaryotes were also found to have undergone genome-wide duplications. For example, the Arabidopsis genome has duplicated three times (Arabidopsis Genome Initiative, 2000), and the ancestor of extant vertebrates had undergone two rounds of WGDs (Putnam et al, 2008). Given the prevalence of WGD events in many species, researchers have studied their contribution to genetic, functional, and phenotypic diversities of the host organisms. WGD increases the number of regulators, which could facilitate achieving a more complex regulatory system (Freeling and Thomas, 2006). As Maere pointed out, WGD could explain more than 90% of the increased regulatory genes in Arabidopsis (Maere et al, 2005). In vertebrates, WGD is found to be a contributor to the expansion of the homeobox (HOX) genes, insulin receptors, and nuclear receptors (Maere et al, 2005). It was also pointed out that WGD could increase species diversity since loss of different sister genes after duplication events in separated populations might give rise to reproductive isolation, yielding new species (Lynch and Force, 2000b). In comparison with WGD, Small Scale Duplication (SSD) occurs locally involving individual genes within a chromosomal segment. It results from unequal crossing over during meiosis. In this case, chromosomes exchange segment of nucleotide sequence unequally. Therefore, in the exchanged regions of daughter cells, varying zygosity for genes occur. Since WGD and SSD genes have different origins, they might be subject to distinct evolutionary constraints (DeLuna et al, 2008; Guan et al, 2007; Hakes et al, 2007). Therefore, it is worthwhile to first explore the differences between WGD and SSD genes. In the following section I will elaborate on the differences between these two evolutionary events. 2

11 1.2.2 Intrinsic difference between WGD and SSD genes Several groups compared the difference between WGD and SSD genes (Davis and Petrov, 2005; Guan et al, 2007). Davis et al found that WGD and SSD genes are enriched for different functional categories (Davis et al, 2005). Guan et al. found that WGD gene pairs have higher functional similarity than SSD gene pairs after applying a Bayesian data integration method in quantifying functional associations (Guan et al, 2007). Later, similar conclusions were drawn by comparing the shared physical interactions and by gauging Gene Ontology (GO) annotations (Hakes et al, 2007). In addition, Guan also found that WGD pairs have more divergence in both regulatory region and expression pattern (Guan et al, 2007). In addition to these differences, the half-life, i.e., the time required for the number of duplicates to be reduced to half of its initial value, of WGD and SSD genes is also distinct. It was estimated the half-life for SSD derived genes is about 4 million years, while the estimated half-life for WGD derived genes is about 33 million years in S. cerevisiae (Lynch et al, 2000a). Taken together, these observations suggest that WGD and SSD genes are subject to different evolutionary pressures. The above conclusions about the difference between WGD and SSD genes also reconciled some previously inconsistent results, where conclusions regarding duplicated genes were drawn without distinguishing between these two categories (Guan et al, 2007). For example, studying a small dataset of WGD pairs, Wagner pointed out that there is no correlation between fitness for single gene knockouts and sequence similarity of duplicated pairs (Wagner, 2000). However, a positive correlation was found by Gu et al on a large gene pool including both SSD and WGD pairs (Gu et al, 2003). Wagner found no coupling between sequence divergence and expression divergence while other two groups found a positive correlation using both WGD and SSD pairs (Gu et al, 2005; Gu et al, 2002; Zhang et al, 2004). Those disputes was explained by treating WGD and SSD genes separately (Guan et al, 2007). 3

12 1.2.3 Different evolutionary constraints on WGD and SSD genes Since WGD and SSD are under different constraints, the underlying mechanisms attributed to their intrinsic differences were then examined. Dosage effect, especially dosage balance was regarded to play an important role in determining the subsequent pressure on gene duplicates (Papp et al, 2003a). Considering a protein complex comprising two protein subunits, A and B, often excessive abundance of either member will break the equilibrium and lower fitness. For example, extra copies of A might compete with other AB-binding regulatory subunits, thereby interfering with AB s normal function. Alternatively, extra A might form non-functional homodimers rather than functional AB heterodimers. Therefore, the stoichiometry between components in the protein complexes or pathways must be maintained to avoid potential dosage disruption (Papp et al, 2003a). This idea could largely explain the different evolutionary pressure on WGD and SSD genes. The entire protein complex is duplicated in WGD but not in SSD. Therefore, WGD genes are more likely to be retained in the protein complexes, as a synchronous increase on dosage brings minimal harmful consequence to the overall dosage equilibrium. In contrast, a SSD derived gene product may disrupt stoichiometry between components, when it is involved in a protein complex. Then strong purifying selections will rapidly eliminate this extra copy (Papp et al, 2003a). The dosage balance hypothesis also explains the observation that in the ribosome, where dosage balance is of significant importance, SSD genes are rarely retained (Papp et al, 2003a). In addition to dosage balance, an alternative explanation for long time retention of the WGD genes is that duplication of an entire protein complex has greater chance to gain immediate benefits. If a certain complex is dosage sensitive, selection will operate on its members immediately afterwards when the increased dosage is beneficial, and the whole complex might be retained (Aury et al, 2006). Taken together, dosage effect is a crucial factor leading to distinct evolutionary constraints on WGD genes and SSD genes. 4

13 1.3 Evolutionary fates of gene duplicates Different evolutionary models have been proposed to explain the evolutionary fates of duplicated genes. In neofunctionalization, one copy of gene duplicates evolves novel functions, while the other copy retains progenitor s function (Kellis et al, 2004; Lynch et al, 2000a; Wolfe and Li, 2003). Nevertheless, not all paralogs can gain novel functions since the acquisition of beneficial mutations is uncommon (Takuno and Innan, 2009). An alternative model, subfunctionalization, argues that gene duplicates evolve merely through partitioning their ancestor s function rather than creating functional novelty. This partition model has been observed in multifunctional genes, where daughter copies divide progenitor s functions immediately after duplication (Force et al, 1999). Here, I categorize and summarize the well-established models regarding the evolutionary fate of gene duplicates Nonfunctionalization (Pseudogenization) Nonfunctionalization is the most likely fate of duplicates and one of the two copies becomes silenced. For example, in S. cerevisiae, only one copy of ~90% WGD pairs is retained since the ancient WGD event (Kellis et al, 2004). At the time of duplication, two copies are completely identical. This state is not stable as the two paralogs are functionally redundant, which allows the accumulation of degenerative (loss-of-function) mutations. In many cases, one of the two copies finally becomes a pseudogene. Through exploring the genomic data of several eukaryotes, Lynch and colleges proposed and confirmed the rapid loss of gene duplicates and also found the number of remaining pairs can be fitted by the survivorship function in (1.1): Ns =N 0 e -ds (1.1) Ns is the number of duplicates observed at the divergence level S. N 0 and d are constants, which are fitted by linear regression of the log-transformed data (Lynch et al, 2000a) Subfunctionalization Though the nonfunctionalization model could explain the fate of most gene duplicates, researchers still observed many extant gene duplicates, especially those following the 5

14 ancient polyploidy event. Several models were introduced (Figure 1-1) to explain the long time preservation of genes duplicates. In the subfunctionalization model, gene duplicates are preserved through a process in which the ancestor s multi-functions are divided into its daughter copies. Specifically, Force et al. introduced the duplicationdegeneration-complementation (DDC) model (Force et al, 1999). A gene accumulates degenerative mutations which are compensated by its paralogous copies. As a result, through subfunctionalization, paralogous copies could fulfill the function of their ancestor through undertaking complementary roles Neofunctionalization Though the DDC model could explain the retention of duplicate genes, recent studies suggested that this model is inadequate to explain all preserved gene duplicates (He and Zhang, 2005; Li et al, 2005). For example, the DDC model brings two predictions when applied in the study of cis-element motif: 1) The total number of cis-element motifs in duplicated pairs should decrease with time due to degenerations. 2) Genes with more paralogs tend to have less regulatory motifs since the dispersal of cis-element motifs in multiple rounds. Surprisingly, Papp and colleagues found that in contrast to the predictions, the number of total motifs in duplicated pairs keeps constant across the evolutionary time (calibrated by Ks), and genes with numerous paralogs do not have particularly low number of regulatory motifs (Papp et al, 2003b). Therefore, in order to maintain the number of motifs, new motifs must emerge in gene duplicates. Such a process requires the acquisition of beneficial mutations conferring new functions (neofunctionalization) (Papp et al, 2003b). After examining genome-wide protein-protein interaction data in budding yeast and comparing the interaction partners between paralogs, He revealed that the DDC model is inadequate to explain the constant number of shared partners over time and confirmed the existence of neofunctionalization. As a result, they proposed a new theory, predicting that a large number of gene duplicates have passed through fast subfunctionalization followed by prolonged and sufficient neofunctionalization (He et al, 2005). In a recent study, Hittinger and Carroll provided experimental evidence in support of He s hypothesis (Hittinger and Carroll, 2007). The bi-functional ancestral gene GAL1 which did not experience duplication is still present in 6

15 some yeast species. However, in Saccharomyces cerevisiae, the ancestral function of GAL1 was split and carried out by GAL1 and GAL3 as galactokinase and co-inducer respectively, indicating subfunctionalization. Furthermore, adaptive evolution was observed in one of these sister paralogs, GAL1, indicating neofunctionalization (Hittinger et al, 2007) Expression divergence contributing to functional divergence between paralogs Extant gene duplicates diverge their functions by either subfunctionalization or neofunctionalization and the underlying mechanisms were studied by researches. Ohno once proposed that expression divergence is an important step in the functional divergence between paralogs (Ohno, 1970). It is known that gene expression divergence following gene duplication could result in expression specialization in tissue or developmental processes, which is a sign of evolving adaptive functions (Huminiecki and Wolfe, 2004). For example, in human and apes, GLUD2, a glutamate dehydrogenase, shows strong evidence of adaptive evolution, and is different from the ancestral form GLUD1 (Plaitakis et al, 2003). This process seems to be initiated by changes in its expression pattern. Consequently, GLUD2 has specific changes in allosteric sensitivity and seems more adaptive to its new location, neurons. (Plaitakis et al, 2003). Moreover, Wolfe and colleagues found that recent lineage-specific duplicates increase human and mouse expression divergence in orthologous tissues (Huminiecki et al, 2004). They also found that specialized expression pattern is a general trend stemming from gene duplication, leading to functional specificity (Huminiecki et al, 2004). 7

16 Figure 1-1 Evolutionary models post a gene duplication event Evolutionary models following gene duplication event are implied by instances of random mutations in cis-regulatory motifs. The coloured small boxes represent functional regulatory elements while the white boxes denote the non-functional elements. The large black boxes denote the transcribed regions. In the first two steps post duplication, one of the copies harbours a null mutation in regulatory region. On the left, one copy acquires null mutations in each element and eventually, this copy will become pseudogenized (denoted by the white boxes). The central part depicts the neofunctionalization model in which one copy acquires a beneficial new-motif. The right shows the subfunctionalization model in which both copies function complementarily to perform the ancestral functions. 8

17 1.4 Genetic redundancy by gene duplication Extant gene duplicates have to diverge through the processes of subfunctionalization or neofunctionalization because complete redundancy is not favoured by evolution (Kitano, 2004). However, functional overlap indeed exists between extant paralogs (Musso et al, 2007; Wagner, 2000). Therefore, it is intriguing to investigate the contribution from these redundant copies to host organism s fitness. Wagner firstly introduced the idea that the widespread gene duplication events could be responsible for robustness against genetic perturbations but he was not able to offer convincing evidence due to limited data at the time (Wagner, 2000). The idea of backup by gene duplicates has been upheld by a number of observations of backup circuit in real network (Kafri et al, 2005), where one gene could change its expression profile to compensate for its lost paralog. Such expression reprogramming has been verified in two isoenzymes Acs1 and Acs2. Despite their dissimilar expression in normal condition, Acs1 achieves an Acs2-like response to glucose, upon the deletion of Acs2 (Van den Berg, 1996). A similar case involves NHP6A and NHP6B, in which deletion of NHP6A gives rise to a three-fold increase in NHP6B synthesis (Kolodrubetz et al, 2001). Recently, using high-throughput flow cytometry, Deluna et al performed a genome scale study on paralog responsiveness (DeLuna et al, 2010). By comparing protein abundance of wide-type strains with paralog-knockout strains, they found that paralog responsiveness is need-based and only appears when gene function is required (DeLuna et al, 2010). On the other hand, the aggravating interaction from the Synthetic Genetic Array (SGA) experiment (Tong et al, 2001) indicates genetic backup and therefore, a clear prediction of genetic backup is that buffering pairs should exhibit a strong aggravating interaction with their paralogs. In a recent study, the high prevalence of duplicates genetic buffering can be observed as reported on subset of yeast gene duplicates (Ihmels et al, 2007; Musso et al, 2008), which confirms that gene duplicates are indeed responsible for genetic robustness. Moreover, maintaining these duplicates could come up with robustness under other conditions. Duplicated pairs do show condition-dependent aggravating interactions or responsiveness, which are quite different across varying conditions (DeLuna et al, 2010; Musso et al, 2008). 9

18 However, it is worthwhile to mention that instead of conferring robustness, maintaining redundant paralogs could also be subject to alternative explanations. It might be merely attributed to dosage effect, that is, keeping balance of gene dosage (Papp et al, 2003a). 1.5 S. cerevisiae as a model organism to study gene duplicates There are several reasons which render the budding yeast S. cerevisiae a valuable model organism for scientific research. First, studying the budding yeast could provide meaningful insights into our own genome as this tiny organism contains orthologs of many human genes. Next, the genome of budding yeast can be manipulated easily due to ease of cell culturing and its compact genome. In the past few years, many large-scale experiments have been undertaken in S. cerevisiae. These experiments covered almost every aspect of functional categorizations. For example, S. cerevisiae was the first sequenced eukaryote (Goffeau et al, 1996) and extensive microarrays have been performed under multiple conditions to obtain mrna expression profiles (Hughes et al, 2000; Spellman et al, 1998). The fitness contribution of each gene in budding yeast was determined through a comprehensive analysis of gene-deletion phenotypes (Giaever et al, 2002). Global analysis of protein localization in budding yeast was also available via a large-scale fluorescence labelling study (Huh et al, 2003). S. cerevisiae was the first eukaryote to be studied in large-scale protein interaction screens by Yeast-2-Hybrid (Y2H) (Gavin et al, 2002) and by Tandem Affinity Purification followed by Mass Spectrometry (TAP-MS) (Krogan et al, 2006). High-throughput Synthetic Genetic Array (SGA) experiment was also developed to detect synthetic lethal or synthetic sick interactions between gene pairs, indicating their functional information (Costanzo et al, 2010; Tong et al, 2001). Gene regulation by transcription factors (TF) binding in regulatory elements was studies by ChIP-chip experiments (Harbison et al, 2004; Lee et al, 2002). Moreover, ChIP-chip and ChIP-Seq experiments on nucleosome occupancy and nucleosome dynamics have been accomplished (Lee et al, 2007; Shivaswamy et al, 2008), since chromatin structure of promoter also regulates gene expression by undergoing a remodelling process prior to TF binding (Jiang and Pugh, 2009). In this 10

19 process, nucleosomes are removed to leave regulatory regions physically accessible by regulatory factors (Jiang et al, 2009). These high-throughput experiments and studies in traditional biochemistry resulted in comprehensive functional annotations. Most of them have been collected and categorized in the publicly available Saccharomyces Genome Database (SGD). Several datasets are of specific relevance to my research. 1.6 Thesis rationale Long-term retention of paralogs is the key issue in the study of gene duplication. Given sufficient evolutionary time, the states of extant paralogs ought to be stabilized. Such states might embrace both functional redundancy and functional divergence (See Fig 1-2). Functional redundancy contributes to genetic robustness, whereas functional differentiation produces genetic novelty and complexity. The study of functional redundancy and divergence between extant paralogs is the premise for understanding the long-term preservation of gene duplicates. However, there are still many questions without definitive answers: (1) What are the underlying determinants of genetic buffering between gene duplicates? (2) Though it has long been postulated that expression evolution to be an important step in the functional differentiation between paralogs, what are the underlying determinants of expression divergence? This study serves to address the above questions on the basis of a large body of extant gene duplicates. In this study, redundancy resulting from gene duplication is investigated through a comprehensive dataset including both WGD and SSD derived genes. We established that functional similarity between duplicate genes, measured by Gene Ontology (GO) terms, is a key determinant and is highly predictive of their backup capacity. This study next investigated mechanisms which lead to expression divergence. Here, transcription factor (TF) divergence is re-evaluated using a more comprehensive dataset compared with previous study and we demonstrated differential TF regulation plays a more important role in expression divergence of paralogs than previously appreciated. Moreover, the role of chromatin structure in determining expression evolution between paralogs is clarified and highlighted. In the last part of this thesis, gene duplicates are studied in the chaperone system, which is an essential quality-control 11

20 system in S. cerevisiae with many gene duplicates. Such a case study in a familiar biological system serves to examine in depth the functional association between duplicated pairs, how extensive functional dispersal is, what role it has played in longterm retention of gene duplicates. We note that duplicates in the chaperone system are not merely redundant; instead, they are divergent in their functions and such divergence might lead to their preservation. Figure 1-2 Functional dispersal of a duplicated gene The schematic figure shows functional divergence and redundancy over long spans of evolutionary time. Functions are symbolized by areas of rectangles. F overlap is the functional overlap between these two genes while F x, F x represents the functional divergence in these two genes respectively. The total function F total is the summation of F x, F x and Foverlap. Upon the time of gene duplication, two copies are functionally identical. After long evolutionary time, paralogs may diverge but meanwhile retain partial functional overlap as indicated by dark red. 12

21 Chapter 2 Functional Redundancy and Expression Divergence in Gene Duplicates 2.1 Functional overlap in gene duplicates Introductions and Motivations It has been long hypothesized that a duplicated copy provided by gene duplication could buffer perturbations on its progenitor copy (Wagner, 2000). However, controversy remains. On one hand, duplicated genes do show markedly elevated dispensability than singleton genes through the single deletion profile study, which has been speculated to result from mutual compensation between paralogs (Gu et al, 2003); on the other hand, He and Zhang proposed that less important genes are more likely to duplicate through the comparison between different species (He and Zhang, 2006). Therefore, the observed elevated dispensability of duplicates by Gu et al might merely result from the intrinsically higher duplicability of these less important genes rather than from the compensation between paralogs. Therefore, in order to reconcile the controversy, a systematic interrogation of genetic interaction data is an effective way to determine the extent to which yeast paralogs could buffer each other. Based on recent studies, high prevalence of mutual genetic buffering by duplicates was consistently observed on a small subset of yeast paralogs, suggesting that paralogous copies do serve to backup each other (Dean et al, 2008; DeLuna et al, 2008; Ihmels et al, 2007; Musso et al, 2008). However, when determining the characteristic of buffering paralogs, researchers found little functional similarity between paralogs, leading to the hypothesis buffering without redundancy (Ihmels et al, 2007). Similarly, Musso wrote Epistatic paralog pairs could not generally be shown to have more shared functional overlap (as gauged by physical interactions) than comparable non-epistatic paralogs and functional compensation can not necessarily be predicted based on the conservation of duplicated pairs; direct assay of function is required (Musso, 2010). Using the largest SGA data with a total of ~5.4 million gene pairs screened (Costanzo et al, 2010), we examined the genetic buffering between duplicate pairs. 13

22 2.1.2 Data and methods Compiling gene duplicates To examine duplicated genes, we employed the dataset from Guan and colleagues (Guan et al, 2007). In this dataset, gene pairs with sequence similarity no less than 20% were identified as being paralogs based on reciprocal best match. WGD pairs were further detected on the basis of Kellis et al. (Kellis et al, 2004). We excluded ribosome-related proteins from our analysis because they tend to display disproportionately high levels of conservation (Papp et al, 2003a). Of 374 WGD pairs and 483 SSD pairs, only 266 WGD pairs and 228 SSD pairs were present in the dataset from Costanzo et al. (Costanzo et al, 2010). The scoring scheme for the SGA experiments is described in the original paper. A significant negative interaction (ε<0 and p<0.05) between a gene pair is defined as a genetic buffering. Compiling protein complexes Protein complexes were curated by merging annotations from SGD Saccharomyces Genome Database (SGD), the Gene Ontology (GO) and The Munich Information Center for Protein Sequences (MIPS). Synonymous and non-synonymous substitution rates per nucleotide Nonsynonymous (Ka) and synonymous (Ks) substitution rates is a useful and straightforward metric for measuring sequence variations of homologs. Nonsynonymous substitution is a nucleotide substitution that results in a change of amino acid encoded, while synonymous substitution does not cause an amino acid replacement. The coding sequences of the above pairs were obtained from Ensembl database and Ka and Ks was calculated between duplicated gene pairs using the PAML package (Yang, 1997). Measurement of functional associations To measure functional association, we used the method from Guo et al (Guo et al, 2006), who developed a method adopting the concept of information content. Each term of Biological Process (BP) in GO represents a corpus and each gene is annotated within this corpus. Functional similarity is defined by the semantic similarity as follows: Suppose there are two genes G1 and G2, and G1 is annotated with M terms and G2 is annotated 14

23 with N terms. The semantic similarity between any two terms, m, n, where m M and n N, is derived as shown in Equation (2.1) (Guo et al, 2006). T (m,n) = 2 ln(min x S(m,n) {p(x)}) ln p(m) + ln p(n) (2.1) where S(m,n) is the set of parent terms shared by m and n, and p(x) represents the occurring frequency of a term x or any child term. The numerator is to calculate the information content of the most specific parent term(s) shared by m and n, and the denominator is the normalization constant as to scale the score between zero and one. Thus for two terms, if both terms are specific (the bottom layers of GO tree) while their common ancestor term is also very specific, then the two terms receive high score T, indicating greatest semantic similarity between the two terms. For a pair of paralogs, all possible configurations from their GO terms were calculated and the maximal score assigned for the best matched GO-term pairs was regarded as the functional similarity (see an example in Figure 2-1). As a result, as long as two genes share some very specific functions regardless of their divergence in other functions, they will be assigned a high score. In other words, this method has the potential capacity to capture the partial functional overlap which is of high relevance to the study here. 15

24 Figure 2-1 An example of calculating GO-div between gene A and B (A) A table shows all configurations of GO terms for a gene pair A and B. For each combination of GO terms, we first calculate functional similarity using equation (2.1) and assign one minus the best score as GO-div. (B) A figure shows how the equation (2.1) works. For GO-term m and n in the GO-tree, the red node indicates the most specific common ancestral term. 16

25 2.1.3 Prevalent and strong genetic backup between duplicate paralogs Among the assayed duplicate pairs, we found that 39.5% (105/266) of the WGD paralogs have significant aggravating interactions, in comparison with 18.4% (42/228) for SSD paralogs. The percentage of backup pairs for WGD is comparable to what was previously reported (~35%) (Musso et al, 2008). We designed two control sets to examine whether duplicate pairs have excessive backup capacity. First, random gene pairs were chosen with genetic interactions and we found only 7% of them have aggravating interactions (Figure 2-2). Second, we took all the duplicated genes and randomly grouped them into pairs, and found that only 6.6% of these random pairs have aggravating interactions; this ruled out the possibility that duplicate genes intrinsically have more aggravating genetic interactions. Thus, the analysis established that duplicates indeed have excessive backup capacity. We also studied the backup strength between paralogs. Compared with both control sets, the interaction strength between duplicate pairs is much stronger with average scores of and for WGD and SSD, respectively, in sharp contrast to and for the two random controls, respectively (P= for WGD, P= for SSD and P=0.06 between WGD and SSD, Wilcoxon rank-sum test). Notably, these findings are in agreement with what was previously reported (Dean et al, 2008; DeLuna et al, 2008; Ihmels et al, 2007; Musso et al, 2008). Taken together, our analysis established that strong genetic buffering capacity is prevalent between both WGD and SSD paralogs, which provides enhanced genetic robustness in yeast cells. 17

26 Figure 2-2 Aggravating genetic interactions among gene duplicates Left figure: Aggravating interaction percentage between SSD, WGD and random simulations 1000 pairs were randomly chosen as a group and the percentage of random pairs maintaining negative genetic interaction can be determined for this group. We did this for 1000 times, then distribution of the percentages can be estimated from the 1000 randomized controls. Right figure: Buffering strength between duplicates is stronger than the randomly paired genes. The x-axis represents the strength of aggravating interaction and y-axis denotes the cumulative density from zero to one. WGDs have much stronger backup strength than SSD. 18

27 2.1.4 Partial functional overlap as a key determinant of backup capacity between paralogs Intuitively, the ability for paralogous genes to backup each other should be correlated with their functional similarity. However, based on small datasets, previous work suggested that functional redundancy between buffering duplicates is minimal and no more than paralogs without backup capacity (Ihmels et al, 2007; Musso, 2010). We noted that in early studies, functional similarity between paralogs was calculated indirectly by divergence in expression profiles, protein interactions, or genetic interaction profiles (Ihmels et al, 2007; Musso et al, 2008). However, for paralogs, as long as they could keep minial functional overlaps, they might buffer each other (see Figure 1-2). Therefore, it is likely that interaction profile is an inappropriate measurement for partial functional overlap. To unravel partial functional overlap, we employed GO-div, to gauge functional overlap between paralogs directly from their respective GO annotations (Guo et al, 2006). Conceptually, GO-div measures the semantic similarity between the sets of GO annotations associated with a pair of genes (Guo et al, 2006). GO-div is calculated on the basis of similarity between the best matched GO terms between paralogs (see method in section (2.1.2)). Higher GO-div indicates less functional overlap between paralogs while lower GO-div indicates both paralogs at least share some very specific functions even though they have diverged in other functions. To increase the reliability of our analysis, the electronic annotation (with the code of IEA) was removed for it was annotated electronically without manually examining. Complementary to GO-div (Li, 1997), the non-synonymous substitution rate per site (Ka) was also calculated between paralogs to indicate coding sequence evolution between paralogs. Among the 494 duplicate pairs, compelling evidence was found arguing against a previous statement that backup between paralogs does not require functional redundancy (Ihmels et al, 2007). We found that substantial functional overlap between paralogs (for both WGD and SSD duplicates) is a key determinant of their genetic backup capability. First, as revealed by Figure 2-3(A, B), duplicate pairs (either WGD or SSD) are more likely to buffer each other if they have less diverged functions; this trend holds when functional divergence was estimated either by the direct measure (GO-div) or by the 19

28 indirect ones (Ka). Secondly, for the buffering pairs in both WGD and SSD, buffering strength between the paralogs is significantly correlated with their functional divergence (Figure 2-3 C and D) scored by GO-div, having Pearson s R=0.34, P= for WGD pairs and R=0.37, P=0.01 for SSD pairs. The correlation is also significant when using Ka to approximate functional divergence between paralogs in both WGD and SSD, with Pearson s R=0.41, P= for WGD pairs and R=0.33, P=0.03 for SSD pairs. Expression divergence between duplicates is significantly correlated with their buffering strength for SSD paralogs with R=0.33, P=0.03, but not for WGD pairs, consistent with previous work showing little difference in expression divergence between backup and non-backup WGD pairs (Musso et al, 2008). Since buffering pairs tend to be more functionally similar, it is intriguing to ask for any paralog pairs whether we can accurately predict their buffering potential based on their functional similarity. To test the predictability of backup capacity between paralogs, we pooled together the WGD and SSD duplicates and labeled the 147 backup pairs and the remaining non-backup pairs as positive and negative samples, respectively. We characterized each pair with a feature vector, each element being a metric scoring their functional divergence, including Ka, sequence identity, expression divergence and GOdiv. A support vector machine (SVM) was subsequently implemented to classify these paralogs into either being backup pairs or non-backup pairs. With a 3-fold crossvalidation, as demonstrated in Figure 2-4, functional similarities are found sufficient to distinguish those backup pairs from the non-backup pairs with AUC=0.74±0.05. Such a high predictive power further strengthens that our argument that backup between paralogs stems from their functional redundancy. It is also important to note that GO-div, which scores the specificity of the best matched functions between paralogs, is a strong indicator of backup capacity between paralogs as when removing this feature, prediction based on Ka, sequence identity and expression divergence, AUC substantially reduced to 0.67±0.04. Taken together, such a tight coupling between buffering strength and functional overlap between paralogs and powerful prediction of the functional overlap demonstrate that the compensatory effect between paralogs is indeed maintained by their functional overlap 20

29 and that less diverged pairs tend to have stronger buffering strength. It is also important to note that WGD and SSD paralogs have different origins and functional propensities (Davis et al, 2005; Guan et al, 2007). Therefore, the consistent observation of these two classes of duplicates suggests that the above conclusion was not biased towards particular function categories. 21

30 Figure 2-3 Genetic buffering between gene duplicates resulting from functional redundancy A and B indicate functionally similar genes are more likely to backup each other for WGD (A) and SSD (B) paralogs, respectively, where functional similarity was calibrated by the overlap of GO annotations (GO-div) and coding sequence divergence (Ka). C and D indicate buffering strength between paralogs is on average proportional to their functional similarity for WGD (C) and SSD (D) paralogs, respectively. 22

31 Figure 2-4 Prediction of backup capacity between paralogs The receiver operating characteristic (ROC) curve for the prediction of backup capacity between paralogs based on functional similarities. The ROC curve (Marzban, 2004) is a two dimensional measure of classification performance based on true positive rate (TPR) and false positive rate (FPR). The area under the ROC curve (AUC) is a scalar for assessing classification performance. A higher AUC indicates a better overall performance. The blue diagonal line represents a random control which has no classification capability. This curve, together with the AUC score, was from one random realization of the 3-fold cross-validation. 23

32 2.1.5 Molecular basis of genetic buffering between duplicated genes In the above analysis, we have seen an appreciable proportion of paralogs, especially those WGD sister paralogs, whose mutual buffering still exists after a long evolutionary time. The next step is to determine the molecular mechanisms by which backup could achieve long-term retention. Since a number of gene duplicates have lost their mutual buffering from the above study, we only considered the ancient WGD and SSD duplicates with backup capacity and considering these backup being stabilized. The WGD paralogs were derived from a single WGD event that occurred ~100 million years ago, and we considered this time is sufficiently long for the sequence and regulation of the paralogs to diverge and become fixed. We contrasted the 105 WGD duplicates with retained backup capacity against those 161 WGD pairs that had lost their mutual compensation. In contrast, for SSD paralogs, we only considered those paralog pairs with Ks greater than 2. In the end, we were able to compare 32 ancient SSD backup pairs (Ks>2) with the 163 non-backup pairs within the same age range (Ks>2). We first compared the sequence divergence between the non-buffering paralog pairs and the buffering paralog pairs. The buffering paralogs have significantly lower sequence divergence (~20% lower, p<1e-3, Wilcoxon rank-sum test). However, regardless of WGD and SSD, we found the buffering pairs do share some characteristics beyond the sequence level. With a total of 392 literature curated protein complexes examined, both WGD and SSD buffering pairs are more likely to reside in the same protein complexes, with the percentage of ~18% for the buffering pairs, compared with only~5-8% for the non-buffering pairs (Figure 2-5). Taken together, it reveals the elevated propensity of cocomplex for backup duplicated pairs. Also it is worthwhile to mention that although both WGD and SSD paralogs could have buffering capacity, substantial difference in the rate of functional divergence is revealed in Figure 2-6. It is clear that WGD pairs have far more buffering pairs than SSD paralogs and maintain much stronger buffering strength. We reasoned that this might result from differential evolutionary modes between WGD and SSD paralogs (Davis et al, 2005). It 24

33 is known that dosage balance plays an important role in WGD retention (Davis et al, 2005; Papp et al, 2003b); thus WGD paralogs are expected to be under stronger functional constraints (see Figure 2-6), which reduce the rate of functional divergence (such as reduced sequence divergence than SSD pairs as shown in Figure 2-6). Figure 2-5 Comparison of the co-cluster in both backup and non-backup pairs Both WGD and SSD buffering pairs are more likely to be associated in the same complex (indicated by the asterisks, Chi-square test, p<0.05). This result suggests that the cocluster in the same complex provides more functional constraints. 25

34 Figure 2-6 Functional divergence measured by Ka and GO-div in WGD and SSD WGD (A) and SSD (B) buffering paralogs have reduced sequence divergence and have more specific overlapping GO annotations. WGD buffering pairs have more conserved sequence evolution than those SSD buffering pairs. 26

35 2.2 The contribution of cis-elements to expression divergence between duplicated genes Introduction and motivations Extensive backup by duplicated genes has been confirmed in above analysis and a coupling between functional overlap and genetic buffering was observed for duplicated pairs. Despite the observed prevalence of mutual compensation between paralogs, a majority of the paralogs (>80% of SSD pairs and >60% of WGD pairs) have lost their mutual buffering, suggesting they might have kept minimal functional overlap and have diverged their functions. Expression divergence has long been regarded as an important factor leading to functional divergence (Ohno, 1970). Examination of the expression patterns of duplicated genes in budding yeast has shown that expression divergence scales with evolutionary time at a rapid rate, suggesting that alterations in transcription are critical in the functional dispersal of paralogs, and subsequently, their long-term retention (Gu et al, 2005; Gu et al, 2002). However, underlying mechanisms of expression divergence remain incomplete as only a very small proportion of expression variation can be explained by recent studies. It was reported that only 2-3% (Zhang et al, 2004) or 8% (Leach et al, 2007) of the expression variation between paralogs could be explained by examining regulatory motifs recognized by TFs. We noted that these studies were based on small datasets, which may bias the results. Here, we examined a more comprehensive dataset of regulatory interactions. In addition to differential TFs regulation, it is likely that other cis-influences on gene expression could further explain the lack of observed correlation between TF binding and expression divergence for paralogs. Specifically, prior to TF binding, the chromatin structure of promoters has to undergo a remodelling process, whereby nucleosomes are removed to leave regulatory regions physically accessible by regulatory factors (Jiang et al, 2009). As a large fraction of nucleosome occupancy is encoded by the flanking cis-elements (Field et al, 2009; Field et al, 2008; Kaplan et al, 2009), thus deviation in the cis-elements could affect nucleosome positioning and drive divergence in gene expression without observable differences in TF binding sites (Tirosh et al, 2008). Here, to further understand 27

36 expression evolution between paralogs and to test the above hypothesis, we explored the chromatin structure of promoter region between paralogs Data and Methods Compiling gene duplicates, transcription regulation and expression data To comprehensively examine the above hypothesis, a compendium of regulatory and gene expression data much greater in coverage than used by previous studies was applied. The known regulatory interactions in budding yeast from two genome-wide ChIP-chip studies (Harbison et al, 2004; Lee et al, 2002) were collected and small-scale biochemical experiments were also curated from recent literature (Balaji et al, 2006; Yu and Gerstein, 2006). This dataset covers 4,684 yeast genes, among which 298 are TFs mediating 15,451 regulatory interactions. Expression data is from microarray experiments which contain three large datasets across a total of 549 physiological conditions (Gasch et al, 2000; Hughes et al, 2000; Spellman et al, 1998). Of all the gene pairs from Guan and colleagues (Guan et al, 2007) (see method in section (2.1.2)), 606 pairs were available in which both genes had annotated regulatory interactions and corresponding expression data. Measurement of TF divergence, expression divergence and nucleosome occupancy divergence TF regulatory divergence is the fraction of diverged TFs for a duplicated pair and was calculated by one minus the fraction of shared TFs between paralogs. The fraction of shared TFs between paralogs is measured by Jaccard index (see equation (2.2)). For two samples, Jaccard index is defined as the size of their intersection divided by the size of their union. Here, the numerator denotes the number of shared TFs and denominator is the total size of TFs. Then one minus this fraction is defined as TF regulatory divergence. (2.2) 28

37 Expression divergence is quantified as one minus the Pearson s correlation coefficient of expression between sister paralogs across all 549 physiological conditions. r is the Pearson s correlation coefficient of gene X, Y across a total of N conditions (N=549 here) in (2.3). One minus r is expression divergence. (2.3) Nucleosome occupancy divergence is calculated based on promoter nucleosome-depleted region (PNDR) scores, which Field and colleagues have devised to quantify the openness (or the lack of nucleosome presence) for promoter regions corresponding to each yeast gene (Field et al, 2009). In essence, this score represents the lowest average nucleosome occupancy across any 100bp region within the nuc-sequence, with a higher score indicating a more closed promoter. In this framework, nucleosome occupancy at each base within the nuc-sequence was calculated by Field et al. based on a probabilistic model where information from flanking sequences was considered and the calculation was highly predictive of the experimentally assayed nucleosome organization (Field et al, 2008). These PNDR scores for each gene within the set of 606 duplicate pairs was combined and further normalized using kernel density estimation (KDE) for cumulative functions (with a Gaussian window) across the entire genome of 5,778 genes. In this way the scaled PNDR score represents the estimated fraction of genes having scores less than a given gene. This normalization procedure neither changed the PNDR score rankings across the genome background, nor did it distort the original score distribution. Therefore, the divergence in open promoter status (denoted by open) between sister paralogs was then taken to represent the absolute difference of the scaled PNDR scores. Construction of neural network We explored the coupling between TF-regulatory divergence and chromatin status divergence ( open) in determining the corresponding expression divergence. Since the relationship between TF regulation and chromatin status is not necessarily a linear 29

38 combination in determination of expression divergence, we sought for a regression method which has the capability to model potentially non-linear relationship. Neural network is a non-linear statistical data modeling tool, which is well-known for finding non-linear patterns between datasets (Bishop, 1995). Therefore, we employed a neural network here. For a neural network, it comprises of a set of highly interconnected processing elements. The learning results highly rely on the quality of initial input data, the architecture of connecting units as well as the efficacy of the input-output function. One of the most widely used structures is back-propagation (BP) network which has been known for its well-designed multi-layer pattern. The BP neural network applied here comprises one hidden layer consisting of 4 neurons of a hyperbolic tangent sigmoid transfer function and one output layer containing 1 neuron of a linear transfer function. And the loss function employs mean square error (MSE) function (Figure 2-7). In the training procedure, the 606 duplicate genes were then randomly partitioned into training, validation and test sets 100 times, with the regression between TF divergence and open to expression divergence learned. This BP neural network was examined on both training (60%, 364 among 606 pairs) and validation sets (20%, 121 among 606 pairs), which was used for preventing the network from over-fitting the data. The learned composite divergence derived from TF regulation and chromatin status was then independently and blindly tested on the test sets (20%, 121 among 606 pairs). 30

39 Figure 2-7 A schematic figure representing the neural network applied in this study. The input layer is the TF and promoter chromatin structure divergence. The hidden layer consists of 4 neurons of a hyperbolic tangent sigmoid transfer function. The output layer contains 1 neuron of a linear transfer function. The output layer yields a value corresponding to the expression divergence learning from divergence of both TF regulation and chromatin status Transcription factors divergence explains expression divergence between paralogs Using the collected expression and TF-regulatory datasets, we observed a more significant correlation between expression divergence and TF regulatory divergence for paralogs (Pearson s R=0.367, P<10-41 and Spearman s ρ=0.372, P<10-41 ). This result thus suggests a stronger association between these two variables than previously reported (e.g., R=0.15 or 0.27; (Zhang et al, 2004) and (Leach et al, 2007), respectively). However, even for all duplicates, the proportion of expression divergence explained by TF divergence was still very low (~13.5%, i.e., the square of Pearson s R, that is ). We thus investigated whether the divergence in promoter chromatin structure for the paralogs could help explain the pattern of expression divergence. 31

40 2.2.4 Divergence in promoter chromatin structure between paralogs In yeast, the accessibility of a promoter for a gene is usually determined by nucleosome occupancy over the 200bp region upstream of its translation start site (Field et al, 2009; Lee et al, 2007; Shivaswamy et al, 2008), and these upstream sequences are referred to as nuc sequences. The normalized PNDR score as discussed in method was applied to access the divergence in chromatin structure. Then nucleosome occupancy between paralogs was examined. Reliable (i.e. non saturated) Ks values could be obtained for 147 paralog pairs (Ks 2) from the calculated synonymous (Ks) and non synonymous (Ka) substitution rates per nucleotide between sister paralogs. While not a complete set, these 147 pairs do not demonstrate any noticeable bias when compared to the complete paralog set and thus were considered a representative set. First, the relationship between dynamics in promoter nucleosome status diverging and Ks was explored. Divergence in promoter nucleosome status between sister paralogs was significantly correlated with Ks (R=0.24, P= and Spearman s ρ=0.28, P= ). Moreover, as Ks is a good indication of time (Gu et al, 2002; Li, 1997), we investigated whether chromatin status scales with divergence time. The differential promoter chromatin status and Ks of 101 SSD pairs (Ks < 2) was significantly correlated (Spearman s ρ=0.39, P= ), which implies that divergence in promoter chromatin structure increases with duplicate age with identical status of chromatin structure upon the time of gene duplication and gradually diverging afterwards. Next, we compared the recent gene duplicates with ancient ones. In all, 79 relatively ancient duplicates (1<Ks 2) and 31 very recent duplicates (Ks 0.1) were compared, and the ancient duplicates were much more highly diverged in promoter openness (Figure 2-8 median open=0.02 for the recent pairs versus open=0.23 for the ancient duplicates, P= ; Wilcoxon ranksum test). Nonetheless, these ancient duplicates were still more similar in promoter openness than unrelated pairs of genes sampled from the genome (median open= 0.30 for 1,000 randomly paired genes, P=0.02, Wilcoxon ranksum test). Notably, among these ancient pairs, there is an excess of sister paralogs with little diverged promoter status with open 0.05 (P<1 10-4, chi-square test), which 32

41 suggests the presence of selection on maintaining the similar promoter chromatin structures even between distant paralogs. Figure 2-8 Comparison of open in ancient and recent pairs This figure is the comparison of open in ancient and recent pairs using cumulative density function. X axis denotes the normalized open and Y axis denotes the cumulative density. Wilcoxon ranksum test suggests the significant difference between these two samples (P= ) with recent pairs more similar in chromatin structure. 33

42 We then investigated whether diverged nucleosome occupancy could be explained by sequence conservation of their nuc-sequences (i.e. the promoter regions where nucleosomes reside). Kimura distance is a well-established and widely used method, which measures DNA difference based on the number of nucleotide substitution with consideration of the difference between transitional and transversional substitutions (Kimura, 1980). The Kimura distance between all the paralogous nuc-sequences was calculated, and their sequence divergence is henceforth termed Knuc. Not surprisingly, for the 31 recent duplicate pairs (Ks 0.1), their Knuc is highly correlated with open (Pearson s R=0.83, P= and Spearman s ρ=0.90, P= ). The significant correlation remains observable for pairs with intermediate divergence time (0.1 Ks<1), however, when considering the 79 ancient duplicate pairs (1<Ks 2), this correlation is substantially reduced (Pearson s R=0.18, P=0.11 and Spearman s ρ=0.20, P=0.08; Figure 2-9). This observation indicates that most of the recent duplicate pairs have relatively less divergence in nuc sequences and chromatin structure (Figure 2-9; left panel). Nevertheless, for ancient pairs (Figure 2-9; middle panel), although their nuc sequences have substantially diverged, an appreciable proportion of sister paralogs show little difference in promoter chromatin status. We further highlighted this observed discordance by comparing the distributions of conservation values for these ancient duplicates. As demonstrated (Figure 2-9; the right panel), clearly the nuc sequences are more divergent than the promoter openness between paralogs for the ancient pairs. On the other hand, examination of TF regulation for these ancient duplicates shows that TF regulation has completely diverged for most ancient duplicate pairs, apparently consistent with the highly degenerate nature of TF binding sites. Therefore, even though ancient duplicates share few common transcription factors and have highly diverged nuc sequences, their chromatin status are still fairly conserved. It is likely that despite the divergence in molecular function, these sister paralogs still keep the chromatin states of their promoters and are still regulated in the same manner for they still might be in the same broad functional categories. 34

43 Figure 2-9 Divergence in nuc-sequences, promoter chromatin status ( open) and transcription factors Divergence in nuc-sequences, promoter chromatin status ( open) and transcription factors between sisters paralogs. Knuc and open were normalized between 0 and 1 by dividing by their respective maximums. The left panel shows the comparison for recent duplicates (Ks < 0.1) with each row representing a duplicate pair. The middle panel similarly shows the same comparison for ancient duplicates (1 < Ks < 2). A histogram of open, Knuc and TF divergence (DivTF) between sister paralogs is shown in the right panel. These results show conserved promoter chromatin structure between ancient paralogs. 35

44 2.2.5 Better explanation of expression divergence between paralogs by the composite divergence of TF regulation and promoter chromatin status The divergence of nucleosome occupancy between duplicated genes scales with time. The next step is to investigate the potential involvement of chromatin divergence in expression evolution between paralogs. It is rational to postulate the diverged chromatin structure will play an important role in differential expression dynamics. Using the same 606 duplicated pairs analyzed in section (2.2.3), we found that expression divergence between sister paralogs is significantly correlated with open (Pearson s R=0.21, P=2.29x10-7 and Spearman s ρ=0.19, P=3.84x10-6 ). Although this correlation is weaker than the comparable correlation between expression divergence and divergence in TF regulation (R=0.36, see above in section (2.2.3)), it is likely that observed expression divergence can be better accounted for when information from both TF regulation and chromatin status are combined. As they play different roles in regulating expression, it is worthwhile to demonstrate this idea with an appropriate model. As the coupling of divergence in TF regulation and differential chromatin status ( open) between duplicates is not necessarily linear, a neural network using Levenberg-Marquardt back-propagation (BP) was trained to learn the relationship between TF regulation and chromatin status as described above (see method in section (2.2.2)). Through the 100 simulations by randomly choosing training, validation and test sets, we derived an empirical distribution of the estimated correlation between the composite divergence and expression divergence. The composite divergence derived from divergence in TF regulation and chromatin status was found to be significantly correlated with expression divergence. As shown in Figure 2-10, one random realization from the 100 simulations indicated ~22% of expression divergence could be explained by the composite divergence between TF regulation and promoter structure, with Pearson s R=0.47, P=5.7x10-8 and Spearman s ρ=0.44, P=4.7x10-7. The distribution of Pearson s correlation for the composite divergence from the 100 simulations was also shown in Figure 2-10B (the red bars), contrasting the lessened correlation of TFs alone (the blue bars) which was derived from a matched control with the same sampling protocol. We realized that the aforementioned results 36

45 might not explain expression divergence between paralogs completely. Therefore, we discussed some other potential mechanisms below (see Future Work in section (3.3)). To demonstrate that the superiority of the combined TF and nucleosome occupancy approach in determining expression divergence is not a by product of the differential non linear regression of the two metrics, the TF divergence to expression divergence was regressed using the same neural network, instead of directly computing their Pearson Correlation Coefficient (PCC). After the same 100 blind tests we again found that the correlation of TF divergence after neural network mapping (median R =0.38) was significantly lower than that of the composite divergence (P=6x10 5, Wilcoxon ranksum test, see Figure 2-11). Next, to assure that the observed superiority of the combined metric was indeed due to differential chromatin structure of sister paralogs, we matched the TF divergence for each pair with a randomly selected open value and computed the correlation of the randomized composite divergence with expression divergence through 100 blind tests. The randomized correlation is substantially lower than the real composite divergence (P=6x10 6, Wilcoxon ranksum test, see Figure 2-11), establishing the role of chromatin structure in expression divergence between paralogs. Since WGD and SSD pairs are of different origins, we performed additional experiments to ensure that the above results are not the result of sampling bias towards a particular group. If our experiment was biased towards solely WGD or SSD genes, we do not expect to see the correlation when the regression model is trained using one data set (WGD or SSD) and tested on the other dataset (SSD or WGD). However, even using pairs with different origin for training, the correlation tested on the other set with different origin is still highly significant, with R=0.35, P= when using SSD for test and WGD for training, and R=0.34, P= when using WGD for testing and SSD for training. In addition, in our analysis, the test sets were randomly chosen from all the duplicate pairs for 100 times, and the regression results were estimated from the distribution of the 100 random sampling. This protocol essentially minimized the sampling bias towards a particular group (WGD and SSD). Therefore, this conclusion applies to duplicate pairs from different origins. Taken all the above analysis together, these results demonstrate that expression divergence between paralogs can be better 37

46 explained by the combination of TF regulation and promoter chromatin structure than by TF alone. Figure 2-10 Chromatin divergence accounting for expression divergence (A) The correlation between expression divergence and the composite divergence in TF regulation and chromatin structures is shown. Data are from one of the 100 simulations in a blind test. The reference line Y=X is also shown (the blue thin line) to represent the perfect linear correlation. (B) The histogram of correlations between expression and TF divergence (blue bar)/the composite (TF+ chromatin) (the red bar) divergence derived from 100 randomization. A low P-value (Wilcoxon rank sum test) indicates that the composite divergence is significantly more correlated with expression divergence than TF divergence alone. 38

47 Figure 2-11 Comparison of correlation between expression divergence and TF divergence The cumulative density of correlation between expression divergence and TF divergence (after neural network mapping) alone (the green curve), the randomized composite divergence derived from TF divergence and the shuffled open (the blue curve), and the real composite divergence derived from paired TF and open (the red curve). The lower curve indicates its higher correlation with expression divergence. 39

48 2.3 Gene duplicates in the chaperone system Introduction and motivations The chaperone system is an essential quality-control system which is conserved in three domains of life. Chaperones assist in protein folding and they engage in multiple biological processes, such as protein assembly, intracellular protein transportation and protein degradation (Hartl and Hayer-Hartl, 2009). Specifically, when proteins tend to aggregate because of unfavourable environmental conditions, chaperones function to maintain the system equilibrium by refolding or degrading the deformed proteins. Recently, the importance of some chaperones has been further highlighted. For example, HSP90 was found to act as a phenotypic buffer that could counter unfavourable genetic mutations (Rutherford and Lindquist, 1998). HSP90 also contributes to evolution of new phenotypes (Cowen and Lindquist, 2005). In this study, we collected 75 known chaperones of S. cerevisiae from the literature. Some of them are regarded as cofactors/co-chaperones, which function together with chaperones. These cofactors either regulate chaperones activities or transfer substrates to chaperones (Caplan et al, 2003; Gong et al, 2009). These chaperones are divided into different families according to their signature domain structure or known functions. (see Table 2-1). Here, a more focused study of gene duplication was performed in the yeast chaperone system. Such a case study in a familiar biological system serves to examine in depth the functional overlap between paralogs, what role functional dispersal has played in long-term preservation of gene duplicates. Moreover, a close examination of this system from the perspective of evolution could also shed new light on chaperones functions. Therefore, studying gene duplicates in this system will be of particular interests. 40

49 Table 2-1 Summary of chaperones in budding yeast S. cerevisiae Family Total Number Standard Name Hsp70 14 ECM10, KAR2, LHS1,SSA1, SSA2, SSA3, SSA4,SSB1, SSB2, SSC1,SSE1,SSE2, SSQ1, SSZ1 Hsp40 22 APJ1,CAJ1,CWC23, DJP1,ERJ5,JAC1, Small heat shock proteins CCT/TRiC complex 8 JEM1, JID1, JJJ1,JJJ2,JJJ3,MDJ1,MDJ2 HLJ1, PAM18, SCJ1, SEC63, SIS1, SWA2, XDJ1,YDJ1,ZUO1 7 HSP12,HSP26, HSP31, HSP32, HSP33, HSP42,SNO4 CCT2, CCT3,CCT4,CCT5,CCT6,CCT7, CCT8, TCP1 Prefoldin/GimC 6 GIM3,GIM4,GIM5,PAC10,PFD1,YKE2 AAA+ family 3 HSP78,HSP104, MCX1, HSP90 2 HSC82,HSP82 HSP60 1 HSP60 HSP90 cofactor 11 AHA1,CDC37,CNS1,CPR6,CPR7,HCH1,PI HSP60 cofactor 1 HSP10 H1,PPT1,SBA1,STI1,TAH1 The above table lists the name and family of S. cerevisiae chaperones. Hsp40s and Hsp70s function together as the Hsp70 system, and they play crucial roles including protein folding, protein translocation and heat shock response in different compartments. Hsp90 family has multiple members, with HSP82 and HSC82 in the central part. Hsp90s can buffer the effects of genetic mutations, and they are involved in the signal transduction, chromatin remodelling and transportation processes. Prefoldin (PFD) helps to transfer protein substrates to CCT/chaperonin. The CCT complex is the chaperonin system in cytosol, while Hsp60 and Hsp10 work together as the chaperonin system in mitochondria. AAA+ family and small heat shock proteins are involved in multiple processes including protein disaggregation and protein degradation. 41

50 2.3.2 Functional divergence and redundancy in the chaperone system It is known that functional redundancy exists between many paralogs in the yeast chaperone system, such as HSP82/HSC82, SSA1/SSA2 and SSB1/SSB2 (Gautschi et al, 2002; Gong et al, 2009; Matsumoto et al, 2006). We performed a more comprehensive study of the gene duplicates using the same dataset as discussed in section (2.1.2) from Guan et al (Guan et al, 2007). As a result, in all 17 gene duplicates were found (see Appendix Table 1). We next examined the genetic buffering between these duplicates. Of all ten pairs that have genetic interaction scores, six of them have lost their mutual buffering, indicating their functional divergence. For the rest four pairs that show aggravating genetic interactions, it is intriguing to investigate the extent of functional overlap between the sister paralogs. We found that three of these four buffering pairs belong to the HSP70 family in cytoplasmic. Therefore, we next studied the gene duplicates of cytoplasmic HSP70 families. We found that these four cytoplasmic pairs in HSP70 families seem highly redundant, for these pairs share high sequence similarity (>75%) and maintain strong aggravating interactions (SSA1/SSA2-0.76, SSB1/SSB2-0.95, SSE1/SSE2-0.36, SSA3/SSA4 is unavailable). We next asked whether these pairs are merely redundant. However, among these pairs, many pieces of evidence indicate that they are functionally divergent. From literature, previous results observed substrates specificity and different deletion phenotypes in both SSA1/SSA2 and SSA3/SS4 pairs, indicating their functional divergence (Kabani and Martineau, 2008). Evolutionary analysis also found strong selection sites in one copy of both SSA1/SSA2 and SSB1/SSB2 pairs, respectively (Takuno et al, 2009). In our analysis, we found that each gene has its own genetic substrates, which suggests functional disparity. This is because if one gene is completely identical to its paralogs, we would not expect to detect its genetic interactors with other genes due to compensation from its paralogous copy. Thus, for sister paralogs, genetic interactions only develop on the subset of diverged functions (Figure 1-2). For example, 42

51 for SSE1/SSE2 pair, SSE1 has ~350 negative interactors while SSE2 has only ~50 negative interactors. Similarly, SSA1/SSA2 and SSB1/SSB2 have their own genetic interactors. Besides, we also found that these pairs are under different transcriptional regulations (expression pattern measured by PCC for both SSE1/SSE2 and SSB1/SSB2 is less than 0.10), indicating their different usages corresponding to varying conditions. Taken together, these results lend support to functional divergence in the cytoplasmic HSP70 family, suggesting that duplicates in the chaperone system are not merely redundant Functional divergence leading to functional specificity We next analyzed cases that gene duplicates could promote substantial functional diversity given the long spans of evolutionary time. We did find completely diverged functions between CPR7/CPR6 pair, which serve as cofactors in the HSP90 machinery. Previously, CPR6 and CPR7 were known as peptidyl-prolyl cis-trans isomerases to catalyze the cis-trans isomerization of peptide bonds N-terminal to proline residues, and they were considered as functional identically (Mayr et al, 2000). However, the substantial sequence divergence between this pair (Ka=0.64) and the potential long evolutionary time (Ks=4.06) motivate us to study their functional divergence. First, we found this paralogous pair has lost genetic buffering (ε>0), indicating minimal functional overlap. Second, we examined their genetic interaction profiles to further explore their functions as genetic interaction profile proves useful in the study of gene function (Collins et al, 2007; Fiedler et al, 2009). CPR6 and CPR7 have different interaction partners, suggesting their functional differentiation. The negative hits of CPR7 are enriched in transport/vesicle-mediated transport and vesicle organization categories, while CPR6 s negative hits are enriched in cell-cycle related categories (Figure 2-12). A detailed analysis was then performed to further clarify their distinct functions through a direct comparison of their genetic interaction profiles. We found a strong correlation between CPR7 and cog2-1 (PCC=0.43, p<1e-20) (Figure 2-13 A). CPR7 also has strong positive correlations with other COG complex components (PCC~0.2, p<1e-8 with other components of COG complex). The high correlations suggest functional similarity 43

52 between CPR7 and the COG complex. Moreover, we draw an integrated interaction map of CPR7 and its interactors in the Golgi and vesicle transportation functional categories. These interactors were grouped in accordance with their functional subgroups (Figure 2-13 B). Such interactions with many genes in the transporting pathway strongly indicate CPR7 s functional involvement. However, CPR6 is different from CPR7, which has no positive correlations with COG complex components (PCC= to -0.06) and has no such genetic or physical interactors in this category. When compared with CPR7, the physical and genetic interaction data strongly suggest CPR6 s function in spindle dynamics. CPR6 s genetic interaction profile is similar to CIN8 (measured by PCC~0.2, p<1e-8), which is also known as a Kinesin motor protein involving in mitotic spindle assembly and chromosome segregation (Geiser et al, 1997; Gerson-Gurwitz et al, 2009). In addition, CPR6 has similar genetic interaction profiles with several other genes including CLB4, KIP3, STU2, STU1 and KAR9, which are all relevant to spindle dynamics. Specifically, CPR6 has negative interactions with multiple components of microtubule motor including PAC11, DYN3, DYN1, NIP100, KAR3, and NUM1(Lee et al, 2005), indicating its functional overlap with this microtubule motor. Moreover, CPR6 has physical interactions with CIN8 and SLI15 (Figure 2-14). 44

53 Figure 2-12 Distinct functional enrichment of negative interactors for CPR6 /CPR7 The y-axis represents significant GO-slim BP categories, and x-axis represents the fold enrichment. The enrichment is calculated using hypergeometric distribution. GO terms with a p-value smaller than 0.01 are presented here. 45

54 Figure 2-13 Functional study of CPR7 linking its role in transportation (A) Genetic interaction profiles of both CPR7 and cog2-1, the PCC is 0.43, p<1e -20 (B) An integrated physical and genetic interaction map reveals many CPR7 s targets in Golgi trafficking and transportation categories. These interactors were grouped in accordance with their functional subgroups. 46

Predicting Protein Functions and Domain Interactions from Protein Interactions

Predicting Protein Functions and Domain Interactions from Protein Interactions Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput

More information

comparative genomics of high throughput data between species and evolution of function

comparative genomics of high throughput data between species and evolution of function Comparative Interactomics comparative genomics of high throughput data between species and evolution of function Function prediction, for what aspects of function from model organism to e.g. human is orthology

More information

Divergence Pattern of Duplicate Genes in Protein-Protein Interactions Follows the Power Law

Divergence Pattern of Duplicate Genes in Protein-Protein Interactions Follows the Power Law Divergence Pattern of Duplicate Genes in Protein-Protein Interactions Follows the Power Law Ze Zhang,* Z. W. Luo,* Hirohisa Kishino,à and Mike J. Kearsey *School of Biosciences, University of Birmingham,

More information

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Bio 1B Lecture Outline (please print and bring along) Fall, 2007 Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution

More information

Computational methods for predicting protein-protein interactions

Computational methods for predicting protein-protein interactions Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational

More information

Computational approaches for functional genomics

Computational approaches for functional genomics Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding

More information

Evidence for dynamically organized modularity in the yeast protein-protein interaction network

Evidence for dynamically organized modularity in the yeast protein-protein interaction network Evidence for dynamically organized modularity in the yeast protein-protein interaction network Sari Bombino Helsinki 27.3.2007 UNIVERSITY OF HELSINKI Department of Computer Science Seminar on Computational

More information

An Optimal System for Evolutionary Cell Biology: the genus Paramecium

An Optimal System for Evolutionary Cell Biology: the genus Paramecium An Optimal System for Evolutionary Cell Biology: the genus Paramecium Presence of a transcriptionally silent germline (micronucleus) and an expression-active somatic macronucleus. Geographically ubiquitous,

More information

Network Centralities and the Retention of Genes Following Whole Genome Duplication in Saccharomyces cerevisiae

Network Centralities and the Retention of Genes Following Whole Genome Duplication in Saccharomyces cerevisiae Network Centralities and the Retention of Genes Following Whole Genome Duplication in Saccharomyces cerevisiae by Matthew J. Imrie B.Sc., University of Victoria, 2010 A Thesis Submitted in Partial Fulfillment

More information

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/8/16

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/8/16 Genome Evolution Outline 1. What: Patterns of Genome Evolution Carol Eunmi Lee Evolution 410 University of Wisconsin 2. Why? Evolution of Genome Complexity and the interaction between Natural Selection

More information

Impact of recurrent gene duplication on adaptation of plant genomes

Impact of recurrent gene duplication on adaptation of plant genomes Impact of recurrent gene duplication on adaptation of plant genomes Iris Fischer, Jacques Dainat, Vincent Ranwez, Sylvain Glémin, Jacques David, Jean-François Dufayard, Nathalie Chantret Plant Genomes

More information

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/1/18

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/1/18 Genome Evolution Outline 1. What: Patterns of Genome Evolution Carol Eunmi Lee Evolution 410 University of Wisconsin 2. Why? Evolution of Genome Complexity and the interaction between Natural Selection

More information

Drosophila melanogaster and D. simulans, two fruit fly species that are nearly

Drosophila melanogaster and D. simulans, two fruit fly species that are nearly Comparative Genomics: Human versus chimpanzee 1. Introduction The chimpanzee is the closest living relative to humans. The two species are nearly identical in DNA sequence (>98% identity), yet vastly different

More information

Introduction to Bioinformatics

Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics

More information

How to detect paleoploidy?

How to detect paleoploidy? Genome duplications (polyploidy) / ancient genome duplications (paleopolyploidy) How to detect paleoploidy? e.g. a diploid cell undergoes failed meiosis, producing diploid gametes, which selffertilize

More information

REVIEWS. The evolution of gene duplications: classifying and distinguishing between models

REVIEWS. The evolution of gene duplications: classifying and distinguishing between models The evolution of gene duplications: classifying and distinguishing between models Hideki Innan* and Fyodor Kondrashov bstract Gene duplications and their subsequent divergence play an important part in

More information

Bioinformatics Chapter 1. Introduction

Bioinformatics Chapter 1. Introduction Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!

More information

Chapter 18 Lecture. Concepts of Genetics. Tenth Edition. Developmental Genetics

Chapter 18 Lecture. Concepts of Genetics. Tenth Edition. Developmental Genetics Chapter 18 Lecture Concepts of Genetics Tenth Edition Developmental Genetics Chapter Contents 18.1 Differentiated States Develop from Coordinated Programs of Gene Expression 18.2 Evolutionary Conservation

More information

Molecular evolution - Part 1. Pawan Dhar BII

Molecular evolution - Part 1. Pawan Dhar BII Molecular evolution - Part 1 Pawan Dhar BII Theodosius Dobzhansky Nothing in biology makes sense except in the light of evolution Age of life on earth: 3.85 billion years Formation of planet: 4.5 billion

More information

5/4/05 Biol 473 lecture

5/4/05 Biol 473 lecture 5/4/05 Biol 473 lecture animals shown: anomalocaris and hallucigenia 1 The Cambrian Explosion - 550 MYA THE BIG BANG OF ANIMAL EVOLUTION Cambrian explosion was characterized by the sudden and roughly simultaneous

More information

The geneticist s questions. Deleting yeast genes. Functional genomics. From Wikipedia, the free encyclopedia

The geneticist s questions. Deleting yeast genes. Functional genomics. From Wikipedia, the free encyclopedia From Wikipedia, the free encyclopedia Functional genomics..is a field of molecular biology that attempts to make use of the vast wealth of data produced by genomic projects (such as genome sequencing projects)

More information

Evolution by duplication

Evolution by duplication 6.095/6.895 - Computational Biology: Genomes, Networks, Evolution Lecture 18 Nov 10, 2005 Evolution by duplication Somewhere, something went wrong Challenges in Computational Biology 4 Genome Assembly

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

Network Biology: Understanding the cell s functional organization. Albert-László Barabási Zoltán N. Oltvai

Network Biology: Understanding the cell s functional organization. Albert-László Barabási Zoltán N. Oltvai Network Biology: Understanding the cell s functional organization Albert-László Barabási Zoltán N. Oltvai Outline: Evolutionary origin of scale-free networks Motifs, modules and hierarchical networks Network

More information

2 Genome evolution: gene fusion versus gene fission

2 Genome evolution: gene fusion versus gene fission 2 Genome evolution: gene fusion versus gene fission Berend Snel, Peer Bork and Martijn A. Huynen Trends in Genetics 16 (2000) 9-11 13 Chapter 2 Introduction With the advent of complete genome sequencing,

More information

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting. Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction

More information

Lecture 4: Yeast as a model organism for functional and evolutionary genomics. Part II

Lecture 4: Yeast as a model organism for functional and evolutionary genomics. Part II Lecture 4: Yeast as a model organism for functional and evolutionary genomics Part II A brief review What have we discussed: Yeast genome in a glance Gene expression can tell us about yeast functions Transcriptional

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

The geneticist s questions

The geneticist s questions The geneticist s questions a) What is consequence of reduced gene function? 1) gene knockout (deletion, RNAi) b) What is the consequence of increased gene function? 2) gene overexpression c) What does

More information

THE EVOLUTION OF DUPLICATED GENES CONSIDERING PROTEIN STABILITY CONSTRAINTS

THE EVOLUTION OF DUPLICATED GENES CONSIDERING PROTEIN STABILITY CONSTRAINTS THE EVOLUTION OF DUPLICATED GENES CONSIDERING PROTEIN STABILITY CONSTRAINTS D.M. TAVERNA*, R.M. GOLDSTEIN* *Biophysics Research Division, Department of Chemistry, University of Michigan, Ann Arbor, MI

More information

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Supplementary Note S2 Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Phylogenetic trees reconstructed by a variety of methods from either single-copy orthologous loci (Class

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison 10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:

More information

Genomes and Their Evolution

Genomes and Their Evolution Chapter 21 Genomes and Their Evolution PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from

More information

Molecular Evolution & the Origin of Variation

Molecular Evolution & the Origin of Variation Molecular Evolution & the Origin of Variation What Is Molecular Evolution? Molecular evolution differs from phenotypic evolution in that mutations and genetic drift are much more important determinants

More information

Molecular Evolution & the Origin of Variation

Molecular Evolution & the Origin of Variation Molecular Evolution & the Origin of Variation What Is Molecular Evolution? Molecular evolution differs from phenotypic evolution in that mutations and genetic drift are much more important determinants

More information

Measuring TF-DNA interactions

Measuring TF-DNA interactions Measuring TF-DNA interactions How is Biological Complexity Achieved? Mediated by Transcription Factors (TFs) 2 Regulation of Gene Expression by Transcription Factors TF trans-acting factors TF TF TF TF

More information

Understanding relationship between homologous sequences

Understanding relationship between homologous sequences Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective

More information

8/23/2014. Phylogeny and the Tree of Life

8/23/2014. Phylogeny and the Tree of Life Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major

More information

18.4 Embryonic development involves cell division, cell differentiation, and morphogenesis

18.4 Embryonic development involves cell division, cell differentiation, and morphogenesis 18.4 Embryonic development involves cell division, cell differentiation, and morphogenesis An organism arises from a fertilized egg cell as the result of three interrelated processes: cell division, cell

More information

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information # Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Details of PRF Methodology In the Poisson Random Field PRF) model, it is assumed that non-synonymous mutations at a given gene are either

More information

Grade 11 Biology SBI3U 12

Grade 11 Biology SBI3U 12 Grade 11 Biology SBI3U 12 } We ve looked at Darwin, selection, and evidence for evolution } We can t consider evolution without looking at another branch of biology: } Genetics } Around the same time Darwin

More information

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Chapter 12 (Strikberger) Molecular Phylogenies and Evolution METHODS FOR DETERMINING PHYLOGENY In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Modern

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

Supplementary Figure 3

Supplementary Figure 3 Supplementary Figure 3 a 1 (i) (ii) (iii) (iv) (v) log P gene Q group, % ~ ε nominal 2 1 1 8 6 5 A B C D D' G J L M P R U + + ε~ A C B D D G JL M P R U -1 1 ε~ (vi) Z group 2 1 1 (vii) (viii) Z module

More information

SEQUENCE DIVERGENCE,FUNCTIONAL CONSTRAINT, AND SELECTION IN PROTEIN EVOLUTION

SEQUENCE DIVERGENCE,FUNCTIONAL CONSTRAINT, AND SELECTION IN PROTEIN EVOLUTION Annu. Rev. Genomics Hum. Genet. 2003. 4:213 35 doi: 10.1146/annurev.genom.4.020303.162528 Copyright c 2003 by Annual Reviews. All rights reserved First published online as a Review in Advance on June 4,

More information

Gene duplication and loss

Gene duplication and loss Gene duplication and loss Matthew Hahn Indiana University mwh@indiana.edu How many genes does a human have: a. 100,000 How many genes does a human have: a.

More information

Research Article Expression Divergence of Tandemly Arrayed Genes in Human and Mouse

Research Article Expression Divergence of Tandemly Arrayed Genes in Human and Mouse Hindawi Publishing Corporation Comparative and Functional Genomics Volume 27, Article ID 6964, 8 pages doi:1.1155/27/6964 Research Article Expression Divergence of Tandemly Arrayed Genes in Human and Mouse

More information

Chapter Chemical Uniqueness 1/23/2009. The Uses of Principles. Zoology: the Study of Animal Life. Fig. 1.1

Chapter Chemical Uniqueness 1/23/2009. The Uses of Principles. Zoology: the Study of Animal Life. Fig. 1.1 Fig. 1.1 Chapter 1 Life: Biological Principles and the Science of Zoology BIO 2402 General Zoology Copyright The McGraw Hill Companies, Inc. Permission required for reproduction or display. The Uses of

More information

BMD645. Integration of Omics

BMD645. Integration of Omics BMD645 Integration of Omics Shu-Jen Chen, Chang Gung University Dec. 11, 2009 1 Traditional Biology vs. Systems Biology Traditional biology : Single genes or proteins Systems biology: Simultaneously study

More information

Markov Random Field Models of Transient Interactions Between Protein Complexes in Yeast

Markov Random Field Models of Transient Interactions Between Protein Complexes in Yeast Markov Random Field Models of Transient Interactions Between Protein Complexes in Yeast Boyko Kakaradov Department of Computer Science, Stanford University June 10, 2008 Motivation: Mapping all transient

More information

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1 Tiffany Samaroo MB&B 452a December 8, 2003 Take Home Final Topic 1 Prior to 1970, protein and DNA sequence alignment was limited to visual comparison. This was a very tedious process; even proteins with

More information

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS CRYSTAL L. KAHN and BENJAMIN J. RAPHAEL Box 1910, Brown University Department of Computer Science & Center for Computational Molecular Biology

More information

Clustering and Network

Clustering and Network Clustering and Network Jing-Dong Jackie Han jdhan@picb.ac.cn http://www.picb.ac.cn/~jdhan Copy Right: Jing-Dong Jackie Han What is clustering? A way of grouping together data samples that are similar in

More information

Computational analyses of ancient polyploidy

Computational analyses of ancient polyploidy Computational analyses of ancient polyploidy Kevin P. Byrne 1 and Guillaume Blanc 2* 1 Department of Genetics, Smurfit Institute, University of Dublin, Trinity College, Dublin 2, Ireland. 2 Laboratoire

More information

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

CHAPTERS 24-25: Evidence for Evolution and Phylogeny CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology

More information

CHAPTER 13 PROKARYOTE GENES: E. COLI LAC OPERON

CHAPTER 13 PROKARYOTE GENES: E. COLI LAC OPERON PROKARYOTE GENES: E. COLI LAC OPERON CHAPTER 13 CHAPTER 13 PROKARYOTE GENES: E. COLI LAC OPERON Figure 1. Electron micrograph of growing E. coli. Some show the constriction at the location where daughter

More information

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference CS 229 Project Report (TR# MSB2010) Submitted 12/10/2010 hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference Muhammad Shoaib Sehgal Computer Science

More information

Robust Community Detection Methods with Resolution Parameter for Complex Detection in Protein Protein Interaction Networks

Robust Community Detection Methods with Resolution Parameter for Complex Detection in Protein Protein Interaction Networks Robust Community Detection Methods with Resolution Parameter for Complex Detection in Protein Protein Interaction Networks Twan van Laarhoven and Elena Marchiori Institute for Computing and Information

More information

AP Curriculum Framework with Learning Objectives

AP Curriculum Framework with Learning Objectives Big Ideas Big Idea 1: The process of evolution drives the diversity and unity of life. AP Curriculum Framework with Learning Objectives Understanding 1.A: Change in the genetic makeup of a population over

More information

Geert Geeven. April 14, 2010

Geert Geeven. April 14, 2010 iction of Gene Regulatory Interactions NDNS+ Workshop April 14, 2010 Today s talk - Outline Outline Biological Background Construction of Predictors The main aim of my project is to better understand the

More information

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

Inferring Transcriptional Regulatory Networks from Gene Expression Data II Inferring Transcriptional Regulatory Networks from Gene Expression Data II Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday

More information

Chapter 27: Evolutionary Genetics

Chapter 27: Evolutionary Genetics Chapter 27: Evolutionary Genetics Student Learning Objectives Upon completion of this chapter you should be able to: 1. Understand what the term species means to biology. 2. Recognize the various patterns

More information

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic

More information

A Complex-based Reconstruction of the Saccharomyces cerevisiae Interactome* S

A Complex-based Reconstruction of the Saccharomyces cerevisiae Interactome* S Research Author s Choice A Complex-based Reconstruction of the Saccharomyces cerevisiae Interactome* S Haidong Wang, Boyko Kakaradov, Sean R. Collins **, Lena Karotki, Dorothea Fiedler **, Michael Shales,

More information

Processes of Evolution

Processes of Evolution 15 Processes of Evolution Forces of Evolution Concept 15.4 Selection Can Be Stabilizing, Directional, or Disruptive Natural selection can act on quantitative traits in three ways: Stabilizing selection

More information

Written Exam 15 December Course name: Introduction to Systems Biology Course no

Written Exam 15 December Course name: Introduction to Systems Biology Course no Technical University of Denmark Written Exam 15 December 2008 Course name: Introduction to Systems Biology Course no. 27041 Aids allowed: Open book exam Provide your answers and calculations on separate

More information

Analysis of Biological Networks: Network Robustness and Evolution

Analysis of Biological Networks: Network Robustness and Evolution Analysis of Biological Networks: Network Robustness and Evolution Lecturer: Roded Sharan Scribers: Sasha Medvedovsky and Eitan Hirsh Lecture 14, February 2, 2006 1 Introduction The chapter is divided into

More information

Optimality, Robustness, and Noise in Metabolic Network Control

Optimality, Robustness, and Noise in Metabolic Network Control Optimality, Robustness, and Noise in Metabolic Network Control Muxing Chen Gal Chechik Daphne Koller Department of Computer Science Stanford University May 18, 2007 Abstract The existence of noise, or

More information

Enduring understanding 1.A: Change in the genetic makeup of a population over time is evolution.

Enduring understanding 1.A: Change in the genetic makeup of a population over time is evolution. The AP Biology course is designed to enable you to develop advanced inquiry and reasoning skills, such as designing a plan for collecting data, analyzing data, applying mathematical routines, and connecting

More information

Protocol S1. Replicate Evolution Experiment

Protocol S1. Replicate Evolution Experiment Protocol S Replicate Evolution Experiment 30 lines were initiated from the same ancestral stock (BMN, BMN, BM4N) and were evolved for 58 asexual generations using the same batch culture evolution methodology

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

3/8/ Complex adaptations. 2. often a novel trait

3/8/ Complex adaptations. 2. often a novel trait Chapter 10 Adaptation: from genes to traits p. 302 10.1 Cascades of Genes (p. 304) 1. Complex adaptations A. Coexpressed traits selected for a common function, 2. often a novel trait A. not inherited from

More information

Big Idea 1: The process of evolution drives the diversity and unity of life.

Big Idea 1: The process of evolution drives the diversity and unity of life. Big Idea 1: The process of evolution drives the diversity and unity of life. understanding 1.A: Change in the genetic makeup of a population over time is evolution. 1.A.1: Natural selection is a major

More information

Understanding Science Through the Lens of Computation. Richard M. Karp Nov. 3, 2007

Understanding Science Through the Lens of Computation. Richard M. Karp Nov. 3, 2007 Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007 The Computational Lens Exposes the computational nature of natural processes and provides a language for their description.

More information

Sex accelerates adaptation

Sex accelerates adaptation Molecular Evolution Sex accelerates adaptation A study confirms the classic theory that sex increases the rate of adaptive evolution by accelerating the speed at which beneficial mutations sweep through

More information

Genomics and bioinformatics summary. Finding genes -- computer searches

Genomics and bioinformatics summary. Finding genes -- computer searches Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence

More information

Simulation of the Evolution of Information Content in Transcription Factor Binding Sites Using a Parallelized Genetic Algorithm

Simulation of the Evolution of Information Content in Transcription Factor Binding Sites Using a Parallelized Genetic Algorithm Simulation of the Evolution of Information Content in Transcription Factor Binding Sites Using a Parallelized Genetic Algorithm Joseph Cornish*, Robert Forder**, Ivan Erill*, Matthias K. Gobbert** *Department

More information

Genetically Engineering Yeast to Understand Molecular Modes of Speciation

Genetically Engineering Yeast to Understand Molecular Modes of Speciation Genetically Engineering Yeast to Understand Molecular Modes of Speciation Mark Umbarger Biophysics 242 May 6, 2004 Abstract: An understanding of the molecular mechanisms of speciation (reproductive isolation)

More information

Modes of Macroevolution

Modes of Macroevolution Modes of Macroevolution Macroevolution is used to refer to any evolutionary change at or above the level of species. Darwin illustrated the combined action of descent with modification, the principle of

More information

Comparative Network Analysis

Comparative Network Analysis Comparative Network Analysis BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2016 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by

More information

Whole-genome analysis of GCN4 binding in S.cerevisiae

Whole-genome analysis of GCN4 binding in S.cerevisiae Whole-genome analysis of GCN4 binding in S.cerevisiae Lillian Dai Alex Mallet Gcn4/DNA diagram (CREB symmetric site and AP-1 asymmetric site: Song Tan, 1999) removed for copyright reasons. What is GCN4?

More information

Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem

Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem University of Groningen Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's

More information

A A A A B B1

A A A A B B1 LEARNING OBJECTIVES FOR EACH BIG IDEA WITH ASSOCIATED SCIENCE PRACTICES AND ESSENTIAL KNOWLEDGE Learning Objectives will be the target for AP Biology exam questions Learning Objectives Sci Prac Es Knowl

More information

In Search of the Biological Significance of Modular Structures in Protein Networks

In Search of the Biological Significance of Modular Structures in Protein Networks In Search of the Biological Significance of Modular Structures in Protein Networks Zhi Wang, Jianzhi Zhang * Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan,

More information

Why transcription factor binding sites are ten nucleotides long

Why transcription factor binding sites are ten nucleotides long Genetics: Published Articles Ahead of Print, published on August 10, 2012 as 10.1534/genetics.112.143370 Why transcription factor binding sites are ten nucleotides long Alexander J. Stewart, Joshua B.

More information

NUCLEOTIDE SUBSTITUTIONS AND THE EVOLUTION OF DUPLICATE GENES

NUCLEOTIDE SUBSTITUTIONS AND THE EVOLUTION OF DUPLICATE GENES Conery, J.S. and Lynch, M. Nucleotide substitutions and evolution of duplicate genes. Pacific Symposium on Biocomputing 6:167-178 (2001). NUCLEOTIDE SUBSTITUTIONS AND THE EVOLUTION OF DUPLICATE GENES JOHN

More information

networks in molecular biology Wolfgang Huber

networks in molecular biology Wolfgang Huber networks in molecular biology Wolfgang Huber networks in molecular biology Regulatory networks: components = gene products interactions = regulation of transcription, translation, phosphorylation... Metabolic

More information

There are 3 parts to this exam. Use your time efficiently and be sure to put your name on the top of each page.

There are 3 parts to this exam. Use your time efficiently and be sure to put your name on the top of each page. EVOLUTIONARY BIOLOGY EXAM #1 Fall 2017 There are 3 parts to this exam. Use your time efficiently and be sure to put your name on the top of each page. Part I. True (T) or False (F) (2 points each). Circle

More information

Mole_Oce Lecture # 24: Introduction to genomics

Mole_Oce Lecture # 24: Introduction to genomics Mole_Oce Lecture # 24: Introduction to genomics DEFINITION: Genomics: the study of genomes or he study of genes and their function. Genomics (1980s):The systematic generation of information about genes

More information

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms

More information

7.06 Problem Set #4, Spring 2005

7.06 Problem Set #4, Spring 2005 7.06 Problem Set #4, Spring 2005 1. You re doing a mutant hunt in S. cerevisiae (budding yeast), looking for temperaturesensitive mutants that are defective in the cell cycle. You discover a mutant strain

More information

Frequently Asked Questions (FAQs)

Frequently Asked Questions (FAQs) Frequently Asked Questions (FAQs) Q1. What is meant by Satellite and Repetitive DNA? Ans: Satellite and repetitive DNA generally refers to DNA whose base sequence is repeated many times throughout the

More information

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogeny? - Systematics? The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogenetic systematics? Connection between phylogeny and classification. - Phylogenetic systematics informs the

More information

CHAPTER 23 THE EVOLUTIONS OF POPULATIONS. Section C: Genetic Variation, the Substrate for Natural Selection

CHAPTER 23 THE EVOLUTIONS OF POPULATIONS. Section C: Genetic Variation, the Substrate for Natural Selection CHAPTER 23 THE EVOLUTIONS OF POPULATIONS Section C: Genetic Variation, the Substrate for Natural Selection 1. Genetic variation occurs within and between populations 2. Mutation and sexual recombination

More information

BME 5742 Biosystems Modeling and Control

BME 5742 Biosystems Modeling and Control BME 5742 Biosystems Modeling and Control Lecture 24 Unregulated Gene Expression Model Dr. Zvi Roth (FAU) 1 The genetic material inside a cell, encoded in its DNA, governs the response of a cell to various

More information

Campbell Biology AP Edition 11 th Edition, 2018

Campbell Biology AP Edition 11 th Edition, 2018 A Correlation and Narrative Summary of Campbell Biology AP Edition 11 th Edition, 2018 To the AP Biology Curriculum Framework AP is a trademark registered and/or owned by the College Board, which was not

More information

Always read the introduction : integrating regulatory and coding sequence evolution in yeast

Always read the introduction : integrating regulatory and coding sequence evolution in yeast University of New Mexico UNM Digital Repository Biology ETDs Electronic Theses and Dissertations 7-1-2010 Always read the introduction : integrating regulatory and coding sequence evolution in yeast Annette

More information

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it? Proteomics What is it? Reveal protein interactions Protein profiling in a sample Yeast two hybrid screening High throughput 2D PAGE Automatic analysis of 2D Page Yeast two hybrid Use two mating strains

More information

Fitness constraints on horizontal gene transfer

Fitness constraints on horizontal gene transfer Fitness constraints on horizontal gene transfer Dan I Andersson University of Uppsala, Department of Medical Biochemistry and Microbiology, Uppsala, Sweden GMM 3, 30 Aug--2 Sep, Oslo, Norway Acknowledgements:

More information