RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES

Molecular Biology-2018 1 Definitions: RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES Heterologues: Genes or proteins that possess different sequences and activities. Homologues: Genes or proteins that share a threshold level of similarity as determined by alignment of matching bases or amino acids. Specifically, nucleotide sequences whose percent similarity is equal to or greater than 70% are termed homologous. In contrast, amino acid sequences whose percent similarity is equal to or greater than 25% are said to be homologous. Similarity is a quantitative term that defines the degree of sequence match between two compared sequences. For example, two aligned genes or segments of sequence that are homologous may have varying degrees of similarity based upon identical base matches in the alignment. In the first sequence alignment in the following figure, the sequences are obviously identical and therefore exhibit 39 matches out of 39 positions aligned, or 100% similarity. In the second alignment, the aligned sequences contain 28 matches out of 39 possible. The quantitative match or degree of similarity is then 28/39 or 72%. In both cases the sequences are homologous. A atgcctgaaggcctattgtttcccagtcgattggctgct... 39 of 39 matches atgcctgaaggcctattgtttcccagtcgattggctgcg... B atgcctgaaggcctattgtttcccagtcgattggctgct... 28 of 39 matches atgcctcggcttatattgtatcccagtccattggcagcg... Analogues: Genes or proteins that display the same activity but lack sufficient similarity to be homologs. (Less than 70% in the case of nucleotide sequences or less than 25% in the case of protein sequences.) Paralogs: Homologous genes or proteins produced by gene duplication are termed paralogous. Given that gene duplication occurs within the same organism/species, paralogues are sequences that share a high degree of similarity within a same species. These may have similar or different activities. Orthologs: After a speciation event, one homolog sorts with one species and the other copy with the other species. Subsequent divergence of the duplicated sequence is associated with one or the other species. Consequently, orthologues represent genes or proteins that share a high degree of similarity between different species.

Molecular Biology-2018 2 FINDING HOMOLOGS STARTING FROM A NUCLEOTIDE SEQUENCE 1. For this exercise you will be using the sequence represented by the mrna accession number NM_000558. Obtain the accession number, source organism and FASTA sequence. 2. From the nucleotide record click on the link "Run Blast" under the heading "analyze this sequence" on the right side of the page. This should bring you to the following page: Click here 3. Choose the options indicated above by the red boxes. Then click on algorithm parameters, at the bottom of the page, to obtain more options. 4. Change the following parameters: Set Max target sequences to 1000 and Expect threshold to 100. Click on Blast to start the search.

Molecular Biology-2018 3 5. Once you've obtained the Blast results, as shown below, click on "Taxonomy reports" to display the different organisms in which sequence similarities were found. 6. A new page will appear. Find the entry for Bos taurus (domestic cow). Notice the number of hits and click on it to list those records. 7. Obtain the first nucleotide record for hemoglobin, alpha 2 (HBA) mrna. Obtain the accession number, source organism and FASTA sequence. 8. Use the same approach to obtain the first nucleotide record of hemoglobin, alpha 1 mrna from Gallus gallus (chicken) 9. Use the approach we ve seen in the first bioinfo exercise to obtain the nucleotide record for human fetal hemoglobin (hemoglobin subunit gamma 1). Obtain the accession number, source organism and FASTA sequence.

Molecular Biology-2018 4 DETERMINING THE LEVEL OF SIMILARITY AT THE NUCLEOTIDE LEVEL 1. You should have saved four nucleotide sequences; two from humans, one from the domestic cow and one from the chicken. To determine the level of similarity at the nucleotide level we will use the program Clustal omega to perform a sequence alignment. Copy and paste each of the nucleotide sequences in FASTA format into the query box. Make sure to choose the option DNA 2. Click Submit to view the alignment. 3. On the menu at the top of the page, click on results summary and then on "percent identity matrix" to obtain the percentage of identity between the different sequences. See below. Percent Identity Matrix - created by Clustal2.1 1: gi 302408715 100.00 18.58 64.66 2: gi 189202936 18.58 100.00 62.75 3: gi 185698558 64.66 62.75 100.00 4. These results are pairwise comparisons between the different sequences. Obtain from this file the percentage identity between each of the following pairs: human-alpha to human-gamma, humanalpha to cow-alpha, and human-alpha to chicken alpha.

Molecular Biology-2018 5 DETERMINING THE LEVEL OF SIMILARITY AT THE PROTEIN LEVEL 1. From each of the nucleotide records obtained above, obtain the corresponding protein records and their FASTA sequences. 2. Repeat the alignment in Clustal omega with the protein sequences. Make sure to choose the option protein this time. 3. You will notice that the display of the alignment is somewhat different this time. Interpreting the results displayed: "*" Means that the amino acids are identical. ":" Means that conserved substitutions are observed; a different amino acid which shares the same charge and shape. "." Means that semi-conserved substitutions are observed; a different amino acid which shares either the same charge or shape. 4. Obtain the percentage identity between each of the following protein pairs: human-alpha to human-gamma, human-alpha to cow-alpha, and human-alpha to chicken alpha.

Molecular Biology-2018 6 FINDING PROTEIN HOMOLOGS STARTING FROM A PROTEIN SEQUENCE 1. For this exercise you will be using the sequence represented by the protein accession number AAA82165. Obtain the corresponding source organism and FASTA sequence. 2. From the protein record page, click on the link "Run Blast" under the heading "analyze this sequence" on the right side of the page. 3. Using the same parameters you used in the previous exercise, use Blastp to find and obtain the FASTA protein sequences for each of the following organisms: Bos taurus (Domestic cow) Mus musculus (Mouse) 4. Use the approach we have seen in the first bioinfo exercise to obtain the protein record for the alcohol dehydrogenase of Saccharomyces cerevisiae (yeast; hint it is classified as a fungi). 5. As you did previously, use Clustal omega to determine the percentage identity between the following pairs of proteins: Human to cow, human to mouse, and human to yeast. FINDING NUCLEOTIDE SEQUENCES WHICH CODE FOR PROTEINS WITH SIMILAR FUNCTIONS STARTING FROM A PROTEIN SEQUENCE 1. For this exercise you will be using the sequence represented by the protein accession number AAA82165. Obtain the corresponding source organism and FASTA sequence. 2. From the protein record page, click on the link "Run Blast" under the heading "analyze this sequence" on the right side of the page. 3. This time, choose the option tblastn among the different Blast option. 4. As you have done previously, use the same algorithm parameters to find and obtain the FASTA nucleotide sequence for ADH 7 of Myotis ricketti (bats) as well as the FASTA protein sequence. 5. Obtain the FASTA nucleotide sequence and FASTA protein sequence from the record with the accession number U09623. 6. As you did previously, use Clustal omega to determine the percentage identity between the nucleotide sequences and the proteins sequences to answer the following questions: a. What type of homologues are the nucleotide sequences? b. What type of homologues are the protein sequences?