Profiles. Evolutionary related sequences (orthologs and paralogs) are often identified with local alignment programs like BLAST, FASTA, SSEARCH.

Size: px
Start display at page:

Download "Profiles. Evolutionary related sequences (orthologs and paralogs) are often identified with local alignment programs like BLAST, FASTA, SSEARCH."

Transcription

1 Profiles Tore Samuelsson Nov 9 Background Evolutionary related sequences (orthologs and paralogs) are often identified with local alignment programs like BLAST, FASTA, SSEARCH. However, these methods are not always sufficient. In many cases the amino acid sequences of related proteins have diverged significantly, although the fold of the proteins is preserved.

2 Amino acid sequences may change rapidly during evolution although D structure is preserved Species A ATGGCAAAACTTGAAAAACTGAATCAAGCAGGCCTGATGGTCGCTGGT M A K L E K L N Q A G L M V A G % Species B ATGGCTAGGTTGGAGAAGAUAAACCAAGCTGGGATAATAGTTGCAGGA M V R L E K I N Q A G L L V A G 9% Species C M V R I Q K I N E K G A L L A G 8% Species D Q V R I Q K I Y E K G A L L A A 9% ( twilight zone ) Species E Q V R I Q K I Y E K T A L L F A % ( midnight zone ) In a BLAST search evolutionary related proteins may have very poor E-values Sequences producing significant alignments: Score E (bits) Value SRC_HUMAN (P9) Proto-oncogene tyrosine-protein kinase Src (E... e- YES_HUMAN (P9) Proto-oncogene tyrosine-protein kinase Yes (E... e-8 FYN_HUMAN (P) Proto-oncogene tyrosine-protein kinase Fyn (E... e- FGR_HUMAN (P99) Proto-oncogene tyrosine-protein kinase FGR (E... 9 e- HCK_HUMAN (P8) Tyrosine-protein kinase HCK (EC...) (p... e- LCK_HUMAN (P9) Proto-oncogene tyrosine-protein kinase LCK (E... 8e- LYN_HUMAN (P98) Tyrosine-protein kinase Lyn (EC...) e- BLK_HUMAN (P) Tyrosine-protein kinase BLK (EC...) (B... e- FRK_HUMAN (P8) Tyrosine-protein kinase FRK (EC...) (N... 9 e SHC_HUMAN (Q99) SHC transforming protein (SH domain prote.... SHA_HUMAN (Q9NP) SH domain protein A (T cell-specific adap.... CHIN_HUMAN (P88) N-chimaerin (NC) (N-chimerin) (Alpha chimeri.... APS_HUMAN (O9) SH and PH domain-containing adapter protein.... CISH_HUMAN (Q9NSE) Cytokine-inducible SH-containing protein (C.... SOCS_HUMAN (O) Suppressor of cytokine signaling (SOCS-) CHIO_HUMAN (P) Beta-chimaerin (Beta-chimerin) (Rho-GTPase-a LIPL_HUMAN (P88) Lipoprotein lipase precursor (EC...) ( SOCS_HUMAN (O9) Cytokine inducible SH-containing protein.... ATPF_HUMAN (Q8NM) ATP synthase mitochondrial F complex assem.... TENS_HUMAN (Q9HBL) Tensin.8 FBWA_HUMAN (Q9Y9) F-box/WD-repeat protein A (F-box and WD-re....8 STAT_HUMAN (P) Signal transducer and activator of transcri.... SOCS_HUMAN (O) Suppressor of cytokine signaling (SOCS-).... Profile-based searches are more efficient in identifying remote sequence similarity.

3 Profiles are generated from multiple alignments, and they are one of many applications of multiple alignments. * Profiles * Identify conserved motifs - patterns (PROSITE) * Phylogenetic studies * Prediction of protein secondary structure Multiple alignment generated by methods like Clustalw andtcoffee Terminology Profile. A matrix where the numbers reflect the probabilities of characters appearing in a certain position in a multiple alignment PSSM "Position specific scoring matrix". More of less synonymous with 'profile' but sometimes a 'profile' refers to a matrix where also gaps are taken into account. Sometimes also called Weight or frequency matrix

4 Position-Specific Scoring Matrix (PSSM) Multiple alignment of ' splice site sequences GAGGTAAAC TCCGTAAGT CAGGTTGGA ACAGTCAGT TAGGTCATT TAGGTACTG ATGGTAACT 8 CAGGTATAC 9 TGTGTGAGT AAGGTAAGT Calculate the absolute frequency of each nucleotide at each position PSSM GAGGTAAAC TCCGTAAGT CAGGTTGGA ACAGTCAGT TAGGTCATT TAGGTACTG ATGGTAACT 8 CAGGTATAC 9 TGTGTGAGT AAGGTAAGT A C G T 8 9

5 Calculate the relative frequency of each nucleotide at each position PSSM 8 9 A C G T 8 9 A C G T PSSM What is the probability of finding CAGGTTGGA? The product of the frequency of each nucleotide at each position:. *. *. * * *. *. *. *. 8 9 A C G T

6 Compute the log odds ratios log(m ij /P i ) M ij = probability of nucleotide i at position j P i = background probability of of nucleotide i For this example we assume P i =. 8 9 PSSM A C G T A C G T Scoring with a PSSM PSSM We want to analyze the sequence GTAGTAGAAGGTAAGTGTCCGTAG with the profile 8 9 A C G T We examine a window the size of the profile GTAGTAGAAGGTAAGTGTCCGTAG

7 T G C A 9 8 PSSM Find the score for GTAGTAGAAGGTAAGTGTCCGTAG (- ) GTAGTAGAAGGTAAGTGTCCGTAG (8.) GTAGTAGAAGGTAAGTGTCCGTAG (.) Pseudocounts T G C A 9 8 T 8 G C 8 A 9 8 PSSM

8 PSSM Position-Specific Scoring Matrix (PSSM) Pseudocounts With a very large number of sequences in the multiple alignment an observed amino acid/nt frequency is expected to be approximately equal to the actual probability of finding that amino acid/nt. However, in most cases the number of sequences are limited so that for some amino acids/nts the observed frequency = whereas the actual probability should be >. For this reason fake counts, pseudocounts, are added to avoid zero probability. For instance, one simple solution is to add to all counts. PSSM Pseudocounts More sophisticated : q u, a nu, N a p seq a where q u,a = estimated probability of residue type a occuring in column u p a = frequency of occurrence of residue type a (based on composition of proteins/dna) n u,a = count of residues a in column u N seq = total number of sequences scaling parameter 8

9 Representing profile as a sequence logo Amount of uncertainty in column u: H u fu, a log fu, a a where H u is the uncertainty at position u, a is one of the four bases, or in the case of proteins, one of the amino acids. f u,a is the frequency of base (amino acid) a in column u. Total information at the position u is represented by the decrease in uncertainty : I u = log - H u (proteins) I u = log - H u (DNA) where I u is the amount of information present at column u, and log (or log ) is the maximum uncertainty at any given position. The entire set of I au values forms a curve that represents the importance of various positions. The height of this curve is the height of the logo at that position. The size of each base/amino acid printed in a logo is determined by multiplying the frequency by the total information at that position: Height of base/amino acid a at position u = P au I u The bases/amino acids are then stacked on top of each other in increasing order of their frequencies and plotted. Sequence logo example. Consider the simple amino acid multiple sequence alignment: Seq Seq Seq Seq We use A A A A A S A G T A G G H f, log f, u a u a u a for each of the columns of the multiple alignment. H = - * log () = H = - ((. * log.) +(. * log.)) = H = -(( *. * log.)) = Total height of columns: I u = log - H u I =. - I =. - I =. - 9

10 Sequence logo example, cont. Height of A at position = f A * I = * I Height of A at position = f A * I =. * I Height of G at position = f G * I =. * I Height of A at position = f A * I =. * I etc. I =. I =. I =. This is the sequence logo obtained at if the alignment above is used, and "small sample correction" is deselected. Sequence logo ' splice site example

11 Sequence logo translation start site in bacteria Methods that take into account position-specific information from multiple alignments 99 PSI-BLAST (Altschul et al) ~99 Profile HMMs (S Eddy) => HMMER software

12 Principle of PSI-BLAST Query sequence "Normal" BLAST search Query sequence Database hits Evalue cutoff A C D.. Y Use hits above cutoff PSSM iterate BLAST search with PSSM Database hits Use hits above cutoff PSI-BLAST PSI-BLAST is an important tool to identify remote protein similarity. It proceeds by way of the following steps: () PSI-BLAST takes as an input a single protein sequence and compares it to a protein database, using the gapped BLAST program. () The program constructs a multiple alignment, and then a profile, from any significant local alignments found. The original query sequence serves as a template for the multiple alignment and profile, whose lengths are identical to that of the query. () The profile is compared to the protein database, again seeking local alignments. After a few minor modifications, the BLAST algorithm can be used for this directly. () PSI-BLAST estimates the statistical significance of the local alignments found. Because profile substitution scores are constructed to a fixed scale, and gap scores remain independent of position, the statistical theory and parameters for gapped BLAST alignments remain applicable to profile alignments. () Finally, PSI-BLAST iterates, by returning to step (), an arbitrary number of times or until convergence. Profile-alignment statistics allow PSI-BLAST to proceed as a natural extension of BLAST; the results produced in iterative search steps are comparable to those produced from the first pass. Advantage : Unlike most profile-based search methods, PSI-BLAST runs as one program, starting with a single protein sequence, and the intermediate steps of multiple alignment and profile construction are invisible to the user.

13 PSI-blast - Constructing the profile Query-anchored multiple alignment Query MKDRNLGEK Sbjct MKD-NLAEK Query MKD-RNLGEK Sbjct MKEARNLAEK Pairwise alignments from PSI-blast Query MKD-RNLGEK Sbjct MKD--NLAEK Sbjct MKEARNLAEK disregarded PSI-blast - Constructing the profile

14 Psiblast tutorial

15

16 "This analysis illustrates not only how the search for sequence relatives can reveal the function of a protein, but also how similarity searching serves to unify formerly disparate members of a database" Yeast Pop Pop Pop Rpp Rpr Pop Pop Pop8 Pop Man hpop Rpp9 hpop Rpp Rpp Rpp Rpp8 Rpp Rpp

17 Outcome of PSI-BLAST is dependent on query sequence: Only some Pop homologues as query identifies Rpp Results from round Sequences producing significant alignments: Sequences used in model and found again: Score E (bits) Value POP_Pichia_stipitis 8_pichia_stipitis_FM.aa.fasta unnamed p... e- ref XP_9. PREDICTED: similar to ribonuclease P kda subu... e- ref XP_98. PREDICTED: similar to ribonuclease P (predicte... e- ref NP_. ribonuclease P kda subunit [Homo sapiens] >gi... e- ref XP_. PREDICTED: similar to RPP protein [Pan troglo... 9 e- dbj BAA98. unnamed protein product [Homo sapiens] 9 e- POP_Pichia_guilliermondii supercont_ Minus (of... e- ref XP_. PREDICTED: ribonuclease P (predicted) [Rattus... e- gb AAH8. Ribonuclease P kda subunit [Mus musculus] >gi... e- ref XP_8. PREDICTED: similar to ribonuclease P kda subu... e- gb AAH8. MGC8 protein [Xenopus laevis] e-8 ref XP_898. PREDICTED: similar to RPP protein [Gallus gal... 9e-8 TPRQKVAIIY DVGVSTLYKR FP IPRKQVAIIY DVAVSTLYKK FP HPRQQLAIIF GIGVSTLYRY FP GSKTKLAQAA GIRLASLYSW KG TTFKQIALES GLSTGTISSF IN IPYQEFAKLI GKSTGAVRRM ID VTLQQFAELE GVSERTAYRW TT FTYNQYAQMM NISRENAYGV LA LGASHISKTM NIARSTYVKV IN TGATEIAHQL SIARSTVYKI LE ISISAIAREF NTTRQTILRV KA GNISALADAE NISRKIITRC IN MVLADIAQAV EMHESTISRV TT LVLHDIAEAV GMHESTISRV TT LNLRIVADAI KMHESTVSRV TS MTRGDIGNYL GLTVETISRL LG LSLSALSRQF GYAPTTLANA LE MSLAELGRSN GLSSSTLKNA LD FDIASVAQHV CLSPSRLSHL FR LRIDEVARHV CLSPSRLAHL FR VTLEALADQV AMSPFHLHRL FK VLYPDIAKKF NTTASRVERA IR Profiles: Example with HTH (helix turn helix) motif

18 Result of scoring with HTH profile >lcl AADR_RHOPA (Q98) Transcriptional activatory protein aadr (Anaerobic aromatic degradation regulator) >lcl AGLR_RHIME (Q9ZR) HTH-type transcriptional regulator aglr >lcl ANR_PSEAE (P9) Transcriptional activator protein anr >lcl ARAC_ERWCH (P) Arabinose operon regulatory protein >lcl ASCG_ECOLI (P) HTH-type transcriptional regulator ascg (Cryptic asc operon repressor) >lcl CCPA_BACME (P88) Glucose-resistance amylase regulator (Catabolite control protein) >lcl CCPA_BACSU (P) Catabolite control protein A (Glucose-resistance amylase regulator) >lcl CCPA_STRMU (O9) Probable catabolite control protein A >lcl CCPB_BACSU (P) Catabolite control protein B >lcl CENPB_CRIGR (P8988) Major centromere autoantigen B (Centromere protein B) (CENP-B) >lcl CENPB_HUMAN (P99) Major centromere autoantigen B (Centromere protein B) (CENP-B) >lcl CENPB_MOUSE (P9) Major centromere autoantigen B (Centromere protein B) (CENP-B) >lcl DEGA_BACSU (P9) HTH-type transcriptional regulator dega (Degradation activator) >lcl DEOR_BACSU (P9) Deoxyribonucleoside regulator >lcl EBGR_ECOLI (P8) HTH-type transcriptional regulator ebgr (Ebg operon repressor) >lcl ENDR_PAEPO (P8) Probable HTH-type transcriptional regulator endr >lcl ETRA_SHEON (P8) Electron transport regulator A >lcl FECI_ECOLI (P8) Probable RNA polymerase sigma factor feci >lcl FLP_LACCA (P98) Probable transcriptional regulator flp >lcl FNRA_PSEST (P) Transcriptional activator protein fnra >lcl FNRN_RHILV (P9) Probable transcriptional activator (ORF-) >lcl FNR_ACTAC (Q9EXQ) Anaerobic regulatory protein >lcl FNR_ECO (PA9E) Fumarate and nitrate reduction regulatory protein >lcl FNR_ECOL (PA9E) Fumarate and nitrate reduction regulatory protein >lcl FNR_ECOLI (PA9E) Fumarate and nitrate reduction regulatory protein >lcl FNR_HAEIN (P99) Anaerobic regulatory protein >lcl FNR_KLEOX (Q9AQ) Fumarate nitrate reduction regulatory protein >lcl FNR PASMU (Q9CMY) Anaerobic regulatory protein Methods that take into account position-specific information from multiple alignments 99 PSI-BLAST (Altschul et al) ~99 Profile HMMs (S Eddy) => HMMER software 8

19 Profile HMMs HMMER software package hmmbuild Build a model from a multiple sequence alignment. hmmpfam Search an HMM database for matches to a query sequence. hmmsearch Search a sequence database for matches to a single profile HMM. Pfam database - attempt to completly and accurately classify protein families and domains "All science is either physics or stamp collecting" Ernest Rutherford 9

20 Profile HMMs Pfam database Pfam is a collection of multiple sequence alignments and profile hidden Markov models (HMMs). Each Pfam HMM represents a protein family or domain. By searching a protein sequence against Pfam library of HMMs you can find out its domain architecture. Pfam may also be used to analyse proteomes and domain architectures. Two categories of families: Pfam-A families are manually curated HMM based families which are built using an alignment of a small number of representative sequences ('seed' alignment). A threshold is manually set for each HMM, and this determines the minimum score a sequence must attain to belong to the family. HMMs are searched against the UniProt database, and include all sequences that score above the cut-off value for a particular family in the family's full alignment. Pfam-A matches are very unlikely to be false matches. Profile HMMs Pfam database Pfam-B. To complement the Pfam-A families, Pfam-B families are automatically generated using the PRODOM database. Pfam-B families are formed by taking alignments of sequence segments from PRODOM, and removing any Pfam-A residues from them. (PRODOM is a database of protein domain sequence families constructed using PSI-BLAST analysis of protein sequences as well as using information from the SCOP database.) All families in Pfam are non-overlapping such that no amino acid belongs to more that one family/domain. Two HMMs for each Pfam entry. For each Pfam entry two HMMs are built, one to represent full length matches (ls model), and one to represent fragment matches (fs model).

21 Complexity of Pfam, PfamA families,, protein sequences in Uniprot analyzed => on average ~. PfamA domains/protein a total of, different architectures

22 Databases related to Pfam PRODOM CDD SMART smart.embl-heidelberg.de/ INTERPRO combines information from : Pfam Prints SMART Prosite PRODOM CDD

23 SMART

24 Databases like InterPro have aided considerably in the annotation of the human genome

25 Exercises Compare pw alignment methods BLAST FASTA SSEARCH to profile methods PSI-BLAST Hmmer Protein domain studied : SH domain originally found in oncoproteins Src and Fps. SH domains are found in many proteins taking part in signal transduction pathways. The function of SH domains is to specifically recognize the phosphorylated state of tyrosine residues, thereby allowing SH domain-containing proteins to localize to tyrosine-phosphorylated sites

26 SH domain Step Extract the SH domain from the human SRC protein % extractseq src_human.fa Step Pw alignment methods Step ) BLAST see PSI-blast step below ) FASTA % fasta [input_file] [database] > result ) SSEARCH % ssearch [input_file] [database] > result Profile-based methods PSI-blast % blastpgp -i [input_file] -d [database] -j -o output_file (st round : normal BLAST search) Hmmer % hmmsearch shfs.hmm [database] > result_file where shfs.hmm is the HMM profile for the SH domain

27 HIT proteins / uridylyltransferases The Histidine Triad (HIT) motif, His-phi-His-phi-His-phi-phi (phi, a hydrophobic amino acid) was identified as being highly conserved in a variety of species. Proteins in the HIT superfamily are conserved as nucleotide-binding proteins, and are structurally related to a family of enzymes that includes GalT, a uridylyltransferase. This relationship was first revealed by structural analysis, but may also be detected using PSIblast. Relationship of ATP- and NAD-dependent DNA ligases ATP-dependent DNA ligases: Eukarya, Archaea NAD-dependent DNA ligases: Eubacteria Previously these enzymes were believed to be evolutionary unrelated but PSI-blast provides evidence that they are related.

Patterns and profiles applications of multiple alignments. Tore Samuelsson March 2013

Patterns and profiles applications of multiple alignments. Tore Samuelsson March 2013 Patterns and profiles applications of multiple alignments Tore Samuelsson March 3 Protein patterns and the PROSITE database Proteins that bind the nucleotides ATP or GTP share a short sequence motif Entry

More information

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs15.html Describing & Modeling Patterns

More information

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ Proteomics Chapter 5. Proteomics and the analysis of protein sequence Ⅱ 1 Pairwise similarity searching (1) Figure 5.5: manual alignment One of the amino acids in the top sequence has no equivalent and

More information

CSCE555 Bioinformatics. Protein Function Annotation

CSCE555 Bioinformatics. Protein Function Annotation CSCE555 Bioinformatics Protein Function Annotation Why we need to do function annotation? Fig from: Network-based prediction of protein function. Molecular Systems Biology 3:88. 2007 What s function? The

More information

Week 10: Homology Modelling (II) - HHpred

Week 10: Homology Modelling (II) - HHpred Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative

More information

Motifs, Profiles and Domains. Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC

Motifs, Profiles and Domains. Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC Motifs, Profiles and Domains Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC Comparing Two Proteins Sequence Alignment Determining the pattern of evolution and identifying conserved

More information

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic

More information

EBI web resources II: Ensembl and InterPro

EBI web resources II: Ensembl and InterPro EBI web resources II: Ensembl and InterPro Yanbin Yin http://www.ebi.ac.uk/training/online/course/ 1 Homework 3 Go to http://www.ebi.ac.uk/interpro/training.htmland finish the second online training course

More information

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki. Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein

More information

Similarity searching summary (2)

Similarity searching summary (2) Similarity searching / sequence alignment summary Biol4230 Thurs, February 22, 2016 Bill Pearson wrp@virginia.edu 4-2818 Pinn 6-057 What have we covered? Homology excess similiarity but no excess similarity

More information

Genome Annotation Project Presentation

Genome Annotation Project Presentation Halogeometricum borinquense Genome Annotation Project Presentation Loci Hbor_05620 & Hbor_05470 Presented by: Mohammad Reza Najaf Tomaraei Hbor_05620 Basic Information DNA Coordinates: 527,512 528,261

More information

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Jianlin Cheng, PhD Department of Computer Science Informatics Institute 2011 Topics Introduction Biological Sequence Alignment and Database Search Analysis of gene expression

More information

Christian Sigrist. November 14 Protein Bioinformatics: Sequence-Structure-Function 2018 Basel

Christian Sigrist. November 14 Protein Bioinformatics: Sequence-Structure-Function 2018 Basel Christian Sigrist General Definition on Conserved Regions Conserved regions in proteins can be classified into 5 different groups: Domains: specific combination of secondary structures organized into a

More information

Amino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1

Amino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1 Amino Acid Structures from Klug & Cummings 10/7/2003 CAP/CGS 5991: Lecture 7 1 Amino Acid Structures from Klug & Cummings 10/7/2003 CAP/CGS 5991: Lecture 7 2 Amino Acid Structures from Klug & Cummings

More information

Amino Acid Structures from Klug & Cummings. Bioinformatics (Lec 12)

Amino Acid Structures from Klug & Cummings. Bioinformatics (Lec 12) Amino Acid Structures from Klug & Cummings 2/17/05 1 Amino Acid Structures from Klug & Cummings 2/17/05 2 Amino Acid Structures from Klug & Cummings 2/17/05 3 Amino Acid Structures from Klug & Cummings

More information

Genomics and bioinformatics summary. Finding genes -- computer searches

Genomics and bioinformatics summary. Finding genes -- computer searches Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence

More information

Multiple sequence alignment

Multiple sequence alignment Multiple sequence alignment Multiple sequence alignment: today s goals to define what a multiple sequence alignment is and how it is generated; to describe profile HMMs to introduce databases of multiple

More information

Large-Scale Genomic Surveys

Large-Scale Genomic Surveys Bioinformatics Subtopics Fold Recognition Secondary Structure Prediction Docking & Drug Design Protein Geometry Protein Flexibility Homology Modeling Sequence Alignment Structure Classification Gene Prediction

More information

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding

More information

Protein function prediction based on sequence analysis

Protein function prediction based on sequence analysis Performing sequence searches Post-Blast analysis, Using profiles and pattern-matching Protein function prediction based on sequence analysis Slides from a lecture on MOL204 - Applied Bioinformatics 18-Oct-2005

More information

-max_target_seqs: maximum number of targets to report

-max_target_seqs: maximum number of targets to report Review of exercise 1 tblastn -num_threads 2 -db contig -query DH10B.fasta -out blastout.xls -evalue 1e-10 -outfmt "6 qseqid sseqid qstart qend sstart send length nident pident evalue" Other options: -max_target_seqs:

More information

RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES

RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES Molecular Biology-2018 1 Definitions: RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES Heterologues: Genes or proteins that possess different sequences and activities. Homologues: Genes or proteins that

More information

Homology. and. Information Gathering and Domain Annotation for Proteins

Homology. and. Information Gathering and Domain Annotation for Proteins Homology and Information Gathering and Domain Annotation for Proteins Outline WHAT IS HOMOLOGY? HOW TO GATHER KNOWN PROTEIN INFORMATION? HOW TO ANNOTATE PROTEIN DOMAINS? EXAMPLES AND EXERCISES Homology

More information

Hidden Markov Models (HMMs) and Profiles

Hidden Markov Models (HMMs) and Profiles Hidden Markov Models (HMMs) and Profiles Swiss Institute of Bioinformatics (SIB) 26-30 November 2001 Markov Chain Models A Markov Chain Model is a succession of states S i (i = 0, 1,...) connected by transitions.

More information

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010 BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for

More information

EBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013

EBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013 EBI web resources II: Ensembl and InterPro Yanbin Yin Spring 2013 1 Outline Intro to genome annotation Protein family/domain databases InterPro, Pfam, Superfamily etc. Genome browser Ensembl Hands on Practice

More information

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega BLAST Multiple Sequence Alignments: Clustal Omega What does basic BLAST do (e.g. what is input sequence and how does BLAST look for matches?) Susan Parrish McDaniel College Multiple Sequence Alignments

More information

BLAST. Varieties of BLAST

BLAST. Varieties of BLAST BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database

More information

Sequence alignment methods. Pairwise alignment. The universe of biological sequence analysis

Sequence alignment methods. Pairwise alignment. The universe of biological sequence analysis he universe of biological sequence analysis Word/pattern recognition- Identification of restriction enzyme cleavage sites Sequence alignment methods PstI he universe of biological sequence analysis - prediction

More information

EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics EECS730: Introduction to Bioinformatics Lecture 07: profile Hidden Markov Model http://bibiserv.techfak.uni-bielefeld.de/sadr2/databasesearch/hmmer/profilehmm.gif Slides adapted from Dr. Shaojie Zhang

More information

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing Bioinformatics Proteins II. - Pattern, Profile, & Structure Database Searching Robert Latek, Ph.D. Bioinformatics, Biocomputing WIBR Bioinformatics Course, Whitehead Institute, 2002 1 Proteins I.-III.

More information

Sequence analysis and comparison

Sequence analysis and comparison The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

More information

Genome Annotation. Qi Sun Bioinformatics Facility Cornell University

Genome Annotation. Qi Sun Bioinformatics Facility Cornell University Genome Annotation Qi Sun Bioinformatics Facility Cornell University Some basic bioinformatics tools BLAST PSI-BLAST - Position-Specific Scoring Matrix HMM - Hidden Markov Model NCBI BLAST How does BLAST

More information

Ch. 9 Multiple Sequence Alignment (MSA)

Ch. 9 Multiple Sequence Alignment (MSA) Ch. 9 Multiple Sequence Alignment (MSA) - gather seqs. to make MSA - doing MSA with ClustalW - doing MSA with Tcoffee - comparing seqs. that cannot align Introduction - from pairwise alignment to MSA -

More information

Introduction to Evolutionary Concepts

Introduction to Evolutionary Concepts Introduction to Evolutionary Concepts and VMD/MultiSeq - Part I Zaida (Zan) Luthey-Schulten Dept. Chemistry, Beckman Institute, Biophysics, Institute of Genomics Biology, & Physics NIH Workshop 2009 VMD/MultiSeq

More information

Basic Local Alignment Search Tool

Basic Local Alignment Search Tool Basic Local Alignment Search Tool Alignments used to uncover homologies between sequences combined with phylogenetic studies o can determine orthologous and paralogous relationships Local Alignment uses

More information

Comparative Bioinformatics Midterm II Fall 2004

Comparative Bioinformatics Midterm II Fall 2004 Comparative Bioinformatics Midterm II Fall 2004 Objective Answer, part I: For each of the following, select the single best answer or completion of the phrase. (3 points each) 1. Deinococcus radiodurans

More information

Homology and Information Gathering and Domain Annotation for Proteins

Homology and Information Gathering and Domain Annotation for Proteins Homology and Information Gathering and Domain Annotation for Proteins Outline Homology Information Gathering for Proteins Domain Annotation for Proteins Examples and exercises The concept of homology The

More information

A profile-based protein sequence alignment algorithm for a domain clustering database

A profile-based protein sequence alignment algorithm for a domain clustering database A profile-based protein sequence alignment algorithm for a domain clustering database Lin Xu,2 Fa Zhang and Zhiyong Liu 3, Key Laboratory of Computer System and architecture, the Institute of Computing

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Exercise 5. Sequence Profiles & BLAST

Exercise 5. Sequence Profiles & BLAST Exercise 5 Sequence Profiles & BLAST 1 Substitution Matrix (BLOSUM62) Likelihood to substitute one amino acid with another Figure taken from https://en.wikipedia.org/wiki/blosum 2 Substitution Matrix (BLOSUM62)

More information

Protein Structure Prediction Using Neural Networks

Protein Structure Prediction Using Neural Networks Protein Structure Prediction Using Neural Networks Martha Mercaldi Kasia Wilamowska Literature Review December 16, 2003 The Protein Folding Problem Evolution of Neural Networks Neural networks originally

More information

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting. Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms   Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Sequences, Structures, and Gene Regulatory Networks

Sequences, Structures, and Gene Regulatory Networks Sequences, Structures, and Gene Regulatory Networks Learning Outcomes After this class, you will Understand gene expression and protein structure in more detail Appreciate why biologists like to align

More information

Computational Genomics and Molecular Biology, Fall

Computational Genomics and Molecular Biology, Fall Computational Genomics and Molecular Biology, Fall 2014 1 HMM Lecture Notes Dannie Durand and Rose Hoberman November 6th Introduction In the last few lectures, we have focused on three problems related

More information

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Bioinformatics. Dept. of Computational Biology & Bioinformatics Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS

More information

Functional Annotation

Functional Annotation Functional Annotation Outline Introduction Strategy Pipeline Databases Now, what s next? Functional Annotation Adding the layers of analysis and interpretation necessary to extract its biological significance

More information

An Introduction to Sequence Similarity ( Homology ) Searching

An Introduction to Sequence Similarity ( Homology ) Searching An Introduction to Sequence Similarity ( Homology ) Searching Gary D. Stormo 1 UNIT 3.1 1 Washington University, School of Medicine, St. Louis, Missouri ABSTRACT Homologous sequences usually have the same,

More information

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB Homology Modeling (Comparative Structure Modeling) Aims of Structural Genomics High-throughput 3D structure determination and analysis To determine or predict the 3D structures of all the proteins encoded

More information

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. A global picture of the protein universe will help us to understand

More information

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program) Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program) Course Name: Structural Bioinformatics Course Description: Instructor: This course introduces fundamental concepts and methods for structural

More information

Introductory course on Multiple Sequence Alignment Part I: Theoretical foundations

Introductory course on Multiple Sequence Alignment Part I: Theoretical foundations Sequence Analysis and Structure Prediction Service Centro Nacional de Biotecnología CSIC 8-10 May, 2013 Introductory course on Multiple Sequence Alignment Part I: Theoretical foundations Course Notes Instructor:

More information

Sequence Analysis, '18 -- lecture 9. Families and superfamilies. Sequence weights. Profiles. Logos. Building a representative model for a gene.

Sequence Analysis, '18 -- lecture 9. Families and superfamilies. Sequence weights. Profiles. Logos. Building a representative model for a gene. Sequence Analysis, '18 -- lecture 9 Families and superfamilies. Sequence weights. Profiles. Logos. Building a representative model for a gene. How can I represent thousands of homolog sequences in a compact

More information

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013 Sequence Alignments Dynamic programming approaches, scoring, and significance Lucy Skrabanek ICB, WMC January 31, 213 Sequence alignment Compare two (or more) sequences to: Find regions of conservation

More information

Comparative Features of Multicellular Eukaryotic Genomes

Comparative Features of Multicellular Eukaryotic Genomes Comparative Features of Multicellular Eukaryotic Genomes C elegans A thaliana O. Sativa D. melanogaster M. musculus H. sapiens Size (Mb) 97 115 389 120 2500 2900 # Genes 18,425 25,498 37,544 13,601 30,000

More information

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748 CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/8/07 CAP5510 1 Pattern Discovery 2/8/07 CAP5510 2 Patterns Nature

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S3 (box) Methods Methods Genome weighting The currently available collection of archaeal and bacterial genomes has a highly biased distribution of isolates across taxa. For example,

More information

HIDDEN MARKOV MODELS FOR REMOTE PROTEIN HOMOLOGY DETECTION

HIDDEN MARKOV MODELS FOR REMOTE PROTEIN HOMOLOGY DETECTION From THE CENTER FOR GENOMICS AND BIOINFORMATICS Karolinska Institutet, Stockholm, Sweden HIDDEN MARKOV MODELS FOR REMOTE PROTEIN HOMOLOGY DETECTION Markus Wistrand Stockholm 2005 All previously published

More information

Bioinformatics Chapter 1. Introduction

Bioinformatics Chapter 1. Introduction Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!

More information

SI Materials and Methods

SI Materials and Methods SI Materials and Methods Gibbs Sampling with Informative Priors. Full description of the PhyloGibbs algorithm, including comprehensive tests on synthetic and yeast data sets, can be found in Siddharthan

More information

Alignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU)

Alignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Alignment principles and homology searching using (PSI-)BLAST Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) http://ibivu.cs.vu.nl Bioinformatics Nothing in Biology makes sense except in

More information

Graph Alignment and Biological Networks

Graph Alignment and Biological Networks Graph Alignment and Biological Networks Johannes Berg http://www.uni-koeln.de/ berg Institute for Theoretical Physics University of Cologne Germany p.1/12 Networks in molecular biology New large-scale

More information

Chapter 12. Genes: Expression and Regulation

Chapter 12. Genes: Expression and Regulation Chapter 12 Genes: Expression and Regulation 1 DNA Transcription or RNA Synthesis produces three types of RNA trna carries amino acids during protein synthesis rrna component of ribosomes mrna directs protein

More information

Introduction to Bioinformatics Online Course: IBT

Introduction to Bioinformatics Online Course: IBT Introduction to Bioinformatics Online Course: IBT Multiple Sequence Alignment Building Multiple Sequence Alignment Lec1 Building a Multiple Sequence Alignment Learning Outcomes 1- Understanding Why multiple

More information

Structure to Function. Molecular Bioinformatics, X3, 2006

Structure to Function. Molecular Bioinformatics, X3, 2006 Structure to Function Molecular Bioinformatics, X3, 2006 Structural GeNOMICS Structural Genomics project aims at determination of 3D structures of all proteins: - organize known proteins into families

More information

Bioinformatics 1--lectures 15, 16. Markov chains Hidden Markov models Profile HMMs

Bioinformatics 1--lectures 15, 16. Markov chains Hidden Markov models Profile HMMs Bioinformatics 1--lectures 15, 16 Markov chains Hidden Markov models Profile HMMs target sequence database input to database search results are sequence family pseudocounts or background-weighted pseudocounts

More information

Sequence Alignment Techniques and Their Uses

Sequence Alignment Techniques and Their Uses Sequence Alignment Techniques and Their Uses Sarah Fiorentino Since rapid sequencing technology and whole genomes sequencing, the amount of sequence information has grown exponentially. With all of this

More information

We have: We will: Assembled six genomes Made predictions of most likely gene locations. Add a layers of biological meaning to the sequences

We have: We will: Assembled six genomes Made predictions of most likely gene locations. Add a layers of biological meaning to the sequences Recap We have: Assembled six genomes Made predictions of most likely gene locations We will: Add a layers of biological meaning to the sequences Start with Biology This will motivate the choices we make

More information

Template-Based 3D Structure Prediction

Template-Based 3D Structure Prediction Template-Based 3D Structure Prediction Sequence and Structure-based Template Detection and Alignment Issues The rate of new sequences is growing exponentially relative to the rate of protein structures

More information

Tools and Algorithms in Bioinformatics

Tools and Algorithms in Bioinformatics Tools and Algorithms in Bioinformatics GCBA815, Fall 2013 Week3: Blast Algorithm, theory and practice Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and Systems Biology

More information

Sequence Analysis and Databases 2: Sequences and Multiple Alignments

Sequence Analysis and Databases 2: Sequences and Multiple Alignments 1 Sequence Analysis and Databases 2: Sequences and Multiple Alignments Jose María González-Izarzugaza Martínez CNIO Spanish National Cancer Research Centre (jmgonzalez@cnio.es) 2 Sequence Comparisons:

More information

Hidden Markov Models and Their Applications in Biological Sequence Analysis

Hidden Markov Models and Their Applications in Biological Sequence Analysis Hidden Markov Models and Their Applications in Biological Sequence Analysis Byung-Jun Yoon Dept. of Electrical & Computer Engineering Texas A&M University, College Station, TX 77843-3128, USA Abstract

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Gene function annotation

Gene function annotation Gene function annotation Paul D. Thomas, Ph.D. University of Southern California What is function annotation? The formal answer to the question: what does this gene do? The association between: a description

More information

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1 Tiffany Samaroo MB&B 452a December 8, 2003 Take Home Final Topic 1 Prior to 1970, protein and DNA sequence alignment was limited to visual comparison. This was a very tedious process; even proteins with

More information

Multiple Sequence Alignments

Multiple Sequence Alignments Multiple Sequence Alignments...... Elements of Bioinformatics Spring, 2003 Tom Carter http://astarte.csustan.edu/ tom/ March, 2003 1 Sequence Alignments Often, we would like to make direct comparisons

More information

Procedure to Create NCBI KOGS

Procedure to Create NCBI KOGS Procedure to Create NCBI KOGS full details in: Tatusov et al (2003) BMC Bioinformatics 4:41. 1. Detect and mask typical repetitive domains Reason: masking prevents spurious lumping of non-orthologs based

More information

A Protein Ontology from Large-scale Textmining?

A Protein Ontology from Large-scale Textmining? A Protein Ontology from Large-scale Textmining? Protege-Workshop Manchester, 07-07-2003 Kai Kumpf, Juliane Fluck and Martin Hofmann Instructive mistakes: a narrative Aim: Protein ontology that supports

More information

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

Practical search strategies

Practical search strategies Computational and Comparative Genomics Similarity Searching II Practical search strategies Bill Pearson wrp@virginia.edu 1 Protein Evolution and Sequence Similarity Similarity Searching I What is Homology

More information

Hands-On Nine The PAX6 Gene and Protein

Hands-On Nine The PAX6 Gene and Protein Hands-On Nine The PAX6 Gene and Protein Main Purpose of Hands-On Activity: Using bioinformatics tools to examine the sequences, homology, and disease relevance of the Pax6: a master gene of eye formation.

More information

Conditional Graphical Models

Conditional Graphical Models PhD Thesis Proposal Conditional Graphical Models for Protein Structure Prediction Yan Liu Language Technologies Institute University Thesis Committee Jaime Carbonell (Chair) John Lafferty Eric P. Xing

More information

PROTEIN FUNCTION PREDICTION WITH AMINO ACID SEQUENCE AND SECONDARY STRUCTURE ALIGNMENT SCORES

PROTEIN FUNCTION PREDICTION WITH AMINO ACID SEQUENCE AND SECONDARY STRUCTURE ALIGNMENT SCORES PROTEIN FUNCTION PREDICTION WITH AMINO ACID SEQUENCE AND SECONDARY STRUCTURE ALIGNMENT SCORES Eser Aygün 1, Caner Kömürlü 2, Zafer Aydin 3 and Zehra Çataltepe 1 1 Computer Engineering Department and 2

More information

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

CISC 636 Computational Biology & Bioinformatics (Fall 2016) CISC 636 Computational Biology & Bioinformatics (Fall 2016) Predicting Protein-Protein Interactions CISC636, F16, Lec22, Liao 1 Background Proteins do not function as isolated entities. Protein-Protein

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value

More information

Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5

Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5 Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5 Why Look at More Than One Sequence? 1. Multiple Sequence Alignment shows patterns of conservation 2. What and how many

More information

Quantitative Bioinformatics

Quantitative Bioinformatics Chapter 9 Class Notes Signals in DNA 9.1. The Biological Problem: since proteins cannot read, how do they recognize nucleotides such as A, C, G, T? Although only approximate, proteins actually recognize

More information

Mitochondrial Genome Annotation

Mitochondrial Genome Annotation Protein Genes 1,2 1 Institute of Bioinformatics University of Leipzig 2 Department of Bioinformatics Lebanese University TBI Bled 2015 Outline Introduction Mitochondrial DNA Problem Tools Training Annotation

More information

Introduction to protein alignments

Introduction to protein alignments Introduction to protein alignments Comparative Analysis of Proteins Experimental evidence from one or more proteins can be used to infer function of related protein(s). Gene A Gene X Protein A compare

More information

Bioinformatics for Biologists

Bioinformatics for Biologists Bioinformatics for Biologists Sequence Analysis: Part I. Pairwise alignment and database searching Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Bioinformatics Definitions The use of computational

More information

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Homology Modeling. Roberto Lins EPFL - summer semester 2005 Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,

More information

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison 10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)

More information

Bioinformatics Exercises

Bioinformatics Exercises Bioinformatics Exercises AP Biology Teachers Workshop Susan Cates, Ph.D. Evolution of Species Phylogenetic Trees show the relatedness of organisms Common Ancestor (Root of the tree) 1 Rooted vs. Unrooted

More information

Some Problems from Enzyme Families

Some Problems from Enzyme Families Some Problems from Enzyme Families Greg Butler Department of Computer Science Concordia University, Montreal www.cs.concordia.ca/~faculty/gregb gregb@cs.concordia.ca Abstract I will discuss some problems

More information

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona Toni Gabaldón Contact: tgabaldon@crg.es Group website: http://gabaldonlab.crg.es Science blog: http://treevolution.blogspot.com

More information