Example questions. Z:\summer_10_teaching\bioinfo\Beispiel_frage_bioinformatik.doc [1 / 5]

Similar documents
Lecture 1, 31/10/2001: Introduction to sequence alignment. The Needleman-Wunsch algorithm for global sequence alignment: description and properties

Introduction to sequence alignment. Local alignment the Smith-Waterman algorithm

Quantifying sequence similarity

Sequence comparison: Score matrices

Tools and Algorithms in Bioinformatics

Week 10: Homology Modelling (II) - HHpred

Tools and Algorithms in Bioinformatics

Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Administration. ndrew Torda April /04/2008 [ 1 ]

Warm-Up- Review Natural Selection and Reproduction for quiz today!!!! Notes on Evidence of Evolution Work on Vocabulary and Lab

Copyright 2000 N. AYDIN. All rights reserved. 1

Administration. Sprache? zu verhandeln. Zweite Hälfte aber ein Kurs. Topics Proteins / DNA / RNA alignments, conservation,.. Andrew Torda June 2018

Sequence Analysis, '18 -- lecture 9. Families and superfamilies. Sequence weights. Profiles. Logos. Building a representative model for a gene.

Phylogenetic inference

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Bioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Tree Building Activity

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

Large-Scale Genomic Surveys

Protein structures and comparisons ndrew Torda Bioinformatik, Mai 2008

Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Sequence Bioinformatics. Multiple Sequence Alignment Waqas Nasir

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Computational Biology

Pairwise sequence alignments

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

Biochemistry 324 Bioinformatics. Pairwise sequence alignment

Exercise 5. Sequence Profiles & BLAST

Dr. Amira A. AL-Hosary

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Substitution matrices

Overview Multiple Sequence Alignment

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Moreover, the circular logic

MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE

Bioinformatics and BLAST

Sequence Alignment: Scoring Schemes. COMP 571 Luay Nakhleh, Rice University

Multiple Sequence Alignment

SCIENTIFIC EVIDENCE TO SUPPORT THE THEORY OF EVOLUTION. Using Anatomy, Embryology, Biochemistry, and Paleontology

First generation sequencing and pairwise alignment (High-tech, not high throughput) Analysis of Biological Sequences

Sequence analysis and Genomics

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

Phylogeny Tree Algorithms

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Collected Works of Charles Dickens

Pairwise sequence alignments. Vassilios Ioannidis (From Volker Flegel )

Sequence Alignment Techniques and Their Uses

Introduction to Bioinformatics

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Cladistics and Bioinformatics Questions 2013

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Bioinformatics for Computer Scientists (Part 2 Sequence Alignment) Sepp Hochreiter

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

Sequence Database Search Techniques I: Blast and PatternHunter tools

Single alignment: Substitution Matrix. 16 march 2017

Lecture Notes: Markov chains

Hidden Markov Models

1.5 Sequence alignment

Algorithms in Bioinformatics

Lecture 5,6 Local sequence alignment

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Basic Local Alignment Search Tool

Chapter 7: Rapid alignment methods: FASTA and BLAST

Sequence Analysis '17- lecture 8. Multiple sequence alignment

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Constructing Evolutionary/Phylogenetic Trees

BLAST. Varieties of BLAST

An ant colony algorithm for multiple sequence alignment in bioinformatics

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

Accurate prediction for atomic-level protein design and its application in diversifying the near-optimal sequence space

Pairwise & Multiple sequence alignments

Sequence Alignment (chapter 6)

Berg Tymoczko Stryer Biochemistry Sixth Edition Chapter 1:

Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Genomics and bioinformatics summary. Finding genes -- computer searches

Building Phylogenetic Trees UPGMA & NJ

BLAST: Target frequencies and information content Dannie Durand

Motivating the need for optimal sequence alignments...

Advanced topics in bioinformatics

Finding the Best Biological Pairwise Alignment Through Genetic Algorithm Determinando o Melhor Alinhamento Biológico Através do Algoritmo Genético

Transcription:

Example questions for Bioinformatics, first semester half Sommersemester 00 ote The schriftliche Klausur wurde auf deutsch geschrieben The questions will be based on material from the Übungen and the Lectures. We missed most of the first lecture. If we do not finish the planned lectures, the last few questions are not relevant and the last topics will not appear in the final Prüfung. Example questions. You are given two D sequences to align GTTTTT and GTTTG You have a scoring scheme where a match gives you + a mismatch gives you 0 gap opening costs 0 Write down the best alignment of the two sequences. You have a scoring scheme where match gives you + a mismatch gives you opening a gap costs you Write down the best alignment for the same two D sequences. Z:\summer_0_teaching\bioinfo\Beispiel_frage_bioinformatik.doc 0.05.00 [ / 5]

3. You are aligning protein sequences using a substitution matrix : R D Q E G H I L K M F P S T W Y V 4 - - - 0 - - 0 - - - - - - - 0-3 - 0 R - 5 0 - -3 0-0 -3 - - -3 - - - -3 - -3-0 6-3 0 0 0-3 -3 0 - -3-0 -4 - -3 D - - 6-3 0 - - -3-4 - -3-3 - 0 - -4-3 -3 0-3 -3-3 9-3 -4-3 -3 - - -3 - - -3 - - - - - Q - 0 0-3 5-0 -3-0 -3-0 - - - - E - 0 0-4 5-0 -3-3 - -3-0 - -3 - - G 0-0 - -3 - - 6 - -4-4 - -3-3 - 0 - - -3-3 H - 0 - -3 0 0-8 -3-3 - - - - - - - -3 I - -3-3 -3 - -3-3 -4-3 4-3 0-3 - - -3-3 L - - -3-4 - - -3-4 -3 4-0 -3 - - - - K - 0 - -3 - - -3-5 - -3-0 - -3 - - M - - - -3-0 - -3 - - 5 0 - - - - - F - -3-3 -3 - -3-3 -3-0 0-3 0 6-4 - - 3 - P - - - - -3 - - - - -3-3 - - -4 7 - - -4-3 - S - 0-0 0 0 - - - 0 - - - 4-3 - - T 0-0 - - - - - - - - - - - - 5 - - 0 W -3-3 -4-4 - - -3 - - -3 - -3 - -4-3 - -3 Y - - - -3 - - - -3 - - - - 3-3 - - 7 - V 0-3 -3-3 - - - -3-3 3 - - - - 0-3 - 4 Gap opening costs -8. Gap widening (extension) costs -. You are given an alignment DQRST -D-RST 4. Given DQRST -D--ST 5. DQRST -D-SST 6. You have calculated a score matrix for a pair of D sequences. You have performed the traceback calculation and found a result like: T T T Write down the corresponding sequence alignment with gaps in the correct positions. 7. Outline the steps used to find values for a BLOSUM amino acid similarity matrix. Z:\summer_0_teaching\bioinfo\Beispiel_frage_bioinformatik.doc 0.05.00 [ / 5]

8. What is the advantage of a eedleman-wunsch alignment compared to a seeded alignment? 9. What is the advantage of a seeded method like BLST compared to a eedleman-wunsch alignment? 0. ame an application where you would use a method like BLST and not a eedleman-wunsch alignment.. ame an application where you would use a slow method like eedleman and Wunsch, rather than a BLST-like method.. You have two proteins with weak, remote similarity. You know their sequences. You also know the sequences of the original D. Why would you expect a better alignment using the protein sequences? 3. In protein alignments, we do not just look at match/mismatches. We look at the similarities between amino acids. How are these represented? 4. There are several different substitution matrices used in protein alignments. Why would you prefer one over another? 5. What is the difference between an iterated blast (psi-blast) search and a simple blast search? 6. What is the advantage of an iterated blast search compared to a simple blast search? 7. I have written a program that generates random D sequences. I expect to see 5 % sequence identity between pairs of random sequences in gapped alignments. When I try the calculation, I usually see more than 5 % sequence identity. Why? 8. If I take random biological sequences from a data bank, I see even more sequence similarity. Why? 9. I have two proteins with 0 % sequence identity. I ask you if this is likely to be significant. What other simple piece of information do you need to answer this question properly? 0. If you use the program "chimera", what representation would you pick in order to see the secondary structure?. I want to calculate a multiple sequence alignment for seq sequences. How many pair-wise alignments will I have to calculate?. In a multiple sequence alignment, you want to maximise a score, score = words? seq seq S a b b a a=, What is this score in 3. In a multiple sequence alignment, I want to build a "guide tree". What determines the order in which I join the nodes together? 4. I have 3 sequences,, B,. The sequences B and are both related to, but I cannot get a good alignment score when I align B and. What could be a reason? Draw a diagram if it is easier to explain. 5. I have calculated a multiple sequence alignment. I want to find which sites in the alignment are conserved and which vary. I would like to make a plot of variability/conservation as a function of sequence position. You remember a formula S = states i= p i ln p i What is the meaning of p i? Z:\summer_0_teaching\bioinfo\Beispiel_frage_bioinformatik.doc 0.05.00 [3 / 5]

6. From a multiple sequence alignment, I have calculated variability/conservation as a function of sequence position: conserved. Why might they be important residues? Residues 37 and 43 seem to be very 7. In the picture above, some sites are not very conserved. I say these residues cannot be important to the function of the protein. Why may I be wrong? 8. I have calculated a sequence alignment of 400 tyrosine kinases and I find that very few sites seem to be conserved in evolution. How could I change my results, so that more sites seem to be conserved? 9. We have a family of sequences and all pair-wise alignments. I can count the number of differences (mutations) between any two sequences and calculate the fraction of residues that have changed : diff p mut =. I would like to estimate evolutionary time by saying t = k p mut for some constant k. length This is not a good measure. Why not? 30. I want to use aligned D sequences to build a phylogenetic tree. ame two reasons that the branches in the tree may not be reliable. 3. I have built a phylogenetic tree using a neighbour joining method. Describe a general approach I could use to see how reliable the tree is. 3. It is believed that protein sequence evolves and changes faster than protein structure. What could be an evolutionary explanation for this. 33. structure determined by X-ray crystallography has a resolution of.5 Å. When I look at the coordinates, I find every backbone - distance is.3 Å. How can the coordinates seem so accurate if the measurement is so imprecise? 34. I want to compare two protein structures using the root mean square difference of coordinates, given by res res i= r a i 3.5 S.5 0.5 r b i where res is the number of residues in a protein and a r i is the coordinate vector of the α of residue i in protein a. What steps do I have to perform on the coordinates of protein a or b before I can apply the formula? 3.5 0 residue number 0 50 00 Z:\summer_0_teaching\bioinfo\Beispiel_frage_bioinformatik.doc 0.05.00 [4 / 5]

35. I have two closely related proteins from different organisms with similar structures and sequences. The proteins are of slightly different sizes. Describe a set of steps one would need to apply the root mean square difference formula given above. 36. Which graphical representation would you use in order to emphasize the secondary structure content of protein? all-atom, chaintrace, ribbon,...? 37. You have a protein of unknown function from a bacterium. You have made a knock-out mutant, but the bacteria die immediately without the corresponding gene. You have sequenced the protein. What steps would you take to guess the function of the protein? What kind of information would you look for? 38. n welchem Merkmal erkennt man phylogenetische Bäume, die mit der UPGM Methode erstellt wurden? 39. Sie haben Multiple Sequenzalignment aus a) 0 Sequenzen aus sehr ähnlichen Quellen b) 967 ähnlichen Sequenzen c) 3 sehr diversen Sequenzen Welche Methode würden Sie in jeden oben genannten Fällen wählen um aus den lignments phylogenetische Bäume zu berechnen? Warum? 40. Wie würden Sie vorgehen um in einem Multiplen Sequenzalignment potentiell katalytisch wichtige Seitenketten im aktiven Zentrum zu identifizieren? Z:\summer_0_teaching\bioinfo\Beispiel_frage_bioinformatik.doc 0.05.00 [5 / 5]