BIOINFORMATICS TRIAL EXAMINATION MASTERS KT-OR

Similar documents
Quantifying sequence similarity

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

EECS730: Introduction to Bioinformatics

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Dr. Amira A. AL-Hosary

Lecture Notes: Markov chains

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22

Computational Biology: Basics & Interesting Problems

STRUCTURAL BIOINFORMATICS I. Fall 2015

Understanding relationship between homologous sequences

How Molecules Evolve. Advantages of Molecular Data for Tree Building. Advantages of Molecular Data for Tree Building

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Using algebraic geometry for phylogenetic reconstruction

Evolutionary Analysis of Viral Genomes

Lecture 4. Models of DNA and protein change. Likelihood methods

Using Bioinformatics to Study Evolutionary Relationships Instructions

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

进化树构建方法的概率方法 第 4 章 : 进化树构建的概率方法 问题介绍. 部分 lid 修改自 i i f l 的 ih l i

Markov Models & DNA Sequence Evolution

Test 3 Version A. On my honor, I have neither given nor received inappropriate or unauthorized information at any time before or during this test.

Test 3 Version A. On my honor, I have neither given nor received inappropriate or unauthorized information at any time before or during this test.

Pairwise & Multiple sequence alignments

MA EXAM 3 INSTRUCTIONS VERSION 01 April 18, Section # and recitation time

Hidden Markov Models

Tree Building Activity

Motivating the need for optimal sequence alignments...

Nucleotide substitution models

Lab 3: Practical Hidden Markov Models (HMM)

CREATING PHYLOGENETIC TREES FROM DNA SEQUENCES

First generation sequencing and pairwise alignment (High-tech, not high throughput) Analysis of Biological Sequences

Modeling Noise in Genetic Sequences

Molecular Evolution, course # Final Exam, May 3, 2006

Without fully opening the exam, check that you have pages 1 through 12.

EVOLUTIONARY DISTANCE MODEL BASED ON DIFFERENTIAL EQUATION AND MARKOV PROCESS

7.36/7.91 recitation CB Lecture #4

Lecture 3: Markov chains.

Without fully opening the exam, check that you have pages 1 through 12.

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Phylogenetic inference

MA EXAM 3 Form A November 12, You must use a #2 pencil on the mark sense sheet (answer sheet).

MA EXAM 3 Form A April 16, You must use a #2 pencil on the mark sense sheet (answer sheet).

Phylogenetics: Building Phylogenetic Trees

Statistics 992 Continuous-time Markov Chains Spring 2004

Test 3 - Answer Key Version B

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Without fully opening the exam, check that you have pages 1 through 12.

Constructing Evolutionary/Phylogenetic Trees

EVOLUTIONARY DISTANCES

Without fully opening the exam, check that you have pages 1 through 13.

What Is Conservation?

Reading for Lecture 13 Release v10

CAP 5510 Lecture 3 Protein Structures

Machine Learning, Midterm Exam

Lecture 18 - Selection and Tests of Neutrality. Gibson and Muse, chapter 5 Nei and Kumar, chapter 12.6 p Hartl, chapter 3, p.

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

NAME... Soc. Sec. #... Remote Location... (if on campus write campus) FINAL EXAM EE568 KUMAR. Sp ' 00

Contra Costa College Course Outline

Bioinformatics for Computer Scientists (Part 2 Sequence Alignment) Sepp Hochreiter

value mean standard deviation

A SURVEY OF ORGANIC CHEMISTRY CHEMISTRY 1315 TuTr 9:35-10:55 am, Boggs B6

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Molecular Evolution and Phylogenetic Tree Reconstruction

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.

C E N T R. Introduction to bioinformatics 2007 E B I O I N F O R M A T I C S V U F O R I N T. Lecture 5 G R A T I V. Pair-wise Sequence Alignment

Practice Questions for Final Exam - Math 1060Q - Fall 2014

Numerical Methods Lecture 7 - Statistics, Probability and Reliability

NUMB3RS Activity: DNA Sequence Alignment. Episode: Guns and Roses

Sequence analysis and comparison

S H/T 0 ph = log([h + ]) E = mc 2 S = klnw G = H T S ph = pk a + log([a ]/[HA]) K a = [H + ][A ]/[HA] G = RTlnK eq e iπ + 1 = 0

Introduction to population genetics & evolution

Lecture 1. ASTR 111 Section 002 Introductory Astronomy Solar System. Dr. Weigel. Outline. Course Overview Topics. Course Overview General Information

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?

Name: Exam 2 Solutions. March 13, 2017

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

BIO 181 GENERAL BIOLOGY I (MAJORS) with Lab (Title change ONLY Oct. 2013) Course Package

Unit 2: Chemistry. Unit Overview:

Welcome to Chemistry 376

Hidden Markov Models

C.DARWIN ( )

Algorithms in Computational Biology (236522) spring 2008 Lecture #1

Campbell Biology AP Edition 11 th Edition, 2018

Math 164-1: Optimization Instructor: Alpár R. Mészáros

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Hidden Markov Models

Inferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies

General Calculus II. Course Text. Course Description. Course Objectives. Course Prerequisites. Important Terms

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

Bioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre

1. (4 % each, total 20 %) Answer each of the following. (No need to show your work for this problem). 3 n. n!? n=1

Transcription:

BIOINFORMATICS KT Maastricht University Faculty of Humanities and Science Knowledge Engineering Study TRIAL EXAMINATION MASTERS KT-OR Examiner: R.L. Westra Date: March 30, 2007 Time: 13:30 15:30 Place: TS06, room 1.014 Notes: 1. The exam is an open-book exam. 2. The textbooks and lecture slides can be used during the exam. 3. The exam consists of x pages (including this page). 4. The exam time is 3 hours (180 minutes). 5. The number of exam questions is 5. 6. The number of points for each question is given (in bold). 7. The maximum number of points is 10. 8. The final exam grade is the sum of the points of the questions answered correctly. 9. The final course grade is the sum of the final exam grade plus the bonus grade that you earned from the student lectures, mini-exams, and skills class hand-ins. The final course grade will be rounded to 10. 10. Before answering the questions, please first read all the exam questions, and then make a plan to spend the three hours. 11. When answer the questions please do not forget: to write your name and student number on each answer page; to number the answers; and to number the answer pages. 1

EXERCISE 1: Short questions: 2 points i. What is a change point, and how could you detect it? ii. What is the importance of the KA/KS ratio? iii. How can you find the root on an unrooted phylogenetic tree? iv. Which operations on a chromosome can change block synteny? v. Why is average linkage better than single linkage? EXERCISE 2: Statistical sequence analysis: 2 points Suppose that we analyse a DNA sequence with the following observed multinomial distribution: p(a) = 0.5 p(c) = 0.1 p(g) = 0.3 p(t) = 0.1 Moreover, suppose that we also observe the following di-nucleotide frequencies: *A *C *G *T A* 0.10 0.10 0.09 0.10 C* 0.02 0.08 0.05 0.08 G* 0.07 0.05 0.07 0.02 T* 0.05 0 0.08 0.04 Use this information to determine unusual dimers (=di-nucleotides). 2

EXERCISE 3: Genetic distance: 2 points As the time of divergence between two sequences grows, the count of differences, d, increases. The true genetic distance, K, between the sequences, however, is not equal to d. a. Explain this phenomenon and describe a method for correcting d to estimate K. b. Describe how the role of transitions and transversions can improve this estimate for K. c. Suppose that two sequences of equal length of 1000 nucleotides, differ at 150 positions. Estimate, following the Jukes- Cantor model, the true genetic distance K, including the magnitude of the error. d. Determine when, for two sequences of equal lengths n, following the Jukes- Cantor model, determining the genetic distance becomes entire inconclusive because the variance exceeds the estimate K. EXERCISE 4: Sequence alignment: 2 points In sequence alignment sequences are compared in order to determine their degree of resemblance. a. Describe the difference(s) between local and global alignment. Suppose that in global alignment we use the following simple scoring function: σ(-,a) = σ(a,-) = σ(a,b) = -1 for all a b σ(a,b) = 1 if a = b for all letters a,b in {A,C,G,T}, and with - representing an indel. Now, suppose we have the following two amino acid sequences: s 1 = FILM s 1 = FEIM b. Write down the dynamic programming table resulting from the global alignment of the two strings s 1 and s 2 using the Needleman-Wunsch algorithm. c. Write down the optimal global alignment of s 1 and s 2 according to the dynamic programming table obtained above. 3

EXERCISE 5: Hidden Markov model: 2 points Consider an odorant receptor, i.e. a protein molecule that sticks in the cell membrane, extending in both the interior and exterior of the cell. This is visualised in the picture below, where the fat curved line represents the folded protein. The floating of this molecule in the cell mebrane is caused by parts of the molecule that likes to be in the membrane (hydrophylic) and parts that definitily not like this (hydrophobic). Therefore, we consider that a part of the molecule can be in two states: H + = hydrophylic, and H = hydrophobic. We employ a Hidden Markov model to estimate the hydrophobic and hydrophylic parts of the molecule, represented as: H + 0.3 H - 0.2 A: 0.1 R: 0.2 N: 0 D: 0.5 C: 0.2 A: 0.3 R: 0.2 N: 0.4 D: 0 C: 0.1 Suppose that we have the following protein sequence: NARNRDCCRN Determine the most likeli hidden sequence of hydrophobicity-states over the molecule using the Viterbi algorithm. 4

ANSWERS ANSWER to EXERCISE 1 Look in book ANSWER to EXERCISE 2 Divifde each possition p2[i,j] by p1[i]*p1[j] and look where this deviates considerably from 1. ANSWER to EXERCISE 3: The Jukes - Cantor Correction As the time of divergence between two sequences increases the probability of a second substitution at any one nucleotide site increases and the increase in the count of differences is slowed. This makes these counts not a desirable measure of distance. In some way, this slow down must be accounted for. The solution to this problem was first noted by Jukes and Cantor (1969; Evol.of Protein Molecules, Academic Press). Instead of calculating distance as a simple count take the distance as (Kimura and Ohta 1972; J.Mol.Evol.2:87-90). A plot of this function for the same range of parameters as in Figure 1 is given in Figure 2. This figure shows that this distance measure increases linearly with time (this is one property that is desirable for a distance measure). This is termed the Jukes & Cantor correction to distance and clearly indicates that divergence is a logarithmic function of time. Observe the large increase in the variance as time increases. As D gets closer and closer over time to 0.75 the variance increases. In the limit as D approaches 0.75, the variance approaches infinity. This is an indication that the measure of distance becomes increasingly less reliable as time increases. Note that in expectation D is less than 0.75 but in reality D can be greater than 0.75. If this is the case then a Jukes-Cantor correction cannot be done - is undefined because the argument of the logarithm will be zero. In this case you can apply a method developed by Tajima (1993, MBE 10:677-688). He suggests using the modified estimator 5

where and With variance This is actually just a different formulation of the same quantity using a Taylor series expansion to avoid the logarithm. This estimator of distance is defined for all parameter values and actually has less bias than Jukes and Cantor's original correction for small levels of divergence. 6

a: Jukes-Cantor b: Kimura 2-param c: d = 150/1000 => K = 167.3577, Var K ~ 1.9922*10-4, => error = n*sqrt(vark) = 1000*V0.0002 ~ 14.1145 => K = 167 +/- 14 d. n = -d(1-d)/((1-4d/3)^2ln(1-4d/3)) ANSWER to EXERCISE 4 a. book b. See book pages 55-58, esp p58 c. F I L M I I I F E I - M ANSWER to EXERCISE 5 See book page 75 7