Evolutionary Change in Nucleotide Sequences. Lecture 3
|
|
- Emil Morrison
- 6 years ago
- Views:
Transcription
1 Evolutionary Change in Nucleotide Sequences Lecture 3 1
2 So far, we described the evolutionary process as a series of gene substitutions in which new alleles, each arising as a mutation ti in a single individual, id progressively increase their frequency and ultimately become fixed in the population. 2
3 We may look at the process from a different point of view. An allele that becomes fixed is different in its sequence from the allele that it replaces. That is, the substitution of a new allele for an old one is the substitution of a new sequence for a previous sequence. 3
4 If we use a time scale in which one time unit is larger than the time of fixation, then the DNA sequence at any given locus will appear to change with time. actgggggtaaactatcggtatagatcataa g actgggggttaactatcggtatagatcataa actgggggttaactatcggtatagatcataa tt t t t t t t actgggggttaactatcggtatagatcataa g actgggggtgaactatcggtatagatcataa actgggggtgaactatcggtacagatcataa 4
5 To study the dynamics of nucleotide substitution, we must make several assumptions regarding the probability of substitution of a nucleotide by another. 5
6 Jukes &C Cantor s one-parameter model 6
7 Assumption: Substitutions occur with equal probabilities among the four nucleotide types. 7
8 If the nucleotide residing at a certain site in a DNA sequence is A at time 0, what is the probability, P, that A(t) this site will be occupied by A at time t? 8
9 Since we start with A, P A(0) = 1. At time 1, the probability of still having A at this site is P = 1 3α A (1) where 3α is the probability of A changing g to T, C, or G, and 1 3α is the probability that A has remained unchanged. 9
10 To derive the probability of having A at time 2, we consider two possible scenarios: 10
11 1. The nucleotide has remained unchanged from time 0 to time 2. 11
12 2. The nucleotide has changed to T, C, or G at time 1, but has subsequently reverted to A at time 2. 12
13 P = (1 3α ) P + α 1 A ( 2) A (1) P A(1) ) 13
14 The following equation applies to any t and any t+1 P = (1 3)P 3a)P + a 1 P A(t + 1) A(t) A(t) 14
15 We can rewrite the equation in terms of the amount of change in P A(t) per unit time as: P = P P = 3aP + a 1 P = 4aP + A(t) A(t + 1) A(t) A(t) A(t) A(t) a 15
16 We approximate the discrete-time process by a continuous-time model, by regarding P A(t) as the rate of change at time t. dp A(t ) = 4αP +α dt = 4αP A(t ) +α 16
17 The solution is: P = 1 + P 1 e 4at A(t) 4 A(0) 4 17
18 1 1 4at P = + P e A(t) 4 A(0) 4 If we start with A, the probability that the site has A at time 0 is 1. Thus, P A(0) = 1, and A(0) consequently, t P = A(t) 4 4 e 4at 18
19 1 1 4at P = + P e A(t) 3 A(0) 4 If we start t with non A, the probability that the site has A at time 0 is 0. Thus, P = 0, and A(0) consequently, P = 1 1 4at A(t) 4 4 e 19
20 In the Jukes and Cantor model, the probability of each of the four nucleotides at equilibrium (t = ) is 1/4. P A(0) = 1 : P A(t) = e 4at 4 4 P 1 1 P = 0 : = e 4at A(0) A(t)
21 So far, we treated P A(t) as a probability. However, P A(t) can also be interpreted as the frequency of A in a DNA sequence at time t. For example, if we start with a sequence made of adenines only, then P A(0) = 1, and P A(t) is the expected frequency of A in the sequence at time t. The expected frequency of A in the sequence at equilibrium will be 1/4, and so will the expected frequencies of T, C, and G. 21
22 After reaching equilibrium no further change in the nucleotide frequencies is expected to occur. However, the actual frequencies of the nucleotides will remain unchanged only in DNA sequences of infinite length. In practice, fluctuations in nucleotide frequencies are likely to occur. 22
23 23
24 24
25 25
26 NUMBER OF NUCLEOTIDE SUBSTITUTIONS BETWEEN TWO DNA SEQUENCES 26
27 After two nucleotide sequences diverge from each other, each of them will start accumulating nucleotide substitutions. If two sequences of length N differ from each other at n sites, then the proportion of differences, n/n, is referred to as the degree of divergence or Hamming distance. Degrees of divergence are usually expressed as percentages (n/n 100%). 27
28 28
29 The observed number of differences is likely to be smaller than the actual number of substitutions due to multiple utpehitsat the same site. 29
30 13 mutations = 3 differences 30
31 31
32 Number of substitutions between two noncoding (NOT protein coding) sequences 32
33 The one-parameter model In this model, it is sufficient to consider only I (t), which is the probability bilit that t the nucleotide at a given site at time t is the same in both sequences. 33
34 I = 1 3 8αt (t) e where I (t) is the proportion of identical nucleotides between two sequences that diverged t time units ago at P A(t) = e 4at 34
35 The probability that the two sequences are different at a site at time t is p = 1 I (t). 8 p = 3 1 e 8αt 4 t is usually not known and, thus, we cannot estimate α. Instead, we compute K, which is the number of substitutions per site since the time of divergence between the two sequences. 35
36 36
37 p = 3 1 e 8αt 4 L = number of sites compared between ee the two sequences. 37
38 Jukes & Cantor s one-parameter model 38
39 39
40 Kimura s two- parameter model 40
41 Assumptions: The rate of transitional substitution at each nucleotide site is α per unit time. The rate of each type of transversional substitution is β per unit time. 41
42 α β
43 If the nucleotide residing at a certain site in a DNA sequence is A at time 0, what is the probability, P, that A(t) this site will be occupied by A at time t? 43
44 After one time unit the probability of A changing into G is α, the probability of A changing into C is β, and the probability of A changing g into T is β. Thus, the probability of A remaining unchanged after one time unit is: P = 1 α α 2β AA(1) ( ) 44
45 To derive the probability of having A at time 2, we consider four possible scenarios: 45
46 1. A remained unchanged at t = 1 and t = 2 46
47 2. A changed into G at t = 1 and reverted by a transition to A at t = 2 47
48 3. A changed into C at t = 1 and reverted by a transversion to A at t = 2 48
49 4. A changed into T at t = 1 and reverted by a transversion to A at t = 2 49
50 P = (1 α α 2β)P + βpβ + βpβ +αpα AA(2) AA(1) TA(1) CA(1) GA(1) 50
51 By extension we obtain the following recurrence equation for the general case: P AA(t +1) = (1 α α 2β)P AA(t) + βp TA(t) + βp CA(t) +αp GA(t) 51
52 After rewriting this equation as the amount of change in P AA(t) per unit time, and after approximating the discrete-time model by the continuous-time model, we obtain the following differential equation dp AA(T ( ) = (α+ ( 2β)P + βp + βp +αpp dt AA(t) TA(t) CA(t) GA(t) 52
53 Similarly, we can obtain equations for P TA(t), P CA(t),and P GA(t), and from this set of four GA(t) equations, we arrive at the following solution 4βt 2(α+ P = 1 AA(t) e 4βt + 1 β)t e 2(α+ 2 4at P = 1 A(t) e 4at 53
54 In the Jukes-Cantor model: P AA(t) = P GG(t) = P CC(t) = P TT(t) Because of the symmetry of the substitution scheme, this equality also holds for Kimura's two-parameter model. 54
55 3 probabilities X (t) = The probability that a nucleotide at a site at time t is identical to that at time 0 X = 1 4βt 2(α+ + 1 e 4βt + 1 β)t e 2(α+ (t) At equilibrium, the equation reduces to X ( ) = 1/4. Thus, as in the case of Jukes and Cantor's model, the equilibrium frequencies of the four nucleotides are 1/4. 55
56 3 probabilities Y (t) () = The probability that the initial nucleotide and the nucleotide at time t differ from each other by a transition. Because of the symmetry of the substitution scheme, Y (t) = P AG(t) = P GA(t) = P TC(t) = P CT(t). = 1 1 4βt 1 2(α+ β)t Y (t) e 2 e 56
57 Z = The probability that the (t) nucleotide at time t and the initial nucleotide differ by a specific type of transversion is given by 3 probabilities 4βt Z = 1 1 e (t)
58 Each nucleotide is subject to two types of transversion, but only one type of transition. Therefore, the probability that the initial nucleotide and the nucleotide at time t differ by a transversion is twice the probability that differ by a transition X (t) + Y (t) + 2Z (t) = 1 58
59 Number of substitutions between two noncoding (NOT protein coding) sequences 59
60 The differences between two sequences are classified into transitions and transversions. P = proportion of transitional differences Q = proportion of transversional differences 60
61 61
62 62
63 63
64 2 V(K) = 1 L P 1 1 2P Q + Q 1 2 4P 2Q Q 2 P 2 1 2P Q + Q 2 4P 2Q + Q 2 4Q 64
65 65
66 Numerical example (2P-model) 66
67 There are substitution tut schemes with more than two parameters! 67
68 Number of substitutions between two protein-coding genes 68
69 Computing the number of substitutions between two protein-coding sequences is more complicated, because a distinction should be made between synonymous and nonsynonymous y substitutions. 69
70 Number of synonymous substitutions Number of synonymous sites Number of nonsynonymous substitutions Number of nonsynonymous sites 70
71 71
72 Aims: 1. Compute two numerators: The numbers of synonymous y and nonsynonymous substitutions. 2. Compute two denominators: The numbers of synonymous and nonsynonymous sites. 72
73 Difficulties with denominator: 1. The classification of a site changes with time: For example, the third position of CGG (Arg) is synonymous. However, if the first position changes to T, then the third position of the resulting codon, TGG (Trp), becomes nonsynonymous. T Trp Nonsynonymous 73
74 Difficulties with denominator: 2. Many sites are neither completely synonymous nor completely nonsynonymous. For example, a transition in the third position of GAT (Asp) will be synonymous, while a transversion to GAG or GAA will alter the amino acid. 74
75 Difficulties with numerator: 1. The classification of the change depends on the order in which the substitutions had occurred. 75
76 Difficulties with numerator: 1. When two homologous codons differ from each other by two substitutions or more the order of the substitutions must be known in order to classify substitutions into synonymous and nonsynonymous. Example: CCC in sequence 1 and CAA in sequence 2. Pathway I: CCC (Pro) CCA (Pro) CAA (Gln) 1 synonymous and 1 nonsynonymous 76
77 Difficulties with numerator: 2. Transitions occur with different frequencies than transversions. 3. The type of substitution depends on the mutation. ti Transitions result more frequently in synonymous y substitutions than transversions. 77
78 Miyata & Yasunaga (1980) and Nei & Gojobori (1986) method 78
79 1. Classification of sites. Consider a particular position in a codon. Let i be the number of possible synonymous changes at this site. Then this site is counted as i/3 synonymous and (3 i)/3 nonsynonymous. 79
80 In TTT (Phe), the first two positions are nonsynonymous, because no synonymous change can occur in them, and the third position is 1/3 synonymous and 2/3 nonsynonymous because one of the three possible changes is synonymous. 80
81 2. Count the number of synonymous and nonsynonymous sites in each sequence and compute the averages between the two sequences. The average number of synonymous y sites is N S and that of nonsynonymous sites is N A. 81
82 3. Classify nucleotide differences into synonymous y and nonsynonymous differences. 82
83 For two codons that differ by only one nucleotide, the difference is easily inferred. For example, the difference between the two codons GTC (Val) and GTT (Val) is synonymous, while the difference between the two codons GTC (Val) and GCC (Ala) is nonsynonymous. 83
84 84
85 For two codons that differ by two or more nucleotides, the estimation problem is more complicated, because we need to determine the order in which the substitutions occurred. 85
86 Pathway (1) requires one synonymous and one nonsynonymous change, whereas pathway (2) requires two nonsynonymous 86 changes.
87 There are two approaches to deal with multiple substitutions at a codon: 87
88 The unweighted method: Average the numbers of the different types of substitutions for all the possible scenarios. For example, if we assume that the two pathways are equally likely, then the number of nonsynonymous differences is (1 + 2)/2 = 1.5, and the number of synonymous differences is (188+ 0)/2 = 0.5.
89 The weighted method. Employ a priori criteria to assign the probability of each pathway. For instance, if the weight of pathway 1 is 0.9, and the weight for pathway 2 is 0.1, then the number of nonsynonymous differences between the two codons is (0.9 1) + (0.1 2) = 1.1, and the number of 89 synonymous differences is 0.9.
90 90
91 4. The numbers of synonymous and nonsynonymous y differences between the two protein- coding sequences are M S and M A, respectively. 91
92 The number of synonymous differences per synonymous site is p S = M S /N S The number of nonsynonymous y differences per nonsynonymous site is p A = M A /N A 92
93 If we take into account the effect of multiple hits at the same site, we can make corrections by using Jukes and Cantor's formula: 93
94 94
95 3 4 M K = ln 1 S S 4 3N S 95
96 3 4 M K = ln 1 A A 4 3NA A 96
97 97
98 Number of Amino-Acid Replacements between Two Proteins The observed proportion of different amino acids between the two sequences (p) is p = n /L n = number of amino acid differences between the two sequences L = length of the aligned sequences. 98
99 99
100 Number of Amino-Acid Replacements between Two Proteins The Poisson model is used to convert p into the number of amino replacements between two sequences (d ): d = - ln(1 p) The variance of d is estimated as V(d) ( ) = p/l (1 p) 100
101 ALIGNMENT OF NUCLEOTIDE & AMINO-ACID ACID SEQUENCES 101
102 102
103 Homology: The term was coined by Richard Owen in Definition: Similarity resulting from common ancestry. 103
104 Homology: A qualitative statment Homology designates a relationship of common descent between entities Two genes are either homologs or not it doesn t make sense to say two genes are 43% homologous. it doesn t make sense to say Linda is 24% pregnant. 104
105 Homology By comparing homologous characters, we can reconstruct the evolutionary events that have led to the formation of the extant sequences from the common ancestor. 105
106 Homology When dealing with sequences, we are interested t in POSITIONAL HOMOLOGY. We identify positional homology by ALIGNMENT. 106
107 ACTGGGCCCAAATC 1 deletion 1 substitution 1 insertion 1 substitution CTGGGCCCAGATC AACAGGGCCCAAATC Correct alignment --CTGGGCCCAGATC AACAGGGCCCAAATC *.*******.********** *** Incorrect alignment CTGGGCCCAGATC-- AACAGGGCCCAAATC...*..*..*.. * * 107
108 Unknown! unknown processes unknown processes CTGGGCCCAGATC AACAGGGCCCAAATC Correct alignment? --CTGGGCCCAGATC AACAGGGCCCAAATC *.*******.********** *** Incorrect alignment? CTGGGCCCAGATC-- AACAGGGCCCAAATC...*..*..*.. * * 108
109 ACCTGAATTTGCCC T9 -A6 G5T -A7 +ACA12 T8A +G2 ACCTTAATTGCACACC ACCTTAATTGCACACC AGCCTGATTGCCC--- AGCCTGATTGCCC C2G, T4C, A6G, A12C, -ACC14 109
110 Alignment: nt A hypothesis s concerning ng positional homology among residues in a sequence. Positional homology = A pair of nucleotides from two aligned sequences that have descended from one nucleotide in the ancestor of the two sequences. 110
111 An alignment consists of a series of paired bases, one base from each sequence. There are three types of pairs: (1) matches = the same nucleotide appears in both sequences. (2) mismatches = different nucleotides are found in the two sequences. (3) gaps = a base in one sequence and a null base in the other. GCGGCCCATCAGGTAGTTGGTG-G GCGTTCCATC--CTGGTTGGTGTG ***..*****.*.******* * 111
112 Sequence alignment = The identification of the location of deletion or insertions that might have occurred in either of the two lineages since their divergence from a common ancestor. Insertion +Deletionetion = Indel Indel or or Gap 112
113 Sequence alignment 1. Pairwise alignment 2. Multiple alignment 113
114 - Two DNA sequences: A and B. - Lengths are m and n,, respectively. - The number of matched pairs is x. - The number of mismatched pairs is y. - Total number of bases in gaps is z. 114
115 There are terminal and internal gaps. GCGG-CCATCAGGTAGTTGGTG-- GCGTTCCATC--CTGGTTGGTGTG 115
116 A terminal gap may indicate missing data. GCGG-CCATCAGGTAGTTGGTG-- GCGTTCCATC--CTGGTTGGTGTG 116
117 An internal gap indicates that a deletion or an insertion has occurred in one of the two lineages. GCGG-CCATCAGGTAGTTGGTG-- GCGTTCCATC--CTGGTTGGTGTG 117
118 The alignment is the first step in many evolutionary and functional studies. Errors in alignment tend to amplify in later computational stages. 118
119 Methods of alignment: 1. Manual 2. Dot matrix 3. Distance Matrix 4. Combined (Distance + Manual) 119
120 Manual alignment. When there are few gaps and the two sequences are not too different from each other, a reasonable alignment can be obtained by visual inspection. GCG-TCCATCAGGTAGTTGGTGTG GCGTTCCATCAGGTGGTTGGTGTG *** **********.********* 120
121 Advantages of manual alignment: (1) use of a powerful and trainable tool (the brain, well, some brains). (2) ability to integrate additional data, e.g., domain structure, biological function. 121
122 122
123 123
124 Protein Alignment may be guided by Tertiary Structures Escherichia coli DjlA protein Homo sapiens DjlA protein 124
125 Disadvantages of manual alignment: (1) The method is subjective and unscalable. 125
126 The dot-matrix method: The two sequences are written out as column and row headings of a two- dimensional matrix. A dot is put in the dot-matrix plot at a position where the nucleotides in the two sequences are identical. 126
127 The alignment is defined by a path from the upper-left element to the lowerright element. 127
128 There are 4 possible steps in the path: (1) a diagonal step through a dot = match. (2) a diagonal step through an empty element of the matrix = mismatch. (3) a horizontal step = a gap in the sequence on the top of the matrix. (4) a vertical step = a gap in the sequence on the left of the matrix. 128
129 forbidden directions allowed directions 129
130 A dot matrix may become cluttered. With DNA sequences, ~25% of the elements will be occupied by dots by chance alone. 130
131 window size =1 stringency ti = 1 alphabet size = 4 The number of spurious matches is determined by: window size, stringency, & alphabet size. 131
132 window size =1 stringency ti = 1 alphabet size = 4 window size = 3 stringency ti = 2 alphabet size = 4 132
133 window size = 1 stringency = 1 alphabet size =
134 Dot-matrix methods: Advantages: May unravel information on the evolution of sequences. 134
135 Window size = 60 amino acids; Stringency = 24 matches Advantages: Highlighting Information The vertical gap indicates that a coding region corresponding to ~75 amino acids has either been deleted from the human gene or inserted into the bacterial gene. 135
136 Window size = 60 amino acids; Stringency = 24 matches Advantages: Highlighting Information The two diagonally oriented parallel lines most probably indicate that a small internal duplication has occurred in the bacterial gene. 136
137 Dot-matrix methods: Disadvantage: May not identify the best alignment. 137
138 Distance and similarity methods 138
139 The best possible alignment (optimal alignment) is the one in which the numbers of mismatches and gaps are minimized i i according to certain criteria. 139
140 Unfortunately, reducing the number of mismatches results in an increase in the number of gaps, and vice versa. 140
141 α = matches th β = mismatches γ = nucleotides in gaps δ = gaps 141
142 Gap penalty (or cost) is a factor (or a set of factors) by which the gap values (numbers and lengths of gaps) are multiplied to make the gaps equivalent in value to the mismatches. The gap penalties are based on our assessment of how frequent different types of insertions and deletions occur in evolution in comparison with the frequency of occurrence of point substitutions. btitti 142
143 Mismatch penalty is an assessment of how frequently substitutions occur. 143
144 The distance (dissimilarity) index (D) between two sequences in an alignment is D= m y + i i w z k k where y i is the number of mismatches of type i, m i is the mismatch penalty for an i-type of mismatch, z k is the number of gaps of length k, and w k is a positive number representing the penalty for gaps of length k. 144
145 The similarity index (S) between two sequences in an alignment is: S = x + w z k k where x is the number of matches, z k is the number of gaps of length k, and w k is a positive number representing the penalty for gaps of 145 length k.
146 The gap penalty has two components: a gap-opening penalty and a gap-extension penalty. 146
147 Three main systems: (1) Fixed gap-penalty system = 0 gap-extension costs. (2) Linear gap-penalty system = the gap-extension cost is calculated by multiplying the gap length minus 1 by a constant representing the gap-extension penalty for increasing the gap by 1. (3) Logarithmic gap-penalty system = the gap-extension penalty increases with the logarithm of the gap length, i.e., slower. 147
148 148
149 Further complications: Distinguishing among different matches and mismatches. For example, a mismatched pair consisting of Leu & Ile, which are very similar biochemically to each other, may be given a lesser penalty than a mismatched pair consisting ss of Arg & Glu, Gu, which are very dissimilar from each other. 149
150 Lesser penalty than 150
151 Alignment algorithms 151
152 Aim: Find the alignment associated with the smallest D (or largest S) from among all possible alignments. 152
153 The number of possible alignments may be astronomical. For example, when two sequences 300 residues long each are compared, there are possible alignments. In comparison, the number of elementary particles in the universe is only ~
154 There are computer algorithms for finding the optimal alignment between two sequences that do not require an exhaustive search of all the possibilities. 154
155 The Needleman-Wunsch algorithm uses Dynamic Programming g 155
156 Dynamic programming = a computational ti technique. It is applicable when large searches can be divided into a succession of small stages, such that (1) the solution of the initial search stage is trivial, (2) each partial solution in a later stage can be calculated by reference to only a small number of solutions in an earlier stage, and (3) the last stage contains the overall solution. 156
157 Multiple l Sequence Alignment 157
158 Alignments can be easy or difficult GCGGCCCA TCAGGTAGTT GGTGG GCGGCCCA TCAGGTAGTT GGTGG GCGTTCCA TCAGCTGGTT GGTGG GCGTCCCA TCAGCTAGTT GGTGG GCGGCGCA TTAGCTAGTT GGTGA ***...** *.**.*.*** ****. TTGACATG CCGGGG---A AACCG T-GACATG CCGGTG--GTGT AAGCC TTGGCATG -CTAGG---A ACGCG TTGACATG -CTAGGGAAC ACGCG TTGACATC -CTCTG---A ACGCG * *.***. *... *. *..*. Easy Difficult 158
159 159
160 Multiple Alignment 2 methods: Dynamic programming (exhaustive, exact) Consider 2 protein sequences of 100 amino acids in length. If it takes seconds to exhaustively align these sequences, then it will take seconds to align 3 sequences, to align 4 sequences...etc. More time than the universe has existed to align 20 sequences exhaustively. Progressive alignment (heuristic, approximate) 160
161 Progressive Alignment Devised by Feng and Doolittle in Essentially a heuristic method and as such is not guaranteed to find the optimal alignment. Requires n-1+n-2+n-3...n-n+1 n n+1 pairwise alignments as a starting point Most successful implementation ti is Clustal l (Des Higgins) 161
162 Overview of Clustal Procedure CLUSTAL Hbb_Human 1 - Hbb_Horse Hba_Human Hba_Horse Myg Whale Quick pairwise alignments 2. Distances for each pair 3. Distance matrix Hbb_Human Hbb_Horse Hba_Human Hba_HorseHorse Neighbor-joining tree 2 (guide tree) Myg_Whale 1 PEEKSAVTALWGKVN--VDEVGG 2 GEEKAAVLALWDKVN--EEEVGG 3 PADKTNVKAAWGKVGAHAGEYGA 4 AADKTNVKAAWSKVGGHAGEYGA 5 EHEWQLVLHVWAKVEADVAGHGQ Progressive alignment 2 following guide tree 162
163 Clustal Clustal: good points/bad points Advantages: Speed. Disadvantages: No way of knowing if the alignment is correct correct. 163
164 Effect of gap penalties on amino-acid alignment Human pancreatic hormone precursor versus chicken pancreatic hormone (a) Penalty for gaps is 0 (b) Penalty for a gap of size k nucleotides is w k = k (c) The same alignment as in (b), only the similarity between the two sequences is further enhanced by 164 showing pairs of biochemically similar amino acids
165 An Alignment GCGGCTCA TCAGGTAGTT GGTG-G GCGGCCCA TCAGGTAGTT GGTG-G GCGTTCCA TC--CT-GTT GGTGTG GCGTCCCA TCAGCTAGTT GTTG-G GCGGCGCA TTAGCTAGTT GGTG-A ***...** *.* *** *.****. Spinach Rice Mosquito Monkey Human 165
SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA
SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS 1 Prokaryotes and Eukaryotes 2 DNA and RNA 3 4 Double helix structure Codons Codons are triplets of bases from the RNA sequence. Each triplet defines an amino-acid.
More informationPractical Bioinformatics
5/2/2017 Dictionaries d i c t i o n a r y = { A : T, T : A, G : C, C : G } d i c t i o n a r y [ G ] d i c t i o n a r y [ N ] = N d i c t i o n a r y. h a s k e y ( C ) Dictionaries g e n e t i c C o
More informationUnderstanding relationship between homologous sequences
Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective
More informationAlgorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More information"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky
MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally
More informationNucleotide substitution models
Nucleotide substitution models Alexander Churbanov University of Wyoming, Laramie Nucleotide substitution models p. 1/23 Jukes and Cantor s model [1] The simples symmetrical model of DNA evolution All
More informationMassachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution
Massachusetts Institute of Technology 6.877 Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution 1. Rates of amino acid replacement The initial motivation for the neutral
More informationTHEORY. Based on sequence Length According to the length of sequence being compared it is of following two types
Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between
More informationAdvanced topics in bioinformatics
Feinberg Graduate School of the Weizmann Institute of Science Advanced topics in bioinformatics Shmuel Pietrokovski & Eitan Rubin Spring 2003 Course WWW site: http://bioinformatics.weizmann.ac.il/courses/atib
More informationInDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9
Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic
More informationAlgorithms in Bioinformatics
Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology
More informationBiochemistry 324 Bioinformatics. Pairwise sequence alignment
Biochemistry 324 Bioinformatics Pairwise sequence alignment How do we compare genes/proteins? When we have sequenced a genome, we try and identify the function of unknown genes by finding a similar gene
More informationProbabilistic modeling and molecular phylogeny
Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU) What is a model? Mathematical
More informationModule: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment
Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand
More informationSequence analysis and Genomics
Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute
More informationSequence analysis and comparison
The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species
More informationSequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment
Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of
More informationLecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)
Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from
More informationEvolutionary Analysis of Viral Genomes
University of Oxford, Department of Zoology Evolutionary Biology Group Department of Zoology University of Oxford South Parks Road Oxford OX1 3PS, U.K. Fax: +44 1865 271249 Evolutionary Analysis of Viral
More informationQuantifying sequence similarity
Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity
More informationIntroduction to Molecular Phylogeny
Introduction to Molecular Phylogeny Starting point: a set of homologous, aligned DNA or protein sequences Result of the process: a tree describing evolutionary relationships between studied sequences =
More informationAmira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut
Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological
More informationCONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018
CONCEPT OF SEQUENCE COMPARISON Natapol Pornputtapong 18 January 2018 SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE Sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of
More informationDr. Amira A. AL-Hosary
Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological
More informationPairwise & Multiple sequence alignments
Pairwise & Multiple sequence alignments Urmila Kulkarni-Kale Bioinformatics Centre 411 007 urmila@bioinfo.ernet.in Basis for Sequence comparison Theory of evolution: gene sequences have evolved/derived
More informationPairwise sequence alignments. Vassilios Ioannidis (From Volker Flegel )
Pairwise sequence alignments Vassilios Ioannidis (From Volker Flegel ) Outline Introduction Definitions Biological context of pairwise alignments Computing of pairwise alignments Some programs Importance
More informationRegulatory Sequence Analysis. Sequence models (Bernoulli and Markov models)
Regulatory Sequence Analysis Sequence models (Bernoulli and Markov models) 1 Why do we need random models? Any pattern discovery relies on an underlying model to estimate the random expectation. This model
More information5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT
5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT.03.239 03.10.2012 ALIGNMENT Alignment is the task of locating equivalent regions of two or more sequences to maximize their similarity. Homology:
More informationSara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)
Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline
More informationFirst generation sequencing and pairwise alignment (High-tech, not high throughput) Analysis of Biological Sequences
First generation sequencing and pairwise alignment (High-tech, not high throughput) Analysis of Biological Sequences 140.638 where do sequences come from? DNA is not hard to extract (getting DNA from a
More informationSupplemental data. Pommerrenig et al. (2011). Plant Cell /tpc
Supplemental Figure 1. Prediction of phloem-specific MTK1 expression in Arabidopsis shoots and roots. The images and the corresponding numbers showing absolute (A) or relative expression levels (B) of
More informationPOPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics
POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the
More information3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT
3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode
More informationSequence Alignment (chapter 6)
Sequence lignment (chapter 6) he biological problem lobal alignment Local alignment Multiple alignment Introduction to bioinformatics, utumn 6 Background: comparative genomics Basic question in biology:
More informationSimilarity or Identity? When are molecules similar?
Similarity or Identity? When are molecules similar? Mapping Identity A -> A T -> T G -> G C -> C or Leu -> Leu Pro -> Pro Arg -> Arg Phe -> Phe etc If we map similarity using identity, how similar are
More informationSequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University
Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of
More informationSequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013
Sequence Alignments Dynamic programming approaches, scoring, and significance Lucy Skrabanek ICB, WMC January 31, 213 Sequence alignment Compare two (or more) sequences to: Find regions of conservation
More informationPairwise sequence alignments
Pairwise sequence alignments Volker Flegel VI, October 2003 Page 1 Outline Introduction Definitions Biological context of pairwise alignments Computing of pairwise alignments Some programs VI, October
More informationModelling and Analysis in Bioinformatics. Lecture 1: Genomic k-mer Statistics
582746 Modelling and Analysis in Bioinformatics Lecture 1: Genomic k-mer Statistics Juha Kärkkäinen 06.09.2016 Outline Course introduction Genomic k-mers 1-Mers 2-Mers 3-Mers k-mers for Larger k Outline
More informationThe use of molecular tools for taxonomic research in zoology & botany
The use of molecular tools for taxonomic research in zoology & botany Outline Why employ molecular genetic markers? Brief historical overview of DN research Molecular techniques for genetic analysis DN
More informationBioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre
Bioinformatics Scoring Matrices David Gilbert Bioinformatics Research Centre www.brc.dcs.gla.ac.uk Department of Computing Science, University of Glasgow Learning Objectives To explain the requirement
More informationEVOLUTIONARY DISTANCE MODEL BASED ON DIFFERENTIAL EQUATION AND MARKOV PROCESS
August 0 Vol 4 No 005-0 JATIT & LLS All rights reserved ISSN: 99-8645 wwwjatitorg E-ISSN: 87-95 EVOLUTIONAY DISTANCE MODEL BASED ON DIFFEENTIAL EUATION AND MAKOV OCESS XIAOFENG WANG College of Mathematical
More informationBio 1B Lecture Outline (please print and bring along) Fall, 2007
Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution
More informationGENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.
!! www.clutchprep.com CONCEPT: OVERVIEW OF EVOLUTION Evolution is a process through which variation in individuals makes it more likely for them to survive and reproduce There are principles to the theory
More informationLecture Notes: Markov chains
Computational Genomics and Molecular Biology, Fall 5 Lecture Notes: Markov chains Dannie Durand At the beginning of the semester, we introduced two simple scoring functions for pairwise alignments: a similarity
More information3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies
Richard Owen (1848) introduced the term Homology to refer to structural similarities among organisms. To Owen, these similarities indicated that organisms were created following a common plan or archetype.
More informationAoife McLysaght Dept. of Genetics Trinity College Dublin
Aoife McLysaght Dept. of Genetics Trinity College Dublin Evolution of genome arrangement Evolution of genome content. Evolution of genome arrangement Gene order changes Inversions, translocations Evolution
More information20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, Global and local alignment of two sequences using dynamic programming
20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, 2008 4 Pairwise alignment We will discuss: 1. Strings 2. Dot matrix method for comparing sequences 3. Edit distance 4. Global and local alignment
More information8 Grundlagen der Bioinformatik, SS 09, D. Huson, April 28, 2009
8 Grundlagen der Bioinformatik, SS 09, D. Huson, April 28, 2009 2 Pairwise alignment We will discuss: 1. Strings 2. Dot matrix method for comparing sequences 3. Edit distance and alignment 4. The number
More informationSSR ( ) Vol. 48 No ( Microsatellite marker) ( Simple sequence repeat,ssr),
48 3 () Vol. 48 No. 3 2009 5 Journal of Xiamen University (Nat ural Science) May 2009 SSR,,,, 3 (, 361005) : SSR. 21 516,410. 60 %96. 7 %. (),(Between2groups linkage method),.,, 11 (),. 12,. (, ), : 0.
More informationSUPPORTING INFORMATION FOR. SEquence-Enabled Reassembly of β-lactamase (SEER-LAC): a Sensitive Method for the Detection of Double-Stranded DNA
SUPPORTING INFORMATION FOR SEquence-Enabled Reassembly of β-lactamase (SEER-LAC): a Sensitive Method for the Detection of Double-Stranded DNA Aik T. Ooi, Cliff I. Stains, Indraneel Ghosh *, David J. Segal
More informationEVOLUTIONARY DISTANCES
EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:
More informationSequence Divergence & The Molecular Clock. Sequence Divergence
Sequence Divergence & The Molecular Clock Sequence Divergence v simple genetic distance, d = the proportion of sites that differ between two aligned, homologous sequences v given a constant mutation/substitution
More informationCSE 549: Computational Biology. Substitution Matrices
CSE 9: Computational Biology Substitution Matrices How should we score alignments So far, we ve looked at arbitrary schemes for scoring mutations. How can we assign scores in a more meaningful way? Are
More informationCharacterization of Pathogenic Genes through Condensed Matrix Method, Case Study through Bacterial Zeta Toxin
International Journal of Genetic Engineering and Biotechnology. ISSN 0974-3073 Volume 2, Number 1 (2011), pp. 109-114 International Research Publication House http://www.irphouse.com Characterization of
More informationMolecular evolution 2. Please sit in row K or forward
Molecular evolution 2 Please sit in row K or forward RBFD: cat, mouse, parasite Toxoplamsa gondii cyst in a mouse brain http://phenomena.nationalgeographic.com/2013/04/26/mind-bending-parasite-permanently-quells-cat-fear-in-mice/
More informationCollected Works of Charles Dickens
Collected Works of Charles Dickens A Random Dickens Quote If there were no bad people, there would be no good lawyers. Original Sentence It was a dark and stormy night; the night was dark except at sunny
More informationAn Introduction to Bioinformatics Algorithms Hidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationCHAPTERS 24-25: Evidence for Evolution and Phylogeny
CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology
More informationIntroduction to Comparative Protein Modeling. Chapter 4 Part I
Introduction to Comparative Protein Modeling Chapter 4 Part I 1 Information on Proteins Each modeling study depends on the quality of the known experimental data. Basis of the model Search in the literature
More informationMoreover, the circular logic
Moreover, the circular logic How do we know what is the right distance without a good alignment? And how do we construct a good alignment without knowing what substitutions were made previously? ATGCGT--GCAAGT
More informationIMPLEMENTING HIERARCHICAL CLUSTERING METHOD FOR MULTIPLE SEQUENCE ALIGNMENT AND PHYLOGENETIC TREE CONSTRUCTION
IMPLEMENTING HIERARCHICAL CLUSTERING METHOD FOR MULTIPLE SEQUENCE ALIGNMENT AND PHYLOGENETIC TREE CONSTRUCTION Harmandeep Singh 1, Er. Rajbir Singh Associate Prof. 2, Navjot Kaur 3 1 Lala Lajpat Rai Institute
More informationSingle alignment: Substitution Matrix. 16 march 2017
Single alignment: Substitution Matrix 16 march 2017 BLOSUM Matrix BLOSUM Matrix [2] (Blocks Amino Acid Substitution Matrices ) It is based on the amino acids substitutions observed in ~2000 conserved block
More informationWeek 5: Distance methods, DNA and protein models
Week 5: Distance methods, DNA and protein models Genome 570 February, 2016 Week 5: Distance methods, DNA and protein models p.1/69 A tree and the expected distances it predicts E A 0.08 0.05 0.06 0.03
More informationProtein Threading. Combinatorial optimization approach. Stefan Balev.
Protein Threading Combinatorial optimization approach Stefan Balev Stefan.Balev@univ-lehavre.fr Laboratoire d informatique du Havre Université du Havre Stefan Balev Cours DEA 30/01/2004 p.1/42 Outline
More informationMotivating the need for optimal sequence alignments...
1 Motivating the need for optimal sequence alignments... 2 3 Note that this actually combines two objectives of optimal sequence alignments: (i) use the score of the alignment o infer homology; (ii) use
More informationPairwise sequence alignment
Department of Evolutionary Biology Example Alignment between very similar human alpha- and beta globins: GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL G+ +VK+HGKKV A+++++AH+D++ +++++LS+LH KL GNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKL
More informationBackground: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6)
Sequence lignment (chapter ) he biological problem lobal alignment Local alignment Multiple alignment Background: comparative genomics Basic question in biology: what properties are shared among organisms?
More informationComputational Biology
Computational Biology Lecture 6 31 October 2004 1 Overview Scoring matrices (Thanks to Shannon McWeeney) BLAST algorithm Start sequence alignment 2 1 What is a homologous sequence? A homologous sequence,
More informationMolecular Population Genetics
Molecular Population Genetics The 10 th CJK Bioinformatics Training Course in Jeju, Korea May, 2011 Yoshio Tateno National Institute of Genetics/POSTECH Top 10 species in INSDC (as of April, 2011) CONTENTS
More information8 Grundlagen der Bioinformatik, SoSe 11, D. Huson, April 18, 2011
8 Grundlagen der Bioinformatik, SoSe 11, D. Huson, April 18, 2011 2 Pairwise alignment We will discuss: 1. Strings 2. Dot matrix method for comparing sequences 3. Edit distance and alignment 4. The number
More informationSupplemental Table 1. Primers used for cloning and PCR amplification in this study
Supplemental Table 1. Primers used for cloning and PCR amplification in this study Target Gene Primer sequence NATA1 (At2g393) forward GGG GAC AAG TTT GTA CAA AAA AGC AGG CTT CAT GGC GCC TCC AAC CGC AGC
More informationAlignment & BLAST. By: Hadi Mozafari KUMS
Alignment & BLAST By: Hadi Mozafari KUMS SIMILARITY - ALIGNMENT Comparison of primary DNA or protein sequences to other primary or secondary sequences Expecting that the function of the similar sequence
More informationSupplementary Information for
Supplementary Information for Evolutionary conservation of codon optimality reveals hidden signatures of co-translational folding Sebastian Pechmann & Judith Frydman Department of Biology and BioX, Stanford
More informationBioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics
Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods
More informationPairwise Sequence Alignment
Introduction to Bioinformatics Pairwise Sequence Alignment Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr Outline Introduction to sequence alignment pair wise sequence alignment The Dot Matrix Scoring
More informationSequence Alignment: Scoring Schemes. COMP 571 Luay Nakhleh, Rice University
Sequence Alignment: Scoring Schemes COMP 571 Luay Nakhleh, Rice University Scoring Schemes Recall that an alignment score is aimed at providing a scale to measure the degree of similarity (or difference)
More informationPhylogenetic Tree Reconstruction
I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven
More informationLarge-Scale Genomic Surveys
Bioinformatics Subtopics Fold Recognition Secondary Structure Prediction Docking & Drug Design Protein Geometry Protein Flexibility Homology Modeling Sequence Alignment Structure Classification Gene Prediction
More informationLecture 1, 31/10/2001: Introduction to sequence alignment. The Needleman-Wunsch algorithm for global sequence alignment: description and properties
Lecture 1, 31/10/2001: Introduction to sequence alignment The Needleman-Wunsch algorithm for global sequence alignment: description and properties 1 Computational sequence-analysis The major goal of computational
More informationPhylogenetic inference
Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types
More informationAn Introduction to Sequence Similarity ( Homology ) Searching
An Introduction to Sequence Similarity ( Homology ) Searching Gary D. Stormo 1 UNIT 3.1 1 Washington University, School of Medicine, St. Louis, Missouri ABSTRACT Homologous sequences usually have the same,
More informationThanks to Paul Lewis, Jeff Thorne, and Joe Felsenstein for the use of slides
hanks to Paul Lewis, Jeff horne, and Joe Felsenstein for the use of slides Hennigian logic reconstructs the tree if we know polarity of characters and there is no homoplasy UPM infers a tree from a distance
More informationpart 4: phenomenological load and biological inference. phenomenological load review types of models. Gαβ = 8π Tαβ. Newton.
2017-07-29 part 4: and biological inference review types of models phenomenological Newton F= Gm1m2 r2 mechanistic Einstein Gαβ = 8π Tαβ 1 molecular evolution is process and pattern process pattern MutSel
More informationSupplementary Information for Hurst et al.: Causes of trends of amino acid gain and loss
Supplementary Information for Hurst et al.: Causes of trends of amino acid gain and loss Methods Identification of orthologues, alignment and evolutionary distances A preliminary set of orthologues was
More informationLecture 8 Multiple Alignment and Phylogeny
Introduction to Bioinformatics for Medical Research Gideon Greenspan gdg@cs.technion.ac.il Lecture 8 Multiple Alignment and Phylogeny Multiple Alignment & Phylogeny Multiple Alignment Scoring Complexity
More informationLecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).
1 Bioinformatics: In-depth PROBABILITY & STATISTICS Spring Semester 2011 University of Zürich and ETH Zürich Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). Dr. Stefanie Muff
More informationHidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationBioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment
Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value
More informationEstimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057
Estimating Phylogenies (Evolutionary Trees) II Biol4230 Thurs, March 2, 2017 Bill Pearson wrp@virginia.edu 4-2818 Jordan 6-057 Tree estimation strategies: Parsimony?no model, simply count minimum number
More informationPractical considerations of working with sequencing data
Practical considerations of working with sequencing data File Types Fastq ->aligner -> reference(genome) coordinates Coordinate files SAM/BAM most complete, contains all of the info in fastq and more!
More informationPhylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University
Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction
More informationComparing whole genomes
BioNumerics Tutorial: Comparing whole genomes 1 Aim The Chromosome Comparison window in BioNumerics has been designed for large-scale comparison of sequences of unlimited length. In this tutorial you will
More informationCopyright 2000 N. AYDIN. All rights reserved. 1
Introduction to Bioinformatics Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr Multiple Sequence Alignment Outline Multiple sequence alignment introduction to msa methods of msa progressive global alignment
More informationLecture Notes: BIOL2007 Molecular Evolution
Lecture Notes: BIOL2007 Molecular Evolution Kanchon Dasmahapatra (k.dasmahapatra@ucl.ac.uk) Introduction By now we all are familiar and understand, or think we understand, how evolution works on traits
More informationO 3 O 4 O 5. q 3. q 4. Transition
Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in
More informationWhy do more divergent sequences produce smaller nonsynonymous/synonymous
Genetics: Early Online, published on June 21, 2013 as 10.1534/genetics.113.152025 Why do more divergent sequences produce smaller nonsynonymous/synonymous rate ratios in pairwise sequence comparisons?
More informationPhylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center
Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods
More informationMultiple Sequence Alignment. Sequences
Multiple Sequence Alignment Sequences > YOR020c mstllksaksivplmdrvlvqrikaqaktasglylpe knveklnqaevvavgpgftdangnkvvpqvkvgdqvl ipqfggstiklgnddevilfrdaeilakiakd > crassa mattvrsvksliplldrvlvqrvkaeaktasgiflpe
More informationNature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1
Supplementary Figure 1 Zn 2+ -binding sites in USP18. (a) The two molecules of USP18 present in the asymmetric unit are shown. Chain A is shown in blue, chain B in green. Bound Zn 2+ ions are shown as
More information