Multiple Sequence Alignment in Historical Linguistics

Size: px
Start display at page:

Download "Multiple Sequence Alignment in Historical Linguistics"

Transcription

1 1 / 32 Multiple Sequence Alignment in Historical Linguistics A Sound Class Based Approach Johann-Mattis List Institute for Romance Languages and Literature Heinrich Heine University Düsseldorf 2011/01/06

2 2 / 32 Structure of the Talk Sequences Alignments Pairwise Sequence Alignment Multiple Sequence Alignment Similarity Sound Classes Main Ideas Working Principle Scoring TPPSR

3 3 / 32 Sequences Alignments - Sequences - - Alignments -

4 4 / 32 Sequences Alignments Sequences Sets Sets are unordered lists of unique objects Sets are compared by comparing the objects of different sets Sequences Sequences are ordered lists of non-unique objects Sequences are compared by comparing both the objects (segments) and the structure of different sequences

5 5 / 32 Alignments Sequences Alignments Sequence Alignment In alignment analyses, the corresponding segments of two or more sequences are ordered in such a way that they are set against each other Segments which do not correspond to any other segments are marked by gaps (-) In this way, both, the structure and the segments of two or more sequences can be compared

6 Alignments Sequences Alignments ʧ ɪ l ɐ vʲ ɛ k ʧ o v ɛ k 1 6 / 32

7 Alignments Sequences Alignments ʧ ɪ l ɐ vʲ ɛ k ʧ o v ɛ k 1 6 / 32

8 Alignments Sequences Alignments ʧ ɪ l ɐ vʲ ɛ k ʧ - - o v ɛ k 1 6 / 32

9 Pairwise Sequence Alignment Multiple Sequence Alignment hjärta cordis herz heart h j - ä r t a - h - e - r z - - h - e a r t - - c - - o r d i s 1 7 / 32

10 8 / 32 Pairwise Sequence Alignment Multiple Sequence Alignment Pairwise Sequence Alignment Create a matrix which confronts all segments of two sequences, either with each other, or with gaps Seek the path through the matrix which is of the lowest cost (or the highest score) Calculate the cost (or the score) cumulatively by scoring the matching of segments with segments and with gaps by means of a specific scoring function

11 9 / 32 Pairwise Sequence Alignment Pairwise Sequence Alignment Multiple Sequence Alignment T E S T T E S T 8

12 9 / 32 Pairwise Sequence Alignment Pairwise Sequence Alignment Multiple Sequence Alignment T E S T T E S T 6

13 9 / 32 Pairwise Sequence Alignment Pairwise Sequence Alignment Multiple Sequence Alignment T E S - T T - E S T 8

14 9 / 32 Pairwise Sequence Alignment Pairwise Sequence Alignment Multiple Sequence Alignment T E S T T - E S T 7

15 9 / 32 Pairwise Sequence Alignment Pairwise Sequence Alignment Multiple Sequence Alignment T E - S T T - - E S T 8

16 9 / 32 Pairwise Sequence Alignment Pairwise Sequence Alignment Multiple Sequence Alignment T E S T T - - E S T 7

17 9 / 32 Pairwise Sequence Alignment Pairwise Sequence Alignment Multiple Sequence Alignment T - E S T T E S T 8

18 9 / 32 Pairwise Sequence Alignment Pairwise Sequence Alignment Multiple Sequence Alignment T E S T T E S T 6

19 9 / 32 Pairwise Sequence Alignment Pairwise Sequence Alignment Multiple Sequence Alignment T E S T - - T - - E S T 5

20 9 / 32 Pairwise Sequence Alignment Pairwise Sequence Alignment Multiple Sequence Alignment T E S - T - - T - - E - S T 6

21 9 / 32 Pairwise Sequence Alignment Pairwise Sequence Alignment Multiple Sequence Alignment T E S T - - T - E - S T 5

22 9 / 32 Pairwise Sequence Alignment Pairwise Sequence Alignment Multiple Sequence Alignment T E - S T - - T - E - - S T 6

23 9 / 32 Pairwise Sequence Alignment Pairwise Sequence Alignment Multiple Sequence Alignment T E S T - - T E - - S T 4

24 9 / 32 Pairwise Sequence Alignment Pairwise Sequence Alignment Multiple Sequence Alignment T E S T - T E - S T 3

25 9 / 32 Pairwise Sequence Alignment Pairwise Sequence Alignment Multiple Sequence Alignment T E - S T - T E S - - T 4

26 9 / 32 Pairwise Sequence Alignment Pairwise Sequence Alignment Multiple Sequence Alignment T E S T - T E S - T 2

27 9 / 32 Pairwise Sequence Alignment Pairwise Sequence Alignment Multiple Sequence Alignment T E S T T E S T 0

28 10 / 32 Pairwise Sequence Alignment Multiple Sequence Alignment Multiple Sequence Alignments Guide Tree Heuristics Due to computational restrictions, multiple sequence alignment (MSA) is based on heuristics Heuristics based on guide-trees are the most common ones used in computational biology Based on pairwise alignment scores, a guide-tree is reconstructed, and the sequences are stepwise added to the MSA along it (Feng & Dolittle 1987)

29 11 / 32 Multiple Sequence Alignment Pairwise Sequence Alignment Multiple Sequence Alignment čelovek human člověk human człowiek human čovek human Russian Czech Polish Bulgarian

30 11 / 32 Pairwise Sequence Alignment Multiple Sequence Alignment Multiple Sequence Alignment ʧɪlɐvʲɛk Russian ʧlovʲɛk Czech ʧwɔvʲɛk Polish ʧovɛk Bulgarian

31 11 / 32 Pairwise Sequence Alignment Multiple Sequence Alignment Multiple Sequence Alignment ʧɪlɐvʲɛk Russian ʧlovʲɛk Czech ʧwɔvʲɛk Polish ʧovɛk Bulgarian

32 11 / 32 Pairwise Sequence Alignment Multiple Sequence Alignment Multiple Sequence Alignment ʧɪlɐvʲɛk Russian ʧlovʲɛk Czech ʧwɔvʲɛk Polish ʧovɛk Bulgarian

33 11 / 32 Pairwise Sequence Alignment Multiple Sequence Alignment Multiple Sequence Alignment ʧɪlɐvʲɛk Russian ʧlovʲɛk Czech ʧwɔvʲɛk Polish ʧovɛk Bulgarian

34 11 / 32 Pairwise Sequence Alignment Multiple Sequence Alignment Multiple Sequence Alignment ʧ ɪ l ɐ vʲ ɛ k ʧ - l o vʲ ɛ k ʧɪlɐvʲɛk Russian ʧlovʲɛk Czech ʧwɔvʲɛk Polish ʧovɛk Bulgarian

35 11 / 32 Pairwise Sequence Alignment Multiple Sequence Alignment Multiple Sequence Alignment ʧ ɪ l ɐ vʲ ɛ k ʧ - l o vʲ ɛ k ʧ w ɔ vʲ ɛ k ʧ - o v ɛ k ʧɪlɐvʲɛk Russian ʧlovʲɛk Czech ʧwɔvʲɛk Polish ʧovɛk Bulgarian

36 11 / 32 Pairwise Sequence Alignment Multiple Sequence Alignment Multiple Sequence Alignment ʧ ɪ l ɐ vʲ ɛ k ʧ l o vʲ ɛ k ʧ w ɔ vʲ ɛ k ʧ - - o v ɛ k ʧ ɪ l ɐ vʲ ɛ k ʧ - l o vʲ ɛ k ʧ w ɔ vʲ ɛ k ʧ - o v ɛ k ʧɪlɐvʲɛk Russian ʧlovʲɛk Czech ʧwɔvʲɛk Polish ʧovɛk Bulgarian

37 12 / 32 Pairwise Sequence Alignment Multiple Sequence Alignment Multiple Sequence Alignment Profiles The guide-tree heuristic can be enhanced by the application of profiles A profile consists of the relative frequency of all segments of an MSA in all its positions, thus, a profile represents an MSA as a sequence of vectors Aligning profiles to profiles instead of aligning two representative sequences of two given MSA yields better results, since more information can be taken into account

38 13 / 32 Pairwise Sequence Alignment Multiple Sequence Alignment Multiple Sequence Alignment ʧ ɪ l ɐ vʲ ɛ k ʧ l o vʲ ɛ k ʧ w ɔ vʲ ɛ k ʧ - - o v ɛ k ʧ ɪ l ɐ vʲ ɛ k ʧ - l o vʲ ɛ k ʧ w ɔ vʲ ɛ k ʧ - o v ɛ k ʧɪlɐvʲɛk Russian ʧlovʲɛk Czech ʧwɔvʲɛk Polish ʧovɛk Bulgarian

39 13 / 32 Multiple Sequence Alignment Pairwise Sequence Alignment Multiple Sequence Alignment ʧ ɪ l ɐ vʲ ɛ k ʧ l o vʲ ɛ k ʧ w ɔ vʲ ɛ k ʧ - - o v ɛ k

40 13 / 32 Multiple Sequence Alignment Pairwise Sequence Alignment Multiple Sequence Alignment ʧ 10 ʧ ɪ l ɐ vʲ ɛ k ʧ l o vʲ ɛ k ʧ w ɔ vʲ ɛ k ʧ - - o v ɛ k

41 13 / 32 Multiple Sequence Alignment Pairwise Sequence Alignment Multiple Sequence Alignment ʧ ɪ l ɐ vʲ ɛ k ʧ l o vʲ ɛ k ʧ w ɔ vʲ ɛ k ʧ - - o v ɛ k ʧ 10 ɪ 25-75

42 13 / 32 Multiple Sequence Alignment Pairwise Sequence Alignment Multiple Sequence Alignment ʧ ɪ l ɐ vʲ ɛ k ʧ l o vʲ ɛ k ʧ w ɔ vʲ ɛ k ʧ - - o v ɛ k ʧ 10 ɪ l 5 w 25

43 13 / 32 Multiple Sequence Alignment Pairwise Sequence Alignment Multiple Sequence Alignment ʧ ɪ l ɐ vʲ ɛ k ʧ l o vʲ ɛ k ʧ w ɔ vʲ ɛ k ʧ - - o v ɛ k ʧ 10 ɪ l 5 w 25 o 5 ɔ 25 ɐ 25

44 13 / 32 Multiple Sequence Alignment Pairwise Sequence Alignment Multiple Sequence Alignment ʧ ɪ l ɐ vʲ ɛ k ʧ l o vʲ ɛ k ʧ w ɔ vʲ ɛ k ʧ - - o v ɛ k ʧ 10 ɪ l 5 w 25 o 5 ɔ 25 ɐ 25 vʲ 75 v 25

45 13 / 32 Multiple Sequence Alignment Pairwise Sequence Alignment Multiple Sequence Alignment ʧ ɪ l ɐ vʲ ɛ k ʧ l o vʲ ɛ k ʧ w ɔ vʲ ɛ k ʧ - - o v ɛ k ʧ 10 ɪ l 5 w 25 o 5 ɔ 25 ɐ 25 vʲ 75 v 25 ɛ 10

46 13 / 32 Multiple Sequence Alignment Pairwise Sequence Alignment Multiple Sequence Alignment ʧ ɪ l ɐ vʲ ɛ k ʧ l o vʲ ɛ k ʧ w ɔ vʲ ɛ k ʧ - - o v ɛ k ʧ 10 ɪ l 5 w 25 o 5 ɔ 25 ɐ 25 vʲ 75 v 25 ɛ 10 k 10

47 13 / 32 Multiple Sequence Alignment Pairwise Sequence Alignment Multiple Sequence Alignment ʧ ɪ l ɐ vʲ ɛ k ʧ l o vʲ ɛ k ʧ w ɔ vʲ ɛ k ʧ - - o v ɛ k ʧ 10 ɪ l 5 w 25 o 5 ɔ 25 ɐ 25 vʲ 75 v 25 ɛ 10 k 10

48 14 / 32 Similarity Sound Classes *ph2tēr *faθēr father 1

49 15 / 32 Similarity Similarity Sound Classes Synchronic Similarity Sounds in different languages are judged to be similar, if they show resemblences regarding the way they are produced or perceived Diachronic Similarity Sounds in different languages are judged to be similar, if they go back to a common ancestor

50 Similarity Sound Classes Similarity Language Word Meaning Mandarin ma⁵⁵ma³ mother German mama mother Russian tak in this way German tʰaːk day 16 / 32

51 Similarity Sound Classes Similarity Language Word Meaning German ʦʰaːn tooth English tʊːθ tooth Italian dɛntɛ tooth French dɑ tooth 16 / 32

52 Similarity Similarity Sound Classes German ʦʰ aː n - * Proto-Germanic t a n d English t ʊː θ - ** Proto-Indo-European d o n t Italian d ɛ n t ɛ * Proto-Romance d e n t French d a - - funktionier endlich! 17 / 32

53 Similarity Similarity Sound Classes German ʦʰ aː n - * Proto-Germanic t a n d English t ʊː θ - ** Proto-Indo-European d o n t Italian d ɛ n t ɛ * Proto-Romance d e n t French d a - - funktionier endlich! 17 / 32

54 Similarity Similarity Sound Classes German ʦʰ aː n - * Proto-Germanic t a n d English t ʊː - θ ** Proto-Indo-European d o n t Italian d ɛ n t ə * Proto-Romance d e n t French d a - - funktionier endlich! 17 / 32

55 Similarity Similarity Sound Classes German ʦʰ aː n - * Proto-Germanic t a n θ English t ʊː - θ ** Proto-Indo-European d o n t Italian d ɛ n t ə * Proto-Romance d e n t French d a - - funktionier endlich! 17 / 32

56 Similarity Similarity Sound Classes German ʦʰ aː n - * Proto-Germanic t a n θ English t ʊː - θ ** Proto-Indo-European d o n t Italian d ɛ n t ə * Proto-Romance d e n t French d a - - funktionier endlich! 17 / 32

57 Similarity Similarity Sound Classes German ʦʰ aː n - * Proto-Germanic t a n θ English t ʊː - θ ** Proto-Indo-European d o n t Italian d ɛ n t ə * Proto-Romance d e n t French d a - - funktionier endlich! 17 / 32

58 Similarity Similarity Sound Classes German ʦʰ aː n - * Proto-Germanic t a n d English t ʊː - θ ** Proto-Indo-European d o n t Italian d ɛ n t ə * Proto-Romance d e n t French d a - - funktionier endlich! 17 / 32

59 Sound Classes Similarity Sound Classes Correspondence Classes In sound class approaches, sounds are divided into several types and thereby distinguished in such a way that phonetic correspondences inside a type are more regular than those between different types (Dolgopolsky 1986: 35) Diachronic Similarity Similarity is not based on synchronic resemblances of sounds but on class-membership: two sounds, how dissimilar they may be from a synchronic perspective, may still belong to the same class Class membership indicates that the probability that sounds occur in a correspondence relationship in genetically related languages is considerably high 18 / 32

60 19 / 32 Sound Classes Similarity Sound Classes k g p b ʧ ʤ f v t d ʃ ʒ θ ð s z

61 19 / 32 Sound Classes Similarity Sound Classes k g p b ʧ ʤ f v t d ʃ ʒ θ ð s z

62 19 / 32 Sound Classes Similarity Sound Classes k g p b ʧ ʤ f v t d ʃ ʒ θ ð s z

63 19 / 32 Sound Classes Similarity Sound Classes K P T S

64 20 / 32 Main Ideas Working Principle Scoring

65 21 / 32 Main Ideas Working Principle Scoring A Python Library for Sequence Alignment (wwwlingulistde/lingpy) is a suite of open source Python modules for sequence comparison, and distance analyses in quantitative historical linguistics The library allows to carry out both pairwise and multiple alignments of strings encoded in IPA or X-Sampa, using different methods and algorithms, such as global (Needleman & Wunsch 1970) and local (Smith & Waterman 1981) pairwise alignments, multiple alignments based on guide trees (Feng & Doolittle 1987), profiles (Thompson et al 1994), or iteration (Barton & Sternberg 1987)

66 22 / 32 Main Ideas Main Ideas Working Principle Scoring Alignment of Sound Class Sequences In contrast to previous approaches, which base the alignment on the sequences as they are given from the input, within the sound class approach, the input strings are first converted to sound classes before they are aligned Transitions Between Sound Classes In contrast to previous sound class approaches (cf eg Turchin et al 2010), which do not allow for transitions between sound classes, this approach is based on a specific scoring function, which defines (diachronic) similarity among different sound classes

67 23 / 32 Working Principle Main Ideas Working Principle Scoring INPUT ʧɪlɐvʲɛk ʧovɛk

68 23 / 32 Working Principle Main Ideas Working Principle Scoring INPUT ʧɪlɐvʲɛk ʧovɛk TOKENIZATION ʧ, ɪ, l, ɐ, vʲ, ɛ, k ʧ, o, v, ɛ, k

69 23 / 32 Working Principle Main Ideas Working Principle Scoring INPUT ʧɪlɐvʲɛk ʧovɛk TOKENIZATION ʧ, ɪ, l, ɐ, vʲ, ɛ, k ʧ, o, v, ɛ, k CONVERSION CILAWEK COWEK

70 23 / 32 Working Principle Main Ideas Working Principle Scoring INPUT ʧɪlɐvʲɛk ʧovɛk TOKENIZATION ʧ, ɪ, l, ɐ, vʲ, ɛ, k ʧ, o, v, ɛ, k CONVERSION CILAWEK COWEK ALIGNMENT C I L A W E K C - - O W E K

71 23 / 32 Working Principle Main Ideas Working Principle Scoring INPUT ʧɪlɐvʲɛk ʧovɛk TOKENIZATION ʧ, ɪ, l, ɐ, vʲ, ɛ, k ʧ, o, v, ɛ, k CONVERSION CILAWEK COWEK ALIGNMENT C I L A W E K C - - O W E K OUTPUT ʧ ɪ l ɐ vʲ ɛ k ʧ - - o v ɛ k

72 Scoring Main Ideas Working Principle Scoring Directionality of Sound Changes One crucial characteristic of certain well-known sound changes is their directionality, ie if certain sounds change, this change will go into a certain direction and the reverse change can rarely be attested Directionality and Sound Correspondences While the nature of certain sound changes may be directional, sound correspondences do not directly reflect this directionality, and neither do scoring functions for sequence alignments, since these are not directional per definitionem, since the distance or similarity between two segments is always the same, regardless from which segment we start to compare 24 / 32

73 25 / 32 Scoring Main Ideas Working Principle Scoring Reflecting Directionality in Undirected Networks In this approach, the directionality of certain sound changes is accounted for by creating a non-metric scoring function While in a metric scoring function the distance between two segments A and B would depend on the distance of A and B to a third segment C in such a way that, according to the triangle inequality the distance from A to B could not exceed the sum of the distances from A to C and from B to C, this does not hold for the probability of those sound correspondences, which occur as a product of directional sound change

74 Scoring Main Ideas Working Principle Scoring 10 dentals affricates fricatives velars / 32

75 Scoring Main Ideas Working Principle Scoring 10 dentals affricates fricatives velars / 32

76 Scoring Main Ideas Working Principle Scoring 10 dentals affricates fricatives velars / 32

77 Scoring Main Ideas Working Principle Scoring 10 dentals affricates fricatives velars / 32

78 Scoring Main Ideas Working Principle Scoring 10 dentals affricates fricatives velars / 32

79 Scoring Main Ideas Working Principle Scoring 10 dentals affricates fricatives velars / 32

80 Scoring Main Ideas Working Principle Scoring 10 dentals affricates fricatives velars / 32

81 27 / 32 TPPSR * * * * * * * * * * * * * v o l - d e m o r t v - l a d i m i r - v a l - d e m a r - 1

82 TPPSR 28 / 32

83 TPPSR >>> from lingpycompareseqcom import Multiple 28 / 32

84 TPPSR >>> from lingpycompareseqcom import Multiple >>> mult = Multiple(['ʧwovʲɛk', 'ʧovɛk',\ 'ʧlɔvʲɛk', 'ʧɪlɐvʲɛk']) 28 / 32

85 TPPSR >>> from lingpycompareseqcom import Multiple >>> mult = Multiple(['ʧwovʲɛk', 'ʧovɛk',\ 'ʧlɔvʲɛk', 'ʧɪlɐvʲɛk']) >>> print ', 'join(multipt_seqs) 28 / 32

86 TPPSR >>> from lingpycompareseqcom import Multiple >>> mult = Multiple(['ʧwovʲɛk', 'ʧovɛk',\ 'ʧlɔvʲɛk', 'ʧɪlɐvʲɛk']) >>> print ', 'join(multipt_seqs) ʧwɔvʲɛk, ʧovɛk, ʧlovʲɛk, ʧɪlɐvʲɛk 28 / 32

87 TPPSR >>> from lingpycompareseqcom import Multiple >>> mult = Multiple(['ʧwovʲɛk', 'ʧovɛk',\ 'ʧlɔvʲɛk', 'ʧɪlɐvʲɛk']) >>> print ', 'join(multipt_seqs) ʧwɔvʲɛk, ʧovɛk, ʧlovʲɛk, ʧɪlɐvʲɛk >>> multprog_align(method='sca',mode='profile') 28 / 32

88 TPPSR >>> from lingpycompareseqcom import Multiple >>> mult = Multiple(['ʧwovʲɛk', 'ʧovɛk',\ 'ʧlɔvʲɛk', 'ʧɪlɐvʲɛk']) >>> print ', 'join(multipt_seqs) ʧwɔvʲɛk, ʧovɛk, ʧlovʲɛk, ʧɪlɐvʲɛk >>> multprog_align(method='sca',mode='profile') ʧ - w ɔ vʲ ɛ k ʧ - - o v ɛ k ʧ - l o vʲ ɛ k ʧ ɪ l ɐ vʲ ɛ k 28 / 32

89 TPPSR >>> from lingpycompareseqcom import Multiple >>> mult = Multiple(['ʧwovʲɛk', 'ʧovɛk',\ 'ʧlɔvʲɛk', 'ʧɪlɐvʲɛk']) >>> print ', 'join(multipt_seqs) ʧwɔvʲɛk, ʧovɛk, ʧlovʲɛk, ʧɪlɐvʲɛk >>> multprog_align(method='sca',mode='profile') ʧ - w ɔ vʲ ɛ k ʧ - - o v ɛ k ʧ - l o vʲ ɛ k ʧ ɪ l ɐ vʲ ɛ k >>> multshow_guide_tree() 28 / 32

90 TPPSR >>> from lingpycompareseqcom import Multiple >>> mult = Multiple(['ʧwovʲɛk', 'ʧovɛk',\ 'ʧlɔvʲɛk', 'ʧɪlɐvʲɛk']) >>> print ', 'join(multipt_seqs) ʧwɔvʲɛk, ʧovɛk, ʧlovʲɛk, ʧɪlɐvʲɛk >>> multprog_align(method='sca',mode='profile') ʧ - w ɔ vʲ ɛ k ʧ - - o v ɛ k ʧ - l o vʲ ɛ k ʧ ɪ l ɐ vʲ ɛ k >>> multshow_guide_tree() /-0:ʧwɔvʲɛk / \-1:ʧovɛk /-3:ʧlovʲɛk \ \-2:ʧɪlɐvʲɛk 28 / 32

91 TPPSR >>> from lingpycompareseqcom import Multiple >>> mult = Multiple(['ʧwovʲɛk', 'ʧovɛk',\ 'ʧlɔvʲɛk', 'ʧɪlɐvʲɛk']) >>> print ', 'join(multipt_seqs) ʧwɔvʲɛk, ʧovɛk, ʧlovʲɛk, ʧɪlɐvʲɛk >>> multprog_align(method='sca',mode='profile') ʧ - w ɔ vʲ ɛ k ʧ - - o v ɛ k ʧ - l o vʲ ɛ k ʧ ɪ l ɐ vʲ ɛ k >>> multshow_guide_tree() /-0:ʧwɔvʲɛk / \-1:ʧovɛk /-3:ʧlovʲɛk \ \-2:ʧɪlɐvʲɛk >>> print ', 'join([seqcls_str for seq in \ multlingpy_seqs]) 28 / 32

92 TPPSR >>> from lingpycompareseqcom import Multiple >>> mult = Multiple(['ʧwovʲɛk', 'ʧovɛk',\ 'ʧlɔvʲɛk', 'ʧɪlɐvʲɛk']) >>> print ', 'join(multipt_seqs) ʧwɔvʲɛk, ʧovɛk, ʧlovʲɛk, ʧɪlɐvʲɛk >>> multprog_align(method='sca',mode='profile') ʧ - w ɔ vʲ ɛ k ʧ - - o v ɛ k ʧ - l o vʲ ɛ k ʧ ɪ l ɐ vʲ ɛ k >>> multshow_guide_tree() /-0:ʧwɔvʲɛk / \-1:ʧovɛk /-3:ʧlovʲɛk \ \-2:ʧɪlɐvʲɛk >>> print ', 'join([seqcls_str for seq in \ multlingpy_seqs]) CWOWEK, COWEK, CLOWEK, CILAWEK 28 / 32

93 TPPSR 29 / 32

94 >>> multflat_cluster(03,method='sca') TPPSR 29 / 32

95 >>> multflat_cluster(03,method='sca') [1, 1, 1, 1] TPPSR 29 / 32

96 TPPSR >>> multflat_cluster(03,method='sca') [1, 1, 1, 1] >>> multprog_align(method='sca',mode='profile')\ # profile-based alignment 29 / 32

97 TPPSR >>> multflat_cluster(03,method='sca') [1, 1, 1, 1] >>> multprog_align(method='sca',mode='profile')\ # profile-based alignment ʧ - w ɔ vʲ ɛ k ʧ - - o v ɛ k ʧ - l o vʲ ɛ k ʧ ɪ l ɐ vʲ ɛ k 29 / 32

98 TPPSR >>> multflat_cluster(03,method='sca') [1, 1, 1, 1] >>> multprog_align(method='sca',mode='profile')\ # profile-based alignment ʧ - w ɔ vʲ ɛ k ʧ - - o v ɛ k ʧ - l o vʲ ɛ k ʧ ɪ l ɐ vʲ ɛ k >>> multsum_of_pairs() 29 / 32

99 TPPSR >>> multflat_cluster(03,method='sca') [1, 1, 1, 1] >>> multprog_align(method='sca',mode='profile')\ # profile-based alignment ʧ - w ɔ vʲ ɛ k ʧ - - o v ɛ k ʧ - l o vʲ ɛ k ʧ ɪ l ɐ vʲ ɛ k >>> multsum_of_pairs() / 32

100 TPPSR >>> multflat_cluster(03,method='sca') [1, 1, 1, 1] >>> multprog_align(method='sca',mode='profile')\ # profile-based alignment ʧ - w ɔ vʲ ɛ k ʧ - - o v ɛ k ʧ - l o vʲ ɛ k ʧ ɪ l ɐ vʲ ɛ k >>> multsum_of_pairs() >>> multprog_align(method='sca',mode='fd') \ # simple guide-tree alignment 29 / 32

101 TPPSR >>> multflat_cluster(03,method='sca') [1, 1, 1, 1] >>> multprog_align(method='sca',mode='profile')\ # profile-based alignment ʧ - w ɔ vʲ ɛ k ʧ - - o v ɛ k ʧ - l o vʲ ɛ k ʧ ɪ l ɐ vʲ ɛ k >>> multsum_of_pairs() >>> multprog_align(method='sca',mode='fd') \ # simple guide-tree alignment ʧ w - ɔ vʲ ɛ k ʧ - - o v ɛ k ʧ - l o vʲ ɛ k ʧ ɪ l ɐ vʲ ɛ k 29 / 32

102 TPPSR >>> multflat_cluster(03,method='sca') [1, 1, 1, 1] >>> multprog_align(method='sca',mode='profile')\ # profile-based alignment ʧ - w ɔ vʲ ɛ k ʧ - - o v ɛ k ʧ - l o vʲ ɛ k ʧ ɪ l ɐ vʲ ɛ k >>> multsum_of_pairs() >>> multprog_align(method='sca',mode='fd') \ # simple guide-tree alignment ʧ w - ɔ vʲ ɛ k ʧ - - o v ɛ k ʧ - l o vʲ ɛ k ʧ ɪ l ɐ vʲ ɛ k >>> multiterate() 29 / 32

103 TPPSR >>> multflat_cluster(03,method='sca') [1, 1, 1, 1] >>> multprog_align(method='sca',mode='profile')\ # profile-based alignment ʧ - w ɔ vʲ ɛ k ʧ - - o v ɛ k ʧ - l o vʲ ɛ k ʧ ɪ l ɐ vʲ ɛ k >>> multsum_of_pairs() >>> multprog_align(method='sca',mode='fd') \ # simple guide-tree alignment ʧ w - ɔ vʲ ɛ k ʧ - - o v ɛ k ʧ - l o vʲ ɛ k ʧ ɪ l ɐ vʲ ɛ k >>> multiterate() Old SoP score: New SoP score: / 32

104 TPPSR >>> multflat_cluster(03,method='sca') [1, 1, 1, 1] >>> multprog_align(method='sca',mode='profile')\ # profile-based alignment ʧ - w ɔ vʲ ɛ k ʧ - - o v ɛ k ʧ - l o vʲ ɛ k ʧ ɪ l ɐ vʲ ɛ k >>> multsum_of_pairs() >>> multprog_align(method='sca',mode='fd') \ # simple guide-tree alignment ʧ w - ɔ vʲ ɛ k ʧ - - o v ɛ k ʧ - l o vʲ ɛ k ʧ ɪ l ɐ vʲ ɛ k >>> multiterate() Old SoP score: New SoP score: ʧ - w ɔ vʲ ɛ k ʧ - - o v ɛ k ʧ - l o vʲ ɛ k ʧ ɪ l ɐ vʲ ɛ k 29 / 32

105 TPPSR TPPSR IPA-Encoding of the TPPSR The Tableaux phonétiques des patois suisses romand (TPPSR, Gauchat et al 1925) is a collection of phonetic dialect data, which was digitized in an earlier research project of the Institute for Romance Languages and Literature (Heinrich Heine University Düsseldorf) The original data was converted to IPA in order make it suitable for alignment analyses using the library The dataset consists of 480 charts (480 words and phrases) which contain phonetic information for 62 dialect points Analysis within The analysis within is done via a simple terminal-based interface which takes text-files as input and outputs the results of the alignment analyses as text-files 30 / 32

106 31 / 32 TPPSR TPPSR tppsr 69,sont tout près,pressu 2 sɔ tɔpriː 3 isɔ tɔʤeː 5 eisəɔ tɔprei 8 sɔ pre 11 sɔ tɔpruːʦɔ 18 sɔ pre 19 sɔ tɔpre 30 ʃʊnpre 31 ʃɔñtɔprei 34 isɔ tɔpre 54 ɛsɔ tɔprɛ 55 prɛj 56 asa o tɔdkoːt 57 sɔ tɔpreː 58 asɔ tɔpreŋ Interesting Site!

107 31 / 32 TPPSR TPPSR tppsr 69,sont tout près,pressu s ɔ - t ɔ p r iː isɔ tɔʤeː 5 1 ei s əɔ - t ɔ p r ei s ɔ p r e s ɔ - t ɔ p r uː ʦ ɔ s ɔ p r e s ɔ - t ɔ p r e ʃ ʊ n - - p r e ʃ ɔ n t ɔ p r ei i s ɔ - t ɔ p r e ɛ s ɔ - t ɔ p r ɛ p r ɛ j asa o tɔdkoːt s ɔ - t ɔ p r eː a s ɔ - t ɔ p r e ŋ -

108 31 / 32 TPPSR TPPSR tppsr 69,sont tout près,pressu s ɔ - t ɔ p r iː isɔ tɔʤeː 5 1 ei s əɔ - t ɔ p r ei s ɔ p r e s ɔ - t ɔ p r uː ʦ ɔ s ɔ p r e s ɔ - t ɔ p r e ʃ ʊ n - - p r e ʃ ɔ n t ɔ p r ei i s ɔ - t ɔ p r e ɛ s ɔ - t ɔ p r ɛ p r ɛ j asa o tɔdkoːt s ɔ - t ɔ p r eː a s ɔ - t ɔ p r e ŋ - Taxon-ID

109 31 / 32 TPPSR TPPSR tppsr 69,sont tout près,pressu s ɔ - t ɔ p r iː isɔ tɔʤeː 5 1 ei s əɔ - t ɔ p r ei s ɔ p r e s ɔ - t ɔ p r uː ʦ ɔ s ɔ p r e s ɔ - t ɔ p r e ʃ ʊ n - - p r e ʃ ɔ n t ɔ p r ei i s ɔ - t ɔ p r e ɛ s ɔ - t ɔ p r ɛ p r ɛ j asa o tɔdkoːt s ɔ - t ɔ p r eː a s ɔ - t ɔ p r e ŋ - Cluster-ID

110 31 / 32 TPPSR TPPSR tppsr 69,sont tout près,pressu s ɔ - t ɔ p r iː isɔ tɔʤeː Singleton 5 1 ei s əɔ - t ɔ p r ei s ɔ p r e s ɔ - t ɔ p r uː ʦ ɔ s ɔ p r e s ɔ - t ɔ p r e ʃ ʊ n - - p r e ʃ ɔ n t ɔ p r ei i s ɔ - t ɔ p r e ɛ s ɔ - t ɔ p r ɛ p r ɛ j asa o tɔdkoːt Singleton s ɔ - t ɔ p r eː a s ɔ - t ɔ p r e ŋ - Cluster-ID Taxon-ID

111 31 / 32 TPPSR TPPSR tppsr 69,sont tout près,pressu s ɔ - t ɔ p r iː - - Singleton 5 1 ei s əɔ - t ɔ p r ei s ɔ p r e s ɔ - t ɔ p r uː ʦ ɔ s ɔ p r e s ɔ - t ɔ p r e ʃ ʊ n - - p r e ʃ ɔ n t ɔ p r ei i s ɔ - t ɔ p r e ɛ s ɔ - t ɔ p r ɛ p r ɛ j - Singleton s ɔ - t ɔ p r eː a s ɔ - t ɔ p r e ŋ - Cluster-ID Taxon-ID

112 31 / 32 TPPSR TPPSR tppsr 69,sont tout près,pressu s ɔ - t ɔ p r iː - - Boring Site! 5 1 ei s əɔ - t ɔ p r ei s ɔ p r e s ɔ - t ɔ p r uː ʦ ɔ s ɔ p r e s ɔ - t ɔ p r e ʃ ʊ n - - p r e ʃ ɔ n t ɔ p r ei i s ɔ - t ɔ p r e ɛ s ɔ - t ɔ p r ɛ p r ɛ j s ɔ - t ɔ p r eː a s ɔ - t ɔ p r e ŋ - Cluster-ID Taxon-ID

113 31 / 32 TPPSR TPPSR tppsr 69,sont tout près,pressu s ɔ - t ɔ p r iː - - Interesting Site! 5 1 ei s əɔ - t ɔ p r ei s ɔ p r e s ɔ - t ɔ p r uː ʦ ɔ s ɔ p r e s ɔ - t ɔ p r e ʃ ʊ n - - p r e ʃ ɔ n t ɔ p r ei i s ɔ - t ɔ p r e ɛ s ɔ - t ɔ p r ɛ p r ɛ j s ɔ - t ɔ p r eː a s ɔ - t ɔ p r e ŋ -

114 31 / 32 TPPSR TPPSR tppsr 66,est étroite,stricta 1 ɛetrɑːt 2 ɛɛtræːtɛ 3 ɛɛtreːta 5 ɛɛtraɛːta 8 ɛɛtrɑːɛt 11 lɛɛtræːtə 19 lɛetrɑːtə 30 lɛθɛθreiti 31 lʲɛɛhriːti 34 ɛteːtraːto 55 ɛɛtræit 56 ɛɛtrɑːət 57 ɛɛtrɛt 58 jɛɛtreːt Interesting Site!

115 31 / 32 TPPSR TPPSR tppsr 66,est étroite,stricta ɛ - e t r ɑː t ɛ - ɛ t r æː t ɛ ɛ - ɛ t r eː t a ɛ - ɛ t r aɛː t a ɛ - ɛ t r ɑːɛ t l ɛ - ɛ t r æː t ə 19 1 l ɛ - e t r ɑː t ə 30 1 l ɛ θ ɛ θ r ei t i 31 1 lʲ ɛ - ɛ h r iː t i ɛ t eː t r aː t o ɛ - ɛ t r æi t ɛ - ɛ t r ɑːə t ɛ - ɛ t r ɛ t j ɛ - ɛ t r eː t -

116 31 / 32 TPPSR TPPSR tppsr 195,une feuille,folia 1 ɔñafɔlʲ 2 nafolʲɛ 3 nafɔlʲ 5 unafɔlʲə 8 ɔñafɔjə 11 ɔñafɔlʲə 19 naføðə 30 folʲe 31 fɔłe 34 nafwolʲ 55 ɔnfɔdʲ 56 ɛnfuj 57 ɛnfuj 58 ɛnfœj

117 31 / 32 TPPSR TPPSR tppsr 195,une feuille,folia 1 1 ɔ n a f - ɔ lʲ n a f - o lʲ ɛ n a f - ɔ lʲ u n a f - ɔ lʲ ə 8 1 ɔ n a f - ɔ j ə 11 1 ɔ n a f - ɔ lʲ ə n a f - ø ð ə f - o lʲ e f - ɔ ł e n a f w o lʲ ɔ n - f - ɔ dʲ ɛ n - f - u j ɛ n - f - u j ɛ n - f - œ j -

118 TPPSR Thank You for Listening! Special thanks to the German Federal Ministry of Education and Research (BMBF) for funding our research project on evolution and classification in biology, linguistics, and the history of science (EvoClass) 1 32 / 32

Sound Correspondences in the World's Languages: Online Supplementary Materials

Sound Correspondences in the World's Languages: Online Supplementary Materials Sound Correspondences in the World's Languages: Online Supplementary Materials Cecil H. Brown, Eric W. Holman, Søren Wichmann Language, Volume 89, Number 1, March 2013, pp. s1-s76 (Article) Published by

More information

Phonetic Word Search One - The Weather

Phonetic Word Search One - The Weather Phonetic Word Search One - The Weather At the bottom of the page you have a list of thirty-three words that are concerned with the weather. Hidden in the block of phonetic letters in the middle of the

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9 Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic

More information

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline

More information

Tools and Algorithms in Bioinformatics

Tools and Algorithms in Bioinformatics Tools and Algorithms in Bioinformatics GCBA815, Fall 2015 Week-4 BLAST Algorithm Continued Multiple Sequence Alignment Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and

More information

Effects of Gap Open and Gap Extension Penalties

Effects of Gap Open and Gap Extension Penalties Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See

More information

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches Int. J. Bioinformatics Research and Applications, Vol. x, No. x, xxxx Phylogenies Scores for Exhaustive Maximum Likelihood and s Searches Hyrum D. Carroll, Perry G. Ridge, Mark J. Clement, Quinn O. Snell

More information

Unsupervised Vocabulary Induction

Unsupervised Vocabulary Induction Infant Language Acquisition Unsupervised Vocabulary Induction MIT (Saffran et al., 1997) 8 month-old babies exposed to stream of syllables Stream composed of synthetic words (pabikumalikiwabufa) After

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Single alignment: Substitution Matrix. 16 march 2017

Single alignment: Substitution Matrix. 16 march 2017 Single alignment: Substitution Matrix 16 march 2017 BLOSUM Matrix BLOSUM Matrix [2] (Blocks Amino Acid Substitution Matrices ) It is based on the amino acids substitutions observed in ~2000 conserved block

More information

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018 CONCEPT OF SEQUENCE COMPARISON Natapol Pornputtapong 18 January 2018 SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE Sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of

More information

A greedy, graph-based algorithm for the alignment of multiple homologous gene lists

A greedy, graph-based algorithm for the alignment of multiple homologous gene lists A greedy, graph-based algorithm for the alignment of multiple homologous gene lists Jan Fostier, Sebastian Proost, Bart Dhoedt, Yvan Saeys, Piet Demeester, Yves Van de Peer, and Klaas Vandepoele Bioinformatics

More information

Supporting Information

Supporting Information Supporting Information Blasi et al. 10.1073/pnas.1605782113 SI Materials and Methods Positional Test. We simulate, for each language and signal, random positions of the relevant signal-associated symbol

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

Collected Works of Charles Dickens

Collected Works of Charles Dickens Collected Works of Charles Dickens A Random Dickens Quote If there were no bad people, there would be no good lawyers. Original Sentence It was a dark and stormy night; the night was dark except at sunny

More information

2014 ATAR & WACE COURSE STATISTICS

2014 ATAR & WACE COURSE STATISTICS 2014 & WACE COURSE STATISTICS Table 1.1 Table 1.2 Table 1.3 Table 1.4 Table 2.1 Table 2.2 Table 2.3 Table 2.4 Table 3.1, by Range - Total Year 12 student population, by Range - School Leaver population,

More information

Prenominal Modifier Ordering via MSA. Alignment

Prenominal Modifier Ordering via MSA. Alignment Introduction Prenominal Modifier Ordering via Multiple Sequence Alignment Aaron Dunlop Margaret Mitchell 2 Brian Roark Oregon Health & Science University Portland, OR 2 University of Aberdeen Aberdeen,

More information

Overview Multiple Sequence Alignment

Overview Multiple Sequence Alignment Overview Multiple Sequence Alignment Inge Jonassen Bioinformatics group Dept. of Informatics, UoB Inge.Jonassen@ii.uib.no Definition/examples Use of alignments The alignment problem scoring alignments

More information

Phonological Correspondence as a Tool for Historical Analysis: An Algorithm for Word Alignment

Phonological Correspondence as a Tool for Historical Analysis: An Algorithm for Word Alignment Phonological Correspondence as a Tool for Historical Analysis: An Algorithm for Word Alignment Daniel M. Albro March 11, 1997 1 Introduction In historical linguistics research it is often necessary to

More information

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT 3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode

More information

An Efficient Sequential Monte Carlo Algorithm for Coalescent Clustering

An Efficient Sequential Monte Carlo Algorithm for Coalescent Clustering An Efficient Sequential Monte Carlo Algorithm for Coalescent Clustering Dilan Görür Gatsby Unit University College London dilan@gatsby.ucl.ac.uk Yee Whye Teh Gatsby Unit University College London ywteh@gatsby.ucl.ac.uk

More information

NUMB3RS Activity: DNA Sequence Alignment. Episode: Guns and Roses

NUMB3RS Activity: DNA Sequence Alignment. Episode: Guns and Roses Teacher Page 1 NUMB3RS Activity: DNA Sequence Alignment Topic: Biomathematics DNA sequence alignment Grade Level: 10-12 Objective: Use mathematics to compare two strings of DNA Prerequisite: very basic

More information

Pairwise sequence alignment

Pairwise sequence alignment Department of Evolutionary Biology Example Alignment between very similar human alpha- and beta globins: GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL G+ +VK+HGKKV A+++++AH+D++ +++++LS+LH KL GNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKL

More information

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Introduction to Comparative Protein Modeling. Chapter 4 Part I Introduction to Comparative Protein Modeling Chapter 4 Part I 1 Information on Proteins Each modeling study depends on the quality of the known experimental data. Basis of the model Search in the literature

More information

Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:

Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17: Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:50 5001 5 Multiple Sequence Alignment The first part of this exposition is based on the following sources, which are recommended reading:

More information

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand

More information

Distances and inference for covariance operators

Distances and inference for covariance operators Royal Holloway Probability and Statistics Colloquium 27th April 2015 Distances and inference for covariance operators Davide Pigoli This is part of joint works with J.A.D. Aston I.L. Dryden P. Secchi J.

More information

Bioinformatics and BLAST

Bioinformatics and BLAST Bioinformatics and BLAST Overview Recap of last time Similarity discussion Algorithms: Needleman-Wunsch Smith-Waterman BLAST Implementation issues and current research Recap from Last Time Genome consists

More information

Multiple sequence alignment

Multiple sequence alignment Multiple sequence alignment Multiple sequence alignment: today s goals to define what a multiple sequence alignment is and how it is generated; to describe profile HMMs to introduce databases of multiple

More information

Sequence Analysis and Databases 2: Sequences and Multiple Alignments

Sequence Analysis and Databases 2: Sequences and Multiple Alignments 1 Sequence Analysis and Databases 2: Sequences and Multiple Alignments Jose María González-Izarzugaza Martínez CNIO Spanish National Cancer Research Centre (jmgonzalez@cnio.es) 2 Sequence Comparisons:

More information

Sequence Bioinformatics. Multiple Sequence Alignment Waqas Nasir

Sequence Bioinformatics. Multiple Sequence Alignment Waqas Nasir Sequence Bioinformatics Multiple Sequence Alignment Waqas Nasir 2010-11-12 Multiple Sequence Alignment One amino acid plays coy; a pair of homologous sequences whisper; many aligned sequences shout out

More information

Distance and inference for covariance functions

Distance and inference for covariance functions Distance and inference for covariance functions Davide Pigoli SuSTaIn Workshop High dimensional and dependent functional data Bristol, 11/09/2012 This work is in collaboration with: Dr. J.A.D. Aston, Department

More information

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven) BMI/CS 776 Lecture #20 Alignment of whole genomes Colin Dewey (with slides adapted from those by Mark Craven) 2007.03.29 1 Multiple whole genome alignment Input set of whole genome sequences genomes diverged

More information

Pairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55

Pairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55 Pairwise Alignment Guan-Shieng Huang shieng@ncnu.edu.tw Dept. of CSIE, NCNU Pairwise Alignment p.1/55 Approach 1. Problem definition 2. Computational method (algorithms) 3. Complexity and performance Pairwise

More information

Regression Clustering

Regression Clustering Regression Clustering In regression clustering, we assume a model of the form y = f g (x, θ g ) + ɛ g for observations y and x in the g th group. Usually, of course, we assume linear models of the form

More information

Aspects of Tree-Based Statistical Machine Translation

Aspects of Tree-Based Statistical Machine Translation Aspects of Tree-Based Statistical Machine Translation Marcello Federico Human Language Technology FBK 2014 Outline Tree-based translation models: Synchronous context free grammars Hierarchical phrase-based

More information

8. Classification and Pattern Recognition

8. Classification and Pattern Recognition 8. Classification and Pattern Recognition 1 Introduction: Classification is arranging things by class or category. Pattern recognition involves identification of objects. Pattern recognition can also be

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Dublin City University (DCU) GCS/GCSE Recognised Subjects

Dublin City University (DCU) GCS/GCSE Recognised Subjects Dublin City University (DCU) GCS/GCSE Recognised Subjects Subjects listed below are recognised for entry to DCU. Unless otherwise indicated only one subject from each group may be presented. Applied A

More information

Computational Biology Lecture 5: Time speedup, General gap penalty function Saad Mneimneh

Computational Biology Lecture 5: Time speedup, General gap penalty function Saad Mneimneh Computational Biology Lecture 5: ime speedup, General gap penalty function Saad Mneimneh We saw earlier that it is possible to compute optimal global alignments in linear space (it can also be done for

More information

Sequence analysis and Genomics

Sequence analysis and Genomics Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute

More information

EM (cont.) November 26 th, Carlos Guestrin 1

EM (cont.) November 26 th, Carlos Guestrin 1 EM (cont.) Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University November 26 th, 2007 1 Silly Example Let events be grades in a class w 1 = Gets an A P(A) = ½ w 2 = Gets a B P(B) = µ

More information

Fall 2008 CSE Qualifying Exam. September 13, 2008

Fall 2008 CSE Qualifying Exam. September 13, 2008 Fall 2008 CSE Qualifying Exam September 13, 2008 1 Architecture 1. (Quan, Fall 2008) Your company has just bought a new dual Pentium processor, and you have been tasked with optimizing your software for

More information

More on Unsupervised Learning

More on Unsupervised Learning More on Unsupervised Learning Two types of problems are to find association rules for occurrences in common in observations (market basket analysis), and finding the groups of values of observational data

More information

Multiple Alignment. Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis

Multiple Alignment. Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Multiple Alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis gorm@cbs.dtu.dk Refresher: pairwise alignments 43.2% identity; Global alignment score: 374 10 20

More information

DOWNLOAD OR READ : CHINESE BIBLE PHONETIC ALPHABET SIMPLIFIED CHINESE PIN YIN PINYIN UNION NEW TESTAMENT PDF EBOOK EPUB MOBI

DOWNLOAD OR READ : CHINESE BIBLE PHONETIC ALPHABET SIMPLIFIED CHINESE PIN YIN PINYIN UNION NEW TESTAMENT PDF EBOOK EPUB MOBI DOWNLOAD OR READ : CHINESE BIBLE PHONETIC ALPHABET SIMPLIFIED CHINESE PIN YIN PINYIN UNION NEW TESTAMENT PDF EBOOK EPUB MOBI Page 1 Page 2 chinese bible phonetic alphabet simplified chinese pin yin pinyin

More information

Pairwise alignment, Gunnar Klau, November 9, 2005, 16:

Pairwise alignment, Gunnar Klau, November 9, 2005, 16: Pairwise alignment, Gunnar Klau, November 9, 2005, 16:36 2012 2.1 Growth rates For biological sequence analysis, we prefer algorithms that have time and space requirements that are linear in the length

More information

Sequence Alignment Techniques and Their Uses

Sequence Alignment Techniques and Their Uses Sequence Alignment Techniques and Their Uses Sarah Fiorentino Since rapid sequencing technology and whole genomes sequencing, the amount of sequence information has grown exponentially. With all of this

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

Copyright 2000 N. AYDIN. All rights reserved. 1

Copyright 2000 N. AYDIN. All rights reserved. 1 Introduction to Bioinformatics Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr Multiple Sequence Alignment Outline Multiple sequence alignment introduction to msa methods of msa progressive global alignment

More information

National Centre for Language Technology School of Computing Dublin City University

National Centre for Language Technology School of Computing Dublin City University with with National Centre for Language Technology School of Computing Dublin City University Parallel treebanks A parallel treebank comprises: sentence pairs parsed word-aligned tree-aligned (Volk & Samuelsson,

More information

APPMODULE WEATHER App Documentation

APPMODULE WEATHER App Documentation REAL SMART HOME GmbH APPMODULE WEATHER App Version: 1.2.0 Type: Application Article No.: BAB-022 version I Actual state 03/2017 Date: 30. Mai 2017 EN WEATHER App REAL SMART HOME GmbH STILWERK Dortmund

More information

Constriction Degree and Sound Sources

Constriction Degree and Sound Sources Constriction Degree and Sound Sources 1 Contrasting Oral Constriction Gestures Conditions for gestures to be informationally contrastive from one another? shared across members of the community (parity)

More information

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, et al. Google arxiv:1609.08144v2 Reviewed by : Bill

More information

V Short, Voiced, - Low, Central. V Open, Voiced, Low, Central Front, Voiced, High, Unrounded, Short. High, Unrounded, Long.

V Short, Voiced, - Low, Central. V Open, Voiced, Low, Central Front, Voiced, High, Unrounded, Short. High, Unrounded, Long. S. Phoneme/ No Allophone ontext [3,4] B.T Desription [4] IPA [1,2] 1 V Short, ^ - Low, entral 2 ఆ V Open, α 3 3(a) ఇ - - vowel ఇ V Low, entral Front, High, Unronded, Short ί ARPA- BET ah a ih X notation

More information

Parsing Beyond Context-Free Grammars: Tree Adjoining Grammars

Parsing Beyond Context-Free Grammars: Tree Adjoining Grammars Parsing Beyond Context-Free Grammars: Tree Adjoining Grammars Laura Kallmeyer & Tatiana Bladier Heinrich-Heine-Universität Düsseldorf Sommersemester 2018 Kallmeyer, Bladier SS 2018 Parsing Beyond CFG:

More information

K-means-based Feature Learning for Protein Sequence Classification

K-means-based Feature Learning for Protein Sequence Classification K-means-based Feature Learning for Protein Sequence Classification Paul Melman and Usman W. Roshan Department of Computer Science, NJIT Newark, NJ, 07102, USA pm462@njit.edu, usman.w.roshan@njit.edu Abstract

More information

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT 5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT.03.239 03.10.2012 ALIGNMENT Alignment is the task of locating equivalent regions of two or more sequences to maximize their similarity. Homology:

More information

Pairwise & Multiple sequence alignments

Pairwise & Multiple sequence alignments Pairwise & Multiple sequence alignments Urmila Kulkarni-Kale Bioinformatics Centre 411 007 urmila@bioinfo.ernet.in Basis for Sequence comparison Theory of evolution: gene sequences have evolved/derived

More information

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST

More information

Statistical Machine Translation

Statistical Machine Translation Statistical Machine Translation -tree-based models (cont.)- Artem Sokolov Computerlinguistik Universität Heidelberg Sommersemester 2015 material from P. Koehn, S. Riezler, D. Altshuler Bottom-Up Decoding

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The Expectation Maximization (EM) algorithm is one approach to unsupervised, semi-supervised, or lightly supervised learning. In this kind of learning either no labels are

More information

Global alignments - review

Global alignments - review Global alignments - review Take two sequences: X[j] and Y[j] M[i-1, j-1] ± 1 M[i, j] = max M[i, j-1] 2 M[i-1, j] 2 The best alignment for X[1 i] and Y[1 j] is called M[i, j] X[j] Initiation: M[,]= pply

More information

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model

More information

Bioinformatics with pen and paper: building a phylogenetic tree

Bioinformatics with pen and paper: building a phylogenetic tree Image courtesy of hometowncd / istockphoto Bioinformatics with pen and paper: building a phylogenetic tree Bioinformatics is usually done with a powerful computer. With help from Cleopatra Kozlowski, however,

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Learning in Bayesian Networks

Learning in Bayesian Networks Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ Proteomics Chapter 5. Proteomics and the analysis of protein sequence Ⅱ 1 Pairwise similarity searching (1) Figure 5.5: manual alignment One of the amino acids in the top sequence has no equivalent and

More information

NLTK and Lexical Information

NLTK and Lexical Information NLTK and Lexical Information Marina Sedinkina - Folien von Desislava Zhekova - CIS, LMU marina.sedinkina@campus.lmu.de December 12, 2017 Marina Sedinkina- Folien von Desislava Zhekova - Language Processing

More information

Maja Popović Humboldt University of Berlin Berlin, Germany 2 CHRF and WORDF scores

Maja Popović Humboldt University of Berlin Berlin, Germany 2 CHRF and WORDF scores CHRF deconstructed: β parameters and n-gram weights Maja Popović Humboldt University of Berlin Berlin, Germany maja.popovic@hu-berlin.de Abstract Character n-gram F-score (CHRF) is shown to correlate very

More information

Practical considerations of working with sequencing data

Practical considerations of working with sequencing data Practical considerations of working with sequencing data File Types Fastq ->aligner -> reference(genome) coordinates Coordinate files SAM/BAM most complete, contains all of the info in fastq and more!

More information

arxiv: v1 [cs.cl] 21 May 2017

arxiv: v1 [cs.cl] 21 May 2017 Spelling Correction as a Foreign Language Yingbo Zhou yingbzhou@ebay.com Utkarsh Porwal uporwal@ebay.com Roberto Konow rkonow@ebay.com arxiv:1705.07371v1 [cs.cl] 21 May 2017 Abstract In this paper, we

More information

Large-Scale Genomic Surveys

Large-Scale Genomic Surveys Bioinformatics Subtopics Fold Recognition Secondary Structure Prediction Docking & Drug Design Protein Geometry Protein Flexibility Homology Modeling Sequence Alignment Structure Classification Gene Prediction

More information

Alignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU)

Alignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Alignment principles and homology searching using (PSI-)BLAST Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) http://ibivu.cs.vu.nl Bioinformatics Nothing in Biology makes sense except in

More information

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013 Sequence Alignments Dynamic programming approaches, scoring, and significance Lucy Skrabanek ICB, WMC January 31, 213 Sequence alignment Compare two (or more) sequences to: Find regions of conservation

More information

Thanks to Paul Lewis, Jeff Thorne, and Joe Felsenstein for the use of slides

Thanks to Paul Lewis, Jeff Thorne, and Joe Felsenstein for the use of slides hanks to Paul Lewis, Jeff horne, and Joe Felsenstein for the use of slides Hennigian logic reconstructs the tree if we know polarity of characters and there is no homoplasy UPM infers a tree from a distance

More information

DECISION TREE BASED QUALITATIVE ANALYSIS OF OPERATING REGIMES IN INDUSTRIAL PRODUCTION PROCESSES *

DECISION TREE BASED QUALITATIVE ANALYSIS OF OPERATING REGIMES IN INDUSTRIAL PRODUCTION PROCESSES * HUNGARIAN JOURNAL OF INDUSTRIAL CHEMISTRY VESZPRÉM Vol. 35., pp. 95-99 (27) DECISION TREE BASED QUALITATIVE ANALYSIS OF OPERATING REGIMES IN INDUSTRIAL PRODUCTION PROCESSES * T. VARGA, F. SZEIFERT, J.

More information

Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas Sequence comparison: Score matrices Genome 559: Introduction to Statistical and omputational Genomics Prof James H Thomas Informal inductive proof of best alignment path onsider the last step in the best

More information

statistical machine translation

statistical machine translation statistical machine translation P A R T 3 : D E C O D I N G & E V A L U A T I O N CSC401/2511 Natural Language Computing Spring 2019 Lecture 6 Frank Rudzicz and Chloé Pou-Prom 1 University of Toronto Statistical

More information

G4120: Introduction to Computational Biology

G4120: Introduction to Computational Biology ICB Fall 2003 G4120: Introduction to Computational Biology Oliver Jovanovic, Ph.D. Columbia University Department of Microbiology Copyright 2003 Oliver Jovanovic, All Rights Reserved. Bioinformatics and

More information

Introduction to Evolutionary Concepts

Introduction to Evolutionary Concepts Introduction to Evolutionary Concepts and VMD/MultiSeq - Part I Zaida (Zan) Luthey-Schulten Dept. Chemistry, Beckman Institute, Biophysics, Institute of Genomics Biology, & Physics NIH Workshop 2009 VMD/MultiSeq

More information

GIS Visualization: A Library s Pursuit Towards Creative and Innovative Research

GIS Visualization: A Library s Pursuit Towards Creative and Innovative Research GIS Visualization: A Library s Pursuit Towards Creative and Innovative Research Justin B. Sorensen J. Willard Marriott Library University of Utah justin.sorensen@utah.edu Abstract As emerging technologies

More information

Multi-theme Sentiment Analysis using Quantified Contextual

Multi-theme Sentiment Analysis using Quantified Contextual Multi-theme Sentiment Analysis using Quantified Contextual Valence Shifters Hongkun Yu, Jingbo Shang, MeichunHsu, Malú Castellanos, Jiawei Han Presented by Jingbo Shang University of Illinois at Urbana-Champaign

More information

In-Depth Assessment of Local Sequence Alignment

In-Depth Assessment of Local Sequence Alignment 2012 International Conference on Environment Science and Engieering IPCBEE vol.3 2(2012) (2012)IACSIT Press, Singapoore In-Depth Assessment of Local Sequence Alignment Atoosa Ghahremani and Mahmood A.

More information

Comparison of Three Fugal ITS Reference Sets. Qiong Wang and Jim R. Cole

Comparison of Three Fugal ITS Reference Sets. Qiong Wang and Jim R. Cole RDP TECHNICAL REPORT Created 04/12/2014, Updated 08/08/2014 Summary Comparison of Three Fugal ITS Reference Sets Qiong Wang and Jim R. Cole wangqion@msu.edu, colej@msu.edu In this report, we evaluate the

More information

Multiple Sequence Alignment

Multiple Sequence Alignment Multiple Sequence Alignment Multiple Alignment versus Pairwise Alignment Up until now we have only tried to align two sequences.! What about more than two? And what for?! A faint similarity between two

More information

Introduction to Sequence Alignment. Manpreet S. Katari

Introduction to Sequence Alignment. Manpreet S. Katari Introduction to Sequence Alignment Manpreet S. Katari 1 Outline 1. Global vs. local approaches to aligning sequences 1. Dot Plots 2. BLAST 1. Dynamic Programming 3. Hash Tables 1. BLAT 4. BWT (Burrow Wheeler

More information

Characteristics of non-pre-vocalic ejectives in Yakima Sahaptin. NoWPhon, UO, Eugene,

Characteristics of non-pre-vocalic ejectives in Yakima Sahaptin. NoWPhon, UO, Eugene, Characteristics of non-pre-vocalic ejectives in Yakima Sahaptin Sharon Hargus University of Washington Virginia Beavert University of Oregon NoWPhon, UO, Eugene, 5-13-16 1 Organization Ejective background

More information

Sequence Alignment (chapter 6)

Sequence Alignment (chapter 6) Sequence lignment (chapter 6) he biological problem lobal alignment Local alignment Multiple alignment Introduction to bioinformatics, utumn 6 Background: comparative genomics Basic question in biology:

More information

Evaluation Measures of Multiple Sequence Alignments. Gaston H. Gonnet, *Chantal Korostensky and Steve Benner. Institute for Scientic Computing

Evaluation Measures of Multiple Sequence Alignments. Gaston H. Gonnet, *Chantal Korostensky and Steve Benner. Institute for Scientic Computing Evaluation Measures of Multiple Sequence Alignments Gaston H. Gonnet, *Chantal Korostensky and Steve Benner Institute for Scientic Computing ETH Zurich, 8092 Zuerich, Switzerland phone: ++41 1 632 74 79

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Lecture 2: Pairwise Alignment. CG Ron Shamir

Lecture 2: Pairwise Alignment. CG Ron Shamir Lecture 2: Pairwise Alignment 1 Main source 2 Why compare sequences? Human hexosaminidase A vs Mouse hexosaminidase A 3 www.mathworks.com/.../jan04/bio_genome.html Sequence Alignment עימוד רצפים The problem:

More information

Resolving Entity Coreference in Croatian with a Constrained Mention-Pair Model. Goran Glavaš and Jan Šnajder TakeLab UNIZG

Resolving Entity Coreference in Croatian with a Constrained Mention-Pair Model. Goran Glavaš and Jan Šnajder TakeLab UNIZG Resolving Entity Coreference in Croatian with a Constrained Mention-Pair Model Goran Glavaš and Jan Šnajder TakeLab UNIZG BSNLP 2015 @ RANLP, Hissar 10 Sep 2015 Background & Motivation Entity coreference

More information

Multiple Sequence Alignment

Multiple Sequence Alignment Multiple Sequence Alignment BMI/CS 576 www.biostat.wisc.edu/bmi576.html Colin Dewey cdewey@biostat.wisc.edu Multiple Sequence Alignment: Tas Definition Given a set of more than 2 sequences a method for

More information

Protein Threading. BMI/CS 776 Colin Dewey Spring 2015

Protein Threading. BMI/CS 776  Colin Dewey Spring 2015 Protein Threading BMI/CS 776 www.biostat.wisc.edu/bmi776/ Colin Dewey cdewey@biostat.wisc.edu Spring 2015 Goals for Lecture the key concepts to understand are the following the threading prediction task

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Jianlin Cheng, PhD Department of Computer Science Informatics Institute 2011 Topics Introduction Biological Sequence Alignment and Database Search Analysis of gene expression

More information

Sequence comparison: Score matrices

Sequence comparison: Score matrices Sequence comparison: Score matrices http://facultywashingtonedu/jht/gs559_2013/ Genome 559: Introduction to Statistical and omputational Genomics Prof James H Thomas FYI - informal inductive proof of best

More information