4 Original Sentence It was a dark and stormy night; the night was dark except at sunny intervals, when it was checked by a stormy gust of wind which made the night darker in the streets, fiercely agitating the scanty flame of the lamps that struggled against the darkness.

5 Problem Are similar phrases present in the sentence? Where, in the Sentence, are these similar phrases? Very Important: How will you help the user visualize this similarity? Not important: How similar are they, exactly? What is the extent of similarity?

6 Dark and Stormy Night It was a dark and stormy night; the night was dark except at sunny intervals, when it was checked by a stormy gust of wind which made the night darker in the streets, fiercely agitating the scanty flame of the lamps that struggled against the darkness.

7 Visualizing Similarities Window = 1 Window = 4 Threshold = 4 Sentence It was a dark and stormy night; the night was dark except at sunny intervals Sentence It was a dark and stormy night; the night was dark except at sunny intervals d a r k a n d s t o r m y n i g h t Phrase d a r k a n d s t o r m y n i g h t Phrase

8 Dot Plots To visualize similarity between sequences Window = 200bp

9 Unit Outline Dot Plots Simple Alignments Gaps Scoring Matrices Needleman and Wunsch Algorithm Databases Searches

10 Simple Alignment Pairwise match Match score (1) and Mismatch score (0) Seq 1: AAGATA, Seq 2: AATCTATA Alignments: A G T C T C T A A G G C T A A G T C T C T A A G G C T A A G T C T C T A A G G C T A Scores? n i =1 match score ; seq1 { i =seq2 i } mismatch score ; seq1 i!=seq2 i Substring Problem in rosalind.info: SUBS

11 Gaps All possible 2 consecutive gaps alignments A G T C T C T A - - A G G C T A A - - G G C T A A G - - G C T A A G G - - C T A A G G C - - T A A G G C T - - A A G G C T A - - n i =1 gap penalty ;if seq1 i =' ' seq2 i =' ' { match score ;if no gaps seq1 i =seq2 i } mismatch score ;if no gaps seq1 i!=seq2 i Match = 1, Mismatch = 0, Gap penalty = -1 AGG CT A, AG GCT A, AG GCTA

12 Homologs Terms Sequences that share a common ancestor Point Mutations indel events Contiguous indels of nucleotides are more likely AGG CTA vs. AG G CTA Origination Penalty (-2) and Length Penalty (-1) Calculate scores now. Counting Point Mutations Problem: HAMM

13 Likely Substitutions In a nucleotide mismatch, which substitutions are more likely to occur? A G T C T C A G G C T C A G T C T C A G C C T C Transitions and Transversions Problem: TRAN

14 For DNA Sequences: Scoring Matrices A T C G A T C G BLAST Matrix A T C G A T C G Transition-Transversion Matrix Amino Acids: Polar, Non-polar, Acidic, Basic Residues Hydrophobicity, Charge, Electronegativity, and size Based on observations

15 Needleman and Wunsch Algorithm A C T C G A -1 C -2 A -3 G -4 T -5 A -6 G -7 Gap Penalty = -1 Match Score = +1 Mismatch Score = 0

16 Needleman and Wunsch Algorithm A C T C G A C -2 A -3 G -4 T -5 A -6 G -7 Gap Penalty = -1 Match Score = +1 Mismatch Score = 0

17 Needleman and Wunsch Algorithm A C T C G A C A -3 G -4 T -5 A -6 G -7 Gap Penalty = -1 Match Score = +1 Mismatch Score = 0

18 Needleman and Wunsch Algorithm A C T C G A C A G T A G Gap Penalty = -1 Match Score = +1 Mismatch Score = 0

19 Needleman and Wunsch Algorithm A C T C G A C A G T A G Gap Penalty = -1 Match Score = +1 Mismatch Score = 0 A C A G T A G A C T C G

20 Semi-Global Alignment Terminal gaps are not penalized T A G C 0 A 0 G 0 T 0 A 0 G 0 C 0 A 0 C A G T A G C A T A G Gap Penalty = -1 Match Score = +1 Mismatch Score = 0 No Gap Penalty in the last row and column

21 Semi-Global Alignment Terminal gaps are not penalized T A G C A G T A G C A C A G T A G C A T A G Gap Penalty = -1 Match Score = +1 Mismatch Score = 0 No Gap Penalty in the last row and column

22 Semi-Global Alignment Terminal gaps are not penalized T A G C A G T A G C A C A G T A G C A T A G Gap Penalty = -1 Match Score = +1 Mismatch Score = 0 No Gap Penalty in the last row and column

23 Use the Semi-Global Alignment AACCTATAGCT and GCGATATA A A C C T A T A G C T G C G A T A T A Modify the previous method: Replace negative values with zero

24 Smith and Waterman Algorithm A A C C T A T A G C T G C G A T A T A

25 Databases and Multiple Sequences BLAST BLASTP, BLASTN, BLASTX, PSI-BLAST FASTA FASTX Multiple Sequence Alignments CLUSTAL Algorithm

