Welcome to CS262: Computational Genomics
|
|
- Felicia Little
- 5 years ago
- Views:
Transcription
1 Instructor: Welcome to CS262: Computational Genomics Serafim Batzoglou TA: Paul Chen Tuesdays & Thursdays 12:50-2:05pm Clark S361
2 Goals of this course Introduction to Computational Biology & Genomics Basic concepts and scientific questions Why does it matter? Basic biology for computer scientists In-depth coverage of algorithmic techniques Current active areas of research Useful algorithms Dynamic programming String algorithms HMMs and other graphical models for sequence analysis
3 Topics in CS262 Part 1: Basic Algorithms Dynamic Programming & sequence alignment HMMs, CRFs & sequence modeling Sequence indexing; Burrows-Wheeler transform, De Brujin graphs Part 2: Topics in computational genomics and areas of active research DNA sequencing and assembly Comparative genomics Human genome resequencing Alignment Compression Human genome variation Cancer genomics Functional genomics Population genomics
4 Course responsibilities Homeworks 4 challenging problem sets, 4-5 problems/pset Due at beginning of class Up to 3 late days (24-hr periods) for the quarter Collaboration allowed please give credit Teams of 2 or 3 students Individual writeups If individual (no team) then drop score of worst problem per problem set (Optional) Scribing Due one week after the lecture, except special permission Scribing grade replaces 2 lowest problems from all problem sets First-come first-serve, staff list to sign up
5 Reading material Main Reading: Lecture notes Papers Optional: Biological sequence analysis by Durbin, Eddy, Krogh, Mitchison Chapters 1-4, 6, 7-8, 9-10
6 Birth of Molecular Biology Phosphate Group Sugar Nitrogenous Base A, C, G, T DNA A G C G A C T T C G C T G A G C Physicist Ornithologist
7 Genetics in the 20 th Century
8 Human Genome Project 1990: Start 3 billion basepairs $3 billion 2000: Bill Clinton: 2001: Draft 2003: Finished now what? most important scien.fic discovery in the 20th century
9 Sequencing Growth Cost of one human genome 2004: $30,000, : $100, : $10, : $1,000???: $300 How much would you pay for a smartphone?
10 Uses of Genomes Medicine Prenatal/Mendelian diseases Drug dosage (eg. Warfarin) Disease risk Diagnosis of infections Ancestry Genealogy Nutrition? Psychology? Baby Engineering???... Ethical Issues
11 How soon will we all be sequenced? Cost Killer apps Roadblocks? Applications Cost 2015? 2020? Time
12 Intro to Biology
13 Sequence Alignment
14 Evolution CT Amemiya et al. Nature 496, (2013) doi: /nature12027
15 Evolution at the DNA level Deletion Mutation ACGGTGCAGTTACCA SEQUENCE EDITS AC----CAGTCCACCA REARRANGEMENTS Inversion Translocation Duplication
16 Evolutionary Rates next generation OK OK OK X X Still OK?
17 Sequence conservation implies function Alignment is the key to Finding important regions Determining function Uncovering evolutionary events
18 Sequence Alignment AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGGTCGATTTGCCCGAC -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC Definition Given two strings x = x 1 x 2...x M, y = y 1 y 2 y N, an alignment is an assignment of gaps to positions 0,, N in x, and 0,, N in y, so as to line up each letter in one sequence with either a letter, or a gap in the other sequence
19 What is a good alignment? AGGCTAGTT, AGCGAAGTTT AGGCTAGTT- 6 matches, 3 mismatches, 1 gap AGCGAAGTTT AGGCTA-GTT- 7 matches, 1 mismatch, 3 gaps AG-CGAAGTTT AGGC-TA-GTT- AG-CG-AAGTTT 7 matches, 0 mismatches, 5 gaps
20 Scoring Function Sequence edits: Mutations Insertions Deletions Scoring Function: Match: +m Mismatch: -s Gap: -d AGGCCTC AGGACTC AGGGCCTC AGG. CTC Alternative definition: minimal edit distance Given two strings x, y, find minimum # of edits (insertions, deletions, mutations) to transform one string to the other Score F = (# matches) m - (# mismatches) s (#gaps) d
21 How do we compute the best alignment? AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGA AGTGACCTGGGAAGACCCTGACCCTGGGTCACAAAACTC Too many possible alignments: (exercise) >> 2 N
22 Alignment is additive Observation: The score of aligning x 1 x M y 1 y N is additive Say that x 1 x i x i+1 x M aligns to y 1 y j y j+1 y N The two scores add up: F(x[1:M], y[1:n]) = F(x[1:i], y[1:j]) + F(x[i+1:M], y[j+1:n])
23 Dynamic Programming There are only a polynomial number of subproblems Align x 1 x i to y 1 y j Original problem is one of the subproblems Align x 1 x M to y 1 y N Each subproblem is easily solved from smaller subproblems We will show next Then, we can apply Dynamic Programming!!! Let F(i, j) = optimal score of aligning x 1 x i y 1 y j F is the DP Matrix or Table Memoization
24 Dynamic Programming (cont d) Notice three possible cases: 1. x i aligns to y j x 1 x i-1 x i y 1 y j-1 y j F(i, j) = F(i 1, j 1) + m, if x i = y j -s, if not 2. x i aligns to a gap x 1 x i-1 x i y 1 y j - F(i, j) = F(i 1, j) d 3. y j aligns to a gap x 1 x i - y 1 y j-1 y j F(i, j) = F(i, j 1) d
25 Dynamic Programming (cont d) How do we know which case is correct? Inductive assumption: F(i, j 1), F(i 1, j), F(i 1, j 1) are optimal Then, F(i, j) = max F(i 1, j 1) + s(x i, y j ) F(i 1, j) d F(i, j 1) d Where s(x i, y j ) = m, if x i = y j ; -s, if not
26 Example x = AGTA m = 1 Procedure to output y = ATA s = -1 Alignment d = -1 F(i,j) i = A G T A j = A T Follow the backpointers F(1, 1) = When max{f(0,0) diagonal, + s(a, A), OUTPUT x F(0, 1) i, y j d, F(1, 0) d} = When up, OUTPUT max{0 + 1, y j -1 1, When left, -1 1} = 1 OUTPUT x i 3 A A A G - T T A A
27 The Needleman-Wunsch Matrix x 1 x M y 1 y N Every nondecreasing path from (0,0) to (M, N) corresponds to an alignment of the two sequences An optimal alignment is composed of optimal subalignments
28 The Needleman-Wunsch Algorithm Initialization. F(0, 0) = 0 F(0, j) = - j d F(i, 0) = - i d Main Iteration. Filling-in partial alignments For each i = 1 M For each j = 1 N F(i 1,j 1) + s(x i, y j ) [case 1] F(i, j) = max F(i 1, j) d [case 2] F(i, j 1) d [case 3] DIAG, if [case 1] Ptr(i, j) = LEFT, if [case 2] UP, if [case 3] 3. Termination. F(M, N) is the optimal score, and from Ptr(M, N) can trace back optimal alignment
29 Performance Time: O(NM) Space: O(NM) Later we will cover more efficient methods
30 A variant of the basic algorithm: Maybe it is OK to have an unlimited # of gaps in the beginning and end: CTATCACCTGACCTCCAGGCCGATGCCCCTTCCGGC GCGAGTTCATCTATCAC--GACCGC--GGTCG Then, we don t want to penalize gaps in the ends
31 Different types of overlaps Example: 2 overlapping reads from a sequencing project Example: Search for a mouse gene within a human chromosome
32 The Overlap Detection variant x 1 x M Changes: y 1 y N 1. Initialization For all i, j, F(i, 0) = 0 F(0, j) = 0 2. Termination max i F(i, N) F OPT = max max j F(M, j)
33 The local alignment problem Given two strings x = x 1 x M, y = y 1 y N Find substrings x, y whose similarity (optimal global alignment value) is maximum x = aaaacccccggggtta y = ttcccgggaaccaacc
34 Why local alignment Genes are shuffled between genomes
35 Cross-species genome similarity 98% of genes are conserved between any two mammals >70% average similarity in protein sequence hum_a : 57331/ mus_a : 78560/ rat_a : / fug_a : 36008/68174 hum_a : 57381/ mus_a : 78610/ rat_a : / fug_a : 36058/68174 hum_a : 57431/ mus_a : 78659/ rat_a : / fug_a : 36084/68174 atoh enhancer in human, mouse, rat, fugu fish hum_a : 57481/ mus_a : 78708/ rat_a : / fug_a : 36097/68174
36 The Smith-Waterman algorithm Idea: Ignore badly aligning regions Modifications to Needleman-Wunsch: Initialization: F(0, j) = F(i, 0) = 0 Iteration: F(i, j) = max F(i 1, j) d 0 F(i, j 1) d F(i 1, j 1) + s(x i, y j )
37 The Smith-Waterman algorithm Termination: 1. If we want the best local alignment F OPT = max i,j F(i, j) Find F OPT and trace back 2. If we want all local alignments scoring > t?? For all i, j find F(i, j) > t, and trace back? Complicated by overlapping local alignments Waterman Eggert 87: find all non-overlapping local alignments with minimal recalculation of the DP matrix
38 Scoring the gaps more accurately Current model: γ(n) Gap of length n incurs penalty n d However, gaps usually occur in bunches Concave gap penalty function γ(n) (aka Convex -γ(n)): γ(n) γ(n): for all n, γ(n + 1) - γ(n) γ(n) - γ(n 1)
39 Convex gap dynamic programming Initialization: same Iteration: F(i 1, j 1) + s(x i, y j ) F(i, j) = max max k=0 i-1 F(k, j) γ(i k) max k=0 j-1 F(i, k) γ(j k) Termination: same Running Time: O(N 2 M) (assume N>M) Space: O(NM)
40 Compromise: affine gaps γ(n) = d + (n 1) e gap gap open extend To compute optimal alignment, γ(n) d e At position i, j, need to remember best score if gap is open best score if gap is not open F(i, j): score of alignment x 1 x i to y 1 y j if x i aligns to y j G(i, j): score if x i aligns to a gap after y j H(i, j): score if y j aligns to a gap after x i V(i, j) = best score of alignment x 1 x i to y 1 y j
41 Needleman-Wunsch with affine gaps Why do we need matrices F, G, H? Because, perhaps x i aligns to y j G(i, j) < V(i, j) x 1 x i-1 x i x i+1 y 1 y j-1 y j - (it is best to align x i to y j if we were aligning only x 1 x i to y 1 y j and not the rest of x, y), but on the contrary Add -d G(i+1, j) = F(i, j) d G(i, j) x e i aligns > V(i, j) to d a gap after y j x 1 x i-1 x i x i+1 y 1 y j - - (i.e., had we fixed our decision that x i aligns to y j, we could regret it at the next step when aligning x 1 x i+1 to y 1 y j ) Add -e G(i+1, j) = G(i, j) e
42 Needleman-Wunsch with affine gaps Initialization: V(i, 0) = d + (i 1) e V(0, j) = d + (j 1) e Iteration: V(i, j) = max{ F(i, j), G(i, j), H(i, j) } F(i, j) = V(i 1, j 1) + s(x i, y j ) G(i, j) = max V(i 1, j) d G(i 1, j) e Termination: V(i, j 1) d H(i, j) = max H(i, j 1) e V(i, j) has the best alignment Time? Space?
43 To generalize a bit think of how you would compute optimal alignment with this gap function γ(n).in time O(MN)
44 Bounded Dynamic Programming Assume we know that x and y are very similar Assumption: # gaps(x, y) < k(n) x i Then, implies i j < k(n) y j We can align x and y more efficiently: Time, Space: O(N k(n)) << O(N 2 )
45 Bounded Dynamic Programming y 1 y N x 1 x M k(n) Initialization: F(i,0), F(0,j) undefined for i, j > k Iteration: For i = 1 M For j = max(1, i k) min(n, i+k) F(i 1, j 1)+ s(x i, y j ) F(i, j) = max F(i, j 1) d, if j > i k(n) F(i 1, j) d, if j < i + k(n) Termination: same Easy to extend to the affine gap case
46 Outline Linear-Space Alignment BLAST local alignment search Ultra-fast alignment for (human) genome resequencing
47 Linear-Space Alignment
48 Subsequences and Substrings Definition A string x is a substring of a string x, if x = ux v for some prefix string u and suffix string v (similarly, x = x i x j, for some 1 i j x ) A string x is a subsequence of a string x if x can be obtained from x by deleting 0 or more letters (x = x i1 x ik, for some 1 i 1 i k x ) Note: a substring is always a subsequence Example: x = abracadabra y = cadabr; substring z = brcdbr; subseqence, not substring
49 Hirschberg s algortihm Given a set of strings x, y,, a common subsequence is a string u that is a subsequence of all strings x, y, Longest common subsequence Given strings x = x 1 x 2 x M, y = y 1 y 2 y N, Find longest common subsequence u = u 1 u k Algorithm: F(i 1, j) F(i, j) = max F(i, j 1) F(i 1, j 1) + [1, if x i = y j ; 0 otherwise] Ptr(i, j) = (same as in N-W) Termination: trace back from Ptr(M, N), and prepend a letter to u whenever Ptr(i, j) = DIAG and F(i 1, j 1) < F(i, j) Hirschberg s original algorithm solves this in linear space
50 Introduction: Compute optimal score It is easy to compute F(M, N) in linear space Allocate ( column[1] ) Allocate ( column[2] ) F(i,j) For If i = 1.M i > 1, then: Free( column[ i 2 ] ) Allocate( column[ i ] ) For j = 1 N F(i, j) =
51 Linear-space alignment To compute both the optimal score and the optimal alignment: Divide & Conquer approach: Notation: x r, y r : reverse of x, y E.g. x = accgg; x r = ggcca F r (i, j): optimal score of aligning x r 1 xr i & yr 1 yr j same as aligning x M-i+1 x M & y N-j+1 y N
52 Linear-space alignment Lemma: (assume M is even) F(M, N) = max k=0 N ( F(M/2, k) + F r (M/2, N k) ) x y M/2 F(M/2, k) F r (M/2, N k) k * Example: ACC-GGTGCCCAGGACTG--CAT ACCAGGTG----GGACTGGGCAG k * = 8
53 Linear-space alignment Now, using 2 columns of space, we can compute for k = 1 M, F(M/2, k), F r (M/2, N k) PLUS the backpointers x 1 x M/2 x M x 1 x M/2+1 x M y 1 y 1 y N y N
54 Linear-space alignment Now, we can find k * maximizing F(M/2, k) + F r (M/2, N-k) Also, we can trace the path exiting column M/2 from k * 0 1 M/2 M/2+1 M M+1 k * k * +1
55 Linear-space alignment Iterate this procedure to the left and right! k * N-k * M/2 M/2
56 Linear-space alignment Hirschberg s Linear-space algorithm: MEMALIGN(l, l, r, r ): (aligns x l x l with y r y r ) 1. Let h = (l -l)/2 2. Find (in Time O((l l) (r r)), Space O(r r)) the optimal path, L h, entering column h 1, exiting column h Let k 1 = pos n at column h 2 where L h enters k 2 = pos n at column h + 1 where L h exits 3. MEMALIGN(l, h 2, r, k 1 ) 4. Output L h 5. MEMALIGN(h + 1, l, k 2, r ) Top level call: MEMALIGN(1, M, 1, N)
57 Linear-space alignment Time, Space analysis of Hirschberg s algorithm: To compute optimal path at middle column, For box of size M N, Space: 2N Time: cmn, for some constant c Then, left, right calls cost c( M/2 k * + M/2 (N k * ) ) = cmn/2 All recursive calls cost Total Time: cmn + cmn/2 + cmn/4 +.. = 2cMN = O(MN) Total Space: O(N) for computation, O(N + M) to store the optimal alignment
Evolution. CT Amemiya et al. Nature 496, (2013) doi: /nature12027
Sequence Alignment Evolution CT Amemiya et al. Nature 496, 311-316 (2013) doi:10.1038/nature12027 Evolutionary Rates next generation OK OK OK X X Still OK? Sequence conservation implies function Alignment
More informationLinear-Space Alignment
Linear-Space Alignment Subsequences and Substrings Definition A string x is a substring of a string x, if x = ux v for some prefix string u and suffix string v (similarly, x = x i x j, for some 1 i j x
More informationMinimum Edit Distance. Defini'on of Minimum Edit Distance
Minimum Edit Distance Defini'on of Minimum Edit Distance How similar are two strings? Spell correc'on The user typed graffe Which is closest? graf gra@ grail giraffe Computa'onal Biology Align two sequences
More informationLecture 2: Pairwise Alignment. CG Ron Shamir
Lecture 2: Pairwise Alignment 1 Main source 2 Why compare sequences? Human hexosaminidase A vs Mouse hexosaminidase A 3 www.mathworks.com/.../jan04/bio_genome.html Sequence Alignment עימוד רצפים The problem:
More informationMoreover, the circular logic
Moreover, the circular logic How do we know what is the right distance without a good alignment? And how do we construct a good alignment without knowing what substitutions were made previously? ATGCGT--GCAAGT
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 03: Edit distance and sequence alignment Slides adapted from Dr. Shaojie Zhang (University of Central Florida) KUMC visit How many of you would like to attend
More informationBio nformatics. Lecture 3. Saad Mneimneh
Bio nformatics Lecture 3 Sequencing As before, DNA is cut into small ( 0.4KB) fragments and a clone library is formed. Biological experiments allow to read a certain number of these short fragments per
More informationPairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55
Pairwise Alignment Guan-Shieng Huang shieng@ncnu.edu.tw Dept. of CSIE, NCNU Pairwise Alignment p.1/55 Approach 1. Problem definition 2. Computational method (algorithms) 3. Complexity and performance Pairwise
More informationAnalysis and Design of Algorithms Dynamic Programming
Analysis and Design of Algorithms Dynamic Programming Lecture Notes by Dr. Wang, Rui Fall 2008 Department of Computer Science Ocean University of China November 6, 2009 Introduction 2 Introduction..................................................................
More informationPractical considerations of working with sequencing data
Practical considerations of working with sequencing data File Types Fastq ->aligner -> reference(genome) coordinates Coordinate files SAM/BAM most complete, contains all of the info in fastq and more!
More informationSequence Alignment (chapter 6)
Sequence lignment (chapter 6) he biological problem lobal alignment Local alignment Multiple alignment Introduction to bioinformatics, utumn 6 Background: comparative genomics Basic question in biology:
More informationCISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)
CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST
More informationIntroduction to Bioinformatics Algorithms Homework 3 Solution
Introduction to Bioinformatics Algorithms Homework 3 Solution Saad Mneimneh Computer Science Hunter College of CUNY Problem 1: Concave penalty function We have seen in class the following recurrence for
More informationBioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment
Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value
More informationLecture 4: September 19
CSCI1810: Computational Molecular Biology Fall 2017 Lecture 4: September 19 Lecturer: Sorin Istrail Scribe: Cyrus Cousins Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes
More informationPairwise sequence alignment
Department of Evolutionary Biology Example Alignment between very similar human alpha- and beta globins: GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL G+ +VK+HGKKV A+++++AH+D++ +++++LS+LH KL GNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKL
More informationText Algorithms (4AP) Biological measures of similarity. Jaak Vilo 2008 fall. MTAT Text Algorithms. Complete genomes
Text Algorithms (4AP) Biological measures of similarity Jaak Vilo 2008 fall Jaak Vilo MTAT.03.190 Text Algorithms 1 Complete genomes Lecture 1, Tuesday April 1, 2003 Serafim Batzoglou http://ai.stanford.edu/~serafim/
More informationSequence analysis and Genomics
Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute
More information6.6 Sequence Alignment
6.6 Sequence Alignment String Similarity How similar are two strings? ocurrance o c u r r a n c e - occurrence First model the problem Q. How can we measure the distance? o c c u r r e n c e 6 mismatches,
More informationPairwise alignment, Gunnar Klau, November 9, 2005, 16:
Pairwise alignment, Gunnar Klau, November 9, 2005, 16:36 2012 2.1 Growth rates For biological sequence analysis, we prefer algorithms that have time and space requirements that are linear in the length
More informationCS 580: Algorithm Design and Analysis
CS 58: Algorithm Design and Analysis Jeremiah Blocki Purdue University Spring 28 Announcement: Homework 3 due February 5 th at :59PM Midterm Exam: Wed, Feb 2 (8PM-PM) @ MTHW 2 Recap: Dynamic Programming
More informationLocal Alignment: Smith-Waterman algorithm
Local Alignment: Smith-Waterman algorithm Example: a shared common domain of two protein sequences; extended sections of genomic DNA sequence. Sensitive to detect similarity in highly diverged sequences.
More informationSara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)
Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline
More information20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, Global and local alignment of two sequences using dynamic programming
20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, 2008 4 Pairwise alignment We will discuss: 1. Strings 2. Dot matrix method for comparing sequences 3. Edit distance 4. Global and local alignment
More informationBioinformatics and BLAST
Bioinformatics and BLAST Overview Recap of last time Similarity discussion Algorithms: Needleman-Wunsch Smith-Waterman BLAST Implementation issues and current research Recap from Last Time Genome consists
More informationCSE 202 Dynamic Programming II
CSE 202 Dynamic Programming II Chapter 6 Dynamic Programming Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. 1 Algorithmic Paradigms Greed. Build up a solution incrementally,
More informationAlgorithms in Bioinformatics: A Practical Introduction. Sequence Similarity
Algorithms in Bioinformatics: A Practical Introduction Sequence Similarity Earliest Researches in Sequence Comparison Doolittle et al. (Science, July 1983) searched for platelet-derived growth factor (PDGF)
More informationPair Hidden Markov Models
Pair Hidden Markov Models Scribe: Rishi Bedi Lecturer: Serafim Batzoglou January 29, 2015 1 Recap of HMMs alphabet: Σ = {b 1,...b M } set of states: Q = {1,..., K} transition probabilities: A = [a ij ]
More informationLecture 5: September Time Complexity Analysis of Local Alignment
CSCI1810: Computational Molecular Biology Fall 2017 Lecture 5: September 21 Lecturer: Sorin Istrail Scribe: Cyrus Cousins Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes
More informationAlgorithm Design and Analysis
Algorithm Design and Analysis LECTURE 18 Dynamic Programming (Segmented LS recap) Longest Common Subsequence Adam Smith Segmented Least Squares Least squares. Foundational problem in statistic and numerical
More informationBackground: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6)
Sequence lignment (chapter ) he biological problem lobal alignment Local alignment Multiple alignment Background: comparative genomics Basic question in biology: what properties are shared among organisms?
More informationNUMB3RS Activity: DNA Sequence Alignment. Episode: Guns and Roses
Teacher Page 1 NUMB3RS Activity: DNA Sequence Alignment Topic: Biomathematics DNA sequence alignment Grade Level: 10-12 Objective: Use mathematics to compare two strings of DNA Prerequisite: very basic
More informationAlgorithms in Bioinformatics
Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology
More informationQuantifying sequence similarity
Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity
More informationComputational Biology Lecture 5: Time speedup, General gap penalty function Saad Mneimneh
Computational Biology Lecture 5: ime speedup, General gap penalty function Saad Mneimneh We saw earlier that it is possible to compute optimal global alignments in linear space (it can also be done for
More informationInDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9
Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic
More informationComputational Biology
Computational Biology Lecture 6 31 October 2004 1 Overview Scoring matrices (Thanks to Shannon McWeeney) BLAST algorithm Start sequence alignment 2 1 What is a homologous sequence? A homologous sequence,
More information1.5 Sequence alignment
1.5 Sequence alignment The dramatic increase in the number of sequenced genomes and proteomes has lead to development of various bioinformatic methods and algorithms for extracting information (data mining)
More informationPairwise & Multiple sequence alignments
Pairwise & Multiple sequence alignments Urmila Kulkarni-Kale Bioinformatics Centre 411 007 urmila@bioinfo.ernet.in Basis for Sequence comparison Theory of evolution: gene sequences have evolved/derived
More informationSequence Comparison. mouse human
Sequence Comparison Sequence Comparison mouse human Why Compare Sequences? The first fact of biological sequence analysis In biomolecular sequences (DNA, RNA, or amino acid sequences), high sequence similarity
More informationAn Introduction to Bioinformatics Algorithms Hidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationMore Dynamic Programming
Algorithms & Models of Computation CS/ECE 374, Fall 2017 More Dynamic Programming Lecture 14 Tuesday, October 17, 2017 Sariel Har-Peled (UIUC) CS374 1 Fall 2017 1 / 48 What is the running time of the following?
More information3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT
3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode
More informationString Matching Problem
String Matching Problem Pattern P Text T Set of Locations L 9/2/23 CAP/CGS 5991: Lecture 2 Computer Science Fundamentals Specify an input-output description of the problem. Design a conceptual algorithm
More informationModule: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment
Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand
More informationMore Dynamic Programming
CS 374: Algorithms & Models of Computation, Spring 2017 More Dynamic Programming Lecture 14 March 9, 2017 Chandra Chekuri (UIUC) CS374 1 Spring 2017 1 / 42 What is the running time of the following? Consider
More informationAside: Golden Ratio. Golden Ratio: A universal law. Golden ratio φ = lim n = 1+ b n = a n 1. a n+1 = a n + b n, a n+b n a n
Aside: Golden Ratio Golden Ratio: A universal law. Golden ratio φ = lim n a n+b n a n = 1+ 5 2 a n+1 = a n + b n, b n = a n 1 Ruta (UIUC) CS473 1 Spring 2018 1 / 41 CS 473: Algorithms, Spring 2018 Dynamic
More informationCSE 431/531: Analysis of Algorithms. Dynamic Programming. Lecturer: Shi Li. Department of Computer Science and Engineering University at Buffalo
CSE 431/531: Analysis of Algorithms Dynamic Programming Lecturer: Shi Li Department of Computer Science and Engineering University at Buffalo Paradigms for Designing Algorithms Greedy algorithm Make a
More informationAlgorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More information6. DYNAMIC PROGRAMMING II
6. DYNAMIC PROGRAMMING II sequence alignment Hirschberg's algorithm Bellman-Ford algorithm distance vector protocols negative cycles in a digraph Lecture slides by Kevin Wayne Copyright 2005 Pearson-Addison
More informationChapter 6. Dynamic Programming. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.
Chapter 6 Dynamic Programming Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. 1 Algorithmic Paradigms Greed. Build up a solution incrementally, myopically optimizing
More informationAlignment Algorithms. Alignment Algorithms
Midterm Results Big improvement over scores from the previous two years. Since this class grade is based on the previous years curve, that means this class will get higher grades than the previous years.
More informationHidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationCS173 Lecture B, November 3, 2015
CS173 Lecture B, November 3, 2015 Tandy Warnow November 3, 2015 CS 173, Lecture B November 3, 2015 Tandy Warnow Announcements Examlet 7 is a take-home exam, and is due November 10, 11:05 AM, in class.
More informationPractical Bioinformatics
5/2/2017 Dictionaries d i c t i o n a r y = { A : T, T : A, G : C, C : G } d i c t i o n a r y [ G ] d i c t i o n a r y [ N ] = N d i c t i o n a r y. h a s k e y ( C ) Dictionaries g e n e t i c C o
More informationWhole Genome Alignments and Synteny Maps
Whole Genome Alignments and Synteny Maps IINTRODUCTION It was not until closely related organism genomes have been sequenced that people start to think about aligning genomes and chromosomes instead of
More informationChapter 6. Dynamic Programming. CS 350: Winter 2018
Chapter 6 Dynamic Programming CS 350: Winter 2018 1 Algorithmic Paradigms Greedy. Build up a solution incrementally, myopically optimizing some local criterion. Divide-and-conquer. Break up a problem into
More informationDynamic Programming. Weighted Interval Scheduling. Algorithmic Paradigms. Dynamic Programming
lgorithmic Paradigms Dynamic Programming reed Build up a solution incrementally, myopically optimizing some local criterion Divide-and-conquer Break up a problem into two sub-problems, solve each sub-problem
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Lecture : p he biological problem p lobal alignment p Local alignment p Multiple alignment 6 Background: comparative genomics p Basic question in biology: what properties
More information8 Grundlagen der Bioinformatik, SS 09, D. Huson, April 28, 2009
8 Grundlagen der Bioinformatik, SS 09, D. Huson, April 28, 2009 2 Pairwise alignment We will discuss: 1. Strings 2. Dot matrix method for comparing sequences 3. Edit distance and alignment 4. The number
More informationFirst generation sequencing and pairwise alignment (High-tech, not high throughput) Analysis of Biological Sequences
First generation sequencing and pairwise alignment (High-tech, not high throughput) Analysis of Biological Sequences 140.638 where do sequences come from? DNA is not hard to extract (getting DNA from a
More informationCHEM 181: Chemical Biology
Instructor Prof. Jane M. Liu (SN-216) jane.liu@pomona.edu CHEM 181: Chemical Biology Office Hours Anytime my office door is open or by appointment COURSE OVERVIEW Class TR 8:10-9:25 am Prerequisite: CHEM115
More information8 Grundlagen der Bioinformatik, SoSe 11, D. Huson, April 18, 2011
8 Grundlagen der Bioinformatik, SoSe 11, D. Huson, April 18, 2011 2 Pairwise alignment We will discuss: 1. Strings 2. Dot matrix method for comparing sequences 3. Edit distance and alignment 4. The number
More information/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programming II Date: 10/12/17
601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programming II Date: 10/12/17 12.1 Introduction Today we re going to do a couple more examples of dynamic programming. While
More informationSequence Alignment Techniques and Their Uses
Sequence Alignment Techniques and Their Uses Sarah Fiorentino Since rapid sequencing technology and whole genomes sequencing, the amount of sequence information has grown exponentially. With all of this
More informationLecture 5,6 Local sequence alignment
Lecture 5,6 Local sequence alignment Chapter 6 in Jones and Pevzner Fall 2018 September 4,6, 2018 Evolution as a tool for biological insight Nothing in biology makes sense except in the light of evolution
More informationChapter 6. Weighted Interval Scheduling. Dynamic Programming. Algorithmic Paradigms. Dynamic Programming Applications
lgorithmic Paradigms hapter Dynamic Programming reedy. Build up a solution incrementally, myopically optimizing some local criterion. Divide-and-conquer. Break up a problem into sub-problems, solve each
More informationMATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME
MATHEMATICAL MODELING AND THE HUMAN GENOME Hilary S. Booth Australian National University, Australia Keywords: Human genome, DNA, bioinformatics, sequence analysis, evolution. Contents 1. Introduction:
More informationMotivating the need for optimal sequence alignments...
1 Motivating the need for optimal sequence alignments... 2 3 Note that this actually combines two objectives of optimal sequence alignments: (i) use the score of the alignment o infer homology; (ii) use
More informationPairwise sequence alignment and pair hidden Markov models
Pairwise sequence alignment and pair hidden Markov models Martin C. Frith April 13, 2012 ntroduction Pairwise alignment and pair hidden Markov models (phmms) are basic textbook fare [2]. However, there
More informationCollected Works of Charles Dickens
Collected Works of Charles Dickens A Random Dickens Quote If there were no bad people, there would be no good lawyers. Original Sentence It was a dark and stormy night; the night was dark except at sunny
More informationList of Code Challenges. About the Textbook Meet the Authors... xix Meet the Development Team... xx Acknowledgments... xxi
Contents List of Code Challenges xvii About the Textbook xix Meet the Authors................................... xix Meet the Development Team............................ xx Acknowledgments..................................
More informationBioinformatics for Computer Scientists (Part 2 Sequence Alignment) Sepp Hochreiter
Bioinformatics for Computer Scientists (Part 2 Sequence Alignment) Institute of Bioinformatics Johannes Kepler University, Linz, Austria Sequence Alignment 2. Sequence Alignment Sequence Alignment 2.1
More informationSequence analysis and comparison
The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species
More information17 Non-collinear alignment Motivation A B C A B C A B C A B C D A C. This exposition is based on:
17 Non-collinear alignment This exposition is based on: 1. Darling, A.E., Mau, B., Perna, N.T. (2010) progressivemauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5(6):e11147.
More informationCopyright 2000 N. AYDIN. All rights reserved. 1
Introduction to Bioinformatics Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr Multiple Sequence Alignment Outline Multiple sequence alignment introduction to msa methods of msa progressive global alignment
More informationIn-Depth Assessment of Local Sequence Alignment
2012 International Conference on Environment Science and Engieering IPCBEE vol.3 2(2012) (2012)IACSIT Press, Singapoore In-Depth Assessment of Local Sequence Alignment Atoosa Ghahremani and Mahmood A.
More informationSequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013
Sequence Alignments Dynamic programming approaches, scoring, and significance Lucy Skrabanek ICB, WMC January 31, 213 Sequence alignment Compare two (or more) sequences to: Find regions of conservation
More informationComputational Molecular Biology (
Computational Molecular Biology (http://cmgm cmgm.stanford.edu/biochem218/) Biochemistry 218/Medical Information Sciences 231 Douglas L. Brutlag, Lee Kozar Jimmy Huang, Josh Silverman Lecture Syllabus
More informationCHEM 121: Chemical Biology
Instructors Prof. Jane M. Liu (HS-212) jliu3@drew.edu x3303 Office Hours Anytime my office door is open CHEM 121: Chemical Biology Class MF 2:30-3:45 pm PRE-REQUISITES: CHEM 117 COURSE OVERVIEW This upper-level
More informationAn Introduction to Sequence Similarity ( Homology ) Searching
An Introduction to Sequence Similarity ( Homology ) Searching Gary D. Stormo 1 UNIT 3.1 1 Washington University, School of Medicine, St. Louis, Missouri ABSTRACT Homologous sequences usually have the same,
More informationSEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA
SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS 1 Prokaryotes and Eukaryotes 2 DNA and RNA 3 4 Double helix structure Codons Codons are triplets of bases from the RNA sequence. Each triplet defines an amino-acid.
More informationSingle alignment: Substitution Matrix. 16 march 2017
Single alignment: Substitution Matrix 16 march 2017 BLOSUM Matrix BLOSUM Matrix [2] (Blocks Amino Acid Substitution Matrices ) It is based on the amino acids substitutions observed in ~2000 conserved block
More informationWhole Genome Alignment. Adam Phillippy University of Maryland, Fall 2012
Whole Genome Alignment Adam Phillippy University of Maryland, Fall 2012 Motivation cancergenome.nih.gov Breast cancer karyotypes www.path.cam.ac.uk Goal of whole-genome alignment } For two genomes, A and
More informationTandem Mass Spectrometry: Generating function, alignment and assembly
Tandem Mass Spectrometry: Generating function, alignment and assembly With slides from Sangtae Kim and from Jones & Pevzner 2004 Determining reliability of identifications Can we use Target/Decoy to estimate
More informationGlobal alignments - review
Global alignments - review Take two sequences: X[j] and Y[j] M[i-1, j-1] ± 1 M[i, j] = max M[i, j-1] 2 M[i-1, j] 2 The best alignment for X[1 i] and Y[1 j] is called M[i, j] X[j] Initiation: M[,]= pply
More informationIntroduction to sequence alignment. Local alignment the Smith-Waterman algorithm
Lecture 2, 12/3/2003: Introduction to sequence alignment The Needleman-Wunsch algorithm for global sequence alignment: description and properties Local alignment the Smith-Waterman algorithm 1 Computational
More informationMultiple Sequence Alignment using Profile HMM
Multiple Sequence Alignment using Profile HMM. based on Chapter 5 and Section 6.5 from Biological Sequence Analysis by R. Durbin et al., 1998 Acknowledgements: M.Sc. students Beatrice Miron, Oana Răţoi,
More informationImplementing Approximate Regularities
Implementing Approximate Regularities Manolis Christodoulakis Costas S. Iliopoulos Department of Computer Science King s College London Kunsoo Park School of Computer Science and Engineering, Seoul National
More informationHMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder
HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding
More informationEfficient High-Similarity String Comparison: The Waterfall Algorithm
Efficient High-Similarity String Comparison: The Waterfall Algorithm Alexander Tiskin Department of Computer Science University of Warwick http://go.warwick.ac.uk/alextiskin Alexander Tiskin (Warwick)
More informationEffects of Gap Open and Gap Extension Penalties
Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See
More informationEvolutionary Models. Evolutionary Models
Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment
More informationPairwise alignment. 2.1 Introduction GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSD----LHAHKL
2 Pairwise alignment 2.1 Introduction The most basic sequence analysis task is to ask if two sequences are related. This is usually done by first aligning the sequences (or parts of them) and then deciding
More informationAreas. ! Bioinformatics. ! Control theory. ! Information theory. ! Operations research. ! Computer science: theory, graphics, AI, systems,.
lgorithmic Paradigms hapter Dynamic Programming reed Build up a solution incrementally, myopically optimizing some local criterion Divide-and-conquer Break up a problem into two sub-problems, solve each
More informationAlignment Strategies for Large Scale Genome Alignments
Alignment Strategies for Large Scale Genome Alignments CSHL Computational Genomics 9 November 2003 Algorithms for Biological Sequence Comparison algorithm value scoring gap time calculated matrix penalty
More informationLecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models
Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm Alignment scoring schemes and theory: substitution matrices and gap models 1 Local sequence alignments Local sequence alignments are necessary
More informationSequence Alignment. Johannes Starlinger
Sequence Alignment Johannes Starlinger his Lecture Approximate String Matching Edit distance and alignment Computing global alignments Local alignment Johannes Starlinger: Bioinformatics, Summer Semester
More informationCSE 241 Class 1. Jeremy Buhler. August 24,
CSE 41 Class 1 Jeremy Buhler August 4, 015 Before class, write URL on board: http://classes.engineering.wustl.edu/cse41/. Also: Jeremy Buhler, Office: Jolley 506, 314-935-6180 1 Welcome and Introduction
More information2 Pairwise alignment. 2.1 References. 2.2 Importance of sequence alignment. Introduction to the pairwise sequence alignment problem.
2 Pairwise alignment Introduction to the pairwise sequence alignment problem Dot plots Scoring schemes The principle of dynamic programming Alignment algorithms based on dynamic programming 2.1 References
More information