NUMB3RS Activity: DNA Sequence Alignment. Episode: Guns and Roses

Similar documents
NUMB3RS Activity: How to Get a Date. Episode: Bones of Contention

NUMB3RS Activity: What Are You Implying? Episode: Backscatter

NUMB3RS Activity: Clearing All Obstacles. Episode: Hot Shot

Sequence analysis and Genomics

NUMB3RS Activity: Fresh Air and Parabolas. Episode: Pandora s Box

Practical Bioinformatics

Pairwise & Multiple sequence alignments

New Jersey Center for Teaching and Learning. Progressive Mathematics Initiative

3.3 It All Adds Up. A Develop Understanding Task

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Bioinformatics and BLAST

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Please bring the task to your first physics lesson and hand it to the teacher.

Use Newton s law of cooling to narrow down the number of suspects by determining when the victim was killed.

NAME: DATE: MATHS: Higher Level Algebra. Maths. Higher Level Algebra

Maths Higher Level Algebra

Pairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55

Knots, Coloring and Applications

AP PHYSICS C Mechanics - SUMMER ASSIGNMENT FOR

NUMB3RS Activity: Magnetism. Episode: Provenance

Analysis and Design of Algorithms Dynamic Programming

Pairwise sequence alignment

In-Depth Assessment of Local Sequence Alignment

Alex s Guide to Word Problems and Linear Equations Following Glencoe Algebra 1

Comparing whole genomes

Collected Works of Charles Dickens

Algorithms in Bioinformatics

Lecture 3: Markov chains.

2017 Summer Break Assignment for Students Entering Geometry

SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA

(

NUMB3RS Activity: Where s the Source? Episode: Undercurrents

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

STATC141 Spring 2005 The materials are from Pairwise Sequence Alignment by Robert Giegerich and David Wheeler

Confidence intervals

Computational Biology

MATH STUDENT BOOK. 6th Grade Unit 9

DEVELOPING MATH INTUITION

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

The PROMYS Math Circle Problem of the Week #3 February 3, 2017

1 Modular Arithmetic Grade Level/Prerequisites: Time: Materials: Preparation: Objectives: Navajo Nation Math Circle Connection

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

PolarSync Quick Start

Pittsfield High School Summer Assignment Contract Intensive 9 / Honors 9 / Honors Physics

Lecture 1, 31/10/2001: Introduction to sequence alignment. The Needleman-Wunsch algorithm for global sequence alignment: description and properties

Gravity: How fast do objects fall? Teacher Advanced Version (Grade Level: 8 12)

Collisions and Momentum in 1D Teacher s Guide

LASER TEAM ANTI-BULLYING PROGRAM STUDENT WORKSHEET MASTERS AND ANSWER KEYS

Alignment Algorithms. Alignment Algorithms

Bio nformatics. Lecture 3. Saad Mneimneh

BSCS Science: An Inquiry Approach Level 3

How Do We Know Where an Earthquake Originated? Teacher's Guide

SUMMER PACKET. Accelerated Algebra Name Date. Teacher Semester

Multiplying Inequalities by Negative Numbers

Tree Building Activity

Section 29: What s an Inverse?

NUMB3RS Activity: Traveling on Good Circles Episode: Spree, Part I

TEACHER NOTES SCIENCE NSPIRED

TIPS TO PREPARE FOR THE BIOLOGY 2 nd SEMESTER FINAL EXAM:

MPM1D - Practice Mastery Test #6

Polynomials; Add/Subtract

Lecture 2: Pairwise Alignment. CG Ron Shamir

AP STATISTICS: Summer Math Packet

Edexcel AS and A Level Mathematics Year 1/AS - Pure Mathematics

Lecture 5,6 Local sequence alignment

Mathematics 1104B. Systems of Equations and Inequalities, and Matrices. Study Guide. Text: Mathematics 11. Alexander and Kelly; Addison-Wesley, 1998.

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

Practical considerations of working with sequencing data

Evolutionary Models. Evolutionary Models

Student Instruction Sheet: Unit 2, Lesson 2. Equations of Lines, Part 2

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

Quinsigamond Community College School of Math and Science

Expressions and Equations 6.EE.9

Clues to Earth s Past

Serena: I don t think that works because if n is 20 and you do 6 less than that you get 20 6 = 14. I think we should write! 6 > 4

Biochemistry 324 Bioinformatics. Pairwise sequence alignment

Similarity or Identity? When are molecules similar?

Partial Fraction Decomposition

Mathematics Revision Guide. Algebra. Grade C B

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

M.A.P. Matrix Algebra Procedures. by Mary Donovan, Adrienne Copeland, & Patrick Curry

17.1 Binary Codes Normal numbers we use are in base 10, which are called decimal numbers. Each digit can be 10 possible numbers: 0, 1, 2, 9.

Chapter 19: Taxonomy, Systematics, and Phylogeny

MegAlign Pro Pairwise Alignment Tutorials

Grade 6 Math Circles October 9 & Visual Vectors

Lecture 5: September Time Complexity Analysis of Local Alignment

When they compared their results, they had an interesting discussion:

Student pages-1. Biological Evolution and Classification Scientific Evidence of Common Ancestry. Purpose. Before You Begin.

Math 9 Practice Final Exam #2

Math 138: Introduction to solving systems of equations with matrices. The Concept of Balance for Systems of Equations

Computational Biology Lecture 5: Time speedup, General gap penalty function Saad Mneimneh

Introduction to sequence alignment. Local alignment the Smith-Waterman algorithm

Homework 9: Protein Folding & Simulated Annealing : Programming for Scientists Due: Thursday, April 14, 2016 at 11:59 PM

Elementary Algebra - Problem Drill 01: Introduction to Elementary Algebra

LECSS Physics 11 Introduction to Physics and Math Methods 1 Revised 8 September 2013 Don Bloomfield

The Math Learning Center PO Box 12929, Salem, Oregon Math Learning Center

GRADE 6 Projections Masters

Exploring Graphs of Polynomial Functions

Lecture 4: September 19

Transcription:

Teacher Page 1 NUMB3RS Activity: DNA Sequence Alignment Topic: Biomathematics DNA sequence alignment Grade Level: 10-12 Objective: Use mathematics to compare two strings of DNA Prerequisite: very basic knowledge about DNA Time: 20-30 minutes Introduction Charlie Eppes learns that the FBI lifted a number of DNA samples from a victim s apartment. He wants to try to determine something about the suspect s ancestry based on the samples. Although examining the DNA would not enable the FBI to identify a specific person, it is possible to identify certain traits like birth defects, red hair, or freckles. A company called DNA Print has created a statistical index that can take information from a DNA sequence and use it to more closely identify its owner. Discuss with Students Sequence alignment is particularly useful in tracing the evolution of sequences from a common ancestor, using either protein sequences or actual DNA sequences. It also is used for comparing sequences of successive generations of the same species, or two species believed to be very closely related through evolution. Changes to the code naturally occur from one generation to the other. A fundamental tool for comparing two closely related sequences of DNA is called global alignment. This is one way of determining if two sequences share a common ancestor. DNA is made up of pairs of nucleotides (A, C, G, T, where A pairs with T, and C pairs with G). The nucleotides of the two sequences are compared one at a time to see how well they align. Example: Compare CAGCT and ATCT We write these strings so that the letters line up in some way, and so that as many letters as possible match. Because these strings are different lengths we ll need to use a gap symbol like -. Here are some sample results: CAGCT CAGCT CAGCT CAGCT -CAGCT CA-GCT -ATCT A-TCT AT-CT ATC-T ATCT-- -AT-CT Notice that gaps can be put in one or both strings to improve the alignment. In each step of the alignment, there are three choices: match a letter from the first string with a gap (1st and 6th example), match a letter from each string (2nd, 3rd, and 4th example), or match a gap from the first string with a letter from the second string (5th example). Because the same three choices exist until all of the letters of the shorter string have been matched with something, there are at least 3 k possible matchings, where k is the length of the shorter string. So, which alignment is considered the best? The purpose of this activity is for students to understand the use of a mathematical algorithm that determines the best alignment for a pair of DNA sequences. It makes use of a scoring system for matches and mismatches, and is a variation of Dijkstra s algorithm for finding the shortest path through a weighted graph (see Extensions). It may

Teacher Page 2 be necessary for you to work through and discuss with students the first part of the student page to help students understand the meaning of the entries in the table and how the scoring system works. Student Page Answers: 1. (3 10 9 )(0.001) = 3,000,000 2. 4 3,000,000 10 602060. (This could be a fun one to try to write in scientific notation.) 3. There is no need to. Both gaps could be taken out and the alignment is still the same. 4. 3 3, or 27; 3 k. 5. ( 2) + ( 1) + (+2) + (+2) = +1; ( 1) + ( 2) + (+2) + (+2) = +1; ( 1) + ( 1) + ( 2) + (+2) = 2; ( 1) + ( 1) + ( 1) + ( 2) = 5; ( 1) + ( 2) + (+2) + ( 2) + ( 2) = 5. 6. - A T G C T -2-1 0-2 -4 G -4-3 -2 2 0 A -6-2 -4 0 1 C -8-4 -3-2 2 The best alignment is: ATG-C -TGAC

Student Page 1 Name: Date: NUMB3RS Activity: DNA Sequence Alignment Charlie Eppes learns that the FBI lifted a number of DNA samples from a victim s apartment. He wants to try to determine something about the suspect s ancestry based on the samples. Although examining the DNA would not enable the FBI to identify a specific person, it is possible to identify certain traits like birth defects, red hair, or freckles. A company called DNA Print has created a statistical index that can take information from a DNA sequence and use it to more closely identify its owner. In this activity you will learn how math is used to perform global alignment on a pair of DNA sequences. This is one way of determining if two sequences share a common ancestor. Global alignment is a fundamental tool for comparing two closely related sequences of DNA. Each strand of DNA is made up of pairs of nucleotides (A, C, G, T, where A pairs with T, and C pairs with G), aligned in a double helix. The nucleotides of two sequences are compared one at a time to see how well they align. We write these strings so that the letters line up in some way, with the underlying goal to match up as many letters as possible (without rearranging them, of course). Human DNA contains approximately three billion (3 10 9 ) base pairs of nucleotides. 1. If all human DNA is 99.9% identical, how many base pairs are left to make different DNA sequences (meaning different humans)? 2. Because bases always pair up the same way (A with T, G with C) it is only necessary to consider one side of the sequence (or one side of the double helix). If each base can be any of the four letters, and they are arranged in a sequence of 3 10 6 letters long, how many different sequences are mathematically possible? Consider the following two short sample sequences that we want to align: ACTG and TTG. Since they are different lengths, we need to assign a gap symbol - to assist in the alignment. (The gap symbol can be used in either or both strings.) Some possible alignments are: String 1: ACTG ACTG ACTG ACTG -ACTG String 2: -TTG T-TG TT-G TTG T-T-G (etc.) When aligning the bases, we always have three choices for each step: align a letter from String 1 with a gap in String 2, align a letter from each string, or align a gap in String 1 with a letter in String 2. (Picture a tree diagram to go with this idea.) 3. Why would we not align a gap in String 1 with a gap in String 2? 4. Because the shorter string is three letters long, what is the least possible number of different matchings? If the shorter string has a length of k, how many would this be? 5. Which alignment is the best one? Suppose we use the following scoring system: Add 2 for each alignment of matching letters Subtract 1 for each alignment of mismatched letters Subtract 2 for each alignment of a letter with a gap. What is the total score for each of the alignments above?

Student Page 2 The scoring system used in this activity is actually arbitrary, since we don t know the actual score that nature uses. This particular scoring system is useful for the sake of these examples, and is typical. The example below uses a matrix to do the alignment. String 1 is on top and String 2 is down the left. Note that both strings start with a gap in the matrix, just to get things going. Use the same scoring as before. The goal is to generate a path from the upper left to the lower right. The connections are made and scored in the following way: Horizontal connection a gap is inserted in string #2 (subtract 2) Vertical connection a gap is inserted in string #1 (subtract 2) Diagonal connection a base is paired with a base (add 2 if they match, subtract 1 if they are different) For most of the boxes, there are three ways to get to that box vertically, diagonally, and horizontally. We only mark the connection that produces the highest total score. For example: T -2 T -2-1 T -4 T -4-3 G -6 G -6-5 The connections made and scores shown from inserting gaps. (Subtract 2 each time a gap is used.) Remember to mark only the best connection. The box with a score of 1 means that aligning A with T produces a higher score than the other possible choices for the first letter. (That is, because 0 1 = 1, and 2 2 = 4, mark the connection only diagonally.) T -2-1 -3 T -2-1 -3-2 T -4-3 -2 T -4-3 -2-1 G -6-5 -4 G -6-5 -4-3 For the 3 entry in the C column, note that if there is a tie for best choice, all are marked (here it is marked both diagonally and horizontally, because 2 1 = 3 (align C with T) and 1 2 = 3 (align C with a gap). Matches are +2, so matching T with T diagonally increases the total score by 2 ( 4 + 2 = 2) as shown in the T column (rather than align T with a gap horizontally or a gap with T vertically). In the 3rd row, the highest possible total is 3 + 2 = 1 (align T with T).

Student Page 3 T -2-1 -3-2 -4 T -4-3 -2-1 -3 G -6-5 -4-3 1 The total score for this alignment is 1 (in the bottom right box). To determine the alignment(s), start at the end and trace all paths that lead back to the beginning. Since this example has two different paths, this means that the optimal alignments are: ACTG -and- ACTG -TTG T-TG 6. Use the method described above to globally align strings ATGC and TGAC in the space provided below. Trace backwards to show the alignment (this one only has one). - A T G C T -2 G -4 A -6 C -8

The goal of this activity is to give your students a short and simple snapshot into a very extensive mathematical topic. TI and NCTM encourage you and your students to learn more about this topic using the extensions provided below and through your own independent research. Extensions Introduction Global alignment is mainly used for aligning sequences that are closely related. Due to many factors, evolution can cause chunks of a sequence to relocate elsewhere in the string, and even be reversed. These relationships cannot be found using global alignment. Other algorithms are used to perform local alignment, in which shorter sequences are identified and aligned within longer strings. This is one way of determining if two sequences share a common ancestor. Scientists associate probabilities with various differences between two strings. This in turn gives them much information about how the two strings are related. Much research is being done to develop and understand mathematical models related to understanding DNA. Additional Resources The algorithm used in this activity is called Needleman-Wunsch global alignment. For information about local alignment algorithms, such as Smith-Waterman or Framesearch, as well as a lot more information about the biology related to the mathematics, visit: http://en.wikipedia.org/wiki/sequence_alignment The algorithm used for aligning strings is based on Dijkstra s algorithm, used to determine the shortest path between vertices in a graph (like two locations on a map). Dijkstra s algorithm is at the heart of every Internet service that computes driving directions. For more about this algorithm and a demonstration of how it works, visit: http://www.cs.sunysb.edu/~skiena/combinatorica/animations/dijkstra.html The Center for Discrete Mathematics and Theoretical Computer Science (DIMACS) is teaching mathematics and biology teachers about the interface between biology and mathematics (called Biomathematics) and exploring ways of how to teach it. One of the first teaching tools developed is a Java applet developed in 2005 by high school teacher Jim Kupetz that demonstrates global (and local) string alignment. It is available for download at: http://dimacs.rutgers.edu/dci/2005/education.html Just click the Global Alignment Program link.