Reconstructing the History of Large-scale Genomic Changes. Jian Ma

Size: px
Start display at page:

Download "Reconstructing the History of Large-scale Genomic Changes. Jian Ma"

Transcription

1 Reconstructing the History of Large-scale Genomic Changes Jian Ma

2 The Human Genome: the blueprint of our body Initial sequencing and analysis of the human genome International Human Genome Sequencing Consortium* Nature 2001 * A partial list of authors appears on the opposite page. Af liations are listed at the end of the paper.... The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence. Finishing the euchromatic sequence of the human genome International Human Genome Sequencing Consortium* * A list of authors and their affiliations appears in the Supplementary Information Nature 2004 Phase 2: Interpret the genetic code, i.e. How the Human Genome works? GTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTAGCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAA CAACCATAGGATGATAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAAAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAG TTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTGTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAATTCTAGCG CAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGGATTTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACT TCTCGGTTTTACCTTTAGCTATTGATTTTGATATGCTTTGCGCCGTCAAAGTTTTGAACGATGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAATCTTTAAGAGTCTTGAAGGCTGTGAAATTA ATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATGAACGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAAT GGATCATATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAAAAGAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACT GATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCAGCATTGGGCAGCTGTCTATATGAATTAGTCAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACTTTAGCATCACAAAA TACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGC TTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAG...TTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAAC ACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATAC CTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTTTCCTACGCATAATAAGAATAGGAGG GAATATCAAGCCAGACAATCTATCATTACATTTAAGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAAGAGTATAAGTCATCTCAGAGTAA TATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATACAGCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTC ATTTATGTACAACCAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATCAACACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAA TTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGTTGGTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCTTCTCTTTATG GCCCGTTATTAACAGAGTCGTCATGGCCATCGTTTGGTATAGTGTCCAAGCTTATATTGCGGCAACTCCCGTATCATTAATGCTGAAATCTATCTTTGGAAAAGATTTACAATGATTGTACGTGGGGCAGTTGACGTCTTATCA TATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAA 2

3 Sequence conservation implies function chr2 (q31.1) 21 p14 2p q34 q35 chr2: DLX1 DLX UCSC Genes Based on RefSeq, UniProt, GenBank, CCDS and Comparative Genomics Vertebrate Multiz Alignment & PhastCons Conservation (28 Species) Vertebrate Cons Chimp Rhesus Bushbaby Tree_shrew Mouse Rat Guinea_Pig Shrew Hedgehog Dog Cat Horse Cow Armadillo Elephant Tenrec Opossum Platypus Lizard Chicken Zebrafish Tetraodon Fugu Stickleback Medaka DLX1 Gaps Human Chimp Rhesus Bushbaby Tree_shrew Mouse Rat Guinea_Pig Shrew Hedgehog Dog Cat Horse Cow Armadillo Elephant Tenrec Opossum Platypus Lizard Chicken Zebrafish Tetraodon Fugu Stickleback Medaka UCSC Genes Based on RefSeq, UniProt, GenBank, CCDS and Comparative Genomics K P R T I Y S S L Q L Q A L N Vertebrate Multiz Alignment & PhastCons Conservation (28 Species) 1 A A A C C C A G G A C G A T T T A T T C C A G T T T G C A G T T G C A G G C T T T G A A C A A A C C C A G G A C G A T T T A T T C C A G T T T G C A G T T G C A G G C T T T G A A C A A A C C C A G G A C G A T T T A T T C C A G C T T G C A G T T G C A G G C T T T G A A C A A A C C C A G G A C G A T T T A T T C C A G T T T G C A G T T G C A G G C T T T G A A T A A A C C C A G G A C G A T T T A T T C C A G T T T G C A G T T G C A G G C T T T G A A C A A A C C C A G G A C A A T T T A T T C C A G T T T G C A G T T G C A G G C T T T G A A C A A A C C C A G G A C A A T T T A T T C C A G T T T G C A G T T G C A G G C T T T G A A C C C C C C T A G G A C A A T T T A T T C C A G T T T G C A G C T G G A C G C T T T G A A T A A A C C C A G G A C G A T T T A T T C C A G T T T G C A G T T G C A G G C T T T G A A C A A G C C C A G G A C A A T C T A T T C C A G T T T G C A G T T G C A G G C T T T G A A C A A A C C C A G G A C G A T T T A C T C C A G T T T G C A G T T G C A G G C T T T G A A C A A A C C C A G G A C G A T T T A T T C C A G T T T G C A G T T G C A G G C T T T G A A C A A A C C C A G G A C G A T T T A T T C C A G T T T G C A G T T G C A G G C T T T G A A C A A A C C C A G G A C G A T T T A T T C C A G T T T G C A G T T G C A G G C T T T G A A C A A A C C C A G G A C G A T T T A T T C C A G T T T G C A G T T G C A G G C T T T G A A C A A A C C C A G G A C A A T T T A T T C C A G T T T G C A G T T G C A G G C T T T G A A C A A A C C T A G G A C G A T T T A T T C C A G T T T G C A G C T G C A G G C T T T G A A T A A A C C C A G G A C T A T T T A T T C C A G T C T G C A G T T G C A G G C T T T G A A C A A A C C C A G G A C T A T A T A T T C C A G T T T G C A G T T G C A G G C A T T G A A C A A G C C G C G C A C C A T C T A C T C C A G C C T C C A G C T C C A G G C C T T G A A C A A A C C C A G G A C T A T T T A T T C C A G T T T G C A G C T G C A G G C T C T G A A C A A G C C C C G G A C C A T A T A C T C C A G T C T C C A G C T G C A G G C T C T G A A C A A A C C C A G G A C T A T C T A T T C C A G T T T A C A G C T C C A G G C C C T G A A C A A A C C A A G G A C T A T C T A T T C A A G T T T A C A A C T C C A A G C C C T G A A C A A A C C A A G G A C T A T C T A T T C C A G T T T A C A A C T T C A A G C T C T A A A C A A A C C A A G G A C T A T A T A T T C C A G T T T A C A G C T T C A G G C T C T G A A C 3

4 Chromosomal differences between human and mouse Human Mouse 4

5 Mammalian evolution Hominini ancestor Human Chimpanzee Orangutan Hominidae ancestor Macaque Catarrhini ancestor Primates Baboon Colobus monkey Owl monkey Primate ancestor Marmoset Dusky titi Mouse lemur Euarchontoglires ancestor Galago Mouse Boreoeutherian ancestor Rabbit Eutherian ancestor Cow Bat Shrew Mammalian ancestor Laurasiatheria Dog Glires Rat Hedgehog Armadillo Xenarthra Elephant Te nrec Afrotheria Monodelphis Platypus 5

6 Reconstruction provides additional dimension for Comparative Genomics NM_ boreoeutherian euarc primate ape human boreoeutherian euarc primate ape human TransMap RefSeq Genes A V G W V I F A G C C G T G G G C T G G G T C A T C T T T G C G C C G T G G G C T G G G T C A T C T T T G C G C T G T G G G C T G G G T C C T C T T T G C G C T G T G G G C T G A G T C C T C T T T G C G C T G T G G G C T G A G T C C T C T T T G C A V G W V I F A A V G W V I F A A V G W V L F A A V G V L F A A V G * V L F A ACYL3: a gene lost in humans and chimps Zhu et al., PLoS Comp Bio

7 Base-level ancestral reconstruction In the multiple alignment for sequences from different species, we often see gaps in it. human chimp macaque mouse rat dog cow ATCAGC------GGCGAT ATCAGC------GGCGAT ATCAGCCGGATCGGCGAT ATCAGCCGGATCGGCGAT ATCAGCCGGATCGGCGAT ATCAGCCGGATCGGCGAT ATCAGCCGGATCGGCGAT 7

8 Base-level ancestral reconstruction Substitutions (point mutations) Small insertions and deletions (indels) ARMADILLO TGCTACTAATAT-----T-TAGTA-CATAGAG-CC-CAGGGGTGCTGCTGAAA GTCTTAAAATGCACA MOUSELEMUR ATCACAG-TTGGGGGATGCCACTGGCCT-----C-AAGTG-GGTAGAG-AA-CAGGGAGGCTGAAAACC ACCCTGCAGAGCACG ORANGUTAN GTCACGATTTGGGAGATGCTTCTGGCTC-----G-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC ATCCAACAGTGCACA COW GCCTCTCTTT CTGCCCTGCAGGC-TAGAA-TGTATCA-CT-TAGATGTTCCAA ATCAGAAAGTGTTCA HORSE GTCACAATTTAGGAAGTGCCACTGGCCT-----C-TAGAG-GGTAGAA-GA-CAGGGATGCTAATAATCATCCCACGTCATCCTACAGTGCTCA CAT GTCACAGTTTAGGGGGTACTACTGGCAT-----C-TATCG-GGTGGAG-GA-TAGGGATACTGATAATC ATTCTACAGTGCACA DOG GTCACAATTTGGGGGATACTACTGGCAT-----C-TAATG-GGTAGAG-GA-CAGGGATACTGATAATT GCTTTACAGTGCACA HEDGEHOG GTCATAGTTT----GATTATATGGGCTT-----CTTAGTA-GACAAAGAAA-AAGATGTTCTGGTAGTC ATTCTGCTTTCCATA MOUSE GTCACAGTTTGGAGGATGTTACTGACAT-----C-TAGAG-AGTAGAC-TT-TAAAGATACTGATAGTC ACCCCATTGTGCAC- RAT GTCACAATTTGGAGGATGTTACTGGCAT-----C-TAGAG-AGTAGAC-TT-TAAGGACACTGATAATC ATACTATGCTGCAC- RABBIT ATCACAATTTGGGGAACACCACTGGCAT-----C-TCGGGTAGCAGGC----CAGGCATGCTGGTAATT ATACTACAGTGCACA LEMUR ATCACAA-TTGGGGG-TGCCACGGTCCT-----C-CAGTG-GGTAGAG-AA-CAGGGAGGCTGATAACC ACCCTGCAGTGCACA VERVET GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGGATGCTTATAATC ATCCTACAGTGCACA MACAQUE GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGAATGCTTATAATC ATCCTACAGTGCACA BABOON GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAA-AAACAGGGATGCTTATAATC ATCCTACAGTGCACA GORILLA GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGTGGGGATGCTTATACTC ATCCTACAGTGCACA CHIMP GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC ATCCTACAGTGCACA HUMAN GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC ATCCTACAGTGCACA Blanchette et al., Genome Res

9 Base-level ancestral reconstruction Substitutions (point mutations) Small insertions and deletions (indels) ARMADILLO TGCTACTAATAT-----T-TAGTA-CATAGAG-CC-CAGGGGTGCTGCTGAAA GTCTTAAAATGCACA MOUSELEMUR ATCACAG-TTGGGGGATGCCACTGGCCT-----C-AAGTG-GGTAGAG-AA-CAGGGAGGCTGAAAACC ACCCTGCAGAGCACG ORANGUTAN GTCACGATTTGGGAGATGCTTCTGGCTC-----G-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC ATCCAACAGTGCACA COW GCCTCTCTTT CTGCCCTGCAGGC-TAGAA-TGTATCA-CT-TAGATGTTCCAA ATCAGAAAGTGTTCA HORSE GTCACAATTTAGGAAGTGCCACTGGCCT-----C-TAGAG-GGTAGAA-GA-CAGGGATGCTAATAATCATCCCACGTCATCCTACAGTGCTCA CAT GTCACAGTTTAGGGGGTACTACTGGCAT-----C-TATCG-GGTGGAG-GA-TAGGGATACTGATAATC ATTCTACAGTGCACA DOG GTCACAATTTGGGGGATACTACTGGCAT-----C-TAATG-GGTAGAG-GA-CAGGGATACTGATAATT GCTTTACAGTGCACA HEDGEHOG GTCATAGTTT----GATTATATGGGCTT-----CTTAGTA-GACAAAGAAA-AAGATGTTCTGGTAGTC ATTCTGCTTTCCATA MOUSE GTCACAGTTTGGAGGATGTTACTGACAT-----C-TAGAG-AGTAGAC-TT-TAAAGATACTGATAGTC ACCCCATTGTGCAC- RAT GTCACAATTTGGAGGATGTTACTGGCAT-----C-TAGAG-AGTAGAC-TT-TAAGGACACTGATAATC ATACTATGCTGCAC- RABBIT ATCACAATTTGGGGAACACCACTGGCAT-----C-TCGGGTAGCAGGC----CAGGCATGCTGGTAATT ATACTACAGTGCACA LEMUR ATCACAA-TTGGGGG-TGCCACGGTCCT-----C-CAGTG-GGTAGAG-AA-CAGGGAGGCTGATAACC ACCCTGCAGTGCACA VERVET GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGGATGCTTATAATC ATCCTACAGTGCACA MACAQUE GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGAATGCTTATAATC ATCCTACAGTGCACA BABOON GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAA-AAACAGGGATGCTTATAATC ATCCTACAGTGCACA GORILLA GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGTGGGGATGCTTATACTC ATCCTACAGTGCACA CHIMP GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC ATCCTACAGTGCACA HUMAN GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC ATCCTACAGTGCACA Blanchette et al., Genome Res

10 Base-level ancestral reconstruction Substitutions (point mutations) Small insertions and deletions (indels) ARMADILLO TGCTACTAATAT-----T-TAGTA-CATAGAG-CC-CAGGGGTGCTGCTGAAA GTCTTAAAATGCACA MOUSELEMUR ATCACAG-TTGGGGGATGCCACTGGCCT-----C-AAGTG-GGTAGAG-AA-CAGGGAGGCTGAAAACC ACCCTGCAGAGCACG ORANGUTAN GTCACGATTTGGGAGATGCTTCTGGCTC-----G-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC ATCCAACAGTGCACA COW GCCTCTCTTT CTGCCCTGCAGGC-TAGAA-TGTATCA-CT-TAGATGTTCCAA ATCAGAAAGTGTTCA HORSE GTCACAATTTAGGAAGTGCCACTGGCCT-----C-TAGAG-GGTAGAA-GA-CAGGGATGCTAATAATCATCCCACGTCATCCTACAGTGCTCA CAT GTCACAGTTTAGGGGGTACTACTGGCAT-----C-TATCG-GGTGGAG-GA-TAGGGATACTGATAATC ATTCTACAGTGCACA DOG GTCACAATTTGGGGGATACTACTGGCAT-----C-TAATG-GGTAGAG-GA-CAGGGATACTGATAATT GCTTTACAGTGCACA HEDGEHOG GTCATAGTTT----GATTATATGGGCTT-----CTTAGTA-GACAAAGAAA-AAGATGTTCTGGTAGTC ATTCTGCTTTCCATA MOUSE GTCACAGTTTGGAGGATGTTACTGACAT-----C-TAGAG-AGTAGAC-TT-TAAAGATACTGATAGTC ACCCCATTGTGCAC- RAT GTCACAATTTGGAGGATGTTACTGGCAT-----C-TAGAG-AGTAGAC-TT-TAAGGACACTGATAATC ATACTATGCTGCAC- RABBIT ATCACAATTTGGGGAACACCACTGGCAT-----C-TCGGGTAGCAGGC----CAGGCATGCTGGTAATT ATACTACAGTGCACA LEMUR ATCACAA-TTGGGGG-TGCCACGGTCCT-----C-CAGTG-GGTAGAG-AA-CAGGGAGGCTGATAACC ACCCTGCAGTGCACA VERVET GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGGATGCTTATAATC ATCCTACAGTGCACA MACAQUE GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGAATGCTTATAATC ATCCTACAGTGCACA BABOON GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAA-AAACAGGGATGCTTATAATC ATCCTACAGTGCACA GORILLA GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGTGGGGATGCTTATACTC ATCCTACAGTGCACA CHIMP GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC ATCCTACAGTGCACA HUMAN GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC ATCCTACAGTGCACA NNNNNNNNNNNNNNNNNNNNNNNNNNNN-----N-NNNNN-NNNNNNN-NN-NNNNNNNNNNNNNNNNN NNNNNNNNNNNNNN Blanchette et al., Genome Res

11 Base-level ancestral reconstruction Substitutions (point mutations) Small insertions and deletions (indels) ARMADILLO TGCTACTAATAT-----T-TAGTA-CATAGAG-CC-CAGGGGTGCTGCTGAAA GTCTTAAAATGCACA MOUSELEMUR ATCACAG-TTGGGGGATGCCACTGGCCT-----C-AAGTG-GGTAGAG-AA-CAGGGAGGCTGAAAACC ACCCTGCAGAGCACG ORANGUTAN GTCACGATTTGGGAGATGCTTCTGGCTC-----G-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC ATCCAACAGTGCACA COW GCCTCTCTTT CTGCCCTGCAGGC-TAGAA-TGTATCA-CT-TAGATGTTCCAA ATCAGAAAGTGTTCA HORSE GTCACAATTTAGGAAGTGCCACTGGCCT-----C-TAGAG-GGTAGAA-GA-CAGGGATGCTAATAATCATCCCACGTCATCCTACAGTGCTCA CAT GTCACAGTTTAGGGGGTACTACTGGCAT-----C-TATCG-GGTGGAG-GA-TAGGGATACTGATAATC ATTCTACAGTGCACA DOG GTCACAATTTGGGGGATACTACTGGCAT-----C-TAATG-GGTAGAG-GA-CAGGGATACTGATAATT GCTTTACAGTGCACA HEDGEHOG GTCATAGTTT----GATTATATGGGCTT-----CTTAGTA-GACAAAGAAA-AAGATGTTCTGGTAGTC ATTCTGCTTTCCATA MOUSE GTCACAGTTTGGAGGATGTTACTGACAT-----C-TAGAG-AGTAGAC-TT-TAAAGATACTGATAGTC ACCCCATTGTGCAC- RAT GTCACAATTTGGAGGATGTTACTGGCAT-----C-TAGAG-AGTAGAC-TT-TAAGGACACTGATAATC ATACTATGCTGCAC- RABBIT ATCACAATTTGGGGAACACCACTGGCAT-----C-TCGGGTAGCAGGC----CAGGCATGCTGGTAATT ATACTACAGTGCACA LEMUR ATCACAA-TTGGGGG-TGCCACGGTCCT-----C-CAGTG-GGTAGAG-AA-CAGGGAGGCTGATAACC ACCCTGCAGTGCACA VERVET GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGGATGCTTATAATC ATCCTACAGTGCACA MACAQUE GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAG-AAACAGGAATGCTTATAATC ATCCTACAGTGCACA BABOON GTCAGAATTTGGGGGATGCTTCTGGCTC-----T-ACTTG-GGTAGAA-AAACAGGGATGCTTATAATC ATCCTACAGTGCACA GORILLA GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGTGGGGATGCTTATACTC ATCCTACAGTGCACA CHIMP GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC ATCCTACAGTGCACA HUMAN GTCACGATTTGGGGGATGCTTCTGGCTC-----A-ACTTG-GGTAGAG-AAGCGGGGATGCTTATAATC ATCCTACAGTGCACA NNNNNNNNNNNNNNNNNNNNNNNNNNNN-----N-NNNNN-NNNNNNN-NN-NNNNNNNNNNNNNNNNN NNNNNNNNNNNNNN GTCACAATTTGGGGGATGCTACTGGCAT-----C-TAGTG-GGTAGAG-AA-CAGGGATGCTGATAATC ATCCTACAGTGCAC Blanchette et al., Genome Res

12 rrangements. Each green or red rectangle is a chromo- mammalian genome evolution. ates what the chromosomes look like before and after mathematical Large-scale model structural of chromosome genomic evolution, changesa chrom g of numbers (or permutation), and a genome as a set o gested where that separates Robertsonian chromosomes. translocation Numbers also on played a chromo an volution. e.g. a single base, a gene, or larger piece of DNA seque of + chromosome or, which indicate evolution, thearelative chromosome orientation can be of the rep-gutation), rearrangements and a genome discussed as a set in the of these previous strings, section e.g. ca inversion me omosomes. ersion: 1 2 Numbers on a chromosome could 6 7 (In be bioinf any alled ene, or reversal); larger piece Translocation: of DNA sequence Numbers may 6 e2the 3 4relative orientation Fission: 1 2of 3the 4 5 genomic 67 content scussed form composite in the previous operations. section For can example, be interpreted as by two overlapping translocation inversions (reciprocal, robertsonian) followed by a fission: (In bioinformatics literature, ocation: Fusion: Overlapping or 9

13 rates chromosomes Numbers 6 7 8, where on a chromosome separates chro cou base, Large-scale a gene, genomic or larger structural content, piecegenomic of e.g. DNA a single sequence. changes base, Num a ge been h indicate suggested the have relative that signs, Robertsonian orientation either + translocation orof, the which genomic indicate also play con ments enome discussed evolution. The in the chromosome previous section rearrangements can be inter dis l 4model 5 67 of chromosome the following: evolution, Inversion: 6 7 (In a chromosome bioinformatics can 67 bl ); (or Translocation: permutation), 1 and 2 3 afusion 4 genome 5 67 as a 17 set of 6 these string inversion is also called reversal); Translo 5. rates chromosomes. Numbers on a chromosome could b Fission: Fission: 7. Overla 1 base, a gene, or larger piece of DNA sequence. Number ite operations. nested For example, operations1 form 2 3 4composite can operat be tra h indicate the relative orientation of the genomic conten rlapping inversions to followed by 7 aby fission: two overlapping i ments discussed in the previous section can be interpre (In bioinformatics liter ); Translocation: fission Fus Synteny blocks Fission: Overlapp ite operations. For example, can be transf tent that signed permutations Identifying thecan genomic represent content has alway that s rlapping inversions followed by a fission:

14 Partition the genomes into synteny blocks human chr13 (q12.13-q13.3) p13 p q q34 dog rat mouse chr13: Level 1 Level 2 Level 3 Level 4 Level 5 Level 6 Level 1 Level 2 Level 3 Level 4 Level 5 Level 6 Level 1 Level 2 Level 3 Level 4 Level 5 Level Dog (May 2005/canFam2) Alignment Net Rat (Nov. 2004/rn4) Alignment Net Mouse (July 2007/mm9) Alignment Net (A) human dog rat mouse (B) 11

15 Duplications and other structural changes Transposition Duplication (A) tandem duplication (B) segmental duplication 12

16 Operation-based ancestral reconstruction The parsimony problem: Given a set of present day genomes, find the evolutionary history for them with the minimum number of operations Pair-wise case: Hannenhalli-Pevzner theory for inversions A = , B = d(a,b) = 7 Median Problem A = d(m,a) + d(m,b) + d(m,c) B = M = C =

17 Adjacency-based ancestral reconstruction human chr13 (q21.1) q human: opossum dog rat mouse rhesus chr13: Level 1 Level 2 Level 3 Level 4 Level 5 Level 6 Level 1 Level 2 Level 3 Level 4 Level 5 Level 6 Level 1 Level 2 Level 3 Level 4 Level 5 Level 6 Level 1 Level 2 Level 3 Level 4 Level 5 Level 6 Level 1 Level 2 Level 3 Level 4 Level 5 Level Opossum (Jan. 2006/monDom4) Alignment Net Dog (May 2005/canFam2) Alignment Net Rat (Nov. 2004/rn4) Alignment Net Mouse (July 2007/mm9) Alignment Net Rhesus (Jan. 2006/rheMac2) Alignment Net (B) How to determine the ancestral orders? opossum: dog: rat: mouse: rhesus:

18 1-2 3 Adjacency-based ancestral reconstruction opossum dog mouse rat rhesus human

19 1-2 3 Adjacency-based ancestral reconstruction opossum dog mouse rat rhesus human

20 Fitch s algorithm human {A} {A G T} {A G} chimp {G} Goal: to infer minimum character changes in a specified tree topology {A} mouse {T} dog {A} The algorithm works sequentially, in two stages. For each position, in a bottom-up fashion, it first determines a set M π of candidate nucleotides at each node π in the tree according to the following rule: if π is a leaf, M π just contains its nucleotide character; otherwise, if π has children τ and ϕ, thenm π equals to M τ M ϕ or M τ M ϕ depending on whether M τ and M ϕ are disjoint or not. I.e., if M τ M ϕ 0 then M π M τ M ϕ else M π M τ M ϕ where X denotes the number of items in the set X. Then, in a top-down fashion, it assigns a character b π from M π to π according to the following rule: Let ρ be the parent of π; ifthecharacterb ρ assigned to ρ belongs to M π, then, b π = b ρ. Otherwise, set b π to be any character in M π. Although character assignment in this second stage may not be unique, any assignment gives an evolutionary history with the minimum number of substitution events. 16

21 Generalize the algorithm to track all the synteny blocks In our case, we deal with sequences of signed integers, rather than characters of nucleotides or amino acids, and instead of keeping track of letters at a particular sequence position, we track the synteny blocks for each of the immediately adjacent positions. Based on this logic, for a certain ancestor, we can infer what would be the most parsimonious neighbors of each synteny block in it. Finally, we connect the synteny blocks in the ancestor based on possible neighboring relationships into chromosomes. 17

22 The algorithm predecessor p g (i) is defined as the signed block that immediately precedes i successor s g (i) of i is defined analogously; e. For instance, let have the chromosome Thus, for any genome g, we associate with each block i two sets of signed blocks, denoted P g (i) and S g (i), giving potential predecessors and successors of i relative to chromosomes of g. If g is a modern genome, P g (i) ={p g (i)} and S g (i) ={s g (i)}, foreachi. If g does not contain i, then both sets are empty. GET-PREDECESSOR-SUCCESSOR(π) 1 if t is non-leaf node 2 then GET-PREDECESSOR-SUCCESSOR(τ) 3 GET-PREDECESSOR-SUCCESSOR(ϕ) 4 for i N to N (i 0) 5 do if P τ (i) P ϕ (i) 0 6 then P π (i) P τ (i) P ϕ (i) 7 else P π (i) P τ (i) P ϕ (i) 8 if S τ (i) S ϕ (i) 0 9 then S π (i) S τ (i) S ϕ (i) 10 else S π (i) S τ (i) S ϕ (i) 18

23 Reconstructing the Boreoeutherian common ancestor CAR 2 CAR CAR 1 CAR p 8p 8p 21q 3 5 CAR 6 15q 14q CAR 5 6 CAR 7 X 22q 12 22q CAR 12 CAR 15 8q CAR 10 2q CAR CAR 11 7 CAR CAR 13 2 CAR q 19q CAR q CAR CAR 9 11 CAR 22 7 CAR 8 10 CAR q 22q22q CAR 16 13q CAR Ma et al., Genome Res

24 Structural genomic variation between human individuals Variation Rearrangement type Size range a Single base-pair changes Small insertions/deletions Single nucleotide polymorphisms, point mutations Binary insertion/deletion events of short sequences (majority <10 bp in size) 1 bp 1 50 bp Short tandem repeats Microsatellites and other simple repeats bp Fine-scale structural variation Deletions, duplications, tandem repeats, inversions 50 bp to 5 kb Retroelement insertions SINEs, LINEs, LTRs, ERVs b 300 bp to 10 kb Intermediate-scale structural variation Large-scale structural variation Chromosomal variation Deletions, duplications, tandem repeats, inversions Deletions, duplications, large tandem repeats, inversions Euchromatic variants, large cytogenetically visible deletions, duplications, translocations, inversions, and aneuploidy 5 kb to 50 kb 50 kb to 5 Mb 5 Mb to entire chromosomes Sharp et al. Annu. Rev. Genomics Hum. Genet

25 Comparing Venter s Genome with the reference genome from public HGP a Assembly 1 Assembly 2 Matched b Assembly 1 Mismatch Assembly 2 c Assembly 1 Assembly 2 Unmatched d e f Assembly 1 Assembly 2 Assembly 1 Assembly 2 Assembly 1 Assembly 2 Copyunmatched Inversion Gap Khaja et al. Nature Genetics X Y Green bars -- unmatched Red bars -- copy-unmatched 21

26 Robertsonian translocation can cause Down syndrome Down syndrome is a chromosomal disorder which causes physical and intellectual delays in development and occurs when there are 3 chromosome 21's, resulting in 47 total chromosomes instead of the normal

27 ', * #*(&(+(& % * ** '" & ', ($', &-,,$(' individuals. Roughly 75 cancer genomes have been sequenced to some extent and published; researchers expect to have several hundred completed sequences by the end of the year. The efforts are certainly creating bigger hay stacks. Comparing the gene sequence of any tumour to that of a normal cell reveals dozens of single-letter changes, or point mutations, along with repeated, deleted, swapped or inverted sequences (see Genomes at a glance ). The difficulty, says Bert Vogelstein, a cancer researcher at the similar they might look clinically, most tumours seem to differ geneti cally. This stymies efforts to distinguish the mutations that cause and accelerate cancers the drivers from the accidental by-products of a cancer s growth and thwarted ETV6-ITPR2 fusion gene in the breast DNA-repair mechanisms the passengers. Researchers can look for mutations that pop up cancer PD3668a (Stephens et al. Nature again and again, or they can identify key path Figure 14: Fusion genes in cancer genomes. (A) CACNA2D4-WDR43 ways that are mutated at different points. But fusion gene identified in the NCI2009) ',* #*(&(+(& % H2171 cancer cell line. The 5 portion of are theproviding CACNA2D4 gene is amplified. A rearrangement breaks * ** '" &lung ', the projects more questions than ()0 '-& * # '" 23 answers. you take The the fewsequence obvious muta the gene in exon 36, fusing it into intron 3 of Once WDR43. at the breakpoint creates an almost (A) ur a ed in amat the ocitrate omoting And at least Agios Pharhusetts is he process. covery, ask cer genome will probably Cancer is another group of genetic diseases associated with massive amount of structural genomic changes ional mutahanged only es isocitrate ing enzyme were plenty 13,000 genes Nobody impor scu, a om hns re, to Chromosomal aberrations in cancer (B)

28 Summary Structural genomic changes can happen: 1) between different species 2) between different individuals in the same population 3) in disease genomes As the genomic data grows exponentially, the idea of ancestral genome reconstruction is an elegant way to organize a large number of related species, creating a vertical map so that we can navigate the genomes and trace the history from past to present. 24

Chromosomal rearrangements in mammalian genomes : characterising the breakpoints. Claire Lemaitre

Chromosomal rearrangements in mammalian genomes : characterising the breakpoints. Claire Lemaitre PhD defense Chromosomal rearrangements in mammalian genomes : characterising the breakpoints Claire Lemaitre Laboratoire de Biométrie et Biologie Évolutive Université Claude Bernard Lyon 1 6 novembre 2008

More information

Reconstructing contiguous regions of an ancestral genome

Reconstructing contiguous regions of an ancestral genome Reconstructing contiguous regions of an ancestral genome Jian Ma, Louxin Zhang, Bernard B. Suh, Brian J. Raney, Richard C. Burhans, W. James Kent, Mathieu Blanchette, David Haussler and Webb Miller Genome

More information

Computational Genetics Winter 2013 Lecture 10. Eleazar Eskin University of California, Los Angeles

Computational Genetics Winter 2013 Lecture 10. Eleazar Eskin University of California, Los Angeles Computational Genetics Winter 2013 Lecture 10 Eleazar Eskin University of California, Los ngeles Pair End Sequencing Lecture 10. February 20th, 2013 (Slides from Ben Raphael) Chromosome Painting: Normal

More information

Supplementary text and figures: Comparative assessment of methods for aligning multiple genome sequences

Supplementary text and figures: Comparative assessment of methods for aligning multiple genome sequences Supplementary text and figures: Comparative assessment of methods for aligning multiple genome sequences Xiaoyu Chen Martin Tompa Department of Computer Science and Engineering Department of Genome Sciences

More information

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS CRYSTAL L. KAHN and BENJAMIN J. RAPHAEL Box 1910, Brown University Department of Computer Science & Center for Computational Molecular Biology

More information

Drosophila melanogaster and D. simulans, two fruit fly species that are nearly

Drosophila melanogaster and D. simulans, two fruit fly species that are nearly Comparative Genomics: Human versus chimpanzee 1. Introduction The chimpanzee is the closest living relative to humans. The two species are nearly identical in DNA sequence (>98% identity), yet vastly different

More information

Orthologous loci for phylogenomics from raw NGS data

Orthologous loci for phylogenomics from raw NGS data Orthologous loci for phylogenomics from raw NS data Rachel Schwartz The Biodesign Institute Arizona State University Rachel.Schwartz@asu.edu May 2, 205 Big data for phylogenetics Phylogenomics requires

More information

Genome Rearrangements In Man and Mouse. Abhinav Tiwari Department of Bioengineering

Genome Rearrangements In Man and Mouse. Abhinav Tiwari Department of Bioengineering Genome Rearrangements In Man and Mouse Abhinav Tiwari Department of Bioengineering Genome Rearrangement Scrambling of the order of the genome during evolution Operations on chromosomes Reversal Translocation

More information

A Methodological Framework for the Reconstruction of Contiguous Regions of Ancestral Genomes and Its Application to Mammalian Genomes

A Methodological Framework for the Reconstruction of Contiguous Regions of Ancestral Genomes and Its Application to Mammalian Genomes A Methodological Framework for the Reconstruction of Contiguous Regions of Ancestral Genomes and Its Application to Mammalian Genomes Cedric Chauve 1, Eric Tannier 2,3,4,5 * 1 Department of Mathematics,

More information

Cladistics and Bioinformatics Questions 2013

Cladistics and Bioinformatics Questions 2013 AP Biology Name Cladistics and Bioinformatics Questions 2013 1. The following table shows the percentage similarity in sequences of nucleotides from a homologous gene derived from five different species

More information

Comparative Genomics. Chapter for Human Genetics - Principles and Approaches - 4 th Edition

Comparative Genomics. Chapter for Human Genetics - Principles and Approaches - 4 th Edition Chapter for Human Genetics - Principles and Approaches - 4 th Edition Editors: Friedrich Vogel, Arno Motulsky, Stylianos Antonarakis, and Michael Speicher Comparative Genomics Ross C. Hardison Affiliations:

More information

Genomes and Their Evolution

Genomes and Their Evolution Chapter 21 Genomes and Their Evolution PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from

More information

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Comparative genomics and proteomics Species available Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Vertebrates: human, chimpanzee, mouse, rat,

More information

Complex evolutionary history of the vertebrate sweet/umami taste receptor genes

Complex evolutionary history of the vertebrate sweet/umami taste receptor genes Article SPECIAL ISSUE Adaptive Evolution and Conservation Ecology of Wild Animals doi: 10.1007/s11434-013-5811-5 Complex evolutionary history of the vertebrate sweet/umami taste receptor genes FENG Ping

More information

I519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB

I519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB I519 Introduction to Bioinformatics, 2015 Genome Comparison Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Whole genome comparison/alignment Build better phylogenies Identify polymorphism

More information

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

Comparative Gene Finding. BMI/CS 776  Spring 2015 Colin Dewey Comparative Gene Finding BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following: using related genomes

More information

Breakpoint Graphs and Ancestral Genome Reconstructions

Breakpoint Graphs and Ancestral Genome Reconstructions Breakpoint Graphs and Ancestral Genome Reconstructions Max A. Alekseyev and Pavel A. Pevzner Department of Computer Science and Engineering University of California at San Diego, U.S.A. {maxal,ppevzner}@cs.ucsd.edu

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

Reconstructing large regions of an ancestral mammalian genome in silico

Reconstructing large regions of an ancestral mammalian genome in silico Letter Reconstructing large regions of an ancestral mammalian genome in silico Mathieu Blanchette, 1,4,5 Eric D. Green, 2 Webb Miller, 3 and David Haussler 1,5 1 Howard Hughes Medical Institute, University

More information

Comparative Genomics. Dept. of Computer Science Comenius University in Bratislava, Slovakia

Comparative Genomics. Dept. of Computer Science Comenius University in Bratislava, Slovakia Comparative Genomics Broňa Brejová Dept. of Computer Science Comenius University in Bratislava, Slovakia 1 2 Why to sequence so many genomes? 3 Comparative genomics Compare genomic sequences of multiple

More information

Molecules consolidate the placental mammal tree.

Molecules consolidate the placental mammal tree. Molecules consolidate the placental mammal tree. The morphological concensus mammal tree Two decades of molecular phylogeny Rooting the placental mammal tree Parallel adaptative radiations among placental

More information

17 Non-collinear alignment Motivation A B C A B C A B C A B C D A C. This exposition is based on:

17 Non-collinear alignment Motivation A B C A B C A B C A B C D A C. This exposition is based on: 17 Non-collinear alignment This exposition is based on: 1. Darling, A.E., Mau, B., Perna, N.T. (2010) progressivemauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5(6):e11147.

More information

Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON M5R 3G4 Canada

Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON M5R 3G4 Canada Multiple Whole Genome Alignments Without a Reference Organism Inna Dubchak 1,2, Alexander Poliakov 1, Andrey Kislyuk 3, Michael Brudno 4* 1 Genome Sciences Division, Lawrence Berkeley National Laboratory,

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

Reading for Lecture 13 Release v10

Reading for Lecture 13 Release v10 Reading for Lecture 13 Release v10 Christopher Lee November 15, 2011 Contents 1 Evolutionary Trees i 1.1 Evolution as a Markov Process...................................... ii 1.2 Rooted vs. Unrooted Trees........................................

More information

I519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB

I519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB I519 Introduction to Bioinformatics, 2011 Genome Comparison Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Whole genome comparison/alignment Build better phylogenies Identify polymorphism

More information

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven) BMI/CS 776 Lecture #20 Alignment of whole genomes Colin Dewey (with slides adapted from those by Mark Craven) 2007.03.29 1 Multiple whole genome alignment Input set of whole genome sequences genomes diverged

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Genomes Comparision via de Bruijn graphs

Genomes Comparision via de Bruijn graphs Genomes Comparision via de Bruijn graphs Student: Ilya Minkin Advisor: Son Pham St. Petersburg Academic University June 4, 2012 1 / 19 Synteny Blocks: Algorithmic challenge Suppose that we are given two

More information

A Phylogenetic Network Construction due to Constrained Recombination

A Phylogenetic Network Construction due to Constrained Recombination A Phylogenetic Network Construction due to Constrained Recombination Mohd. Abdul Hai Zahid Research Scholar Research Supervisors: Dr. R.C. Joshi Dr. Ankush Mittal Department of Electronics and Computer

More information

Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method

Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method Phylogeny 1 Plan: Phylogeny is an important subject. We have 2.5 hours. So I will teach all the concepts via one example of a chain letter evolution. The concepts we will discuss include: Evolutionary

More information

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods

More information

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Chapter 12 (Strikberger) Molecular Phylogenies and Evolution METHODS FOR DETERMINING PHYLOGENY In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Modern

More information

- mutations can occur at different levels from single nucleotide positions in DNA to entire genomes.

- mutations can occur at different levels from single nucleotide positions in DNA to entire genomes. February 8, 2005 Bio 107/207 Winter 2005 Lecture 11 Mutation and transposable elements - the term mutation has an interesting history. - as far back as the 17th century, it was used to describe any drastic

More information

Major Gene Families in Humans and Their Evolutionary History Prof. Yoshihito Niimura Prof. Masatoshi Nei

Major Gene Families in Humans and Their Evolutionary History Prof. Yoshihito Niimura Prof. Masatoshi Nei Major Gene Families in Humans Yoshihito Niimura Tokyo Medical and Dental University and Masatoshi Nei Pennsylvania State University 1 1. Multigene family Contents 2. Olfactory receptors (ORs) 3. OR genes

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

28-Way vertebrate alignment and conservation track in the UCSC Genome Browser

28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Resource 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Webb Miller, 1,11 Kate Rosenbloom, 2 Ross C. Hardison, 1 Minmei Hou, 1 James Taylor, 3 Brian Raney, 2 Richard Burhans,

More information

Evolutionary Tree Analysis. Overview

Evolutionary Tree Analysis. Overview CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based

More information

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST Introduction Bioinformatics is a powerful tool which can be used to determine evolutionary relationships and

More information

Whole Genome Alignments and Synteny Maps

Whole Genome Alignments and Synteny Maps Whole Genome Alignments and Synteny Maps IINTRODUCTION It was not until closely related organism genomes have been sequenced that people start to think about aligning genomes and chromosomes instead of

More information

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Bio 1B Lecture Outline (please print and bring along) Fall, 2007 Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution

More information

AN EXACT SOLVER FOR THE DCJ MEDIAN PROBLEM

AN EXACT SOLVER FOR THE DCJ MEDIAN PROBLEM AN EXACT SOLVER FOR THE DCJ MEDIAN PROBLEM MENG ZHANG College of Computer Science and Technology, Jilin University, China Email: zhangmeng@jlueducn WILLIAM ARNDT AND JIJUN TANG Dept of Computer Science

More information

Handling Rearrangements in DNA Sequence Alignment

Handling Rearrangements in DNA Sequence Alignment Handling Rearrangements in DNA Sequence Alignment Maneesh Bhand 12/5/10 1 Introduction Sequence alignment is one of the core problems of bioinformatics, with a broad range of applications such as genome

More information

The Contribution of Bioinformatics to Evolutionary Thought

The Contribution of Bioinformatics to Evolutionary Thought The Contribution of Bioinformatics to Evolutionary Thought A demonstration of the abilities of Entrez, BLAST, and UCSC s Genome Browser to provide information about common ancestry. American Scientific

More information

Sequence motif analysis

Sequence motif analysis Sequence motif analysis Alan Moses Associate Professor and Canada Research Chair in Computational Biology Departments of Cell & Systems Biology, Computer Science, and Ecology & Evolutionary Biology Director,

More information

1 Introduction. Abstract

1 Introduction. Abstract CBS 530 Assignment No 2 SHUBHRA GUPTA shubhg@asu.edu 993755974 Review of the papers: Construction and Analysis of a Human-Chimpanzee Comparative Clone Map and Intra- and Interspecific Variation in Primate

More information

Molecular evolution - Part 1. Pawan Dhar BII

Molecular evolution - Part 1. Pawan Dhar BII Molecular evolution - Part 1 Pawan Dhar BII Theodosius Dobzhansky Nothing in biology makes sense except in the light of evolution Age of life on earth: 3.85 billion years Formation of planet: 4.5 billion

More information

NIH Public Access Author Manuscript Pac Symp Biocomput. Author manuscript; available in PMC 2009 October 6.

NIH Public Access Author Manuscript Pac Symp Biocomput. Author manuscript; available in PMC 2009 October 6. NIH Public Access Author Manuscript Published in final edited form as: Pac Symp Biocomput. 2009 ; : 162 173. SIMULTANEOUS HISTORY RECONSTRUCTION FOR COMPLEX GENE CLUSTERS IN MULTIPLE SPECIES * Yu Zhang,

More information

The combinatorics and algorithmics of genomic rearrangements have been the subject of much

The combinatorics and algorithmics of genomic rearrangements have been the subject of much JOURNAL OF COMPUTATIONAL BIOLOGY Volume 22, Number 5, 2015 # Mary Ann Liebert, Inc. Pp. 425 435 DOI: 10.1089/cmb.2014.0096 An Exact Algorithm to Compute the Double-Cutand-Join Distance for Genomes with

More information

Analysis of Gene Order Evolution beyond Single-Copy Genes

Analysis of Gene Order Evolution beyond Single-Copy Genes Analysis of Gene Order Evolution beyond Single-Copy Genes Nadia El-Mabrouk Département d Informatique et de Recherche Opérationnelle Université de Montréal mabrouk@iro.umontreal.ca David Sankoff Department

More information

TE content correlates positively with genome size

TE content correlates positively with genome size TE content correlates positively with genome size Mb 3000 Genomic DNA 2500 2000 1500 1000 TE DNA Protein-coding DNA 500 0 Feschotte & Pritham 2006 Transposable elements. Variation in gene numbers cannot

More information

Algorithms for Bioinformatics

Algorithms for Bioinformatics Adapted from slides by Alexandru Tomescu, Leena Salmela, Veli Mäkinen, Esa Pitkänen 582670 Algorithms for Bioinformatics Lecture 5: Combinatorial Algorithms and Genomic Rearrangements 1.10.2015 Background

More information

Evolutionary Models. Evolutionary Models

Evolutionary Models. Evolutionary Models Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment

More information

Perfect Sorting by Reversals and Deletions/Insertions

Perfect Sorting by Reversals and Deletions/Insertions The Ninth International Symposium on Operations Research and Its Applications (ISORA 10) Chengdu-Jiuzhaigou, China, August 19 23, 2010 Copyright 2010 ORSC & APORC, pp. 512 518 Perfect Sorting by Reversals

More information

FUNDAMENTALS OF MOLECULAR EVOLUTION

FUNDAMENTALS OF MOLECULAR EVOLUTION FUNDAMENTALS OF MOLECULAR EVOLUTION Second Edition Dan Graur TELAVIV UNIVERSITY Wen-Hsiung Li UNIVERSITY OF CHICAGO SINAUER ASSOCIATES, INC., Publishers Sunderland, Massachusetts Contents Preface xiii

More information

Supplementary information

Supplementary information Supplementary information Superoxide dismutase 1 is positively selected in great apes to minimize protein misfolding Pouria Dasmeh 1, and Kasper P. Kepp* 2 1 Harvard University, Department of Chemistry

More information

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science Phylogeny and Evolution Gina Cannarozzi ETH Zurich Institute of Computational Science History Aristotle (384-322 BC) classified animals. He found that dolphins do not belong to the fish but to the mammals.

More information

Primate Diversity & Human Evolution (Outline)

Primate Diversity & Human Evolution (Outline) Primate Diversity & Human Evolution (Outline) 1. Source of evidence for evolutionary relatedness of organisms 2. Primates features and function 3. Classification of primates and representative species

More information

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016 Molecular phylogeny - Using molecular sequences to infer evolutionary relationships Tore Samuelsson Feb 2016 Molecular phylogeny is being used in the identification and characterization of new pathogens,

More information

CLADOGRAMS & GENETIC PHYLOGENIES

CLADOGRAMS & GENETIC PHYLOGENIES CLADOGRAMS & GENETIC PHYLOGENIES INTRODUCTION Taxonomists since Linnaeus have used relative similarities and differences to group species into a taxonomic hierarchy of genera, families, orders, etc. Darwin

More information

Understanding relationship between homologous sequences

Understanding relationship between homologous sequences Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective

More information

8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage

8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage 8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage Chris M. Rands 1, Stephen Meader 1, Chris P. Ponting 1 *, Gerton Lunter 2

More information

Phylogenomic Resources at the UCSC Genome Browser

Phylogenomic Resources at the UCSC Genome Browser 9 Phylogenomic Resources at the UCSC Genome Browser Kate Rosenbloom, James Taylor, Stephen Schaeffer, Jim Kent, David Haussler, and Webb Miller Summary The UC Santa Cruz Genome Browser provides a number

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Emily Blanton Phylogeny Lab Report May 2009

Emily Blanton Phylogeny Lab Report May 2009 Introduction It is suggested through scientific research that all living organisms are connected- that we all share a common ancestor and that, through time, we have all evolved from the same starting

More information

Evolution of Tandemly Arrayed Genes in Multiple Species

Evolution of Tandemly Arrayed Genes in Multiple Species Evolution of Tandemly Arrayed Genes in Multiple Species Mathieu Lajoie 1, Denis Bertrand 1, and Nadia El-Mabrouk 1 DIRO - Université de Montréal - H3C 3J7 - Canada {bertrden,lajoimat,mabrouk}@iro.umontreal.ca

More information

Evolution by duplication

Evolution by duplication 6.095/6.895 - Computational Biology: Genomes, Networks, Evolution Lecture 18 Nov 10, 2005 Evolution by duplication Somewhere, something went wrong Challenges in Computational Biology 4 Genome Assembly

More information

Phylogeny: traditional and Bayesian approaches

Phylogeny: traditional and Bayesian approaches Phylogeny: traditional and Bayesian approaches 5-Feb-2014 DEKM book Notes from Dr. B. John Holder and Lewis, Nature Reviews Genetics 4, 275-284, 2003 1 Phylogeny A graph depicting the ancestor-descendent

More information

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Supplementary Note S2 Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Phylogenetic trees reconstructed by a variety of methods from either single-copy orthologous loci (Class

More information

Multiple Alignment of Genomic Sequences

Multiple Alignment of Genomic Sequences Ross Metzger June 4, 2004 Biochemistry 218 Multiple Alignment of Genomic Sequences Genomic sequence is currently available from ENTREZ for more than 40 eukaryotic and 157 prokaryotic organisms. As part

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Realism and Instrumentalism. in models of. molecular evolution

Realism and Instrumentalism. in models of. molecular evolution Galileo Realism and Instrumentalism in models of molecular evolution David Penny Montpellier, June 08 Overview sites free to vary summing sources of error rates of molecular evolution estimates of time

More information

Phylogenetic inference: from sequences to trees

Phylogenetic inference: from sequences to trees W ESTFÄLISCHE W ESTFÄLISCHE W ILHELMS -U NIVERSITÄT NIVERSITÄT WILHELMS-U ÜNSTER MM ÜNSTER VOLUTIONARY FUNCTIONAL UNCTIONAL GENOMICS ENOMICS EVOLUTIONARY Bioinformatics 1 Phylogenetic inference: from sequences

More information

Multiple Whole Genome Alignment

Multiple Whole Genome Alignment Multiple Whole Genome Alignment BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 206 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by

More information

Regions with Duplications ABSTRACT

Regions with Duplications ABSTRACT JOURNAL OF COMPUTATIONAL BIOLOGY Volume 15, Number 8, 2008 Mary Ann Liebert, Inc. Pp. 1 21 DOI: 10.1089/cmb.2008.0069 DUPCAR: Reconstructing Contiguous Ancestral Regions with Duplications JIAN MA, 1 AAKROSH

More information

Comparative genomics: Overview & Tools + MUMmer algorithm

Comparative genomics: Overview & Tools + MUMmer algorithm Comparative genomics: Overview & Tools + MUMmer algorithm Urmila Kulkarni-Kale Bioinformatics Centre University of Pune, Pune 411 007. urmila@bioinfo.ernet.in Genome sequence: Fact file 1995: The first

More information

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5. Five Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu v Distance Methods v Character Methods v Molecular Clock v UPGMA v Maximum Parsimony

More information

Nature Genetics: doi:0.1038/ng.2768

Nature Genetics: doi:0.1038/ng.2768 Supplementary Figure 1: Graphic representation of the duplicated region at Xq28 in each one of the 31 samples as revealed by acgh. Duplications are represented in red and triplications in blue. Top: Genomic

More information

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM I529: Machine Learning in Bioinformatics (Spring 2017) HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington

More information

Conservation of Human Microsatellites across 450 Million Years of Evolution

Conservation of Human Microsatellites across 450 Million Years of Evolution Conservation of Human Microsatellites across 450 Million Years of Evolution Emmanuel Buschiazzo*,1,2 and Neil J. Gemmell 1,3 1 School of Biological Sciences, University of Canterbury, Christchurch, New

More information

Lecture 7 Mutation and genetic variation

Lecture 7 Mutation and genetic variation Lecture 7 Mutation and genetic variation Thymidine dimer Natural selection at a single locus 2. Purifying selection a form of selection acting to eliminate harmful (deleterious) alleles from natural populations.

More information

Stat 529 (Winter 2011) A simple linear regression (SLR) case study. Mammals brain weights and body weights

Stat 529 (Winter 2011) A simple linear regression (SLR) case study. Mammals brain weights and body weights Stat 529 (Winter 2011) A simple linear regression (SLR) case study Reading: Sections 8.1 8.4, 8.6, 8.7 Mammals brain weights and body weights Questions of interest Scatterplots of the data Log transforming

More information

Ensembl Exercise Answers Adapted from Ensembl tutorials presented by Dr. Bert Overduin, EBI

Ensembl Exercise Answers Adapted from Ensembl tutorials presented by Dr. Bert Overduin, EBI Ensembl Exercise Answers Adapted from Ensembl tutorials presented by Dr. Bert Overduin, EBI Exercise 1 Exploring the human MYH9 gene (a) Go to the Ensembl homepage (http://www.ensembl.org). Select Search:

More information

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree) I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

More information

Supplemental Figure 1.

Supplemental Figure 1. Supplemental Material: Annu. Rev. Genet. 2015. 49:213 42 doi: 10.1146/annurev-genet-120213-092023 A Uniform System for the Annotation of Vertebrate microrna Genes and the Evolution of the Human micrornaome

More information

Phylogenetic Networks, Trees, and Clusters

Phylogenetic Networks, Trees, and Clusters Phylogenetic Networks, Trees, and Clusters Luay Nakhleh 1 and Li-San Wang 2 1 Department of Computer Science Rice University Houston, TX 77005, USA nakhleh@cs.rice.edu 2 Department of Biology University

More information

Evidence of Evolution by Natural Selection (Ch. 16.4) Dodo bird

Evidence of Evolution by Natural Selection (Ch. 16.4) Dodo bird Evidence of Evolution by Natural Selection (Ch. 16.4) Dodo bird Evidence supporting evolution Fossil record Anatomical record Molecular record Artificial selection Fossil record Layers of sedimentary rock

More information

Annotation and Nomenclature: A Zebrafish Example. Ingo Braasch, Julian Catchen and John Postlethwait

Annotation and Nomenclature: A Zebrafish Example. Ingo Braasch, Julian Catchen and John Postlethwait Annotation and Nomenclature: A Zebrafish Example Ingo Braasch, Julian Catchen and John Postlethwait Annotation and Nomenclature: An Example: Zebrafish The goal Solutions Annotation and Nomenclature: An

More information

CONTENTS. P A R T I Genomes 1. P A R T II Gene Transcription and Regulation 109

CONTENTS. P A R T I Genomes 1. P A R T II Gene Transcription and Regulation 109 CONTENTS ix Preface xv Acknowledgments xxi Editors and contributors xxiv A computational micro primer xxvi P A R T I Genomes 1 1 Identifying the genetic basis of disease 3 Vineet Bafna 2 Pattern identification

More information

Bioinformatics Exercises

Bioinformatics Exercises Bioinformatics Exercises AP Biology Teachers Workshop Susan Cates, Ph.D. Evolution of Species Phylogenetic Trees show the relatedness of organisms Common Ancestor (Root of the tree) 1 Rooted vs. Unrooted

More information

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM I529: Machine Learning in Bioinformatics (Spring 2017) Content HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University,

More information

Molecular evolution 2. Please sit in row K or forward

Molecular evolution 2. Please sit in row K or forward Molecular evolution 2 Please sit in row K or forward RBFD: cat, mouse, parasite Toxoplamsa gondii cyst in a mouse brain http://phenomena.nationalgeographic.com/2013/04/26/mind-bending-parasite-permanently-quells-cat-fear-in-mice/

More information

Reconstruction of Ancestral Genome subject to Whole Genome Duplication, Speciation, Rearrangement and Loss

Reconstruction of Ancestral Genome subject to Whole Genome Duplication, Speciation, Rearrangement and Loss Reconstruction of Ancestral Genome subject to Whole Genome Duplication, Speciation, Rearrangement and Loss Denis Bertrand, Yves Gagnon, Mathieu Blanchette 2, and Nadia El-Mabrouk DIRO, Université de Montréal,

More information

Evidence for Evolution by Natural Selection. Raven Chapters 1 & 22

Evidence for Evolution by Natural Selection. Raven Chapters 1 & 22 Evidence for Evolution by Natural Selection Raven Chapters 1 & 22 2006-2007 Science happens within a culture What was the doctrine of the time? TINTORETTO The Creation of the Animals 1550 Then along comes

More information

Group activities: Making animal model of human behaviors e.g. Wine preference model in mice

Group activities: Making animal model of human behaviors e.g. Wine preference model in mice Lecture schedule 3/30 Natural selection of genes and behaviors 4/01 Mouse genetic approaches to behavior 4/06 Gene-knockout and Transgenic technology 4/08 Experimental methods for measuring behaviors 4/13

More information

Anthro 101: Human Biological Evolution. Lecture 7: Taxonomy/Primate Adaptations. Prof. Kenneth Feldmeier

Anthro 101: Human Biological Evolution. Lecture 7: Taxonomy/Primate Adaptations. Prof. Kenneth Feldmeier Anthro 101: Human Biological Evolution Lecture 7: Taxonomy/Primate Adaptations Prof. Kenneth Feldmeier Here is the PLAN Listen to this lecture and read about Taxonomy in the text I will ask you a question(s)

More information

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand

More information

Organizing Life s Diversity

Organizing Life s Diversity 17 Organizing Life s Diversity section 2 Modern Classification Classification systems have changed over time as information has increased. What You ll Learn species concepts methods to reveal phylogeny

More information

Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome

Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome Article Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome Elliott H. Margulies, 2,7,8,21 Gregory M. Cooper, 2,3,9 George Asimenos, 2,10 Daryl J. Thomas,

More information