Chromosomal rearrangements in mammalian genomes : characterising the breakpoints. Claire Lemaitre

Similar documents
Genome Rearrangements In Man and Mouse. Abhinav Tiwari Department of Bioengineering

Genomes and Their Evolution

A Methodological Framework for the Reconstruction of Contiguous Regions of Ancestral Genomes and Its Application to Mammalian Genomes

The Fragile Breakage versus Random Breakage Models of Chromosome Evolution

The 3D Organization of Chromatin Explains Evolutionary Fragile Genomic Regions

Handling Rearrangements in DNA Sequence Alignment

17 Non-collinear alignment Motivation A B C A B C A B C A B C D A C. This exposition is based on:

Analysis of Gene Order Evolution beyond Single-Copy Genes

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

Multiple Whole Genome Alignment

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

TE content correlates positively with genome size

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:

Comparative genomics: Overview & Tools + MUMmer algorithm

I519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB

Chromosomal Breakpoint Reuse in Genome Sequence Rearrangement ABSTRACT

Molecular evolution - Part 1. Pawan Dhar BII

Algorithms for Bioinformatics

The combinatorics and algorithmics of genomic rearrangements have been the subject of much

Example of Function Prediction

1 Introduction. Abstract

Comparative Genomics. Chapter for Human Genetics - Principles and Approaches - 4 th Edition

NIH Public Access Author Manuscript Pac Symp Biocomput. Author manuscript; available in PMC 2009 October 6.

Whole Genome Alignments and Synteny Maps

Genomes Comparision via de Bruijn graphs

Greedy Algorithms. CS 498 SS Saurabh Sinha

Molecular Evolution & the Origin of Variation

Molecular Evolution & the Origin of Variation

Adaptive Evolution of Conserved Noncoding Elements in Mammals

Identifying Positional Homologs as Bidirectional Best Hits of Sequence and Gene Context Similarity

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Evolution s cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes

- mutations can occur at different levels from single nucleotide positions in DNA to entire genomes.

BIO GENETICS CHROMOSOME MUTATIONS

Reconstructing contiguous regions of an ancestral genome

Breakpoint Graphs and Ancestral Genome Reconstructions

Drosophila melanogaster and D. simulans, two fruit fly species that are nearly

Computational Genetics Winter 2013 Lecture 10. Eleazar Eskin University of California, Los Angeles

Multiple Alignment of Genomic Sequences

WHAT fraction of new mutations in the genome are

FUNDAMENTALS OF MOLECULAR EVOLUTION

Multi-Break Rearrangements and Breakpoint Re- Uses: From Circular to Linear Genomes

Inference of mutation parameters and selective constraint in mammalian. coding sequences by approximate Bayesian computation

Genome Rearrangement. 1 Inversions. Rick Durrett 1

Effects of chromosomal rearrangements on human-chimpanzee molecular evolution

Detection of gene expression changes at chromosomal rearrangement breakpoints in evolution

RGP finder: prediction of Genomic Islands

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS

A sequence-dependent physical model of nucleosome occupancy along the eukaryotic chromatin fiber. Alain Arneodo

7 Multiple Genome Alignment

Ancestral reconstruction of segmental duplications reveals punctuated cores of human genome evolution

Understanding relationship between homologous sequences

Dr. Amira A. AL-Hosary

Inferring positional homologs with common intervals of sequences

AN EXACT SOLVER FOR THE DCJ MEDIAN PROBLEM

Principles of Genetics

Supporting Information

Mobilomics in Saccharomyces cerevisiae strains

I519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB

Conservation of Human Microsatellites across 450 Million Years of Evolution

Statistical Analyses and Markov Modeling of Duplication in Genome Evolution

Comparative Genomics II

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Changes in gene expression near evolutionary breakpoints

Intro Gene regulation Synteny The End. Today. Gene regulation Synteny Good bye!

CONTENTS. P A R T I Genomes 1. P A R T II Gene Transcription and Regulation 109

Figure S2. The distribution of the sizes (in bp) of syntenic regions of humans and chimpanzees on human chromosome 21.

Introduction to Bioinformatics

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Phylogenetic analysis. Characters

COMPARATIVE PRIMATE GENOMICS

8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage

Outline of lectures 23-26

WHERE DOES THE VARIATION COME FROM IN THE FIRST PLACE?

Visit to BPRC. Data is crucial! Case study: Evolution of AIRE protein 6/7/13

Hidden Markov models in population genetics and evolutionary biology

Annotation and Nomenclature: A Zebrafish Example. Ingo Braasch, Julian Catchen and John Postlethwait

CELL CYCLE UNIT GUIDE- Due January 19, 2016

Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012

SWEEPFINDER2: Increased sensitivity, robustness, and flexibility

Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON M5R 3G4 Canada

Genome Rearrangement Problems with Single and Multiple Gene Copies: A Review

Lecture 7 Mutation and genetic variation

Whole-Genome Alignments and Polytopes for Comparative Genomics

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

Phylogenetics in the Age of Genomics: Prospects and Challenges

Pairwise & Multiple sequence alignments

and both play a significant role in the rise of variable size gene families originating

Supporting Information

How much non-coding DNA do eukaryotes require?

Gene expression differences in human and chimpanzee cerebral cortex

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/8/16

Evolution by duplication

Jay Moore,, Graham King, James Lynn. Data integration for Brassica comparative genomics

Isolating - A New Resampling Method for Gene Order Data

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Techniques for Multi-Genome Synteny Analysis to Overcome Assembly Limitations

Sequence analysis and comparison

Interstitial telomere-like repeats in the Arabidopsis thaliana genome

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Transcription:

PhD defense Chromosomal rearrangements in mammalian genomes : characterising the breakpoints Claire Lemaitre Laboratoire de Biométrie et Biologie Évolutive Université Claude Bernard Lyon 1 6 novembre 2008

Genome dynamics Point mutations: insertion, deletion, substitution Large-scale modifications: chromosomal rearrangements inversions, translocations, transpositions, fusions, fissions, duplications, deletions 2

Genome dynamics Functional impacts of rearrangements duplication / deletion breakage in functional sequences modification of the genome organisation Rearrangements are found in: inherited diseases polymorphism evolution Also : cancer, speciation 3

Genome evolution Structural differences between species: in germ-line cell inheritance fixation in the population Examples in mammals: number of chromosomes: 2n=6 2n=102 Human-chimpanzee: 1.2 % sequence divergence 9 large inversions and 1 fusion and many smaller rearrangements 4

Chromosomal evolution : example Human vs Mouse Chr X Human Mouse 5 http://cinteny.cchmc.org

Open questions in chromosomal evolution Diversity in rates and types of rearrangements in different lineages Localisation of rearrangements along the genomes: Random Breakage Model size of conserved segments (Nadeau & Taylor, 1984) Fragile Breakage Model more data thanks to whole genome sequencing many small segments re-use of breakpoints in different lineages Pevzner, 2003 Kent, 2003 Murphy, 2005 6...

Motivations breakpoints Do the breakpoint sequences show some characteristics? Is it possible to characterise the breakpoint sequences? base composition, repeated elements, motifs... Is the breakpoint distribution along the genome linked to some genome organisation? isochores, gene distribution, recombination, replication, chromatin structure... 7

Strategy 1. Localising very precisely the breakpoints along one genome 2. Analysing: breakpoint sequences breakpoint distribution 8

Strategy 1. Localising very precisely the breakpoints along one genome review on the computational methods to detect rearrangement breakpoints whole genome sequences: precision expected... but disappointing Lemaitre & Sagot. A small trip in the untranquil world of genomes. TCS 2008 9

Strategy 1. Localising very precisely the breakpoints on a genome: development of a method in 2 steps 1. detecting broadly the synteny blocks 2. refining the breakpoints mouse 10 human chr. 11

Synteny blocks detection Def: orthologous regions between 2 genomes which have not been rearranged => conserved order and orientation of orthologous markers Our contribution: formal definition of synteny blocks flexibility blocks without conflicts (no overlap) markers = genes 11

Method to refine breakpoints INPUT: the synteny blocks between 2 genomes Gr and Go the sequences of genomes Gr and Go OUTPUT: the breakpoints regions on Gr 12

Breakpoint refinement The breakpoint = between 2 consecutive synteny blocks on Gr, rearranged on Go Asymmetry + origin of the breakage event 13

Alignments Alignment of the inter-block sequences alignment: local, sensitive and fast -> BlastZ 14

Alignments (2) 2 lists of hits: hits between Sr and SoA hits between Sr and SoB The hits are mapped on sequence Sr Ar Br Sr 15

Segmentation Coding the hits information: only the positions covered by hits numerical sequence I Ar Br Sr Sr' I = 1111110-1011111-1-1-1-110-1-1 16

Segmentation (2) looking for 3 segments: segment 1: homology with SoA segment 2: breakpoint segment 3: homology with SoB 1 u1 u2 0 I u3-1 x1 x2 17

1 Segmentation (3) u1 u2 0 u3 I -1 x1 3 segments: { segment 1 : u1 = x2 mean (I[1..x1]) if > 0 + otherwise segment 2 : u2 = 0 segment 3 : u3 = { mean (I[x2+1..n]) if < 0 + otherwise Find x1 et x2 such that: 3 min f x 1, x 2 = xj j =1 k = x j 1 1 2 I [ k ] u j (with x0=0 and x3=n) 18

Segmentation - algorithm Classical algorithm: dynamic programming => O(n2) Speed-up: two independent minimisations => O(n) Evaluation: estimation of a p-value random sequences (I) by shuffling the hits 19

Results Comparison human-mouse 354 breakpoints breakpoints reduced on average by 536 Kb median = 268 Kb median = 51 Kb 171 breakpoints < 50 Kb 20

Comparisons with other methods Whole genome alignments: pairwise and multiple human-mouse Breakpoint size (Kb) median mean 51 129 Grimm2 (Pevzner & Tesler, 2003) 156 364 Grimm3 (Bourque, 2004) Ensembl (Hubbard, 2007) 268 95 454 223 Refined Lemaitre et al. 2008. BMC Bioinformatics 21

Discussion Better precision because: limitation of the search space (2 steps), rather than whole genome alignments pairwise, rather than multiple G1-G2-G3 G1-G3 G1-G2 G1 G2 G3 22

Discussion Better precision because: limitation of the search space (2 steps), rather than whole genome alignments pairwise, rather than multiple G1-G2-G3 G1-G3 G1 G1-G2 G2 G3 23

Characterising the breakpoints In a systematic way: mammalian breakpoints: human vs mouse, rat, dog, macaque, chimpanzee compared to: artificial breakpoints segmentation properties flanking sequences sequence characteristics randomised points distribution 24

Artificial breakpoints Artificial breakpoints 25

Artificial breakpoints or a null model of breakage Artificial breakpoints model = breakage + sequence evolution as if no rearrangement 26

Segmentation differences Size of the «breakpoint» : point or region? Similarity of the adjacent sequences (seg. 1 & 3) Breakpoints Artificial points Size of segment 2 Hits coverage over segments 1 and 3 128 Kb 7 Kb 50 % 59 % human-mouse Ar Br Sr seg. 2 27

Investigating large breakpoints large breakpoint hits from SoA and SoB? Yes No duplication 28

Duplication at the breakpoint explaining the size of some breakpoints related to the molecular mechanisms of rearrangement 29

Duplication at the breakpoint (2) 2 mechanisms of rearrangements involving duplications ectopic recombination Casals, 2007 staggered breaks 30

Duplication at the breakpoint (3) Application to the human X-Y comparison: identification of 2 duplications = footprints of 2 inversions + temporal ordering hypothesis of sex differenciation by inversions 40 Kb 70 Kb 31 Lemaitre et al. (M. Braga, G. Marais). 2008. sub. to Mol Biol Evol.

Investigating large breakpoints large breakpoint hits from SoA and SoB? 17 % Yes No 83 % duplication no similarity with Go similarity elsewhere in Go other rearr. 32

Similarity elsewhere UCSC whole genome alignments no similarity: rapid divergence insertion of new sequence in human deletion in mouse 33

Other characteristics Human segmental duplications Transposable elements mean = 44 % mean = 53 % 34

Breakpoints features Results : Loss of similarity inside and outside breakpoints Duplications and repeated elements Complexity of breakpoints : not only punctual breakage sequence evolution more complex «after» the rearrangement or: sequence properties «before» the rearrangement 35

Distribution of breakpoints along the human genome Are the breakpoints distributed uniformly and independently along the genome? Random Breakage Model Are there some «forbidden» regions? negative selection preventing breakage inside genes and functional regions Intergenic Breakage Model Are there some «fragile» regions? neutral model, regions more prone to breakage Hotspots or Fragile Breakage Model 36

Breakpoint data Mammalian breakpoints: 5 pairwise comparisons Human X X= mouse, rat, dog, macaque, chimpanzee 622 breakpoints mapped on the human genome median size of 26.6 Kb Simulations: random breakage model: 1000 data sets: 622 breakpoints uniformly redistributed on the human genome (same size, without overlap) 37

Breakpoints and genes Comparison of the genic fraction of breakpoints 57 breakpoints fully genic no simulation with 57 p-value < 0.001 under-representation of breakpoints inside genes RBM + negative selection 38

Breakpoints and genes (2) Intergenic breakpoints Inter-gene size breakpoints are located in smaller inter-genes Can not be explained by random breakage + natural selection 39

The isochore organisation Genomic landscape isochores : homogeneous GC content correlations with other genomic features SINEs content GC content intron length -1 Gene density Part of human chromosome 9 Versteeg et al. 2003 40

Other correlations breakpoints sim. points test (p-value) GC content 44 % 41 % e-12 Gene density (#/Mb) 14.9 8.3 <2e-16 19.2 % 12.6 % e-13 SINEs Isochores 41

Replication origins Detection in silico, based on GC + AT skew profiles breakpoint 578 N-domains 1060 putative origins ORI d ORI N-Domain ~20% of the genome breakpoints are over-represented close to the ORIs Collaboration with A. Arneodo's team at ENS of Lyon Huvet et al., 2007. 42

Replication origins (2) ORIs contain small intergenes and are richer in G+C 43

A new model Breakpoints are over-represented in regions with : high transcriptional activity, replication initiation, Open chromatin hypothesis: these regions are «open» and thus more susceptible to breakage Model: neutral mutational bias + natural selection in genes Lemaitre et al. (Arneodo's team) 2008. sub. to Genome Res. 44

Conclusion and future work A method allowing to analyse precisely breakpoint structure and distribution automatically detection of duplications analysing the similarity decrease around the breakpoint Characterisation of breakpoints: duplications, loss of similarity, motifs... comparing different types of rearrangements take into account the evolutionary origin 45

To continue... A new model of breakpoint localisation along the human genome expression and chromatin data investigating cases of breakage inside genes 1D 3D spatial genome organisation inside the nucleus Branco and Pombo, 2006 46

merci!!! 47

48

Synteny blocks Flexibility colinearity distance criteria size criteria genome 2 Chaining principle: génome 1 genome 1 49

Conflicts genome 2 orthology at the extremities genome 1 genome 2 overlaps between blocks genome 1 block 1 50 block 2

Human X-Y divergence stair-shape of the divergence between X-Y genes along X => several steps of recombination suppression Skaletsky et al. 2003 Lahn and Page, 1999 51

Combining pairwise datasets Example : mouse rat human human dog chimp mouse macaque rat 52