The Saguaro Genome Toward the Ecological Genomics of a Sonoran Desert Icon Dr. Dario Copetti June 30, 2015 STEMAZing workshop TCSS
Why study a genome? - the genome contains the genetic information of an organism, which is stored as DNA sequences (nucleus, organelles) - the four DNA bases form a linear sequence genes, non-coding regions that store and specify the biological information - genes define the features of an organism: how to produce energy, color of flowers, type of protein, how many fingers, - genes are inherited by the next generations and the DNA sequence may change over time
Why study a genome? - the genome contains the genetic information of an organism, which is stored as DNA sequences (nucleus, organelles) - the four DNA bases form a linear sequence genes, non-coding regions that store and specify the biological information - genes define the features of an organism: how to produce energy, color of flowers, type of protein, how many fingers, - genes are inherited by the next generations and the DNA sequence may change over time Sequencing a genome means to read all the DNA bases and to assign their position on the chromosomes Knowing the genome sequence of an organism allows us to: - unravel the genetic basis underlying its traits - determine relationships between individuals - compare organisms and study their evolution
Plant genome sequencing Important technological achievements have allowed us to sequence DNA faster and cheaper The number of plant genomes sequenced has grown exponentially in the last few years Not only model or crop plants, but many other organisms can be fully studied... cheaply!
Plant genome sequencing Important technological achievements have allowed us to sequence DNA faster and cheaper The number of plant genomes sequenced has grown exponentially in the last few years Not only model or crop plants, but many other organisms can be fully studied... cheaply! Arizona Genomics Institute has a 10+ year history of plant genome sequencing We are pushing this to the next level: platinum standard genome assemblies
PacBio RSII Single Molecule Real Time sequencing 15 Gb/run 4.7 human genomes 60,000 reads/smrt cell reads 2,000 to 40,000 bp long
Federally listed threatened and endangered plant species In Arizona, 9 of the 20 threatened or endangered species are cacti 8 of the 14 endangered species are cacti But there is no genetic resources to study them Furthermore, succulent plants such as agaves and yuccas have peculiar adaptations to survive in extreme environments, just like many cacti http://www.fws.gov/endangered/
The Saguaro Genome Project In summer 2013 the Universidad Nacional Autonoma de Mexico (UNAM) and the University of Arizona established small grants for: research in adaptation and sustainability in arid areas of Northwest Mexico and the Southwestern United States
The Saguaro Genome Project In summer 2013 the Universidad Nacional Autonoma de Mexico (UNAM) and the University of Arizona established small grants for research in adaptation and sustainability in arid areas of Northwest Mexico and the Southwestern United States Our proposal for building a genome assembly of the saguaro was awarded We are adopting the latest sequencing and assembly technologies to reconstruct the gene space of a member of Cactaceae, for which very limited amount of sequence data is available This will provide an unprecedented amount of genetic data for this family Extend to population studies, collecting specimens in both states
How do we sequence the saguaro genome? Illumina sequencing technology 200M read pairs/lane 300 Gb/run ~100 human genomes reads 100 to 300 bp long Sequencing and assembly
Genome sequencing 101 - DNA extraction: isolate and purify DNA from an organism - Library preparation: modify the DNA to prepare it for the sequencing
Genome sequencing 101 - DNA extraction: isolate and purify DNA from an organism - Library preparation: modify the DNA to prepare it for the sequencing - Sequencing: determine the order of the consecutive bases of one DNA strand - Data pre-processing: remove low quality data, correct errors, modify format
Genome sequencing 101 - DNA extraction: isolate and purify DNA from an organism - Library preparation: modify the DNA to prepare it for the sequencing - Sequencing: determine the order of the consecutive bases of one DNA strand - Data pre-processing: remove low quality data, correct errors, modify format - Assembly: reconstructing the sequence of the chromosomes by merging overlapping reads
Genome sequencing 101 - DNA extraction: isolate and purify DNA from an organism - Library preparation: modify the DNA to prepare it for the sequencing - Sequencing: determine the order of the consecutive bases of one DNA strand - Data pre-processing: remove low quality data, correct errors, modify format - Assembly: reconstructing the sequence of the chromosomes by merging overlapping reads - Annotation: predict and assign a function to the sequence
Saguaro Genome Sequencing Project Status and future work The sequence of the chloroplast genome revealed peculiar structure and gene content We are currently in the phase of assembling the nuclear genome then we will use RNA-Seq data to annotate the sequence this will also allow us to detect genes differentially expressed in organs or phases
Datasets to be produced: - Annotated genome assembly
Datasets to be produced: - Annotated genome assembly - Collection of transcripts from different tissues and stages
Datasets to be produced: - Annotated genome assembly - Collection of transcripts from different tissues and stages - Low coverage sequencing of 5 saguaro populations At each location, two individuals were resequenced to detect the genetic variants that characterize each population
Datasets to be produced: - Annotated genome assembly - Collection of transcripts from different tissues and stages - Low coverage sequencing of 5 saguaro populations - Low coverage sequencing of other columnar cacti
Immediate and long-term goals of the project - Characterize and describe the saguaro (and cactus) genome at the molecular level - Describe ecology, phylogeny, and physiology of the family - Apply modern research technologies to non-model, endangered species to develop ecological genomics studies - Train and educate researchers and the community about the importance of knowing and preserving local resources
Saguaro Genome Sequencing Project - Investigators Prof. Michael J, Sanderson, Prof. Shelley McMahon, Dr. Derrick Zwickl, Joseph Charboneau EEB UofA Comparative evolutionary biology, computational biology Prof. Rod A. Wing, Dr. Dario Copetti CALS-EEB UofA Structural and evolutionary biology of crop genomes, genome sequencing and physical mapping Prof. Alberto Burquez, Dr. Enriquena Bustamante IDE UNAM Population and community ecology, plant-animal relationships, biogeography and land-use change in Sonora Prof. Luis E. Eguiarte IDE UNAM Evolution of Agave and cacti, population and conservation genetics of Centroamerican plants Prof. Martin F. Wojciechowski, Prof. Kelly Steele, Prof. Sudhir Kumar SOLS ASU Molecular phylogenetics and evolution of plants, gene & genome evolution
Thanks to the UofA-UNAM consortium the Tumamoc Hill Desert Laboratory, and TCSS