Integrating Globus into a Science Gateway for Cryo-EM Michael Cianfrocco Life Sciences Institute Department of Biological Chemistry University of Michigan Globus World 2018
Impact & growth of Executive structural biology summary A revolution has occurred in structural biology - these questions can now be answered with cryo-electron microscopy (cryo-em) But Biologists need atomic-detailed structures of proteins in order to understand healthy and diseased states of organisms at the molecular level. Size of structure database Cryo-EM generating is a big data technology, 20+TB per person per project So We incorporated Globus into a web-platform to allow PNAS 73,128-312 () point-and-click data upload & analysis on HPC resources
2: a platform for determine cryo-em protein structures COSMIC Impact & growth of structural biology Cryo-EM Open Source Multiplatform Infrastructure for Cloud Computing PNAS 73,128-312 ()
Understanding biology requires an Impact & growth of structural biology understanding of the microcosmos' Example human cell 3m 30 μm https://www.thinglink.com times smaller PNAS 73,128-312 ()
Understanding biology requires an Impact & growth of structural biology understanding of the microcosmos' Example human cell **Model** illustrations Cytoplasm Plasma membrane Illustrations: David Goodsell, TSRI 3 μm Nucleus PNAS 73,128-312 ()
Understanding biology requires an Impact & growth of structural biology understanding of the microcosmos' Example human cell 3 μm Example subset of proteins (~100) PNAS 73,128-312 ()
Understanding biology requires an Impact & growth of structural biology understanding of the microcosmos' Amino acid 3 μm Diagram of ~100 proteins / molecules Example human cell Example subset of proteins (~100) Humans have 2 genes Predicted to form different types of proteins are PNAS 73,128-312 () Proteins linear chains of amino acids strung together (Usually > 300) Illustrations: David Goodsell, TSRI
Cryo-EM utilizes transmission electron microscopes to take images of proteins New Old generation
Cryo-EM: 30 second overview - core hours 3D RECONSTRUCTION EM IMAGING Class Average X Class Average Y Class Average Z AVERAGING NOISE Class X Class Y Class Z c/o Andres Leschziner CLASSIFICATION ALIGNMENT
Cryo-EM relies on movies instead of images of protein samples 4000 x 4000 pixels ~50 frames 1-5 GB / movie 1 movie / minute (24/7) Bai et al. 2015
Cryo-EM data is low SNR Grouped into classes and averaged to create averaged images: Averaged image Individual particles in averaged image
but can solve atomic protein structures Combine data together in 3D dimensions Atomic protein structure from ~400,000 images (12 TB) Allows atomic model to be built
Nobel Prize in Chemistry Impact & growth of structural biology PNAS 73,128-312 ()
undergoing a rapid expansion Impact &Cryo-EM growth ofisstructural biology # of cryo-em users Expert level knowledge (aka command-line experience) Time Instrument collects ~2 TB / day (24/7) -> Each individual requires collecting ~10-20 TB ~60 instruments in the US, growing quickly PNAS 73,128-312 ()
sciencebiology gateway for cryo-em Impact & Building growth of astructural Short term: 1. Remove command-line interface for cryo-em job submission Remove decisions regarding HPC job environment 2. 3. Create centralized location for data analysis software Long term: 1. Connect users to cloud storage 2. Become platform for training and implementing advanced algorithm developmen - e.g. Machine learning / neural networks 3. Integrate educational materials to train next generation of structural biologists PNAS 73,128-312 ()
Globus a central hub for COSMIC2 Impact & growth of as structural biology SDSC virtual machine XSEDE Comet supercomputer user storage Cloud storage (Google, Box, etc) Globus Auth & Transfer services user gateway storage PNAS 73,128-312 ()
Comet supercomputer at San Diego Supercomputer Center Title Impact & growth of structural biology 46,656 CPUs 288 GPUs PNAS 73,128-312 ()
Computing allocations Impact & growth of structural biology Current allocation on COSMIC2: ~25,000 GPU-hours All users: 500 GPU hours (free, no questions asked) If you run out: 1957 1959 2 -Apply for supplement from COSMIC 1961 XSEDE -Receiveguidance onhow to submit computing allocation from COSMIC2 PNAS 73,128-312 ()
Impact & growth of structural biology PNAS 73,128-312 ()
Impact & growth of structural biology PNAS 73,128-312 ()
Impact & growth of structural biology PNAS 73,128-312 ()
Impact & growth of structural biology PNAS 73,128-312 ()
Impact & growth of structural biology PNAS 73,128-312 ()
Impact & growth of structural biology PNAS 73,128-312 ()
Impact & growth of structural biology PNAS 73,128-312 ()
Impact & growth of structural biology PNAS 73,128-312 ()
Impact & growth of structural biology PNAS 73,128-312 ()
Impact & growth of structural biology PNAS 73,128-312 ()
Lessons learned Impact & growth of structural biology Globus integration was straightforward User account management & authentication Very easy data movement Next steps: Linking up to cloud storage Moving data between NSF HPC resources using Globus PNAS 73,128-312 ()
For morebiology information Impact & growth of structural Website: cosmic-cryoem.org PEARC17 paper: https://goo.gl/gnq79a ECSS Github repo: https://github.com/cianfrocco-lab/cosmic-cryoem-gateway PNAS 73,128-312 ()