Integrating Globus into a Science Gateway for Cryo-EM

Similar documents
Decision-making, inference, and learning theory. ECE 830 & CS 761, Spring 2016

NIH Center for Macromolecular Modeling and Bioinformatics Developer of VMD and NAMD. Beckman Institute

Statistical Clustering of Vesicle Patterns Practical Aspects of the Analysis of Large Datasets with R

NIH Center for Macromolecular Modeling and Bioinformatics Developer of VMD and NAMD. Beckman Institute

Introduction to Spatial Big Data Analytics. Zhe Jiang Office: SEC 3435

Breaking News English.com Ready-to-Use English Lessons by Sean Banville

Data Intensive Computing meets High Performance Computing

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

Introduction to Portal for ArcGIS. Hao LEE November 12, 2015

LIGO Grid Applications Development Within GriPhyN

Introduction to Portal for ArcGIS

What Can Physics Say About Life Itself?

Mark Neubauer. Enabling Discoveries at the LHC through Advanced Computation and Machine Learning. University of Illinois at Urbana-Champaign

Michael L. Norman Chief Scientific Officer, SDSC Distinguished Professor of Physics, UCSD

Quantum Artificial Intelligence and Machine Learning: The Path to Enterprise Deployments. Randall Correll. +1 (703) Palo Alto, CA

Three-Dimensional Electron Microscopy of Macromolecular Assemblies

Eligible Content This is what the State of Pennsylvania wants your students to know and be able to do by the end of the unit.

Introduction to Chemoinformatics and Drug Discovery

Cyberinfrastructure and CyberGIS: Recent Advances and Key Themes

CONTEMPORARY ANALYTICAL ECOSYSTEM PATRICK HALL, SAS INSTITUTE

Institute for Functional Imaging of Materials (IFIM)

Introduction to Protein Structures - Molecular Graphics Tool

Ivana Zinno, Francesco Casu, Claudio De Luca, Riccardo Lanari, Michele Manunta. CNR IREA, Napoli, Italy

PRACTICAL ANALYTICS 7/19/2012. Tamás Budavári / The Johns Hopkins University

Biology Slide 1 of 31

ArcGIS GeoAnalytics Server: An Introduction. Sarah Ambrose and Ravi Narayanan

Integrated Electricity Demand and Price Forecasting

Section II Understanding the Protein Data Bank

(I)LC Software and the grid

5.1 Cell Division and the Cell Cycle

Algorithms in Computational Biology (236522) spring 2008 Lecture #1

Introduction to Protein Structures - Molecular Graphics Tool

ArcGIS Enterprise: What s New. Philip Heede Shannon Kalisky Melanie Summers Shreyas Shinde

Interstate Data Moving and the Last Block Problem: Lessons Learned in the CAPS Spring Experiment 2014

NASA EARTH EXCHANGE (NEX)

CHAPTER 22 GEOGRAPHIC INFORMATION SYSTEMS

AstroPortal: A Science Gateway for Large-scale Astronomy Data Analysis

Open Data meets Big Data

Computational Structural Biology and Molecular Simulation. Introduction to VMD Molecular Visualization and Analysis

Environmental Response Management Application

ArcGIS Enterprise: What s New. Philip Heede Shannon Kalisky Melanie Summers Sam Williamson

Spatial Analysis in CyberGIS

MSc Drug Design. Module Structure: (15 credits each) Lectures and Tutorials Assessment: 50% coursework, 50% unseen examination.

CSD. Unlock value from crystal structure information in the CSD

Topology based deep learning for biomolecular data

BIOLOGY IB DIPLOMA PROGRAM PARENTS ORIENTATION DAY

Resources for Spatial Thinking and Analysis

Harvard Center for Geographic Analysis Geospatial on the MOC

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST

Editorial Introduction: Special Issue on Grids and Geospatial Information Systems

Search for the Dubai in the remap search bar:

Your web browser (Safari 7) is out of date. For more security, comfort and the best experience on this site: Update your browser Ignore

Chem Compute Science Gateway for Undergraduates. Mark J. Perri, M.S. Reeves, R.M. Whitnell

Cloud-based WRF Downscaling Simulations at Scale using Community Reanalysis and Climate Datasets

Hands-on Course in Computational Structural Biology and Molecular Simulation BIOP590C/MCB590C. Course Details

Computing Infrastructure for Gravitational Wave Data Analysis

What is Systems Biology

VMD: Visual Molecular Dynamics. Our Microscope is Made of...

How GIS based Visualizations Support Land Use and Transportation Modeling

BMD645. Integration of Omics

RNA & PROTEIN SYNTHESIS. Making Proteins Using Directions From DNA

Application of HUBzero platform for the educational process in astroparticle physics

Continuous Machine Learning

The Application of 3D Web GIS In Land Administration - 3D Building Model System

Large Scale Evaluation of Chemical Structure Recognition 4 th Text Mining Symposium in Life Sciences October 10, Dr.

Deep Learning. Convolutional Neural Networks Applications

Application of WebGIS and VGI for Community Based Resources Inventory. Jihn-Fa Jan Department of Land Economics National Chengchi University

Portal for ArcGIS: An Introduction

Outcome 3 is an assessment of practical skills where evidence could be recorded in the form of a checklist.

Quantum computing with superconducting qubits Towards useful applications

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

Cell Alive Homeostasis Plants Animals Fungi Bacteria. Loose DNA DNA Nucleus Membrane-Bound Organelles Humans

Nobel Prize in Physiology or Medicine 2017

for Effective Land Administration

The Fruit Fly Brain Observatory

04/05/2017. Cell Biology. AQA 2016 Syllabus

Portal for ArcGIS: An Introduction. Catherine Hynes and Derek Law

Computational Biology From The Perspective Of A Physical Scientist

Spatial Data Availability Energizes Florida s Citizens

Measuring earthquake-generated surface offsets from high-resolution digital topography

Start of the maintenance:

The Changing Landscape of Land Administration

Biology (age 11-14) Subject map

CSISS Resources for Research and Teaching

Role of GIS in Tracking and Controlling Spread of Disease

Form a Hypothesis. Variables in an Experiment Dependent Variable what is being measured (data) Form a Hypothesis 2. Form a Hypothesis 3 15:03 DRY MIX

Introduction to Three Dimensional Structure Determination of Macromolecules by Cryo-Electron Microscopy

Advanced Weather Technology

Recommended Grade Level: 8 Earth/Environmental Science Weather vs. Climate

Keystone Exams: Biology Assessment Anchors and Eligible Content. Pennsylvania Department of Education

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES

Collaborative WRF-based research and education enabled by software containers

STEM Research Literacy

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Introduction to Google Mapping Tools

Product Quality README file for GOME Level 1b version 5.1 dataset

Marla Meehl Manager of NCAR/UCAR Networking and Front Range GigaPoP (FRGP)

Protein Structure Prediction and Display

A GIS-Model to Identify Flood Affected Zones Using Landsat Images

CAWa Central Asian Water. Training Course Geographical Information Systems in Hydrology

Transcription:

Integrating Globus into a Science Gateway for Cryo-EM Michael Cianfrocco Life Sciences Institute Department of Biological Chemistry University of Michigan Globus World 2018

Impact & growth of Executive structural biology summary A revolution has occurred in structural biology - these questions can now be answered with cryo-electron microscopy (cryo-em) But Biologists need atomic-detailed structures of proteins in order to understand healthy and diseased states of organisms at the molecular level. Size of structure database Cryo-EM generating is a big data technology, 20+TB per person per project So We incorporated Globus into a web-platform to allow PNAS 73,128-312 () point-and-click data upload & analysis on HPC resources

2: a platform for determine cryo-em protein structures COSMIC Impact & growth of structural biology Cryo-EM Open Source Multiplatform Infrastructure for Cloud Computing PNAS 73,128-312 ()

Understanding biology requires an Impact & growth of structural biology understanding of the microcosmos' Example human cell 3m 30 μm https://www.thinglink.com times smaller PNAS 73,128-312 ()

Understanding biology requires an Impact & growth of structural biology understanding of the microcosmos' Example human cell **Model** illustrations Cytoplasm Plasma membrane Illustrations: David Goodsell, TSRI 3 μm Nucleus PNAS 73,128-312 ()

Understanding biology requires an Impact & growth of structural biology understanding of the microcosmos' Example human cell 3 μm Example subset of proteins (~100) PNAS 73,128-312 ()

Understanding biology requires an Impact & growth of structural biology understanding of the microcosmos' Amino acid 3 μm Diagram of ~100 proteins / molecules Example human cell Example subset of proteins (~100) Humans have 2 genes Predicted to form different types of proteins are PNAS 73,128-312 () Proteins linear chains of amino acids strung together (Usually > 300) Illustrations: David Goodsell, TSRI

Cryo-EM utilizes transmission electron microscopes to take images of proteins New Old generation

Cryo-EM: 30 second overview - core hours 3D RECONSTRUCTION EM IMAGING Class Average X Class Average Y Class Average Z AVERAGING NOISE Class X Class Y Class Z c/o Andres Leschziner CLASSIFICATION ALIGNMENT

Cryo-EM relies on movies instead of images of protein samples 4000 x 4000 pixels ~50 frames 1-5 GB / movie 1 movie / minute (24/7) Bai et al. 2015

Cryo-EM data is low SNR Grouped into classes and averaged to create averaged images: Averaged image Individual particles in averaged image

but can solve atomic protein structures Combine data together in 3D dimensions Atomic protein structure from ~400,000 images (12 TB) Allows atomic model to be built

Nobel Prize in Chemistry Impact & growth of structural biology PNAS 73,128-312 ()

undergoing a rapid expansion Impact &Cryo-EM growth ofisstructural biology # of cryo-em users Expert level knowledge (aka command-line experience) Time Instrument collects ~2 TB / day (24/7) -> Each individual requires collecting ~10-20 TB ~60 instruments in the US, growing quickly PNAS 73,128-312 ()

sciencebiology gateway for cryo-em Impact & Building growth of astructural Short term: 1. Remove command-line interface for cryo-em job submission Remove decisions regarding HPC job environment 2. 3. Create centralized location for data analysis software Long term: 1. Connect users to cloud storage 2. Become platform for training and implementing advanced algorithm developmen - e.g. Machine learning / neural networks 3. Integrate educational materials to train next generation of structural biologists PNAS 73,128-312 ()

Globus a central hub for COSMIC2 Impact & growth of as structural biology SDSC virtual machine XSEDE Comet supercomputer user storage Cloud storage (Google, Box, etc) Globus Auth & Transfer services user gateway storage PNAS 73,128-312 ()

Comet supercomputer at San Diego Supercomputer Center Title Impact & growth of structural biology 46,656 CPUs 288 GPUs PNAS 73,128-312 ()

Computing allocations Impact & growth of structural biology Current allocation on COSMIC2: ~25,000 GPU-hours All users: 500 GPU hours (free, no questions asked) If you run out: 1957 1959 2 -Apply for supplement from COSMIC 1961 XSEDE -Receiveguidance onhow to submit computing allocation from COSMIC2 PNAS 73,128-312 ()

Impact & growth of structural biology PNAS 73,128-312 ()

Impact & growth of structural biology PNAS 73,128-312 ()

Impact & growth of structural biology PNAS 73,128-312 ()

Impact & growth of structural biology PNAS 73,128-312 ()

Impact & growth of structural biology PNAS 73,128-312 ()

Impact & growth of structural biology PNAS 73,128-312 ()

Impact & growth of structural biology PNAS 73,128-312 ()

Impact & growth of structural biology PNAS 73,128-312 ()

Impact & growth of structural biology PNAS 73,128-312 ()

Impact & growth of structural biology PNAS 73,128-312 ()

Lessons learned Impact & growth of structural biology Globus integration was straightforward User account management & authentication Very easy data movement Next steps: Linking up to cloud storage Moving data between NSF HPC resources using Globus PNAS 73,128-312 ()

For morebiology information Impact & growth of structural Website: cosmic-cryoem.org PEARC17 paper: https://goo.gl/gnq79a ECSS Github repo: https://github.com/cianfrocco-lab/cosmic-cryoem-gateway PNAS 73,128-312 ()