Cloud-scale RNA-sequencing differential expression analysis with Myrna
|
|
- Kristopher Daniels
- 5 years ago
- Views:
Transcription
1 Cloud-scale RNA-sequencing differential expression analysis with Myrna Jeff Leek Johns Hopkins Bloomberg School of Public Health e: t: myrna:
2 Acknowledgements Kasper Hansen Ben Langmead
3 RNA-Sequencing Experiments Sample A TGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATTTGATTCCTGCCTCATCCTATTATTTATCGCACCTAC Sample B
4 Big Picture Ideas A Problem: The growth in data from sequencing is outstripping the growth in processing speed/storage of individual computers A solution: Take advantage of economies of scale and rent scalable computing resources for sequencing analysis Today: Myrna Tri-mode software for RNA-sequencing analysis
5 Intimidating trends GA II 1.6 billion bp per day (old) GA IIx 5 billion bp per day (current) HiSeq billion bp per day (for release later in 2010) Images from:
6 Computational throughput Moore s Law: The number of transistors that can be placed inexpensively on an integrated circuit doubles approximately every two years.
7 Intimidating Trends Stein Genome Biology (2010) doi: /gb
8 Intimidating trends Kahvejian et al. Nature Biotechnology (2008) doi: /nbt1494
9 Difficulties With Storage/Support
10 Throughput growth gap > 4-5x per year 2x per 2 years
11 Throughput growth gap = Idle
12 Throughput growth gap = Faster algorithms
13 Throughput growth gap
14 Throughput growth gap
15 Cloud computing Rent; don t buy. : Cloud vendor : Electric Company
16 Renting Computing Time Easier access to ever larger economies of scale Columbia river for cheap hydroelectric power & cooling
17 Cloud computing Pros & Cons Why? Cost? Handles demand that grows, shrinks dramatically No hardware maintenance No alternative? Why not? Cost? Harder to program Less user-friendly Data movement is inconvenient & can outpace network Privacy (e.g. IRB concerns)
18 Amazon Web Services
19 Cloud computing 1.7 GB RAM, 1 32-bit virtual processor core clocked at ~1.2Ghz, ~160 GB local storage 70 GB RAM, 8 64-bit virtual processor cores clocked at ~4.0 Ghz each, ~1.5 TB local storage 7 GB RAM, 8 64-bit virtual processor cores clocked at ~2.6 Ghz each, ~1.5 TB local storage
20 RNA-Sequencing Experiments Sample A TGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATTTGATTCCTGCCTCATCCTATTATTTATCGCACCTAC Sample B
21 Sample A Simple Goal: Determine If Gene 1 Is Differentially Expressed TGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATTTGATTCCTGCCTCATCCTATTATTTATCGCACCTAC Sample B
22 RNA-Sequencing Experiments Sample A GTCGCAGTANCTGTCT GGATCTGCGATATACC GGATCT-CGATATACC AATCTGATCTTATTTT AATCTGATCTTATTTT ATATATATATATATAT ATATATATATATATAT TCTCTCCCANNAGAGC TCTCTCCCAGGAGAGC TGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATTTGATTCCTGCCTCATCCTATTATTTATCGCACCTAC Sample B GTCGCAGTANCTGTCT GGATCTGCGATATACC GGATCT-CGATATACC AATCTGATCTTATTTT AATCTGATCTTATTTT ATATATATATATATAT ATATATATATATATAT TCTCTCCCANNAGAGC TCTCTCCCAGGAGAGC
23 RNA-Sequencing Experiments Sample A GTCGCAGTANCTGTCT GGATCTGCGATATACC GGATCT-CGATATACC AATCTGATCTTATTTT AATCTGATCTTATTTT ATATATATATATATAT ATATATATATATATAT TGTCGCAGTATCTGTC TATGTCGCAGTATCTG TATATCGCAGTATCTG TATATCGCAGTATCTG TATATCGCAGTATCTG CCCTATATCGCAGTAT AGCACCCTATGTCGCA AGCACCCTATATCGCA AGCACCCTATGTCGCA GAGCACCCTATGTCGC CCGGAGCACCCTATAT TCTCTCCCANNAGAGC CCGGAGCACCCTATAT TCTCTCCCAGGAGAGC GCCGGAGCACCCTATG TGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATTTGATTCCTGCCTCATCCTATTATTTATCGCACCTAC Sample B TGTCGCAGTATCTGTC AGCACCCTATGTCGCA GTCGCAGTANCTGTCT GCCGGAGCACCCTATG GGATCTGCGATATACC GGATCT-CGATATACC AATCTGATCTTATTTT AATCTGATCTTATTTT ATATATATATATATAT ATATATATATATATAT TCTCTCCCANNAGAGC TCTCTCCCAGGAGAGC
24 RNA-Sequencing Experiments Sample A GTCGCAGTANCTGTCT GGATCTGCGATATACC GGATCT-CGATATACC AATCTGATCTTATTTT AATCTGATCTTATTTT ATATATATATATATAT ATATATATATATATAT TGTCGCAGTATCTGTC TATGTCGCAGTATCTG TATATCGCAGTATCTG TATATCGCAGTATCTG TATATCGCAGTATCTG CCCTATATCGCAGTAT AGCACCCTATGTCGCA AGCACCCTATATCGCA AGCACCCTATGTCGCA GAGCACCCTATGTCGC CCGGAGCACCCTATAT TCTCTCCCANNAGAGC CCGGAGCACCCTATAT TCTCTCCCAGGAGAGC GCCGGAGCACCCTATG TGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATTTGATTCCTGCCTCATCCTATTATTTATCGCACCTAC Sample B TGTCGCAGTATCTGTC AGCACCCTATGTCGCA GTCGCAGTANCTGTCT GCCGGAGCACCCTATG GGATCTGCGATATACC GGATCT-CGATATACC AATCTGATCTTATTTT AATCTGATCTTATTTT ATATATATATATATAT ATATATATATATATAT TCTCTCCCANNAGAGC TCTCTCCCAGGAGAGC
25 RNA-Sequencing Experiments Sample A GTCGCAGTANCTGTCT GGATCTGCGATATACC GGATCT-CGATATACC AATCTGATCTTATTTT AATCTGATCTTATTTT ATATATATATATATAT ATATATATATATATAT TGTCGCAGTATCTGTC TATGTCGCAGTATCTG TATATCGCAGTATCTG TATATCGCAGTATCTG TATATCGCAGTATCTG CCCTATATCGCAGTAT AGCACCCTATGTCGCA AGCACCCTATATCGCA AGCACCCTATGTCGCA GAGCACCCTATGTCGC CCGGAGCACCCTATAT TCTCTCCCANNAGAGC CCGGAGCACCCTATAT TCTCTCCCAGGAGAGC GCCGGAGCACCCTATG TGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATTTGATTCCTGCCTCATCCTATTATTTATCGCACCTAC Sample B TGTCGCAGTATCTGTC AGCACCCTATGTCGCA GTCGCAGTANCTGTCT GCCGGAGCACCCTATG GGATCTGCGATATACC GGATCT-CGATATACC AATCTGATCTTATTTT AATCTGATCTTATTTT ATATATATATATATAT ATATATATATATATAT TCTCTCCCANNAGAGC TCTCTCCCAGGAGAGC
26 RNA-Sequencing Experiments Sample A GTCGCAGTANCTGTCT GGATCTGCGATATACC GGATCT-CGATATACC AATCTGATCTTATTTT AATCTGATCTTATTTT ATATATATATATATAT ATATATATATATATAT TGTCGCAGTATCTGTC TATGTCGCAGTATCTG TATATCGCAGTATCTG TATATCGCAGTATCTG TATATCGCAGTATCTG CCCTATATCGCAGTAT AGCACCCTATGTCGCA AGCACCCTATATCGCA AGCACCCTATGTCGCA GAGCACCCTATGTCGC CCGGAGCACCCTATAT TCTCTCCCANNAGAGC CCGGAGCACCCTATAT TCTCTCCCAGGAGAGC GCCGGAGCACCCTATG Gene 1 differentially expressed?: YES p-value: TGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATTTGATTCCTGCCTCATCCTATTATTTATCGCACCTAC Sample B TGTCGCAGTATCTGTC AGCACCCTATGTCGCA GTCGCAGTANCTGTCT GCCGGAGCACCCTATG GGATCTGCGATATACC GGATCT-CGATATACC AATCTGATCTTATTTT AATCTGATCTTATTTT ATATATATATATATAT ATATATATATATATAT TCTCTCCCANNAGAGC TCTCTCCCAGGAGAGC
27 The Myrna Pipeline
28 Bet-hedging architecture Cloud driver script Hadoop driver script Singleton driver script Wrapper Wrapper Wrapper bowtie Wrapper Hadoop bowtie Wrapper Hadoop bowtie Wrapper Perl, fork, sort soapsnp soapsnp soapsnp Postprocess Postprocess Postprocess Cloud mode Hadoop mode Single-computer mode
29 Pickrell Study 69 lymphoblastoid Hapmap cell lines Sequenced in two labs Argonne and Yale Total of 1.1 Billion 35 bp reads
30 Runtime/Costs On EC2 Myrna Runtime, Cost for 1.1 billion reads from Pickrell et al study EC2 Nodes 1 master, 1 master, 1 master, 10 workers 20 workers 40 workers Worker CPU cores Wall clock time 4h:20m 2h:32m 1h:38m Cluster setup 4m 4m 3m Align 2h:56m 1h:31m 54m Overlap 52m 31m 16m Normalize 6m 7m 6m Statistics 9m 6m 6m Summarize & Postprocess 13m 14m 13m Approximate cost (N. Virginia / Elsewhere) $44.00 / $49.50 $50.40 / $56.70 $65.60 / $73.80 Table 1. Timing and cost for a Myrna experiment with 1.1 billion 35 bp unpaired reads from the Pickrell et al study as input. Costs are approximate and based on the pricing as of this writing, that is, $0.68 per extra-large high-cpu EC2 node per hour in the Northern Virginia zone and $0.78 in other zones, plus a $0.12 per-node-per-hour surcharge for Elastic MapReduce in all zones. Times can vary subject to, for example, congestion and Internet traffic conditions.
31 Compute Time Nearly Linear in CPU Cores
32 Three Myrna Vignettes 1. Comparing models for differential expression 2. Biological variation in RNA-seq & microarrays 3. Identifying batch effects in sequencing
33 Vignette 1 Comparing RNA-Seq DE Models 69 lymphoblastoid Hapmap cell lines Sequenced in two labs Argonne and Yale Total of 1.1 Billion 35 bp reads Randomly assigned samples to 2-groups
34 Gene by Gene Statistical Model g(e[ f (c ij ) y j ]) = b i0 + η i log(q j ) + b i1 y j
35 Gene by Gene Statistical Model Normalized Counts For Gene i, Sample j Normalization Constant For Sample j g(e[ f (c ij ) y j ]) = b i0 + η i log(q j ) + b i1 y j Link Function Group Indicator Parameter We Test
36 Different Models = Different Results Poisson Model Gaussian Model Left Column η i = 1 Bullard et al. (2010) BMC Bioinformatics doi: / Right Column η i = estimated Langmead et al. (2010) Genome Biology doi: /gb r83 Permutation Model
37 P-values By Log-Count Poisson Model Gaussian Model Left Column η i = 1 Bullard et al. (2010) BMC Bioinformatics doi: / Right Column η i = estimated Langmead et al. (2010) Genome Biology doi: /gb r83 Permutation Model
38 Vignette 2 - Biological Variability in Sequencing Stranger et al. (2007) vs. Montgomery et al. (2010) Choy et al. (2008) vs. Pickrell et al. (2010)
39 RNA-Sequencing Studies With Small n
40 Vignette 3 Batch Effects
41 Batch Effect in 1,000 Genomes Input: 2 billion reads from the thousand genomes project Blue: 3 sds below the mean Orange: 3 sds above the mean Horizontal lines delimit process dates Human chromosome 16
42 Batch Effects Are Strong
43 Big Picture Ideas A Problem: The growth in data from sequencing is outstripping the growth in processing speed/storage of individual computers A solution: Take advantage of economies of scale and rent scalable computing resources for sequencing analysis Today: Myrna Tri-mode software for RNA-sequencing analysis
44 Acknowledgments Myrna Ben Langmead Kasper Hansen Biological Variability Rafael Irizarry Kasper Hansen Zhijin Wu Batch Effects Robert Scharpf Hector Corrada-Bravo David Simcha Benjamin Langmead W. Evan Johnson Donald Geman Keith Baggerly Rafael A. Irizarry
45 References + Further Information Leek Group Twitter Feed: Myrna Website: Langmead B, Hansen KD, Leek JT (2010), "Cloud-scale RNA-sequencing differential expression analysis with Myrna." Genome Biology 11:R83 Leek JT, Scharpf RB, Corrada Bravo H, Simcha D, Langmead B, Johnson WE, Geman D, Baggerly K, Irizarry RA (2010), "Tackling the widespread and critical impact of batch effects in high-throughput data." Nature Reviews Genetics, 11: Hansen KD, Wu Z, Irizarry RA, Leek JT (Submitted), Sequencing technology does not eliminate biological variability.
Introduction to second-generation sequencing
Introduction to second-generation sequencing CMSC858B Spring 2012 Many slides courtesy of Ben Langmead Second-Generation Sequencing 2 1 2 Human Epigenome Project ENCODE project Methylation 3 http://www.genome.gov/10005107
More informationLarge-Scale Behavioral Targeting
Large-Scale Behavioral Targeting Ye Chen, Dmitry Pavlov, John Canny ebay, Yandex, UC Berkeley (This work was conducted at Yahoo! Labs.) June 30, 2009 Chen et al. (KDD 09) Large-Scale Behavioral Targeting
More informationProbabilistic Near-Duplicate. Detection Using Simhash
Probabilistic Near-Duplicate Detection Using Simhash Sadhan Sood, Dmitri Loguinov Presented by Matt Smith Internet Research Lab Department of Computer Science and Engineering Texas A&M University 27 October
More informationAn Integrated Approach for the Assessment of Chromosomal Abnormalities
An Integrated Approach for the Assessment of Chromosomal Abnormalities Department of Biostatistics Johns Hopkins Bloomberg School of Public Health June 6, 2007 Karyotypes Mitosis and Meiosis Meiosis Meiosis
More informationNatural Computation. Transforming Tools of NBIC: Complex Adaptive Systems. James P. Crutchfield Santa Fe Institute
Natural Computation Transforming Tools of NBIC: Complex Adaptive Systems James P. Crutchfield www.santafe.edu/~chaos Santa Fe Institute Nanotechnology, Biotechnology, Information Technology, and Cognitive
More informationJacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA
Jacobi-Based Eigenvalue Solver on GPU Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline Symmetric eigenvalue solver Experiment Applications Conclusions Symmetric eigenvalue solver The standard form is
More informationUsing R for Iterative and Incremental Processing
Using R for Iterative and Incremental Processing Shivaram Venkataraman, Indrajit Roy, Alvin AuYoung, Robert Schreiber UC Berkeley and HP Labs UC BERKELEY Big Data, Complex Algorithms PageRank (Dominant
More informationIon Torrent. The chip is the machine
Ion Torrent Introduction The Ion Personal Genome Machine [PGM] is simple, more costeffective, and more scalable than any other sequencing technology. Founded in 2007 by Jonathan Rothberg. Part of Life
More informationClustering algorithms distributed over a Cloud Computing Platform.
Clustering algorithms distributed over a Cloud Computing Platform. SEPTEMBER 28 TH 2012 Ph. D. thesis supervised by Pr. Fabrice Rossi. Matthieu Durut (Telecom/Lokad) 1 / 55 Outline. 1 Introduction to Cloud
More informationAn Integrated Approach for the Assessment of Chromosomal Abnormalities
An Integrated Approach for the Assessment of Chromosomal Abnormalities Department of Biostatistics Johns Hopkins Bloomberg School of Public Health June 26, 2007 Karyotypes Karyotypes General Cytogenetics
More informationMining Data Streams. The Stream Model. The Stream Model Sliding Windows Counting 1 s
Mining Data Streams The Stream Model Sliding Windows Counting 1 s 1 The Stream Model Data enters at a rapid rate from one or more input ports. The system cannot store the entire stream. How do you make
More informationSpatial Analytics Workshop
Spatial Analytics Workshop Pete Skomoroch, LinkedIn (@peteskomoroch) Kevin Weil, Twitter (@kevinweil) Sean Gorman, FortiusOne (@seangorman) #spatialanalytics Introduction The Rise of Spatial Analytics
More informationIntegrating Globus into a Science Gateway for Cryo-EM
Integrating Globus into a Science Gateway for Cryo-EM Michael Cianfrocco Life Sciences Institute Department of Biological Chemistry University of Michigan Globus World 2018 Impact & growth of Executive
More informationPerformance Metrics for Computer Systems. CASS 2018 Lavanya Ramapantulu
Performance Metrics for Computer Systems CASS 2018 Lavanya Ramapantulu Eight Great Ideas in Computer Architecture Design for Moore s Law Use abstraction to simplify design Make the common case fast Performance
More informationOne Optimized I/O Configuration per HPC Application
One Optimized I/O Configuration per HPC Application Leveraging I/O Configurability of Amazon EC2 Cloud Mingliang Liu, Jidong Zhai, Yan Zhai Tsinghua University Xiaosong Ma North Carolina State University
More informationCONTEMPORARY ANALYTICAL ECOSYSTEM PATRICK HALL, SAS INSTITUTE
CONTEMPORARY ANALYTICAL ECOSYSTEM PATRICK HALL, SAS INSTITUTE Copyright 2013, SAS Institute Inc. All rights reserved. Agenda (Optional) History Lesson 2015 Buzzwords Machine Learning for X Citizen Data
More informationBeiHang Short Course, Part 7: HW Acceleration: It s about Performance, Energy and Power
BeiHang Short Course, Part 7: HW Acceleration: It s about Performance, Energy and Power James C. Hoe Department of ECE Carnegie Mellon niversity Eric S. Chung, et al., Single chip Heterogeneous Computing:
More informationCS425: Algorithms for Web Scale Data
CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original slides can be accessed at: www.mmds.org Challenges
More informationKnowledge Discovery and Data Mining 1 (VO) ( )
Knowledge Discovery and Data Mining 1 (VO) (707.003) Map-Reduce Denis Helic KTI, TU Graz Oct 24, 2013 Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 1 / 82 Big picture: KDDM Probability Theory Linear Algebra
More informationCS 700: Quantitative Methods & Experimental Design in Computer Science
CS 700: Quantitative Methods & Experimental Design in Computer Science Sanjeev Setia Dept of Computer Science George Mason University Logistics Grade: 35% project, 25% Homework assignments 20% midterm,
More informationStatistics for Differential Expression in Sequencing Studies. Naomi Altman
Statistics for Differential Expression in Sequencing Studies Naomi Altman naomi@stat.psu.edu Outline Preliminaries what you need to do before the DE analysis Stat Background what you need to know to understand
More informationGenetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig
Genetic Networks Korbinian Strimmer IMISE, Universität Leipzig Seminar: Statistical Analysis of RNA-Seq Data 19 June 2012 Korbinian Strimmer, RNA-Seq Networks, 19/6/2012 1 Paper G. I. Allen and Z. Liu.
More informationHidden Markov Models for the Assessment of Chromosomal Alterations using High-throughput SNP Arrays
Hidden Markov Models for the Assessment of Chromosomal Alterations using High-throughput SNP Arrays Department of Biostatistics Johns Hopkins Bloomberg School of Public Health November 18, 2008 Acknowledgments
More informationLecture 12: Energy and Power. James C. Hoe Department of ECE Carnegie Mellon University
18 447 Lecture 12: Energy and Power James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L12 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today a working understanding of
More informationCheminformatics Role in Pharmaceutical Industry. Randal Chen Ph.D. Abbott Laboratories Aug. 23, 2004 ACS
Cheminformatics Role in Pharmaceutical Industry Randal Chen Ph.D. Abbott Laboratories Aug. 23, 2004 ACS Agenda The big picture for pharmaceutical industry Current technological/scientific issues Types
More informationBehavioral Simulations in MapReduce
Behavioral Simulations in MapReduce Guozhang Wang, Marcos Vaz Salles, Benjamin Sowell, Xun Wang, Tuan Cao, Alan Demers, Johannes Gehrke, Walker White Cornell University 1 What are Behavioral Simulations?
More informationPI SERVER 2012 Do. More. Faster. Now! Copyr i g h t 2012 O S Is o f t, L L C. 1
PI SERVER 2012 Do. More. Faster. Now! Copyr i g h t 2012 O S Is o f t, L L C. 1 AUGUST 7, 2007 APRIL 14, 2010 APRIL 24, 2012 Copyr i g h t 2012 O S Is o f t, L L C. 2 PI Data Archive Security PI Asset
More informationScaling Magento in the Cloud
Scaling Magento in the Cloud Tria Foster VP, Ecommerce and Customer Loyalty Kendra Scott Brandon Elliott Chief Technologist Rackspace Digital M A G E N T O I M A G I N E A P R I L 2 0 1 5 TOP CONSIDERATIONS
More informationPERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah
PERFORMANCE METRICS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Jan. 17 th : Homework 1 release (due on Jan.
More information2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51
2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 Star Joins A common structure for data mining of commercial data is the star join. For example, a chain store like Walmart keeps a fact table whose tuples each
More informationIntroduction to Bioinformatics
CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics
More informationSTATISTICAL PERFORMANCE
STATISTICAL PERFORMANCE PROVISIONING AND ENERGY EFFICIENCY IN DISTRIBUTED COMPUTING SYSTEMS Nikzad Babaii Rizvandi 1 Supervisors: Prof.Albert Y.Zomaya Prof. Aruna Seneviratne OUTLINE Introduction Background
More informationArcGIS GeoAnalytics Server: An Introduction. Sarah Ambrose and Ravi Narayanan
ArcGIS GeoAnalytics Server: An Introduction Sarah Ambrose and Ravi Narayanan Overview Introduction Demos Analysis Concepts using GeoAnalytics Server GeoAnalytics Data Sources GeoAnalytics Server Administration
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Brad Broom Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org bmbroom@mdanderson.org 11
More informationImprovements for Implicit Linear Equation Solvers
Improvements for Implicit Linear Equation Solvers Roger Grimes, Bob Lucas, Clement Weisbecker Livermore Software Technology Corporation Abstract Solving large sparse linear systems of equations is often
More informationCS 100: Parallel Computing
CS 100: Parallel Computing Chris Kauffman Week 12 Logistics Upcoming HW 5: Due Friday by 11:59pm HW 6: Up by Early Next Week Moore s Law: CPUs get faster Smaller transistors closer together Smaller transistors
More informationFundamentals of Computational Science
Fundamentals of Computational Science Dr. Hyrum D. Carroll August 23, 2016 Introductions Each student: Name Undergraduate school & major Masters & major Previous research (if any) Why Computational Science
More informationGrundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson
Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 1 1 Introduction Grundlagen der Bioinformatik Summer semester 2010 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a)
More informationCost/Performance Tradeoffs:
Cost/Performance Tradeoffs: a case study Digital Systems Architecture I. L10 - Multipliers 1 Binary Multiplication x a b n bits n bits EASY PROBLEM: design combinational circuit to multiply tiny (1-, 2-,
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Sequence Assembly
CMPS 6630: Introduction to Computational Biology and Bioinformatics Sequence Assembly Why Genome Sequencing? Sanger (1982) introduced chaintermination sequencing. Main idea: Obtain fragments of all possible
More informationUnveiling the misteries of our Galaxy with Intersystems Caché
Unveiling the misteries of our Galaxy with Intersystems Caché Dr. Jordi Portell i de Mora on behalf of the Gaia team at the Institute for Space Studies of Catalonia (IEEC UB) and the Gaia Data Processing
More informationCMP 338: Third Class
CMP 338: Third Class HW 2 solution Conversion between bases The TINY processor Abstraction and separation of concerns Circuit design big picture Moore s law and chip fabrication cost Performance What does
More informationMining Data Streams. The Stream Model Sliding Windows Counting 1 s
Mining Data Streams The Stream Model Sliding Windows Counting 1 s 1 The Stream Model Data enters at a rapid rate from one or more input ports. The system cannot store the entire stream. How do you make
More informationMeasuring TF-DNA interactions
Measuring TF-DNA interactions How is Biological Complexity Achieved? Mediated by Transcription Factors (TFs) 2 Regulation of Gene Expression by Transcription Factors TF trans-acting factors TF TF TF TF
More informationMapReduce in Spark. Krzysztof Dembczyński. Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland
MapReduce in Spark Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, second semester
More informationLecture 2: Metrics to Evaluate Systems
Lecture 2: Metrics to Evaluate Systems Topics: Metrics: power, reliability, cost, benchmark suites, performance equation, summarizing performance with AM, GM, HM Sign up for the class mailing list! Video
More informationDesign of Microarray Experiments. Xiangqin Cui
Design of Microarray Experiments Xiangqin Cui Experimental design Experimental design: is a term used about efficient methods for planning the collection of data, in order to obtain the maximum amount
More informationMapReduce in Spark. Krzysztof Dembczyński. Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland
MapReduce in Spark Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first semester
More informationThe Challenge of Geospatial Big Data Analysis
288 POSTERS The Challenge of Geospatial Big Data Analysis Authors - Teerayut Horanont, University of Tokyo, Japan - Apichon Witayangkurn, University of Tokyo, Japan - Shibasaki Ryosuke, University of Tokyo,
More informationESE 570: Digital Integrated Circuits and VLSI Fundamentals
ESE 570: Digital Integrated Circuits and VLSI Fundamentals Lec 21: April 4, 2017 Memory Overview, Memory Core Cells Penn ESE 570 Spring 2017 Khanna Today! Memory " Classification " ROM Memories " RAM Memory
More informationHYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017
HYCOM and Navy ESPC Future High Performance Computing Needs Alan J. Wallcraft COAPS Short Seminar November 6, 2017 Forecasting Architectural Trends 3 NAVY OPERATIONAL GLOBAL OCEAN PREDICTION Trend is higher
More informationRevenue Maximization in a Cloud Federation
Revenue Maximization in a Cloud Federation Makhlouf Hadji and Djamal Zeghlache September 14th, 2015 IRT SystemX/ Telecom SudParis Makhlouf Hadji Outline of the presentation 01 Introduction 02 03 04 05
More informationBMD645. Integration of Omics
BMD645 Integration of Omics Shu-Jen Chen, Chang Gung University Dec. 11, 2009 1 Traditional Biology vs. Systems Biology Traditional biology : Single genes or proteins Systems biology: Simultaneously study
More informationThe Geo Web: Enabling GIS on the Internet IT4GIS Keith T. Weber, GISP GIS Director ISU GIS Training and Research Center.
The Geo Web: Enabling GIS on the Internet IT4GIS Keith T. Weber, GISP GIS Director ISU GIS Training and Research Center In the Beginning GIS was independent The GIS analyst or manager was typically a oneperson
More informationGenome 559 Wi RNA Function, Search, Discovery
Genome 559 Wi 2009 RN Function, Search, Discovery The Message Cells make lots of RN noncoding RN Functionally important, functionally diverse Structurally complex New tools required alignment, discovery,
More information3/10/2013. Lecture #1. How small is Nano? (A movie) What is Nanotechnology? What is Nanoelectronics? What are Emerging Devices?
EECS 498/598: Nanocircuits and Nanoarchitectures Lecture 1: Introduction to Nanotelectronic Devices (Sept. 5) Lectures 2: ITRS Nanoelectronics Road Map (Sept 7) Lecture 3: Nanodevices; Guest Lecture by
More informationBioinformatics and BLAST
Bioinformatics and BLAST Overview Recap of last time Similarity discussion Algorithms: Needleman-Wunsch Smith-Waterman BLAST Implementation issues and current research Recap from Last Time Genome consists
More informationVisualizing Big Data on Maps: Emerging Tools and Techniques. Ilir Bejleri, Sanjay Ranka
Visualizing Big Data on Maps: Emerging Tools and Techniques Ilir Bejleri, Sanjay Ranka Topics Web GIS Visualization Big Data GIS Performance Maps in Data Visualization Platforms Next: Web GIS Visualization
More informationCheminformatics platform for drug discovery application
EGI-InSPIRE Cheminformatics platform for drug discovery application Hsi-Kai, Wang Academic Sinica Grid Computing EGI User Forum, 13, April, 2011 1 Introduction to drug discovery Computing requirement of
More informationArcGIS Deployment Pattern. Azlina Mahad
ArcGIS Deployment Pattern Azlina Mahad Agenda Deployment Options Cloud Portal ArcGIS Server Data Publication Mobile System Management Desktop Web Device ArcGIS An Integrated Web GIS Platform Portal Providing
More informationALMA Development Program
ALMA Development Program Jeff Kern CASA Team Lead Atacama Large Millimeter/submillimeter Array Expanded Very Large Array Robert C. Byrd Green Bank Telescope Very Long Baseline Array Opportunities for Software
More informationEnergy-efficient Mapping of Big Data Workflows under Deadline Constraints
Energy-efficient Mapping of Big Data Workflows under Deadline Constraints Presenter: Tong Shu Authors: Tong Shu and Prof. Chase Q. Wu Big Data Center Department of Computer Science New Jersey Institute
More informationCONCEPT Year Schedule Master Biomolecular Sciences Specialization Molecular Cell Biology
Course Project Elective Elective (6 ec) Exams Holidays CONCEPT Year Schedule Master Sciences 2017-2018 Specialization Cell Biology Scientific Writing in English (AM_BMOL) (3 1156 modified: 6-7-2017 period
More informationSummary of Hyperion Research's First QC Expert Panel Survey Questions/Answers. Bob Sorensen, Earl Joseph, Steve Conway, and Alex Norton
Summary of Hyperion Research's First QC Expert Panel Survey Questions/Answers Bob Sorensen, Earl Joseph, Steve Conway, and Alex Norton Hyperion s Quantum Computing Program Global Coverage of R&D Efforts
More informationAstroPortal: A Science Gateway for Large-scale Astronomy Data Analysis
AstroPortal: A Science Gateway for Large-scale Astronomy Data Analysis Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago Joint work with: Ian Foster: Univ. of
More informationBioinformatics. Dept. of Computational Biology & Bioinformatics
Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS
More informationThanks to: University of Illinois at Chicago NSF DMS Alfred P. Sloan Foundation
Building circuits for integer factorization D. J. Bernstein Thanks to: University of Illinois at Chicago NSF DMS 0140542 Alfred P. Sloan Foundation I want to work for NSA as an independent contractor.
More informationHow to deal with uncertainties and dynamicity?
How to deal with uncertainties and dynamicity? http://graal.ens-lyon.fr/ lmarchal/scheduling/ 19 novembre 2012 1/ 37 Outline 1 Sensitivity and Robustness 2 Analyzing the sensitivity : the case of Backfilling
More informationIntroduction to Evolutionary Concepts
Introduction to Evolutionary Concepts and VMD/MultiSeq - Part I Zaida (Zan) Luthey-Schulten Dept. Chemistry, Beckman Institute, Biophysics, Institute of Genomics Biology, & Physics NIH Workshop 2009 VMD/MultiSeq
More informationIntroducing a Bioinformatics Similarity Search Solution
Introducing a Bioinformatics Similarity Search Solution 1 Page About the APU 3 The APU as a Driver of Similarity Search 3 Similarity Search in Bioinformatics 3 POC: GSI Joins Forces with the Weizmann Institute
More informationA Reconfigurable Quantum Computer
A Reconfigurable Quantum Computer David Moehring CEO, IonQ, Inc. College Park, MD Quantum Computing for Business 4-6 December 2017, Mountain View, CA IonQ Highlights Full Stack Quantum Computing Company
More informationScalable and Power-Efficient Data Mining Kernels
Scalable and Power-Efficient Data Mining Kernels Alok Choudhary, John G. Searle Professor Dept. of Electrical Engineering and Computer Science and Professor, Kellogg School of Management Director of the
More informationWhole Genome Alignments and Synteny Maps
Whole Genome Alignments and Synteny Maps IINTRODUCTION It was not until closely related organism genomes have been sequenced that people start to think about aligning genomes and chromosomes instead of
More informationQuantum Artificial Intelligence and Machine Learning: The Path to Enterprise Deployments. Randall Correll. +1 (703) Palo Alto, CA
Quantum Artificial Intelligence and Machine : The Path to Enterprise Deployments Randall Correll randall.correll@qcware.com +1 (703) 867-2395 Palo Alto, CA 1 Bundled software and services Professional
More informationWDCloud: An End to End System for Large- Scale Watershed Delineation on Cloud
WDCloud: An End to End System for Large- Scale Watershed Delineation on Cloud * In Kee Kim, * Jacob Steele, + Anthony Castronova, * Jonathan Goodall, and * Marty Humphrey * University of Virginia + Utah
More informationDiscrete Event Simulation. Motive
Discrete Event Simulation These slides are created by Dr. Yih Huang of George Mason University. Students registered in Dr. Huang's courses at GMU can make a single machine-readable copy and print a single
More informationInteractive GIS in Veterinary Epidemiology Technology & Application in a Veterinary Diagnostic Lab
Interactive GIS in Veterinary Epidemiology Technology & Application in a Veterinary Diagnostic Lab Basics GIS = Geographic Information System A GIS integrates hardware, software and data for capturing,
More informationRainfall data analysis and storm prediction system
Rainfall data analysis and storm prediction system SHABARIRAM, M. E. Available from Sheffield Hallam University Research Archive (SHURA) at: http://shura.shu.ac.uk/15778/ This document is the author deposited
More informationINF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)
INF2270 Spring 2010 Philipp Häfliger Summary/Repetition (1/2) content From Scalar to Superscalar Lecture Summary and Brief Repetition Binary numbers Boolean Algebra Combinational Logic Circuits Encoder/Decoder
More informationDEXSeq paper discussion
DEXSeq paper discussion L Collado-Torres December 10th, 2012 1 / 23 1 Background 2 DEXSeq paper 3 Results 2 / 23 Gene Expression 1 Background 1 Source: http://www.ncbi.nlm.nih.gov/projects/genome/probe/doc/applexpression.shtml
More informationBayesian Clustering of Multi-Omics
Bayesian Clustering of Multi-Omics for Cardiovascular Diseases Nils Strelow 22./23.01.2019 Final Presentation Trends in Bioinformatics WS18/19 Recap Intermediate presentation Precision Medicine Multi-Omics
More informationCMPE12 - Notes chapter 1. Digital Logic. (Textbook Chapter 3)
CMPE12 - Notes chapter 1 Digital Logic (Textbook Chapter 3) Transistor: Building Block of Computers Microprocessors contain TONS of transistors Intel Montecito (2005): 1.72 billion Intel Pentium 4 (2000):
More informationComparative analysis of RNA- Seq data with DESeq2
Comparative analysis of RNA- Seq data with DESeq2 Simon Anders EMBL Heidelberg Two applications of RNA- Seq Discovery Eind new transcripts Eind transcript boundaries Eind splice junctions Comparison Given
More informationIntro To Digital Logic
Intro To Digital Logic 1 Announcements... Project 2.2 out But delayed till after the midterm Midterm in a week Covers up to last lecture + next week's homework & lab Nick goes "H-Bomb of Justice" About
More informationL16: Power Dissipation in Digital Systems. L16: Spring 2007 Introductory Digital Systems Laboratory
L16: Power Dissipation in Digital Systems 1 Problem #1: Power Dissipation/Heat Power (Watts) 100000 10000 1000 100 10 1 0.1 4004 80088080 8085 808686 386 486 Pentium proc 18KW 5KW 1.5KW 500W 1971 1974
More informationFrequent Itemsets and Association Rule Mining. Vinay Setty Slides credit:
Frequent Itemsets and Association Rule Mining Vinay Setty vinay.j.setty@uis.no Slides credit: http://www.mmds.org/ Association Rule Discovery Supermarket shelf management Market-basket model: Goal: Identify
More informationSAT in Bioinformatics: Making the Case with Haplotype Inference
SAT in Bioinformatics: Making the Case with Haplotype Inference Inês Lynce 1 and João Marques-Silva 2 1 IST/INESC-ID, Technical University of Lisbon, Portugal ines@sat.inesc-id.pt 2 School of Electronics
More informationExperiments with. Cascading Design. Ben Kovitz Fluid Analogies Research Group Indiana University START
START + -1.715 + 6.272 + 6.630 + 1.647 Experiments with + 8.667-5.701 Cascading Design P1 P2 P3 P4 Ben Kovitz Fluid Analogies Research Group Indiana University The coordination problem An old creationist
More informationBigBOSS Data Reduction Software
BigBOSS Data Reduction Software The University of Utah Department of Physics & Astronomy The Premise BigBOSS data reduction software is as important as BigBOSS data collection hardware to the scientific
More informationIMA Preprint Series # 2176
SOLVING LARGE DOUBLE DIGESTION PROBLEMS FOR DNA RESTRICTION MAPPING BY USING BRANCH-AND-BOUND INTEGER LINEAR PROGRAMMING By Zhijun Wu and Yin Zhang IMA Preprint Series # 2176 ( October 2007 ) INSTITUTE
More informationSmartDairy Catalog HerdMetrix Herd Management Software
SmartDairy Catalog HerdMetrix Herd Management Quality Milk Through Technology Sort Gate Hoof Care Feeding Station ISO RFID SmartControl Meter TouchPoint System Management ViewPoint Catalog March 2011 Quality
More informationWhat is Systems Biology
What is Systems Biology 2 CBS, Department of Systems Biology 3 CBS, Department of Systems Biology Data integration In the Big Data era Combine different types of data, describing different things or the
More informationArcGIS Enterprise: What s New. Philip Heede Shannon Kalisky Melanie Summers Shreyas Shinde
ArcGIS Enterprise: What s New Philip Heede Shannon Kalisky Melanie Summers Shreyas Shinde ArcGIS Enterprise is the new name for ArcGIS for Server ArcGIS Enterprise Software Components ArcGIS Server Portal
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org
More informationComputational Frameworks. MapReduce
Computational Frameworks MapReduce 1 Computational complexity: Big data challenges Any processing requiring a superlinear number of operations may easily turn out unfeasible. If input size is really huge,
More informationLecture: Mixture Models for Microbiome data
Lecture: Mixture Models for Microbiome data Lecture 3: Mixture Models for Microbiome data Outline: - - Sequencing thought experiment Mixture Models (tangent) - (esp. Negative Binomial) - Differential abundance
More informationMSc Drug Design. Module Structure: (15 credits each) Lectures and Tutorials Assessment: 50% coursework, 50% unseen examination.
Module Structure: (15 credits each) Lectures and Assessment: 50% coursework, 50% unseen examination. Module Title Module 1: Bioinformatics and structural biology as applied to drug design MEDC0075 In the
More informationExploring Human Mobility with Multi-Source Data at Extremely Large Metropolitan Scales. ACM MobiCom 2014, Maui, HI
Exploring Human Mobility with Multi-Source Data at Extremely Large Metropolitan Scales Desheng Zhang & Tian He University of Minnesota, USA Jun Huang, Ye Li, Fan Zhang, Chengzhong Xu Shenzhen Institute
More informationCMP 334: Seventh Class
CMP 334: Seventh Class Performance HW 5 solution Averages and weighted averages (review) Amdahl's law Ripple-carry adder circuits Binary addition Half-adder circuits Full-adder circuits Subtraction, negative
More informationCRYPTOGRAPHIC COMPUTING
CRYPTOGRAPHIC COMPUTING ON GPU Chen Mou Cheng Dept. Electrical Engineering g National Taiwan University January 16, 2009 COLLABORATORS Daniel Bernstein, UIC, USA Tien Ren Chen, Army Tanja Lange, TU Eindhoven,
More information