Big Data Analytics. Lucas Rego Drumond
|
|
- Logan Barker
- 5 years ago
- Views:
Transcription
1 Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Map Reduce I Map Reduce I 1 / 32
2 Outline 1. Introduction 2. Parallel Computing 3. Parallel programming paradigms Map Reduce I 1 / 32
3 1. Introduction Outline 1. Introduction 2. Parallel Computing 3. Parallel programming paradigms Map Reduce I 1 / 32
4 1. Introduction Overview Part III Machine Learning Algorithms Part II Large Scale Computational Models Part I Distributed Database Distributed File System Map Reduce I 1 / 32
5 2. Parallel Computing Outline 1. Introduction 2. Parallel Computing 3. Parallel programming paradigms Map Reduce I 2 / 32
6 2. Parallel Computing Why do we need a Computational Model? Our data is nicely stored in a distributed infrastructure We have a number of computers at our disposal We want our analytics software to take advantage of all this computing power When programming we want to focus on understanding our data and not our infrastructure Map Reduce I 2 / 32
7 2. Parallel Computing Shared Memory Infrastructure Processor Processor Processor Memory Map Reduce I 3 / 32
8 2. Parallel Computing Distributed Infrastructure Network Processor Processor Processor Processor Memory Memory Memory Memory Map Reduce I 4 / 32
9 3. Parallel programming paradigms Outline 1. Introduction 2. Parallel Computing 3. Parallel programming paradigms Map Reduce I 5 / 32
10 3. Parallel programming paradigms Parallel Computing principles We have p processors available to execute a task T Ideally: the more processors the faster a task is executed Reality: synchronisation and communication costs Speedup s(t, p) of a task T by using p processors: Be t(t, p) the time needed to execute T using p processors Speedup is given by: s(t, p) = t(t, 1) t(t, p) Map Reduce I 5 / 32
11 3. Parallel programming paradigms Parallel Computing principles We have p processors available to execute a task T Efficiency e(t, p) of a task T by using p processors: e(t, p) = t(t, 1) p t(t, p) Map Reduce I 6 / 32
12 3. Parallel programming paradigms Considerations It is not worth using a lot of processors for solving small problems Algorithms should increase efficiency with problem size Map Reduce I 7 / 32
13 3. Parallel programming paradigms Paradigms - Shared Memory All the processors have access to all the data D := {d 1,..., d n } Pieces of data can be overwritten Processors need to lock datapoints before using them For each processor p: 1. lock(d i ) 2. process(d i ) 3. unlock(d i ) Map Reduce I 8 / 32
14 3. Parallel programming paradigms Word Count Example Given a corpus of text documents D := {d 1,..., d n } each containing a sequence of words: w 1,..., w m pooled from a set W of possible words. the task is to generate word counts for each word in the corpus Map Reduce I 9 / 32
15 3. Parallel programming paradigms Word Count - Shared Memory Shared vector for word counts: c R W c {0} W Each processor: 1. access a document d D 2. for each word w i in document d: 3. lock(c i ) 4. c i c i unlock(c i ) Map Reduce I 10 / 32
16 3. Parallel programming paradigms Paradigms - Shared Memory Inefficient in a distributed scenario Results of a process can easily be overwritten Possible long waiting times for a piece of data because of the lock mechanism Map Reduce I 11 / 32
17 3. Parallel programming paradigms Paradigms - Message passing Each processor sees only one part of the data π(d, p) := {d p,..., d p+ n p 1 } Each processor works on its partition Results are exchanged between processors (message passing) For each processor p: 1. For each d π(d, p) 2. process(d) 3. Communicate results Map Reduce I 12 / 32
18 3. Parallel programming paradigms Word Count - Message passing We need to define two types of processes: 1. Slave - counts the words on a subset of documents and informs the master 2. Master - gathers counts from the slaves and sums them up Map Reduce I 13 / 32
19 3. Parallel programming paradigms Word Count - Message passing Slave: Local memory: 1. subset of documents: π(d, p) := {d p,..., d p+ n p 1 } 2. Address of the master: addr master 3. Local word counts: c R W c {0} W 1. for each document d π(d, p) 2. for each word w i in document d: 3. c i c i + 1 Send message send(addr master, c) Map Reduce I 14 / 32
20 3. Parallel programming paradigms Word Count - Message passing Master: Local memory: 1. Global word counts: c global R W 2. List of slaves: S c global {0} W s {0} S For each received message (p, c p ) 1. c global c global + c p 2. s p 1 3. if s 1 = S return c global Map Reduce I 15 / 32
21 3. Parallel programming paradigms Paradigms - Message passing We need to manually assign master and slave roles for each processor Partition of the data needs to be done manually Implementations like OpenMPI only provide services to exchange messages Map Reduce I 16 / 32
22 Outline 1. Introduction 2. Parallel Computing 3. Parallel programming paradigms Map Reduce I 17 / 32
23 Map-Reduce Builds on the distributed message passing paradigm Considers the data is partitioned over the nodes Pipelined procedure: 1. Map phase 2. Reduce phase High level abstraction: programmer only specifies a map and a reduce routine Map Reduce I 17 / 32
24 Map-Reduce No need to worry about how many processors are available No need to specify which ones will be mappers and which ones will be reducers Map Reduce I 18 / 32
25 Key-Value input data Map-Reduce requires the data to be stored in a key-value format Natural if one works with column databases Examples: Key document document user user user Value array of words word movies friends tweet Map Reduce I 19 / 32
26 The Paradigm - Formally Given A set of input keys I A set of output keys O A set of input values X A set of intermediate values V A set of output values Y We can define: map : I X P(O V ) and reduce : O P(V ) O Y where P denotes the powerset Map Reduce I 20 / 32
27 The Paradigm - Unformal 1. Each mapper transforms a set key-value pairs into a list of output keys and intermediate value pairs 2. all intermediate values are grouped according to their output keys 3. each reducer receives all the intermediate values associated with a given keys 4. each reducer associates one final value to each key Map Reduce I 21 / 32
28 Word Count Example Map: Input: document-word list pairs Output: word-count pairs Reduce: Input: word-(count list) pairs Output: word-count pairs (d k, w 1,..., w m) [(w i, c i )] (w i, [c i ]) (w i, c [c i ] c) Map Reduce I 22 / 32
29 Word Count Example (d1, love ain't no stranger ) (d2, crying in the rain ) (love, 1) (stranger,1) (crying, 1) (rain, 1) (ain't, 1) (stranger,1) (love, 1) (love, 2) (love, 2) (crying, 1) (stranger,1) (love, 5) (crying, 2) (crying, 1) (d3, looking for love ) (love, 2) (d4, I'm crying ) (d5, the deeper the love ) (crying, 1) (looking, 1) (deeper, 1) (ain't, 1) (ain't, 2) (ain't, 1) (rain, 1) (rain, 1) (looking, 1) (d6, is this love ) (love, 2) (ain't, 1) (looking, 1) (deeper, 1) (deeper, 1) (this, 1) (d7, Ain't no love ) (this, 1) (this, 1) Mappers Reducers Map Reduce I 23 / 32
30 Map p u b l i c s t a t i c c l a s s Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, I n t W r i t a b l e > { p r i v a t e f i n a l s t a t i c I n t W r i t a b l e one = new I n t W r i t a b l e ( 1 ) ; p r i v a t e Text word = new Text ( ) ; p u b l i c v o i d map( L o n g W r i t a b l e key, Text v a l u e, O u t p u t C o l l e c t o r <Text, I n t W r i t a b l e > output, R e p o r t e r r e p o r t e r ) throws I O E x c e p t i o n { S t r i n g l i n e = v a l u e. t o S t r i n g ( ) ; S t r i n g T o k e n i z e r t o k e n i z e r = new S t r i n g T o k e n i z e r ( l i n e ) ; } } w h i l e ( t o k e n i z e r. hasmoretokens ( ) ) { word. s e t ( t o k e n i z e r. nexttoken ( ) ) ; o u t p u t. c o l l e c t ( word, one ) ; } Map Reduce I 24 / 32
31 Reduce p u b l i c s t a t i c c l a s s Reduce extends MapReduceBase implements Reducer<Text, I n t W r i t a b l e, Text, I n t W r i t a b l e > { p u b l i c v o i d r e d u c e ( Text key, I t e r a t o r <I n t W r i t a b l e > v a l u e s, O u t p u t C o l l e c t o r <Text, I n t W r i t a b l e > output, R e p o r t e r r e p o r t e r ) throws I O E x c e p t i o n { } } i n t sum = 0 ; w h i l e ( v a l u e s. hasnext ( ) ) { sum += v a l u e s. n e x t ( ). g e t ( ) ; } o u t p u t. c o l l e c t ( key, new I n t W r i t a b l e ( sum ) ) ; Map Reduce I 25 / 32
32 Execution snippet p u b l i c s t a t i c v o i d main ( S t r i n g [ ] a r g s ) throws E x c e p t i o n { JobConf c o n f = new JobConf ( WordCount. c l a s s ) ; c o n f. setjobname ( wordcount ) ; c o n f. s e t O u t p u t K e y C l a s s ( Text. c l a s s ) ; c o n f. s e t O u t p u t V a l u e C l a s s ( I n t W r i t a b l e. c l a s s ) ; c o n f. s e t M a p p e r C l a s s (Map. c l a s s ) ; c o n f. s e t C o m b i n e r C l a s s ( Reduce. c l a s s ) ; c o n f. s e t R e d u c e r C l a s s ( Reduce. c l a s s ) ; c o n f. s e t I n p u t F o r m a t ( TextInputFormat. c l a s s ) ; c o n f. setoutputformat ( TextOutputFormat. c l a s s ) ; F i l e I n p u t F o r m a t. s e t I n p u t P a t h s ( conf, new Path ( a r g s [ 0 ] ) ) ; F i l e O u t p u t F o r m a t. setoutputpath ( conf, new Path ( a r g s [ 1 ] ) ) ; } } J o b C l i e n t. runjob ( c o n f ) ; Map Reduce I 26 / 32
33 Considerations Maps are executed in parallel Reduces are executed in parallel Bottleneck: Reducers can only execute after all the mappers are finished Map Reduce I 27 / 32
34 Fault tolerance When the master node detecs node failures: Re-executes completed and in-progress map() Re-executes in-progress reduce tasks When the master node detects particular key-value pairs that causes mappers to crash: Problematic pairs are skipped in the execution Map Reduce I 28 / 32
35 Parallel Efficiency of Map-Reduce We have p processors for performing map and reduce operations Time to perform a task T on data D: t(t, 1) = wd Time for producing intermediate data σd after the map phase: t(t inter, 1) = σd Overheads: intermediate data per mapper: σd p each of the p reducers needs to read one p-th of the data written by each of the p mappers: σd p 1 p p = σd p Time for performing the task with Map-reduce: t MR (T, p) = wd p + 2K σd p K - constant for representing the overhead of IO operations (reading and writing data to disk) Map Reduce I 29 / 32
36 Parallel Efficiency of Map-Reduce Time for performing the task in one processor: wd Time for performing the task with p processors on Map-reduce: t MR (T, p) = wd p + K σd p Efficiency equation: Efficiency of Map-Reduce: e(t, p) = t(t, 1) p t(t, p) e MR (T, p) = wd p( wd p + 2K σd p ) Map Reduce I 30 / 32
37 Parallel Efficiency of Map-Reduce e MR (T, p) = wd p( wd p + 2K σd p ) = wd wd + 2KσD = wd 1 wd wd 1 wd + 2KσD 1 wd 1 = 1 + 2K σ w Map Reduce I 31 / 32
38 Parallel Efficiency of Map-Reduce 1 e MR (T, p) = 1 + 2K σ w Apparently the efficiency is independent of p Large efficiency can be achieved with large number of processors If σ is high (too much intermediate data) the efficiency suffers In many cases σ depends on p Map Reduce I 32 / 32
MapReduce in Spark. Krzysztof Dembczyński. Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland
MapReduce in Spark Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, second semester
More informationMapReduce in Spark. Krzysztof Dembczyński. Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland
MapReduce in Spark Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first semester
More informationDistributed Architectures
Distributed Architectures Software Architecture VO/KU (707023/707024) Roman Kern KTI, TU Graz 2015-01-21 Roman Kern (KTI, TU Graz) Distributed Architectures 2015-01-21 1 / 64 Outline 1 Introduction 2 Independent
More informationTiming Results of a Parallel FFTsynth
Purdue University Purdue e-pubs Department of Computer Science Technical Reports Department of Computer Science 1994 Timing Results of a Parallel FFTsynth Robert E. Lynch Purdue University, rel@cs.purdue.edu
More informationCactus Tools for Petascale Computing
Cactus Tools for Petascale Computing Erik Schnetter Reno, November 2007 Gamma Ray Bursts ~10 7 km He Protoneutron Star Accretion Collapse to a Black Hole Jet Formation and Sustainment Fe-group nuclei Si
More informationCS425: Algorithms for Web Scale Data
CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original slides can be accessed at: www.mmds.org Challenges
More informationAgreement Protocols. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur
Agreement Protocols CS60002: Distributed Systems Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Classification of Faults Based on components that failed Program
More information4th year Project demo presentation
4th year Project demo presentation Colm Ó héigeartaigh CASE4-99387212 coheig-case4@computing.dcu.ie 4th year Project demo presentation p. 1/23 Table of Contents An Introduction to Quantum Computing The
More informationLab Course: distributed data analytics
Lab Course: distributed data analytics 01. Threading and Parallelism Nghia Duong-Trung, Mohsan Jameel Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim, Germany International
More informationHigh Performance Computing
Master Degree Program in Computer Science and Networking, 2014-15 High Performance Computing 2 nd appello February 11, 2015 Write your name, surname, student identification number (numero di matricola),
More informationModern Optimization Techniques
Modern Optimization Techniques Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Stochastic Gradient Descent Stochastic
More informationOverview: Synchronous Computations
Overview: Synchronous Computations barriers: linear, tree-based and butterfly degrees of synchronization synchronous example 1: Jacobi Iterations serial and parallel code, performance analysis synchronous
More informationUsing R for Iterative and Incremental Processing
Using R for Iterative and Incremental Processing Shivaram Venkataraman, Indrajit Roy, Alvin AuYoung, Robert Schreiber UC Berkeley and HP Labs UC BERKELEY Big Data, Complex Algorithms PageRank (Dominant
More informationHow to deal with uncertainties and dynamicity?
How to deal with uncertainties and dynamicity? http://graal.ens-lyon.fr/ lmarchal/scheduling/ 19 novembre 2012 1/ 37 Outline 1 Sensitivity and Robustness 2 Analyzing the sensitivity : the case of Backfilling
More informationPERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah
PERFORMANCE METRICS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Jan. 17 th : Homework 1 release (due on Jan.
More informationRainfall data analysis and storm prediction system
Rainfall data analysis and storm prediction system SHABARIRAM, M. E. Available from Sheffield Hallam University Research Archive (SHURA) at: http://shura.shu.ac.uk/15778/ This document is the author deposited
More informationBig Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Predictive Models Predictive Models 1 / 34 Outline
More informationTIME DEPENDENCE OF SHELL MODEL CALCULATIONS 1. INTRODUCTION
Mathematical and Computational Applications, Vol. 11, No. 1, pp. 41-49, 2006. Association for Scientific Research TIME DEPENDENCE OF SHELL MODEL CALCULATIONS Süleyman Demirel University, Isparta, Turkey,
More informationPatent Searching using Bayesian Statistics
Patent Searching using Bayesian Statistics Willem van Hoorn, Exscientia Ltd Biovia European Forum, London, June 2017 Contents Who are we? Searching molecules in patents What can Pipeline Pilot do for you?
More informationProgram Performance Metrics
Program Performance Metrics he parallel run time (par) is the time from the moment when computation starts to the moment when the last processor finished his execution he speedup (S) is defined as the
More informationClojure Concurrency Constructs, Part Two. CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014
Clojure Concurrency Constructs, Part Two CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014 1 Goals Cover the material presented in Chapter 4, of our concurrency textbook In particular,
More informationDivisible Load Scheduling
Divisible Load Scheduling Henri Casanova 1,2 1 Associate Professor Department of Information and Computer Science University of Hawai i at Manoa, U.S.A. 2 Visiting Associate Professor National Institute
More informationCPU SCHEDULING RONG ZHENG
CPU SCHEDULING RONG ZHENG OVERVIEW Why scheduling? Non-preemptive vs Preemptive policies FCFS, SJF, Round robin, multilevel queues with feedback, guaranteed scheduling 2 SHORT-TERM, MID-TERM, LONG- TERM
More informationCS 347. Parallel and Distributed Data Processing. Spring Notes 11: MapReduce
CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 11: MapReduce Motivation Distribution makes simple computations complex Communication Load balancing Fault tolerance Not all applications
More information= ( 1 P + S V P) 1. Speedup T S /T V p
Numerical Simulation - Homework Solutions 2013 1. Amdahl s law. (28%) Consider parallel processing with common memory. The number of parallel processor is 254. Explain why Amdahl s law for this situation
More informationCRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel?
CRYSTAL in parallel: replicated and distributed (MPP) data Roberto Orlando Dipartimento di Chimica Università di Torino Via Pietro Giuria 5, 10125 Torino (Italy) roberto.orlando@unito.it 1 Why parallel?
More informationComputer Architecture
Lecture 2: Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture CPU Evolution What is? 2 Outline Measurements and metrics : Performance, Cost, Dependability, Power Guidelines
More informationPerformance and Scalability. Lars Karlsson
Performance and Scalability Lars Karlsson Outline Complexity analysis Runtime, speedup, efficiency Amdahl s Law and scalability Cost and overhead Cost optimality Iso-efficiency function Case study: matrix
More informationTime. Today. l Physical clocks l Logical clocks
Time Today l Physical clocks l Logical clocks Events, process states and clocks " A distributed system a collection P of N singlethreaded processes without shared memory Each process p i has a state s
More informationClock Synchronization
Today: Canonical Problems in Distributed Systems Time ordering and clock synchronization Leader election Mutual exclusion Distributed transactions Deadlock detection Lecture 11, page 7 Clock Synchronization
More informationOur Problem. Model. Clock Synchronization. Global Predicate Detection and Event Ordering
Our Problem Global Predicate Detection and Event Ordering To compute predicates over the state of a distributed application Model Clock Synchronization Message passing No failures Two possible timing assumptions:
More informationOne Optimized I/O Configuration per HPC Application
One Optimized I/O Configuration per HPC Application Leveraging I/O Configurability of Amazon EC2 Cloud Mingliang Liu, Jidong Zhai, Yan Zhai Tsinghua University Xiaosong Ma North Carolina State University
More informationAdministrivia. Course Objectives. Overview. Lecture Notes Week markem/cs333/ 2. Staff. 3. Prerequisites. 4. Grading. 1. Theory and application
Administrivia 1. markem/cs333/ 2. Staff 3. Prerequisites 4. Grading Course Objectives 1. Theory and application 2. Benefits 3. Labs TAs Overview 1. What is a computer system? CPU PC ALU System bus Memory
More informationScience Analysis Tools Design
Science Analysis Tools Design Robert Schaefer Software Lead, GSSC July, 2003 GLAST Science Support Center LAT Ground Software Workshop Design Talk Outline Definition of SAE and system requirements Use
More informationNCU EE -- DSP VLSI Design. Tsung-Han Tsai 1
NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using
More informationParallelization of the QC-lib Quantum Computer Simulator Library
Parallelization of the QC-lib Quantum Computer Simulator Library Ian Glendinning and Bernhard Ömer September 9, 23 PPAM 23 1 Ian Glendinning / September 9, 23 Outline Introduction Quantum Bits, Registers
More informationOn the Optimal Recovery Threshold of Coded Matrix Multiplication
1 On the Optimal Recovery Threshold of Coded Matrix Multiplication Sanghamitra Dutta, Mohammad Fahim, Farzin Haddadpour, Haewon Jeong, Viveck Cadambe, Pulkit Grover arxiv:1801.10292v2 [cs.it] 16 May 2018
More informationDistributed Box-Constrained Quadratic Optimization for Dual Linear SVM
Distributed Box-Constrained Quadratic Optimization for Dual Linear SVM Lee, Ching-pei University of Illinois at Urbana-Champaign Joint work with Dan Roth ICML 2015 Outline Introduction Algorithm Experiments
More informationTime. To do. q Physical clocks q Logical clocks
Time To do q Physical clocks q Logical clocks Events, process states and clocks A distributed system A collection P of N single-threaded processes (p i, i = 1,, N) without shared memory The processes in
More informationModel Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University
Model Order Reduction via Matlab Parallel Computing Toolbox E. Fatih Yetkin & Hasan Dağ Istanbul Technical University Computational Science & Engineering Department September 21, 2009 E. Fatih Yetkin (Istanbul
More informationWHITE PAPER ON QUANTUM COMPUTING AND QUANTUM COMMUNICATION
WHITE PAPER ON QUANTUM COMPUTING AND QUANTUM COMMUNICATION Based on the discussion during the respective workshop at the ZEISS Symposium Optics in the Quantum World on 18 April 2018 in Oberkochen, Germany
More informationSection 6 Fault-Tolerant Consensus
Section 6 Fault-Tolerant Consensus CS586 - Panagiota Fatourou 1 Description of the Problem Consensus Each process starts with an individual input from a particular value set V. Processes may fail by crashing.
More informationParallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano
Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano ... Our contribution PIPS-PSBB*: Multi-level parallelism for Stochastic
More informationECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University
ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University Prof. Mi Lu TA: Ehsan Rohani Laboratory Exercise #4 MIPS Assembly and Simulation
More informationIII. Naïve Bayes (pp.70-72) Probability review
III. Naïve Bayes (pp.70-72) This is a short section in our text, but we are presenting more material in these notes. Probability review Definition of probability: The probability of an even E is the ratio
More informationParallelization of the QC-lib Quantum Computer Simulator Library
Parallelization of the QC-lib Quantum Computer Simulator Library Ian Glendinning and Bernhard Ömer VCPC European Centre for Parallel Computing at Vienna Liechtensteinstraße 22, A-19 Vienna, Austria http://www.vcpc.univie.ac.at/qc/
More informationCPU Scheduling. CPU Scheduler
CPU Scheduling These slides are created by Dr. Huang of George Mason University. Students registered in Dr. Huang s courses at GMU can make a single machine readable copy and print a single copy of each
More informationDegradable Agreement in the Presence of. Byzantine Faults. Nitin H. Vaidya. Technical Report #
Degradable Agreement in the Presence of Byzantine Faults Nitin H. Vaidya Technical Report # 92-020 Abstract Consider a system consisting of a sender that wants to send a value to certain receivers. Byzantine
More informationCOMPUTER SCIENCE TRIPOS
CST.2016.2.1 COMPUTER SCIENCE TRIPOS Part IA Tuesday 31 May 2016 1.30 to 4.30 COMPUTER SCIENCE Paper 2 Answer one question from each of Sections A, B and C, and two questions from Section D. Submit the
More informationWeatherHawk Weather Station Protocol
WeatherHawk Weather Station Protocol Purpose To log atmosphere data using a WeatherHawk TM weather station Overview A weather station is setup to measure and record atmospheric measurements at 15 minute
More informationCS 700: Quantitative Methods & Experimental Design in Computer Science
CS 700: Quantitative Methods & Experimental Design in Computer Science Sanjeev Setia Dept of Computer Science George Mason University Logistics Grade: 35% project, 25% Homework assignments 20% midterm,
More informationReview for the Midterm Exam
Review for the Midterm Exam 1 Three Questions of the Computational Science Prelim scaled speedup network topologies work stealing 2 The in-class Spring 2012 Midterm Exam pleasingly parallel computations
More informationDistributed Systems Byzantine Agreement
Distributed Systems Byzantine Agreement He Sun School of Informatics University of Edinburgh Outline Finish EIG algorithm for Byzantine agreement. Number-of-processors lower bound for Byzantine agreement.
More informationComputation Theory Finite Automata
Computation Theory Dept. of Computing ITT Dublin October 14, 2010 Computation Theory I 1 We would like a model that captures the general nature of computation Consider two simple problems: 2 Design a program
More informationLogical Time. 1. Introduction 2. Clock and Events 3. Logical (Lamport) Clocks 4. Vector Clocks 5. Efficient Implementation
Logical Time Nicola Dragoni Embedded Systems Engineering DTU Compute 1. Introduction 2. Clock and Events 3. Logical (Lamport) Clocks 4. Vector Clocks 5. Efficient Implementation 2013 ACM Turing Award:
More informationUML. Design Principles.
.. Babes-Bolyai University arthur@cs.ubbcluj.ro November 20, 2018 Overview 1 2 3 Diagrams Unified Modeling Language () - a standardized general-purpose modeling language in the field of object-oriented
More informationKnowledge Discovery and Data Mining 1 (VO) ( )
Knowledge Discovery and Data Mining 1 (VO) (707.003) Map-Reduce Denis Helic KTI, TU Graz Oct 24, 2013 Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 1 / 82 Big picture: KDDM Probability Theory Linear Algebra
More informationAGREEMENT PROBLEMS (1) Agreement problems arise in many practical applications:
AGREEMENT PROBLEMS (1) AGREEMENT PROBLEMS Agreement problems arise in many practical applications: agreement on whether to commit or abort the results of a distributed atomic action (e.g. database transaction)
More informationOn the Fundamental Limits of Coded Data Shuffling for Distributed Learning Systems
1 On the undamental Limits of oded Data Shuffling for Distributed Learning Systems del lmahdy, and Soheil Mohajer Department of lectrical and omputer ngineering, University of Minnesota, Minneapolis, MN,
More informationONLINE SCHEDULING OF MALLEABLE PARALLEL JOBS
ONLINE SCHEDULING OF MALLEABLE PARALLEL JOBS Richard A. Dutton and Weizhen Mao Department of Computer Science The College of William and Mary P.O. Box 795 Williamsburg, VA 2317-795, USA email: {radutt,wm}@cs.wm.edu
More informationChe-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University
Che-Wei Chang chewei@mail.cgu.edu.tw Department of Computer Science and Information Engineering, Chang Gung University } 2017/11/15 Midterm } 2017/11/22 Final Project Announcement 2 1. Introduction 2.
More informationScalable Asynchronous Gradient Descent Optimization for Out-of-Core Models
Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models Chengjie Qin 1, Martin Torres 2, and Florin Rusu 2 1 GraphSQL, Inc. 2 University of California Merced August 31, 2017 Machine
More informationCIS 4930/6930: Principles of Cyber-Physical Systems
CIS 4930/6930: Principles of Cyber-Physical Systems Chapter 11 Scheduling Hao Zheng Department of Computer Science and Engineering University of South Florida H. Zheng (CSE USF) CIS 4930/6930: Principles
More informationComputational Frameworks. MapReduce
Computational Frameworks MapReduce 1 Computational complexity: Big data challenges Any processing requiring a superlinear number of operations may easily turn out unfeasible. If input size is really huge,
More informationMotors Automation Energy Transmission & Distribution Coatings. Servo Drive SCA06 V1.5X. Addendum to the Programming Manual SCA06 V1.
Motors Automation Energy Transmission & Distribution Coatings Servo Drive SCA06 V1.5X SCA06 V1.4X Series: SCA06 Language: English Document Number: 10003604017 / 01 Software Version: V1.5X Publication Date:
More informationUsing a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics
Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Jorge González-Domínguez Parallel and Distributed Architectures Group Johannes Gutenberg University of Mainz, Germany j.gonzalez@uni-mainz.de
More informationModule 5: CPU Scheduling
Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation 5.1 Basic Concepts Maximum CPU utilization obtained
More informationTTA and PALS: Formally Verified Design Patterns for Distributed Cyber-Physical
TTA and PALS: Formally Verified Design Patterns for Distributed Cyber-Physical DASC 2011, Oct/19 CoMMiCS Wilfried Steiner wilfried.steiner@tttech.com TTTech Computertechnik AG John Rushby rushby@csl.sri.com
More informationLecture 2: Metrics to Evaluate Systems
Lecture 2: Metrics to Evaluate Systems Topics: Metrics: power, reliability, cost, benchmark suites, performance equation, summarizing performance with AM, GM, HM Sign up for the class mailing list! Video
More informationLet s now begin to formalize our analysis of sequential machines Powerful methods for designing machines for System control Pattern recognition Etc.
Finite State Machines Introduction Let s now begin to formalize our analysis of sequential machines Powerful methods for designing machines for System control Pattern recognition Etc. Such devices form
More informationMachine Learning: Pattern Mining
Machine Learning: Pattern Mining Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Wintersemester 2007 / 2008 Pattern Mining Overview Itemsets Task Naive Algorithm Apriori Algorithm
More informationChapter 6: CPU Scheduling
Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation 6.1 Basic Concepts Maximum CPU utilization obtained
More informationProject 2: Hadoop PageRank Cloud Computing Spring 2017
Project 2: Hadoop PageRank Cloud Computing Spring 2017 Professor Judy Qiu Goal This assignment provides an illustration of PageRank algorithms and Hadoop. You will then blend these applications by implementing
More informationFrom DES to LSST. Transient Processing Goes from Hours to Seconds. Eric Morganson, NCSA LSST Time Domain Meeting Tucson, AZ May 22, 2017
From DES to LSST Transient Processing Goes from Hours to Seconds Eric Morganson, NCSA LSST Time Domain Meeting Tucson, AZ May 22, 2017 Hi, I m Eric Dr. Eric Morganson, Research Scientist, Nation Center
More informationCSE613: Parallel Programming, Spring 2012 Date: May 11. Final Exam. ( 11:15 AM 1:45 PM : 150 Minutes )
CSE613: Parallel Programming, Spring 2012 Date: May 11 Final Exam ( 11:15 AM 1:45 PM : 150 Minutes ) This exam will account for either 10% or 20% of your overall grade depending on your relative performance
More informationLars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
Syllabus Fri. 21.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 27.10. (2) A.1 Linear Regression Fri. 3.11. (3) A.2 Linear Classification Fri. 10.11. (4) A.3 Regularization
More informationINF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)
INF2270 Spring 2010 Philipp Häfliger Summary/Repetition (1/2) content From Scalar to Superscalar Lecture Summary and Brief Repetition Binary numbers Boolean Algebra Combinational Logic Circuits Encoder/Decoder
More information2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51
2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 Star Joins A common structure for data mining of commercial data is the star join. For example, a chain store like Walmart keeps a fact table whose tuples each
More informationOperational Laws Raj Jain
Operational Laws 33-1 Overview What is an Operational Law? 1. Utilization Law 2. Forced Flow Law 3. Little s Law 4. General Response Time Law 5. Interactive Response Time Law 6. Bottleneck Analysis 33-2
More informationPYTHON AND DATA SCIENCE. Prof. Chris Jermaine
PYTHON AND DATA SCIENCE Prof. Chris Jermaine cmj4@cs.rice.edu 1 Python Old language, first appeared in 1991 But updated often over the years Important characteristics Interpreted Dynamically-typed High
More information416 Distributed Systems
416 Distributed Systems RAID, Feb 26 2018 Thanks to Greg Ganger and Remzi Arapaci-Dusseau for slides Outline Using multiple disks Why have multiple disks? problem and approaches RAID levels and performance
More informationSoftware optimization for petaflops/s scale Quantum Monte Carlo simulations
Software optimization for petaflops/s scale Quantum Monte Carlo simulations A. Scemama 1, M. Caffarel 1, E. Oseret 2, W. Jalby 2 1 Laboratoire de Chimie et Physique Quantiques / IRSAMC, Toulouse, France
More informationAndrew Morton University of Waterloo Canada
EDF Feasibility and Hardware Accelerators Andrew Morton University of Waterloo Canada Outline 1) Introduction and motivation 2) Review of EDF and feasibility analysis 3) Hardware accelerators and scheduling
More informationHierarchical Anomaly Detection in Load Testing with StormRunner Load
White Paper Application Development, Test & Delivery Hierarchical Anomaly Detection in Load Testing with StormRunner Load A fresh approach to cloud-based website load testing is proving more effective
More informationECE521 Lecture 7/8. Logistic Regression
ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression
More informationParallelized Variational EM for Latent Dirichlet Allocation: An Experimental Evaluation of Speed and Scalability
Parallelized Variational EM for Latent Dirichlet Allocation: An Experimental Evaluation of Speed and Scalability Ramesh Nallapati, William Cohen and John Lafferty Machine Learning Department Carnegie Mellon
More informationParallel Computation of the Eigenstructure of Toeplitz-plus-Hankel matrices on Multicomputers
Parallel Computation of the Eigenstructure of Toeplitz-plus-Hankel matrices on Multicomputers José M. Badía * and Antonio M. Vidal * Departamento de Sistemas Informáticos y Computación Universidad Politécnica
More informationAbility to Count Messages Is Worth Θ( ) Rounds in Distributed Computing
Ability to Count Messages Is Worth Θ( ) Rounds in Distributed Computing Tuomo Lempiäinen Aalto University, Finland LICS 06 July 7, 06 @ New York / 0 Outline Introduction to distributed computing Different
More informationDesign of discrete-event simulations
Design of discrete-event simulations Lecturer: Dmitri A. Moltchanov E-mail: moltchan@cs.tut.fi http://www.cs.tut.fi/kurssit/tlt-2707/ OUTLINE: Discrete event simulation; Event advance design; Unit-time
More informationBarrier. Overview: Synchronous Computations. Barriers. Counter-based or Linear Barriers
Overview: Synchronous Computations Barrier barriers: linear, tree-based and butterfly degrees of synchronization synchronous example : Jacobi Iterations serial and parallel code, performance analysis synchronous
More informationNotation. Bounds on Speedup. Parallel Processing. CS575 Parallel Processing
Parallel Processing CS575 Parallel Processing Lecture five: Efficiency Wim Bohm, Colorado State University Some material from Speedup vs Efficiency in Parallel Systems - Eager, Zahorjan and Lazowska IEEE
More informationForecasting Workbench in PRMS TM. Master Production Schedule. Material Requirements Plan. Work Order/ FPO Maintenance. Soft Bill Maintenance
Forecasting Workbench in PRMS TM SHOP FLOOR CONTROL Work Order/ FPO Maintenance Auto Allocation to Lots Pick Slip Print Master Production Schedule Material Requirements Plan Soft Bill Maintenance Stage
More informationA New Dominant Point-Based Parallel Algorithm for Multiple Longest Common Subsequence Problem
A New Dominant Point-Based Parallel Algorithm for Multiple Longest Common Subsequence Problem Dmitry Korkin This work introduces a new parallel algorithm for computing a multiple longest common subsequence
More informationCSE 380 Computer Operating Systems
CSE 380 Computer Operating Systems Instructor: Insup Lee & Dianna Xu University of Pennsylvania, Fall 2003 Lecture Note 3: CPU Scheduling 1 CPU SCHEDULING q How can OS schedule the allocation of CPU cycles
More informationEngineering for Compatibility
W17 Compatibility Testing Wednesday, October 3rd, 2018 3:00 PM Engineering for Compatibility Presented by: Melissa Benua mparticle Brought to you by: 350 Corporate Way, Suite 400, Orange Park, FL 32073
More informationUsing Oracle Rdb Partitioned Lock Trees. Norman Lastovica Oracle Rdb Engineering November 13, 06
Using Oracle Rdb Partitioned Lock Trees Norman Lastovica Oracle Rdb Engineering November 13, 06 Agenda Locking Review Partitioned Lock Trees in OpenVMS Clusters Performance tests 2 Disclaimers Tests represented
More informationQuery Analyzer for Apache Pig
Imperial College London Department of Computing Individual Project: Final Report Query Analyzer for Apache Pig Author: Robert Yau Zhou 00734205 (robert.zhou12@imperial.ac.uk) Supervisor: Dr Peter McBrien
More informationOBEUS. (Object-Based Environment for Urban Simulation) Shareware Version. Itzhak Benenson 1,2, Slava Birfur 1, Vlad Kharbash 1
OBEUS (Object-Based Environment for Urban Simulation) Shareware Version Yaffo model is based on partition of the area into Voronoi polygons, which correspond to real-world houses; neighborhood relationship
More informationComputational Frameworks. MapReduce
Computational Frameworks MapReduce 1 Computational challenges in data mining Computation-intensive algorithms: e.g., optimization algorithms, graph algorithms Large inputs: e.g., web, social networks data.
More informationLUIZ FERNANDO F. G. DE ASSIS, TÉSSIO NOVACK, KARINE R. FERREIRA, LUBIA VINHAS AND ALEXANDER ZIPF
A discussion of crowdsourced geographic information initiatives and big Earth observation data architectures for land-use and land-cover change monitoring LUIZ FERNANDO F. G. DE ASSIS, TÉSSIO NOVACK, KARINE
More information