Distributed Architectures

Size: px
Start display at page:

Download "Distributed Architectures"

Transcription

1 Distributed Architectures Software Architecture VO/KU (707023/707024) Roman Kern KTI, TU Graz Roman Kern (KTI, TU Graz) Distributed Architectures / 64

2 Outline 1 Introduction 2 Independent operations 3 Distributed operations 4 Summary Roman Kern (KTI, TU Graz) Distributed Architectures / 64

3 Introduction Introduction Why distributed architecture? Roman Kern (KTI, TU Graz) Distributed Architectures / 64

4 Introduction Distributed Architectures Goal is to achieve a scalable infrastructure scale horizontally (scale out) Different levels of complexity Depends on the systems and the required attributes Certain approaches have evolved Frameworks have been developed Roman Kern (KTI, TU Graz) Distributed Architectures / 64

5 Introduction Distributed Architectures Parallel computing vs distributed computing In parallel computing all component share a common memory, typically threads within a single program In distributed computing each component has it own memory Typically in distributed computing the individual components are connected over a network Dedicated programming languages (or extensions) for parallel computing Roman Kern (KTI, TU Graz) Distributed Architectures / 64

6 Introduction Distributed Architectures Roman Kern (KTI, TU Graz) Distributed Architectures / 64

7 Introduction Distributed Architectures Different levels of complexity Lowest complexity for operations, which can easily be distributed If they are independent and short enough be to executed independent from each other Higher degree of complexity for operations, which compute a single result on multiple nodes Roman Kern (KTI, TU Graz) Distributed Architectures / 64

8 Independent operations Independent operations In the best case Roman Kern (KTI, TU Graz) Distributed Architectures / 64

9 Independent operations Independent operations In a simple scenario, the system just contains separate, independent operations No operation requires complex interactions Input data are typically small chunks Shared repository - all the data is available on all nodes Roman Kern (KTI, TU Graz) Distributed Architectures / 64

10 Independent operations Distributed Architectures Roman Kern (KTI, TU Graz) Distributed Architectures / 64

11 Independent operations Independent operations Still a number of issues to address 1 Group membership 2 Leader election 3 Queues - distribution of workload 4 Distributed locks 5 Barriers 6 Shared resources 7 Configuration Roman Kern (KTI, TU Graz) Distributed Architectures / 64

12 Independent operations Independent operations - Group membership Group membership When a single node comes online How does it know where to connect to? How do the other members know of an added node? Roman Kern (KTI, TU Graz) Distributed Architectures / 64

13 Independent operations Independent operations - Group membership Peer-to-peer architectural style Each node is client, as well as server Parts of the bootstrapping mechanism Dynamic vs static Fully dynamic via broadcast/multicast within local area networks (UDP) Centralised P2P - eg central login components/servers Static lists of group members (needs to be configurable) Roman Kern (KTI, TU Graz) Distributed Architectures / 64

14 Independent operations Independent operations - Leader Election Leader Election Not all nodes are equal, eg centralised components in P2P networks Single node acts as master, others are workers Some nodes have additional responsibilities (supernodes) Having centralised components makes some functionality easier to implement Eg assign work-load Disadvantage: might lead to a single point of failure Roman Kern (KTI, TU Graz) Distributed Architectures / 64

15 Independent operations Independent operations - Leader Election Client-server architectural style Once the leader has been elected, it takes over the role of the server All other group members then act as clients Roman Kern (KTI, TU Graz) Distributed Architectures / 64

16 Independent operations Independent operations - Leader Election Roman Kern (KTI, TU Graz) Distributed Architectures / 64

17 Independent operations Independent operations - Leader Election Roman Kern (KTI, TU Graz) Distributed Architectures / 64

18 Independent operations Independent operations - Leader Election Roman Kern (KTI, TU Graz) Distributed Architectures / 64

19 Independent operations Independent operations - Queues Queues Important component in many distributed systems Two types of nodes: manager of the queue, workers Incoming requests are collected at a single point And are stored as items in a queue Many client node consume items from the queue Roman Kern (KTI, TU Graz) Distributed Architectures / 64

20 Independent operations Independent operations - Queues Queues are often FIFO (first-in, first-out) Sometimes specific items are of higher priority Crucial aspect is the coordinated access to the queue Each item is only processed by a single client What if the client crashes while processing an item from the queue? Roman Kern (KTI, TU Graz) Distributed Architectures / 64

21 Independent operations Independent operations - Queues Publish-subscribe architectural style Basically a producer-consumer pattern Each worker client registers itself Queue manager notifies the worker of new items How to schedule the workers, which should be picked next? Roman Kern (KTI, TU Graz) Distributed Architectures / 64

22 Independent operations Independent operations - Locks Distributed Locks Restrict access to shared resources to only a single node at a time Eg allow only a single node to write to a file May yield many non-trivial problems, for example deadlocks or race conditions Distributed locks without central component are very complex to realise Roman Kern (KTI, TU Graz) Distributed Architectures / 64

23 Independent operations Independent operations - Locks Blackboard architectural style The shared repository is responsible to orchestrate the access to a locks Notifies waiting nodes once the lock has been lifted This functionality is often coupled with the elected leader Roman Kern (KTI, TU Graz) Distributed Architectures / 64

24 Independent operations Independent operations - Barriers Barriers Specific type of distributed lock Sychronise multiple nodes Eg multiple nodes should wait until a certain state has been reached Used when a part of the processing can be done in parallel and some parts cannot be distributed Roman Kern (KTI, TU Graz) Distributed Architectures / 64

25 Independent operations Independent operations - Shared Resources Shared Resources If all nodes need to be able to access a common data-structure Read-only vs read-write If read-write, the complexity rises due to synchronisation issues Roman Kern (KTI, TU Graz) Distributed Architectures / 64

26 Independent operations Apache Zookeeper Apache Zookeeper is a framework/library to Used by Yahoo!, LinkedIn, Facebook Initially developed by Yahoo! Now managed by Apache Alternative approaches: Google Chubby, Microsoft Centrifuge Roman Kern (KTI, TU Graz) Distributed Architectures / 64

27 Independent operations Apache Zookeeper Components of Zookeeper Coordination kernel File-system like API Synchronisation, Watches, Locks Configuration Shared data Example taken from: 2/zookeeperTutorialhtml Roman Kern (KTI, TU Graz) Distributed Architectures / 64

28 Independent operations Example of a Barrier with Zookeeper B a r r i e r ( S t r i n g address, S t r i n g name, i n t s i z e ) { super ( address ) ; t h i s r o o t = name ; t h i s s i z e = s i z e ; S t a t s = zk e x i s t s ( root, f a l s e ) ; i f ( s == n u l l ) zk create ( root, new byte [ 0 ], I d s OPEN_ACL_UNSAFE, 0 ) ; } / / My node name t h i s name = new S t r i n g ( InetAddress getlocalhost ( ) getcanonicalhostname ( ) t o S t r i n g ( ) ) ; Roman Kern (KTI, TU Graz) Distributed Architectures / 64

29 Independent operations Example of a Barrier with Zookeeper boolean enter ( ) { zk create ( r o o t + " / " + name, new byte [ 0 ], I d s OPEN_ACL_UNSAFE, CreateFlags EPHEMERAL ) ; while ( true ) { synchronized ( mutex ) { A r r a y L i s t < S t r i n g > l i s t = zk getchildren ( root, true ) ; } } } i f ( l i s t s i z e ( ) < s i z e ) mutex wait ( ) ; else return true ; Roman Kern (KTI, TU Graz) Distributed Architectures / 64

30 Independent operations Example of a Queue with Zookeeper i n t consume ( ) throws KeeperException, I n t e r r u p t e d E x c e p t i o n { i n t r e s u l t = 1; S t a t s t a t = n u l l ; while ( true ) { / / Get the f i r s t element a v a i l a b l e synchronized ( mutex ) { A r r a y L i s t < S t r i n g > l i s t = zk getchildren ( root, true ) ; i f (! l i s t isempty ( ) ) { I n t e g e r min = new I n t e g e r ( l i s t get ( 0 ) s u b s t r i n g ( 7 ) ) ; f o r ( S t r i n g s : l i s t ) { I n t e g e r tempvalue = new I n t e g e r ( s s u b s t r i n g ( 7 ) ) ; i f ( tempvalue < min ) min = tempvalue ; } byte [ ] b = zk getdata ( r o o t + " / element " + min, false, s t a t ) ; zk delete ( r o o t + " / element " + min, 0 ) ; B y t e B u f f e r b u f f e r = B y t e B uffer wrap ( b ) ; r e s u l t = b u f f e r g e t I n t ( ) ; return r e s u l t ; } mutex wait ( ) ; / / Going to wait } Roman Kern (KTI, TU Graz) Distributed Architectures / 64

31 Distributed operations Split up the work into separate tasks Roman Kern (KTI, TU Graz) Distributed Architectures / 64

32 Distributed Operations If the processing cannot be split into separate, independent operations If the data is too big to fit on a single machine Need for a distributed processing of a single operation Roman Kern (KTI, TU Graz) Distributed Architectures / 64

33 Contemporary Computing Environment Hardware basics Access to data in memory is much faster than access to data on disk (or online) Disk seeks: No data is transferred from disk while the disk head is being positioned Therefore: Transferring one large chunk of data from disk to memory is faster than transferring many small chunks Disk I/O is block-based: Reading and writing of entire blocks (as opposed to smaller chunks) Roman Kern (KTI, TU Graz) Distributed Architectures / 64

34 Map/Reduce Distributed indexing at Google For web-scale indexing Must use a distributed computing cluster Individual machines are fault-prone Can unpredictably slow down or fail Based on distributed file system Files are stored among different machines Redundant storage Information about storage is available to other components Roman Kern (KTI, TU Graz) Distributed Architectures / 64

35 Map/Reduce MapReduce MapReduce (Dean and Ghemawat 2004) is a robust and conceptually simple framework for distributed computing Motivated by indexing system at Google, which consists of a number of phases, each implemented in MapReduce Approach: Bring the code to the data distributed computing without having to write code for the distribution part Roman Kern (KTI, TU Graz) Distributed Architectures / 64

36 Google Infrastructure Google data centres mainly contain commodity machines Data centres are distributed around the world Estimate: a total of 1 million servers, 3 million processors/cores (Gartner 2007) Estimate: Google installs 100,000 servers each quarter Based on expenditures of million dollars per year Ṭhis would be 10% of the computing capacity of the world Roman Kern (KTI, TU Graz) Distributed Architectures / 64

37 Map/Reduce Map Worker Intermediate Data Input Data Map 1 Reduce Worker Output Data Split 1 Split 2 Reduce 1 Output 1 Split 3 Map 2 Split 4 Reduce 2 Output 2 Split 5 Map 3 Roman Kern (KTI, TU Graz) Distributed Architectures / 64

38 Map/Reduce Task of the mapper: read a chunk of the input data and generate a intermediate key plus values Task of the reducer: process a tuple of intermediate key plus values and write the output Note: Often a number of additional functions need to be provided as well Input Output Mapper k1, v1 list(k2, v2) Reducer k2, list(v2) list(k3, v3) Roman Kern (KTI, TU Graz) Distributed Architectures / 64

39 Example of a Mapper void countwordsoldschool ( ) { Map< S t r i n g, Integer > wordtocountmap = new HashMap< S t r i n g, Integer > ( ) ; L i s t < F i l e > f i l e L i s t = d i r l i s t F i l e s ( ) ; f o r ( F i l e f i l e : f i l e L i s t ) { S t r i n g content = I O U t i l s r e a d F i l e T o S t r i n g ( f i l e ) ; L i s t < S t r i n g > wordlist = tokenizeintowords ( content ) ; f o r ( S t r i n g word : wordlist ) { increment ( word, 1 ) ; } } w r i t e T o F i l e ( wordtocountmap ) ; } Roman Kern (KTI, TU Graz) Distributed Architectures / 64

40 Example of a Mapper void map ( i n t documentid, S t r i n g content ) { L i s t < S t r i n g > wordlist = tokenizeintowords ( content ) ; f o r ( S t r i n g word : wordlist ) { y i e l d ( word, 1 ) ; } } Roman Kern (KTI, TU Graz) Distributed Architectures / 64

41 Example of a Reducer void reduce ( S t r i n g word, L i s t < Integer > c o u n t L i s t ) { i n t counter = 0; f o r ( I n t e g e r count : c o u n t L i s t ) { counter += count ; } w r i t e ( word, counter ) ; } Roman Kern (KTI, TU Graz) Distributed Architectures / 64

42 Overview Inverted Index Input: Documents to be indexed, input documents are parsed and text is extracted 3 Friends, Romans, countrymen Roman Kern (KTI, TU Graz) Distributed Architectures / 64

43 Overview Inverted Index Input: Documents to be indexed, input documents are parsed and text is extracted 3 Friends, Romans, countrymen Tokenizer: Produces a token stream from the text 3 Friends Romans countrymen Roman Kern (KTI, TU Graz) Distributed Architectures / 64

44 Overview Inverted Index Input: Documents to be indexed, input documents are parsed and text is extracted 3 Friends, Romans, countrymen Tokenizer: Produces a token stream from the text 3 Friends Romans countrymen Linguistic models: Analyses and modifies the tokens 3 friends romans countrymen Roman Kern (KTI, TU Graz) Distributed Architectures / 64

45 Overview Inverted Index Input: Documents to be indexed, input documents are parsed and text is extracted 3 Friends, Romans, countrymen Tokenizer: Produces a token stream from the text 3 Friends Romans countrymen Linguistic models: Analyses and modifies the tokens 3 friends romans countrymen Indexer: Collects the tokens and inverts the data-structure countrymen 2 3 friends romans 3 9 Roman Kern (KTI, TU Graz) Distributed Architectures / 64

46 Detail Inverted Index Step 1: Build term-document table Document 1 I did enact Julius Caesar I was killed in the Capitol; Brutus killed me Document 2 So let it be with Caesar The noble Brutus hath told you Caesar was ambitious Roman Kern (KTI, TU Graz) Distributed Architectures / 64

47 Detail Inverted Index Step 1: Build term-document table Document 1 I did enact Julius Caesar I was killed in the Capitol; Brutus killed me Document 2 So let it be with Caesar The noble Brutus hath told you Caesar was ambitious Term Doc # i 1 did 1 enact 1 julius 1 caesar 1 i 1 was 1 killed 1 in 1 the 1 capitol 1 brutus 1 killed 1 me 1 so 2 let 2 it 2 be 2 with 2 caesar 2 Roman Kern (KTI, TU Graz) Distributed Architectures / 64

48 Detail Inverted Index Step 2: Sort by terms Term Doc # i 1 did 1 enact 1 julius 1 caesar 1 i 1 was 1 killed 1 in 1 the 1 capitol 1 brutus 1 killed 1 me 1 so 2 let 2 it 2 be 2 with 2 caesar 2 Roman Kern (KTI, TU Graz) Distributed Architectures / 64

49 Detail Inverted Index Step 2: Sort by terms Term Doc # i 1 did 1 enact 1 julius 1 caesar 1 i 1 was 1 killed 1 in 1 the 1 capitol 1 brutus 1 killed 1 me 1 so 2 let 2 it 2 be 2 with 2 caesar 2 Term Doc # ambitious 2 be 2 brutus 1 brutus 2 capitol 1 caesar 1 caesar 2 caesar 2 did 1 enact 1 hath 1 i 1 i 1 in 1 it 2 julius 1 killed 1 killed 1 let 2 me 1 Roman Kern (KTI, TU Graz) Distributed Architectures / 64

50 Detail Inverted Index Step 3: Add term frequency, multiple entries from single document get merged Term Doc # ambitious 2 be 2 brutus 1 brutus 2 capitol 1 caesar 1 caesar 2 caesar 2 did 1 enact 1 hath 1 i 1 i 1 in 1 it 2 julius 1 killed 1 killed 1 let 2 me 1 Roman Kern (KTI, TU Graz) Distributed Architectures / 64

51 Detail Inverted Index Step 3: Add term frequency, multiple entries from single document get merged Term Doc # ambitious 2 be 2 brutus 1 brutus 2 capitol 1 caesar 1 caesar 2 caesar 2 did 1 enact 1 hath 1 i 1 i 1 in 1 it 2 julius 1 killed 1 killed 1 let 2 me 1 Term Doc # TF ambitious 2 1 be 2 1 brutus 1 1 brutus 2 1 capitol 1 1 caesar 1 1 caesar 2 2 did 1 1 enact 1 1 hath 2 1 i 1 2 in 1 1 it 2 1 julius 1 1 killed 1 2 let 2 1 me 1 1 noble 2 1 so 2 1 the 1 1 Roman Kern (KTI, TU Graz) Distributed Architectures / 64

52 Detail Inverted Index Step 4: Result is split into dictionary file and postings file Term Doc # TF ambitious 2 1 be 2 1 brutus 1 1 brutus 2 1 capitol 1 1 caesar 1 1 caesar 2 2 did 1 1 enact 1 1 hath 2 1 i 1 2 in 1 1 it 2 1 julius 1 1 killed 1 2 let 2 1 me 1 1 noble 2 1 Roman Kern (KTI, TU Graz) Distributed Architectures / 64

53 Detail Inverted Index Step 4: Result is split into dictionary file and postings file Term Doc # TF ambitious 2 1 be 2 1 brutus 1 1 brutus 2 1 capitol 1 1 caesar 1 1 caesar 2 2 did 1 1 enact 1 1 hath 2 1 i 1 2 in 1 1 it 2 1 julius 1 1 killed 1 2 let 2 1 me 1 1 noble 2 1 Dictionary # Term DF CF 0 ambitious be brutus capitol caesar did enact hath i 1 2 Term# Postings {Doc#,TF} 0 2,1 1 2,1 2 1,1 2,1 3 1,1 4 1,1 2,2 5 1,1 6 1,1 7 2,1 8 1,2 Roman Kern (KTI, TU Graz) Distributed Architectures / 64

54 Index Construction What is the role of the Map/Reduce framework when building such an index? Roman Kern (KTI, TU Graz) Distributed Architectures / 64

55 Index Construction Recall step 1 of inverted index creation Document 1 I did enact Julius Caesar I was killed in the Capitol; Brutus killed me Document 2 So let it be with Caesar The noble Brutus hath told you Caesar was ambitious Term Doc # i 1 did 1 enact 1 julius 1 caesar 1 i 1 was 1 killed 1 in 1 the 1 capitol 1 brutus 1 killed 1 me 1 so 2 let 2 it 2 be 2 with 2 caesar 2 Roman Kern (KTI, TU Graz) Distributed Architectures / 64

56 Index Creation After all documents have been parsed, the inverted file is sorted by terms There might be many items to sort Term Doc # i 1 did 1 enact 1 julius 1 caesar 1 i 1 was 1 killed 1 in 1 the 1 capitol 1 brutus 1 killed 1 me 1 so 2 let 2 it 2 be 2 with 2 caesar 2 Term Doc # ambitious 2 be 2 brutus 1 brutus 2 capitol 1 caesar 1 caesar 2 caesar 2 did 1 enact 1 hath 1 i 1 i 1 in 1 it 2 julius 1 killed 1 killed 1 let 2 me 1 Roman Kern (KTI, TU Graz) Distributed Architectures / 64

57 Index Construction Map step: parse the documents and yield terms as keys Framework: Sort the keys from the mappers Reduce: Collect all keys and write out the inverted index Roman Kern (KTI, TU Graz) Distributed Architectures / 64

58 Map/Reduce Framework Existing open-source framework: Apache Hadoop Implemented in Java Initially developed by Yahoo! Now used by many companies and organisations Roman Kern (KTI, TU Graz) Distributed Architectures / 64

59 Big Data Framework Map/Reduce is well suited for batch processing, less so for online processing eg incoming stream of Twitter messages Need for a distributed realtime computation system Roman Kern (KTI, TU Graz) Distributed Architectures / 64

60 Big Data Framework Storm framework: Scaleable, fault-tolerant guaranteed message processing Multi-language support Thrift definitions JSON based protocol (for non-jvm languages) Uses ZeroMQ for message passing, Zookeeper for cluster setup No storage capabilities Roman Kern (KTI, TU Graz) Distributed Architectures / 64

61 Storm Topologies the ``job'', defines how spouts and bolts are connected Spouts sources of streams, deliver data to bolts Bolts processing units (can produce input for other bolts) Roman Kern (KTI, TU Graz) Distributed Architectures / 64

62 Storm Roman Kern (KTI, TU Graz) Distributed Architectures / 64

63 Storm Topology TopologyBuilder b u i l d e r = new TopologyBuilder ( ) ; b u i l d e r setspout ( " sentences ", new SentenceSpout ( ), 5 ) ; b u i l d e r s e t B o l t ( " s p l i t ", new SplitSentence ( ), 8) shufflegrouping ( " sentences " ) ; b u i l d e r s e t B o l t ( " count ", new WordCount ( ), 1 2 ) f i e l d s G r o u p i n g ( " s p l i t ", new F i e l d s ( " word " ) ) ; Roman Kern (KTI, TU Graz) Distributed Architectures / 64

64 Storm Spout public cl ass SentenceSpout extends BaseRichSpout public void nexttuple ( ) { Sentence s = queue p o l l ( ) ; i f ( r e t == n u l l ) { U t i l s sleep ( 5 0 ) ; } else { _ c o l l e c t o r emit ( new Values ( s ) ) ; } } Roman Kern (KTI, TU Graz) Distributed Architectures / 64

65 Storm Bolt #1 public cl ass SplitSentence extends BaseRichBolt public void execute ( Tuple t u p l e ) { S t r i n g row = t u p l e g e t S t r i n g ( 0 ) ; S t r i n g [ ] words = row s p l i t ( " " ) ; f o r ( S t r i n g word : words ) { c o l l e c t o r emit ( tuple, new Values ( word ) ) ; } c o l l e c t o r ack ( t u p l e ) ; } public void d eclareoutputfields ( O u t p u t F i e l d s D e c larer d e c l a r e r ) { d e c l a r e r declare ( new F i e l d s ( " word " ) ) ; } Roman Kern (KTI, TU Graz) Distributed Architectures / 64

66 Storm Bolt #2 public c l a s s WordCount implements I B a s i c B o l t { p r i v a t e Map< S t r i n g, Integer > _counts = new HashMap< S t r i n g, Integer > ( ) ; public void execute ( Tuple tuple, B a s i c O u t p u t C o l l e c t o r c o l l e c t o r ) { S t r i n g word = t u p l e g e t S t r i n g ( 0 ) ; i n t count ; i f ( _counts containskey ( word ) ) { count = _counts get ( word ) ; } else { count = 0; } count + + ; _counts put ( word, count ) ; c o l l e c t o r emit ( new Values ( word, count ) ) ; } } public void d eclareoutputfields ( O u t p u t F i e l d s D e c larer d e c l a r e r ) { d e c l a r e r declare ( new F i e l d s ( " word ", " count " ) ) ; } Roman Kern (KTI, TU Graz) Distributed Architectures / 64

67 Summary Summary Main things to watch out Roman Kern (KTI, TU Graz) Distributed Architectures / 64

68 Summary Summary If the system needs to be scalable, it needs to be appropriately designed In a simple scenario, the load is distributed via individual operations For more demanding operations, specific approaches are necessary Roman Kern (KTI, TU Graz) Distributed Architectures / 64

69 Summary Summary The simple scenario Scalability limited often limited by dedicated central components Eg the master node Performance bottlenecks for shared resources No guarantee on execution order Limited suitable for interactive applications Roman Kern (KTI, TU Graz) Distributed Architectures / 64

70 Summary Summary The scenario with a complex operation Scalability is very good High complexity when implementing Not suited for interactive applications Roman Kern (KTI, TU Graz) Distributed Architectures / 64

71 Summary The End Next: Examination Roman Kern (KTI, TU Graz) Distributed Architectures / 64

Information Retrieval Using Boolean Model SEEM5680

Information Retrieval Using Boolean Model SEEM5680 Information Retrieval Using Boolean Model SEEM5680 1 Unstructured (text) vs. structured (database) data in 1996 2 2 Unstructured (text) vs. structured (database) data in 2009 3 3 The problem of IR Goal

More information

MapReduce in Spark. Krzysztof Dembczyński. Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland

MapReduce in Spark. Krzysztof Dembczyński. Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland MapReduce in Spark Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, second semester

More information

CS425: Algorithms for Web Scale Data

CS425: Algorithms for Web Scale Data CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original slides can be accessed at: www.mmds.org Challenges

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016) Week 12: Real-Time Data Analytics (2/2) March 31, 2016 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo

More information

boolean queries Inverted index query processing Query optimization boolean model January 15, / 35

boolean queries Inverted index query processing Query optimization boolean model January 15, / 35 boolean model January 15, 2017 1 / 35 Outline 1 boolean queries 2 3 4 2 / 35 taxonomy of IR models Set theoretic fuzzy extended boolean set-based IR models Boolean vector probalistic algebraic generalized

More information

MapReduce in Spark. Krzysztof Dembczyński. Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland

MapReduce in Spark. Krzysztof Dembczyński. Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland MapReduce in Spark Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first semester

More information

Knowledge Discovery and Data Mining 1 (VO) ( )

Knowledge Discovery and Data Mining 1 (VO) ( ) Knowledge Discovery and Data Mining 1 (VO) (707.003) Map-Reduce Denis Helic KTI, TU Graz Oct 24, 2013 Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 1 / 82 Big picture: KDDM Probability Theory Linear Algebra

More information

V.4 MapReduce. 1. System Architecture 2. Programming Model 3. Hadoop. Based on MRS Chapter 4 and RU Chapter 2 IR&DM 13/ 14 !74

V.4 MapReduce. 1. System Architecture 2. Programming Model 3. Hadoop. Based on MRS Chapter 4 and RU Chapter 2 IR&DM 13/ 14 !74 V.4 MapReduce. System Architecture 2. Programming Model 3. Hadoop Based on MRS Chapter 4 and RU Chapter 2!74 Why MapReduce? Large clusters of commodity computers (as opposed to few supercomputers) Challenges:

More information

The Boolean Model ~1955

The Boolean Model ~1955 The Boolean Model ~1955 The boolean model is the first, most criticized, and (until a few years ago) commercially more widespread, model of IR. Its functionalities can often be found in the Advanced Search

More information

Spatial Analytics Workshop

Spatial Analytics Workshop Spatial Analytics Workshop Pete Skomoroch, LinkedIn (@peteskomoroch) Kevin Weil, Twitter (@kevinweil) Sean Gorman, FortiusOne (@seangorman) #spatialanalytics Introduction The Rise of Spatial Analytics

More information

Big Data Analytics. Lucas Rego Drumond

Big Data Analytics. Lucas Rego Drumond Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Map Reduce I Map Reduce I 1 / 32 Outline 1. Introduction 2. Parallel

More information

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University Che-Wei Chang chewei@mail.cgu.edu.tw Department of Computer Science and Information Engineering, Chang Gung University } 2017/11/15 Midterm } 2017/11/22 Final Project Announcement 2 1. Introduction 2.

More information

Rainfall data analysis and storm prediction system

Rainfall data analysis and storm prediction system Rainfall data analysis and storm prediction system SHABARIRAM, M. E. Available from Sheffield Hallam University Research Archive (SHURA) at: http://shura.shu.ac.uk/15778/ This document is the author deposited

More information

CS 347. Parallel and Distributed Data Processing. Spring Notes 11: MapReduce

CS 347. Parallel and Distributed Data Processing. Spring Notes 11: MapReduce CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 11: MapReduce Motivation Distribution makes simple computations complex Communication Load balancing Fault tolerance Not all applications

More information

Data-Intensive Distributed Computing

Data-Intensive Distributed Computing Data-Intensive Distributed Computing CS 451/651 431/631 (Winter 2018) Part 9: Real-Time Data Analytics (2/2) March 29, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo

More information

Lecture 4: Process Management

Lecture 4: Process Management Lecture 4: Process Management Process Revisited 1. What do we know so far about Linux on X-86? X-86 architecture supports both segmentation and paging. 48-bit logical address goes through the segmentation

More information

Distributed Systems Principles and Paradigms. Chapter 06: Synchronization

Distributed Systems Principles and Paradigms. Chapter 06: Synchronization Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 06: Synchronization Version: November 16, 2009 2 / 39 Contents Chapter

More information

Distributed Systems Principles and Paradigms

Distributed Systems Principles and Paradigms Distributed Systems Principles and Paradigms Chapter 6 (version April 7, 28) Maarten van Steen Vrije Universiteit Amsterdam, Faculty of Science Dept. Mathematics and Computer Science Room R4.2. Tel: (2)

More information

Clustering algorithms distributed over a Cloud Computing Platform.

Clustering algorithms distributed over a Cloud Computing Platform. Clustering algorithms distributed over a Cloud Computing Platform. SEPTEMBER 28 TH 2012 Ph. D. thesis supervised by Pr. Fabrice Rossi. Matthieu Durut (Telecom/Lokad) 1 / 55 Outline. 1 Introduction to Cloud

More information

SDS developer guide. Develop distributed and parallel applications in Java. Nathanaël Cottin. version

SDS developer guide. Develop distributed and parallel applications in Java. Nathanaël Cottin. version SDS developer guide Develop distributed and parallel applications in Java Nathanaël Cottin sds@ncottin.net http://sds.ncottin.net version 0.0.3 Copyright 2007 - Nathanaël Cottin Permission is granted to

More information

Outline. PeerSim: Informal introduction. Resources. What is PeerSim? Alberto Montresor Gianluca Ciccarelli

Outline. PeerSim: Informal introduction. Resources. What is PeerSim? Alberto Montresor Gianluca Ciccarelli Outline PeerSim: Informal introduction Alberto Montresor Gianluca Ciccarelli Networking group - University of Trento April 3, 2009 1 2 files structure 3 1 / 45 2 / 45 Resources What is PeerSim? These slides

More information

Ricerca dell Informazione nel Web. Aris Anagnostopoulos

Ricerca dell Informazione nel Web. Aris Anagnostopoulos Ricerca dell Informazione nel Web Aris Anagnostopoulos Docenti Dr. Aris Anagnostopoulos http://aris.me Stanza B118 Ricevimento: Inviate email a: aris@cs.brown.edu Laboratorio: Dr.ssa Ilaria Bordino (Yahoo!

More information

Operating Systems. VII. Synchronization

Operating Systems. VII. Synchronization Operating Systems VII. Synchronization Ludovic Apvrille ludovic.apvrille@telecom-paristech.fr Eurecom, office 470 http://soc.eurecom.fr/os/ @OS Eurecom Outline Synchronization issues 2/22 Fall 2017 Institut

More information

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK).

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). Boolean and Vector Space Retrieval Models 2013 CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). 1 Table of Content Boolean model Statistical vector space model Retrieval

More information

STATISTICAL PERFORMANCE

STATISTICAL PERFORMANCE STATISTICAL PERFORMANCE PROVISIONING AND ENERGY EFFICIENCY IN DISTRIBUTED COMPUTING SYSTEMS Nikzad Babaii Rizvandi 1 Supervisors: Prof.Albert Y.Zomaya Prof. Aruna Seneviratne OUTLINE Introduction Background

More information

Clock-driven scheduling

Clock-driven scheduling Clock-driven scheduling Also known as static or off-line scheduling Michal Sojka Czech Technical University in Prague, Faculty of Electrical Engineering, Department of Control Engineering November 8, 2017

More information

CIS 4930/6930: Principles of Cyber-Physical Systems

CIS 4930/6930: Principles of Cyber-Physical Systems CIS 4930/6930: Principles of Cyber-Physical Systems Chapter 11 Scheduling Hao Zheng Department of Computer Science and Engineering University of South Florida H. Zheng (CSE USF) CIS 4930/6930: Principles

More information

Lab Course: distributed data analytics

Lab Course: distributed data analytics Lab Course: distributed data analytics 01. Threading and Parallelism Nghia Duong-Trung, Mohsan Jameel Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim, Germany International

More information

Large-Scale Behavioral Targeting

Large-Scale Behavioral Targeting Large-Scale Behavioral Targeting Ye Chen, Dmitry Pavlov, John Canny ebay, Yandex, UC Berkeley (This work was conducted at Yahoo! Labs.) June 30, 2009 Chen et al. (KDD 09) Large-Scale Behavioral Targeting

More information

Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models

Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models Chengjie Qin 1, Martin Torres 2, and Florin Rusu 2 1 GraphSQL, Inc. 2 University of California Merced August 31, 2017 Machine

More information

COMPUTING SIMILARITY BETWEEN DOCUMENTS (OR ITEMS) This part is to a large extent based on slides obtained from

COMPUTING SIMILARITY BETWEEN DOCUMENTS (OR ITEMS) This part is to a large extent based on slides obtained from COMPUTING SIMILARITY BETWEEN DOCUMENTS (OR ITEMS) This part is to a large extent based on slides obtained from http://www.mmds.org Distance Measures For finding similar documents, we consider the Jaccard

More information

How to deal with uncertainties and dynamicity?

How to deal with uncertainties and dynamicity? How to deal with uncertainties and dynamicity? http://graal.ens-lyon.fr/ lmarchal/scheduling/ 19 novembre 2012 1/ 37 Outline 1 Sensitivity and Robustness 2 Analyzing the sensitivity : the case of Backfilling

More information

Computational Frameworks. MapReduce

Computational Frameworks. MapReduce Computational Frameworks MapReduce 1 Computational challenges in data mining Computation-intensive algorithms: e.g., optimization algorithms, graph algorithms Large inputs: e.g., web, social networks data.

More information

CSCE 561 Information Retrieval System Models

CSCE 561 Information Retrieval System Models CSCE 561 Information Retrieval System Models Satya Katragadda 26 August 2015 Agenda Introduction to Information Retrieval Inverted Index IR System Models Boolean Retrieval Model 2 Introduction Information

More information

Document Similarity in Information Retrieval

Document Similarity in Information Retrieval Document Similarity in Information Retrieval Mausam (Based on slides of W. Arms, Dan Jurafsky, Thomas Hofmann, Ata Kaban, Chris Manning, Melanie Martin) Sec. 1.1 Unstructured data in 1620 Which plays of

More information

CPU SCHEDULING RONG ZHENG

CPU SCHEDULING RONG ZHENG CPU SCHEDULING RONG ZHENG OVERVIEW Why scheduling? Non-preemptive vs Preemptive policies FCFS, SJF, Round robin, multilevel queues with feedback, guaranteed scheduling 2 SHORT-TERM, MID-TERM, LONG- TERM

More information

Impression Store: Compressive Sensing-based Storage for. Big Data Analytics

Impression Store: Compressive Sensing-based Storage for. Big Data Analytics Impression Store: Compressive Sensing-based Storage for Big Data Analytics Jiaxing Zhang, Ying Yan, Liang Jeff Chen, Minjie Wang, Thomas Moscibroda & Zheng Zhang Microsoft Research The Curse of O(N) in

More information

CSE 380 Computer Operating Systems

CSE 380 Computer Operating Systems CSE 380 Computer Operating Systems Instructor: Insup Lee & Dianna Xu University of Pennsylvania, Fall 2003 Lecture Note 3: CPU Scheduling 1 CPU SCHEDULING q How can OS schedule the allocation of CPU cycles

More information

Causal Consistency for Geo-Replicated Cloud Storage under Partial Replication

Causal Consistency for Geo-Replicated Cloud Storage under Partial Replication Causal Consistency for Geo-Replicated Cloud Storage under Partial Replication Min Shen, Ajay D. Kshemkalyani, TaYuan Hsu University of Illinois at Chicago Min Shen, Ajay D. Kshemkalyani, TaYuan Causal

More information

Dealing with Text Databases

Dealing with Text Databases Dealing with Text Databases Unstructured data Boolean queries Sparse matrix representation Inverted index Counts vs. frequencies Term frequency tf x idf term weights Documents as vectors Cosine similarity

More information

CMP 338: Third Class

CMP 338: Third Class CMP 338: Third Class HW 2 solution Conversion between bases The TINY processor Abstraction and separation of concerns Circuit design big picture Moore s law and chip fabrication cost Performance What does

More information

NEC PerforCache. Influence on M-Series Disk Array Behavior and Performance. Version 1.0

NEC PerforCache. Influence on M-Series Disk Array Behavior and Performance. Version 1.0 NEC PerforCache Influence on M-Series Disk Array Behavior and Performance. Version 1.0 Preface This document describes L2 (Level 2) Cache Technology which is a feature of NEC M-Series Disk Array implemented

More information

Query Analyzer for Apache Pig

Query Analyzer for Apache Pig Imperial College London Department of Computing Individual Project: Final Report Query Analyzer for Apache Pig Author: Robert Yau Zhou 00734205 (robert.zhou12@imperial.ac.uk) Supervisor: Dr Peter McBrien

More information

Computational Frameworks. MapReduce

Computational Frameworks. MapReduce Computational Frameworks MapReduce 1 Computational complexity: Big data challenges Any processing requiring a superlinear number of operations may easily turn out unfeasible. If input size is really huge,

More information

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 Star Joins A common structure for data mining of commercial data is the star join. For example, a chain store like Walmart keeps a fact table whose tuples each

More information

Visualizing Big Data on Maps: Emerging Tools and Techniques. Ilir Bejleri, Sanjay Ranka

Visualizing Big Data on Maps: Emerging Tools and Techniques. Ilir Bejleri, Sanjay Ranka Visualizing Big Data on Maps: Emerging Tools and Techniques Ilir Bejleri, Sanjay Ranka Topics Web GIS Visualization Big Data GIS Performance Maps in Data Visualization Platforms Next: Web GIS Visualization

More information

Administrivia. Course Objectives. Overview. Lecture Notes Week markem/cs333/ 2. Staff. 3. Prerequisites. 4. Grading. 1. Theory and application

Administrivia. Course Objectives. Overview. Lecture Notes Week markem/cs333/ 2. Staff. 3. Prerequisites. 4. Grading. 1. Theory and application Administrivia 1. markem/cs333/ 2. Staff 3. Prerequisites 4. Grading Course Objectives 1. Theory and application 2. Benefits 3. Labs TAs Overview 1. What is a computer system? CPU PC ALU System bus Memory

More information

Maps performance tips On server: Maintain DB connections, prepared statements (per thread/request!)

Maps performance tips On server: Maintain DB connections, prepared statements (per thread/request!) Announcements Maps performance tips On server: Maintain DB connections, prepared statements (per thread/request!) Use Spark.before, Spark.after to open and close. Use ThreadLocal, or pass the connection

More information

Query CS347. Term-document incidence. Incidence vectors. Which plays of Shakespeare contain the words Brutus ANDCaesar but NOT Calpurnia?

Query CS347. Term-document incidence. Incidence vectors. Which plays of Shakespeare contain the words Brutus ANDCaesar but NOT Calpurnia? Query CS347 Which plays of Shakespeare contain the words Brutus ANDCaesar but NOT Calpurnia? Lecture 1 April 4, 2001 Prabhakar Raghavan Term-document incidence Incidence vectors Antony and Cleopatra Julius

More information

Description of the ED library Basic Atoms

Description of the ED library Basic Atoms Description of the ED library Basic Atoms Simulation Software / Description of the ED library BASIC ATOMS Enterprise Dynamics Copyright 2010 Incontrol Simulation Software B.V. All rights reserved Papendorpseweg

More information

Let s now begin to formalize our analysis of sequential machines Powerful methods for designing machines for System control Pattern recognition Etc.

Let s now begin to formalize our analysis of sequential machines Powerful methods for designing machines for System control Pattern recognition Etc. Finite State Machines Introduction Let s now begin to formalize our analysis of sequential machines Powerful methods for designing machines for System control Pattern recognition Etc. Such devices form

More information

Agreement. Today. l Coordination and agreement in group communication. l Consensus

Agreement. Today. l Coordination and agreement in group communication. l Consensus Agreement Today l Coordination and agreement in group communication l Consensus Events and process states " A distributed system a collection P of N singlethreaded processes w/o shared memory Each process

More information

Clocks in Asynchronous Systems

Clocks in Asynchronous Systems Clocks in Asynchronous Systems The Internet Network Time Protocol (NTP) 8 Goals provide the ability to externally synchronize clients across internet to UTC provide reliable service tolerating lengthy

More information

Outline for today. Information Retrieval. Cosine similarity between query and document. tf-idf weighting

Outline for today. Information Retrieval. Cosine similarity between query and document. tf-idf weighting Outline for today Information Retrieval Efficient Scoring and Ranking Recap on ranked retrieval Jörg Tiedemann jorg.tiedemann@lingfil.uu.se Department of Linguistics and Philology Uppsala University Efficient

More information

Distributed Systems Part II Solution to Exercise Sheet 10

Distributed Systems Part II Solution to Exercise Sheet 10 Distributed Computing HS 2012 Prof. R. Wattenhofer / C. Decker Distributed Systems Part II Solution to Exercise Sheet 10 1 Spin Locks A read-write lock is a lock that allows either multiple processes to

More information

INF Models of concurrency

INF Models of concurrency Monitors INF4140 - Models of concurrency Monitors, lecture 4 Fall 2017 27. September 2017 2 / 49 Overview Concurrent execution of different processes Communication by shared variables Processes may interfere

More information

Modern Functional Programming and Actors With Scala and Akka

Modern Functional Programming and Actors With Scala and Akka Modern Functional Programming and Actors With Scala and Akka Aaron Kosmatin Computer Science Department San Jose State University San Jose, CA 95192 707-508-9143 akosmatin@gmail.com Abstract For many years,

More information

COMPUTER SCIENCE TRIPOS

COMPUTER SCIENCE TRIPOS CST.2016.2.1 COMPUTER SCIENCE TRIPOS Part IA Tuesday 31 May 2016 1.30 to 4.30 COMPUTER SCIENCE Paper 2 Answer one question from each of Sections A, B and C, and two questions from Section D. Submit the

More information

Analysis of Software Artifacts

Analysis of Software Artifacts Analysis of Software Artifacts System Performance I Shu-Ngai Yeung (with edits by Jeannette Wing) Department of Statistics Carnegie Mellon University Pittsburgh, PA 15213 2001 by Carnegie Mellon University

More information

Time in Distributed Systems: Clocks and Ordering of Events

Time in Distributed Systems: Clocks and Ordering of Events Time in Distributed Systems: Clocks and Ordering of Events Clocks in Distributed Systems Needed to Order two or more events happening at same or different nodes (Ex: Consistent ordering of updates at different

More information

Time. To do. q Physical clocks q Logical clocks

Time. To do. q Physical clocks q Logical clocks Time To do q Physical clocks q Logical clocks Events, process states and clocks A distributed system A collection P of N single-threaded processes (p i, i = 1,, N) without shared memory The processes in

More information

The conceptual view. by Gerrit Muller University of Southeast Norway-NISE

The conceptual view. by Gerrit Muller University of Southeast Norway-NISE by Gerrit Muller University of Southeast Norway-NISE e-mail: gaudisite@gmail.com www.gaudisite.nl Abstract The purpose of the conceptual view is described. A number of methods or models is given to use

More information

Clojure Concurrency Constructs, Part Two. CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014

Clojure Concurrency Constructs, Part Two. CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014 Clojure Concurrency Constructs, Part Two CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014 1 Goals Cover the material presented in Chapter 4, of our concurrency textbook In particular,

More information

Real Time Operating Systems

Real Time Operating Systems Real Time Operating ystems Luca Abeni luca.abeni@unitn.it Interacting Tasks Until now, only independent tasks... A job never blocks or suspends A task only blocks on job termination In real world, jobs

More information

INF Models of concurrency

INF Models of concurrency INF4140 - Models of concurrency Fall 2017 October 17, 2017 Abstract This is the handout version of the slides for the lecture (i.e., it s a rendering of the content of the slides in a way that does not

More information

Chapter 11 Time and Global States

Chapter 11 Time and Global States CSD511 Distributed Systems 分散式系統 Chapter 11 Time and Global States 吳俊興 國立高雄大學資訊工程學系 Chapter 11 Time and Global States 11.1 Introduction 11.2 Clocks, events and process states 11.3 Synchronizing physical

More information

Apache Quarks for Developers April 13, 2016

Apache Quarks for Developers April 13, 2016 Apache Quarks for Developers April 13, 2016 Apache Quarks is currently undergoing Incubation at the Apache Software Foundation. Quarks Development Console - Topics Who am I? Susan Cline, Quarks committer,

More information

CS276A Text Information Retrieval, Mining, and Exploitation. Lecture 4 15 Oct 2002

CS276A Text Information Retrieval, Mining, and Exploitation. Lecture 4 15 Oct 2002 CS276A Text Information Retrieval, Mining, and Exploitation Lecture 4 15 Oct 2002 Recap of last time Index size Index construction techniques Dynamic indices Real world considerations 2 Back of the envelope

More information

Trivadis Integration Blueprint V0.1

Trivadis Integration Blueprint V0.1 Spring Integration Peter Welkenbach Principal Consultant peter.welkenbach@trivadis.com Agenda Integration Blueprint Enterprise Integration Patterns Spring Integration Goals Main Components Architectures

More information

Design Patterns and Refactoring

Design Patterns and Refactoring Singleton Oliver Haase HTWG Konstanz 1 / 19 Description I Classification: object based creational pattern Puropse: ensure that a class can be instantiated exactly once provide global access point to single

More information

Lecture Note #6: More on Task Scheduling EECS 571 Principles of Real-Time Embedded Systems Kang G. Shin EECS Department University of Michigan

Lecture Note #6: More on Task Scheduling EECS 571 Principles of Real-Time Embedded Systems Kang G. Shin EECS Department University of Michigan Lecture Note #6: More on Task Scheduling EECS 571 Principles of Real-Time Embedded Systems Kang G. Shin EECS Department University of Michigan Note 6-1 Mars Pathfinder Timing Hiccups? When: landed on the

More information

Portal for ArcGIS: An Introduction

Portal for ArcGIS: An Introduction Portal for ArcGIS: An Introduction Derek Law Esri Product Management Esri UC 2014 Technical Workshop Agenda Web GIS pattern Product overview Installation and deployment Security and groups Configuration

More information

1 Boolean retrieval. Online edition (c)2009 Cambridge UP

1 Boolean retrieval. Online edition (c)2009 Cambridge UP DRAFT! April 1, 2009 Cambridge University Press. Feedback welcome. 1 1 Boolean retrieval INFORMATION RETRIEVAL The meaning of the term information retrieval can be very broad. Just getting a credit card

More information

Introduction to Portal for ArcGIS. Hao LEE November 12, 2015

Introduction to Portal for ArcGIS. Hao LEE November 12, 2015 Introduction to Portal for ArcGIS Hao LEE November 12, 2015 Agenda Web GIS pattern Product overview Installation and deployment Security and groups Configuration options Portal for ArcGIS + ArcGIS for

More information

Coordination. Failures and Consensus. Consensus. Consensus. Overview. Properties for Correct Consensus. Variant I: Consensus (C) P 1. v 1.

Coordination. Failures and Consensus. Consensus. Consensus. Overview. Properties for Correct Consensus. Variant I: Consensus (C) P 1. v 1. Coordination Failures and Consensus If the solution to availability and scalability is to decentralize and replicate functions and data, how do we coordinate the nodes? data consistency update propagation

More information

The World Bank and the Open Geospatial Web. Chris Holmes

The World Bank and the Open Geospatial Web. Chris Holmes The World Bank and the Open Geospatial Web Chris Holmes Geospatial is Everywhere QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture. Spatial Data Infrastructure (SDI) the sources,

More information

416 Distributed Systems

416 Distributed Systems 416 Distributed Systems RAID, Feb 26 2018 Thanks to Greg Ganger and Remzi Arapaci-Dusseau for slides Outline Using multiple disks Why have multiple disks? problem and approaches RAID levels and performance

More information

Scalability is Quantifiable

Scalability is Quantifiable Scalability is Quantifiable Universal Scalability Law Baron Schwartz - November 2017 Logistics & Stuff Slides will be posted :) Ask questions anytime! Founder of VividCortex Wrote High Performance MySQL

More information

CPU Scheduling. CPU Scheduler

CPU Scheduling. CPU Scheduler CPU Scheduling These slides are created by Dr. Huang of George Mason University. Students registered in Dr. Huang s courses at GMU can make a single machine readable copy and print a single copy of each

More information

AstroPortal: A Science Gateway for Large-scale Astronomy Data Analysis

AstroPortal: A Science Gateway for Large-scale Astronomy Data Analysis AstroPortal: A Science Gateway for Large-scale Astronomy Data Analysis Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago Joint work with: Ian Foster: Univ. of

More information

COMP9334 Capacity Planning for Computer Systems and Networks

COMP9334 Capacity Planning for Computer Systems and Networks COMP9334 Capacity Planning for Computer Systems and Networks Week 2: Operational Analysis and Workload Characterisation COMP9334 1 Last lecture Modelling of computer systems using Queueing Networks Open

More information

CS/IT OPERATING SYSTEMS

CS/IT OPERATING SYSTEMS CS/IT 5 (CR) Total No. of Questions :09] [Total No. of Pages : 0 II/IV B.Tech. DEGREE EXAMINATIONS, DECEMBER- 06 CS/IT OPERATING SYSTEMS. a) System Boot Answer Question No. Compulsory. Answer One Question

More information

Distributed Systems. Time, Clocks, and Ordering of Events

Distributed Systems. Time, Clocks, and Ordering of Events Distributed Systems Time, Clocks, and Ordering of Events Björn Franke University of Edinburgh 2016/2017 Today Last lecture: Basic Algorithms Today: Time, clocks, NTP Ref: CDK Causality, ordering, logical

More information

INF 4140: Models of Concurrency Series 3

INF 4140: Models of Concurrency Series 3 Universitetet i Oslo Institutt for Informatikk PMA Olaf Owe, Martin Steffen, Toktam Ramezani INF 4140: Models of Concurrency Høst 2016 Series 3 14. 9. 2016 Topic: Semaphores (Exercises with hints for solution)

More information

High Performance Computing

High Performance Computing Master Degree Program in Computer Science and Networking, 2014-15 High Performance Computing 2 nd appello February 11, 2015 Write your name, surname, student identification number (numero di matricola),

More information

Module 5: CPU Scheduling

Module 5: CPU Scheduling Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation 5.1 Basic Concepts Maximum CPU utilization obtained

More information

Marwan Burelle. Parallel and Concurrent Programming. Introduction and Foundation

Marwan Burelle.  Parallel and Concurrent Programming. Introduction and Foundation and and marwan.burelle@lse.epita.fr http://wiki-prog.kh405.net Outline 1 2 and 3 and Evolutions and Next evolutions in processor tends more on more on growing of cores number GPU and similar extensions

More information

Session-Based Queueing Systems

Session-Based Queueing Systems Session-Based Queueing Systems Modelling, Simulation, and Approximation Jeroen Horters Supervisor VU: Sandjai Bhulai Executive Summary Companies often offer services that require multiple steps on the

More information

Complex Systems Design & Distributed Calculus and Coordination

Complex Systems Design & Distributed Calculus and Coordination Complex Systems Design & Distributed Calculus and Coordination Concurrency and Process Algebras: Theory and Practice Francesco Tiezzi University of Camerino francesco.tiezzi@unicam.it A.A. 2014/2015 F.

More information

Chapter 6: CPU Scheduling

Chapter 6: CPU Scheduling Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation 6.1 Basic Concepts Maximum CPU utilization obtained

More information

Introduction to Portal for ArcGIS

Introduction to Portal for ArcGIS Introduction to Portal for ArcGIS Derek Law Product Management March 10 th, 2015 Esri Developer Summit 2015 Agenda Web GIS pattern Product overview Installation and deployment Security and groups Configuration

More information

EECS150 - Digital Design Lecture 23 - FFs revisited, FIFOs, ECCs, LSFRs. Cross-coupled NOR gates

EECS150 - Digital Design Lecture 23 - FFs revisited, FIFOs, ECCs, LSFRs. Cross-coupled NOR gates EECS150 - Digital Design Lecture 23 - FFs revisited, FIFOs, ECCs, LSFRs April 16, 2009 John Wawrzynek Spring 2009 EECS150 - Lec24-blocks Page 1 Cross-coupled NOR gates remember, If both R=0 & S=0, then

More information

UC Santa Barbara. Operating Systems. Christopher Kruegel Department of Computer Science UC Santa Barbara

UC Santa Barbara. Operating Systems. Christopher Kruegel Department of Computer Science UC Santa Barbara Operating Systems Christopher Kruegel Department of Computer Science http://www.cs.ucsb.edu/~chris/ Many processes to execute, but one CPU OS time-multiplexes the CPU by operating context switching Between

More information

Scheduling I. Today. Next Time. ! Introduction to scheduling! Classical algorithms. ! Advanced topics on scheduling

Scheduling I. Today. Next Time. ! Introduction to scheduling! Classical algorithms. ! Advanced topics on scheduling Scheduling I Today! Introduction to scheduling! Classical algorithms Next Time! Advanced topics on scheduling Scheduling out there! You are the manager of a supermarket (ok, things don t always turn out

More information

CS 700: Quantitative Methods & Experimental Design in Computer Science

CS 700: Quantitative Methods & Experimental Design in Computer Science CS 700: Quantitative Methods & Experimental Design in Computer Science Sanjeev Setia Dept of Computer Science George Mason University Logistics Grade: 35% project, 25% Homework assignments 20% midterm,

More information

Socket Programming. Daniel Zappala. CS 360 Internet Programming Brigham Young University

Socket Programming. Daniel Zappala. CS 360 Internet Programming Brigham Young University Socket Programming Daniel Zappala CS 360 Internet Programming Brigham Young University Sockets, Addresses, Ports Clients and Servers 3/33 clients request a service from a server using a protocol need an

More information

C 1. Recap: Finger Table. CSE 486/586 Distributed Systems Consensus. One Reason: Impossibility of Consensus. Let s Consider This

C 1. Recap: Finger Table. CSE 486/586 Distributed Systems Consensus. One Reason: Impossibility of Consensus. Let s Consider This Recap: Finger Table Finding a using fingers Distributed Systems onsensus Steve Ko omputer Sciences and Engineering University at Buffalo N102 86 + 2 4 N86 20 + 2 6 N20 2 Let s onsider This

More information

CPU scheduling. CPU Scheduling

CPU scheduling. CPU Scheduling EECS 3221 Operating System Fundamentals No.4 CPU scheduling Prof. Hui Jiang Dept of Electrical Engineering and Computer Science, York University CPU Scheduling CPU scheduling is the basis of multiprogramming

More information

Introduction to Randomized Algorithms III

Introduction to Randomized Algorithms III Introduction to Randomized Algorithms III Joaquim Madeira Version 0.1 November 2017 U. Aveiro, November 2017 1 Overview Probabilistic counters Counting with probability 1 / 2 Counting with probability

More information

Our Problem. Model. Clock Synchronization. Global Predicate Detection and Event Ordering

Our Problem. Model. Clock Synchronization. Global Predicate Detection and Event Ordering Our Problem Global Predicate Detection and Event Ordering To compute predicates over the state of a distributed application Model Clock Synchronization Message passing No failures Two possible timing assumptions:

More information