Introduction to Column Stores with MonetDB and Benchmark

Size: px
Start display at page:

Download "Introduction to Column Stores with MonetDB and Benchmark"

Transcription

1 Introduction to Column Stores with MonetDB and Benchmark Seminar Database Systems Master of Science in Engineering Major Software and Systems HSR Hochschule für Technik Rapperswil Supervisor: Prof. Stefan Keller Author: Jannis Grimm Rapperswil, February 2016

2 Abstract Column-Store databases (also: Column-oriented databases) are a new direction for Database Management Systems. This paper will explain the differences to traditional row-oriented databases, go to greater detail with MonetDB as an column-store database example and conclude with a comparing benchmark between PostgreSQL and MonetDB. Contents 1. Introduction 5 2. Column-Store Databases Main differences to row-based databases Advantages and Disadvantages to Row-based Databases MonetDB Introduction 7 4. MonetDB s Column-Store Implementation Vertical Fragmentation Execution Engine Recycling and Updates Adaptive Indexes (Database Cracking) Data Compression Best Practices Benchmark Adaption From PostgreSQL to MonetDB Droping existing tables Create tables Import from CSV echo Rename fields Benchmark Execution Environment Physical Machine

3 6.2. Virtual Machine Benchmark Execution Results Table gnis Query 1a and 1b Query 2a and 2b Query 3a and 3b Query 4a and 4b Table osm_poi_ch Query 10x, 10a, 10b, and 10c Query 11a, 11b, and 11c Query 12a, 12b, and 12c Query 13a, 13b, and 13c Query 14a, 14b, and 14c Conclusion 22 References 24 A. List of Figures 26 B. Bachelor/Master Lab Lession 26 B.1. Installation B.2. Create and Import B.3. Insert, Update and Delete B.4. Comparison B.5. Database Cracking C. PostgreSQL Scripts 29 C.1. gnis_load.sql C.2. osm_poi_ch_load.sql C.3. bm_prepare.sql C.4. bm.sql D. MonetDB Scripts 37 D.1. gnis_load.sql D.2. osm_poi_ch_load.sql D.3. bm_prepare.sql

4 D.4. bm.sql E. PostgreSQL Benchmark Commands and Output 44 F. MonetDB Benchmark Commands and Output 45 4

5 1. Introduction Column-Store databases (also: Columnoriented databases) are a new direction for Database Management Systems. This paper will explain the differences to traditional row-oriented databases, go to greater detail with MonetDB as a column-store database example and conclude with a comparing benchmark between PostgreSQL and MonetDB. Chapter 2 will start with the introduction to Column-Store Databases, together with a comparision to traditional row-based databases. The Chapter 3 introduces MonetDB as an example of a column-store database. The implementation of the column-store theory in MonetDB is the topic of Chapter 4, this chapter will also cover the best practices of working with MonetDB. Chapters 5, 6, and 7 topic is the benchmark. While Chapter 5 explains how the given PostgreSQL benchmark implementation was ported to a MonetDB version, Chapter 6 will go into greater detail about the environment in which the benchmark was executed. Chapter 7 will show the results, together with diagrams and an explanation how the differences between the PostgreSQL and MonetDB times arise. Finally, Chapter 8 will give a conclusion and summarize the most importing findings of this paper. 2. Column-Store Databases 2.1. Main differences to row-based databases Most databases store data tables in a row-based format, i.e. in every table, each tuple follows the other. The values of each tuple s columns follow sequentially for one row before the values for the next row. If there is a table with the columns 1, 2, 3, and 4, the database would internally store for the first tuple, followed by for the second tuple, and so on. This is shown in Figure 1. Figure 1: The storage of a table in the row-based format Because primary storage (e.g. hard disk) is accessed in blocks of consecu- 5

6 tive data, this means that full rows are loaded into CPU registers for processing. Column-oriented databases follow the opposit storing pattern, i.e. vertical fragmentation: The values for a column of each row is stored before the values for the next column. With a table with the columns 1, 2, 3, and 4, the this could internally be stored like , , , This is shown in Figure 2. Figure 2: The storage of a table in the column-oriented format This allows aggregating or searching single columns without the need to load to and afterwards discard the unneeded other columns in memory. More details about the concept and implementation of column-stores can be found in [ABH + 13]. This paper will explain the implementation with the implementation in MonetDB in chapter Advantages and Disadvantages to Row-based Databases Column-oriented databases are faster when searching for values in a single column, because all values in a single column are saved together. This is especially noticable with big tables, because more values fit together in one block, which means lesser hard disk accesses are needed. With row-based databases, each value is followed by unneeded values from other columns, so each hard disk block contains fewer values from the column to search in. Additionally, column-oriented databases are faster when building aggregate values (e.g. sums) over few columns but many rows, because other than with row-based databases, not every column has to be read. The values are stored together, so both the access time and the computation time are lower than when the values are spread out in the internal data storage. Furthermore, column-oriented databases are faster when a column of every row has to be changed, because the values are stored next to each other, so 6

7 single hard disk blocks can be overwritten instead of having to parse the whole table to find the values to change. Finally, column-oriented databases can be better compressed, because similar values can be grouped together. If multiple tuples contain the same value for a column, the value only needs to be stored once together with the row numbers, instead of having it repeated for every tuple. But most often we are interested in the speed savings over the storage savings. On the other side, row-oriented databases are faster when many columns of single rows are needed and when inserting new rows with values for every column. In summary, it heavily depends on the data usage which storage system is faster. In genaral, column-oriented database are prefered for statistical usages and when most data querys contain a small subset of columns, whereas row-based databases are faster when generally most columns are read or if there are more insertions of new rows with many values than data querys. 3. MonetDB Introduction MonetDB is an open source columnstore database management system. It is being developed at the CWI database architectures research group since As described in [IGN + 12, Ch. 1], the primary targed for MonetDB was warehouse applications and it is also used for e-science, in health care, telecommunication and sciences as astronomy. MonetDB is supporting the SQL:2013 standard 1. As a column-oriented database management system, it uses vertical fragmentation and has an excecution engine tailored for columnar execution. It is designed to exploit modern hardware, e.g. large main memorys of modern computer systems, by deploying cache-conscious data structures and algorithms that make use of hierachical memory systems. Noteworthy, in difference to most other database management systems, MonetDB is optimized to minimize CPU cache misses rather than IOs, because it was found that CPU speed advances have outpaced advances in memory latency, as described in [BMK99]. It mainly focuses on analytical and scientific workloads that are read-dominated and where updates mostly consist of appending new data to the database in large chunks at a time. One main algorithm principle is supporting a priori unknown or rapidly changing workloads over large data volumes. Examples for this are the intermediate result caching 1 Supported SQL features can be found at [Mon15c], unsupported features at [Mon15d] 7

8 technique recycling and the adaptive indexing technique database cracking, which are explained in more detail in chapter 4. They require minimal overhead to provide benefit for the actual workload and hot data. MonetDB also supports extensibility. Both its core and the SQL syntax may be extended in C or MonetDB s own MAL language. This allows efficient exploitation of domain-specific data characteristics or special application requirements that go beyond the SQL standard. This extensibility will not be a topic of this paper. 4. MonetDB s Column-Store Implementation The next chapter will go into greater detail and briefly explain how the columnstore principle is implemented in MonetDB, with explanations targeted for the database user. The MonetDB documentation can be found at [Mon15b], but it does not go into depth about the technical side of the implementation. More details from the technical side can be found in [IGN + 12, Ch. 2 & 3] or with focus on spacial applications in [VQKN08] Vertical Fragmentation As a column-store database, MonetDB core concept is vertical fragmentation. Instead of storing all attributes of each relational tuple together in one record (what would be called row-store), each column is stored in a seperate table, called a BAT (Binary Association Table). Each BAT has two physical columns: The left column are the object identifiers (identifing the relational tuple), while the right column holds the actual values. Because the object identifier can be seen as an array index (where the actual values are the arrays context), it is not materialized, which also saves storage space (and thus data access times), according to [ABH + 13, Ch. 3.2] Execution Engine The MonetDB execution engine evaluates querys with a low-level two-column relational algebra. Because it is known that all physical tables follow the same layout, this can be highly optimized. The technical details can be found in [BK99]. The algebraic operations for the BATs are compiled to MonetDB s MAL language and executed with the operatorat-a-time principle: Each operation is evaluated to completion over its entire input data, before the subsequent 8

9 data-dependent opretation is executed. This allows exploiting the architecture of modern CPUs by cycling through tight loops of the same data types. (Traditional database management systems use a tuple-at-a-time, where calculations are done on a per-tuple basis.) A key aspect in the execution engine is its reliance on hardware conscious algorithms. Own algorithms are used, mainly for joins. The algorithms are optimized to make good use of the CPU cache to avoid the memory wall i.e. the bottleneck with the main memory access time. The algorithms are designed to avoid CPU cache misses, which mainly means that random data accesses are restricted to regions that fit into the cache. To fullfil this requirement, MonetDB s optimizer creates a cost model which takes the memory access cost into account. MonetDB uses late tuple materialization : The columns are converted back to tuples as late as possible. Every operation is just done with BATs and produces new BATs in memory. This allows MonetDB to use a single data structure (BAT) to manipulate a widely different data sets. The algebra is simple (e.g. in difference to traditional database management systems, the operator functions are not executed with so-called complex parameters) and thus very fast Recycling and Updates These newly created intermediate BATs are kept as long as they fit into storage and as long as they are hot. This allows the reuse in similar querys. MonetDB avoids touching the base BATs and uses already created intermediate BATs whenever possible. One main area where vertical fragmentation (column-store) is slower than row-store is updates with many columns. This would need updates to multiple BATs for simple tuple insertions. To avoid this performance problem, MonetDB uses update BATs: For each base BAT there are also update BATs where just changes to the base BATs are stored. When doing an operation with base BATs, they are joined with the update BATs. This allows MonetDB to postpone the actual base BATs update to a later moment, when multiple values can be changed at once Adaptive Indexes (Database Cracking) As dynamic data storage environments often have unknown a priori workload knowledge and little idle time to spend on reorganizing data (e.g. building indexes), traditional approaches to index building and maintenance do not apply to MonetDB. (MonetDB ignores index creation SQL statements.) 9

10 To solve this problem, MonetDB was the first implementation to use database cracking (as proposed and described in [IKM + 07]). This technique allows to adaptively, contiuously and automatically create and maintain indexes according to the workload at hand without human indexes. This avoids the need to know which indexes will be needed in the future queries and spending time to maintain indexes whose time savings are lower than the index maintainance costs. With database cracking, indexes are created incrementally, partially and on demand. The more queries are preceeded, the more the relevant indexes are optimized. An example for database cracking and what is meant by adaptive partial indexes can be found in [ABH + 13, Ch. 4.8]. data structure (plain C arrays for the BATs) with dense arrays with the least possible number of bytes per value (1 to 8 bytes). This allows efficient reading and direct mapping from storage into CPU cache memory. No overhead is produced by the storage technique (as it would be the case with B-trees and others) as values directly follow each others. For strings, MonetDB uses dictionary encoding. This allows saving storage space by only storing dictionary indexes in the BAT (also only 1 to 8 bytes) while allowing the same relational algebra on the physical BATs like with numbers. Only with large dictionaries the maintaining costs grow higher than the query savings could justifiy which is why MonetDB then switches to a non-compressed string representation Data Compression Data compression in MonetDB is based on optimized storage structures instead of compression algorithms. As MonetDB s speed focus is on using less CPU cycles, the data is not altered for compression, which would trade the saved storage space with needed more CPU cycles for decryption. The principles and effects behind this decision is described at [Mon15a]. Instead the most space savings are gained by using the smallest possible 4.6. Best Practices From these core concept follow some best practices for database administrators when working with MonetDB: Think about wether column-store database management systems really are the right choice for the application (i.e. mostly querying or running aggregation functions over single or few columns, data inserts mostly in big blocks). 10

11 Query only the needed columns (e.g. no SELECT * ). Avoid updating or inserting data with many columns. (It is slow.) Forget everything you know about indexes and let MonetDB handle them by itself. Use a 64bit computer architecture, that MonetDB can address more than 3 gigabytes of data at once. With these best practices, it is possible to get the most potential (and highest speed gains over traditional rowbased database management systems) out of MonetDB. They will show especially when working with huge (multiple gigabytes) data sets. 5. Benchmark Adaption From PostgreSQL to MonetDB One main topic for this paper was to do a MonetDB Benchmark to draw a comparision to PostgreSQL. The PostgreSQL given by Prof. Stefan Keller are in Appendix C on page 29. These scripts where adopted to MonetDB. This chapter will explain the most important changes that where needed to create the benchmark implementation for MonetDB Droping existing tables A rather classical procedure when importing whole tables is to drop an existing table with the same name. This allows repeating the same statements multiple times without getting an error that the tables already exist. PostgreSQL: 1 DROP TABLE IF EXISTS gnis ; The restriction IF EXISTS does not exist in MonetDB. It had to be deleted: MonetDB: 1 DROP TABLE gnis ; This will throw a warning on the first import, when the table does not exist, but not an error and thus can be ignored Create tables All data types and keywords used in the benchmark implementation for PostgreSQL also exist in MonetDB. No changes where needed for the data types. The following data types and keywords were used: not null primary key integer 11

12 double precision text character varying Hence the table creation statements for PostgreSQL and MonetDB are identical for this benchmark. PostgreSQL & MonetDB: 1 CREATE TABLE gnis ( 2 x double precision not null, 3 y double precision not null, 4 fid integer primary key, 5 name text, 6 class text, 7 state text, 8 county text, 9 elevation integer, 10 map text 11 ); 5.3. Import from CSV The syntax to import data from CSV into a database table is different in the two database management systems, because it is not in the SQL standard. PostgreSQL: 1 \ COPY osm_poi_tag_ch FROM osm_poi_tag_ch. csv DELIMITER ; QUOTE " CSV HEADER ; MonetDB: 1 COPY RECORDS INTO osm_poi_tag_ch FROM 2 / path / osm_poi_tag_ch. csv DELIMITERS ;, \n, " NULL AS ; Especially notable: MonetDB needs the number of rows to import, as well as an abolute (instead of relative) path. Empty fields lead to an error, because for empty values the keyword null is expected. To allow the handling of empty fields as null values, an explicit definition in the copy statement is needed, as shown above. Furthermore, it is not possible to automatically ignore the first line when importing, where this benchmark holds the column names. Thus the first line has to be removed from every CSV file echo PostgreSQL: 1 \ echo \n === Table osm_poi_ch There is no echo command for MonetDB. A SELECT was used for echoing strings. MonetDB: 1 SELECT \n === Table osm_poi_ch ; 12

13 5.5. Rename fields PostgreSQL: 1 SELECT 2 max( version ) " version " 3 FROM osm_poi_ch ; In MonetDB, an explicit AS is needed to rename fields in a query: MonetDB: 1 SELECT 2 max( version ) AS " version " 3 FROM osm_poi_ch ; 6. Benchmark Execution Environment The commands used to setup the database tables in PostgresSQL and MonetDB, and to execute the benchmark and its output on the command line follow in Appendix E on page 44 (PostgresSQL) and Appendix F on page 45 (MonetDB) Physical Machine The physical machine was used to run the benchmark without having too many other processes impacting the result. It represents a database server in this benchmark. The benchmark was run 3 times with taking the median time. Hardware Fujitsu Celsius W530, 3.40 GHz Intel Xeon CPU E v3, 16 GB 1600 MHz DDR3 RAM, 256 GB SAMSUNG SSD 840 Software Ubuntu Server 64bit 15.10, PostgreSQL 9.4.5, MonetDB Jul2015- SP Virtual Machine The virtual machine was used as a developing machine. Based on the feedback for this paper at the presentation and since the times show interesting differences to the physical machine, the times were kept in this paper. These times represent a low memory computer after multiple executions. The benchmark was run 25 times with taking the median time from the last 3 runs. Because of the multiple runs, the query times are subject to the database cracking. Host Hardware MacBook Pro (Retina, Mid 2012), 2.3 GHz Intel Core i7, 16 GB 1600 MHz DDR3 RAM, 768 GB APPLE SSD SM768E Host Software OS X El Capitan Beta (15C47a), VMware Fusion Guest Simulated Hardware 1 core, 4096 MB RAM, 20 GB Harddisk Guest Software Ubuntu Server 64bit , PostgreSQL , MonetDB Jul2015-SP1 13

14 7. Benchmark Execution Results Here the benchmark results are shown and explained. An overview of the result times (on the physical machine) is shown in figure Table gnis The first queries target the table gnis, which is a table containing points with x and y coordinates and informations about the points. It contains tuples. In this part of the benchmark, each query is run twice, hence the two numbers for each category. Because of caching, the second run is faster everytime. Interesting to see is that the virtual machine (which had more runs) is faster in every MonetDB query. It is easy to see the database cracking here: Because the querys are simple value comparisions, the BATs have the automatic index applied due to the higher number of runs on the virtual machine Query 1a and 1b 1 select id, version,lon, lat from osm_poi_ch_1mio where id= pt ; Figure 3: Benchmark results overview on physical machine, times in milliseconds 14

15 Physical machine: PostgreSQL ms, ms Physical machine: MonetDB ms, ms Virtual machine: PostgreSQL ms, ms Virtual machine: MonetDB ms, ms As this query is a primary key search, it just reduces to an index search test. As PostgreSQL has the needed index built while MonetDB still is in the process to build the needed adaptive index, PostgreSQL is faster. A diagram of the result times (on the physical machine) is shown in figure Query 2a and 2b 1 SELECT name, county, state FROM gnis t WHERE t. county = Texas ; Physical machine: PostgreSQL ms, ms Physical machine: MonetDB ms, ms Virtual machine: PostgreSQL ms, ms Virtual machine: MonetDB ms, ms Figure 4: Diagram for queries 1a and 1b on physical machine (times in milliseconds) Here the column-oriented storage can be used: To search for one value, MonetDB can exploit its propertys to be able to just load values from one column into the CPU cache. Also, MonetDB can see that this would be a good fit for an index as there are multiple points with county Texas and moves those tuples to the beginning of the BAT to build an partial index. A diagram of the result times (on the physical machine) is shown in figure Query 3a and 3b 1 SELECT avg(t. elevation ) FROM gnis t 15

16 can shine again with its adaptive indexes (as can be seen with the different times from Physical MonetDB and Virtual MonetDB even with the latter having restricted main memory). A diagram of the result times (on the physical machine) is shown in figure 6. Figure 5: Diagram for queries 2a and 2b on physical machine (times in milliseconds) 2 WHERE t.x > and t.y > and t.x < and t.y <33.460; Physical machine: PostgreSQL ms, ms Physical machine: MonetDB ms, ms Virtual machine: PostgreSQL ms, ms Virtual machine: MonetDB ms, ms Searching a big dataset for few columns is the speciality of MonetDB. It Figure 6: Diagram for queries 3a and 3b on physical machine (times in milliseconds) Query 4a and 4b 1 SELECT count (*), class FROM gnis GROUP BY class Physical machine: PostgreSQL ms, ms 16

17 Physical machine: MonetDB ms, ms Virtual machine: PostgreSQL ms, ms Virtual machine: MonetDB ms, ms Same as the above. A diagram of the result times (on the physical machine) is shown in figure 7. into three smaller tables: osm_poi_ch _1mio contains the first 1 million entries, osm_poi_ch_2mio contains the first 2 million entries, and osm_poi_ch _3mio contains the first 3 million entries. The benchmark queries use each table to see which effect the table size has Query 10x, 10a, 10b, and 10c 1 select id, version,lon, lat from osm_poi_ch_1mio where id= pt ; 2 -- Query 10 b uses table osm_poi_ch_2mio 3 -- Query 10 c uses table osm_poi_ch_3mio Physical machine: PostgreSQL 1mio: ms, ms, 2mio: 0.152, 3mio: Figure 7: Diagram for queries 4a and 4b on physical machine (times in milliseconds) 7.2. Table osm_poi_ch Table osm_poi_ch contains lat/long coordinates with additional information. For the benchmark, these table is copied Physical machine: MonetDB 1mio: ms, ms, 2mio: 0.820, 3mio: Virtual machine: PostgreSQL 1mio: ms, ms, 2mio: 0.523, 3mio: Virtual machine: MonetDB 1mio: ms, ms, 2mio: 0.759, 3mio:

18 This is the same case as the very first query: PostgreSQL is fast because it can use its prebuilt index whereas MonetDB would have to build its index at query time and decides against doing so, because it would not be worth the time for just one result (where it thinks it is improbable that the same id will be queried again shortly after). In this query, the 1mio table was queried twice to see the effect of result caching. A diagram of the result times (on the physical machine) is shown in figure Query 11a, 11b, and 11c 1 select id, version,lon, lat from osm_poi_ch_1mio where version >300 order by version desc limit 10; 2 -- Query 11 b uses table osm_poi_ch_2mio 3 -- Query 11 c uses table osm_poi_ch_3mio Physical machine: PostgreSQL 1mio: ms, 2mio: , 3mio: Physical machine: MonetDB 1mio: ms, 2mio: 3.872, 3mio: Virtual machine: PostgreSQL 1mio: ms, 2mio: , 3mio: Virtual machine: MonetDB 1mio: ms, 2mio: , 3mio: Figure 8: Diagram for queries 10a, 10b, and 10c on physical machine (times in milliseconds) Only one column to work with and lots of data: The ideal world for MonetDB. A diagram of the result times (on the physical machine) is shown in figure 9. 18

19 3mio: Physical machine: MonetDB 1mio: ms, 2mio: 6.756, 3mio: Virtual machine: PostgreSQL 1mio: ms, 2mio: , 3mio: Virtual machine: MonetDB 1mio: ms, 2mio: , 3mio: Figure 9: Diagram for queries 11a, 11b, and 11c on physical machine (times in milliseconds) Query 12a, 12b, and 12c 1 select id, version,lon, lat 2 from osm_poi_ch_1mio 3 where lon > and lat > lon < lat < and and 4 -- Query 12 b uses table osm_poi_ch_2mio 5 -- Query 12 c uses table osm_poi_ch_3mio Physical machine: PostgreSQL 1mio: ms, 2mio: , Even with two columns MonetDB can work a lot faster. Where PostgreSQL almost doubles its time from the last query, MonetDB s time is almost the same. A diagram of the result times (on the physical machine) is shown in figure Query 13a, 13b, and 13c 1 select count (id), uid 2 from osm_poi_ch_1mio 3 group by uid having count (id) >1 4 order by 1 desc limit 10; 5 -- Query 13 b uses table osm_poi_ch_2mio 6 -- Query 13 c uses table osm_poi_ch_3mio Physical machine: PostgreSQL 1mio: ms, 2mio: , 3mio:

20 gram of the result times (on the physical machine) is shown in figure 11. Figure 10: Diagram for queries 12a, 12b, and 12c on physical machine (times in milliseconds) Physical machine: MonetDB 1mio: ms, 2mio: , 3mio: Virtual machine: PostgreSQL 1mio: ms, 2mio: , 3mio: Virtual machine: MonetDB 1mio: ms, 2mio: , 3mio: As with the last queries, this shows that MonetDB is fast, no matter the number elements in the SQL query, as long as few columns are in use. A dia- Figure 11: Diagram for queries 13a, 13b, and 13c on physical machine (times in milliseconds) Query 14a, 14b, and 14c 1 select e.id, av2.value as name, av3. value cuisine, lon, lat as 2 from osm_poi_ch_1mio as e 3 join osm_poi_tag_ch as av on e.id=av.id 4 left outer join osm_poi_tag_ch as av2 on e.id=av2.id 20

21 5 left outer join osm_poi_tag_ch as av3 on e.id=av3.id 6 where 7 av.key= amenity and av.value = restaurant 8 and av2.key= name 9 and av3.key= cuisine 10 and e. lon > and e. lat > and e. lon < and e.lat < order by name 12 limit 10; Query 14 b uses table osm_poi_ch_2mio Query 14 c uses table osm_poi_ch_3mio optimization for few columns and is slower than PostgreSQL (which makes use of its indexes for the id s). But interesting to see is that the curve how much the time rises is much lower in MonetDB than in PostgreSQL: MonetDB uses its recycling technique to build the intermediate BATs for this query, which it can reuse for the av2 and av3 tables. Together with its database cracking on those intermediate BATs, it can hugely profit from its optimisations. A diagram of the result times (on the physical machine) is shown in figure 12. Physical machine: PostgreSQL 1mio: ms, 2mio: , 3mio: Physical machine: MonetDB 1mio: ms, 2mio: , 3mio: Virtual machine: PostgreSQL 1mio: ms, 2mio: , 3mio: Virtual machine: MonetDB 1mio: ms, 2mio: , 3mio: As here there are many columns involved, MonetDB cannot fully use its Figure 12: Diagram for queries 14a, 14b, and 14c on physical machine (times in milliseconds) 21

22 8. Conclusion This paper explained column-store database management systems with MonetDB as the example in detail. It explained how the principle of vertical fragmentation looks in MonetDB, what BATs are, how recycling and updates work in MonetDB and how database cracking and data compression is used. Especially the benchmark showed, where the strength of MonetDB are and in which applications and queries it can outperform traditional database management systems. The benchmark showed the same timing patterns for similar querys: For index searches, where one row is selected by its primary key, PostgreSQL is much faster, because it can use the prebuilt index. Because MonetDB is designed for unknown future queries, it does not build indexes when creating or filling a table. And because starting a partitional index for a single row would take more time than it will save in the future, this time will not be faster when repeating the query. In the benchmark, queries 1a, 1b, 10x, 10a, 10b, and 10c are examples for index searches. The timing differences are shown in Figure 13 for a timing overview for this type of queries. The second query category could be named value searches. Here the data- Figure 13: Diagram showing index search query timing differences on the physical machine (times in milliseconds) base tables are filtered by a single value which returns multiple rows. It is important to note that PostgreSQL has no index here, it has to scan the whole table. It will load the whole rows but only needs certain columns. This leads to multiple iterations of data loading. MonetDB on the other side can make use of vertical fragmentation and just loads the right column which leads to more data fitting in each loading procedure. Also, after filtering once, it starts saving the results as a partial index and thus gets much faster after every repetition, whereas PostgreSQL 22

23 will never build an index on its own. On the whole, this makes MonetDB much faster for this type of query, as shown in Figure 14 in the timing overview for value search queries, using 2a/2b and 11a/11b/11c as an example. Figure 14: Diagram showing value search query timing differences on the physical machine (times in milliseconds) future querys and setting on database cracking. The last types of query one could categorize the benchmark queries in is multi column, where lots of columns over multiple tables are used. In the benchmark, query 14a/14b/14c is an example here. This is slow in both database management systems. Interesting here is that MonetDB gets faster after each query because it can reuse its intermediate BATs, which makes it beat PostgreSQL with the three queries, even though query 14a was much slower. This is shown in Figure 15. In conclusion, the two key points of this paper are: MonetDB is fast when working with lots 2 of data in a mostly read environment with few columns in single querys. MonetDB is optimized for modern CPU architectures and for unknown workloads where it is not known before which queries will be run. As the benchmark shows, in these two areas, MonetDB has the most speed difference versus PostgreSQL. There are more querys with this characteristic in the benchmark which show the same results. In reality, for data warehouse or in scientific data evaluations, this will be the most common type of query and shows why MonetDB has its principle of assuming unknown 2 In this benchmark: Few hunded megabytes, but the differences would be more clear with bigger data from ten to hundred gigabytes. 23

24 The VLDB Journal The International Journal on Very Large Data Bases, 8(2): , [BMK99] Peter A Boncz, Stefan Manegold, and Martin L Kersten. Database architecture optimized for the new bottleneck: Memory access. In VLDB, volume 99, pages 54 65, Figure 15: Diagram showing multi column query timing differences on the physical machine (times in milliseconds) References [ABH + 13] Daniel Abadi, Peter Boncz, [BK99] Stavros Harizopoulos, Stratos Idreos, et al. The design and implementation of modern column-oriented database systems. Now, Peter A Boncz and Martin L Kersten. MIL primitives for querying a fragmented world. [IGN + 12] Stratos Idreos, Fabian Groffen, Niels Nes, Stefan Manegold, et al. MonetDB: Two decades of research in column-oriented database architectures. Data Engineering, page 40, [IKM + 07] Stratos Idreos, Martin L Kersten, Stefan Manegold, et al. Database cracking. In CIDR, volume 3, pages 1 8, [Mon15a] MonetDB B.V. Data compression MonetDB. org/documentation/ Guide/Compression, Last visited: February 18, [Mon15b] MonetDB B.V. Documentations MonetDB. 24

25 org/documentation, Last visited: February 18, [Mon15c] MonetDB B.V. SQL features. supported MonetDB. org/documentation/ Manuals/SQLreference/ Features/Supported, Last visited: February 18, [Mon15d] MonetDB B.V. SQL features. unsupported MonetDB. org/documentation/ Manuals/SQLreference/ Features/unsupported, Last visited: February 18, [VQKN08] Maarten Vermeij, Wilko Quak, Martin Kersten, and Niels Nes. MonetDB, a novel spatial columnstore DBMS. In Academic Proceedings of the 2008 Free and Open Source for Geospatial (FOSS4G) Conference, OS- Geo, pages ,

26 A. List of Figures 1. The storage of a table in the row-based format The storage of a table in the column-oriented format Benchmark results overview on physical machine, times in milliseconds Diagram for queries 1a and 1b on physical machine (times in milliseconds) Diagram for queries 2a and 2b on physical machine (times in milliseconds) Diagram for queries 3a and 3b on physical machine (times in milliseconds) Diagram for queries 4a and 4b on physical machine (times in milliseconds) Diagram for queries 10a, 10b, and 10c on physical machine (times in milliseconds) Diagram for queries 11a, 11b, and 11c on physical machine (times in milliseconds) Diagram for queries 12a, 12b, and 12c on physical machine (times in milliseconds) Diagram for queries 13a, 13b, and 13c on physical machine (times in milliseconds) Diagram for queries 14a, 14b, and 14c on physical machine (times in milliseconds) Diagram showing index search query timing differences on the physical machine (times in milliseconds) Diagram showing value search query timing differences on the physical machine (times in milliseconds) Diagram showing multi column query timing differences on the physical machine (times in milliseconds) B. Bachelor/Master Lab Lession One part of the task for this paper was designing an Bachelor/Master lab lession which is attached in this chapter. B.1. Installation For this lab, you need PostgreSQL and MonetDB. It is assumed that you already have installed PostgreSQL. For MonetDB, please go to the download page: 26

27 There you will find the download instructions for every common operating system. If using a Virtual Machine for the lab, please assure that the VM has at least 4 gigabytes of RAM and uses a 64-bit architecture. The statements given here assume MonetDB on Linux. After installing MonetDB, create a database farm. In MonetDB, database farms are used to group databases, which allows having different storage paths for different databases. 1 $ monetdbd create mydbfarm 2 $ monetdbd start mydbfarm Next, we need to create a database (note that we now use the monetdb client, i.e. no d at the end): 1 $ monetdb create labdb 2 $ monetdb release labdb We can now connect (assuming you kept the default user monetdb at the installation) to the database with the following command: 1 $ mclient -u monetdb -d labdb B.2. Create and Import Create a table in both PostgreSQL and MonetDB with the following statement: 1 CREATE TABLE gnis ( 2 x double precision not null, 3 y double precision not null, 4 fid integer primary key, 5 name text, 6 class text, 7 state text, 8 county text, 9 elevation integer, 10 map text 11 ); In PostgreSQL, you can import the data for this table from the file gnis_names09.csv with the following command: 27

28 1 \ COPY gnis FROM gnis_names09. csv DELIMITER ; QUOTE " CSV HEADER ; Ex. 1: How does the statement look like for MonetDB? (Hint: There is no option to ignore the first row of the CSV, you have to manually delete it. You also have to delete the character ; inside strings, MonetDB cannot handle them.) B.3. Insert, Update and Delete To get accustomed to MonetDB and the used table, please do the following. Because MonetDB uses the standard SQL:2003, this should not be hard. Ex. 2: Insert a new row. How does the statement look like? Ex. 3: Update the elevation for the row. How does the statement look like? Ex. 4: Delete the row. How does the statement look like? B.4. Comparison See the following statements: SELECT name, county, state FROM gnis t WHERE t. fid = ; SELECT name, county, state FROM gnis t WHERE t. county = Texas ; SELECT avg( t. elevation ) FROM gnis t 9 WHERE t.x > and t.y > and t.x < and t.y <33.460; Ex. 5: Think about each statement: What do you expect, for which statements will PostgreSQL be faster, for which statements MonetDB? Ex. 6: Execute the statements in both PostgreSQL and MonetDB and note down the execution times. Why is PostgreSQL or MonetDB faster for each statement? 28

29 B.5. Database Cracking You learned that MonetDB uses Database Cracking. Ex. 7: What is Database Cracking? How do future queries with the same data get faster and under which circumstances? Ex. 8: For each query 1 to 3, note down how much faster you expect it to be after 50 iterations. Ex. 9: Execute each query 50 times and compare the 50th run with your estimations. Explain the differences. C. PostgreSQL Scripts C.1. gnis_load.sql 1 gnis_load. s q l 2 Tested on PostgreSQL 9.4 using p s q l SK 4 5 Create t a b l e g n i s : 6 DROP TABLE IF EXISTS g n i s ; 7 8 CREATE TABLE g n i s ( 9 x double p r e c i s i o n not null, 10 y double p r e c i s i o n not null, 11 f i d integer primary key, 12 name text, 13 c l a s s text, 14 s t a t e text, 15 county text, 16 e l e v a t i o n integer, 17 map t e x t 18 ) ; Copy data from CSV f i l e to database : 21 \COPY g n i s FROM gnis_names09. csv DELIMITER ; QUOTE " CSV HEADER; C.2. osm_poi_ch_load.sql 1 osm_poi_ch_load. s q l 29

30 2 Tested on PostgreSQL 9.4 using p s q l SK 4 5 \ echo osm_poi_ch_loader Create new t a b l e 9 DROP TABLE IF EXISTS osm_poi_ch ; CREATE TABLE osm_poi_ch ( 12 i d character varying ( 6 4) not null, no primary key s i n c e OSM i d maybe not unique 13 lastchanged character varying (35), 14 changeset integer, 15 v e r s i o n integer, 16 uid integer, 17 lon double p r e c i s i o n not null, 18 l a t double p r e c i s i o n not null 19 ) ; Copy data from CSV f i l e to database 23 \COPY osm_poi_ch FROM osm_poi_ch. c s v DELIMITER ; QUOTE " CSV HEADER; select count ( ) from osm_poi_ch ; Create new t a b l e DROP TABLE IF EXISTS osm_poi_tag_ch ; CREATE TABLE osm_poi_tag_ch ( 34 i d character varying ( 6 4) not null, 35 key t e x t not null, 36 value t e x t 37 primary key ( id, key ). I t i s not r e a l l y t r u e. A d u p i c a t i o n found. 38 ) ; Copy data from CSV f i l e to the temporary t a b l e 41 \COPY osm_poi_tag_ch FROM osm_poi_tag_ch. c s v DELIMITER ; QUOTE " CSV HEADER; 30

31 42 43 select count ( ) from osm_poi_tag_ch ; C.3. bm_prepare.sql 1 Benchmark 2 Tested on PostgreSQL 9.4 using p s q l SK 4 5 Requirements : 6 Tables gnis, osm_poi_ch and osm_poi_tag_ch e x i s t and are loaded. 7 8 \ echo Preparing t a b l e s. Pls. wait \ timing on \ echo \n=== Table g n i s 13 Preparing index : 14 DROP INDEX IF EXISTS gnis_fid_idx CASCADE; 15 CREATE UNIQUE INDEX gnis_fid_idx ON g n i s ( f i d ) ; 16 CLUSTER g n i s USING gnis_fid_idx ; 17 Refreshing s t a t i s t i c s : 18 VACUUM FULL ANALYZE g n i s ; \ echo \n=== Table osm_poi_ch 22 DROP INDEX IF EXISTS osm_poi_ch_id_idx CASCADE; 23 CREATE INDEX osm_poi_ch_id_idx ON osm_poi_ch ( i d ) ; 24 CLUSTER osm_poi_ch USING osm_poi_ch_id_idx ; 25 VACUUM FULL ANALYZE osm_poi_ch ; \ echo \n=== Table osm_poi_tag_ch 28 DROP INDEX IF EXISTS osm_poi_tag_ch_id_idx ; 29 CREATE INDEX osm_poi_tag_ch_id_idx ON osm_poi_tag_ch ( i d ) ; 103 s e c 30 CLUSTER osm_poi_tag_ch USING osm_poi_tag_ch_id_idx ; 31 VACUUM FULL ANALYZE osm_poi_tag_ch ; \ echo \n=== Table osm_poi_ch_3mio 35 DROP TABLE IF EXISTS osm_poi_ch_3mio CASCADE; 36 CREATE TABLE osm_poi_ch_3mio AS 31

32 37 select 38 id, 39 max( v e r s i o n ) " v e r s i o n ", 40 max( lastchanged ) lastchanged, 41 max( uid ) uid, 42 max( changeset ) changeset, 43 max( lon ) lon, 44 max( l a t ) l a t 45 from osm_poi_ch 46 group by i d 47 ORDER BY 1 LIMIT ; 48 ALTER TABLE osm_poi_ch_3mio ADD CONSTRAINT osm_poi_ch_3mio_pk PRIMARY KEY( id ) ; 47 sec 49 CREATE UNIQUE INDEX osm_poi_ch_3mio_pk_idx ON osm_poi_ch_3mio ( i d ) ; 38 sec 50 CLUSTER osm_poi_ch_3mio USING osm_poi_ch_3mio_pk_idx ; 112 s e c 51 VACUUM FULL ANALYZE osm_poi_ch_3mio ; \ echo \n=== Table osm_poi_ch_2mio 55 DROP TABLE IF EXISTS osm_poi_ch_2mio CASCADE; 56 CREATE TABLE osm_poi_ch_2mio AS 57 SELECT FROM osm_poi_ch_3mio 58 ORDER BY 1 LIMIT ; 59 ALTER TABLE osm_poi_ch_2mio ADD CONSTRAINT osm_poi_ch_2mio_pk PRIMARY KEY( id ) ; 60 CREATE UNIQUE INDEX osm_poi_ch_2mio_pk_idx ON osm_poi_ch_2mio ( i d ) ; 61 CLUSTER osm_poi_ch_2mio USING osm_poi_ch_2mio_pk_idx ; 62 VACUUM FULL ANALYZE osm_poi_ch_2mio ; \ echo \n=== Table osm_poi_ch_1mio 66 DROP TABLE IF EXISTS osm_poi_ch_1mio CASCADE; 67 CREATE TABLE osm_poi_ch_1mio AS 68 SELECT FROM osm_poi_ch_3mio 69 ORDER BY 1 LIMIT ; 70 ALTER TABLE osm_poi_ch_1mio ADD CONSTRAINT osm_poi_ch_1mio_pk PRIMARY KEY( id ) ; 71 CREATE UNIQUE INDEX osm_poi_ch_1mio_pk_idx ON osm_poi_ch_1mio ( i d ) ; 72 CLUSTER osm_poi_ch_1mio USING osm_poi_ch_1mio_pk_idx ; 73 VACUUM FULL ANALYZE osm_poi_ch_1mio ; 74 32

33 75 \ echo \nok. C.4. bm.sql 1 Benchmark 2 Tested on PostgreSQL 9.4 using p s q l SK 4 5 Local c o n f i g u r a t i o n 6 \ pset format 7 \ pset pager o f f 8 9 Redirect query output to f i l e 10 \ set OUTFILE bm_out. txt 11 \o :OUTFILE This i s a dummy query to f i l l cache with ( o t h e r ) t u p l e s 14 SELECT count ( ) FROM osm_poi_tag_ch ; \ echo \n=== Table g n i s Simple e q u a l i t y search with s i n g l e t u p e l in return s e t 19 \ timing o f f 20 SELECT count ( ) FROM osm_poi_tag_ch ; 21 \ timing on 22 \ echo ; 1 a 23 SELECT name, county, s t a t e FROM g n i s t WHERE t. f i d = ; 24 \ echo ; 1 b 25 SELECT name, county, s t a t e FROM g n i s t WHERE t. f i d = ; Simple e q u a l i t y search on county Texas : 28 \ timing o f f 29 SELECT count ( ) FROM osm_poi_tag_ch ; 30 \ timing on 31 \ echo ; 2 a 32 SELECT name, county, s t a t e FROM g n i s t WHERE t. county= Texas ; 33 \ echo ; 2 b 34 SELECT name, county, s t a t e FROM g n i s t WHERE t. county= Texas ; Range search with a g g r e g a t e f u n c t i o n 37 \ timing o f f 33

34 38 SELECT count ( ) FROM osm_poi_tag_ch ; 39 \ timing on 40 \ echo ; 3 a 41 SELECT avg( t. e l e v a t i o n ) : : int FROM g n i s t 42 WHERE t. x> and t. y > and t. x< and t. y <33.460; 43 \ echo ; 3 b 44 SELECT avg( t. e l e v a t i o n ) : : int FROM g n i s t 45 WHERE t. x> and t. y > and t. x< and t. y <33.460; Group by query 48 \ timing o f f 49 SELECT count ( ) FROM osm_poi_tag_ch ; 50 \ timing on 51 \ echo ; 4 a 52 SELECT count ( ), c l a s s FROM g n i s GROUP BY c l a s s 53 ORDER BY 1 DESC; 54 \ echo ; 4 b 55 SELECT count ( ), c l a s s FROM g n i s GROUP BY c l a s s 56 ORDER BY 1 DESC; \ echo \n=== Table osm_poi_ch Query with e q u a l i t y c o n d i t i o n 62 \ timing o f f 63 SELECT count ( ) FROM g n i s ; 64 \ timing on 65 \ echo ; 1 0 x 66 select id, version, lon, l a t from osm_poi_ch_1mio where id= pt ; 67 \ echo ; 1 0 a 68 select id, version, lon, l a t from osm_poi_ch_1mio where id= pt ; 69 \ echo ; 1 0 b 70 select id, version, lon, l a t from osm_poi_ch_2mio where id= pt ; 71 \ echo ; 1 0 c 72 select id, version, lon, l a t from osm_poi_ch_3mio where id= pt ; Query with range c o n d i t i o n 75 \ timing o f f 76 SELECT count ( ) FROM g n i s ; 77 \ timing on 78 \ echo ; 1 1 a 34

35 79 select id, version, lon, l a t from osm_poi_ch_1mio where version >300 order by v e r s i o n desc 80 limit 1 0 ; 81 \ echo ; 1 1 b 82 select id, version, lon, l a t from osm_poi_ch_2mio where version >300 order by v e r s i o n desc 83 limit 1 0 ; 84 \ echo ; 1 1 c 85 select id, version, lon, l a t from osm_poi_ch_3mio where version >300 order by v e r s i o n desc 86 limit 1 0 ; Query with range c o n d i t i o n I I. 90 \ timing o f f 91 SELECT count ( ) FROM g n i s ; 92 \ timing on 93 \ echo ; 1 2 a 94 select id, version, lon, l a t 95 from osm_poi_ch_1mio 96 where lon > and l a t > and lon < and l a t < order by v e r s i o n desc ; 98 \ echo ; 1 2 b 99 select id, version, lon, l a t 100 from osm_poi_ch_2mio 101 where lon > and l a t > and lon < and l a t < order by v e r s i o n desc ; 103 \ echo ; 1 2 c 104 select id, version, lon, l a t 105 from osm_poi_ch_3mio 106 where lon > and l a t > and lon < and l a t < order by v e r s i o n desc ; Query w i t h group by 110 \ timing o f f 111 SELECT count ( ) FROM g n i s ; 112 \ timing on 113 \ echo ; 1 3 a 114 select count ( id ), uid 115 from osm_poi_ch_1mio 116 group by uid having count ( id )>1 117 order by 1 desc limit 1 0 ; 35

36 118 \ echo ; 1 3 b 119 select count ( id ), uid 120 from osm_poi_ch_2mio 121 group by uid having count ( id )>1 122 order by 1 desc limit 1 0 ; 123 \ echo ; 1 3 c 124 select count ( id ), uid 125 from osm_poi_ch_3mio 126 group by uid having count ( id )>1 127 order by 1 desc limit 1 0 ; Query with 3 j o i n s 130 A l l e Restaurants mit id, name und K nart ( f a l l s vorhanden ) : 131 \ timing o f f 132 SELECT count ( ) FROM g n i s ; 133 \ timing on 134 \ echo ; 1 4 a 135 select e. id, av2. value as name, av3. value as c u i s i n e, lon, l a t 136 from osm_poi_ch_1mio as e 137 join osm_poi_tag_ch as av on e. id=av. id 138 l e f t outer join osm_poi_tag_ch as av2 on e. id=av2. id 139 l e f t outer join osm_poi_tag_ch as av3 on e. id=av3. id 140 where 141 av. key= amenity and av. value= r e s t a u r a n t 142 and av2. key= name 143 and av3. key= c u i s i n e 144 and e. lon > and e. l a t > and e. lon < and e. l a t < order by name 146 limit 1 0 ; 147 \ echo ; 1 4 b 148 select e. id, av2. value as name, av3. value as c u i s i n e, lon, l a t 149 from osm_poi_ch_2mio as e 150 join osm_poi_tag_ch as av on e. id=av. id 151 l e f t outer join osm_poi_tag_ch as av2 on e. id=av2. id 152 l e f t outer join osm_poi_tag_ch as av3 on e. id=av3. id 153 where 154 av. key= amenity and av. value= r e s t a u r a n t 155 and av2. key= name 156 and av3. key= c u i s i n e 157 and e. lon > and e. l a t > and e. lon < and e. l a t <

NEC PerforCache. Influence on M-Series Disk Array Behavior and Performance. Version 1.0

NEC PerforCache. Influence on M-Series Disk Array Behavior and Performance. Version 1.0 NEC PerforCache Influence on M-Series Disk Array Behavior and Performance. Version 1.0 Preface This document describes L2 (Level 2) Cache Technology which is a feature of NEC M-Series Disk Array implemented

More information

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 Star Joins A common structure for data mining of commercial data is the star join. For example, a chain store like Walmart keeps a fact table whose tuples each

More information

FACTORS AFFECTING CONCURRENT TRUNCATE

FACTORS AFFECTING CONCURRENT TRUNCATE T E C H N I C A L N O T E FACTORS AFFECTING CONCURRENT TRUNCATE DURING BATCH PROCESSES Prepared By David Kurtz, Go-Faster Consultancy Ltd. Technical Note Version 1.00 Thursday 2 April 2009 (E-mail: david.kurtz@go-faster.co.uk,

More information

ST-Links. SpatialKit. Version 3.0.x. For ArcMap. ArcMap Extension for Directly Connecting to Spatial Databases. ST-Links Corporation.

ST-Links. SpatialKit. Version 3.0.x. For ArcMap. ArcMap Extension for Directly Connecting to Spatial Databases. ST-Links Corporation. ST-Links SpatialKit For ArcMap Version 3.0.x ArcMap Extension for Directly Connecting to Spatial Databases ST-Links Corporation www.st-links.com 2012 Contents Introduction... 3 Installation... 3 Database

More information

Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012

Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012 Weather Research and Forecasting (WRF) Performance Benchmark and Profiling July 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell,

More information

ArcGIS Enterprise: What s New. Philip Heede Shannon Kalisky Melanie Summers Shreyas Shinde

ArcGIS Enterprise: What s New. Philip Heede Shannon Kalisky Melanie Summers Shreyas Shinde ArcGIS Enterprise: What s New Philip Heede Shannon Kalisky Melanie Summers Shreyas Shinde ArcGIS Enterprise is the new name for ArcGIS for Server ArcGIS Enterprise Software Components ArcGIS Server Portal

More information

Administrivia. Course Objectives. Overview. Lecture Notes Week markem/cs333/ 2. Staff. 3. Prerequisites. 4. Grading. 1. Theory and application

Administrivia. Course Objectives. Overview. Lecture Notes Week markem/cs333/ 2. Staff. 3. Prerequisites. 4. Grading. 1. Theory and application Administrivia 1. markem/cs333/ 2. Staff 3. Prerequisites 4. Grading Course Objectives 1. Theory and application 2. Benefits 3. Labs TAs Overview 1. What is a computer system? CPU PC ALU System bus Memory

More information

Performance Metrics for Computer Systems. CASS 2018 Lavanya Ramapantulu

Performance Metrics for Computer Systems. CASS 2018 Lavanya Ramapantulu Performance Metrics for Computer Systems CASS 2018 Lavanya Ramapantulu Eight Great Ideas in Computer Architecture Design for Moore s Law Use abstraction to simplify design Make the common case fast Performance

More information

Data Canopy. Accelerating Exploratory Statistical Analysis. Abdul Wasay Xinding Wei Niv Dayan Stratos Idreos

Data Canopy. Accelerating Exploratory Statistical Analysis. Abdul Wasay Xinding Wei Niv Dayan Stratos Idreos Accelerating Exploratory Statistical Analysis Abdul Wasay inding Wei Niv Dayan Stratos Idreos Statistics are everywhere! Algorithms Systems Analytic Pipelines 80 Temperature 60 40 20 0 May 2017 80 Temperature

More information

The File Geodatabase API. Craig Gillgrass Lance Shipman

The File Geodatabase API. Craig Gillgrass Lance Shipman The File Geodatabase API Craig Gillgrass Lance Shipman Schedule Cell phones and pagers Please complete the session survey we take your feedback very seriously! Overview File Geodatabase API - Introduction

More information

CS425: Algorithms for Web Scale Data

CS425: Algorithms for Web Scale Data CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original slides can be accessed at: www.mmds.org Challenges

More information

Behavioral Simulations in MapReduce

Behavioral Simulations in MapReduce Behavioral Simulations in MapReduce Guozhang Wang, Marcos Vaz Salles, Benjamin Sowell, Xun Wang, Tuan Cao, Alan Demers, Johannes Gehrke, Walker White Cornell University 1 What are Behavioral Simulations?

More information

RESEARCH ON THE DISTRIBUTED PARALLEL SPATIAL INDEXING SCHEMA BASED ON R-TREE

RESEARCH ON THE DISTRIBUTED PARALLEL SPATIAL INDEXING SCHEMA BASED ON R-TREE RESEARCH ON THE DISTRIBUTED PARALLEL SPATIAL INDEXING SCHEMA BASED ON R-TREE Yuan-chun Zhao a, b, Cheng-ming Li b a. Shandong University of Science and Technology, Qingdao 266510 b. Chinese Academy of

More information

VMware VMmark V1.1 Results

VMware VMmark V1.1 Results Vendor and Hardware Platform: IBM System x3950 M2 Virtualization Platform: VMware ESX 3.5.0 U2 Build 110181 Performance VMware VMmark V1.1 Results Tested By: IBM Inc., RTP, NC Test Date: 2008-09-20 Performance

More information

A General-Purpose Counting Filter: Making Every Bit Count. Prashant Pandey, Michael A. Bender, Rob Johnson, Rob Patro Stony Brook University, NY

A General-Purpose Counting Filter: Making Every Bit Count. Prashant Pandey, Michael A. Bender, Rob Johnson, Rob Patro Stony Brook University, NY A General-Purpose Counting Filter: Making Every Bit Count Prashant Pandey, Michael A. Bender, Rob Johnson, Rob Patro Stony Brook University, NY Approximate Membership Query (AMQ) insert(x) ismember(x)

More information

Database Systems SQL. A.R. Hurson 323 CS Building

Database Systems SQL. A.R. Hurson 323 CS Building SQL A.R. Hurson 323 CS Building Structured Query Language (SQL) The SQL language has the following features as well: Embedded and Dynamic facilities to allow SQL code to be called from a host language

More information

Replication cluster on MariaDB 5.5 / ubuntu-server. Mark Schneider ms(at)it-infrastrukturen(dot)org

Replication cluster on MariaDB 5.5 / ubuntu-server. Mark Schneider ms(at)it-infrastrukturen(dot)org Mark Schneider ms(at)it-infrastrukturen(dot)org 2012-05-31 Abstract Setting of MASTER-SLAVE or MASTER-MASTER replications on MariaDB 5.5 database servers is neccessary for higher availability of data and

More information

MapOSMatic: city maps for the masses

MapOSMatic: city maps for the masses MapOSMatic: city maps for the masses Thomas Petazzoni Libre Software Meeting July 9th, 2010 Outline 1 The story 2 MapOSMatic 3 Behind the web page 4 Pain points 5 Future work 6 Conclusion Thomas Petazzoni

More information

One Optimized I/O Configuration per HPC Application

One Optimized I/O Configuration per HPC Application One Optimized I/O Configuration per HPC Application Leveraging I/O Configurability of Amazon EC2 Cloud Mingliang Liu, Jidong Zhai, Yan Zhai Tsinghua University Xiaosong Ma North Carolina State University

More information

Environment (Parallelizing Query Optimization)

Environment (Parallelizing Query Optimization) Advanced d Query Optimization i i Techniques in a Parallel Computing Environment (Parallelizing Query Optimization) Wook-Shin Han*, Wooseong Kwak, Jinsoo Lee Guy M. Lohman, Volker Markl Kyungpook National

More information

4th year Project demo presentation

4th year Project demo presentation 4th year Project demo presentation Colm Ó héigeartaigh CASE4-99387212 coheig-case4@computing.dcu.ie 4th year Project demo presentation p. 1/23 Table of Contents An Introduction to Quantum Computing The

More information

Linear Algebra and Eigenproblems

Linear Algebra and Eigenproblems Appendix A A Linear Algebra and Eigenproblems A working knowledge of linear algebra is key to understanding many of the issues raised in this work. In particular, many of the discussions of the details

More information

Enumeration and generation of all constitutional alkane isomers of methane to icosane using recursive generation and a modified Morgan s algorithm

Enumeration and generation of all constitutional alkane isomers of methane to icosane using recursive generation and a modified Morgan s algorithm The C n H 2n+2 challenge Zürich, November 21st 2015 Enumeration and generation of all constitutional alkane isomers of methane to icosane using recursive generation and a modified Morgan s algorithm Andreas

More information

A Spatial Data Infrastructure for Landslides and Floods in Italy

A Spatial Data Infrastructure for Landslides and Floods in Italy V Convegno Nazionale del Gruppo GIT Grottaminarda 14 16 giugno 2010 A Spatial Data Infrastructure for Landslides and Floods in Italy Ivan Marchesini, Vinicio Balducci, Gabriele Tonelli, Mauro Rossi, Fausto

More information

Multi-Approximate-Keyword Routing Query

Multi-Approximate-Keyword Routing Query Bin Yao 1, Mingwang Tang 2, Feifei Li 2 1 Department of Computer Science and Engineering Shanghai Jiao Tong University, P. R. China 2 School of Computing University of Utah, USA Outline 1 Introduction

More information

Esri UC2013. Technical Workshop.

Esri UC2013. Technical Workshop. Esri International User Conference San Diego, California Technical Workshops July 9, 2013 CAD: Introduction to using CAD Data in ArcGIS Jeff Reinhart & Phil Sanchez Agenda Overview of ArcGIS CAD Support

More information

Algorithms and Data Structures

Algorithms and Data Structures Algorithms and Data Structures, Divide and Conquer Albert-Ludwigs-Universität Freiburg Prof. Dr. Rolf Backofen Bioinformatics Group / Department of Computer Science Algorithms and Data Structures, December

More information

The File Geodatabase API. Dave Sousa, Lance Shipman

The File Geodatabase API. Dave Sousa, Lance Shipman The File Geodatabase API Dave Sousa, Lance Shipman Overview Introduction Supported Tasks API Overview What s not supported Updates Demo Introduction Example Video: City Engine Provide a non-arcobjects

More information

ww.padasalai.net

ww.padasalai.net t w w ADHITHYA TRB- TET COACHING CENTRE KANCHIPURAM SUNDER MATRIC SCHOOL - 9786851468 TEST - 2 COMPUTER SCIENC PG - TRB DATE : 17. 03. 2019 t et t et t t t t UNIT 1 COMPUTER SYSTEM ARCHITECTURE t t t t

More information

ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University

ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University Prof. Mi Lu TA: Ehsan Rohani Laboratory Exercise #4 MIPS Assembly and Simulation

More information

Rainfall data analysis and storm prediction system

Rainfall data analysis and storm prediction system Rainfall data analysis and storm prediction system SHABARIRAM, M. E. Available from Sheffield Hallam University Research Archive (SHURA) at: http://shura.shu.ac.uk/15778/ This document is the author deposited

More information

Large-Scale Behavioral Targeting

Large-Scale Behavioral Targeting Large-Scale Behavioral Targeting Ye Chen, Dmitry Pavlov, John Canny ebay, Yandex, UC Berkeley (This work was conducted at Yahoo! Labs.) June 30, 2009 Chen et al. (KDD 09) Large-Scale Behavioral Targeting

More information

MapOSMatic, free city maps for everyone!

MapOSMatic, free city maps for everyone! MapOSMatic, free city maps for everyone! Thomas Petazzoni thomas.petazzoni@enix.org Libre Software Meeting 2012 http://www.maposmatic.org Thomas Petazzoni () MapOSMatic: free city maps for everyone! July

More information

Practical Data Processing With Haskell

Practical Data Processing With Haskell Practical Data Processing With Haskell Ozgun Ataman November 14, 2012 Ozgun Ataman (Soostone Inc) Practical Data Processing With Haskell November 14, 2012 1 / 18 A bit about the speaker Electrical Engineering,

More information

Initial Sampling for Automatic Interactive Data Exploration

Initial Sampling for Automatic Interactive Data Exploration Initial Sampling for Automatic Interactive Data Exploration Wenzhao Liu 1, Yanlei Diao 1, and Anna Liu 2 1 College of Information and Computer Sciences, University of Massachusetts, Amherst 2 Department

More information

md5bloom: Forensic Filesystem Hashing Revisited

md5bloom: Forensic Filesystem Hashing Revisited DIGITAL FORENSIC RESEARCH CONFERENCE md5bloom: Forensic Filesystem Hashing Revisited By Vassil Roussev, Timothy Bourg, Yixin Chen, Golden Richard Presented At The Digital Forensic Research Conference DFRWS

More information

Geographical Databases: PostGIS. Introduction. Creating a new database. References

Geographical Databases: PostGIS. Introduction. Creating a new database. References Geographical Databases: PostGIS Introduction PostGIS is an extension of PostgresSQL for storing and analyzing spatial data. It defines data types and operations to process (mostly) vector data. It has

More information

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University Che-Wei Chang chewei@mail.cgu.edu.tw Department of Computer Science and Information Engineering, Chang Gung University } 2017/11/15 Midterm } 2017/11/22 Final Project Announcement 2 1. Introduction 2.

More information

An Experimental Evaluation of Passage-Based Process Discovery

An Experimental Evaluation of Passage-Based Process Discovery An Experimental Evaluation of Passage-Based Process Discovery H.M.W. Verbeek and W.M.P. van der Aalst Technische Universiteit Eindhoven Department of Mathematics and Computer Science P.O. Box 513, 5600

More information

Arup Nanda Starwood Hotels

Arup Nanda Starwood Hotels Arup Nanda Starwood Hotels Why Analyze The Database is Slow! Storage, CPU, memory, runqueues all affect the performance Know what specifically is causing them to be slow To build a profile of the application

More information

Introduction to Randomized Algorithms III

Introduction to Randomized Algorithms III Introduction to Randomized Algorithms III Joaquim Madeira Version 0.1 November 2017 U. Aveiro, November 2017 1 Overview Probabilistic counters Counting with probability 1 / 2 Counting with probability

More information

Reaxys Pipeline Pilot Components Installation and User Guide

Reaxys Pipeline Pilot Components Installation and User Guide 1 1 Reaxys Pipeline Pilot components for Pipeline Pilot 9.5 Reaxys Pipeline Pilot Components Installation and User Guide Version 1.0 2 Introduction The Reaxys and Reaxys Medicinal Chemistry Application

More information

Impression Store: Compressive Sensing-based Storage for. Big Data Analytics

Impression Store: Compressive Sensing-based Storage for. Big Data Analytics Impression Store: Compressive Sensing-based Storage for Big Data Analytics Jiaxing Zhang, Ying Yan, Liang Jeff Chen, Minjie Wang, Thomas Moscibroda & Zheng Zhang Microsoft Research The Curse of O(N) in

More information

StreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory

StreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory StreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V. N. (vishy) Vishwanathan Purdue University and Microsoft vishy@purdue.edu October 9, 2012 S.V. N. Vishwanathan (Purdue,

More information

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018 15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018 Today we ll talk about a topic that is both very old (as far as computer science

More information

CS 700: Quantitative Methods & Experimental Design in Computer Science

CS 700: Quantitative Methods & Experimental Design in Computer Science CS 700: Quantitative Methods & Experimental Design in Computer Science Sanjeev Setia Dept of Computer Science George Mason University Logistics Grade: 35% project, 25% Homework assignments 20% midterm,

More information

Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models

Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models Chengjie Qin 1, Martin Torres 2, and Florin Rusu 2 1 GraphSQL, Inc. 2 University of California Merced August 31, 2017 Machine

More information

Homework Assignment 2. Due Date: October 17th, CS425 - Database Organization Results

Homework Assignment 2. Due Date: October 17th, CS425 - Database Organization Results Name CWID Homework Assignment 2 Due Date: October 17th, 2017 CS425 - Database Organization Results Please leave this empty! 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.15 2.16 2.17 2.18 2.19 Sum

More information

A Comparison Between MongoDB and MySQL Document Store Considering Performance

A Comparison Between MongoDB and MySQL Document Store Considering Performance A Comparison Between MongoDB and MySQL Document Store Considering Performance Erik Andersson and Zacharias Berggren Erik Andersson and Zacharias Berggren VT 2017 Examensarbete, 15 hp Supervisor: Kai-Florian

More information

Geography 281 Map Making with GIS Project Four: Comparing Classification Methods

Geography 281 Map Making with GIS Project Four: Comparing Classification Methods Geography 281 Map Making with GIS Project Four: Comparing Classification Methods Thematic maps commonly deal with either of two kinds of data: Qualitative Data showing differences in kind or type (e.g.,

More information

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017

HYCOM and Navy ESPC Future High Performance Computing Needs. Alan J. Wallcraft. COAPS Short Seminar November 6, 2017 HYCOM and Navy ESPC Future High Performance Computing Needs Alan J. Wallcraft COAPS Short Seminar November 6, 2017 Forecasting Architectural Trends 3 NAVY OPERATIONAL GLOBAL OCEAN PREDICTION Trend is higher

More information

Using the File Geodatabase API. Lance Shipman David Sousa

Using the File Geodatabase API. Lance Shipman David Sousa Using the File Geodatabase API Lance Shipman David Sousa Overview File Geodatabase API - Introduction - Supported Tasks - API Overview - What s not supported - Updates - Demo File Geodatabase API Provide

More information

Administering your Enterprise Geodatabase using Python. Jill Penney

Administering your Enterprise Geodatabase using Python. Jill Penney Administering your Enterprise Geodatabase using Python Jill Penney Assumptions Basic knowledge of python Basic knowledge enterprise geodatabases and workflows You want code Please turn off or silence cell

More information

ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN. Week 9 Dr. Srinivas Shakkottai Dept. of Electrical and Computer Engineering

ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN. Week 9 Dr. Srinivas Shakkottai Dept. of Electrical and Computer Engineering ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN Week 9 Dr. Srinivas Shakkottai Dept. of Electrical and Computer Engineering TIMING ANALYSIS Overview Circuits do not respond instantaneously to input changes

More information

The conceptual view. by Gerrit Muller University of Southeast Norway-NISE

The conceptual view. by Gerrit Muller University of Southeast Norway-NISE by Gerrit Muller University of Southeast Norway-NISE e-mail: gaudisite@gmail.com www.gaudisite.nl Abstract The purpose of the conceptual view is described. A number of methods or models is given to use

More information

Let s now begin to formalize our analysis of sequential machines Powerful methods for designing machines for System control Pattern recognition Etc.

Let s now begin to formalize our analysis of sequential machines Powerful methods for designing machines for System control Pattern recognition Etc. Finite State Machines Introduction Let s now begin to formalize our analysis of sequential machines Powerful methods for designing machines for System control Pattern recognition Etc. Such devices form

More information

Lineage implementation in PostgreSQL

Lineage implementation in PostgreSQL Lineage implementation in PostgreSQL Andrin Betschart, 09-714-882 Martin Leimer, 09-728-569 3. Oktober 2013 Contents Contents 1. Introduction 3 2. Lineage computation in TPDBs 4 2.1. Lineage......................................

More information

High-Performance Scientific Computing

High-Performance Scientific Computing High-Performance Scientific Computing Instructor: Randy LeVeque TA: Grady Lemoine Applied Mathematics 483/583, Spring 2011 http://www.amath.washington.edu/~rjl/am583 World s fastest computers http://top500.org

More information

High-performance Technical Computing with Erlang

High-performance Technical Computing with Erlang High-performance Technical Computing with Erlang Alceste Scalas Giovanni Casu Piero Pili Center for Advanced Studies, Research and Development in Sardinia ACM ICFP 2008 Erlang Workshop September 27th,

More information

COMPUTER SCIENCE TRIPOS

COMPUTER SCIENCE TRIPOS CST.2016.2.1 COMPUTER SCIENCE TRIPOS Part IA Tuesday 31 May 2016 1.30 to 4.30 COMPUTER SCIENCE Paper 2 Answer one question from each of Sections A, B and C, and two questions from Section D. Submit the

More information

Moving into the information age: From records to Google Earth

Moving into the information age: From records to Google Earth Moving into the information age: From records to Google Earth David R. R. Smith Psychology, School of Life Sciences, University of Hull e-mail: davidsmith.butterflies@gmail.com Introduction Many of us

More information

CS 243 Lecture 11 Binary Decision Diagrams (BDDs) in Pointer Analysis

CS 243 Lecture 11 Binary Decision Diagrams (BDDs) in Pointer Analysis CS 243 Lecture 11 Binary Decision Diagrams (BDDs) in Pointer Analysis 1. Relations in BDDs 2. Datalog -> Relational Algebra 3. Relational Algebra -> BDDs 4. Context-Sensitive Pointer Analysis 5. Performance

More information

Geodatabase Best Practices. Dave Crawford Erik Hoel

Geodatabase Best Practices. Dave Crawford Erik Hoel Geodatabase Best Practices Dave Crawford Erik Hoel Geodatabase best practices - outline Geodatabase creation Data ownership Data model Data configuration Geodatabase behaviors Data integrity and validation

More information

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2) INF2270 Spring 2010 Philipp Häfliger Summary/Repetition (1/2) content From Scalar to Superscalar Lecture Summary and Brief Repetition Binary numbers Boolean Algebra Combinational Logic Circuits Encoder/Decoder

More information

What are the five components of a GIS? A typically GIS consists of five elements: - Hardware, Software, Data, People and Procedures (Work Flows)

What are the five components of a GIS? A typically GIS consists of five elements: - Hardware, Software, Data, People and Procedures (Work Flows) LECTURE 1 - INTRODUCTION TO GIS Section I - GIS versus GPS What is a geographic information system (GIS)? GIS can be defined as a computerized application that combines an interactive map with a database

More information

Data Structures. Outline. Introduction. Andres Mendez-Vazquez. December 3, Data Manipulation Examples

Data Structures. Outline. Introduction. Andres Mendez-Vazquez. December 3, Data Manipulation Examples Data Structures Introduction Andres Mendez-Vazquez December 3, 2015 1 / 53 Outline 1 What the Course is About? Data Manipulation Examples 2 What is a Good Algorithm? Sorting Example A Naive Algorithm Counting

More information

ArcGIS GeoAnalytics Server: An Introduction. Sarah Ambrose and Ravi Narayanan

ArcGIS GeoAnalytics Server: An Introduction. Sarah Ambrose and Ravi Narayanan ArcGIS GeoAnalytics Server: An Introduction Sarah Ambrose and Ravi Narayanan Overview Introduction Demos Analysis Concepts using GeoAnalytics Server GeoAnalytics Data Sources GeoAnalytics Server Administration

More information

Factorized Relational Databases Olteanu and Závodný, University of Oxford

Factorized Relational Databases   Olteanu and Závodný, University of Oxford November 8, 2013 Database Seminar, U Washington Factorized Relational Databases http://www.cs.ox.ac.uk/projects/fd/ Olteanu and Závodný, University of Oxford Factorized Representations of Relations Cust

More information

COMPUTER SCIENCE TRIPOS

COMPUTER SCIENCE TRIPOS CST0.2017.2.1 COMPUTER SCIENCE TRIPOS Part IA Thursday 8 June 2017 1.30 to 4.30 COMPUTER SCIENCE Paper 2 Answer one question from each of Sections A, B and C, and two questions from Section D. Submit the

More information

6.830 Lecture 11. Recap 10/15/2018

6.830 Lecture 11. Recap 10/15/2018 6.830 Lecture 11 Recap 10/15/2018 Celebration of Knowledge 1.5h No phones, No laptops Bring your Student-ID The 5 things allowed on your desk Calculator allowed 4 pages (2 pages double sided) of your liking

More information

Analysis of Algorithms [Reading: CLRS 2.2, 3] Laura Toma, csci2200, Bowdoin College

Analysis of Algorithms [Reading: CLRS 2.2, 3] Laura Toma, csci2200, Bowdoin College Analysis of Algorithms [Reading: CLRS 2.2, 3] Laura Toma, csci2200, Bowdoin College Why analysis? We want to predict how the algorithm will behave (e.g. running time) on arbitrary inputs, and how it will

More information

csci 210: Data Structures Program Analysis

csci 210: Data Structures Program Analysis csci 210: Data Structures Program Analysis 1 Summary Summary analysis of algorithms asymptotic analysis big-o big-omega big-theta asymptotic notation commonly used functions discrete math refresher READING:

More information

Dot-Product Join: Scalable In-Database Linear Algebra for Big Model Analytics

Dot-Product Join: Scalable In-Database Linear Algebra for Big Model Analytics Dot-Product Join: Scalable In-Database Linear Algebra for Big Model Analytics Chengjie Qin 1 and Florin Rusu 2 1 GraphSQL, Inc. 2 University of California Merced June 29, 2017 Machine Learning (ML) Is

More information

Computation Theory Finite Automata

Computation Theory Finite Automata Computation Theory Dept. of Computing ITT Dublin October 14, 2010 Computation Theory I 1 We would like a model that captures the general nature of computation Consider two simple problems: 2 Design a program

More information

In-Database Factorised Learning fdbresearch.github.io

In-Database Factorised Learning fdbresearch.github.io In-Database Factorised Learning fdbresearch.github.io Mahmoud Abo Khamis, Hung Ngo, XuanLong Nguyen, Dan Olteanu, and Maximilian Schleich December 2017 Logic for Data Science Seminar Alan Turing Institute

More information

Aalto University 2) University of Oxford

Aalto University 2) University of Oxford RFID-Based Logistics Monitoring with Semantics-Driven Event Processing Mikko Rinne 1), Monika Solanki 2) and Esko Nuutila 1) 23rd of June 2016 DEBS 2016 1) Aalto University 2) University of Oxford Scenario:

More information

Prosurv LLC Presents

Prosurv LLC Presents Prosurv LLC Presents An Enterprise-Level Geo-Spatial Data Visualizer Part IV Upload Data Upload Data Click the Upload Data menu item to access the uploading data page. Step #1: Select a Project All Projects

More information

41. Sim Reactions Example

41. Sim Reactions Example HSC Chemistry 7.0 41-1(6) 41. Sim Reactions Example Figure 1: Sim Reactions Example, Run mode view after calculations. General This example contains instruction how to create a simple model. The example

More information

CPU Scheduling. CPU Scheduler

CPU Scheduling. CPU Scheduler CPU Scheduling These slides are created by Dr. Huang of George Mason University. Students registered in Dr. Huang s courses at GMU can make a single machine readable copy and print a single copy of each

More information

OBEUS. (Object-Based Environment for Urban Simulation) Shareware Version. Itzhak Benenson 1,2, Slava Birfur 1, Vlad Kharbash 1

OBEUS. (Object-Based Environment for Urban Simulation) Shareware Version. Itzhak Benenson 1,2, Slava Birfur 1, Vlad Kharbash 1 OBEUS (Object-Based Environment for Urban Simulation) Shareware Version Yaffo model is based on partition of the area into Voronoi polygons, which correspond to real-world houses; neighborhood relationship

More information

Scripting Languages Fast development, extensible programs

Scripting Languages Fast development, extensible programs Scripting Languages Fast development, extensible programs Devert Alexandre School of Software Engineering of USTC November 30, 2012 Slide 1/60 Table of Contents 1 Introduction 2 Dynamic languages A Python

More information

Supplementary Material

Supplementary Material Supplementary Material Contents 1 Keywords of GQL 2 2 The GQL grammar 3 3 THE GQL user guide 4 3.1 The environment........................................... 4 3.2 GQL projects.............................................

More information

Event Operators: Formalization, Algorithms, and Implementation Using Interval- Based Semantics

Event Operators: Formalization, Algorithms, and Implementation Using Interval- Based Semantics Department of Computer Science and Engineering University of Texas at Arlington Arlington, TX 76019 Event Operators: Formalization, Algorithms, and Implementation Using Interval- Based Semantics Raman

More information

Algorithms for Data Science

Algorithms for Data Science Algorithms for Data Science CSOR W4246 Eleni Drinea Computer Science Department Columbia University Tuesday, December 1, 2015 Outline 1 Recap Balls and bins 2 On randomized algorithms 3 Saving space: hashing-based

More information

Multiple-Site Distributed Spatial Query Optimization using Spatial Semijoins

Multiple-Site Distributed Spatial Query Optimization using Spatial Semijoins 11 Multiple-Site Distributed Spatial Query Optimization using Spatial Semijoins Wendy OSBORN a, 1 and Saad ZAAMOUT a a Department of Mathematics and Computer Science, University of Lethbridge, Lethbridge,

More information

Quiz 2. Due November 26th, CS525 - Advanced Database Organization Solutions

Quiz 2. Due November 26th, CS525 - Advanced Database Organization Solutions Name CWID Quiz 2 Due November 26th, 2015 CS525 - Advanced Database Organization s Please leave this empty! 1 2 3 4 5 6 7 Sum Instructions Multiple choice questions are graded in the following way: You

More information

CHAPTER 2 EXTRACTION OF THE QUADRATICS FROM REAL ALGEBRAIC POLYNOMIAL

CHAPTER 2 EXTRACTION OF THE QUADRATICS FROM REAL ALGEBRAIC POLYNOMIAL 24 CHAPTER 2 EXTRACTION OF THE QUADRATICS FROM REAL ALGEBRAIC POLYNOMIAL 2.1 INTRODUCTION Polynomial factorization is a mathematical problem, which is often encountered in applied sciences and many of

More information

Fundamentals of Computational Science

Fundamentals of Computational Science Fundamentals of Computational Science Dr. Hyrum D. Carroll August 23, 2016 Introductions Each student: Name Undergraduate school & major Masters & major Previous research (if any) Why Computational Science

More information

Geodatabase An Introduction

Geodatabase An Introduction 2013 Esri International User Conference July 8 12, 2013 San Diego, California Technical Workshop Geodatabase An Introduction David Crawford and Jonathan Murphy Session Path The Geodatabase What is it?

More information

csci 210: Data Structures Program Analysis

csci 210: Data Structures Program Analysis csci 210: Data Structures Program Analysis Summary Topics commonly used functions analysis of algorithms experimental asymptotic notation asymptotic analysis big-o big-omega big-theta READING: GT textbook

More information

B629 project - StreamIt MPI Backend. Nilesh Mahajan

B629 project - StreamIt MPI Backend. Nilesh Mahajan B629 project - StreamIt MPI Backend Nilesh Mahajan March 26, 2013 Abstract StreamIt is a language based on the dataflow model of computation. StreamIt consists of computation units called filters connected

More information

Analysis of Algorithms

Analysis of Algorithms Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M. H. Goldwasser, Wiley, 2014 Analysis of Algorithms Input Algorithm Analysis

More information

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications Christopher Rodrigues, David J. Hardy, John E. Stone, Klaus Schulten, Wen-Mei W. Hwu University of Illinois at Urbana-Champaign

More information

COMPUTER SCIENCE TRIPOS

COMPUTER SCIENCE TRIPOS CST.2016.6.1 COMPUTER SCIENCE TRIPOS Part IB Thursday 2 June 2016 1.30 to 4.30 COMPUTER SCIENCE Paper 6 Answer five questions. Submit the answers in five separate bundles, each with its own cover sheet.

More information

1 ListElement l e = f i r s t ; / / s t a r t i n g p o i n t 2 while ( l e. next!= n u l l ) 3 { l e = l e. next ; / / next step 4 } Removal

1 ListElement l e = f i r s t ; / / s t a r t i n g p o i n t 2 while ( l e. next!= n u l l ) 3 { l e = l e. next ; / / next step 4 } Removal Präsenzstunden Today In the same room as in the first week Assignment 5 Felix Friedrich, Lars Widmer, Fabian Stutz TA lecture, Informatics II D-BAUG March 18, 2014 HIL E 15.2 15:00-18:00 Timon Gehr (arriving

More information

Universal Turing Machine. Lecture 20

Universal Turing Machine. Lecture 20 Universal Turing Machine Lecture 20 1 Turing Machine move the head left or right by one cell read write sequentially accessed infinite memory finite memory (state) next-action look-up table Variants don

More information

GIS-BASED DISASTER WARNING SYSTEM OF LOW TEMPERATURE AND SPARE SUNLIGHT IN GREENHOUSE

GIS-BASED DISASTER WARNING SYSTEM OF LOW TEMPERATURE AND SPARE SUNLIGHT IN GREENHOUSE GIS-BASED DISASTER WARNING SYSTEM OF LOW TEMPERATURE AND SPARE SUNLIGHT IN GREENHOUSE 1,2,* 1,2 Ruijiang Wei, Chunqiang Li, Xin Wang 1, 2 1 Hebei Provincial Institute of Meteorology, Shijiazhuang, Hebei

More information

Knowledge Discovery and Data Mining 1 (VO) ( )

Knowledge Discovery and Data Mining 1 (VO) ( ) Knowledge Discovery and Data Mining 1 (VO) (707.003) Map-Reduce Denis Helic KTI, TU Graz Oct 24, 2013 Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 1 / 82 Big picture: KDDM Probability Theory Linear Algebra

More information

McBits: Fast code-based cryptography

McBits: Fast code-based cryptography McBits: Fast code-based cryptography Peter Schwabe Radboud University Nijmegen, The Netherlands Joint work with Daniel Bernstein, Tung Chou December 17, 2013 IMA International Conference on Cryptography

More information

From BASIS DD to Barista Application in Five Easy Steps

From BASIS DD to Barista Application in Five Easy Steps Y The steps are: From BASIS DD to Barista Application in Five Easy Steps By Jim Douglas our current BASIS Data Dictionary is perfect raw material for your first Barista-brewed application. Barista facilitates

More information