QR decomposition integration into DBMS

Size: px
Start display at page:

Download "QR decomposition integration into DBMS"

Transcription

1 Department of Informatics, University of Zürich BSc Thesis QR decomposition integration into DBMS Dzmitry Katsiuba Matrikelnummer: July 2, 2017 supervised by Prof. Dr. M. Böhlen and Oksana Dolmatova

2

3 Acknowledgements Most of all, I would like to express my sincere gratitude to my supervisor Oksana Dolmatova for her patience, helpfulness and support. I am very thankful for the feedback and guidance she provided. Also, I would like to thank Prof. Dr. Michael Böhlen for giving me the opportunity to write my bachelor thesis at the Database Technology Group of the University of Zurich.

4

5 Abstract The demand to analyse data stored in DBMSs has increased significantly during the last few years. Since the analysis of scientific data is mostly based on statistical and linear algebra operations (e.g. vector multiplication, operations on matrices), the computation of the latter plays a big role in data processing. However, the current approach to deal with statistics is to export data from a DBMS to a math program, like R or MATLAB. This implies additional time and memory costs. At the same time the column-store approach has become popular and a number of hybrid or pure column-store systems, such as MonetDB, Apache Cassandra etc. are available. I investigate the benefits of incorporating a linear operation into a column-oriented DBMS. For this purpose, I integrate the QR decomposition in MonetDB, analyse the complexity of the implementation and empirically compare the performance with the existing R-solution (exporting the data with UDFs). The results of experimental evaluation show, that the embedded R solution works faster than the QR integration in MonetDB, when a virtualized environment used. On an unvirtualized host MonetDB has a significantly better performance, exceeding the results of R. The implemented Q QR function allows calculation on relations directly in the DBMS. It uses the SQL syntax, which makes the usage of the function easy and intuitive. The user doesn t require any additional skills, while writing of UDF functions for different sets of parameters means a certain programmer effort.

6

7 Zusammenfassung Die Nachfrage nach Datenanalyse der in Datenbanken gespeicherter Information ist in den letzten Jahren deutlich gewachsen. Die Auswertung der wissenschaftlichen Daten basiert meist auf statistischen und linearen Algebraoperationen (z.b. Vektorrechnungen, Matrizen-Operationen usw.). Dabei erhält die Möglichkeit, direkt in Datenbanken diese Operationen auszuführen, ein immer grösseres Gewicht. Die verbreitete Vorgehensweise besteht darin, Daten in ein mathematisches oder statistisches Programm, wie beispielsweise R oder MATLAB, zu exportieren. Das Problem dabei: Es verursacht zusätzlichen Zeit- und Speicheraufwand. Gleichzeitig werden spaltenorientierte Datenbanken immer populärer. Auch sind auf dem Markt reine oder hybride spaltenorientierte Datenbanken (wie MonetDB, Apache Cassandra) verfügbar. Meine Arbeit untersucht die Vorteile der Integration von linear algebraischen Operationen in einer spaltenorientierten Datenbank. Dazu integriere ich die QR-Zerlegung in MonetDB, analysiere die Komplexität der Implementierung und vergleiche empirisch ihre Performanz mit der existierenden R-Lösung (unter Anwendung von UDFs für den Datenexport). Die Ergebnisse meiner experimentellen Analyse zeigen, dass in einer virtuellen Umgebung die eingebettete R-Lösung eine grössere Schnelligkeit aufweist, als die QR-Integration in MonetDB. Anders verhält es sich in einer nicht virtuellen Umgebung: Da weist MonetDB eine signifikant bessere Performanz auf und überholt gar die R-Lösung. Die implementierte Q QR-Funktion ermöglicht die Berechnungen auf Relationen direkt in der Datenbank. Die dabei genutzte, übliche SQL-Syntax macht den Gebrauch dieser Funktion einfach und intuitiv. Dafür benötigt der Benutzer keine zusätzlichen Kenntnisse, während das Schreiben von UDF-Funktionen für verschiedene Parameterkombinationen einen zusätzlichen Programmieraufwand bedeutet.

8

9 Table of Contents 1 Introduction Motivation Problem definition Related Work QR Decomposition Matrix operations in a DBMS Task description 7 4 Approach MonetDB Implementation Symbol tree Relation tree Statement tree Translation to MAL-Plan Complexity Analysis Complexity of the Gram-Schmidt algorithm Complexity of the implementation Experimental evaluation Setup Test Case Selection and Metrics Data Set Results W ith ordering mode W ithout ordering mode Operating environment Change of complexity Optimization Summary and future work 43

10 x Table of Contents A Code 47 A.1 sql parser.y A.2 rel bin.c A.3 sql gencode.c B Relation-Plan 55 C MAL-Plan 57 C.1 MAL-Plan for with ordering mode C.2 MAL-Plan for without ordering mode D UDF 63 D.1 UDF for with ordering mode D.2 UDF for without ordering mode D.3 UDF without Q QR E Detailed test results 65 x

11 1 Introduction 1.1 Motivation The amount of data produced by scientists has increased significantly during the last decades. The demand for structured and practical approach to processing and storing these data is thus even higher. Database management systems (DBMSs) represent an efficient storage solution with integrated set of mathematical and statistical operations (e.g. minimum, maximum, average, count etc.). However, scientific data often require a more complicated evaluation and computations, such as linear algebra operations, which cannot always be processed directly in DBMSs. Math and statistics programs, like R or MATLAB, allow eluding the lack of statistical functionality in popular DBMSs. At the same time column-oriented DBMSs have attracted a lot of attention. These databases differ in the way information is stored, namely the attribute values belonging to the same column are stored separately, contiguously and densely packed, while the traditional row-oriented database systems store the entire records (rows) one by one [1]. The column-store approach has become popular and a number of hybrid or pure column-store systems, such as MonetDB, Kdb, VectorWise, Infobright, Exasol are now available. Column-oriented systems have some important advantages compared to rowstores, e.g. compression of the repeating values in columns, working with linear sets of values, especially when those sets are much larger than available RAM. Specifically due to these advantages column-oriented databases offer a great flexibility for analytical workloads. The goal of my thesis is to expand MonetDB functionality by integrating one of the most well-known linear algebra operations, the QR decomposition (QRD), and to investigate the benefits of incorporating a linear operation into a column-oriented DBMS. 1.2 Problem definition Many mathematical and statistical problems arising during scientific research are solved with the help of linear algebra operations, which require matrix and array operations. QRD is one of the popular operations. Calculations of eigenvalue algorithm, the QR algorithm, the linear least squares problem and many other problems are often solved

12 2 CHAPTER 1. INTRODUCTION using the QRD. Unfortunately, it doesn t belong to the standard set of database functions and additional steps for its execution are required. Moreover, the QRD, as any linear operation, is defined over matrices, but not relations. A relation should be adjusted in order to be suitable for the QRD. The current approach to perform this and other statistical calculation is to export data from a DBMS, perform the operation using a math tool and after that import the results back into the DBMS. This process implies additional time and memory costs. Moreover, it is a potential source of errors. The integration of the QRD in a column-oriented DBMS can be done by implementing a new function, which returns the matrix Q from the QRD (Q QR function). This approach allows doing the decomposition directly in the database, using existing relations, without additional export and import operations. It also saves resources and excludes possible non-calculation errors. 2

13 2 Related Work In this section I give an overview about QRD algorithms, describe DBMSs that allow working with linear algebra operations. I also mention the possibility of processing these operations on database relations. 2.1 QR Decomposition QR Decomposition (QRD) of a matrix, also known as the QR factorization, is a decomposition of matrix A into a product A = QR, where Q is an orthogonal matrix, and R is an upper triangular matrix. I assume, that the matrix A is of size mxn, where m is a number of rows, n is a number of columns and m n. QRD is used for solving several mathematical problems, such as: computation of matrix inversion, determination of the numerical rank of a given matrix, singular value decomposition, least squares problem, etc. Algorithm 1 Classical Gram-Schmidt algorithm [2] 1: for k:=1 to n do 2: for i:=1 to k-1 do 3: s := 0 4: for j:=1 to m do 5: s := s + a j,i a j,k 6: r i,k := s 7: for i:=1 to k-1 do 8: for j:=1 to m do 9: a j,k := a j,k a j,i r i,k 10: s := 0 11: for j:=1 to m do 12: s := s + a 2 j,k 13: r k,k := sqrt(s) 14: for j:=1 to m do 15: a j,k := a j,k /r k,k

14 4 CHAPTER 2. RELATED WORK There are two common algorithms for executing QR-decomposition: Householder transformations Gram-Schmidt algorithm (see Algorithm 1) These algorithms differ in terms of performance and possibility of parallelizing their execution. Transforming parts of the classical Gram-Schmidt algorithm into euclidean norm and dot product functions produces the vector-based version of the algorithm (see Algorithm 2). The algorithm normalizes the vectors in sequence, doing orthogonalization for the rest of the vectors after every normalization (reorthogonalizing them by already normalized vectors). Algorithm 2 Vector-based Gram-Schmidt algorithm 1: for k:=1 to n number of vectors do 2: r k,k := euclidean norm(a k ) normalization 3: q k := a k /r k,k 4: for j:=k+1 to n do 5: r := dot product(q k, a j ) orthogonalization 6: a j := a j r q k The reorthogonalization doubles the computational effort and for this reason the method of Householder is usually preferred. The advantage of Gram-Schmidt algorithm is that it is possible to perform the orthogonalization of one vector to as many other vectors as it is needed [2]. If, as according to the task (see Chapter 3), only the matrix Q is needed, then the modified Gram-Schmidt algorithm is much more efficient [9]. 2.2 Matrix operations in a DBMS Data-intensive research fields rely on the ability to efficiently process massive amounts of experimental data using database technologies. A DBMS works mostly with relations and not with arrays or matrices, as it is often necessary for scientific purposes. Outsourcing those computations to other programs represents a solution for this computation gap. The most DBMSs have only a limited set of algebra operations and functions, such as count, minimum, maximum, average etc. These are functions that are not dependent on the order of the values in the attributes. More complex operations that respect the order of attributes are not built-in and can only be performed using user-defined functions (UDF), embedded packages or by exporting the data into external solutions. One of the biggest relational database Oracle has an R Enterprise solution, which integrates R with Oracle Database [10]. MonetDB has an embedded R package, which also allows using the full functionality of R inside of MonetDB. However, this approach needs programming effort and involves writing of single UDF function for each set of input parameters (e.g by changing the number or data type of input columns, etc.). The 4

15 2.2. MATRIX OPERATIONS IN A DBMS 5 implemented Q QR function allows processing the linear algebra operation directly in DBMS, without exporting and re-importing of the data. To bridge the gap between the needs of the scientific world and the current DBMS technologies, the first SQL-based query language (SciQL) for scientific applications was introduced. SciQL s key innovation is the introduction of an array type, which works on the same level as a table. It uses both tables and arrays as first-class citizens and provides relational algebra-like operations on them [15]. The difference between these two types is their structure: tables are represented by a set of tuples, while an array is identified by attributes (dimensions) and their constraints. Combination of indeces (values of these attributes) denotes a cell, where a value of one non-dimensional attribute is stored. Thus, one tuple from a table is represented in arrays by a set of cell indices and the non-dimensional value in this cell. SciQL uses MonetDB as the target platform. Its column-store architecture corresponds well to the array representation, which significantly reduces the impedance mismatch between query language and array manipulation and, as a result, the development effort [16]. To address arrays in queries, the user applies the table syntax. That makes the transition from SQL to SciQL easier [6]. The implemented Q QR function allows working directly on relations, without paying attention to the array representation or learning a new query language. SciDB is an open-source analytical database that provides not only data management, but also complex analytics. In contrast to SciQL, SciDB was completely new developed. SciDB is oriented towards the data management needs of scientists, fulfilling their requirements regarding complex analytics and working with the large data sets [14]. As such it mixes statistical and linear algebra operations with data management operations (e.g. create, delete, update values, managing of constrains, etc.). SciDB takes natural nested multi-dimensional array data model as basis that eliminates the conversion between tables and arrays. For this goal a new functional (AFL) and a SQL-like query (AQL) languages were invented. SciDB provides an extensibility framework (for user-defined data types, functions, aggregates but also array operators). That makes it possible to apply highly specific algorithms in conjunction with some pre-processing handled by SciDB s built-in or user-defined array operators [13]. In contrast to SciDB for employing the new defined function in MonetDB, the user doesn t need to learn a new query language or deal with a new data model. The Q QR function runs directly on existing relations. 5

16

17 3 Task description In this chapter I describe the function that has to be implemented, including its structure, input parameters and constraints. I give a running example for the input relation, on which I show the defined parameters. Finally, using the running example, I define the expected results of the function. The goal of this thesis is to integrate the QR decomposition (QRD) into columnoriented DBMS by extending it with the new Q QR function. So that afterwards the following SQL query can be processed: SELECT F ROM Q QR (relation name ON on attributes ORDER BY order by attributes); (3.1) This query returns a relation with the same schema as the input relation and represents the Q matrix from the QRD. Since I want only the Q matrix as the result, QRD can be implemented according to the Gram-Schmidt Algorithm (see Chapter 2.1). Figure 3.1 illustrates the structure of the input relation. Descriptive part {}}{ Application part {}}{ o 1 o 2 o... o n d 1 d 2 d... d n a 1 a 2 a... a n } {{ } Order By part Figure 3.1: Structure of the input relation The application part (in (3.1) on attributes) includes the set of only numeric-domain attributes, which represents the matrix A for the QR decomposition (see Chapter 2.1). The relationship between the size of the application part and the whole number of attributes in the relation forms the ratio of the application part. Attributes, which are not in the application part, form the descriptive part of the input relation. The Order By part is a subset of descriptive part (in (3.1) order by attributes). It includes

18 8 CHAPTER 3. TASK DESCRIPTION the attributes, the tuples in the output relation and the corresponding matrix A for the QRD have to be sorted by. The attributes in Order By part can also be of non-numeric data types. The ratio of the Order By part is a relationship between its size and the number of attributes in the descriptive part. For the implementation I use MonetDB (Jun2016-SP2). The column-oriented architecture of MonetDB works with the attributes as arrays. It allows me to use the vector-based version of the algorithm (see Algorithm 2), which handles the values of one attribute together as one vector. The DBMS has to recognize ambiguous or not existing names of attributes and the input relation in the passed query, and, if applicable, it returns an appropriate error message. The input relation can be an existing relation or a result of an arbitrary select subquery. As a running example I take a relation r, which is displayed in Figure 3.2. Descriptive part {}}{ Application part {}}{ a b c x y z t t t t t t } {{ } Order By part Figure 3.2: Relation r An example SQL query for this relation is shown in Query (3.2) SELECT F ROM Q QR (r ON x, y, z ORDER BY a); (3.2) QRD is performed over 3 attributes (x, y, z) from the application part of r. After ordering the attributes by a, the relation r and the matrix A look as shown in Figure 3.3. a b c x y z t t t t t t A = Figure 3.3: Relation r ordered by attribute a and the corresponding matrix A 8

19 9 a b c x y z Figure 3.4: Results of the query (3.2) The Order By part in (3.2) is presented only by attribute a, the ratio of Order By part is thus 1/3, while the ratio of the application part is 50%. Figure 3.4 demonstrates what the results of the Q QR function applied on the relation r has to look like. 9

20

21 4 Approach In this chapter I give a description of the architecture and features of the used DBMS (MonetDB). Afterwards I describe the necessary implementation steps. 4.1 MonetDB MonetDB is a column-oriented database management system developed at the Centrum Wiskunde & Informatica (CWI) in the Netherlands. It is actively used in businesses such as health care, telecommunications as well as in sciences such as astronomy. Innovations at all layers of a DBMS such as storage model based on vertical fragmentation, a modern architecture of CPU-tuned query execution, different ways of optimizations, adaptive indexing and a modular software architecture allow MonetDB to achieve significant speed up compared to traditional designs [5]. It supports SQL:2003 standard on client side. Architecture MonetDB has a three-level software stack: SQL front-end: The top layer of the architecture provides the access point for the user. The task of front-end is to map the user s data to the MonetDB s internal structure (BAT) and to translate user queries to MonetDB Assembly Language, known as MAL. The user query is transformed from a SQL query to an internal relational algebra representation, which is then optimized using domain-specific rules and heuristics (e.g. reducing the size of intermediate results, by pushing selections down in the tree). This represents the strategic optimization of the parsed query [5]. At this level I need to extend the parser with the new syntax, and ensure that all internal transformations and the strategic optimizations are done. Tactical-optimizers: In the middle layer the resulting MAL-plan is going through a row of tactical optimizations. It is inspired more by the programming language than by classical database query optimization. Each optimizer, sprinkled with resource management or flow of control directives, takes a MAL plan and transforms it into a more efficient one. It provides facilities ranging from symbolic processing up to just-in-time

22 12 CHAPTER 4. APPROACH r a b c x y z OID a OID b OID c OID x OID y OID z o 1 7 o 1 8 o 1 3 o 1 0 o 1 2 o o 2 1 o 2 2 o 2 0 o 2 3 o 2 4 o = o 3 4 o 3 1 o 3 8 o 3 2 o 3 1 o o 4 2 o 4 4 o 4 1 o 4 8 o 4 9 o o 5 9 o 5 0 o 5 8 o 5 2 o 5 9 o o 6 4 o 6 5 o 6 2 o 6 1 o 6 0 o 6 1 } {{ } BAT s Figure 4.1: Internal (BAT) representation of relation r data distribution [5]. Columnar abstract-machine kernel: The bottom layer of MonetDB stores each attribute of a relation as columns (in fact as arrays) known as BAT. It includes the library of highly optimized implementations of the binary relational algebra operators, stored in corresponding BAT modules. Those operators have access to the whole metadata of BATs, it allows them to perform operational optimization. The interface is described in the MAL module sections. For the implementation of the task I can reuse already existing BAT algebra modules. Nothing additional needs to be done at this level for the purpose of optimization. BAT MonetDB stores each attribute of a relation in a separate table, called a BAT (Binary Association Table). Each BAT consist of two columns: the head column, where the object-identifiers (OID) are stored, and the tail column, where the actual attribute values are stored. Due to this approach every relation is internally represented as a collection of BATs. Physically BATs are represented as consecutive C arrays. They can be up to hundreds of megabytes. The operating system swaps them into memory and compresses on the disk upon need [3]. The example relation r consists of 6 attributes, for these attributes 6 BATs exist. Each BAT stores the respective attribute as (OID, value) pairs. The internal representation of r is illustrated in Figure 4.1. OIDs (o 1, o 2,...,o 6 ) are generated from the system and identify the tuple the value belongs to. The values from the same tuple are assigned the same OID (e.g. in 6 BATs that represent relation r values 7, 8, 3, 0, 2, 5 at the very top of each BAT have the same OID o 1. That means, that all of them belong to the same tuple t 1 ). The head OID column is not materialised, but rather implicitly given by the position of the value, so all values of one tuple will be on the same position in their respective column representation. The position, in turn, is determined by the inserting order of tuples [5]. Relational algebra plans are translated into simple BAT algebra operations and com- 12

23 4.1. MONETDB 13 piled to MAL programs. BAT algebra operators perform simple operations on an entire column of values ( bulk processing ). MAL MonetDB Assembly Language (MAL) is the language used for the abstract machine of the MonetDB kernel. It is the target language for front-end query compilers. Every SQL query generates an execution plan, consisting of simple MAL instructions. These instructions represent all actions, which are necessary in order to provide and deliver the results (e.g. actions to ensure binding data and transaction control, the instructions to produce the results, and the administration steps for preparing and delivering the resulting relation to the front-end). BAT algebra operators correspond to simple MAL instructions, thus building its core. MAL instructions have only BATs as input, on which corresponding simple BAT operations are performed. Complex operators can be broken down into a sequence of BAT algebra operators, which in their turn are mapped to simple operations on arrays. This approach allows avoiding additional expression interpretations. An example of such implementation of a select operator in MonetDB is shown in Figure 4.2 [5]. The select value (V) is compared to the values, stored in the tail part of the input BAT (B), which in its turn is stored as a C-Array. If the value is found, its OID is stored in the tail part of the resulting BAT (R). for ( i=j=0 ; i<n ; i++ ) select (B:bat:[:oid:int], V:int) if ( B.tail[i] == V) R.tail[j++] = i; Figure 4.2: Implementation of select operation The for-loops containing no external function calls provide high instruction locality. That enables better compiler optimization and leads to reduction in the number of instruction cache misses. MonetDB provides an internal function to show the MAL-Plan of a passed query. The operations in it are executed one at a time. That means, that subsequent operations are invoked only after the operation completed the calculation over its entire input data. The idea of BAT operations is always to have a materialized result after its execution. Query Processing Symbol tree: A query passed into the client is parsed in order to detect the specified keywords (tokens). The tokens help the Yacc parser generator to generate a shift-reduce parser. The input to Yacc is a grammar (sql parser.y) with snippets of C++ code ( actions ) attached to rules of the grammar. As soon as the rule is recognised, the parser 13

24 14 CHAPTER 4. APPROACH calls the code snippets associated with this rule. During the execution of these snippets nodes of different types are built and connected together in a symbol tree. Symbol tree represents only the structure of parsed query. Relation tree: The resulting symbol tree is used to build a relation tree. The relation tree consists of nodes, representing a single operation on input tables, containing necessary information for this operation. The nodes of the symbol tree are parsed, analysed, grouped together and converted into corresponding relation tree node, depending on tokens in them. Each node in the relation tree may have up to two children. Afterwards the strategic optimization is carried out. Statement tree: A statement tree is a special feature of MonetDB. It is an intermediate step between a relation tree and a low-level execution plan (MAL-Plan). Comparing to the resulting relation tree, which nodes represent operations on or between relations, each statement in the statement tree represents the sequence of operations on attributes and returns one BAT as a result. Each operation is represented by a MAL-expression that in its turn corresponds to simple BAT algebra operations. Putting it all together, MonetDB provides a good extensibility framework. That allows to extend its functionality, SQL syntax and to make changes at all levels of architecture. The BAT structure using the OID guarantees, that the values from the same tuple belong together, despite the order of the tuples in BATs. This mechanism allows algebra and statistical operations, for which the order and the relation between the values are essential. Breaking down the operations to the simple operations on arrays and the set of existing BAT algebra operations provide a good basis for implementation of sophisticated and complex calculations. 14

25 4.2. IMPLEMENTATION Implementation In this subchapter I describe the changes in MonetDB, which has to be done in order to implement the task query. I also show the examples of trees built by MonetDB during the processing of Q QR function, using the example SQL query (3.2) Symbol tree To parse the task query I expand the existing parser grammar by a new token Q QR. For processing the new token I define additional rules and snippets of code for these rules. The extended part of the grammar can be found in Appendix A.1. Figure 4.3 illustrates the symbol tree, which is produced after parsing and executing the corresponding code. SELECT * FROM Q QR (r ON x, y, z ORDER BY a); SelectNode F ROM Q QR r Application part ORDER BY x y z a Figure 4.3: Symbol tree for the query (3.2) The symbols are build according to the tokens recognized in the passed query. SelectN ode is a route symbol-node for all SELECT-queries. It is connected to all symbols containing the information, which is necessary to produce the results from the input relation and to deliver the needed result relation (e.g. symbols like W HERE, GROUP BY, ORDER BY, HAV ING etc.). The query (3.2) produces the SelectN ode connected only to one symbol namely one with the information about the source relation (F ROM). The new Q QR token also produces a symbol, connected to the symbols with input parameters: relation r, Application part and ORDER BY. The last two parameters are lists, containing the COLUMN-symbols. The Application part contains the columns (x, y, z) for the matrix used during the QRD and ORDER BY contains the column (a) for determining the order of values in the resulting relation. The information in COLU M N-symbols must correspond with the attributes in the input relation r. The result of the Q QR-symbol represents the only source relation for the F ROM-symbol. 15

26 16 CHAPTER 4. APPROACH Relation tree To implement the task I add a new function that deals with the new Q QR-symbol in the symbol tree. This function uses the Q QR-symbol as well as the related symbols with corresponding information to build a proper relation tree node for the Q QR function. The resulting relation tree has tree nodes, as shown in Figure 4.4. SELECT * FROM Q QR (r ON x, y, z ORDER BY a); π a,b,c,x,y,z q qr a x,y,z r Figure 4.4: Relation tree for the query (3.2) After conversion of symbol tree nodes into relation tree nodes the information about input relation r is stored in a separate node. For storing the lists of columns for QRD (x, y, z) and ordering (a) I define a new type of a relation tree node (q qr relation). The resulting relation tree corresponds to the relation plan, produced by the internal MonetDB function (see Appendix B) Statement tree To fulfil the translation of the new q qr relation into MAL-Plan and make the calculation of the QRD possible, I need to extend the existing translation function. It takes the information from the q qr relation and, following the steps of the vector-based Gram- Schmidt Algorithm, translates it in a sequence of statements. The pseudocode of the translation is shown in Algorithm 3. The extended part of the function can be found in Appendix A.2. The resulting sequence of statements can be also presented in a tree form. Figure 4.5 displays such a statement tree for the q qr relation of the query (3.2). It shows both the sequence and the origin of the data required for each particular statement. Each statement tree node is a single BAT, representing the results of the done statement calculations. At the beginning I use the information from the q qr relation in order to create two lists of BATs for the descriptive (a, b, c) and the application parts (x, y, z) of the input relation. Using the Order By part (a), I determine the order of tuples in the output relation. For this purpose I reuse the operations of the existing statements for ordering and reordering of data in BATs. These statements are translated without changes into 16

27 4.2. IMPLEMENTATION 17 Algorithm 3 Q QR 1: procedure Q QR(relation, on attrs, order by attrs) 2: attrs attributes of relation 3: descriptive part attrs on attrs 4: if order by attrs then 5: order by order of order by attrs 6: for descriptive attr in descriptive part do 7: sort descriptive attr by order by 8: add descriptive attr to list output 9: for on attr in on attrs do 10: sort on attr by order by 11: add on attr to on attrs sorted 12: rest attrs on attrs sorted 13: for on attr in on attrs sorted do 14: normalize on attr 15: add on attr to list output 16: rest attrs rest attrs on attr 17: for rest attr in rest attrs do 18: orthogonalize rest attr to on attr return list output low-level MAL-plan and represented through existing BAT algebra operations. The ordering statement takes the BAT, by which it must be sorted, as input and returns a BAT containing the determined order of tuples (O list order attributes in Figure 4.5 O a ). After that the attributes in the descriptive and the application parts are ordered by this BAT. As a result a new ordered BAT for each of the attributes is produced (O order by attrs (on attr) in Figure 4.5 O a (a), O a (b),..., O a (z)). The ordered results of the descriptive part are directly appended to the output list. The ordered application part is stored as an intermediate list that is used for the Q QR calculation afterwards. Q QR calculation is defined by a sequence of normalization and orthogonalization operations on attributes. The operations of the corresponding statements and their results are not implemented in MonetDB and have to be defined. Detailed description of how these two operations work and how the resulting BATs (N orm(attribute) and Orth attribute2 (attribute 1 ), e.g. in Figure 4.5 Norm(x) and Orth x (y) or Orth y (z) respectively) are produced is given in the Chapter I append the BAT, representing the normalized attribute, to the output list. After all operations have been done and all attributes have been appended, the output list is passed as an argument to the next node of the relation tree. The projection of the results to the user is processed by existing statements, and doesn t require changes at this step. The MAL-Plan constructed for the query (3.2) is shown in the Appendix C.1. 17

28 18 CHAPTER 4. APPROACH SELECT * FROM Q QR (r ON x, y, z ORDER BY a); a a b a c a x a y a z a Output Norm(z) Orth y(z) Norm(y) Q QR Orth x(y) Orth x(z) Norm(x) x a y a z a O a(a) O a(b) O a(c) O a(x) O a(y) O a(z) Ordering O a a b c x y z a b c x y z }{{} Order By part } {{ } Descriptive part } {{ } Application part Input Translation to MAL-Plan Figure 4.5: Statement tree for the query (3.2) Two additional statements, required for the implementation of Gram-Schmidt algorithm and used by me for the translation of the q qr relation into low-level MAL instructions, have to be implemented. These statements execute the operations of normalization and orthogonalization on attributes (see Algorithm 2). To get the results of these two statements, I use BAT operations from existing BAT modules, such as: algebra calculations (multiplication, exponentiation, division, subtraction) aggregate functions (sum) mathematical functions (square root). Both statements run over the ordered application part of the relation that is stored after the ordering operation in an intermediate list. 18

29 4.2. IMPLEMENTATION 19 Normalization: Normalization statement has one BAT as input, which has to be normalized. First the euclidean norm for the attribute is calculated. The calculation is performed by multiplying the corresponding BAT of the attribute by itself. The values in the resulting BAT are summed and the square root of the sum is taken. Each of the aggregate (SUM()) and mathematical (SQRT ()) functions returns a single number, stored as a separate temporal BAT. This BAT is used in subsequent calculations later. Division of each attribute value by its euclidean norm delivers the normalized form of the attribute. It is stored as a new BAT (Norm(x)) that is also appended to the output list of attributes. Figure 4.6 illustrates the processes of normalization statement, on the example of the attribute x from the query (3.2). List of ordered application part OID Oa(x) o 2 3 o 4 8 o 3 2 o 6 1 o 1 0 o 5 2 OID Oa(y) o 2 4 o 4 9 o 3 1 o 6 0 o 1 2 o 5 9 OID Oa(z) o 2 5 o 4 6 o 3 2 o 6 1 o 1 5 o 5 3 OID Oa(x) o 2 3 o 4 8 o 3 2 o 6 1 o 1 0 o 5 2 OID Oa(x) o 2 3 o 4 8 o 3 2 o 6 1 o 1 0 o 5 2 = OID temp1 o 2 9 o 4 64 o 3 4 o 6 1 o 1 0 o 5 4 Euclidean norm SUM( OID temp1 o 2 9 o 4 64 o 3 4 o 6 1 o 1 0 o 5 4 ) = OID temp2 o t2 82 SQRT ( OID temp2 o t2 82 ) = OID temp3 o t3 9,05539 Normalization operation OID Oa(x) o 2 3 o 4 8 o 3 2 o 6 1 o 1 0 o 5 2 / OID temp3 o t3 9,05539 = OID Norm(x) o 2 0,33129 o 4 0,88345 o 3 0,22086 o 6 0,11043 o 1 0 o 5 0,22086 Figure 4.6: Processes of normalization statement, on the example of the attribute x from the query (3.2) Orthogonalization: After each operation of normalization the rest of non-normalized attributes in the application part of the relation has to be orthogonolized to the last normalized vector. For this purpose the statement Orth attribute2 (attribute 1 ) takes two 19

30 20 CHAPTER 4. APPROACH attributes as an input: the attribute that has to be orthogonalized (attribute 1 ) and the attribute, which it has to be orthogonalized to (attribute 2 ). An example of orthogonalization procedure for the attribute y (orthogonalized to attribute x) is displayed in Figure 4.7. List of ordered application part OID Norm(x) o 2 0,33129 o 4 0,88345 o 3 0,22086 o 6 0,11043 o 1 0 o 5 0,22086 OID Oa(y) o 2 4 o 4 9 o 3 1 o 6 0 o 1 2 o 5 9 OID Oa(z) o 2 5 o 4 6 o 3 2 o 6 1 o 1 5 o 5 3 Dot product OID Norm(x) o 2 0,33129 o 4 0,88345 o 3 0,22086 o 6 0,11043 o 1 0 o 5 0,22086 OID Oa(y) o 2 4 o 4 9 o 3 1 o 6 0 o 1 2 o 5 9 = OID temp1 o 2 1,32518 o 4 7,95107 o 3 0,22086 o 6 0 o 1 0 o 5 1,98777 SUM( OID temp1 o 2 1,32518 o 4 7,95107 o 3 0,22086 o 6 0 o 1 0 o 5 1,98777 ) = OID temp2 o t2 11,48488 Orthogonalization process OID Norm(x) o 2 0,33129 o 4 0,88345 o 3 0,22086 o 6 0,11043 o 1 0 o 5 0,22086 OID o t2 temp2 11,48488 = OID Oa(y) o 2 4 o 4 9 o 3 1 o 6 0 o 1 2 o 5 9 OID temp3 o 2 3,80488 o 4 10,14634 o 3 2,53659 o 6 1,26829 o 1 0 o 5 2,53659 OID temp3 o 2 3,80488 o 4 10,14634 o 3 2,53659 o 6 1,26829 o 1 0 o 5 2,53659 = OID Orthx(y) o 2 0,19512 o 4-1,14634 o 3-1,53659 o 6-1,26829 o 1 2 o 5 6,46341 Figure 4.7: Processes of orthogonalization statement, on the example of the attribute y from the query (3.2) (orthogonalized to attribute x) After the orthogonalization of the non-normalized attributes is completed, as shown in Figure 4.8, the next attribute can be normalized and the orthogonalization procedure iterated until there is only one attribute left. After the normalization of the last attribute, the latter is appended to the output list, where it forms the output relation, together with the attributes from the descriptive part, as shown in Figure 3.4. One of the advantages of the application of Gram-Schmidt algorithm for the matrix 20

31 4.2. IMPLEMENTATION 21 OID Norm(x) o 2 0,33129 o 4 0,88345 o 3 0,22086 o 6 0,11043 o 1 0 o 5 0,22086 OID Orthx(y) o 2 0,19512 o 4-1,14634 o 3-1,53659 o 6-1,26829 o 1 2 o 5 6,46341 OID Orthx(z) o 2 2,29268 o 4-1,21951 o 3 0,19512 o 6 0,09756 o 1 5 o 5 1,19512 Figure 4.8: Stand of BATs after all attributes from the query (3.2) are orthogonalized to attribute x Q calculation is the possibility of parallelizing its execution. Since there are no dependencies between the attributes and no overwriting operations of them during the orthogonalization, the latter can be done concurrently. The described approach represents the processes and the order of their executions according to the used algorithm. However, during the low-level MAL-optimization of MonetDB, these processes are re-organized, so that the entire set of operations (the required number of orthogonalizations and the subsequent normalization) are performed consequently for each attribute. This is reflected in the produced MAL-Plan for Q QR function (see Appendix C). Such optimization helps to reduce the number of cache misses, since more information needed for all optimizations and the normalization is stored in the cache. That reduces the number of requests to the slow main memory and improves the runtime of the entire Q QR function. The code for the extended statements is shown in Appendix A.3. 21

32

33 5 Complexity Analysis In this chapter the complexity of the original Gram-Schmidt algorithm and the implementation are estimated and compared. 5.1 Complexity of the Gram-Schmidt algorithm The vector-based Gram-Schmidt algorithm, used for the implementation, comprises operations of normalization and orthogonalization. Using a matrix of size mxn, where m is the number of tuples and n is the number of attributes, the numbers of executions of the algorithm operations and their suboperations can be analyzed. Table 5.1 lists the estimated numbers of these operations. The normalization is executed only once for each vector, while orthogonalization is to be done for the rest (all non-normalized) vectors after every normalization. As a result, the execution number of orthogonalization equals to (n 1)n/2. Normalization Orthogonalization Operation Euclidian norm(e n ) Devision by e n Dot product Subtraction Sub -operation Number of executions n 2 n(m) + n(m-1) n(1) n / n(m) (n-1)n/2 ((n-1)n/2)(m) + ((n-1)n/2)(m-1) ((n-1)n/2) / ((n-1)n/2)(m) ((n-1)n/2)(m) Table 5.1: Operations in vector-based Gram-Schmidt algorithm and number of their executions

34 24 CHAPTER 5. COMPLEXITY ANALYSIS The total number of calculated QRD operations is summarized in Equation (n 1)n m + 2 (n 1)n (m 1) + 2 nm + n(m 1) + 1n + nm+ (n 1)n m + 2 (n 1)n m 2 (5.1) mn + 2mn n n (5.2) According to the simplified version of Equation (5.2), the algorithm has the total complexity of O(mn 2 ). 5.2 Complexity of the implementation According to the task, the output relation of the implemented Q QR function is ordered by the defined set of attributes. It means that the complexity of the Gram-Schmidt algorithm has to be extended by the complexity of additional operations of lists building, determination of order and ordering itself. In the first step I build the lists of attributes for the application and descriptive parts of the relation, which I need to define the matrix for the QRD. For this purpose I iterate through all attributes of the input relation, what has a complexity of O(n 2 ). The order of the attributes is to be determined according to the order by attributes passed in the query. If only one attribute is passed, the complexity is at least O(m log m). If there are more than one attribute in the order by attributes, the following attributes are only used in case, if the first order by attribute has at least two tuples with exactly same values. For ordering those tuples additional information is needed. That means, that the complexity of such additional ordering equals to O(k log k), where k is the number of tuples with same values and k m. The worst case of ordering is when all tuples of an order by attribute have the same value (k = m). The complexity of ordering in this case is at least O(sm log m), where s is the number of attributes with the same values. After the order is determined, all attributes in the relation have to be sorted according to it. This results in the ordering complexity of O(m log m n). The total complexity of the Q QR function is summarized in Equation(5.3). O(n 2 ) + O(m log m) + O(m log m n) + O(mn 2 ) (5.3) For the met assumption, that m n, the total complexity of the Q QR function equals to O(mn 2 ). If it is assumed, that m n, the effort for ordering overtakes the cost of the Q QR calculations, what leads to the resulting complexity of O(m log m n). 24

35 6 Experimental evaluation In this chapter I describe the experimental evaluation. It includes the explanation of the setup, test cases, data set, the results of the experiments and their discussion. The goal of the experimental evaluation is to analyse the complete runtime of the solution and to compare the performance of MonetDB with the existing R-solution empirically. Additionally to the complete runtime I analyse time spent on building the execution plan and for Q QR execution itself. 6.1 Setup The experiments are carried out on an Ubuntu (16.04 LTS) virtual machine having 5,5 GB RAM. The virtual machine (5.1.14) was running on MacOS Sierra ( ) having 2.70 GHz Intel Core i5-5257u CPU and 8 GB 1867 MHz DDR3 memory. To check the environmental dependency of MonetDB I use a server with 2 x Intel(R) Xeon(R), CPU E GH, 64GB main memory, hard disks 4x320GB, SATA, 15000rpm and OS: CentOS Test Case Selection and Metrics The implemented Q QR function has only one input relation and three additional parameters, which influence the execution performance and can be manipulated during the evaluation (see Figure 3.1): 1. number of tuples in the input relation (m) 2. number of attributes in the application part (n a ) 3. number of attributes in the descriptive part (n d ) 4. ratio of the Order By part in the descriptive part (ratio) The distribution of values is irrelevant. To get a different size of the Order By part of the relation I combine different Order By ratios with the size of the descriptive part.

36 26 CHAPTER 6. EXPERIMENTAL EVALUATION The input relation itself is an existing relation or a result of a passed sub-query. Since the processing of sub-query is not influenced during the implementation and it is not directly connected with the performance of the Q QR function, I refrain from manipulating this parameter. Experimental evaluation consists of two separate experiments: MonetDB vs. R: It is a comparison between the implementation in MonetDB and the existing R-solution. For this experiment, the embedded R package for MonetDB is used and the data is exported to R via UDF (see Appendix D). Internal: It provides the detailed information about the time consumption of the implementation function in MonetDB Detailed information on chosen values of input parameters for these experiments is presented in Table 6.1. Parameter Values m , , n a 20, 40, 60, 80, 100 n d 1 1, 10, 50 2 ratio 100%, 30%, 60%, 90% 3 Table 6.1: Values of the input parameters for experimental evaluation A combination of all four parameters represents a test case to be performed. For each of them a complete runtime is measured using the internal MonetDB timing function. For the Internal evaluation I record the time spent on creation of execution plan (preparation time), additionally to the complete runtime. The combination of complete runtime and preparation time provides the information about execution time. In order to evaluate time spent on ordering the relation, I run every defined test case in two modes (with and without ordering). For the without ordering mode I optimize the Q QR function so that the order is not determined and no ordering of attributes is performed. For each test case the difference between runtimes in two modes provides the information about time spent on ordering all attributes in the relation. 6.3 Data Set Every single test case has 3 runs that are performed one by one. The mean value is calculated based on the measurement results of each test case run. For each test case a new relation with m tuples and n attributes (n = n a + n d ) is generated. MonetDB and R solution use the same relation by each run. 1 Descriptive part with 1 attribute has always ratio of 100% 2 In the without ordering mode the descriptive part consists of only 1 attribute 3 In the without ordering mode the ratio is for all test cases 100% 26

37 6.3. DATA SET 27 By executing the same query more than one time, the execution plan is created only once and stored in the cache. To avoid its reusing for measuring the preparation time 3 runs are done on different relations. Test cases, where the Order By ratio parameter is manipulated, are performed on the same relation. Queries for creating and filling tables are generated using a python script program that takes random integers between 0 and

38 28 CHAPTER 6. EXPERIMENTAL EVALUATION 6.4 Results In this section I present the test results for both experiments. At first I present the results of the tests in with ordering mode, which are followed by the results of the without ordering mode. The measurements are evaluated and comparison and evaluation of the results is given. Detailed test results are listed in Appendix E W ith ordering mode In this part of the experimental evaluation I test the implemented solution in with ordering mode. The order of data in a relation can be essential for some statistical and linear algebra calculations, so the resulting relation is sorted by its Order By part. MonetDB vs. R The first set of test cases compares the performance of MonetDB with existing R- solution. That is done using UDF and embedded R package in MonetDB. The results of test cases with ordering by one attribute are illustrated in Figure 6.1. It is recognisable, that for R the number of tuples doesn t play a significant role, while for MonetDB it is a decisive parameter. Runtime(sec) MonetDB 100k tuples MonetDB 1M tuples R 100k tuples R 1M tuples Number of attributes in the application part Figure 6.1: Complete runtime in the MonetDB vs. R experiment (ordered by 1 attribute) The performance of R is over 4 times better than the performance of MonetDB for all combinations of the number of tuples and the size of the application part. And the more attributes in the application part are taken, the bigger is the absolute performance difference. By increasing the number of attributes in the application part and using the high number of tuples in the relation, the complete runtime in MonetDB changes 28

Optimization of Mixed Queries in MonetDB System

Optimization of Mixed Queries in MonetDB System Department of Informatics, University of Zürich BSc Thesis Optimization of Mixed Queries in MonetDB System Alphonse Mariyagnanaseelan Matrikelnummer: 15-712-698 Email: alphonse.mariyagnanaseelan@uzh.ch

More information

Linear Algebra and Eigenproblems

Linear Algebra and Eigenproblems Appendix A A Linear Algebra and Eigenproblems A working knowledge of linear algebra is key to understanding many of the issues raised in this work. In particular, many of the discussions of the details

More information

(Mathematical Operations with Arrays) Applied Linear Algebra in Geoscience Using MATLAB

(Mathematical Operations with Arrays) Applied Linear Algebra in Geoscience Using MATLAB Applied Linear Algebra in Geoscience Using MATLAB (Mathematical Operations with Arrays) Contents Getting Started Matrices Creating Arrays Linear equations Mathematical Operations with Arrays Using Script

More information

Designing Information Devices and Systems I Spring 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way

Designing Information Devices and Systems I Spring 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way EECS 16A Designing Information Devices and Systems I Spring 018 Lecture Notes Note 1 1.1 Introduction to Linear Algebra the EECS Way In this note, we will teach the basics of linear algebra and relate

More information

Temporal Integrity Constraints in Databases With Ongoing Timestamps

Temporal Integrity Constraints in Databases With Ongoing Timestamps Department of Informatics, University of Zürich BSc Thesis Temporal Integrity Constraints in Databases With Ongoing Timestamps Jasmin Ebner Matrikelnummer: 13-747-159 Email: jasmin.ebner@uzh.ch August

More information

Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012

Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012 Weather Research and Forecasting (WRF) Performance Benchmark and Profiling July 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell,

More information

Compiling Techniques

Compiling Techniques Lecture 11: Introduction to 13 November 2015 Table of contents 1 Introduction Overview The Backend The Big Picture 2 Code Shape Overview Introduction Overview The Backend The Big Picture Source code FrontEnd

More information

B553 Lecture 5: Matrix Algebra Review

B553 Lecture 5: Matrix Algebra Review B553 Lecture 5: Matrix Algebra Review Kris Hauser January 19, 2012 We have seen in prior lectures how vectors represent points in R n and gradients of functions. Matrices represent linear transformations

More information

NEC PerforCache. Influence on M-Series Disk Array Behavior and Performance. Version 1.0

NEC PerforCache. Influence on M-Series Disk Array Behavior and Performance. Version 1.0 NEC PerforCache Influence on M-Series Disk Array Behavior and Performance. Version 1.0 Preface This document describes L2 (Level 2) Cache Technology which is a feature of NEC M-Series Disk Array implemented

More information

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way EECS 16A Designing Information Devices and Systems I Fall 018 Lecture Notes Note 1 1.1 Introduction to Linear Algebra the EECS Way In this note, we will teach the basics of linear algebra and relate it

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical

More information

Dot-Product Join: Scalable In-Database Linear Algebra for Big Model Analytics

Dot-Product Join: Scalable In-Database Linear Algebra for Big Model Analytics Dot-Product Join: Scalable In-Database Linear Algebra for Big Model Analytics Chengjie Qin 1 and Florin Rusu 2 1 GraphSQL, Inc. 2 University of California Merced June 29, 2017 Machine Learning (ML) Is

More information

Applied Linear Algebra in Geoscience Using MATLAB

Applied Linear Algebra in Geoscience Using MATLAB Applied Linear Algebra in Geoscience Using MATLAB Contents Getting Started Creating Arrays Mathematical Operations with Arrays Using Script Files and Managing Data Two-Dimensional Plots Programming in

More information

Applied Linear Algebra in Geoscience Using MATLAB

Applied Linear Algebra in Geoscience Using MATLAB Applied Linear Algebra in Geoscience Using MATLAB Contents Getting Started Creating Arrays Mathematical Operations with Arrays Using Script Files and Managing Data Two-Dimensional Plots Programming in

More information

This ensures that we walk downhill. For fixed λ not even this may be the case.

This ensures that we walk downhill. For fixed λ not even this may be the case. Gradient Descent Objective Function Some differentiable function f : R n R. Gradient Descent Start with some x 0, i = 0 and learning rate λ repeat x i+1 = x i λ f(x i ) until f(x i+1 ) ɛ Line Search Variant

More information

Remainders. We learned how to multiply and divide in elementary

Remainders. We learned how to multiply and divide in elementary Remainders We learned how to multiply and divide in elementary school. As adults we perform division mostly by pressing the key on a calculator. This key supplies the quotient. In numerical analysis and

More information

Improvements for Implicit Linear Equation Solvers

Improvements for Implicit Linear Equation Solvers Improvements for Implicit Linear Equation Solvers Roger Grimes, Bob Lucas, Clement Weisbecker Livermore Software Technology Corporation Abstract Solving large sparse linear systems of equations is often

More information

Sparse Polynomial Multiplication and Division in Maple 14

Sparse Polynomial Multiplication and Division in Maple 14 Sparse Polynomial Multiplication and Division in Maple 4 Michael Monagan and Roman Pearce Department of Mathematics, Simon Fraser University Burnaby B.C. V5A S6, Canada October 5, 9 Abstract We report

More information

Sparse BLAS-3 Reduction

Sparse BLAS-3 Reduction Sparse BLAS-3 Reduction to Banded Upper Triangular (Spar3Bnd) Gary Howell, HPC/OIT NC State University gary howell@ncsu.edu Sparse BLAS-3 Reduction p.1/27 Acknowledgements James Demmel, Gene Golub, Franc

More information

CHAPTER 11. A Revision. 1. The Computers and Numbers therein

CHAPTER 11. A Revision. 1. The Computers and Numbers therein CHAPTER A Revision. The Computers and Numbers therein Traditional computer science begins with a finite alphabet. By stringing elements of the alphabet one after another, one obtains strings. A set of

More information

Comp 11 Lectures. Mike Shah. July 26, Tufts University. Mike Shah (Tufts University) Comp 11 Lectures July 26, / 45

Comp 11 Lectures. Mike Shah. July 26, Tufts University. Mike Shah (Tufts University) Comp 11 Lectures July 26, / 45 Comp 11 Lectures Mike Shah Tufts University July 26, 2017 Mike Shah (Tufts University) Comp 11 Lectures July 26, 2017 1 / 45 Please do not distribute or host these slides without prior permission. Mike

More information

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 Star Joins A common structure for data mining of commercial data is the star join. For example, a chain store like Walmart keeps a fact table whose tuples each

More information

CS425: Algorithms for Web Scale Data

CS425: Algorithms for Web Scale Data CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original slides can be accessed at: www.mmds.org Challenges

More information

Event Operators: Formalization, Algorithms, and Implementation Using Interval- Based Semantics

Event Operators: Formalization, Algorithms, and Implementation Using Interval- Based Semantics Department of Computer Science and Engineering University of Texas at Arlington Arlington, TX 76019 Event Operators: Formalization, Algorithms, and Implementation Using Interval- Based Semantics Raman

More information

Skriptsprachen. Numpy und Scipy. Kai Dührkop. Lehrstuhl fuer Bioinformatik Friedrich-Schiller-Universitaet Jena

Skriptsprachen. Numpy und Scipy. Kai Dührkop. Lehrstuhl fuer Bioinformatik Friedrich-Schiller-Universitaet Jena Skriptsprachen Numpy und Scipy Kai Dührkop Lehrstuhl fuer Bioinformatik Friedrich-Schiller-Universitaet Jena kai.duehrkop@uni-jena.de 24. September 2015 24. September 2015 1 / 37 Numpy Numpy Numerische

More information

4th year Project demo presentation

4th year Project demo presentation 4th year Project demo presentation Colm Ó héigeartaigh CASE4-99387212 coheig-case4@computing.dcu.ie 4th year Project demo presentation p. 1/23 Table of Contents An Introduction to Quantum Computing The

More information

Databases 2011 The Relational Algebra

Databases 2011 The Relational Algebra Databases 2011 Christian S. Jensen Computer Science, Aarhus University What is an Algebra? An algebra consists of values operators rules Closure: operations yield values Examples integers with +,, sets

More information

USING SINGULAR VALUE DECOMPOSITION (SVD) AS A SOLUTION FOR SEARCH RESULT CLUSTERING

USING SINGULAR VALUE DECOMPOSITION (SVD) AS A SOLUTION FOR SEARCH RESULT CLUSTERING POZNAN UNIVE RSIY OF E CHNOLOGY ACADE MIC JOURNALS No. 80 Electrical Engineering 2014 Hussam D. ABDULLA* Abdella S. ABDELRAHMAN* Vaclav SNASEL* USING SINGULAR VALUE DECOMPOSIION (SVD) AS A SOLUION FOR

More information

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

6. Iterative Methods for Linear Systems. The stepwise approach to the solution... 6 Iterative Methods for Linear Systems The stepwise approach to the solution Miriam Mehl: 6 Iterative Methods for Linear Systems The stepwise approach to the solution, January 18, 2013 1 61 Large Sparse

More information

Administering your Enterprise Geodatabase using Python. Jill Penney

Administering your Enterprise Geodatabase using Python. Jill Penney Administering your Enterprise Geodatabase using Python Jill Penney Assumptions Basic knowledge of python Basic knowledge enterprise geodatabases and workflows You want code Please turn off or silence cell

More information

Vector, Matrix, and Tensor Derivatives

Vector, Matrix, and Tensor Derivatives Vector, Matrix, and Tensor Derivatives Erik Learned-Miller The purpose of this document is to help you learn to take derivatives of vectors, matrices, and higher order tensors (arrays with three dimensions

More information

EXAMPLES OF CLASSICAL ITERATIVE METHODS

EXAMPLES OF CLASSICAL ITERATIVE METHODS EXAMPLES OF CLASSICAL ITERATIVE METHODS In these lecture notes we revisit a few classical fixpoint iterations for the solution of the linear systems of equations. We focus on the algebraic and algorithmic

More information

CS 542G: Conditioning, BLAS, LU Factorization

CS 542G: Conditioning, BLAS, LU Factorization CS 542G: Conditioning, BLAS, LU Factorization Robert Bridson September 22, 2008 1 Why some RBF Kernel Functions Fail We derived some sensible RBF kernel functions, like φ(r) = r 2 log r, from basic principles

More information

Applied Linear Algebra in Geoscience Using MATLAB

Applied Linear Algebra in Geoscience Using MATLAB Applied Linear Algebra in Geoscience Using MATLAB Contents Getting Started Creating Arrays Mathematical Operations with Arrays Using Script Files and Managing Data Two-Dimensional Plots Programming in

More information

The File Geodatabase API. Craig Gillgrass Lance Shipman

The File Geodatabase API. Craig Gillgrass Lance Shipman The File Geodatabase API Craig Gillgrass Lance Shipman Schedule Cell phones and pagers Please complete the session survey we take your feedback very seriously! Overview File Geodatabase API - Introduction

More information

MapOSMatic: city maps for the masses

MapOSMatic: city maps for the masses MapOSMatic: city maps for the masses Thomas Petazzoni Libre Software Meeting July 9th, 2010 Outline 1 The story 2 MapOSMatic 3 Behind the web page 4 Pain points 5 Future work 6 Conclusion Thomas Petazzoni

More information

Lineage implementation in PostgreSQL

Lineage implementation in PostgreSQL Lineage implementation in PostgreSQL Andrin Betschart, 09-714-882 Martin Leimer, 09-728-569 3. Oktober 2013 Contents Contents 1. Introduction 3 2. Lineage computation in TPDBs 4 2.1. Lineage......................................

More information

Computational Linear Algebra

Computational Linear Algebra Computational Linear Algebra PD Dr. rer. nat. habil. Ralf Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2017/18 Part 2: Direct Methods PD Dr.

More information

Numerical Methods I Non-Square and Sparse Linear Systems

Numerical Methods I Non-Square and Sparse Linear Systems Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant

More information

Abstract parsing: static analysis of dynamically generated string output using LR-parsing technology

Abstract parsing: static analysis of dynamically generated string output using LR-parsing technology Abstract parsing: static analysis of dynamically generated string output using LR-parsing technology Kyung-Goo Doh 1, Hyunha Kim 1, David A. Schmidt 2 1. Hanyang University, Ansan, South Korea 2. Kansas

More information

QR Decomposition in a Multicore Environment

QR Decomposition in a Multicore Environment QR Decomposition in a Multicore Environment Omar Ahsan University of Maryland-College Park Advised by Professor Howard Elman College Park, MD oha@cs.umd.edu ABSTRACT In this study we examine performance

More information

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Vahid Dehdari and Clayton V. Deutsch Geostatistical modeling involves many variables and many locations.

More information

Impression Store: Compressive Sensing-based Storage for. Big Data Analytics

Impression Store: Compressive Sensing-based Storage for. Big Data Analytics Impression Store: Compressive Sensing-based Storage for Big Data Analytics Jiaxing Zhang, Ying Yan, Liang Jeff Chen, Minjie Wang, Thomas Moscibroda & Zheng Zhang Microsoft Research The Curse of O(N) in

More information

MATH.2720 Introduction to Programming with MATLAB Vector and Matrix Algebra

MATH.2720 Introduction to Programming with MATLAB Vector and Matrix Algebra MATH.2720 Introduction to Programming with MATLAB Vector and Matrix Algebra A. Vectors A vector is a quantity that has both magnitude and direction, like velocity. The location of a vector is irrelevant;

More information

Factorized Relational Databases Olteanu and Závodný, University of Oxford

Factorized Relational Databases   Olteanu and Závodný, University of Oxford November 8, 2013 Database Seminar, U Washington Factorized Relational Databases http://www.cs.ox.ac.uk/projects/fd/ Olteanu and Závodný, University of Oxford Factorized Representations of Relations Cust

More information

Linear Algebra, part 3 QR and SVD

Linear Algebra, part 3 QR and SVD Linear Algebra, part 3 QR and SVD Anna-Karin Tornberg Mathematical Models, Analysis and Simulation Fall semester, 2012 Going back to least squares (Section 1.4 from Strang, now also see section 5.2). We

More information

A comparison of sequencing formulations in a constraint generation procedure for avionics scheduling

A comparison of sequencing formulations in a constraint generation procedure for avionics scheduling A comparison of sequencing formulations in a constraint generation procedure for avionics scheduling Department of Mathematics, Linköping University Jessika Boberg LiTH-MAT-EX 2017/18 SE Credits: Level:

More information

Analysis of Algorithms [Reading: CLRS 2.2, 3] Laura Toma, csci2200, Bowdoin College

Analysis of Algorithms [Reading: CLRS 2.2, 3] Laura Toma, csci2200, Bowdoin College Analysis of Algorithms [Reading: CLRS 2.2, 3] Laura Toma, csci2200, Bowdoin College Why analysis? We want to predict how the algorithm will behave (e.g. running time) on arbitrary inputs, and how it will

More information

15-780: LinearProgramming

15-780: LinearProgramming 15-780: LinearProgramming J. Zico Kolter February 1-3, 2016 1 Outline Introduction Some linear algebra review Linear programming Simplex algorithm Duality and dual simplex 2 Outline Introduction Some linear

More information

BLAS: Basic Linear Algebra Subroutines Analysis of the Matrix-Vector-Product Analysis of Matrix-Matrix Product

BLAS: Basic Linear Algebra Subroutines Analysis of the Matrix-Vector-Product Analysis of Matrix-Matrix Product Level-1 BLAS: SAXPY BLAS-Notation: S single precision (D for double, C for complex) A α scalar X vector P plus operation Y vector SAXPY: y = αx + y Vectorization of SAXPY (αx + y) by pipelining: page 8

More information

Revenue Maximization in a Cloud Federation

Revenue Maximization in a Cloud Federation Revenue Maximization in a Cloud Federation Makhlouf Hadji and Djamal Zeghlache September 14th, 2015 IRT SystemX/ Telecom SudParis Makhlouf Hadji Outline of the presentation 01 Introduction 02 03 04 05

More information

Linear Algebra for Machine Learning. Sargur N. Srihari

Linear Algebra for Machine Learning. Sargur N. Srihari Linear Algebra for Machine Learning Sargur N. srihari@cedar.buffalo.edu 1 Overview Linear Algebra is based on continuous math rather than discrete math Computer scientists have little experience with it

More information

Johns Hopkins Math Tournament Proof Round: Automata

Johns Hopkins Math Tournament Proof Round: Automata Johns Hopkins Math Tournament 2018 Proof Round: Automata February 9, 2019 Problem Points Score 1 10 2 5 3 10 4 20 5 20 6 15 7 20 Total 100 Instructions The exam is worth 100 points; each part s point value

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 27, 2015 Outline Linear regression Ridge regression and Lasso Time complexity (closed form solution) Iterative Solvers Regression Input: training

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra By: David McQuilling; Jesus Caban Deng Li Jan.,31,006 CS51 Solving Linear Equations u + v = 8 4u + 9v = 1 A x b 4 9 u v = 8 1 Gaussian Elimination Start with the matrix representation

More information

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition AM 205: lecture 8 Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition QR Factorization A matrix A R m n, m n, can be factorized

More information

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017 The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the

More information

Lecture 4: Linear Algebra 1

Lecture 4: Linear Algebra 1 Lecture 4: Linear Algebra 1 Sourendu Gupta TIFR Graduate School Computational Physics 1 February 12, 2010 c : Sourendu Gupta (TIFR) Lecture 4: Linear Algebra 1 CP 1 1 / 26 Outline 1 Linear problems Motivation

More information

Efficient random number generation on FPGA-s

Efficient random number generation on FPGA-s Proceedings of the 9 th International Conference on Applied Informatics Eger, Hungary, January 29 February 1, 2014. Vol. 1. pp. 313 320 doi: 10.14794/ICAI.9.2014.1.313 Efficient random number generation

More information

Intel Math Kernel Library (Intel MKL) LAPACK

Intel Math Kernel Library (Intel MKL) LAPACK Intel Math Kernel Library (Intel MKL) LAPACK Linear equations Victor Kostin Intel MKL Dense Solvers team manager LAPACK http://www.netlib.org/lapack Systems of Linear Equations Linear Least Squares Eigenvalue

More information

ww.padasalai.net

ww.padasalai.net t w w ADHITHYA TRB- TET COACHING CENTRE KANCHIPURAM SUNDER MATRIC SCHOOL - 9786851468 TEST - 2 COMPUTER SCIENC PG - TRB DATE : 17. 03. 2019 t et t et t t t t UNIT 1 COMPUTER SYSTEM ARCHITECTURE t t t t

More information

LU Factorization. LU factorization is the most common way of solving linear systems! Ax = b LUx = b

LU Factorization. LU factorization is the most common way of solving linear systems! Ax = b LUx = b AM 205: lecture 7 Last time: LU factorization Today s lecture: Cholesky factorization, timing, QR factorization Reminder: assignment 1 due at 5 PM on Friday September 22 LU Factorization LU factorization

More information

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 1. Basic Linear Algebra Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Example

More information

Some Formal Analysis of Rocchio s Similarity-Based Relevance Feedback Algorithm

Some Formal Analysis of Rocchio s Similarity-Based Relevance Feedback Algorithm Some Formal Analysis of Rocchio s Similarity-Based Relevance Feedback Algorithm Zhixiang Chen (chen@cs.panam.edu) Department of Computer Science, University of Texas-Pan American, 1201 West University

More information

Matrix Computations: Direct Methods II. May 5, 2014 Lecture 11

Matrix Computations: Direct Methods II. May 5, 2014 Lecture 11 Matrix Computations: Direct Methods II May 5, 2014 ecture Summary You have seen an example of how a typical matrix operation (an important one) can be reduced to using lower level BS routines that would

More information

Chapter 2. Solving Systems of Equations. 2.1 Gaussian elimination

Chapter 2. Solving Systems of Equations. 2.1 Gaussian elimination Chapter 2 Solving Systems of Equations A large number of real life applications which are resolved through mathematical modeling will end up taking the form of the following very simple looking matrix

More information

A Simple Left-to-Right Algorithm for Minimal Weight Signed Radix-r Representations

A Simple Left-to-Right Algorithm for Minimal Weight Signed Radix-r Representations A Simple Left-to-Right Algorithm for Minimal Weight Signed Radix-r Representations James A. Muir School of Computer Science Carleton University, Ottawa, Canada http://www.scs.carleton.ca/ jamuir 23 October

More information

AM205: Assignment 2. i=1

AM205: Assignment 2. i=1 AM05: Assignment Question 1 [10 points] (a) [4 points] For p 1, the p-norm for a vector x R n is defined as: ( n ) 1/p x p x i p ( ) i=1 This definition is in fact meaningful for p < 1 as well, although

More information

QR FACTORIZATIONS USING A RESTRICTED SET OF ROTATIONS

QR FACTORIZATIONS USING A RESTRICTED SET OF ROTATIONS QR FACTORIZATIONS USING A RESTRICTED SET OF ROTATIONS DIANNE P. O LEARY AND STEPHEN S. BULLOCK Dedicated to Alan George on the occasion of his 60th birthday Abstract. Any matrix A of dimension m n (m n)

More information

Performance of Random Sampling for Computing Low-rank Approximations of a Dense Matrix on GPUs

Performance of Random Sampling for Computing Low-rank Approximations of a Dense Matrix on GPUs Performance of Random Sampling for Computing Low-rank Approximations of a Dense Matrix on GPUs Théo Mary, Ichitaro Yamazaki, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, Jack Dongarra presenter 1 Low-Rank

More information

Course Announcements. Bacon is due next Monday. Next lab is about drawing UIs. Today s lecture will help thinking about your DB interface.

Course Announcements. Bacon is due next Monday. Next lab is about drawing UIs. Today s lecture will help thinking about your DB interface. Course Announcements Bacon is due next Monday. Today s lecture will help thinking about your DB interface. Next lab is about drawing UIs. John Jannotti (cs32) ORMs Mar 9, 2017 1 / 24 ORMs John Jannotti

More information

CS 310 Advanced Data Structures and Algorithms

CS 310 Advanced Data Structures and Algorithms CS 310 Advanced Data Structures and Algorithms Runtime Analysis May 31, 2017 Tong Wang UMass Boston CS 310 May 31, 2017 1 / 37 Topics Weiss chapter 5 What is algorithm analysis Big O, big, big notations

More information

In-Database Factorised Learning fdbresearch.github.io

In-Database Factorised Learning fdbresearch.github.io In-Database Factorised Learning fdbresearch.github.io Mahmoud Abo Khamis, Hung Ngo, XuanLong Nguyen, Dan Olteanu, and Maximilian Schleich December 2017 Logic for Data Science Seminar Alan Turing Institute

More information

If you buy 4 apples for a total cost of 80 pence, how much does each apple cost?

If you buy 4 apples for a total cost of 80 pence, how much does each apple cost? Introduction If you buy 4 apples for a total cost of 80 pence, how much does each apple cost? Cost of one apple = Total cost Number 80 pence = 4 = 20 pence In maths we often use symbols to represent quantities.

More information

Binary Decision Diagrams and Symbolic Model Checking

Binary Decision Diagrams and Symbolic Model Checking Binary Decision Diagrams and Symbolic Model Checking Randy Bryant Ed Clarke Ken McMillan Allen Emerson CMU CMU Cadence U Texas http://www.cs.cmu.edu/~bryant Binary Decision Diagrams Restricted Form of

More information

Sparse Linear Systems. Iterative Methods for Sparse Linear Systems. Motivation for Studying Sparse Linear Systems. Partial Differential Equations

Sparse Linear Systems. Iterative Methods for Sparse Linear Systems. Motivation for Studying Sparse Linear Systems. Partial Differential Equations Sparse Linear Systems Iterative Methods for Sparse Linear Systems Matrix Computations and Applications, Lecture C11 Fredrik Bengzon, Robert Söderlund We consider the problem of solving the linear system

More information

Scripting Languages Fast development, extensible programs

Scripting Languages Fast development, extensible programs Scripting Languages Fast development, extensible programs Devert Alexandre School of Software Engineering of USTC November 30, 2012 Slide 1/60 Table of Contents 1 Introduction 2 Dynamic languages A Python

More information

Communication-avoiding parallel and sequential QR factorizations

Communication-avoiding parallel and sequential QR factorizations Communication-avoiding parallel and sequential QR factorizations James Demmel, Laura Grigori, Mark Hoemmen, and Julien Langou May 30, 2008 Abstract We present parallel and sequential dense QR factorization

More information

An Introduction to NeRDS (Nearly Rank Deficient Systems)

An Introduction to NeRDS (Nearly Rank Deficient Systems) (Nearly Rank Deficient Systems) BY: PAUL W. HANSON Abstract I show that any full rank n n matrix may be decomposento the sum of a diagonal matrix and a matrix of rank m where m < n. This decomposition

More information

ST-Links. SpatialKit. Version 3.0.x. For ArcMap. ArcMap Extension for Directly Connecting to Spatial Databases. ST-Links Corporation.

ST-Links. SpatialKit. Version 3.0.x. For ArcMap. ArcMap Extension for Directly Connecting to Spatial Databases. ST-Links Corporation. ST-Links SpatialKit For ArcMap Version 3.0.x ArcMap Extension for Directly Connecting to Spatial Databases ST-Links Corporation www.st-links.com 2012 Contents Introduction... 3 Installation... 3 Database

More information

Preliminaries and Complexity Theory

Preliminaries and Complexity Theory Preliminaries and Complexity Theory Oleksandr Romanko CAS 746 - Advanced Topics in Combinatorial Optimization McMaster University, January 16, 2006 Introduction Book structure: 2 Part I Linear Algebra

More information

P Q1 Q2 Q3 Q4 Q5 Tot (60) (20) (20) (20) (60) (20) (200) You are allotted a maximum of 4 hours to complete this exam.

P Q1 Q2 Q3 Q4 Q5 Tot (60) (20) (20) (20) (60) (20) (200) You are allotted a maximum of 4 hours to complete this exam. Exam INFO-H-417 Database System Architecture 13 January 2014 Name: ULB Student ID: P Q1 Q2 Q3 Q4 Q5 Tot (60 (20 (20 (20 (60 (20 (200 Exam modalities You are allotted a maximum of 4 hours to complete this

More information

Matrix Assembly in FEA

Matrix Assembly in FEA Matrix Assembly in FEA 1 In Chapter 2, we spoke about how the global matrix equations are assembled in the finite element method. We now want to revisit that discussion and add some details. For example,

More information

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for 1 Logistics Notes for 2016-09-14 1. There was a goof in HW 2, problem 1 (now fixed) please re-download if you have already started looking at it. 2. CS colloquium (4:15 in Gates G01) this Thurs is Margaret

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 7: More on Householder Reflectors; Least Squares Problems Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 15 Outline

More information

Preconditioned Parallel Block Jacobi SVD Algorithm

Preconditioned Parallel Block Jacobi SVD Algorithm Parallel Numerics 5, 15-24 M. Vajteršic, R. Trobec, P. Zinterhof, A. Uhl (Eds.) Chapter 2: Matrix Algebra ISBN 961-633-67-8 Preconditioned Parallel Block Jacobi SVD Algorithm Gabriel Okša 1, Marián Vajteršic

More information

OBEUS. (Object-Based Environment for Urban Simulation) Shareware Version. Itzhak Benenson 1,2, Slava Birfur 1, Vlad Kharbash 1

OBEUS. (Object-Based Environment for Urban Simulation) Shareware Version. Itzhak Benenson 1,2, Slava Birfur 1, Vlad Kharbash 1 OBEUS (Object-Based Environment for Urban Simulation) Shareware Version Yaffo model is based on partition of the area into Voronoi polygons, which correspond to real-world houses; neighborhood relationship

More information

M. Matrices and Linear Algebra

M. Matrices and Linear Algebra M. Matrices and Linear Algebra. Matrix algebra. In section D we calculated the determinants of square arrays of numbers. Such arrays are important in mathematics and its applications; they are called matrices.

More information

A Simple Left-to-Right Algorithm for Minimal Weight Signed Radix-r Representations

A Simple Left-to-Right Algorithm for Minimal Weight Signed Radix-r Representations IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. XX, NO. X, MONTH 2007 1 A Simple Left-to-Right Algorithm for Minimal Weight Signed Radix-r Representations James A. Muir Abstract We present a simple algorithm

More information

Fast Multipole Methods: Fundamentals & Applications. Ramani Duraiswami Nail A. Gumerov

Fast Multipole Methods: Fundamentals & Applications. Ramani Duraiswami Nail A. Gumerov Fast Multipole Methods: Fundamentals & Applications Ramani Duraiswami Nail A. Gumerov Week 1. Introduction. What are multipole methods and what is this course about. Problems from physics, mathematics,

More information

Review: Linear and Vector Algebra

Review: Linear and Vector Algebra Review: Linear and Vector Algebra Points in Euclidean Space Location in space Tuple of n coordinates x, y, z, etc Cannot be added or multiplied together Vectors: Arrows in Space Vectors are point changes

More information

Algorithms and Their Complexity

Algorithms and Their Complexity CSCE 222 Discrete Structures for Computing David Kebo Houngninou Algorithms and Their Complexity Chapter 3 Algorithm An algorithm is a finite sequence of steps that solves a problem. Computational complexity

More information

QALGO workshop, Riga. 1 / 26. Quantum algorithms for linear algebra.

QALGO workshop, Riga. 1 / 26. Quantum algorithms for linear algebra. QALGO workshop, Riga. 1 / 26 Quantum algorithms for linear algebra., Center for Quantum Technologies and Nanyang Technological University, Singapore. September 22, 2015 QALGO workshop, Riga. 2 / 26 Overview

More information

(Linear equations) Applied Linear Algebra in Geoscience Using MATLAB

(Linear equations) Applied Linear Algebra in Geoscience Using MATLAB Applied Linear Algebra in Geoscience Using MATLAB (Linear equations) Contents Getting Started Creating Arrays Mathematical Operations with Arrays Using Script Files and Managing Data Two-Dimensional Plots

More information

Environment (Parallelizing Query Optimization)

Environment (Parallelizing Query Optimization) Advanced d Query Optimization i i Techniques in a Parallel Computing Environment (Parallelizing Query Optimization) Wook-Shin Han*, Wooseong Kwak, Jinsoo Lee Guy M. Lohman, Volker Markl Kyungpook National

More information

Enumeration and generation of all constitutional alkane isomers of methane to icosane using recursive generation and a modified Morgan s algorithm

Enumeration and generation of all constitutional alkane isomers of methane to icosane using recursive generation and a modified Morgan s algorithm The C n H 2n+2 challenge Zürich, November 21st 2015 Enumeration and generation of all constitutional alkane isomers of methane to icosane using recursive generation and a modified Morgan s algorithm Andreas

More information

COMS 6100 Class Notes

COMS 6100 Class Notes COMS 6100 Class Notes Daniel Solus September 20, 2016 1 General Remarks The Lecture notes submitted by the class have been very good. Integer division seemed to be a common oversight when working the Fortran

More information

Behavioral Simulations in MapReduce

Behavioral Simulations in MapReduce Behavioral Simulations in MapReduce Guozhang Wang, Marcos Vaz Salles, Benjamin Sowell, Xun Wang, Tuan Cao, Alan Demers, Johannes Gehrke, Walker White Cornell University 1 What are Behavioral Simulations?

More information

SPATIAL DATA MINING. Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM

SPATIAL DATA MINING. Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM SPATIAL DATA MINING Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM INTRODUCTION The main difference between data mining in relational DBS and in spatial DBS is that attributes of the neighbors

More information