REDISTRIBUTION OF TENSORS FOR DISTRIBUTED CONTRACTIONS

Size: px
Start display at page:

Download "REDISTRIBUTION OF TENSORS FOR DISTRIBUTED CONTRACTIONS"

Transcription

1 REDISTRIBUTION OF TENSORS FOR DISTRIBUTED CONTRACTIONS THESIS Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of the Ohio State University By Akshay Nikam, B.Tech. Graduate Program in Computer Science and Engineering The Ohio State University 2014 Master s Examination Committee: Dr. P. Sadayappan, Advisor Dr. Atanas Rountev

2 c Copyright by Akshay Nikam 2014

3 ABSTRACT In computational quantum chemistry and nuclear physics, tensor contractions are frequently required computationally expensive operations. Methods involving tensor contractions often require them to run in series. Efficient contraction algorithms require the tensors to be distributed in certain ways. Hence, a redistribution operation on tensors is essential between two contractions in series. In this thesis, an efficient method to redistribute tensors on a multi-dimensional torus-grid of processors is proposed. Multiple ways of redistribution exist and the most efficient one for the given pair of distributions can be picked. The choice is mainly driven by replication of tensor data in the processor grid. Redistribution approaches are developed based on the replication scheme and they involve division of processor grid into smaller hyperplanar sub-grids that can handle the communication in parallel. Some approaches involve data broadcast along grid dimensions to replicate data as required by the new distribution. Wherever such efficient schemes are not possible, a point-to-point communication approach is proposed. ii

4 I dedicate my work to my parents, sisters and friends iii

5 ACKNOWLEDGMENTS I would like to convey my gratitude to my advisor Dr. P. Sadayappan, for giving me the opportunity to be a part of his vibrant and active research group. He has been a great source of inspiration for me throughout my masters program and his guidance has been of immense help in learning about High Performance Computing, especially Distributed and Parallel Computing. I express my deep gratitude to Samyam Rajbhandari with whom I worked for the most of my time in High Performance Computing Lab. I have lost track of the countless days and nights we worked together for implementing contractions of distributed tensors and I will always appreciate his hard work and willingness to help me with the hurdles I came across. I would also like to thank Kevin Stock for being a good critic and helping me in improving my programming practices. I cannot thank enough my friends Sonali and Ashwin for keeping me motivated about the dreams of my life and always being there for me. I am short of words to appreciate the efforts my parents have taken to help me achieve my dreams. I thank my sisters Manjusha, Archana and Kranti who always motivated me for keeping up the hard work. I express my deep gratitude to my brother-in-law Mr. Baban Shinde. I attribute a significant portion of my success to the help he has provided for my education. iv

6 VITA June May May July June July Jan Dec May Aug B.Tech. in Computer Enginnering, College of Engineering, Pune, India. Software Development Engineering Intern, IBM India Software Labs, Pune, India. Associate Software Engineer, IBM India Software Labs, Pune, India. Graduate Research Associate, High Performance Computing Lab, The Ohio State University, Columbus OH, USA. Software Development Engineering Intern, Microsoft Corporation, Redmond WA, USA. PUBLICATIONS Hukerikar, Saumil; Tumma, Ashwin; Nikam, Akshay; Attar, Vahida. SkewBoost: An Algorithm for Classifying Imbalanced Datasets, IEEE Xplore Proceedings of ICCCT 2011, 45-52, /ICCCT v

7 FIELDS OF STUDY Major Field: Computer Science and Engineering Specialization: High Performance Computing vi

8 TABLE OF CONTENTS Abstract Dedication Acknowledgments Vita ii iii iv v List of Figures ix List of Algorithms xi CHAPTER PAGE 1 Introduction Background and Motivation Matrices and Tensors Contractions of Tensors Symmetry in Tensors Tensor Contraction on a Processor Grid Tensor Mapping on Processor Grid Tensor Index Distribution On Grid Dimensions Contraction of Distributed Tensors Need for Tensor Redistribution Redistribution of Tensors Identifying Communication Patterns Parallelizing Communication Replication in Current Distribution Redistribution within a Hyperplane Broadcast Communication Point-to-Point Communication vii

9 4 Experimental Setup and Results Experimental Setup Results Grid-wide Broadcast Communication Grid-wide Point-to-Point Communication Broadcast Communication in Hyperplanes Point-to-Point Communication in Hyperplanes Conclusion and Future Work Bibliography viii

10 LIST OF FIGURES FIGURE PAGE 2.1 3D Torus grid by Fujitsu [5] One-to-one mapping of tensor A[i, j, k] on a 3D grid Mapping tensor A[i, j, k, l] on a 3D grid with l serialized Mapping tensor A[i, j] on a 3D grid with one of the dimensions replicated Mapping tensor A[i, j, k] on a 3D grid with k serialized and one of the dimensions replicated elements to be distributed along 3 nodes Block Distribution Cyclic Distribution Block-Cyclic Distribution Virtual splitting of grid into 2D planes along the replicated dimension Broadcast groups of the form P[I,*,K] Broadcast in an Intra-Communicator Broadcast in an Inter-Communicator Bandwidth achieved for different pairs of distributions for broadcast communication across entire grid Bandwidth achieved for different pairs of distributions for point-topoint communication across entire grid Bandwidth achieved for different pairs of distributions for Broadcast communication in hyperplanes ix

11 4.4 Bandwidth achieved for different pairs of distributions for point-topoint communication in hyperplanes x

12 LIST OF ALGORITHMS ALGORITHMS PAGE 2.1 Tensor Contraction: C[a,b,c,d] = A[a,b,k,l] B[l,k,c,d] Redistribution with Broadcast Communication Redistribution with Point-to-point Communication Redistribution Scheme 35 xi

13 CHAPTER 1 INTRODUCTION The topic of this thesis stems from the area of Quantum Chemistry. This field applies the notions in quantum mechanics to chemical many-body systems. It involves usage of a variety of computational methods to solve problems. One of the widely used methods in Quantum Chemistry is the Coupled Cluster family of methods [1] [2]. Used for modeling many-body systems in chemistry, coupled cluster methods are computationally expensive and often require computational power of supercomputers. With the benefit of high performance computing, multi-electron wavefunctions can be more accurately modeled for molecules. The types of coupled cluster methods are decided by the number of excitations permitted. Coupled Cluster Doubles (CCD) applies for onyl double excitations, Coupled Cluster Singles and Doubles (CCSD) applies for single and doubles excitations, etc. In CCSD, using algebraic and diagrammetic techniques, a series of equations is derived [2]. These equations involve operations on tensor objects. Tensors can be stored as high dimensional matrices in computer memory. The frequent operations on tensors in CCSD include contractions. A contraction of two tensors is essentially a higher dimensional generalization of a matrix multiplication from computation perspective. Since dimensionality and size of tensors can be big in CCSD, contraction operations tend to be compute-intensive. It is common to exploit the power of supercomputer 1

14 architectures to accurately compute CCSD equations. Tensors are distributed across processor grids and contractions take place in distributed fashion. Efficient contraction algorithms require tensors to be distributed and mapped to the processor grid in a certain manner. Since tensors are often reused in multiple equations in one CCSD sequence of operations, efficient execution of CCSD needs to redistribute tensors as required by the contraction algorithm for efficient execution [4]. This thesis presents an algorithm for redistributing tensors on a multi-dimensional torus grid based on the data replication patterns in the grid. This algorithm assists the CCSD contractions for an efficient execution. 2

15 CHAPTER 2 BACKGROUND AND MOTIVATION In Coupled Cluster Singles and Doubles (CCSD), a large number of contractions need to be performed on tensors efficiently. This section defines various concepts such as tensors, their mapping on a processor grid, data distribution and tensor contractions and thereafter shows why an efficient redistribution scheme is required. 2.1 Matrices and Tensors A matrix A [M N] can be defined as a set of M N numbers organized in a rectangle (2 dimensions) in M rows and N columns. From computer storage viewpoint, Matrix A is essentially a rectangular array of size M N, with M rows and N columns. Where a matrix is a 2-dimensional array, tensors can be defined as higher dimensional generalizations of matrices from the storage and computational perspective. Since tensor is a general term, a 2-dimensional matrix can also be called as a 2-dimensional tensor. For CCSD method, we only have to work with tensors of dimensionality 4 or lower. A dimension of tensor is referred to as its index. For representing matrix A of size M N as a tensor, we need to use two indices representing its dimensions as A[i, j]. Here there are M elements along index i and N elements along index j. Although an order is decided for the indices of a tensor for the purpose of storage, the order does 3

16 not hold any significance in the physical interpretation of the tensor. A 4D tensor B[i, j, k, l] has four indices namely, i, j, k and l and the tensor can also be represented as B[k, i, l, j] or any other permutation of the indices. However, the storage of the tensor only follows one of all the possible permutations. 2.2 Contractions of Tensors The CCSD equations involve addition and contraction operations between multiple tensors. Addition of tensors is analogous to matrix addition. A contraction operation between tensors is a slightly more complicated higher dimensional generalization of matrix multiplication. It is helpful to discuss matrix multiplication from another point of view before discussing tensor contractions. Matrix multiplication is an operation on two matrices: A [M K] and B [K N] to generate an output matrix C [M N], such that an element of C at the intersection of i th row and j th column which is identified as C[i][j] is a dot product of i th row of A and j th column of B. (The notation C[i][j] is used to denote an element where i and j mean values of the respective indices, while C[i, j] is used to denote the tensor where i and j indicate the indices of the tensor.) The rows of A and columns of B are vectors of length K. These vectors represent one of the two dimensions or indices of their parent matrix. Thus one of the two indices of both the input matrices (the index with size K) is vanished due to the dot product after all values in C are computed. The remaining indices (of the sizes M and N) appear in the output matrix C. In the tensor terminology, we say that the index with size K has contracted. Thus this matrix multiplication also represents a contraction of tensors A[i, k] and B[k, j] that contracts the index k and yields an output tensor C[i, j]. From the indices of the tensors, it can be observed that index i of A and the index j of B are retained in C after the contraction. These are called as external indices of the input tensors. 4

17 However, the index k of both input tensors is contracted and thus does not appear in the output tensor. It is termed as a contracting index of the input tensors. This tensor contraction can be represented by equation C[i, j] = k A[i, k] B[k, j] (2.2.1) In higher dimensional tensor contractions, there can be more than one contracting indices in each input tensor. For example, contraction of two tensors: A[a, b, k, l] and B[l, k, c, d] will yield an output tensor C[a, b, c, d] such that the contraction indices k and l are contracted, while the indices a, b, c and d as external indices are retained in C. Listing. 2.1 shows a code snippet for contracting tensors A and B to yield C. Note that there is one loop for each external index as well as for each contracting index. The loops for contracting indices k and l are innermost in the code. Suppose, A is d a - dimensional, B is d b -dimensional tensor that are contracted to yield a d c dimensional tensor C. Then the total number of loops or iterators in a contraction algorithm is equal to d a + d b + d c. If n is the size of each tensor (number of elements) in each 2 dimension, their contraction takes O(n da+d b +dc 2 ) time. Listing 2.1: Tensor Contraction: C[a,b,c,d] = A[a,b,k,l] B[l,k,c,d] for ( int a =0; a<n; a ++) { for ( int b =0; b<n; b ++) { for ( int c =0; c<n; c ++) { for ( int d =0; d<n; d ++) { for ( int k =0; k<n; k ++) { for ( int l =0; l<n; l ++) { C[a,b,c,d] += A[a,b,k,l] * B[l,k,c,d] } } } } } } 5

18 2.3 Symmetry in Tensors Symmetry in tensors can be generalized from that in 2-dimensional matrices. A 2D symmetric matrix can be defined as a matrix A [M N] such that A[i][j] = A[j][i] for all i,j. Thus if we divide the matrix in two triangles separated by the diagonal, the triangles are mirror images of each other and the diagonal elements are unique. This type of symmetry in a 2D matrix can also be represented by a notation A[i > j], indicating that only elements with indices i j exist and the elements with indices j > i are the same as those with i > j. Only the lower triangle of the matrix can be stored in memory to avoid redundancy and utilized for all complete matrix operations. Thus with a 2-index symmetry in a 2D matrix, we can save half of the total storage space required for a non-symmetric 2D matrix. When the number of dimensions increase, the concept of symmetry can be further generalized for tensors. A group of two or more indices can be involved in a symmetry. Moreover, there can be more than one independent symmetry groups in a higher dimensional tensor. For example, there can be five cases in a 4D tensor A[i,j,k,l]: CASE I: No Symmetry The tensor can be represented simply as A[i, j, k, l]. CASE II: Two of the indices are symmetric with each other Without loss of generality, consider i and j to be symmetric. Tensor A can be represented as A[i > j, k, l]. CASE III: Three of the indices are symmetric with each other Without loss of generality, consider i, j and k to be symmetric. Tensor A can be represented as A[i > j > k, l]. CASE IV: Two disjoint symmetry groups each involving two indices of the tensor Consider i and j to be symmetric with each other while k and l are symmetric with each other. In this case, tensor A is represented as A[i > j, k > l]. 6

19 CASE V: All four indices are symmetric with each other Although this case does not occur in the tensors in CCSD equations, it is a possible scenario for symmetry in a 4D tensor. In this case, tensor A can be represented as A[i > j > k > l] With d dimensions involved in a symmetry group, we can save 1/d! of the storage space due to the symmetry. 2.4 Tensor Contraction on a Processor Grid Contraction of tensors can be parallelized on a cluster of processor nodes with distributed memory. The processor grid considered for this study is a multi-dimensional torus. A torus shaped grid has nodes linked in series in each dimension with a link between the two extreme nodes. In a d-dimensional torus, every node has two neighbor nodes in each dimension and total 2 d neighbors. Each processor in the grid can be identified by its d-dimensional coordinates. Thus for a 3D grid, a processor P [I, J, K] is the I th processor in 0 th dimension, J th in the 1 st dimension and K th in the 2 nd dimension, where dimension indexing starts at 0. Figure. 2.1 shows an example of 3D torus grid of 64 processors by Fujitsu. There are four nodes along each dimension of the grid. Each of these nodes are connected to two neighbors per dimension and thus each one has a total of six neighbors. Note that the nodes at either end of each dimension are connected to each other by a link forming a loop of four nodes in each dimension. An advantage of a torus grid over non-torus grid of same dimensionality is that it is efficient to shift and rotate data in all dimensions. This type of communication pattern is highly desirable for the most efficient tensor contraction algorithms. 7

20 Figure 2.1: 3D Torus grid by Fujitsu [5] Tensor Mapping on Processor Grid A tensor can be stored in a processor grid of the same or different dimensionality as that of the tensor. Usually, tensor indices are mapped to the grid dimensions in order to distribute the tensor. This mapping is also referred to as index-dimension mapping here onwards. It can be represented by a vector of the same dimensionality as the tensor where the value at i th dimension represents the physical grid dimension where the i th index of tensor is mapped to. We need to consider some constraints and general facts about distributing a tensor on a processor grid. One-to-one Index-Dimension Mapping Constraint Only one index from a tensor can be mapped to a grid dimension. This is because the set of elements of a tensor is the product set formed by a cross product between 8

21 each index of the tensor. Thus, if there are 2 indices and the size of tensor is n along each index, the total number of elements formed by the product is n n = n 2. Generalizing for p indices, we get the total number of elements as n n n... n (p times) = n p. It is not possible to have the entire product set of elements if we map more than one index to a physical grid dimension. If we map two indices i and j to a dimension and distribute the index values in the same fashion along the dimension, only elements that will be formed are the diagonal elements where i = j. It is not possible to store the elements where i j since each node along the dimension gets the same values of i and j. Distribution or Serialization of a Tensor Index There are two ways to deal with each tensor index when distributing a tensor on a grid. An index can either be distributed across some physical dimension or it can be serialized. There are several ways of distributing an index along a dimension which will be discussed in a later section. However, when an index is serialized, it means the index is not mapped to any of the physical dimensions, i.e. all the elements along this index are fully stored in one processor rather than spreading them across several processors in a dimension. It is possible to serialize all indices of the tensor, as a result of which we will have the entire tensor stored in each processor node of the grid. Distribution or Replication along a Grid Dimension Another perspective of understanding tensor mapping on a grid is from the viewpoint of a grid dimension. Again, there are two possibilities with respect to a physical grid dimension. Either a tensor index is distributed across the dimension or the tensor data is replicated along it. If an index is mapped to this dimension, data along that index is divided in chunks and each chunk is stored in one node along this grid 9

22 dimension. However, if no index is mapped to this dimension, all processors along the dimension hold the same data resulting in replication and redundant storage. Keeping above facts and constraints in mind, there are various ways of mapping a tensor. Let us consider a few examples of mapping a d-dimensional tensor on a δ-dimensional grid. Example 1: One-to-one index-dimension mapping Since the mapping is exhaustive in this example, consider a d-dimensional tensor mapped on a d- dimensional grid such that each index is mapped to one and only one dimension. No two indices are mapped to the same dimension. The index-dimension map acts as a one-to-one function. Consider an example dimensionality d = 3, now 3 indices are mapped to 3 dimensions. There are 6 possible permutations of the index-dimension map and thus 6 possible ways of mapping indices to the dimensions in one-to-one fashion. Figure. 2.2 shows one of such possible mappings of the tensor A[i, j, k] on a 3D physical grid. The 3D cube represents the processor grid while the arrows indicate mapping of a tensor index i, j or k to a physical dimension. The index-dimension map for this example can be written as < 0, 1, 2 >, meaning index i is mapped to dimension 0, index j to dimension 1 and k to dimension 2. Example 2: Serialization Consider the case when the grid dimensionality is less than that of the tensor. Now, only δ of the d indices can be mapped to the δ-dimensional grid. While the remaining d δ indices have to be serialized. For example, consider a 4D tensor A[i, j, k, l] being distributed along a 3D processor grid. We will try to map as many indices possible to the various dimensions. But only 3 of them can be mapped in one-to-one mapping. The fourth one has to be serialized. There are 4 ways of choosing the serialized index and 6 ways of mapping the rest to the 3 dimensions. 10 Hence we have

23 Figure 2.2: One-to-one mapping of tensor A[i, j, k] on a 3D grid 4 6 = 24 ways of distributing the tensor (ignoring possibilities involving replication of dimensions). Note that there are more possible distributions if we allow replication along dimensions. One of the 24 possible ways of mapping A[i, j, k, l] on a 3D grid is shown in figure Here the index l is serialized since there is no dimension that it can be mapped to. The index-dimension map for this case is < 0, 1, 2, serial >, meaning indices i, j and k are mapped to the dimensions 0, 1 and 2 respectively, while index l is serialized. Example 3: Replication Consider the case when the tensor dimensionality is less than that of the grid. We can map all the tensor indices to the dimensions in one-to-one fashion. But some of the dimensions will be left unmapped. Data will be replicated along these dimensions. For example, consider a 2D tensor A[i, j] being mapped to a 3D grid. If we try to achieve as many indices mapped to different dimensions, we can map all 2 of the tensor indices to any 2 of the 3 grid dimensions and leave one of the dimensions replicated. There are 11

24 Figure 2.3: Mapping tensor A[i, j, k, l] on a 3D grid with l serialized 2 ways to choose a replicated dimension and 2 ways to map the indices on remaining 2 dimension. Hence, we have 2 2 = 4 ways of distributing the tensor (ignoring the possibilities involving serialization of tensor indices). One of the possible ways of mapping A[i, j] on a 3D grid is shown in figure Note that data is replicated along one of the physical dimensions since there is no index that can be mapped there. The index-dimension map can be represented as < 0, 1 >. Note that, unlike serialization of indices, it is not evident from the index-dimension map whether a physical dimension is replicated or not. Example 4: Serialization and Replication Although we saw exhaustive one-to-one mapping of a 3D tensor A[i, j, k] on a 3D grid, there are more possibilities. For example, even if the three indices can be mapped to the dimensions in one-to-one fashion, we can choose to serialize one or more indices and as a result the same number of dimensions will have replication along them. Figure. 2.5 shows an example where the index k is serialized, while i and j are 12

25 Figure 2.4: Mapping tensor A[i, j] on a 3D grid with one of the dimensions replicated distributed along two of the dimensions. Since one of the dimensions is left unmapped, data is replicated along it. The index-dimension map for this example is < 0, 1, serial >. Figure 2.5: Mapping tensor A[i, j, k] on a 3D grid with k serialized and one of the dimensions replicated 13

26 2.4.2 Tensor Index Distribution On Grid Dimensions Before delving into tensor redistribution algorithms, it is necessary to understand how a tensor index can be distributed on a grid dimension. Assume the size of tensor along an index to be n while the size of processor grid in one dimension to be p. Now, the problem of distributing a tensor index along a grid dimension can be reduced to a problem of distributing n elements across p nodes. There are several possible ways of doing this a subset of which is defined below. Figure 2.6: 12 elements to be distributed along 3 nodes Block Distribution In this type of distribution, we divide the sequence of n elements into p parts equally, so that each part contains n/p elements (assuming p divides n). Each part is stored on one node. As an example, consider n = 12 and p = 3. The 12 elements to be distributed are shown in Figure After distributing, each of the three nodes gets n/p = 4 elements. First node gets the first four, the second node gets the next four while the third node gets the last four elements. Figure. 2.7 illustrates this example. Although block distribution is easy to understand and implement, it does not provide a good load balancing in case of symmetric tensors where you would only want to store the unique elements. 14

27 Figure 2.7: Block Distribution Cyclic Distribution Here, n elements are cyclically divided into p nodes, such that first element is stored in first node, second element in the second node... p th element in p th node. Now the next, i.e. (p + 1) th element is stored in the first node and further elements are similarly distributed in a cyclic fashion. In general, i th element is stored on (i%p) th node. Figure. 2.8 shows how the same 12 elements can be distributed across 3 nodes in a cyclic fashion. Cyclic distribution is a good fit for symmetric tensors, since it can properly load balance the storage across the nodes in the dimension where a symmetric index is mapped. Figure 2.8: Cyclic Distribution 15

28 Block-Cyclic Distribution This distribution combines the concepts of the above two distributions into one. The elements are first divided into s equal blocks where s is an integer that divides the number of elements and is divisible by number of processors p. These s blocks of elements are then cyclically distributed across p processors. Figure. 2.9 shows how the blocks of 2 elements can be cyclically distributed across 3 nodes. This scheme is good for load balancing of symmetric tensors and is a good fit for distributed contractions since the blocks can be utilized directly in the parallel matrix multiplication kernel of contractions of tensors. The implementation for this thesis uses a block-cyclic distribution of tensors. Figure 2.9: Block-Cyclic Distribution Contraction of Distributed Tensors CCSD equations involve computations with very large tensors. Contraction of large tensors is a computationally demanding operation. Contracting a d a -dimensional and a d b -dimensional tensor resulting in a d c -dimensional tensor takes O(n da+d b +dc 2 ) time where n is the size of tensor in each dimension. Since a large percentage of CCSD equations involve upto 4D tensors, this cost often turns out exorbitant on single 16

29 processor execution. However, with distributed and parallel computing this task can be made much more efficient. The computationally intensive contraction operation can be effectively parallelized on a torus-shaped grid of processors. As discussed before, the tensors can be stored on the grid in a distributed fashion with various possible index-dimension mappings. The mapping of input tensors on the processor grid is the primary decisive factor in the choice of the most efficient contraction algorithm. The tensor distribution across the processor grid dictates the inter-processor data communication pattern for the contraction algorithm. The data communication patterns tend to be different based on the contraction algorithm. Some tensor distributions make the most efficient contraction algorithms possible, while others tend to force a more time-intensive algorithm. Hence, the initial distribution of input and output tensors in the contraction process is very important. 2.5 Need for Tensor Redistribution CCSD equations involve repeated contraction operations on several tensors. These operations are broken down in a series of contractions and additions of two intermediate tensors. Output of one tensor contraction operation is often used as input in the next contraction. However, although this tensor is involved in two different contraction operations, it often requires to have different data distributions on the grid. The contraction that generates the tensor could generate it in a certain indexdimension mapping, but for another contraction where it is used as input, its current distribution may not be optimal for the contraction to work most efficiently. Hence, the tensor needs to be redistributed with a new index-dimension mapping on the grid. As an example, let us consider a scenario that often mandates a redistribution of the input tensors depending on the initial distribution of the output tensors. External 17

30 indices (those that are retained in the output tensor after the contraction) are mapped to a physical dimension or serialized in the initial distribution of the output tensor. If an external index in the output tensor is mapped along a particular dimension, we normally would like that external index in the input tensor to be mapped along the same dimension before starting the contraction, unless contractions can be done with rotational communication patterns. Also, we usually require that external indices from different input tensors to be orthogonal to each other, i.e. they are to be mapped to different physical dimensions (those where the same external indices in the output tensor are mapped). Now, consider five 4D tensors distributed on a 4D processor grid all with the same index-dimension map < 0, 1, 2, 3 >, i.e. i t h index of each tensor is mapped to i t h dimension. A[a 1, l 1, k 1, k 2 ] B[k 1, k 2, l 2, b 1 ] C[a 1, l 1, l 2, b 1 ] D[d 1, l 1, l 2, d 2 ] E[d 1, a 1, d 2, b 1 ] These tensors undergo the following two contractions: 1. A[a 1, l 1, k 1, k 2 ] B[k 1, k 2, l 2, b 1 ] = C[a 1, l 1, l 2, b 1 ] 2. C[a 1, l 1, l 2, b 1 ] D[d 1, l 1, l 2, d 2 ] = E[d 1, a 1, d 2, b 1 ] In contraction 1, according to the given index-dimension map, all the external indices tensors A and B (the input tensors) are mapped to the same dimension where 18

31 the corresponding external indices in tensor C (the output tensor) are mapped. For example, according to the index-dimension map < 0, 1, 2, 3 >, index l 1 in C is mapped to the dimension 1; since tensor A has the same map, index l 1 in A is mapped to the dimension 1 as well. This type of intial distribution is highly desirable for the contraction algorithm to be efficient. However, in contraction 2, index a 1 is mapped to the dimension 1 in E (the output tensor), while that in tensor C is mapped to the dimension 0. Similarly, index d 2 in E is mapped to dimension 2, but that in D is mapped to dimension 3. Since we would normally want them to be mapped to the same dimension in both input and output tensors, it is desirable to redistribute the input tensors, so that the condition is fulfilled. Hence, before starting contraction 2, a preferable index-dimension map for tensor C would be < 1, 0, 2, 3 > and that for D would be < 0, 3, 1, 2 >. With such distribution, external indices in C and D are mapped to the dimensions where those in E are mapped to. To achieve this, we need to redistribute the tensors C and D after contraction 1 and before contraction 2. 19

32 CHAPTER 3 REDISTRIBUTION OF TENSORS Redistribution of tensors is an essential operation that occurs between two contraction operations in CCSD. Such frequent occurrence of redistribution increases the necessity for it to be performed efficiently. This section throws light on the details of redistribution algorithm. The redistribution algorithm takes the tensor to be redistributed and a new indexdimension map as inputs and redistributes the tensor according to the new indexdimension map. The algorithm considers both current (or old) and the new indexdimension maps to decide how to communicate data in the grid in order to redistribute the tensor. The details are mainly driven by the idea of replication along physical dimensions. 3.1 Identifying Communication Patterns Since redistribution of tensors is mainly a data communication task, the first problem encountered is which processors send data and which ones receive it. Since a blockcyclic distribution of tensors is used, the best way to identify the data held by a processor is in terms of the blocks of tensor. In block-cyclic distribution, a tensor is split into blocks along each index and they are cyclically distributed along the dimensions the respective indices are mapped to. Each block has a unique vector 20

33 address of the same dimensionality as the tensor. For example, a block in a 4D tensor A[i, j, k, l] can be identified by an address of the form < i, j, k, l > where i, j, k and l represent the position of the block along each index. Given an index-dimension map, dimensionality and size of the processor grid and tensor block size, we can determine which blocks a particular node in the grid holds. It is possible to enumerate all the block addresses held at a processor given the processor s address in the grid. Thus we can find out what blocks each processor will hold after redistribution, given the new index-dimension map. From these block addresses, each processor can determine where each of them currently reside based on the current index-dimension map. Hence, each processor can find out which other processor will send the blocks to it. Also, the addresses of currently held blocks are already stored in each processor. From the addresses, the processor can compute which other processor the block will reside at after redistribution. Hence, each processor can compute where to send each block. 3.2 Parallelizing Communication Although redistribution overall seems to be a grid-wide communication algorithm, it can be parallelized for some cases by dividing the grid in groups of nodes that handle the communication within the group. This idea of parallelization of communication stems from current replication scenario in the grid Replication in Current Distribution Communication can be parallelized and handled independently by dividing the grid into groups of nodes such that each group contains full copy of the tensor in the current distribution. Multiple copies of tensor exist in the grid only if there is replication 21

34 along at least one dimension. The division of grid is using the replication scenario is discussed below. Replication along one dimension If there is replication along one of the dimensions as per the current index-dimension map, it means we can divide the grid into hyperplanes of d 1 dimensions, where d is the dimensionality of the grid. The division is such that each processor along the replicated dimension is in a different hyperplane and each hyperplane contains a copy of the entire tensor. Hence, if there are p nodes along each dimension of the grid, there will be p hyperplanes. Figure. 3.1 shows a 3D grid with p = 4. One of the dimensions of the grid has replication along it while the other two dimensions each have some tensor index distributed along them. As shown the grid can be virtually split into 2D planes along the replicated dimension. Since all the dimensions that have some index distributed along them form the plane together, each plane contains a copy of full tensor. An interesting outcome of this is that we do not need to send data from nodes in one plane to those in any other planes in order to redistribute the tensor. Each plane can handle communication independent of other planes. This not only greatly simplifies the communication pattern for redistribution but also makes sure that nodes only communicate with the nearest other nodes for redistribution. Replication along multiple dimensions If there are d r replicated dimensions in the grid according to the current indexdimension map, the hyperplane splits will occur along all d r dimensions. This will yield hyperplanes of d d r dimensions formed with those that have indices mapped to them. Again, each hyperplane will have a copy of the entire tensor and communication can be worked out in each hyperplane independent of others. If there are p nodes along each dimension of the grid, there will be p dr 22 different hyperplanes formed.

35 Figure 3.1: Virtual splitting of grid into 2D planes along the replicated dimension No Replication in Current Distribution If there is no replication of data in the grid, the entire grid has only one copy of the tensor distributed in all nodes. In this case, since no two nodes hold the same block of tensor data, communication cannot be parallelized but it will have to be grid-wide. One point to note is that this communication will resemble that happening within a hyperplane if the grid were divided into hyperplanes containing one and only copy of the tensor. Now since the entire d-dimensional grid contains only one complete copy of the tensor, it can be viewed as a d-dimensional hyperplane itself. Hence, all arguments made for a hyperplane will be valid for the grid when there is no replication in it with respect to current distribution of tensor. 23

36 3.3 Redistribution within a Hyperplane Irrespective of whether the hyperplane spans the entire grid or a subset of it, the algorithm applies in the same manner. Redistribution can be performed in one of the two possible ways inside a hyperplane. Since we know the new index-dimension map to redistribute the tensor to, we can tell which dimensions of the hyperplane (or grid) will have replication after redistribution. The replication scenario in new distribution of tensor decides which algorithm to pick Broadcast Communication If a dimension is replicated in the new index-dimension map while some index is distributed along it in the current map, then it means there will be a broadcast of data that will be replicated along this dimension. All the processors in this dimension are supposed to receive the same data from the only processor in the hyperplane (or grid) that currently holds it. This sender processor may or may not be a part of the group of processors that will be holding this data after the broadcast. A previously defined notation to identify processors (P [I, J, K...]) will be used to discuss data broadcasts. This notation can also be used to identify multiple processors along a dimension by using a wildcard ( ). For example, in a 3D grid, a processor can be identified by its coordinates as P [I, J, K]. The processors in dimension=1 that share the other coordinates with this processor can be represented by P [I,, K] which means any processor that is I th in 0 th dimension and K th in the 2 nd dimension is included despite of what value it takes for the wildcard ( ). This group of processors represented by P [I,, K] forms a 1-dimensional torus in the grid, which can also be seen as a ring of processors. 24

37 Broadcast Groups A group of processors that will receive the broadcast (also called as a broadcast group ) can be represented using the above defined notation. If 1 st dimension in a 3D grid has replication in the new distribution but not in the old one, it becomes a broadcast dimension. A broadcast group along the dimension can be represented as P [I,, K]. If there are p processors in each dimension of the grid, there will be p p = p 2 disjoint broadcast groups, because I and K can take p possible values and form p 2 possible combinations for P [I,, K]. Mathematically generalizing the idea, consider d to be the dimensionality of the torus grid, with p nodes on each dimension. If there are b broadcast dimensions, then each broadcast group will have n b nodes and there will be n d b such broadcast groups. Figure. 3.2 shows these broadcast groups of processors for p = 4. The groups look like lines of processors (1D) along the 1 st dimension which is the dimension that will have replication in the new index-dimension map. Although we discussed for only one replicated dimension, there can be more than one dimensions that have replication in the new distribution but not in the old one. In this case, the broadcast groups span all these dimensions forming hyperplanes instead of just one line of processors. For example, consider a 4D processor grid [I, J, K, L] where two of the dimensions J and L have replication in the new distribution but some indices are mapped to them in the current distribution. Now the broadcast groups can be represented as P [I,, K, ]. Again, the processor that sends the broadcast may or may not be a part of the broadcast group. Implementation of Broadcast Groups C++ and Message Passing Interface (MPI) were used for implementing redistribution of tensors. Some MPI features allow broadcast data communication within the same 25

38 Figure 3.2: Broadcast groups of the form P[I,*,K] group or between different groups. Since broadcast is a collective operation in MPI, it needs to be done in a defined group of processes. A communicator is created for the group of processors involved in the collective operation and all the group members participate in the operation. An MPI communicator can be of two types: Intra-Communicator allows communication within a single group of processes. A process that is a part of the sole group of an intra-communicator can broadcast data to all the processes in that group. This scenario is depicted in Figure The cloud represents an MPI group of processes and the circles in the cloud represent processes. A large rectangle that wraps the group is an intra-communicator. As shown, the process 0 sends a broadcast message to all the other processes in the group. Inter-Communicator allows communication between two disjoint groups of processes. A process in one group can broadcast data to all the processes in the 26

39 Figure 3.3: Broadcast in an Intra-Communicator other group that is part of the inter-communicator. Figure. 3.4 represents this setup. The two clouds represent two groups that form an inter-communicator together. Process 0 from Group1 broadcasts data to all processes in Group2. If a processor P [I 1, J 1, K 1 ] in a 3D processor grid needs to broadcast data to a group of processors P [I 2,, K 2 ] where I 1 = I 2 and K 1 = K 2, then the sender processor is a part of the broadcast group. In this case, an intra-communicator is used. However, if I 1 I 2 or K 1 K 2, then the sender processor is not a part of the broadcast receiver group. Here, an inter-communicator is created such that one of the groups contains only the sender processor P [I 1, J 1, K 1 ], while the other group contains all the processors P [I 2,, K 2 ] that will receive the broadcast. 27

40 Figure 3.4: Broadcast in an Inter-Communicator Redistribution with Broadcast Listing 3.1 presents the algorithm for redistribution with broadcast. In this scheme, all processors that receive tensor blocks will always receive them as a part of a broadcast message. A broadcast message is directed from a sender processor to a broadcast group. The sender can have several blocks that may need to be broadcasted to different broadcast groups. Thus the sender needs to find out which broadcast group needs which set of blocks. This can be computed with the help of current (or old) and new index-dimension map, processor grid size, tensor block size, etc. Each of the broadcast groups can be represented as a generic processor address with a wildcard for the broadcast dimension. This generic address can be translated to a single number that can act as a generic rank of the broadcast group. A send map is created that 28

41 maps this generic rank of the broadcast group to the list of addresses (or numbers) of blocks it needs to send to the broadcast group. Since the sender processor knows the receiver broadcast groups it will broadcast its blocks to, it can determine whether it is a part of the broadcast groups or not. Based on this information, it can decide whether to form an intra-communicator or an intercommiunicator with the receiver group. Similarly, each processor finds out which blocks it will receive based on the new index-dimension map, processor grid size, tensor block size, etc. From the block addresses to receive, the receiver processor can find which processor will send it. The receiver also knows the broadcast group it is part of and thus it can determine if the sender is in the broadcast group or not. Thus, the receiver can decide whether to form an intra-communicator with the broadcast group or an inter-communicator with broadcast group as one group and the sender processor as the other group with sole member. Now that each processor knows which blocks to receive from which processor, a recv map is created that maps the sender processor s rank to the list of addresses (or numbers) of blocks this processor will receive from this sender. Once the send map and recv map are ready for each processor, the communicators are created. When sender is a part of the broadcast group, all the processors in the broadcast group as well as the sender know about it and they mutually form an intracommunicator. When sender is outside the receiving broadcast group, both parties know about this and they mutually form an inter-communicator. Since creation of a communicator involves internal message passing between all the processors, a haphazard order of communicator creation may result in a deadlock. Hence the function call to create a communicator is made by each processor involved (sender and broadcast receivers) in the ascending order of the sender processor s rank in the world (or global) communicator. 29

42 Listing 3.1: Redistribution with Broadcast Communication send_ map : maps the general ranks of processor groups that will receive the broadcast to the list of block numbers to broadcast recv_ map : maps the ranks of processors to receive blocks from to the list of block numbers to receive rank : rank of this processor in the processor grid send_ comm : array of communicators for messages to send recv_ comm : array of communicators for messages to receive redistribute_broadcast ( bcast_dims ): send_ map = generate_ send_ map (); recv_ map = generate_ recv_ map (); send_comm = create_bcast_send_comms ( send_map, bcast_dims ); recv_comm = create_bcast_recv_comms ( recv_map, bcast_dims ); sends_ done = false ; for (r in range (0, recv_map. size )) sender = recv_map [r]. key ; block_ids_to_recv = recv_map [r]. value ; if ( sender >= rank AND sends_ done == false ) send_bcast ( send_map ); sends_ done = true ; if ( sender!= rank ) received_ blocks = recv_ bcast ( recv_ comm [ r ]); if ( sends_ done == false ) send_bcast ( send_map ); tensor. blocks = received_ blocks ; send_ bcast ( send_ map, blocks_ to_ send ): for (s in range (0, send_map. size )) recv_group = send_map [s]. key ; block_ids_to_send = send_map [s]. value ; blocks_ to_ send = gather_ blocks ( block_ ids_ to_ send ); bcast ( blocks_to_send, send_comm [s ]); if ( sender is in the recv group ) received_blocks. add ( blocks_to_send ); 30

43 After all the maps and communicators are ready, the processors are ready to exchange blocks. Again, since each processor can be involved in multiple broadcasts as a sender or a receiver, a haphazard ordering of calls to send and receive broadcasts will result in a deadlock. A similar scheme used in creation of communicators can be used and the function calls can be made send or receive broadcasts in the ascending order of the sender s rank in the world (or global) communicator. Since we are following sender s rank order, we would like to access each key in the recv map (which is nothing but the sender s rank) in ascending order. Iterating over the map yields ascending order of the keys. The processor starts posting broadcast receive calls for each entry in recv map. When the processor finds that it has received data from all the processors with rank less than its own rank, it is time to send the blocks it holds to the processors expecting them. Hence, the processor iterates through the send map and posts broadcast sends for each entry. Once it is done with all sends, remaining broadcast receives are posted in the ascending order of senders. This finishes the broadcast of blocks and at the end of this exchange, every processor holds the blocks that it should as per the new index-dimension map Point-to-Point Communication This scenario arises when none of the dimensions in the hyperplane have replication along them in the new distribution. The statement is also valid for the entire grid when there is no replication in current distribution as well, meaning there is only one copy of the tensor in entire grid. This idea can also be represented with the following cases: Case I There is no dimension that has replication as per the new distribution irrespective of its replication scenario in the current distribution. However, it 31

Cyclops Tensor Framework

Cyclops Tensor Framework Cyclops Tensor Framework Edgar Solomonik Department of EECS, Computer Science Division, UC Berkeley March 17, 2014 1 / 29 Edgar Solomonik Cyclops Tensor Framework 1/ 29 Definition of a tensor A rank r

More information

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 Star Joins A common structure for data mining of commercial data is the star join. For example, a chain store like Walmart keeps a fact table whose tuples each

More information

Event Operators: Formalization, Algorithms, and Implementation Using Interval- Based Semantics

Event Operators: Formalization, Algorithms, and Implementation Using Interval- Based Semantics Department of Computer Science and Engineering University of Texas at Arlington Arlington, TX 76019 Event Operators: Formalization, Algorithms, and Implementation Using Interval- Based Semantics Raman

More information

Overview: Synchronous Computations

Overview: Synchronous Computations Overview: Synchronous Computations barriers: linear, tree-based and butterfly degrees of synchronization synchronous example 1: Jacobi Iterations serial and parallel code, performance analysis synchronous

More information

Efficient algorithms for symmetric tensor contractions

Efficient algorithms for symmetric tensor contractions Efficient algorithms for symmetric tensor contractions Edgar Solomonik 1 Department of EECS, UC Berkeley Oct 22, 2013 1 / 42 Edgar Solomonik Symmetric tensor contractions 1/ 42 Motivation The goal is to

More information

Vector, Matrix, and Tensor Derivatives

Vector, Matrix, and Tensor Derivatives Vector, Matrix, and Tensor Derivatives Erik Learned-Miller The purpose of this document is to help you learn to take derivatives of vectors, matrices, and higher order tensors (arrays with three dimensions

More information

MTH Linear Algebra. Study Guide. Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education

MTH Linear Algebra. Study Guide. Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education MTH 3 Linear Algebra Study Guide Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education June 3, ii Contents Table of Contents iii Matrix Algebra. Real Life

More information

Trivially parallel computing

Trivially parallel computing Parallel Computing After briefly discussing the often neglected, but in praxis frequently encountered, issue of trivially parallel computing, we turn to parallel computing with information exchange. Our

More information

Constructing Polar Codes Using Iterative Bit-Channel Upgrading. Arash Ghayoori. B.Sc., Isfahan University of Technology, 2011

Constructing Polar Codes Using Iterative Bit-Channel Upgrading. Arash Ghayoori. B.Sc., Isfahan University of Technology, 2011 Constructing Polar Codes Using Iterative Bit-Channel Upgrading by Arash Ghayoori B.Sc., Isfahan University of Technology, 011 A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree

More information

A Computation- and Communication-Optimal Parallel Direct 3-body Algorithm

A Computation- and Communication-Optimal Parallel Direct 3-body Algorithm A Computation- and Communication-Optimal Parallel Direct 3-body Algorithm Penporn Koanantakool and Katherine Yelick {penpornk, yelick}@cs.berkeley.edu Computer Science Division, University of California,

More information

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way EECS 16A Designing Information Devices and Systems I Fall 018 Lecture Notes Note 1 1.1 Introduction to Linear Algebra the EECS Way In this note, we will teach the basics of linear algebra and relate it

More information

Finite difference methods. Finite difference methods p. 1

Finite difference methods. Finite difference methods p. 1 Finite difference methods Finite difference methods p. 1 Overview 1D heat equation u t = κu xx +f(x,t) as a motivating example Quick intro of the finite difference method Recapitulation of parallelization

More information

2.5D algorithms for distributed-memory computing

2.5D algorithms for distributed-memory computing ntroduction for distributed-memory computing C Berkeley July, 2012 1/ 62 ntroduction Outline ntroduction Strong scaling 2.5D factorization 2/ 62 ntroduction Strong scaling Solving science problems faster

More information

An Algorithm for a Two-Disk Fault-Tolerant Array with (Prime 1) Disks

An Algorithm for a Two-Disk Fault-Tolerant Array with (Prime 1) Disks An Algorithm for a Two-Disk Fault-Tolerant Array with (Prime 1) Disks Sanjeeb Nanda and Narsingh Deo School of Computer Science University of Central Florida Orlando, Florida 32816-2362 sanjeeb@earthlink.net,

More information

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = 30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

More information

Linear Algebra Review

Linear Algebra Review Chapter 1 Linear Algebra Review It is assumed that you have had a beginning course in linear algebra, and are familiar with matrix multiplication, eigenvectors, etc I will review some of these terms here,

More information

Linear Algebra Review

Linear Algebra Review Chapter 1 Linear Algebra Review It is assumed that you have had a course in linear algebra, and are familiar with matrix multiplication, eigenvectors, etc. I will review some of these terms here, but quite

More information

Linear Algebra and Eigenproblems

Linear Algebra and Eigenproblems Appendix A A Linear Algebra and Eigenproblems A working knowledge of linear algebra is key to understanding many of the issues raised in this work. In particular, many of the discussions of the details

More information

Scalable numerical algorithms for electronic structure calculations

Scalable numerical algorithms for electronic structure calculations Scalable numerical algorithms for electronic structure calculations Edgar Solomonik C Berkeley July, 2012 Edgar Solomonik Cyclops Tensor Framework 1/ 73 Outline Introduction Motivation: Coupled Cluster

More information

Improvements for Implicit Linear Equation Solvers

Improvements for Implicit Linear Equation Solvers Improvements for Implicit Linear Equation Solvers Roger Grimes, Bob Lucas, Clement Weisbecker Livermore Software Technology Corporation Abstract Solving large sparse linear systems of equations is often

More information

Designing Information Devices and Systems I Spring 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way

Designing Information Devices and Systems I Spring 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way EECS 16A Designing Information Devices and Systems I Spring 018 Lecture Notes Note 1 1.1 Introduction to Linear Algebra the EECS Way In this note, we will teach the basics of linear algebra and relate

More information

Algebraic Methods in Combinatorics

Algebraic Methods in Combinatorics Algebraic Methods in Combinatorics Po-Shen Loh 27 June 2008 1 Warm-up 1. (A result of Bourbaki on finite geometries, from Răzvan) Let X be a finite set, and let F be a family of distinct proper subsets

More information

COLORINGS FOR MORE EFFICIENT COMPUTATION OF JACOBIAN MATRICES BY DANIEL WESLEY CRANSTON

COLORINGS FOR MORE EFFICIENT COMPUTATION OF JACOBIAN MATRICES BY DANIEL WESLEY CRANSTON COLORINGS FOR MORE EFFICIENT COMPUTATION OF JACOBIAN MATRICES BY DANIEL WESLEY CRANSTON B.S., Greenville College, 1999 M.S., University of Illinois, 2000 THESIS Submitted in partial fulfillment of the

More information

4th year Project demo presentation

4th year Project demo presentation 4th year Project demo presentation Colm Ó héigeartaigh CASE4-99387212 coheig-case4@computing.dcu.ie 4th year Project demo presentation p. 1/23 Table of Contents An Introduction to Quantum Computing The

More information

Linear Algebra: Lecture Notes. Dr Rachel Quinlan School of Mathematics, Statistics and Applied Mathematics NUI Galway

Linear Algebra: Lecture Notes. Dr Rachel Quinlan School of Mathematics, Statistics and Applied Mathematics NUI Galway Linear Algebra: Lecture Notes Dr Rachel Quinlan School of Mathematics, Statistics and Applied Mathematics NUI Galway November 6, 23 Contents Systems of Linear Equations 2 Introduction 2 2 Elementary Row

More information

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel?

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel? CRYSTAL in parallel: replicated and distributed (MPP) data Roberto Orlando Dipartimento di Chimica Università di Torino Via Pietro Giuria 5, 10125 Torino (Italy) roberto.orlando@unito.it 1 Why parallel?

More information

Solution of Linear Systems

Solution of Linear Systems Solution of Linear Systems Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico May 12, 2016 CPD (DEI / IST) Parallel and Distributed Computing

More information

MTH 2032 Semester II

MTH 2032 Semester II MTH 232 Semester II 2-2 Linear Algebra Reference Notes Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education December 28, 2 ii Contents Table of Contents

More information

[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.]

[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.] Math 43 Review Notes [Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty Dot Product If v (v, v, v 3 and w (w, w, w 3, then the

More information

Announcements PA2 due Friday Midterm is Wednesday next week, in class, one week from today

Announcements PA2 due Friday Midterm is Wednesday next week, in class, one week from today Loop Transformations Announcements PA2 due Friday Midterm is Wednesday next week, in class, one week from today Today Recall stencil computations Intro to loop transformations Data dependencies between

More information

Algebraic Methods in Combinatorics

Algebraic Methods in Combinatorics Algebraic Methods in Combinatorics Po-Shen Loh June 2009 1 Linear independence These problems both appeared in a course of Benny Sudakov at Princeton, but the links to Olympiad problems are due to Yufei

More information

J S Parker (QUB), Martin Plummer (STFC), H W van der Hart (QUB) Version 1.0, September 29, 2015

J S Parker (QUB), Martin Plummer (STFC), H W van der Hart (QUB) Version 1.0, September 29, 2015 Report on ecse project Performance enhancement in R-matrix with time-dependence (RMT) codes in preparation for application to circular polarised light fields J S Parker (QUB), Martin Plummer (STFC), H

More information

Performance Evaluation of the Matlab PCT for Parallel Implementations of Nonnegative Tensor Factorization

Performance Evaluation of the Matlab PCT for Parallel Implementations of Nonnegative Tensor Factorization Performance Evaluation of the Matlab PCT for Parallel Implementations of Nonnegative Tensor Factorization Tabitha Samuel, Master s Candidate Dr. Michael W. Berry, Major Professor Abstract: Increasingly

More information

LINEAR ALGEBRA KNOWLEDGE SURVEY

LINEAR ALGEBRA KNOWLEDGE SURVEY LINEAR ALGEBRA KNOWLEDGE SURVEY Instructions: This is a Knowledge Survey. For this assignment, I am only interested in your level of confidence about your ability to do the tasks on the following pages.

More information

A comparison of sequencing formulations in a constraint generation procedure for avionics scheduling

A comparison of sequencing formulations in a constraint generation procedure for avionics scheduling A comparison of sequencing formulations in a constraint generation procedure for avionics scheduling Department of Mathematics, Linköping University Jessika Boberg LiTH-MAT-EX 2017/18 SE Credits: Level:

More information

Symmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano

Symmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano Symmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano Introduction Introduction We wanted to parallelize a serial algorithm for the pivoted Cholesky factorization

More information

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular

More information

How to Optimally Allocate Resources for Coded Distributed Computing?

How to Optimally Allocate Resources for Coded Distributed Computing? 1 How to Optimally Allocate Resources for Coded Distributed Computing? Qian Yu, Songze Li, Mohammad Ali Maddah-Ali, and A. Salman Avestimehr Department of Electrical Engineering, University of Southern

More information

B-Spline Interpolation on Lattices

B-Spline Interpolation on Lattices B-Spline Interpolation on Lattices David Eberly, Geometric Tools, Redmond WA 98052 https://www.geometrictools.com/ This work is licensed under the Creative Commons Attribution 4.0 International License.

More information

Clojure Concurrency Constructs, Part Two. CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014

Clojure Concurrency Constructs, Part Two. CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014 Clojure Concurrency Constructs, Part Two CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014 1 Goals Cover the material presented in Chapter 4, of our concurrency textbook In particular,

More information

Divisible Load Scheduling

Divisible Load Scheduling Divisible Load Scheduling Henri Casanova 1,2 1 Associate Professor Department of Information and Computer Science University of Hawai i at Manoa, U.S.A. 2 Visiting Associate Professor National Institute

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

Lab 2 Worksheet. Problems. Problem 1: Geometry and Linear Equations

Lab 2 Worksheet. Problems. Problem 1: Geometry and Linear Equations Lab 2 Worksheet Problems Problem : Geometry and Linear Equations Linear algebra is, first and foremost, the study of systems of linear equations. You are going to encounter linear systems frequently in

More information

3 Matrix Algebra. 3.1 Operations on matrices

3 Matrix Algebra. 3.1 Operations on matrices 3 Matrix Algebra A matrix is a rectangular array of numbers; it is of size m n if it has m rows and n columns. A 1 n matrix is a row vector; an m 1 matrix is a column vector. For example: 1 5 3 5 3 5 8

More information

CS 598: Communication Cost Analysis of Algorithms Lecture 9: The Ideal Cache Model and the Discrete Fourier Transform

CS 598: Communication Cost Analysis of Algorithms Lecture 9: The Ideal Cache Model and the Discrete Fourier Transform CS 598: Communication Cost Analysis of Algorithms Lecture 9: The Ideal Cache Model and the Discrete Fourier Transform Edgar Solomonik University of Illinois at Urbana-Champaign September 21, 2016 Fast

More information

Linear Algebra I. Ronald van Luijk, 2015

Linear Algebra I. Ronald van Luijk, 2015 Linear Algebra I Ronald van Luijk, 2015 With many parts from Linear Algebra I by Michael Stoll, 2007 Contents Dependencies among sections 3 Chapter 1. Euclidean space: lines and hyperplanes 5 1.1. Definition

More information

Roberto s Notes on Linear Algebra Chapter 10: Eigenvalues and diagonalization Section 3. Diagonal matrices

Roberto s Notes on Linear Algebra Chapter 10: Eigenvalues and diagonalization Section 3. Diagonal matrices Roberto s Notes on Linear Algebra Chapter 10: Eigenvalues and diagonalization Section 3 Diagonal matrices What you need to know already: Basic definition, properties and operations of matrix. What you

More information

SDS developer guide. Develop distributed and parallel applications in Java. Nathanaël Cottin. version

SDS developer guide. Develop distributed and parallel applications in Java. Nathanaël Cottin. version SDS developer guide Develop distributed and parallel applications in Java Nathanaël Cottin sds@ncottin.net http://sds.ncottin.net version 0.0.3 Copyright 2007 - Nathanaël Cottin Permission is granted to

More information

Chapter Two Elements of Linear Algebra

Chapter Two Elements of Linear Algebra Chapter Two Elements of Linear Algebra Previously, in chapter one, we have considered single first order differential equations involving a single unknown function. In the next chapter we will begin to

More information

Cofactors and Laplace s expansion theorem

Cofactors and Laplace s expansion theorem Roberto s Notes on Linear Algebra Chapter 5: Determinants Section 3 Cofactors and Laplace s expansion theorem What you need to know already: What a determinant is. How to use Gauss-Jordan elimination to

More information

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4 Linear Algebra Section. : LU Decomposition Section. : Permutations and transposes Wednesday, February 1th Math 01 Week # 1 The LU Decomposition We learned last time that we can factor a invertible matrix

More information

THESIS. Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The Ohio State University

THESIS. Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The Ohio State University The Hasse-Minkowski Theorem in Two and Three Variables THESIS Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The Ohio State University By

More information

Preconditioned Parallel Block Jacobi SVD Algorithm

Preconditioned Parallel Block Jacobi SVD Algorithm Parallel Numerics 5, 15-24 M. Vajteršic, R. Trobec, P. Zinterhof, A. Uhl (Eds.) Chapter 2: Matrix Algebra ISBN 961-633-67-8 Preconditioned Parallel Block Jacobi SVD Algorithm Gabriel Okša 1, Marián Vajteršic

More information

LAKELAND COMMUNITY COLLEGE COURSE OUTLINE FORM

LAKELAND COMMUNITY COLLEGE COURSE OUTLINE FORM LAKELAND COMMUNITY COLLEGE COURSE OUTLINE FORM ORIGINATION DATE: 8/2/99 APPROVAL DATE: 3/22/12 LAST MODIFICATION DATE: 3/28/12 EFFECTIVE TERM/YEAR: FALL/ 12 COURSE ID: COURSE TITLE: MATH2800 Linear Algebra

More information

Computational Approaches to Finding Irreducible Representations

Computational Approaches to Finding Irreducible Representations Computational Approaches to Finding Irreducible Representations Joseph Thomas Research Advisor: Klaus Lux May 16, 2008 Introduction Among the various branches of algebra, linear algebra has the distinctions

More information

Parallel Singular Value Decomposition. Jiaxing Tan

Parallel Singular Value Decomposition. Jiaxing Tan Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate SVD? How to parallelize SVD? Future Work What is SVD? Matrix Decomposition Eigen Decomposition A (non-zero) vector

More information

Efficient implementation of the overlap operator on multi-gpus

Efficient implementation of the overlap operator on multi-gpus Efficient implementation of the overlap operator on multi-gpus Andrei Alexandru Mike Lujan, Craig Pelissier, Ben Gamari, Frank Lee SAAHPC 2011 - University of Tennessee Outline Motivation Overlap operator

More information

Lecture 19. Architectural Directions

Lecture 19. Architectural Directions Lecture 19 Architectural Directions Today s lecture Advanced Architectures NUMA Blue Gene 2010 Scott B. Baden / CSE 160 / Winter 2010 2 Final examination Announcements Thursday, March 17, in this room:

More information

Communication Lower Bounds for Programs that Access Arrays

Communication Lower Bounds for Programs that Access Arrays Communication Lower Bounds for Programs that Access Arrays Nicholas Knight, Michael Christ, James Demmel, Thomas Scanlon, Katherine Yelick UC-Berkeley Scientific Computing and Matrix Computations Seminar

More information

Numerical Methods Lecture 2 Simultaneous Equations

Numerical Methods Lecture 2 Simultaneous Equations CGN 42 - Computer Methods Numerical Methods Lecture 2 Simultaneous Equations Topics: matrix operations solving systems of equations Matrix operations: Adding / subtracting Transpose Multiplication Adding

More information

STRONG FORMS OF ORTHOGONALITY FOR SETS OF HYPERCUBES

STRONG FORMS OF ORTHOGONALITY FOR SETS OF HYPERCUBES The Pennsylvania State University The Graduate School Department of Mathematics STRONG FORMS OF ORTHOGONALITY FOR SETS OF HYPERCUBES A Dissertation in Mathematics by John T. Ethier c 008 John T. Ethier

More information

Parallelization of the QC-lib Quantum Computer Simulator Library

Parallelization of the QC-lib Quantum Computer Simulator Library Parallelization of the QC-lib Quantum Computer Simulator Library Ian Glendinning and Bernhard Ömer VCPC European Centre for Parallel Computing at Vienna Liechtensteinstraße 22, A-19 Vienna, Austria http://www.vcpc.univie.ac.at/qc/

More information

Katholieke Universiteit Leuven Department of Computer Science

Katholieke Universiteit Leuven Department of Computer Science On the maximal cycle and transient lengths of circular cellular automata Kim Weyns, Bart Demoen Report CW 375, December 2003 Katholieke Universiteit Leuven Department of Computer Science Celestijnenlaan

More information

INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. NPTEL National Programme on Technology Enhanced Learning. Probability Methods in Civil Engineering

INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. NPTEL National Programme on Technology Enhanced Learning. Probability Methods in Civil Engineering INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR NPTEL National Programme on Technology Enhanced Learning Probability Methods in Civil Engineering Prof. Rajib Maity Department of Civil Engineering IIT Kharagpur

More information

Parallel Programming. Parallel algorithms Linear systems solvers

Parallel Programming. Parallel algorithms Linear systems solvers Parallel Programming Parallel algorithms Linear systems solvers Terminology System of linear equations Solve Ax = b for x Special matrices Upper triangular Lower triangular Diagonally dominant Symmetric

More information

A GENETIC ALGORITHM FOR FINITE STATE AUTOMATA

A GENETIC ALGORITHM FOR FINITE STATE AUTOMATA A GENETIC ALGORITHM FOR FINITE STATE AUTOMATA Aviral Takkar Computer Engineering Department, Delhi Technological University( Formerly Delhi College of Engineering), Shahbad Daulatpur, Main Bawana Road,

More information

On queueing in coded networks queue size follows degrees of freedom

On queueing in coded networks queue size follows degrees of freedom On queueing in coded networks queue size follows degrees of freedom Jay Kumar Sundararajan, Devavrat Shah, Muriel Médard Laboratory for Information and Decision Systems, Massachusetts Institute of Technology,

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra By: David McQuilling; Jesus Caban Deng Li Jan.,31,006 CS51 Solving Linear Equations u + v = 8 4u + 9v = 1 A x b 4 9 u v = 8 1 Gaussian Elimination Start with the matrix representation

More information

Review of linear algebra

Review of linear algebra Review of linear algebra 1 Vectors and matrices We will just touch very briefly on certain aspects of linear algebra, most of which should be familiar. Recall that we deal with vectors, i.e. elements of

More information

Algebra Exam. Solutions and Grading Guide

Algebra Exam. Solutions and Grading Guide Algebra Exam Solutions and Grading Guide You should use this grading guide to carefully grade your own exam, trying to be as objective as possible about what score the TAs would give your responses. Full

More information

Chapter 2. Matrix Arithmetic. Chapter 2

Chapter 2. Matrix Arithmetic. Chapter 2 Matrix Arithmetic Matrix Addition and Subtraction Addition and subtraction act element-wise on matrices. In order for the addition/subtraction (A B) to be possible, the two matrices A and B must have the

More information

Class President: A Network Approach to Popularity. Due July 18, 2014

Class President: A Network Approach to Popularity. Due July 18, 2014 Class President: A Network Approach to Popularity Due July 8, 24 Instructions. Due Fri, July 8 at :59 PM 2. Work in groups of up to 3 3. Type up the report, and submit as a pdf on D2L 4. Attach the code

More information

Additional Constructions to Solve the Generalized Russian Cards Problem using Combinatorial Designs

Additional Constructions to Solve the Generalized Russian Cards Problem using Combinatorial Designs Additional Constructions to Solve the Generalized Russian Cards Problem using Combinatorial Designs Colleen M. Swanson Computer Science & Engineering Division University of Michigan Ann Arbor, MI 48109,

More information

Contents. Preface... xi. Introduction...

Contents. Preface... xi. Introduction... Contents Preface... xi Introduction... xv Chapter 1. Computer Architectures... 1 1.1. Different types of parallelism... 1 1.1.1. Overlap, concurrency and parallelism... 1 1.1.2. Temporal and spatial parallelism

More information

c 2011 Nisha Somnath

c 2011 Nisha Somnath c 2011 Nisha Somnath HIERARCHICAL SUPERVISORY CONTROL OF COMPLEX PETRI NETS BY NISHA SOMNATH THESIS Submitted in partial fulfillment of the requirements for the degree of Master of Science in Aerospace

More information

Modelling and implementation of algorithms in applied mathematics using MPI

Modelling and implementation of algorithms in applied mathematics using MPI Modelling and implementation of algorithms in applied mathematics using MPI Lecture 3: Linear Systems: Simple Iterative Methods and their parallelization, Programming MPI G. Rapin Brazil March 2011 Outline

More information

Counting Clusters on a Grid

Counting Clusters on a Grid Dartmouth College Undergraduate Honors Thesis Counting Clusters on a Grid Author: Jacob Richey Faculty Advisor: Peter Winkler May 23, 2014 1 Acknowledgements There are a number of people who have made

More information

Lecture 5: Web Searching using the SVD

Lecture 5: Web Searching using the SVD Lecture 5: Web Searching using the SVD Information Retrieval Over the last 2 years the number of internet users has grown exponentially with time; see Figure. Trying to extract information from this exponentially

More information

Vector Spaces. 9.1 Opening Remarks. Week Solvable or not solvable, that s the question. View at edx. Consider the picture

Vector Spaces. 9.1 Opening Remarks. Week Solvable or not solvable, that s the question. View at edx. Consider the picture Week9 Vector Spaces 9. Opening Remarks 9.. Solvable or not solvable, that s the question Consider the picture (,) (,) p(χ) = γ + γ χ + γ χ (, ) depicting three points in R and a quadratic polynomial (polynomial

More information

MATH 2030: MATRICES ,, a m1 a m2 a mn If the columns of A are the vectors a 1, a 2,...,a n ; A is represented as A 1. .

MATH 2030: MATRICES ,, a m1 a m2 a mn If the columns of A are the vectors a 1, a 2,...,a n ; A is represented as A 1. . MATH 030: MATRICES Matrix Operations We have seen how matrices and the operations on them originated from our study of linear equations In this chapter we study matrices explicitely Definition 01 A matrix

More information

5.1 Banded Storage. u = temperature. The five-point difference operator. uh (x, y + h) 2u h (x, y)+u h (x, y h) uh (x + h, y) 2u h (x, y)+u h (x h, y)

5.1 Banded Storage. u = temperature. The five-point difference operator. uh (x, y + h) 2u h (x, y)+u h (x, y h) uh (x + h, y) 2u h (x, y)+u h (x h, y) 5.1 Banded Storage u = temperature u= u h temperature at gridpoints u h = 1 u= Laplace s equation u= h u = u h = grid size u=1 The five-point difference operator 1 u h =1 uh (x + h, y) 2u h (x, y)+u h

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Power System Analysis Prof. A. K. Sinha Department of Electrical Engineering Indian Institute of Technology, Kharagpur. Lecture - 21 Power Flow VI

Power System Analysis Prof. A. K. Sinha Department of Electrical Engineering Indian Institute of Technology, Kharagpur. Lecture - 21 Power Flow VI Power System Analysis Prof. A. K. Sinha Department of Electrical Engineering Indian Institute of Technology, Kharagpur Lecture - 21 Power Flow VI (Refer Slide Time: 00:57) Welcome to lesson 21. In this

More information

ALGEBRA AND GEOMETRY. Cambridge University Press Algebra and Geometry Alan F. Beardon Frontmatter More information

ALGEBRA AND GEOMETRY. Cambridge University Press Algebra and Geometry Alan F. Beardon Frontmatter More information ALGEBRA AND GEOMETRY This text gives a basic introduction and a unified approach to algebra and geometry. It covers the ideas of complex numbers, scalar and vector products, determinants, linear algebra,

More information

Cyclops Tensor Framework: reducing communication and eliminating load imbalance in massively parallel contractions

Cyclops Tensor Framework: reducing communication and eliminating load imbalance in massively parallel contractions Cyclops Tensor Framework: reducing communication and eliminating load imbalance in massively parallel contractions Edgar Solomonik 1, Devin Matthews 3, Jeff Hammond 4, James Demmel 1,2 1 Department of

More information

26 Group Theory Basics

26 Group Theory Basics 26 Group Theory Basics 1. Reference: Group Theory and Quantum Mechanics by Michael Tinkham. 2. We said earlier that we will go looking for the set of operators that commute with the molecular Hamiltonian.

More information

STUDY OF PERMUTATION MATRICES BASED LDPC CODE CONSTRUCTION

STUDY OF PERMUTATION MATRICES BASED LDPC CODE CONSTRUCTION EE229B PROJECT REPORT STUDY OF PERMUTATION MATRICES BASED LDPC CODE CONSTRUCTION Zhengya Zhang SID: 16827455 zyzhang@eecs.berkeley.edu 1 MOTIVATION Permutation matrices refer to the square matrices with

More information

Gershgorin s Circle Theorem for Estimating the Eigenvalues of a Matrix with Known Error Bounds

Gershgorin s Circle Theorem for Estimating the Eigenvalues of a Matrix with Known Error Bounds Gershgorin s Circle Theorem for Estimating the Eigenvalues of a Matrix with Known Error Bounds Author: David Marquis Advisors: Professor Hans De Moor Dr. Kathryn Porter Reader: Dr. Michael Nathanson May

More information

13 Searching the Web with the SVD

13 Searching the Web with the SVD 13 Searching the Web with the SVD 13.1 Information retrieval Over the last 20 years the number of internet users has grown exponentially with time; see Figure 1. Trying to extract information from this

More information

Fortran program + Partial data layout specifications Data Layout Assistant.. regular problems. dynamic remapping allowed Invoked only a few times Not part of the compiler Can use expensive techniques HPF

More information

PRECONDITIONING IN THE PARALLEL BLOCK-JACOBI SVD ALGORITHM

PRECONDITIONING IN THE PARALLEL BLOCK-JACOBI SVD ALGORITHM Proceedings of ALGORITMY 25 pp. 22 211 PRECONDITIONING IN THE PARALLEL BLOCK-JACOBI SVD ALGORITHM GABRIEL OKŠA AND MARIÁN VAJTERŠIC Abstract. One way, how to speed up the computation of the singular value

More information

2 - Strings and Binomial Coefficients

2 - Strings and Binomial Coefficients November 14, 2017 2 - Strings and Binomial Coefficients William T. Trotter trotter@math.gatech.edu Basic Definition Let n be a positive integer and let [n] = {1, 2,, n}. A sequence of length n such as

More information

Core Connections Algebra 2 Checkpoint Materials

Core Connections Algebra 2 Checkpoint Materials Core Connections Algebra 2 Note to Students (and their Teachers) Students master different skills at different speeds. No two students learn eactly the same way at the same time. At some point you will

More information

MATH 433 Applied Algebra Lecture 22: Review for Exam 2.

MATH 433 Applied Algebra Lecture 22: Review for Exam 2. MATH 433 Applied Algebra Lecture 22: Review for Exam 2. Topics for Exam 2 Permutations Cycles, transpositions Cycle decomposition of a permutation Order of a permutation Sign of a permutation Symmetric

More information

In biological terms, memory refers to the ability of neural systems to store activity patterns and later recall them when required.

In biological terms, memory refers to the ability of neural systems to store activity patterns and later recall them when required. In biological terms, memory refers to the ability of neural systems to store activity patterns and later recall them when required. In humans, association is known to be a prominent feature of memory.

More information

An Integrative Model for Parallelism

An Integrative Model for Parallelism An Integrative Model for Parallelism Victor Eijkhout ICERM workshop 2012/01/09 Introduction Formal part Examples Extension to other memory models Conclusion tw-12-exascale 2012/01/09 2 Introduction tw-12-exascale

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Experimental designs for multiple responses with different models

Experimental designs for multiple responses with different models Graduate Theses and Dissertations Graduate College 2015 Experimental designs for multiple responses with different models Wilmina Mary Marget Iowa State University Follow this and additional works at:

More information

Parallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco

Parallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Parallel programming using MPI Analysis and optimization Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Outline l Parallel programming: Basic definitions l Choosing right algorithms: Optimal serial and

More information

SUPERDENSE CODING AND QUANTUM TELEPORTATION

SUPERDENSE CODING AND QUANTUM TELEPORTATION SUPERDENSE CODING AND QUANTUM TELEPORTATION YAQIAO LI This note tries to rephrase mathematically superdense coding and quantum teleportation explained in [] Section.3 and.3.7, respectively (as if I understood

More information