All of the above algorithms are such that the total work done by themisω(n 2 m 2 ). (The work done by a parallel algorithm that uses p processors and
|
|
- Nicholas Phillip Wilson
- 5 years ago
- Views:
Transcription
1 Efficient Parallel Algorithms for Template Matching Sanguthevar Rajasekaran Department of CISE, University of Florida Abstract. The parallel complexity of template matching has been well studied. In this paper we present more work-efficient algorithms than the existing ones. Our algorithms are based on FFT primitives. We consider the following models of computing: PRAM and the hypercube. 1 Introduction The problem of template matching (also known as convolution) is a basic operation in computer vision and image processing and hence is of vital importance. For instance, template matching is used for edge and object detection, filtering, and image registration [8]. The parallel complexity of this problem has been well studied. In the 2-D version of this problem, we are given an image I[0 : n 1; 0:n 1] and a template T [0 : m 1; 0:m 1] of pixel values. The problem is to compute C[0 : n 1; 0:n 1] r=0 where C[i; j] = q=0 I[(i + q) modn; (j + r) modn] Λ T [q; r]. Fang and Ni [1] showed that 2-D template matching can be done in O(m 2 ) time using n 2 hypercube processors. They also extended this algorithm to the mesh and shuffle-exchnage networks to obtain similar time and processor bounds. Followed by this work, Prasanna Kumar and Krishnan [5] presented hyercube algorithms for 2-D template matching. They considered the cases when the individual processor-memory was O(m) and O(1), respectively. Their algorithm runs in time O(m 2 =k 2 +logn) using n 2 k 2 hypercube processors, for any 1» k» m. Ranka and Sahni [6, 7] have presented 2-D template matching algorithms for both SIMD and MIMD versions of the hypercube. They have also considered the cases when the individual processormemory is O(m) and O(1), respectively. For example, if each processor has O(m) memory, then their algorithm on the MIMD model runs in time O(m 2 + log(n=m)) using n 2 processors. 1
2 All of the above algorithms are such that the total work done by themisω(n 2 m 2 ). (The work done by a parallel algorithm that uses p processors and take t time is defined to be p t.) In this paper we present algorithms whose total work is either O(n 2 m log m) oro(n 2 m log n). If m>log n, our algorithms will perform better than previous algorithms. We apply FFT techniques. 2 Models of Computing In a Parallel Random Access Machine (PRAM), a number of processors work synchronously. They communicate with each other using a common block of global memory that is accessible by all. Communication is performed by writing into and/or reading from the common memory. EREW (Exclusive Read and Exclusive Write) PRAM is a version in which no concurrent read or write is allowed on any cellofthe global memory. A CREW (Concurrent Read and Exclusive Write) PRAM permits concurrent reads but not concurrent writes. The CRCW PRAM model allows both concurrent reads and concurrent writes. A hypercube of dimension d has 2 d processors. Each processor can be labeled with a d-bit binary number. Let v (i) stand for the binary number which differs from v only in the ith bit. We shall use the same symbol to denote a processor and its label. Any processor v in the hypercube is connected only to the processors v (i) for i =1; 2; ;d. There are two variants of the hypercube. In the first version, known as the sequential hypercube, single-port hypercube, or SIMD hypercube, it is assumed that in one unit of time a processor can communicate with only one of its neighbors. In contrast, the second version, known as the parallel hypercube, multiport hypercube, or MIMD hypercube assumes that in one unit of time a processor can communicate with all its d neighbors. We assume the sequential hypercube model wherein each processor has only O(1) local memory. 3 1-D Template Matching In the one-dimensional case, this problem is stated as follows: I[0 : n 1] is a given image and T [0 : m 1] is a template. Given I and T, compute C[0 : n 1], where C[i] = j=0 I[(i + j) modn] Λ T [j], for 0» i» n 1. 2
3 First consider the case when n = m. This computation can be reduced to polynomials multiplication and hence FFT computation. Let A(x) =I[0] + I[1]x + I[2]x I[m 1]x m 1 and B(x) =T [m 1] + T [m 2]x + T [m 3]x T [0]x m 1. If D(x) = A(x) Λ B(x) = d 0 + d 1 x + + d 2m 2 x 2m 2, then, C[0] = d m 1 and C[i] = d i+m 1 + d i 1, for i = 1; 2;:::;m 1. Thus, the C[i]'s can be computed in O(m logm) time sequentially if n = m using the fast Fourier transform (FFT) algorithms (see e.g., [2]). Now we consider the case when n > m. Obtaining C[ ] can now be reduced to d n e steps, where in each step, two polynomials of degree» m 1 are multiplied. m Assume w.l.o.g. that n = km for some integer k. Let A i (x) = I[im] + I[im + 1]x + I[im +2]x I[im + m 1]x m 1, for i = 0; 1; ;k 1. Let B(x) = T [m 1] + T [m 2]x + T [m 3]x T [0]x m 1. Multiply A i (x) withb(x), for 0» i» k 1. This will take atotalofo(n log m) time. Let D i (x) = A i (x) Λ B(x) = d i 0 + d i 1 x + + di 2m 2 x2m 2, for 0» i» k 1. Then, C[im] = d i m 1 for 0» i» k 1. Also, C[im + j] = d i (i+1) mod k m 1+j + d j 1, for 0» i» k 1 and 1» j» (m 1). Thus, C[ ] can be computed in O(n) time, given the coefficients of all the D i (x)'s D Template matching on the PRAM The following lemma pertains to computing FFTs on the PRAM [3]. Lemma 3.1 FFT of a vector of length m (and hence the product of two polynomials of degree m 1 each) can be computed in O(log m) time using m EREW PRAM processors. As a corollary to Lemma 3.1, we get: Theorem D template matching problem can be solved in O(log m) time using n EREW PRAM processors. Since the work done by the above algorithm is O(n log m), it is work-optimal D Template Matching on the Hypercube The following lemma will prove helpful in our presentation [4]: 3
4 Lemma 3.2 The following problems can be solved onann-node hypercube ino(log n) time: 1) Monotone routing (see [4] or [2] for a definition), when the data is of length n; 2) Broadcasting a data item; 3) Computing the FFT of a vector of length n; and 4) Computing the product of two polynomials of degree n each. Let H q denote a hypercube of dimension q. Assume that m =2` and n =2 r for some integers ` and r. Consider a hypercube H r. Let I[ ] be stored one element per node of H r. Also let T [ ]beinput one element per node of a subcube H` of H r. If we fix some q bits and vary the remaining bits of an r-bit number, the corresponding processors form a subcube H r q in H r. Thus H r consists of 2 q copies of the subcube H r q. Call the subcube H r whose first r ` bits have the value i as row i, 0» i» 2 r ` 1. Note also that all the nodes whose last ` bits are the same form a subcube H r `. Call the subcube whose last ` bits have the value j as column j, 0» j» 2` 1. Let T []be input in row 0. Broadcast T [ ] so that every row has a copy of T [ ]. This can be done by broadcasting T [j] in column j, for 0» j» 2` 1. Broadcasting takes O(r `) =O(log n) time. Now row i has A i (x) and B(x). Let row i multiply A i (x) and B(x) to obtain D i (x) using Lemma 3.2. The time needed is O(`) =O(log n) Let the result be stored two coefficients per node in row i. In particular, D i [0] and D i [m] willbeinnode0of row i. D i [1] and D i [m +1]will be in node 1 of row i. And so on. C[ ] can now be obtained by an appropriate shift of D[ ]. The shift itself corresponds to a monotone routing step and hence can be completed in O(log n) time. As a result we arrive at: Theorem D template matching problem can be solved on an n-node hypercube in O(log n) time. Here n is size of I[ ] and m is the size of T [ ]. 4 2-D Template Matching In the 2-D version of template matching, we are given an image I[0 : n 1; 0:n 1] and a template T [0 : m 1; 0 : m 1] of pixel values. The problem is to compute r=0 C[0 : n 1; 0:n 1] where C[i; j] = q=0 I[(i+q) modn; (j+r) modn]λt [q; r]. Here also, we can employ FFT techniques to obtain efficient parallel algorithms. 4
5 The idea is to perform a series of 1-D template matchings. In particular, compute the covolution of the qth row ofi[ ]with the rth row oft []for 0» q» (n 1); 0» r» (m 1). Let Q (q;r) [0 : n 1] be the result. Sequentially, P this computation will take time O(n 2 m 1 m log m). Now compute C[i; j] as Q((i+`) modn;`) `=0 [j]. Given the Q (q;r) 's, it takes only an additional O(n 2 m) time to compute all the C[i; j]'s. Thus the total sequential run time is O(n 2 m log m) D Template matching on the PRAM We assume the EREW PRAM model. For a given q and r, Q (q;r) [0 : n 1] can be computed in O(log m) time using n processors (c.f. Theorem 3.1). Thus, if we have n 2 m processors, all the Q (q;r) [0 : n 1]'s can be computed in O(log m) time. After this step, assign m processors to compute each of the C[i; j]'s. The m processors corresponding to any C[i; j] add up the m relevant items using the prefix algorithm in O(log m) time. As a result, we have: Theorem D template matching problem can be solved in O(log m) time using n 2 m EREW PRAM processors. Here n n is the image size and m m is the template size D Template matching on the Hypercube Assume that n =2 r for some integer r. Also assume that m =2` forsomeinteger `. We employ a hypercube with n 2 m =2 2r+` nodes. Let (i; j; Λ) stand for the subcube whose first r bits have avalue i, whose next r bits equal j, and whose last ` bits vary (0» i; j» 2 r 1). Similarly define (i; Λ;k) and (Λ;j;k). Let (i; Λ; Λ) stand for the subcube whose first r bits have the value i and whose other bits vary (0» i» 2 r 1). Similarly define (Λ; j; Λ) and(λ; Λ; k). The image I[ ] is input in the subcube (Λ; Λ; 0). The template is also input in the same subcube. Broadcast I[ ] and T [ ] so that each subcube (Λ; Λ;k) has a copy of I[ ]and T [],0» k» 2` 1. This can be done in time O(r). Subcube (Λ; Λ;k) has n 2 processors. These processors compute Q (u;k) [0 : n 1] for u =0; 1;:::;n 1. This is done in parallel for each k. The time taken is O(r). 5
6 Now C[i; j]'s can be computed by appropriate shifts along the `j-axis' in O(r) time and a prefix computation along the `k-axis' in O(`) time. Thus the total run time is O(r + `) =O(log n). Thus we have shown: Theorem D template matching can be solved in O(log n) time using n 2 m hypercube processors. Here n n is the size of the image and m m is the size of the template. In contrast to Theorem 4.2, Prasanna Kumar and Krishnan's [5] algorithm uses n 2 m 2 processors and runs in time O(log n). References [1] Z. Fang and L. Ni, Parallel Algorithms for 2-D Convolution, Proc. International Conference on Parallel Processing, 1986, pp [2] E. Horowitz, S. Sahni, and S. Rajasekaran, Computer Algorithms/C++, W. H. Freeman Press, [3] J. Já Já, An Introduction to Parallel Algorithms, Addison-Wesley Publishing Company, [4] F. T. Leighton, Introduction to Parallel Algorithms and Architectures: Arrays- Trees-Hypercubes, Morgan Kaufmann Publishers, [5] V. Prasanna Kumar and V. Krishnan, Efficient Template Matching on SIMD Arrays, Proc. International Conference on Parallel Processing, 1987, pp [6] S. Ranka and S. Sahni, Image Template Matching on SIMD Hypercube Multicomputers, Proc. International Conference on Parallel Processing, 1988, pp [7] S. Ranka and S. Sahni, Image Template Matching on MIMD Hypercube Multicomputers, Proc. International Conference on Parallel Processing, 1988, pp [8] A. Rosenfeld and A. C. Kak, Digital Picture Processing, Academic Press,
ODD EVEN SHIFTS IN SIMD HYPERCUBES 1
ODD EVEN SHIFTS IN SIMD HYPERCUBES 1 Sanjay Ranka 2 and Sartaj Sahni 3 Abstract We develop a linear time algorithm to perform all odd (even) length circular shifts of data in an SIMD hypercube. As an application,
More informationR ij = 2. Using all of these facts together, you can solve problem number 9.
Help for Homework Problem #9 Let G(V,E) be any undirected graph We want to calculate the travel time across the graph. Think of each edge as one resistor of 1 Ohm. Say we have two nodes: i and j Let the
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #19 3/28/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 From last class PRAM
More informationHYPERCUBE ALGORITHMS FOR IMAGE PROCESSING AND PATTERN RECOGNITION SANJAY RANKA SARTAJ SAHNI Sanjay Ranka and Sartaj Sahni
HYPERCUBE ALGORITHMS FOR IMAGE PROCESSING AND PATTERN RECOGNITION SANJAY RANKA SARTAJ SAHNI 1989 Sanjay Ranka and Sartaj Sahni 1 2 Chapter 1 Introduction 1.1 Parallel Architectures Parallel computers may
More informationPRAMs. M 1 M 2 M p. globaler Speicher
PRAMs A PRAM (parallel random access machine) consists of p many identical processors M,..., M p (RAMs). Processors can read from/write to a shared (global) memory. Processors work synchronously. M M 2
More informationSeparating the Power of EREW and CREW PRAMs with Small Communication Width*
information and computation 138, 8999 (1997) article no. IC97649 Separating the Power of EREW and CREW PRAMs with Small Communication Width* Paul Beame Department of Computer Science and Engineering, University
More informationLarge Integer Multiplication on Hypercubes. Barry S. Fagin Thayer School of Engineering Dartmouth College Hanover, NH
Large Integer Multiplication on Hypercubes Barry S. Fagin Thayer School of Engineering Dartmouth College Hanover, NH 03755 barry.fagin@dartmouth.edu Large Integer Multiplication 1 B. Fagin ABSTRACT Previous
More informationCost-Constrained Matchings and Disjoint Paths
Cost-Constrained Matchings and Disjoint Paths Kenneth A. Berman 1 Department of ECE & Computer Science University of Cincinnati, Cincinnati, OH Abstract Let G = (V, E) be a graph, where the edges are weighted
More informationCSE Introduction to Parallel Processing. Chapter 2. A Taste of Parallel Algorithms
Dr.. Izadi CSE-0 Introduction to Parallel Processing Chapter 2 A Taste of Parallel Algorithms Consider five basic building-block parallel operations Implement them on four simple parallel architectures
More informationLecture 6 September 21, 2016
ICS 643: Advanced Parallel Algorithms Fall 2016 Lecture 6 September 21, 2016 Prof. Nodari Sitchinava Scribe: Tiffany Eulalio 1 Overview In the last lecture, we wrote a non-recursive summation program and
More informationTwo Fast Parallel GCD Algorithms of Many Integers. Sidi Mohamed SEDJELMACI
Two Fast Parallel GCD Algorithms of Many Integers Sidi Mohamed SEDJELMACI Laboratoire d Informatique Paris Nord, France ISSAC 2017, Kaiserslautern, 24-28 July 2017 1 Motivations GCD of two integers: Used
More informationComplexity: Some examples
Algorithms and Architectures III: Distributed Systems H-P Schwefel, Jens M. Pedersen Mm6 Distributed storage and access (jmp) Mm7 Introduction to security aspects (hps) Mm8 Parallel complexity (hps) Mm9
More informationConquering Edge Faults in a Butterfly with Automorphisms
International Conference on Theoretical and Mathematical Foundations of Computer Science (TMFCS-0) Conquering Edge Faults in a Butterfly with Automorphisms Meghanad D. Wagh and Khadidja Bendjilali Department
More informationCSE 4502/5717 Big Data Analytics Spring 2018; Homework 1 Solutions
CSE 502/5717 Big Data Analytics Spring 2018; Homework 1 Solutions 1. Consider the following algorithm: for i := 1 to α n log e n do Pick a random j [1, n]; If a[j] = a[j + 1] or a[j] = a[j 1] then output:
More informationLimitations of the QRQW and EREW PRAM Models?
Limitations of the QRQW and EREW PRAM Models? Mirosław Kutyłowski, Heinz Nixdorf Institute and Dept. of Mathematics & Computer Science University of Paderborn, D-33095 Paderborn, Germany, mirekk@uni-paderborn.de
More informationA Digit-Serial Systolic Multiplier for Finite Fields GF(2 m )
A Digit-Serial Systolic Multiplier for Finite Fields GF( m ) Chang Hoon Kim, Sang Duk Han, and Chun Pyo Hong Department of Computer and Information Engineering Taegu University 5 Naeri, Jinryang, Kyungsan,
More informationGray Codes for Torus and Edge Disjoint Hamiltonian Cycles Λ
Gray Codes for Torus and Edge Disjoint Hamiltonian Cycles Λ Myung M. Bae Scalable POWERparallel Systems, MS/P963, IBM Corp. Poughkeepsie, NY 6 myungbae@us.ibm.com Bella Bose Dept. of Computer Science Oregon
More informationCOMP 633: Parallel Computing Fall 2018 Written Assignment 1: Sample Solutions
COMP 633: Parallel Computing Fall 2018 Written Assignment 1: Sample Solutions September 12, 2018 I. The Work-Time W-T presentation of EREW sequence reduction Algorithm 2 in the PRAM handout has work complexity
More informationTompa [7], von zur Gathen and Nocker [25], and Mnuk [16]. Recently, von zur Gathen and Shparlinski gave a lower bound of (log n) for the parallel time
A Sublinear-Time Parallel Algorithm for Integer Modular Exponentiation Jonathan P. Sorenson Department of Mathematics and Computer Science Butler University http://www.butler.edu/sorenson sorenson@butler.edu
More informationArnaud Legrand, CNRS, University of Grenoble. November 1, LIG laboratory, Theoretical Parallel Computing. A.
Theoretical Arnaud Legrand, CNRS, University of Grenoble LIG laboratory, arnaudlegrand@imagfr November 1, 2009 1 / 63 Outline Theoretical 1 2 3 2 / 63 Outline Theoretical 1 2 3 3 / 63 Models for Computation
More informationParallelism and Machine Models
Parallelism and Machine Models Andrew D Smith University of New Brunswick, Fredericton Faculty of Computer Science Overview Part 1: The Parallel Computation Thesis Part 2: Parallelism of Arithmetic RAMs
More informationModels: Amdahl s Law, PRAM, α-β Tal Ben-Nun
spcl.inf.ethz.ch @spcl_eth Models: Amdahl s Law, PRAM, α-β Tal Ben-Nun Design of Parallel and High-Performance Computing Fall 2017 DPHPC Overview cache coherency memory models 2 Speedup An application
More informationARecursive Doubling Algorithm. for Solution of Tridiagonal Systems. on Hypercube Multiprocessors
ARecursive Doubling Algorithm for Solution of Tridiagonal Systems on Hypercube Multiprocessors Omer Egecioglu Department of Computer Science University of California Santa Barbara, CA 936 Cetin K Koc Alan
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms Chapter 13 Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael T. Heath Parallel Numerical Algorithms
More information1 1 0, g Exercise 1. Generator polynomials of a convolutional code, given in binary form, are g
Exercise Generator polynomials of a convolutional code, given in binary form, are g 0, g 2 0 ja g 3. a) Sketch the encoding circuit. b) Sketch the state diagram. c) Find the transfer function TD. d) What
More informationOPTIMAL PARALLEL SUPERPRIMITIVITY TESTING FOR SQUARE ARRAYS
Parallel Processing Letters, c World Scientific Publishing Company OPTIMAL PARALLEL SUPERPRIMITIVITY TESTING FOR SQUARE ARRAYS COSTAS S. ILIOPOULOS Department of Computer Science, King s College London,
More informationOn Embeddings of Hamiltonian Paths and Cycles in Extended Fibonacci Cubes
American Journal of Applied Sciences 5(11): 1605-1610, 2008 ISSN 1546-9239 2008 Science Publications On Embeddings of Hamiltonian Paths and Cycles in Extended Fibonacci Cubes 1 Ioana Zelina, 2 Grigor Moldovan
More informationPancycles and Hamiltonian-Connectedness of the Hierarchical Cubic Network
Pancycles and Hamiltonian-Connectedness of the Hierarchical Cubic Network Jung-Sheng Fu Takming College, Taipei, TAIWAN jsfu@mail.takming.edu.tw Gen-Huey Chen Department of Computer Science and Information
More informationSequential Synchronous Circuit Analysis
Sequential Synchronous Circuit Analysis General Model Current State at time (t) is stored in an array of flip-flops. Next State at time (t+1) is a Boolean function of State and Inputs. Outputs at time
More informationImplementation of the DKSS Algorithm for Multiplication of Large Numbers
Implementation of the DKSS Algorithm for Multiplication of Large Numbers Christoph Lüders Universität Bonn The International Symposium on Symbolic and Algebraic Computation, July 6 9, 2015, Bath, United
More informationBeta Operations: Efficient Implementation of a Primitive Parallel Operation
August 1986 Hcport No. STAN-CS-86 1129 Beta Operations: Efficient Implementation of a Primitive Parallel Operation by lcv:rn It. Cohn ;III~ fk~~mcy W. H;lddad Department of Computer Science Sl;~r~Tcml
More informationOn a Parallel Lehmer-Euclid GCD Algorithm
On a Parallel Lehmer-Euclid GCD Algorithm Sidi Mohammed Sedjelmaci LIPN CNRS UPRES-A 7030, Université Paris-Nord 93430 Villetaneuse, France. e-mail: sms@lipn.univ-paris13.fr ABSTRACT A new version of Euclid
More information2 Introduction to Network Theory
Introduction to Network Theory In this section we will introduce some popular families of networks and will investigate how well they allow to support routing.. Basic network topologies The most basic
More informationProgram Performance Metrics
Program Performance Metrics he parallel run time (par) is the time from the moment when computation starts to the moment when the last processor finished his execution he speedup (S) is defined as the
More informationParallel Prefix Algorithms 1. A Secret to turning serial into parallel
Parallel Prefix Algorithms. A Secret to turning serial into parallel 2. Suppose you bump into a parallel algorithm that surprises you there is no way to parallelize this algorithm you say 3. Probably a
More informationSequential and Parallel Algorithms for the All-Substrings Longest Common Subsequence Problem
Sequential and Parallel Algorithms for the All-Substrings Longest Common Subsequence Problem C. E. R. Alves Faculdade de Tecnologia e Ciências Exatas Universidade São Judas Tadeu São Paulo-SP-Brazil prof.carlos
More informationHardware Design I Chap. 4 Representative combinational logic
Hardware Design I Chap. 4 Representative combinational logic E-mail: shimada@is.naist.jp Already optimized circuits There are many optimized circuits which are well used You can reduce your design workload
More informationEfficient Distributed Quantum Computing
Efficient Distributed Quantum Computing Steve Brierley Heilbronn Institute, Dept. Mathematics, University of Bristol October 2013 Work with Robert Beals, Oliver Gray, Aram Harrow, Samuel Kutin, Noah Linden,
More informationAlgorithms for Collective Communication. Design and Analysis of Parallel Algorithms
Algorithms for Collective Communication Design and Analysis of Parallel Algorithms Source A. Grama, A. Gupta, G. Karypis, and V. Kumar. Introduction to Parallel Computing, Chapter 4, 2003. Outline One-to-all
More informationPh219/CS219 Problem Set 3
Ph19/CS19 Problem Set 3 Solutions by Hui Khoon Ng December 3, 006 (KSV: Kitaev, Shen, Vyalyi, Classical and Quantum Computation) 3.1 The O(...) notation We can do this problem easily just by knowing the
More informationFast and Processor Efficient Parallel Matrix Multiplication Algorithms on a Linear Array With a Reconfigurable Pipelined Bus System
IEEE TRASACTIOS O PARALLEL AD DISTRIBUTED SYSTEMS, VOL. 9, O. 8, AUGUST 1998 705 Fast and Processor Efficient Parallel Matrix Multiplication Algorithms on a Linear Array With a Reconfigurable Pipelined
More informationA Complexity Theory for Parallel Computation
A Complexity Theory for Parallel Computation The questions: What is the computing power of reasonable parallel computation models? Is it possible to make technology independent statements? Is it possible
More informationFast Convolution; Strassen s Method
Fast Convolution; Strassen s Method 1 Fast Convolution reduction to subquadratic time polynomial evaluation at complex roots of unity interpolation via evaluation at complex roots of unity 2 The Master
More informationRun-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE
General e Image Coder Structure Motion Video x(s 1,s 2,t) or x(s 1,s 2 ) Natural Image Sampling A form of data compression; usually lossless, but can be lossy Redundancy Removal Lossless compression: predictive
More informationComputing Techniques for Parallel and Distributed Systems with an Application to Data Compression. Sergio De Agostino Sapienza University di Rome
Computing Techniques for Parallel and Distributed Systems with an Application to Data Compression Sergio De Agostino Sapienza University di Rome Parallel Systems A parallel random access machine (PRAM)
More informationFast Decoding Of Alternant Codes Using A Divison-Free Analog Of An Accelerated Berlekamp-Massey Algorithm
Fast Decoding Of Alternant Codes Using A Divison-Free Analog Of An Accelerated Berlekamp-Massey Algorithm MARC A. ARMAND WEE SIEW YEN Department of Electrical & Computer Engineering National University
More informationComputing Maximum Subsequence in Parallel
Computing Maximum Subsequence in Parallel C. E. R. Alves 1, E. N. Cáceres 2, and S. W. Song 3 1 Universidade São Judas Tadeu, São Paulo, SP - Brazil, prof.carlos r alves@usjt.br 2 Universidade Federal
More informationFast, Parallel Algorithm for Multiplying Polynomials with Integer Coefficients
, July 4-6, 01, London, UK Fast, Parallel Algorithm for Multiplying Polynomials with Integer Coefficients Andrzej Chmielowiec Abstract This paper aims to develop and analyze an effective parallel algorithm
More informationCompute the Fourier transform on the first register to get x {0,1} n x 0.
CS 94 Recursive Fourier Sampling, Simon s Algorithm /5/009 Spring 009 Lecture 3 1 Review Recall that we can write any classical circuit x f(x) as a reversible circuit R f. We can view R f as a unitary
More informationPseudo-Random Numbers Generators. Anne GILLE-GENEST. March 1, Premia Introduction Definitions Good generators...
14 pages 1 Pseudo-Random Numbers Generators Anne GILLE-GENEST March 1, 2012 Contents Premia 14 1 Introduction 2 1.1 Definitions............................. 2 1.2 Good generators..........................
More informationOptimization Techniques for Parallel Code 1. Parallel programming models
Optimization Techniques for Parallel Code 1. Parallel programming models Sylvain Collange Inria Rennes Bretagne Atlantique http://www.irisa.fr/alf/collange/ sylvain.collange@inria.fr OPT - 2017 Goals of
More informationLecture 2 September 4, 2014
CS 224: Advanced Algorithms Fall 2014 Prof. Jelani Nelson Lecture 2 September 4, 2014 Scribe: David Liu 1 Overview In the last lecture we introduced the word RAM model and covered veb trees to solve the
More informationAlgorithmic Approach to Counting of Certain Types m-ary Partitions
Algorithmic Approach to Counting of Certain Types m-ary Partitions Valentin P. Bakoev Abstract Partitions of integers of the type m n as a sum of powers of m (the so called m-ary partitions) and their
More informationEfficient parallel algorithms for optical computing with the discrete Fourier transform DFT primitive
Efficient parallel algorithms for optical computing with the discrete Fourier transform DFT primitive John H. Reif and Akhilesh Tyagi Optical-computing technology offers new challenges to algorithm designers
More informationSimultaneous Linear, and Non-linear Congruences
Simultaneous Linear, and Non-linear Congruences CIS002-2 Computational Alegrba and Number Theory David Goodwin david.goodwin@perisic.com 09:00, Friday 18 th November 2011 Outline 1 Polynomials 2 Linear
More informationIntroduction. An Introduction to Algorithms and Data Structures
Introduction An Introduction to Algorithms and Data Structures Overview Aims This course is an introduction to the design, analysis and wide variety of algorithms (a topic often called Algorithmics ).
More informationAlgorithms (II) Yu Yu. Shanghai Jiaotong University
Algorithms (II) Yu Yu Shanghai Jiaotong University Chapter 1. Algorithms with Numbers Two seemingly similar problems Factoring: Given a number N, express it as a product of its prime factors. Primality:
More informationDefinition: Alternating time and space Game Semantics: State of machine determines who
CMPSCI 601: Recall From Last Time Lecture Definition: Alternating time and space Game Semantics: State of machine determines who controls, White wants it to accept, Black wants it to reject. White wins
More informationOptimal and Sublogarithmic Time. Sanguthevar Rajasekaran 2. John H. Reif 2. Abstract.We assume a parallel RAM model which allows both concurrent reads
Optimal and Sublogarithmic Time Randomized Parallel Sorting Algorithms 1 Sanguthevar Rajasekaran 2 John H. Reif 2 Aiken Computation Laboratory, Harvard University, Cambridge, MA 02138. Abstract.We assume
More informationCOL 730: Parallel Programming
COL 730: Parallel Programming PARALLEL SORTING Bitonic Merge and Sort Bitonic sequence: {a 0, a 1,, a n-1 }: A sequence with a monotonically increasing part and a monotonically decreasing part For some
More informationarxiv: v1 [cs.na] 8 Feb 2016
Toom-Coo Multiplication: Some Theoretical and Practical Aspects arxiv:1602.02740v1 [cs.na] 8 Feb 2016 M.J. Kronenburg Abstract Toom-Coo multiprecision multiplication is a well-nown multiprecision multiplication
More informationGF(2 m ) arithmetic: summary
GF(2 m ) arithmetic: summary EE 387, Notes 18, Handout #32 Addition/subtraction: bitwise XOR (m gates/ops) Multiplication: bit serial (shift and add) bit parallel (combinational) subfield representation
More informationExercises - Solutions
Chapter 1 Exercises - Solutions Exercise 1.1. The first two definitions are equivalent, since we follow in both cases the unique path leading from v to a sink and using only a i -edges leaving x i -nodes.
More informationLCNS, Vol 762, pp , Springer 1993
On the Power of Reading and Writing Simultaneously in Parallel Compations? Rolf Niedermeier and Peter Rossmanith?? Fakultat fur Informatik, Technische Universitat Munchen Arcisstr. 21, 80290 Munchen, Fed.
More informationOptimal Extension Field Inversion in the Frequency Domain
Optimal Extension Field Inversion in the Frequency Domain Selçuk Baktır, Berk Sunar WPI, Cryptography & Information Security Laboratory, Worcester, MA, USA Abstract. In this paper, we propose an adaptation
More informationThe Design Procedure. Output Equation Determination - Derive output equations from the state table
The Design Procedure Specification Formulation - Obtain a state diagram or state table State Assignment - Assign binary codes to the states Flip-Flop Input Equation Determination - Select flipflop types
More informationFinite Fields. SOLUTIONS Network Coding - Prof. Frank H.P. Fitzek
Finite Fields In practice most finite field applications e.g. cryptography and error correcting codes utilizes a specific type of finite fields, namely the binary extension fields. The following exercises
More informationFast Polynomial Multiplication
Fast Polynomial Multiplication Marc Moreno Maza CS 9652, October 4, 2017 Plan Primitive roots of unity The discrete Fourier transform Convolution of polynomials The fast Fourier transform Fast convolution
More informationAn Embedding of Multiple Edge-Disjoint Hamiltonian Cycles on Enhanced Pyramid Graphs
Journal of Information Processing Systems, Vol.7, No.1, March 2011 DOI : 10.3745/JIPS.2011.7.1.075 An Embedding of Multiple Edge-Disjoint Hamiltonian Cycles on Enhanced Pyramid Graphs Jung-Hwan Chang*
More informationAnalysis and Synthesis of Weighted-Sum Functions
Analysis and Synthesis of Weighted-Sum Functions Tsutomu Sasao Department of Computer Science and Electronics, Kyushu Institute of Technology, Iizuka 820-8502, Japan April 28, 2005 Abstract A weighted-sum
More informationBoolean circuits. Lecture Definitions
Lecture 20 Boolean circuits In this lecture we will discuss the Boolean circuit model of computation and its connection to the Turing machine model. Although the Boolean circuit model is fundamentally
More informationStream Ciphers. Çetin Kaya Koç Winter / 20
Çetin Kaya Koç http://koclab.cs.ucsb.edu Winter 2016 1 / 20 Linear Congruential Generators A linear congruential generator produces a sequence of integers x i for i = 1,2,... starting with the given initial
More informationConstant Space and Non-Constant Time in Distributed Computing
Constant Space and Non-Constant Time in Distributed Computing Tuomo Lempiäinen and Jukka Suomela Aalto University, Finland OPODIS 20th December 2017 Lisbon, Portugal 1 / 19 Time complexity versus space
More informationMatching Partition a Linked List and Its Optimization
Matching Partition a Linked List and Its Otimization Yijie Han Deartment of Comuter Science University of Kentucky Lexington, KY 40506 ABSTRACT We show the curve O( n log i + log (i) n + log i) for the
More informationParallel Integer Polynomial Multiplication Changbo Chen, Svyatoslav Parallel Integer Covanov, Polynomial FarnamMultiplication
Parallel Integer Polynomial Multiplication Parallel Integer Polynomial Multiplication Changbo Chen 1 Svyatoslav Covanov 2,3 Farnam Mansouri 2 Marc Moreno Maza 2 Ning Xie 2 Yuzhen Xie 2 1 Chinese Academy
More informationCprE 281: Digital Logic
CprE 281: Digital Logic Instructor: Alexander Stoytchev http://www.ece.iastate.edu/~alexs/classes/ Synchronous Sequential Circuits Basic Design Steps CprE 281: Digital Logic Iowa State University, Ames,
More informationFrequency-domain representation of discrete-time signals
4 Frequency-domain representation of discrete-time signals So far we have been looing at signals as a function of time or an index in time. Just lie continuous-time signals, we can view a time signal as
More informationChapter 1. Comparison-Sorting and Selecting in. Totally Monotone Matrices. totally monotone matrices can be found in [4], [5], [9],
Chapter 1 Comparison-Sorting and Selecting in Totally Monotone Matrices Noga Alon Yossi Azar y Abstract An mn matrix A is called totally monotone if for all i 1 < i 2 and j 1 < j 2, A[i 1; j 1] > A[i 1;
More informationThis is a repository copy of Improving the associative rule chaining architecture.
This is a repository copy of Improving the associative rule chaining architecture. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/75674/ Version: Accepted Version Book Section:
More informationNotes on Computer Theory Last updated: November, Circuits
Notes on Computer Theory Last updated: November, 2015 Circuits Notes by Jonathan Katz, lightly edited by Dov Gordon. 1 Circuits Boolean circuits offer an alternate model of computation: a non-uniform one
More informationIntroduction to Quantum Computing
Introduction to Quantum Computing The lecture notes were prepared according to Peter Shor s papers Quantum Computing and Polynomial-Time Algorithms for Prime Factorization and Discrete Logarithms on a
More informationCS 310 Advanced Data Structures and Algorithms
CS 310 Advanced Data Structures and Algorithms Runtime Analysis May 31, 2017 Tong Wang UMass Boston CS 310 May 31, 2017 1 / 37 Topics Weiss chapter 5 What is algorithm analysis Big O, big, big notations
More informationA General Lower Bound on the I/O-Complexity of Comparison-based Algorithms
A General Lower ound on the I/O-Complexity of Comparison-based Algorithms Lars Arge Mikael Knudsen Kirsten Larsent Aarhus University, Computer Science Department Ny Munkegade, DK-8000 Aarhus C. August
More informationcompare to comparison and pointer based sorting, binary trees
Admin Hashing Dictionaries Model Operations. makeset, insert, delete, find keys are integers in M = {1,..., m} (so assume machine word size, or unit time, is log m) can store in array of size M using power:
More informationCME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.
CME342 Parallel Methods in Numerical Analysis Matrix Computation: Iterative Methods II Outline: CG & its parallelization. Sparse Matrix-vector Multiplication. 1 Basic iterative methods: Ax = b r = b Ax
More information15.1 Proof of the Cook-Levin Theorem: SAT is NP-complete
CS125 Lecture 15 Fall 2016 15.1 Proof of the Cook-Levin Theorem: SAT is NP-complete Already know SAT NP, so only need to show SAT is NP-hard. Let L be any language in NP. Let M be a NTM that decides L
More informationA GENERAL POLYNOMIAL SIEVE
A GENERAL POLYNOMIAL SIEVE SHUHONG GAO AND JASON HOWELL Abstract. An important component of the index calculus methods for finding discrete logarithms is the acquisition of smooth polynomial relations.
More informationCPSC 518 Introduction to Computer Algebra Schönhage and Strassen s Algorithm for Integer Multiplication
CPSC 518 Introduction to Computer Algebra Schönhage and Strassen s Algorithm for Integer Multiplication March, 2006 1 Introduction We have now seen that the Fast Fourier Transform can be applied to perform
More informationBayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI
Bayes Classifiers CAP5610 Machine Learning Instructor: Guo-Jun QI Recap: Joint distributions Joint distribution over Input vector X = (X 1, X 2 ) X 1 =B or B (drinking beer or not) X 2 = H or H (headache
More informationDetermine the size of an instance of the minimum spanning tree problem.
3.1 Algorithm complexity Consider two alternative algorithms A and B for solving a given problem. Suppose A is O(n 2 ) and B is O(2 n ), where n is the size of the instance. Let n A 0 be the size of the
More informationA Lower Bound Technique for Nondeterministic Graph-Driven Read-Once-Branching Programs and its Applications
A Lower Bound Technique for Nondeterministic Graph-Driven Read-Once-Branching Programs and its Applications Beate Bollig and Philipp Woelfel FB Informatik, LS2, Univ. Dortmund, 44221 Dortmund, Germany
More informationChapter 5 Divide and Conquer
CMPT 705: Design and Analysis of Algorithms Spring 008 Chapter 5 Divide and Conquer Lecturer: Binay Bhattacharya Scribe: Chris Nell 5.1 Introduction Given a problem P with input size n, P (n), we define
More informationOptimizing the Dimensional Method for Performing Multidimensional, Multiprocessor, Out-of-Core FFTs
Dartmouth College Computer Science Technical Report TR2001-402 Optimizing the Dimensional Method for Performing Multidimensional, Multiprocessor, Out-of-Core FFTs Jeremy T Fineman Dartmouth College Department
More informationGeneralized DCell Structure for Load-Balanced Data Center Networks
Generalized DCell Structure for Load-Balanced Data Center Networks Markus Kliegl, Jason Lee, Jun Li, Xinchao Zhang, Chuanxiong Guo, David Rincón Swarthmore College, Duke University, Fudan University, Shanghai
More informationHAMMING DISTANCE FROM IRREDUCIBLE POLYNOMIALS OVER F Introduction and Motivation
HAMMING DISTANCE FROM IRREDUCIBLE POLYNOMIALS OVER F 2 GILBERT LEE, FRANK RUSKEY, AND AARON WILLIAMS Abstract. We study the Hamming distance from polynomials to classes of polynomials that share certain
More informationOn Pattern Matching With Swaps
On Pattern Matching With Swaps Fouad B. Chedid Dhofar University, Salalah, Oman Notre Dame University - Louaize, Lebanon P.O.Box: 2509, Postal Code 211 Salalah, Oman Tel: +968 23237200 Fax: +968 23237720
More informationOptimal Folding Of Bit Sliced Stacks+
Optimal Folding Of Bit Sliced Stacks+ Doowon Paik University of Minnesota Sartaj Sahni University of Florida Abstract We develop fast polynomial time algorithms to optimally fold stacked bit sliced architectures
More informationNCU EE -- DSP VLSI Design. Tsung-Han Tsai 1
NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using
More informationAlgorithm Design and Analysis
Algorithm Design and Analysis LECTURE 14 Divide and Conquer Fast Fourier Transform Sofya Raskhodnikova 10/7/2016 S. Raskhodnikova; based on slides by K. Wayne. 5.6 Convolution and FFT Fast Fourier Transform:
More informationRemainders. We learned how to multiply and divide in elementary
Remainders We learned how to multiply and divide in elementary school. As adults we perform division mostly by pressing the key on a calculator. This key supplies the quotient. In numerical analysis and
More information