Introduction to HPC. Lecture 20
|
|
- Aubrey Carpenter
- 6 years ago
- Views:
Transcription
1 COSC Introduction to HPC Lecture Dept of Computer Science COSC Fast Fourier Transform
2 COSC Image Processing In Electron Microscopy Medical Imaging Signal Processing In Radio Astronomy Modeling and Solution of PDEs COSC FFT O(PlogP) CooleyTukey, radix,,,. Mixedradix CooleyTukey (MR) Prime Factor Algorithm (PFA) Splitradix (SR) Rader s Algorithm Transforms on real data Sine, Cosine transforms: transforms on real data with symmetry (odd and even symmetry, respectively) (Cosine transforms are used for JPEG, MPEG, )
3 COSC The Fast Fourier Transform The Discrete Fourier Transform (DFT) P X(m) = Σ ω mj P x(j), j= V m [,P], ω P = e πi P The Inverse Discrete Fourier Transform (IDFT) P x(j) = Σ ω mj P X(m), j= V j [,P], ω P = e πi P COSC The DFT X() X() X() X() = P.. ω P ω P ω P. ω (P) P ω P ω P ω P. ω (P) P ω P ω P ω P. ω (P) P x() x() x() x(). X(P)..... ω P (P) ω P (P) ω P (P). ω P (P)(P). x(p) X = Wx The DFT is indeed Matrixvector multiplication
4 COSC The Inverse DFT x() x() x() x().. ω P ω P ω P. ω P (P) X() X() ω P ω P ω P. ω (P) = P X() P ω P ω P ω P. ω (P) P X() = P WX. x(p)..... ω P (P) ω P (P) ω P (P). ω P (P)(P). X(P) Thus, the elements of W are the inverse of the elements of W Proof: P Σ ω ik P ω kj P k= = P Σ ω (ij)k P k= = P if i=j =( ω (ij)p ω (ij) )/( )= if i j COSC FFT Decimationintime (DIT) Decimationinfrequency (DIF) Ordered vs scrambled (bitreversed) Selfsorting Inplace
5 COSC DIF FFT i i ω P Now, consider even and odd l Two half sized DFTs! COSC DIF FFT First computation step The butterfly
6 COSC First computation step DIF FFT COSC DIF FFT Result after recursive application of DIF splitting formula
7 COSC DIF FFT Normal order Bitreversed order ( scrambled ) COSC DIF FFT Twiddle factors Only half as many as the number of data points (half of the unit circle) First stage use all P/ rotations of ω P nd stage use every other twiddle factor rd stage use every fourth. First stage, twiddle exponent: if msb=, then the remaining bits define exponent nd stage, if msb=, then remaining lower order bits define exponent of ω P/
8 COSC Normal and BitReversed orders Index Binary code Reversed binary code Bitreversed index COSC DIT FFT Now consider l and l+p/ since
9 COSC DIT FFT The DIT butterfly The last computation step COSC DIT FFT Result after recursive application of DIT splitting formula
10 COSC DIT FFT Bitreversed order ( scrambled ) Normal order COSC DIT FFT Twiddle factors Only half as many as the number of data points (half of the unit circle) Last stage use all P/ rotations of ω P nd to last stage use every other twiddle factor rd to last stage use every fourth. Last stage, twiddle exponent: if msb=, then the remaining bits define exponent nd to last stage, if msb = of result index, then remaining lower order bits define exponent of ω P/
11 COSC DIF vs DIT FFT DIF Normal input order, bitreversed output AllP/ twiddles used in first stage, every nd twiddle second stage, every fourth in rd stage etc. Twiddle exponent computed based on source index DIT Bitreversed input, normal output AllP/ twiddles used in last stage, every nd twiddle in second last stage, every fourth in rd third last stage etc. Twiddle index computed based on result index Both compute butterflies on successively lower order bits of source index! COSC DIF FFT Normal to Bitreversed order Bitreversed to Normal order DIF can be used for either normal or bitreversed input order! Output order always bitreverse of input order!
12 COSC DIT FFT Bitreversed to Normal order Normal to Bitreversed order DIT can be used for either normal or bitreversed input order! Output order always bitreverse of input order! COSC FFT followed by Inverse FFT DIF DIT Use inverse twiddles for the inverse FFT No bitreversal necessary!
13 COSC FFT followed by Inverse FFT DIF DIF Use inverse twiddles for the inverse FFT No bitreversal necessary! But? Twiddles!!! Allocation for forward and inverse different!! COSC FFT followed by Inverse FFT DIT DIT Use inverse twiddles for the inverse FFT No bitreversal necessary! But? Twiddles!!! Allocation for forward and inverse different!!
14 COSC FFT followed by Inverse FFT DIT DIF Use inverse twiddles for the inverse FFT No bitreversal necessary! Twiddle allocation for forward and inverse the same!! COSC Radix DIF FFT For rewrite as
15 COSC FFT followed by Inverse FFT DIF followed by DIT, or DIT followed by DFT have same twiddle allocation which is important in parallel computation DIT followed by DIT or DFT followed by DFT have different twiddle allocation for forward and inverse FFT. Problem in parallel computation (more twiddle storage than necessary) We illustrated this for normal input order. Same is true for bitreversed input order Since bitreversal is its own inverse, no explicit bitreversal necessary to restore input order for forward followed by inverse FFT COSC Radix DIF FFT
16 COSC Radix DIT FFT With rewrite as COSC Radix DIT FFT
17 COSC Radix DIF FFT COSC Radix DIT FFT
18 COSC FFT: arithmetic and memory ops Butterflies Arithmetic Operations Storage References FFT Add/Sub Mult Total Data Twiddles Total Radix Radix Radix FFT Arithmetic Operations Storage References FFT Add/Sub Mult Total Data Twiddles Total Radix Pp Pp Pp Pp Pp Pp Radix (/)Pp (/)Pp (/)Pp (/)Pp (/)Pp (/)Pp Radix (/)Pp (/)Pp (/)Pp (/)Pp (/)Pp (/)Pp COSC Parallel FFT Data allocation Example: Poweroftwo data set, poweroftwo processors Consecutive data allocation Cyclic data allocation P P P P P P P P P P P P P P P P Communication Input order Consecutive Cyclic Normal First n stages Last n stages Bitreversed Last n stages First n stages
19 COSC Parallel Radix DIT FFT Processor Processor Processor Processor Communication Local COSC Parallel Radix FFT + Inverse FFT DIT DIF Processor Processor Processor Processor Processor twiddle factor subset the same in forward and inverse FFT
20 COSC Parallel Radix + Inverse FFT DIF DIT Use inverse twiddles for the inverse FFT No bitreversal necessary! COSC Permutation based parallel FFT Block allocation Proc ID P P P P P P P P Initial Alloc. Data for the first radix stage cannot be performed locally! Data for the first stage is half the processor address range apart
21 COSC Permutation based parallel FFT Block allocation Proc ID P P P P P P P P Initial Alloc. Exchange First radix stage can be performed concurrently without communication after the exchange! No further stages can be performed locally COSC Permutation based parallel FFT Block allocation Proc ID P P P P P P P P Initial Alloc. After st Exch. After nd Exch. nd radix stage can be performed concurrently without communication after the exchange! No further stages can be performed locally
22 COSC Permutation based parallel FFT Block allocation Proc ID P P P P P P P P Initial Alloc. After st Exch. After nd Exch. After rd Exch. rd radix stage can be performed concurrently without communication after the exchange! Last stage cannot be be performed locally COSC Permutation based parallel FFT Block allocation Proc ID P P P P P P P P Initial Alloc. After st Exch. After nd Exch. After rd Exch. After th Exch. Last ( th ) radix stage can be performed concurrently without communication after the exchange! Note, the last exchange stage is the same as the first!
23 COSC Permutation based parallel FFT Block allocation Proc ID P P P P P P P P Initial Alloc. After st Exch. After nd Exch. After rd Exch. After th Exch. Four exchanges even though there are only three bits used for processor addresses! All butterflies local! Data permuted!! Unshuffle on nonlocal (all indices for source, index (left cyclic shift)!! output index bitreverse of indices above) COSC The unshuffle How did it happen? Initial allocation Index ( ) Step : exchange ( ) Step : exchange ( ) Step : exchange ( ) Step : exchange ( )
24 COSC Permutation based parallel FFT Block allocation Proc ID P P P P P P P P Initial Alloc. After st Exch. After nd Exch. After rd Exch. After th Exch. All butterflies local! Four exchanges! Data permuted!! (all indices for source, output index bitreverse of indices above) Unshuffle on nonlocal index (left cyclic shift)!! COSC Permutation based FFT Proc ID P P P P P P P P Initial Alloc. After st Exch. After nd Exch. After rd Exch. After th Exch. Exchanges based on blocks defined by local msb ( ) ( ) ( ) ( ) ( ) (paddr maddr)
25 COSC Permutation based FFT Proc ID P P P P P P P P Initial Alloc. After st Exch. After nd Exch. After rd Exch. After th Exch. Exchanges based on blocks defined by local lsb ( ) ( ) ( ) ( ) ( ) (paddr maddr) COSC Minimizing the number of permutations (paddr maddr) ( ) ( ) ( ) ( ) ( ) ( ) ( ) (...) ( ) ( ) ( ) (...) ( ) ( ) ( ) (..) Three permutations instead of four many ways
26 COSC Permutation based FFT Proc ID P P P P P P P P P P P P P P P P Initial Alloc. After st Exch. After nd Exch. After rd Exch. section Exchange Sequence ( ) ( ) ( ) ( ) COSC Permutation based parallel FFT Cyclic data allocation Proc ID P P P P P P P P Initial Alloc. After st Exch. After nd Exch. After rd Exch. All butterflies local! Three exchanges! Data permuted!! Consecutive order after exchange sequence
27 COSC Permutation based parallel FFT Cyclic data allocation Proc ID P P P P P P P P Initial Alloc. After st Exch. After nd Exch. After rd Exch. Consecutive order!! Exchange sequence ( ) ( ) ( ) ( ) (maddr paddr) COSC Permutation based FFT FFT computations carried out from msb to lsb in data index (always) To achieve that all computations are local permutation depends on data allocation (and if the FFT is self sorting) Exchanges affect memory access strides both in carrying out permutations, and in carrying out butterfly computations
28 COSC Parallel unordered FFT communication requirements COSC Permutation based parallel FFT Cyclic data allocation, DIF without permutations, twiddles Proc ID P P P P P P P P Input index Proc ID P P P P P P P P Twiddle expon. st stage Proc ID P P P P P P P P Proc ID P P P P P P P P Twiddle expon. nd stage Twiddle expon. rd stage
29 COSC Permutation based parallel FFT Cyclic data allocation, DIF without permutations Proc ID P P P P P P P P Twiddle expon. th stage Proc ID P P P P P P P P Twiddle expon. th stage COSC Twiddle factor allocation
30 COSC Permutation based FFT DIT permutation based FFT twiddles Proc ID P P P P P P P P P P P P P P P P Initial Alloc. Twiddle expon. st stage Twiddle expon. nd stage Twiddle expon. rd stage Twiddle expon. th stage COSC Permutation based FFT DIT permutation based FFT twiddles Proc ID P P P P P P P P P P P P P P P P Twiddle expon. th stage Twiddle expon. th stage
31 COSC Four step FFT Constructing the four step FFT Factorize N in two equal (palindrome) factors N N N. Compute first rank, N FFTs of size. Multiply with twiddle factors. Transpose N Nmatrix. Compute last rank, N FFTs of size N N FFT FFT FFT FFT FFT FFT FFT FFT COSC Square Transpose (c)
32 COSC Square Transpose Xeon Clovertown Opteron Memory. GB/s, cycles Memory. GB/s, cycles L Cache K/core, B, way, cycles L Cache K/core, B, way, cycles L Cache M/dual, B, way, cycles L Cache M/core, B, way, cycles COSC Parallel FFT on Binary ncube
33 COSC Pipelined FFT on ncube The first four steps of a pipelined, inplace, FFT on a cube Time step Time step Time step Time step Memory location Processor The Table entry is network dimension starting with the dimension corresponding to the msb of the paddr) COSC Pipelined Bisection FFT on ncube The first four steps of a pipelined, inplace, FFT on a cube Time step Time step Time step Time step Memory location Processor
34 COSC Pipelined, D, inplace, FFT on cube Performance of a pipelined, onedimensional, inplace, radix, FFT on a cube as a function of data allocation (mesh shape of the cube) COSC Pipelined, D, inplace, FFT on cube Performance of a pipelined, twodimensional, inplace, radix, FFT on a cube as a function of data allocation (mesh shape of the cube)
Bit-Reversed Input to the Radix-2 DIF FFT
Chapter 5 Bit-Reversed Input to the Radix-2 DIF FFT Technically speaking, the correctness of Algorithm 4.2 depends on the fact that x m is initially contained in a[m]. For easy reference, the contents
More informationEEC 281 VLSI Digital Signal Processing Notes on the RRI-FFT
EEC 281 VLSI Digital Signal Processing Notes on the RRI-FFT Bevan Baas The attached description of the RRI-FFT is taken from a chapter describing the cached-fft algorithm and there are therefore occasional
More informationFundamentals of the DFT (fft) Algorithms
Fundamentals of the DFT (fft) Algorithms D. Sundararajan November 6, 9 Contents 1 The PM DIF DFT Algorithm 1.1 Half-wave symmetry of periodic waveforms.............. 1. The DFT definition and the half-wave
More informationDHANALAKSHMI COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING EC2314- DIGITAL SIGNAL PROCESSING UNIT I INTRODUCTION PART A
DHANALAKSHMI COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING EC2314- DIGITAL SIGNAL PROCESSING UNIT I INTRODUCTION PART A Classification of systems : Continuous and Discrete
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms Chapter 13 Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael T. Heath Parallel Numerical Algorithms
More informationThe Fourier transform allows an arbitrary function to be represented in terms of simple sinusoids. The Fourier transform (FT) of a function f(t) is
1 Introduction Here is something I wrote many years ago while working on the design of anemometers for measuring shear stresses. Part of this work required modelling and compensating for the transfer function
More informationFrequency-domain representation of discrete-time signals
4 Frequency-domain representation of discrete-time signals So far we have been looing at signals as a function of time or an index in time. Just lie continuous-time signals, we can view a time signal as
More informationRadix-4 Factorizations for the FFT with Ordered Input and Output
Radix-4 Factorizations for the FFT with Ordered Input and Output Vikrant 1, Ritesh Vyas 2, Sandeep Goyat 3, Jitender Kumar 4, Sandeep Kaushal 5 YMCA University of Science & Technology, Faridabad (Haryana),
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 12 Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted for noncommercial,
More informationx (2) k even h n=(n) + (n+% x(6) X(3), x (5) 4 x(4)- x(), x (2), Decomposition of an N-point DFT into 2 N/2-point DFT's.
COMPUTATION OF DISCRETE FOURIER TRANSFORM - PART 2 1. Lecture 19-49 minutes k even n.o h n=(n) + (n+% x (2), x (2) x(4)- x(6) Decomposition of an N-point DFT into 2 N/2-point DFT's. X(3), x (5) 4 x(),
More informationDSP Algorithm Original PowerPoint slides prepared by S. K. Mitra
Chapter 11 DSP Algorithm Implementations 清大電機系林嘉文 cwlin@ee.nthu.edu.tw Original PowerPoint slides prepared by S. K. Mitra 03-5731152 11-1 Matrix Representation of Digital Consider Filter Structures This
More informationLow-Power Twiddle Factor Unit for FFT Computation
Low-Power Twiddle Factor Unit for FFT Computation Teemu Pitkänen, Tero Partanen, and Jarmo Takala Tampere University of Technology, P.O. Box, FIN- Tampere, Finland {teemu.pitkanen, tero.partanen, jarmo.takala}@tut.fi
More informationHow to Write Fast Numerical Code
How to Write Fast Numerical Code Lecture: Discrete Fourier transform, fast Fourier transforms Instructor: Markus Püschel TA: Georg Ofenbeck & Daniele Spampinato Rest of Semester Today Lecture Project meetings
More information14:332:231 DIGITAL LOGIC DESIGN. Why Binary Number System?
:33:3 DIGITAL LOGIC DESIGN Ivan Marsic, Rutgers University Electrical & Computer Engineering Fall 3 Lecture #: Binary Number System Complement Number Representation X Y Why Binary Number System? Because
More informationE The Fast Fourier Transform
Fourier Transform Methods in Finance By Umberto Cherubini Giovanni Della Lunga Sabrina Mulinacci Pietro Rossi Copyright 2010 John Wiley & Sons Ltd E The Fast Fourier Transform E.1 DISCRETE FOURIER TRASFORM
More informationTransforms and Orthogonal Bases
Orthogonal Bases Transforms and Orthogonal Bases We now turn back to linear algebra to understand transforms, which map signals between different domains Recall that signals can be interpreted as vectors
More information! Circular Convolution. " Linear convolution with circular convolution. ! Discrete Fourier Transform. " Linear convolution through circular
Previously ESE 531: Digital Signal Processing Lec 22: April 18, 2017 Fast Fourier Transform (con t)! Circular Convolution " Linear convolution with circular convolution! Discrete Fourier Transform " Linear
More informationComputational Methods CMSC/AMSC/MAPL 460
Computational Methods CMSC/AMSC/MAPL 460 Fourier transform Balaji Vasan Srinivasan Dept of Computer Science Several slides from Prof Healy s course at UMD Last time: Fourier analysis F(t) = A 0 /2 + A
More informationThe Fourier Transform (and more )
The Fourier Transform (and more ) imrod Peleg ov. 5 Outline Introduce Fourier series and transforms Introduce Discrete Time Fourier Transforms, (DTFT) Introduce Discrete Fourier Transforms (DFT) Consider
More informationDesign and Analysis of Algorithms
Design and Analysis of Algorithms CSE 5311 Lecture 5 Divide and Conquer: Fast Fourier Transform Junzhou Huang, Ph.D. Department of Computer Science and Engineering CSE5311 Design and Analysis of Algorithms
More informationAppendix 3: FFT Computer Programs
onnexions module: m17397 1 Appendix 3: FFT omputer Programs. Sidney Burrus This work is produced by The onnexions Project and licensed under the reative ommons Attribution License Abstract Fortran programs
More informationCSE 421 Algorithms. T(n) = at(n/b) + n c. Closest Pair Problem. Divide and Conquer Algorithms. What you really need to know about recurrences
CSE 421 Algorithms Richard Anderson Lecture 13 Divide and Conquer What you really need to know about recurrences Work per level changes geometrically with the level Geometrically increasing (x > 1) The
More informationInteger multiplication with generalized Fermat primes
Integer multiplication with generalized Fermat primes CARAMEL Team, LORIA, University of Lorraine Supervised by: Emmanuel Thomé and Jérémie Detrey Journées nationales du Calcul Formel 2015 (Cluny) November
More informationSection 3.7: Solving Radical Equations
Objective: Solve equations with radicals and check for extraneous solutions. In this section, we solve equations that have roots in the problem. As you might expect, to clear a root we can raise both sides
More informationFall 2011, EE123 Digital Signal Processing
Lecture 6 Miki Lustig, UCB September 11, 2012 Miki Lustig, UCB DFT and Sampling the DTFT X (e jω ) = e j4ω sin2 (5ω/2) sin 2 (ω/2) 5 x[n] 25 X(e jω ) 4 20 3 15 2 1 0 10 5 1 0 5 10 15 n 0 0 2 4 6 ω 5 reconstructed
More informationVU Signal and Image Processing. Torsten Möller + Hrvoje Bogunović + Raphael Sahann
052600 VU Signal and Image Processing Torsten Möller + Hrvoje Bogunović + Raphael Sahann torsten.moeller@univie.ac.at hrvoje.bogunovic@meduniwien.ac.at raphael.sahann@univie.ac.at vda.cs.univie.ac.at/teaching/sip/17s/
More informationChapter 4 Discrete Fourier Transform (DFT) And Signal Spectrum
Chapter 4 Discrete Fourier Transform (DFT) And Signal Spectrum CEN352, DR. Nassim Ammour, King Saud University 1 Fourier Transform History Born 21 March 1768 ( Auxerre ). Died 16 May 1830 ( Paris ) French
More informationQuestion Bank. UNIT 1 Part-A
FATIMA MICHAEL COLLEGE OF ENGINEERING & TECHNOLOGY Senkottai Village, Madurai Sivagangai Main Road, Madurai -625 020 An ISO 9001:2008 Certified Institution Question Bank DEPARTMENT OF ELECTRONICS AND COMMUNICATION
More informationEDISP (NWL3) (English) Digital Signal Processing DFT Windowing, FFT. October 19, 2016
EDISP (NWL3) (English) Digital Signal Processing DFT Windowing, FFT October 19, 2016 DFT resolution 1 N-point DFT frequency sampled at θ k = 2πk N, so the resolution is f s/n If we want more, we use N
More informationThe DFT as Convolution or Filtering
Connexions module: m16328 1 The DFT as Convolution or Filtering C. Sidney Burrus This work is produced by The Connexions Project and licensed under the Creative Commons Attribution License A major application
More informationMath 121 Homework 2 Solutions
Math 121 Homework 2 Solutions Problem 13.2 #16. Let K/F be an algebraic extension and let R be a ring contained in K that contains F. Prove that R is a subfield of K containing F. We will give two proofs.
More informationDFT & Fast Fourier Transform PART-A. 7. Calculate the number of multiplications needed in the calculation of DFT and FFT with 64 point sequence.
SHRI ANGALAMMAN COLLEGE OF ENGINEERING & TECHNOLOGY (An ISO 9001:2008 Certified Institution) SIRUGANOOR,TRICHY-621105. DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING UNIT I DFT & Fast Fourier
More informationChapter 2 Algorithms for Periodic Functions
Chapter 2 Algorithms for Periodic Functions In this chapter we show how to compute the Discrete Fourier Transform using a Fast Fourier Transform (FFT) algorithm, including not-so special case situations
More informationBinary Multipliers. Reading: Study Chapter 3. The key trick of multiplication is memorizing a digit-to-digit table Everything else was just adding
Binary Multipliers The key trick of multiplication is memorizing a digit-to-digit table Everything else was just adding 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 2 2 4 6 8 2 4 6 8 3 3 6 9 2 5 8 2 24 27 4 4 8 2 6
More informationEE216B: VLSI Signal Processing. FFT Processors. Prof. Dejan Marković FFT: Background
4/30/0 EE6B: VLSI Signal Processing FFT Processors Prof. Dejan Marković ee6b@gmail.com FFT: Background A bit of history 805 - algorithm first described by Gauss 965 - algorithm rediscovered (not for the
More informationCourse Name: Digital Signal Processing Course Code: EE 605A Credit: 3
Course Name: Digital Signal Processing Course Code: EE 605A Credit: 3 Prerequisites: Sl. No. Subject Description Level of Study 01 Mathematics Fourier Transform, Laplace Transform 1 st Sem, 2 nd Sem 02
More informationELEG 305: Digital Signal Processing
ELEG 305: Digital Signal Processing Lecture 18: Applications of FFT Algorithms & Linear Filtering DFT Computation; Implementation of Discrete Time Systems Kenneth E. Barner Department of Electrical and
More informationPhysics 116A Determinants
Physics 116A Determinants Peter Young (Dated: February 5, 2014) I. DEFINITION The determinant is an important property of a square n n matrix. Such a matrix, A say, has n 2 elements, a ij, and is written
More informationELEG 305: Digital Signal Processing
ELEG 5: Digital Signal Processing Lecture 6: The Fast Fourier Transform; Radix Decimatation in Time Kenneth E. Barner Department of Electrical and Computer Engineering University of Delaware Fall 8 K.
More informationDIGITAL SIGNAL PROCESSING
IT 1252 DIGITAL SIGNAL PROCESSING 1. Define Discrete time signal. A discrete time signal x (n) is a function of an independent variable that is an integer.a discrete time signal is not defined at instant
More informationHow to Write Fast Numerical Code Spring 2012 Lecture 19. Instructor: Markus Püschel TAs: Georg Ofenbeck & Daniele Spampinato
How to Write Fast Numerical Code Spring 2012 Lecture 19 Instructor: Markus Püschel TAs: Georg Ofenbeck & Daniele Spampinato Miscellaneous Roofline tool Project report etc. online Midterm (solutions online)
More informationVALLIAMMAI ENGINEERING COLLEGE. SRM Nagar, Kattankulathur DEPARTMENT OF INFORMATION TECHNOLOGY. Academic Year
VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur- 603 203 DEPARTMENT OF INFORMATION TECHNOLOGY Academic Year 2016-2017 QUESTION BANK-ODD SEMESTER NAME OF THE SUBJECT SUBJECT CODE SEMESTER YEAR
More informationIBM Research Report. arxiv:quant-ph/ v1 16 Jan An Approximate Fourier Transform Useful in Quantum Factoring
RC 964 (07//94) Mathematics IBM Research Report arxiv:quant-ph/00067v 6 Jan 00 An Approximate Fourier Transform Useful in Quantum Factoring D. Coppersmith IBM Research Division T.J. Watson Research Center
More informationIT DIGITAL SIGNAL PROCESSING (2013 regulation) UNIT-1 SIGNALS AND SYSTEMS PART-A
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING IT6502 - DIGITAL SIGNAL PROCESSING (2013 regulation) UNIT-1 SIGNALS AND SYSTEMS PART-A 1. What is a continuous and discrete time signal? Continuous
More informationCMPSCI611: Three Divide-and-Conquer Examples Lecture 2
CMPSCI611: Three Divide-and-Conquer Examples Lecture 2 Last lecture we presented and analyzed Mergesort, a simple divide-and-conquer algorithm. We then stated and proved the Master Theorem, which gives
More informationOptimum Circuits for Bit Reversal
Optimum Circuits for Bit Reversal Mario Garrido Gálvez, Jesus Grajal and Oscar Gustafsson Linköping University Post Print.B.: When citing this work, cite the original article. 2011 IEEE. Personal use of
More information1. Calculation of the DFT
ELE E4810: Digital Signal Processing Topic 10: The Fast Fourier Transform 1. Calculation of the DFT. The Fast Fourier Transform algorithm 3. Short-Time Fourier Transform 1 1. Calculation of the DFT! Filter
More informationStrassen-like algorithms for symmetric tensor contractions
Strassen-like algorithms for symmetric tensor contractions Edgar Solomonik Theory Seminar University of Illinois at Urbana-Champaign September 18, 2017 1 / 28 Fast symmetric tensor contractions Outline
More informationCanonic FFT flow graphs for real-valued even/odd symmetric inputs
Lao and Parhi EURASIP Journal on Advances in Signal Processing (017) 017:45 DOI 10.1186/s13634-017-0477-9 EURASIP Journal on Advances in Signal Processing RESEARCH Canonic FFT flow graphs for real-valued
More informationFast and Small: Multiplying Polynomials without Extra Space
Fast and Small: Multiplying Polynomials without Extra Space Daniel S. Roche Symbolic Computation Group School of Computer Science University of Waterloo CECM Day SFU, Vancouver, 24 July 2009 Preliminaries
More informationDIGITAL SIGNAL PROCESSING UNIT I
DIGITAL SIGNAL PROCESSING UNIT I CONTENTS 1.1 Introduction 1.2 Introduction about signals 1.3 Signals Processing 1.4 Classification of signals 1.5 Operations performed on signals 1.6 Properties of signals
More informationUNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. Prof. R. Fateman
UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division CS 282 Spring, 2000 Prof. R. Fateman The (finite field) Fast Fourier Transform 0. Introduction
More informationSubquadratic Computational Complexity Schemes for Extended Binary Field Multiplication Using Optimal Normal Bases
1 Subquadratic Computational Complexity Schemes for Extended Binary Field Multiplication Using Optimal Normal Bases H. Fan and M. A. Hasan March 31, 2007 Abstract Based on a recently proposed Toeplitz
More informationChapter 2: The Fourier Transform
EEE, EEE Part A : Digital Signal Processing Chapter Chapter : he Fourier ransform he Fourier ransform. Introduction he sampled Fourier transform of a periodic, discrete-time signal is nown as the discrete
More informationSparse Fourier Transform (lecture 1)
1 / 73 Sparse Fourier Transform (lecture 1) Michael Kapralov 1 1 IBM Watson EPFL St. Petersburg CS Club November 2015 2 / 73 Given x C n, compute the Discrete Fourier Transform (DFT) of x: x i = 1 x n
More informationALU (3) - Division Algorithms
HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK Lecture 12 ALU (3) - Division Algorithms Sommersemester 2002 Leitung: Prof. Dr. Miroslaw Malek www.informatik.hu-berlin.de/rok/ca CA - XII - ALU(3)
More informationOptimizing the Dimensional Method for Performing Multidimensional, Multiprocessor, Out-of-Core FFTs
Dartmouth College Computer Science Technical Report TR2001-402 Optimizing the Dimensional Method for Performing Multidimensional, Multiprocessor, Out-of-Core FFTs Jeremy T Fineman Dartmouth College Department
More informationSymmetry and Properties of Crystals (MSE638) Stress and Strain Tensor
Symmetry and Properties of Crystals (MSE638) Stress and Strain Tensor Somnath Bhowmick Materials Science and Engineering, IIT Kanpur April 6, 2018 Tensile test and Hooke s Law Upto certain strain (0.75),
More informationKevin James. MTHSC 3110 Section 2.1 Matrix Operations
MTHSC 3110 Section 2.1 Matrix Operations Notation Let A be an m n matrix, that is, m rows and n columns. We ll refer to the entries of A by their row and column indices. The entry in the i th row and j
More informationAlgorithms of Scientific Computing
Algorithms of Scientific Computing Discrete Fourier Transform (DFT) Michael Bader Technical University of Munich Summer 2018 Fast Fourier Transform Outline Discrete Fourier transform Fast Fourier transform
More informationEE482: Digital Signal Processing Applications
Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:305:45 CBC C222 Lecture 8 Frequency Analysis 14/02/18 http://www.ee.unlv.edu/~b1morris/ee482/
More information5.6 Convolution and FFT
5.6 Convolution and FFT Fast Fourier Transform: Applications Applications. Optics, acoustics, quantum physics, telecommunications, control systems, signal processing, speech recognition, data compression,
More informationComputer Architecture 10. Residue Number Systems
Computer Architecture 10 Residue Number Systems Ma d e wi t h Op e n Of f i c e. o r g 1 A Puzzle What number has the reminders 2, 3 and 2 when divided by the numbers 7, 5 and 3? x mod 7 = 2 x mod 5 =
More informationJim Lambers ENERGY 281 Spring Quarter Lecture 5 Notes
Jim ambers ENERGY 28 Spring Quarter 27-8 ecture 5 Notes These notes are based on Rosalind Archer s PE28 lecture notes, with some revisions by Jim ambers. Fourier Series Recall that in ecture 2, when we
More informationPermuting Streaming Data Using RAMs
Permuting Streaming Data Using RAMs MARKUS PÜSCHEL, PETER A. MILDER, and JAMES C. HOE Carnegie Mellon University This paper presents a method for constructing hardware structures that perform a fixed permutation
More informationDSP Configurations. responded with: thus the system function for this filter would be
DSP Configurations In this lecture we discuss the different physical (or software) configurations that can be used to actually realize or implement DSP functions. Recall that the general form of a DSP
More informationFast evaluation of iterated multiplication of very large polynomials: An application to chinese remainder theory
ANZIAM J. 48 (CTAC2006) pp.c709 C724, 2007 C709 Fast evaluation of iterated multiplication of very large polynomials: An application to chinese remainder theory D. Laing 1 B. Litow 2 (Received 30 August
More informationCSE548, AMS542: Analysis of Algorithms, Fall 2017 Date: October 11. In-Class Midterm. ( 7:05 PM 8:20 PM : 75 Minutes )
CSE548, AMS542: Analysis of Algorithms, Fall 2017 Date: October 11 In-Class Midterm ( 7:05 PM 8:20 PM : 75 Minutes ) This exam will account for either 15% or 30% of your overall grade depending on your
More informationOptimal Extension Field Inversion in the Frequency Domain
Optimal Extension Field Inversion in the Frequency Domain Selçuk Baktır, Berk Sunar WPI, Cryptography & Information Security Laboratory, Worcester, MA, USA Abstract. In this paper, we propose an adaptation
More informationCSE 548: Analysis of Algorithms. Lecture 4 ( Divide-and-Conquer Algorithms: Polynomial Multiplication )
CSE 548: Analysis of Algorithms Lecture 4 ( Divide-and-Conquer Algorithms: Polynomial Multiplication ) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2015 Coefficient Representation
More information5.1 Banded Storage. u = temperature. The five-point difference operator. uh (x, y + h) 2u h (x, y)+u h (x, y h) uh (x + h, y) 2u h (x, y)+u h (x h, y)
5.1 Banded Storage u = temperature u= u h temperature at gridpoints u h = 1 u= Laplace s equation u= h u = u h = grid size u=1 The five-point difference operator 1 u h =1 uh (x + h, y) 2u h (x, y)+u h
More informationMatrix-Vector Based Fast Fourier Transformations on SDR Architectures
Adv. Radio Sci., 6, 89 9, 008 www.adv-radio-sci.net/6/89/008/ Author(s) 008. Th work dtributed under Creative Commons Attribution 3.0 License. Advances in Radio Science Matrix-Vector Based Fast Fourier
More informationWorking with Square Roots. Return to Table of Contents
Working with Square Roots Return to Table of Contents 36 Square Roots Recall... * Teacher Notes 37 Square Roots All of these numbers can be written with a square. Since the square is the inverse of the
More informationHMMT February 2018 February 10, 2018
HMMT February 018 February 10, 018 Algebra and Number Theory 1. For some real number c, the graphs of the equation y = x 0 + x + 18 and the line y = x + c intersect at exactly one point. What is c? 18
More informationCOSC460 Honours Report. A Fast Discrete Tchebichef Transform Algorithm for Image Compression
COSC460 Honours Report A Fast Discrete Tchebichef Transform Algorithm for Image Compression November 2006 Kiyoyuki Nakagaki kna23@student.canterbury.ac.nz Supervisor : Dr. Ramakrishnan Mukundan mukundan@canterbury.ac.nz
More informationLinear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4
Linear Algebra Section. : LU Decomposition Section. : Permutations and transposes Wednesday, February 1th Math 01 Week # 1 The LU Decomposition We learned last time that we can factor a invertible matrix
More informationCurriculum Guide Algebra 2 Advanced
Unit 1: Equations and Inequalities Biblical Worldview Essential Questions: Is your life balanced as a believer? Are you a real Christian? 13 Lessons A2#1, A2#2 1. Use a number line to graph and order real
More informationDISCRETE FOURIER TRANSFORM
DISCRETE FOURIER TRANSFORM 1. Introduction The sampled discrete-time fourier transform (DTFT) of a finite length, discrete-time signal is known as the discrete Fourier transform (DFT). The DFT contains
More informationFast Fourier Transforms. Collection Editor: C. Sidney Burrus
Fast Fourier Transforms ollection Editor:. Sidney Burrus Fast Fourier Transforms ollection Editor:. Sidney Burrus Authors:. Sidney Burrus Matteo Frigo Steven G. Johnson Markus Pueschel Ivan Selesnick
More informationarxiv: v1 [cs.na] 8 Feb 2016
Toom-Coo Multiplication: Some Theoretical and Practical Aspects arxiv:1602.02740v1 [cs.na] 8 Feb 2016 M.J. Kronenburg Abstract Toom-Coo multiprecision multiplication is a well-nown multiprecision multiplication
More informationToward High Performance Matrix Multiplication for Exact Computation
Toward High Performance Matrix Multiplication for Exact Computation Pascal Giorgi Joint work with Romain Lebreton (U. Waterloo) Funded by the French ANR project HPAC Séminaire CASYS - LJK, April 2014 Motivations
More informationSection Summary. Sequences. Recurrence Relations. Summations Special Integer Sequences (optional)
Section 2.4 Section Summary Sequences. o Examples: Geometric Progression, Arithmetic Progression Recurrence Relations o Example: Fibonacci Sequence Summations Special Integer Sequences (optional) Sequences
More informationSequences. 1. Number sequences. 2. Arithmetic sequences. Consider the illustrated pattern of circles:
Sequences 1. Number sequences Consider the illustrated pattern of circles: The first layer has just one blue ball. The second layer has three pink balls. The third layer has five black balls. The fourth
More informationComputing With Tensors: Potential Applications of Physics-Motivated Mathematics to Computer Science
Computing With Tensors: Potential Applications of Physics-Motivated Mathematics to Computer Science Martine Ceberio and Vladik Kreinovich Department of Computer Science University of Texas at El Paso El
More informationNumbering Systems. Computational Platforms. Scaling and Round-off Noise. Special Purpose. here that is dedicated architecture
Computational Platforms Numbering Systems Basic Building Blocks Scaling and Round-off Noise Computational Platforms Viktor Öwall viktor.owall@eit.lth.seowall@eit lth Standard Processors or Special Purpose
More informationOutline. MSRI-UP 2009 Coding Theory Seminar, Week 2. The definition. Link to polynomials
Outline MSRI-UP 2009 Coding Theory Seminar, Week 2 John B. Little Department of Mathematics and Computer Science College of the Holy Cross Cyclic Codes Polynomial Algebra More on cyclic codes Finite fields
More informationMatrix Multiplication
Matrix Multiplication Matrix Multiplication Matrix multiplication. Given two n-by-n matrices A and B, compute C = AB. n c ij = a ik b kj k=1 c 11 c 12 c 1n c 21 c 22 c 2n c n1 c n2 c nn = a 11 a 12 a 1n
More informationIntroduction to Fourier Analysis Part 2. CS 510 Lecture #7 January 31, 2018
Introduction to Fourier Analysis Part 2 CS 510 Lecture #7 January 31, 2018 OpenCV on CS Dept. Machines 2/4/18 CSU CS 510, Ross Beveridge & Bruce Draper 2 In the extreme, a square wave Graphic from http://www.mechatronics.colostate.edu/figures/4-4.jpg
More informationThe tangent FFT. D. J. Bernstein University of Illinois at Chicago
The tangent FFT D. J. Bernstein University of Illinois at Chicago Advertisement SPEED: Software Performance Enhancement for Encryption and Decryption A workshop on software speeds for secret-key cryptography
More informationFast Fourier Transform
Fast Fourier Transform December 8, 2016 FFT JPEG RGB Y C B C R (luma (brightness), chroma 2 (color)) chroma resolution is reduced image is split in blocks 8 8 pixels JPEG RGB Y C B C R (luma (brightness),
More informationInteger factorization, part 1: the Q sieve. part 2: detecting smoothness. D. J. Bernstein
Integer factorization, part 1: the Q sieve Integer factorization, part 2: detecting smoothness D. J. Bernstein The Q sieve factors by combining enough -smooth congruences ( + ). Enough log. Plausible conjecture:
More informationFaster arithmetic for number-theoretic transforms
University of New South Wales 7th October 2011, Macquarie University Plan for talk 1. Review number-theoretic transform (NTT) 2. Discuss typical butterfly algorithm 3. Improvements to butterfly algorithm
More informationArithmetic. Integers: Any positive or negative whole number including zero
Arithmetic Integers: Any positive or negative whole number including zero Rules of integer calculations: Adding Same signs add and keep sign Different signs subtract absolute values and keep the sign of
More informationAPPENDIX A. The Fourier integral theorem
APPENDIX A The Fourier integral theorem In equation (1.7) of Section 1.3 we gave a description of a signal defined on an infinite range in the form of a double integral, with no explanation as to how that
More informationModule 1: Analyzing the Efficiency of Algorithms
Module 1: Analyzing the Efficiency of Algorithms Dr. Natarajan Meghanathan Professor of Computer Science Jackson State University Jackson, MS 39217 E-mail: natarajan.meghanathan@jsums.edu What is an Algorithm?
More informationMARN 5898 Fourier Analysis.
MAR 5898 Fourier Analysis. Dmitriy Leykekhman Spring 2010 Goals Fourier Series. Discrete Fourier Transforms. D. Leykekhman - MAR 5898 Parameter estimation in marine sciences Linear Least Squares 1 Complex
More informationExascale Computing for Radio Astronomy: GPU or FPGA?
Exascale Computing for Radio Astronomy: GPU or FPGA? Kees van Berkel MPSoC 2016, Nara, Japan 2016, July 14 Radio Astronomy: Herculus A (a.k.a. 3C 348) optically invisible jets, one-and-a-half million light-years
More informationCSCE 564, Fall 2001 Notes 6 Page 1 13 Random Numbers The great metaphysical truth in the generation of random numbers is this: If you want a function
CSCE 564, Fall 2001 Notes 6 Page 1 13 Random Numbers The great metaphysical truth in the generation of random numbers is this: If you want a function that is reasonably random in behavior, then take any
More informationMicrocontrollers. Fast - Fourier - Transformation. þ additional file AP EXE available
Microcontrollers Apote AP634 þ additional file AP634.EXE available ast - ourier - Transformation The ast ourier Transformation (T is an algorithm frequently used in various applications, like telecommunication,
More information1 Determinants. 1.1 Determinant
1 Determinants [SB], Chapter 9, p.188-196. [SB], Chapter 26, p.719-739. Bellow w ll study the central question: which additional conditions must satisfy a quadratic matrix A to be invertible, that is to
More information