CS 179: GPU Programming. Lecture 9 / Homework 3

Similar documents
CS 179: GPU Programming. Lecture 9 / Homework 3

Module 3. Convolution. Aim

Definition. A signal is a sequence of numbers. sequence is also referred to as being in l 1 (Z), or just in l 1. A sequence {x(n)} satisfying

DFT-Based FIR Filtering. See Porat s Book: 4.7, 5.6

EEO 401 Digital Signal Processing Prof. Mark Fowler

Digital Signal Processing. Midterm 2 Solutions

LTI H. the system H when xn [ ] is the input.

Linear Convolution Using FFT

Topic 7. Convolution, Filters, Correlation, Representation. Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

Homework #1 Solution

! Review: Discrete Fourier Transform (DFT) ! DFT Properties. " Duality. " Circular Shift. ! Circular Convolution. ! Fast Convolution Methods

/ (2π) X(e jω ) dω. 4. An 8 point sequence is given by x(n) = {2,2,2,2,1,1,1,1}. Compute 8 point DFT of x(n) by

! Circular Convolution. " Linear convolution with circular convolution. ! Discrete Fourier Transform. " Linear convolution through circular

Lecture 11 FIR Filters

Lecture 3: Linear Filters

Lecture 20: Discrete Fourier Transform and FFT

CS168: The Modern Algorithmic Toolbox Lecture #11: The Fourier Transform and Convolution

Lecture 3: Linear Filters

Chapter 8 The Discrete Fourier Transform

ENT 315 Medical Signal Processing CHAPTER 2 DISCRETE FOURIER TRANSFORM. Dr. Lim Chee Chin

ELEG 305: Digital Signal Processing

CS S Lecture 5 January 29, 2019

VU Signal and Image Processing. Torsten Möller + Hrvoje Bogunović + Raphael Sahann

Fall 2011, EE123 Digital Signal Processing

Introduction to DFT. Deployment of Telecommunication Infrastructures. Azadeh Faridi DTIC UPF, Spring 2009

Solution 10 July 2015 ECE301 Signals and Systems: Midterm. Cover Sheet

EEL3135: Homework #4

E : Lecture 1 Introduction

Frequency-domain representation of discrete-time signals

Shift Property of z-transform. Lecture 16. More z-transform (Lathi 5.2, ) More Properties of z-transform. Convolution property of z-transform

DHANALAKSHMI COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING EC2314- DIGITAL SIGNAL PROCESSING UNIT I INTRODUCTION PART A

EE123 Digital Signal Processing

Convolution. Define a mathematical operation on discrete-time signals called convolution, represented by *. Given two discrete-time signals x 1, x 2,

INTRODUCTION TO THE DFS AND THE DFT

Image Enhancement in the frequency domain. GZ Chapter 4

Therefore the new Fourier coefficients are. Module 2 : Signals in Frequency Domain Problem Set 2. Problem 1

Linear Filters and Convolution. Ahmed Ashraf

Linear Scale-Space. Algorithms for Linear Scale-Space. Gaussian Image Derivatives. Rein van den Boomgaard University of Amsterdam.

Review of Linear Systems Theory

EE123 Digital Signal Processing

Lecture 19 IIR Filters

Lecture 5. The Digital Fourier Transform. (Based, in part, on The Scientist and Engineer's Guide to Digital Signal Processing by Steven Smith)

DSP Algorithm Original PowerPoint slides prepared by S. K. Mitra

The Convolution Sum for Discrete-Time LTI Systems

Fast Multipole Methods: Fundamentals & Applications. Ramani Duraiswami Nail A. Gumerov

Exercises in Digital Signal Processing

Compute the Fourier transform on the first register to get x {0,1} n x 0.

EE 225D LECTURE ON DIGITAL FILTERS. University of California Berkeley

Signal Processing COS 323

Designing Information Devices and Systems II Fall 2018 Elad Alon and Miki Lustig Homework 12

EE422G Homework #13 (12 points)

6.02 Fall 2012 Lecture #10

Digital Signal Processing Lecture 4

Discrete-time signals and systems

Roll No. :... Invigilator s Signature :.. CS/B.Tech (EE-N)/SEM-6/EC-611/ DIGITAL SIGNAL PROCESSING. Time Allotted : 3 Hours Full Marks : 70

ESE 531: Digital Signal Processing

Transforms and Orthogonal Bases

ECS455: Chapter 5 OFDM. ECS455: Chapter 5 OFDM. OFDM: Overview. OFDM Applications. Dr.Prapun Suksompong prapun.com/ecs455

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY

LAB 6: FIR Filter Design Summer 2011

Discrete Fourier Transform (DFT) & Fast Fourier Transform (FFT) Lab 9

Spatial-Domain Convolution Filters

Notes 07 largely plagiarized by %khc

1. Calculation of the DFT

DFT & Fast Fourier Transform PART-A. 7. Calculate the number of multiplications needed in the calculation of DFT and FFT with 64 point sequence.

BME I5000: Biomedical Imaging

EE6604 Personal & Mobile Communications. Week 15. OFDM on AWGN and ISI Channels

MAT01A1: Complex Numbers (Appendix H)

Digital Signal Processing. Lecture Notes and Exam Questions DRAFT

Sequential Fast Fourier Transform (PSC ) Sequential FFT

ECE-314 Fall 2012 Review Questions for Midterm Examination II

CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis. Catie Baker Spring 2015

Block-by Block Convolution, FFT/IFFT, Digital Spectral Analysis

Review of Linear System Theory

Signal Processing and Linear Systems1 Lecture 4: Characterizing Systems

ELEG 305: Digital Signal Processing

Multidimensional digital signal processing

How to manipulate Frequencies in Discrete-time Domain? Two Main Approaches

Department of Electrical and Telecommunications Engineering Technology TEL (718) FAX: (718) Courses Description:

Value Function Methods. CS : Deep Reinforcement Learning Sergey Levine

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

Chirp Transform for FFT

Lecture 14: Convolution and Frequency Domain Filtering

IT DIGITAL SIGNAL PROCESSING (2013 regulation) UNIT-1 SIGNALS AND SYSTEMS PART-A

ECS455: Chapter 5 OFDM

ECE503: Digital Signal Processing Lecture 6

Discrete Fourier Transform

Due dates are as mentioned above. Checkoff interviews for PS4 and PS5 will happen together between October 24 and 28, 2012.

Lecture 6: Discrete Fourier Transform

PSET 0 Due Today by 11:59pm Any issues with submissions, post on Piazza. Fall 2018: T-R: Lopata 101. Median Filter / Order Statistics

Pre Calculus with Mrs. Bluell

CSE 548: Analysis of Algorithms. Lecture 4 ( Divide-and-Conquer Algorithms: Polynomial Multiplication )

2 M. Hasegawa-Johnson. DRAFT COPY.

Review of Concepts from Fourier & Filtering Theory. Fourier theory for finite sequences. convolution/filtering of infinite sequences filter cascades

Discrete Fourier transform (DFT)

Two-Dimensional Signal Processing and Image De-noising

LTI Systems (Continuous & Discrete) - Basics

EE482: Digital Signal Processing Applications

Fourier transforms and convolution

EE482: Digital Signal Processing Applications

Transcription:

CS 179: GPU Programming Lecture 9 / Homework 3

Recap Some algorithms are less obviously parallelizable : Reduction Sorts FFT (and certain recursive algorithms)

Parallel FFT structure (radix-2) Bit-reversed access Stage 1 Stage 2 Stage 3 http://staff.ustc.edu.cn/~csli/graduate/algorithms/book6/chap32.htm

cufft 1D example Correction: Remember to use cufftdestroy(plan) when finished with transforms

Today Homework 3 Large-kernel convolution

Systems Given input signal(s), produce output signal(s)

LTI system review (Week 1) Linear time-invariant (LTI) systems Lots of them! Can be characterized entirely by impulse response h n Output given from input by convolution: y n = x k h[n k] k=

Parallelization y n = x k h[n k] k= Convolution is parallelizable! Sequential pseudocode (ignoring boundary conditions): (set all y[i] to 0) For (i from 0 through x.length - 1) for (j from 0 through h.length 1) y[i] += (appropriate terms from x and h)

A problem This worked for small impulse responses E.g. h[n], 0 n 20 in HW 1 Homework 1 was small-kernel convolution : (Vocab alert: Impulse responses are often called kernels!)

A problem y n = k= x k h[n k] Sequential runtime: O(n*m) (n: size of x) (m: size of h) Troublesome for large m! (i.e. large impulse responses) (set all y[i] to 0) For (i from 0 through x.length - 1) for (j from 0 through h.length 1) y[i] += (appropriate terms from x and h)

DFT/FFT Same problem with Discrete Fourier Transform! Successfully optimized and GPU-accelerated! O(n 2 ) to O(n log n) Can we do the same here?

Circular convolution equivalent of convolution for periodic signals

Circular convolution Linear convolution: y n = k= x k h[n k] Circular convolution: N 1 y n = x k h[ n k mod N] k=0

x0 x1 x2 x3 h0 h1 0 0 x[0..3], h[0..1] Linear convolution: y[0] = x[0]h[0] Example: y n = k= x k h[n k] 0 0x0 x1 x2 x3 0 0 0h1 h0 0 0 0 0 0

x0 x1 x2 x3 h0 h1 0 0 x[0..3], h[0..1] Linear convolution: Example: y[0] = x[0]h[0] y[1] = x[0]h[1] + x[1]h[0] y n = x k h[n k] k= 0 0x0 x1 x2 x3 0 0 0h10 0h1 h0 h00 0 0 0 0

x0 x1 x2 x3 h0 h1 0 0 x[0..3], h[0..1] Linear convolution: Example: y[0] = x[0]h[0] y[1] = x[0]h[1] + x[1]h[0] y[2] = x[1]h[1] + x[2]h[0] y[3] = x[2]h[1] + x[3]h[0] y n = x k h[n k] k= 0 0x0 x1 x2 x3 0 0 0h10 0h00h10 h00 0 0 0

x0 x1 x2 x3 h0 h1 0 0 x[0..3], h[0..1] Linear convolution: Example: y[0] = x[0]h[0] y[1] = x[0]h[1] + x[1]h[0] y[2] = x[1]h[1] + x[2]h[0] y[3] = x[2]h[1] + x[3]h[0] y[4] = x[3]h[1] + x[4]h[0] y n = x k h[n k] k=

x0 x1 x2 x3 h0 h1 0 0 x[0..3], h[0..1] Linear convolution: y[0] = x[0]h[0] y[1] = x[0]h[1] + x[1]h[0] y[2] = x[1]h[1] + x[2]h[0] y[3] = x[2]h[1] + x[3]h[0] y[4] = x[3]h[1] + x[4]h[0] Circular convolution: Example: y n = y n = N 1 k=0 k= y[0] = x[0]h[0] + x[3]h[1] + x[2]h[2] + x[3]h[1] y[1] = x[0]h[1] + x[1]h[0] + x[2]h[3] + x[3]h[2] y[2] = x[1]h[1] + x[2]h[0] + x[3]h[3] + x[0]h[2] y[3] = x[2]h[1] + x[3]h[0] + x[0]h[3] + x[1]h[2] x k h[n k] x k h[ n k mod N] 0 0x0 x1 x2 x3 0 0 0h1 h0 0 0 0 0 0 x0 x1 x2 x3 h0 0 0h1

Circular Convolution Theorem* y n = N 1 k=0 x k h[ n k mod N] Can be calculated by: IFFT( FFT(x).* FFT(h) ) i.e. X = FFT( x) H = FFT(h) For all i: Y i = X i H i Then: y = IFFT(Y) * DFT case

Circular Convolution Theorem* Can be calculated by: IFFT( FFT(x).* FFT(h) ) i.e. For all i: Then: y n = N 1 k=0 x k h[ n k mod N] X = FFT( x) H = FFT(h) Y i = X i H i y = IFFT(Y) O(n log n) Assume n > m O(m log m) O(n) O(n log n) Total: O(n log n) * DFT case

x[n] and h[n] are different lengths? How to linearly convolve using circular convolution?

Padding x[n] and h[n] presumed zero where not defined Computationally: Store x[n] and h[n] as larger arrays Pad both to at least x.length + h.length - 1

Example: (Padding) x[0..3], h[0..1] Linear convolution: y[0] = x[0]h[0] y[1] = x[0]h[1] + x[1]h[0] y[2] = x[1]h[1] + x[2]h[0] y[3] = x[2]h[1] + x[3]h[0] y[4] = x[3]h[1] + x[4]h[0] Circular convolution: y n = y n = N 1 k=0 k= y[0] = x[0]h[0] + x[3]h[1] + x[2]h[2] + x[3]h[1] y[1] = x[0]h[1] + x[1]h[0] + x[2]h[3] + x[3]h[2] y[2] = x[1]h[1] + x[2]h[0] + x[3]h[3] + x[0]h[2] y[3] = x[2]h[1] + x[3]h[0] + x[0]h[3] + x[1]h[2] x k h[n k] x k h[ n k mod N]

Example: (Padding) x[0..3], h[0..1] Linear convolution: y[0] = x[0]h[0] y[1] = x[0]h[1] + x[1]h[0] y[2] = x[1]h[1] + x[2]h[0] y[3] = x[2]h[1] + x[3]h[0] y[4] = x[3]h[1] + x[4]h[0] Circular convolution: y n = y n = N 1 k=0 k= y[0] = x[0]h[0] + x[1]h[4] + x[2]h[3] + x[3]h[2] + x[4]h[1] y[1] = x[0]h[1] + x[1]h[0] + x[2]h[4] + x[3]h[3] + x[4]h[2] y[2] = x[1]h[1] + x[2]h[0] + x[3]h[4] + x[4]h[3] + x[0]h[2] y[3] = x[2]h[1] + x[3]h[0] + x[4]h[4] + x[0]h[3] + x[1]h[2] y[4] = x[3]h[1] + x[4]h[0] + x[0]h[4] + x[1]h[3] + x[2]h[2] x k h[n k] N is now (4 + 2 1) = 5 x k h[ n k mod N]

Summary Alternate algorithm for large impulse response convolution! Serial: O(n log n) vs. O(mn) Small vs. large m determines algorithm choice Runtime does carry over to parallel situations (to some extent)

Homework 3, Part 1 Implement FFT ( large-kernel ) convolution Use cufft for FFT/IFFT (if brave, try your own) Use batch variable to save FFT calculations Correction: Good practice in general, but results in poor performance on Homework 3 Don t forget padding Complex multiplication kernel: Multiply the FFT values pointwise!

Complex numbers cufftcomplex: cufft complex number type Example usage: cufftcomplex a; a.x = 3; a.y = 4; // Real part // Imaginary part Complex Multiplication: (a + bi)(c + di) = (ac - bd) + (ad + bc)i

Homework 3, Part 2 For a normalized Gaussian filter, output values cannot be larger than the largest input Not true in general Low Pass Filter Test Signal and Response

Normalization Amplitudes must lie in range [-1, 1] Normalize s.t. maximum magnitude is 1 (or 1 - ε) How to find maximum amplitude?

Reduction This time, maximum (instead of sum) Lecture 7 strategies Optimizing Parallel Reduction in CUDA (Harris)

Homework 3, Part 2 Implement GPU-accelerated normalization Find maximum (reduction) The max amplitude may be a negative sample Divide by maximum to normalize

(Demonstration) Rooms can be modeled as LTI systems!

Other notes Machines: Normal mode: haru, mako, mx, minuteman Audio mode: haru, mako Due date: Wednesday (4/20), 3 PM