CORDIC, Divider, Square Root

Similar documents
A HIGH-SPEED PROCESSOR FOR RECTANGULAR-TO-POLAR CONVERSION WITH APPLICATIONS IN DIGITAL COMMUNICATIONS *

Part VI Function Evaluation

On-Line Hardware Implementation for Complex Exponential and Logarithm

Lecture 11. Advanced Dividers

Fixed-Point Trigonometric Functions on FPGAs

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017

Using the CORDIC Algorithm for Fast Trigonometric Operations in Hardware

Analysis and Synthesis of Weighted-Sum Functions

Cost/Performance Tradeoff of n-select Square Root Implementations

THE discrete sine transform (DST) and the discrete cosine

EECS150 - Digital Design Lecture 25 Shifters and Counters. Recap

CPS 104 Computer Organization and Programming Lecture 11: Gates, Buses, Latches. Robert Wagner

Svoboda-Tung Division With No Compensation

FPGA accelerated multipliers over binary composite fields constructed via low hamming weight irreducible polynomials

FPGA Implementation of a Predictive Controller

On LUT Cascade Realizations of FIR Filters

Design and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives

General CORDIC Description (1A)

DSP Design Lecture 7. Unfolding cont. & Folding. Dr. Fredrik Edman.

Low-Complexity Multiplierless Constant Rotators Based on Combined Coefficient Selection and Shift-and-Add Implementation (CCSSI)

como trabajan las calculadoras?

Logic and Computer Design Fundamentals. Chapter 8 Sequencing and Control

Transformation Techniques for Real Time High Speed Implementation of Nonlinear Algorithms

Complex Logarithmic Number System Arithmetic Using High-Radix Redundant CORDIC Algorithms

A 32-bit Decimal Floating-Point Logarithmic Converter

Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System

EECS150 - Digital Design Lecture 11 - Shifters & Counters. Register Summary

An Area Efficient Enhanced Carry Select Adder

Radix-4 Vectoring CORDIC Algorithm and Architectures. July 1998 Technical Report No: UMA-DAC-98/20

Microtrend Systems Inc. Fixed Point Two s Complement CORDIC Arithmetic on MSP430

Chapter 5: Solutions to Exercises

Circuit Analysis-III. Circuit Analysis-II Lecture # 3 Friday 06 th April, 18

Test Pattern Generator for Built-in Self-Test using Spectral Methods

IMPLEMENTATION OF SIGNAL POWER ESTIMATION METHODS

Design at the Register Transfer Level

VLSI Arithmetic. Lecture 9: Carry-Save and Multi-Operand Addition. Prof. Vojin G. Oklobdzija University of California

Binary Multipliers. Reading: Study Chapter 3. The key trick of multiplication is memorizing a digit-to-digit table Everything else was just adding

CMPEN 411 VLSI Digital Circuits Spring Lecture 21: Shifters, Decoders, Muxes

Design and Comparison of Wallace Multiplier Based on Symmetric Stacking and High speed counters

AN IMPROVED LOW LATENCY SYSTOLIC STRUCTURED GALOIS FIELD MULTIPLIER

EE216B: VLSI Signal Processing. FFT Processors. Prof. Dejan Marković FFT: Background

Logic Design II (17.342) Spring Lecture Outline

Design and Study of Enhanced Parallel FIR Filter Using Various Adders for 16 Bit Length

Determining Appropriate Precisions for Signals in Fixed-Point IIR Filters

A Digit-Serial Systolic Multiplier for Finite Fields GF(2 m )

SOLVED PROBLEMS ON TAYLOR AND MACLAURIN SERIES

FIXED WIDTH BOOTH MULTIPLIER BASED ON PEB CIRCUIT

EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters

Digital Logic: Boolean Algebra and Gates. Textbook Chapter 3

DSP Configurations. responded with: thus the system function for this filter would be

How Do Calculators Calculate? Helmut Knaust Department of Mathematical Sciences University of Texas at El Paso

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Example: vending machine

GF(2 m ) arithmetic: summary

Mark Redekopp, All rights reserved. Lecture 1 Slides. Intro Number Systems Logic Functions

Combinational Logic Design Combinational Functions and Circuits

Advanced Hardware Architecture for Soft Decoding Reed-Solomon Codes

doi: /TCAD

Reduce-by-Feedback: Timing resistant and DPA-aware Modular Multiplication plus: How to Break RSA by DPA

Multi-Level Logic Optimization. Technology Independent. Thanks to R. Rudell, S. Malik, R. Rutenbar. University of California, Berkeley, CA

Physics 6303 Lecture 22 November 7, There are numerous methods of calculating these residues, and I list them below. lim

Chapter 1: Solutions to Exercises

A High-Speed Realization of Chinese Remainder Theorem

King Fahd University of Petroleum and Minerals College of Computer Science and Engineering Computer Engineering Department

PLA Minimization for Low Power VLSI Designs

LOGIC CIRCUITS. Basic Experiment and Design of Electronics

ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN. Week 9 Dr. Srinivas Shakkottai Dept. of Electrical and Computer Engineering

ELCT201: DIGITAL LOGIC DESIGN

LUTMIN: FPGA Logic Synthesis with MUX-Based and Cascade Realizations

EECS 579: Logic and Fault Simulation. Simulation

A New Bit-Serial Architecture for Field Multiplication Using Polynomial Bases

CprE 281: Digital Logic

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte

6. Finite State Machines

Implementation of Particle Filter-based Target Tracking

A VLSI Algorithm for Modular Multiplication/Division

FILTERING IN THE FREQUENCY DOMAIN

Analytic Trigonometry. Copyright Cengage Learning. All rights reserved.

Quasi Analog Formal Neuron and Its Learning Algorithm Hardware

A Novel LUT Using Quaternary Logic

Solutions to Laplace s Equations

ALU (3) - Division Algorithms

EECS150 - Digital Design Lecture 16 Counters. Announcements

Numbering Systems. Computational Platforms. Scaling and Round-off Noise. Special Purpose. here that is dedicated architecture

Design Method for Numerical Function Generators Based on Polynomial Approximation for FPGA Implementation

AREA EFFICIENT MODULAR ADDER/SUBTRACTOR FOR RESIDUE MODULI

For smaller NRE cost For faster time to market For smaller high-volume manufacturing cost For higher performance

Introduction to Quantum Computing

VLSI Signal Processing

Sequential Logic Worksheet

Lecture 17: Designing Sequential Systems Using Flip Flops

Memory Elements I. CS31 Pascal Van Hentenryck. CS031 Lecture 6 Page 1

Multivariate Gaussian Random Number Generator Targeting Specific Resource Utilization in an FPGA

Parallelization of the QC-lib Quantum Computer Simulator Library

Chapter 8. Low-Power VLSI Design Methodology

Review: Designing with FSM. EECS Components and Design Techniques for Digital Systems. Lec09 Counters Outline.

EECS150 - Digital Design Lecture 10 - Combinational Logic Circuits Part 1

A Low-Error Statistical Fixed-Width Multiplier and Its Applications

Hardware Operator for Simultaneous Sine and Cosine Evaluation

Ch 7. Finite State Machines. VII - Finite State Machines Contemporary Logic Design 1

Transcription:

4// EE6B: VLSI Signal Processing CORDIC, Divider, Square Root Prof. Dejan Marković ee6b@gmail.com Iterative algorithms CORDIC Division Square root Lecture Overview Topics covered include Algorithms and their implementation Convergence analysis Speed of convergence The choice of initial condition 7.

4// CORDIC To perform the following transformation y(t) = y R + j y I y e jφ and the inverse, we use the CORDIC algorithm CORDIC - COordinate Rotation DIgital Computer 7.3 CORDIC: Idea Use rotations to implement a variety of functions Examples: x j y x y e ( / ) j tan y x z x y z cos( y / x) z tan( y / x) z x / y z sin( y / x) z sinh( y / x) z tan ( y / x) z cos () y 7.4

4// CORDIC (Cont.) How to do it? Start with general rotation by φ x = x cos(φ) y sin(φ) y = y cos(φ) + x sin(φ) x = cos(φ) [x y tan(φ)] y = cos(φ) [y + x tan(φ)] The trick is to only do rotations by values of tan(φ) which are powers of 7.5 CORDIC (Cont.) Rotation Number φ tan(φ) k i 45 o 6.565 o 4.36 o 3 7.5 o 3 4 3 3.576 o 4 5 4.79 o 5 6 5.895 o 6 7 6 To rotate to any arbitrary angle, we do a sequence of rotations to get to that value 7.6 3

4// Basic CORDIC Iteration x i+ = (K i ) [x i y i d i i ] y i+ = (K i ) [y i + x i d i i ] K i = cos(tan ( i )) = /( + i ).5 d i = ± The d i is chosen to rotate by φ If we don t multiply (x i+, y i+ ) by K i we get a gain error which is independent of the direction of the rotation The error converges to.6 - May not need to compensate for it We also can accumulate the rotation angle: z i+ = z i d i tan ( i ) 7.7 Example Initial vector is described by x and y coordinates y φ Initial vector x We want to find φ and (x + y ).5 7.8 4

4// Step : Check the Angle / Sign of y If positive, rotate by 45 o If negative, rotate by +45 o y y 45 o d = (y > ) x = x + y / y = y x / x x 7.9 If positive, rotate by 6.57 o If negative, rotate by +6.57 o Step : Check the Sign of y d = (y > ) y 6 o x = x + y /4 y = y x /4 x y x 7. 5

4// Repeat Step for Each Rotation k Until y n = y n = x n = A n (x + y ).5 y n x n accumulated gain 7. Gain accumulation: G = G G =.44 G G G =.58 G G G G 3 =.63 G G G G 3 G 4 =.64 The Gain Factor So, start with x, y ; end up with: Shift & adds of x z 3 = 7 o, y (x + y ).5 =.64 ( ) We did the rectangular-to-polar coordinate conversion 7. 6

4// Rectangular-to-Polar Conversion: Summary Start with vector on x-axis A = A e jφ x = A y =, z = φ z i <, d i = z i >, d i = + y x z i+ = z i d i tan ( i ) 7.3 CORDIC Algorithm d i =, z i < +, z i > Rotation mode (rotate by specified angle) Minimize residual angle Result x n = A n [x cos(z ) y sin(z )] y n = A n [y cos(z ) + x sin(z )] z n = x i+ = x i y i d i i y i+ = y i + x i d i i z i+ = z i d i tan ( i ), y i > d i = +, y i < n A n = (+ i ).5.647 Vectoring mode (align with the x-axis) Minimize y component Result x n = A n (x + y ).5 y n = z n = z + tan (y /x ) 7.4 7

4// Vectoring Example 45 o It: 4.4 o It: 3.58 o It: 4 It: 5 7.3 o It: 3 Acc. Gain K = K =.44 K =.58 K 3 =.63 K 4 =.64 K 5 =.646 Etc. 6.57 o Residual angle φ = 3 o φ = 5 o φ =.57 o φ =.47 o φ = 4.65 o φ =.8 o It: 7.5 Vectoring Example: Best-Case Convergence 45 o It: It: Acc. Gain Residual angle K = φ = 45 o K =.44 φ = o K =.58 φ = o K 3 =.63 φ = o K 4 =.64 φ = o K 5 =.646 φ = o Etc. In the best case (φ = 45 o ), we can converge in one iteration 7.6 8

4// Calculating Sine and Cosine To calculate sin and cos: Start with x = /.64, y = Rotate by φ sin(φ) y n = sin(φ) x n = cos(φ) φ cos(φ) (/.64) 7.7 Functions Rotation mode sin/cos z = angle y =, x = /A n x n = A n x cos(z ) y n = A n x sin(z ) (=) Vectoring mode tan z = z n = z + tan (y /x ) Vector/Magnitude x n = A n (x + y ).5 Polar Rectangular x n = r cos(φ) y n = r sin(φ) x = r z = φ y = Rectangular Polar r = (x + y ).5 φ = tan (y /x ) 7.8 9

4// CORDIC Divider To do a divide, change CORDIC rotations to a linear function calculator x i+ = x i y i d i i = x i y i+ = y i + x i d i i z i+ = z i d i ( i ) 7.9 Generalized CORDIC x i+ = x i m y i d i i y i+ = y i + x i d i i z i+ = z i d i e i Rotation d i =, z i < d i = +, z i > d i Vectoring d i =, y i > d i = +, y i < sign(z i ) sign(y i ) Mode m e i Circular + tan ( i ) Linear i Hyperbolic tanh ( i ) 7.

4// An FPGA Implementation [] x Reg Vectoring sgn(y i ) y Rotation sgn(z i ) z Reg Reg >>n >>n ROM ± ± ± m d i d i d i x n Cir/Lin/Hyp y n z n Three difference equations directly mapped to hardware The decision d i is driven by the sign of the y or z register Vectoring: d i = sign(y i ) Rotation: d i = sign(z i ) The initial values loaded via muxes On each clock cycle Register values are passed through shifters and add/sub and the values placed in registers The shifters are modified on each iteration to cause the desired shift (state machine) Elementary angle stored in ROM Last iteration: results read from reg [] R. Andraka, A survey of CORDIC algorithms for FPGA based computers, in Proc. Int. Symp. Field Programmable Gate Arrays, Feb. 998, pp. 9-. 7. Iterative Sqrt and Division [] ~5k gates Inputs: a (4 bits), reset (active high) Outputs: zs (6 bits), zd (6 bits) Total: 3 bits [] C. Ramamoorthy, J. Goodman, and K. Kim, "Some Properties of Iterative Square-Rooting Methods Using High-Speed Multiplication," IEEE Trans. Computers, vol. C-, no. 8, pp. 837 847, Aug. 97. 7.

4// Iterative /sqrt(z): Simulink XSG Model rst Z wordlength latency x s (k) x s (k + ) = x s (k) / (3 Z x s (k)) x s (k+) User defined parameters User defined Wordlength parameters: (#bits, binary pt) - data type - wordlength Quantization, (#bits, binary pt) overflow - quantization Latency, sample period - overflow The - latency choice of initial condition - sample period Determines # iterations and convergence 7.3 Quadratic Convergence: /sqrt(n) x k x () k s s( ) (3 N xs( k) ) x () s k N k ys() k ys( k ) (3 ys( k) ) ys( k) k e ( k) y ( k) s s e k e k e k s( ) s( ) (3 s( )) 7.4

4// Quadratic Convergence: /N x ( k ) x ( k) ( N x ( k)) d d d x () d k N k y ( k ) y ( k) ( y ( k)) y ( k) d d d d k e ( k) y ( k) d d e ( k ) e ( k) d d 7.5 Initial Condition: /sqrt(n) ys() k ys( k ) (3 ys( k) ) ys( k) k 4 y s (k+) 5 3 Convergence: Conv. stripes: () 3 y s 3 () 5 y s - -4-4 - 4 y s (k) Divergence: y () 5 s 7.6 3

4// Initial Condition: /N y ( k ) y ( k) ( y ( k)) y ( k) d d d d k 4 y d (k+) - Convergence: Divergence: () y d otherwise -4-4 - 4 y d (k) 7.7 /sqrt(n): Convergence Analysis xn x n 3 N xn xn N n x n y n N Error: e n y n y y n n 3 y n [3] 3 e e e 3 n n n [3] D. Marković, A Power/Area Optimal Approach to VLSI Signal Processing, Ph.D. Thesis, University of California, Berkeley, 6. 7.8 4

y s (k+) 4// /sqrt(n): Convergence Analysis (Cont.) y y n n 3 y y n n n 4 3 - - -3 5 3 y 3 3y 5 y 5 convergence conv. stripes divergence -4-4 - 4 y s (k) 7.9 Error 6 5 4 divergence e e n+ n+ 3 conv. stripes.5 convergence -.5 -.5 -.5.5 e e n n 7.3 5

Solution Solution 4// Choosing the Initial Condition.5.5.5.5.5.5 sqrt(5) δ sqrt(5) δ δ sqrt(3) + δ + δ sqrt(3) + δ + δ δ 5 5 5 Iteration 7.3 Initial Condition > sqrt(5) Results in Divergence.5.5.5.5.5.5 Iteration sqrt(5) + δ sqrt(5) δ sqrt(5) δ δ 4 6 8 4 6 8 7.3 6

4// /sqrt(): Picking Initial Condition x x n n 3 N xn xn N n Equilibriums, N Initial condition Take: V( xn) xn N Find x such that: V( xn ) V( x ) n ( n,,,...) Solution: S { x : V( x ) a} n Level set V(x ) = a is a convergence bound Local convergence (3 equilibriums) (6.) (6.) (6.3) 7.33 Descending Absolute Error.6.4 /sqrt 3.6.4 Div sqrt, V(x ). div, V(x ). -. -.5+4.5 -. 3 3 initial condition, x initial condition, x x Vs ( x) ( x ) ( x ) ( x x 4) Descending error: E( x ) ( ) k 4 V ( x ) x ( x ) ( x ) d V( x ) E( x ) E( x ) k xk k k k 7.34 7

4// /sqrt(n): Picking Initial Condition (Cont.) (6.) x 5 N 3 ( N x) x x N f3.5.5 -.5 Numbers below: N N=/64 N=/6 N=/4 N= N=4 N=6 /64 /6 /4 4 6 Roots: x N x,3 N 7 - -.5 - - initial condition x Max: x M 5 3N 7.35 Initial Condition Circuit (6.3) a x a N N N >6 >4 > >/4 >/6 x = /4 x = / x = x = x = 4 >/64 x = 8 7.36 8

4// Left: Internal Node, Right: Sampled Output Internal node and output Sampled output of the left plot sampled out internal Zoom in: convergence in 8 iterations 8 iterations Sampled output of the left plot sampled out Internal divide by extends range Sampled output of the left plot extended range internal sampled out 7.37 Convergence Speed # iterations required for specified accuracy Target relative error (%).% % 5% % e : 5%, # iter (sqrt/div) 5 / 4 5 / 3 4 / 3 3 / e : 5%, # iter (sqrt/div) 3 / 3 3 / / / Adaptive algorithm current result.ic for next iteration N k+ N k.ic Iterative algorithm (y k ) /sqrt(n) 7.38 9

4// Summary Iterative algorithms can be use for a variety of DSP functions CORDIC uses angular rotations to compute trigonometric and hyperbolic functions as well as divide and other operations One bit of resolution is resolved in each iteration Netwon-Raphson algorithms for square root and division have faster convergence than CORDIC Two bits of resolution are resolved in each iteration (the algorithm has quadratic error convergence) Convergence speed greatly depends on the initial condition The choice of initial condition can be made as to guarantee decreasing absolute error in each iteration For slowly varying inputs, adaptive algorithms can use the result of current iteration as the initial condition Hardware latency depends on the initial condition and accuracy 7.39 References R. Andraka, "A Survey of CORDIC Algorithms for FPGA based Computers," in Proc. Int. Symp. Field Programmable Gate Arrays, Feb. 998, pp. 9-. C. Ramamoorthy, J. Goodman, and K. Kim, "Some Properties of Iterative Square-Rooting Methods Using High-Speed Multiplication," IEEE Trans. Computers, vol. C-, no. 8, pp. 837 847, Aug. 97. D. Marković, A Power/Area Optimal Approach to VLSI Signal Processing, Ph.D. Thesis, University of California, Berkeley, 6. 7.4

4// Additional References S. Oberman and M. Flynn, "Division Algorithms and Implementations," IEEE Trans. Computers, vol. 47, no. 8, pp. 833 854, Aug. 997. T. Lang and E. Antelo, "CORDIC Vectoring with Arbitrary Target Value," IEEE Trans. Computers, vol. 47, no. 7, pp. 736 749, July 998. J. Valls, M. Kuhlmann, and K.K. Parhi, "Evaluation of CORDIC Algorithms for FPGA Design," J. VLSI Signal Processing 3 (Kluwer Academic Publishers), pp. 7, Nov.. 7.4