CORDIC, Divider, Square Root - PDF Free Download

4// EE6B: VLSI Signal Processing CORDIC, Divider, Square Root Prof. Dejan Marković ee6b@gmail.com Iterative algorithms CORDIC Division Square root Lecture Overview Topics covered include Algorithms and their implementation Convergence analysis Speed of convergence The choice of initial condition 7.

4// CORDIC To perform the following transformation y(t) = y R + j y I y e jφ and the inverse, we use the CORDIC algorithm CORDIC - COordinate Rotation DIgital Computer 7.3 CORDIC: Idea Use rotations to implement a variety of functions Examples: x j y x y e ( / ) j tan y x z x y z cos( y / x) z tan( y / x) z x / y z sin( y / x) z sinh( y / x) z tan ( y / x) z cos () y 7.4

4// CORDIC (Cont.) How to do it? Start with general rotation by φ x = x cos(φ) y sin(φ) y = y cos(φ) + x sin(φ) x = cos(φ) [x y tan(φ)] y = cos(φ) [y + x tan(φ)] The trick is to only do rotations by values of tan(φ) which are powers of 7.5 CORDIC (Cont.) Rotation Number φ tan(φ) k i 45 o 6.565 o 4.36 o 3 7.5 o 3 4 3 3.576 o 4 5 4.79 o 5 6 5.895 o 6 7 6 To rotate to any arbitrary angle, we do a sequence of rotations to get to that value 7.6 3

4// Basic CORDIC Iteration x i+ = (K i ) [x i y i d i i ] y i+ = (K i ) [y i + x i d i i ] K i = cos(tan ( i )) = /( + i ).5 d i = ± The d i is chosen to rotate by φ If we don t multiply (x i+, y i+ ) by K i we get a gain error which is independent of the direction of the rotation The error converges to.6 - May not need to compensate for it We also can accumulate the rotation angle: z i+ = z i d i tan ( i ) 7.7 Example Initial vector is described by x and y coordinates y φ Initial vector x We want to find φ and (x + y ).5 7.8 4

4// Step : Check the Angle / Sign of y If positive, rotate by 45 o If negative, rotate by +45 o y y 45 o d = (y > ) x = x + y / y = y x / x x 7.9 If positive, rotate by 6.57 o If negative, rotate by +6.57 o Step : Check the Sign of y d = (y > ) y 6 o x = x + y /4 y = y x /4 x y x 7. 5

4// Repeat Step for Each Rotation k Until y n = y n = x n = A n (x + y ).5 y n x n accumulated gain 7. Gain accumulation: G = G G =.44 G G G =.58 G G G G 3 =.63 G G G G 3 G 4 =.64 The Gain Factor So, start with x, y ; end up with: Shift & adds of x z 3 = 7 o, y (x + y ).5 =.64 ( ) We did the rectangular-to-polar coordinate conversion 7. 6

4// Rectangular-to-Polar Conversion: Summary Start with vector on x-axis A = A e jφ x = A y =, z = φ z i <, d i = z i >, d i = + y x z i+ = z i d i tan ( i ) 7.3 CORDIC Algorithm d i =, z i < +, z i > Rotation mode (rotate by specified angle) Minimize residual angle Result x n = A n [x cos(z ) y sin(z )] y n = A n [y cos(z ) + x sin(z )] z n = x i+ = x i y i d i i y i+ = y i + x i d i i z i+ = z i d i tan ( i ), y i > d i = +, y i < n A n = (+ i ).5.647 Vectoring mode (align with the x-axis) Minimize y component Result x n = A n (x + y ).5 y n = z n = z + tan (y /x ) 7.4 7

4// Vectoring Example 45 o It: 4.4 o It: 3.58 o It: 4 It: 5 7.3 o It: 3 Acc. Gain K = K =.44 K =.58 K 3 =.63 K 4 =.64 K 5 =.646 Etc. 6.57 o Residual angle φ = 3 o φ = 5 o φ =.57 o φ =.47 o φ = 4.65 o φ =.8 o It: 7.5 Vectoring Example: Best-Case Convergence 45 o It: It: Acc. Gain Residual angle K = φ = 45 o K =.44 φ = o K =.58 φ = o K 3 =.63 φ = o K 4 =.64 φ = o K 5 =.646 φ = o Etc. In the best case (φ = 45 o ), we can converge in one iteration 7.6 8

4// Calculating Sine and Cosine To calculate sin and cos: Start with x = /.64, y = Rotate by φ sin(φ) y n = sin(φ) x n = cos(φ) φ cos(φ) (/.64) 7.7 Functions Rotation mode sin/cos z = angle y =, x = /A n x n = A n x cos(z ) y n = A n x sin(z ) (=) Vectoring mode tan z = z n = z + tan (y /x ) Vector/Magnitude x n = A n (x + y ).5 Polar Rectangular x n = r cos(φ) y n = r sin(φ) x = r z = φ y = Rectangular Polar r = (x + y ).5 φ = tan (y /x ) 7.8 9

4// CORDIC Divider To do a divide, change CORDIC rotations to a linear function calculator x i+ = x i y i d i i = x i y i+ = y i + x i d i i z i+ = z i d i ( i ) 7.9 Generalized CORDIC x i+ = x i m y i d i i y i+ = y i + x i d i i z i+ = z i d i e i Rotation d i =, z i < d i = +, z i > d i Vectoring d i =, y i > d i = +, y i < sign(z i ) sign(y i ) Mode m e i Circular + tan ( i ) Linear i Hyperbolic tanh ( i ) 7.

4// An FPGA Implementation [] x Reg Vectoring sgn(y i ) y Rotation sgn(z i ) z Reg Reg >>n >>n ROM ± ± ± m d i d i d i x n Cir/Lin/Hyp y n z n Three difference equations directly mapped to hardware The decision d i is driven by the sign of the y or z register Vectoring: d i = sign(y i ) Rotation: d i = sign(z i ) The initial values loaded via muxes On each clock cycle Register values are passed through shifters and add/sub and the values placed in registers The shifters are modified on each iteration to cause the desired shift (state machine) Elementary angle stored in ROM Last iteration: results read from reg [] R. Andraka, A survey of CORDIC algorithms for FPGA based computers, in Proc. Int. Symp. Field Programmable Gate Arrays, Feb. 998, pp. 9-. 7. Iterative Sqrt and Division [] ~5k gates Inputs: a (4 bits), reset (active high) Outputs: zs (6 bits), zd (6 bits) Total: 3 bits [] C. Ramamoorthy, J. Goodman, and K. Kim, "Some Properties of Iterative Square-Rooting Methods Using High-Speed Multiplication," IEEE Trans. Computers, vol. C-, no. 8, pp. 837 847, Aug. 97. 7.

4// Iterative /sqrt(z): Simulink XSG Model rst Z wordlength latency x s (k) x s (k + ) = x s (k) / (3 Z x s (k)) x s (k+) User defined parameters User defined Wordlength parameters: (#bits, binary pt) - data type - wordlength Quantization, (#bits, binary pt) overflow - quantization Latency, sample period - overflow The - latency choice of initial condition - sample period Determines # iterations and convergence 7.3 Quadratic Convergence: /sqrt(n) x k x () k s s( ) (3 N xs( k) ) x () s k N k ys() k ys( k ) (3 ys( k) ) ys( k) k e ( k) y ( k) s s e k e k e k s( ) s( ) (3 s( )) 7.4

4// Quadratic Convergence: /N x ( k ) x ( k) ( N x ( k)) d d d x () d k N k y ( k ) y ( k) ( y ( k)) y ( k) d d d d k e ( k) y ( k) d d e ( k ) e ( k) d d 7.5 Initial Condition: /sqrt(n) ys() k ys( k ) (3 ys( k) ) ys( k) k 4 y s (k+) 5 3 Convergence: Conv. stripes: () 3 y s 3 () 5 y s - -4-4 - 4 y s (k) Divergence: y () 5 s 7.6 3

4// Initial Condition: /N y ( k ) y ( k) ( y ( k)) y ( k) d d d d k 4 y d (k+) - Convergence: Divergence: () y d otherwise -4-4 - 4 y d (k) 7.7 /sqrt(n): Convergence Analysis xn x n 3 N xn xn N n x n y n N Error: e n y n y y n n 3 y n [3] 3 e e e 3 n n n [3] D. Marković, A Power/Area Optimal Approach to VLSI Signal Processing, Ph.D. Thesis, University of California, Berkeley, 6. 7.8 4

y s (k+) 4// /sqrt(n): Convergence Analysis (Cont.) y y n n 3 y y n n n 4 3 - - -3 5 3 y 3 3y 5 y 5 convergence conv. stripes divergence -4-4 - 4 y s (k) 7.9 Error 6 5 4 divergence e e n+ n+ 3 conv. stripes.5 convergence -.5 -.5 -.5.5 e e n n 7.3 5

Solution Solution 4// Choosing the Initial Condition.5.5.5.5.5.5 sqrt(5) δ sqrt(5) δ δ sqrt(3) + δ + δ sqrt(3) + δ + δ δ 5 5 5 Iteration 7.3 Initial Condition > sqrt(5) Results in Divergence.5.5.5.5.5.5 Iteration sqrt(5) + δ sqrt(5) δ sqrt(5) δ δ 4 6 8 4 6 8 7.3 6

4// /sqrt(): Picking Initial Condition x x n n 3 N xn xn N n Equilibriums, N Initial condition Take: V( xn) xn N Find x such that: V( xn ) V( x ) n ( n,,,...) Solution: S { x : V( x ) a} n Level set V(x ) = a is a convergence bound Local convergence (3 equilibriums) (6.) (6.) (6.3) 7.33 Descending Absolute Error.6.4 /sqrt 3.6.4 Div sqrt, V(x ). div, V(x ). -. -.5+4.5 -. 3 3 initial condition, x initial condition, x x Vs ( x) ( x ) ( x ) ( x x 4) Descending error: E( x ) ( ) k 4 V ( x ) x ( x ) ( x ) d V( x ) E( x ) E( x ) k xk k k k 7.34 7

4// /sqrt(n): Picking Initial Condition (Cont.) (6.) x 5 N 3 ( N x) x x N f3.5.5 -.5 Numbers below: N N=/64 N=/6 N=/4 N= N=4 N=6 /64 /6 /4 4 6 Roots: x N x,3 N 7 - -.5 - - initial condition x Max: x M 5 3N 7.35 Initial Condition Circuit (6.3) a x a N N N >6 >4 > >/4 >/6 x = /4 x = / x = x = x = 4 >/64 x = 8 7.36 8

4// Left: Internal Node, Right: Sampled Output Internal node and output Sampled output of the left plot sampled out internal Zoom in: convergence in 8 iterations 8 iterations Sampled output of the left plot sampled out Internal divide by extends range Sampled output of the left plot extended range internal sampled out 7.37 Convergence Speed # iterations required for specified accuracy Target relative error (%).% % 5% % e : 5%, # iter (sqrt/div) 5 / 4 5 / 3 4 / 3 3 / e : 5%, # iter (sqrt/div) 3 / 3 3 / / / Adaptive algorithm current result.ic for next iteration N k+ N k.ic Iterative algorithm (y k ) /sqrt(n) 7.38 9

4// Summary Iterative algorithms can be use for a variety of DSP functions CORDIC uses angular rotations to compute trigonometric and hyperbolic functions as well as divide and other operations One bit of resolution is resolved in each iteration Netwon-Raphson algorithms for square root and division have faster convergence than CORDIC Two bits of resolution are resolved in each iteration (the algorithm has quadratic error convergence) Convergence speed greatly depends on the initial condition The choice of initial condition can be made as to guarantee decreasing absolute error in each iteration For slowly varying inputs, adaptive algorithms can use the result of current iteration as the initial condition Hardware latency depends on the initial condition and accuracy 7.39 References R. Andraka, "A Survey of CORDIC Algorithms for FPGA based Computers," in Proc. Int. Symp. Field Programmable Gate Arrays, Feb. 998, pp. 9-. C. Ramamoorthy, J. Goodman, and K. Kim, "Some Properties of Iterative Square-Rooting Methods Using High-Speed Multiplication," IEEE Trans. Computers, vol. C-, no. 8, pp. 837 847, Aug. 97. D. Marković, A Power/Area Optimal Approach to VLSI Signal Processing, Ph.D. Thesis, University of California, Berkeley, 6. 7.4

4// Additional References S. Oberman and M. Flynn, "Division Algorithms and Implementations," IEEE Trans. Computers, vol. 47, no. 8, pp. 833 854, Aug. 997. T. Lang and E. Antelo, "CORDIC Vectoring with Arbitrary Target Value," IEEE Trans. Computers, vol. 47, no. 7, pp. 736 749, July 998. J. Valls, M. Kuhlmann, and K.K. Parhi, "Evaluation of CORDIC Algorithms for FPGA Design," J. VLSI Signal Processing 3 (Kluwer Academic Publishers), pp. 7, Nov.. 7.4