DSP Design Lecture 7. Unfolding cont. & Folding. Dr. Fredrik Edman.

Size: px

Start display at page:

Download "DSP Design Lecture 7. Unfolding cont. & Folding. Dr. Fredrik Edman."

Thomasine Whitehead
5 years ago
Views:

1 SP esign Lecture 7 Unfolding cont. & Folding r. Fredrik Edman fredrik.edman@eit.lth.se

2 Unfolding Unfolding creates a program with more than one iteration, J=unfolding factor Unfolding is a structured way to achieve parallel processing Applications Reveal hidden concurrencies so that the program can be scheduled to a smaller iteration period T Parallel processing Bit-serial and igit-serial Unfolding = Loop unrolling assembly programming compiler theory

3 General Algorithm for unfolding Step 1. For each node U in the original FG, draw J nodes U 0, U 1, U 2,, U J-1 i = 0, 1,, J-1 U 0 J= 9 V 0 U 37 V U 1 9 V 1 ( i w) ( i 37) + J = + = 9, i 10, = 0,1,2 i = 3 U 2 U V 2 V 3 Step 2. For each edge U V with w delays in the original FG, draw the J edges U i V (i + w)%j (i+w)/j delays for i = 0, 1,, J-1 with

4 Properties of unfolding Unfolding preserves the number of delays in a FG w/j + (w+1)/j + + (w + J - 1)/J = w Unfolding preserves precedence constraints J-unfolding of a loop with w l delays in the original FG gcd(w l, J) loops in the unfolded FG. Each loop contains w l /gcd(w l, J) delays and J/ gcd(w l, J) copies of each node. Unfolding a FG with iteration bound T results in a J-unfolded FG with iteration bound JT.

5 Unfolding of Switches The following assumptions are made when unfolding an edge U V containing a switch : The wordlength W is a multiple of the unfolding factor J, i.e. W = W J. All edges into and out of the switch have no delays. If so, an edge U V can be unfolded as: Write the switching instance as Wl + u = J( W l + u/j ) + (u%j) raw an edge from the node U u%j V u%j, which is switched at time instance ( W l + u/j ). U Wl+u V

6 Unfolding (cont.) Chapter 5

7 What about Switches with elays? Unfolding a FG containing an edge having a switch and a positive number of delays is done by introducing a dummy node. A B 2 6l + 1, 5 6l + 0, 2, 3, C Inserting ummy node A B 2 6l + 1, 5 6l + 0, 2, 3, C After unfolding remove the dummy node!

8 Example: How to Unfold a Bit-serial Adder A S Output INPUTS X Bit-Serial B l +0 l +1,2,3 a b i i s i cout i Reset Carry = 0 Z Carry

9 Example: How to Unfold a Bit-serial Adder A S Output INPUTS X B l +0 l +1,2,3 ummy node Reset Carry = 0 Z Carry

10 Unfold Bit-serial Adder, J=2 A 0 S 0 A 1 S 1 X 0 X 1 B 0 0 B 1 1 Z 0 Z 1 For each node U in the original FG, draw J nodes U 0, U 1, U 2,, U J-1

11 Unfold Bit-serial Adder, J=2 A 0 S 0 A 1 S 1 X 0 X 1 B 0 0 B 1 1 Z 0 Z 1 For each edge U V with w delays in the original FG, draw the J edges U i V (i + w)%j with (i+w)/j delays for i = 0, 1,, J-1 If edge has w=0 U i V i with 0 delays

12 Unfold Bit-serial Adder, J=2 A 0 S 0 A 1 S 1 X 0 X 1 B 0 0 B 1 1 Z 0 Z 1 For each edge U V with w delays in the original FG, draw the J edges U i V (i + w)%j with (i+w)/j delays for i = 0, 1,, J-1 X for i=0 X 0 1 with 0 delays and X for i=1 X 1 0 with 1 delays

13 Unfold the Switch, J=2 A 0 S 0 X 0 Z X l+0 2(2l+0)+0 B 0 l+0 l+1,2,3 Z 0 0 X l+1 2(2l+0)+1 l+2 2(2l+1)+0 l+3 2(2l+1)+1 Write the switching instance as Wl + u = J( W l + u/j ) + (u%j)

14 Unfold the Switch, J=2 A 0 S 0 X 0 Z X l+0 2(2l+0)+0 B 0 l+0 l+1,2,3 Z 0 0 X l+1 2(2l+0)+1 l+2 2(2l+1)+0 l+3 2(2l+1)+1 Z 0 X 0 at time 2l+0 0 X 0 at time 2l+1

15 Unfold the Switch, J=2 A 0 S 0 X 0 Z X l+0 2(2l+0)+0 B 0 l+0 l+1,2,3 Z 0 0 X l+1 2(2l+0)+1 l+2 2(2l+1)+0 l+3 2(2l+1)+1 Z 0 X 0 at time 2l+0 0 X 0 at time 2l+1 1 X 1 at time 2l+0,1 i.e. always closed

16 Unfold the Switch, J=2 A 0 S 0 A 1 S 1 X 0 X 1 B 0 2l+0 2l+1 0 B 1 1 Z 0 Z 1 ead Node Z 0 X 0 at time 2l+0 0 X 0 at time 2l+1 1 X 1 at time 2l+0,1 i.e. always closed

17 Remove ead and ummy Nodes A 0 S 0 A 1 S 1 X 0 X 1 B 0 2l+0 2l+1 Z 0 B 1 Z 0 X 0 at time 2l+0 0 X 0 at time 2l+1 1 X 1 at time 2l+0,1 i.e. always closed

18 The igit Serial Adder A 0 S 0 A 1 S 1 X 0 X 1 digit2 digit1 B 0 2l+0 2l+1 B 1 a 3 a 2 a 1 a 0 b 3 b 2 b 1 b 0 Z 0 s 3 s 2 s 1 s 0 Carry next iteration =1 Carry within iteration

19 Fully Parallel Adder, i.e.j= LSB MSB A 0 S 0 A 1 S 1 A 2 S 2 A 3 S 3 X 0 X 1 X 2 X 3 B 0 0 B 1 1 B 2 2 B 3 3 Z 0 Z 1 Z 2 Z 3 For each node U in the original FG, draw J nodes U 0, U 1, U 2,, U J-1 For each edge U V with w delays in the original FG, draw the J edges U i V (i + w)%j with (i+w)/j delays for i = 0, 1,, J-1

20 Unfold the Switch, J= A 0 S 0 X 0 Z X l+0 (1l+0)+0 B 0 l+0 l+1,2,3 Z 0 0 X l+1 (1l+0)+1 l+2 (1l+0)+2 l+3 (1l+0)+3 Write the switching instance as Wl + u = J( W l + u/j ) + (u%j)

21 Unfold the Switch, J= A 0 S 0 X 0 Z X l+0 (1l+0)+0 B 0 l+0 l+1,2,3 Z 0 0 X l+1 (1l+0)+1 l+2 (1l+0)+2 l+3 (1l+0)+3 Only 1 time instance 0, i.e. fully parallel Z 0 X 0, 1 X 1, 2 X 2 and 3 X 3

22 Bit-parallel Adder A 0 S 0 A 1 S 1 A 2 S 2 A 3 S 3 X 0 X 1 X 2 X 3 B 0 0 B 1 1 B 2 2 B 3 3 Z 0 Z 1 Z 2 Z 3 Only 1 time instance 0, i.e. fully parallel Z 0 X 0, 1 X 1, 2 X 2 and 3 X 3

23 LSB Bit-parallel Adder MSB A 0 S 0 A 1 S 1 A 2 S 2 A 3 S 3 X 0 X 1 X 2 X 3 B 0 0 B 1 1 B 2 2 B 3 3 Z 0 Z 1 Z 2 Z 3 ead nodes

24 Remove ead and ummy Nodes A 0 S 0 A 1 S 1 A 2 S 2 A 3 S 3 X 0 X 1 X 2 X 3 B 0 0 B 1 1 B 2 2 B 3 3 Z 0 Z 1 Z 2 Z 3 ead nodes can be removed ummy nodes can be removed

25 Bit-parallel Adder Carry from MSB as overflow or if to be used as a -bit module A 0 S 0 A 1 S 1 A 2 S 2 A 3 S 3 X 0 X 1 X 2 X 3 B 0 B 1 B 2 B 3 Z 0 Switch if to be used as a -bit module Carry = 0 Carry Ripple Adder a 3 a 2 a 1 a 0 b 3 b 2 b 1 b 0 s 3 s 2 s 1 s 0

26 If Wordlength is not a multiple of J determine L=lcm{W,J}, lcm = least common multiple replace switching instance Wl+u with L/W instances Ll+u+wW, for w= 0..L/W-1 i.e. the switching periodicity has been changed from W to L perform the unfolding as previously identify the correspondence between original instances and expanded instances

27 Example: Unfold Bit-serial Adder by J=3 A Z X B l +0 l +1,2,3 S Wordlength W= not a multiple of the the unfolding factor J=3. etermine L=lcm{W,J}=lcm{,3}=12 Replace Wl+u with Ll+u+wW for w= 0 to L/W-1

28 Example: Unfold Bit-serial Adder by J=3 A X S Replace Wl+u with Ll+u+wW for w= 0 to L/W-1 B 12l+0,,8 Z 12l+1,2,3,5,6,7, 9,10,11 l+0 is now equivalent to 12l+0, 12l+ and 12l+8 and so on Unfold as for the regular case.

29 Example: Unfold Bit-serial Adder by J=3 A 0 S 0 A 1 S 1 A 2 S 2 X 0 X 1 X 2 B 0 0 B 1 1 B 2 2 Z 0 Z 2

30 Example: Unfold Bit-serial Adder by J=3 A S Write the switching instance as Wl + u = J( W l + u/j ) + (u%j) B 12l+0,,8 X 12l+0 = 3(l+0)+0 12l+ = 3(l+1)+1 12l+8 = 3(l+2)+2 raw an edge from node U u%j V u%j, which is switched at time instance Z 0 X 0 Z 1 X 1 Z 2 X 2 Z ( W l + u/j )

31 Example: Unfold Bit-serial Adder by J=3 A 0 S 0 A 1 S 1 A 2 S 2 X 0 X 1 X 2 B 0 0 B 1 1 B 2 2 l+0 l+1 l+2 Z 0 Z 1 Z 2

32 Example: Unfold Bit-serial Adder by J=3 A B Z X S 12l+1,2,3,5,6,7, 9,10,11 Write the switching instance as Wl + u = J( W l + u/j ) + (u%j) 12l+1 = 3(l+0)+1 12l+2 = 3(l+0)+2 12l+3 = 3(l+1)+0 0 X 0 12l+5 = 3(l+1)+2 12l+6 = 3(l+2)+0 0 X 0 12l+7 = 3(l+2)+1 12l+9 = 3(l+3)+0 0 X 0 12l+10 = 3(l+3)+1 12l+11 = 3(l+3)+2

33 Example: Unfold Bit-serial Adder by J=3 A 0 S 0 A 1 S 1 A 2 S 2 X 0 X 1 X 2 B 0 0 B 1 1 B 2 2 l+0 l+1,2,3 l+1 l+0,2,3 l+2 l+0,1,3 Z 0 Z 1 Z 2

34 Example: Unfold Bit-serial Adder by J=3 A B Z X S 12l+1,2,3,5,6,7, 9,10,11 Write the switching instance as Wl + u = J( W l + u/j ) + (u%j) 12l+1 = 3(l+0)+1 1 X 1 12l+2 = 3(l+0)+2 12l+3 = 3(l+1)+0 12l+5 = 3(l+1)+2 12l+6 = 3(l+2)+0 12l+7 = 3(l+2)+1 1 X 1 12l+9 = 3(l+3)+0 12l+10 = 3(l+3)+1 1 X 1 12l+11 = 3(l+3)+2

35 Example: Unfold Bit-serial Adder by J=3 A 0 S 0 A 1 S 1 A 2 S 2 X 0 X 1 X 2 B 0 0 B 1 1 B 2 2 l+0 l+1,2,3 l+1 l+0,2,3 l+2 l+0,1,3 Z 0 Z 1 Z 2

36 Example: Unfold Bit-serial Adder by J=3 A B Z X S 12l+1,2,3,5,6,7, 9,10,11 Write the switching instance as Wl + u = J( W l + u/j ) + (u%j) 12l+1 = 3(l+0)+1 12l+2 = 3(l+0)+2 2 X 2 12l+3 = 3(l+1)+0 12l+5 = 3(l+1)+2 2 X 2 12l+6 = 3(l+2)+0 12l+7 = 3(l+2)+1 12l+9 = 3(l+3)+0 12l+10 = 3(l+3)+1 12l+11 = 3(l+3)+2 2 X 2

37 Example: Unfold Bit-serial Adder by J=3 A 0 S 0 A 1 S 1 A 2 S 2 X 0 X 1 X 2 B 0 0 B 1 1 B 2 2 l+0 l+1,2,3 l+1 l+0,2,3 l+2 l+0,1,3 Z 0 Z 1 Z 2 c 3 c 2 c 1 c 0 d 3 d 2 d 1 d 0 s 3 s 2 s 1 s 0 a 3 a 2 a 1 a 0 b 3 b 2 b 1 b 0 s 3 s 2 s 1 s 0

38 End of Unfolding

39 Folding Chapter 6

40 Node A What is folding?? Folding is the Inverse of Unfolding A Folding by N (N=folding factor) A 0 Unfolding by J A 1 A J-1

41 Folding? Used to minimize silicon area (trading area for time)! A way to systematically determine the control circuits in SP architectures by folding transformation, where multiple algorithm operations are time-multiplexed to a single functional unit. Use for synthesis of SP architectures that can be operated at single or multiple clocks. Use to reduce the number of hardware functional units (FUs) such as adders and mults by a factor of N at the expense of increasing computation time by a factor of N. But Folding lead to an architecture that uses a large number of registers and thus a register minimization technique needs sometime to be applied.

42 Hardware Mapped vs. Time multiplexed x(n) FIR : y N 1 k = 0 ( n) = h( k) x( n k) MUX c h0 h1 h2 h3 REG 1 sample/cc N fixed multipliers N-1 adders y(n) N cc/sample 1 generalized multiplier 1 adders 1 coefficient memory + control

43 Hardware Mapped vs. Time multiplexed/microcoded Biquad Filter M MUX Hardware mapped 5 mult with fixed coeffecients adders 2 delays Latency=1cc Microcoded 1 mult 1 adder Latency=5cc Coeff Memory 3 Registers Controller c MUX REG REG REG

44 a(n) SP esign Folding Time-shared Architecture b(n) c(n) y(n) y ( n) = a( n) + b( n) + c( n) a(n) b(n) 2l+0 2l+0 2l+1 2l+1 c(n) 2l+0 Folding is a technique to reduce the silicon area by time-multiplexing many operations into single functional units. The right figure shows a 2 times folded architecture where 2 additions are folded, or time-multiplexed, to a single adder Folding introduces registers/storage Computation time increased, e.g. one output sample every 2 cc (one input signal consumed every 2cc) y(n)

45 Folding Example A more detailed look! a(0) b(0) 2l+0 y ( n) = a( n) + b( n) + c( n) Cycle 0 2l+0 2l+0 y(-1) a(n) a(0)+b(0) b(n) c(n) Cycle 1 2l+1 c(0) y(n) b(1) Cycle 2 2l+0 2l+1 Cycle 3 2l+1 c(1) a(1) 2l+0 2l+0 a(0)+b(0)+c(0) 2l+1

46 Control Unit Finite State Machine a(n) 2l+0 b(n) 2l+0 2l+1 2l+1 c(n) 2l+0 y(n) S0 -/0 -/1 S1 control signal(s) Control units can be complex in large systems!

47 Folding Reduce hardware by N-folding T computation increased by N Latecy Extremes Fully parallel Time multiplexed = 1 unit per algorithmic operation Folding extra registers, i.e. extra storage a more complex control unit more latency

48 N=folding factor Nr. of operations folded to a single unit Folding Transformation Nl+u H u U ω(e) V l = iteration HW-unit V P u F (U V) Nl+v H v P v HW-unit U Level of Pipeline elays in folded graph u and v are folding order, i.e. scheduled time 0 u, v N 1

49 Folding Transformation Nl+u H u P u F (U V) Nl+v H u is pipelined by P u stages and its output is available at Nl + u + P u. Edge U V has w(e) delays the l-th iteration of U is used by (l+w(e)) th iteration of node V, which is executed at N(l + w(e)) + v. So, the result should be stored for : F (U V) = [N(l + w(e)) + v] [Nl + P u + u] H v P v F (U V) = Nw(e) - P u + v u ( independent of l )

50 Folding Set A folding set is an ordered set of operations to be executed on the same functional unit. The folding set are typically obtained from a scheduling and allocation algorithm (ref. Appendix B) The folding set represents underlying folding transformation Each set contain N entries, N=folding factor. Folding order: ( N-1) S = { A, A } 0, A 1 belongs to folding set S 1 with folding order 0 N=3 Null operation ( ) S 0 1

51 Ex. Folding of Biquad filter In 1 2 a 3 b 5 6 c 7 8 (S 2 2) d Out

52 T P adder adder = Ex. Folding of Biquad filter In (S 1 3) 1 2 a 3 b 5 6 c d (S 1 1) (S 1 2) (S 1 0) (S 2 0) (S 2 2) 1 = 1 Additions 7 8 (S 2 3) (S 2 1) Out Multiplication T P mult mult = = 2 2 S {,2,3,1 } S = { 5,8,6,7} 2 1 =

53 Folding of Biquad filter, N= F (U V) = Nw(e) - P u + v u F (1 2) = -3 F (1 5) = 0 F (1 6) = 2 F (1 7) = 7 F (1 8) = 5 F (3 1) = 0 F ( 2) = 0 F (5 3) = 0 F (6 ) = - F (7 3) = -3 F (8 ) = -3 receive In send (S 1 3) A delay between two edges can not be negative! 1 2 a 3 b 5 6 c d (S 1 1) (S 1 2) (S 1 0) (S 2 0) (S 2 2) 7 8 (S 2 3) (S 2 1) Out F ( U V ) < 0 Not Valid folding

54 Retiming: Folding of Biquad filter, In (S 1 3) N= 1 2 a 3 b 5 6 c d (S 1 1) (S 1 2) (S 1 0) (S 2 0) (S 2 2) Retiming Split and move delay 7 8 (S 2 3) (S 2 1) Out Feedforward cutset Pipelining

55 Folding of retimed Biquad filter In (S 1 3) 1 2 a 3 b 5 6 c d (S 1 1) (S 1 2) (S 1 0) (S 2 0) (S 2 2) 7 8 (S 2 3) (S 2 1) Out

56 Systematic way of: Retiming for Folding r(u) U ω(e) V r(v) If F (U V) is the folded delays of the edge U V for the retimed graph then F (U V) 0 Nw r (e) P U + v u 0 U ω r (e) V ω r (e) = ω(e) + r(v) - r(u) N(w(e) + r(v) r(u) ) - P U + v u 0 N(r(U) r(v)) Nw(e) - P U + v u r(u) r(v) F (U V) /N receive send r(u) r(v) F (U V) /N (floor since retiming values are integers) Then solve the the system of inequalities!

57 T P SP esign adder adder Folding of retimed Biquad filter, = 1 = 1 In Additions N= S {,2,3,1 } S = { 5,8,6,7} 2 1 = (S 1 3) 1 2 a 3 b 5 6 c d (S 1 1) (S 1 2) (S 1 0) (S 2 0) (S 2 2) 7 8 (S 2 3) (S 2 1) Out Multiplication T P mult mult = = 2 2

58 Folding of retimed Biquad filter, receive F (U V) = Nw(e) - P u + v u F (1 2) = (1) = 1 F (1 5) = (1) = 0 F (1 6) = (1) = 2 F (1 7) = (1) = 3 F (1 8) = (2) = 5 F (3 1) = (0) = 0 F ( 2) = (0) = 0 F (5 3) = (0) = 0 F (6 ) = (1) = 0 F (7 3) = (1) = 1 F (8 ) = (1) = 1 N= In F send (S 1 3) Out 1 a b Valid folding ( U V ) c 0 d (S 1 1) (S 1 2) (S 1 0) (S 2 0) (S 2 2) 7 8 (S 2 3) (S 2 1)

59 Folding of retimed Biquad filter, receive F (U V) = Nw(e) - P u + v u F (1 2) = (1) = 1 F (1 5) = (1) = 0 F (1 6) = (1) = 2 F (1 7) = (1) = 3 F (1 8) = (2) = 5 F (3 1) = (0) = 0 F ( 2) = (0) = 0 F (5 3) = (0) = 0 F (6 ) = (1) = 0 F (7 3) = (1) = 1 F (8 ) = (1) = 1 N= In F send (S 1 3) Out 1 a b Valid folding ( U V ) c 0 d (S 1 1) (S 1 2) (S 1 0) (S 2 0) (S 2 2) 7 8 P adder (S 2 3) (S 2 1) =1

60 Folding of retimed Biquad filter, N= F (U V) = Nw(e) - P u + v u 5 delays S 1 = Additions S 2 = {,2,3,1 } Multiplication F (1 8) = (2) = 5 path from add to mult with 5 Node 8 has folding order 1 switch close at 1 { 5,8,6,7}

61 Folding of retimed Biquad filter S 1 = Additions {,2,3,1 } Multiplication S 2 = F (3 1) = (0) = 0 path from add to add with 0 Node 1 has folding order 3 switch close at 3 Node 1 is also connected to the input { 5,8,6,7}

62 Folding of retimed Biquad filter S 1 = Additions {,2,3,1 } Multiplication S = 2 Execution of node 2 (input from node 1 and ) : F (1 2) = (1) = 1 path from add to add with 1 Node 2 has folding order 1 switch close at 1 { 5,8,6,7} F ( 2) = (0) = 0 path from add to add with 0

63 Folding & Register Minimization Chapter 6.3

64 Register/Storage Minimization Folding inserts register. Lifetime analysis is used for register minimization techniques in a SP hardware. A variable is live from the time it is produced until the time it is consumed. After that it is dead. Linear lifetime chart : Represents the lifetime of the variables in a linear fashion. Convention: a variable is not live during the clock cycle when it is produced but live during the clock cycle when it is consumed. One iteratiom 6 cc N=6

65 Register Minimization Max. number of live variables Min. number of registers Use previous iter. to avoid drawing lifetime chart over several iterations 2 live variables But 3 if several iterations 2 live variables in iteration

66 Register Minimization Max. number of live variables Min. number of registers 6cc 6cc 2 live variables But 3 if several iterations 2 live variables in iteration

67 SP esign Example of a systematic way of working with lifetime charts 3x3 Matrix Transpose i h g f e d c b a i f c h e b g d a Matrix Transposer i h g f e d c b a i f c h e b g d a One iteration = 9 clock cycles

68 Lifetime Table - 3x3 Matrix Transpose i h g f e d c b a Matrix Transposer Sample T in T zlout T diff T out Life a 0 0 b 1 3 c 2 6 d 3 1 e f 5 7 g 6 2 h 7 5 i 8 8 Out before In -2 i f c h e b g d a

69 Lifetime Table - 3x3 Matrix Transpose i h g f e d c b a Matrix Transposer i f c h e b g d a Sample T in T zlout T diff T out Life a b c 2 6 d e 0 f g h i T diff = T zlout T input, where T zlout = zero latency

70 3x3 Matrix Transpose Sample T in T zlout T diff T out Life a b c 2 6 d e 0 f g h i if T diff < 0 not causal add latency = T negative diffmax for all nodes 5

71 3x3 Matrix Transpose Sample T in T zlout T diff T out Life a b c d e f g h i if T diff < 0 not causal add latency T lat = T negative diffmax for all nodes T out = T zlout +T lat

72 Lifetime chart 3x3 Matrix Transpose Sampl e T in T zlout T diff T out Life a b c d e f g h i One iteration = 9 clock cycles cycle a b c d e f g h i #live Contribution from next iteration

73 Lifetime chart 3x3 Matrix Transpose Sampl e T in T zlout T diff T out Life a b c d e f g h i One iteration = 9 clock cycles cycle a b c d e f g h i #live = 3 +1= 2 +2= 1 +3= The total

74 One iteration = 9 clock cycles Lifetime chart cycle a b c d e f g h i #live Contribution from next iteration cycle a b c d e f g h i #live

75 Lifetime chart 3x3 Matrix Transpose Sampl e T in T zlout T diff T out Life a b c d e f g h i max #live = registers x x x x

76 Circular lifetime chart Useful to represent the periodic nature of the SP programs. 0 (N-1) pies Number in parantheses represents the number of live variables at each time instance

DSP Design Lecture 5. Dr. Fredrik Edman.

DSP Design Lecture 5. Dr. Fredrik Edman. SP esign SP esign Lecture 5 Retiming r. Fredrik Edman fredrik.edman@eit.lth.se Fredrik Edman, ept. of Electrical and Information Technology, Lund University, Sweden-www.eit.lth.se SP esign Repetition Critical