ENEE350 Lecture Notes-Weeks 14 and 15
|
|
- Gladys Anderson
- 5 years ago
- Views:
Transcription
1 Pipelining & Amdahl s Law ENEE350 Lecture Notes-Weeks 14 and 15 Pipelining is a method of processing in which a problem is divided into a number of sub problems and solved and the solu8ons of the sub problems for different instances of the problem are then overlapped.
2 Example: a[i] = b[i] + c[i] + d[i] + e[i] + f[i], i = 1, 2, 3,,n c[2] c[1] f[2] f[1] e[2] D e[1] d[2] D D d[1] D D D b[1] a[2] a[1] Adders have delay D to compute. Computation time = 4D + (n-1)d = nd +3D Speed-up = 4nD/{3D + nd} -> 4 for large n.
3 We can describe the computa8on process in an n segment pipeline algorithmically. There are three dis8nct phases to this computa8on: (a) filling the pipeline, (b) running the pipeline in the filled state un8l the last input arrives, and (c) emptying the pipeline.
4 Example: Pipelined Ripple Adder u[m 1n 11], v[m 1,n 1] u[m 1,1], v[m 1,1] u[1,n 1], v[1,n 1] u[0,n 1], v[0,n 1] u[m 1,0], v[m 1,0] u[1,0], v[1,0] u[1,1], v[1,1] u[0,1], v[0,1] u[0,0], v[0,0] u[m 1:0 n 1 + v[m 1:0 n 1] u[1:0 n 1] + v[1:0 n 1] u[0:0 n 1] + v[0:0 n 1] 0 FA 0 D FA0 D FA n 1 D clock
5 Instruc6on pipelines: Goal: (i) to increase the throughput (number of instruc8ons/sec) in execu8ng programs (ii) to reduce the execu8on 8me (clock cycles/instruc8on, etc). clock 0 fetch decode execute I 1 1 I 2 I 1 2 I 3 I 2 I 1 3 I 4 I 3 I 2 4 I 4 I 3
6 A 5 stage (MIPS) pipeline clock fetch decode execute memory write back 0 I 1 1 I 2 I 1 2 I 3 I 2 I 1 3 I 4 I 3 I 2 I 1 4 I 5 I 4 I 3 I 2 I 1
7 Speed up of pipelined execu6on of instruc6ons over a sequen6al execu6on: S(5) = T 1 T p = CPI un u / f u CPI p N p / f p N u : The number of instruc8ons executed by serial system N p : The number of instruc8ons executed by pipeline system CPI u : Number of clock cycles per instruc8on for serial system CPI p : Number of clock cycles per instruc8on for pipeline system f u : Clock frequency of serial system f p : Clock frequency of pipeline system Assuming that the serial and pipeline systems both operate at the same clock rate and use the same number of opera8ons: S(5) = CPI u CPI p
8 Example Suppose that the instruc8on mix of programs executed on a serial and pipeline machines is 40% ALU, 20% branching, and 40% memory with 4, 2, and 4 cycles per each instruc8on in the three classes respec8vely. Then, under ideal condi8ons (no stalls due to hazards) S(5) = CPI u CPI p = = 3.3 If, the clock speed needs to be increased for the pipeline implementa8on then the speed up will have to be scaled down accordingly using the formula on the previous slide.
9 Instruction Pipelines MIPS (Hennessy & Patterson)
10 MIPS Pipeline IF ID EX WB Register operations IF ID EX ME WB Register/Memory operations
11 Hazards 1 Structural Hazards 2 Data Hazards 3 Control Hazards
12 Structural Hazards: They arise when limited resources are scheduled to operate concurrently on different streams during the same clock period. Example: Memory conflict (data fetch + instruc8on fetch) or datapath conflict (arithme8c opera8on + PC update) Clock IF ID EX ME WB 0 I 1 1 I 2 I 1 2 I 3 I 2 I 1 3 I 4 I 3 I 2 I 1 4 I 5 I 4 I 3 I 2 I 1 5 I 6 I 5 I 4 I 3 I 2 6 I 7 I 6 I 5 I 4 I 3
13 Fix: Duplicate hardware (too expensive) Stall the pipeline (serialize the opera8on) (too slow) Clock IF ID EX ME WB 0 I 1 1 I 2 I 1 2 I 2 I 1 3 I 2 I 1 4 I 3 I 2 I 1 5 I 4 I 3 I 2 6 I 4 I 3 7 I 4 I 3 8 I 5 I 4 I 3 9 I 6 I 5 I 4
14 Speed up = T serial /T pipeline = 5nt s / {2nt s + 2t s }, for odd n = 5nt s / {2nt s + 3t s }, for even n > 5/2 as the number of instruc8ons, n, tends to infinity. Thus, we loose half the throughput due to stalls. Note: The pipeline 8me of execu8on can be computed using the recurrences T 1 = 4 T i = T i for even i T i = T i for odd i T 1 = 4, T 2 = 4 +1 = 5, T 3 = 5+3 = 8, T 4 = 8 +1 = 9, T 5 = 9+3 = 12, T 6 = = 13, T n = 2n + 2, T n+1 = 2n = 2n + 3
15 Data Hazards They occur when the execu8ons of two instruc8ons may result in the incorrect reading of operands and/or wri8ng of a result. Read Aher Write (RAW) Hazard (Data Dependency) Write Aher Read Hazard (WAR) (Data An8 dependency) Write Aher Write Hazard (WAW) (Data An8 dependency)
16 RAW Hazards They occur when reads are early and writes are late. Clock IF ID EX ME WB 0 I 1 1 I 2 I 1 2 I 3 I 2 I 1 3 I 4 I 3 Read I 1 4 I 5 I 4 I 3 I 2 Write 5 I 6 I 5 I 4 I 3 I 2 6 I 7 I 6 I 5 I 4 I 3 I 2 : R 3 = R 1 + R 2 I 1 : R 1 = R 1 + R 2
17 RAW Hazards (Cont d) They can be avoided by stalling the reads but this increases the execu8on 8me. A beier approach is to use data forwarding: Clock IF ID EX ME WB 0 I 1 1 I 2 I 1 2 I 3 I 2 I 1 3 I 4 I 3 Read I 2 4 I 5 I 4 I 3 I 2 Write I 1 5 I 6 I 5 I 4 I 3 I 2 6 I 7 I 6 I 5 I 4 I 3 I 2 : R 3 = R 1 + R 2 I 1 : R 1 = R 1 + R 2
18 WAR Hazards They occur when writes are early and reads are late Clock IF ID EX ME WB EX ME WB 0 I 1 1 I 2 I 1 2 I 3 I 2 I 1 3 I 4 I 3 I 2 I 1 4 I 5 I 4 I 3 I 2 I 1 5 I 6 I 5 I 4 I 3 Write Read 6 I 7 I 6 I 5 I 4 I 3 I 2 I 1 I 4 I 3 I 2 I 1 I 2 : R 3 = R 7 + R 5 ; R 6 = R 2 + R 8 I 1 : R 2 = R 2 + R 3 ; R 9 = R 3 + R 4
19 Branch Prediction in Pipeline Instruction Sequencing One of the major issues in pipelined instruc8on processing is to schedule condi8onal branch instruc8ons. When a pipeline controller encounters a condi8onal branch instruc8on it has a choice to decode it into one of two instruc8on streams. If the branch condi8on is met then the execu8on con8nues from the target of the condi8onal branch instruc8on; Otherwise, it con8nues with the instruc8on that follows the condi8onal branch instruc8on. As there are other instruc8ons moving behind a condi8onal branch instruc8on, it is necessary to have a system which can flush the pipeline in case the branch condi8on is mispredicted.
20 Example: Suppose that we execute the following assembly code on a 5 stage pipeline (IF, ID,EX,ME, WB): LDI R0 = 20; JCD R0 < 10, add; SUB R0,R1; JMP D,halt; add: ADD R0,R1; halt: HLT; If we assume that R0 < 10 then the SUB instruc8on would have been incorrectly fetched during the second clock cycle. and we will have to execute another fetch cycle to fetch the ADD instruc8on.
21 Classifica6on of branch predic6on algorithms Sta:c Branch Predic:on: The branch decision does not change over 8me we use a fixed branching policy. Dynamic Branch Predic:on: The branch decision does change over 8me we use a branching policy that varies over 8me.
22 Sta6c Branch Predic6on Algorithms 1 Don t predict (stall the pipeline) 2 Never take the branch 3 Always take the branch 4 Delayed branch
23 1 Stall the pipeline by 1 clock cycle : This allows us to determine the target of the branch instruc8on. JCD IF ID EX ME WB SUB IF ID EX ME WB ADD IF ID EX ME WB Pipeline proceeds with one of the Instruc8ons Stall and decide the branch.
24 Pipeline Execu8on Speed (stall case): Assuming only branch hazards, we can compute the average number of clock cycles per instruc8on (CPI) as CPI of the pipeline = CPI of ideal pipeline + the number of idle cycles/instruc8on = 1 + branch penalty branch frequency = 1 + branch frequency In general, CPI of the pipeline > 1 + branch frequency because of data and possibly structural hazards Pros: Straighqorward to implement Cons: The 8me overhead is high when the instruc8on mix includes a high percentage of branch instruc8ons.
25 2 Never take the branch. The instruc8on in the pipeline is flushed if it is determined that the branch should have been taken aher the ID stage is carried out. JCD IF ID EX ME WB SUB IF ID EX ME WB IOR IF ID EX ME WB Execute this if the branch fails XOR IF ID EX ME WB SUB instruc8on is always fetched and then either it is decoded and executed next or it is flushed and XOR is fetched and executed.
26 Pipeline Execu8on Speed (Never take the branch case): Assuming only branch hazards, we can compute the average number of clock cycles per instruc8on (CPI) as CPI of the pipeline = CPI of ideal pipeline + the number of idle cycles/instruc8on = 1 + branch penalty branch frequency mispredic8on rate = 1 + branch frequency mispredic8on rate Pros: If the predic8on is highly accurate then the pipeline can operate close to its full throughput. Cons: Implementa8on is not as straighqorward and requires flushing if decoding the branch address takes more than 1 clock cycle.
27 3 Always take the branch. The instruc8on in the pipeline is flushed if it is determined that the branch should have been taken aher the ID stage is carried out. JCD IF ID EX ME WB SUB IF ID EX ME WB IOR IF ID EX ME WB XOR IF ID EX ME WB address computa8on XOR instruc8on is always fetched and then either it is decoded and executed next or it is flushed and SUB is fetched and executed. Extra clock cycle is needed to set the PC to PC+1 during the EX cycle (because it was altered during the ID step to point to the XOR instruc8on) in case SUB must be fetched.
28 Pipeline Execu8on Speed (Always take the branch case): Assuming only branch hazards, we can compute the average number of clock cycles per instruc8on (CPI) as CPI of the pipeline = CPI of ideal pipeline + the number of idle cycles/instruc8on = 1 + branch penalty branch frequency predic8on rate + branch penalty branch frequency mispredic8on rate = 1 + branch frequency predic8on rate + 2 branch frequency mispredic8on rate Pros: No clear advantage other than it is beier suited for the execu8on of typical loops without the compiler's interven8on (but this can generally be overcome, see the next slide). Cons: Implementa8on is not as straighqorward, and has a higher mispredic8on penalty and overall expected CPI which is worse than the stall method.
29 Example: for (i = 0; i < 10; i++) a[i] = a[i] + 1; Branch always will not work well without compiler s help CLR R0; loop: JCD R0 >=10,exit LDD R1,R0; ADD R1,1; ST+ R1,R0; JMP D,loop; exit: Branch always will work well without compiler s help CLR R0; loop: LDD R1,R0; ADD R1,1; ST+ R1,R0; JCD R0 < 10,loop;
30 3 Delayed branch: Insert an instruc8on aher a branch instruc8on, and always execute it whether or not the branch condi8on applies. Of course, this must be an instruc8on that can be executed without any side effects on the correctness of the program. Pros: Pipeline is never stalled or flushed and with the correct choice branch delayed slot instruc8on, performance can approach that of an ideal pipeline. Cons: It is not always possible to find a delayed slot instruc8on in which case a NOP instruc8on may have to be inserted into the delayed slot to make sure that the program's integrity is not violated. It makes compilers work harder.
31 Which instruc8on to place into the delayed branch slot? 3.1 Choose an instruc8on before the branch, but make sure that branch does not depend on moved instruc8on. If such an instruc8on can be found, this always pays off. Example: ADD R1,R2; JCD R2>10,exit; can be rescheduled as JCD R2,>,10,exit; ADD R1,R2; (Delay slot)
32 3.2 Choose an instruc8on from the target of the branch, but make sure that the moved instruc8on is executable when the branch is not taken. Example: ADD R1,R2; JCD R2 > 10,sub; JMP D, add;. sub: SUB R4,R5; add: ADI R3,5; can be rescheduled as ADD R1,R2; JCD R2,>,10,sub; ADI R3,5; (Delay slot). sub: SUB R4,R5;
33 3.3 Choose an instruc8on from the an8 target of the branch, but make sure that the moved instruc8on is executable when the branch is taken. Example: ADD R1,R2; JCD R2 > 10,exit; ADD R3,R2; exit: SUB R4,R5; // ADD R4,R3; can be rescheduled as ADD R1,R2; JCD R2,>,10,exit; ADD R3,R2; (Schedule for execu8on if it does not alter the program flow or output) exit: SUB R4,R5;
34 Dynamic Branch Predic6on Dynamic branch predic8on relies on the history of how branch condi8ons were resolved in the past. History of branches is kept in a buffer. To keep this buffer reasonably small and easy to access, the buffer is indexed by some fixed number of lower order bits of the address of the branch instruc8on in the program space. Assump8on is that the address values in the lower address field are unique enough to prevent frequent collisions or overrides. Thus if we are trying to predict branches in a program which remains within a block of 256 loca8ons, 8 bits should suffice. x x+1 x+256 JCD.. JCD
35 Branch instruc8ons in the instruc8on cache include a branch predic8on field that is used to predict if the branch should be taken. Memory Location Program Branch prediction field x Branch instruction 0 (branch was not taken) x+4 x+8 Branch instruction 0 (branch was not taken) x+12 x+16 x+20 Branch instruction 1 (branch was taken)
36 Branch predic8on: In the simplest case, the field is a 1 bit tag: 0 <=> branch was not taken last 8me (State A) 1 <=> branch was taken last 8me (State B) not taken taken taken A B not taken While in state A predict the branch as not to be taken While in state B predict the branch as to be taken
37 This works rela8vely well: It accurately predicts the branches in loops in all but two of the itera8ons CLR R0; not taken taken loop: LDD R1,R0; taken ADD R1,1; A B ST+ R1,R0; JCD R0 < 10,loop; not taken Assuming that we begin in state A, predic8on fails when R0 = 1 (branch is not taken when it should be) and R0 =10(branch is taken when it should not be) Assuming that we begin in state B, predic8on fails when R0 =10 (branch is taken when it should not be)
38 We can modify the loop to make the branch predic8on algorithm fail twice when we begin in state B as well. CLR R0; loop:ldd R1,R0; ADD R1,1; ST+ R1,R0; JCD R0 >=10,exit; JMP D,loop; exit: not taken taken A B not taken taken Assuming that we begin in state B, predic8on fails: when R0 = 1 (branch is taken when it should not be) and R0 =10(branch is not taken when it should not be)
39 What is worse is that we can make this branch predic8on algorithm fail each 8me it makes a predic8on: LDI R0,1; loop: JCD R0 > 0,neg; LDI R0,1; JMP D,loop; neg: LDI R0, 1; JMP D,loop; not taken taken A B not taken taken Assuming that we begin in state A, predic8on fails when R0 = 1 (branch is not taken when it should be) R0 = 1 (branch is taken when it should not be) R0 = 1 (branch is not taken when it should be) R0 = 1 (branch is taken when it should not be) and so on
40 2 bit predic8on ( A more reluctant flip in decision ) not taken taken A1 A2 not taken not taken taken taken taken B2 B1 not taken While in states A1 and A2 predict the branch as not to be taken While in states B1 and B2 predict the branch as to be taken
41 CLR R0; loop: LDD R1,R0; ADD R1,1; ST+ R1,R0; JCD R0 < 10,loop; not taken not taken Assuming that we begin in state A1, predic8on fails when R0 = 1,2 (branch is not taken when it should be) and R0 = 10 (branch is taken when it should not be) Assuming that we begin in state B1, predic8on fails when R0 = 10 (branch is taken when it should not be) taken A1 A2 not taken taken B2 B1 not taken taken taken
42 2 bit predictors are more resilient to branch inversions (predic8ons are reversed when they are missed twice): LDI R0,1; not taken taken loop: JCD R0 > 0,neg; A1 A2 LDI R0,1; not taken JMP D,loop; neg: LDI R0, 1; not taken JMP D,loop; Assuming that we begin in state B1, predic8on succeeds when R0 = 1 (branch is taken when it should be) fails when R0 = 1 (branch is taken when it should not be) succeeds when R0 = 1 (branch is taken when it should be) fails when R0 = 1 (branch is taken when it should not be) and so on taken B2 B1 not taken taken taken
43 Amdahl's Law (Fixed Load Speed up) Let q be the frac8on of a load L that cannot be speeded up by introducing more processors and let T(p) be the amount 8me it takes to execute L on p processors by a linear work func8on, p > 1. Then T( p) > qt(1) + (1 q)t(1) p S( p) = T(1) T(p) < 1 q + 1 q 1 q as p p All this means is that, the maximum speed up of a system is limited by the frac8on of the work that must be completed sequen8ally. Thus, the execu8on of the work using p processors can be reduced to qt(1) under the best of circumstances, and the speed up cannot exceed 1/q.
44 Example A 4 processor computer executes instruc8ons that are fetched from a random access memory over a shared bus as shown below:
45 The task to be performed is divided into two parts: 1. Fetch instruc8on (serial part) it takes 30 microseconds 2. Execute instruc8on (parallel part) it takes 10 microseconds to execute: S(4) = T(1)/T(4) = 1/( /4) = 4/3.25 = 1.23 microseconds microseconds microseconds microseconds
46 Now, suppose that the number of processors is doubled. Then S(8) = T(1)/T(8) = 1/( /8) = 8/6.25 = 1.28 Suppose that the number of processors is doubled again. Then S(16) = T(1)/T(16) = 1/( /16) = 16/12.25 = 1.30.
47 What is the limit S(p) = T(1)/T(p) = 1/( /p) = 1/0.75 =
48 Alternate Forms of Amdahl's Law S = T(1) T unenhanced + T enhanced = T(1) T(1)(q + 1 q s ) 1 q as s. where s is the speed up of the computa8on that can be enhanced.
49 Example: Suppose that you've upgraded your computer from a 2 GHz processor to a 4 GHz processor. What is the maximum speed up you expect in execu8ng a typical program assuming that (1) the speed of fetching each instruc8on is directly propor8onal to the speed of reading an instruc8on from the primary memory of your computer, and reading an instruc8on takes four 8mes longer than execu8ng it, (2) the speed of execu8ng each instruc8on is directly propor8onal to the clock speed of the processor of your computer? Using Amdahl's Law with q = 0.8 and s = 2, we have S = 2 /( x 2) = Very disappoin8ng as you are likely to have paid quite a bit of money for the upgrade!
50 Generalized Amdahl's Law In general, a task may be par88oned into a set of subtasks, with each subtask requiring a designated number of processors to execute. In this case, the speed up of the parallel execu8on of the task over its sequen8al execu8on can be characterized by the following, more general formula: T(1) S( p 1, p 2,, p k ) = T(p 1, p 2,, p k ) T(1) < q 1 T(1) + q T(1) 2 p 1 p 2 where q 1 + q q k = q k T(1) p k = 1 q 1 p 1 + q 2 p q k p k When k = 2, q 1 = q, q 2 = 1 q, p 1 = 1, p 2 = p, this formula reduces to Amdahl's Law.
51 Remark: The generalized Amdahl's Law can also be rewriien to express the speed up due to different amounts of speed enhancement (S e ) that can be made to different parts of a system: S e (s 1,s 2,,s k ) = T(1) T(s 1,s 2,,s k ) T(1) = q 1 T(1) + q 2T(1) s 1 s 2 where q 1 + q q k = q kt(1) s k < 1 q 1 s 1 + q 2 s q k s k
52 Example: Suppose that your computer executes a program that has the following profile of execu8on: (a) 30% integer opera8ons, (b) 20% floa8ng point opera8ons, (c) 50% memory reference instruc8ons How much speed up will you expect if you double the speed of the floa8ng unit of your computer?using the formula above: S e =1/( / ) = 1.1
53 Example: Suppose that you have a fixed budget of $500 to upgrade each of the computers in your laboratory, and you find out that the computa8ons you perform on your computers require (a) 40% integer opera8ons, (b) 60% floa8ng point opera8ons, If every dollar spent on the integer unit aher $50 decreases its execu8on 8me by 2%, and if every dollar spent on the floa8ng point unit aher $100 decreases its execu8on 8me by 1%, how would you spend the $500?
54 Example (Con6nued): S = T(1) T i (x 1 ) + T f (x 2 ) where x 1 + x 2 = 350 T i (x 1 ) = (1 0.02)T i (x 1 1) T i (x 1 ) = 0.98 x 1 T i (0) T f (x 2 ) = (1 0.01)T f (x 2 1) T f (x 2 ) = 0.99 x 2 T f (0) T i (0) = 0.4T(1) T f (0) = 0.6T(1) Subs8tu8ng these into the generalized Amdahl's speed up expression gives: T(1) S = 0.98 x T(1) x T(1) 1 = 0.98 x x 2 0.6
55 Example 8 (Con6nued): So we maximize x x subject to x 1 + x 2 = 350, or maximize x x subject to x 1 < 350.
56 Example (Con6nued): Compu8ng the values in the neighborhood of 120 reveals that the speed up is maximized when x 1 = 126. From Mathema8ca: Table[1/ (0.4 * 0.98^x * 0.99 ^(350 x)),{ x, 120,128,1}] { , , , , , , , ,10.574} Note: It is possible to have higher speed up with all of the money invested in one of the units if the fix cost for one of the units becomes sufficiently large.
57 Addendum: If the changes in performance due to upgrades are specified in terms of speed rather than 8me, we can then use the following formula8on: t = L s Δt Δx = Δt Δs Δs Δx = L Δs s 2 Δx = L s Δs = t Δs s s Δs s 1 Δx Δt = L s Δt =T(x) T(x 1) = T(x 1) Δs s T(x) = (1 Δs )T(x 1) s Δs where denotes the percentage change in speed. s
CMP N 301 Computer Architecture. Appendix C
CMP N 301 Computer Architecture Appendix C Outline Introduction Pipelining Hazards Pipelining Implementation Exception Handling Advanced Issues (Dynamic Scheduling, Out of order Issue, Superscalar, etc)
More informationICS 233 Computer Architecture & Assembly Language
ICS 233 Computer Architecture & Assembly Language Assignment 6 Solution 1. Identify all of the RAW data dependencies in the following code. Which dependencies are data hazards that will be resolved by
More informationEXAMPLES 4/12/2018. The MIPS Pipeline. Hazard Summary. Show the pipeline diagram. Show the pipeline diagram. Pipeline Datapath and Control
The MIPS Pipeline CSCI206 - Computer Organization & Programming Pipeline Datapath and Control zybook: 11.6 Developed and maintained by the Bucknell University Computer Science Department - 2017 Hazard
More informationPipelining. Traditional Execution. CS 365 Lecture 12 Prof. Yih Huang. add ld beq CS CS 365 2
Pipelining CS 365 Lecture 12 Prof. Yih Huang CS 365 1 Traditional Execution 1 2 3 4 1 2 3 4 5 1 2 3 add ld beq CS 365 2 1 Pipelined Execution 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
More informationCSCI-564 Advanced Computer Architecture
CSCI-564 Advanced Computer Architecture Lecture 8: Handling Exceptions and Interrupts / Superscalar Bo Wu Colorado School of Mines Branch Delay Slots (expose control hazard to software) Change the ISA
More informationECE 3401 Lecture 23. Pipeline Design. State Table for 2-Cycle Instructions. Control Unit. ISA: Instruction Specifications (for reference)
ECE 3401 Lecture 23 Pipeline Design Control State Register Combinational Control Logic New/ Modified Control Word ISA: Instruction Specifications (for reference) P C P C + 1 I N F I R M [ P C ] E X 0 PC
More information4. (3) What do we mean when we say something is an N-operand machine?
1. (2) What are the two main ways to define performance? 2. (2) When dealing with control hazards, a prediction is not enough - what else is necessary in order to eliminate stalls? 3. (3) What is an "unbalanced"
More information1. (2 )Clock rates have grown by a factor of 1000 while power consumed has only grown by a factor of 30. How was this accomplished?
1. (2 )Clock rates have grown by a factor of 1000 while power consumed has only grown by a factor of 30. How was this accomplished? 2. (2 )What are the two main ways to define performance? 3. (2 )What
More informationUnit 6: Branch Prediction
CIS 501: Computer Architecture Unit 6: Branch Prediction Slides developed by Joe Devie/, Milo Mar4n & Amir Roth at Upenn with sources that included University of Wisconsin slides by Mark Hill, Guri Sohi,
More informationPortland State University ECE 587/687. Branch Prediction
Portland State University ECE 587/687 Branch Prediction Copyright by Alaa Alameldeen and Haitham Akkary 2015 Branch Penalty Example: Comparing perfect branch prediction to 90%, 95%, 99% prediction accuracy,
More informationComputer Architecture ELEC2401 & ELEC3441
Last Time Pipeline Hazard Computer Architecture ELEC2401 & ELEC3441 Lecture 8 Pipelining (3) Dr. Hayden Kwok-Hay So Department of Electrical and Electronic Engineering Structural Hazard Hazard Control
More information[2] Predicting the direction of a branch is not enough. What else is necessary?
[2] When we talk about the number of operands in an instruction (a 1-operand or a 2-operand instruction, for example), what do we mean? [2] What are the two main ways to define performance? [2] Predicting
More informationPipeline no Prediction. Branch Delay Slots A. From before branch B. From branch target C. From fall through. Branch Prediction
Pipeline no Prediction Branching completes in 2 cycles We know the target address after the second stage? PC fetch Instruction Memory Decode Check the condition Calculate the branch target address PC+4
More informationCSE. 1. In following code. addi. r1, skip1 xor in r2. r3, skip2. counter r4, top. taken): PC1: PC2: PC3: TTTTTT TTTTTT
CSE 560 Practice Problem Set 4 Solution 1. In this question, you will examine several different schemes for branch prediction, using the following code sequence for a simple load store ISA with no branch
More informationNCU EE -- DSP VLSI Design. Tsung-Han Tsai 1
NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using
More information3. (2) What is the difference between fixed and hybrid instructions?
1. (2 pts) What is a "balanced" pipeline? 2. (2 pts) What are the two main ways to define performance? 3. (2) What is the difference between fixed and hybrid instructions? 4. (2 pts) Clock rates have grown
More information[2] Predicting the direction of a branch is not enough. What else is necessary?
[2] What are the two main ways to define performance? [2] Predicting the direction of a branch is not enough. What else is necessary? [2] The power consumed by a chip has increased over time, but the clock
More informationFall 2011 Prof. Hyesoon Kim
Fall 2011 Prof. Hyesoon Kim Add: 2 cycles FE_stage add r1, r2, r3 FE L ID L EX L MEM L WB L add add sub r4, r1, r3 sub sub add add mul r5, r2, r3 mul sub sub add add mul sub sub add add mul sub sub add
More informationWorst-Case Execution Time Analysis. LS 12, TU Dortmund
Worst-Case Execution Time Analysis Prof. Dr. Jian-Jia Chen LS 12, TU Dortmund 02, 03 May 2016 Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 1 / 53 Most Essential Assumptions for Real-Time Systems Upper
More informationCMP 334: Seventh Class
CMP 334: Seventh Class Performance HW 5 solution Averages and weighted averages (review) Amdahl's law Ripple-carry adder circuits Binary addition Half-adder circuits Full-adder circuits Subtraction, negative
More informationCPSC 3300 Spring 2017 Exam 2
CPSC 3300 Spring 2017 Exam 2 Name: 1. Matching. Write the correct term from the list into each blank. (2 pts. each) structural hazard EPIC forwarding precise exception hardwired load-use data hazard VLIW
More information/ : Computer Architecture and Design
16.482 / 16.561: Computer Architecture and Design Summer 2015 Homework #5 Solution 1. Dynamic scheduling (30 points) Given the loop below: DADDI R3, R0, #4 outer: DADDI R2, R1, #32 inner: L.D F0, 0(R1)
More informationIssue = Select + Wakeup. Out-of-order Pipeline. Issue. Issue = Select + Wakeup. OOO execution (2-wide) OOO execution (2-wide)
Out-of-order Pipeline Buffer of instructions Issue = Select + Wakeup Select N oldest, read instructions N=, xor N=, xor and sub Note: ma have execution resource constraints: i.e., load/store/fp Fetch Decode
More informationFall 2008 CSE Qualifying Exam. September 13, 2008
Fall 2008 CSE Qualifying Exam September 13, 2008 1 Architecture 1. (Quan, Fall 2008) Your company has just bought a new dual Pentium processor, and you have been tasked with optimizing your software for
More informationMicroprocessor Power Analysis by Labeled Simulation
Microprocessor Power Analysis by Labeled Simulation Cheng-Ta Hsieh, Kevin Chen and Massoud Pedram University of Southern California Dept. of EE-Systems Los Angeles CA 989 Outline! Introduction! Problem
More informationSimple Instruction-Pipelining. Pipelined Harvard Datapath
6.823, L8--1 Simple ruction-pipelining Updated March 6, 2000 Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Pipelined Harvard path 6.823, L8--2. fetch decode & eg-fetch execute
More informationSimple Instruction-Pipelining. Pipelined Harvard Datapath
6.823, L8--1 Simple ruction-pipelining Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Pipelined Harvard path 6.823, L8--2. I fetch decode & eg-fetch execute memory Clock period
More informationComputer Architecture
Lecture 2: Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture CPU Evolution What is? 2 Outline Measurements and metrics : Performance, Cost, Dependability, Power Guidelines
More informationCS 52 Computer rchitecture and Engineering Lecture 4 - Pipelining Krste sanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste! http://inst.eecs.berkeley.edu/~cs52!
More informationPerformance Metrics for Computer Systems. CASS 2018 Lavanya Ramapantulu
Performance Metrics for Computer Systems CASS 2018 Lavanya Ramapantulu Eight Great Ideas in Computer Architecture Design for Moore s Law Use abstraction to simplify design Make the common case fast Performance
More informationAdders, subtractors comparators, multipliers and other ALU elements
CSE4: Components and Design Techniques for Digital Systems Adders, subtractors comparators, multipliers and other ALU elements Adders 2 Circuit Delay Transistors have instrinsic resistance and capacitance
More informationINF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)
INF2270 Spring 2010 Philipp Häfliger Summary/Repetition (1/2) content From Scalar to Superscalar Lecture Summary and Brief Repetition Binary numbers Boolean Algebra Combinational Logic Circuits Encoder/Decoder
More informationECE 172 Digital Systems. Chapter 12 Instruction Pipelining. Herbert G. Mayer, PSU Status 7/20/2018
ECE 172 Digital Systems Chapter 12 Instruction Pipelining Herbert G. Mayer, PSU Status 7/20/2018 1 Syllabus l Scheduling on Pipelined Architecture l Idealized Pipeline l Goal of Scheduling l Causes for
More informationWorst-Case Execution Time Analysis. LS 12, TU Dortmund
Worst-Case Execution Time Analysis Prof. Dr. Jian-Jia Chen LS 12, TU Dortmund 09/10, Jan., 2018 Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 1 / 43 Most Essential Assumptions for Real-Time Systems Upper
More informationAdders, subtractors comparators, multipliers and other ALU elements
CSE4: Components and Design Techniques for Digital Systems Adders, subtractors comparators, multipliers and other ALU elements Instructor: Mohsen Imani UC San Diego Slides from: Prof.Tajana Simunic Rosing
More informationCS 700: Quantitative Methods & Experimental Design in Computer Science
CS 700: Quantitative Methods & Experimental Design in Computer Science Sanjeev Setia Dept of Computer Science George Mason University Logistics Grade: 35% project, 25% Homework assignments 20% midterm,
More informationGATE 2014 A Brief Analysis (Based on student test experiences in the stream of CS on 1 st March, Second Session)
GATE 4 A Brief Analysis (Based on student test experiences in the stream of CS on st March, 4 - Second Session) Section wise analysis of the paper Mark Marks Total No of Questions Engineering Mathematics
More informationCOVER SHEET: Problem#: Points
EEL 4712 Midterm 3 Spring 2017 VERSION 1 Name: UFID: Sign here to give permission for your test to be returned in class, where others might see your score: IMPORTANT: Please be neat and write (or draw)
More informationGoals for Performance Lecture
Goals for Performance Lecture Understand performance, speedup, throughput, latency Relationship between cycle time, cycles/instruction (CPI), number of instructions (the performance equation) Amdahl s
More informationLecture: Pipelining Basics
Lecture: Pipelining Basics Topics: Performance equations wrap-up, Basic pipelining implementation Video 1: What is pipelining? Video 2: Clocks and latches Video 3: An example 5-stage pipeline Video 4:
More informationProject Two RISC Processor Implementation ECE 485
Project Two RISC Processor Implementation ECE 485 Chenqi Bao Peter Chinetti November 6, 2013 Instructor: Professor Borkar 1 Statement of Problem This project requires the design and test of a RISC processor
More informationCSE P 501 Compilers. Value Numbering & Op;miza;ons Hal Perkins Winter UW CSE P 501 Winter 2016 S-1
CSE P 501 Compilers Value Numbering & Op;miza;ons Hal Perkins Winter 2016 UW CSE P 501 Winter 2016 S-1 Agenda Op;miza;on (Review) Goals Scope: local, superlocal, regional, global (intraprocedural), interprocedural
More informationCMU Introduction to Computer Architecture, Spring 2015 HW 2: ISA Tradeoffs, Microprogramming and Pipelining
CMU 18-447 Introduction to Computer Architecture, Spring 2015 HW 2: ISA Tradeoffs, Microprogramming and Pipelining Instructor: Prof Onur Mutlu TAs: Rachata Ausavarungnirun, Kevin Chang, Albert Cho, Jeremie
More informationComputer Engineering Department. CC 311- Computer Architecture. Chapter 4. The Processor: Datapath and Control. Single Cycle
Computer Engineering Department CC 311- Computer Architecture Chapter 4 The Processor: Datapath and Control Single Cycle Introduction The 5 classic components of a computer Processor Input Control Memory
More informationSimple Instruction-Pipelining (cont.) Pipelining Jumps
6.823, L9--1 Simple ruction-pipelining (cont.) + Interrupts Updated March 6, 2000 Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Src1 ( j / ~j ) Src2 ( / Ind) Pipelining Jumps
More informationThe Working Set Model for Program Behavior. Peter J. Denning Massachuse=s Ins?tute of Technology, Cambridge, Massachuse=s
The Working Set Model for Program Behavior Peter J. Denning Massachuse=s Ins?tute of Technology, Cambridge, Massachuse=s 1 About the paper Published in 1968, 1st ACM Symposium on Opera?on Systems Principles
More informationThis Unit: Scheduling (Static + Dynamic) CIS 501 Computer Architecture. Readings. Review Example
This Unit: Scheduling (Static + Dnamic) CIS 50 Computer Architecture Unit 8: Static and Dnamic Scheduling Application OS Compiler Firmware CPU I/O Memor Digital Circuits Gates & Transistors! Previousl:!
More informationRegister Alloca.on. CMPT 379: Compilers Instructor: Anoop Sarkar. anoopsarkar.github.io/compilers-class
Register Alloca.on CMPT 379: Compilers Instructor: Anoop Sarkar anoopsarkar.github.io/compilers-class 1 Register Alloca.on Intermediate code uses unlimited temporaries Simplifying code genera.on and op.miza.on
More informationLogic and Computer Design Fundamentals. Chapter 8 Sequencing and Control
Logic and Computer Design Fundamentals Chapter 8 Sequencing and Control Datapath and Control Datapath - performs data transfer and processing operations Control Unit - Determines enabling and sequencing
More informationChapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>
Chapter 5 Digital Design and Computer Architecture, 2 nd Edition David Money Harris and Sarah L. Harris Chapter 5 Chapter 5 :: Topics Introduction Arithmetic Circuits umber Systems Sequential Building
More informationLecture 9: Control Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University
18 447 Lectre 9: Control Hazard and Resoltion James C. Hoe Department of ECE Carnegie ellon University 18 447 S18 L09 S1, James C. Hoe, CU/ECE/CALC, 2018 Yor goal today Hosekeeping simple control flow
More informationCS-683: Advanced Computer Architecture Course Introduction
CS-683: Advanced Computer Architecture Course Introduction Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology
More informationMeasurement & Performance
Measurement & Performance Timers Performance measures Time-based metrics Rate-based metrics Benchmarking Amdahl s law Topics 2 Page The Nature of Time real (i.e. wall clock) time = User Time: time spent
More informationMeasurement & Performance
Measurement & Performance Topics Timers Performance measures Time-based metrics Rate-based metrics Benchmarking Amdahl s law 2 The Nature of Time real (i.e. wall clock) time = User Time: time spent executing
More informationPerformance, Power & Energy. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So
Performance, Power & Energy ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So Recall: Goal of this class Performance Reconfiguration Power/ Energy H. So, Sp10 Lecture 3 - ELEC8106/6102 2 PERFORMANCE EVALUATION
More informationDepartment of Electrical and Computer Engineering University of Wisconsin - Madison. ECE/CS 752 Advanced Computer Architecture I.
Last (family) name: Solution First (given) name: Student I.D. #: Department of Electrical and Computer Engineering University of Wisconsin - Madison ECE/CS 752 Advanced Computer Architecture I Midterm
More informationEECS150. Arithmetic Circuits
EE5 ection 8 Arithmetic ircuits Fall 2 Arithmetic ircuits Excellent Examples of ombinational Logic Design Time vs. pace Trade-offs Doing things fast may require more logic and thus more space Example:
More informationLecture 12: Pipelined Implementations: Control Hazards and Resolutions
18-447 Lectre 12: Pipelined Implementations: Control Hazards and Resoltions S 09 L12-1 James C. Hoe Dept of ECE, CU arch 2, 2009 Annoncements: Spring break net week!! Project 2 de the week after spring
More informationTEST 1 REVIEW. Lectures 1-5
TEST 1 REVIEW Lectures 1-5 REVIEW Test 1 will cover lectures 1-5. There are 10 questions in total with the last being a bonus question. The questions take the form of short answers (where you are expected
More informationEECS150 - Digital Design Lecture 11 - Shifters & Counters. Register Summary
EECS50 - Digital Design Lecture - Shifters & Counters February 24, 2003 John Wawrzynek Spring 2005 EECS50 - Lec-counters Page Register Summary All registers (this semester) based on Flip-flops: q 3 q 2
More informationCounters. We ll look at different kinds of counters and discuss how to build them
Counters We ll look at different kinds of counters and discuss how to build them These are not only examples of sequential analysis and design, but also real devices used in larger circuits 1 Introducing
More informationLoop Scheduling and Software Pipelining \course\cpeg421-08s\topic-7.ppt 1
Loop Scheduling and Software Pipelining 2008-04-24 \course\cpeg421-08s\topic-7.ppt 1 Reading List Slides: Topic 7 and 7a Other papers as assigned in class or homework: 2008-04-24 \course\cpeg421-08s\topic-7.ppt
More informationClock-driven scheduling
Clock-driven scheduling Also known as static or off-line scheduling Michal Sojka Czech Technical University in Prague, Faculty of Electrical Engineering, Department of Control Engineering November 8, 2017
More informationPerformance of Computers. Performance of Computers. Defining Performance. Forecast
Performance of Computers Which computer is fastest? Not so simple scientific simulation - FP performance program development - Integer performance commercial work - I/O Performance of Computers Want to
More informationCPE100: Digital Logic Design I
Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu CPE100: Digital Logic Design I Final Review http://www.ee.unlv.edu/~b1morris/cpe100/ 2 Logistics Tuesday Dec 12 th 13:00-15:00 (1-3pm) 2 hour
More informationECE290 Fall 2012 Lecture 22. Dr. Zbigniew Kalbarczyk
ECE290 Fall 2012 Lecture 22 Dr. Zbigniew Kalbarczyk Today LC-3 Micro-sequencer (the control store) LC-3 Micro-programmed control memory LC-3 Micro-instruction format LC -3 Micro-sequencer (the circuitry)
More informationA Second Datapath Example YH16
A Second Datapath Example YH16 Lecture 09 Prof. Yih Huang S365 1 A 16-Bit Architecture: YH16 A word is 16 bit wide 32 general purpose registers, 16 bits each Like MIPS, 0 is hardwired zero. 16 bit P 16
More informationSystem Data Bus (8-bit) Data Buffer. Internal Data Bus (8-bit) 8-bit register (R) 3-bit address 16-bit register pair (P) 2-bit address
Intel 8080 CPU block diagram 8 System Data Bus (8-bit) Data Buffer Registry Array B 8 C Internal Data Bus (8-bit) F D E H L ALU SP A PC Address Buffer 16 System Address Bus (16-bit) Internal register addressing:
More informationImplementing the Controller. Harvard-Style Datapath for DLX
6.823, L6--1 Implementing the Controller Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 6.823, L6--2 Harvard-Style Datapath for DLX Src1 ( j / ~j ) Src2 ( R / RInd) RegWrite MemWrite
More informationComputer Architecture. ECE 361 Lecture 5: The Design Process & ALU Design. 361 design.1
Computer Architecture ECE 361 Lecture 5: The Design Process & Design 361 design.1 Quick Review of Last Lecture 361 design.2 MIPS ISA Design Objectives and Implications Support general OS and C- style language
More informationLecture 3, Performance
Lecture 3, Performance Repeating some definitions: CPI Clocks Per Instruction MHz megahertz, millions of cycles per second MIPS Millions of Instructions Per Second = MHz / CPI MOPS Millions of Operations
More informationCprE 281: Digital Logic
CprE 28: Digital Logic Instructor: Alexander Stoytchev http://www.ece.iastate.edu/~alexs/classes/ Simple Processor CprE 28: Digital Logic Iowa State University, Ames, IA Copyright Alexander Stoytchev Digital
More informationEE 660: Computer Architecture Out-of-Order Processors
EE 660: Computer Architecture Out-of-Order Processors Yao Zheng Department of Electrical Engineering University of Hawaiʻi at Mānoa Based on the slides of Prof. David entzlaff Agenda I4 Processors I2O2
More informationL07-L09 recap: Fundamental lesson(s)!
L7-L9 recap: Fundamental lesson(s)! Over the next 3 lectures (using the IPS ISA as context) I ll explain:! How functions are treated and processed in assembly! How system calls are enabled in assembly!
More informationTDDB68 Concurrent programming and operating systems. Lecture: CPU Scheduling II
TDDB68 Concurrent programming and operating systems Lecture: CPU Scheduling II Mikael Asplund, Senior Lecturer Real-time Systems Laboratory Department of Computer and Information Science Copyright Notice:
More informationPriority-driven Scheduling of Periodic Tasks (1) Advanced Operating Systems (M) Lecture 4
Priority-driven Scheduling of Periodic Tasks (1) Advanced Operating Systems (M) Lecture 4 Priority-driven Scheduling Assign priorities to jobs, based on their deadline or other timing constraint Make scheduling
More informationParallel Performance Theory - 1
Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science Outline q Performance scalability q Analytical performance measures q Amdahl s law and Gustafson-Barsis
More informationMicro-architecture Pipelining Optimization with Throughput- Aware Floorplanning
Micro-architecture Pipelining Optimization with Throughput- Aware Floorplanning Yuchun Ma* Zhuoyuan Li* Jason Cong Xianlong Hong Glenn Reinman Sheqin Dong* Qiang Zhou *Department of Computer Science &
More informationDSP Design Lecture 7. Unfolding cont. & Folding. Dr. Fredrik Edman.
SP esign Lecture 7 Unfolding cont. & Folding r. Fredrik Edman fredrik.edman@eit.lth.se Unfolding Unfolding creates a program with more than one iteration, J=unfolding factor Unfolding is a structured way
More informationEE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining
Slide 1 EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining Slide 2 Topics Clocking Clock Parameters Latch Types Requirements for reliable clocking Pipelining Optimal pipelining
More informationEECS150 - Digital Design Lecture 23 - FFs revisited, FIFOs, ECCs, LSFRs. Cross-coupled NOR gates
EECS150 - Digital Design Lecture 23 - FFs revisited, FIFOs, ECCs, LSFRs April 16, 2009 John Wawrzynek Spring 2009 EECS150 - Lec24-blocks Page 1 Cross-coupled NOR gates remember, If both R=0 & S=0, then
More informationDigital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.
Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Arithmetic Circuits January, 2003 1 A Generic Digital Processor MEMORY INPUT-OUTPUT CONTROL DATAPATH
More informationBranch Prediction using Advanced Neural Methods
Branch Prediction using Advanced Neural Methods Sunghoon Kim Department of Mechanical Engineering University of California, Berkeley shkim@newton.berkeley.edu Abstract Among the hardware techniques, two-level
More informationSpiral 2-1. Datapath Components: Counters Adders Design Example: Crosswalk Controller
2-. piral 2- Datapath Components: Counters s Design Example: Crosswalk Controller 2-.2 piral Content Mapping piral Theory Combinational Design equential Design ystem Level Design Implementation and Tools
More informationCMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design
CMPEN 411 VLSI Digital Circuits Spring 2011 Lecture 19: Adder Design [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp11 CMPEN 411 L19
More informationLecture 3, Performance
Repeating some definitions: Lecture 3, Performance CPI MHz MIPS MOPS Clocks Per Instruction megahertz, millions of cycles per second Millions of Instructions Per Second = MHz / CPI Millions of Operations
More informationCMSC 451: Lecture 7 Greedy Algorithms for Scheduling Tuesday, Sep 19, 2017
CMSC CMSC : Lecture Greedy Algorithms for Scheduling Tuesday, Sep 9, 0 Reading: Sects.. and. of KT. (Not covered in DPV.) Interval Scheduling: We continue our discussion of greedy algorithms with a number
More informationCMP 338: Third Class
CMP 338: Third Class HW 2 solution Conversion between bases The TINY processor Abstraction and separation of concerns Circuit design big picture Moore s law and chip fabrication cost Performance What does
More informationIntroduction The Nature of High-Performance Computation
1 Introduction The Nature of High-Performance Computation The need for speed. Since the beginning of the era of the modern digital computer in the early 1940s, computing power has increased at an exponential
More informationReview: Designing with FSM. EECS Components and Design Techniques for Digital Systems. Lec09 Counters Outline.
Review: Designing with FSM EECS 150 - Components and Design Techniques for Digital Systems Lec09 Counters 9-28-04 David Culler Electrical Engineering and Computer Sciences University of California, Berkeley
More informationProcessor Design & ALU Design
3/8/2 Processor Design A. Sahu CSE, IIT Guwahati Please be updated with http://jatinga.iitg.ernet.in/~asahu/c22/ Outline Components of CPU Register, Multiplexor, Decoder, / Adder, substractor, Varity of
More informationECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN. Week 9 Dr. Srinivas Shakkottai Dept. of Electrical and Computer Engineering
ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN Week 9 Dr. Srinivas Shakkottai Dept. of Electrical and Computer Engineering TIMING ANALYSIS Overview Circuits do not respond instantaneously to input changes
More informationTiming analysis and timing predictability
Timing analysis and timing predictability Caches in WCET Analysis Reinhard Wilhelm 1 Jan Reineke 2 1 Saarland University, Saarbrücken, Germany 2 University of California, Berkeley, USA ArtistDesign Summer
More informationCaches in WCET Analysis
Caches in WCET Analysis Jan Reineke Department of Computer Science Saarland University Saarbrücken, Germany ARTIST Summer School in Europe 2009 Autrans, France September 7-11, 2009 Jan Reineke Caches in
More informationChe-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University
Che-Wei Chang chewei@mail.cgu.edu.tw Department of Computer Science and Information Engineering, Chang Gung University } 2017/11/15 Midterm } 2017/11/22 Final Project Announcement 2 1. Introduction 2.
More informationPipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.
Pipelined Datapath Lectre notes from KP, H. H. Lee and S. Yalamanchili Sections 4.5 4. Practice Problems:, 3, 8, 2 Reading (2) Pipeline Performance Assme time for stages is v ps for register read or write
More informationPattern History Table. Global History Register. Pattern History Table. Branch History Pattern Pattern History Bits
An Enhanced Two-Level Adaptive Multiple Branch Prediction for Superscalar Processors Jong-bok Lee, Soo-Mook Moon and Wonyong Sung fjblee@mpeg,smoon@altair,wysung@dspg.snu.ac.kr School of Electrical Engineering,
More informationA Detailed Study on Phase Predictors
A Detailed Study on Phase Predictors Frederik Vandeputte, Lieven Eeckhout, and Koen De Bosschere Ghent University, Electronics and Information Systems Department Sint-Pietersnieuwstraat 41, B-9000 Gent,
More informationVLSI Physical Design Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
VLSI Physical Design Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 54 Design for Testability So, in the last lecture we talked
More informationEECS150 - Digital Design Lecture 21 - Design Blocks
EECS150 - Digital Design Lecture 21 - Design Blocks April 3, 2012 John Wawrzynek Spring 2012 EECS150 - Lec21-db3 Page 1 Fixed Shifters / Rotators fixed shifters hardwire the shift amount into the circuit.
More information