Department of Electrical and Computer Engineering The University of Texas at Austin

Similar documents
CMU Introduction to Computer Architecture, Spring 2015 HW 2: ISA Tradeoffs, Microprogramming and Pipelining

Design of Digital Circuits Lecture 14: Microprogramming. Prof. Onur Mutlu ETH Zurich Spring April 2017

ECE290 Fall 2012 Lecture 22. Dr. Zbigniew Kalbarczyk

Combinational vs. Sequential. Summary of Combinational Logic. Combinational device/circuit: any circuit built using the basic gates Expressed as

Computer Architecture Lecture 8: Pipelining. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/4/2013

Figure 4.9 MARIE s Datapath

UNIVERSITY OF WISCONSIN MADISON

ECE 3401 Lecture 23. Pipeline Design. State Table for 2-Cycle Instructions. Control Unit. ISA: Instruction Specifications (for reference)

4. (3) What do we mean when we say something is an N-operand machine?

3. (2) What is the difference between fixed and hybrid instructions?

Project Two RISC Processor Implementation ECE 485

CPSC 3300 Spring 2017 Exam 2

[2] Predicting the direction of a branch is not enough. What else is necessary?

Computer Engineering Department. CC 311- Computer Architecture. Chapter 4. The Processor: Datapath and Control. Single Cycle

Pipelining. Traditional Execution. CS 365 Lecture 12 Prof. Yih Huang. add ld beq CS CS 365 2

1. (2 )Clock rates have grown by a factor of 1000 while power consumed has only grown by a factor of 30. How was this accomplished?

CSCI-564 Advanced Computer Architecture

CHAPTER log 2 64 = 6 lines/mux or decoder 9-2.* C = C 8 V = C 8 C * 9-4.* (Errata: Delete 1 after problem number) 9-5.

[2] Predicting the direction of a branch is not enough. What else is necessary?

CENG3420 Lab 3-1: LC-3b Datapath

CMP 338: Third Class

Administrivia. Course Objectives. Overview. Lecture Notes Week markem/cs333/ 2. Staff. 3. Prerequisites. 4. Grading. 1. Theory and application

CSE Computer Architecture I

ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University

Logic and Computer Design Fundamentals. Chapter 8 Sequencing and Control

Enrico Nardelli Logic Circuits and Computer Architecture

CMP 334: Seventh Class

Verilog HDL:Digital Design and Modeling. Chapter 11. Additional Design Examples. Additional Figures

Basic Computer Organization and Design Part 3/3

Unit 6: Branch Prediction

System Data Bus (8-bit) Data Buffer. Internal Data Bus (8-bit) 8-bit register (R) 3-bit address 16-bit register pair (P) 2-bit address

CS/ECE 252: INTRODUCTION TO COMPUTER ENGINEERING COMPUTER SCIENCES DEPARTMENT UNIVERSITY OF WISCONSIN-MADISON

Simple Instruction-Pipelining (cont.) Pipelining Jumps

Design at the Register Transfer Level

Portland State University ECE 587/687. Branch Prediction

Review. Combined Datapath

Table of Content. Chapter 11 Dedicated Microprocessors Page 1 of 25

Simple Instruction-Pipelining. Pipelined Harvard Datapath

Lecture: Pipelining Basics

Worst-Case Execution Time Analysis. LS 12, TU Dortmund

CprE 281: Digital Logic

On my honor, as an Aggie, I have neither given nor received unauthorized aid on this academic work

A Second Datapath Example YH16

CSE. 1. In following code. addi. r1, skip1 xor in r2. r3, skip2. counter r4, top. taken): PC1: PC2: PC3: TTTTTT TTTTTT

Fall 2011 Prof. Hyesoon Kim

Laboratory Exercise #11 A Simple Digital Combination Lock

CMP N 301 Computer Architecture. Appendix C

Computer Architecture

Building a Computer. Quiz #2 on 10/31, open book and notes. (This is the last lecture covered) I wonder where this goes? L16- Building a Computer 1

ECE20B Final Exam, 200 Point Exam Closed Book, Closed Notes, Calculators Not Allowed June 12th, Name

CSE 200 Lecture Notes Turing machine vs. RAM machine vs. circuits

COVER SHEET: Problem#: Points

Virtualization. Introduction. G. Lettieri A/A 2014/15. Dipartimento di Ingegneria dell Informazione Università di Pisa. G. Lettieri Virtualization

Load. Load. Load 1 0 MUX B. MB select. Bus A. A B n H select S 2:0 C S. G select 4 V C N Z. unit (ALU) G. Zero Detect.

Lecture 3, Performance

Practice Homework Solution for Module 4

Lecture 3, Performance

Latches. October 13, 2003 Latches 1

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

Processor Design & ALU Design

ALU A functional unit

Computer Organization I CRN Test 3/Version 1 CMSC 2833 CRN Spring 2017

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

Microprocessor Power Analysis by Labeled Simulation

EE201l Homework # 6. Microprogrammed Control Unit Design

Simple Instruction-Pipelining. Pipelined Harvard Datapath

Measurement & Performance

Measurement & Performance

Automata Theory CS S-12 Turing Machine Modifications

Logic Design. CS 270: Mathematical Foundations of Computer Science Jeremy Johnson

Computer Architecture

L07-L09 recap: Fundamental lesson(s)!

EE 660: Computer Architecture Out-of-Order Processors

CPU DESIGN The Single-Cycle Implementation

Department of Electrical and Computer Engineering University of Wisconsin - Madison. ECE/CS 752 Advanced Computer Architecture I.

Shannon-Fano-Elias coding

Lecture 13: Sequential Circuits, FSM

EECS150 - Digital Design Lecture 11 - Shifters & Counters. Register Summary

CMPT-150-e1: Introduction to Computer Design Final Exam

EC 413 Computer Organization

2

Review for Final Exam

Lecture 13: Sequential Circuits, FSM

Computer Science. Questions for discussion Part II. Computer Science COMPUTER SCIENCE. Section 4.2.

Counters. We ll look at different kinds of counters and discuss how to build them

ECEN 449: Microprocessor System Design Department of Electrical and Computer Engineering Texas A&M University

State & Finite State Machines

Distributed Systems Part II Solution to Exercise Sheet 10

CA Compiler Construction

Philadelphia University Student Name: Student Number:

Models of Computation, Recall Register Machines. A register machine (sometimes abbreviated to RM) is specified by:

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design

Parallel Numerics. Scope: Revise standard numerical methods considering parallel computations!

CMPE12 - Notes chapter 1. Digital Logic. (Textbook Chapter 3)

Evaluating Overheads of Multi-bit Soft Error Protection Techniques at Hardware Level Sponsored by SRC and Freescale under SRC task number 2042

Introduction The Nature of High-Performance Computation

ECE QUALIFYING EXAM I (QE l) January 5, 2015

ICS 233 Computer Architecture & Assembly Language

Dynamic Semantics. Dynamic Semantics. Operational Semantics Axiomatic Semantics Denotational Semantic. Operational Semantics

State and Finite State Machines

Transcription:

Department of Electrical and Computer Engineering The University of Texas at Austin EE 360N, Fall 2004 Yale Patt, Instructor Aater Suleman, Huzefa Sanjeliwala, Dam Sunwoo, TAs Exam 1, October 6, 2004 Name: Problem 1 (20 points): Problem 2 (15 points): Problem 3 (20 points): Problem 4 (15 points): Problem 5 (30 points): Total (100 points): Note: Please be sure that your answers to all questions (and all supporting work that is required) are contained in the space provided. Note: Please be sure your name is recorded on each sheet of the exam. GOOD LUCK!

Problem 1 (20 points): Part a (4 points): The string move instruction copies a block of information from one region of memory to another. This instruction can usually be used to copy up to 64KB of data with a single instruction, although admittedly, it takes a long time to execute that instruction. This instruction is usually accomplished by using an inner loop that copies data, increments the two pointers, and decrements the counter indicating how much more is left to copy. This implementation detail (the inner loop) allows the string move instruction to be stopped after some but not all of the data has been copied, in order to service and interrupt, which improves interrupt latency. Part b (4 points): Some ISAs have a fixed length, uniform decode instruction format. Other ISAs opt for a variable length format. The advantages of a fixed length, uniform decode format (in 20 words or fewer): Decode logic is simpler The disadvantages of a fixed length, uniform decode format (in 20 words or fewer): Lower utilization of instruction bits. Limited number of opcodes, addressing modes, and operands. Please put both answers in the above boxes. Part c (4 points): The classical condition code mechanism gives you an extra piece of work tacked on to the instruction, thereby potentially decreasing the number of dynamic instructions needed to execute the program. A significant potential disadvantage to this mechanism is (in 20 words or fewer): The branch instruction needs to be executed right after the instruction that generates the condition codes. Part d (4 points): Critical path design requires you to look at the longest speed path and try to shorten it, thereby decreasing the cycle time. MIPS found on one of their early machines that the speed path associated with the cache access was twice as long as the ALU path and the control path. That is, the ALU was n nanoseconds, the control path 0.80n nanoseconds, and the cache access approximately 2n nanoseconds. The microarchitect set the cycle to what? How did he manage that? Answer in the box, 20 words or fewer: n nanoseconds. Make the cache access a two cycle operation. Part e (4 points): Mike Flynn observed that it did not matter how sophisticated we made our pipeline, or how many stages we had in it, we would never be able to complete on average more than one instruction each cycle as long as: (Finish your answer in the box in fewer than 20 words.) we fetch only one instruction per cycle. 2

Problem 2 (15 Points): Little Computer Inc. produces a machine in which all instructions execute in one cycle. However, the ISA does not include a multiply (MUL) instruction. To perform a multiplication, it takes 5 instructions from the existing ISA. Little Computer Inc. proposes a new ISA, which adds a MUL instruction. Everything else stays the same. In the new ISA, all instructions take one cycle, including the new MUL instruction. However, in order to accomplish this, the implementation of the proposed ISA has an increase in cycle time of 50%. Your job: to find out if the proposed ISA is a good idea. Part a (8 points): Little Computer Inc simulates the proposed ISA and finds that on a set of representative benchmarks, 20% of the instructions executed on average are MUL instructions. Would it be a good idea to add the MUL to the ISA? Explain. 0.2 x 5 + 0.8 = 1.8 Since 1.8 > 1.5, it is a good idea Part b (7 points): What if, instead of 20%, only 10% of the instructions exectuted are MUL instructions? Good idea or bad idea? Explain. 0.1 x 5 + 0.9 = 1.4 Since 1.4 < 1.5, it is a bad idea 3

Problem 3 (20 points): A two-way interleaved, byte addressable memory is shown in the figure. It takes TEN cycles after the MAR is loaded to load the MDR with the contents of the relevant memory locations. The processor supports unaligned memory accesses. 4

Problem 3 (Continued): Part a (4 points) LDW R3, B loads a 32-bit word into R3. Consider the case where the address B is an integer of the form (8 k + 2), where k is an integer. What is the number of cycles required to get the contents of B into the MDR, after the cycle in which the processor puts the requested address on the bus. Part b (4 points) Repeat part (a) where B is (8 k + 6), k is an integer. 10 11 Part c (12 points) Construct the truth table to implement the combinational logic that controls the unaligned memory system shown in the figure for read accesses. You DO NOT need to worry about the datasize BYTE. R/W SIZE 1st/2nd ADDR[2:0] LD MAR[3:0] ROT[1:0] SEL[3:0] R H 1 000 0011 00 xx00 R H 2 000 0000 xx xxxx R H 1 001 0011 01 x00x R H 2 001 0000 xx xxxx R H 1 010 0011 10 00xx R H 2 010 0000 xx xxxx R H 1 011 0011 11 0xx1 R H 2 011 0000 xx xxxx R H 1 100 0011 00 xx11 R H 2 100 0000 xx xxxx R H 1 101 0011 01 x11x R H 2 101 0000 xx xxxx R H 1 110 0011 10 11xx R H 2 110 0000 xx xxxx R H 1 111 0001 11 1xxx R H 2 111 0010 11 xxx0 R W 1 000 1111 00 0000 R W 2 000 0000 xx xxxx R W 1 001 1111 01 0001 R W 2 001 0000 xx xxxx R W 1 010 1111 10 0011 R W 2 010 0000 xx xxxx R W 1 011 1111 11 0111 R W 2 011 0000 xx xxxx R W 1 100 1111 00 1111 R W 2 100 0000 xx xxxx R W 1 101 0111 01 111x R W 2 101 1000 01 xxx0 R W 1 110 0011 10 11xx R W 2 110 1100 10 xx00 R W 1 111 0001 11 1xxx R W 2 111 1110 11 x000 5

Problem 4 (15 points): A KB virtual address space is made up of 1KB pages. For this anemic system, operating system code and data structures are contained within the process virtual address space. A snapshot of physical memory right now is shown below: Page 10 000000000000 Page 0 Page 4 Page Table 111111111111 All pages that are resident belong to this process. The process consists of 12 pages of virtual address space. Since the last time the reference bits in all PTEs were cleared, the following memory accesses were made: R-10, R-10, W-10, R-10, R-0, R-0, R-0, R-10, R-0, where R-n means Read from page n, W-m means Write to page m. 6

Problem 4 (Continued): Your job: Construct a snapshot of the COMPLETE page table for this process as it exists right now. Note, we have provided more than enough space to contain the COMPLETE page table. Use only what you need. Each page table entry should consist of the minimum integer number of bytes needed to do the job. You may assign fields to the bits of the PTE anyway you wish. Assume memory is protected on a page level of granularity. Assume two levels of privilege {Kernel, User}. Assume three kinds of access {None, Read, Write}. In the current process Kernel and User are allowed to read and write to all pages. Feel free to choose any code you feel is appropriate to represent the fact that Kernel and User can read and write to a page. V M R Prot PFN 0 1 0 1 0 0 0 0 1 Page Table Base Register 1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 4 1 0 0 0 0 0 1 0 5 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 9 0 0 0 0 0 0 0 0 10 1 1 1 0 0 0 0 0 11 0 0 0 0 0 0 0 0 7

Problem 5 (30 points): We wish to add to the LC-3b the new instruction MWORDCPY, which copies the contents of k consecutive words to another set of k consecutive words. We will use the unused opcode 1010 for this purpose. Encoding 15 12 11 9 8 6 5 4 2 0 MWORDCPY DBaseR, SBaseR, CntR 1010 DBaseR SBaseR 0 0 0 CntR Operation while(cntr > 0){ MEM[DBaseR] = MEM[SBaseR]; DBaseR = DBaseR + 2; SBaseR = SBaseR + 2; CntR = CntR - 1; } We will assume for simplicity here that DBaseR and SBaseR are different registers and that CntR contains a nonnegative integer. You do not have to test for either of these. In order to implement this instruction, we have added some hardware to the LC-3b datapath and some new logic in the microsequencer. 8

Problem 5 (Continued): MEMORY OUTPUT INPUT KBDR ADDR. CTL. LOGIC MDR INMUX MAR L MAR[0] MAR[0] DATA.SIZE R DATA.SIZE D.MAR 2 KBSR MEM.EN R.W MIO.EN GatePC GateMARMUX LD.CC SR2MUX SEXT SEXT [8:0] [10:0] SEXT SEXT [5:0] +2 PC LD.PC + [7:0] LSHF1 [4:0] GateALU SHF GateSHF 6 IR[5:0] LOGIC GateMDR N Z P SR2 OUT SR1 OUT REG FILE MARMUX 3 0 R ADDR2MUX 2 ZEXT & LSHF1 3 ALU ALUK 2 A B PCMUX 2 SR1 SR2 LD.REG IR LD.IR CONTROL DDR DSR MIO.EN LOGIC LOGIC SIZE DATA. WE1WE0 [0] WE LOGIC ADDR1MUX 3 DR DRMUX IR[11:9] M 111 N O 0x0002 3 3 BMUX AMUX A_MUX B_MUX 9

Problem 5 (Continued): Part a (3 points) What are M, N, and O (see datapath on previous page). Be specific. M -1 or 0xFFFF N IR [8:6] O IR [2:0] Part b (2 points) What signal does X correspond to in the Microsequencer diagram shown below? (Hint: It is one of the condition code registers). Part c (5 points) What is the Control Store address for C in the state machine on the next page. P 26 COND2 COND1 COND0 X BEN R IR[11] Branch Ready Addr. Mode J[5] J[4] J[3] J[2] J[1] J[0] 0,0,IR[15:12] 6 IRD 6 Address of Next State 10

Problem 5 (Continued): Part d (9 points) We show below the begining of the state diagram necessary to implement MWORDCPY. Using the notation of the LC-3b State Diagram, add the bubbles you need to implement the MWORDCPY instruction. Describe inside each bubble what happens in each state. You should be able to implement this in fewer than 15 states. (A TA found a solution that required only 8 states). from 32 A CntR< CntR setcc 10 To 18 B [P] C MAR< SBaseR D MDR< M[MAR] R E MAR< DBaseR F M[MAR]< MDR R G DBaseR< DBaseR+2 H SBaseR< SBaseR+2 I CntR< CntR 1 setcc 11

Problem 5 (Continued): Part e (2 points) Give the values of the COND bits (COND0, COND1, COND2) for the state labeled B. COND2 1 COND1 0 COND0 0 Part f (9 points) The processing in each state you just added is controlled by asserting or negating each control signal. Enter a 1 or a 0 as appropriate for the microinstructions corresponding to the states you have added. LD.MAR LD.MDR LD.REG LD.CC GateMDR GateALU GateMARMUX SR1MUX AMUX BMUX DRMUX ADDR2MUX ADDR1MUX MARMUX ALUK R.W MIO.EN A B C D E F G H I 0 0 1 1 0 1 0 0 1 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 12