Fall 2011 Prof. Hyesoon Kim

Similar documents
Unit 6: Branch Prediction

Portland State University ECE 587/687. Branch Prediction

EXAMPLES 4/12/2018. The MIPS Pipeline. Hazard Summary. Show the pipeline diagram. Show the pipeline diagram. Pipeline Datapath and Control

CSE. 1. In following code. addi. r1, skip1 xor in r2. r3, skip2. counter r4, top. taken): PC1: PC2: PC3: TTTTTT TTTTTT

/ : Computer Architecture and Design

Pipeline no Prediction. Branch Delay Slots A. From before branch B. From branch target C. From fall through. Branch Prediction

CSCI-564 Advanced Computer Architecture

Issue = Select + Wakeup. Out-of-order Pipeline. Issue. Issue = Select + Wakeup. OOO execution (2-wide) OOO execution (2-wide)

CMP N 301 Computer Architecture. Appendix C

Computer Architecture ELEC2401 & ELEC3441

CPSC 3300 Spring 2017 Exam 2

1. (2 )Clock rates have grown by a factor of 1000 while power consumed has only grown by a factor of 30. How was this accomplished?

CMU Introduction to Computer Architecture, Spring 2015 HW 2: ISA Tradeoffs, Microprogramming and Pipelining

3. (2) What is the difference between fixed and hybrid instructions?

Branch Prediction using Advanced Neural Methods

ICS 233 Computer Architecture & Assembly Language

Pipelining. Traditional Execution. CS 365 Lecture 12 Prof. Yih Huang. add ld beq CS CS 365 2

EE 660: Computer Architecture Out-of-Order Processors

Worst-Case Execution Time Analysis. LS 12, TU Dortmund

[2] Predicting the direction of a branch is not enough. What else is necessary?

Pattern History Table. Global History Register. Pattern History Table. Branch History Pattern Pattern History Bits

4. (3) What do we mean when we say something is an N-operand machine?

ENEE350 Lecture Notes-Weeks 14 and 15

[2] Predicting the direction of a branch is not enough. What else is necessary?

Department of Electrical and Computer Engineering University of Wisconsin - Madison. ECE/CS 752 Advanced Computer Architecture I.

This Unit: Scheduling (Static + Dynamic) CIS 501 Computer Architecture. Readings. Review Example

Simple Instruction-Pipelining (cont.) Pipelining Jumps

Simple Instruction-Pipelining. Pipelined Harvard Datapath

Simple Instruction-Pipelining. Pipelined Harvard Datapath

Lecture 9: Control Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

A Novel Meta Predictor Design for Hybrid Branch Prediction

COVER SHEET: Problem#: Points

Lecture 13: Sequential Circuits, FSM

Microprocessor Power Analysis by Labeled Simulation

Scalable Store-Load Forwarding via Store Queue Index Prediction

Department of Electrical and Computer Engineering The University of Texas at Austin


UNIVERSITY OF WISCONSIN MADISON

Lecture 13: Sequential Circuits, FSM

Unit 8: Sequ. ential Circuits

ECE 3401 Lecture 23. Pipeline Design. State Table for 2-Cycle Instructions. Control Unit. ISA: Instruction Specifications (for reference)

Worst-Case Execution Time Analysis. LS 12, TU Dortmund

Equalities and Uninterpreted Functions. Chapter 3. Decision Procedures. An Algorithmic Point of View. Revision 1.0

Models of Computation, Recall Register Machines. A register machine (sometimes abbreviated to RM) is specified by:

Project Two RISC Processor Implementation ECE 485

Logic and Computer Design Fundamentals. Chapter 8 Sequencing and Control

Limits to Branch Prediction

A Second Datapath Example YH16

Fall 2008 CSE Qualifying Exam. September 13, 2008

GATE 2014 A Brief Analysis (Based on student test experiences in the stream of CS on 1 st March, Second Session)

Fast Path-Based Neural Branch Prediction

EECS150 - Digital Design Lecture 25 Shifters and Counters. Recap

ECE290 Fall 2012 Lecture 22. Dr. Zbigniew Kalbarczyk

Implementing the Controller. Harvard-Style Datapath for DLX

Dataflow Analysis. A sample program int fib10(void) { int n = 10; int older = 0; int old = 1; Simple Constant Propagation

ECE 172 Digital Systems. Chapter 12 Instruction Pipelining. Herbert G. Mayer, PSU Status 7/20/2018

A framework for the timing analysis of dynamic branch predictors

Performance, Power & Energy. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So

Galois Field Algebra and RAID6. By David Jacob

System Data Bus (8-bit) Data Buffer. Internal Data Bus (8-bit) 8-bit register (R) 3-bit address 16-bit register pair (P) 2-bit address

Computer Engineering Department. CC 311- Computer Architecture. Chapter 4. The Processor: Datapath and Control. Single Cycle

ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN. Week 9 Dr. Srinivas Shakkottai Dept. of Electrical and Computer Engineering

Lecture 12: Pipelined Implementations: Control Hazards and Resolutions

Enrico Nardelli Logic Circuits and Computer Architecture

EECS150 - Digital Design Lecture 11 - Shifters & Counters. Register Summary

ECE QUALIFYING EXAM I (QE l) January 5, 2015

Chapter 7. Sequential Circuits Registers, Counters, RAM

COSE312: Compilers. Lecture 17 Intermediate Representation (2)

Building a Computer Adder

Ch 7. Finite State Machines. VII - Finite State Machines Contemporary Logic Design 1

Latches. October 13, 2003 Latches 1

EXAM 2 REVIEW DAVID SEAL

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Digital Logic

Lecture 5 - Assembly Programming(II), Intro to Digital Filters

Performance Prediction Models

Review: Designing with FSM. EECS Components and Design Techniques for Digital Systems. Lec 09 Counters Outline.

STEP Support Programme. Hints and Partial Solutions for Assignment 17

Exploiting Bias in the Hysteresis Bit of 2-bit Saturating Counters in Branch Predictors

Measurement & Performance

Measurement & Performance

COMPUTER SCIENCE TRIPOS

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

Implementing Absolute Addressing in a Motorola Processor (DRAFT)

Time Allowed 3:00 hrs. April, pages

Review for B33DV2-Digital Design. Digital Design

A Detailed Study on Phase Predictors

Lecture 34: Portable Systems Technology Background Professor Randy H. Katz Computer Science 252 Fall 1995

UNIVERSITY OF BOLTON SCHOOL OF ENGINEERING BENG (HONS) ELECTRICAL & ELECTRONICS ENGINEERING EXAMINATION SEMESTER /2017

Árpád Gellért Lucian N. Vinţan Adrian Florea. A Systematic Approach to Predict Unbiased Branches

Branch Prediction based attacks using Hardware performance Counters IIT Kharagpur

BETTER BRANCH PREDICTION THROUGH PROPHET/CRITIC HYBRIDS

Performance, Power & Energy

CPE100: Digital Logic Design I

State Machines ELCTEC-131

Counters. We ll look at different kinds of counters and discuss how to build them

Quiz 2. Due November 26th, CS525 - Advanced Database Organization Solutions

Datapath Component Tradeoffs

UC Berkeley, CS 174: Combinatorics and Discrete Probability (Fall 2008) Midterm 1. October 7, 2008

Loop Scheduling and Software Pipelining \course\cpeg421-08s\topic-7.ppt 1

Transcription:

Fall 2011 Prof. Hyesoon Kim

Add: 2 cycles FE_stage add r1, r2, r3 FE L ID L EX L MEM L WB L add add sub r4, r1, r3 sub sub add add mul r5, r2, r3 mul sub sub add add mul sub sub add add mul sub sub add add mul mul sub sub add

br target 0x800 add r1, r2,r3 0x804 target sub r2,r3,r4 0x900 FE_stage cycle 1 2 3 4 5 PC (latch) 0x800 0x804 0x804 0x804 0x900 FE ID EX MEM WB br add add add sub br 6 0x904 add sub br br br

-Eliminate branches Predication (more on later) Delayed branch slot SPARC, MIPS Dual-path execution (more on later) Or predict?

Branches are very frequent Approx. 20% of all instructions Can not wait until we know where it goes Long pipelines Branch outcome is known after B cycles No scheduling past the branch until outcome known Superscalars (e.g., 4-way) Branch every cycle or so! One cycle of work, then bubbles for ~B cycles?

Predict Branches And predict them well! Fetch, decode, etc. on the predicted path Option 1: No execution until branch is resolved Option 2: Execute anyway (speculation) Recover from mispredictions Restart fetch from correct path

Need to know two things Whether the branch is taken or not (direction) he target address if it is taken (target) Direct jumps, Function calls: unconditional branches Direction known (always taken), target easy to compute Conditional Branches (typically PC-relative) Direction difficult to predict, target easy to compute Indirect jumps, function returns Direction known (always taken), target difficult

Needed for conditional branches Most branches are of this type Many, many kinds of predictors for this Static: fixed rule, or compiler annotation (e.g. br.bwh (branch whether hint. IA-64)) Dynamic: hardware prediction Dynamic prediction usually history-based Example: predict direction is the same as the last time this branch was executed

Always predict N easy to implement 30-40% accuracy not so good Always predict 60-70% accuracy BFN loops usually have a few iterations, so this is like always predicting that the loop is taken don t know target until decode

K bits of branch instruction address Branch history table of 2^K entries, 1 bit per entry Index Use this entry to predict this branch: 0: predict not taken 1: predict taken When branch direction resolved, go back into the table and update entry: 0 if not taken, 1 if taken

0xDC08: for(i=0; i < 100000; i++) { 0xDC44: if( ( i % 100) == 0 ) tick( ); 0xDC50: if( (i & 1) == 1) odd( ); } N

Example: short loop (8 iterations) aken 7 times, then not taken once Not-taken mispredicted (was taken previously) Act: NNN Pred: XNNN Corr: Xooooo MMooooooMMooooooMM Misprediction rate: 2/8 = 25% Execute the same loop again First always mispredicted (previous outcome was not taken) hen 6 predicted correctly hen last one mispredicted again Each fluke/anomaly in a stable pattern results in two mispredicts per loop

DC08:... N 100,000 iterations How often is branch outcome!= previous outcome? 2 / 100,000 DC44:... N N 2 / 100 98.0% N N 99.998% Prediction Rate DC50: NNNNNNNNNNNNNN 2 / 2 0.0%

Predict N Predict ransistion on outcome ransistion on N outcome 2 3 0 1 FSM for Last-time Prediction 0 1 FSM for 2bC (2-bit Counter)

1bC: Initial raining/warm-up 0 1 1 1 1 1 1 N 0 1 1 2bC: 0 1 2 3 3 3 3 N 2 3 3 2 3 Only 1 Mispredict per N branches now! DC08: 99.999% DC44: 99.0% 0 1

hese are good We can live with these his is bad!

98% 99% Who cares? Actually, it s 2% misprediction rate 1% hat s a halving of the number of mispredictions So what? If a pipeline can fetch 5 instructions at a cycle and the branch resolution time is 20 cycles o Fetch 500 instructions 100 accuracy : 100 cycles 98 accuracy: 100 (correctly fetch) + 20 (misprediction)*10 = 300 cycles 99 accuracy 100 (correctly fetch) + 20 misprediction *5 = 200 cycles

Pattern History able 1 1.. 1 0 previous one 00. 00 00. 01 00. 10 2 3 BHR (branch history register) index 11. 11 0 1 Yeh&patt 92

Old history Initialization value (0 or 1) New history 0 0 0 0 0 0 History length 1 : branch is taken 0: branch is not-taken New BHR = old BHR<<1 (br_dir) Example BHR: 00000 Br1 : taken BHR 00001 Br 2: not-taken BHR 00010 Br 3: taken BHR 00101

Yeh and Patt 3-letter naming scheme ype of history collected G (global), P (per branch), S (per set) PH type A (adaptive), S (static) PH organization g (global), p (per branch), s (per set) 20

PH GBHR BHR able PC PC GAp PAp

Local Behavior What is the predicted direction of Branch A given the outcomes of previous instances of Branch A? Global Behavior What is the predicted direction of Branch Z given the outcomes of all* previous branches A, B,, X and Y? * number of previous branches tracked limited by the history length

Branches are correlated Branch X: if (cond1). Branch Y: if (cond 2). Branch Z : if (cond 1 and cond 2) BHR.1 0.1 1 PH Branch X Branch Y 1 0 0 1 1 1 0 1 0 Branch Z.01.00 0 0 0

1 1.. 1 0 2bc 2bc BHR XOR index 2bc 0x809000 PC 2bc McFarling 93 Predictor size: 2^(history length)*2bit

predict_func(pc, actual_dir) { index = PC xor BHR taken = 2bit_counters[index] > 2? 1 : 0 correctly_predictied = (actual_dir == taken)? 1 : 0 // stats } updated_func(pc, actual_dir) { index = PC xor BHR if (actual_dir) SA_INC( 2bit_counter[index] ) else SA_DEC ( 2bit_counter[index] ) BHR = BHR << 1 actual_dir }