Chapter 4 Number Representations

Similar documents
Computer Arithmetic. MATH 375 Numerical Analysis. J. Robert Buchanan. Fall Department of Mathematics. J. Robert Buchanan Computer Arithmetic

Number Representation and Waveform Quantization

ENGIN 112 Intro to Electrical and Computer Engineering

How do computers represent numbers?

Mathematical preliminaries and error analysis

Notes for Chapter 1 of. Scientific Computing with Case Studies

Chapter 1: Introduction and mathematical preliminaries

Complement Arithmetic

ALU (3) - Division Algorithms

Arithmetic and Error. How does error arise? How does error arise? Notes for Part 1 of CMSC 460

Ex code

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

Numbering Systems. Contents: Binary & Decimal. Converting From: B D, D B. Arithmetic operation on Binary.

Chapter 1 Error Analysis

14:332:231 DIGITAL LOGIC DESIGN. Why Binary Number System?

Introduction CSE 541

Introduction to Scientific Computing Languages

ACM 106a: Lecture 1 Agenda


The Euclidean Division Implemented with a Floating-Point Multiplication and a Floor

Fundamentals of Digital Design

Binary Floating-Point Numbers

DSP Design Lecture 2. Fredrik Edman.

ECE380 Digital Logic. Positional representation

Number Representations

Introduction to Scientific Computing Languages

What Every Programmer Should Know About Floating-Point Arithmetic DRAFT. Last updated: November 3, Abstract

Introduction and mathematical preliminaries

Computing Machine-Efficient Polynomial Approximations

cse 311: foundations of computing Fall 2015 Lecture 12: Primes, GCD, applications

Menu. Review of Number Systems EEL3701 EEL3701. Math. Review of number systems >Binary math >Signed number systems

ECE260: Fundamentals of Computer Engineering

Computer Architecture, IFE CS and T&CS, 4 th sem. Representation of Integer Numbers in Computer Systems

Floating-point Computation

History & Binary Representation

1. Write a program to calculate distance traveled by light

Round-off Errors and Computer Arithmetic - (1.2)

Four Important Number Systems

Lecture 7. Floating point arithmetic and stability

Numerical Methods - Preliminaries

ECE 372 Microcontroller Design

Conversions between Decimal and Binary

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017

Optimizing MPC for robust and scalable integer and floating-point arithmetic

Introduction to Digital Logic Missouri S&T University CPE 2210 Number Systems

Midterm Review. Igor Yanovsky (Math 151A TA)

Software implementation of ECC

Floating Point Number Systems. Simon Fraser University Surrey Campus MACM 316 Spring 2005 Instructor: Ha Le

Chapter 5 Arithmetic Circuits

Tu: 9/3/13 Math 471, Fall 2013, Section 001 Lecture 1

Next, we include the several conversion from type to type.

Hakim Weatherspoon CS 3410 Computer Science Cornell University

Introduction to Digital Logic Missouri S&T University CPE 2210 Number Systems

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 5. Ax = b.

Logic and Computer Design Fundamentals. Chapter 5 Arithmetic Functions and Circuits

What is Binary? Digital Systems and Information Representation. An Example. Physical Representation. Boolean Algebra

Lecture Notes 7, Math/Comp 128, Math 250

Representing Numbers on the Computer

8/13/16. Data analysis and modeling: the tools of the trade. Ø Set of numbers. Ø Binary representation of numbers. Ø Floating points.

Notes on floating point number, numerical computations and pitfalls

MATH Dr. Halimah Alshehri Dr. Halimah Alshehri

Cs302 Quiz for MID TERM Exam Solved

0,..., r 1 = digits in radix r number system, that is 0 d i r 1 where m i n 1

CSE 1400 Applied Discrete Mathematics Definitions

Identify the specified numbers by circling. 1) The integers in the following list: 8 16, 8, -10, 0, 9, - 9

Solutions - Homework 1 (Due date: September 25 th ) Presentation and clarity are very important! Show your procedure!

Jim Lambers MAT 610 Summer Session Lecture 2 Notes

NUMBER SYSTEMS. and DATA REPRESENTATION. for COMPUTERS (PROBLEM ANSWERS)

1 ERROR ANALYSIS IN COMPUTATION

FYSE410 DIGITAL ELECTRONICS [1] [2] [3] [4] [5] A number system consists of an ordered set of symbols (digits).

Tunable Floating-Point for Energy Efficient Accelerators

Chapter 1 Mathematical Preliminaries and Error Analysis

QUADRATIC PROGRAMMING?

ENGG 1203 Tutorial_9 - Review. Boolean Algebra. Simplifying Logic Circuits. Combinational Logic. 1. Combinational & Sequential Logic

What s the Deal? MULTIPLICATION. Time to multiply

Chapter 1 Mathematical Preliminaries and Error Analysis

FLOATING POINT ARITHMETHIC - ERROR ANALYSIS

Chapter 4: Radicals and Complex Numbers

EE260: Digital Design, Spring n Digital Computers. n Number Systems. n Representations. n Conversions. n Arithmetic Operations.

Residue Number Systems Ivor Page 1

Chapter 1 CSCI

ECS 231 Computer Arithmetic 1 / 27

Chapter 1: Solutions to Exercises

Lecture 11. Advanced Dividers

Numerical Methods - Lecture 2. Numerical Methods. Lecture 2. Analysis of errors in numerical methods

Arithmetic, Algebra, Number Theory

Design of Digital Circuits Reading: Binary Numbers. Required Reading for Week February 2017 Spring 2017

Elements of Floating-point Arithmetic

Lecture 2: Number Representations (2)

PowerPoints organized by Dr. Michael R. Gustafson II, Duke University

mith College Computer Science CSC231 - Assembly Week #7 Dominique Thiébaut

Numerical Analysis and Computing

Combinational Logic. By : Ali Mustafa

FLOATING POINT ARITHMETHIC - ERROR ANALYSIS

CHAPTER 2 NUMBER SYSTEMS

Numbering Systems. Computational Platforms. Scaling and Round-off Noise. Special Purpose. here that is dedicated architecture

Chapter 1. Binary Systems 1-1. Outline. ! Introductions. ! Number Base Conversions. ! Binary Arithmetic. ! Binary Codes. ! Binary Elements 1-2

NUMBERS AND CODES CHAPTER Numbers

CSCI 220: Computer Architecture-I Instructor: Pranava K. Jha. BCD Codes

Transcription:

Chapter 4 Number Representations SKEE2263 Digital Systems Mun im/ismahani/izam {munim@utm.my,e-izam@utm.my,ismahani@fke.utm.my} February 9, 2016

Table of Contents 1 Fundamentals 2 Signed Numbers 3 Fixed-Point Numbers 4 Floating Point

Taxonomy of Number Systems Numbers Integers Reals Unsigned Signed Fixed Point Floating Point Signed- Magnitude Ones' Complement Two's Complement

Integers Number of bits Number of values Machine 4 16 Intel 4004 8 256 8080, 6800 16 65536 PDP11, 8086, 68000 32 4 10 9 68020, VAX11, IEEE single 48 1 10 14 Unisys 64 1.8 10 19 Cray, IEEE double

Integers Value for integer bit pattern: Example 10110 2 : V unsigned = N 1 i=0 b i 2 i 10110 2 = 1 2 4 + 0 2 3 + 1 2 2 + 1 2 1 + 0 2 0 = 22 10

Signed-Magnitude Value for N-bit signed-magnitude pattern is: Example 1010 SM : N 2 V SM = ( 1) b N 1 b i 2 i i=0 V SM = ( 1) b 3 [b 2 2 2 + b 1 2 1 + b 0 2 0 ] = ( 1) 1 [0(4) + 1(2) + 0(1)] = 1 2 = 2

Signed-Magnitude Signed Integer Signed-Magnitude +5 0000 0101 +4 0000 0100 +3 0000 0011 +2 0000 0010 +1 0000 0001 0 0000 0000 1000 0000-1 1000 0001-2 1000 0010-3 1000 0011-4 1000 0100-5 1000 0101

Ones Complement Value for N-bit ones complement pattern is: Example 1010 1C : N 2 V 1C = b N 1 2 N 1 + b i 2 i + b N 1 i=0 V 1C = b 3 2 3 + b 2 2 2 + b 1 2 1 + b 0 2 0 + b 3 = 1(8) + 0(4) + 1(2) + 0(1) + 1 = 8 + 2 + 1 = 5

Ones Complement Signed Integer Ones Complement +127 0111 1111 +126 0111 1110...... +2 0000 0010 +1 0000 0001 0 0000 0000 1111 1111-1 1111 1110-2 1111 1101-3 1111 1100...... -126 1000 0001-127 1000 0000

Two s Complement Value for N-bit Two s complement pattern is: Example 1010 2C : N 2 V 2C = b N 1 2 N 1 + b i 2 i i=0 V 2C = b 3 2 3 + b 2 2 2 + b 1 2 1 + b 0 2 0 = 1(8) + 0(4) + 1(2) + 0(1) = 8 + 2 = 6

Twos Complement Signed Integer Twos Complement +127 0111 1111 +126 0111 1110...... +2 0000 0010 +1 0000 0001 0 0000 0000-1 1111 1111-2 1111 1110...... -126 1000 0010-127 1000 0001-128 1000 0000

Sign Extension convert a number to a larger format. just copy the sign bit to fill the new high order bits + 100 in 8-bit two s-complement binary 0110 0100 + 100 in 16-bit two s-complement binary 0000 0000 0110 0100-100 in 8-bit two s-complement binary 1001 1100-100 in 16-bit two s-complement binary 1111 1111 1001 1100

Offset binary a.k.a. biased-k representation Variation of two s complement Uses a value K as biasing value Applications: Exponent of floating-point number (biased-127 or biased-1023) Analog interfacing Excess-3 code (actual value = binary - 3)

Comparing Number Systems Decimal Signed- One s Two s Offset Magnitude Complement Complement Binary 7 0111 0111 0111 1111 6 0110 0110 0110 1110 5 0101 0101 0101 1101 4 0100 0100 0100 1100 3 0011 0011 0011 1011 2 0010 0010 0010 1010 1 0001 0001 0001 1001 0 0000 0000 0000 1000-0 1000 1111-1 1001 1110 1111 0111-2 1010 1101 1110 0110-3 1011 1100 1101 0101-4 1100 1011 1100 0100-5 1101 1010 1011 0011-6 1110 1001 1010 0010-7 1111 1000 1001 0001-8 1000 0000

Signed Systems Compared Unsigned Signed- Ones Two s Magnitude Complement Complement Smallest 0 (2 n 1 1) (2 n 1 1) 2 n 1 Largest 2 n 1 +(2 n 1 1) +(2 n 1 1) +(2 n 1 1)

Real Numbers Number System Format Characteristics Fixed-point ±i.f Low-precision Rational ±p/q Difficult to work with Floating-point ±m b e Most common way to handle reals

Fixed-Point Numbers The general expression for an N-bit fixed point 2 s complement where: N = total #bits x = b N 12 N 1 + N 2 i=0 b i 2 i 2 f f = #bits in fraction (0 f N 1) N-1 0 S int frac imaginary binary point

Expression for Two s Comp. Value of N-bit two s complement integer, f = 0 x = b N 12 N 1 + N 2 i=0 b i 2 i 2 0 = b N 1 2 N 1 + b N 2 2 N 2 + + b 1 2 1 + b 0 2 0 N-1 0 S int imaginary binary point

Expression for Q-Format Value of N-bit fixed point, f = N 1 x = b N 12 N 1 + N 2 i=0 b i 2 i 2 N 1 = b 0 + b 1 2 1 + b 2 2 2 + + b (N 1) 2 (N 1) N-1 0 S frac imaginary binary point

N = 8, f = 4 Weights 2 3 2 2 2 1 2 0 2 1 2 2 2 3 2 4 Bit value 0 1 0 1 1 1 0 0 0101.1100 2 = 2 2 + 2 0 + 2 1 + 2 2 = 4 + 1 + 0.5 + 0.25 = 5.75 10 OR Weights 2 7 2 6 2 5 2 4 2 3 2 2 2 1 2 0 2 4 Bit value 0 1 0 1 1 1 0 0 x = (2 6 + 2 4 + 2 3 + 2 2 ) 2 4 = (64 + 16 + 8 + 4) 16 = 5.75 10

N = 8, f = 7 Q7format Weights 2 0 2 1 2 2 2 3 2 4 2 5 2 6 2 7 Bit value 0 1 1 1 1 1 1 1 max = 2 1 + 2 2 + 2 3 + 2 4 + 2 5 + 2 6 + 2 7 = (2 6 + 2 5 + 2 4 + 2 3 + 2 2 + 2 1 + 2 0 ) 2 7 = 127/128 = 0.9921875 Weights 2 0 2 1 2 2 2 3 2 4 2 5 2 6 2 7 Bit value 1 0 0 0 0 0 0 0 min = 2 0 = 1

Multiplying Q15 Numbers Q15 15 0 S x Q15 S Q30 31 S S 15 14 0 Q30 31 S S 16 15 14 0 r + 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 rounding by addition a '1' here

What is Floating Point? 5.25 10 = 101.01 2 0 = 10.101 2 1 = 1.0101 2 2 = 0.10101 2 3 Binary point floats to a pre-defined position Process is called normalization

Floating Point Parts - 0. 4 3 2 1 x 1 0-2 Exponent Sign of mantissa Location of decimal point Mantissa Sign of exponent Radix ±X = m b e where m = mantissa, b = number base and e = exponent.

IEEE Single-Precision Format 31 30 23 22 0 S 8-bit exp 23-bit frac ±X = ( 1) s 1.m 2 e 127 Sign field: 0 for positive numbers (-1 0 = +1) 1 for negative numbers (-1 1 = -1). Exponent field: Unsigned 8 bit, biased-127. Mantissa field: Bits to the right of normalized binary number.

IEEE Single-Precision Format 1 1000 0001 110 0000 0000 0000 0000 0000 X = ( 1) 1 1.11 2 2 129 127 = 1 1.525 10 2 2 = 7

IEEE Double-Precision Format Double precision is more common. 63 62 52 51 0 S 11-bit exp 52-bit frac ±X = ( 1) s 1.m 2 e 1023 public class TryFP { public static void main(string[ ] args) { double d = 1/3.; // Java likes double-prec more float f = 1f/3f; // Must force use of single-prec System.out.println("Value of d="+d); System.out.println("Value of f="+f); } } Value of d=0.3333333333333333 Value of f=0.33333334

FX vs FP Fixed Point Arithmetic Simple circuit Small area and faster Less accurate (the result is truncated if it exceeds the size) Smaller range of values can be handled Floating-Point Arithmetic Complex circuit (due to rounding and normalization) Large area and slower More accurate (high precision) Wider range of values can be handled