EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

EAD 115 Numerical Solution of Engineering and Scientific Problems David M. Rocke Department of Applied Science

Computer Representation of Numbers Counting numbers (unsigned integers) are the numbers 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, In almost all computers, these numbers are represented in binary (base 2) rather than decimal. We count 0, 1, 10, 11, 100, 101, 110, 111, 1000, 1001,

Fixed length Integers Data storage is generally in bytes, where 1 byte = 8 bits. With one-byte integers, the smallest integer that can be stored is 0, and the largest is 11111111 2 = 2 8 1 = 255. Internet IP addresses consist of four bytes, so that no part of an IP address exceeds 255 (UC Davis is 168.150.243.2).

The IP address 168.150.243.2 looks like this in binary: 10101000 10010110 11110011 00000010

More Unsigned Integers Two-byte or 16 bit short integers can represent any whole number from 0 to 65,535 Long integers of four bytes or 32 bits can represent any whole number from 0 to 4,294,967,296 If each disk block has an address of a long integer, and each disk block has 4,196 bytes, then the disk can hold 16TB

Application: Digital Audio Uncompressed digital audio can be represented as a sequence of loudness levels A pure tone has a sequence that evolve as a sine wave The loudness levels can be represented as unsigned integers, giving all possible values

Pure Tone

6-bit Audio

Sampling Rate The sampling rate is the number of times per second that a loudness measure is taken CD s are 44,100 times per second (44.1 khz) Digital recordings are typically 44.1, 48, 96, or 192 khz

Word Length 8-bit audio has loudness levels that exist in 2 8 = 256 discrete levels. This is crude 16-bit audio has 2 16 = 65,536 loudness levels. This is what is used for CD s Audio is often now recorded in 24-bit audio, which has 16,777,216 levels, and is difficult to distinguish from the smooth original

Loudest Sound In 16-bit audio, the loudest sound that can be recorded has a numerical value of 65,536 If the input in a recording goes over this level, it is still recorded at 65,536 This leads to distorted sound, which is much more unpleasant than analog overload distortion (as with Jimi Hendrix)

Pure Tone with no Headroom

Signed integers 16 bit signed integers can represent any whole number from -32,767 to 32,767

Integer Overflow Suppose we are using one-byte signed integers, which can represent any whole number from -128 to 128. What happens when we add 100 and 100? The answer should be 200, but 1100100 + 1100100 = 11001000 which has nine bits, so is probably truncated to 1001000 or 72

Decimal Numbers Decimal numbers or floating point (vs. fixed point) are represented in scientific notation. 1,437,526 =.1437526 10 7 Exponent +7 mantissa +1437526 We represent this in binary on a computer

Typical single/double precision: 1 sign bit 8/11 exponent bits (one sign) 23/52 bit mantissa

Hypothetical 7-bit Reals 1 sign bit 3 exponent bits 3 mantissa bits Mantissa normalized to be between 0.5 and 1 to avoid wasting bits (we don t want to use a mantissa of 001 when we could use a mantissa of 100 instead since 100 and 101 (for example) look the same when truncated. (We could omit leading 1.)

Smallest Positive Number Sign 0 (positive) Exponent sign 1 (negative) Exponent magnitude 11 (3 in decimal) Mantissa, smallest normalized is 100 (next smallest is 011 which has a leading 0). 100 represents 2-1 = 0.5 in decimal. Smallest positive number is 0.5 2-3 = 2-4 = 1/16 If we divide this by 2 we get 0 (underflow)!

Largest Positive Number Sign 0 (positive) Exponent sign 0 (positive) Exponent magnitude 11 (3 in decimal) Mantissa, largest normalized is 111 111 2 = 2-1 + 2-2 + 2-3 =0.875 in decimal. Largest positive number is 0.875 2 3 = 7 If we multiply this by 2 we get overflow!

Many Numbers Cannot be Represented Exactly 1/3 in our 7-bit real has the following representation: 0 1 0 1 1 0 1 This is.3125 instead of.3333333 because that is as close as it can get When multiplied by 3, the result is 0.9375 instead of 1 (3)(1/3) = 0.9375!

Limitations of Floating Point There is a limited range of quantities that can be represented There is only a finite number of quantities that can be represented in a given range Chopping = truncation or rounding of numbers that cannot be represented exactly

Machine Epsilon Machine epsilon is the largest computer number ε such that (1 + ε) - 1 = 0 Excel uses double precision, which has 52 bit mantissa. Machine epsilon is about this size: 52 16 2 10

Some Excel Arithmetic ε (1 + ε) - 1 1E-13 1E-13 1E-14 0.999E-14 5E-15 5.11E-15 1E-15 0

Precision and Accuracy Precision means the variability between estimates Accuracy means the amount of deviation between the estimate and the true value

Errors of approximation True Value = Approximation + Error E T = TV Approx (True) Relative error is ε T = E T / TV Absolute (relative) error is the absolute value of the (relative) error ε A = E A / Approximation Both the error and the relative error can matter

Example True Value = 20 Approximation = 20.5 E T = TV Approximation = -0.5 (True) Relative error is ε T = E T / TV = -0.5/20 = -0.025 or -2.5% E A = E T = 0.5 ε A = 0.025 or 2.5%

(Series) truncation error e e x x 2 3 x x = 1+ x+ + + 2 3! 2 3 x x 1+ x+ + 2 3!

Roundoff Error Results from the approximate representation of numbers in a computer Accumulation over many computations Addition or subtraction of small and large numbers

n -1 x = n å xi i= 1 n 2-1 2 = ( -1) å ( i - ) i= 1 s n x x n 2-1 2 2 = - å i - i + i= 1 s ( n 1) ( x 2 xx x ) n n 2-1 2-1 -1 2 = ( -1) å i -2( - 1) å i + ( -1) i= 1 i= 1 s n x n x x n nx n 2-1 2-1 2-1 2 = ( -1) å i -2( - 1) + ( -1) i= 1 s n x n nx n nx n 2-1 2-1 2 = ( -1) å i -( -1) i= 1 s n x n nx

Shortcut or Mistake? The variance of the data set {1,2,3,4,5} is 2.5. The variance of the data set (100,000,001, 100,000,002, ) is the same because the spacing has not changed The shortcut formula gives 2 for the variance in Excel If the sequence starts 1,000,000,001, the variance by the shortcut is 0!

Taylor s Theorem Can often approximate a function by a polynomial The error in the approximation is related to the first omitted term There are several forms for the error We will use this kind of analysis extensively in this course

( n) f ''( a) 2 f ( a) n f ( x) = f( a) + f '( a)( x- a) + ( x- a) + + ( x- a) + R 2! n! x n ( x-t) ( n+ 1) Rn = f () t dt R= ò a n! ( x-a) ( n + 1)! n+ 1 ( n+ 1) x is between x and a f ( x) n ( n) f ''( x) 2 f ( x) n f ( x+ h) = f( x) + f '( x) h+ h + + h + R 2! n! n+ 1 h ( n+ 1) R= f ( x) ( n + 1)! n

Series Truncation Error In general, the more terms in a Taylor series, the smaller the error In general, the smaller the step size h, the smaller the error Error is O(h n+1 ), so halving the step size should result in a reduction of error that is on the order of 2 n+1 In general, the smoother the function, the smaller the the error

Taylor Series Approximation of a Polynomial f x =- x - x - x - x+ 4 3 2 ( ) 0.1 0.15 0.5 0.25 1.2 f (0) = 1.2 f (1) = 0.2 f 0 1 2 (1) = 1.2 f '(0) =-0.25 f f f (1) = f(0) - 0.25(1) = 1.2 -.25 = 0.95 ''(0) =-1 2 (1) (1) = 1.2 -.25-1 = 0.95-0.5 = 0.45 2!

f x =- x - x - x - x+ 4 3 2 ( ) 0.1 0.15 0.5 0.25 1.2 f(0) = 1.2; f '(0) =- 0.25; f ''(0) =- 1; f '''(0) =-0.9 f f n ( n) ''''(0) =- 2.4; (0) = 0, > 4 f( x) 1.2 0.25( x) ( 1/2)( x) 2 = - + - + (- 0.9 / 6)( x) + (-2.4 / 24)( x) 3 4 f ( x) = 1.2-0.25( x) + (-1/2)( x) 2 f x =- x -.25x + 1.2 2 2( ) 0.5 0 2

f x =- x - x - x - x+ 4 3 2 ( ) 0.1 0.15 0.5 0.25 1.2 f(1) = 0.2; f '(1) =- 0.25; f ''(1) =- 2.2; f '''(1) =-3.3 f =- f = n> ( n) ''''(1) 2.4; (1) 0, 4 2 = - - + - - + f( x) 0.2 0.25( x 1) ( 2.2 / 2)( x 1) (-3.3/ 6)( x- 1) + (-2.4 / 24)( x-1) 3 4 f x = - x+ - x + x-1.1 2 2( ) 0.2.25.25 1.1 2.2 f x x x 2 2( ) =- 1.1 + 0.95-0.65

Approximating Polynomials Any fourth degree polynomial has a fifth derivative that is identically zero The remainder term for the order four Taylor series contains the fourth derivative at a point. Thus the order four Taylor series approximation is exact; that is, it is the polynomial itself.

The Taylor approximation of order n to a function f(x) at a point a is the best polynomial approximation to f() at a in the following sense: It is a polynomial It is of order n or less (no terms higher than x n It matches the value and first n derivatives of f() at a. ˆ f n ( x; a)

Taylor Series and Euler s Method dv c = g- v dt m v''( x) v'''( x) vx ( h) vx ( ) v'( xh ) h h 2! 3! c v'( x) = g- v( x) m 2 3 + = + + + + c gc æcö v''( x) =- v'( x) =- + ç v( x) m m çèm ø 2

dv dt = g- c m v dv vt ( ) = vt ( ) + ( t - t) + R dt i+ 1 i i+ 1 i 1 æ c ö vt ( ) vt ( ) + g vt ( ) ç - t -t çè m ø ( ) i+ 1 i i i+ 1 i v''( x) ( ) 2 v''( x) R = t - t = h 2 = O( h 2 ) 1 i+ 1 i 2! 2

Nonlinearity and Step Size For the first-order Taylor approximation, the more nearly linear the function is, the better the approximation The smaller the step size, the better the approximation

f( x) = x m f '( x) = mx m-1 f ( x+ h) = f( x) + f '( x) h+ R m m- m-2 2 2 f ''( x) ( 1) x R = h = h 1 2! 2! 1

Numerical Differentiation f ( xi+ 1) = f( xi) + f '( xi)( xi+ 1- xi) + Oé êë ( xi+ 1-xi) f '( xi)( xi+ 1- xi) = f( xi+ 1) - f( xi) + Oé êë ( xi+ 1-xi) f( x )- f( x ) f x Oé x x ù û i+ 1 i '( i) = + ( i+ 1 - i) ( xi+ 1 - xi) ë Dfi f '( xi ) = + O( h) h 2 2 ù úû ù úû First Forward Difference

f ( x ) = f( x )- f '( x ) h+ O( h ) i-1 i i f( x )- f( x ) f '( x ) = i i-1 + O( h) i h f f '( x ) = i + O( h) i h 2 First Backward Difference

f x f x f x h f x h O h 2 3 ( i+ 1) = ( i) + '( i) + 0.5 ''( i) + ( ) f x f x f x h f x h O h 2 3 ( i-1 ) = ( i) - '( i) + 0.5 ''( i) + ( ) f x f x f x h O h 3 ( i+ 1) - ( i-1) = 2 '( i) + ( ) f( xi+ 1) - f( xi- 1) = f x + O h 2h f( x )- f( x ) f x = + O h 2h 2 '( i ) ( ) i+ 1 i-1 2 '( i ) ( ) First Centered Difference

f x =- x - x - x - x+ 4 3 2 ( ) 0.1 0.15 0.5 0.25 1.2 h= 0.5; x = 0.5 i f(0.5) =.925; f '(0.5) =-.9125 f(0) = 1.2; f(1) = 0.2 f '(0.5) (0.2-.925) /.5 =-1.45 e =- (.9125+ 1.45) /.9125 =.589 f '(0.5) (.925-1.2) /.5 =-.55 e =- (.9125 +.55) /.9125 =.397 f '(0.5) (0.2-1.2) / (2)(.5) =-1.00 e =- (.9125+ 1.00) /.9125 =.096

f x =- x - x - x - x+ h 4 3 2 ( ) 0.1 0.15 0.5 0.25 1.2 = 0.25; x = 0.5 i f(0.5) =.925; f '(0.5) =-.9125 f(.25) = 1.10351563; f(.75) = 0.63632813 f '(0.5) (0.63632813-.925) /.5 =-1.155 e =- (.9125+ 1.155) /.9125 =.265 f '(0.5) (.925-1.10351563) /.5 =-.714 e =- (.9125 +.714) /.9125 =.217 f '(0.5) (0.63632813-1.10351563) /.5 =-0.934 e =- (.9125 +.934) /.9125 =.024

Summary of Example h = 0.5 h = 0.25 Forward 0.589.265 Backward 0.397 0.217 Centered 0.096 0.024 Relative Error

Second Differences f x f x h h f x f x 2 2-2 ''( i) D ( i) / = D ( i+ 1) - ( i) ( ) f x h é êë f x f x f x f x -2 f ''( xi) h éf( xi+ 2) - 2 f( xi+ 1) + f( xi) ù ë û -2 ''( i) ( i+ 2) - ( i+ 1) - ( i+ 1) - ( i) ( ) ( ) ù úû

D f = f - f If i Ff i i i+ 1 i = = f i+ 1 ( F- I) f = f - f D= ( F-I) i f i i+ 1 i 2 2 2 2 2 D = ( F - I) = F - 2FI+ I = F - 2F+ I

f = f - f = ( I-B) f i i i-1 i f = ( I- B) f = ( I- 2 B+ B ) f 2 2 2 i i i 2 fi = fi- 2 fi- 1+ fi-2 2 2 D = - + = - + B ( F 2 F I) B ( F 2 I B) 2 2 = - + = - + F ( I 2 B B ) F ( F 2 I B) f = f - f = ( F-B) f i i+ 1 i-1 i 2 2 2 2 i i i 2 f = ( F- B) f = ( F - 2 FB+ B ) f f = f - 2 f + f i i+ 2 i i-2

Second Derivatives f x f x f x h f x h O h 2 3 ( i+ 2) = ( i) + '( i)(2 ) + 0.5 ''( i)(2 ) + ( ) f x f x f x h f x h O h 2 3 ( i+ 1) = ( i) + '( i)( ) + 0.5 ''( i)( ) + ( ) 2 3 2 f( xi+ 1) = 2 f( xi) + 2 f '( xi)( h) + f ''( xi)( h) + O( h ) f x f x f x f x h O h 2 3 ( i+ 2) - 2 ( i+ 1) =- ( i) + ''( i)( ) + ( ) f( xi+ 2) - 2 f( xi+ 1) + f ''( xi ) = 2 h 2 D fi f ''( xi ) = + O( h) 2 h f( x ) i + Oh ( ) Second Forward Difference

Second Derivatives f( x )- 2 f( x ) + f( x ) f x O h h 2 fi f ''( xi ) = + O( h) 2 h i i-1 i-2 ''( i ) = + ( ) 2 Second Backward Difference

Second Derivatives f( xi+ 1) - 2 f( xi) + f( xi- 1) f ''( xi ) = + O( h) 2 h 2 fi f ''( xi ) = + O( h) 2 h Second Centered Difference

Propagation of Error Suppose that we have an approximation of the quantity x, and we then transform the value of x by a function f(x). How is the error in f(x) related to the error in x? How can we determine this if f is a function of several inputs?

x x x = x+ e f ''( x) f( x ) = f( x) + f '( x) e+ e 2 + 2! f( x )- f( x) f '( x) e If the error is bounded e < B f( x )- f( x) < f '( x) B If the error is random with standard deviation SD( x ) = s s SD( f ( x )) f '( x) s

x x x = x + e 1 1 1 1 1 x x x = x + e 2 2 2 2 2 f( x, x ) = f( x, x ) + f ( x, x ) e + f ( x, x ) e + 1 2 1 2 1 1 2 1 2 1 2 2 f( x, x )- f( x, x ) f ( x, x ) e + f ( x, x ) e 1 2 1 2 1 1 2 1 2 1 2 2 If the errors are bounded e i < B i f( x, x ) f( x, x ) f ( x, x ) B f ( x, x 1 2-1 2 < 1 1 2 1+ 2 1 2) B2

Stability and Condition If small changes in the input produce large changes in the answer, the problem is said to be ill conditioned or unstable Numerical methods should be able to cope with ill conditioned problems Naïve methods may not meet this requirement The condition number is the ratio of the output error to the input error

x = x+ e The error of the input is e. e/ x e/ x is the relative error of the input. The error of the output is f( x )- f( x) e f '( x) and the relative error of the output is f( x ) - f( x) ef '( x) ef '( x ) f( x) f( x) f( x ) The ratio of the output RE to the input RE is f ( x ) - f ( x) e f '( x) xf '( x) xf '( x ) = ( e/ x) f( x) ( e/ x) f ( x) f ( x) f ( x )