Error Detection & Correction Error detection & correction noisy channels techniques in networking error detection error detection capability retransmition error correction reconstruction checksums redundancy
Errors Errors that have no relation to each other are called random-bit errors A burst error is a large error, disrupting perhaps thousands of bits An important characteristic of any error-correction system is the burst length, that is, the maximum number of adjacent erroneous bits that can be corrected. The bit-error rate (BER) is the number of bits received in error divided by the total number of bit received. The block error rate (BLER) measures the number of blocks or frames of data per second that have at least one occurrence of uncorrected data. The burst-error length (BERL)counts the number of consecutive blocks in error. Probability calculations for small probabilities p, q - the prob. of either event estimation: p+q exact value: 1-(1-p)(1-q)=p+q-pq example: BER: 1:10 7 prob. of one error in a 10000 bit packet? est. 10 4 / 10 7 = 10-3 exact 1-(1-10 -7 ) 10000 =0.0009995
Error limiting coding Message u : = ( u 1, u 2,... u K ) Coded message c : = ( c 1, c 2,... c N ) c : = f ( u) Ex. K=2, N=3 u 2 01 11 00 10 u 1 c 011 3 111 001 101 c 010 2 110 000 100 c 1 Hamming distance: (, ) : χ. d c v = { ci vi } i= 1 q Parameters Error limiting coding Messages are fixed-sized blocks from an alphabet Σ typically Σ = {0,1} q size of alphabet, Σ ; typically 2 n block length; code is a subset of Σ n k information length; Code = Σ k d is the minimum distance of the code R = k/n - the rate of the code (or the ratio of message bits to block bits). (n - log2(n)) / n is the theoretic limit. δ= d/n - relative distance (or normalized distance)
Parity bits Given a binary number, a residue bit can be formed by casting out 2s. This extra bit, known as a parity bit, permits error detection, but not correction. An even parity bit is formed with a simple rule : if the number of 1s in the data word is even, the parity bit is a 0; if the number of 1s in the word is odd, the parity bit is a 1. Even and odd parity Parity bits (cont.)
Parity bits (cont.) Error detection capability Error detection performance K=8, N=9 binary coding (1 parity bit), BER 1:10 4 8 bit message with error (w/o parity) 1-(1-10 -4 ) 8 =0.000799 undetected w/ parity 1 bit errors can be detected 9*10-4 (1-10 -4 ) 8 1-(1-10 -4 ) 9-9*10-4 (1-10 -4 ) 8 =0.000000359
Two-Dimensional Parity 0101001 1 1101001 0 Parity bits Data 1011110 1 0001110 1 0110100 1 1011111 0 Parity byte 1111011 0 Cyclic codes cyclic rotation of code words also gives code words ( c 1,c 2,...,c n ) C (c n,c 1,c 2,...,c n 1 ) C F q (a 0,a 1,...,a n 1 ) F n q a 0 + a 1 x +...+ a n 1 x n 1 F q [ x] ( ) = { a 0 + a 1 x +...+ a n 1 x n 1 a i F q,0 i < n} [ x] x n 1
Internet Checksum Algorithm Idea: view message as a sequence of 16-bit integers. Add these integers together using 16-bit ones complement arithmetic, and then take the ones complement of the result. That 16-bit number is the checksum. u_short cksum(u_short *buf, int count) { register u_long sum = 0; } while (count--) { sum += *buf++; if (sum & 0xFFFF0000) { /* carry occurred, so wrap around */ sum &= 0xFFFF; sum++; } } return ~(sum & 0xFFFF);
Cyclic Redundancy Check Add k bits of redundant data to an n-bit message. Represent n-bit message as an n-1 degree polynomial; e.g., MSG=10011010 corresponds to M(x) = x 7 + x 4 + x 3 + x 1. Let k be the degree of some divisor polynomial C(x); e.g., C(x) = x 3 + x 2 + 1. Transmit polynomial P(x) that is evenly divisible by C(x), and receive polynomial P(x) + E(x); E(x)=0 implies no errors. Recipient divides (P(x) + E(x)) by C(x); the remainder will be zero in only two cases: E(x) was zero (i.e. there was no error), or E(x) is exactly divisible by C(x). Choose C(x) to make second case extremely rare. Cyclic Redundancy Check
Sender: multiply M(x) by x k ; for our example, we get x 10 + x 7 + x 6 + x 4 (10011010000); divide result by C(x) (1101); Generator 1101 11111001 10011010000 Message 1101 1001 1101 1000 1101 1011 1101 1100 1101 1000 1101 101 Remainder Send 10011010000-101 = 10011010101, since this must be exactly divisible by C(x); Want to ensure that C(x) does not divide evenly into polynomial E(x). All single-bit errors, as long as the x k and x 0 terms have non-zero coefficients. All double-bit errors, as long as C(x) has a factor with at least three terms. Any odd number of errors, as long as C(x) contains the factor (x + 1). Any burst error (i.e sequence of consecutive errored bits) for which the length of the burst is less than k bits. Most burst errors of larger than k bits can also be detected.
Common polynomials for C(x): CRC CRC-8 CRC-10 CRC-12 CRC-16 CRC-CCITT CRC-32 C(x) x 8 +x 2 +x 1 +1 x 10 +x 9 +x 5 +x 4 +x 1 +1 x 12 +x 11 +x 3 +x 2 +x 1 +1 x 16 +x 15 +x 2 +1 x 16 +x 12 +x 5 +1 x 32 +x 26 +x 23 +x 22 +x 16 +x 12 +x 11 +x 10 +x 8 +x 7 +x 5 +x 4 +x 2 +x+1 CRC calculation with shift register position of XOR gates if term x n is in the generator put XOR gate before corresp. shift register generator: x 3 + x 2 +1 Message x 0 x 1 XOR gate x 2
CRC as cyclic binary systematic code c( x) = g( x) u( x) h(x) = x N 1 g(x) c(x)h(x) = 0 mod ( x N 1) c(x) = u(x) x N-K [ u(x) x N K mod g(x) ] Fletcher s checksum software CRC generation and testing is processor intensive approx. 50 machine instructions per byte of data Fletcher s checksum is nearly equal in error detection, but requires much less computation all single bit error double bit errors if they are close 65535 vs. 2040 bits 4-5 machine instructions per byte of data
Fletcher s checksum integer i, sum1,sum2 sum1=0 sum2=0 for i from 1 to msg_length do sum1=(sum1+msg[i]) mod 255 sum2=(sum2+sum1) mod 255 end for check1=255-((sum1+sum2) mod 255) msg[msg_length+1]=check1 check2=255-((sum1+check1) mod 255) msg[msg_length+2]=check2 Checksum performace CRC-32 32 bit cont. burst 2 bit errors in 64K blocks all odd number bit errors Internet checksum (16 bit) all 1 bit errors order independent most efficent calculation Fletcher s checksum (16 bit - two 8 bit xsum) all 1 bit error order dependent unaffected by starting or trailing zeros 15 bit burst 2 bit errors in 2040 bit blocks efficent calculation
Checksum performace observations checksum distributions on modest amounts of realdata are substantially different from the distributions one would anticipate for random data distribution would suggest. for the TCP checksum (and apparently Fletcher s checksum as well), the skewed distribution makes failure in the face of combined orreshuffled data more likely. compressing data clearly improves the performance of checksums. there s a strong suggestion that the common prac-tice of adjusting checksum fields to cause a packet or segment s checksum to sum to zero is a bad idea. Error correction Hamming distance defines a ball; nonintersecting balls are what permits error correction (to error correction distance). d min > 2t t corr = int d min 1 2 Errors simple error erasure t erasure = d min 1
Error correction codes Linear coding scheme Systematic coding scheme Error correction codes G generator matrix H parity matrix Syndrome Error s T e := v c c = u G. T G H = 0. c H T = 0 = H v T 0 = H e T
Parity-based encodings Hamming codes Error detecting and correcting code detects all single and double bit errors corrects all single bit errors 2 m n+1 (m check bit in block of n bit)
Hamming codes Calculation Write the data bits in positions whose binary representation has at least two 1 bits. Set the bits whose positions are powers of 2 so that the parity of odd-numbered bits is even, the parity of bits whose position is 2 or 3 mod 4 is even, and so on. Decoding Compute the parity of all odd-numbered bits, the parity of bits whose position is 2 or 3 mod 4, etc. Interpret these parities as a number which is a bit position. Flip the bit in that position. Read out the data bits. (7,4) Hamming code Hamming codes DDDPDPP Data bits corresponding to parity bits: The minimum distance between any two valid code words is 3.
(7,4) Hamming code as systematic code Code b as k=bg [ ] b = b 0 b 1 b 2 b 3 1 0 0 0 0 1 1 0 1 0 0 1 0 1 G = 0 0 1 0 1 1 0 0 0 0 1 1 1 1 To check calculate kh T 0 0 1 0 1 0 0 1 1 H T = 1 0 0 1 0 1 1 1 0 1 1 1 Hamming code example 0111110x100x0xxx 011111011000001x (the x is bit 0) Transmit it with an error: 011101011000001 Compute parities: 0111010110000010 01110101 1 0111 1000 0 01 01 10 00 1 0 1 0 0 1 0 0 1 1 Bit 11 (1011 in binary) is in error. Flip it: 0111110110000010 Extract the data: 0111110 100 0
Reed-Solomon coding Systematic coding Vandermonde matrix Error correction (failure models) errors erasures s bit symbols Finite field arithmetic Reed-Solomon code Reed-Solomon codes exclusively use polynomials derived using finite field mathematic known as Galois Field to encode and decode block data. Either multiplication or addition can be used to combine elements, and the result of adding or multiplying two elements is always a third element contained in the field. In addition, there exists at least one element, such that every other element can be expressed as a power of this element.
Finite fields (K,+,*) is called a finite field (or Galois filed). 1. + and * are closed on K, 2. for any elements a and b of K, a+b=b+a and a*b=b*a, 3. for any elements a, b and c of K, (a+b)+c=a+(b+c) and (a*b)*c=a*(b*c), 4. for any elements a, b and c of K, a*(b+c)=a*b+a*c, 5. there exist unique an additive identity 0 and a multiplicative identity 1, 6. for any element a of K, there exists a unique additive inverse -a. And for any nonzero element b of K, there exits a unique multiplicative inverse b -1. Finite fields (cont.) There exists a finite field of order q if and only if q is a prime or prime power. (K, +(mod p), *(mod p)) is the finite field GF(p) Fields with q=p n elements p prime are called extension fields Polynomial over a GF which is unfactorable is called an irreducible polynomial If an element a of GF(q) satisfies α i 1 for all i = 1,2,..., q-2, then α is called a primitive element of GF(q).
Calculations Finite fields (cont.) p=2 sum bit-by-bit modulo 2 sum - XOR multiplication & division primitive element logarithm GF Example Consider a GF(2 3 ) code comprising 3-bit symbols. α is the primitive element and is the solution to the equation : F(x)=x 3 +x +1, that is α 3 +α+1 =0. The element can be represented as ordinary polynomials : 000 = 0 001 =1 010 =x 011 =x +1 100 =x 2 101 =x 2 +1 110 =x 2 +x 111 =x 2 +x +1 Using the properties of the Galois Field and modulo 2 (where 1+1 = α+ α= α 2 + α 2 =0) 0=000, 1=001, α=010, α 2 =100, α 3 = α+1 =011 α 4 = α(α+1)= α 2 + α=110, α 5 = α 2 + α+1 =111, α 6 =(α 3 ) 2 = α 2 +1 =101, α 7 = α 3 + α=2 α+1 =1 =001
GF Example (cont.) Elements can be multiplied by simply adding exponents, always resulting in another element in the Galois Field. α α= α 2 = (010)(010) = 100, α 2 *α 3 = α 5 = (100)(011) = 111 RS(n,k) with s-bit symbols n=2 s -1 Coding codeword c(x)=g(x)*a(x) k=n-2t
Coding (cont.) coding with Vandermonde matrix codewords Reed-Solomon codes example The size of the Galois Field, which determine the number of symbols in the code, is based on the number of bits comprising a symbol; 8-bit symbols are commonly used. A primitive polynomial often used in GF(2 8 ) systems is x 8 +x 4 +x 3 +x 2 +1. The code can use the input word to generate two types of parity, P and Q. The P parity can be a modulo 2 sum of the symbols. The Q parity multiplies each input word by a different power of the GF primitive element. If one symbol is erroneous, the P parity gives a nonzero syndrome S1. The Q parity yields a syndrome S2. By checking this relationship between S1 and S2, the RS code can locate the error. Correction is performed by adding S1 to the designated location.
RS example (cont.) Suppose that A, B, C, and D are data symbols and P and Q are parity symbols of RS(6,4). The RS code will satisfy the following equations : A + B + C + D + P + Q = 0 α 6 A + α 5 B + α 4 C + α 3 D + α 2 P + α 1 Q = 0 P = α 1 A + α 2 B + α 5 C + α 3 D Q = α 3 A + α 6 B + α 4 C + α 1 D If A = 001 = 1, B = 101 = α 6, C = 011= α 3, D =100 =α 2 P = 101 = α 6, Q = 110 = α 4 Assume A, B, C, D, P, and Q are the received data S1 = A + B + C + D + P + Q S2 = α 6 A + α 5 B + α 4 C + α 3 D + α 2 P + α 1 Q RS example (cont.) Each possible error pattern is expressed by E i : S1 = E A + E B +E C + E D + E P + E Q S2 = α 6 E A + α 5 E B + α 4 E C + α 3 E D + α 2 E P + α 1 E Q If there is no error, then S1 = S2 = 0 If symbol A is erroneous, S1 = E A and S2 = α 6 S1 If symbol B is erroneous, S1 = E B and S2 = α 5 S1 If symbol C is erroneous, S1 = E C and S2 = α 4 S1 If symbol D is erroneous, S1 = E D and S2 = α 3 S1 If symbol P is erroneous, S1 = E P and S2 = α 2 S1 If symbol Q is erroneous, S1 = E Q and S2 = α 1 S1 In other word, an error results in nonzero syndromes; the value of the erroneous symbols can be determined by the difference of the weighting between S1 and S2..
RS example (cont.) If the received data is : A = 001 = 1, B = 101 = α 6, C = 001= 1 ( erroneous ), D = 100 = α 2, P = 101 = α 6, Q = 110 = α 4 S1 = α = 010, S2 = α 2 + α + 1 = α 5 = 111 S2 = α 4 S1, symbol C must be erroneous and because S1 = E C = 010, C = C + E C = 001 + 010 = 011.