AS the number of cores per chip continues to increase,

Size: px
Start display at page:

Download "AS the number of cores per chip continues to increase,"

Transcription

1 IEEE TRANSACTIONS ON COMPUTERS 1 Improving Bit Flip Reduction for Biased and Random Data Seyed Mohammad Seyedzadeh, Rakan Maddah, Donald Kline Jr, Student Member, IEEE, Alex K. Jones, Senior Member, IEEE, Rami Melhem, Fellow, IEEE Abstract Nonvolatile memory technologies such as Spin-Transfer Torque Random Access Memory (STT-RAM) and Phase Change Memory (PCM) are emerging as promising replacements to DRAM. Before deploying STT-RAM and PCM into functional systems, a number of challenges still remain must be addressed. Specifically, both require relatively high write energy, STT-RAM suffers from high bit error rates and PCM suffers from low endurance. A common solution to overcome those challenges is to minimize the number of bits changed per write. In this paper, we propose and evaluate the hybrid coset encoder to efficiently improve and balance the bit flip reduction for biased and unbiased data. The main core of the coset encoder consists of biased and unbiased vectors which maps the data input to a larger set of data vectors. Subsequently, the intermediate data vector that yields the least number of differences when compared to the currently stored data is selected. Our evaluation shows that hybrid coset encoder reduces bit flips by up to 25% over a baseline differential writing scheme. Further, our proposed scheme reduces bit flips by up to 20% over the leading bit-flip minimization scheme for biased data, while achieving very low decoding overhead similar to the Flip-N-Write scheme. Index Terms Non-Volatile Memory, Coset Coding, Reliability. 1 INTRODUCTION AS the number of cores per chip continues to increase, the memory system is becoming more than ever a defining component for the performance of computer systems. A large memory capacity operating under stringent quality of service requirements is required to respond to memory access requests of executing cores within acceptable latencies. Unfortunately, DRAM, which currently forms the building block of the memory system, is becoming limited by power and scalability challenges, thus, endangering the evolution of the memory system. This has turned the attention of architects and researchers to considering alternative memory technologies [1 4]. Amongst several memory candidates, both Phase Change Memory (PCM) and Spin-Transfer Torque Random Access Memory (STT-RAM) are receiving considerable attention as potential replacements for DRAM. Assessments and evaluations of PCM [5, 6] show that it can compete with DRAM in terms of performance while providing improved scalability and power efficiency. Multi-level cell techniques for STT-RAM [7] suggest the potential for near DRAM densities while retaining a near SRAM performance (potentially faster than DRAM). Yet, S.M. Seyedzadeh and R. Melhem are with the Department of Computer Science, University of Pittsburgh, Pittsburgh, PA seyedzadeh@cs.pitt.edu, melhem@cs.pitt.edu. Rakan Maddah is with Intel Corporation. This work was completed while the author was still a Ph.D student at the University of Pittsburgh. rakan.maddah@intel.com. D. Kline Jr and A.K. Jones are with the Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA dek61@pitt.edu, akjones@ece.pitt.edu. both technologies suffer from a number of challenges that must be resolved to become viable for high volume manufacturing. Specifically, PCM suffers from low endurance [8] (10 6 to 10 8 writes on average) and STT-RAM suffers from high write bit error rates [9 11]. To address the challenges faced by PCM and STT- RAM, servicing write requests while minimizing the actual number of bits written to memory is a promising approach. This achieves a reduced effective wear-out rate of PCM cells and a lower effective bit error rate for STT-RAM. In its simplest form, minimizing the actual number of bit flips can be achieved through a concept called a differential write [12]. Differential write is a bit by bit comparison between the new data to be written and the currently stored data within the memory block and subsequently only writing to the cells that store bit values different from their new bit values. Clearly, the higher the similarity of the new data to the currently stored data the lower the number of required bit changes. In this realm, coset coding techniques [13 17] have been used to encode the new data into a form that exhibits high similarity of the data to be written to the currently stored data. The encoding process consists of mapping the new data vector to be written into several other vector candidates and subsequently picking the candidate that minimizes the bit flips. Therefore, the type of the encoding can play a significant role in minimizing the bit flips. Consequently, in this paper, we consider two fundamental problems: Finding efficient encodings to reduce not only the number of bit flips but also the encoding hardware overhead. Finding encodings that apply to both unbiased and biased data.

2 IEEE TRANSACTIONS ON COMPUTERS 2 We advance the notion of the coset coding to deliver solutions to these problems and provide major enhancements of the understanding of coset coding and how this may be implemented in a simple fashion, with reduced computational complexity. We make the following contributions to data encoding for bit flip minimization: We observe that increasing the randomness of encoded vector candidates decreases the required number of bit flips required for unbiased (i.e. apparently random) data sets. We demonstrate mathematically and by simulation that random encoding outperforms the leading bitminimization encoding approach based partially on inverting bits. We introduce a new type of the base coset called the hybrid coset which includes unbiased and biased vectors suitable for the unbiased and biased data, respectively. We propose low-overhead encoders and decoders that only use the bitwise exclusive OR operations. The remainder of this paper is organized as follows. Section 2 presents fundamentals on bit flip reduction techniques. Section 3 describes write minimization using a randomly generated coset. Section 4 discusses two different ways for the construction of the random cosets. Section 5 illustrates a pseudo-random encoding scheme [18] which can be suitable for unbiased data. Sections 6 and 7 explain the low-overhead and flexible encoders/decoders which can be applied to biased and unbiased data. Section 8 presents an experimental evaluation of hybrid coset coding for biased and unbiased data. Section 9 compares the encoding and decoding overheads of schemes. Finally, Section 10 concludes our work. 2 BACKGROUND: WRITE MINIMIZATION US- ING COSET THEORY As mentioned in the previous section, differential write [12] is a technique to minimize bits written by only writing bits that change value during the write operation. Consider an n-bit data block B that is to be written to an n-cell memory block that already stores data D. A traditional differential write first reads D, compares it with B and only writes the bits of B that are different from their corresponding bits in D. Thus, for entirely random data, the distribution of zeros and ones in both B and D is random, and hence, on average, n/2 bits are written to memory instead of n bits. Coset theory [19] attempts to increase the number of bits that are identical in B and D by exploring a number K of possible encodings of B, where K = 2 k and k is an integer. The overall number of written bits can be reduced by selecting the encoding that has the minimum Hamming distance to D (the fewest number of different bits). To recover the original data B, it is necessary to have a unique decoding path. To accomplish this, one coset approach defines a fixed set of K, n-bit vectors, C 0,..., C K 1 and uses these vectors to generate K different encodings of B through bit-wise XOR. We denote these K different encodings of B as W 0,, W K 1 where each W i, 0 i K 1, is an (n + k)-bit vector with the first n bits equal C i B and the last k bits record the binary representation of the index i. Any W i among the K possible encodings can be decoded back to B by recovering i from the last k bits of W i and XORing C i with the first n bits of W i, i.e., (C i B) C i = B. To minimize the number of bits written, as previously mentioned, the encoding W i that has the minimum Hamming distance from D (the data already stored in memory) is used. However, each W i and D contains n+k bits, while the data block, B, contains only n bits. The additional k bits, which stores the index i, represents the coding overhead that is needed to reduce the Hamming distance and minimize the number of bits written during a differential write. We denote these overhead bits as h 0,..., h k 1. In the rest of this paper, we will refer to the set C 0,..., C K 1 as the base coset. Flip-N-Write (FNW) proposes a method to reduce the number of bits written using differential write by selectively inverting blocks of the data to be written [16]. In general, for any number of k bits, Flip-N-Write divides B into k equal partitions and writes each partition directly or inverted, whichever minimizes the number of written bits. It uses k overhead bits h 0,, h k 1 to track which partition is inverted to allow retrieval of the original data. FNW is a special case of the coset approach in which each n-bit vector C i in the base coset is constructed from K = 2 k sub-vectors, C i,j, j = 0,, K 1, each consisting of n/k bits. Specifically, if the overhead bits, h 0,, h k 1, are used to store the binary representation of the index i, then the n/k bits of C i,j are all zeroes if h j = 0 and the n/k bits of C i,j are all ones if h j = 1. For example, if k = 1 (that is K = 2), then the base coset for Flip-N-Write contains the two vectors 0 0 and 1 1, which means that the data vector B can be either written as is B 0 0 or written inverted B 1 1, with one overhead bit, h 0, indicating which of the two options is used. For k = 2 (that is K = 4), the base coset for Flip-N-Write contains the 4 vectors , , and , which means that B is divided into two halves and each of the two halves can be either inverted or not inverted, with one overhead bit, h 0, to indicate if the first half is inverted and another overhead bit, h 1, to indicate if the second half is inverted. Another approach for write minimization, FlipMin, proposes using a special instance of coset theory to encode B using the Hamming (72,64) s dual code [20] to obtain W 0,, W K 1 and to decode any W i back to B [17]. FlipMin performs one to many mapping by its base coset from each dataword to a coset of vectors and then picks for writing the vector that minimizes the number of bit flips. The key difference between FNW

3 IEEE TRANSACTIONS ON COMPUTERS 3 and FlipMin can be conceptually described in terms of the base coset used to map the dataword. Specifically, FNW uses simple patterns such as 0 0 and 1 1 to construct the base coset while FlipMin utilizes the generators of linear codes as the base coset. In the next section, we propose a third option for the base coset. 3 WRITE MINIMIZATION USING A RANDOMLY GENERATED COSET In this section, we take a different approach to building the base coset by randomly generating each vector C 0,, C K 1. We mathematically demonstrate that a randomly generated coset can complete write requests while inducing fewer bit flips than FNW. To derive a formula for the number of written bits when the base coset consists of K randomly generated vectors, C 0,, C K 1, we compute the Hamming distance between each encoded vector C i B and the currently stored data, D. We then estimate the expected value of the minimum of this distance over i = 0,, K 1. The Hamming distance between C i B and D is equal to the Hamming weight of C i B D. Moreover, because B, D and C i are random, then C i B D is also random. This implies that the number of written bits (NWB) is equivalent to the expected value of the minimum Hamming weight of K random vectors. The random vector, C i can be any element of the set of 2 n possible strings of zeroes and ones. We divide the set of 2 n possible elements into the set of 2 n k distinct groups so that each group has 2 k elements. The number of groups can change from one group (k = n) to 2 n groups (k = 0). We assume that the weight of a word is determined by the number of ones it contains denoted by the parameter w such that 0 w n. The total number of elements that has weight w in the set of 2 n elements is ( n w). Since the number of elements in each group is 2 k, the number of elements with weight w in a group of 2 k elements ranges 1 l 2 k. In the set of 2 n elements, there are ( ) 2 n 2 different combinations to k form 2 n k distinct groups so that each group includes 2 k elements. To obtain the average weight of groups, we follow steps 1-3 as follows: Step 1. Find the Number of Combinations with the Weight w (NC w ) so that each combination has at least one element with weight w and at most 2 k elements with weight w as follows: NC w = 2 k l=1 (( n w) l ) ( n j=w+1 2 k l ( n ) j) Step 2. Multiply NC w by the corresponding weights as follows: n NC = (w NC w ) (2) w=0 Step 3. Calculate the average number of bits updated (1) Bit Flip Reduc:on (k=4 auxiliary bits) 25% 20% 15% 10% 5% 0% 32 (12.5%) 64(6.25%) 128 (3.125%) 256(1.5625%) 512 ( %) Block Size / Overhead (Percentage) Random Coset FNW Coset Fig. 1: Weight average difference of random cosets and FNW with 4 auxiliary bits and different block sizes. The value in between parentheses represents space overhead. per write (denoted NW B Random ) as follows: NW B Random = NC ( 2 n 2 k ) (3) where 0 w n, 1 l 2 k and 0 k n. For comparison with FNW [16], we note that given a memory block of size n and k auxiliary bits, the number of written bits by FNW can be expressed as: NW B F NW = k 2 n k 1 n 2k 1 i=0 n ) i( k + n ( n ) k i 2 n k +1 n 2k Figure 1 shows the bit flip reduction over differential write (n/2 bit flips) achieved by randomly generated cosets and FNW cosets derived through Eq. (3) and Eq. (4). The randomly generated coset achieves considerably higher flip reduction rates than a FNW coset for various different block sizes. Unfortunately, it is infeasible to determine an analytical model to estimate the number of required bit flips for FlipMin to complete a write request. Thus, we rely on Monte Carlo simulations to compare random cosets against FlipMin. Our analysis from these simulations do indicate that random cosets can achieve significantly fewer bit flips than FlipMin for unbiased data as reported in Section MECHANISMS OF THE RANDOM COSET GENERATION In Section 2, we introduced cosets that can encode a value B into a value W i that reduces the bit transitions when writing to a memory cell containing a value D using differential write. We can also refer to the encoded value W i as a codeword. The analysis provided in Section 3 demonstrates that using a randomly generated coset for unbiased data provides higher reduction in bit flips than the leading existing flip minimization schemes. However, to decode a codeword encoded with a random coset, we need to know the random vector C i that we used to encode the data. Since the inverse generator matrix concept used in FlipMin is not a valid option (4)

4 IEEE TRANSACTIONS ON COMPUTERS 4 for a random coset, we are left with the following two options: Use a random number generator at encoding time to generate the random coset element, C i, from B. The hardware overhead of generating these random elements by traditional random generators is high and typically irreversible at decoding time. To address these difficulties, in the next section we propose a Pseudo-Random Encoding Scheme, PRES, to generate pseudo-random vectors directly from the value B. PRES codewords are quickly recoverable using an efficient decoding mechanism that is less complex than FlipMin. Generate the random coset elements in advance and store them in both the encoder and the decoder. This option leads to creating a low-overhead encoder and decoder which is more flexible and efficient than the first option. Details are explained in Sections 6 and 7. 5 COSET GENERATION BASED ON PRES PRES uses a tree-structure pseudo-random encoding model to generate pseudo-random cosets. We define conditions for our proposed tree-structure model to guarantee that the generation of the pseudo-random cosets is demonstrably close to true random through several standard tests. We describe the PRES scheme in detail in the following sections. 5.1 PREM: A Pseudo-Random Encoding Model We first define a pseudo-random encoding model (PREM) to decorrelate a data block B as: P n 1 B i i=0 P i = B i 1 B i i=1 (5) P i 1 B i i=2,...,n-1 where B i and P i represent the i-th element of the data block and the pseudo-random vector, respectively. The parameter n is the number of memory cells to be encoded. As illustrated in Eq. (5), P i for 1 i n 1 is first generated, followed by P 0 that is produced with P n 1 from the previous step and B 0. Eq. (5) has been designed so that all B i have the potential to be updated after each encoding process; Since, the probability of the cell having a 0 or 1 is 1 2, it follows that the probability of the cell being updated is also 1 2. Thus, the probability of cells with different values is 1 2 because the corresponding combinations are 0 1 and also 1 0 out of the four possible combinations that also include 0 0 and 1 1. Therefore, there is a high probability to produce pseudo-random vectors using Eq. (5). The corresponding decoding algorithm for Eq. (5) can be expressed as: P n 1 P i i=0 B i = B i 1 P i i=1 (6) P i 1 P i i=2,...,n-1 B0 B1... Bn- 2 Bn- 1 P0 P1... Pn- 2 Pn- 1 B0 B1... Bn- 2 Bn- 1 Encoder Decoder Fig. 2: Overview of PREM for encoding and decoding. Figure 2 shows the feedback path from the left side to the right side that causes P i to be produced serially. As shown in Figure 2, there is no feedback path between P i and B i. However, the advantage of this configuration is that all cells in the decoder can be simultaneously decoded. Thus, read accesses that are typically on the critical path for processor performance and require decoding can be streamlined. Table 1 is an example that illustrates the encoding and the decoding for n = 4. Further, several pseudo-random encodings can be obtained by applying PREM in different bit-orderings to expand the number of candidate encodings. For example, if the feedback path in Figure 2 is used from the right side to the left side (i.e., reversed), a different pseudo-random vector, P, can be generated. Thus, it is possible to utilize one bit pattern in two different directions to build P and P. To produce additional pseudo-random codewords, the encoding process can be applied in different patterns such as different block interleaved orderings. We describe this process in the following section and use this to build an indexable set of pseudo-random vectors for write minimized encoding. 5.2 PRES: A Pseudo-Random Encoding Scheme We can create p different patterns by subdividing B into sub-blocks conceptually represented by the rows (or columns) of a two dimensional matrix. Thus, PRES can simultaneously and independently encode these sub-blocks using PREM in two opposite directions to TABLE 1: The encoding and the decoding for n = 4 Input B B B B = P 1 = B 1 B 0 = 1 0 = 1 Encoding P 2 = B 2 P 1 = 0 1 = 1 P 3 = B 3 P 2 = 0 1 = 1 P = B 0 0 P = 1 1 = 0 3 B 0 = P 0 P 3 = 0 1 = 1 Decoding B 1 = P 1 B 0 = 1 1 = 0 B 2 = P 2 P 1 = 1 1 = 0 B 3 = P 3 P 2 = 1 0 = 1

5 IEEE TRANSACTIONS ON COMPUTERS 5 LR RL TB BT Dataword Block/Index (a)/ (b)/ (c)/ Fig. 3: PRES 16-bit example (a) the 4 4 block, (b) the generation of parents, (c) the generation of children. generate two different pseudo-random codewords. By constructing p different matrices we can generate 2p codewords. For a partition based pseudo-random generator model to be functional, there are certain requirements on how each matrix is partitioned and encoded. To that end, each particular partitioning corresponds to a particular pattern and results in a unique pseudo-random encoding. However, partitions should attempt to group bits in the sub-blocks such that each partition groups unique bits together and does not repeat bits grouped together in other partitions sub-blocks. If overlapping occurs, those bits are guaranteed to have the same values in the two different codewords, which decreases the randomness of the encoding candidates. Ensuring that the patterns have minimal overlap makes it possible to build a partition based pseudo-random generator model that closely mimics an ideal random encoding generator. One way to do this is to use a single matrix and to apply PREM along different dimensions using different orderings such as along rows, then columns, diagonals, etc. In PRES, we propose a two-phase tree structure to generate 2p codewords in the first phase and to take these resulting codewords and from each generate 2p 1 codewords in the second phase, resulting in a total of 2p (2p 1) codewords. We assume that each matrix partitions the n bits of B in a square m m matrix, such that each row (or column) of the matrix can be considered a sub-block that contains m bits 1. The encoding process is explained as follows: Step 1. The encoder uses p given patterns in the tree structure to partition B into equal sub-blocks and generate p new matrices. PREM is independently applied to the sub-blocks of p new matrices in two different directions to generate 2p pseudo-random codewords. The generated codewords are different in terms of the used encoding direction or the pattern. Then, each codeword can be re-partitioned into p new matrices to produce 2p 1 codewords by PREM. Note that the same direction of the original PREM used to generate the first phase 1. It is relatively straightforward to extend this idea to support nonsquare matrices through repartitioning. codeword should not be used in the second phase as it will essentially reproduce several bits of the original block B. Step 2. The encoder utilizes k-bit indices (k = log 2 4p 2 ) to label the 2 k generated codewords (C 0,, C 2 k 1) in Step 1 and compare each codeword, concatenated with its corresponding index (i.e., W i ), with D. The W i that has the minimum Hamming distance to D is selected. Step 3. Then, W i is written to memory. To retrieve B, the decoder uses the index i in memory to find the corresponding patterns (matrix partitioning) and encoding directions used for the codeword written in memory. Finally, the decoder using Eq. (6) restores the original data block by reversing the two phases from the encoding. To clarify the process we provide a detailed example in Figure 3 to generate 16 pseudo-random codewords from a new 16-bit data block storing B = 0xBC07 and then minimize the number of bits flipped during the write to the memory that currently stores an existing value D = 0x Recall that D is the concatenation of a previously stored codeword (16-bits) and index (4-bits) requiring 20-bits of storage to store a 16-bit value. Let us assume that the bits of B are arranged in a 4 4 matrix [Figure 3(a)]. In the first phase of PRES, PREM is applied to the matrix from Left-to-Right (LR), Right-to- Left (RL), Top-to-Bottom (TB), and Bottom-to-Top (BT). Note, TB is equivalent to applying LR to the transposed matrix. Each function generates one codeword we call a parent [Figure 3(b)]. In the next phase, each parent using the three other functions generates three additional codewords, or children. For instance, the parent generated by LR uses RL, BT and TB functions to create three children [Figure 3(c)]. Each parent and child is provided a 4-bit unique index. PRES then compares the 16 generated codewords and index (i.e., W 0,, W 15 ) with D (i.e., 0x00000) and selects W i with the minimum Hamming weight. According to Figure 3, the block with the minimum Hamming weight is at index i = 4. Thus W 4 is written in memory. It is important to note that PRES actually generates 17 codeword candidates, because the original data B can

6 IEEE TRANSACTIONS ON COMPUTERS 6 4.a.1 i $$$$$$$coset0000$ 4.a.2 i $$$$$$$coset0001$ 4.a.3 i $$$$$$$coset0010$ 4.a.4 i $$$$$$$coset0011$ b.1 i $$$$$$$coset0100$ 4.b.2 i $$$$$$$coset0101$ 4.b.3 i $$$$$$$coset0110$ 4.b.4 i $$$$$$$coset0111$ c.1 i $$$$$$$coset1000$ 4.c.2 i $$$$$$$coset c.3 i $$$$$$$coset1010$ 4.c.4 i $$$$$$$coset1011$ d.1 i $$$$$$$coset1100$ 4.d.2 i $$$$$$$coset1101$ 4.d.3 i $$$$$$$coset1110$ 4.d.4 i $$$$$$$coset1111$ Fig. 4: The generation of 16 cosets using the base coset (4.a.1). The elements of the first coset, coset 0000, containing the vectors C 0 = 1100, C 1 = 1001, C 2 = 0110, and C 3 = 0001 are randomly selected from the 16 possible 4-bit vectors. be a codeword. If there is a desire to use the original data as one of the codeword candidates, rather than use an additional index bit, we can systematically replace one of the generated codewords with B. In this example, only p = 2 patterns (horizontal and vertical) were used to generate 16 pseudo-random codewords. If we have additional index bits (e.g., 6-bits), PRES can utilize other patterns such as southwest-tonortheast and southeast-to-northwest diagonal patterns to increase the generated pseudo-random codewords from 16 to 64. In this case, although the computational complexity of the decoder does not change, the computational complexity of the encoder significantly increases. In the next section, we propose an encoder and decoder based on storing base coset elements, which have lower computational complexity than PRES. 6 RANDOMLY GENERATED COSET (RGC) In this section, we use pure random data to build a base coset from which other cosets are derived. The base coset is stored in the encoder and the decoder. Without loss of generality, we assume that a base coset has 2 k unique elements. The (n+k)-bit element consists of two parts: k-bit unique index, i, and n-bit random data, C i. We provide a simple example to clarify the coset generation using random data. Assume we have a base coset as shown in Figure 4.a.1. The base coset consists of 2 k (n+k)-bit elements where k = 2 and n = 4. For each row in the base coset, although the values, C i, are randomly selected, a unique index, i, is assigned. Assuming the 4-bit input data, B, can be any element of a set of 2 4 possible strings of zeros and ones. XORing bits of C i by B, 2 4 unique cosets (coset B = C i B; i = 0,, 3) are generated. For example, all elements of the base coset are mapped from Figure 4.a.1 to Figure 4.a.2 by XORing C i and B = 0001 to create the coset Moreover, the set of all possible bit vectors is split into 2 4 cosets 2. The base coset can sometimes be referred to as the zero coset because coset 0000 is identical to the base coset. indexed by all possible values of B so that each coset contains 2 2 unique elements indexable by i. 6.1 Coset Encoder We now propose an architecture for minimizing writes using the randomly generated coset. As shown in Figure 5, the main core of the encoder and the decoder is the base coset. The base coset includes K = 2 k blocks that store n-bit random data. The role of the base coset in the encoder is to map datawords from the set of 2 n possibilities to a larger set of 2 n+k possible (n+k)-bit codewords. The larger set of codewords makes it much more flexible for the encoder to select the codeword W i to be written in the memory block such that the Hamming weight compared to the codeword D already stored in the memory block is minimized. The middle part in the encoder, i.e. the algorithm part, finds the codewords with the minimum Hamming weight. The decoder utilizes the New data (B) Coset0 (C0,..., CK- 1) Wi' Ci i' i Wi i Xi Stored data (D) Encoder Coset0 (C0,..., CK- 1) Ci' Decoder Find Wi that minimizes the weight of Xi i' Xi' Original data (B) Fig. 5: Coset Encoder and Coset Decoder. Wi'

7 IEEE TRANSACTIONS ON COMPUTERS 7 base coset, along with the index stored in the codeword, for finding the base coset element used in the encoder to retrieve the original data block. In order to encode the dataword B, the encoder maps B to the suitable coset (i.e., coset B ) from an arbitrary base coset containing unique random coset elements. Let us present an algorithm to encode B assuming that the random base coset is stored in both the encoder and decoder. Given the base coset with 2 k blocks, 2 k n-bit random data bits, C i,j are stored in the base coset where C i,j denotes the j th bit of the vector C i. Thus, B is an n-bit data block that is to be written to an (n + k)-cell memory block that already stores data D such that the first n bits of the block contain data and the last k bits stores the corresponding codeword index. To select the codeword W i encoded from B to store in the memory block, Algorithm 1 is applied. In each iteration, i, of the algorithm (line 2), C i (the i th element in the base coset) is selected to be XORed with B. Then, the results of n XOR operations (lines 3 to 4) are set to the first n bits of W i. The last k bits of W i records the binary representation of the index i in lines 5 to 7 where symbols % and / are the mode and division operations. Thus, W is coset B of the base coset. To minimize the number of bits flipped between the new write and the old write, the (n + k)- cell memory block D is XORed with each W i (lines 8 to 10) to form the two-dimensional array X i,j which keeps the weight difference between the old data, D, and the elements of the new coset, W i. Then, the encoder computes the Hamming weight of X i (lines 11 and 12) and finds X i which has the minimum Hamming weight (line 13). Finally, the (n + k)-bit codeword, W i, is stored into memory (line 14). Sometimes, there is more than one i discovered at line 13. This means that multiple coset elements can minimize the number of bit flipped per write. In this case, Algorithm 1 selects the minimum i of the set of elements that satisfy the minimum number of flips. 6.2 Coset Decoder To retrieve the dataword, Algorithm 2 is applied. To determine which the coset element has been used in the codeword, lines 1 to 3 retrieve the index used in the encoder, i. XORing the corresponding bits of W i,j and C i,j, the original dataword, B, is recovered. Scrutinizing the coset encoder in Algorithm 1, we observe that the main part of the encoder can be implemented in parallel because the base coset elements, C i, are independent and there is not any feedback between the neighboring bits in C i. Accordingly, lines 2 to 7 can merge with lines 8 to 10 to provide the first n bits of W i,j. Also, finding the index of the used base coset element in the encoder, Algorithm 2 simultaneously retrieves original bits of the dataword. 7 HYBRID COSET (HC) Recall that write minimization results depend on the input dataset. While FNW greatly outperforms FlipMin Algorithm 1: Coset Encoder 1 begin 2 for i 0 to 2 k do 3 for j 0 to n 1 do 4 W i,j = C i,j B j 5 for j n to n + k 1 do 6 W i,j = i%2 7 i = i/2 8 for i 0 to 2 k do 9 for j 0 to n + k 1 do 10 X i,j = W i,j D j 11 for i 0 to 2 k 1 do 12 S i = sum(x i,j, j = 0,, n + k 1) 13 if i, i S i < S i then 14 Out W i,j Algorithm 2: Coset Decoder 1 begin 2 for j n to n + k 1 do 3 i + = W i,j 2 j n 4 for j 0 to n 1 do 5 B j = W i,j C i,j on biased data, FlipMin has been shown to do better with unbiased data [17]. Moreover, we will show in Section 8.1 that the random coset will do better than FNW with unbiased data. This discrepancy reveals the lack of tunability and flexibility of existing base cosets for different types of input datasets. This motivates and inspires a new type of base coset, a hybrid based coset, which is the combination of the randomly generated coset and FNW coset. Our hypothesis is that by replacing a percentage of randomly generated coset elements with FNW coset elements, we will significantly improve the bit flip reduction for biased data while suffering only a negligible degradation of bit flip reduction for unbiased data compared to an entirely randomly generated coset. As explained in Section 2, the FNW coset element is constructed from K = 2 k vectors, C i,j, j = 0,, K 1, each consisting of n/k bits. Given the overhead bits, h 0,, h k 1, the n/k bits of C i,j are all zeroes if h j = 0 and the n/k bits of C i,j are all ones if h j = 1. For example, if k = 4 (that is K = 16), FNW coset elements contain 16 vectors , , ,, which can be shown by corresponding overhead bits (pattern) 0000, 0001, 0010,, 1111, respectively. We choose 2 α FNW coset elements and 2 k 2 α randomly generated coset elements from the 2 k base

8 IEEE TRANSACTIONS ON COMPUTERS 8 Access Frequency Distribu7on 45% 40% 35% 30% 25% 20% 15% 10% 5% 0% Pa#ern astar gcc leslie3d mcf soplex zeusmp canneal lbm libquantum omnetpp wrf milc Fig. 6: Access frequency distribution for 16 FNW coset elements. coset elements. Therefore, the proportion of the number of FNW coset elements to randomly generated coset elements can play a significant role in the results of random and biased inputs. To find 2 α FNW coset elements, we ranked the FNW coset elements based on the access frequency distribution [21] for biased data and selected elements that have the most access frequency. Figure 6 shows results of the access frequency distribution for the bit FNW coset elements with k = 4 on 12 different applications. The first four FNW patterns from the left side roughly covers up to 71.50% access frequency distribution. Also, Figure 7 depicts the first bit FNW coset elements achieves up to 80.25% access frequency distribution when k = 8 (the number of coset elements increases from 16 to 256). Note that each bit in Figures 6 and 7 is representative of the content of two bytes and one byte, respectively. For example, the FNW coset element is represented by the pattern in Figure 7. Based on Figures 6 and 7, we conclude that a hybrid coset can be tuned with 25% FNW coset elements (4 out of 16 patterns for k = 4 and 64 out of 256 patterns for k = 8) and the remaining 75% use randomly generated coset elements in order to cover at least 71% of the best codewords from the entire FNW coset for biased data, while still providing good coverage of the unbiased data space from the randomly generated coset. In the next section, we will show that the hybrid coset with 75% randomly generated coset elements and 25% FNW coset elements can obtain competitive performance for both biased and unbiased datasets. Note that this ratio is sensitive to the used datasets. While selecting more RGCbased coset elements improves the performance (bit flip reduction) for datasets with random data patterns, it sacrifices the performance for datasets including biased patterns. Similarly, more FNW-based coset elements improve the performance for datasets including biased patterns EXPERIMENTAL RESULTS In this section, we evaluate our proposed schemes, specifically, PRES, RGC, and HC, against the leading bit minimization techniques of FlipMin and FNW for unbiased and biased datasets, respectively. Our experiments are applied across different budgets for space overhead. The results show that HC is the best candidate for application to systems that process both unbiased and biased datasets. Moreover, we measure the randomness degree of the data vectors generated by the encoder of each scheme and consider the implications of these results. 8.1 Bit Flip Reduction for Unbiased data To compare the reduction in bit flips achieved by each scheme with respect to differential write as a baseline for unbiased data, we generated 100 million random inputs. We assign 4 auxiliary bits (k = 4) to RGC, i.e., the base coset encompasses 16 different elements. In addition, we consider p = 2 for PRES to generate 16 pseudorandom codewords based on the input data. Finally, we compared with HC. Recall from Section 7, HC can be tuned based on the combination of randomly generated coset elements and FNW coset elements and that 25% from FNW and 75% from RGC provided good coverage. Thus, four FNW coset elements we combined with 12 randomly generated coset elements for k = 4. Finally, we change the data block sizes from 32 to 512 bits covering an encoding overhead range of 12.5% to less than 1%. Figure 8 shows that RGC, HC and PRES require flipping fewer bits and outperform both FlipMin and FNW across different block sizes with varying space overhead. RGC requires 15-25% fewer bit flips depending on block size configuration than FlipMin and FNW, respectively. Since both RGC and PRES generate random codewords, they have roughly the same bit flip reduction for different overheads. Moreover, Figure 8 shows that replacing four randomly generated coset elements with four FNW coset elements still maintains the positive bit flip minimization effect on the random dataset. To achieve higher flip reduction rates, the number of elements of the base coset can be increased to allow more encoding candidates to be considered. Accordingly, we extend the number of auxiliary bits to eight (256 elements for RGC) and consider HC with the combination of the 64 FNW coset elements as shown in Figure 7 and 192 randomly generated coset elements and then plot our findings in Figure 9. The results show that for the same space overhead, the flip reduction capabilities of all five schemes have significantly improved. For a space overhead of 12.5%, the flip reduction capability of RGC increased from 21% to 25% which amounts to a 20% improvement, and also the flip reduction capabilities of HC reached 24.2%, an 18% improvement. For a space overhead of %, while RGC and HC require up to 6% fewer bit flips than PRES, they obtain up to 30% fewer bit flips than either FlipMin or FNW. We

9 IEEE TRANSACTIONS ON COMPUTERS 9 Access Frequency Distribu0on 25% 20% 15% 10% 5% 0% astar canneal gcc lbm leslie3d libquntum mcf omnetpp soplex wrf zeusmp milc Pa4ern Fig. 7: The selection of 64 FNW coset elements with most access frequency from 256 possible FNW coset elements. Bit Flip Reduc:on (4 auxiliary Bits) 20% 15% 10% 5% RGC PRES HC FlipMin FNW Bit Flip Reduc:on (8 auxiliary Bits) 25% 20% 15% 10% 5% RGC PRES HC FlipMin FNW 0% 32 (12.5%) 64(6.25%) 128 (3.125%) 256(1.5625%) 512 ( %) Block Size / Overhead (Percentage) 0% 64 (12.5%) 128 (6.25%) 256 (3.125%) 512 (1.5625%) 1024 ( %) Block Size / Overhead (Percentage) Fig. 8: Bit flip reduction over Differential Write for unbiased data with 4 auxiliary bits and different block sizes. The value in between the parenthesis represents space overhead. Fig. 9: Bit flip reduction over Differential Write for unbiased data with 8 auxiliary bits and different block sizes. The value in between the parenthesis represents space overhead. note that with a larger base cost, FlipMin gets closer to the capability of RGC and HC for blocks with higher space overhead. However, the performance overhead of FlipMin is significantly larger than RGC, HC, PRES and FNW as we discuss in Section Bit Flip Reduction for Biased Data We evaluated the performance of RGC, HC, FlipMin and FNW for biased 3 inputs in Figures 10 and 11. Since both RGC and PRES generate essentially random codewords, regardless the type of input data, i.e. biased or unbiased, both RGC and PRES have a similar effect on biased data. On average, RGC produces a 33.8% bit flip reduction improvement over the baseline (Differential Write) for blocks with four auxiliary bits, while FlipMin and FNW produce 47.2% and 56.4% bit flip reduction improvements over the baseline, respectively. Compared to RGC, HC on average achieves up to 55.92% the bit flip reduction over baseline for a space overhead of 6.25%. For eight auxiliary bits, the flip reduction capability of the randomly generated coset increased from 33.8% to 38.0%, while the flip reduction capabilities FlipMin and FNW increased from 47.4% and 57.0% to 48.8% and 3. We use twelve write-intensive benchmarks from SPEC CPU2006, SPEC JBB2005 and the only write-intensive benchmark (canneal) from the PARSEC suite. 59.6%, respectively. Figure 11 depicts that the performance of HC can be improved from 56.0% to 58.7% on average. On all benchmarks, FNW and HC outperform RGC and FlipMin even though RGC only outperforms FNW on unbiased inputs. This discrepancy highlights how write minimization results depend on the inputs. According to Figures 10 and 11, using HC is much more effective than using random elements alone when biased inputs is employed. Therefore, employing the combination of 75% randomly generated coset elements and 25% FNW coset elements in HC generally improves performance for biased inputs over purely random encoding strategies. This is supported by the results demonstrating that the HC scheme has the best bit flip reduction for biased inputs in comparison to FlipMin and RGC (56.4% for a space overhead of 6.25% and also 58.7% for a space overhead of 12.5%) and almost has the same bit flip reduction as the FNW for biased inputs. We conclude that using the combination of randomly generated coset elements and FNW coset elements in the base coset is better than each coset alone in terms of the bit flip reduction for a system that handles both biased and unbiased inputs. Compression was proposed as a method to reduce the overhead of writing in nonvolatile memory [22]. It is well known that compression increases the randomness of the data and hence will ben-

10 IEEE TRANSACTIONS ON COMPUTERS 10 Number of bit flips (less is be<er) bit block 4 auxiliary bits RGC FlipMin HC FNW 0 libquantum omnetpp wrf canneal leslie3d milc astar zeusmp mcf gcc lbm soplex average Fig. 10: Comparison of Write minimization schemes using 4 auxiliary bits for SPEC CPU2006 and SPEC JBB2005 inputs. Number of bit flips (less is be<er) bit block 8 auxiliary bits RGC FlipMin HC FNW 0 libquantum omnetpp wrf canneal leslie3d milc astar zeusmp mcf gcc lbm soplex average Fig. 11: Comparison of Write minimization schemes using 8 auxiliary bits for SPEC CPU2006 and SPEC JBB2005 inputs. efit our proposed schemes. To verify this hypothesis, we compressed 32-bytes blocks of the different applications using Base-Delta-Immediate Compression(BDI) [22] and then encode compressed applications using the different encoding schemes. Since compressing data can increase randomness, the experimental results show that the combination of random coset elements and FNW-based coset elements, on average significantly minimizes the number of bit flips compared to other schemes. Note that some data applications are compressed very well while others cannot be compressed by the compression algorithm. Figures 12 and 13 show the average number of bit flips for different schemes that compresses data applications before encoding with 4 and 8 auxiliary bits, respectively. According to the experimental results, compressing data before encoding can improve HC performance in comparison to other schemes. On average, HC, FNW, FlipMin and RGC with 4 auxiliary bits achieve bit flip reductions of about 77%, 56%, 63% and 52% against the differential write scheme, respectively. With 8 auxiliary bit, HC, FNW, FlipMin and RGC improve the bit flip reduction by up to 79%, 60%, 65%, 56% against the differential write scheme, respectively. 8.3 Randomness Measurement Figures 8 and 9 showed that RGC outperforms both FlipMin and FNW in bit flip reduction capability for unbiased data. We attribute those results to the fact that the RGC generates a base coset with elements that are more random than the elements of the base cosets of FlipMin and FNW. NIST SP [23] are well known quantitative tests that measure the randomness of a set of data vectors. Accordingly, we have used these tests to measure the randomness of the codewords produced by the encoders of RGC, HC, PRES, FlipMin and FNW. Our measurements reveal that the data vectors generated by RGC, HC, PRES, FlipMin and FNW pass 100%, 95%, 99%, 96% and 89% of the tests, respectively. These findings support our rationale that the higher the randomness of the base coset elements the higher the rate of the bit flip reduction that can be achieved for unbiased inputs. To achieve the satisfactory results when both unbiased inputs and biased inputs are employed, a small percentage of random coset elements can be replaced by targeted FNW coset elements at the cost of slightly reducing the randomness of the base coset elements. However, lack of randomness is not necessarily valuable for biased data. FlipMin also attempts to address general datasets through coding theory and has a similar randomness as HC, however, the results from Sections 8.1 and 8.2 indicate that HC is significantly more effective for actual bit flip reduction due to the better targeted nature of the base coset elements. 9 CODING COMPLEXITY We applied Synopsys Design Compiler to synthesize encoders and decoders of different schemes using 45nm FreePDK cell library. To store the coset elements of FlipMin and Hybrid Coset (HC), we used CACTI [24] and modified it to model SRAM cells and estimate the area and delay of the storage. We assume that FlipMin, HC and PRES take 64-bit inputs and utilize 4 auxiliary bits for encoding input data and their decoders retrieve 64-bit original datawords. Note that the overhead of

11 IEEE TRANSACTIONS ON COMPUTERS 11 Number of Bit flips (less is be=er) bit block 4 auxiliary bits RGC FlipMin HC FNW libquantum omnetpp wrf canneal leslie3d milc astar zeusmp mcf gcc lbm soplex average Fig. 12: Comparison of Write minimization schemes using 4 auxiliary bits after compressing SPEC CPU2006 and SPEC JBB2005 inputs. Number of Bit flips (less is be=er) bit block 8 auxiliary bits RGC FlipMin HC FNW libquantum omnetpp wrf canneal leslie3d milc astar zeusmp mcf gcc lbm soplex average Fig. 13: Comparison of Write minimization schemes using 8 auxiliary bits after compressing SPEC CPU2006 and SPEC JBB2005 inputs. TABLE 2: The encoder and decoder overheads of different schemes. Scheme Delay(ns) Area(µm 2 ) # of cells # of cells in the critical path PRES(16 counters) PRES(1 counter) Encoder HC/FlipMin(16 counters) HC/FlipMin(1 counter) FNW PRES Decoder HC/FlipMin FNW FlipMin is similar to HC since they leverage the same mechanisms for encoding and decoding when the coset elements are stored in SRAM. To compare the overhead of FNW, which is 16-bit wide, we multiply the area overhead of FNW by 4. Table 2 shows overheads of the different schemes. The dominant portion of the encoding (area and latency) for FlipMin, PRES, and HC is in the one s counter to determine the best codeword to write. Thus, we implemented an entirely parallel version (16 encoders and 1 s counters) and a sequential version (individual encoder and 1s counter that is pipelined). The number of cells used in PRES encoder for the parallel version is about the same as the number of cells used in HC/FlipMin and also the number of cells of the critical path in PRES is 23.5% less than that in HC/FlipMin because, in PRES, it is possible to count the number of ones in the codeword while they are computed by XOR operations (See Figure 2). Accordingly, PRES encoder reduces the delay by up to 20.9% compared to HC/FlipMin. However, the most critical latency is that of decoding since this is typically the limiting factor in performance. Encoding happens during writing and typically this is not along the critical due to memory buffers. According to Table 2, although the HC/FlipMin encoder experiences a higher overhead than FNW encoder, the HC/FlipMin decoder achieves an overhead similar to FNW. Meanwhile, the HC/FlipMin decoder decreases the delay by up to 89.18% compared to PRES, and the number cells used in the HC/FlipMin is 47 times fewer than that in PRES decoder. In the sequential version, HC/FlipMin has identical performance to PRES since we pipelined the design of both encoders with the one s counter being in the critical stage of pipelines. The encoding and 1 s counting are in separate cycles that dominate the delay. Extending the number of auxiliary bits provides higher flip reduction improvement because the number of coset elements in HC/FlipMin and pseudo-random codewords in PRES can be increased to

12 IEEE TRANSACTIONS ON COMPUTERS 12 allow the currently stored data to be compared against more data vectors. This improvement comes at a higher computational overhead, as a larger codeword requires more elements to compare against. For k=8, the storage space and gate counts of HC/FlipMin increase by up to 16 times compared to k=4 since the number of coset elements increases from 16 to 256. Meanwhile, PRES requires an increase in the number of codewords by up to 16 times which amounts to 16 times space overhead. On the other hand, using one counter or 16 counters during encoding process does not change much the space occupied each the counter in comparison to 4 auxiliary bits. 10 CONCLUSION The relatively high write energy is one of the major weaknesses of emerging non-volatile memories. Accordingly, bit change reduction schemes are a particularly successful approach to reduce the impact of this overhead through the minimization of the number of bits changed per write. In this paper, we show that the effectiveness of coset based write minimization techniques is directly correlated with the correlation of the biased and unbiased codewords in the encoder with the biased and unbiased nature of the dataset to be encoded. We further show that the randomness of the elements that form the base coset work effectively on unbiased datasets while FNW approaches work effectively for biased datasets. Finally, we find that replacing a number of randomly generated coset elements with selected FNW coset elements can dramatically improve the randomly generated coset approach towards biased data while maintaining the positive effect of the random coset elements on random datasets. Our analyses and experimental results showed that the generated codewords by the hybrid coset encoder can lead to fewer bit flips than generated codewords by other schemes. Also, the HC decoder not only decreases the delay by up to 89.18% in the critical path compared to PRES, but also achieves an overhead (very low) similar to FNW. Overall, the hybrid coset encoder contributes to overcoming the challenges of dynamic energy and reliability concerns of emerging non-volatile memories, advancing their eventual deployment in commercial systems. ACKNOWLEDGMENT The authors would like to thank anonymous Referees for their valuable comments and suggestions to improve this paper. REFERENCES [1] M. K. Qureshi, S. Gurumurthi, and B. Rajendran, Phase change memory: From devices to systems, Synthesis Lectures on Computer Architecture, vol. 6, no. 4, pp , [2] E. Chen, D. Apalkov, Z. Diao, A. Driskill-Smith, D. Druist, D. Lottis, V. Nikitin, X. Tang, S. Watts, S. Wang et al., Advances and future prospects of spin-transfer torque random access memory, IEEE Transactions on Magnetics, vol. 46, no. 6, pp , [3] M. Rasquinha, D. Choudhary, S. Chatterjee, S. Mukhopadhyay, and S. Yalamanchili, An energy efficient cache design using spin torque transfer (stt) ram, in Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design, 2010, pp [4] X. Guo, E. Ipek, and T. Soyata, Resistive computation: avoiding the power wall with low-leakage, stt-mram based computing, ACM SIGARCH Computer Architecture News, vol. 38, no. 3, pp , [5] B. C. Lee, E. Ipek, O. Mutlu, and D. Burger, Architecting phase change memory as a scalable dram alternative, ACM SIGARCH Computer Architecture News, vol. 37, no. 3, pp. 2 13, [6] M. K. Qureshi, V. Srinivasan, and J. A. Rivers, Scalable high performance main memory system using phase-change memory technology, ACM SIGARCH Computer Architecture News, vol. 37, no. 3, pp , [7] Y. Zhang, L. Zhang, W. Wen, G. Sun, and Y. Chen, Multi-level cell stt-ram: Is it realistic or just a dream? in IEEE/ACM International Conference on Computer-Aided Design, 2012, pp [8] S. Raoux, G. W. Burr, M. J. Breitwisch, C. T. Rettner, Y.-C. Chen, R. M. Shelby, M. Salinga, D. Krebs, S.-H. Chen, H.-L. Lung et al., Phase-change random access memory: A scalable technology, IBM Journal of Research and Development, vol. 52, no. 4.5, pp , [9] W. Wen, Y. Zhang, Y. Chen, Y. Wang, and Y. Xie, Ps3-ram: A fast portable and scalable statistical stt-ram reliability analysis method, in Proceedings of the 49th Annual IEEE Design Automation Conference, 2012, pp [10] Y. Zhang, W. Wen, and Y. Chen, The prospect of stt-ram scaling from readability perspective, IEEE Transactions on Magnetics, vol. 48, no. 11, pp , [11] R. Maddah, S. M. Seyedzadeh, and R. Melhem, Cafo: Cost aware flip optimization for asymmetric memories, in 21st IEEE International Symposium on High Performance Computer Architecture, 2015, pp [12] B. Lee, P. Zhou, J. Yang, Y. Zhang, B. Zhao, E. Ipek, O. Mutlu, and D. Burger, Phase-change technology and the future of main memory, IEEE Micro, [13] B.-D. Yang, J.-E. Lee, J.-S. Kim, J. Cho, S.-Y. Lee, and B.-G. Yu, A low power phase-change random access memory using a datacomparison write scheme, in IEEE International Symposium on Circuits and Systems, 2007, pp [14] A. N. Jacobvitz, R. Calderbank, and D. J. Sorin, Writing cosets of a convolutional code to increase the lifetime of flash memory, in 50th Annual IEEE Allerton Conference on Communication, Control, and Computing, 2012, pp [15] J. Li and K. Mohanram, Write-once-memory-code phase change memory, in Design, Automation and Test in Europe Conference and Exhibition, 2014, pp [16] S. Cho and H. Lee, Flip-n-write: a simple deterministic technique to improve pram write performance, energy and endurance, in 42nd Annual IEEE/ACM International Symposium on Microarchitecture, 2009, pp [17] A. N. Jacobvitz, R. Calderbank, and D. J. Sorin, Coset coding to extend the lifetime of memory, in 19th IEEE International Symposium on High Performance Computer Architecture, 2013, pp [18] S. M. Seyedzadeh, R. Maddah, A. Jones, and R. Melhem, Pres: Pseudo-random encoding scheme to increase the bit flip reduction in the memory, in 52nd ACM/EDAC/IEEE Design Automation Conference, 2015, pp [19] G. D. Forney Jr, Coset codes. i. introduction and geometrical classification, IEEE Transactions on Information Theory, vol. 34, no. 5, pp , [20] I. Reed, A class of multiple-error-correcting codes and the decoding scheme, Transactions of the IRE Professional Group on Information Theory, vol. 4, no. 4, pp , [21] Y. Du, M. Zhou, B. R. Childers, D. Mossé, and R. Melhem, Bit mapping for balanced pcm cell programming, in ACM SIGARCH Computer Architecture News, vol. 41, no. 3. ACM, 2013, pp [22] G. Pekhimenko, V. Seshadri, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry, Base-delta-immediate compression: practical data compression for on-chip caches, in Proceedings of the 21st international conference on Parallel architectures and compilation techniques. ACM, 2012, pp [23] Rukhin et al., Nist special publication , A statistical test

13 IEEE TRANSACTIONS ON COMPUTERS suite for random and pseudorandom number generators for cryptographic applications, [24] N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi, Cacti 6.0: A tool to understand large caches, University of Utah and Hewlett Packard Laboratories, Tech. Rep, Seyed Mohammad Seyedzadeh received a B.S. degree in Electrical Engineering from Shiraz University of Technology in 2007 and an M.S. degree in Electrical Engineering from Iran University of Science and Technology in He is currently a Ph.D. student in computer engineering at University of Pittsburgh. His main research interests include computer architecture, fault tolerance, and coding theory. Rakan Maddah received his B.S. and M.S. degree in Computer Science from the Lebanese American University in 2007 and 2009 respectively. He joined the Computer Science Department at the University of Pittsburgh as a Ph.D. student in 2010 and earned his degree in He is now a senior engineer with Intel s NonVolatile Solutions Group working on memory and storage products. His research interests are in computer architecture, systems and fault tolerance. Donald Kline, Jr received his Bachelor of Science in Computer Engineering from the University of Pittsburgh in the spring of 2015, and is currently pursuing his Ph.D. in Electrical and Computer Engineering at the University of Pittsburgh under the guidance of Dr. Alex Jones and Dr. Rami Melhem. His research interests currently include computer architecture, memories, compilers, and machine learning. Alex K. Jones received the BS degree in 1998 in physics from the College of William and Mary in Williamsburg, Virginia, and the MS and PhD degrees in 2000 and 2002, respectively, in electrical and computer engineering at Northwestern University. He is currently the Director of Computer Engineering and an Associate Professor of Electrical and Computer Engineering and Computer Science at the University of Pittsburgh, Pennsylvania. He is a Walter P. Murphy Fellow of Northwestern University, a senior member of the IEEE and ACM. Dr. Jones research interests include compilation techniques for configurable systems and architectures, behavioral and lowpower synthesis, parallel architectures and networks, radio frequency identification (RFID) and sensor networks, sustainable computing, and embedded computing for medical instruments. He is the author of more than 100 publications in these areas. His research is funded by the U.S. National Science Foundation, DARPA, CCC, and industry. Dr. Jones contributions have received several awards including multiple the 2010 ACM/SIGDA Distinguished Service Award and recognition of a top 25 paper from the first 20 years of FCCM. Recently, Dr. Jones led an effort in visioning for the electronic design automation community funded by the Computing Community Consortium (CCC). Dr. Jones is also actively involved in efforts to improve the scientific method for experiments in computers science and engineering, to develop methods reproducible research, and a centralized hub for computer architecture simulators, emulators, benchmarks and experiments. 13 Rami Melhem received a B.E. in Electrical Engineering from Cairo University in 1976, an M.A. degree in Mathematics and an M.S. degree in Computer Science from the University of Pittsburgh in 1981, and a Ph.D. degree in Computer Science from the University of Pittsburgh in He was an Assistant Professor at Purdue University prior to joining the faculty of The University of Pittsburgh in 1986, where he is currently a Professor in the Computer Science Department which he chaired from 2000 to His research interests include Power Management, Parallel Computer Architectures, Fault-Tolerant Systems, Optical Networks and High Performance Computing. Dr. Melhem served and is serving on program committees of numerous conferences and workshops and on the editorial boards of the IEEE Transactions on Computers, the IEEE Transactions on Parallel and Distributed systems, the Computer Architecture Letters, the Journal of Parallel and Distributed Computing and the Journal of Sustainable Computing, Informatics and Systems. Dr. Melhem is a fellow of IEEE and a member of the ACM.

Leveraging ECC to Mitigate Read Disturbance, False Reads and Write Faults in STT-RAM

Leveraging ECC to Mitigate Read Disturbance, False Reads and Write Faults in STT-RAM Leveraging ECC to Mitigate Read Disturbance, False Reads and Write Faults in STT-RAM Seyed Mohammad Seyedzadeh, Rakan Maddah, Alex Jones, Rami Melhem University of Pittsburgh Intel Corporation seyedzadeh@cs.pitt.edu,

More information

A Piggybacking Design Framework for Read-and Download-efficient Distributed Storage Codes

A Piggybacking Design Framework for Read-and Download-efficient Distributed Storage Codes A Piggybacing Design Framewor for Read-and Download-efficient Distributed Storage Codes K V Rashmi, Nihar B Shah, Kannan Ramchandran, Fellow, IEEE Department of Electrical Engineering and Computer Sciences

More information

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER Jesus Garcia and Michael J. Schulte Lehigh University Department of Computer Science and Engineering Bethlehem, PA 15 ABSTRACT Galois field arithmetic

More information

Flip-N-Write: A Simple Deterministic Technique to Improve PRAM Write Performance, Energy and Endurance. Presenter: Brian Wongchaowart March 17, 2010

Flip-N-Write: A Simple Deterministic Technique to Improve PRAM Write Performance, Energy and Endurance. Presenter: Brian Wongchaowart March 17, 2010 Flip-N-Write: A Simple Deterministic Technique to Improve PRAM Write Performance, Energy and Endurance Sangyeun Cho Hyunjin Lee Presenter: Brian Wongchaowart March 17, 2010 Motivation Suppose that you

More information

Error Detection and Correction: Hamming Code; Reed-Muller Code

Error Detection and Correction: Hamming Code; Reed-Muller Code Error Detection and Correction: Hamming Code; Reed-Muller Code Greg Plaxton Theory in Programming Practice, Spring 2005 Department of Computer Science University of Texas at Austin Hamming Code: Motivation

More information

Evaluating Sampling Based Hotspot Detection

Evaluating Sampling Based Hotspot Detection Evaluating Sampling Based Hotspot Detection Qiang Wu and Oskar Mencer Department of Computing, Imperial College London, South Kensington, London SW7 2AZ, UK {qiangwu,oskar}@doc.ic.ac.uk http://comparch.doc.ic.ac.uk

More information

Chapter 7. Sequential Circuits Registers, Counters, RAM

Chapter 7. Sequential Circuits Registers, Counters, RAM Chapter 7. Sequential Circuits Registers, Counters, RAM Register - a group of binary storage elements suitable for holding binary info A group of FFs constitutes a register Commonly used as temporary storage

More information

CMPE12 - Notes chapter 1. Digital Logic. (Textbook Chapter 3)

CMPE12 - Notes chapter 1. Digital Logic. (Textbook Chapter 3) CMPE12 - Notes chapter 1 Digital Logic (Textbook Chapter 3) Transistor: Building Block of Computers Microprocessors contain TONS of transistors Intel Montecito (2005): 1.72 billion Intel Pentium 4 (2000):

More information

Boolean Algebra and Digital Logic 2009, University of Colombo School of Computing

Boolean Algebra and Digital Logic 2009, University of Colombo School of Computing IT 204 Section 3.0 Boolean Algebra and Digital Logic Boolean Algebra 2 Logic Equations to Truth Tables X = A. B + A. B + AB A B X 0 0 0 0 3 Sum of Products The OR operation performed on the products of

More information

Digital Logic: Boolean Algebra and Gates. Textbook Chapter 3

Digital Logic: Boolean Algebra and Gates. Textbook Chapter 3 Digital Logic: Boolean Algebra and Gates Textbook Chapter 3 Basic Logic Gates XOR CMPE12 Summer 2009 02-2 Truth Table The most basic representation of a logic function Lists the output for all possible

More information

Outline. EECS Components and Design Techniques for Digital Systems. Lec 18 Error Coding. In the real world. Our beautiful digital world.

Outline. EECS Components and Design Techniques for Digital Systems. Lec 18 Error Coding. In the real world. Our beautiful digital world. Outline EECS 150 - Components and esign Techniques for igital Systems Lec 18 Error Coding Errors and error models Parity and Hamming Codes (SECE) Errors in Communications LFSRs Cyclic Redundancy Check

More information

Synthesis of Saturating Counters Using Traditional and Non-traditional Basic Counters

Synthesis of Saturating Counters Using Traditional and Non-traditional Basic Counters Synthesis of Saturating Counters Using Traditional and Non-traditional Basic Counters Zhaojun Wo and Israel Koren Department of Electrical and Computer Engineering University of Massachusetts, Amherst,

More information

Cold Boot Attacks in the Discrete Logarithm Setting

Cold Boot Attacks in the Discrete Logarithm Setting Cold Boot Attacks in the Discrete Logarithm Setting B. Poettering 1 & D. L. Sibborn 2 1 Ruhr University of Bochum 2 Royal Holloway, University of London October, 2015 Outline of the talk 1 Introduction

More information

Efficient Bit-Channel Reliability Computation for Multi-Mode Polar Code Encoders and Decoders

Efficient Bit-Channel Reliability Computation for Multi-Mode Polar Code Encoders and Decoders Efficient Bit-Channel Reliability Computation for Multi-Mode Polar Code Encoders and Decoders Carlo Condo, Seyyed Ali Hashemi, Warren J. Gross arxiv:1705.05674v1 [cs.it] 16 May 2017 Abstract Polar codes

More information

ECC for NAND Flash. Osso Vahabzadeh. TexasLDPC Inc. Flash Memory Summit 2017 Santa Clara, CA 1

ECC for NAND Flash. Osso Vahabzadeh. TexasLDPC Inc. Flash Memory Summit 2017 Santa Clara, CA 1 ECC for NAND Flash Osso Vahabzadeh TexasLDPC Inc. 1 Overview Why Is Error Correction Needed in Flash Memories? Error Correction Codes Fundamentals Low-Density Parity-Check (LDPC) Codes LDPC Encoding and

More information

Optimum Soft Decision Decoding of Linear Block Codes

Optimum Soft Decision Decoding of Linear Block Codes Optimum Soft Decision Decoding of Linear Block Codes {m i } Channel encoder C=(C n-1,,c 0 ) BPSK S(t) (n,k,d) linear modulator block code Optimal receiver AWGN Assume that [n,k,d] linear block code C is

More information

Lecture 2: Metrics to Evaluate Systems

Lecture 2: Metrics to Evaluate Systems Lecture 2: Metrics to Evaluate Systems Topics: Metrics: power, reliability, cost, benchmark suites, performance equation, summarizing performance with AM, GM, HM Sign up for the class mailing list! Video

More information

Codes for Partially Stuck-at Memory Cells

Codes for Partially Stuck-at Memory Cells 1 Codes for Partially Stuck-at Memory Cells Antonia Wachter-Zeh and Eitan Yaakobi Department of Computer Science Technion Israel Institute of Technology, Haifa, Israel Email: {antonia, yaakobi@cs.technion.ac.il

More information

Reducing NVM Writes with Optimized Shadow Paging

Reducing NVM Writes with Optimized Shadow Paging Reducing NVM Writes with Optimized Shadow Paging Yuanjiang Ni, Jishen Zhao, Daniel Bittman, Ethan L. Miller Center for Research in Storage Systems University of California, Santa Cruz Emerging Technology

More information

MATH3302. Coding and Cryptography. Coding Theory

MATH3302. Coding and Cryptography. Coding Theory MATH3302 Coding and Cryptography Coding Theory 2010 Contents 1 Introduction to coding theory 2 1.1 Introduction.......................................... 2 1.2 Basic definitions and assumptions..............................

More information

Error Detection, Correction and Erasure Codes for Implementation in a Cluster File-system

Error Detection, Correction and Erasure Codes for Implementation in a Cluster File-system Error Detection, Correction and Erasure Codes for Implementation in a Cluster File-system Steve Baker December 6, 2011 Abstract. The evaluation of various error detection and correction algorithms and

More information

Lecture 4 : Introduction to Low-density Parity-check Codes

Lecture 4 : Introduction to Low-density Parity-check Codes Lecture 4 : Introduction to Low-density Parity-check Codes LDPC codes are a class of linear block codes with implementable decoders, which provide near-capacity performance. History: 1. LDPC codes were

More information

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017 9. Datapath Design Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017 October 2, 2017 ECE Department, University of Texas at Austin

More information

CHAPTER 3 LOW DENSITY PARITY CHECK CODES

CHAPTER 3 LOW DENSITY PARITY CHECK CODES 62 CHAPTER 3 LOW DENSITY PARITY CHECK CODES 3. INTRODUCTION LDPC codes were first presented by Gallager in 962 [] and in 996, MacKay and Neal re-discovered LDPC codes.they proved that these codes approach

More information

GENETIC ALGORITHM FOR CELL DESIGN UNDER SINGLE AND MULTIPLE PERIODS

GENETIC ALGORITHM FOR CELL DESIGN UNDER SINGLE AND MULTIPLE PERIODS GENETIC ALGORITHM FOR CELL DESIGN UNDER SINGLE AND MULTIPLE PERIODS A genetic algorithm is a random search technique for global optimisation in a complex search space. It was originally inspired by an

More information

FPGA BASED DESIGN OF PARALLEL CRC GENERATION FOR HIGH SPEED APPLICATION

FPGA BASED DESIGN OF PARALLEL CRC GENERATION FOR HIGH SPEED APPLICATION 258 FPGA BASED DESIGN OF PARALLEL CRC GENERATION FOR HIGH SPEED APPLICATION Sri N.V.N.Prasanna Kumar 1, S.Bhagya Jyothi 2,G.K.S.Tejaswi 3 1 prasannakumar429@gmail.com, 2 sjyothi567@gmail.com, 3 tejaswikakatiya@gmail.com

More information

Lecture 3: Error Correcting Codes

Lecture 3: Error Correcting Codes CS 880: Pseudorandomness and Derandomization 1/30/2013 Lecture 3: Error Correcting Codes Instructors: Holger Dell and Dieter van Melkebeek Scribe: Xi Wu In this lecture we review some background on error

More information

EECS150 - Digital Design Lecture 23 - FFs revisited, FIFOs, ECCs, LSFRs. Cross-coupled NOR gates

EECS150 - Digital Design Lecture 23 - FFs revisited, FIFOs, ECCs, LSFRs. Cross-coupled NOR gates EECS150 - Digital Design Lecture 23 - FFs revisited, FIFOs, ECCs, LSFRs April 16, 2009 John Wawrzynek Spring 2009 EECS150 - Lec24-blocks Page 1 Cross-coupled NOR gates remember, If both R=0 & S=0, then

More information

Digital Image Processing Lectures 25 & 26

Digital Image Processing Lectures 25 & 26 Lectures 25 & 26, Professor Department of Electrical and Computer Engineering Colorado State University Spring 2015 Area 4: Image Encoding and Compression Goal: To exploit the redundancies in the image

More information

Chapter 3 Linear Block Codes

Chapter 3 Linear Block Codes Wireless Information Transmission System Lab. Chapter 3 Linear Block Codes Institute of Communications Engineering National Sun Yat-sen University Outlines Introduction to linear block codes Syndrome and

More information

Featured Articles Advanced Research into AI Ising Computer

Featured Articles Advanced Research into AI Ising Computer 156 Hitachi Review Vol. 65 (2016), No. 6 Featured Articles Advanced Research into AI Ising Computer Masanao Yamaoka, Ph.D. Chihiro Yoshimura Masato Hayashi Takuya Okuyama Hidetaka Aoki Hiroyuki Mizuno,

More information

Combinational Logic Design Combinational Functions and Circuits

Combinational Logic Design Combinational Functions and Circuits Combinational Logic Design Combinational Functions and Circuits Overview Combinational Circuits Design Procedure Generic Example Example with don t cares: BCD-to-SevenSegment converter Binary Decoders

More information

Fault Tolerance Technique in Huffman Coding applies to Baseline JPEG

Fault Tolerance Technique in Huffman Coding applies to Baseline JPEG Fault Tolerance Technique in Huffman Coding applies to Baseline JPEG Cung Nguyen and Robert G. Redinbo Department of Electrical and Computer Engineering University of California, Davis, CA email: cunguyen,

More information

Physically Unclonable Functions

Physically Unclonable Functions Physically Unclonable Functions Rajat Subhra Chakraborty Associate Professor Department of Computer Science and Engineering IIT Kharagpur E-mail: rschakraborty@cse.iitkgp.ernet.in ISEA Workshop IIT Kharagpur,

More information

Fully-parallel linear error block coding and decoding a Boolean approach

Fully-parallel linear error block coding and decoding a Boolean approach Fully-parallel linear error block coding and decoding a Boolean approach Hermann Meuth, Hochschule Darmstadt Katrin Tschirpke, Hochschule Aschaffenburg 8th International Workshop on Boolean Problems, 28

More information

CPSC 467: Cryptography and Computer Security

CPSC 467: Cryptography and Computer Security CPSC 467: Cryptography and Computer Security Michael J. Fischer Lecture 16 October 30, 2017 CPSC 467, Lecture 16 1/52 Properties of Hash Functions Hash functions do not always look random Relations among

More information

Error-Correcting Schemes with Dynamic Thresholds in Nonvolatile Memories

Error-Correcting Schemes with Dynamic Thresholds in Nonvolatile Memories 2 IEEE International Symposium on Information Theory Proceedings Error-Correcting Schemes with Dynamic Thresholds in Nonvolatile Memories Hongchao Zhou Electrical Engineering Department California Institute

More information

Design and Analysis of LT Codes with Decreasing Ripple Size

Design and Analysis of LT Codes with Decreasing Ripple Size Design and Analysis of LT Codes with Decreasing Ripple Size 1 Jesper H. Sørensen, Petar Popovski, Jan Østergaard, Aalborg University, Department of Electronic Systems, E-mail: {jhs, petarp, arxiv:1011.2078v2

More information

Quantum-inspired Huffman Coding

Quantum-inspired Huffman Coding Quantum-inspired Huffman Coding A. S. Tolba, M. Z. Rashad, and M. A. El-Dosuky Dept. of Computer Science, Faculty of Computers and Information Sciences, Mansoura University, Mansoura, Egypt. tolba_954@yahoo.com,

More information

Improved Successive Cancellation Flip Decoding of Polar Codes Based on Error Distribution

Improved Successive Cancellation Flip Decoding of Polar Codes Based on Error Distribution Improved Successive Cancellation Flip Decoding of Polar Codes Based on Error Distribution Carlo Condo, Furkan Ercan, Warren J. Gross Department of Electrical and Computer Engineering, McGill University,

More information

Design of Sequential Circuits

Design of Sequential Circuits Design of Sequential Circuits Seven Steps: Construct a state diagram (showing contents of flip flop and inputs with next state) Assign letter variables to each flip flop and each input and output variable

More information

2. Accelerated Computations

2. Accelerated Computations 2. Accelerated Computations 2.1. Bent Function Enumeration by a Circular Pipeline Implemented on an FPGA Stuart W. Schneider Jon T. Butler 2.1.1. Background A naive approach to encoding a plaintext message

More information

Probabilistic Near-Duplicate. Detection Using Simhash

Probabilistic Near-Duplicate. Detection Using Simhash Probabilistic Near-Duplicate Detection Using Simhash Sadhan Sood, Dmitri Loguinov Presented by Matt Smith Internet Research Lab Department of Computer Science and Engineering Texas A&M University 27 October

More information

The E8 Lattice and Error Correction in Multi-Level Flash Memory

The E8 Lattice and Error Correction in Multi-Level Flash Memory The E8 Lattice and Error Correction in Multi-Level Flash Memory Brian M Kurkoski University of Electro-Communications Tokyo, Japan kurkoski@iceuecacjp Abstract A construction using the E8 lattice and Reed-Solomon

More information

Cryptanalysis of Achterbahn

Cryptanalysis of Achterbahn Cryptanalysis of Achterbahn Thomas Johansson 1, Willi Meier 2, and Frédéric Muller 3 1 Department of Information Technology, Lund University P.O. Box 118, 221 00 Lund, Sweden thomas@it.lth.se 2 FH Aargau,

More information

Fast matrix algebra for dense matrices with rank-deficient off-diagonal blocks

Fast matrix algebra for dense matrices with rank-deficient off-diagonal blocks CHAPTER 2 Fast matrix algebra for dense matrices with rank-deficient off-diagonal blocks Chapter summary: The chapter describes techniques for rapidly performing algebraic operations on dense matrices

More information

Extended Superposed Quantum State Initialization Using Disjoint Prime Implicants

Extended Superposed Quantum State Initialization Using Disjoint Prime Implicants Extended Superposed Quantum State Initialization Using Disjoint Prime Implicants David Rosenbaum, Marek Perkowski Portland State University, Department of Computer Science Portland State University, Department

More information

High Sum-Rate Three-Write and Non-Binary WOM Codes

High Sum-Rate Three-Write and Non-Binary WOM Codes Submitted to the IEEE TRANSACTIONS ON INFORMATION THEORY, 2012 1 High Sum-Rate Three-Write and Non-Binary WOM Codes Eitan Yaakobi, Amir Shpilka Abstract Write-once memory (WOM) is a storage medium with

More information

Digital Integrated Circuits A Design Perspective. Semiconductor. Memories. Memories

Digital Integrated Circuits A Design Perspective. Semiconductor. Memories. Memories Digital Integrated Circuits A Design Perspective Semiconductor Chapter Overview Memory Classification Memory Architectures The Memory Core Periphery Reliability Case Studies Semiconductor Memory Classification

More information

Logic BIST. Sungho Kang Yonsei University

Logic BIST. Sungho Kang Yonsei University Logic BIST Sungho Kang Yonsei University Outline Introduction Basics Issues Weighted Random Pattern Generation BIST Architectures Deterministic BIST Conclusion 2 Built In Self Test Test/ Normal Input Pattern

More information

An Algorithm for a Two-Disk Fault-Tolerant Array with (Prime 1) Disks

An Algorithm for a Two-Disk Fault-Tolerant Array with (Prime 1) Disks An Algorithm for a Two-Disk Fault-Tolerant Array with (Prime 1) Disks Sanjeeb Nanda and Narsingh Deo School of Computer Science University of Central Florida Orlando, Florida 32816-2362 sanjeeb@earthlink.net,

More information

ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN. Week 9 Dr. Srinivas Shakkottai Dept. of Electrical and Computer Engineering

ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN. Week 9 Dr. Srinivas Shakkottai Dept. of Electrical and Computer Engineering ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN Week 9 Dr. Srinivas Shakkottai Dept. of Electrical and Computer Engineering TIMING ANALYSIS Overview Circuits do not respond instantaneously to input changes

More information

MICROPROCESSOR REPORT. THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE

MICROPROCESSOR REPORT.   THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE MICROPROCESSOR www.mpronline.com REPORT THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE ENERGY COROLLARIES TO AMDAHL S LAW Analyzing the Interactions Between Parallel Execution and Energy Consumption By

More information

ALU A functional unit

ALU A functional unit ALU A functional unit that performs arithmetic operations such as ADD, SUB, MPY logical operations such as AND, OR, XOR, NOT on given data types: 8-,16-,32-, or 64-bit values A n-1 A n-2... A 1 A 0 B n-1

More information

Lecture 4: Linear Algebra 1

Lecture 4: Linear Algebra 1 Lecture 4: Linear Algebra 1 Sourendu Gupta TIFR Graduate School Computational Physics 1 February 12, 2010 c : Sourendu Gupta (TIFR) Lecture 4: Linear Algebra 1 CP 1 1 / 26 Outline 1 Linear problems Motivation

More information

Integer weight training by differential evolution algorithms

Integer weight training by differential evolution algorithms Integer weight training by differential evolution algorithms V.P. Plagianakos, D.G. Sotiropoulos, and M.N. Vrahatis University of Patras, Department of Mathematics, GR-265 00, Patras, Greece. e-mail: vpp

More information

Constructing Polar Codes Using Iterative Bit-Channel Upgrading. Arash Ghayoori. B.Sc., Isfahan University of Technology, 2011

Constructing Polar Codes Using Iterative Bit-Channel Upgrading. Arash Ghayoori. B.Sc., Isfahan University of Technology, 2011 Constructing Polar Codes Using Iterative Bit-Channel Upgrading by Arash Ghayoori B.Sc., Isfahan University of Technology, 011 A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree

More information

Chapter Overview. Memory Classification. Memory Architectures. The Memory Core. Periphery. Reliability. Memory

Chapter Overview. Memory Classification. Memory Architectures. The Memory Core. Periphery. Reliability. Memory SRAM Design Chapter Overview Classification Architectures The Core Periphery Reliability Semiconductor Classification RWM NVRWM ROM Random Access Non-Random Access EPROM E 2 PROM Mask-Programmed Programmable

More information

The Pennsylvania State University. The Graduate School. Department of Computer Science and Engineering

The Pennsylvania State University. The Graduate School. Department of Computer Science and Engineering The Pennsylvania State University The Graduate School Department of Computer Science and Engineering A SIMPLE AND FAST VECTOR SYMBOL REED-SOLOMON BURST ERROR DECODING METHOD A Thesis in Computer Science

More information

Relating Entropy Theory to Test Data Compression

Relating Entropy Theory to Test Data Compression Relating Entropy Theory to Test Data Compression Kedarnath J. Balakrishnan and Nur A. Touba Computer Engineering Research Center University of Texas, Austin, TX 7872 Email: {kjbala, touba}@ece.utexas.edu

More information

A robust multilevel approximate inverse preconditioner for symmetric positive definite matrices

A robust multilevel approximate inverse preconditioner for symmetric positive definite matrices DICEA DEPARTMENT OF CIVIL, ENVIRONMENTAL AND ARCHITECTURAL ENGINEERING PhD SCHOOL CIVIL AND ENVIRONMENTAL ENGINEERING SCIENCES XXX CYCLE A robust multilevel approximate inverse preconditioner for symmetric

More information

Lecture 12. Block Diagram

Lecture 12. Block Diagram Lecture 12 Goals Be able to encode using a linear block code Be able to decode a linear block code received over a binary symmetric channel or an additive white Gaussian channel XII-1 Block Diagram Data

More information

STUDY OF PERMUTATION MATRICES BASED LDPC CODE CONSTRUCTION

STUDY OF PERMUTATION MATRICES BASED LDPC CODE CONSTRUCTION EE229B PROJECT REPORT STUDY OF PERMUTATION MATRICES BASED LDPC CODE CONSTRUCTION Zhengya Zhang SID: 16827455 zyzhang@eecs.berkeley.edu 1 MOTIVATION Permutation matrices refer to the square matrices with

More information

Tribhuvan University Institute of Science and Technology 2067

Tribhuvan University Institute of Science and Technology 2067 11CSc. MTH. -2067 Tribhuvan University Institute of Science and Technology 2067 Bachelor Level/First Year/ Second Semester/ Science Full Marks: 80 Computer Science and Information Technology Pass Marks:

More information

A Detailed Study on Phase Predictors

A Detailed Study on Phase Predictors A Detailed Study on Phase Predictors Frederik Vandeputte, Lieven Eeckhout, and Koen De Bosschere Ghent University, Electronics and Information Systems Department Sint-Pietersnieuwstraat 41, B-9000 Gent,

More information

MTJ-Based Nonvolatile Logic-in-Memory Architecture and Its Application

MTJ-Based Nonvolatile Logic-in-Memory Architecture and Its Application 2011 11th Non-Volatile Memory Technology Symposium @ Shanghai, China, Nov. 9, 20112 MTJ-Based Nonvolatile Logic-in-Memory Architecture and Its Application Takahiro Hanyu 1,3, S. Matsunaga 1, D. Suzuki

More information

Branch Prediction based attacks using Hardware performance Counters IIT Kharagpur

Branch Prediction based attacks using Hardware performance Counters IIT Kharagpur Branch Prediction based attacks using Hardware performance Counters IIT Kharagpur March 19, 2018 Modular Exponentiation Public key Cryptography March 19, 2018 Branch Prediction Attacks 2 / 54 Modular Exponentiation

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

Optical Storage Technology. Error Correction

Optical Storage Technology. Error Correction Optical Storage Technology Error Correction Introduction With analog audio, there is no opportunity for error correction. With digital audio, the nature of binary data lends itself to recovery in the event

More information

AN increasing number of multimedia applications require. Streaming Codes with Partial Recovery over Channels with Burst and Isolated Erasures

AN increasing number of multimedia applications require. Streaming Codes with Partial Recovery over Channels with Burst and Isolated Erasures 1 Streaming Codes with Partial Recovery over Channels with Burst and Isolated Erasures Ahmed Badr, Student Member, IEEE, Ashish Khisti, Member, IEEE, Wai-Tian Tan, Member IEEE and John Apostolopoulos,

More information

Lecture 9 Evolutionary Computation: Genetic algorithms

Lecture 9 Evolutionary Computation: Genetic algorithms Lecture 9 Evolutionary Computation: Genetic algorithms Introduction, or can evolution be intelligent? Simulation of natural evolution Genetic algorithms Case study: maintenance scheduling with genetic

More information

Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano

Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano ... Our contribution PIPS-PSBB*: Multi-level parallelism for Stochastic

More information

Dense Arithmetic over Finite Fields with CUMODP

Dense Arithmetic over Finite Fields with CUMODP Dense Arithmetic over Finite Fields with CUMODP Sardar Anisul Haque 1 Xin Li 2 Farnam Mansouri 1 Marc Moreno Maza 1 Wei Pan 3 Ning Xie 1 1 University of Western Ontario, Canada 2 Universidad Carlos III,

More information

Design and Implementation of High Speed CRC Generators

Design and Implementation of High Speed CRC Generators Department of ECE, Adhiyamaan College of Engineering, Hosur, Tamilnadu, India Design and Implementation of High Speed CRC Generators ChidambarakumarS 1, Thaky Ahmed 2, UbaidullahMM 3, VenketeshK 4, JSubhash

More information

Outline. policies for the first part. with some potential answers... MCS 260 Lecture 10.0 Introduction to Computer Science Jan Verschelde, 9 July 2014

Outline. policies for the first part. with some potential answers... MCS 260 Lecture 10.0 Introduction to Computer Science Jan Verschelde, 9 July 2014 Outline 1 midterm exam on Friday 11 July 2014 policies for the first part 2 questions with some potential answers... MCS 260 Lecture 10.0 Introduction to Computer Science Jan Verschelde, 9 July 2014 Intro

More information

In biological terms, memory refers to the ability of neural systems to store activity patterns and later recall them when required.

In biological terms, memory refers to the ability of neural systems to store activity patterns and later recall them when required. In biological terms, memory refers to the ability of neural systems to store activity patterns and later recall them when required. In humans, association is known to be a prominent feature of memory.

More information

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute DIGITAL TECHNICS Dr. Bálint Pődör Óbuda University, Microelectronics and Technology Institute 4. LECTURE: COMBINATIONAL LOGIC DESIGN: ARITHMETICS (THROUGH EXAMPLES) 2016/2017 COMBINATIONAL LOGIC DESIGN:

More information

Quantum Algorithms. Andreas Klappenecker Texas A&M University. Lecture notes of a course given in Spring Preliminary draft.

Quantum Algorithms. Andreas Klappenecker Texas A&M University. Lecture notes of a course given in Spring Preliminary draft. Quantum Algorithms Andreas Klappenecker Texas A&M University Lecture notes of a course given in Spring 003. Preliminary draft. c 003 by Andreas Klappenecker. All rights reserved. Preface Quantum computing

More information

State of the art Image Compression Techniques

State of the art Image Compression Techniques Chapter 4 State of the art Image Compression Techniques In this thesis we focus mainly on the adaption of state of the art wavelet based image compression techniques to programmable hardware. Thus, an

More information

Power System Analysis Prof. A. K. Sinha Department of Electrical Engineering Indian Institute of Technology, Kharagpur. Lecture - 21 Power Flow VI

Power System Analysis Prof. A. K. Sinha Department of Electrical Engineering Indian Institute of Technology, Kharagpur. Lecture - 21 Power Flow VI Power System Analysis Prof. A. K. Sinha Department of Electrical Engineering Indian Institute of Technology, Kharagpur Lecture - 21 Power Flow VI (Refer Slide Time: 00:57) Welcome to lesson 21. In this

More information

The E8 Lattice and Error Correction in Multi-Level Flash Memory

The E8 Lattice and Error Correction in Multi-Level Flash Memory The E8 Lattice and Error Correction in Multi-Level Flash Memory Brian M. Kurkoski kurkoski@ice.uec.ac.jp University of Electro-Communications Tokyo, Japan ICC 2011 IEEE International Conference on Communications

More information

CS 700: Quantitative Methods & Experimental Design in Computer Science

CS 700: Quantitative Methods & Experimental Design in Computer Science CS 700: Quantitative Methods & Experimental Design in Computer Science Sanjeev Setia Dept of Computer Science George Mason University Logistics Grade: 35% project, 25% Homework assignments 20% midterm,

More information

WITH increasing complexity in systems design from increased

WITH increasing complexity in systems design from increased 150 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 51, NO. 1, FEBRUARY 2001 Data Compression in Space Under Generalized Mergeability Based on Concepts of Cover Table and Frequency Ordering

More information

of Digital Electronics

of Digital Electronics 26 Digital Electronics 729 Digital Electronics 26.1 Analog and Digital Signals 26.3 Binary Number System 26.5 Decimal to Binary Conversion 26.7 Octal Number System 26.9 Binary-Coded Decimal Code (BCD Code)

More information

B. Cyclic Codes. Primitive polynomials are the generator polynomials of cyclic codes.

B. Cyclic Codes. Primitive polynomials are the generator polynomials of cyclic codes. B. Cyclic Codes A cyclic code is a linear block code with the further property that a shift of a codeword results in another codeword. These are based on polynomials whose elements are coefficients from

More information

EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs)

EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) EECS150 - igital esign Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) Nov 21, 2002 John Wawrzynek Fall 2002 EECS150 Lec26-ECC Page 1 Outline Error detection using parity Hamming

More information

Linear Algebra and Eigenproblems

Linear Algebra and Eigenproblems Appendix A A Linear Algebra and Eigenproblems A working knowledge of linear algebra is key to understanding many of the issues raised in this work. In particular, many of the discussions of the details

More information

Introduction: Computer Science is a cluster of related scientific and engineering disciplines concerned with the study and application of computations. These disciplines range from the pure and basic scientific

More information

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 Star Joins A common structure for data mining of commercial data is the star join. For example, a chain store like Walmart keeps a fact table whose tuples each

More information

DRAM Reliability: Parity, ECC, Chipkill, Scrubbing. Alpha Particle or Cosmic Ray. electron-hole pairs. silicon. DRAM Memory System: Lecture 13

DRAM Reliability: Parity, ECC, Chipkill, Scrubbing. Alpha Particle or Cosmic Ray. electron-hole pairs. silicon. DRAM Memory System: Lecture 13 slide 1 DRAM Reliability: Parity, ECC, Chipkill, Scrubbing Alpha Particle or Cosmic Ray electron-hole pairs silicon Alpha Particles: Radioactive impurity in package material slide 2 - Soft errors were

More information

Hardware Design I Chap. 4 Representative combinational logic

Hardware Design I Chap. 4 Representative combinational logic Hardware Design I Chap. 4 Representative combinational logic E-mail: shimada@is.naist.jp Already optimized circuits There are many optimized circuits which are well used You can reduce your design workload

More information

Regenerating Codes and Locally Recoverable. Codes for Distributed Storage Systems

Regenerating Codes and Locally Recoverable. Codes for Distributed Storage Systems Regenerating Codes and Locally Recoverable 1 Codes for Distributed Storage Systems Yongjune Kim and Yaoqing Yang Abstract We survey the recent results on applying error control coding to distributed storage

More information

Introduction to Computer Engineering. CS/ECE 252, Fall 2012 Prof. Guri Sohi Computer Sciences Department University of Wisconsin Madison

Introduction to Computer Engineering. CS/ECE 252, Fall 2012 Prof. Guri Sohi Computer Sciences Department University of Wisconsin Madison Introduction to Computer Engineering CS/ECE 252, Fall 2012 Prof. Guri Sohi Computer Sciences Department University of Wisconsin Madison Chapter 3 Digital Logic Structures Slides based on set prepared by

More information

Tackling Intracell Variability in TLC Flash Through Tensor Product Codes

Tackling Intracell Variability in TLC Flash Through Tensor Product Codes Tackling Intracell Variability in TLC Flash Through Tensor Product Codes Ryan Gabrys, Eitan Yaakobi, Laura Grupp, Steven Swanson, Lara Dolecek University of California, Los Angeles University of California,

More information

Linear Programming in Matrix Form

Linear Programming in Matrix Form Linear Programming in Matrix Form Appendix B We first introduce matrix concepts in linear programming by developing a variation of the simplex method called the revised simplex method. This algorithm,

More information

Practical Polar Code Construction Using Generalised Generator Matrices

Practical Polar Code Construction Using Generalised Generator Matrices Practical Polar Code Construction Using Generalised Generator Matrices Berksan Serbetci and Ali E. Pusane Department of Electrical and Electronics Engineering Bogazici University Istanbul, Turkey E-mail:

More information

CSCI 2570 Introduction to Nanocomputing

CSCI 2570 Introduction to Nanocomputing CSCI 2570 Introduction to Nanocomputing Information Theory John E Savage What is Information Theory Introduced by Claude Shannon. See Wikipedia Two foci: a) data compression and b) reliable communication

More information

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using

More information

VHDL Implementation of Reed Solomon Improved Encoding Algorithm

VHDL Implementation of Reed Solomon Improved Encoding Algorithm VHDL Implementation of Reed Solomon Improved Encoding Algorithm P.Ravi Tej 1, Smt.K.Jhansi Rani 2 1 Project Associate, Department of ECE, UCEK, JNTUK, Kakinada A.P. 2 Assistant Professor, Department of

More information

Nearest Neighbor Search with Keywords: Compression

Nearest Neighbor Search with Keywords: Compression Nearest Neighbor Search with Keywords: Compression Yufei Tao KAIST June 3, 2013 In this lecture, we will continue our discussion on: Problem (Nearest Neighbor Search with Keywords) Let P be a set of points

More information