Complexity of Biomolecular Sequences

Size: px
Start display at page:

Download "Complexity of Biomolecular Sequences"

Transcription

1 Complexity of Biomolecular Sequences Institute of Signal Processing Tampere University of Technology Tampere University of Technology Page 1

2 Outline ➀ ➁ ➂ ➃ ➄ ➅ ➆ Introduction Biological Preliminaries Compression Preliminaries The Biocompress Program NML Model for Discrete Regression The GeNML Algorithm Results Tampere University of Technology Page 2

3 Introduction Complete DNA sequences of many organisms are known, and their number is rapidly increasing. The sequences are huge E. coli: 4,639,221 bp H. sapiens: 3 billion bp 4 letter alphabet 2 bits/bp Computational aid is necessary for processing these sequences Tampere University of Technology Page 3

4 Introduction 2 The objective of computational investigations is twofold: Compress the DNA sequences Cut down storage space and transmission costs Algorithm complexity is critical Model the statistical properties of the data Find patterns and structure within them Closely related to compression Algorithm complexity is not so important DNA compression has paramount importance in studying biomolecular sequences Tampere University of Technology Page 4

5 Introduction 3 Biological studies reveal important statistical information: DNA sequences contain many approximate tandem repeats Many essential genes have many copies There are only about 1,000 basic protein folding patterns Genes duplicate themselves for evolutionary purposes These suggest that DNA sequences are well compressible Tampere University of Technology Page 5

6 Introduction 4 Unfortunately, other known facts hamper efficient compression: Regularities are often blurred by Random mutation Translocation Crossover Reversal events Sequencing errors Only about 10% of a sequence contains genes The rest is considered to be non-coding Conclusion: compression of DNA is difficult! Tampere University of Technology Page 6

7 Biological Preliminaries Important facts about DNA sequences from biological studies: Sequences have 3D structure Can be reversibly unfolded into a string of symbols Four kinds of nucleotides: A, C, G, T (U) There are links between complementary bases: A T, C G Some pairs of complementary subsequences are mapped together Such pairs of subsequences are called palindromes Tampere University of Technology Page 7

8 Biological Preliminaries 2 Figure 1: Secondary structure of an RNA sequence Tampere University of Technology Page 8

9 Compression Preliminaries Further observations about DNA sequences: ➀ Repetitions are a Very sparse b Relatively long c Roughly half the time palindromes ➁ Long sequences often can be matched approximately by a Deletions b Substitutions c Insertions ➂ Repetitions can occur far from each other ➃ Contextual correlation is not too significant Tampere University of Technology Page 9

10 Compression Preliminaries 2 Problems with existing technologies: General purpose coders are not good because of ➀c, ➁, ➂, ➃ PPM and its derivatives are not good because of ➀a, ➀b, ➁ BWT is not good because of ➀a, ➀c, ➁ Substitution-based methods have difficulties with ➁, ➃ Observation: among these algorithm classes, substitution-based methods offer the best performance on DNA sequences. Substitutional methods are popular in DNA encoders. Tampere University of Technology Page 10

11 Compression Preliminaries 3 The pair of factors f and f 1 α is called a palindrome if f denotes a sequence a 1, a 2,..., a n f 1 is the sequence in inverse order: a n, a n 1,..., a 1 f α is complementing each character in the sequence A T C G E.g. if f =AAACGT, then fα 1 =ACGTTT. Tampere University of Technology Page 11

12 Compression Preliminaries 4 We can measure the complexity of DNA sequences by DNA compression: Provide a compact representation of a DNA sequence from which the exact replica of the original can be restored. Practical considerations are important Running time Complexity (memory, code) We are not concerned with lossy compression Tampere University of Technology Page 12

13 Compression Preliminaries 5 Another approach for estimating the complexity is Entropy estimation: Provide a reliable entropy estimate that asymptotically converges to the actual entropy. Practical requirements are less significant Usability is less as well Entropy estimates are difficult to be justified Tampere University of Technology Page 13

14 The Biocompress Program S. Grumbach, F. Tahi, Compression of DNA sequences, Data Compression Conference 1993, DCC 93, pp , Encodes a text on the four letter alphabet Produces a binary sequence Follows the LZSS scenario Window has the size that of the input Substitutes factors with earlier occurrences Occurrences are either identical or palindrome References are shorter than factors they refer to Tampere University of Technology Page 14

15 The Biocompress Program 2 Representing numbers Sequence lengths: reversed Fibonacci number followed by 1 Match lengths: Matches shorter than 7 are discarded Matches between 7 and 38 are written in 5 bits Matches beyond 38: followed by Fibonacci of remainder Match positions: the shorter of binary or Fibonacci form Tampere University of Technology Page 15

16 The Biocompress Program 3 S. Grumbach, F. Tahi, A new challenge for compression algorithms: Genetic sequences, J. Inform. Process. Manage., vol. 30, no. 6, pp , An improved version of Biocompress is Biocompress-2: Literal and LZSS coding remain the same as in Biocompress Order-2 context coder with arithmetic coding has been added Best of the three methods codes a small prefix of the input Tampere University of Technology Page 16

17 Discrete Regression Encode a vector using another known vector from a sample space Block-based decomposition Regressor is chosen from past, finite data For genomic data and a given regressor Bit mask of matching symbols is deemed to be output of a memoryless source Non-matching symbols take each other symbol with equal probability This decomposes problem to encoding bit patterns Tampere University of Technology Page 17

18 Discrete Regression 2 Tampere University of Technology Page 18

19 The NML Model for Discrete Regression I. Tabus, G. Korodi, J. Rissanen, DNA sequence compression using the normalized maximum likelihood model for discrete regression, Data Compression Conference 2003, DCC 03, pp , Practical encoder with lightweight requirements Effectively combines NML, context, clear, RLE plus AE Instances of NML and context are used with different parameters Best of the methods codes a small prefix of the input Tampere University of Technology Page 19

20 The NML Model for Discrete Regression 2 Objective: encode a sequence y n based on a known discrete regressor sequence x n Choose a parametric probability model P (y n x n ; Θ) Obtain the maximized likelihood P (y n x n ; ˆΘ(y n, x n )) Obtain the universal NML model ˆP (y n x n ) by normalization of the maximized likelihood Using this model y n is encoded by an arithmetic coder x n is chosen so that the Hamming-distance between x n and y n is minimized. Tampere University of Technology Page 20

21 The NML Model for Discrete Regression 3 Benefits: Inherently suitable for ➀, ➁b, ➂ Practical encoders are feasible with low complexity Drawbacks: Cannot efficiently handle ➁a, ➁c Some remedy is provided by block-based decomposition Not so optimal with ➀b Solution: run-length coding is added Difficulties with ➃ Solution: a low-order context coder is added Tampere University of Technology Page 21

22 The NML Model for Discrete Regression 4 Denoting the block to be encoded by y n and the regressor block by x n, P (y i x i ; θ) = θ if y i = x i ψ if y i x i, with ψ = 1 θ. We extend this model to blocks as M 1 P (y n x n ; θ) = θ n 1 i=0 χ(y i=x i ) ψ n 1 i=0 χ(y i x i ) = θ n m ψ n n m. Tampere University of Technology Page 22

23 The NML Model for Discrete Regression 5 Since ˆθ(y n, x n ) = n m n, the maximized likelihood is P (y n x n ; ˆθ(y n, x n )) = ( n mn ) nm ( n nm n(m 1)) n nm. For normalization use blocks similar enough to x n : ˆP (y n x n ) = m Λn (M 1)n m ( n m n ) ( n m n nm n m n(m 1) ) n n m ( m n ) m( n m n(m 1) ) n m. Tampere University of Technology Page 23

24 The NML Model for Discrete Regression 6 The set Λ n = {N(w, n)..., n} are computed from N(w, n) = min{n m L w NML (n m, N(w, n)) + log 2 w + 1 < 2n}. Introducing C n,n = m N ( n m ) ( m ) m ( n m ) n m n n, the NML code length is L NML (n m, N) = log 2 C n,n n m log 2 nm n (n n m ) log 2 n nm n + (n n m ) log 2 (M 1). Tampere University of Technology Page 24

25 The NML Model for Discrete Regression Unconstrained normalization Constrained normalization Clear representation NML code length Number of matching bases n m Figure 2: NML code length when Y contains all possible blocks (unconstrained normalization) and when Y x n contains only the blocks with a number of correct matches larger than N = 30, for n = 48. Tampere University of Technology Page 25

26 The NML Model for Discrete Regression 8 Encoding a block y n with the NML model goes as ➀ Find the best regressor x n, ➁ Encode the position and direction (normal or palindrome) of x n, ➂ Encode the binary mask b n where b i = χ(y i = x i ), ➃ Correct the non-matching characters indicated by b n. Tampere University of Technology Page 26

27 The NML Model for Discrete Regression 9 For finding the best regressor, Both normal and palindrome matches are used Regressor is fully inside s max{ln n w+1,0},..., s ln n A contiguous string of 0 s of length k is conditioned in b n This requirement is used to speed up the search Increasing k is much faster, with little loss in compression Tampere University of Technology Page 27

28 The NML Model for Discrete Regression Search acceleration Exact formula A(n,r) Lower bound D(n,r) Seed length r Figure 3: Acceleration of the search using seeds of length r, when the block size is n = 48. Tampere University of Technology Page 28

29 The NML Model for Discrete Regression Relative decrease in compression ratio NML+clear GeNML Seed length r Figure 4: Relative reduction in compression ratio EL I 2 EL I1 EL I1 when using seeds of length r, against the case of exhaustive search, when the block size is n = 48. Circular marks show P (miss n m ) obtained on a random sequence, but with probabilities P (n m ) collected from the DNA sequence HUMGHCSA. Triangular marks show the change in performance of GeNML on the same file. Tampere University of Technology Page 29

30 The NML Model for Discrete Regression 12 Next, the position and direction of the best match is encoded Normally this takes up log 2 min{(l 1)n + 1, w} + 1 bits Long approximate matches are efficiently coded with match prediction Long exact matches are coded with run lengths Tampere University of Technology Page 30

31 The NML Model for Discrete Regression 13 The binary mask b n can be encoded in two steps ➀ The number of matching bases n m is encoded according to P (n m ) = b n i b i =n m ˆP (b n ) = ( n n m ) ( n m n ) n m ( n nm n C n,n ) n n m ➁ The binary mask b n is encoded bit-wise with the distribution P (b k = 0) = n k n(k) n k, P (b k = 1) = n(k) n k, where n(k) = n 1 j=k b j. Tampere University of Technology Page 31

32 The NML Model for Discrete Regression 14 The overall code length for NML at position ln, window size w is L 1 (y n ) = log 2 C n,n n m log 2 n mn (n n m ) log 2 n n m n + (n n m ) log 2 (M 1) + log 2 min{(l 1)n + 1, w} + 1. Tampere University of Technology Page 32

33 The GeNML Algorithm and Ioan Tabus, An efficient normalized maximum likelihood algorithm for DNA sequence compression, ACM Trans. on Information Systems, vol. 23, no. 1, pp. 3 34, Improved compression efficiency Practical resource requirements Complex model of several algorithms each with several instances Tampere University of Technology Page 33

34 The GeNML Algorithm 2 Context coder serves as an auxiliary coder that presumably complements NML in performance NML cannot capture redundancy concentrating in a small area For these blocks order-1 context coder is used Parameters for this coder are set as η(a k a j ) = n (a k a j ) a n (a a j ) The overall code length for the context coder is L 2 (y n ) = n i=1 log 2 η(y i y i 1 ). Tampere University of Technology Page 34

35 The GeNML Algorithm 3 DNA data are often statistically random appearing Since these parts cannot be compressed, clear encoding is used The code length for the clear representation is L 3 (y n ) = 2n. Tampere University of Technology Page 35

36 The GeNML Algorithm 4 The GeNML algorithm is outlined in the followings: DNA sequence is split into macroblocks One instance of NML, context and clear coder forms a group Objects in different groups have different parameters The best group is selected to compress the next macroblock Inside the macroblock, compression is done block-wise The best algorithm of the group compresses the next block Tampere University of Technology Page 36

37 The GeNML Algorithm 5 Step 1. Set parameters n 0, H 0, δ, C. Let m = δ C 1 n 0. Step 2. For each macroblock M k = s km,..., s (k+1)m 1 Step 2.1 Let n = n 0, H = H 0. Step 2.2 Step 2.3 Let L n = 0. For each block y n in M k Step Compute L 1, L 2, L 3. Step Let L n = L n + min{l 1, L 2, L 3 } Step 2.4 If n < m Step 2.5 Step 2.6 then let n = δ n, H = H/δ, and go to Step 2.2. else proceed to next Step. Find n b for which L nb = min n {L n }, let H b = nh n b. Signal n b in the compressed stream. For each block y n b in M k Step Repeat Step Step Signal the best algorithm found in the previous Step and encode y n b. Figure 5: The specification of the GeNML algorithm. Tampere University of Technology Page 37

38 The GeNML Algorithm Clear Context NML Best Predicted Predicted=Best (a) (b) Figure 6: (a) The number of times the clear representation, order-1 context and NML with n = 48 prove to be the best. (b) The number of times the best match is used, the predicted match is used, and the best match is predicted. Tampere University of Technology Page 38

39 Results Sequence Size Bio2 Gen2 CTW DNA GeNML CHMPXX CHNTXX HEHCMVCG HUMDYSTROP HUMGHCSA HUMHDABCD HUMHPRTB MPOMTCG MTPACG VACCG Table 1: Comparison of the compression (in bits per base) obtained by algorithms Biocompress-2 (Bio2), GenCompress-2 (Gen2), CTW-LZ (CTW), DNACompress (DNA) and GeNML (GeNML). Tampere University of Technology Page 39

40 Results 2 Sequence Size Cfact GenCompress-2 GeNML Atatsgs Atef1a Atrdnaf Atrdnai Celk07e HSG6PDGEN Mmzp3g Xlxfg Table 2: Comparison of the compression (in bits per base) obtained by the algorithms Cfact, GenCompress-2 and GeNML. Tampere University of Technology Page 40

41 Human Genome Compression The program GeNML is suitable for compressing the entire Human Genome Original size: 3,070,521,116 bases 732 Mbytes Number of specified bases: 2,832,183, Mbytes GeNML window size: 1 Mbyte GeNML seed length: 8 Compressed size: 589,323,192 bytes Mbytes Compression ratio for specified bases: bpb Tampere University of Technology Page 41

Sequence comparison by compression

Sequence comparison by compression Sequence comparison by compression Motivation similarity as a marker for homology. And homology is used to infer function. Sometimes, we are only interested in a numerical distance between two sequences.

More information

Technical Report TR-INF UNIPMN. A Simple and Fast DNA Compressor

Technical Report TR-INF UNIPMN. A Simple and Fast DNA Compressor Università del Piemonte Orientale. Dipartimento di Informatica. http://www.di.unipmn.it 1 Technical Report TR-INF-2003-04-03-UNIPMN A Simple and Fast DNA Compressor Giovanni Manzini Marcella Rastero April

More information

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding SIGNAL COMPRESSION Lecture 7 Variable to Fix Encoding 1. Tunstall codes 2. Petry codes 3. Generalized Tunstall codes for Markov sources (a presentation of the paper by I. Tabus, G. Korodi, J. Rissanen.

More information

Estimating DNA Sequence Entropy. No all of the sequence information that gets copied. result in a viable organism, therefore there are

Estimating DNA Sequence Entropy. No all of the sequence information that gets copied. result in a viable organism, therefore there are Estimating DNA Sequence Entropy J. Kevin Lanctot University of Waterloo Ming Li y University of Waterloo En-hui Yang z University of Waterloo Abstract This paper presents the rst entropy estimator for

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

MACFP: Maximal Approximate Consecutive Frequent Pattern Mining under Edit Distance

MACFP: Maximal Approximate Consecutive Frequent Pattern Mining under Edit Distance MACFP: Maximal Approximate Consecutive Frequent Pattern Mining under Edit Distance Jingbo Shang, Jian Peng, Jiawei Han University of Illinois, Urbana-Champaign May 6, 2016 Presented by Jingbo Shang 2 Outline

More information

Lecture 4 : Adaptive source coding algorithms

Lecture 4 : Adaptive source coding algorithms Lecture 4 : Adaptive source coding algorithms February 2, 28 Information Theory Outline 1. Motivation ; 2. adaptive Huffman encoding ; 3. Gallager and Knuth s method ; 4. Dictionary methods : Lempel-Ziv

More information

Physical Layer and Coding

Physical Layer and Coding Physical Layer and Coding Muriel Médard Professor EECS Overview A variety of physical media: copper, free space, optical fiber Unified way of addressing signals at the input and the output of these media:

More information

Autumn Coping with NP-completeness (Conclusion) Introduction to Data Compression

Autumn Coping with NP-completeness (Conclusion) Introduction to Data Compression Autumn Coping with NP-completeness (Conclusion) Introduction to Data Compression Kirkpatrick (984) Analogy from thermodynamics. The best crystals are found by annealing. First heat up the material to let

More information

1 Introduction to information theory

1 Introduction to information theory 1 Introduction to information theory 1.1 Introduction In this chapter we present some of the basic concepts of information theory. The situations we have in mind involve the exchange of information through

More information

UNIT I INFORMATION THEORY. I k log 2

UNIT I INFORMATION THEORY. I k log 2 UNIT I INFORMATION THEORY Claude Shannon 1916-2001 Creator of Information Theory, lays the foundation for implementing logic in digital circuits as part of his Masters Thesis! (1939) and published a paper

More information

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE General e Image Coder Structure Motion Video x(s 1,s 2,t) or x(s 1,s 2 ) Natural Image Sampling A form of data compression; usually lossless, but can be lossy Redundancy Removal Lossless compression: predictive

More information

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet) Compression Motivation Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet) Storage: Store large & complex 3D models (e.g. 3D scanner

More information

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Multimedia Communications. Mathematical Preliminaries for Lossless Compression Multimedia Communications Mathematical Preliminaries for Lossless Compression What we will see in this chapter Definition of information and entropy Modeling a data source Definition of coding and when

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION doi:10.1038/nature11875 Method for Encoding and Decoding Arbitrary Computer Files in DNA Fragments 1 Encoding 1.1: An arbitrary computer file is represented as a string S 0 of

More information

Information and Entropy

Information and Entropy Information and Entropy Shannon s Separation Principle Source Coding Principles Entropy Variable Length Codes Huffman Codes Joint Sources Arithmetic Codes Adaptive Codes Thomas Wiegand: Digital Image Communication

More information

CSCI 2570 Introduction to Nanocomputing

CSCI 2570 Introduction to Nanocomputing CSCI 2570 Introduction to Nanocomputing Information Theory John E Savage What is Information Theory Introduced by Claude Shannon. See Wikipedia Two foci: a) data compression and b) reliable communication

More information

Enumeration and symmetry of edit metric spaces. Jessie Katherine Campbell. A dissertation submitted to the graduate faculty

Enumeration and symmetry of edit metric spaces. Jessie Katherine Campbell. A dissertation submitted to the graduate faculty Enumeration and symmetry of edit metric spaces by Jessie Katherine Campbell A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY

More information

3F1 Information Theory, Lecture 3

3F1 Information Theory, Lecture 3 3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2011, 28 November 2011 Memoryless Sources Arithmetic Coding Sources with Memory 2 / 19 Summary of last lecture Prefix-free

More information

A Repetitive Corpus Testbed

A Repetitive Corpus Testbed Chapter 3 A Repetitive Corpus Testbed In this chapter we present a corpus of repetitive texts. These texts are categorized according to the source they come from into the following: Artificial Texts, Pseudo-

More information

Digital communication system. Shannon s separation principle

Digital communication system. Shannon s separation principle Digital communication system Representation of the source signal by a stream of (binary) symbols Adaptation to the properties of the transmission channel information source source coder channel coder modulation

More information

3F1 Information Theory, Lecture 3

3F1 Information Theory, Lecture 3 3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2013, 29 November 2013 Memoryless Sources Arithmetic Coding Sources with Memory Markov Example 2 / 21 Encoding the output

More information

Text Compression. Jayadev Misra The University of Texas at Austin December 5, A Very Incomplete Introduction to Information Theory 2

Text Compression. Jayadev Misra The University of Texas at Austin December 5, A Very Incomplete Introduction to Information Theory 2 Text Compression Jayadev Misra The University of Texas at Austin December 5, 2003 Contents 1 Introduction 1 2 A Very Incomplete Introduction to Information Theory 2 3 Huffman Coding 5 3.1 Uniquely Decodable

More information

Indexing LZ77: The Next Step in Self-Indexing. Gonzalo Navarro Department of Computer Science, University of Chile

Indexing LZ77: The Next Step in Self-Indexing. Gonzalo Navarro Department of Computer Science, University of Chile Indexing LZ77: The Next Step in Self-Indexing Gonzalo Navarro Department of Computer Science, University of Chile gnavarro@dcc.uchile.cl Part I: Why Jumping off the Cliff The Past Century Self-Indexing:

More information

Introduction to Sequence Alignment. Manpreet S. Katari

Introduction to Sequence Alignment. Manpreet S. Katari Introduction to Sequence Alignment Manpreet S. Katari 1 Outline 1. Global vs. local approaches to aligning sequences 1. Dot Plots 2. BLAST 1. Dynamic Programming 3. Hash Tables 1. BLAT 4. BWT (Burrow Wheeler

More information

17.1 Binary Codes Normal numbers we use are in base 10, which are called decimal numbers. Each digit can be 10 possible numbers: 0, 1, 2, 9.

17.1 Binary Codes Normal numbers we use are in base 10, which are called decimal numbers. Each digit can be 10 possible numbers: 0, 1, 2, 9. ( c ) E p s t e i n, C a r t e r, B o l l i n g e r, A u r i s p a C h a p t e r 17: I n f o r m a t i o n S c i e n c e P a g e 1 CHAPTER 17: Information Science 17.1 Binary Codes Normal numbers we use

More information

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for MARKOV CHAINS A finite state Markov chain is a sequence S 0,S 1,... of discrete cv s from a finite alphabet S where q 0 (s) is a pmf on S 0 and for n 1, Q(s s ) = Pr(S n =s S n 1 =s ) = Pr(S n =s S n 1

More information

4. Quantization and Data Compression. ECE 302 Spring 2012 Purdue University, School of ECE Prof. Ilya Pollak

4. Quantization and Data Compression. ECE 302 Spring 2012 Purdue University, School of ECE Prof. Ilya Pollak 4. Quantization and Data Compression ECE 32 Spring 22 Purdue University, School of ECE Prof. What is data compression? Reducing the file size without compromising the quality of the data stored in the

More information

Noisy channel communication

Noisy channel communication Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 6 Communication channels and Information Some notes on the noisy channel setup: Iain Murray, 2012 School of Informatics, University

More information

On-line String Matching in Highly Similar DNA Sequences

On-line String Matching in Highly Similar DNA Sequences On-line String Matching in Highly Similar DNA Sequences Nadia Ben Nsira 1,2,ThierryLecroq 1,,MouradElloumi 2 1 LITIS EA 4108, Normastic FR3638, University of Rouen, France 2 LaTICE, University of Tunis

More information

Complementary Contextual Models with FM-index for DNA Compression

Complementary Contextual Models with FM-index for DNA Compression 2017 Data Compression Conference Complementary Contextual Models with FM-index for DNA Compression Wenjing Fan,WenruiDai,YongLi, and Hongkai Xiong Department of Electronic Engineering Department of Biomedical

More information

Fast Progressive Wavelet Coding

Fast Progressive Wavelet Coding PRESENTED AT THE IEEE DCC 99 CONFERENCE SNOWBIRD, UTAH, MARCH/APRIL 1999 Fast Progressive Wavelet Coding Henrique S. Malvar Microsoft Research One Microsoft Way, Redmond, WA 98052 E-mail: malvar@microsoft.com

More information

Electrical and Information Technology. Information Theory. Problems and Solutions. Contents. Problems... 1 Solutions...7

Electrical and Information Technology. Information Theory. Problems and Solutions. Contents. Problems... 1 Solutions...7 Electrical and Information Technology Information Theory Problems and Solutions Contents Problems.......... Solutions...........7 Problems 3. In Problem?? the binomial coefficent was estimated with Stirling

More information

Context tree models for source coding

Context tree models for source coding Context tree models for source coding Toward Non-parametric Information Theory Licence de droits d usage Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding

More information

INFORMATION-THEORETIC BOUNDS OF EVOLUTIONARY PROCESSES MODELED AS A PROTEIN COMMUNICATION SYSTEM. Liuling Gong, Nidhal Bouaynaya and Dan Schonfeld

INFORMATION-THEORETIC BOUNDS OF EVOLUTIONARY PROCESSES MODELED AS A PROTEIN COMMUNICATION SYSTEM. Liuling Gong, Nidhal Bouaynaya and Dan Schonfeld INFORMATION-THEORETIC BOUNDS OF EVOLUTIONARY PROCESSES MODELED AS A PROTEIN COMMUNICATION SYSTEM Liuling Gong, Nidhal Bouaynaya and Dan Schonfeld University of Illinois at Chicago, Dept. of Electrical

More information

CS6304 / Analog and Digital Communication UNIT IV - SOURCE AND ERROR CONTROL CODING PART A 1. What is the use of error control coding? The main use of error control coding is to reduce the overall probability

More information

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding SIGNAL COMPRESSION Lecture 3 4.9.2007 Shannon-Fano-Elias Codes and Arithmetic Coding 1 Shannon-Fano-Elias Coding We discuss how to encode the symbols {a 1, a 2,..., a m }, knowing their probabilities,

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Chapter 8: Introduction to Evolutionary Computation

Chapter 8: Introduction to Evolutionary Computation Computational Intelligence: Second Edition Contents Some Theories about Evolution Evolution is an optimization process: the aim is to improve the ability of an organism to survive in dynamically changing

More information

Motivation for Arithmetic Coding

Motivation for Arithmetic Coding Motivation for Arithmetic Coding Motivations for arithmetic coding: 1) Huffman coding algorithm can generate prefix codes with a minimum average codeword length. But this length is usually strictly greater

More information

Entropies & Information Theory

Entropies & Information Theory Entropies & Information Theory LECTURE I Nilanjana Datta University of Cambridge,U.K. See lecture notes on: http://www.qi.damtp.cam.ac.uk/node/223 Quantum Information Theory Born out of Classical Information

More information

Shannon-Fano-Elias coding

Shannon-Fano-Elias coding Shannon-Fano-Elias coding Suppose that we have a memoryless source X t taking values in the alphabet {1, 2,..., L}. Suppose that the probabilities for all symbols are strictly positive: p(i) > 0, i. The

More information

Introduction to Information Theory. By Prof. S.J. Soni Asst. Professor, CE Department, SPCE, Visnagar

Introduction to Information Theory. By Prof. S.J. Soni Asst. Professor, CE Department, SPCE, Visnagar Introduction to Information Theory By Prof. S.J. Soni Asst. Professor, CE Department, SPCE, Visnagar Introduction [B.P. Lathi] Almost in all the means of communication, none produces error-free communication.

More information

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code Chapter 3 Source Coding 3. An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code 3. An Introduction to Source Coding Entropy (in bits per symbol) implies in average

More information

Cold Boot Attacks in the Discrete Logarithm Setting

Cold Boot Attacks in the Discrete Logarithm Setting Cold Boot Attacks in the Discrete Logarithm Setting B. Poettering 1 & D. L. Sibborn 2 1 Ruhr University of Bochum 2 Royal Holloway, University of London October, 2015 Outline of the talk 1 Introduction

More information

Bioinformatics Chapter 1. Introduction

Bioinformatics Chapter 1. Introduction Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!

More information

Chapter 9 Fundamental Limits in Information Theory

Chapter 9 Fundamental Limits in Information Theory Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For

More information

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically

More information

Data Compression Techniques

Data Compression Techniques Data Compression Techniques Part 1: Entropy Coding Lecture 4: Asymmetric Numeral Systems Juha Kärkkäinen 08.11.2017 1 / 19 Asymmetric Numeral Systems Asymmetric numeral systems (ANS) is a recent entropy

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

Data Compression Techniques

Data Compression Techniques Data Compression Techniques Part 2: Text Compression Lecture 5: Context-Based Compression Juha Kärkkäinen 14.11.2017 1 / 19 Text Compression We will now look at techniques for text compression. These techniques

More information

Image and Multidimensional Signal Processing

Image and Multidimensional Signal Processing Image and Multidimensional Signal Processing Professor William Hoff Dept of Electrical Engineering &Computer Science http://inside.mines.edu/~whoff/ Image Compression 2 Image Compression Goal: Reduce amount

More information

Algorithms: COMP3121/3821/9101/9801

Algorithms: COMP3121/3821/9101/9801 NEW SOUTH WALES Algorithms: COMP3121/3821/9101/9801 Aleks Ignjatović School of Computer Science and Engineering University of New South Wales TOPIC 4: THE GREEDY METHOD COMP3121/3821/9101/9801 1 / 23 The

More information

Coding on Countably Infinite Alphabets

Coding on Countably Infinite Alphabets Coding on Countably Infinite Alphabets Non-parametric Information Theory Licence de droits d usage Outline Lossless Coding on infinite alphabets Source Coding Universal Coding Infinite Alphabets Enveloppe

More information

CMPT 365 Multimedia Systems. Lossless Compression

CMPT 365 Multimedia Systems. Lossless Compression CMPT 365 Multimedia Systems Lossless Compression Spring 2017 Edited from slides by Dr. Jiangchuan Liu CMPT365 Multimedia Systems 1 Outline Why compression? Entropy Variable Length Coding Shannon-Fano Coding

More information

Lecture 12. Block Diagram

Lecture 12. Block Diagram Lecture 12 Goals Be able to encode using a linear block code Be able to decode a linear block code received over a binary symmetric channel or an additive white Gaussian channel XII-1 Block Diagram Data

More information

Lecture 1 : Data Compression and Entropy

Lecture 1 : Data Compression and Entropy CPS290: Algorithmic Foundations of Data Science January 8, 207 Lecture : Data Compression and Entropy Lecturer: Kamesh Munagala Scribe: Kamesh Munagala In this lecture, we will study a simple model for

More information

A General-Purpose Counting Filter: Making Every Bit Count. Prashant Pandey, Michael A. Bender, Rob Johnson, Rob Patro Stony Brook University, NY

A General-Purpose Counting Filter: Making Every Bit Count. Prashant Pandey, Michael A. Bender, Rob Johnson, Rob Patro Stony Brook University, NY A General-Purpose Counting Filter: Making Every Bit Count Prashant Pandey, Michael A. Bender, Rob Johnson, Rob Patro Stony Brook University, NY Approximate Membership Query (AMQ) insert(x) ismember(x)

More information

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid. 1. A change that makes a polypeptide defective has been discovered in its amino acid sequence. The normal and defective amino acid sequences are shown below. Researchers are attempting to reproduce the

More information

On Universal Types. Gadiel Seroussi Hewlett-Packard Laboratories Palo Alto, California, USA. University of Minnesota, September 14, 2004

On Universal Types. Gadiel Seroussi Hewlett-Packard Laboratories Palo Alto, California, USA. University of Minnesota, September 14, 2004 On Universal Types Gadiel Seroussi Hewlett-Packard Laboratories Palo Alto, California, USA University of Minnesota, September 14, 2004 Types for Parametric Probability Distributions A = finite alphabet,

More information

Compressing Tabular Data via Pairwise Dependencies

Compressing Tabular Data via Pairwise Dependencies Compressing Tabular Data via Pairwise Dependencies Amir Ingber, Yahoo! Research TCE Conference, June 22, 2017 Joint work with Dmitri Pavlichin, Tsachy Weissman (Stanford) Huge datasets: everywhere - Internet

More information

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal

More information

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

6. Iterative Methods for Linear Systems. The stepwise approach to the solution... 6 Iterative Methods for Linear Systems The stepwise approach to the solution Miriam Mehl: 6 Iterative Methods for Linear Systems The stepwise approach to the solution, January 18, 2013 1 61 Large Sparse

More information

Compression and Coding

Compression and Coding Compression and Coding Theory and Applications Part 1: Fundamentals Gloria Menegaz 1 Transmitter (Encoder) What is the problem? Receiver (Decoder) Transformation information unit Channel Ordering (significance)

More information

Bloom Filters, Minhashes, and Other Random Stuff

Bloom Filters, Minhashes, and Other Random Stuff Bloom Filters, Minhashes, and Other Random Stuff Brian Brubach University of Maryland, College Park StringBio 2018, University of Central Florida What? Probabilistic Space-efficient Fast Not exact Why?

More information

Implementation of Lossless Huffman Coding: Image compression using K-Means algorithm and comparison vs. Random numbers and Message source

Implementation of Lossless Huffman Coding: Image compression using K-Means algorithm and comparison vs. Random numbers and Message source Implementation of Lossless Huffman Coding: Image compression using K-Means algorithm and comparison vs. Random numbers and Message source Ali Tariq Bhatti 1, Dr. Jung Kim 2 1,2 Department of Electrical

More information

CSEP 521 Applied Algorithms Spring Statistical Lossless Data Compression

CSEP 521 Applied Algorithms Spring Statistical Lossless Data Compression CSEP 52 Applied Algorithms Spring 25 Statistical Lossless Data Compression Outline for Tonight Basic Concepts in Data Compression Entropy Prefix codes Huffman Coding Arithmetic Coding Run Length Coding

More information

Sequence analysis and Genomics

Sequence analysis and Genomics Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute

More information

Molecular Evolution & the Origin of Variation

Molecular Evolution & the Origin of Variation Molecular Evolution & the Origin of Variation What Is Molecular Evolution? Molecular evolution differs from phenotypic evolution in that mutations and genetic drift are much more important determinants

More information

Molecular Evolution & the Origin of Variation

Molecular Evolution & the Origin of Variation Molecular Evolution & the Origin of Variation What Is Molecular Evolution? Molecular evolution differs from phenotypic evolution in that mutations and genetic drift are much more important determinants

More information

Huffman Coding. C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University

Huffman Coding. C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University Huffman Coding C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University http://www.csie.nctu.edu.tw/~cmliu/courses/compression/ Office: EC538 (03)573877 cmliu@cs.nctu.edu.tw

More information

Lecture 3 : Algorithms for source coding. September 30, 2016

Lecture 3 : Algorithms for source coding. September 30, 2016 Lecture 3 : Algorithms for source coding September 30, 2016 Outline 1. Huffman code ; proof of optimality ; 2. Coding with intervals : Shannon-Fano-Elias code and Shannon code ; 3. Arithmetic coding. 1/39

More information

Computation Theory Finite Automata

Computation Theory Finite Automata Computation Theory Dept. of Computing ITT Dublin October 14, 2010 Computation Theory I 1 We would like a model that captures the general nature of computation Consider two simple problems: 2 Design a program

More information

Asymptotic redundancy and prolixity

Asymptotic redundancy and prolixity Asymptotic redundancy and prolixity Yuval Dagan, Yuval Filmus, and Shay Moran April 6, 2017 Abstract Gallager (1978) considered the worst-case redundancy of Huffman codes as the maximum probability tends

More information

Intermittent Communication

Intermittent Communication Intermittent Communication Mostafa Khoshnevisan, Student Member, IEEE, and J. Nicholas Laneman, Senior Member, IEEE arxiv:32.42v2 [cs.it] 7 Mar 207 Abstract We formulate a model for intermittent communication

More information

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006)

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006) MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK SATELLITE COMMUNICATION DEPT./SEM.:ECE/VIII UNIT V PART-A 1. What is binary symmetric channel (AUC DEC 2006) 2. Define information rate? (AUC DEC 2007)

More information

Chapter 2 Source Models and Entropy. Any information-generating process can be viewed as. computer program in executed form: binary 0

Chapter 2 Source Models and Entropy. Any information-generating process can be viewed as. computer program in executed form: binary 0 Part II Information Theory Concepts Chapter 2 Source Models and Entropy Any information-generating process can be viewed as a source: { emitting a sequence of symbols { symbols from a nite alphabet text:

More information

Reducing storage requirements for biological sequence comparison

Reducing storage requirements for biological sequence comparison Bioinformatics Advance Access published July 15, 2004 Bioinfor matics Oxford University Press 2004; all rights reserved. Reducing storage requirements for biological sequence comparison Michael Roberts,

More information

Mutual information content of homologous DNA sequences

Mutual information content of homologous DNA sequences Mutual information content of homologous DNA sequences 55 Mutual information content of homologous DNA sequences Helena Cristina G. Leitão, Luciana S. Pessôa and Jorge Stolfi Instituto de Computação, Universidade

More information

Bio nformatics. Lecture 3. Saad Mneimneh

Bio nformatics. Lecture 3. Saad Mneimneh Bio nformatics Lecture 3 Sequencing As before, DNA is cut into small ( 0.4KB) fragments and a clone library is formed. Biological experiments allow to read a certain number of these short fragments per

More information

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

More information

Piecewise Constant Prediction

Piecewise Constant Prediction Piecewise Constant Prediction Erik Ordentlich Information heory Research Hewlett-Packard Laboratories Palo Alto, CA 94304 Email: erik.ordentlich@hp.com Marcelo J. Weinberger Information heory Research

More information

Multimedia. Multimedia Data Compression (Lossless Compression Algorithms)

Multimedia. Multimedia Data Compression (Lossless Compression Algorithms) Course Code 005636 (Fall 2017) Multimedia Multimedia Data Compression (Lossless Compression Algorithms) Prof. S. M. Riazul Islam, Dept. of Computer Engineering, Sejong University, Korea E-mail: riaz@sejong.ac.kr

More information

Entropy Coding. Connectivity coding. Entropy coding. Definitions. Lossles coder. Input: a set of symbols Output: bitstream. Idea

Entropy Coding. Connectivity coding. Entropy coding. Definitions. Lossles coder. Input: a set of symbols Output: bitstream. Idea Connectivity coding Entropy Coding dd 7, dd 6, dd 7, dd 5,... TG output... CRRRLSLECRRE Entropy coder output Connectivity data Edgebreaker output Digital Geometry Processing - Spring 8, Technion Digital

More information

ECE533 Digital Image Processing. Embedded Zerotree Wavelet Image Codec

ECE533 Digital Image Processing. Embedded Zerotree Wavelet Image Codec University of Wisconsin Madison Electrical Computer Engineering ECE533 Digital Image Processing Embedded Zerotree Wavelet Image Codec Team members Hongyu Sun Yi Zhang December 12, 2003 Table of Contents

More information

Selective Use Of Multiple Entropy Models In Audio Coding

Selective Use Of Multiple Entropy Models In Audio Coding Selective Use Of Multiple Entropy Models In Audio Coding Sanjeev Mehrotra, Wei-ge Chen Microsoft Corporation One Microsoft Way, Redmond, WA 98052 {sanjeevm,wchen}@microsoft.com Abstract The use of multiple

More information

Data Structures in Java

Data Structures in Java Data Structures in Java Lecture 20: Algorithm Design Techniques 12/2/2015 Daniel Bauer 1 Algorithms and Problem Solving Purpose of algorithms: find solutions to problems. Data Structures provide ways of

More information

Chapter 2: Source coding

Chapter 2: Source coding Chapter 2: meghdadi@ensil.unilim.fr University of Limoges Chapter 2: Entropy of Markov Source Chapter 2: Entropy of Markov Source Markov model for information sources Given the present, the future is independent

More information

Evolution of Genotype-Phenotype mapping in a von Neumann Self-reproduction within the Platform of Tierra

Evolution of Genotype-Phenotype mapping in a von Neumann Self-reproduction within the Platform of Tierra Evolution of Genotype-Phenotype mapping in a von Neumann Self-reproduction within the Platform of Tierra Declan Baugh and Barry Mc Mullin The Rince Institute, Dublin City University, Ireland declan.baugh2@mail.dcu.ie,

More information

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline

More information

MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A

MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK DEPARTMENT: ECE SEMESTER: IV SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A 1. What is binary symmetric channel (AUC DEC

More information

Example: sending one bit of information across noisy channel. Effects of the noise: flip the bit with probability p.

Example: sending one bit of information across noisy channel. Effects of the noise: flip the bit with probability p. Lecture 20 Page 1 Lecture 20 Quantum error correction Classical error correction Modern computers: failure rate is below one error in 10 17 operations Data transmission and storage (file transfers, cell

More information

+ (50% contribution by each member)

+ (50% contribution by each member) Image Coding using EZW and QM coder ECE 533 Project Report Ahuja, Alok + Singh, Aarti + + (50% contribution by each member) Abstract This project involves Matlab implementation of the Embedded Zerotree

More information

Linear-Space Alignment

Linear-Space Alignment Linear-Space Alignment Subsequences and Substrings Definition A string x is a substring of a string x, if x = ux v for some prefix string u and suffix string v (similarly, x = x i x j, for some 1 i j x

More information

Communications Theory and Engineering

Communications Theory and Engineering Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 AEP Asymptotic Equipartition Property AEP In information theory, the analog of

More information

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code Chapter 2 Date Compression: Source Coding 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code 2.1 An Introduction to Source Coding Source coding can be seen as an efficient way

More information

Objective: Reduction of data redundancy. Coding redundancy Interpixel redundancy Psychovisual redundancy Fall LIST 2

Objective: Reduction of data redundancy. Coding redundancy Interpixel redundancy Psychovisual redundancy Fall LIST 2 Image Compression Objective: Reduction of data redundancy Coding redundancy Interpixel redundancy Psychovisual redundancy 20-Fall LIST 2 Method: Coding Redundancy Variable-Length Coding Interpixel Redundancy

More information

A Comparison of Methods for Redundancy Reduction in Recurrence Time Coding

A Comparison of Methods for Redundancy Reduction in Recurrence Time Coding 1 1 A Comparison of Methods for Redundancy Reduction in Recurrence Time Coding Hidetoshi Yokoo, Member, IEEE Abstract Recurrence time of a symbol in a string is defined as the number of symbols that have

More information

The Complete Set Of Genetic Instructions In An Organism's Chromosomes Is Called The

The Complete Set Of Genetic Instructions In An Organism's Chromosomes Is Called The The Complete Set Of Genetic Instructions In An Organism's Chromosomes Is Called The What is a genome? A genome is an organism's complete set of genetic instructions. Single strands of DNA are coiled up

More information

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory Digital Communications III (ECE 154C) Introduction to Coding and Information Theory Tara Javidi These lecture notes were originally developed by late Prof. J. K. Wolf. UC San Diego Spring 2014 1 / 8 I

More information