Data Compression LZ77. Jens Müller Universität Stuttgart

Similar documents
International Journal of Computer Engineering and Applications, Volume XII, Issue III, March 18, ISSN

Andersen s Algorithm. CS 701 Final Exam (Reminder) Friday, December 12, 4:00 6:00 P.M., 1289 Computer Science.

Lecture 6: Coding theory

Module 4: Moral Hazard - Linear Contracts

Chapter 7. Kleene s Theorem. 7.1 Kleene s Theorem. The following theorem is the most important and fundamental result in the theory of FA s:

Illustrating the space-time coordinates of the events associated with the apparent and the actual position of a light source

CSE 332. Sorting. Data Abstractions. CSE 332: Data Abstractions. QuickSort Cutoff 1. Where We Are 2. Bounding The MAXIMUM Problem 4

Toward Mechanized Music Pedagogy

Review of Mathematical Concepts

A Study of Some Integral Problems Using Maple

Week 8. Topic 2 Properties of Logarithms

Data Structures. Element Uniqueness Problem. Hash Tables. Example. Hash Tables. Dana Shapira. 19 x 1. ) h(x 4. ) h(x 2. ) h(x 3. h(x 1. x 4. x 2.

10.3 The Quadratic Formula

( ) D x ( s) if r s (3) ( ) (6) ( r) = d dr D x

This immediately suggests an inverse-square law for a "piece" of current along the line.

COMPUTER AIDED ANALYSIS OF KINEMATICS AND KINETOSTATICS OF SIX-BAR LINKAGE MECHANISM THROUGH THE CONTOUR METHOD

Finite State Automata and Determinisation

Mathematical Reflections, Issue 5, INEQUALITIES ON RATIOS OF RADII OF TANGENT CIRCLES. Y.N. Aliyev

Electric Potential. and Equipotentials

Equilibria of a cylindrical plasma

Data Compression Techniques (Spring 2012) Model Solutions for Exercise 4

Language Processors F29LP2, Lecture 5

10 Statistical Distributions Solutions

Mark Scheme (Results) January 2008

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1

Surds and Indices. Surds and Indices. Curriculum Ready ACMNA: 233,

Energy Dissipation Gravitational Potential Energy Power

Factorising FACTORISING.

Lecture 10. Solution of Nonlinear Equations - II

Find this material useful? You can help our team to keep this site up and bring you even more content consider donating via the link on our site.

SIMPLE NONLINEAR GRAPHS

Mitosis vs meiosis: Lecture Outline 10/26/05. Independent Assortment

The Area of a Triangle

Conjugate Gradient (CG) Method

Homework 3 MAE 118C Problems 2, 5, 7, 10, 14, 15, 18, 23, 30, 31 from Chapter 5, Lamarsh & Baratta. The flux for a point source is:

ITI Introduction to Computing II

Intermediate Math Circles Wednesday 17 October 2012 Geometry II: Side Lengths

Week 10: DTMC Applications Ranking Web Pages & Slotted ALOHA. Network Performance 10-1

Properties and Formulas

Influence of the Magnetic Field in the Solar Interior on the Differential Rotation

DEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING FLUID MECHANICS III Solutions to Problem Sheet 3

Prerna Tower, Road No 2, Contractors Area, Bistupur, Jamshedpur , Tel (0657) ,

Counting Paths Between Vertices. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs

Kleene s Theorem. Kleene s Theorem. Kleene s Theorem. Kleene s Theorem. Kleene s Theorem. Kleene s Theorem 2/16/15

Topic II.1: Frequent Subgraph Mining

r r E x w, y w, z w, (1) Where c is the speed of light in vacuum.

Fourier-Bessel Expansions with Arbitrary Radial Boundaries

PH126 Exam I Solutions

Suggested t-z and q-z functions for load-movement responsef

Equivalent fractions have the same value but they have different denominators. This means they have been divided into a different number of parts.

Incremental Maintenance of XML Structural Indexes

Physics 604 Problem Set 1 Due Sept 16, 2010

Ch 26 - Capacitance! What s Next! Review! Lab this week!

Lecture 4. Electric Potential

Linear Algebra Introduction

Discrete Model Parametrization

Outline Data Structures and Algorithms. Data compression. Data compression. Lossy vs. Lossless. Data Compression

CS 360 Exam 2 Fall 2014 Name

Solutions to Problem Set #1

Physical Security Countermeasures. This entire sheet. I m going to put a heptadecagon into game.

The DOACROSS statement

Empirical equations for electrical parameters of asymmetrical coupled microstrip lines

Lecture 14. Protocols. Key Distribution Center (KDC) or Trusted Third Party (TTP) KDC generates R1

Previously. Extensions to backstepping controller designs. Tracking using backstepping Suppose we consider the general system

BİL 354 Veritabanı Sistemleri. Relational Algebra (İlişkisel Cebir)

8 THREE PHASE A.C. CIRCUITS

Numbers and indices. 1.1 Fractions. GCSE C Example 1. Handy hint. Key point

Chapter 2 Finite Automata

Solids of Revolution

Section 35 SHM and Circular Motion

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Ch. 2.3 Counting Sample Points. Cardinality of a Set

Equations from the Millennium Theory of Inertia and Gravity. Copyright 2004 Joseph A. Rybczyk

Logic Synthesis and Verification

CSC Design and Analysis of Algorithms. Example: Change-Making Problem

Using integration tables

Finite-State Automata: Recap

Compression. Compression. Compression. This part of the course... Ifi, UiO Norsk Regnesentral Vårsemester 2005 Wolfgang Leister

Class Summary. be functions and f( D) , we define the composition of f with g, denoted g f by

Pushdown Automata (PDAs)

Lesson 55 - Inverse of Matrices & Determinants

for all x in [a,b], then the area of the region bounded by the graphs of f and g and the vertical lines x = a and x = b is b [ ( ) ( )] A= f x g x dx

1 From NFA to regular expression

Data Structures and Algorithm. Xiaoqing Zheng

Answers to test yourself questions

Lossless Compression Lossy Compression

The transformation to right derivation is called the canonical reduction sequence. Bottom-up analysis

ELECTROSTATICS. 4πε0. E dr. The electric field is along the direction where the potential decreases at the maximum rate. 5. Electric Potential Energy:

Physics 505 Fall 2005 Midterm Solutions. This midterm is a two hour open book, open notes exam. Do all three problems.

Chapter Introduction to Partial Differential Equations

CISC 4090 Theory of Computation

3.1 Magnetic Fields. Oersted and Ampere

Important design issues and engineering applications of SDOF system Frequency response Functions

Analysis of Variance for Multiple Factors

Eigenvectors and Eigenvalues

Lecture 11 Binary Decision Diagrams (BDDs)

Graph Theory. Presentation Outline. Introduction. Introduction. Introduction

15.081J/6.251J Introduction to Mathematical Programming. Lecture 6: The Simplex Method II

Chapter 2: Introduction to Implicit Equations

( ) { } [ ] { } [ ) { } ( ] { }

Transcription:

Dt Compession LZ77 Jens Mülle Univesität Stuttgt 2008-11-25

Outline Intoution Piniple of itiony methos LZ77 Sliing winow Exmples Optimiztion Pefomne ompison Applitions/Ptents Jens Mülle- IPVS Univesität Stuttgt 2

Piniple of itiony methos Compessing multiple stings n e moe effiient thn ompessing single symols only (e.g. Huffmn enoing). Stings of symols e e to itiony. Lte ouenes e efeene. Stti itiony: Enties e peefine n onstnt oing to the pplition of the text Aptive itiony: Enties e tken fom the text itself n ete on-the-fly Jens Mülle- IPVS Univesität Stuttgt 3

LZ77 Fist ppe y Ziv n Lempel in 1977 out lossless ompession with n ptive itiony. Goes though the text in sliing winow onsisting of seh uffe n look he uffe. Seh uffe this is text tht is eing Look-he uffe e though the winow The seh uffe is use s itiony Sizes of these uffes e pmetes of the implementtion. Assumption: Pttens in text ou within nge of the seh uffe. Jens Mülle- IPVS Univesität Stuttgt 4

LZ77 Exmple (Enoing) Enoing of the sting: output tuple: (offset, length, symol) 7 6 5 4 3 2 1 output (0,0,) (0,0,) (0,0,) (3,1,) (2,1,) (7,4,) Seh uffe Look-he uffe 12 htes ompesse into 6 tuples Compession te: (12*8)/(6*(5+2+3))=96/60=1,6=60%. Jens Mülle- IPVS Univesität Stuttgt 5

Size of output Size fo eh output tuple (offset, length, symol) when using fixe-length stoge: log 2 S + log2( S+ L) + log2 A whee S is the length of the seh uffe, L the length of the look he winow, A the size of the lphet. Why S+L n not only S? See next slie. Wost se if no symol epets in the seh uffe: n ( ) log2 S + log2( S+ L) log2 A log2 A Blow up of n + inste of Jens Mülle- IPVS Univesität Stuttgt 6

Enoing ehes into look-he uffe Speil se he he e s : HA 7 6 5 4 3 2 1 output s i : H A H A H AHA! (0,0,H) s i : H A H A H A HH! (0,0,A) i : H A H A H A H A! (2,4,H) H A H A H A H A! (2,1,!) H A H A H A! Seh uffe Look-he uffe Jens Mülle- IPVS Univesität Stuttgt 7

Enoing Pseuo oe lgoithm while look-he uffe is not empty go kws in seh uffe to fin longest mth of the look-he uffe if mth foun else fi en while pint: (offset fom winow ouny, length of mth, next symol in lookhe uffe); shift winow y length+1; pint: (0, 0, fist symol in look-he uffe); shift winow y 1; Jens Mülle- IPVS Univesität Stuttgt 8

Exmple (Deoing) input 7 6 5 4 3 2 1 (0,0,) (0,0,) (0,0,) (3,1,) (2,1,) (7,4,) Jens Mülle- IPVS Univesität Stuttgt 9

Deoing Pseuo oe lgoithm fo eh token (offset, length, symol) next if offset = 0 then pint symol; else fi go evese in pevious output y offset htes n opy hte wise fo length symols; pint symol; LZ77 is symmeti, enoing is moe iffiult thn eoing s it nees to fin the longest mth. Jens Mülle- IPVS Univesität Stuttgt 10

Optimiztions Suessos following LZ77 use iffeent optimiztions: Use vile size offset n length fiels in the tuples inste of fixe-length. Bette if smll offsets n sizes pevil. Don t output (0,0,x) token when hte is not foun ut inste iffeentite using flg-it: 0 x o 1 o,l Use ette suite t stutue (e.g. tee, hsh set) fo the uffes. This llows fste seh n/o lge uffes. Aitionl Huffmn oing of tuples/efeenes. -> LZSS, LZB, LZH, LZR, LZFG, LZMA, Deflte, Jens Mülle- IPVS Univesität Stuttgt 11

Pefomne 8 Bits/Symol 7 6 LZ77 LZR LZSS LZH 5 4 3 2 1 0 i ook1 ook2 geo news oj1 oj2 ppe1 pi pog pog1 pogp tns Benhmk (Fom Bell/Cley/Witten: Text Compession) Jens Mülle- IPVS Univesität Stuttgt 12

Applitions, Ptents Unlike LZ78, LZ77 hs not een ptente. This my e eson why its suessos sing on LZ77 e so wiely use: Deflte is omintion of LZSS togethe with Huffmn enoing n uses winow size of 32kB. This lgoithm is open soue n use in wht is wiely known s ZIP ompession (lthough the ZIP fomt itself is only ontine fomt, like AVI n n e use with sevel lgoithms), n y the fomts PNG, TIFF, PDF n mny othes. Jens Mülle- IPVS Univesität Stuttgt 13

Refeenes SOLOMON, D.: Dt Compession, The Complete Refeene., Spinge, New Yok, 1998 BELL, T. C., CLEARY, J. G., WITTEN, I. H.: Text Compession, Pentie Hll Avne Refeene Seies, 1990 SAYOOD, K.: Intoution to Dt Compession, Aemi Pess, Sn Diego, CA,1996, 2000. ZIV, J., LEMPEL, A.: A univesl lgoithm fo sequentil t ompession. IEEE Tnstions on Infomtion Theoy 23 (1977), 337 343. Jens Mülle- IPVS Univesität Stuttgt 14