Lecture 18: Compression

Similar documents
Chapter 6: Memory: Information and Secret Codes. CS105: Great Insights in Computer Science. Overview

4 A Survey of Congruent Results 12

Approximation in Stochastic Scheduling: The Power of LP-Based Priority Policies

Handout 6 Solutions to Problems from Homework 2

Place value and fractions. Explanation and worked examples We read this number as two hundred and fifty-six point nine one.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

Quadratic Reciprocity. As in the previous notes, we consider the Legendre Symbol, defined by

Ballistic Pendulum. Introduction

5. Dimensional Analysis. 5.1 Dimensions and units

EXACT BOUNDS FOR JUDICIOUS PARTITIONS OF GRAPHS

Reversibility of Turing Machine Computations

2E1252 Control Theory and Practice

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA)

Chapter 8 Markov Chains and Some Applications ( 馬哥夫鏈 )

USEFUL HINTS FOR SOLVING PHYSICS OLYMPIAD PROBLEMS. By: Ian Blokland, Augustana Campus, University of Alberta

Your Suggestions. Board/slides. Too fast/too slow. Book does not have enough examples.

CHAPTER 2 THERMODYNAMICS

Computability and Complexity Random Sources. Computability and Complexity Andrei Bulatov

Note-A-Rific: Mechanical

the vibrant colors and further behind each design. Other early pioneers include Théo Ballmer Max THEINTERNA-

1. (2.5.1) So, the number of moles, n, contained in a sample of any substance is equal N n, (2.5.2)

The Frequent Paucity of Trivial Strings

Multiplication and division. Explanation and worked examples. First, we ll look at work you should know at this level.

languages are not CFL and hence are not This lemma enables us to prove that some recognizable by any PDA.

Lecture 8.2 Fluids For a long time now we have been talking about classical mechanics, part of physics which studies macroscopic motion of

26 Impulse and Momentum

ESE 523 Information Theory

Finite fields. and we ve used it in various examples and homework problems. In these notes I will introduce more finite fields

Math 116 First Midterm October 14, 2009

DCSP-3: Minimal Length Coding. Jianfeng Feng

Derivative at a point

#A62 INTEGERS 16 (2016) REPRESENTATION OF INTEGERS BY TERNARY QUADRATIC FORMS: A GEOMETRIC APPROACH

Unit 14 Harmonic Motion. Your Comments

Lesson 24: Newton's Second Law (Motion)

Physically Based Modeling CS Notes Spring 1997 Particle Collision and Contact

Birth-Death Processes. Outline. EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Relationship Among Stochastic Processes.

Honors Lab 4.5 Freefall, Apparent Weight, and Friction

A Self-adaptive Predictive Congestion Control Model for Extreme Networks

Quadratic forms and a some matrix computations

CSE 421 Greedy: Huffman Codes

Algebraic Multigrid. Multigrid

Huffman Coding. C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University

Phase field modelling of microstructural evolution using the Cahn-Hilliard equation: A report to accompany CH-muSE

Lecture 21 Principle of Inclusion and Exclusion

[95/95] APPROACH FOR DESIGN LIMITS ANALYSIS IN VVER. Shishkov L., Tsyganov S. Russian Research Centre Kurchatov Institute Russian Federation, Moscow

General Physical Chemistry I

The Transactional Nature of Quantum Information

Support Vector Machines MIT Course Notes Cynthia Rudin

Now multiply the left-hand-side by ω and the right-hand side by dδ/dt (recall ω= dδ/dt) to get:

Estimation of the Mean of the Exponential Distribution Using Maximum Ranked Set Sampling with Unequal Samples

The Semantics of Data Flow Diagrams. P.D. Bruza. Th.P. van der Weide. Dept. of Information Systems, University of Nijmegen

Math Review. Week 1, Wed Jan 10

Module 9: Further Numbers and Equations. Numbers and Indices. The aim of this lesson is to enable you to: work with rational and irrational numbers

Binomial and Poisson Probability Distributions

OPTI-502 Optical Design and Instrumentation I John E. Greivenkamp Final Exam In Class Page 1/16 Fall, 2016

ANOVA INTERPRETING. It might be tempting to just look at the data and wing it

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

OPTI-502 Optical Design and Instrumentation I John E. Greivenkamp Final Exam In Class Page 1/16 Fall, 2015

The Realm of Hydrogeology

CS 331: Artificial Intelligence Naïve Bayes. Naïve Bayes

List Scheduling and LPT Oliver Braun (09/05/2017)

For those who want to skip this chapter and carry on, that s fine, all you really need to know is that for the scalar expression: 2 H

abhi shelat

6.02 Fall 2012 Lecture #1

3 Thermodynamics and Statistical mechanics

10.2 Solving Quadratic Equations by Completing the Square

EGN 3353C Fluid Mechanics

!! Let x n = x 1,x 2,,x n with x j! X!! We say that x n is "-typical with respect to p(x) if

Section 9. Paraxial Raytracing

Gaussians. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

New upper bound for the B-spline basis condition number II. K. Scherer. Institut fur Angewandte Mathematik, Universitat Bonn, Bonn, Germany.

SUPPORTING INFORMATION FOR. Mass Spectrometrically-Detected Statistical Aspects of Ligand Populations in Mixed Monolayer Au 25 L 18 Nanoparticles

0.1. Linear transformations

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words)

Experiment 2: Hooke s Law

OPTI-502 Optical Design and Instrumentation I John E. Greivenkamp Final Exam In Class Page 1/14 Fall, 2017

t s time we revisit our friend, the equation of a line: y = mx + b

The Schrödinger Equation and the Scale Principle

OPTI-502 Optical Design and Instrumentation I John E. Greivenkamp Final Exam In Class Page 1/12 Fall, 2011

Fundamentals of Image Compression

Birthday Paradox Calculations and Approximation

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding

Source-Channel-Sink Some questions

Math 144 Activity #10 Applications of Vectors

ma x = -bv x + F rod.

Random Process Examples 1/23

Exploiting Matrix Symmetries and Physical Symmetries in Matrix Product States and Tensor Trains

Handout 7. and Pr [M(x) = χ L (x) M(x) =? ] = 1.

UNIT I INFORMATION THEORY. I k log 2

Equivalence between transition systems. Modal logic and first order logic. In pictures: forth condition. Bisimilation. In pictures: back condition

Multiple Testing Issues & K-Means Clustering. Definitions related to the significance level (or type I error) of multiple tests

Graphs and Networks Lecture 5. PageRank. Lecturer: Daniel A. Spielman September 20, 2007

Reduced Length Checking Sequences

Lecture 1: Shannon s Theorem

We have also learned that, thanks to the Central Limit Theorem and the Law of Large Numbers,

Limited Failure Censored Life Test Sampling Plan in Burr Type X Distribution

Entropy as a measure of surprise

Discrete Memoryless Channels

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

Fibonacci Coding for Lossless Data Compression A Review

Transcription:

Lectre 8: Coression CS2: Great Insights in Coter Science Michael L Littan Sring 2006 Overview When we decide how to reresent soething in its there are soe coeting interests: easil anilated/rocessed short Coon to se two reresentations: one direct to allow for eas rocessing one terse (coressed) to save storage and conication costs

Plan I going to tr to descrie one neat idea ilicit in Chater 6: Hffan coding For ore inforation see wiiedia: htt://enwiiediaorg/wii/ Hffan_coding Gettsrg Address For score and seven ears ago or fathers roght forth on this continent a new nation conceived in Liert and dedicated to the roosition that all en are created eal Now we are engaged in a great civil war testing whether that nation or an nation so conceived and so dedicated can long endre We are et on a great attlefield of that war We have coe to dedicate a ortion of that field as a final resting lace for those who here gave their lives that that nation ight live It is altogether fitting and roer that we shold do this Bt in a larger sense we can not dedicate we can not consecrate we can not hallow this grond The rave en living and dead who strggled here have consecrated it far aove or oor ower to add or detract The world will little note nor long reeer what we sa here t it can never forget what the did here It is for s the living rather to e dedicated here to the nfinished wor which the who foght here have ths far so nol advanced It is rather for s to e here dedicated to the great tas reaining efore s that fro these honored dead we tae increased devotion to that case for which the gave the last fll easre of devotion that we here highl resolve that these dead shall not have died in vain that this nation nder God shall have a new irth of freedo and that governent of the eole the eole for the eole shall not erish fro the earth

Character Conts For silicit let s trn the ercase letters into lowercase letters That leaves s with: 2 <s> <> 0? 2 a c 58 d 65 e 27 f g 80 h 68 i 0 j 2 l 77 n 9 o 79 r s 26 t 2 v w 0 x 0 z Attet #: ASCII The standard forat for reresenting characters ses 8 its er character The address is 82 characters long so a total of 856 its is needed sing this reresentation 8 its er character 856 total its 0% the size of ASCII reresentation

Attet #2: Coact Note that at least in its lowercase for there are onl 2 different characters needed Therefore each can e assigned a 5 it code (2 different 5its atterns) 5 its er character 7 total its 625% the size of ASCII reresentation 5it Patterns 00000 <s> 0000 <> 000 000 000 00? 00 a 00 000 c 00 d 0 e 0 f 00 g 0 h 0 i 0 j 000 00 l 0 0 n 0 o r 00 s 0 t v 0 w x z

Attet #: Variale Len Soe characters are ch ore coon than others Give the ost coon characters a it code and the reaining a 6it code How an its do we need now? Variale Length Patterns 000 <s> 00 e 0 t 0 a 0000 o 000 h 00 r 00 n 00 i 0 d 0 s 0 l 00 c 0 w g f 0 v 000 00 0 0 0 <> 00? 0 j x z

Decodailit Note that the code was chosen so that the first it of each character tells o whether the code is short (0) or long () This choice ensres that a essage can actall e decoded: 000000000000000000 h i <s> t h e r e 2 its not 5 Bt harder to wor with What Gives? We had assigned all 2 characters 5it codes Now we ve got that have it codes and that are 6it codes So ore than half of the characters have actall gotten longer How can that change hel? Need to factor in how an of each characters there are

Adding U the Bits How an its to write down jst the letter? Well there are s and each taes 6 its So 60 its (It was 50 efore) How aot t? There are 26 and each taes its That s 78 (was 60) So how do we total the all? Let c e a character fre(c) the ner of ties it aears and len(c) its encoding length Total its =! c fre(c) x len(c) Sing It U 2x + 65x + 26x +2x + 9x6+ 80x6 + 79x6 + + 0x6 + 0x6 = 6867 2 <s> 65 e 26 t 2 a 9 o 80 h 79 r 77 n 68 i 58 d s 2 l c w g 27 f 2 v <> 0? 0 j 0 x 0 z

Attet #: Sar Total for this exale: 6 its er character 6867 total its 579% the size of ASCII reresentation Attet #: Sorted 0 <s> e t a o Total for this exale: 7 its er character 67 total its 88% the size of ASCII reresentation

Attet #5: Yor Trn Mae sre it is decodale! 2 <s> 65 e 26 t 2 a 9 o 80 h 79 r 77 n 68 i 58 d s 2 l c w g 27 f 2 v <> 0? 0 j 0 x 0 z Can We Do Better? Shannon invented inforation theor which tals aot its and randoness and encodings Fano and Shannon wored together on finding inial size codes The fond a good heristic Fano assigned the role to his class Hffan solved it not nowing his rof had nsccessfll strggled with it

Tree (Prefix) Code First notice that a code can e drawn as a tree Left = 0 right = So e = 00 w = 0 Tree strctre ensres code is decodale: Bits tell o naigosl which character <s> e t a o h r n i d s l c w g f v <>? j x z Hffan Coding Mae each character a stree ( loc ) with cont eal to its freenc Tae two locs with sallest conts and erge the into left and right ranches The cont for the new loc is the s of the conts of the locs it is ade ot of Reeat ntil all locs have een erged into one ig loc (single tree) Read the code off the ranches in the tree

Partial Exale 2 l 85 s 2 v 2 7 9 76 2 a 9 o 95 7 g 27 f 55 w 29 57 2 58 d c <> 8 6 2 26 t 77 n 68 i 5 27 505 2 l s 2 v g 27 f w 58 d c <> 2 l s 2 v g 27 f w 58 d c 8 <> Partial Exale <> <> 8 <> 8 8 <> 2 8 8 <> 2 29 8 8 <> 2 29 8 8 <>

Coleted Code Tree 82 876 77 n 5 27 68 i 26 t 8 8 <> 505 6 c 58 d 2 29 57 2 w g 55 27 f 2 a 95 9 o 2 v 7 7 9 2 s 76 85 2 l 65 e 606 2 9 80 h 79 r 2 <s> Created Code <s> 0 e 000 t 00 a 0 o h r 00000 n 0000 i 00 d 0 s 0 l 000 c 00 w 00 g 00 f 000 v 00 0 0000 0000 000 00 00 000000 00000 <> 000000 00000

Hffan: Sar Total for this exale: its er character 65 total its 57% the size of ASCII reresentation Minial for this te of code Other Codes error detecting: Know if soething has een odified (it fli) error correcting: Know which it has een odified lticharacter: Encode seences (lie the ) with their own codes Can get ch closer to ini ossile code length: Shannon s entro

What To Know constrct a Hffan code fro freencies decode a essage sing a Hffan code encode a essage sing a Hffan code (Let s tr soe exales as tie erits) Next Tie Hillis Chater 8