Algorithm Design and Analysis

Similar documents
Algorithm Design and Analysis

Single Source Shortest Paths

Greedy Alg: Huffman abhi shelat

CMPS 2200 Fall Carola Wenk Slides courtesy of Charles Leiserson with small changes by Carola Wenk. 10/8/12 CMPS 2200 Intro.

Introduction to Algorithms

CS60020: Foundations of Algorithm Design and Machine Learning. Sourangshu Bhattacharya

Introduction to Algorithms

Introduction to Algorithms

CMPS 6610 Fall 2018 Shortest Paths Carola Wenk

CS781 Lecture 3 January 27, 2011

Discrete Wiskunde II. Lecture 5: Shortest Paths & Spanning Trees

Algorithm Design and Analysis

CS 580: Algorithm Design and Analysis

Shortest Path Algorithms

Single Source Shortest Paths

CS 4407 Algorithms Lecture: Shortest Path Algorithms

Lecture 4 : Adaptive source coding algorithms

L feb abhi shelat

25. Minimum Spanning Trees

25. Minimum Spanning Trees

CSE 421 Greedy: Huffman Codes

Chapter 4. Greedy Algorithms. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Greedy. Outline CS141. Stefano Lonardi, UCR 1. Activity selection Fractional knapsack Huffman encoding Later:

BFS Dijkstra. Oct abhi shelat

Introduction to Algorithms

4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd

Discrete Optimization 2010 Lecture 2 Matroids & Shortest Paths

Algorithm Design and Analysis

Breadth First Search, Dijkstra s Algorithm for Shortest Paths

CS4800: Algorithms & Data Jonathan Ullman

Dijkstra s Single Source Shortest Path Algorithm. Andreas Klappenecker

Design and Analysis of Algorithms

8 Priority Queues. 8 Priority Queues. Prim s Minimum Spanning Tree Algorithm. Dijkstra s Shortest Path Algorithm

Slides for CIS 675. Huffman Encoding, 1. Huffman Encoding, 2. Huffman Encoding, 3. Encoding 1. DPV Chapter 5, Part 2. Encoding 2

Lecture 1 : Data Compression and Entropy

IS 709/809: Computational Methods in IS Research Fall Exam Review

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

Data Structures in Java

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Fibonacci Heaps These lecture slides are adapted from CLRS, Chapter 20.

Chapter 5 Data Structures Algorithm Theory WS 2017/18 Fabian Kuhn

General Methods for Algorithm Design

Chapter 4. Greedy Algorithms. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

3 Greedy Algorithms. 3.1 An activity-selection problem

Algorithms Booklet. 1 Graph Searching. 1.1 Breadth First Search

Undirected Graphs. V = { 1, 2, 3, 4, 5, 6, 7, 8 } E = { 1-2, 1-3, 2-3, 2-4, 2-5, 3-5, 3-7, 3-8, 4-5, 5-6 } n = 8 m = 11

Analysis of Algorithms. Outline. Single Source Shortest Path. Andres Mendez-Vazquez. November 9, Notes. Notes

Greedy Algorithms and Data Compression. Curs 2018

CS 580: Algorithm Design and Analysis

Contents Lecture 4. Greedy graph algorithms Dijkstra s algorithm Prim s algorithm Kruskal s algorithm Union-find data structure with path compression

Entropy Coding. Connectivity coding. Entropy coding. Definitions. Lossles coder. Input: a set of symbols Output: bitstream. Idea

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

Greedy Algorithms. CSE 101: Design and Analysis of Algorithms Lecture 10

Greedy Algorithms and Data Compression. Curs 2017

Text Compression. Jayadev Misra The University of Texas at Austin December 5, A Very Incomplete Introduction to Information Theory 2

2-INF-237 Vybrané partie z dátových štruktúr 2-INF-237 Selected Topics in Data Structures

Assignment 5: Solutions

Algorithm Design and Analysis

ICS 252 Introduction to Computer Design

NATIONAL UNIVERSITY OF SINGAPORE CS3230 DESIGN AND ANALYSIS OF ALGORITHMS SEMESTER II: Time Allowed 2 Hours

1 Some loose ends from last time

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

Sublinear Algorithms Lecture 3. Sofya Raskhodnikova Penn State University

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Entropy as a measure of surprise

CSE 417. Chapter 4: Greedy Algorithms. Many Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Autumn Coping with NP-completeness (Conclusion) Introduction to Data Compression

Partha Sarathi Mandal

Preliminaries. Graphs. E : set of edges (arcs) (Undirected) Graph : (i, j) = (j, i) (edges) V = {1, 2, 3, 4, 5}, E = {(1, 3), (3, 2), (2, 4)}

Lecture 9. Greedy Algorithm

Algorithm Design and Analysis

CPSC 320 Sample Final Examination December 2013

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

2018/5/3. YU Xiangyu

Midterm 2 for CS 170

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

Information Theory and Statistics Lecture 2: Source coding

Multimedia. Multimedia Data Compression (Lossless Compression Algorithms)

CS325: Analysis of Algorithms, Fall Final Exam

Combinatorial Optimization

abhi shelat

3F1 Information Theory, Lecture 3

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

CS 253: Algorithms. Chapter 24. Shortest Paths. Credit: Dr. George Bebis

Source Coding Techniques

Dynamic Programming. Shuang Zhao. Microsoft Research Asia September 5, Dynamic Programming. Shuang Zhao. Outline. Introduction.

CS 410/584, Algorithm Design & Analysis, Lecture Notes 4

CSE101: Design and Analysis of Algorithms. Ragesh Jaiswal, CSE, UCSD

from notes written mostly by Dr. Matt Stallmann: All Rights Reserved

University of Toronto Department of Electrical and Computer Engineering. Final Examination. ECE 345 Algorithms and Data Structures Fall 2016

CSCI 2570 Introduction to Nanocomputing

Algorithm Design Strategies V

CMPSCI611: The Matroid Theorem Lecture 5

Introduction to information theory and coding

Algorithm Design and Analysis

Data Compression Techniques

CS 410/584, Algorithm Design & Analysis: Lecture Notes 3

1 Basic Definitions. 2 Proof By Contradiction. 3 Exchange Argument

Slides credited from Hsueh-I Lu & Hsu-Chun Hsiao

Transcription:

Algorithm Design and Analysis LECTURE 8 Greedy Algorithms V Huffman Codes Adam Smith

Review Questions Let G be a connected undirected graph with distinct edge weights. Answer true or false: Let e be the cheapest edge in G. Some MST of G contains e? Let e be the most expensive edge in G. No MST of G contains e? 9/15/2008

Review Questions Let G be a connected undirected graph with distinct edge weights. Answer true or false: Let e be the cheapest edge in G. Some MST of G contains e? (Answer:Yes, by the Cut Property) Let e be the most expensive edge in G. No MST of G contains e? (Answer: False. Counterexample: if G is a tree, all its edges are in the MST) 9/15/2008

Review Exercise: given an undirected graph G, consider spanning trees produced by four algorithms BFS tree DFS tree shortest paths tree (Dijsktra) MST Find a graph where all four trees are the same all four trees must be different (note: DFS/BFS may depend on exploration order)

Non-distinct edges? Read in text 9/15/2008

Implementing MST algorithms Prim: similar to Dijkstra Kruskal: Requires efficient data structure to keep track of islands : Union-Find data structure We may revisit this later in the course You should know how to implement Prim in O(m log m/n n) time

Implementation of Prim(G,w) 9/15/2008 IDEA: Maintain V S as a priority queue Q (as in Dijkstra). Key each vertex in Q with the weight of the leastweight edge connecting it to a vertex in S. Q V key[v] for all v V key[s] 0 for some arbitrary s V while Q do u EXTRACT-MIN(Q) for each v Adjacency-list[u] do if v Q and w(u, v) < key[v] then key[v] w(u, v) DECREASE-KEY [v] u At the end, {(v, [v])} forms the MST.

Analysis of Prim n times Handshaking Lemma (m) implicit DECREASE-KEY s. Time: as in Dijkstra 9/15/2008 (n) total degree(u) times Q V key[v] for all v V key[s] 0 for some arbitrary s V while Q do u EXTRACT-MIN(Q) for each v Adj[u] do if v Q and w(u, v) < key[v] then key[v] w(u, v) [v] u

Analysis of Prim n times degree(u) times while Q do u EXTRACT-MIN(Q) for each v Adj[u] do if v Q and w(u, v) < key[v] then key[v] w(u, v) [v] u Handshaking Lemma (m) implicit DECREASE-KEY s. PQ Operation ExtractMin DecreaseKey Prim n m Array n 1 Binary heap log n log n d-way Heap HW3 HW3 Fib heap log n 1 Total n 2 m log n m log m/n n m + n log n Individual ops are amortized bounds 9/15/2008

Greedy Algorithms for MST Kruskal's: Start with T =. Consider edges in ascending order of weights. Insert edge e in T unless doing so would create a cycle. Reverse-Delete: Start with T = E. Consider edges in descending order of weights. Delete edge e from T unless doing so would disconnect T. Prim's: Start with some root node s. Grow a tree T from s outward. At each step, add to T the cheapest edge e with exactly one endpoint in T. 9/15/2008

Union-Find Data Structures Operation\ Implementation Array + linked-lists and Balanced Trees sizes Find (worst-case) (1) (log n) Union of sets A,B (worst-case) Amortized analysis: k unions and k finds, starting from singletons (min( A, B ) (could be as large as (n) (log n) (k log k) (k log k) With modifications, amortized time for tree structure is O(n Ack(n)), where Ack(n), the Ackerman function grows much more slowly than log n. See KT Chapter 4.6 9/15/2008

Huffman codes

Prefix-free codes Binary code maps characters in an alphabet (say {A,,Z}) to binary strings Prefix-free code: no codeword is a prefix of any other ASCII: prefix-free (all symbols have same length) Not prefix-free: a 0 b 1 c 00 d 01 Why is prefix-free good?

A prefix-free code for a few letters e.g. e 00, p 10011 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, Source: K. Wayne WIkipedia

A prefix-free code 170 59 111 S 27 32 60 51 16 N 16 39 21 25 E 26 H 8 W 8 T 17 10 11 22 12 I 13 8 O 9 F 5 V 5 Y 5 R 6 6 6 X 4 4 A 3 C 3 G 3 3 L 2 U 2 e.g. T 1001, U 1000011 D 2 Z 1 Source: Jeff Erickson notes.

How good is a prefix-free code? Given a text, let f[i] = # occurrences of letter i Total number of symbols needed f[i] depth(i) i How do we pick the best prefix-free code?

Huffman s Algorithm (1952) Given individual letter frequencies f[1,.., n]: Find the two least frequent letters i,j Merge them into symbol with frequency f[i]+f[j] Repeat e.g. a: 6 b: 6 c: 4 d: 3 e: 2 Theorem: Huffman algorithm finds an optimal prefix-free code

Warming up Lemma 0: Every optimal prefix-free code corresponds to a full binary tree. (Full = every node has 0 or 2 children) Lemma 1: Let x and y be two least frequent characters. There is an optimal code in which x and y are siblings.

Huffman codes are optimal Proof by induction! Base case: two symbols; only one full tree. Induction step: Suppose f[1], f[2] are smallest in f[1,,n] T is an optimal code for {1,,n} Lemma 1 ==> can choose T where 1,2 are siblings. New symbol numbered n+1, with f[n+1] = f[1]+f[2] T = code obtained by merging 1,2 into n+1

Cost of T in terms of T : n cost(t)= f [i] depth(i) i=1 n+1 = f [i] depth(i) + f [1] depth(1) + f [2] depth(2) f [n + 1] depth(n + 1) i=3 = cost(t )+ f [1] depth(1)+ f [2] depth(2) f [n + 1] depth(n + 1) = cost(t )+(f [1]+ f [2]) depth(t) f [n + 1] (depth(t) 1) = cost(t )+ f [1]+ f [2] Let H be Huffman code for {1,,n} Let H be Huffman code for {3,,n+1} Induction hypothesis cost(h ) cost(t ) cost(h) = cost(h )+f[1]+f[2] cost(t). QED

Notes See Jeff Erickson s lecture notes on greedy algorithms: http://theory.cs.uiuc.edu/~jeffe/teaching/algorithms/

Data Compression for real? Generally, we don t use letter-by-letter encoding Instead, find frequently repeated substrings Lempel-Ziv algorithm extremely common also has deep connections to entropy If we have time for string algorithms, we ll cover this

Huffman codes and entropy Given a set of frequencies, consider probabilities p[i] = f[i] / (f[1] + + f[n]) Entropy(p) = i p[i] log(1/p[i]) Huffman code has expected depth Entropy(p) i p[i] depth(i) Entropy(p) +1 To prove the upper bound, find some prefix free code where depth(i) log(1/p[i]) +1 for every symbol i Exercise! The bound applies to Huffman too, by optimality

Prefix-free encoding of arbitrary length strings What if I want to send you a text But you don t know ahead of time how long it is? 1: put length at the beginning: n+log(n) bits requires me to know the length 2: every B bits, put a special bit indicating whether or not we re done: n(1+1/b) +B-1 bits Can we do better?