Compressing Kinetic Data From Sensor Networks. Sorelle A. Friedler (Swat 04) Joint work with David Mount University of Maryland, College Park

Similar documents
Algorithms for Calculating Statistical Properties on Moving Points

Compressing Kinetic Data From Sensor Networks

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Multimedia. Multimedia Data Compression (Lossless Compression Algorithms)

Lecture 4 : Adaptive source coding algorithms

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Motivation for Arithmetic Coding

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Entropies & Information Theory

Chapter 9 Fundamental Limits in Information Theory

Lecture 6. Today we shall use graph entropy to improve the obvious lower bound on good hash functions.

CSE 421 Greedy: Huffman Codes

Chapter 2: Source coding

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

ECE 587 / STA 563: Lecture 5 Lossless Compression

Some Basic Concepts of Probability and Information Theory: Pt. 2

ECE 587 / STA 563: Lecture 5 Lossless Compression

Lecture 1: September 25, A quick reminder about random variables and convexity

02 Background Minimum background on probability. Random process

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Source Coding Techniques

Image and Multidimensional Signal Processing

ECE472/572 - Lecture 11. Roadmap. Roadmap. Image Compression Fundamentals and Lossless Compression Techniques 11/03/11.

Kotebe Metropolitan University Department of Computer Science and Technology Multimedia (CoSc 4151)

CS 630 Basic Probability and Information Theory. Tim Campbell

CS 229r Information Theory in Computer Science Feb 12, Lecture 5

Coding for Discrete Source

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations

Lecture 11: Quantum Information III - Source Coding

Computing and Communications 2. Information Theory -Entropy

CSE 190, Great ideas in algorithms: Pairwise independent hash functions

Wavelet Scalable Video Codec Part 1: image compression by JPEG2000

CS4800: Algorithms & Data Jonathan Ullman

Exercises with solutions (Set D)

Image Compression - JPEG

Lecture 1 : Data Compression and Entropy

Noisy channel communication

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

Dynamic Programming Lecture #4

Entropy. Probability and Computing. Presentation 22. Probability and Computing Presentation 22 Entropy 1/39

RVs and their probability distributions

Digital communication system. Shannon s separation principle

Lecture 10 : Basic Compression Algorithms

Compressing Tabular Data via Pairwise Dependencies

Autumn Coping with NP-completeness (Conclusion) Introduction to Data Compression

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Joint Universal Lossy Coding and Identification of Stationary Mixing Sources With General Alphabets Maxim Raginsky, Member, IEEE

MATH 324 Summer 2011 Elementary Number Theory. Notes on Mathematical Induction. Recall the following axiom for the set of integers.

Text Compression. Jayadev Misra The University of Texas at Austin December 5, A Very Incomplete Introduction to Information Theory 2

Lossless Compression of Chemical Fingerprints Using Integer Entropy Codes Improves Storage and Retrieval

1 Probability Theory. 1.1 Introduction

Introduction to Information Entropy Adapted from Papoulis (1991)

Nearest Neighbor Search with Keywords: Compression

Information Complexity and Applications. Mark Braverman Princeton University and IAS FoCM 17 July 17, 2017

Multiterminal Source Coding with an Entropy-Based Distortion Measure

Algorithms for Data Science

6.1 Occupancy Problem

Notes. Combinatorics. Combinatorics II. Notes. Notes. Slides by Christopher M. Bourke Instructor: Berthe Y. Choueiry. Spring 2006

9. Distance measures. 9.1 Classical information measures. Head Tail. How similar/close are two probability distributions? Trace distance.

18.2 Continuous Alphabet (discrete-time, memoryless) Channel

Randomized Algorithms

On bounded redundancy of universal codes

Lecture 1: Shannon s Theorem

Topics. Probability Theory. Perfect Secrecy. Information Theory

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding

Computing the Entropy of a Stream

Information. = more information was provided by the outcome in #2

A Gentle Tutorial on Information Theory and Learning. Roni Rosenfeld. Carnegie Mellon University

Ch 0 Introduction. 0.1 Overview of Information Theory and Coding

lossless, optimal compressor

A Faster Grammar-Based Self-Index

1 Introduction to information theory

DATA MINING LECTURE 9. Minimum Description Length Information Theory Co-Clustering

The Communication Complexity of Correlation. Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan

Streaming and communication complexity of Hamming distance

Network Coding on Directed Acyclic Graphs

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Fundamentals of Probability CE 311S

Source Coding with Lists and Rényi Entropy or The Honey-Do Problem

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

Cosets and Lagrange s theorem

Entropy Coding. Connectivity coding. Entropy coding. Definitions. Lossles coder. Input: a set of symbols Output: bitstream. Idea

CMPT 365 Multimedia Systems. Lossless Compression

Finite and infinite sets, and cardinality

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Lecture 8: Probability

Lecture 11: Information theory THURSDAY, FEBRUARY 21, 2019

Classification & Information Theory Lecture #8

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

1. Basics of Information

an event with one outcome is called a simple event.

3F1 Information Theory, Lecture 3

Compression. Encryption. Decryption. Decompression. Presentation of Information to client site

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

the time it takes until a radioactive substance undergoes a decay

Lecture 8: Conditional probability I: definition, independence, the tree method, sampling, chain rule for independent events

Dept. of Linguistics, Indiana University Fall 2015

Transcription:

Compressing Kinetic Data From Sensor Networks Sorelle A. Friedler (Swat 04) Joint work with David Mount University of Maryland, College Park

Motivation

Motivation Computer Science Graphics: Image and video segmentation, animation Databases: Maintenance over time Sensor Networks: Data analysis Physics Simulations Biology Mathematical ecology: Migratory paths HIV strain analysis Engineering Traffic patterns and identification

Motivation Develop a framework for kinetic data from sensors No advance object motion knowledge No restrictions on object motion Reasonable assumptions of what a sensor can know Efficiency analysis that is motion sensitive

Motivation Kinetic data: data generated by moving objects Sensors collect data Large amounts of data Want to analyze it later Don t know what questions we ll want to ask in advance Lossless data compression

Entropy Low Entropy High Entropy

Entropy Low Entropy Point motion is predictable High Entropy Point motion is close to random Representing the motion of the system requires few bits Representing the motion of the system requires many bits

Entropy Given a random process, entropy measures its uncertainty information content lack of predictability - x pr(x) log pr(x) for all outcomes x Example A weighted coin that s always heads: -(1 log 1) = 0 A normal coin: -(½ log ½ + ½ log ½)=1

Entropy Represents the number of bits to encode a character of a string generated by a random process Normalized entropy: for strings of length n, 1/n entropy - 1 / n x pr(x) log pr(x) for all events x, characters of the string Example A weighted coin that s always heads: HH -½(1 log 1) = 0 A normal coin: HT -½(¼ log ¼ + ¼ log ¼ + ¼ log ¼ + ¼ log ¼)=1

Entropy Measures the joint information content of two strings generated by random processes Joint entropy: The entropy based on joint probabilities of a set of events - x,y pr(x,y) log pr(x,y) for all events x,y Normalized joint entropy: - 1 / n x,y pr(x,y) log pr(x,y) Example: Joint entropy -(½ log ½ + ⅛ log ⅛ + ⅛ log ⅛ + ¼ log ¼)=1¾

Data Compression Encoded strings are compressed to a shorter length than the original Swarthmore College SC, [(S, Swarthmore), (C, College)] Swarthmore College Swat, [(Swat, Swarthmore College)] Consider the string generated by a random process Optimal compression algorithms over a string achieve a per bit encoding rate equal to the normalized entropy Optimal compression algorithms over a set of strings achieve a per bit encoding rate equal to the normalized joint entropy [Shannon48]

Data Compression Options Lossless Data is completely retrievable Lossy Some data may be lost Compression bounds are theoretically provable Can compress the data to fit in the space you have Sliding-window Lempel- Ziv algorithm 1977 (used by gzip)

Lempel-Ziv Sliding-Window Algorithm Look back within window to find repeating phrase Store pointer to most recent previous occurrence Example: window size: 2 ABAABA (0,A) (0,B) (2,1) (1,1) (0,B) (2,1) window size: 3 ABAABA (0,A) (0,B) (2,1) (3,3) window size: 3 ABAABAABA (0,A)(0,B)(2,1)(3,6) Shown to be an entropy encoding algorithm [WynerZiv94]

Lempel-Ziv Sliding-Window Algorithm Decoding takes an encoded string and translates it back to the original. Example: (0,A)(0,B)(2,1)(3,6) ABAABAABA (0,A)(1,1)(0,B)(1,1)(4,2)(5,2) AABBAAAB

Motivation Kinetic data: data generated by moving objects Sensors collect data Large amounts of data Want to analyze it later Don t know what questions we ll want to ask in advance Lossless data compression

Existing Frameworks for Kinetic Data Atallah (1983) Polynomial motion of degree k Motion known in advance Points lie in R d Analysis in R d+1 Kahan (1991) Bounds on point velocity Update function provided Limit queries to function time time Kinetic Data Structures (Basch, Guibas, Hershberger 97) Points have flight plans (algebraic expressions) that can change

Existing Frameworks for Sensor Data Minimal sensor assumptions [GandhiKumarSuri08] Sensors can count objects within their detection region Traffic detection assumptions [GuittonSkordylisTrigoni07] Sensors can calculate traffic flow (cars/time), occupancy (cars/ area) Exact state of object knowledge [Kastrinaki03 Survey] Sensors can calculate object speed, change in angle, etc

Our Framework Detection region around each sensor (stationary sensors) Point motion unrestricted No advance knowledge about motion Each sensor reports the count of points within its region at each synchronized time step k-local: Sensor outputs statistically only dependent on k nearest neighbors sensor balls

Data Collection Data based on underlying geometric motion A B Sensor data streams A B C D E 1 0 0 0 0 2 0 0 0 0 time 2 1 0 0 0 D C 0 2 0 0 0 0 0 0 1 0 E 0 0 1 1 0

What is Optimal? Optimal: the smallest lossless encoding of the sensor data Joint entropy chain rule (X = {X 1, X 2,, X S }): H(X) = H(X 1 )+H(X 2 X 1 )+ +H(X S X 1,,X S-1 ) k-local entropy (H k ): normalized joint entropy of a set of streams that are only dependent on up to k streams from their k nearest neighbors Optimal compression of sensor streams is H(X) = H k (X)

Data Compression Algorithm The optimal bound is the joint entropy of the set of streams Compressing each separately doesn t reach this bound Compressing all together reaches bound, but the window size necessary to achieve the needed repetition is too large to be practical Key: Need statistically dependent streams to be compressed together Since H(X)=H k (X) we want groups of roughly k streams

Data Compression Algorithm: Partitioning Lemma k-clusterable: A point set that can be clustered into subsets of size at most k+1 so that if p and q are among each other s k nearest neighbors then they are in the same cluster. 2-clusterable example

Data Compression Algorithm: Partitioning Lemma k-clusterable: A point set that can be clustered into subsets of size at most k+1 so that if p and q are among each other s k nearest neighbors then they are in the same cluster. not 2-clusterable example 0 1 2 3 4 5 6 7

Data Compression Algorithm: Partitioning Lemma Lemma: There exists an integral constant c such that for all k>0 any point set can be partitioned into c partitions that are each k-clusterable.

Partitioning Algorithm for all points in P find r k (p) = distance from p to its k th nearest neighbor NN k (p) = k nearest neighbors of p while P (points) is nonempty unmark all points in P create a new empty partition P i while there are unmarked points r = minimum r k (p) for unmarked p q = point with r k (q) = r add q and its NN k (q) to P i and remove from P mark all points within 3r of q return {P 1, P 2,, P c }

Partitioning Lemma Proof Sketch By nature of marking and order of clustering, all partitions are k-clusterable Increasing r k (p) choices ensure non-mutual NN k (p) relations are separated into different clusters

Partitioning Lemma Proof Sketch There are c partitions In each round every point is either marked or removed from P By bounding the outer radius that can mark a point and the separation distance between such points, we can use a packing argument to bound the number of times a point can be marked to c = O(1 + 12 O(1) ) = O(1). 12r r

Data Compression Algorithm Partition and cluster the sensors, then compress for each partition P i for each cluster in P i combine the cluster s streams into one with longer characters and compress it return the union of the compressed streams 1120 1003 2201 (112)(102)(200)(031)

Data Compression Algorithm Proof Sketch: The joint entropy of the streams is the optimal length Recall: H(X) = H k (X) Sensor outputs are k-local, so each compressed partition is the optimal length: statistically dependent streams are compressed together There are c partitions, so the total length is c times optimal Recall: c is O(1)

Summary of Results Framework for kinetic sensor data No assumptions about object motion or advance knowledge Motion sensitive analysis Relies on minimal sensor abilities Lossless compression algorithm that compresses the data to c H(X), which is O(optimal) Assumes the sensor outputs are only dependent on their k nearest neighbors Assumes the sensor outputs can be modeled by an underlying random process

Recent and Future Work Extend analysis of compression algorithm to consider empirical entropy (no underlying random process) Retrieval without decompressing the data Range searching E.g. given a time period and spatial range, what is the aggregated count? Statistical analysis without decompressing the data Lossy compression Experimental evaluation Application in non-sensor contexts

Thank you! Questions?