Estimating Entropy and Entropy Norm on Data Streams

Similar documents
13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

arxiv: v1 [cs.ds] 17 Mar 2016

Tight Information-Theoretic Lower Bounds for Welfare Maximization in Combinatorial Auctions

arxiv: v2 [math.co] 3 Dec 2008

On the Inapproximability of Vertex Cover on k-partite k-uniform Hypergraphs

Computational and Statistical Learning Theory

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40

A Simple Regression Problem

arxiv: v1 [cs.ds] 3 Feb 2014

The degree of a typical vertex in generalized random intersection graph models

1 Identical Parallel Machines

List Scheduling and LPT Oliver Braun (09/05/2017)

LogLog-Beta and More: A New Algorithm for Cardinality Estimation Based on LogLog Counting

The Frequent Paucity of Trivial Strings

Research Article On the Isolated Vertices and Connectivity in Random Intersection Graphs

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine

Fairness via priority scheduling

On Conditions for Linearity of Optimal Estimation

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

Asynchronous Gossip Algorithms for Stochastic Optimization

Lecture 9 November 23, 2015

Testing Properties of Collections of Distributions

Computable Shell Decomposition Bounds

Bipartite subgraphs and the smallest eigenvalue

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

Collision-based Testers are Optimal for Uniformity and Closeness

A note on the multiplication of sparse matrices

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

The Weierstrass Approximation Theorem

Analyzing Simulation Results

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval

Computable Shell Decomposition Bounds

3.8 Three Types of Convergence

Feature Extraction Techniques

COS 424: Interacting with Data. Written Exercises

CS Lecture 13. More Maximum Likelihood

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 11 10/15/2008 ABSTRACT INTEGRATION I

Vulnerability of MRD-Code-Based Universal Secure Error-Correcting Network Codes under Time-Varying Jamming Links

Fundamental Limits of Database Alignment

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science

In this chapter, we consider several graph-theoretic and probabilistic models

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Polygonal Designs: Existence and Construction

Handout 7. and Pr [M(x) = χ L (x) M(x) =? ] = 1.

arxiv: v1 [cs.ds] 29 Jan 2012

Convex Programming for Scheduling Unrelated Parallel Machines

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation

Exact tensor completion with sum-of-squares

Solutions of some selected problems of Homework 4

Constant-Space String-Matching. in Sublinear Average Time. (Extended Abstract) Wojciech Rytter z. Warsaw University. and. University of Liverpool

Lecture 21. Interior Point Methods Setup and Algorithm

Birthday Paradox Calculations and Approximation

STOPPING SIMULATED PATHS EARLY

Learnability and Stability in the General Learning Setting

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Handout 6 Solutions to Problems from Homework 2

Acyclic Colorings of Directed Graphs

Replication Strategies in Unstructured Peer-to-Peer Networks

Tail estimates for norms of sums of log-concave random vectors

Distributed Subgradient Methods for Multi-agent Optimization

Approximating and Testing k-histogram Distributions in Sub-linear time

Lower Bounds for Quantized Matrix Completion

Introduction to Discrete Optimization

Block designs and statistics

Constrained Consensus and Optimization in Multi-Agent Networks arxiv: v2 [math.oc] 17 Dec 2008

Probability and Stochastic Processes: A Friendly Introduction for Electrical and Computer Engineers Roy D. Yates and David J.

SPECTRUM sensing is a core concept of cognitive radio

Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography

Supplement to: Subsampling Methods for Persistent Homology

The Transactional Nature of Quantum Information

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013).

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 2, FEBRUARY ETSP stands for the Euclidean traveling salesman problem.

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

Interactive Markov Models of Evolutionary Algorithms

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Fixed-to-Variable Length Distribution Matching

Characterization of the Line Complexity of Cellular Automata Generated by Polynomial Transition Rules. Bertrand Stone

Divisibility of Polynomials over Finite Fields and Combinatorial Applications

On Process Complexity

Optimal Jamming Over Additive Noise: Vector Source-Channel Case

On the Use of A Priori Information for Sparse Signal Approximations

Computational and Statistical Learning Theory

3.3 Variational Characterization of Singular Values

Stochastic Subgradient Methods

A Note on Online Scheduling for Jobs with Arbitrary Release Times

1 Generalization bounds based on Rademacher complexity

RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION

Multi-Dimensional Hegselmann-Krause Dynamics

Sharp Time Data Tradeoffs for Linear Inverse Problems

Detection and Estimation Theory

Approximating and Testing k-histogram Distributions in Sub-linear Time

1 Bounding the Margin

arxiv: v1 [math.co] 19 Apr 2017

Lecture 20 November 7, 2013

Machine Learning Basics: Estimators, Bias and Variance

time time δ jobs jobs

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13

Randomized Recovery for Boolean Compressed Sensing

Transcription:

Estiating Entropy and Entropy Nor on Data Streas Ait Chakrabarti 1, Khanh Do Ba 1, and S. Muthukrishnan 2 1 Departent of Coputer Science, Dartouth College, Hanover, NH 03755, USA 2 Departent of Coputer Science, Rutgers University, Piscataway, NJ 08854, USA Abstract. We consider the proble of coputing inforation theoretic functions such as entropy on a data strea, using sublinear space. Our first result deals with a easure we call the entropy nor of an input strea: it is closely related to entropy but is structurally siilar to the well-studied notion of frequency oents. We give a polylogarithic space one-pass algorith for estiating this nor under certain conditions on the input strea. We also prove a lower bound that rules out such an algorith if these conditions do not hold. Our second group of results are for estiating the epirical entropy of an input strea. We first present a sublinear space one-pass algorith for this proble. For a strea of ites and a given real paraeter α, our algorith uses space e O( 2α ) and provides an approxiation of 1/α in the worst case and (1 + ε) in ost cases. We then present a two-pass polylogarithic space (1 + ε)-approxiation algorith. All our algoriths are quite siple. 1 Introduction Algoriths for coputational probles on data streas have been the focus of plenty of recent research in several counities, such as theory, databases and networks [1, 6, 2, 13]. In this odel of coputation, the input is a strea of ites that is too long to be stored copletely in eory, and a typical proble involves coputing soe statistics on this strea. The ain challenge is to design algoriths that are efficient not only in ters of running tie, but also in ters of space (i.e., eory usage): sublinear space is andatory and polylogarithic space is often the goal. The seinal paper of Alon, Matias and Szegedy [1] considered the proble of estiating the frequency oents of the input strea: if a strea contains i occurrences of ite i (for 1 i n), its k th frequency oent is denoted Supported by an NSF CAREER award and Dartouth College startup funds. Work partly done while visiting DIMACS in the REU progra, supported by NSF ITR 0220280, DMS 0354600, and a Dean of Faculty Fellowship fro Dartouth College. Supported by NSF ITR 0220280 and DMS 0354600.

F k and is defined by F k := n i=1 k i. Alon et al. showed that F k could be estiated arbitrarily well in sublinear space for all nonnegative integers k and in polylogarithic (in and n) space for k {0, 1, 2}. Their algorithic results were subsequently iproved by Coppersith and Kuar [4] and Indyk and Woodruff [10]. In this work, we first consider a soewhat related statistic of the input strea, inspired by the classic inforation theoretic notion of entropy. We consider the entropy nor of the input strea, denoted F H and defined by F H := n i=1 i lg i. 3 We prove (see Theore 2.2) that F H can be estiated arbitrarily well in polylogarithic space provided its value is not too sall, a condition that is satisfied if, e.g., the input strea is at least twice as long as the nuber of distinct ites in it. We also prove (see Theore 2.5) that F H cannot be estiated well in polylogarithic space if its value is too sall. Second, we consider the estiation of entropy itself, as opposed to the entropy nor. Any input strea iplicitly defines an epirical probability distribution on the set of ites it contains; the probability of ite i being i /, where is the length of the strea. The epirical entropy of the strea, denoted H, is defined to be the entropy of this probability distribution: H := n ( i /) lg(/ i ) = lg F H /. (1) i=1 An algorith that coputes F H exactly clearly suffices to copute H as well. However, since we are only able to approxiate F H in the data strea odel, we need new techniques to estiate H. We prove (see Theore 3.1) that H can be approxiated using sublinear space. Although the space usage is not polylogarithic in general, our algorith provides a tradeoff between space and approxiation factor and can be tuned to use space arbitrarily close to polylogarithic. The standard data strea odel allows us only one pass over the input. If, however, we are allowed two passes over the input but still restricted to sall space, we have an algorith that approxiates H to within a (1 + ε) factor and uses polylogarithic space. Both entropy and entropy nor are natural statistics to approxiate on data streas. Arguably, entropy related easures are even ore natural than L p nors or frequency oents F k. In addition, they have direct applications. The quintessential need arises in analyzing IP network traffic at packet level on high speed routers. In onitoring IP traffic, one cares about anoalies. In general, anoalies are hard to define and detect since there are subtle intrusions, sophisticated dependence aongst network events and agents gaing the attacks. A nuber of recent results in the networking counity use entropy as an approach [7, 14, 15] to detect sudden changes in the network behavior and as an indicator of anoalous events. The rationale is well explained elsewhere, 3 Throughout this paper lg denotes logarith to the base 2.

chiefly in Section 2 of [14]. The current research in this area [14, 7, 15] relies on full space algoriths for entropy calculation; this is a serious bottleneck in high speed routers where high speed eory is at preiu. Indeed, this is the bottleneck that otivated data strea algoriths and their applications to IP network analysis [6, 13]. Our sall-space algoriths can iediately ake entropy estiation at line speed practical on high speed routers. To the best of our knowledge, our upper and lower bound results for the entropy nor are the first of their kind. Recently Guha, McGregor and Venkatasubraanian [8] considered approxiation algoriths for the entropy of a given distribution ( under various odels, including the data strea odel. They obtain a e e 1 )-approxiation + ε for the entropy H of an input strea provided H is at least a sufficiently large constant, using space Õ(1/(ε2 H)), where the Õ-notation hides factors polylogarithic in and n. Our work shows that H can be (1 + ε)-approxiated in Õ(1/ε2 ) space for H 1 (see the reark after Theore 3.1); Our space bounds are independent of H. Guha et al. [8] also give a two-pass (1+ε)-approxiation algorith for entropy, using Õ(1/(ε2 H)) space. We do the sae using only Õ(1/ε2 ) space. Finally, Guha et al. consider the entropy estiation proble in the rando streas odel, where it is assued that the ites in the input strea are presented in a unifor rando order. Under this assuption, they obtain a (1 + ε)-approxiation using Õ(1/ε2 ) space. We study adversarial data strea inputs only. 2 Estiating the Entropy Nor In this section we present a polylogarithic space (1 + ε)-approxiation algorith for entropy nor that assues the nor is sufficiently large, and prove a atching lower bound if the nor is in fact not as large. 2.1 Upper Bound Our algorith is inspired by the work of Alon et al. [1]. Their first algorith, for the frequency oents F k, has the following nice structure to it (soe of the terinology is ours). A subroutine coputes a basic estiator, which is a rando variable X whose ean is exactly the quantity we seek and whose variance is sall. The algorith itself uses this subroutine to aintain s 1 s 2 independent basic estiators {X ij : 1 i s 1, 1 j s 2 }, where each X ij is distributed identically to X. It then outputs a final estiator Y defined by ( Y := edian 1 j s 2 1 s 1 s 1 i=1 X ij ) The following lea, iplicit in [1], gives a guarantee on the quality of this final estiator.

Lea 2.1. Let µ := E[X]. For any ε, δ (0, 1], if s 1 8 Var[X]/(ε 2 µ 2 ) and s 2 = 4 lg(1/δ), then the above final estiator deviates fro µ by no ore than εµ with probability at least 1 δ. The above algorith can be ipleented to use space O(S log(1/δ) Var[X]/(ε 2 µ 2 )), provided the basic estiator can be coputed using space at ost S. Proof. The clai about the space usage is iediate fro the structure of the algorith. Let Y j = 1 s1 s 1 i=1 X ij. Then E[Y j ] = µ and Var[Y j ] = Var[X]/s 1 ε 2 µ 2 /8. Applying Chebyshev s Inequality gives us Pr[ Y j µ εµ] 1/8. Now, if fewer than (s 2 /2) of the Y j s deviate by as uch as εµ fro µ, then Y ust be within εµ of µ. So we upper bound the probability that this does not happen. Define s 2 indicator rando variables I j, where I j = 1 iff Y j µ εµ, and let W = s 2 j=1 I j. Then E[W ] s 2 /8. A standard Chernoff bound (see, e.g. [12, Theore 4.1]) gives Pr [ Y µ εµ ] [ Pr W s ] 2 2 ( e 3 4 4 ) s2/8 = ( e 3 4 4 ) 1 2 lg(1/δ) δ, which copletes the proof. We use the following subroutine to copute a basic estiator X for the entropy nor F H. 1 2 3 Input strea: A = a 1, a 2,..., a, where each a i {1,..., n}. Choose p uniforly at rando fro {1,..., }. Let r = {q : a q = a p, p q }. Note that r 1. Let X = ( r lg r (r 1) lg(r 1) ), with the convention that 0 lg 0 = 0. Our algorith for estiating the entropy nor outputs a final estiator based on this basic estiator, as described above. This gives us the following theore. Theore 2.2. For any > 0, if F H /, the above one-pass algorith can be ipleented so that its output deviates fro F H by no ore than εf H with probability at least 1 δ, and so that it uses space log(1/δ) O log (log + log n). ε 2 In particular, taking to be a constant, we have a polylogarithic space algorith that works on streas whose F H is not too sall.

Proof. We first check that the expected value of X is indeed the desired quantity: E[X] = n v r lg r (r 1) lg(r 1) = v=1 r=1 n ( v lg v 0 lg 0) = F H. v=1 The approxiation guarantee of the algorith now follows fro Lea 2.1. To bound the space usage, we ust bound the variance Var[X] and for this we bound E[X 2 ]. Let f(r) := r lg r, with f(0) := 0, so that X can be expressed as X = (f(r) f(r 1)). Then n v E[X 2 2 ] = f(r) f(r 1) v=1 r=1 ax f(r) f(r 1) 1 r n v f(r) f(r 1) v=1 r=1 sup {f (x) : x (0, ]} F H (2) = (lg e + lg ) F H (3) (lg e + lg ) F 2 H, where (2) follows fro the Mean Value Theore. Thus, Var[X]/E[X] 2 = O( lg ). Moreover, the basic estiator can be ipleented using space O(log + log n): O(log ) to count and r, and O(log n) to store the value of a p. Plugging these bounds into Lea 2.1 yields the claied upper bound on the space of our algorith. Let F 0 denote the nuber of distinct ites in the input strea (this notation deliberately coincides with that for frequency oents). Let f(x) := x lg x as used in the proof above. Observe that f is convex on (0, ) whence, via Jensen s inequality, we obtain ( F H = F 0 F 0 n f( v ) F 0 f v=1 1 F 0 v=1 ) n v = lg F 0. (4) Thus, if the input strea satisfies 2F 0 (or the sipler, but stronger, condition 2n), then we have F H. As a direct corollary of Theore 2.2 (for = 1) we obtain a (1 + ε)-approxiation algorith for the entropy nor in space O((log(1/δ)/ε 2 ) log (log + log n)). However, we can do slightly better. Theore 2.3. If 2F 0 then the above one-pass, (1 + ε)-approxiation algorith can be ipleented in space log(1/δ) O log log n without a priori knowledge of the strea length. ε 2

Proof. We follow the proof of Theore 2.2 up to the bound (3) to obtain Var[X] (2 lg )F H, for large enough. We now ake the following clai lg lg(/f 0 ) 2 ax{lg F 0, 1}. (5) Assuing the truth of this clai and using (4), we obtain Var[X] (2 lg )F H 2 lg lg(/f 0 ) F 2 H 4 ax{lg F 0, 1}F 2 H (4 lg n)f 2 H. Plugging this into Lea 2.1 and proceeding as before, we obtain the desired space upper bound. Note that we no longer need to know before starting the algorith, because the nuber of basic estiators used by the algorith is now independent of. Although aintaining each basic estiator sees, at first, to require prior knowledge of, a careful ipleentation can avoid this, as shown by Alon et al [1]. We turn to proving our clai (5). We will need the assuption 2F 0. If F 2 0, then lg 2 lg F 0 = 2 lg F 0 lg(2f 0 /F 0 ) 2 lg F 0 lg(/f 0 ) and we are done. On the other hand, if F 2 0, then F 0 1/2 so that lg(/f 0 ) lg (1/2) lg = (1/2) lg and we are done as well. Reark 2.4. Theore 2.2 generalizes to estiating quantities of the for ˆµ = n v=1 ˆf( v ), for any onotone increasing (on integer values), differentiable function ˆf that satisfies ˆf(0) = 0. Assuing ˆµ /, it gives us a one-pass (1 + ε)- approxiation algorith that uses Õ( ˆf () ) space. For instance, this space usage is polylogarithic in if ˆf(x) = x polylog(x). 2.2 Lower Bound The following lower bound shows that the algorith of Theore 2.2 is optial, up to factors polylogarithic in and n. Theore 2.5. Suppose and c are integers with 4 o() and 0 c /. On input streas of size at ost, a randoized algorith able to distinguish between F H 2c and F H c + 2/ ust use space at least Ω( ). In particular, the upper bound in Theore 2.2 is tight in its dependence on. Proof. We present a reduction fro the classic proble of (two-party) Set Disjointness in counication coplexity [11]. Suppose Alice has a subset X and Bob a subset Y of {1, 2,..., 1}, such that X and Y either are disjoint or intersect at exactly one point. Let us define the apping { ( 2c)x φ : x + i : i Z, 0 i < 2c }.

Alice creates a strea A by listing all eleents in x X φ(x) and concatenating the c special eleents + 1,..., + c. Siilarly, Bob creates a strea B by listing all eleents in y Y φ(y) and concatenating the sae c special eleents + 1,..., + c. Now, Alice can process her strea (with the hypothetical entropy nor estiation algorith) and send over her eory contents to Bob, who can then finish the processing. Note that the length of the cobined strea A B is at ost 2c + X Y (( 2c)/ ). We now show that, based on the output of the algorith, Alice and Bob can tell whether or not X and Y intersect. Since the set disjointness proble has counication coplexity Ω( ), we get the desired space lower bound. Suppose X and Y are disjoint. Then the ites in A B are all distinct except for the c special eleents, which appear twice each. So F H (A B) = c (2 lg 2) = 2c. Now suppose X Y = {z}. Then the ites in A B are all distinct except for the ( 2c)/ eleents in φ(z) and the c special eleents, each of which appears twice. So F H (A B) = 2(c + ( 2c)/ ) c + 2/, since 4. Reark 2.6. Notice that the above theore rules out even a polylogarithic space constant factor approxiation to F H that can work on streas with sall F H. This can be seen by setting = γ for soe constant γ > 0. 3 Estiating the Epirical Entropy We now turn to the estiation of the epirical entropy H of a data strea, defined as in equation (1): H = n i=1 ( i/) lg(/ i ). Although H can be coputed exactly fro F H, as shown in (1), a (1 + ε)-approxiation of F H can yield a poor estiate of H when H is sall (sublinear in its axiu value, lg ). We therefore present a different sublinear space, one-pass algorith that directly coputes entropy. Our data structure takes a user paraeter α > 0, and consists of three coponents. The first (A1) is a sketch in the anner of Section 2, with basic estiator ( r X = lg r r 1 lg ), (6) r 1 and a final estiator derived fro this basic estiator using s 1 = (8/ε 2 ) 2α lg 2 and s 2 = 4 lg(1/δ). The second coponent (A2) is an array of 2α counters (each counting fro 1 to ) used to keep exact counts of the first 2α distinct ites seen in the input strea. The third coponent (A3) is a Count-Min Sketch, as described by Corode and Muthukrishnan [5], which we use to estiate k, defined to be the nuber of ites in the strea that are different fro the ost frequent ite; i.e., k = ax{ i : 1 i n}. The algorith itself works as follows. Recall that F 0 denotes the nuber of distinct ites in the strea.

1 2 3 4 5 6 7 Maintain A1, A2, A3 as described above. When queried (or at end of input): if F 0 2α then return exact H fro A2. else let ˆk = estiate of k fro A3. if ˆk (1 ε) 1 α then return final estiator, Y, of A1. else return (ˆk lg )/. end Theore 3.1. The above algorith uses log(1/δ) O ε 2 2α log 2 (log + log n) space and outputs a rando variable Z that satisfies the following properties. 1. If k 2α 1, then Z = H. 2. If k 1 α, then Pr [ Z H εh ] δ. 3. Otherwise (i.e., if 2α k < 1 α ), Z is a (1/α)-approxiation of H. Reark 3.2. Under the assuption H 1, an algorith that uses only the basic estiator in A1 and sets s 1 = (8/ε 2 ) lg 2 suffices to give a (1+ε)-approxiation in O(ε 2 log 2 ) space. Proof. The space bound is clear fro the specifications of A1, A2 and A3, and Lea 2.1. We now prove the three claied properties of Z in sequence. Property 1: This follows directly fro the fact that F 0 k + 1. Property 2: The Count-Min sketch guarantees that ˆk k and, with probability at least 1 δ, ˆk (1 ε)k. The condition in Property 2 therefore iplies that ˆk (1 ε) 1 α, that is, Z = Y, with probability at least 1 δ. Here we need the following lea. Lea 3.3. Given that the ost frequent ite in the input strea A has count k, the iniu entropy H in is achieved when all the reaining k ites are identical, and the axiu H ax is achieved when they are all distinct. Therefore, H in = k lg k + k lg k, H ax = k lg k + k lg. and

Proof. Consider a iniu-entropy strea A in and suppose that, apart fro its ost frequent ite, it has at least two other ites with positive count. Without loss of generality, let 1 = k and 2, 3 1. Modify A in to A by letting 2 = 2 + 3 and 3 = 0, and keeping all other counts the sae. Then H(A ) H(A in ) = (lg F H (A )/) (lg F H (A in )/) = (F H (A in ) F H (A ))/ = 2 lg 2 + 3 lg 3 ( 2 + 3 ) lg( 2 + 3 ) < 0, since x lg x is convex and onotone increasing (on integer values), giving us a contradiction. The proof of the axiu-entropy distribution is siilar. Now, consider equation (6) and note that for any r, X lg. Thus, if E[X] = H 1, then Var[X]/ E[X] 2 E[X 2 ] lg 2 and our choice of s 1 is sufficiently large to give us the desired (1 + ε)-approxiation, by Lea 2.1. 4 On the other hand, if H < 1, then k < /2, by a siple arguent siilar to the proof of Lea 3.3. Using the expression for H in fro Lea 3.3, we then have H in = lg k + k lg k k ( lg 1 k ) k α, which gives us Var[X]/ E[X] 2 E[X 2 ]/ 2α (lg 2 ) 2α. Again, plugging this and our choice of s 1 into Lea 2.1 gives us the desired (1+ε)-approxiation. Property 3: By assuption, k < 1 α. If ˆk (1 ε) 1 α, then Z = Y and the analysis proceeds as for Property 2. Otherwise, Z = (ˆk lg )/ (k lg )/. This tie, again by Lea 3.3, we have and H in k lg k k lg (α ) = αk lg, H ax = k lg k + k lg = lg k + k lg( k) k k lg + O, which, for large, iplies H o(h) Z H/α and gives us Property 3. Note that we did not use the inequality 2α k in the proof of this property. The ideas involved in the proof of Theore 3.1 can be used to yield a very efficient two-pass algorith for estiating H, which can be found in [3]. 4 This observation, that H 1 = Var[X] lg 2, proves the stateent in the reark following Theore 3.1.

4 Conclusions Entropy and entropy nors are natural easures with direct applications in IP network traffic analysis for which one-pass streaing algoriths are needed. We have presented one-pass sublinear space algoriths for approxiating the entropy nors as well as the epirical entropy. We have also presented a two-pass algorith for epirical entropy that has a stronger approxiation guarantee and space bound. We believe our algoriths will be of interest in practice of data strea systes. It will be of interest to study these probles on streas in the presence of inserts and deletes. Note: Very recently, we have learned of a work in progress [9] that ay lead to a one-pass polylogarithic space algorith for approxiating H to within a (1 + ε)-factor. References 1. N. Alon, Y. Matias and M. Szegedy. The space coplexity of approxiating the frequency oents. Proc. ACM STOC, 20 29, 1996. 2. B. Babcock, S. Babu, M. Datar, R. Motwani and J. Wido. Models and issues in data strea systes. ACM PODS, 2002, 1 16. 3. A. Chakrabarti, K. Do Ba and S. Muthukrishnan. Estiating entropy and entropy nor on data streas. DIMACS technical report, 2005-33. 4. D. Coppersith and R. Kuar. An iproved data strea algorith for frequency oents. ACM-SIAM SODA, 151 156, 2004. 5. G. Corode and S. Muthukrishnan. An iproved data strea suary: the countin sketch and its applications. J. Algoriths, 55(1): 58 75, April 2005. 6. C. Estan and G. Varghese. New directions in traffic easureent and accounting: Focusing on the elephants, ignoring the ice. ACM Trans. Coput. Syst., 21(3): 270 313, 2003. 7. Y. Gu, A. McCallu and D. Towsley. Detecting Anoalies in Network Traffic Using Maxiu Entropy Estiation. Proc. Internet Measureent Conference, 2005. 8. S. Guha, A. McGregor, and S. Venkatasubraanian. Streaing and Sublinear Approxiation of Entropy and Inforation Distances. ACM-SIAM SODA, to appear, 2006. 9. P. Indyk. Personal e-ail counication. Septeber 2005. 10. P. Indyk and D. Woodruff. Optial approxiations of the frequency oents of data streas. ACM STOC, 202 208, 2005. 11. E. Kushilevitz and N. Nisan. Counication Coplexity. Cabridge University Press, Cabridge, 1997. 12. R. Motwani and P. Raghavan. Randoized Algoriths. Cabridge University Press, New York, 1995. 13. S. Muthukrishnan. Data Streas: Algoriths and Applications. Manuscript, Available online at http://www.cs.rutgers.edu/ uthu/strea-1-1.ps 14. A. Wagner and B. Plattner Entropy Based Wor and Anoaly Detection in Fast IP Networks. 14th IEEE International Workshops on Enabling Technologies: Infrastructures for Collaborative Enterprises (WET ICE), STCA security workshop, Linkping, Sweden, June, 2005 15. K. Xu, Z. Zhang, and S. Bhattacharya. Profiling Internet Backbone Traffic: Behavior Models and Applications. Proc. ACM SIGCOMM 2005.