Space- Efficient Es.ma.on of Sta.s.cs over Sub- Sampled Streams
|
|
- Edith Peters
- 5 years ago
- Views:
Transcription
1 Space- Efficient Es.ma.on of Sta.s.cs over Sub- Sampled Streams Andrew McGregor University of Massachuse7s A. Pavan Iowa State University Srikanta Tirthapura Iowa State University David Woodruff IBM Research Almaden
2 Sampled IP Packet Streams IP Traffic Monitoring 40Gbps 100s of Gbps Cisco NeOlow standard for monitoring Sampled NeOlow Network Monitor sees only a random sample of the original packet stream Different Types of Sampling Used IETF Working Group (psamp) EsTmaTon from Sub- Sampled Streams 2
3 Sampled Streams What can we compute over a stream by observing only a random sample of the stream? Two Constraints: Only observe a random sample Streaming, Memory Bound ComputaTon EsTmaTon from Sub- Sampled Streams 3
4 Model: Bernoulli Sampling Original Stream P a 1,a 2,...,a n a i {1, 2, 3,...,m} Bernoulli Sampler selects each a i with probability p Sampled Stream L Aggregates and Analysis of Original Stream P EsTmaTon from Sub- Sampled Streams 4
5 Aggregates The stream is a sequence of items What ma7ers is the vector frequency of i f = (a 1,a 2,...,a n ) f 1, f 2,..., f m where f i is Frequency Moments Number of DisTnct Elements Empirical Entropy Heavy Hi7ers H( f ) = F k ( f ) = m i=1 m i=1 f i k f i n lg n f i EsTmaTon from Sub- Sampled Streams 5
6 Preliminaries Let g i be the frequency of item i in the substream g i = B(f i,p) L contains the frequency vector g 1,g 2,...,g m Randomized multplicatve approximaton, for parameters 1 Pr α X X α 1 δ EsTmaTon from Sub- Sampled Streams 6
7 Results Number of DisTnct Elements (Known) Upper Bound on Result Quality Simple Streaming Algorithm that meets the bound Frequency Moments, F k, k > 0 Smaller values of p increase streaming space complexity Matching upper and lower bounds (w.r.t p) Tradeoff between processing Tme and space Entropy Matching Upper and Lower Bounds for AddiTve Error RelaTve Error Impossible in small space Heavy Hi7ers Sampling is a good fit EsTmaTon from Sub- Sampled Streams 7
8 Related Work Duffield, Lund, Thorup, ProperTes and predicton of flow statstcs from sampled packet streams, IMC 2002 Duffield, Lund, and Thorup, EsTmaTng flow distributons from sampled flow statstcs, SIGCOMM 2003 Rusu and Dobra, Sketching sampled data streams ICDE 2009 Bar- Yossef, Sampling Lower Bounds via InformaTon Theory, STOC 2003 EsTmaTon from Sub- Sampled Streams 8
9 Results: Number of DisTnct Elements For constant p, estmate F 0 (P) from observing L Any algorithm must have relatve error Ω 1 (Charikar et al. PODS 2000) p A simple streaming algorithm has relatve error Ο 1 p Other EsTmators, such as GEE (Generalized Error EsTmator) also possible in single pass EsTmaTon from Sub- Sampled Streams 9
10 Frequency Moments F k Theorem (Upper Bound): There is a one pass streaming algorithm which observes L and outputs a (1 +, δ) - estmator to F k (P ) where using space. assuming k 2 Õ 1 p m1 2/k p = Ω(min(m, n) 1/k ) EsTmaTon from Sub- Sampled Streams 10
11 CompuTng F 2 : Why is This Hard? n n sampler F 2 (P ) n + n =2n p n pn F 2 (L) =p 2 n + pn =(p 2 + p)n EsTmaTon from Sub- Sampled Streams 11
12 Algorithm 1 for F 2 Z = F 2(L) F 1 (L) p 2 E[Z] =F 2 (P ) Algorithm F 2 (L) 1. EsTmate using streaming algorithm on L 2. Use in above formula for Z EsTmaTon from Sub- Sampled Streams 12
13 Algorithm 1 for F 2 (contd) To get needed is (1 +, δ) Õ - approximaton for Z, space Issue: Need to estmate F 2 (L) with very high accuracy to get good relatve error for F 2 (L) F 1 (L) p 2 1 p 2 EsTmaTon from Sub- Sampled Streams 13
14 Algorithm 2 for F 2 Collisions. The number of 2- wise collisions in P m fi C 2 (P )= 2 i=1 ObservaTon: F 2 (P )=2C 2 (P )+F 1 (P ) Algorithm: C 2 (P ),F 1 (P ) EsTmate and with mult. error (1 + ) EsTmaTon from Sub- Sampled Streams 14
15 EsTmaTng C 2 (P ) ObservaTon: E[C 2 (L)] = p 2 C 2 (P ) Var(C 2 (L)) = O(p 3 F ) If C 2 (L) estmated accurately, we are done But this is hard in general, in small space EsTmaTon from Sub- Sampled Streams 15
16 EsTmaTng C 2 (L) F 2 (P )=2C 2 (P )+F 1 (P ) Case I: Case 2: C 2 (L) =Ω(p 2 F 2 (P )) C 2 (L) =O(p 2 F 2 (P )) C 2 (L) C 2 (P ) EsTmate with good relatve error hence C 2 (P ) F 2 (P ) Accurate estmate of not needed Get an estmate within a multplicatve error of 3 C 2 (L) EsTmaTon from Sub- Sampled Streams 16
17 EsTmaTng C 2 (L) EsTmate with good relatve error when C 2 (L) =Ω(p 2 F 2 (P )) Technique due to Indyk & Woodruff (STOC 2005) Divide items {1,2..,m} into classes based on frequency EsTmate the size of different classes that contribute to the final result EsTmaTon from Sub- Sampled Streams 17
18 Tight Lower Bound for F k Theorem: Any constant- pass streaming algo. that (1 +, δ) - approximates F k, for a sufficiently small constants δ by observing a sampled stream, in the Bernoulli sampling model, requires Ω m 1 2/k /p bits of space EsTmaTon from Sub- Sampled Streams 18
19 F k Time- Space Tradeoff With sampled stream at probability p Processing Time: Streaming Space: Õ(pn) Õ m 1 2/k p Product: Õ(nm 1 2/k ) EsTmaTon from Sub- Sampled Streams 19
20 Entropy No multplicatve error approximaton possible with probability 9/10, even if p > ½ The entropy of sampled stream could be zero, while that of the original stream non- zero If p = Ω(n 1/3 ) there is an approximaton H to H(f) such that with prob. at least 99/100 H 100H(f) H H(f)/2 o(1) EsTmaTon from Sub- Sampled Streams 20
21 Heavy Hi7ers The frequency of heavy hi7ers are (approximately) proportonately maintained in the sampled stream Precise upper and lower bounds in paper EsTmaTon from Sub- Sampled Streams 21
22 Conclusions Extreme Volume Data Streams F k Upper and Lower Bounds for F k, k > 0 Smooth Space- Time tradeoff Number of distnct Elements, Entropy Sampling harms, streaming does not Heavy Hi7ers: Sampling and Streaming are both ok EsTmaTon from Sub- Sampled Streams 22
Tight Bounds for Distributed Streaming
Tight Bounds for Distributed Streaming (a.k.a., Distributed Functional Monitoring) David Woodruff IBM Research Almaden Qin Zhang MADALGO, Aarhus Univ. STOC 12 May 22, 2012 1-1 The distributed streaming
More informationTight Bounds for Distributed Functional Monitoring
Tight Bounds for Distributed Functional Monitoring Qin Zhang MADALGO, Aarhus University Joint with David Woodruff, IBM Almaden NII Shonan meeting, Japan Jan. 2012 1-1 The distributed streaming model (a.k.a.
More informationDeclaring Independence via the Sketching of Sketches. Until August Hire Me!
Declaring Independence via the Sketching of Sketches Piotr Indyk Andrew McGregor Massachusetts Institute of Technology University of California, San Diego Until August 08 -- Hire Me! The Problem The Problem
More informationAn Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems
An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems Arnab Bhattacharyya, Palash Dey, and David P. Woodruff Indian Institute of Science, Bangalore {arnabb,palash}@csa.iisc.ernet.in
More informationDistributed Statistical Estimation of Matrix Products with Applications
Distributed Statistical Estimation of Matrix Products with Applications David Woodruff CMU Qin Zhang IUB PODS 2018 June, 2018 1-1 The Distributed Computation Model p-norms, heavy-hitters,... A {0, 1} m
More informationBottom-k and Priority Sampling, Set Similarity and Subset Sums with Minimal Independence. Mikkel Thorup University of Copenhagen
Bottom-k and Priority Sampling, Set Similarity and Subset Sums with Minimal Independence Mikkel Thorup University of Copenhagen Min-wise hashing [Broder, 98, Alta Vita] Jaccard similary of sets A and B
More informationMining Data Streams-Estimating Frequency Moment
Mnng Data Streams-Estmatng Frequency Moment Barna Saha October 26, 2017 Frequency Moment Computng moments nvolves dstrbuton of frequences of dfferent elements n the stream. Frequency Moment Computng moments
More informationRange-efficient computation of F 0 over massive data streams
Range-efficient computation of F 0 over massive data streams A. Pavan Dept. of Computer Science Iowa State University pavan@cs.iastate.edu Srikanta Tirthapura Dept. of Elec. and Computer Engg. Iowa State
More informationA Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window
A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch 1 and Srikanta Tirthapura 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY
More informationSketching and Streaming for Distributions
Sketching and Streaming for Distributions Piotr Indyk Andrew McGregor Massachusetts Institute of Technology University of California, San Diego Main Material: Stable distributions, pseudo-random generators,
More informationBig Data. Big data arises in many forms: Common themes:
Big Data Big data arises in many forms: Physical Measurements: from science (physics, astronomy) Medical data: genetic sequences, detailed time series Activity data: GPS location, social network activity
More informationOptimal Random Sampling from Distributed Streams Revisited
Optimal Random Sampling from Distributed Streams Revisited Srikanta Tirthapura 1 and David P. Woodruff 2 1 Dept. of ECE, Iowa State University, Ames, IA, 50011, USA. snt@iastate.edu 2 IBM Almaden Research
More informationLecture 10. Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1
Lecture 10 Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1 Recap Sublinear time algorithms Deterministic + exact: binary search Deterministic + inexact: estimating diameter in a
More informationA Near-Optimal Algorithm for Computing the Entropy of a Stream
A Near-Optimal Algorithm for Computing the Entropy of a Stream Amit Chakrabarti ac@cs.dartmouth.edu Graham Cormode graham@research.att.com Andrew McGregor andrewm@seas.upenn.edu Abstract We describe a
More informationThe space complexity of approximating the frequency moments
The space complexity of approximating the frequency moments Felix Biermeier November 24, 2015 1 Overview Introduction Approximations of frequency moments lower bounds 2 Frequency moments Problem Estimate
More information1 Estimating Frequency Moments in Streams
CS 598CSC: Algorithms for Big Data Lecture date: August 28, 2014 Instructor: Chandra Chekuri Scribe: Chandra Chekuri 1 Estimating Frequency Moments in Streams A significant fraction of streaming literature
More informationComputing the Entropy of a Stream
Computing the Entropy of a Stream To appear in SODA 2007 Graham Cormode graham@research.att.com Amit Chakrabarti Dartmouth College Andrew McGregor U. Penn / UCSD Outline Introduction Entropy Upper Bound
More informationCommunication-Efficient Computation on Distributed Noisy Datasets. Qin Zhang Indiana University Bloomington
Communication-Efficient Computation on Distributed Noisy Datasets Qin Zhang Indiana University Bloomington SPAA 15 June 15, 2015 1-1 Model of computation The coordinator model: k sites and 1 coordinator.
More informationCMSC 858F: Algorithmic Lower Bounds: Fun with Hardness Proofs Fall 2014 Introduction to Streaming Algorithms
CMSC 858F: Algorithmic Lower Bounds: Fun with Hardness Proofs Fall 2014 Introduction to Streaming Algorithms Instructor: Mohammad T. Hajiaghayi Scribe: Huijing Gong November 11, 2014 1 Overview In the
More informationB669 Sublinear Algorithms for Big Data
B669 Sublinear Algorithms for Big Data Qin Zhang 1-1 2-1 Part 1: Sublinear in Space The model and challenge The data stream model (Alon, Matias and Szegedy 1996) a n a 2 a 1 RAM CPU Why hard? Cannot store
More informationLeveraging Big Data: Lecture 13
Leveraging Big Data: Lecture 13 http://www.cohenwang.com/edith/bigdataclass2013 Instructors: Edith Cohen Amos Fiat Haim Kaplan Tova Milo What are Linear Sketches? Linear Transformations of the input vector
More informationLecture 2. Frequency problems
1 / 43 Lecture 2. Frequency problems Ricard Gavaldà MIRI Seminar on Data Streams, Spring 2015 Contents 2 / 43 1 Frequency problems in data streams 2 Approximating inner product 3 Computing frequency moments
More informationLecture 3 Frequency Moments, Heavy Hitters
COMS E6998-9: Algorithmic Techniques for Massive Data Sep 15, 2015 Lecture 3 Frequency Moments, Heavy Hitters Instructor: Alex Andoni Scribes: Daniel Alabi, Wangda Zhang 1 Introduction This lecture is
More informationAn Optimal Algorithm for Large Frequency Moments Using O(n 1 2/k ) Bits
An Optimal Algorithm for Large Frequency Moments Using O(n 1 2/k ) Bits Vladimir Braverman 1, Jonathan Katzman 2, Charles Seidell 3, and Gregory Vorsanger 4 1 Johns Hopkins University, Department of Computer
More informationSpace efficient tracking of persistent items in a massive data stream
Electrical and Computer Engineering Publications Electrical and Computer Engineering 2014 Space efficient tracking of persistent items in a massive data stream Impetus Technologies Srikanta Tirthapura
More informationEfficient Sketches for Earth-Mover Distance, with Applications
Efficient Sketches for Earth-Mover Distance, with Applications Alexandr Andoni MIT andoni@mit.edu Khanh Do Ba MIT doba@mit.edu Piotr Indyk MIT indyk@mit.edu David Woodruff IBM Almaden dpwoodru@us.ibm.com
More informationStochastic Streams: Sample Complexity vs. Space Complexity
Stochastic Streams: Sample Complexity vs. Space Complexity Michael Crouch 1, Andrew McGregor 2, Gregory Valiant 3, and David P. Woodruff 4 1 Bell Labs Ireland 2 University of Massachusetts Amherst 3 Stanford
More informationSpace-efficient Tracking of Persistent Items in a Massive Data Stream
Space-efficient Tracking of Persistent Items in a Massive Data Stream Bibudh Lahiri Dept. of ECE Iowa State University Ames, IA, USA 50011 bibudh@iastate.edu* Jaideep Chandrashekar Intel Labs Berkeley
More informationZero-One Laws for Sliding Windows and Universal Sketches
Zero-One Laws for Sliding Windows and Universal Sketches Vladimir Braverman 1, Rafail Ostrovsky 2, and Alan Roytman 3 1 Department of Computer Science, Johns Hopkins University, USA vova@cs.jhu.edu 2 Department
More informationData Streams & Communication Complexity
Data Streams & Communication Complexity Lecture 1: Simple Stream Statistics in Small Space Andrew McGregor, UMass Amherst 1/25 Data Stream Model Stream: m elements from universe of size n, e.g., x 1, x
More informationAdaptive Matrix Vector Product
Adaptive Matrix Vector Product Santosh S. Vempala David P. Woodruff November 6, 2016 Abstract We consider the following streaming problem: given a hardwired m n matrix A together with a poly(mn)-bit hardwired
More informationAlgorithms for Querying Noisy Distributed/Streaming Datasets
Algorithms for Querying Noisy Distributed/Streaming Datasets Qin Zhang Indiana University Bloomington Sublinear Algo Workshop @ JHU Jan 9, 2016 1-1 The big data models The streaming model (Alon, Matias
More information25.2 Last Time: Matrix Multiplication in Streaming Model
EE 381V: Large Scale Learning Fall 01 Lecture 5 April 18 Lecturer: Caramanis & Sanghavi Scribe: Kai-Yang Chiang 5.1 Review of Streaming Model Streaming model is a new model for presenting massive data.
More information1-Pass Relative-Error L p -Sampling with Applications
-Pass Relative-Error L p -Sampling with Applications Morteza Monemizadeh David P Woodruff Abstract For any p [0, 2], we give a -pass poly(ε log n)-space algorithm which, given a data stream of length m
More informationLower Bound Techniques for Multiparty Communication Complexity
Lower Bound Techniques for Multiparty Communication Complexity Qin Zhang Indiana University Bloomington Based on works with Jeff Phillips, Elad Verbin and David Woodruff 1-1 The multiparty number-in-hand
More informationMaximum Likelihood Estimation of the Flow Size Distribution Tail Index from Sampled Packet Data
Maximum Likelihood Estimation of the Flow Size Distribution Tail Index from Sampled Packet Data Patrick Loiseau 1, Paulo Gonçalves 1, Stéphane Girard 2, Florence Forbes 2, Pascale Vicat-Blanc Primet 1
More informationTabulation-Based Hashing
Mihai Pǎtraşcu Mikkel Thorup April 23, 2010 Applications of Hashing Hash tables: chaining x a t v f s r Applications of Hashing Hash tables: chaining linear probing m c a f t x y t Applications of Hashing
More informationFinding Frequent Items in Probabilistic Data
Finding Frequent Items in Probabilistic Data Qin Zhang, Hong Kong University of Science & Technology Feifei Li, Florida State University Ke Yi, Hong Kong University of Science & Technology SIGMOD 2008
More informationSparser Johnson-Lindenstrauss Transforms
Sparser Johnson-Lindenstrauss Transforms Jelani Nelson Princeton February 16, 212 joint work with Daniel Kane (Stanford) Random Projections x R d, d huge store y = Sx, where S is a k d matrix (compression)
More informationEstimating Dominance Norms of Multiple Data Streams
Estimating Dominance Norms of Multiple Data Streams Graham Cormode and S. Muthukrishnan Abstract. There is much focus in the algorithms and database communities on designing tools to manage and mine data
More informationTesting k-wise Independence over Streaming Data
Testing k-wise Independence over Streaming Data Kai-Min Chung Zhenming Liu Michael Mitzenmacher Abstract Following on the work of Indyk and McGregor [5], we consider the problem of identifying correlations
More informationCharging from Sampled Network Usage
Charging from Sampled Network Usage Nick Duffield Carsten Lund Mikkel Thorup AT&T Labs-Research, Florham Park, NJ 1 Do Charging and Sampling Mix? Usage sensitive charging charge based on sampled network
More informationFully Understanding the Hashing Trick
Fully Understanding the Hashing Trick Lior Kamma, Aarhus University Joint work with Casper Freksen and Kasper Green Larsen. Recommendation and Classification PG-13 Comic Book Super Hero Sci Fi Adventure
More informationClassic Network Measurements meets Software Defined Networking
Classic Network s meets Software Defined Networking Ran Ben Basat, Technion, Israel Joint work with Gil Einziger and Erez Waisbard (Nokia Bell Labs) Roy Friedman (Technion) and Marcello Luzieli (UFGRS)
More informationNear-Optimal Lower Bounds on the Multi-Party Communication Complexity of Set Disjointness
Near-Optimal Lower Bounds on the Multi-Party Communication Complexity of Set Disjointness Amit Chakrabarti School of Mathematics Institute for Advanced Study Princeton, NJ 08540 amitc@ias.edu Subhash Khot
More informationUniversal Sketches for the Frequency Negative Moments and Other Decreasing Streaming Sums
Universal Sketches for the Frequency Negative Moments and Other Decreasing Streaming Sums Vladimir Braverman 1 and Stephen R. Chestnut 2 1 Department of Computer Science, Johns Hopkins University 3400
More informationTowards a Universal Sketch for Origin-Destination Network Measurements
Towards a Universal Sketch for Origin-Destination Network Measurements Haiquan (Chuck) Zhao 1, Nan Hua 1, Ashwin Lall 2, Ping Li 3, Jia Wang 4, and Jun (Jim) Xu 5 1 Georgia Tech {chz,nanhua,jx}@cc.gatech.edu,
More informationA Tight Lower Bound for High Frequency Moment Estimation with Small Error
A Tight Lower Bound for High Frequency Moment Estimation with Small Error Yi Li Department of EECS University of Michigan, Ann Arbor leeyi@umich.edu David P. Woodruff IBM Research, Almaden dpwoodru@us.ibm.com
More informationCSC2420 Fall 2012: Algorithm Design, Analysis and Theory Lecture 9
CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Lecture 9 Allan Borodin March 13, 2016 1 / 33 Lecture 9 Announcements 1 I have the assignments graded by Lalla. 2 I have now posted five questions
More informationSparse Johnson-Lindenstrauss Transforms
Sparse Johnson-Lindenstrauss Transforms Jelani Nelson MIT May 24, 211 joint work with Daniel Kane (Harvard) Metric Johnson-Lindenstrauss lemma Metric JL (MJL) Lemma, 1984 Every set of n points in Euclidean
More informationSparse Fourier Transform (lecture 1)
1 / 73 Sparse Fourier Transform (lecture 1) Michael Kapralov 1 1 IBM Watson EPFL St. Petersburg CS Club November 2015 2 / 73 Given x C n, compute the Discrete Fourier Transform (DFT) of x: x i = 1 x n
More informationLinear Index Codes are Optimal Up To Five Nodes
Linear Index Codes are Optimal Up To Five Nodes Lawrence Ong The University of Newcastle, Australia March 05 BIRS Workshop: Between Shannon and Hamming: Network Information Theory and Combinatorics Unicast
More informationCS 473: Algorithms. Ruta Mehta. Spring University of Illinois, Urbana-Champaign. Ruta (UIUC) CS473 1 Spring / 32
CS 473: Algorithms Ruta Mehta University of Illinois, Urbana-Champaign Spring 2018 Ruta (UIUC) CS473 1 Spring 2018 1 / 32 CS 473: Algorithms, Spring 2018 Universal Hashing Lecture 10 Feb 15, 2018 Most
More informationSketching Sampled Data Streams
Sketchng Sampled Data Streams Florn Rusu and Aln Dobra CISE Department Unversty of Florda March 31, 2009 Motvaton & Goal Motvaton Multcore processors How to use all the processng power? Parallel algorthms
More informationLower Bounds for Quantile Estimation in Random-Order and Multi-Pass Streaming
Lower Bounds for Quantile Estimation in Random-Order and Multi-Pass Streaming Sudipto Guha 1 and Andrew McGregor 2 1 University of Pennsylvania sudipto@cis.upenn.edu 2 University of California, San Diego
More informationCSCI8980 Algorithmic Techniques for Big Data September 12, Lecture 2
CSCI8980 Algorithmic Techniques for Big Data September, 03 Dr. Barna Saha Lecture Scribe: Matt Nohelty Overview We continue our discussion on data streaming models where streams of elements are coming
More informationThe Communication Complexity of Correlation. Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan
The Communication Complexity of Correlation Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan Transmitting Correlated Variables (X, Y) pair of correlated random variables Transmitting
More informationData Stream Methods. Graham Cormode S. Muthukrishnan
Data Stream Methods Graham Cormode graham@dimacs.rutgers.edu S. Muthukrishnan muthu@cs.rutgers.edu Plan of attack Frequent Items / Heavy Hitters Counting Distinct Elements Clustering items in Streams Motivating
More informationLecture Lecture 9 October 1, 2015
CS 229r: Algorithms for Big Data Fall 2015 Lecture Lecture 9 October 1, 2015 Prof. Jelani Nelson Scribe: Rachit Singh 1 Overview In the last lecture we covered the distance to monotonicity (DTM) and longest
More informationNew Characterizations in Turnstile Streams with Applications
New Characterizations in Turnstile Streams with Applications Yuqing Ai 1, Wei Hu 1, Yi Li 2, and David P. Woodruff 3 1 The Institute for Theoretical Computer Science (ITCS), Institute for Interdisciplinary
More informationPing Li Compressed Counting, Data Streams March 2009 DIMACS Compressed Counting. Ping Li
Ping Li Compressed Counting, Data Streams March 2009 DIMACS 2009 1 Compressed Counting Ping Li Department of Statistical Science Faculty of Computing and Information Science Cornell University Ithaca,
More informationLecture 4: Hashing and Streaming Algorithms
CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 4: Hashing and Streaming Algorithms Lecturer: Shayan Oveis Gharan 01/18/2017 Scribe: Yuqing Ai Disclaimer: These notes have not been subjected
More informationWavelet decomposition of data streams. by Dragana Veljkovic
Wavelet decomposition of data streams by Dragana Veljkovic Motivation Continuous data streams arise naturally in: telecommunication and internet traffic retail and banking transactions web server log records
More informationSome notes on streaming algorithms continued
U.C. Berkeley CS170: Algorithms Handout LN-11-9 Christos Papadimitriou & Luca Trevisan November 9, 016 Some notes on streaming algorithms continued Today we complete our quick review of streaming algorithms.
More informationA Tight Lower Bound for Dynamic Membership in the External Memory Model
A Tight Lower Bound for Dynamic Membership in the External Memory Model Elad Verbin ITCS, Tsinghua University Qin Zhang Hong Kong University of Science & Technology April 2010 1-1 The computational model
More informationOptimal Combination of Sampled Network Measurements
Optimal Combination of Sampled Network Measurements Nick Duffield Carsten Lund Mikkel Thorup AT&T Labs Research, 8 Park Avenue, Florham Park, New Jersey, 7932, USA {duffield,lund,mthorup}research.att.com
More informationModeling Residual-Geometric Flow Sampling
Modeling Residual-Geometric Flow Sampling Xiaoming Wang Joint work with Xiaoyong Li and Dmitri Loguinov Amazon.com Inc., Seattle, WA April 13 th, 2011 1 Agenda Introduction Underlying model of residual
More informationLower Bounds for Dynamic Connectivity (2004; Pǎtraşcu, Demaine)
Lower Bounds for Dynamic Connectivity (2004; Pǎtraşcu, Demaine) Mihai Pǎtraşcu, MIT, web.mit.edu/ mip/www/ Index terms: partial-sums problem, prefix sums, dynamic lower bounds Synonyms: dynamic trees 1
More informationEstimating Entropy and Entropy Norm on Data Streams
Estiating Entropy and Entropy Nor on Data Streas Ait Chakrabarti 1, Khanh Do Ba 1, and S. Muthukrishnan 2 1 Departent of Coputer Science, Dartouth College, Hanover, NH 03755, USA 2 Departent of Coputer
More informationTitle: Count-Min Sketch Name: Graham Cormode 1 Affil./Addr. Department of Computer Science, University of Warwick,
Title: Count-Min Sketch Name: Graham Cormode 1 Affil./Addr. Department of Computer Science, University of Warwick, Coventry, UK Keywords: streaming algorithms; frequent items; approximate counting, sketch
More informationFast Moment Estimation in Data Streams in Optimal Space
Fast Moment Estimation in Data Streams in Optimal Space Daniel M. Kane Jelani Nelson Ely Porat David P. Woodruff Abstract We give a space-optimal algorithm with update time O(log 2 (1/ε) log log(1/ε))
More informationOn the tightness of the Buhrman-Cleve-Wigderson simulation
On the tightness of the Buhrman-Cleve-Wigderson simulation Shengyu Zhang Department of Computer Science and Engineering, The Chinese University of Hong Kong. syzhang@cse.cuhk.edu.hk Abstract. Buhrman,
More information15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018
15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018 Today we ll talk about a topic that is both very old (as far as computer science
More informationTurnstile Streaming Algorithms Might as Well Be Linear Sketches
Turnstile Streaming Algorithms Might as Well Be Linear Sketches Yi Li Max-Planck Institute for Informatics yli@mpi-inf.mpg.de David P. Woodruff IBM Research, Almaden dpwoodru@us.ibm.com Huy L. Nguy ên
More informationSolutions to COMP9334 Week 8 Sample Problems
Solutions to COMP9334 Week 8 Sample Problems Problem 1: Customers arrive at a grocery store s checkout counter according to a Poisson process with rate 1 per minute. Each customer carries a number of items
More informationTests of Single Linear Coefficient Restrictions: t-tests and F-tests. 1. Basic Rules. 2. Testing Single Linear Coefficient Restrictions
ECONOMICS 35* -- NOTE ECON 35* -- NOTE Tests of Sngle Lnear Coeffcent Restrctons: t-tests and -tests Basc Rules Tests of a sngle lnear coeffcent restrcton can be performed usng ether a two-taled t-test
More informationHierarchical Sampling from Sketches: Estimating Functions over Data Streams
Hierarchical Sampling from Sketches: Estimating Functions over Data Streams Sumit Ganguly 1 and Lakshminath Bhuvanagiri 2 1 Indian Institute of Technology, Kanpur 2 Google Inc., Bangalore Abstract. We
More informationarxiv:quant-ph/ v1 29 May 2003
Quantum Lower Bounds for Collision and Element Distinctness with Small Range arxiv:quant-ph/0305179v1 29 May 2003 Andris Ambainis Abstract We give a general method for proving quantum lower bounds for
More informationIntroduction to Distributed Data Streams Graham Cormode
Introduction to Distributed Data Streams Graham Cormode graham@research.att.com Data is Massive Data is growing faster than our ability to store or index it There are 3 Billion Telephone Calls in US each
More informationImproved Algorithms for Distributed Entropy Monitoring
Improved Algorithms for Distributed Entropy Monitoring Jiecao Chen Indiana University Bloomington jiecchen@umail.iu.edu Qin Zhang Indiana University Bloomington qzhangcs@indiana.edu Abstract Modern data
More informationThe Magic of Random Sampling: From Surveys to Big Data
The Magic of Random Sampling: From Surveys to Big Data Edith Cohen Google Research Tel Aviv University Disclaimer: Random sampling is classic and well studied tool with enormous impact across disciplines.
More informationData stream algorithms via expander graphs
Data stream algorithms via expander graphs Sumit Ganguly sganguly@cse.iitk.ac.in IIT Kanpur, India Abstract. We present a simple way of designing deterministic algorithms for problems in the data stream
More informationFast Dimension Reduction
Fast Dimension Reduction MMDS 2008 Nir Ailon Google Research NY Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes (with Edo Liberty) The Fast Johnson Lindenstrauss Transform (with Bernard
More informationOn the Optimality of the Dimensionality Reduction Method
On the Optimality of the Dimensionality Reduction Method Alexandr Andoni MIT andoni@mit.edu Piotr Indyk MIT indyk@mit.edu Mihai Pǎtraşcu MIT mip@mit.edu Abstract We investigate the optimality of (1+)-approximation
More informationRandom Sampling on Big Data: Techniques and Applications Ke Yi
: Techniques and Applications Ke Yi Hong Kong University of Science and Technology yike@ust.hk Big Data in one slide The 3 V s: Volume Velocity Variety Integers, real numbers Points in a multi-dimensional
More informationU.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 3 Luca Trevisan August 31, 2017
U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 3 Luca Trevisan August 3, 207 Scribed by Keyhan Vakil Lecture 3 In which we complete the study of Independent Set and Max Cut in G n,p random graphs.
More information6.854 Advanced Algorithms
6.854 Advanced Algorithms Homework Solutions Hashing Bashing. Solution:. O(log U ) for the first level and for each of the O(n) second level functions, giving a total of O(n log U ) 2. Suppose we are using
More informationAn Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems
An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems Arnab Bhattacharyya Indian Institute of Science, Bangalore arnabb@csa.iisc.ernet.in Palash Dey Indian Institute of
More informationMaintaining Significant Stream Statistics over Sliding Windows
Maintaining Significant Stream Statistics over Sliding Windows L.K. Lee H.F. Ting Abstract In this paper, we introduce the Significant One Counting problem. Let ε and θ be respectively some user-specified
More informationThe quantum complexity of approximating the frequency moments
The quantum complexity of approximating the frequency moments Ashley Montanaro May 1, 2015 Abstract The th frequency moment of a sequence of integers is defined as F = j n j, where n j is the number of
More informationOptimal Bounds for Johnson-Lindenstrauss Transforms and Streaming Problems with Subconstant Error
Optimal Bounds for Johnson-Lindenstrauss Transforms and Streaming Problems with Subconstant Error T. S. JAYRAM and DAVID P. WOODRUFF, IBM Almaden The Johnson-Lindenstrauss transform is a dimensionality
More informationFacility Location in Sublinear Time
Facility Location in Sublinear Time Mihai Bădoiu 1, Artur Czumaj 2,, Piotr Indyk 1, and Christian Sohler 3, 1 MIT Computer Science and Artificial Intelligence Laboratory, Stata Center, Cambridge, Massachusetts
More informationSuccessive Derivatives and Integer Sequences
2 3 47 6 23 Journal of Integer Sequences, Vol 4 (20, Article 73 Successive Derivatives and Integer Sequences Rafael Jaimczu División Matemática Universidad Nacional de Luján Buenos Aires Argentina jaimczu@mailunlueduar
More informationTime-Decaying Aggregates in Out-of-order Streams
Time-Decaying Aggregates in Out-of-order Streams Graham Cormode AT&T Labs Research graham@research.att.com Flip Korn AT&T Labs Research flip@research.att.com Srikanta Tirthapura Iowa State University snt@iastate.edu
More informationarxiv: v1 [cs.ds] 24 May 2010
arxiv:1005.4344v1 [cs.ds] 24 May 2010 Max stable sketches: estimation of l α norms, dominance norms and point queries for non negative signals Stilian A. Stoev Department of Statistics University of Michigan,
More informationTIGHT BOUNDS FOR DATA STREAM ALGORITHMS AND COMMUNICATION PROBLEMS
TIGHT BOUNDS FOR DATA STREAM ALGORITHMS AND COMMUNICATION PROBLEMS by Mert Sağlam B.A., Sabanci University, 2008 a Thesis submitted in partial fulfillment of the requirements for the degree of Master of
More informationGraph & Geometry Problems in Data Streams
Graph & Geometry Problems in Data Streams 2009 Barbados Workshop on Computational Complexity Andrew McGregor Introduction Models: Graph Streams: Stream of edges E = {e 1, e 2,..., e m } describe a graph
More informationLecture 6 September 13, 2016
CS 395T: Sublinear Algorithms Fall 206 Prof. Eric Price Lecture 6 September 3, 206 Scribe: Shanshan Wu, Yitao Chen Overview Recap of last lecture. We talked about Johnson-Lindenstrauss (JL) lemma [JL84]
More informationLecture 1: Introduction to Sublinear Algorithms
CSE 522: Sublinear (and Streaming) Algorithms Spring 2014 Lecture 1: Introduction to Sublinear Algorithms March 31, 2014 Lecturer: Paul Beame Scribe: Paul Beame Too much data, too little time, space for
More informationCS47300: Web Information Search and Management
CS47300: Web Informaton Search and Management Probablstc Retreval Models Prof. Chrs Clfton 7 September 2018 Materal adapted from course created by Dr. Luo S, now leadng Albaba research group 14 Why probabltes
More information