Lecture 3 Frequency Moments, Heavy Hitters

Size: px
Start display at page:

Download "Lecture 3 Frequency Moments, Heavy Hitters"

Transcription

1 COMS E6998-9: Algorithmic Techniques for Massive Data Sep 15, 2015 Lecture 3 Frequency Moments, Heavy Hitters Instructor: Alex Andoni Scribes: Daniel Alabi, Wangda Zhang 1 Introduction This lecture is about the second frequency moment and heavy hitters. First, e present the Tug-of-War algorithm by Alon-Matias-Szegedy to obtain a 1 + ɛ approximation using O( 1 ɛ log n) space. Second, e define heavy hitters and present the CountMin algorithm hich can be used to obtain heavy hitters. 2 Second Frequency Moment Assume as in previous lectures that e are given a stream of length m from hich e ant to obtain a frequency vector f = (f 1, f 2,..., f n ) (n distinct elements) here f i is the frequency of the ith distinct element in the stream, then the kth frequency moment of the stream is F k = fi k (1) In Lectures 1 and 2, e estimated F 0 and F 1 corresponding to the number of distinct elements and the length of the stream respectively. No e ish to obtain the second moment: F 2 = fi 2 (2) The idea is to use i.i.d random variables r i (= r(i)) here P (r i = 1) = P (r i = 1) = 1 2 (Rademacher random variables) to get an estimator. Thus, E[r i ] = 0 and Var[r i ] = 1 (since r 2 i = 1 and Var[r i] = E[r 2 i ] (E[r i]) 2 ). We can think of r i as a random variable defined for every distinct element so that r : [n] { 1, +1} (3) As an example, let s consider hen the entries of the frequency vector is all 1s ( m = n). Let z = r i (4) Then E[z] = E[r i ] = 0 by linearity of expectation and using the distribution of r i. Similarly, Var[z] = Var[ r i ] = m. We can apply Chebyshev to obtain that z 0 = z O( m) ith constant probability. Turns out that this bound is tight. 1

2 No, let s consider the more general case and define z r i f i (5) Claim 1. E[z 2 ] = f 2 i = F 2 Proof. E[z 2 ] = E[( r i f i ) 2 ] (6) = E[ri 2 ]fi 2 + E[r i ]E[r j ]f i f j (7) i i j f 2 i E[r 2 i ] (8) f 2 i = F 2 (9) (7) holds because for i j, r i and r j are independent, hile (8) holds because E[r i ] = 0 for all i. Finally, (9) holds because r 2 i = 1( E[r2 i ] = 1). Claim 2. Var[z 2 ] O(1) F 2 2 Proof. First, let s compute E[z 4 ] E[z 4 ] = E[( r i f i ) 4 ] (10) i = E[ r i f i r r k f k r l f l ] (11) i,j,k,l = f i f j f k f l E[r i r j r k r l ] (12) i,j,k,l f 4 i + 6 i<j f 2 i f 2 j (13) O(1) ( i f 2 i ) 2 (14) (13) holds because: Of independence of the random variables The terms ith odd poers of r i evaluate to zero There are ( 4 2) = 6 terms of the form r 2 i r 2 j Finally, e obtain that Var[z 2 ] E[z 4 ] O(1) F 2 2 Having bounded E[z 2 ] and Var[z 2 ], e present the algorithm belo 2

3 2.1 Tug-of-War (Alon-Matias-Szegedy 1996) We maintain a counter z. 1. Initialize z = 0 2. When e see element i: z = z + r(i) 3. Return the estimator z 2 Earlier, e defined r i in the form r : [n] { 1, +1} here r i acts like a hash function. The hash function for the r i s should be 4-ise independent. Next, e apply the average trick using k = O( 1 ɛ 2 ) parallel runs of the algorithm to obtain a 1 + ɛ approximation in O( 1 ɛ 2 log n) space. 2.2 Linearity So far e have only considered simple estimators, and e next consider something more complex. Suppose e have to parts of a stream seen by to different estimators, and e ant to estimate the union of these to parts (i.e., ho e can combine them). Claim 3. Linearity: given estimates z for f and z for f, e can combine them as z = z + z for f = f + f. Proof. Since z = r i f i and z = r i f i, e have z + z = r i (f i + f i ). Note that e need to use the same randomness for to estimates. Similarly, e can use (z z ) 2 to estimate (f i f i )2. Hoever, e cannot use linearity in a similar ay for f i f i, and ill discuss this pointer later in class. 2.3 General Streaming Model We no consider a more generalized model: at each moment, e have an update (i, δ i ) to increase the i-th entry by δ i. (δ i may be negative) A linear algorithm S handles this easily, S(f + e i δ i ) = S(f) + S(e i δ i ). We call S a sketch. According to [Nguyen-Li-Woodruff 14], any algorithm for general streaming might as ell be linear. 3 Heavy Hitters No that e are able to compute many types of frequency counts, e onder if e can also compute the max frequency in a stream. It turns out that e cannot: it is impossible to approximate the max frequency in sublinear space. Therefore, e ill solve a more modest problem here e ant to detect the max-frequency element if it is very heavy. We ill sho that e can find these heavy hitters in space O(1/φ). Definition 4. i is φ-heavy if f i φ. 3

4 The basic idea is still to use hash functions. A first-attempt method uses a single hash function h mapping from [n] to [] randomly, here = O(1/φ). Then each element i goes to bucket i, and e sum up the frequencies in each of the buckets. We denote the sum of each bucket as S. So the estimator for f i is ˆf = S(h(i)). For example, consider a stream of 2, 5, 7, 5, 5. If the hash function h 1 orks as h 1 (2) = 2, h 1 (5) = 1, h 1 (7) = 2, then e ill obtain the folloing estimates: ˆf2 = 2, ˆf 5 = 3, ˆf 7 = 2. Hoever, for an element that never appears in the stream, e.g. 11, this method also estimates its frequency as f ˆ 11 = 2, assuming h 1 (11) = 2. Claim 5. S(h(i)) is a biased estimator. Proof. Analyzing this estimator, e have ˆf i = S(h(i)) = f i + {j:h(j)=h(i)} f j. Let C = {j:h(j)=h(i)} f j. Thus, E[C] = P r[h(j) = h(i)] f j = f j 0 j j i Hoever, it is easy to see that E[C] f i /. By Markov Inequality, e have Cle 10 ( P r[ ˆf i f i < O. So the bias is at most /, hich is small for ith at least 90% probability, i.e. ) ] 0.9 ( ) For = O 1 and f i φ, e have C ɛf i. That is, ˆf i is a (1 + ɛ) approximation (ith 90% probability). Still, there are to issues ith this estimator: (1) only constant probability; (2) overestimate for many indices (10%). Fundamentally, there is a conflict beteen avoiding many collisions and reducing space used by the hash table. 3.1 CountMin This motivates us to use the median trick. We can use L = O(log n) hash tables ith hash functions h j. The CountMin algorithm orks as follos: Initialize (r, L): array S[L][] L hash functions h 1,..., h L, into {1,..., } Process ( int i): for (j = 0; j < L; ++j) S[j][h j (i)] += 1 Estimator : foreach i in PossibleIP : ˆf i = median j (S[j][h j (i)] 4

5 Claim 6. The median is a ± estimator ith (1 1/n 2 ) probability. Proof. For an index i, each ro (out of the L ros) gives ˆf i = f i ± ith 90% probability. By Median Trick (see Lecture 2), the median gives an estimator of ± ith (1 1/n 2 ) probability. Alternatively, e can take the min, instead of median, since all counts are overestimated. 3.2 Output Heavy Hitters We can no identify the heavy hitters by iterating over all i s and outputing those ith particular, for true frequencies f i s, if if f i f i if φ(1 ɛ) φ(1 ɛ), then i is not in output; φ(1 + ɛ), then i is reported as a heavy hitter; f i φ(1 + ɛ), then i may or may not be reported as a heavy hitter. ˆf i φ. In If e really care about those elements in beteen, then e could take more space to further narro the gap. ( ) The space used is O log 2 n bits. Since e iterate over all i s, the time complexity is Ω(n). We can improve this time complexity, at the cost of increasing space to O ( log 3 n ) bits. The idea is to use dyadic intervals. For each level, e maintain its on sketch, and find the heavy hitters by folloing don the subtrees of heavy hitters in intermediary. 5

Lecture 6 September 13, 2016

Lecture 6 September 13, 2016 CS 395T: Sublinear Algorithms Fall 206 Prof. Eric Price Lecture 6 September 3, 206 Scribe: Shanshan Wu, Yitao Chen Overview Recap of last lecture. We talked about Johnson-Lindenstrauss (JL) lemma [JL84]

More information

CS 598CSC: Algorithms for Big Data Lecture date: Sept 11, 2014

CS 598CSC: Algorithms for Big Data Lecture date: Sept 11, 2014 CS 598CSC: Algorithms for Big Data Lecture date: Sept 11, 2014 Instructor: Chandra Cheuri Scribe: Chandra Cheuri The Misra-Greis deterministic counting guarantees that all items with frequency > F 1 /

More information

Lecture 3 Sept. 4, 2014

Lecture 3 Sept. 4, 2014 CS 395T: Sublinear Algorithms Fall 2014 Prof. Eric Price Lecture 3 Sept. 4, 2014 Scribe: Zhao Song In today s lecture, we will discuss the following problems: 1. Distinct elements 2. Turnstile model 3.

More information

Lecture 10. Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1

Lecture 10. Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1 Lecture 10 Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1 Recap Sublinear time algorithms Deterministic + exact: binary search Deterministic + inexact: estimating diameter in a

More information

Lecture 2. Frequency problems

Lecture 2. Frequency problems 1 / 43 Lecture 2. Frequency problems Ricard Gavaldà MIRI Seminar on Data Streams, Spring 2015 Contents 2 / 43 1 Frequency problems in data streams 2 Approximating inner product 3 Computing frequency moments

More information

Lecture 4: Hashing and Streaming Algorithms

Lecture 4: Hashing and Streaming Algorithms CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 4: Hashing and Streaming Algorithms Lecturer: Shayan Oveis Gharan 01/18/2017 Scribe: Yuqing Ai Disclaimer: These notes have not been subjected

More information

Data Streams & Communication Complexity

Data Streams & Communication Complexity Data Streams & Communication Complexity Lecture 1: Simple Stream Statistics in Small Space Andrew McGregor, UMass Amherst 1/25 Data Stream Model Stream: m elements from universe of size n, e.g., x 1, x

More information

Block Heavy Hitters Alexandr Andoni, Khanh Do Ba, and Piotr Indyk

Block Heavy Hitters Alexandr Andoni, Khanh Do Ba, and Piotr Indyk Computer Science and Artificial Intelligence Laboratory Technical Report -CSAIL-TR-2008-024 May 2, 2008 Block Heavy Hitters Alexandr Andoni, Khanh Do Ba, and Piotr Indyk massachusetts institute of technology,

More information

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018 15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018 Today we ll talk about a topic that is both very old (as far as computer science

More information

Lecture 2: Streaming Algorithms

Lecture 2: Streaming Algorithms CS369G: Algorithmic Techniques for Big Data Spring 2015-2016 Lecture 2: Streaming Algorithms Prof. Moses Chariar Scribes: Stephen Mussmann 1 Overview In this lecture, we first derive a concentration inequality

More information

Lecture 2 Sept. 8, 2015

Lecture 2 Sept. 8, 2015 CS 9r: Algorithms for Big Data Fall 5 Prof. Jelani Nelson Lecture Sept. 8, 5 Scribe: Jeffrey Ling Probability Recap Chebyshev: P ( X EX > λ) < V ar[x] λ Chernoff: For X,..., X n independent in [, ],

More information

An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems

An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems Arnab Bhattacharyya, Palash Dey, and David P. Woodruff Indian Institute of Science, Bangalore {arnabb,palash}@csa.iisc.ernet.in

More information

Randomized Smoothing Networks

Randomized Smoothing Networks Randomized Smoothing Netorks Maurice Herlihy Computer Science Dept., Bron University, Providence, RI, USA Srikanta Tirthapura Dept. of Electrical and Computer Engg., Ioa State University, Ames, IA, USA

More information

1 Estimating Frequency Moments in Streams

1 Estimating Frequency Moments in Streams CS 598CSC: Algorithms for Big Data Lecture date: August 28, 2014 Instructor: Chandra Chekuri Scribe: Chandra Chekuri 1 Estimating Frequency Moments in Streams A significant fraction of streaming literature

More information

These notes give a quick summary of the part of the theory of autonomous ordinary differential equations relevant to modeling zombie epidemics.

These notes give a quick summary of the part of the theory of autonomous ordinary differential equations relevant to modeling zombie epidemics. NOTES ON AUTONOMOUS ORDINARY DIFFERENTIAL EQUATIONS MARCH 2017 These notes give a quick summary of the part of the theory of autonomous ordinary differential equations relevant to modeling zombie epidemics.

More information

Some notes on streaming algorithms continued

Some notes on streaming algorithms continued U.C. Berkeley CS170: Algorithms Handout LN-11-9 Christos Papadimitriou & Luca Trevisan November 9, 016 Some notes on streaming algorithms continued Today we complete our quick review of streaming algorithms.

More information

Lecture Lecture 25 November 25, 2014

Lecture Lecture 25 November 25, 2014 CS 224: Advanced Algorithms Fall 2014 Lecture Lecture 25 November 25, 2014 Prof. Jelani Nelson Scribe: Keno Fischer 1 Today Finish faster exponential time algorithms (Inclusion-Exclusion/Zeta Transform,

More information

Big Data. Big data arises in many forms: Common themes:

Big Data. Big data arises in many forms: Common themes: Big Data Big data arises in many forms: Physical Measurements: from science (physics, astronomy) Medical data: genetic sequences, detailed time series Activity data: GPS location, social network activity

More information

12 Count-Min Sketch and Apriori Algorithm (and Bloom Filters)

12 Count-Min Sketch and Apriori Algorithm (and Bloom Filters) 12 Count-Min Sketch and Apriori Algorithm (and Bloom Filters) Many streaming algorithms use random hashing functions to compress data. They basically randomly map some data items on top of each other.

More information

The space complexity of approximating the frequency moments

The space complexity of approximating the frequency moments The space complexity of approximating the frequency moments Felix Biermeier November 24, 2015 1 Overview Introduction Approximations of frequency moments lower bounds 2 Frequency moments Problem Estimate

More information

14.1 Finding frequent elements in stream

14.1 Finding frequent elements in stream Chapter 14 Streaming Data Model 14.1 Finding frequent elements in stream A very useful statistics for many applications is to keep track of elements that occur more frequently. It can come in many flavours

More information

Minimize Cost of Materials

Minimize Cost of Materials Question 1: Ho do you find the optimal dimensions of a product? The size and shape of a product influences its functionality as ell as the cost to construct the product. If the dimensions of a product

More information

Bloom Filters and Locality-Sensitive Hashing

Bloom Filters and Locality-Sensitive Hashing Randomized Algorithms, Summer 2016 Bloom Filters and Locality-Sensitive Hashing Instructor: Thomas Kesselheim and Kurt Mehlhorn 1 Notation Lecture 4 (6 pages) When e talk about the probability of an event,

More information

Frequency Estimators

Frequency Estimators Frequency Estimators Outline for Today Randomized Data Structures Our next approach to improving performance. Count-Min Sketches A simple and powerful data structure for estimating frequencies. Count Sketches

More information

Data Stream Methods. Graham Cormode S. Muthukrishnan

Data Stream Methods. Graham Cormode S. Muthukrishnan Data Stream Methods Graham Cormode graham@dimacs.rutgers.edu S. Muthukrishnan muthu@cs.rutgers.edu Plan of attack Frequent Items / Heavy Hitters Counting Distinct Elements Clustering items in Streams Motivating

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

CMSC 858F: Algorithmic Lower Bounds: Fun with Hardness Proofs Fall 2014 Introduction to Streaming Algorithms

CMSC 858F: Algorithmic Lower Bounds: Fun with Hardness Proofs Fall 2014 Introduction to Streaming Algorithms CMSC 858F: Algorithmic Lower Bounds: Fun with Hardness Proofs Fall 2014 Introduction to Streaming Algorithms Instructor: Mohammad T. Hajiaghayi Scribe: Huijing Gong November 11, 2014 1 Overview In the

More information

Sparse Fourier Transform (lecture 1)

Sparse Fourier Transform (lecture 1) 1 / 73 Sparse Fourier Transform (lecture 1) Michael Kapralov 1 1 IBM Watson EPFL St. Petersburg CS Club November 2015 2 / 73 Given x C n, compute the Discrete Fourier Transform (DFT) of x: x i = 1 x n

More information

Bucket handles and Solenoids Notes by Carl Eberhart, March 2004

Bucket handles and Solenoids Notes by Carl Eberhart, March 2004 Bucket handles and Solenoids Notes by Carl Eberhart, March 004 1. Introduction A continuum is a nonempty, compact, connected metric space. A nonempty compact connected subspace of a continuum X is called

More information

ECE380 Digital Logic. Synchronous sequential circuits

ECE380 Digital Logic. Synchronous sequential circuits ECE38 Digital Logic Synchronous Sequential Circuits: State Diagrams, State Tables Dr. D. J. Jackson Lecture 27- Synchronous sequential circuits Circuits here a clock signal is used to control operation

More information

Mining Data Streams-Estimating Frequency Moment

Mining Data Streams-Estimating Frequency Moment Mnng Data Streams-Estmatng Frequency Moment Barna Saha October 26, 2017 Frequency Moment Computng moments nvolves dstrbuton of frequences of dfferent elements n the stream. Frequency Moment Computng moments

More information

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Lecture 9

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Lecture 9 CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Lecture 9 Allan Borodin March 13, 2016 1 / 33 Lecture 9 Announcements 1 I have the assignments graded by Lalla. 2 I have now posted five questions

More information

6.1 Occupancy Problem

6.1 Occupancy Problem 15-859(M): Randomized Algorithms Lecturer: Anupam Gupta Topic: Occupancy Problems and Hashing Date: Sep 9 Scribe: Runting Shi 6.1 Occupancy Problem Bins and Balls Throw n balls into n bins at random. 1.

More information

Lecture 1 September 3, 2013

Lecture 1 September 3, 2013 CS 229r: Algorithms for Big Data Fall 2013 Prof. Jelani Nelson Lecture 1 September 3, 2013 Scribes: Andrew Wang and Andrew Liu 1 Course Logistics The problem sets can be found on the course website: http://people.seas.harvard.edu/~minilek/cs229r/index.html

More information

Homework Set 2 Solutions

Homework Set 2 Solutions MATH 667-010 Introduction to Mathematical Finance Prof. D. A. Edards Due: Feb. 28, 2018 Homeork Set 2 Solutions 1. Consider the ruin problem. Suppose that a gambler starts ith ealth, and plays a game here

More information

Parallel Implementation of Proposed One Way Hash Function

Parallel Implementation of Proposed One Way Hash Function UDC:004.421.032.24:003.26 Parallel Implementation of Proposed One Way Hash Function Berisha A 1 1 University of Prishtina, Faculty of Mathematics and Natural Sciences, Kosovo artan.berisha@uni-pr.edu Abstract:

More information

Computing the Entropy of a Stream

Computing the Entropy of a Stream Computing the Entropy of a Stream To appear in SODA 2007 Graham Cormode graham@research.att.com Amit Chakrabarti Dartmouth College Andrew McGregor U. Penn / UCSD Outline Introduction Entropy Upper Bound

More information

Science Degree in Statistics at Iowa State College, Ames, Iowa in INTRODUCTION

Science Degree in Statistics at Iowa State College, Ames, Iowa in INTRODUCTION SEQUENTIAL SAMPLING OF INSECT POPULATIONSl By. G. H. IVES2 Mr.. G. H. Ives obtained his B.S.A. degree at the University of Manitoba in 1951 majoring in Entomology. He thm proceeded to a Masters of Science

More information

A turbulence closure based on the maximum entropy method

A turbulence closure based on the maximum entropy method Advances in Fluid Mechanics IX 547 A turbulence closure based on the maximum entropy method R. W. Derksen Department of Mechanical and Manufacturing Engineering University of Manitoba Winnipeg Canada Abstract

More information

1 Effects of Regularization For this problem, you are required to implement everything by yourself and submit code.

1 Effects of Regularization For this problem, you are required to implement everything by yourself and submit code. This set is due pm, January 9 th, via Moodle. You are free to collaborate on all of the problems, subject to the collaboration policy stated in the syllabus. Please include any code ith your submission.

More information

2 How many distinct elements are in a stream?

2 How many distinct elements are in a stream? Dealing with Massive Data January 31, 2011 Lecture 2: Distinct Element Counting Lecturer: Sergei Vassilvitskii Scribe:Ido Rosen & Yoonji Shin 1 Introduction We begin by defining the stream formally. Definition

More information

Estimating Dominance Norms of Multiple Data Streams Graham Cormode Joint work with S. Muthukrishnan

Estimating Dominance Norms of Multiple Data Streams Graham Cormode Joint work with S. Muthukrishnan Estimating Dominance Norms of Multiple Data Streams Graham Cormode graham@dimacs.rutgers.edu Joint work with S. Muthukrishnan Data Stream Phenomenon Data is being produced faster than our ability to process

More information

Tight Bounds for Distributed Streaming

Tight Bounds for Distributed Streaming Tight Bounds for Distributed Streaming (a.k.a., Distributed Functional Monitoring) David Woodruff IBM Research Almaden Qin Zhang MADALGO, Aarhus Univ. STOC 12 May 22, 2012 1-1 The distributed streaming

More information

Lecture 8 January 30, 2014

Lecture 8 January 30, 2014 MTH 995-3: Intro to CS and Big Data Spring 14 Inst. Mark Ien Lecture 8 January 3, 14 Scribe: Kishavan Bhola 1 Overvie In this lecture, e begin a probablistic method for approximating the Nearest Neighbor

More information

On the approximation of real powers of sparse, infinite, bounded and Hermitian matrices

On the approximation of real powers of sparse, infinite, bounded and Hermitian matrices On the approximation of real poers of sparse, infinite, bounded and Hermitian matrices Roman Werpachoski Center for Theoretical Physics, Al. Lotnikó 32/46 02-668 Warszaa, Poland Abstract We describe a

More information

Econ 201: Problem Set 3 Answers

Econ 201: Problem Set 3 Answers Econ 20: Problem Set 3 Ansers Instructor: Alexandre Sollaci T.A.: Ryan Hughes Winter 208 Question a) The firm s fixed cost is F C = a and variable costs are T V Cq) = 2 bq2. b) As seen in class, the optimal

More information

Outline of lectures 3-6

Outline of lectures 3-6 GENOME 453 J. Felsenstein Evolutionary Genetics Autumn, 013 Population genetics Outline of lectures 3-6 1. We ant to kno hat theory says about the reproduction of genotypes in a population. This results

More information

1 Maintaining a Dictionary

1 Maintaining a Dictionary 15-451/651: Design & Analysis of Algorithms February 1, 2016 Lecture #7: Hashing last changed: January 29, 2016 Hashing is a great practical tool, with an interesting and subtle theory too. In addition

More information

B669 Sublinear Algorithms for Big Data

B669 Sublinear Algorithms for Big Data B669 Sublinear Algorithms for Big Data Qin Zhang 1-1 2-1 Part 1: Sublinear in Space The model and challenge The data stream model (Alon, Matias and Szegedy 1996) a n a 2 a 1 RAM CPU Why hard? Cannot store

More information

STATC141 Spring 2005 The materials are from Pairwise Sequence Alignment by Robert Giegerich and David Wheeler

STATC141 Spring 2005 The materials are from Pairwise Sequence Alignment by Robert Giegerich and David Wheeler STATC141 Spring 2005 The materials are from Pairise Sequence Alignment by Robert Giegerich and David Wheeler Lecture 6, 02/08/05 The analysis of multiple DNA or protein sequences (I) Sequence similarity

More information

Race car Damping Some food for thought

Race car Damping Some food for thought Race car Damping Some food for thought The first article I ever rote for Racecar Engineering as ho to specify dampers using a dual rate damper model. This as an approach that my colleagues and I had applied

More information

Consider this problem. A person s utility function depends on consumption and leisure. Of his market time (the complement to leisure), h t

Consider this problem. A person s utility function depends on consumption and leisure. Of his market time (the complement to leisure), h t VI. INEQUALITY CONSTRAINED OPTIMIZATION Application of the Kuhn-Tucker conditions to inequality constrained optimization problems is another very, very important skill to your career as an economist. If

More information

Lecture Notes 8

Lecture Notes 8 14.451 Lecture Notes 8 Guido Lorenzoni Fall 29 1 Stochastic dynamic programming: an example We no turn to analyze problems ith uncertainty, in discrete time. We begin ith an example that illustrates the

More information

PHY 335 Data Analysis for Physicists

PHY 335 Data Analysis for Physicists PHY 335 Data Analysis or Physicists Instructor: Kok Wai Ng Oice CP 171 Telephone 7 1782 e mail: kng@uky.edu Oice hour: Thursday 11:00 12:00 a.m. or by appointment Time: Tuesday and Thursday 9:30 10:45

More information

Minimizing and maximizing compressor and turbine work respectively

Minimizing and maximizing compressor and turbine work respectively Minimizing and maximizing compressor and turbine ork respectively Reversible steady-flo ork In Chapter 3, Work Done during a rocess as found to be W b dv Work Done during a rocess It depends on the path

More information

Lecture 5: Hashing. David Woodruff Carnegie Mellon University

Lecture 5: Hashing. David Woodruff Carnegie Mellon University Lecture 5: Hashing David Woodruff Carnegie Mellon University Hashing Universal hashing Perfect hashing Maintaining a Dictionary Let U be a universe of keys U could be all strings of ASCII characters of

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

compare to comparison and pointer based sorting, binary trees

compare to comparison and pointer based sorting, binary trees Admin Hashing Dictionaries Model Operations. makeset, insert, delete, find keys are integers in M = {1,..., m} (so assume machine word size, or unit time, is log m) can store in array of size M using power:

More information

The Count-Min-Sketch and its Applications

The Count-Min-Sketch and its Applications The Count-Min-Sketch and its Applications Jannik Sundermeier Abstract In this thesis, we want to reveal how to get rid of a huge amount of data which is at least dicult or even impossible to store in local

More information

Lecture 9 Nearest Neighbor Search: Locality Sensitive Hashing.

Lecture 9 Nearest Neighbor Search: Locality Sensitive Hashing. COMS 4995-3: Advanced Algorithms Feb 15, 2017 Lecture 9 Nearest Neighbor Search: Locality Sensitive Hashing. Instructor: Alex Andoni Scribes: Weston Jackson, Edo Roth 1 Introduction Today s lecture is

More information

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7. Preliminaries Linear models: the perceptron and closest centroid algorithms Chapter 1, 7 Definition: The Euclidean dot product beteen to vectors is the expression d T x = i x i The dot product is also

More information

Adaptive Noise Cancellation

Adaptive Noise Cancellation Adaptive Noise Cancellation P. Comon and V. Zarzoso January 5, 2010 1 Introduction In numerous application areas, including biomedical engineering, radar, sonar and digital communications, the goal is

More information

Problem 1: (Chernoff Bounds via Negative Dependence - from MU Ex 5.15)

Problem 1: (Chernoff Bounds via Negative Dependence - from MU Ex 5.15) Problem 1: Chernoff Bounds via Negative Dependence - from MU Ex 5.15) While deriving lower bounds on the load of the maximum loaded bin when n balls are thrown in n bins, we saw the use of negative dependence.

More information

Homework 6 Solutions

Homework 6 Solutions Homeork 6 Solutions Igor Yanovsky (Math 151B TA) Section 114, Problem 1: For the boundary-value problem y (y ) y + log x, 1 x, y(1) 0, y() log, (1) rite the nonlinear system and formulas for Neton s method

More information

Logic Effort Revisited

Logic Effort Revisited Logic Effort Revisited Mark This note ill take another look at logical effort, first revieing the asic idea ehind logical effort, and then looking at some of the more sutle issues in sizing transistors..0

More information

Towards a Theory of Societal Co-Evolution: Individualism versus Collectivism

Towards a Theory of Societal Co-Evolution: Individualism versus Collectivism Toards a Theory of Societal Co-Evolution: Individualism versus Collectivism Kartik Ahuja, Simpson Zhang and Mihaela van der Schaar Department of Electrical Engineering, Department of Economics, UCLA Theorem

More information

Lecture 2 Sep 5, 2017

Lecture 2 Sep 5, 2017 CS 388R: Randomized Algorithms Fall 2017 Lecture 2 Sep 5, 2017 Prof. Eric Price Scribe: V. Orestis Papadigenopoulos and Patrick Rall NOTE: THESE NOTES HAVE NOT BEEN EDITED OR CHECKED FOR CORRECTNESS 1

More information

3 - Vector Spaces Definition vector space linear space u, v,

3 - Vector Spaces Definition vector space linear space u, v, 3 - Vector Spaces Vectors in R and R 3 are essentially matrices. They can be vieed either as column vectors (matrices of size and 3, respectively) or ro vectors ( and 3 matrices). The addition and scalar

More information

Elements of Economic Analysis II Lecture VII: Equilibrium in a Competitive Market

Elements of Economic Analysis II Lecture VII: Equilibrium in a Competitive Market Elements of Economic Analysis II Lecture VII: Equilibrium in a Competitive Market Kai Hao Yang 10/31/2017 1 Partial Equilibrium in a Competitive Market In the previous lecture, e derived the aggregate

More information

6.842 Randomness and Computation March 3, Lecture 8

6.842 Randomness and Computation March 3, Lecture 8 6.84 Randomness and Computation March 3, 04 Lecture 8 Lecturer: Ronitt Rubinfeld Scribe: Daniel Grier Useful Linear Algebra Let v = (v, v,..., v n ) be a non-zero n-dimensional row vector and P an n n

More information

Ch. 2 Math Preliminaries for Lossless Compression. Section 2.4 Coding

Ch. 2 Math Preliminaries for Lossless Compression. Section 2.4 Coding Ch. 2 Math Preliminaries for Lossless Compression Section 2.4 Coding Some General Considerations Definition: An Instantaneous Code maps each symbol into a codeord Notation: a i φ (a i ) Ex. : a 0 For Ex.

More information

Improved Concentration Bounds for Count-Sketch

Improved Concentration Bounds for Count-Sketch Improved Concentration Bounds for Count-Sketch Gregory T. Minton 1 Eric Price 2 1 MIT MSR New England 2 MIT IBM Almaden UT Austin 2014-01-06 Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds

More information

The Dot Product

The Dot Product The Dot Product 1-9-017 If = ( 1,, 3 ) and = ( 1,, 3 ) are ectors, the dot product of and is defined algebraically as = 1 1 + + 3 3. Example. (a) Compute the dot product (,3, 7) ( 3,,0). (b) Compute the

More information

How Long Does it Take to Catch a Wild Kangaroo?

How Long Does it Take to Catch a Wild Kangaroo? Ho Long Does it Take to Catch a Wild Kangaroo? Ravi Montenegro Prasad Tetali ABSTRACT The discrete logarithm problem asks to solve for the exponent x, given the generator g of a cyclic group G and an element

More information

Linear Sketches A Useful Tool in Streaming and Compressive Sensing

Linear Sketches A Useful Tool in Streaming and Compressive Sensing Linear Sketches A Useful Tool in Streaming and Compressive Sensing Qin Zhang 1-1 Linear sketch Random linear projection M : R n R k that preserves properties of any v R n with high prob. where k n. M =

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 12 Scribe: Indraneel Mukherjee March 12, 2008

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 12 Scribe: Indraneel Mukherjee March 12, 2008 COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture # 12 Scribe: Indraneel Mukherjee March 12, 2008 In the previous lecture, e ere introduced to the SVM algorithm and its basic motivation

More information

COS 424: Interacting with Data. Lecturer: Rob Schapire Lecture #15 Scribe: Haipeng Zheng April 5, 2007

COS 424: Interacting with Data. Lecturer: Rob Schapire Lecture #15 Scribe: Haipeng Zheng April 5, 2007 COS 424: Interacting ith Data Lecturer: Rob Schapire Lecture #15 Scribe: Haipeng Zheng April 5, 2007 Recapitulation of Last Lecture In linear regression, e need to avoid adding too much richness to the

More information

A ROBUST BEAMFORMER BASED ON WEIGHTED SPARSE CONSTRAINT

A ROBUST BEAMFORMER BASED ON WEIGHTED SPARSE CONSTRAINT Progress In Electromagnetics Research Letters, Vol. 16, 53 60, 2010 A ROBUST BEAMFORMER BASED ON WEIGHTED SPARSE CONSTRAINT Y. P. Liu and Q. Wan School of Electronic Engineering University of Electronic

More information

Ad Placement Strategies

Ad Placement Strategies Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January

More information

4/26/2017. More algorithms for streams: Each element of data stream is a tuple Given a list of keys S Determine which tuples of stream are in S

4/26/2017. More algorithms for streams: Each element of data stream is a tuple Given a list of keys S Determine which tuples of stream are in S Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit

More information

Lecture 01 August 31, 2017

Lecture 01 August 31, 2017 Sketching Algorithms for Big Data Fall 2017 Prof. Jelani Nelson Lecture 01 August 31, 2017 Scribe: Vinh-Kha Le 1 Overview In this lecture, we overviewed the six main topics covered in the course, reviewed

More information

On A Comparison between Two Measures of Spatial Association

On A Comparison between Two Measures of Spatial Association Journal of Modern Applied Statistical Methods Volume 9 Issue Article 3 5--00 On A Comparison beteen To Measures of Spatial Association Faisal G. Khamis Al-Zaytoonah University of Jordan, faisal_alshamari@yahoo.com

More information

y(x) = x w + ε(x), (1)

y(x) = x w + ε(x), (1) Linear regression We are ready to consider our first machine-learning problem: linear regression. Suppose that e are interested in the values of a function y(x): R d R, here x is a d-dimensional vector-valued

More information

Business Cycles: The Classical Approach

Business Cycles: The Classical Approach San Francisco State University ECON 302 Business Cycles: The Classical Approach Introduction Michael Bar Recall from the introduction that the output per capita in the U.S. is groing steady, but there

More information

1-Pass Relative-Error L p -Sampling with Applications

1-Pass Relative-Error L p -Sampling with Applications -Pass Relative-Error L p -Sampling with Applications Morteza Monemizadeh David P Woodruff Abstract For any p [0, 2], we give a -pass poly(ε log n)-space algorithm which, given a data stream of length m

More information

EXAMINATION PAPER NUMBER OF PAGES: 4

EXAMINATION PAPER NUMBER OF PAGES: 4 EXAMINATION PAPER SUBJECT: CERTIFICATE IN STRATA CONTROL (COAL) SUBJECT CODE: EXAMINATION DATE: 18 OCTOBER 011 TIME: 14h30 17h30 EXAMINER: D. NEAL MODERATOR: B. MADDEN TOTAL MARKS: [100] PASS MARK: 60%

More information

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 13 October 6, 2016 Scribe: Kiyeon Jeon and Loc Hoang 1 Overview In the last lecture we covered the lower bound for p th moment (p > 2) and

More information

Title: Count-Min Sketch Name: Graham Cormode 1 Affil./Addr. Department of Computer Science, University of Warwick,

Title: Count-Min Sketch Name: Graham Cormode 1 Affil./Addr. Department of Computer Science, University of Warwick, Title: Count-Min Sketch Name: Graham Cormode 1 Affil./Addr. Department of Computer Science, University of Warwick, Coventry, UK Keywords: streaming algorithms; frequent items; approximate counting, sketch

More information

Lecture 1: Introduction to Sublinear Algorithms

Lecture 1: Introduction to Sublinear Algorithms CSE 522: Sublinear (and Streaming) Algorithms Spring 2014 Lecture 1: Introduction to Sublinear Algorithms March 31, 2014 Lecturer: Paul Beame Scribe: Paul Beame Too much data, too little time, space for

More information

The Probability of Pathogenicity in Clinical Genetic Testing: A Solution for the Variant of Uncertain Significance

The Probability of Pathogenicity in Clinical Genetic Testing: A Solution for the Variant of Uncertain Significance International Journal of Statistics and Probability; Vol. 5, No. 4; July 2016 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education The Probability of Pathogenicity in Clinical

More information

Problem 1. Possible Solution. Let f be the 2π-periodic functions defined by f(x) = cos ( )

Problem 1. Possible Solution. Let f be the 2π-periodic functions defined by f(x) = cos ( ) Problem Let f be the π-periodic functions defined by f() = cos ( ) hen [ π, π]. Make a draing of the function f for the interval [ 3π, 3π], and compute the Fourier series of f. Use the result to compute

More information

CMPSCI 711: More Advanced Algorithms

CMPSCI 711: More Advanced Algorithms CMPSCI 711: More Advanced Algorithms Section 1-1: Sampling Andrew McGregor Last Compiled: April 29, 2012 1/14 Concentration Bounds Theorem (Markov) Let X be a non-negative random variable with expectation

More information

1 Approximate Quantiles and Summaries

1 Approximate Quantiles and Summaries CS 598CSC: Algorithms for Big Data Lecture date: Sept 25, 2014 Instructor: Chandra Chekuri Scribe: Chandra Chekuri Suppose we have a stream a 1, a 2,..., a n of objects from an ordered universe. For simplicity

More information

15-855: Intensive Intro to Complexity Theory Spring Lecture 16: Nisan s PRG for small space

15-855: Intensive Intro to Complexity Theory Spring Lecture 16: Nisan s PRG for small space 15-855: Intensive Intro to Complexity Theory Spring 2009 Lecture 16: Nisan s PRG for small space For the next few lectures we will study derandomization. In algorithms classes one often sees clever randomized

More information

Fundamentals of DC Testing

Fundamentals of DC Testing Fundamentals of DC Testing Aether Lee erigy Japan Abstract n the beginning of this lecture, Ohm s la, hich is the most important electric la regarding DC testing, ill be revieed. Then, in the second section,

More information

Estimating Frequencies and Finding Heavy Hitters Jonas Nicolai Hovmand, Morten Houmøller Nygaard,

Estimating Frequencies and Finding Heavy Hitters Jonas Nicolai Hovmand, Morten Houmøller Nygaard, Estimating Frequencies and Finding Heavy Hitters Jonas Nicolai Hovmand, 2011 3884 Morten Houmøller Nygaard, 2011 4582 Master s Thesis, Computer Science June 2016 Main Supervisor: Gerth Stølting Brodal

More information

An Analog Analogue of a Digital Quantum Computation

An Analog Analogue of a Digital Quantum Computation An Analog Analogue of a Digital Quantum Computation arxiv:quant-ph/9612026v1 6 Dec 1996 Edard Farhi Center for Theoretical Physics Massachusetts Institute of Technology Cambridge, MA 02139 Sam Gutmann

More information

A proof of topological completeness for S4 in (0,1)

A proof of topological completeness for S4 in (0,1) A proof of topological completeness for S4 in (,) Grigori Mints and Ting Zhang 2 Philosophy Department, Stanford University mints@csli.stanford.edu 2 Computer Science Department, Stanford University tingz@cs.stanford.edu

More information

A Generalization of a result of Catlin: 2-factors in line graphs

A Generalization of a result of Catlin: 2-factors in line graphs AUSTRALASIAN JOURNAL OF COMBINATORICS Volume 72(2) (2018), Pages 164 184 A Generalization of a result of Catlin: 2-factors in line graphs Ronald J. Gould Emory University Atlanta, Georgia U.S.A. rg@mathcs.emory.edu

More information

11 Heavy Hitters Streaming Majority

11 Heavy Hitters Streaming Majority 11 Heavy Hitters A core mining problem is to find items that occur more than one would expect. These may be called outliers, anomalies, or other terms. Statistical models can be layered on top of or underneath

More information