Lecture 3 Frequency Moments, Heavy Hitters
|
|
- Fay Black
- 6 years ago
- Views:
Transcription
1 COMS E6998-9: Algorithmic Techniques for Massive Data Sep 15, 2015 Lecture 3 Frequency Moments, Heavy Hitters Instructor: Alex Andoni Scribes: Daniel Alabi, Wangda Zhang 1 Introduction This lecture is about the second frequency moment and heavy hitters. First, e present the Tug-of-War algorithm by Alon-Matias-Szegedy to obtain a 1 + ɛ approximation using O( 1 ɛ log n) space. Second, e define heavy hitters and present the CountMin algorithm hich can be used to obtain heavy hitters. 2 Second Frequency Moment Assume as in previous lectures that e are given a stream of length m from hich e ant to obtain a frequency vector f = (f 1, f 2,..., f n ) (n distinct elements) here f i is the frequency of the ith distinct element in the stream, then the kth frequency moment of the stream is F k = fi k (1) In Lectures 1 and 2, e estimated F 0 and F 1 corresponding to the number of distinct elements and the length of the stream respectively. No e ish to obtain the second moment: F 2 = fi 2 (2) The idea is to use i.i.d random variables r i (= r(i)) here P (r i = 1) = P (r i = 1) = 1 2 (Rademacher random variables) to get an estimator. Thus, E[r i ] = 0 and Var[r i ] = 1 (since r 2 i = 1 and Var[r i] = E[r 2 i ] (E[r i]) 2 ). We can think of r i as a random variable defined for every distinct element so that r : [n] { 1, +1} (3) As an example, let s consider hen the entries of the frequency vector is all 1s ( m = n). Let z = r i (4) Then E[z] = E[r i ] = 0 by linearity of expectation and using the distribution of r i. Similarly, Var[z] = Var[ r i ] = m. We can apply Chebyshev to obtain that z 0 = z O( m) ith constant probability. Turns out that this bound is tight. 1
2 No, let s consider the more general case and define z r i f i (5) Claim 1. E[z 2 ] = f 2 i = F 2 Proof. E[z 2 ] = E[( r i f i ) 2 ] (6) = E[ri 2 ]fi 2 + E[r i ]E[r j ]f i f j (7) i i j f 2 i E[r 2 i ] (8) f 2 i = F 2 (9) (7) holds because for i j, r i and r j are independent, hile (8) holds because E[r i ] = 0 for all i. Finally, (9) holds because r 2 i = 1( E[r2 i ] = 1). Claim 2. Var[z 2 ] O(1) F 2 2 Proof. First, let s compute E[z 4 ] E[z 4 ] = E[( r i f i ) 4 ] (10) i = E[ r i f i r r k f k r l f l ] (11) i,j,k,l = f i f j f k f l E[r i r j r k r l ] (12) i,j,k,l f 4 i + 6 i<j f 2 i f 2 j (13) O(1) ( i f 2 i ) 2 (14) (13) holds because: Of independence of the random variables The terms ith odd poers of r i evaluate to zero There are ( 4 2) = 6 terms of the form r 2 i r 2 j Finally, e obtain that Var[z 2 ] E[z 4 ] O(1) F 2 2 Having bounded E[z 2 ] and Var[z 2 ], e present the algorithm belo 2
3 2.1 Tug-of-War (Alon-Matias-Szegedy 1996) We maintain a counter z. 1. Initialize z = 0 2. When e see element i: z = z + r(i) 3. Return the estimator z 2 Earlier, e defined r i in the form r : [n] { 1, +1} here r i acts like a hash function. The hash function for the r i s should be 4-ise independent. Next, e apply the average trick using k = O( 1 ɛ 2 ) parallel runs of the algorithm to obtain a 1 + ɛ approximation in O( 1 ɛ 2 log n) space. 2.2 Linearity So far e have only considered simple estimators, and e next consider something more complex. Suppose e have to parts of a stream seen by to different estimators, and e ant to estimate the union of these to parts (i.e., ho e can combine them). Claim 3. Linearity: given estimates z for f and z for f, e can combine them as z = z + z for f = f + f. Proof. Since z = r i f i and z = r i f i, e have z + z = r i (f i + f i ). Note that e need to use the same randomness for to estimates. Similarly, e can use (z z ) 2 to estimate (f i f i )2. Hoever, e cannot use linearity in a similar ay for f i f i, and ill discuss this pointer later in class. 2.3 General Streaming Model We no consider a more generalized model: at each moment, e have an update (i, δ i ) to increase the i-th entry by δ i. (δ i may be negative) A linear algorithm S handles this easily, S(f + e i δ i ) = S(f) + S(e i δ i ). We call S a sketch. According to [Nguyen-Li-Woodruff 14], any algorithm for general streaming might as ell be linear. 3 Heavy Hitters No that e are able to compute many types of frequency counts, e onder if e can also compute the max frequency in a stream. It turns out that e cannot: it is impossible to approximate the max frequency in sublinear space. Therefore, e ill solve a more modest problem here e ant to detect the max-frequency element if it is very heavy. We ill sho that e can find these heavy hitters in space O(1/φ). Definition 4. i is φ-heavy if f i φ. 3
4 The basic idea is still to use hash functions. A first-attempt method uses a single hash function h mapping from [n] to [] randomly, here = O(1/φ). Then each element i goes to bucket i, and e sum up the frequencies in each of the buckets. We denote the sum of each bucket as S. So the estimator for f i is ˆf = S(h(i)). For example, consider a stream of 2, 5, 7, 5, 5. If the hash function h 1 orks as h 1 (2) = 2, h 1 (5) = 1, h 1 (7) = 2, then e ill obtain the folloing estimates: ˆf2 = 2, ˆf 5 = 3, ˆf 7 = 2. Hoever, for an element that never appears in the stream, e.g. 11, this method also estimates its frequency as f ˆ 11 = 2, assuming h 1 (11) = 2. Claim 5. S(h(i)) is a biased estimator. Proof. Analyzing this estimator, e have ˆf i = S(h(i)) = f i + {j:h(j)=h(i)} f j. Let C = {j:h(j)=h(i)} f j. Thus, E[C] = P r[h(j) = h(i)] f j = f j 0 j j i Hoever, it is easy to see that E[C] f i /. By Markov Inequality, e have Cle 10 ( P r[ ˆf i f i < O. So the bias is at most /, hich is small for ith at least 90% probability, i.e. ) ] 0.9 ( ) For = O 1 and f i φ, e have C ɛf i. That is, ˆf i is a (1 + ɛ) approximation (ith 90% probability). Still, there are to issues ith this estimator: (1) only constant probability; (2) overestimate for many indices (10%). Fundamentally, there is a conflict beteen avoiding many collisions and reducing space used by the hash table. 3.1 CountMin This motivates us to use the median trick. We can use L = O(log n) hash tables ith hash functions h j. The CountMin algorithm orks as follos: Initialize (r, L): array S[L][] L hash functions h 1,..., h L, into {1,..., } Process ( int i): for (j = 0; j < L; ++j) S[j][h j (i)] += 1 Estimator : foreach i in PossibleIP : ˆf i = median j (S[j][h j (i)] 4
5 Claim 6. The median is a ± estimator ith (1 1/n 2 ) probability. Proof. For an index i, each ro (out of the L ros) gives ˆf i = f i ± ith 90% probability. By Median Trick (see Lecture 2), the median gives an estimator of ± ith (1 1/n 2 ) probability. Alternatively, e can take the min, instead of median, since all counts are overestimated. 3.2 Output Heavy Hitters We can no identify the heavy hitters by iterating over all i s and outputing those ith particular, for true frequencies f i s, if if f i f i if φ(1 ɛ) φ(1 ɛ), then i is not in output; φ(1 + ɛ), then i is reported as a heavy hitter; f i φ(1 + ɛ), then i may or may not be reported as a heavy hitter. ˆf i φ. In If e really care about those elements in beteen, then e could take more space to further narro the gap. ( ) The space used is O log 2 n bits. Since e iterate over all i s, the time complexity is Ω(n). We can improve this time complexity, at the cost of increasing space to O ( log 3 n ) bits. The idea is to use dyadic intervals. For each level, e maintain its on sketch, and find the heavy hitters by folloing don the subtrees of heavy hitters in intermediary. 5
Lecture 6 September 13, 2016
CS 395T: Sublinear Algorithms Fall 206 Prof. Eric Price Lecture 6 September 3, 206 Scribe: Shanshan Wu, Yitao Chen Overview Recap of last lecture. We talked about Johnson-Lindenstrauss (JL) lemma [JL84]
More informationCS 598CSC: Algorithms for Big Data Lecture date: Sept 11, 2014
CS 598CSC: Algorithms for Big Data Lecture date: Sept 11, 2014 Instructor: Chandra Cheuri Scribe: Chandra Cheuri The Misra-Greis deterministic counting guarantees that all items with frequency > F 1 /
More informationLecture 3 Sept. 4, 2014
CS 395T: Sublinear Algorithms Fall 2014 Prof. Eric Price Lecture 3 Sept. 4, 2014 Scribe: Zhao Song In today s lecture, we will discuss the following problems: 1. Distinct elements 2. Turnstile model 3.
More informationLecture 10. Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1
Lecture 10 Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1 Recap Sublinear time algorithms Deterministic + exact: binary search Deterministic + inexact: estimating diameter in a
More informationLecture 2. Frequency problems
1 / 43 Lecture 2. Frequency problems Ricard Gavaldà MIRI Seminar on Data Streams, Spring 2015 Contents 2 / 43 1 Frequency problems in data streams 2 Approximating inner product 3 Computing frequency moments
More informationLecture 4: Hashing and Streaming Algorithms
CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 4: Hashing and Streaming Algorithms Lecturer: Shayan Oveis Gharan 01/18/2017 Scribe: Yuqing Ai Disclaimer: These notes have not been subjected
More informationData Streams & Communication Complexity
Data Streams & Communication Complexity Lecture 1: Simple Stream Statistics in Small Space Andrew McGregor, UMass Amherst 1/25 Data Stream Model Stream: m elements from universe of size n, e.g., x 1, x
More informationBlock Heavy Hitters Alexandr Andoni, Khanh Do Ba, and Piotr Indyk
Computer Science and Artificial Intelligence Laboratory Technical Report -CSAIL-TR-2008-024 May 2, 2008 Block Heavy Hitters Alexandr Andoni, Khanh Do Ba, and Piotr Indyk massachusetts institute of technology,
More information15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018
15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018 Today we ll talk about a topic that is both very old (as far as computer science
More informationLecture 2: Streaming Algorithms
CS369G: Algorithmic Techniques for Big Data Spring 2015-2016 Lecture 2: Streaming Algorithms Prof. Moses Chariar Scribes: Stephen Mussmann 1 Overview In this lecture, we first derive a concentration inequality
More informationLecture 2 Sept. 8, 2015
CS 9r: Algorithms for Big Data Fall 5 Prof. Jelani Nelson Lecture Sept. 8, 5 Scribe: Jeffrey Ling Probability Recap Chebyshev: P ( X EX > λ) < V ar[x] λ Chernoff: For X,..., X n independent in [, ],
More informationAn Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems
An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems Arnab Bhattacharyya, Palash Dey, and David P. Woodruff Indian Institute of Science, Bangalore {arnabb,palash}@csa.iisc.ernet.in
More informationRandomized Smoothing Networks
Randomized Smoothing Netorks Maurice Herlihy Computer Science Dept., Bron University, Providence, RI, USA Srikanta Tirthapura Dept. of Electrical and Computer Engg., Ioa State University, Ames, IA, USA
More information1 Estimating Frequency Moments in Streams
CS 598CSC: Algorithms for Big Data Lecture date: August 28, 2014 Instructor: Chandra Chekuri Scribe: Chandra Chekuri 1 Estimating Frequency Moments in Streams A significant fraction of streaming literature
More informationThese notes give a quick summary of the part of the theory of autonomous ordinary differential equations relevant to modeling zombie epidemics.
NOTES ON AUTONOMOUS ORDINARY DIFFERENTIAL EQUATIONS MARCH 2017 These notes give a quick summary of the part of the theory of autonomous ordinary differential equations relevant to modeling zombie epidemics.
More informationSome notes on streaming algorithms continued
U.C. Berkeley CS170: Algorithms Handout LN-11-9 Christos Papadimitriou & Luca Trevisan November 9, 016 Some notes on streaming algorithms continued Today we complete our quick review of streaming algorithms.
More informationLecture Lecture 25 November 25, 2014
CS 224: Advanced Algorithms Fall 2014 Lecture Lecture 25 November 25, 2014 Prof. Jelani Nelson Scribe: Keno Fischer 1 Today Finish faster exponential time algorithms (Inclusion-Exclusion/Zeta Transform,
More informationBig Data. Big data arises in many forms: Common themes:
Big Data Big data arises in many forms: Physical Measurements: from science (physics, astronomy) Medical data: genetic sequences, detailed time series Activity data: GPS location, social network activity
More information12 Count-Min Sketch and Apriori Algorithm (and Bloom Filters)
12 Count-Min Sketch and Apriori Algorithm (and Bloom Filters) Many streaming algorithms use random hashing functions to compress data. They basically randomly map some data items on top of each other.
More informationThe space complexity of approximating the frequency moments
The space complexity of approximating the frequency moments Felix Biermeier November 24, 2015 1 Overview Introduction Approximations of frequency moments lower bounds 2 Frequency moments Problem Estimate
More information14.1 Finding frequent elements in stream
Chapter 14 Streaming Data Model 14.1 Finding frequent elements in stream A very useful statistics for many applications is to keep track of elements that occur more frequently. It can come in many flavours
More informationMinimize Cost of Materials
Question 1: Ho do you find the optimal dimensions of a product? The size and shape of a product influences its functionality as ell as the cost to construct the product. If the dimensions of a product
More informationBloom Filters and Locality-Sensitive Hashing
Randomized Algorithms, Summer 2016 Bloom Filters and Locality-Sensitive Hashing Instructor: Thomas Kesselheim and Kurt Mehlhorn 1 Notation Lecture 4 (6 pages) When e talk about the probability of an event,
More informationFrequency Estimators
Frequency Estimators Outline for Today Randomized Data Structures Our next approach to improving performance. Count-Min Sketches A simple and powerful data structure for estimating frequencies. Count Sketches
More informationData Stream Methods. Graham Cormode S. Muthukrishnan
Data Stream Methods Graham Cormode graham@dimacs.rutgers.edu S. Muthukrishnan muthu@cs.rutgers.edu Plan of attack Frequent Items / Heavy Hitters Counting Distinct Elements Clustering items in Streams Motivating
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationCMSC 858F: Algorithmic Lower Bounds: Fun with Hardness Proofs Fall 2014 Introduction to Streaming Algorithms
CMSC 858F: Algorithmic Lower Bounds: Fun with Hardness Proofs Fall 2014 Introduction to Streaming Algorithms Instructor: Mohammad T. Hajiaghayi Scribe: Huijing Gong November 11, 2014 1 Overview In the
More informationSparse Fourier Transform (lecture 1)
1 / 73 Sparse Fourier Transform (lecture 1) Michael Kapralov 1 1 IBM Watson EPFL St. Petersburg CS Club November 2015 2 / 73 Given x C n, compute the Discrete Fourier Transform (DFT) of x: x i = 1 x n
More informationBucket handles and Solenoids Notes by Carl Eberhart, March 2004
Bucket handles and Solenoids Notes by Carl Eberhart, March 004 1. Introduction A continuum is a nonempty, compact, connected metric space. A nonempty compact connected subspace of a continuum X is called
More informationECE380 Digital Logic. Synchronous sequential circuits
ECE38 Digital Logic Synchronous Sequential Circuits: State Diagrams, State Tables Dr. D. J. Jackson Lecture 27- Synchronous sequential circuits Circuits here a clock signal is used to control operation
More informationMining Data Streams-Estimating Frequency Moment
Mnng Data Streams-Estmatng Frequency Moment Barna Saha October 26, 2017 Frequency Moment Computng moments nvolves dstrbuton of frequences of dfferent elements n the stream. Frequency Moment Computng moments
More informationCSC2420 Fall 2012: Algorithm Design, Analysis and Theory Lecture 9
CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Lecture 9 Allan Borodin March 13, 2016 1 / 33 Lecture 9 Announcements 1 I have the assignments graded by Lalla. 2 I have now posted five questions
More information6.1 Occupancy Problem
15-859(M): Randomized Algorithms Lecturer: Anupam Gupta Topic: Occupancy Problems and Hashing Date: Sep 9 Scribe: Runting Shi 6.1 Occupancy Problem Bins and Balls Throw n balls into n bins at random. 1.
More informationLecture 1 September 3, 2013
CS 229r: Algorithms for Big Data Fall 2013 Prof. Jelani Nelson Lecture 1 September 3, 2013 Scribes: Andrew Wang and Andrew Liu 1 Course Logistics The problem sets can be found on the course website: http://people.seas.harvard.edu/~minilek/cs229r/index.html
More informationHomework Set 2 Solutions
MATH 667-010 Introduction to Mathematical Finance Prof. D. A. Edards Due: Feb. 28, 2018 Homeork Set 2 Solutions 1. Consider the ruin problem. Suppose that a gambler starts ith ealth, and plays a game here
More informationParallel Implementation of Proposed One Way Hash Function
UDC:004.421.032.24:003.26 Parallel Implementation of Proposed One Way Hash Function Berisha A 1 1 University of Prishtina, Faculty of Mathematics and Natural Sciences, Kosovo artan.berisha@uni-pr.edu Abstract:
More informationComputing the Entropy of a Stream
Computing the Entropy of a Stream To appear in SODA 2007 Graham Cormode graham@research.att.com Amit Chakrabarti Dartmouth College Andrew McGregor U. Penn / UCSD Outline Introduction Entropy Upper Bound
More informationScience Degree in Statistics at Iowa State College, Ames, Iowa in INTRODUCTION
SEQUENTIAL SAMPLING OF INSECT POPULATIONSl By. G. H. IVES2 Mr.. G. H. Ives obtained his B.S.A. degree at the University of Manitoba in 1951 majoring in Entomology. He thm proceeded to a Masters of Science
More informationA turbulence closure based on the maximum entropy method
Advances in Fluid Mechanics IX 547 A turbulence closure based on the maximum entropy method R. W. Derksen Department of Mechanical and Manufacturing Engineering University of Manitoba Winnipeg Canada Abstract
More information1 Effects of Regularization For this problem, you are required to implement everything by yourself and submit code.
This set is due pm, January 9 th, via Moodle. You are free to collaborate on all of the problems, subject to the collaboration policy stated in the syllabus. Please include any code ith your submission.
More information2 How many distinct elements are in a stream?
Dealing with Massive Data January 31, 2011 Lecture 2: Distinct Element Counting Lecturer: Sergei Vassilvitskii Scribe:Ido Rosen & Yoonji Shin 1 Introduction We begin by defining the stream formally. Definition
More informationEstimating Dominance Norms of Multiple Data Streams Graham Cormode Joint work with S. Muthukrishnan
Estimating Dominance Norms of Multiple Data Streams Graham Cormode graham@dimacs.rutgers.edu Joint work with S. Muthukrishnan Data Stream Phenomenon Data is being produced faster than our ability to process
More informationTight Bounds for Distributed Streaming
Tight Bounds for Distributed Streaming (a.k.a., Distributed Functional Monitoring) David Woodruff IBM Research Almaden Qin Zhang MADALGO, Aarhus Univ. STOC 12 May 22, 2012 1-1 The distributed streaming
More informationLecture 8 January 30, 2014
MTH 995-3: Intro to CS and Big Data Spring 14 Inst. Mark Ien Lecture 8 January 3, 14 Scribe: Kishavan Bhola 1 Overvie In this lecture, e begin a probablistic method for approximating the Nearest Neighbor
More informationOn the approximation of real powers of sparse, infinite, bounded and Hermitian matrices
On the approximation of real poers of sparse, infinite, bounded and Hermitian matrices Roman Werpachoski Center for Theoretical Physics, Al. Lotnikó 32/46 02-668 Warszaa, Poland Abstract We describe a
More informationEcon 201: Problem Set 3 Answers
Econ 20: Problem Set 3 Ansers Instructor: Alexandre Sollaci T.A.: Ryan Hughes Winter 208 Question a) The firm s fixed cost is F C = a and variable costs are T V Cq) = 2 bq2. b) As seen in class, the optimal
More informationOutline of lectures 3-6
GENOME 453 J. Felsenstein Evolutionary Genetics Autumn, 013 Population genetics Outline of lectures 3-6 1. We ant to kno hat theory says about the reproduction of genotypes in a population. This results
More information1 Maintaining a Dictionary
15-451/651: Design & Analysis of Algorithms February 1, 2016 Lecture #7: Hashing last changed: January 29, 2016 Hashing is a great practical tool, with an interesting and subtle theory too. In addition
More informationB669 Sublinear Algorithms for Big Data
B669 Sublinear Algorithms for Big Data Qin Zhang 1-1 2-1 Part 1: Sublinear in Space The model and challenge The data stream model (Alon, Matias and Szegedy 1996) a n a 2 a 1 RAM CPU Why hard? Cannot store
More informationSTATC141 Spring 2005 The materials are from Pairwise Sequence Alignment by Robert Giegerich and David Wheeler
STATC141 Spring 2005 The materials are from Pairise Sequence Alignment by Robert Giegerich and David Wheeler Lecture 6, 02/08/05 The analysis of multiple DNA or protein sequences (I) Sequence similarity
More informationRace car Damping Some food for thought
Race car Damping Some food for thought The first article I ever rote for Racecar Engineering as ho to specify dampers using a dual rate damper model. This as an approach that my colleagues and I had applied
More informationConsider this problem. A person s utility function depends on consumption and leisure. Of his market time (the complement to leisure), h t
VI. INEQUALITY CONSTRAINED OPTIMIZATION Application of the Kuhn-Tucker conditions to inequality constrained optimization problems is another very, very important skill to your career as an economist. If
More informationLecture Notes 8
14.451 Lecture Notes 8 Guido Lorenzoni Fall 29 1 Stochastic dynamic programming: an example We no turn to analyze problems ith uncertainty, in discrete time. We begin ith an example that illustrates the
More informationPHY 335 Data Analysis for Physicists
PHY 335 Data Analysis or Physicists Instructor: Kok Wai Ng Oice CP 171 Telephone 7 1782 e mail: kng@uky.edu Oice hour: Thursday 11:00 12:00 a.m. or by appointment Time: Tuesday and Thursday 9:30 10:45
More informationMinimizing and maximizing compressor and turbine work respectively
Minimizing and maximizing compressor and turbine ork respectively Reversible steady-flo ork In Chapter 3, Work Done during a rocess as found to be W b dv Work Done during a rocess It depends on the path
More informationLecture 5: Hashing. David Woodruff Carnegie Mellon University
Lecture 5: Hashing David Woodruff Carnegie Mellon University Hashing Universal hashing Perfect hashing Maintaining a Dictionary Let U be a universe of keys U could be all strings of ASCII characters of
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More informationcompare to comparison and pointer based sorting, binary trees
Admin Hashing Dictionaries Model Operations. makeset, insert, delete, find keys are integers in M = {1,..., m} (so assume machine word size, or unit time, is log m) can store in array of size M using power:
More informationThe Count-Min-Sketch and its Applications
The Count-Min-Sketch and its Applications Jannik Sundermeier Abstract In this thesis, we want to reveal how to get rid of a huge amount of data which is at least dicult or even impossible to store in local
More informationLecture 9 Nearest Neighbor Search: Locality Sensitive Hashing.
COMS 4995-3: Advanced Algorithms Feb 15, 2017 Lecture 9 Nearest Neighbor Search: Locality Sensitive Hashing. Instructor: Alex Andoni Scribes: Weston Jackson, Edo Roth 1 Introduction Today s lecture is
More informationLinear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.
Preliminaries Linear models: the perceptron and closest centroid algorithms Chapter 1, 7 Definition: The Euclidean dot product beteen to vectors is the expression d T x = i x i The dot product is also
More informationAdaptive Noise Cancellation
Adaptive Noise Cancellation P. Comon and V. Zarzoso January 5, 2010 1 Introduction In numerous application areas, including biomedical engineering, radar, sonar and digital communications, the goal is
More informationProblem 1: (Chernoff Bounds via Negative Dependence - from MU Ex 5.15)
Problem 1: Chernoff Bounds via Negative Dependence - from MU Ex 5.15) While deriving lower bounds on the load of the maximum loaded bin when n balls are thrown in n bins, we saw the use of negative dependence.
More informationHomework 6 Solutions
Homeork 6 Solutions Igor Yanovsky (Math 151B TA) Section 114, Problem 1: For the boundary-value problem y (y ) y + log x, 1 x, y(1) 0, y() log, (1) rite the nonlinear system and formulas for Neton s method
More informationLogic Effort Revisited
Logic Effort Revisited Mark This note ill take another look at logical effort, first revieing the asic idea ehind logical effort, and then looking at some of the more sutle issues in sizing transistors..0
More informationTowards a Theory of Societal Co-Evolution: Individualism versus Collectivism
Toards a Theory of Societal Co-Evolution: Individualism versus Collectivism Kartik Ahuja, Simpson Zhang and Mihaela van der Schaar Department of Electrical Engineering, Department of Economics, UCLA Theorem
More informationLecture 2 Sep 5, 2017
CS 388R: Randomized Algorithms Fall 2017 Lecture 2 Sep 5, 2017 Prof. Eric Price Scribe: V. Orestis Papadigenopoulos and Patrick Rall NOTE: THESE NOTES HAVE NOT BEEN EDITED OR CHECKED FOR CORRECTNESS 1
More information3 - Vector Spaces Definition vector space linear space u, v,
3 - Vector Spaces Vectors in R and R 3 are essentially matrices. They can be vieed either as column vectors (matrices of size and 3, respectively) or ro vectors ( and 3 matrices). The addition and scalar
More informationElements of Economic Analysis II Lecture VII: Equilibrium in a Competitive Market
Elements of Economic Analysis II Lecture VII: Equilibrium in a Competitive Market Kai Hao Yang 10/31/2017 1 Partial Equilibrium in a Competitive Market In the previous lecture, e derived the aggregate
More information6.842 Randomness and Computation March 3, Lecture 8
6.84 Randomness and Computation March 3, 04 Lecture 8 Lecturer: Ronitt Rubinfeld Scribe: Daniel Grier Useful Linear Algebra Let v = (v, v,..., v n ) be a non-zero n-dimensional row vector and P an n n
More informationCh. 2 Math Preliminaries for Lossless Compression. Section 2.4 Coding
Ch. 2 Math Preliminaries for Lossless Compression Section 2.4 Coding Some General Considerations Definition: An Instantaneous Code maps each symbol into a codeord Notation: a i φ (a i ) Ex. : a 0 For Ex.
More informationImproved Concentration Bounds for Count-Sketch
Improved Concentration Bounds for Count-Sketch Gregory T. Minton 1 Eric Price 2 1 MIT MSR New England 2 MIT IBM Almaden UT Austin 2014-01-06 Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds
More informationThe Dot Product
The Dot Product 1-9-017 If = ( 1,, 3 ) and = ( 1,, 3 ) are ectors, the dot product of and is defined algebraically as = 1 1 + + 3 3. Example. (a) Compute the dot product (,3, 7) ( 3,,0). (b) Compute the
More informationHow Long Does it Take to Catch a Wild Kangaroo?
Ho Long Does it Take to Catch a Wild Kangaroo? Ravi Montenegro Prasad Tetali ABSTRACT The discrete logarithm problem asks to solve for the exponent x, given the generator g of a cyclic group G and an element
More informationLinear Sketches A Useful Tool in Streaming and Compressive Sensing
Linear Sketches A Useful Tool in Streaming and Compressive Sensing Qin Zhang 1-1 Linear sketch Random linear projection M : R n R k that preserves properties of any v R n with high prob. where k n. M =
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 12 Scribe: Indraneel Mukherjee March 12, 2008
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture # 12 Scribe: Indraneel Mukherjee March 12, 2008 In the previous lecture, e ere introduced to the SVM algorithm and its basic motivation
More informationCOS 424: Interacting with Data. Lecturer: Rob Schapire Lecture #15 Scribe: Haipeng Zheng April 5, 2007
COS 424: Interacting ith Data Lecturer: Rob Schapire Lecture #15 Scribe: Haipeng Zheng April 5, 2007 Recapitulation of Last Lecture In linear regression, e need to avoid adding too much richness to the
More informationA ROBUST BEAMFORMER BASED ON WEIGHTED SPARSE CONSTRAINT
Progress In Electromagnetics Research Letters, Vol. 16, 53 60, 2010 A ROBUST BEAMFORMER BASED ON WEIGHTED SPARSE CONSTRAINT Y. P. Liu and Q. Wan School of Electronic Engineering University of Electronic
More informationAd Placement Strategies
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January
More information4/26/2017. More algorithms for streams: Each element of data stream is a tuple Given a list of keys S Determine which tuples of stream are in S
Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit
More informationLecture 01 August 31, 2017
Sketching Algorithms for Big Data Fall 2017 Prof. Jelani Nelson Lecture 01 August 31, 2017 Scribe: Vinh-Kha Le 1 Overview In this lecture, we overviewed the six main topics covered in the course, reviewed
More informationOn A Comparison between Two Measures of Spatial Association
Journal of Modern Applied Statistical Methods Volume 9 Issue Article 3 5--00 On A Comparison beteen To Measures of Spatial Association Faisal G. Khamis Al-Zaytoonah University of Jordan, faisal_alshamari@yahoo.com
More informationy(x) = x w + ε(x), (1)
Linear regression We are ready to consider our first machine-learning problem: linear regression. Suppose that e are interested in the values of a function y(x): R d R, here x is a d-dimensional vector-valued
More informationBusiness Cycles: The Classical Approach
San Francisco State University ECON 302 Business Cycles: The Classical Approach Introduction Michael Bar Recall from the introduction that the output per capita in the U.S. is groing steady, but there
More information1-Pass Relative-Error L p -Sampling with Applications
-Pass Relative-Error L p -Sampling with Applications Morteza Monemizadeh David P Woodruff Abstract For any p [0, 2], we give a -pass poly(ε log n)-space algorithm which, given a data stream of length m
More informationEXAMINATION PAPER NUMBER OF PAGES: 4
EXAMINATION PAPER SUBJECT: CERTIFICATE IN STRATA CONTROL (COAL) SUBJECT CODE: EXAMINATION DATE: 18 OCTOBER 011 TIME: 14h30 17h30 EXAMINER: D. NEAL MODERATOR: B. MADDEN TOTAL MARKS: [100] PASS MARK: 60%
More informationLecture 13 October 6, Covering Numbers and Maurey s Empirical Method
CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 13 October 6, 2016 Scribe: Kiyeon Jeon and Loc Hoang 1 Overview In the last lecture we covered the lower bound for p th moment (p > 2) and
More informationTitle: Count-Min Sketch Name: Graham Cormode 1 Affil./Addr. Department of Computer Science, University of Warwick,
Title: Count-Min Sketch Name: Graham Cormode 1 Affil./Addr. Department of Computer Science, University of Warwick, Coventry, UK Keywords: streaming algorithms; frequent items; approximate counting, sketch
More informationLecture 1: Introduction to Sublinear Algorithms
CSE 522: Sublinear (and Streaming) Algorithms Spring 2014 Lecture 1: Introduction to Sublinear Algorithms March 31, 2014 Lecturer: Paul Beame Scribe: Paul Beame Too much data, too little time, space for
More informationThe Probability of Pathogenicity in Clinical Genetic Testing: A Solution for the Variant of Uncertain Significance
International Journal of Statistics and Probability; Vol. 5, No. 4; July 2016 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education The Probability of Pathogenicity in Clinical
More informationProblem 1. Possible Solution. Let f be the 2π-periodic functions defined by f(x) = cos ( )
Problem Let f be the π-periodic functions defined by f() = cos ( ) hen [ π, π]. Make a draing of the function f for the interval [ 3π, 3π], and compute the Fourier series of f. Use the result to compute
More informationCMPSCI 711: More Advanced Algorithms
CMPSCI 711: More Advanced Algorithms Section 1-1: Sampling Andrew McGregor Last Compiled: April 29, 2012 1/14 Concentration Bounds Theorem (Markov) Let X be a non-negative random variable with expectation
More information1 Approximate Quantiles and Summaries
CS 598CSC: Algorithms for Big Data Lecture date: Sept 25, 2014 Instructor: Chandra Chekuri Scribe: Chandra Chekuri Suppose we have a stream a 1, a 2,..., a n of objects from an ordered universe. For simplicity
More information15-855: Intensive Intro to Complexity Theory Spring Lecture 16: Nisan s PRG for small space
15-855: Intensive Intro to Complexity Theory Spring 2009 Lecture 16: Nisan s PRG for small space For the next few lectures we will study derandomization. In algorithms classes one often sees clever randomized
More informationFundamentals of DC Testing
Fundamentals of DC Testing Aether Lee erigy Japan Abstract n the beginning of this lecture, Ohm s la, hich is the most important electric la regarding DC testing, ill be revieed. Then, in the second section,
More informationEstimating Frequencies and Finding Heavy Hitters Jonas Nicolai Hovmand, Morten Houmøller Nygaard,
Estimating Frequencies and Finding Heavy Hitters Jonas Nicolai Hovmand, 2011 3884 Morten Houmøller Nygaard, 2011 4582 Master s Thesis, Computer Science June 2016 Main Supervisor: Gerth Stølting Brodal
More informationAn Analog Analogue of a Digital Quantum Computation
An Analog Analogue of a Digital Quantum Computation arxiv:quant-ph/9612026v1 6 Dec 1996 Edard Farhi Center for Theoretical Physics Massachusetts Institute of Technology Cambridge, MA 02139 Sam Gutmann
More informationA proof of topological completeness for S4 in (0,1)
A proof of topological completeness for S4 in (,) Grigori Mints and Ting Zhang 2 Philosophy Department, Stanford University mints@csli.stanford.edu 2 Computer Science Department, Stanford University tingz@cs.stanford.edu
More informationA Generalization of a result of Catlin: 2-factors in line graphs
AUSTRALASIAN JOURNAL OF COMBINATORICS Volume 72(2) (2018), Pages 164 184 A Generalization of a result of Catlin: 2-factors in line graphs Ronald J. Gould Emory University Atlanta, Georgia U.S.A. rg@mathcs.emory.edu
More information11 Heavy Hitters Streaming Majority
11 Heavy Hitters A core mining problem is to find items that occur more than one would expect. These may be called outliers, anomalies, or other terms. Statistical models can be layered on top of or underneath
More information