CS 591, Lecture 9 Data Analytics: Theory and Applications Boston University

Size: px
Start display at page:

Download "CS 591, Lecture 9 Data Analytics: Theory and Applications Boston University"

Transcription

1 CS 591, Lecture 9 Data Analytics: Theory and Applications Boston University Babis Tsourakakis February 22nd, 2017

2 Announcement We will cover the Monday s 2/20 lecture (President s day) by appending half an hour to next Monday s and Wednesday s lectures. Next week s office hours for next week: pm Babis Tsourakakis CS 591 Data Analytics, Lecture 9 2 / 25

3 Reminder: Reservoir Sampling Input: Stream of n items that does not fit in memory x 1,..., x n Goal: Keep a uniform random sample Simple: Draw a random integer {1,..., n}, and select that item. Issue: We don t know the length n of the stream a priori Problem: Find a uniform sample s from a stream of unknown length Babis Tsourakakis CS 591 Data Analytics, Lecture 9 3 / 25

4 Algorithm: Reservoir Sampling 1 Initially we set s x 1 2 When i-th element arrives, we set s x i with probability 1 i Intuition: Correctness: What is the probability that s = x i at some time t i? Babis Tsourakakis CS 591 Data Analytics, Lecture 9 4 / 25

5 Algorithm: Reservoir Sampling Correctness: What is the probability that s = x i at some time t i? Pr [s = x i ] = 1 i ( 1 1 ) ( ) = 1 i + 1 t t. Question: What about the space complexity? Babis Tsourakakis CS 591 Data Analytics, Lecture 9 5 / 25

6 Extension: Weighted Reservoir Sampling Input: Stream of w 1, w 2,... Goal: Select index i with probability proportional to w i Again, we don t know the length n of the stream a priori Define for each index i W i = w w i. Any ideas? Babis Tsourakakis CS 591 Data Analytics, Lecture 9 6 / 25

7 Extension: Weighted Reservoir Sampling Intuition: Babis Tsourakakis CS 591 Data Analytics, Lecture 9 7 / 25

8 Algorithm: Weighted Reservoir Sampling 1 Initially we set s x 1 2 When i-th element arrives, we set s x i with probability w i W i 1 +w i Correctness: On seeing w i+1 : We switch to w i+1 with probability w i+1 W i+1 If we don t switch the winner j i remains with probability Pr [s = w j ] = (1 w i+1 W i+1 ) w j W i = w j W i+1. Question: What about the space complexity? Babis Tsourakakis CS 591 Data Analytics, Lecture 9 8 / 25

9 Reservoir Sampling, k items Input: Stream of items x 1, x 2,..., Parameter k 1 Goal: Keep k uniform random samples Ideas? Babis Tsourakakis CS 591 Data Analytics, Lecture 9 9 / 25

10 Extension: Weighted Reservoir Sampling Intuition: # k-sets that do not contain i = ( i 1 k ) ( i k) = 1 k i. Babis Tsourakakis CS 591 Data Analytics, Lecture 9 10 / 25

11 Python implementation import random def reservoir_sampling ( stream, k): S =[] counter = 1 for x in stream : if(counter <=k): S. append (x) else : c = random. randint (0, counter -1); if( c<k): S[c] = x counter +=1 return S Babis Tsourakakis CS 591 Data Analytics, Lecture 9 11 / 25

12 Probabilistic Counting Researchers at Bell Labs needed to count overlapping substrings of three letters. Why? To develop statistic-based spell-checker. This spellchecker (typo) was included in early Unix distributions Robert Morris came up with an elegant way of counting each item approximately [Morris, 1978] Why approximately? 8-bit counters can hold counts up to 255 (too small range)... Babis Tsourakakis CS 591 Data Analytics, Lecture 9 12 / 25

13 Probabilistic Counting Increments performed in a probabilistic manner Suppose we use constant probabilities, i.e., we increase the counter from i to i + 1 with probability p Why is this problematic? Morris idea: probability Pr [i i + 1] = f (i) depends on the actual counter value Outline: Morris Morris+ Morris We will follow Jelani Nelson s exposition (M,M+,M++). Babis Tsourakakis CS 591 Data Analytics, Lecture 9 13 / 25

14 Morris algorithm 1 Initialize X 0. 2 For each event, increment X with probability 1 2 X. 3 Output ñ = 2 X 1. Notice that our data structure just maintains an integer! We will see that this integer grows as log n, hence we need O(log log n) bits. Babis Tsourakakis CS 591 Data Analytics, Lecture 9 14 / 25

15 Example: Morris algorithm Source: [Flajolet, 1985] Pr [ñ = 1] = 8 38, Pr [ñ = 2] =, Pr [ñ = 3] = 17 1, Pr [ñ = 4] = Babis Tsourakakis CS 591 Data Analytics, Lecture 9 15 / 25

16 Morris algorithm: Analysis X n := Morris counter after n updates Let s compute E [ 2 Xn ]. Idea 1: Let s apply the formula E [ 2 Xn] = n 2 l p n,l. l=0 We have to compute Pr [X n = l] = p n,l Babis Tsourakakis CS 591 Data Analytics, Lecture 9 16 / 25

17 Morris algorithm: Analysis Suppose we reached state l via n 1 transitions from state 1 to state 1, n 2 from state 2 to state 2,..., n l from state l to l (1 2 1 ) n (1 2 2 ) n (1 2 (l 1) ) n l 1 2 (l 1) (1 2 l ) n l. with the condition that n n l + l 1 = n. Tedious approach... Let s try an inductive approach. Babis Tsourakakis CS 591 Data Analytics, Lecture 9 17 / 25

18 Morris algorithm : Analysis 1 Base case. It s obviously true for n = 0. 2 Induction step. E [ 2 ] X n+1 = = = Pr [X n = j] E [ 2 X n+1 X n = j ] j=0 Pr [X n = j] (2 j (1 1 2 ) j+1 ) j 2 j j=0 Pr [X n = j] 2 j + j=0 j = E [ 2 Xn] + 1 = (n + 1) + 1 Pr [X n = j] Babis Tsourakakis CS 591 Data Analytics, Lecture 9 18 / 25

19 Morris algorithm : Analysis In a similar way, we can show that E [ 2 2Xn] = 3 2 n n + 1. Therefore Var [ 2 Xn] < 1 2 n2. Chebyshev s inequality: Pr [ ñ n ɛn] < 1 2ɛ 2. Babis Tsourakakis CS 591 Data Analytics, Lecture 9 19 / 25

20 Details For the second moment, again we use induction: E [ 2 2Xn] = j 0 = j 0 2 2j Pr [X n = j] 2 2j ( Pr [X n 1 = j] ( 1 2 j) + Pr [X n 1 = j 1]2 (j 1) ) =... = E [ 2 2X n 1 ] + 3E [ 2 X n 1 ] = 3 2 (n 1)2 + 3 (n 1) n 2 = 3 2 n n + 1 Babis Tsourakakis CS 591 Data Analytics, Lecture 9 20 / 25

21 Morris algorithm Source: [Flajolet, 1985] Exact probability distribution for Morris counter for n = 1024 Variance is not bad, but we want to do better! Morris+ Babis Tsourakakis CS 591 Data Analytics, Lecture 9 21 / 25

22 Morris+ algorithm We instantiate s independent copies of Morris and average their outputs. s = 1 2ɛ 2 δ Our estimate becomes ñ = 1 s s ñ i. i=1 Now Chebyshev s inequality gives Next step is Morris++! Pr [ ñ n ɛn] < 1 2sɛ 2 < δ. Babis Tsourakakis CS 591 Data Analytics, Lecture 9 22 / 25

23 Morris++ algorithm Run t = 36 lg 1 δ copies of Morris+ with failure probability δ = 1 3 Output the median estimate from the t copies of Morris+ In other words, we have t coin tosses, each results in 1 with probability 1 3, and in 0 with probability 2 3. Failure if median is 0, otherwise success Chernoff! Babis Tsourakakis CS 591 Data Analytics, Lecture 9 23 / 25

24 Morris++ algorithm analysis Y i = By Chernoff bound, { 1, if i-th Morris+ fails. 0, otherwise. [ ] [ [ ] ] Pr Y i > t Pr Y i E Y i > t < δ. 2 6 i i i Morris++ estimate ñ (1 ± ɛ)-approximates n with probability at least (1 δ). Question: What is the space complexity that we achieved? Babis Tsourakakis CS 591 Data Analytics, Lecture 9 24 / 25

25 references I Flajolet, P. (1985). Approximate counting: a detailed analysis. BIT Numerical Mathematics, 25(1): Morris, R. (1978). Counting large numbers of events in small registers. Communications of the ACM, 21(10): Babis Tsourakakis CS 591 Data Analytics, Lecture 9 25 / 25

Lecture 1 September 3, 2013

Lecture 1 September 3, 2013 CS 229r: Algorithms for Big Data Fall 2013 Prof. Jelani Nelson Lecture 1 September 3, 2013 Scribes: Andrew Wang and Andrew Liu 1 Course Logistics The problem sets can be found on the course website: http://people.seas.harvard.edu/~minilek/cs229r/index.html

More information

Lecture 01 August 31, 2017

Lecture 01 August 31, 2017 Sketching Algorithms for Big Data Fall 2017 Prof. Jelani Nelson Lecture 01 August 31, 2017 Scribe: Vinh-Kha Le 1 Overview In this lecture, we overviewed the six main topics covered in the course, reviewed

More information

Lecture 2 Sept. 8, 2015

Lecture 2 Sept. 8, 2015 CS 9r: Algorithms for Big Data Fall 5 Prof. Jelani Nelson Lecture Sept. 8, 5 Scribe: Jeffrey Ling Probability Recap Chebyshev: P ( X EX > λ) < V ar[x] λ Chernoff: For X,..., X n independent in [, ],

More information

CS 591, Lecture 7 Data Analytics: Theory and Applications Boston University

CS 591, Lecture 7 Data Analytics: Theory and Applications Boston University CS 591, Lecture 7 Data Analytics: Theory and Applications Boston University Babis Tsourakakis February 13th, 2017 Bloom Filter Approximate membership problem Highly space-efficient randomized data structure

More information

The space complexity of approximating the frequency moments

The space complexity of approximating the frequency moments The space complexity of approximating the frequency moments Felix Biermeier November 24, 2015 1 Overview Introduction Approximations of frequency moments lower bounds 2 Frequency moments Problem Estimate

More information

Lecture 3 Sept. 4, 2014

Lecture 3 Sept. 4, 2014 CS 395T: Sublinear Algorithms Fall 2014 Prof. Eric Price Lecture 3 Sept. 4, 2014 Scribe: Zhao Song In today s lecture, we will discuss the following problems: 1. Distinct elements 2. Turnstile model 3.

More information

Lecture 5: Probabilistic tools and Applications II

Lecture 5: Probabilistic tools and Applications II T-79.7003: Graphs and Networks Fall 2013 Lecture 5: Probabilistic tools and Applications II Lecturer: Charalampos E. Tsourakakis Oct. 11, 2013 5.1 Overview In the first part of today s lecture we will

More information

1 Approximate Quantiles and Summaries

1 Approximate Quantiles and Summaries CS 598CSC: Algorithms for Big Data Lecture date: Sept 25, 2014 Instructor: Chandra Chekuri Scribe: Chandra Chekuri Suppose we have a stream a 1, a 2,..., a n of objects from an ordered universe. For simplicity

More information

1 Estimating Frequency Moments in Streams

1 Estimating Frequency Moments in Streams CS 598CSC: Algorithms for Big Data Lecture date: August 28, 2014 Instructor: Chandra Chekuri Scribe: Chandra Chekuri 1 Estimating Frequency Moments in Streams A significant fraction of streaming literature

More information

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018 15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018 Today we ll talk about a topic that is both very old (as far as computer science

More information

2 How many distinct elements are in a stream?

2 How many distinct elements are in a stream? Dealing with Massive Data January 31, 2011 Lecture 2: Distinct Element Counting Lecturer: Sergei Vassilvitskii Scribe:Ido Rosen & Yoonji Shin 1 Introduction We begin by defining the stream formally. Definition

More information

Lecture 4: Sampling, Tail Inequalities

Lecture 4: Sampling, Tail Inequalities Lecture 4: Sampling, Tail Inequalities Variance and Covariance Moment and Deviation Concentration and Tail Inequalities Sampling and Estimation c Hung Q. Ngo (SUNY at Buffalo) CSE 694 A Fun Course 1 /

More information

Some notes on streaming algorithms continued

Some notes on streaming algorithms continued U.C. Berkeley CS170: Algorithms Handout LN-11-9 Christos Papadimitriou & Luca Trevisan November 9, 016 Some notes on streaming algorithms continued Today we complete our quick review of streaming algorithms.

More information

Stanford University CS254: Computational Complexity Handout 8 Luca Trevisan 4/21/2010

Stanford University CS254: Computational Complexity Handout 8 Luca Trevisan 4/21/2010 Stanford University CS254: Computational Complexity Handout 8 Luca Trevisan 4/2/200 Counting Problems Today we describe counting problems and the class #P that they define, and we show that every counting

More information

Discrete Random Variables

Discrete Random Variables Discrete Random Variables An Undergraduate Introduction to Financial Mathematics J. Robert Buchanan 2014 Introduction The markets can be thought of as a complex interaction of a large number of random

More information

Lecture 2: Streaming Algorithms

Lecture 2: Streaming Algorithms CS369G: Algorithmic Techniques for Big Data Spring 2015-2016 Lecture 2: Streaming Algorithms Prof. Moses Chariar Scribes: Stephen Mussmann 1 Overview In this lecture, we first derive a concentration inequality

More information

14.1 Finding frequent elements in stream

14.1 Finding frequent elements in stream Chapter 14 Streaming Data Model 14.1 Finding frequent elements in stream A very useful statistics for many applications is to keep track of elements that occur more frequently. It can come in many flavours

More information

CSCI8980 Algorithmic Techniques for Big Data September 12, Lecture 2

CSCI8980 Algorithmic Techniques for Big Data September 12, Lecture 2 CSCI8980 Algorithmic Techniques for Big Data September, 03 Dr. Barna Saha Lecture Scribe: Matt Nohelty Overview We continue our discussion on data streaming models where streams of elements are coming

More information

Introduction to Randomized Algorithms III

Introduction to Randomized Algorithms III Introduction to Randomized Algorithms III Joaquim Madeira Version 0.1 November 2017 U. Aveiro, November 2017 1 Overview Probabilistic counters Counting with probability 1 / 2 Counting with probability

More information

CPSC 490 Problem Solving in Computer Science

CPSC 490 Problem Solving in Computer Science CPSC 490 Problem Solving in Computer Science Lecture 6: Recovering Solutions from DP, Bitmask DP, Matrix Exponentiation David Zheng and Daniel Du Based on CPSC490 slides from 2014-2017 2018/01/23 Announcements

More information

Lecture 2. Frequency problems

Lecture 2. Frequency problems 1 / 43 Lecture 2. Frequency problems Ricard Gavaldà MIRI Seminar on Data Streams, Spring 2015 Contents 2 / 43 1 Frequency problems in data streams 2 Approximating inner product 3 Computing frequency moments

More information

ECE 302: Probabilistic Methods in Electrical Engineering

ECE 302: Probabilistic Methods in Electrical Engineering ECE 302: Probabilistic Methods in Electrical Engineering Test I : Chapters 1 3 3/22/04, 7:30 PM Print Name: Read every question carefully and solve each problem in a legible and ordered manner. Make sure

More information

Frequency Estimators

Frequency Estimators Frequency Estimators Outline for Today Randomized Data Structures Our next approach to improving performance. Count-Min Sketches A simple and powerful data structure for estimating frequencies. Count Sketches

More information

Discrete Random Variables

Discrete Random Variables Discrete Random Variables An Undergraduate Introduction to Financial Mathematics J. Robert Buchanan Introduction The markets can be thought of as a complex interaction of a large number of random processes,

More information

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20 CS 70 Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20 Today we shall discuss a measure of how close a random variable tends to be to its expectation. But first we need to see how to compute

More information

Lecture 2 Sep 5, 2017

Lecture 2 Sep 5, 2017 CS 388R: Randomized Algorithms Fall 2017 Lecture 2 Sep 5, 2017 Prof. Eric Price Scribe: V. Orestis Papadigenopoulos and Patrick Rall NOTE: THESE NOTES HAVE NOT BEEN EDITED OR CHECKED FOR CORRECTNESS 1

More information

CSE548, AMS542: Analysis of Algorithms, Spring 2014 Date: May 12. Final In-Class Exam. ( 2:35 PM 3:50 PM : 75 Minutes )

CSE548, AMS542: Analysis of Algorithms, Spring 2014 Date: May 12. Final In-Class Exam. ( 2:35 PM 3:50 PM : 75 Minutes ) CSE548, AMS54: Analysis of Algorithms, Spring 014 Date: May 1 Final In-Class Exam ( :35 PM 3:50 PM : 75 Minutes ) This exam will account for either 15% or 30% of your overall grade depending on your relative

More information

Lecture 6 September 13, 2016

Lecture 6 September 13, 2016 CS 395T: Sublinear Algorithms Fall 206 Prof. Eric Price Lecture 6 September 3, 206 Scribe: Shanshan Wu, Yitao Chen Overview Recap of last lecture. We talked about Johnson-Lindenstrauss (JL) lemma [JL84]

More information

Data Streams & Communication Complexity

Data Streams & Communication Complexity Data Streams & Communication Complexity Lecture 1: Simple Stream Statistics in Small Space Andrew McGregor, UMass Amherst 1/25 Data Stream Model Stream: m elements from universe of size n, e.g., x 1, x

More information

Hierarchical Sampling from Sketches: Estimating Functions over Data Streams

Hierarchical Sampling from Sketches: Estimating Functions over Data Streams Hierarchical Sampling from Sketches: Estimating Functions over Data Streams Sumit Ganguly 1 and Lakshminath Bhuvanagiri 2 1 Indian Institute of Technology, Kanpur 2 Google Inc., Bangalore Abstract. We

More information

Data Stream Methods. Graham Cormode S. Muthukrishnan

Data Stream Methods. Graham Cormode S. Muthukrishnan Data Stream Methods Graham Cormode graham@dimacs.rutgers.edu S. Muthukrishnan muthu@cs.rutgers.edu Plan of attack Frequent Items / Heavy Hitters Counting Distinct Elements Clustering items in Streams Motivating

More information

CS483 Design and Analysis of Algorithms

CS483 Design and Analysis of Algorithms CS483 Design and Analysis of Algorithms Chapter 2 Divide and Conquer Algorithms Instructor: Fei Li lifei@cs.gmu.edu with subject: CS483 Office hours: Room 5326, Engineering Building, Thursday 4:30pm -

More information

6 Filtering and Streaming

6 Filtering and Streaming Casus ubique valet; semper tibi pendeat hamus: Quo minime credas gurgite, piscis erit. [Luck affects everything. Let your hook always be cast. Where you least expect it, there will be a fish.] Publius

More information

CS5314 Randomized Algorithms. Lecture 15: Balls, Bins, Random Graphs (Hashing)

CS5314 Randomized Algorithms. Lecture 15: Balls, Bins, Random Graphs (Hashing) CS5314 Randomized Algorithms Lecture 15: Balls, Bins, Random Graphs (Hashing) 1 Objectives Study various hashing schemes Apply balls-and-bins model to analyze their performances 2 Chain Hashing Suppose

More information

Lecture 10. Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1

Lecture 10. Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1 Lecture 10 Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1 Recap Sublinear time algorithms Deterministic + exact: binary search Deterministic + inexact: estimating diameter in a

More information

CS 591, Lecture 6 Data Analytics: Theory and Applications Boston University

CS 591, Lecture 6 Data Analytics: Theory and Applications Boston University CS 591, Lecture 6 Data Analytics: Theory and Applications Boston University Babis Tsourakakis February 8th, 2017 Universal hash family Notation: Universe U = {0,..., u 1}, index space M = {0,..., m 1},

More information

CS 237: Probability in Computing

CS 237: Probability in Computing CS 237: Probability in Computing Wayne Snyder Computer Science Department Boston University Lecture 11: Geometric Distribution Poisson Process Poisson Distribution Geometric Distribution The Geometric

More information

Lecture 1: Introduction to Sublinear Algorithms

Lecture 1: Introduction to Sublinear Algorithms CSE 522: Sublinear (and Streaming) Algorithms Spring 2014 Lecture 1: Introduction to Sublinear Algorithms March 31, 2014 Lecturer: Paul Beame Scribe: Paul Beame Too much data, too little time, space for

More information

X = X X n, + X 2

X = X X n, + X 2 CS 70 Discrete Mathematics for CS Fall 2003 Wagner Lecture 22 Variance Question: At each time step, I flip a fair coin. If it comes up Heads, I walk one step to the right; if it comes up Tails, I walk

More information

Testing k-wise Independence over Streaming Data

Testing k-wise Independence over Streaming Data Testing k-wise Independence over Streaming Data Kai-Min Chung Zhenming Liu Michael Mitzenmacher Abstract Following on the work of Indyk and McGregor [5], we consider the problem of identifying correlations

More information

Recitation 6. Randomization. 6.1 Announcements. RandomLab has been released, and is due Monday, October 2. It s worth 100 points.

Recitation 6. Randomization. 6.1 Announcements. RandomLab has been released, and is due Monday, October 2. It s worth 100 points. Recitation 6 Randomization 6.1 Announcements RandomLab has been released, and is due Monday, October 2. It s worth 100 points. FingerLab will be released after Exam I, which is going to be on Wednesday,

More information

Lecture Lecture 25 November 25, 2014

Lecture Lecture 25 November 25, 2014 CS 224: Advanced Algorithms Fall 2014 Lecture Lecture 25 November 25, 2014 Prof. Jelani Nelson Scribe: Keno Fischer 1 Today Finish faster exponential time algorithms (Inclusion-Exclusion/Zeta Transform,

More information

Parity Checker Example. EECS150 - Digital Design Lecture 9 - Finite State Machines 1. Formal Design Process. Formal Design Process

Parity Checker Example. EECS150 - Digital Design Lecture 9 - Finite State Machines 1. Formal Design Process. Formal Design Process Parity Checker Example A string of bits has even parity if the number of 1 s in the string is even. Design a circuit that accepts a bit-serial stream of bits and outputs a 0 if the parity thus far is even

More information

Topic: Sampling, Medians of Means method and DNF counting Date: October 6, 2004 Scribe: Florin Oprea

Topic: Sampling, Medians of Means method and DNF counting Date: October 6, 2004 Scribe: Florin Oprea 15-859(M): Randomized Algorithms Lecturer: Shuchi Chawla Topic: Sampling, Medians of Means method and DNF counting Date: October 6, 200 Scribe: Florin Oprea 8.1 Introduction In this lecture we will consider

More information

Improved Concentration Bounds for Count-Sketch

Improved Concentration Bounds for Count-Sketch Improved Concentration Bounds for Count-Sketch Gregory T. Minton 1 Eric Price 2 1 MIT MSR New England 2 MIT IBM Almaden UT Austin 2014-01-06 Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds

More information

CMPSCI 240: Reasoning Under Uncertainty

CMPSCI 240: Reasoning Under Uncertainty CMPSCI 240: Reasoning Under Uncertainty Lecture 5 Prof. Hanna Wallach wallach@cs.umass.edu February 7, 2012 Reminders Pick up a copy of B&T Check the course website: http://www.cs.umass.edu/ ~wallach/courses/s12/cmpsci240/

More information

Lecture 4 Thursday Sep 11, 2014

Lecture 4 Thursday Sep 11, 2014 CS 224: Advanced Algorithms Fall 2014 Lecture 4 Thursday Sep 11, 2014 Prof. Jelani Nelson Scribe: Marco Gentili 1 Overview Today we re going to talk about: 1. linear probing (show with 5-wise independence)

More information

variance of independent variables: sum of variances So chebyshev predicts won t stray beyond stdev.

variance of independent variables: sum of variances So chebyshev predicts won t stray beyond stdev. Announcements No class monday. Metric embedding seminar. Review expectation notion of high probability. Markov. Today: Book 4.1, 3.3, 4.2 Chebyshev. Remind variance, standard deviation. σ 2 = E[(X µ X

More information

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University Charalampos E. Tsourakakis January 25rd, 2017 Probability Theory The theory of probability is a system for making better guesses.

More information

Cell-Probe Lower Bounds for Prefix Sums and Matching Brackets

Cell-Probe Lower Bounds for Prefix Sums and Matching Brackets Cell-Probe Lower Bounds for Prefix Sums and Matching Brackets Emanuele Viola July 6, 2009 Abstract We prove that to store strings x {0, 1} n so that each prefix sum a.k.a. rank query Sumi := k i x k can

More information

Lecture 13 March 7, 2017

Lecture 13 March 7, 2017 CS 224: Advanced Algorithms Spring 2017 Prof. Jelani Nelson Lecture 13 March 7, 2017 Scribe: Hongyao Ma Today PTAS/FPTAS/FPRAS examples PTAS: knapsack FPTAS: knapsack FPRAS: DNF counting Approximation

More information

Intractable Problems Part Two

Intractable Problems Part Two Intractable Problems Part Two Announcements Problem Set Five graded; will be returned at the end of lecture. Extra office hours today after lecture from 4PM 6PM in Clark S250. Reminder: Final project goes

More information

CS5112: Algorithms and Data Structures for Applications

CS5112: Algorithms and Data Structures for Applications CS5112: Algorithms and Data Structures for Applications Lecture 14: Exponential decay; convolution Ramin Zabih Some content from: Piotr Indyk; Wikipedia/Google image search; J. Leskovec, A. Rajaraman,

More information

Finite State Machines. CS 447 Wireless Embedded Systems

Finite State Machines. CS 447 Wireless Embedded Systems Finite State Machines CS 447 Wireless Embedded Systems Outline Discrete systems Finite State Machines Transitions Timing Update functions Determinacy and Receptiveness 1 Discrete Systems Operates in sequence

More information

Lecture 4: September Reminder: convergence of sequences

Lecture 4: September Reminder: convergence of sequences 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 4: September 6 In this lecture we discuss the convergence of random variables. At a high-level, our first few lectures focused

More information

Lecture 2 September 4, 2014

Lecture 2 September 4, 2014 CS 224: Advanced Algorithms Fall 2014 Prof. Jelani Nelson Lecture 2 September 4, 2014 Scribe: David Liu 1 Overview In the last lecture we introduced the word RAM model and covered veb trees to solve the

More information

Bernoulli and Binomial

Bernoulli and Binomial Bernoulli and Binomial Will Monroe July 1, 217 image: Antoine Taveneaux with materials by Mehran Sahami and Chris Piech Announcements: Problem Set 2 Due this Wednesday, 7/12, at 12:3pm (before class).

More information

2. The Binomial Distribution

2. The Binomial Distribution 1 of 11 7/16/2009 6:39 AM Virtual Laboratories > 11. Bernoulli Trials > 1 2 3 4 5 6 2. The Binomial Distribution Basic Theory Suppose that our random experiment is to perform a sequence of Bernoulli trials

More information

Fast Sorting and Selection. A Lower Bound for Worst Case

Fast Sorting and Selection. A Lower Bound for Worst Case Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 0 Fast Sorting and Selection USGS NEIC. Public domain government image. A Lower Bound

More information

Exponential Distribution and Poisson Process

Exponential Distribution and Poisson Process Exponential Distribution and Poisson Process Stochastic Processes - Lecture Notes Fatih Cavdur to accompany Introduction to Probability Models by Sheldon M. Ross Fall 215 Outline Introduction Exponential

More information

CS 237: Probability in Computing

CS 237: Probability in Computing CS 237: Probability in Computing Wayne Snyder Computer Science Department Boston University Lecture 13: Normal Distribution Exponential Distribution Recall that the Normal Distribution is given by an explicit

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 3 Luca Trevisan August 31, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 3 Luca Trevisan August 31, 2017 U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 3 Luca Trevisan August 3, 207 Scribed by Keyhan Vakil Lecture 3 In which we complete the study of Independent Set and Max Cut in G n,p random graphs.

More information

Randomized Time Space Tradeoffs for Directed Graph Connectivity

Randomized Time Space Tradeoffs for Directed Graph Connectivity Randomized Time Space Tradeoffs for Directed Graph Connectivity Parikshit Gopalan, Richard Lipton, and Aranyak Mehta College of Computing, Georgia Institute of Technology, Atlanta GA 30309, USA. Email:

More information

Example continued. Math 425 Intro to Probability Lecture 37. Example continued. Example

Example continued. Math 425 Intro to Probability Lecture 37. Example continued. Example continued : Coin tossing Math 425 Intro to Probability Lecture 37 Kenneth Harris kaharri@umich.edu Department of Mathematics University of Michigan April 8, 2009 Consider a Bernoulli trials process with

More information

Lecture Examples of problems which have randomized algorithms

Lecture Examples of problems which have randomized algorithms 6.841 Advanced Complexity Theory March 9, 2009 Lecture 10 Lecturer: Madhu Sudan Scribe: Asilata Bapat Meeting to talk about final projects on Wednesday, 11 March 2009, from 5pm to 7pm. Location: TBA. Includes

More information

CS173 Lecture B, November 3, 2015

CS173 Lecture B, November 3, 2015 CS173 Lecture B, November 3, 2015 Tandy Warnow November 3, 2015 CS 173, Lecture B November 3, 2015 Tandy Warnow Announcements Examlet 7 is a take-home exam, and is due November 10, 11:05 AM, in class.

More information

COS 341: Discrete Mathematics

COS 341: Discrete Mathematics COS 341: Discrete Mathematics Final Exam Fall 2006 Print your name General directions: This exam is due on Monday, January 22 at 4:30pm. Late exams will not be accepted. Exams must be submitted in hard

More information

Skip Lists. What is a Skip List. Skip Lists 3/19/14

Skip Lists. What is a Skip List. Skip Lists 3/19/14 Presentation for use with the textbook Data Structures and Algorithms in Java, 6 th edition, by M. T. Goodrich, R. Tamassia, and M. H. Goldwasser, Wiley, 2014 Skip Lists 15 15 23 10 15 23 36 Skip Lists

More information

Notes on Discrete Probability

Notes on Discrete Probability Columbia University Handout 3 W4231: Analysis of Algorithms September 21, 1999 Professor Luca Trevisan Notes on Discrete Probability The following notes cover, mostly without proofs, the basic notions

More information

ELEG 3143 Probability & Stochastic Process Ch. 2 Discrete Random Variables

ELEG 3143 Probability & Stochastic Process Ch. 2 Discrete Random Variables Department of Electrical Engineering University of Arkansas ELEG 3143 Probability & Stochastic Process Ch. 2 Discrete Random Variables Dr. Jingxian Wu wuj@uark.edu OUTLINE 2 Random Variable Discrete Random

More information

CS Communication Complexity: Applications and New Directions

CS Communication Complexity: Applications and New Directions CS 2429 - Communication Complexity: Applications and New Directions Lecturer: Toniann Pitassi 1 Introduction In this course we will define the basic two-party model of communication, as introduced in the

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

CS483 Design and Analysis of Algorithms

CS483 Design and Analysis of Algorithms CS483 Design and Analysis of Algorithms Lecture 6-8 Divide and Conquer Algorithms Instructor: Fei Li lifei@cs.gmu.edu with subject: CS483 Office hours: STII, Room 443, Friday 4:00pm - 6:00pm or by appointments

More information

EXPERIMENT 2 Reaction Time Objectives Theory

EXPERIMENT 2 Reaction Time Objectives Theory EXPERIMENT Reaction Time Objectives to make a series of measurements of your reaction time to make a histogram, or distribution curve, of your measured reaction times to calculate the "average" or mean

More information

Expectation, inequalities and laws of large numbers

Expectation, inequalities and laws of large numbers Chapter 3 Expectation, inequalities and laws of large numbers 3. Expectation and Variance Indicator random variable Let us suppose that the event A partitions the sample space S, i.e. A A S. The indicator

More information

Finite State Machines CS 64: Computer Organization and Design Logic Lecture #15 Fall 2018

Finite State Machines CS 64: Computer Organization and Design Logic Lecture #15 Fall 2018 Finite State Machines CS 64: Computer Organization and Design Logic Lecture #15 Fall 2018 Ziad Matni, Ph.D. Dept. of Computer Science, UCSB Administrative The Last 2 Weeks of CS 64: Date L # Topic Lab

More information

Welcome to MAT 137! Course website:

Welcome to MAT 137! Course website: Welcome to MAT 137! Course website: http://uoft.me/ Read the course outline Office hours to be posted here Online forum: Piazza Precalculus review: http://uoft.me/precalc If you haven t gotten an email

More information

Notes 6 : First and second moment methods

Notes 6 : First and second moment methods Notes 6 : First and second moment methods Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Roc, Sections 2.1-2.3]. Recall: THM 6.1 (Markov s inequality) Let X be a non-negative

More information

Lecture 26. Sequence Algorithms (Continued)

Lecture 26. Sequence Algorithms (Continued) Lecture 26 Sequence Algorithms (Continued) Announcements for This Lecture Lab/Finals Lab 12 is the final lab Can use Consulting hours Due next Wednesday 9:30 Final: Dec 10 th 2:00-4:30pm Study guide is

More information

20.1 2SAT. CS125 Lecture 20 Fall 2016

20.1 2SAT. CS125 Lecture 20 Fall 2016 CS125 Lecture 20 Fall 2016 20.1 2SAT We show yet another possible way to solve the 2SAT problem. Recall that the input to 2SAT is a logical expression that is the conunction (AND) of a set of clauses,

More information

Big Data. Big data arises in many forms: Common themes:

Big Data. Big data arises in many forms: Common themes: Big Data Big data arises in many forms: Physical Measurements: from science (physics, astronomy) Medical data: genetic sequences, detailed time series Activity data: GPS location, social network activity

More information

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code Lecture 16 Agenda for the lecture Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code Variable-length source codes with error 16.1 Error-free coding schemes 16.1.1 The Shannon-Fano-Elias

More information

Math489/889 Stochastic Processes and Advanced Mathematical Finance Solutions for Homework 7

Math489/889 Stochastic Processes and Advanced Mathematical Finance Solutions for Homework 7 Math489/889 Stochastic Processes and Advanced Mathematical Finance Solutions for Homework 7 Steve Dunbar Due Mon, November 2, 2009. Time to review all of the information we have about coin-tossing fortunes

More information

1 Randomized Computation

1 Randomized Computation CS 6743 Lecture 17 1 Fall 2007 1 Randomized Computation Why is randomness useful? Imagine you have a stack of bank notes, with very few counterfeit ones. You want to choose a genuine bank note to pay at

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics February 19, 2018 CS 361: Probability & Statistics Random variables Markov s inequality This theorem says that for any random variable X and any value a, we have A random variable is unlikely to have an

More information

Lecture Lecture 9 October 1, 2015

Lecture Lecture 9 October 1, 2015 CS 229r: Algorithms for Big Data Fall 2015 Lecture Lecture 9 October 1, 2015 Prof. Jelani Nelson Scribe: Rachit Singh 1 Overview In the last lecture we covered the distance to monotonicity (DTM) and longest

More information

Linear Sketches A Useful Tool in Streaming and Compressive Sensing

Linear Sketches A Useful Tool in Streaming and Compressive Sensing Linear Sketches A Useful Tool in Streaming and Compressive Sensing Qin Zhang 1-1 Linear sketch Random linear projection M : R n R k that preserves properties of any v R n with high prob. where k n. M =

More information

Outline. We will cover (over the next few weeks) Induction Strong Induction Constructive Induction Structural Induction

Outline. We will cover (over the next few weeks) Induction Strong Induction Constructive Induction Structural Induction Outline We will cover (over the next few weeks) Induction Strong Induction Constructive Induction Structural Induction Induction P(1) ( n 2)[P(n 1) P(n)] ( n 1)[P(n)] Why Does This Work? I P(1) ( n 2)[P(n

More information

Entropy. Probability and Computing. Presentation 22. Probability and Computing Presentation 22 Entropy 1/39

Entropy. Probability and Computing. Presentation 22. Probability and Computing Presentation 22 Entropy 1/39 Entropy Probability and Computing Presentation 22 Probability and Computing Presentation 22 Entropy 1/39 Introduction Why randomness and information are related? An event that is almost certain to occur

More information

Lecture 2: Discrete Probability Distributions

Lecture 2: Discrete Probability Distributions Lecture 2: Discrete Probability Distributions IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge February 1st, 2011 Rasmussen (CUED) Lecture

More information

ECE302 Spring 2015 HW10 Solutions May 3,

ECE302 Spring 2015 HW10 Solutions May 3, ECE32 Spring 25 HW Solutions May 3, 25 Solutions to HW Note: Most of these solutions were generated by R. D. Yates and D. J. Goodman, the authors of our textbook. I have added comments in italics where

More information

POISSON PROCESSES 1. THE LAW OF SMALL NUMBERS

POISSON PROCESSES 1. THE LAW OF SMALL NUMBERS POISSON PROCESSES 1. THE LAW OF SMALL NUMBERS 1.1. The Rutherford-Chadwick-Ellis Experiment. About 90 years ago Ernest Rutherford and his collaborators at the Cavendish Laboratory in Cambridge conducted

More information

Lecture Notes 17. Randomness: The verifier can toss coins and is allowed to err with some (small) probability if it is unlucky in its coin tosses.

Lecture Notes 17. Randomness: The verifier can toss coins and is allowed to err with some (small) probability if it is unlucky in its coin tosses. CS 221: Computational Complexity Prof. Salil Vadhan Lecture Notes 17 March 31, 2010 Scribe: Jonathan Ullman 1 Interactive Proofs ecall the definition of NP: L NP there exists a polynomial-time V and polynomial

More information

Lecture 4: LMN Learning (Part 2)

Lecture 4: LMN Learning (Part 2) CS 294-114 Fine-Grained Compleity and Algorithms Sept 8, 2015 Lecture 4: LMN Learning (Part 2) Instructor: Russell Impagliazzo Scribe: Preetum Nakkiran 1 Overview Continuing from last lecture, we will

More information

Combinatorial Proof of the Hot Spot Theorem

Combinatorial Proof of the Hot Spot Theorem Combinatorial Proof of the Hot Spot Theorem Ernie Croot May 30, 2006 1 Introduction A problem which has perplexed mathematicians for a long time, is to decide whether the digits of π are random-looking,

More information

CMPT 882 Machine Learning

CMPT 882 Machine Learning CMPT 882 Machine Learning Lecture Notes Instructor: Dr. Oliver Schulte Scribe: Qidan Cheng and Yan Long Mar. 9, 2004 and Mar. 11, 2004-1 - Basic Definitions and Facts from Statistics 1. The Binomial Distribution

More information

MATH 341, Section 001 FALL 2014 Introduction to the Language and Practice of Mathematics

MATH 341, Section 001 FALL 2014 Introduction to the Language and Practice of Mathematics MATH 341, Section 001 FALL 2014 Introduction to the Language and Practice of Mathematics Class Meetings: MW 9:30-10:45 am in EMS E424A, September 3 to December 10 [Thanksgiving break November 26 30; final

More information

Math 180A. Lecture 16 Friday May 7 th. Expectation. Recall the three main probability density functions so far (1) Uniform (2) Exponential.

Math 180A. Lecture 16 Friday May 7 th. Expectation. Recall the three main probability density functions so far (1) Uniform (2) Exponential. Math 8A Lecture 6 Friday May 7 th Epectation Recall the three main probability density functions so far () Uniform () Eponential (3) Power Law e, ( ), Math 8A Lecture 6 Friday May 7 th Epectation Eample

More information

Lectures One Way Permutations, Goldreich Levin Theorem, Commitments

Lectures One Way Permutations, Goldreich Levin Theorem, Commitments Lectures 11 12 - One Way Permutations, Goldreich Levin Theorem, Commitments Boaz Barak March 10, 2010 From time immemorial, humanity has gotten frequent, often cruel, reminders that many things are easier

More information

Probability Theory and Simulation Methods

Probability Theory and Simulation Methods Feb 28th, 2018 Lecture 10: Random variables Countdown to midterm (March 21st): 28 days Week 1 Chapter 1: Axioms of probability Week 2 Chapter 3: Conditional probability and independence Week 4 Chapters

More information