CS 125 Section #12 (More) Probability and Randomized Algorithms 11/24/14. For random numbers X which only take on nonnegative integer values, E(X) =

Similar documents
20.1 2SAT. CS125 Lecture 20 Fall 2016

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

N/4 + N/2 + N = 2N 2.

1 Maintaining a Dictionary

CS280, Spring 2004: Final

Analysis of Algorithms I: Perfect Hashing

CS 361: Probability & Statistics

UCSD CSE 21, Spring 2014 [Section B00] Mathematics for Algorithm and System Analysis

Discrete Mathematics for CS Spring 2006 Vazirani Lecture 22

CS 125 Section #10 (Un)decidability and Probability November 1, 2016

CSE 312, 2017 Winter, W.L.Ruzzo. 5. independence [ ]

Randomized Algorithms

1 More finite deterministic automata

Problem 1: (Chernoff Bounds via Negative Dependence - from MU Ex 5.15)

Lecture: Analysis of Algorithms (CS )

Expectation of geometric distribution

Discrete Probability

Basic Probability. Introduction

Homework 4 Solutions

Solving Equations by Adding and Subtracting

CS261: A Second Course in Algorithms Lecture #18: Five Essential Tools for the Analysis of Randomized Algorithms

CSCB63 Winter Week10 - Lecture 2 - Hashing. Anna Bretscher. March 21, / 30

UCSD CSE 21, Spring 2014 [Section B00] Mathematics for Algorithm and System Analysis

Theorem 1.7 [Bayes' Law]: Assume that,,, are mutually disjoint events in the sample space s.t.. Then Pr( )

UCSD CSE 21, Spring 2014 [Section B00] Mathematics for Algorithm and System Analysis

Discrete Structures for Computer Science

Simultaneous Equations Solve for x and y (What are the values of x and y): Summation What is the value of the following given x = j + 1. x i.

Lecture 12: Interactive Proofs

Expectation of geometric distribution. Variance and Standard Deviation. Variance: Examples

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations

Lecture Examples of problems which have randomized algorithms

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 12. Random Variables: Distribution and Expectation

Notes slides from before lecture. CSE 21, Winter 2017, Section A00. Lecture 16 Notes. Class URL:

1. When applied to an affected person, the test comes up positive in 90% of cases, and negative in 10% (these are called false negatives ).

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

1 Randomized Computation

6.842 Randomness and Computation Lecture 5

Notes. Combinatorics. Combinatorics II. Notes. Notes. Slides by Christopher M. Bourke Instructor: Berthe Y. Choueiry. Spring 2006

Lecture 23: Alternation vs. Counting

Math 122L. Additional Homework Problems. Prepared by Sarah Schott

6.1 Occupancy Problem

Notes on Discrete Probability

CSE 312: Foundations of Computing II Random Variables, Linearity of Expectation 4 Solutions

1 The Basic Counting Principles

Discrete Mathematics and Probability Theory Fall 2012 Vazirani Note 14. Random Variables: Distribution and Expectation

Lecture notes on OPP algorithms [Preliminary Draft]

First Digit Tally Marks Final Count

CVMS CONTEST. Mock AMC 12

Lecture 26: Arthur-Merlin Games

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 16. Random Variables: Distribution and Expectation

CPSC 467: Cryptography and Computer Security

Computations - Show all your work. (30 pts)

Discrete Mathematics

Lecture 6 - Random Variables and Parameterized Sample Spaces

CS5314 Randomized Algorithms. Lecture 15: Balls, Bins, Random Graphs (Hashing)

Introduction to discrete probability. The rules Sample space (finite except for one example)

5.3 Conditional Probability and Independence

Recitation 6. Randomization. 6.1 Announcements. RandomLab has been released, and is due Monday, October 2. It s worth 100 points.

HW11. Due: December 6, 2018

MATH 3C: MIDTERM 1 REVIEW. 1. Counting

6.041/6.431 Spring 2009 Quiz 1 Wednesday, March 11, 7:30-9:30 PM. SOLUTIONS

14.1 Finding frequent elements in stream

F 99 Final Quiz Solutions

Introduction to Probability, Fall 2013

CS 5321: Advanced Algorithms Amortized Analysis of Data Structures. Motivations. Motivation cont d

Stanford University CS254: Computational Complexity Handout 8 Luca Trevisan 4/21/2010

Lecture Notes. This lecture introduces the idea of a random variable. This name is a misnomer, since a random variable is actually a function.

4. What is the probability that the two values differ by 4 or more in absolute value? There are only six

(x 1 +x 2 )(x 1 x 2 )+(x 2 +x 3 )(x 2 x 3 )+(x 3 +x 1 )(x 3 x 1 ).

Lecture 12: Randomness Continued

the time it takes until a radioactive substance undergoes a decay

Quiz 1 Solutions. Problem 2. Asymptotics & Recurrences [20 points] (3 parts)

Math , Fall 2012: HW 5 Solutions

Discrete Distributions

Lecture 20: PSPACE. November 15, 2016 CS 1010 Theory of Computation

1 Primals and Duals: Zero Sum Games

Pentagon Walk Revisited. Double Heads Revisited Homework #1 Thursday 26 Jan 2006

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

1 Terminology and setup

Notes on Complexity Theory Last updated: November, Lecture 10

CME 106: Review Probability theory

Lecture 2 : CS6205 Advanced Modeling and Simulation

The problem Countable additivity Finite additivity Conglomerability Structure Puzzle. The Measure Problem. Alexander R. Pruss

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MH4702/MAS446/MTH437 Probabilistic Methods in OR

CS 237: Probability in Computing

Insert Sorted List Insert as the Last element (the First element?) Delete Chaining. 2 Slide courtesy of Dr. Sang-Eon Park

Discrete Mathematics and Probability Theory Fall 2011 Rao Midterm 2 Solutions

Carleton University. Final Examination Fall DURATION: 2 HOURS No. of students: 223

Lecture 5. 1 Review (Pairwise Independence and Derandomization)

Lecture 24: Randomized Complexity, Course Summary

Lecture notes for probability. Math 124

This exam contains 13 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Advanced topic: Space complexity

Lecture 8: Conditional probability I: definition, independence, the tree method, sampling, chain rule for independent events

? 11.5 Perfect hashing. Exercises

A Collection of Dice Problems

Introduction to Hashtables

Lecture 23: More PSPACE-Complete, Randomized Complexity

Lecture 5: Hashing. David Woodruff Carnegie Mellon University

Transcription:

CS 125 Section #12 (More) Probability and Randomized Algorithms 11/24/14 1 Probability First, recall a couple useful facts from last time about probability: Linearity of expectation: E(aX + by ) = ae(x) + be(y ) even when X and Y are not independent. Geometric sums: i=0 pi = 1 1 p (for 0 < p < 1) and n i=0 pi = 1 pn+1 1 p (for all p). For random numbers X which only take on nonnegative integer values, E(X) = P(X > j). Geometric random variables: If some event happens with probability p, the expected number of tries we need in order to get that even to occur is 1/p. Markov s inequality: For any nonnegative random variable X and λ > 0, P(X > λ EX) < 1 λ. Exercise (Fixed points in permutations). Suppose we pick a uniformly random permutation on n elements. First, how would we do it? And then, what is the expected number of fixed points it has? j=0 Exercise (Special Dice). Consider the following game: there are three dice, each with its faces labeled by integers. You are allowed to choose what you think is the best die, after which your opponent will choose another die. Each of you roll yours, and the person who rolls the larger number receives $1 from the other player. Should you always play? 1

Exercise (Coupon Collector). There are n coupons C 1,..., C n. Every time you buy a magazine, a uniformly random coupon C i is inside. How many magazines do you have to buy, in expectation, to get all n coupons? How many magazines do you have to buy to get all n coupons with a failure probability of at most P? Exercise (Election Problem (Challenge)). America is having its next presidential election between two candidates, and for each of the n states you know the probability that candidate 1 wins that state is P i. For simplicity we ll assume that independent candidates don t exist, so the probability candidate 2 wins is 1 P i. Because of the electoral college, state i is worth S i points: when a candidate wins a state, he/she gets all S i points. To win, a candidate must get a strict majority of the electoral votes. Assuming a tie isn t possible, what s the probability that candidate 1 wins? 2 Probabilistic Algorithms We have looked at probailistic algorithsm with bounded running time, but that may be incorrect with a small probability. More specifically: RP is the class of languages for which there exists a polynomial-time randomized algorithm which always returns correctly for inputs in the language, and which returns correctly at least half the time for inputs not in the language. BPP is the class of languages for which there exists a polynomial-time ranomized algorithm which returns correctly at least 2/3 of the time on any input. Note that the randomness is over the coins of the algorithm, not over the input. 2

Oftentimes, we can decrease the failure probability by running the algorithm multiple times. For example, if we use our algorithm to check some computation but can be fooled with some probability, we can just run it multiple times. Indeed, this can be used to show that it is sufficient for a BPP algorithm to be correct at least 1 2 + 1 p(n) of the time for any polynomial p(n) in the length of the input. 2.1 Hashing Remember from class that a hash function is a mapping h : {0,..., n 1} {0,..., m 1}. In general, because one of the goals of a hash function is to save space, we will see that m << n. Hashing-based data structures are useful since they ideally allow for constant time operations (lookup, adding, and deletion), although collisions can change that if, for example, n is too large or you use a poorly chosen hash function. (Why?) In practice, there are several ways to deal with collisions while hashing, such as: Chaining: Each bucket holds a set of all items that are hashed to it. Simply add x to this set. Linear probing: If f(x) already has an item, try f(x) + 1, f(x) + 2, etc. until you find an empty location (all taken mod m). Perfect hashing: If the set is known and fixed, generate a hash function for that specific set. However, this does not support insertions and deletions. Automatic resizing: Once the number of elements exceeds cm for some constant c, make the hash table larger by doubling m. In practice, a load factor of c = 0.75 is fairly common. 2.2 Document Similarity Consider two sets of numbers, A and B. For concreteness, we will assume that A and B are subsets of 64 bit numbers. We may define the resemblance of A and B as resemblance(a, B) = R(A, B) = A B A B. The resemblance is a real number between 0 and 1. Intuitively, the resemblance accurately captures how close the two sets are. To calculate this, we could sort the sets and compare, but this is still rather slow, so we d like to do better. We will do this by computing an approximation to the set resemblance, rather than the exact value. Let π 1 be some random permutations of the numbers 1, 2,..., 2 64 (that is, it is a bijection {1,..., 2 64 } {1,..., 2 64 }). Let π 1 (A) = {π 1 (x) : x A}. Exercise. When does min{π 1 (A)} = min{π 1 (B)}? 3

Exercise. What is Pr[min{π 1 (A)} = min{π 1 (B)}]? This gives us a way to estimate the resemblance. Instead of taking just one permutation, we take many say 100. For each set A, we preprocess by computing min{π j (A)} for j = 1 to 100, and store these values. To estimate the resemblance of two sets A and B, we count how often the minima are the same, and divide by 100. It is like each permutation gives us a coin flip, where the probability of a heads (a match) is exactly the resemblance R(A, B) of the two sets. If you have some more experience with this type of probability, you might think about how we could bound the accuracy of this method that is, you want to make some statement of the form the algorithm is within ɛ of the correct set resemblence with probability at least 1 δ for some ɛ, δ [0, 1]. Now, we can use this to think about similarity between documents via a method called shingling. That is, divide the document into pieces of consecutive words and hash the results into a set S D. Once we have the shingles for the document, we associate a document sketch with each document. The sketch of a document S D is a list of say 100 numbers: (min{π 1 (S D )}, min{π 2 (S D )}, min{π 3 (S D )},..., min{π 100 (S D )}). Exercise. How well does this method do? More concretely, suppose that we mark two documents as similar if their sketches match on at least 90 values. What is the probability that this happens if the set resemblence of the two sets of shingles is r? 3 Random Walks A random walk is an iterative process on a set of vertices V. In each step, you move from the current vertex v 0 to each v V with some probability. The simplest version of a random walk is a one-dimensional 4

random walk in which the vertices are the integers, you start at 0, and at each step you either move up one (with probability 1/2) or down one (with probability 1/2). 2-SAT: In lecture, we gave the following randomized algorithm for solving 2-SAT. Start with some truth assignment, say by setting all the variables to false. Find some clause that is not yet satisfied. Randomly choose one the variables in that clause, say by flipping a coin, and change its value. Continue this process, until either all clauses are satisfied or you get tired of flipping coins. We used a random walk with a completely reflecting boundary at 0 to model our randomized solution to 2-SAT. Fix some solution S and keep track of the number of variables k consistent with the solution S. In each step, we either increase or decrease k by one. Using this model, we showed that the expected running time of our algorithm is O(n 2 ). Exercise. This weekend, you decide to go to a casino and gamble. You start with k dollars, and you decide that if you ever have n k dollars, you will take your winnings and go home. Assuming that at each step you either win $1 or lose $1 (with equal probability), what is the probability that you instead lose all your money? Exercise. Now suppose that there are n + 1 people in a circle numbered 0, 1,..., n. Person 0 starts with a bag of candy. At each step, the person with the bag of candy passes it either left or right with equal probability. The last person to receive the bag wins (and gets to keep all the candy). So if you were playing, then you would want to receive the bag only after everyone else has received the bag at least once. What is the probability that person i wins? 5