CS155: Probability and Computing: Randomized Algorithms and Probabilistic Analysis

Similar documents
Randomized Algorithms

Theorem 1.7 [Bayes' Law]: Assume that,,, are mutually disjoint events in the sample space s.t.. Then Pr( )

CS684 Graph Algorithms

6.262: Discrete Stochastic Processes 2/2/11. Lecture 1: Introduction and Probability review

Random Variable. Pr(X = a) = Pr(s)

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

COS 341: Discrete Mathematics

Mathematical Foundations of Computer Science Lecture Outline October 18, 2018

Essentials on the Analysis of Randomized Algorithms

Probabilistic Machine Learning

CS 6375 Machine Learning

Formalizing Probability. Choosing the Sample Space. Probability Measures

Learning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin

Introduction to Randomized Algorithms: Quick Sort and Quick Selection

Naive Bayes classification

Disjointness and Additivity

Midterm 2 Review. CS70 Summer Lecture 6D. David Dinh 28 July UC Berkeley

Final Examination CS 540-2: Introduction to Artificial Intelligence

A PECULIAR COIN-TOSSING MODEL

Dynamic Programming Lecture #4

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 3 Luca Trevisan August 31, 2017

MA : Introductory Probability

Randomized Algorithms I. Spring 2013, period III Jyrki Kivinen

Machine Learning Theory (CS 6783)

Introductory Probability

Qualifier: CS 6375 Machine Learning Spring 2015

Machine Learning. Instructor: Pranjal Awasthi

Machine Learning for Large-Scale Data Analysis and Decision Making A. Week #1

Lecture 23: More PSPACE-Complete, Randomized Complexity

With Question/Answer Animations. Chapter 7

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

CSE 548: Analysis of Algorithms. Lectures 18, 19, 20 & 21 ( Randomized Algorithms & High Probability Bounds )

CPSC 536N: Randomized Algorithms Term 2. Lecture 2

Bayesian Models in Machine Learning

PAC-learning, VC Dimension and Margin-based Bounds

PAC-learning, VC Dimension and Margin-based Bounds

CMPSCI 240: Reasoning Under Uncertainty

Which coin will I use? Which coin will I use? Logistics. Statistical Learning. Topics. 573 Topics. Coin Flip. Coin Flip

The first bound is the strongest, the other two bounds are often easier to state and compute. Proof: Applying Markov's inequality, for any >0 we have

Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

Introduction to Stochastic Processes

MATH MW Elementary Probability Course Notes Part I: Models and Counting

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2

Introduction to Stochastic Processes

LECTURE 1. 1 Introduction. 1.1 Sample spaces and events

Be able to define the following terms and answer basic questions about them:

Example 1. The sample space of an experiment where we flip a pair of coins is denoted by:

Lecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan

CS261: Problem Set #3

CSE525: Randomized Algorithms and Probabilistic Analysis April 2, Lecture 1

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

MATH2206 Prob Stat/20.Jan Weekly Review 1-2

Probability in Programming. Prakash Panangaden School of Computer Science McGill University

Lecture 4: Probability and Discrete Random Variables

Probability, Random Processes and Inference

COMP 2804 Assignment 4

Be able to define the following terms and answer basic questions about them:

Probability: Axioms, Properties, Interpretations

BASICS OF PROBABILITY CHAPTER-1 CS6015-LINEAR ALGEBRA AND RANDOM PROCESSES

1. When applied to an affected person, the test comes up positive in 90% of cases, and negative in 10% (these are called false negatives ).

ECE 340 Probabilistic Methods in Engineering M/W 3-4:15. Lecture 2: Random Experiments. Prof. Vince Calhoun

Probability. Lecture Notes. Adolfo J. Rumbos

Probability Basics Review

Probability Theory and Simulation Methods

CMPSCI 240: Reasoning about Uncertainty

CS70: Jean Walrand: Lecture 16.

Lecture Notes 7 Random Processes. Markov Processes Markov Chains. Random Processes

Lecture 4: Probability, Proof Techniques, Method of Induction Lecturer: Lale Özkahya

CS145: Probability & Computing Lecture 24: Algorithms

Probability Space: Formalism Simplest physical model of a uniform probability space:

CMPSCI 240: Reasoning about Uncertainty

Course Introduction. Probabilistic Modelling and Reasoning. Relationships between courses. Dealing with Uncertainty. Chris Williams.

Independence. Conditional Probability. Conditional Probability. Independence. SCONE Lab. Definition: Two events E and F are independent iff

Approximate Inference

Lecture 1: January 16

UCSD CSE 21, Spring 2014 [Section B00] Mathematics for Algorithm and System Analysis

Bayesian Learning (II)

Conditional Probability

Conditional Probability & Independence. Conditional Probabilities

Topic 4 Probability. Terminology. Sample Space and Event

Monte Carlo Integration II & Sampling from PDFs

Markov Chains and MCMC

Today s Outline. Biostatistics Statistical Inference Lecture 01 Introduction to BIOSTAT602 Principles of Data Reduction

CIS519: Applied Machine Learning Fall Homework 5. Due: December 10 th, 2018, 11:59 PM

Machine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20.

Conditional Probability & Independence. Conditional Probabilities

Discrete Mathematics and Probability Theory Fall 2010 Tse/Wagner MT 2 Soln

CS 246 Review of Proof Techniques and Probability 01/14/19

Probability and random variables

Continuing Probability.

Bayes Formula. MATH 107: Finite Mathematics University of Louisville. March 26, 2014

Chapter 1: Introduction to Probability Theory

Statistical Methods for the Social Sciences, Autumn 2012

Janson s Inequality and Poisson Heuristic

CSC Discrete Math I, Spring Discrete Probability

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

CS 188: Artificial Intelligence Fall 2008

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Transcription:

CS155: Probability and Computing: Randomized Algorithms and Probabilistic Analysis Eli Upfal Eli Upfal@brown.edu Office: 319 TA s: Lorenzo De Stefani and Sorin Vatasoiu cs155tas@cs.brown.edu

It is remarkable that this science, which originated in the consideration of games and chances, should have become the most important object of human knowledge... The most important questions of life are, for the most part, really only problems of probability Pierre Simons, Marquis de Laplace (1749 1827).

Why Probability in Computing? The ideal naive computation paradigm: Deterministic algorithm computes efficiently a correct output for any input. In reality : Algorithms are often randomized (no deterministic algorithm, or randomized algorithm is simpler and more efficient) Input is assumed from a distribution - probabilistic analysis. (No efficient worst case solution or iterative computation on unknown future input). Correct Output is not known, need to learn from examples.

Why Probability in Computing? Almost any advance computing application today has some randomization/statistical/machine learning components: Network security Cryptography Web search and Web advertising Spam filtering Social network tools Recommendation systems: Amazon, Netfix,.. Communication protocols Computational finance System biology DNA sequencing and analysis Data mining

Probability and Computing Randomized algorithms - random steps help! - cryptography and security, fast algorithms, simulations Probabilistic analysis of algorithms - Why hard to solve problems in theory are often not that hard in practice. Statistical inference - Machine learning, data mining... All are based on the same (mostly discrete) probability theory principles and techniques

Course Details - Main Topics Review basic probability theory through analysis of randomized algorithms. Large deviation: Chernoff and Hoeffding bounds Martingale (in discrete space) The theory of learning, PAC learning and VC-dimension Monte Carlo methods, Metropolis algorithm,... Convergence of Monte Carlo Markov Chains methods. The probabilistic method The Poisson process and some queueing theory....

Course Details Pre-requisite: CS 145, CS 45 or equivalent (first three chapters in the textbook). Textbook:

Homeworks, Midterm and Final: Weekly assignments. Typeset in Latex (or readable like typed) - template on the website Concise and correct proofs. Can work together - but write in your own words. Work must be submitted on time. Midterm and final: take home exams, absolute no collaboration, cheaters get C.

Verifying Matrix Multiplication Given three n n matrices A, B, and C in a Boolean field, we want to verify AB = C. Standard method: Matrix multiplication - takes Θ(n 3 ) (Θ(n 2.37 )) operations.

Randomized algorithm: 1 Chooses a random vector r = (r 1, r 2,..., r n ) {0, 1} n. 2 Compute B r; 3 Compute A(B r); 4 Computes C r; 5 If A(B r) C r return AB C, else return AB = C. The algorithm takes Θ(n 2 ) time. Theorem If AB C, and r is chosen uniformly at random from {0, 1} n, then Pr(AB r = C r) 1 2.

Probability Space Definition A probability space has three components: 1 A sample space Ω, which is the set of all possible outcomes of the random process modeled by the probability space; 2 A family of sets F representing the allowable events, where each set in F is a subset of the sample space Ω; 3 A probability function Pr : F R, satisfying the definition below. An element of Ω is a simple event. In a discrete probability space we use F = 2 Ω.

Probability Function Definition A probability function is any function Pr : F R that satisfies the following conditions: 1 For any event E, 0 Pr(E) 1; 2 Pr(Ω) = 1; 3 For any finite or countably infinite sequence of pairwise mutually disjoint events E 1, E 2, E 3,... Pr E i = Pr(E i ). i 1 i 1 The probability of an event is the sum of the probabilities of its simple events.

Independent Events Definition Two events E and F are independent if and only if Pr(E F ) = Pr(E) Pr(F ). More generally, events E 1, E 2,... E k are mutually independent if and only if for any subset I [1, k], ( ) Pr E i = Pr(E i ). i I i I

Verifying Matrix Multiplication Randomized algorithm: 1 Chooses a random vector r = (r 1, r 2,..., r n ) {0, 1} n. 2 Compute B r; 3 Compute A(B r); 4 Computes C r; 5 If A(B r) C r return AB C, else return AB = C. The algorithm takes Θ(n 2 ) time. Theorem If AB C, and r is chosen uniformly at random from {0, 1} n, then Pr(AB r = C r) 1 2.

Lemma Choosing r = (r 1, r 2,..., r n ) {0, 1} n uniformly at random is equivalent to choosing each r i independently and uniformly from {0, 1}. Proof. If each r i is chosen independently and uniformly at random, each of the 2 n possible vectors r is chosen with probability 2 n, giving the lemma.

Proof: Let D = AB C 0. AB r = C r implies that D r = 0. Since D 0 it has some non-zero entry; assume d 11. For D r = 0, it must be the case that or equivalently n d 1j r j = 0, j=1 r 1 = n j=2 d 1jr j d 11. (1) Here we use d 11 0.

Principle of Deferred Decision Assume that we fixed r 2,..., r n. The RHS is already determined, the only variable is r 1. n j=2 r 1 = d 1jr j. (2) d 11 Probability that r 1 = RHS is no more than 1/2.

More formally, summing over all collections of values (x 2, x 3, x 4,..., x n ) {0, 1} n 1, we have = = = Pr(AB r = C r) (x 2,...,x n) {0,1} n 1 Pr (AB r = C r (r 2,..., r n) = (x 2,..., x n)) Pr ((r 2,..., r n) = (x 2,..., x n)) Pr ((AB r = C r) ((r 2,..., r n) = (x 2,..., x n))) (x 2,...,x n) {0,1} n 1 (( n j=2 r 1 = d ) ) 1jr j ((r 2,..., r n) = (x 2,..., x n)) (x 2,...,x n) {0,1} n 1 Pr (x 2,...,x n) {0,1} n 1 Pr d 11 ( n j=2 r 1 = d ) 1jr j Pr ((r 2,..., r n) = (x 2,..., x n)) d 11 1 Pr((r2,..., rn) = (x2,..., xn)) 2 (x 2,...,x n) {0,1} n 1 = 1 2.

Computing Conditional Probabilities Definition The conditional probability that event E 1 occurs given that event E 2 occurs is Pr(E 1 E 2 ) = Pr(E 1 E 2 ). Pr(E 2 ) The conditional probability is only well-defined if Pr(E 2 ) > 0. By conditioning on E 2 we restrict the sample space to the set E 2. Thus we are interested in Pr(E 1 E 2 ) normalized by Pr(E 2 ).

Theorem (Law of Total Probability) Let E 1, E 2,..., E n be mutually disjoint events in the sample space Ω, and n i=1 E i = Ω, then Pr(B) = n Pr(B E i ) = i=1 n Pr(B E i ) Pr(E i ). i=1 Proof. Since the events E i, i = 1,..., n are disjoint and cover the entire sample space Ω, Pr(B) = n Pr(B E i ) = i=1 n Pr(B E i ) Pr(E i ). i=1

Smaller Error Probability The test has a one side error, repeated tests are independent. Run the test k times. Accept AB = C if it passed all k tests. Theorem The probability of making a mistake is (1/2) k.

Bayes Law Theorem (Bayes Law) Assume that E 1, E 2,..., E n are mutually disjoint sets such that n i=1 E i = Ω, then Pr(E j B) = Pr(E j B) Pr(B) = Pr(B E j ) Pr(E j ) n i=1 Pr(B E i) Pr(E i ).

Application: Finding a Biased Coin We are given three coins, two of the coins are fair and the third coin is biased, landing heads with probability 2/3. We need to identify the biased coin. We flip each of the coins. The first and second coins come up heads, and the third comes up tails. What is the probability that the first coin is the biased one?

Let E i be the event that the i-th coin flipped is the biased one, and let B be the event that the three coin flips came up heads, heads, and tails. Before we flip the coins we have Pr(E i ) = 1/3 for i = 1,..., 3, thus and Pr(B E 1 ) = Pr(B E 2 ) = 2 3 1 2 1 2 = 1 6, Pr(B E 3 ) = 1 2 1 2 1 3 = 1 12. Applying Bayes law we have Pr(E 1 B) = Pr(B E 1) Pr(E 1 ) 3 i=1 Pr(B E i) Pr(E i ) = 2 5. The outcome of the three coin flips increases the probability that the first coin is the biased one from 1/3 to 2/5.

Bayesian approach Start with an prior model, giving some initial value to the model parameters. This model is then modified, by incorporating new observations, to obtain a posterior model that captures the new information.

Min-Cut

Min-Cut Algorithm Input: An n-node graph G. Output: A minimal set of edges that disconnects the graph. 1 Repeat n 2 times: 1 Pick an edge uniformly at random. 2 Contract the two vertices connected by that edge, eliminate all edges connecting the two vertices. 2 Output the set of edges connecting the two remaining vertices.

Theorem The algorithm outputs a min-cut set of edges with probability 2 n(n 1). Lemma Vertex contraction does not reduce the size of the min-cut set. (Contraction can only increase the size of the min-cut set.) Proof. Every cut set in the new graph is a cut set in the original graph.

Analysis of the Algorithm Assume that the graph has a min-cut set of k edges. We compute the probability of finding one such set C. Lemma If no edge of C was contracted, no edge of C was eliminated. Proof. Let X and Y be the two set of vertices cut by C. If the contracting edge connects two vertices in X (res. Y ), then all its parallel edges also connect vertices in X (res. Y ).

Let E i = the edge contracted in iteration i is not in C. Let F i = i j=1 E j = no edge of C was contracted in the first i iterations. We need to compute Pr(F n 2 )

Since the minimum cut-set has k edges, all vertices have degree k, and the graph has nk/2 edges. There are at least nk/2 edges in the graph, k edges are in C. Pr(E 1 ) = Pr(F 1 ) 1 2k nk = 1 2 n.

Assume that the first contraction did not eliminate an edge of C (conditioning on the event E 1 = F 1 ). After the first vertex contraction we are left with an n 1 node graph, with minimum cut set, and minimum degree k. The new graph has at least k(n 1)/2 edges. Pr(E 2 F 1 ) 1 Similarly, Pr(E i F i 1 ) 1 k k(n 1)/2 1 2 n 1. k k(n i+1)/2 = 1 2 n i+1. We need to compute Pr(F n 2 ) = Pr( n 2 j=1 E j)

Useful identities: Pr(A B) = Pr(A B) Pr(B) Pr(A B) = Pr(A B)Pr(B) Pr(A B C) = Pr(A B C)Pr(B C) = Pr(A B C)Pr(B C)Pr(C) Let A 1,..., A n be a sequence of events. Let E i = i j=1 A i Pr(E n ) = Pr(A n E n 1 )Pr(E n 1 ) = Pr(A n E n 1 )Pr(A n 1 E n 2 )...P(A 2 E 1 )Pr(A 1 )

We need to compute Pr(F n 2 ) = Pr( n 2 j=1 E j) We use Pr(A B) = Pr(A B)Pr(B) Pr(F n 2 ) = Pr(E n 2 F n 3 ) = Pr(E n 2 F n 3 )Pr(F n 3 ) = Pr(E n 2 F n 3 )Pr(E n 3 F n 4 )...Pr(E 2 F 1 )Pr(F 1 ) = n 2 Pr(F 1 ) Pr(E j F j 1 ) j=2

The probability that the algorithm computes the minimum cut-set is n 2 Pr(F n 2 ) = Pr( n 2 j=1 E j) = Pr(F 1 ) Pr(E j F j 1 ) ( Π n 2 i=1 1 ( n 2 = n ) 2 n i + 1 ) ( n 3 n 1 2 n(n 1). = Π n 2 i=1 j=2 ) ( n 4 n 2 ( ) n i 1 n i + 1 )...

Theorem Assume that we run the randomized min-cut algorithm n(n 1) log n times and output the minimum size cut-set found in all the iterations. The probability that the output is not a min-cut set is bounded by ( ) 2 n(n 1) log n 1 e 2 log n = 1 n(n 1) n 2. Proof. The algorithm has a one side error: the output is never smaller than the min-cut value.

The Taylor series expansion of e x gives e x = 1 x + x 2 2!... Thus, for x < 1, 1 x e x.