CS206 Review Sheet 3 October 24, 2018

Similar documents
Conditional Probability

What is Probability? Probability. Sample Spaces and Events. Simple Event

Expected Value 7/7/2006

CSC Discrete Math I, Spring Discrete Probability

Sec$on Summary. Assigning Probabilities Probabilities of Complements and Unions of Events Conditional Probability

Mathematics. ( : Focus on free Education) (Chapter 16) (Probability) (Class XI) Exercise 16.2

STAT 430/510 Probability

Notation: X = random variable; x = particular value; P(X = x) denotes probability that X equals the value x.

Probability (10A) Young Won Lim 6/12/17

CS4705. Probability Review and Naïve Bayes. Slides from Dragomir Radev

Discrete Random Variable

Events A and B are said to be independent if the occurrence of A does not affect the probability of B.

Lecture 1 : The Mathematical Theory of Probability

Chapter 2. Conditional Probability and Independence. 2.1 Conditional Probability

Homework 4 Solution, due July 23

Quantitative Methods for Decision Making

27 Binary Arithmetic: An Application to Programming

Lecture 10: Bayes' Theorem, Expected Value and Variance Lecturer: Lale Özkahya

Binomial and Poisson Probability Distributions


The Bernoulli distribution has only two outcomes, Y, success or failure, with values one or zero. The probability of success is p.

STA 2023 EXAM-2 Practice Problems. Ven Mudunuru. From Chapters 4, 5, & Partly 6. With SOLUTIONS

Lecture 4: Probability, Proof Techniques, Method of Induction Lecturer: Lale Özkahya

Part (A): Review of Probability [Statistics I revision]

What is a random variable

STA 2023 EXAM-2 Practice Problems From Chapters 4, 5, & Partly 6. With SOLUTIONS

Chapter 2. Conditional Probability and Independence. 2.1 Conditional Probability

Lecture 16. Lectures 1-15 Review

MATH1231 Algebra, 2017 Chapter 9: Probability and Statistics

Probability, Random Processes and Inference

324 Stat Lecture Notes (1) Probability

Introduction to Probability 2017/18 Supplementary Problems

ω X(ω) Y (ω) hhh 3 1 hht 2 1 hth 2 1 htt 1 1 thh 2 2 tht 1 2 tth 1 3 ttt 0 none

Discrete random variables and probability distributions

SDS 321: Introduction to Probability and Statistics

More on Distribution Function

4/17/2012. NE ( ) # of ways an event can happen NS ( ) # of events in the sample space

CISC 1100/1400 Structures of Comp. Sci./Discrete Structures Chapter 7 Probability. Outline. Terminology and background. Arthur G.

Outline. 1. Define likelihood 2. Interpretations of likelihoods 3. Likelihood plots 4. Maximum likelihood 5. Likelihood ratio benchmarks

Probabilistic models

CS 441 Discrete Mathematics for CS Lecture 20. Probabilities. CS 441 Discrete mathematics for CS. Probabilities

4th IIA-Penn State Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur

Discrete Structures for Computer Science

(i) Given that a student is female, what is the probability of having a GPA of at least 3.0?

Chapter 3: Random Variables 1

V. RANDOM VARIABLES, PROBABILITY DISTRIBUTIONS, EXPECTED VALUE

Discrete Probability Distribution

p. 4-1 Random Variables

Dept. of Linguistics, Indiana University Fall 2015

Conditional Probability

Chapter 3: Random Variables 1

MODULE 2 RANDOM VARIABLE AND ITS DISTRIBUTION LECTURES DISTRIBUTION FUNCTION AND ITS PROPERTIES

Applied Statistics I

Probability: Terminology and Examples Class 2, Jeremy Orloff and Jonathan Bloom

CLASS 6 July 16, 2015 STT

9/6/2016. Section 5.1 Probability. Equally Likely Model. The Division Rule: P(A)=#(A)/#(S) Some Popular Randomizers.

Conditional Probability and Bayes Theorem (2.4) Independence (2.5)

Chapter 2 PROBABILITY SAMPLE SPACE

1 INFO Sep 05

Probability Distributions for Discrete RV

Lecture Notes 1 Basic Probability. Elements of Probability. Conditional probability. Sequential Calculation of Probability

2. Conditional Probability

Statistical Experiment A statistical experiment is any process by which measurements are obtained.

Basic Statistics for SGPE Students Part II: Probability theory 1

Solution to HW 12. Since B and B 2 form a partition, we have P (A) = P (A B 1 )P (B 1 ) + P (A B 2 )P (B 2 ). Using P (A) = 21.

Deep Learning for Computer Vision

Theorem 1.7 [Bayes' Law]: Assume that,,, are mutually disjoint events in the sample space s.t.. Then Pr( )

TOPIC 12 PROBABILITY SCHEMATIC DIAGRAM

18440: Probability and Random variables Quiz 1 Friday, October 17th, 2014

Joint Distribution of Two or More Random Variables

LECTURE NOTES by DR. J.S.V.R. KRISHNA PRASAD

Probability Distributions. Conditional Probability.

Massachusetts Institute of Technology

3rd IIA-Penn State Astrostatistics School July, 2010 Vainu Bappu Observatory, Kavalur

ELEG 3143 Probability & Stochastic Process Ch. 1 Experiments, Models, and Probabilities

Probability Theory and Applications

Recap of Basic Probability Theory

Laws of Probability, Bayes theorem, and the Central Limit Theorem

Introduction to Statistics

M378K In-Class Assignment #1

Expected Value. Lecture A Tiefenbruck MWF 9-9:50am Center 212 Lecture B Jones MWF 2-2:50pm Center 214 Lecture C Tiefenbruck MWF 11-11:50am Center 212

Introduction to Probability and Sample Spaces

Statistical methods in recognition. Why is classification a problem?

Probability Pearson Education, Inc. Slide

To understand and analyze this test, we need to have the right model for the events. We need to identify an event and its probability.

Why should you care?? Intellectual curiosity. Gambling. Mathematically the same as the ESP decision problem we discussed in Week 4.

Probability. VCE Maths Methods - Unit 2 - Probability

Recap of Basic Probability Theory

Random Variable. Discrete Random Variable. Continuous Random Variable. Discrete Random Variable. Discrete Probability Distribution

Chapter 2: The Random Variable

Lecture 3 - Axioms of Probability

Notes slides from before lecture. CSE 21, Winter 2017, Section A00. Lecture 15 Notes. Class URL:

Probability Year 9. Terminology

Chapter 4: Probability and Probability Distributions

I. Introduction to probability, 1

PROBABILITY CHAPTER LEARNING OBJECTIVES UNIT OVERVIEW

n(1 p i ) n 1 p i = 1 3 i=1 E(X i p = p i )P(p = p i ) = 1 3 p i = n 3 (p 1 + p 2 + p 3 ). p i i=1 P(X i = 1 p = p i )P(p = p i ) = p1+p2+p3

Outline. 1. Define likelihood 2. Interpretations of likelihoods 3. Likelihood plots 4. Maximum likelihood 5. Likelihood ratio benchmarks

Conditional Probability (cont...) 10/06/2005

II. Introduction to probability, 2

Transcription:

CS206 Review Sheet 3 October 24, 2018 After ourintense focusoncounting, wecontinue withthestudyofsomemoreofthebasic notions from Probability (though counting will remain in our thoughts). An important concept is that of the Random Variable: Consider a random experiment E that has probability space (S, P). A random variable X is a function from the sample space S to the reals. For example in the experiment of tossing a fair coin three times, let X be the profit if you receive a dollar for each Head and pay a dollar for each Tail. w S HHH HHT HTH HTT THH THT TTH TTT X(w) 3 1 1 1 1 1 1 3 The outcomes of E (symbols) are in the top row; the values of the random variable X (real numbers) are in the bottom. Define Range(X) = {all possible values of X} = {distinct X(w) : w S}; notice that it is a set, not a multi-set. In the example above, Range(X) = {3,1, 1, 3} = {a 1,...,a 4 }. Fact 1: Range(X) S : the equality can occur only if X takes a distinct value for each w S; otherwise there are outcomes w 1 w 2 in S for which X(w 1 ) = X(w 2 ). In the example above S = 8, twice the size of Range(X). Fact 2: Suppose Range(X) = {a 1,...,a k }. The events A i = {w S : X(w) = a i }, i = 1,...,k, partition S. They form what is called the partition induced by X, and we write For each a i Range(X) we define A X = {A 1,...,A k. f X (a i ) = P(A i ) = P({w S : X(w) = a i }). Also, if t R is NOT an element of Range(X), we define f X (t) = 0. The function f X is called the frequency function of X (some books call it the probability mass function). Fact 3: Because A X partitions S, a i Range(X) f X (a i ) = 1. Therefore f X is a probability measure on Range(X) and we now understand that the random variable X has mapped the original probability space (S,P) into a new one (Range(X),f X ). Independence: Random variables X and Z are independent iff their partitions A X and A Z are. For this we need P(A B) = P(A)P(B) for each A A X and each B A Z (so no hint about 1

a value of X alters your assessment of probabilities for values of Z. Pairwise, k-wise, and mutual independence for a sequence X 1,...,X n of random variables is defined via the events in their partitions. Independent Trials: We have an experiment E with sample space T = {t 1,...,t k } and we use probability P on T. E is performed n times in succession, each under identical conditions. We will refer to this sequence of repetitions of the experiment as a composite experiment E (n) = E 1,...,E n, where E i denotes the i th performance. The composite sample space for the n repetitions S (n) = {w = (w 1,...,w n ) w i T is the outcome of E i } = T T. }{{} n times Fact 1: S (n) = T n = k n. Product Probability: The points of S (n) can be assigned probabilities in infinitely many ways. We would like to do it in such a way that both 1. The original probability P is respected: For every trial E i and every t j T, the probability that t j occurs on the i th repetition should be P(t j ); in other words, we want and 2. the repetitions are independent. Prob{w = (w 1,...,w n ) S (n) : w i = t j } = P(t j ), (1) The product probability measure P (n) on S (n) is defined by It is not hard to show that P (n) (w) = P(w 1 )P(w 2 )...P(w n ), for all w = (w 1,...,w n ) S (n). (2) Fact 2: P (n) really is a probability on S (n) ; i.e., P (n) (w) = 1, the sum over all outcomes in S (n). Also Fact 3: Product probability respects P on T; i.e., (1) holds for Prob = P (n). Finally, Fact 4: P (n) captures the independence of the trials in a very strong way. If A 1,...,A n are events and A i depends only on the outcome of the i th repetition (i.e., w = (w 1,...,w n ) A i depends on w i and not the other coordinates), then the A i are mutually independent. Also random variables X 1,...,X n are mutually independent as long as the value of X i (w) depends only on the w i. In fact it can be shown that Fact 5: If a probability on S (n) satisfies (1) and if events A 1,...,A n are mutually independent as long as the occurrence of A i depends only on the outcome of the i th trial, then it is product probability P (n), defined in (2). 2

Bernoulli Trials: E has two outcomes, success (s) and failure (f). P = P(s) = 1 P(f). E (n) - the repetition of E n times - is called n Bernoulli trials with success probability P. For each trial i = 1,...,n define { 1 if trial i is success X i = 0 if trial i is failure the indicator (of success) for the i th trial, and S n = X 1 + +X n measures the number of successes that occur in the n trials. Fact 1: Range(S n ) = {0,1,...,n}. The equation f Sn (k) = Prob(S n = k) = ( ) n P k (1 P) n k (3) k defines the binomial frequency function, k = 0,1,...,n. [The product probability of a particular sequence of n trials that results in k successes is P k (1 P) n k. There are ( ) n k such sequences, one for each distinct way to choose which k trials result in success]. Fact 2: It increases up to np and decreases thereafter. Infinite Sequences of Bernoulli Trials: Let E be a Bernoulli trial experiment with success probability P. 1. (k = 1) Let E be the experiment repeat E until a succuss occurs. The composite sample space is S = {s,fs,ffs,...}. S =. The interesting random variable is W 1, the number of repetitions; i.e., the waiting time for the first success. Fact 1: Range(W 1 ) = {1,2,...}. The equation f W1 (n) = Prob(W 1 = n) = (1 P) n 1 P (4) defines the geometric frequency function, n = 1,2,... 2. (general case) Now let E be the experiment repeat E until the k th succuss occurs, k 1. The composite sample space is S = {s s }{{},fs s,sfs s,...,s sfs,...}. S =. }{{}}{{}}{{} k k+1 k+1 k+1 The interesting random variable is W k, the number of repetitions; i.e., the waiting time for the k th success. Fact 2: Range(W k ) = {k,k +1,...}. The equation f Wk (n) = Prob(W k = n) = defines the negative binomial frequency function, n = k,k +1,... ( ) n 1 (1 P) n k P k (5) k 1 3

Expectation: The notion of random variable provides a basic and useful way to study aspects of a random experiment that are of particular interest. One of the most useful properties of a random variable is the expectation. Let X be a random variable defined on a probability space (S,P). Its expectation (expected value, mean) is defined by E(X) = w SX(w)P(w). (6) This is a probability-weighted-average of values of X. Suppose Range(X) = {a 1,...,a k }, and that A X = {A 1,...,A k } is the partition induced by X. Breaking the sum in (6) into sums over the events A i A X, we get Fact 1: E(X) = X(w)P(w)+ + X(w)P(w) w A 1 w A k = a 1 P(X = a 1 )+ +a k P(X = a k ) = a i f X (a i ) a i Range(X) Fact 2: Expectation is linear; that is, E(aX +by +c) = ae(x)+be(y)+c for any random variables X and Y and reals a,b,c. It is interesting to take a = b = 0 and also to take a = b = 1, c = 0. This latter extends by induction to show E(X 1 + +X n ) = E(X 1 )+ +E(X n ). (7) Fact 3 (mean of the geometric): Let W 1 be the wait for the first success in Bernoulli trials with success probability P. Then E(W 1 ) = 1/P. (You expect to wait twice as long for an event that is half as likely). This was proved by showing np(w 1 = n) = np(1 P) n 1 = 1 P. n=1 n=1 Fact 4 (mean of the negative binomial): Let W k be the wait for the k th success in Bernoulli trials with success probability P. Then E(W k ) = k/p. This implies the indentity ( ) np(w k = n) = P k n 1 n (1 P) n k = k k 1 P n=k n=k and is proved probabilistically by noting that W k = X 1 + +X k, where X 1 is the wait for the first success and X i+1 is the wait for the first success after the i-th; use (7) and note that each X i is geometric. Fact 5 (mean of the binomial): Let S n be the number of successes in n Bernoulli trials with success probability P. Then E(S n ) = np. This implies the identity ( ) n n n kp(s n = k) = k P k (1 P) n k = np k k=0 k=0 and is proved using indicators: S n = X 1 + + X n where X i, the indicator (of success) for the i th trial, has E(X i ) = P. Now use (7). 4

Coupon Collecting: There are n coupon types, each type equally likely. You collect coupons (i.e., sample from the n coupons with replacement) until you have seen r of the types (so its likely you have sampled much more than r times). The expected wait needed to collect r different coupon types is 1+ n n 1 + + [ ] n 1 n r +1 = n n + + 1 ; n r+1 (after you have seen j types you are waiting for an event (a new type) which has probability P = (n j)/n). An interesting case is r = n. The expected wait to collect the whole set of n coupons is about nlogn (natural log), using the fact that 1+1/2+ +1/n log e n. The Method of Indicators: We already applied this method to establish the mean of the binomial random variable in Fact 5. Here is another example: Let E be the experiment of placing r balls randomly into n boxes, and let N count the number of empty boxes. The computation of E(N) using E(N) = a i f N (a i ) a i Range(N) could be difficult since you need to compute the entire frequency function f N. However let X i be the indicator of the event that box i is empty. E(X i ) = 1 P(box i is empty) = (1 1/n) r (why?). Now notice that N = X 1 + X n so E(N) = E(X 1 )+ E(X n ) by linearity of expectation, and we easily obtain E(N) = n(1 1/n) r. For another example, in class we applied the same method to show that the expected number of people who get their own hats in the n hat experiment is 1 independent of n (!!!): If X i is the indicator of the event that person i gets her own hat (so E(X i ) = 1/n) and N is the total number who get their own hats, then N = X 1 + +X n and E(N) = E(X 1 )+ +E(X n ) = 1. 5