Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Similar documents
Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Lecture 2: Repetition of probability theory and statistics

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

1.1 Review of Probability Theory

1 Presessional Probability

Algorithms for Uncertainty Quantification

Probability. J.R. Norris. December 13, 2017

STAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed.

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

THE QUEEN S UNIVERSITY OF BELFAST

Lecture 11. Probability Theory: an Overveiw

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture 1: August 28

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

EE514A Information Theory I Fall 2013

Lectures on Elementary Probability. William G. Faris

Quick Tour of Basic Probability Theory and Linear Algebra

Recitation 2: Probability

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Chapter 1. Sets and probability. 1.3 Probability space

Formulas for probability theory and linear models SF2941

Recap of Basic Probability Theory

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Probability Models. 4. What is the definition of the expectation of a discrete random variable?

Recap of Basic Probability Theory

Week 2. Review of Probability, Random Variables and Univariate Distributions

Probability and Distributions

Chapter 5. Chapter 5 sections

Review of Probabilities and Basic Statistics

Lecture 1: Review on Probability and Statistics

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27

Sample Spaces, Random Variables

Appendix A : Introduction to Probability and stochastic processes

Introduction to Machine Learning

PROBABILITY VITTORIA SILVESTRI

Random Variables and Their Distributions

STAT Chapter 5 Continuous Distributions

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Week 12-13: Discrete Probability

APM 504: Probability Notes. Jay Taylor Spring Jay Taylor (ASU) APM 504 Fall / 65

1 Review of Probability and Distributions

STAT 418: Probability and Stochastic Processes

3. Review of Probability and Statistics

CME 106: Review Probability theory

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Chapter 5 continued. Chapter 5 sections

Lecture 1. ABC of Probability

18.440: Lecture 28 Lectures Review

STAT 414: Introduction to Probability Theory

Random Models. Tusheng Zhang. February 14, 2013

3. DISCRETE RANDOM VARIABLES

2 n k In particular, using Stirling formula, we can calculate the asymptotic of obtaining heads exactly half of the time:

Lecture 16. Lectures 1-15 Review

Classical Probability

2. AXIOMATIC PROBABILITY

Lectures for APM 541: Stochastic Modeling in Biology. Jay Taylor

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

Actuarial Science Exam 1/P

Probability Notes. Compiled by Paul J. Hurtado. Last Compiled: September 6, 2017

Things to remember when learning probability distributions:

1: PROBABILITY REVIEW

LIST OF FORMULAS FOR STK1100 AND STK1110

Probability. Copier s Message. Schedules. This is a pure course.

1 Random Variable: Topics

Deep Learning for Computer Vision

Moments. Raw moment: February 25, 2014 Normalized / Standardized moment:

Probability Theory for Machine Learning. Chris Cremer September 2015

Probability Distributions Columns (a) through (d)

Northwestern University Department of Electrical Engineering and Computer Science

Probability and Statistics. Vittoria Silvestri

HANDBOOK OF APPLICABLE MATHEMATICS

Probability: Handout

Chapter 2. Probability

18.440: Lecture 28 Lectures Review

Practice Examination # 3

Lecture 2: Review of Probability

Set Theory Digression

3. Probability and Statistics

Basics on Probability. Jingrui He 09/11/2007

S n = x + X 1 + X X n.

EE4601 Communication Systems

Probability reminders

2.1 Elementary probability; random sampling

Final Exam # 3. Sta 230: Probability. December 16, 2012

Chapter 6: Functions of Random Variables

Probability and Statistics

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.

Probability. Computer Science Tripos, Part IA. R.J. Gibbens. Computer Laboratory University of Cambridge. Easter Term 2008/9

System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models

Chapter 2: Random Variables

M378K In-Class Assignment #1

Lecture 2: Review of Basic Probability Theory

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Random Variables. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

Lecture 1: Probability Fundamentals

MATH 3510: PROBABILITY AND STATS June 15, 2011 MIDTERM EXAM

Brief Review of Probability

Transcription:

Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures. They are nowhere near accurate representations of what was actually lectured, and in particular, all errors are almost surely mine. Basic concepts Classical probability, equally likely outcomes. Combinatorial analysis, permutations and combinations. Stirling s formula (asymptotics for log n! proved). [3] Axiomatic approach Axioms (countable case). Probability spaces. Inclusion-exclusion formula. Continuity and subadditivity of probability measures. Independence. Binomial, Poisson and geometric distributions. Relation between Poisson and binomial distributions. Conditional probability, Bayes s formula. Examples, including Simpson s paradox. [5] Discrete random variables Expectation. Functions of a random variable, indicator function, variance, standard deviation. Covariance, independence of random variables. Generating functions: sums of independent random variables, random sum formula, moments. Conditional expectation. Random walks: gambler s ruin, recurrence relations. Difference equations and their solution. Mean time to absorption. Branching processes: generating functions and extinction probability. Combinatorial applications of generating functions. [7] Continuous random variables Distributions and density functions. Expectations; expectation of a function of a random variable. Uniform, normal and exponential random variables. Memoryless property of exponential distribution. Joint distributions: transformation of random variables (including Jacobians), examples. Simulation: generating continuous random variables, independent normal random variables. Geometrical probability: Bertrand s paradox, Buffon s needle. Correlation coefficient, bivariate normal random variables. [6] Inequalities and limits Markov s inequality, Chebyshev s inequality. Weak law of large numbers. Convexity: Jensen s inequality for general random variables, AM/GM inequality. Moment generating functions and statement (no proof) of continuity theorem. Statement of central limit theorem and sketch of proof. Examples, including sampling. [3] 1

Contents IA Probability (Definitions) Contents 0 Introduction 3 1 Classical probability 4 1.1 Classical probability......................... 4 1.2 Counting............................... 4 1.3 Stirling s formula........................... 4 2 Axioms of probability 5 2.1 Axioms and definitions........................ 5 2.2 Inequalities and formulae...................... 6 2.3 Independence............................. 6 2.4 Important discrete distributions................... 6 2.5 Conditional probability....................... 7 3 Discrete random variables 8 3.1 Discrete random variables...................... 8 3.2 Inequalities.............................. 9 3.3 Weak law of large numbers..................... 9 3.4 Multiple random variables...................... 9 3.5 Probability generating functions.................. 10 4 Interesting problems 11 4.1 Branching processes......................... 11 4.2 Random walk and gambler s ruin.................. 11 5 Continuous random variables 12 5.1 Continuous random variables.................... 12 5.2 Stochastic ordering and inspection paradox............ 13 5.3 Jointly distributed random variables................ 13 5.4 Geometric probability........................ 14 5.5 The normal distribution....................... 14 5.6 Transformation of random variables................ 14 5.7 Moment generating functions.................... 15 6 More distributions 16 6.1 Cauchy distribution......................... 16 6.2 Gamma distribution......................... 16 6.3 Beta distribution*.......................... 16 6.4 More on the normal distribution.................. 16 6.5 Multivariate normal......................... 16 7 Central limit theorem 17 8 Summary of distributions 18 8.1 Discrete distributions......................... 18 8.2 Continuous distributions....................... 18 2

0 Introduction IA Probability (Definitions) 0 Introduction 3

1 Classical probability IA Probability (Definitions) 1 Classical probability 1.1 Classical probability Definition (Classical probability). Classical probability applies in a situation when there are a finite number of equally likely outcome. Definition (Sample space). The set of all possible outcomes is the sample space, Ω. We can lists the outcomes as ω 1, ω 2, Ω. Each ω Ω is an outcome. Definition (Event). A subset of Ω is called an event. Definition (Set notations). Given any two events A, B Ω, The complement of A is A C = A = Ā = Ω \ A. A or B is the set A B. A and B is the set A B. A and B are mutually exclusive or disjoint if A B =. If A B, then A occurring implies B occurring. Definition (Probability). Suppose Ω = {ω 1, ω 2,, ω N }. Let A Ω be an event. Then the probability of A is 1.2 Counting P(A) = Number of outcomes in A Number of outcomes in Ω = A N. Definition (Sampling with replacement). When we sample with replacement, after choosing at item, it is put back and can be chosen again. Then any sampling function f satisfies sampling with replacement. Definition (Sampling without replacement). When we sample without replacement, after choosing an item, we kill it with fire and cannot choose it again. Then f must be an injective function, and clearly we must have X n. Definition (Multinomial coefficient). A multinomial coefficient is ( ) ( )( ) ( ) n n n n1 n n1 n k 1 n! = = n 1, n 2,, n k n 1!n 2! n k!. n 1 n 2 It is the number of ways to distribute n items into k positions, in which the ith position has n i items. 1.3 Stirling s formula n k 4

2 Axioms of probability IA Probability (Definitions) 2 Axioms of probability 2.1 Axioms and definitions Definition (Probability space). A probability space is a triple (Ω, F, P). Ω is a set called the sample space, F is a collection of subsets of Ω, and P : F [0, 1] is the probability measure. F has to satisfy the following axioms: (i), Ω F. (ii) A F A C F. (iii) A 1, A 2, F i=1 A i F. And P has to satisfy the following Kolmogorov axioms: (i) 0 P(A) 1 for all A F (ii) P(Ω) = 1 (iii) For any countable collection of events A 1, A 2, which are disjoint, i.e. A i A j = for all i, j, we have ( ) P A i = P(A i ). i i Items in Ω are known as the outcomes, items in F are known as the events, and P(A) is the probability of the event A. Definition (Probability distribution). Let Ω = {ω 1, ω 2, }. Choose numbers p 1, p 2, such that i=1 p i = 1. Let p(ω i ) = p i. Then define P(A) = p(ω i ). ω i A This P(A) satisfies the above axioms, and p 1, p 2, is the probability distribution Definition (Limit of events). A sequence of events A 1, A 2, is increasing if A 1 A 2. Then we define the limit as lim A n = n A n. Similarly, if they are decreasing, i.e. A 1 A 2, then lim A n = n 1 A n. 1 5

2 Axioms of probability IA Probability (Definitions) 2.2 Inequalities and formulae 2.3 Independence Definition (Independent events). Two events A and B are independent if P(A B) = P(A)P(B). Otherwise, they are said to be dependent. Definition (Independence of multiple events). Events A 1, A 2, are said to be mutually independent if P(A i1 A i2 A ir ) = P(A i1 )P(A i2 ) P(A ir ) for any i 1, i 2, i r and r 2. 2.4 Important discrete distributions Definition (Bernoulli distribution). Suppose we toss a coin. Ω = {H, T } and p [0, 1]. The Bernoulli distribution, denoted B(1, p) has P(H) = p; P(T ) = 1 p. Definition (Binomial distribution). Suppose we toss a coin n times, each with probability p of getting heads. Then So In general, P(HHT T T ) = pp(1 p) (1 p). P(two heads) = P(k heads) = ( ) n p 2 (1 p) n 2. 2 ( ) n p k (1 p) n k. k We call this the binomial distribution and write it as B(n, p). Definition (Geometric distribution). Suppose we toss a coin with probability p of getting heads. The probability of having a head after k consecutive tails is p k = (1 p) k p This is geometric distribution. We say it is memoryless because how many tails we ve got in the past does not give us any information to how long I ll have to wait until I get a head. Definition (Hypergeometric distribution). Suppose we have an urn with n 1 red balls and n 2 black balls. We choose n balls. The probability that there are k red balls is ( n1 )( n2 ) P(k red) = k n k ). ( n1+n 2 n Definition (Poisson distribution). The Poisson distribution denoted P (λ) is for k N. p k = λk k! e λ 6

2 Axioms of probability IA Probability (Definitions) 2.5 Conditional probability Definition (Conditional probability). Suppose B is an event with P(B) > 0. For any event A Ω, the conditional probability of A given B is P(A B) = P(A B). P(B) We interpret as the probability of A happening given that B has happened. Definition (Partition). A partition of the sample space is a collection of disjoint events {B i } i=0 such that i B i = Ω. 7

3 Discrete random variables IA Probability (Definitions) 3 Discrete random variables 3.1 Discrete random variables Definition (Random variable). A random variable X taking values in a set Ω X is a function X : Ω Ω X. Ω X is usually a set of numbers, e.g. R or N. Definition (Discrete random variables). A random variable is discrete if Ω X is finite or countably infinite. Notation. Let T Ω X, define P(X T ) = P({ω Ω : X(ω) T }). i.e. the probability that the outcome is in T. Definition (Discrete uniform distribution). A discrete uniform distribution is a discrete distribution with finitely many possible outcomes, in which each outcome is equally likely. Notation. We write P X (x) = P(X = x). We can also write X B(n, p) to mean ( ) n P(X = r) = p r (1 p) n r, r and similarly for the other distributions we have come up with before. Definition (Expectation). The expectation (or mean) of a real-valued X is equal to E[X] = ω Ω p ω X(ω). provided this is absolutely convergent. Otherwise, we say the expectation doesn t exist. Alternatively, E[X] = p ω X(ω) x Ω X ω:x(ω)=x = x Ω X x = ω:x(ω)=x p ω x Ω X xp (X = x). We are sometimes lazy and just write EX. Definition (Variance and standard deviation). The variance of a random variable X is defined as var(x) = E[(X E[X]) 2 ]. The standard deviation is the square root of the variance, var(x). 8

3 Discrete random variables IA Probability (Definitions) Definition (Indicator function). The indicator function or indicator variable I[A] (or I A ) of an event A Ω is { 1 ω A I[A](ω) = 0 ω A Definition (Independent random variables). Let X 1, X 2,, X n be discrete random variables. They are independent iff for any x 1, x 2,, x n, 3.2 Inequalities P(X 1 = x 1,, X n = x n ) = P(X 1 = x 1 ) P(X n = x n ). Definition (Convex function). A function f : (a, b) R is convex if for all x 1, x 2 (a, b) and λ 1, λ 2 0 such that λ 1 + λ 2 = 1, λ 1 f(x 1 ) + λ 2 f(x 2 ) f(λ 1 x 1 + λ 2 x 2 ). It is strictly convex if the inequality above is strict (except when x 1 = x 2 or λ 1 or λ 2 = 0). λ 1 f(x 1 ) + λ 2 f(x 2 ) A function is concave if f is convex. λ x 1 x 1 + λ 2 x 2 1 x 2 3.3 Weak law of large numbers 3.4 Multiple random variables Definition (Covariance). Given two random variables X, Y, the covariance is cov(x, Y ) = E[(X E[X])(Y E[Y ])]. Definition (Correlation coefficient). The correlation coefficient of X and Y is corr(x, Y ) = cov(x, Y ) var(x) var(y ). Definition (Conditional distribution). Let X and Y be random variables (in general not independent) with joint distribution P(X = x, Y = y). Then the marginal distribution (or simply distribution) of X is P(X = x) = y Ω y P(X = x, Y = y). 9

3 Discrete random variables IA Probability (Definitions) The conditional distribution of X given Y is P(X = x Y = y) = The conditional expectation of X given Y is E[X Y = y] = P(X = x, Y = y). P(Y = y) x Ω X xp(x = x Y = y). We can view E[X Y ] as a random variable in Y : given a value of Y, we return the expectation of X. 3.5 Probability generating functions Definition (Probability generating function (pgf)). The probability generating function (pgf) of X is p(z) = E[z X ] = P(X = r)z r = p 0 + p 1 z + p 2 z 2 = r=0 p r z r. 0 This is a power series (or polynomial), and converges if z 1, since p(z) r p r z r r p r = 1. We sometimes write as p X (z) to indicate what the random variable. 10

4 Interesting problems IA Probability (Definitions) 4 Interesting problems 4.1 Branching processes 4.2 Random walk and gambler s ruin Definition (Random walk). Let X 1,, X n be iid random variables such that X n = +1 with probability p, and 1 with probability 1 p. Let S n = S 0 + X 1 + + X n. Then (S 0, S 1,, S n ) is a 1-dimensional random walk. If p = q = 1 2, we say it is a symmetric random walk. 11

5 Continuous random variables IA Probability (Definitions) 5 Continuous random variables 5.1 Continuous random variables Definition (Continuous random variable). A random variable X : Ω R is continuous if there is a function f : R R 0 such that P(a X b) = b a f(x) dx. We call f the probability density function, which satisfies f 0 f(x) = 1. Definition (Cumulative distribution function). The cumulative distribution function (or simply distribution function) of a random variable X (discrete, continuous, or neither) is F (x) = P(X x). Definition (Uniform distribution). The uniform distribution on [a, b] has pdf Then F (x) = f(x) = 1 b a. x a f(z) dz = x a b a for a x b. If X follows a uniform distribution on [a, b], we write X U[a, b]. Definition (Exponential random variable). The exponential random variable with parameter λ has pdf f(x) = λe λx and for x 0. We write X E(λ). F (x) = 1 e λx Definition (Expectation). The expectation (or mean) of a continuous random variable is E[X] = xf(x) dx, provided not both xf(x) dx and 0 xf(x) dx are infinite. 0 Definition (Variance). The variance of a continuous random variable is var(x) = E[(X E[X]) 2 ] = E[X 2 ] (E[X]) 2. 12

5 Continuous random variables IA Probability (Definitions) Definition (Mode and median). Given a pdf f(x), we call ˆx a mode if f(ˆx) f(x) for all x. Note that a distribution can have many modes. For example, in the uniform distribution, all x are modes. We say it is a median if ˆx f(x) dx = 1 2 = f(x) dx. For a discrete random variable, the median is ˆx such that P(X ˆx) 1 2, P(X ˆx) 1 2. Here we have a non-strict inequality since if the random variable, say, always takes value 0, then both probabilities will be 1. ˆx Definition (Sample mean). If X 1,, X n distribution, then the sample mean is is a random sample from some X = 1 n n X i. 1 5.2 Stochastic ordering and inspection paradox Definition (Stochastic order). The stochastic order is defined as: X st Y if P(X > t) P(Y > t) for all t. 5.3 Jointly distributed random variables Definition (Joint distribution). Two random variables X, Y have joint distribution F : R 2 [0, 1] defined by The marginal distribution of X is F (x, y) = P(X x, Y y). F X (x) = P(X x) = P(X x, Y < ) = F (x, ) = lim F (x, y) y Definition (Jointly distributed random variables). We say X 1,, X n are jointly distributed continuous random variables and have joint pdf f if for any set A R n P((X 1,, X n ) A) = f(x 1,, x n ) dx 1 dx n. (x 1, x n) A where and f(x 1,, x n ) 0 R n f(x 1,, x n ) dx 1 dx n = 1. 13

5 Continuous random variables IA Probability (Definitions) Definition (Independent continuous random variables). Continuous random variables X 1,, X n are independent if P(X 1 A 1, X 2 A 2,, X n A n ) = P(X 1 A 1 )P(X 2 A 2 ) P(X n A n ) for all A i Ω Xi. If we let F Xi and f Xi be the cdf, pdf of X, then and F (x 1,, x n ) = F X1 (x 1 ) F Xn (x n ) f(x 1,, x n ) = f X1 (x 1 ) f Xn (x n ) are each individually equivalent to the definition above. 5.4 Geometric probability 5.5 The normal distribution Definition (Normal distribution). The normal distribution with parameters µ, σ 2, written N(µ, σ 2 ) has pdf f(x) = 1 ) (x µ)2 exp ( 2πσ 2σ 2, for < x <. It looks approximately like this: µ The standard normal is when µ = 0, σ 2 = 1, i.e. X N(0, 1). We usually write φ(x) for the pdf and Φ(x) for the cdf of the standard normal. 5.6 Transformation of random variables Definition (Jacobian determinant). Suppose si y j exists and is continuous at every point (y 1,, y n ) S. Then the Jacobian determinant is s 1 J = (s y 1 1,, s n ) (y 1,, y n ) = det..... s n y 1 Definition (Order statistics). Suppose that X 1,, X n are some random variables, and Y 1,, Y n is X 1,, X n arranged in increasing order, i.e. Y 1 Y 2 Y n. This is the order statistics. We sometimes write Y i = X (i). s 1 y n s n y n 14

5 Continuous random variables IA Probability (Definitions) 5.7 Moment generating functions Definition (Moment generating function). The moment generating function of a random variable X is m(θ) = E[e θx ]. For those θ in which m(θ) is finite, we have m(θ) = e θx f(x) dx. Definition (Moment). The rth moment of X is E[X r ]. 15

6 More distributions IA Probability (Definitions) 6 More distributions 6.1 Cauchy distribution Definition (Cauchy distribution). The Cauchy distribution has pdf for < x <. f(x) = 6.2 Gamma distribution 1 π(1 + x 2 ) Definition (Gamma distribution). The gamma distribution Γ(n, λ) has pdf f(x) = λn x n 1 e λx. (n 1)! We can show that this is a distribution by showing that it integrates to 1. 6.3 Beta distribution* Definition (Beta distribution). The beta distribution β(a, b) has pdf f(x; a, b) = for 0 x 1. This has mean a/(a + b). Γ(a + b) Γ(a)Γ(b) xa 1 (1 x) b 1 6.4 More on the normal distribution 6.5 Multivariate normal 16

7 Central limit theorem IA Probability (Definitions) 7 Central limit theorem 17

8 Summary of distributions IA Probability (Definitions) 8 Summary of distributions 8.1 Discrete distributions 8.2 Continuous distributions 18