MAS113 Introduction to Probability and Statistics

Similar documents
Proving the central limit theorem

Fundamental Tools - Probability Theory IV

Limiting Distributions

Practice Problem - Skewness of Bernoulli Random Variable. Lecture 7: Joint Distributions and the Law of Large Numbers. Joint Distributions - Example

Sampling Distributions

Lecture 7: Chapter 7. Sums of Random Variables and Long-Term Averages

Limiting Distributions

Midterm #1. Lecture 10: Joint Distributions and the Law of Large Numbers. Joint Distributions - Example, cont. Joint Distributions - Example

Limit Theorems. STATISTICS Lecture no Department of Econometrics FEM UO Brno office 69a, tel

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

CS145: Probability & Computing

COMPSCI 240: Reasoning Under Uncertainty

Topic 7: Convergence of Random Variables

Things to remember when learning probability distributions:

Example continued. Math 425 Intro to Probability Lecture 37. Example continued. Example

Tom Salisbury

COMP2610/COMP Information Theory

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

6 The normal distribution, the central limit theorem and random samples

Central Theorems Chris Piech CS109, Stanford University

18.175: Lecture 8 Weak laws and moment-generating/characteristic functions

Introduction to Probability

Lecture 11. Probability Theory: an Overveiw

Lecture 1: August 28

Homework for 1/13 Due 1/22

Lecture Notes 3 Convergence (Chapter 5)

Discrete Probability Refresher

Joint Probability Distributions and Random Samples (Devore Chapter Five)

CSE 312 Final Review: Section AA

Lecture 8. October 22, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Two hours. Statistical Tables to be provided THE UNIVERSITY OF MANCHESTER. 14 January :45 11:45

Lecture 4: September Reminder: convergence of sequences

CSE 103 Homework 8: Solutions November 30, var(x) = np(1 p) = P r( X ) 0.95 P r( X ) 0.

Mathematical Statistics

1 Presessional Probability

Lecture 8 Sampling Theory

Stochastic Models (Lecture #4)

Quick Tour of Basic Probability Theory and Linear Algebra

Continuous Expectation and Variance, the Law of Large Numbers, and the Central Limit Theorem Spring 2014

Expectation of Random Variables

Department of Mathematics

The central limit theorem

Moments. Raw moment: February 25, 2014 Normalized / Standardized moment:

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Mathematical Statistics 1 Math A 6330

Random Variables and Their Distributions

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

Chapter 6: Large Random Samples Sections

Topic 3: The Expectation of a Random Variable

Stat 5101 Lecture Slides: Deck 7 Asymptotics, also called Large Sample Theory. Charles J. Geyer School of Statistics University of Minnesota

Probability Models. 4. What is the definition of the expectation of a discrete random variable?

Twelfth Problem Assignment

HT Introduction. P(X i = x i ) = e λ λ x i

Lecture 3 - Expectation, inequalities and laws of large numbers

Chapter 3: Random Variables 1

Fundamental Tools - Probability Theory II

Bernoulli and Binomial

Properties of Random Variables

Chapter 7: Special Distributions

Chapter 4. Chapter 4 sections

Chapter 5. Means and Variances

Expectation is linear. So far we saw that E(X + Y ) = E(X) + E(Y ). Let α R. Then,

ECE302 Spring 2015 HW10 Solutions May 3,

Lecture notes for Part A Probability

ECE302 Exam 2 Version A April 21, You must show ALL of your work for full credit. Please leave fractions as fractions, but simplify them, etc.

Joint Distribution of Two or More Random Variables

More on Distribution Function

STAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed.

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Probability and Measure

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27

STAT/MATH 395 A - PROBABILITY II UW Winter Quarter Moment functions. x r p X (x) (1) E[X r ] = x r f X (x) dx (2) (x E[X]) r p X (x) (3)

Special distributions

8 Laws of large numbers

MATH 3510: PROBABILITY AND STATS June 15, 2011 MIDTERM EXAM

Conditional distributions (discrete case)

Math 151. Rumbos Spring Solutions to Review Problems for Exam 3

6.1 Moment Generating and Characteristic Functions

Lecture 18: Central Limit Theorem. Lisa Yan August 6, 2018

MATH Notebook 5 Fall 2018/2019

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Formulas for probability theory and linear models SF2941

Lecture 1: Review on Probability and Statistics

PROBABILITY THEORY LECTURE 3

Lecture 2: Repetition of probability theory and statistics

SDS 321: Introduction to Probability and Statistics

APPM/MATH 4/5520 Solutions to Exam I Review Problems. f X 1,X 2. 2e x 1 x 2. = x 2

Week 2. Review of Probability, Random Variables and Univariate Distributions

Copyright c 2006 Jason Underdown Some rights reserved. choose notation. n distinct items divided into r distinct groups.

Probability Distributions

Probability and Statistics Notes

18.175: Lecture 13 Infinite divisibility and Lévy processes

Chapter 7. Basic Probability Theory

CDA5530: Performance Models of Computers and Networks. Chapter 2: Review of Practical Random Variables

STAT/MATH 395 PROBABILITY II

Susceptible-Infective-Removed Epidemics and Erdős-Rényi random

Chapter 5. Chapter 5 sections

Transcription:

MAS113 Introduction to Probability and Statistics School of Mathematics and Statistics, University of Sheffield 2018 19

Identically distributed Suppose we have n random variables X 1, X 2,..., X n.

Identically distributed Suppose we have n random variables X 1, X 2,..., X n. If these each have the same probability distribution, which is the same thing as their cumulative distribution functions being the same, P(X 1 a) = P(X 2 a) = P(X n a), then we say that they are identically distributed.

Independent and identically distributed We are particularly interested in the case where X 1, X 2,..., X n are not only identically distributed, but also independent so that for all 1 i, j n with i j: P{(X i a) (X j b)} = P(X i a)p(x j b).

Independent and identically distributed We are particularly interested in the case where X 1, X 2,..., X n are not only identically distributed, but also independent so that for all 1 i, j n with i j: P{(X i a) (X j b)} = P(X i a)p(x j b). We then say that X 1, X 2,..., X n are independent and identically distributed, or i.i.d. for short.

Independent and identically distributed cont. We can, if we like, regard X 1, X 2,..., X n as independent copies of some given random variable.

Independent and identically distributed cont. We can, if we like, regard X 1, X 2,..., X n as independent copies of some given random variable. I.i.d. random variables are very important in applications as they describe repeated experiments that are carried out under identical conditions, in which the outcome of each experiment does not affect the others.

Sums Now define S(n) to be the sum and X (n) to be the mean:

Sums Now define S(n) to be the sum and X (n) to be the mean: S(n) = n X i, i=1

Sums Now define S(n) to be the sum and X (n) to be the mean: S(n) = n X i, i=1 and X (n) = S(n) n.

Sums Now define S(n) to be the sum and X (n) to be the mean: S(n) = n X i, i=1 and X (n) = S(n) n. Both S(n) and X (n) are also random variables, as they are functions of the random variables X 1,..., X n.

Mean and variance of sums If we write E(X i ) = µ and Var(X i ) = σ 2, it is straightforward to derive the mean and variance of S(n) and X (n) in terms of µ and σ 2.

Mean and variance of sums If we write E(X i ) = µ and Var(X i ) = σ 2, it is straightforward to derive the mean and variance of S(n) and X (n) in terms of µ and σ 2. Theorem We have: 1 E(S(n)) = nµ;

Mean and variance of sums If we write E(X i ) = µ and Var(X i ) = σ 2, it is straightforward to derive the mean and variance of S(n) and X (n) in terms of µ and σ 2. Theorem We have: 1 E(S(n)) = nµ; 2 Var(S(n)) = nσ 2 ;

Mean and variance of sums If we write E(X i ) = µ and Var(X i ) = σ 2, it is straightforward to derive the mean and variance of S(n) and X (n) in terms of µ and σ 2. Theorem We have: 1 E(S(n)) = nµ; 2 Var(S(n)) = nσ 2 ; 3 E( X (n)) = µ;

Mean and variance of sums If we write E(X i ) = µ and Var(X i ) = σ 2, it is straightforward to derive the mean and variance of S(n) and X (n) in terms of µ and σ 2. Theorem We have: 1 E(S(n)) = nµ; 2 Var(S(n)) = nσ 2 ; 3 E( X (n)) = µ; 4 Var( X (n)) = σ2 n

Standard error The standard deviation of X (n) plays an important role.

Standard error The standard deviation of X (n) plays an important role. It is called the standard error and we denote it by SE( X (n)), so that SE( X (n)) = σ n.

Application in statistics These results have important applications in statistics.

Application in statistics These results have important applications in statistics. Suppose we are able to observe i.i.d. random variables X 1,..., X n, but we don t know the value of E(X i ) = µ.

Application in statistics These results have important applications in statistics. Suppose we are able to observe i.i.d. random variables X 1,..., X n, but we don t know the value of E(X i ) = µ. Theorem 28 part 4 tells us that as n increases, the variance of X (n) gets smaller, and the smaller the variance is, the closer we expect X (n) to be to its mean value.

Application in statistics These results have important applications in statistics. Suppose we are able to observe i.i.d. random variables X 1,..., X n, but we don t know the value of E(X i ) = µ. Theorem 28 part 4 tells us that as n increases, the variance of X (n) gets smaller, and the smaller the variance is, the closer we expect X (n) to be to its mean value. Theorem 28 part 3 tells us that the mean value of X (n) is µ (for any value of n). In other words, as n gets larger, we expect X (n) to be increasingly close to the unknown quantity µ, so we can use the observed value of X (n) to estimate µ.

Illustration The four plots show the density functions of X (n) for n = 1, 10, 20 and 100. In each case, X 1,..., X n N(0, 1), so E(X i ) = µ = 0. 4 n=1 4 n=10 3 2 1 0 4 2 0 2 4 x n=20 4 3 2 1 0 4 2 0 2 4 x 3 2 1 0 4 2 0 2 4 x n=100 4 3 2 1 0 4 2 0 2 4 x

Examples Here are two key examples of sums of i.i.d. random variables:

Examples Here are two key examples of sums of i.i.d. random variables: If X i Bernoulli(p) for 1 i n, then S(n) Bin(n, p).

Examples Here are two key examples of sums of i.i.d. random variables: If X i Bernoulli(p) for 1 i n, then S(n) Bin(n, p). If X i N(µ, σ 2 ) for 1 i n, then S(n) N(nµ, nσ 2 ).

Examples Here are two key examples of sums of i.i.d. random variables: If X i Bernoulli(p) for 1 i n, then S(n) Bin(n, p). If X i N(µ, σ 2 ) for 1 i n, then S(n) N(nµ, nσ 2 ). The first of these is how we defined the Binomial; we will prove the second later on, using moment generating functions.

Chebyshev s inequality We will now derive an important result regarding the behaviour of X (n) for large n.

Chebyshev s inequality We will now derive an important result regarding the behaviour of X (n) for large n. We first prove a useful inequality. It is true if X is discrete or continuous. Theorem (Chebyshev s inequality)

Chebyshev s inequality We will now derive an important result regarding the behaviour of X (n) for large n. We first prove a useful inequality. It is true if X is discrete or continuous. Theorem (Chebyshev s inequality) Let X be a random variable for which E(X ) = µ and Var(X ) = σ 2. Then for any c > 0 P( X µ c) σ2 c 2.

Chebyshev s inequality cont. The same inequality holds if P( X µ c) is replaced by P( X µ > c), as P( X µ > c) P( X µ c).

Chebyshev s inequality cont. The same inequality holds if P( X µ c) is replaced by P( X µ > c), as P( X µ > c) P( X µ c). What does Chebyshev s inequality tell us?

Chebyshev s inequality cont. The same inequality holds if P( X µ c) is replaced by P( X µ > c), as P( X µ > c) P( X µ c). What does Chebyshev s inequality tell us? We expect the probability to find the value of a random variable to be smaller, the further away we get from the mean.

Chebyshev s inequality cont. The same inequality holds if P( X µ c) is replaced by P( X µ > c), as P( X µ > c) P( X µ c). What does Chebyshev s inequality tell us? We expect the probability to find the value of a random variable to be smaller, the further away we get from the mean. But a large variance may counteract this a little, as it tells us that the values which have high probability are more spread out. Chebyshev s inequality makes this more precise.

Example Example A random variable X has mean 1 and variance 0.5. What can you say about P(X > 6)?

Weak Law of Large Numbers Theorem (The Weak Law of Large Numbers) Let X 1, X 2,... be a sequence of i.i.d. random variables, each with mean µ and variance σ 2. Then for all ɛ > 0, lim P( X (n) µ > ɛ) = 0. n

Strong Law It is possible to prove a stronger result, which is that ( ) P X (n) = µ = 1; lim n

Strong Law It is possible to prove a stronger result, which is that ( ) P X (n) = µ = 1; lim n but the proof is outside the scope of this module.

Strong Law It is possible to prove a stronger result, which is that ( ) P X (n) = µ = 1; lim n but the proof is outside the scope of this module. This result is known as the strong law of large numbers.

Informal law of large numbers In section 4 we introduced the following informal version: The law of large numbers (an informal version). Suppose we do a sequence of independent experiments, so that the outcome in one experiment has no effect on the outcome in another experiment.

Informal law of large numbers In section 4 we introduced the following informal version: The law of large numbers (an informal version). Suppose we do a sequence of independent experiments, so that the outcome in one experiment has no effect on the outcome in another experiment. Now suppose in experiment i, there is an event E i that has a probability of p of occurring, for i = 1, 2,....

Informal law of large numbers In section 4 we introduced the following informal version: The law of large numbers (an informal version). Suppose we do a sequence of independent experiments, so that the outcome in one experiment has no effect on the outcome in another experiment. Now suppose in experiment i, there is an event E i that has a probability of p of occurring, for i = 1, 2,.... The proportion of events out of E 1, E 2,... which actually occur as we do the experiments will typically get closer and closer to p, as the number of experiments increases.

Relationship to informal law To see the relationship between this and Theorem 30, for each of the events E i define a random variable X i which takes the value 1 if E i occurs and 0 if it does not.

Relationship to informal law To see the relationship between this and Theorem 30, for each of the events E i define a random variable X i which takes the value 1 if E i occurs and 0 if it does not. Then X i is a Bernoulli random variable with P(X i = 1) = p and P(X i = 0) = 1 p. As the E i are independent, the X i will be independent too, and it is easy to see that the mean µ = p.

Relationship to informal law Hence Theorem 30 tells us that, for any ɛ > 0, lim P( X (n) p > ɛ) = 0, n and X (n) here is precisely the proportion of the events E 1, E 2,..., E n which actually occur.

Relationship to informal law Hence Theorem 30 tells us that, for any ɛ > 0, lim P( X (n) p > ɛ) = 0, n and X (n) here is precisely the proportion of the events E 1, E 2,..., E n which actually occur. So Theorem 30 tells us that if n is large there is high probability that the proportion of the events E 1, E 2,..., E n which occur is close to p.