Lecture 11. Probability Theory: an Overveiw

Similar documents
Lecture 1. ABC of Probability

Lecture 4. Continuous Random Variables and Transformations of Random Variables

Lecture 2. Conditional Probability

Lecture 1. ABC of Probability

Review: mostly probability and some statistics

Recitation 2: Probability

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27

Lecture Tricks with Random Variables: The Law of Large Numbers & The Central Limit Theorem

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Lecture 2: Repetition of probability theory and statistics

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

Algorithms for Uncertainty Quantification

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Probability. Paul Schrimpf. January 23, UBC Economics 326. Probability. Paul Schrimpf. Definitions. Properties. Random variables.

Lecture 3. Discrete Random Variables

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Appendix A : Introduction to Probability and stochastic processes

Quick Tour of Basic Probability Theory and Linear Algebra

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Introduction to Probability and Stocastic Processes - Part I

Review of Probability Theory

2 (Statistics) Random variables

Math-Stat-491-Fall2014-Notes-I

conditional cdf, conditional pdf, total probability theorem?

Lecture 1: Review on Probability and Statistics

Formulas for probability theory and linear models SF2941

1 Random Variable: Topics

ECE 4400:693 - Information Theory

Bivariate distributions

SDS 321: Introduction to Probability and Statistics

Gov Multiple Random Variables

EE514A Information Theory I Fall 2013

Lecture 1: August 28

Multiple Random Variables

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

4 Pairs of Random Variables

Multivariate Random Variable

Random Signals and Systems. Chapter 3. Jitendra K Tugnait. Department of Electrical & Computer Engineering. Auburn University.

STT 441 Final Exam Fall 2013

STAT Chapter 5 Continuous Distributions

Continuous Random Variables

1: PROBABILITY REVIEW

Lecture 2: Review of Probability

Probability and Statistics

18.440: Lecture 28 Lectures Review

Review (Probability & Linear Algebra)

1 Presessional Probability

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Probability Theory. Patrick Lam

18.440: Lecture 28 Lectures Review

1.1 Review of Probability Theory

BASICS OF PROBABILITY

EE4601 Communication Systems

Practice Problem - Skewness of Bernoulli Random Variable. Lecture 7: Joint Distributions and the Law of Large Numbers. Joint Distributions - Example

Chapter 2. Probability

matrix-free Elements of Probability Theory 1 Random Variables and Distributions Contents Elements of Probability Theory 2

Bivariate Distributions

SDS 321: Introduction to Probability and Statistics

Midterm #1. Lecture 10: Joint Distributions and the Law of Large Numbers. Joint Distributions - Example, cont. Joint Distributions - Example

CME 106: Review Probability theory

Single Maths B: Introduction to Probability

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Lecture The Sample Mean and the Sample Variance Under Assumption of Normality

Probability Notes. Compiled by Paul J. Hurtado. Last Compiled: September 6, 2017

Probability Theory and Statistics. Peter Jochumzen

3. Review of Probability and Statistics

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Probability Theory Review Reading Assignments

MATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation)

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

Elements of Probability Theory

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Lecture 22: Variance and Covariance

Lecture 3 - Expectation, inequalities and laws of large numbers

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Introduction to Probability Theory

LIST OF FORMULAS FOR STK1100 AND STK1110

Probability inequalities 11

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

Lecture 1: Bayesian Framework Basics

Lecture 6 Basic Probability

Probability and Distributions

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

1 Joint and marginal distributions

Bayesian statistics, simulation and software

Lectures 22-23: Conditional Expectations

Week 2. Review of Probability, Random Variables and Univariate Distributions

Probability Theory for Machine Learning. Chris Cremer September 2015

ECE302 Exam 2 Version A April 21, You must show ALL of your work for full credit. Please leave fractions as fractions, but simplify them, etc.

Deep Learning for Computer Vision

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Expectation of Random Variables

01 Probability Theory and Statistics Review

6.1 Moment Generating and Characteristic Functions

Mathematical Statistics 1 Math A 6330

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Lecture 4 : Random variable and expectation

Transcription:

Math 408 - Mathematical Statistics Lecture 11. Probability Theory: an Overveiw February 11, 2013 Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 1 / 24

The starting point in developing the probability theory is the notion of a sample space = the set of possible outcomes. The sample space Ω is the set of possible outcomes of an experiment Points ω Ω are called realizations Events are subsets of Ω Next, to every event A Ω, we assign a real number P(A), called the probability of A. We call function P : {subsets of Ω} R a probability distribution. Function P is not arbitrary, it satisfies several natural properties (called axioms of probability): 1 0 P(A) 1 (Events range from never happening to always happening) 2 P(Ω) = 1 (Something must happen) 3 P( ) = 0 (Nothing never happens) 4 P(A) + P(Ā) = 1 (A must either happen or not-happen) 5 P(A + B) = P(A) + P(B) P(AB) Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 2 / 24

Statistical Independence Two events A and B are independent if P(AB) = P(A)P(B) Independence can arise in two distinct ways: 1 We explicitly assume that two events are independent. 2 We derive independence of A and B by verifying that P(AB) = P(A)P(B). Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 3 / 24

Conditional Probability If P(A) > 0, then the conditional probability of B given A is P(B A) = P(AB) P(A) Useful Interpretation: Think of P(B A) as the fraction of times B occurs among those in which A occurs Properties of Conditional Probabilities: 1 For any fixed A such that P(A) > 0, P( A) is a probability, i.e. it satisfies the rules of probability. 2 In general P(B A) P(A B) = P(B) Thus, another interpretation of independence is that knowing A does not change the probability of B. 3 If A and B are independent then P(B A) = P(AB) P(A) = P(A)P(B) P(A) Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 4 / 24

Law of Total Probability and Bayes Theorem Law of Total Probability Let A 1,..., A n be a partition of Ω, i.e. n i=1 A i = Ω (A 1,..., A n are jointly exhaustive events) A i Aj = for i j (A 1,..., A n are mutually exclusive events) P(A i ) > 0 Then for any event B P(B) = n P(B A i )P(A i ) i=1 Bayes Theorem Conditional probabilities can be inverted. That is, P(A B) = P(B A)P(A) P(B) Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 5 / 24

Random Variables We need the random variables to link sample spaces and events to data. A random variable is a mapping X : Ω R that assigns a real number X (ω) to each outcome ω Ω. This mapping induces probability on R from Ω as follows: Given a random variable X and a set A R, define and let X 1 (A) = {ω Ω : X (ω) A} P(X A) = P(X 1 (A)) = P({ω Ω : X (ω) A}) The cumulative distribution function (CDF) F X : R [0, 1] is defined by F X (x) = P(X x) CDF contains all the information about the random variable Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 6 / 24

Properties of CDFs Theorem A function F : R [0, 1] is a CDF for some random variable if and only if it satisfies the following three conditions: 1 F is non-decreasing: x 1 < x 2 F (x 1 ) F (x 2 ) 2 F is normalized: lim F (x) = 0 and lim F (x) = 1 x x + 3 F is right-continuous: lim F (y) = F (x) y x+0 Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 7 / 24

Discrete Random Variables X is discrete if it takes countable many values {x 1, x 2,...}. We define the probability mass function (PMF) for X by f X (x) = P(X = x) Relationships between CDF and PMF: The CDF of X is related to the PMF f X by F X (x) = P(X x) = f X (x i ) x i x The PMF f X is related to the CDF F X by f X (x) = F X (x) F X (x ) = F X (x) lim F (y) y x 0 Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 8 / 24

Continuous Random Variables A random variable is continuous if there exists a function f X such that f X (x) 0 for all x + f X (x)dx = 1, and For every a b P(a < X b) = b a f X (x)dx The function f X (x) is called the probability density function (PDF) Relationship between the CDF F X (x) and PDF f X (x): F X (x) = x f X (t)dt f X (x) = F X (x) Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 9 / 24

Transformation of Random Variables Suppose that X is a random variable with PDF f X and CDF F X. Let Y = r(x ) be a function of X. Q: How to compute the PDFand CDF of Y? 1 For each y, find the set A y = {x : r(x) y} 2 Find the CDF F Y (y) F Y (y) = P(Y y) = P(r(X ) y) = P(X A y ) = f X (x)dx A y 3 The PDF is then f Y (y) = F Y (y) Important Fact: When r is strictly monotonic, then r has an inverse s = r 1 and f Y (y) = f X (s(y)) ds(y) dy Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 10 / 24

Joint Distributions Discrete Case Given a pair of discrete random variables X and Y, their joint PMF is defined by Continuous Case f X,Y (x, y) = P(X = x, Y = y) A function f X,Y (x, y) is called the joint PDF of continuous random variables X and Y if + + fx,y (x, y) 0, f X,Y (x, y)dxdy = 1 For any set A R R P((X, Y ) A) = f X,Y (x, y)dxdy The joint CDF of X and Y is defined as F X,Y (x, y) = P(X x, Y y) A Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 11 / 24

Marginal Distributions Discrete Case If X and Y have joint PMF f X,Y, then the marginal PMF of X is f X (x) = P(X = x) = y P(X = x, Y = y) = y f X,Y (x, y) Similarly, the marginal PMF of Y is f Y (y) = P(Y = y) = x P(X = x, Y = y) = x f X,Y (x, y) Continuous Case If X and Y have joint PDF f X,Y, then the marginal PDFs of X and Y are f X (x) = f X,Y (x, y)dy and f Y (y) = f X,Y (x, y)dx Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 12 / 24

Independent Random Variables Two random variables X and Y are independent if, for every A and B Criterion of independence: Theorem P(X A, Y B) = P(X A)P(Y B) Let X and Y have joint PDF/PMF f X,Y. Then X and Y are independent if and only if f X,Y (x, y) = f X (x)f Y (y) Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 13 / 24

Conditional Distributions Discrete Case The conditional PMF: f X Y (x y) = P(X = x Y = y) = Continuous Case The conditional PDF is P(X = x, Y = y) P(Y = y) f X Y (x y) = f X,Y (x, y) f Y (y) = f X,Y (x, y) f Y (y) Then, P(X A Y = y) = f X Y (x y)dx A Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 14 / 24

Expectation and its Properties The expectation (or mean) of a random variable X is the average value of X. The expected value, or mean, or first moment of X is { µ X E[X ] = x xf X (x), xfx (x)dx, assuming that the sum (or integral) is well-defined. if X is discrete if X is continuous Let Y = r(x ), then E[Y ] = E[r(X )] = r(x)f X (x)dx If X 1,..., X n are random variables and a 1,..., a n are constants, then [ n ] n E a i X i = a i E[X i ] i=1 i=1 Let X 1,..., X n be independent random variables. Then, [ n ] n E X i = E[X i ] i=1 Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 15 / 24 i=1

Variance and its Properties The variance measures the spread of a distribution. Let X be a random variance with mean µ X. The variance of X, denoted V[X ] or σx 2, is defined by { σx 2 V[X ] = E[(X µ X ) 2 ] = x (x µ X ) 2 f X (x), (x µx ) 2 f X (x)dx, if X is discrete if X is continuous The standard deviation is σ X = V[X ] Important Properties of V[X ]: V[X ] = E[X 2 ] µ 2 X If a and b are constants, then V[aX + b] = a 2 V[X ] If X 1,..., X n are independent and a 1,..., a n are constants, then [ n ] n V a i X i = ai 2 V[X i ] i=1 Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 16 / 24 i=1

Covariance and Correlation If X and Y are random variables, then the covariance and correlation between X and Y measure how strong the linear relationship is between X and Y. Let X and Y be random variables with means µ X and µ Y and standard deviations σ X and σ Y. Define the covariance between X and Y by Cov(X, Y ) = E[(X µ X )(Y µ Y )] and the correlation by ρ(x, Y ) = Cov(X, Y ) σ X σ Y Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 17 / 24

Properties of Covariance and Correlation The covariance satisfies (useful in computations): Cov(X, Y ) = E[XY ] E[X ]E[Y ] The correlation satisfies: 1 ρ(x, Y ) 1 If Y = ax + b for some constants a and b, then { 1, if a > 0 ρ(x, Y ) = 1, if a < 0 If X and Y are independent, then Cov(X, Y ) = ρ(x, Y ) = 0. The converse is not true. For random variables X 1,..., X n [ n ] n V a i X i = ai 2 V[X i ] + 2 a i a j Cov(X i, X j ) i=1 i<j i=1 Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 18 / 24

Conditional Expectation and Conditional Variance The conditional expectation of X given Y = y is { E[X Y = y] = x xf X Y (x y), discrete case; xfx Y (x y)dx, continuous case. E[X ] is a number E[X Y = y] is a function of y E[X Y ] is the random variable whose value is E[X Y = y] when Y = y The Rule of Iterated Expectations EE[Y X ] = E[Y ] and EE[X Y ] = E[X ] The conditional variance of X given Y = y is V[X Y = y] = E[(X E[X Y = y]) 2 Y = y] V[X ] is a number V[X Y = y] is a function of y V[X Y ] is the random variable whose value is V[X Y = y] when Y = y For random variables X and Y V[X ] = EV[X Y ] + VE[X Y ] Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 19 / 24

Inequalities Markov inequality: If X is a non-negative random variable, then for any a > 0 P(X a) E[X ] a Chebyshev inequality: If X is a random variable with mean µ and variance σ 2, then for any a > 0 P( X µ a) σ2 a 2 Hoeffding inequality: Let X 1,..., X n Bernoulli(p), then for any ε > 0 P( X n p a) 2e 2na2 Cauchy-Schwarz inequality: If X and Y have finite variances, then Jensen Inequality: E[ XY ] E[X 2 ]E[Y 2 ] If g is convex, then E[g(X )] g(e[x ]) If g is concave, then E[g(X )] g(e[x ]) Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 20 / 24

Convergence of Random Variables There are two main types of convergence: convergence in probability and convergence in distribution. Let X 1, X 2,... be a sequence of random variables and let X be another random variable. Let F n denote the CDF of X n and let F denote the CDF of X. P X n converges to X in probability, written X n X, if for every ɛ > 0 lim P( X n X ɛ) = 0 n D X n converges to X in distribution, written X n X, if lim F n(x) = F (x) n for all x for which F is continuous. X n P D X implies that X n X Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 21 / 24

Law of Large Numbers and Central Limit Theorem The LLN says that the mean of a large sample is close to the mean of the distribution. The Law of Large Numbers Let X 1,..., X n be i.i.d. with mean µ and variance σ 2. Let X n = 1 n n i=1 X i. Then X n = 1 n n i=1 X i P µ as n The CLT says that X n has a distribution which is approximately Normal with mean µ and variance σ 2 /n. This is remarkable since nothing is assumed about the distribution of X i, except the existence of the mean and variance. The Central Limit Theorem Let X 1,..., X n be i.i.d. with mean µ and variance σ 2. Then Z n X n µ σ/ n D Z N (0, 1) as n Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 22 / 24

The Central Limit Theorem The central limit theorem tells us that Z n = X n µ σ/ n N (0, 1) However, in applications, we rarely know σ. We can estimate σ 2 from X 1,..., X n by sample variance Sn 2 = 1 n (X i X n ) 2 n 1 Question: If we replace σ with S n is the central limit theorem still true? Answer: Yes! Theorem i=1 Assume the same conditions as in the CLT. Then, X n µ S n / n D Z N (0, 1) as n Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 23 / 24

Multivariate Central Limit Theorem Let X 1,..., X n be i.i.d. random vectors with mean µ and covariance matrix Σ: X 1i µ 1 E[X 1i ] X 2i µ 2 X i = µ =.. = E[X 2i ]. X ki µ k E[X ki ] V[X 1i ] Cov(X 1i, X 2i )... Cov(X 1i, X ki ) Cov(X 2i, X 1i ) V[X 2i ]... Cov(X 2i, X ki ) Σ =...... Cov(X ki, X 1i )... Cov(X ki, X k 1i ) V[X ki ] Let X n = (X 1n,..., X kn ) T. Then n(x n µ) D N (0, Σ) as n Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 24 / 24