Appendix A : Introduction to Probability and stochastic processes

Similar documents
Algorithms for Uncertainty Quantification

Lecture 2: Repetition of probability theory and statistics

Random Variables. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

Review of Probability. CS1538: Introduction to Simulations

Lecture 11. Probability Theory: an Overveiw

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Probability Review. Chao Lan

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Probability Theory Review

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

1 Presessional Probability

Chapter 2. Probability

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

BASICS OF PROBABILITY

Recitation 2: Probability

Probability Theory for Machine Learning. Chris Cremer September 2015

Deep Learning for Computer Vision

Discrete Probability Refresher

Bayesian statistics, simulation and software

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Lecture 1: Basics of Probability

Probability Theory. Probability and Statistics for Data Science CSE594 - Spring 2016

Multivariate Random Variable

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Review: mostly probability and some statistics

01 Probability Theory and Statistics Review

Quick Tour of Basic Probability Theory and Linear Algebra

Dept. of Linguistics, Indiana University Fall 2015

LECTURE 1. 1 Introduction. 1.1 Sample spaces and events

Basic Probability. Introduction

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

More than one variable

Lecture Notes 2 Random Variables. Discrete Random Variables: Probability mass function (pmf)

Review of Probability Theory

Continuous Random Variables

Probability. Paul Schrimpf. January 23, UBC Economics 326. Probability. Paul Schrimpf. Definitions. Properties. Random variables.

2 Statistical Estimation: Basic Concepts

E X A M. Probability Theory and Stochastic Processes Date: December 13, 2016 Duration: 4 hours. Number of pages incl.

Lecture 2: Review of Probability

ECE 450 Homework #3. 1. Given the joint density function f XY (x,y) = 0.5 1<x<2, 2<y< <x<4, 2<y<3 0 else

EE4601 Communication Systems

If we want to analyze experimental or simulated data we might encounter the following tasks:

Lecture 1: August 28

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Stochastic processes Lecture 1: Multiple Random Variables Ch. 5

L2: Review of probability and statistics

Probability Theory Review Reading Assignments

Introduction to Stochastic Processes

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Math-Stat-491-Fall2014-Notes-I

EE514A Information Theory I Fall 2013

Solutions to Homework Set #6 (Prepared by Lele Wang)

1 Random Variable: Topics

Week 12-13: Discrete Probability

ENGG2430A-Homework 2

for valid PSD. PART B (Answer all five units, 5 X 10 = 50 Marks) UNIT I

CME 106: Review Probability theory

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Chapter 4 continued. Chapter 4 sections

Communication Theory II

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Review (Probability & Linear Algebra)

CHAPTER 4 MATHEMATICAL EXPECTATION. 4.1 Mean of a Random Variable

Brief Review of Probability

Basics on Probability. Jingrui He 09/11/2007

Chapter 2: Random Variables

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Jointly Distributed Random Variables

Introduction to Probability Theory

Math Review Sheet, Fall 2008

ECE 650 Lecture 4. Intro to Estimation Theory Random Vectors. ECE 650 D. Van Alphen 1

3. Probability and Statistics

Exercises with solutions (Set D)

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

Random Variables. Lecture 6: E(X ), Var(X ), & Cov(X, Y ) Random Variables - Vocabulary. Random Variables, cont.

1.1 Review of Probability Theory

Notes on Mathematics Groups

Problem Solving. Correlation and Covariance. Yi Lu. Problem Solving. Yi Lu ECE 313 2/51

ECE531: Principles of Detection and Estimation Course Introduction

Joint Distribution of Two or More Random Variables

1 Review of Probability and Distributions

Recap of Basic Probability Theory

Lecture 1: Review on Probability and Statistics

Gaussian random variables inr n

ECE Homework Set 3

Random variables (discrete)

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Recap of Basic Probability Theory

conditional cdf, conditional pdf, total probability theorem?

Lecture 1. ABC of Probability

UCSD ECE153 Handout #34 Prof. Young-Han Kim Tuesday, May 27, Solutions to Homework Set #6 (Prepared by TA Fatemeh Arbabjolfaei)

Bivariate distributions

Set Theory. Pattern Recognition III. Michal Haindl. Set Operations. Outline

Probability and Distributions

Transcription:

A-1 Mathematical methods in communication July 5th, 2009 Appendix A : Introduction to Probability and stochastic processes Lecturer: Haim Permuter Scribe: Shai Shapira and Uri Livnat The probability of an event is the ratio of the number of cases favorable to it, to the number of all cases possible when nothing leads us to expect that any one of these cases should occur more than any other, which renders them, for us, equally possible. Pierre Simon Laplace, 1812 I. BASIC CONCEPTS OF PROBABILITY. Definition I.1 (Probability Space: (Ω, F, P)) Probability space formalizes three interrelated ideas by three mathematical notions. Ω - Sample space : set of all possible outcomes ω of a particular random experiment. Where, {ω : ω Ω}. F - Collection of all events, subsets of Ω. P - Probability measure. Properties I.1 (Probability Space) 1) P : F [0, 1] 2) Empty Event -. P( ) = 0 3) Deterministic Event - Ω. P(Ω) = 1 4) Disjoint Events - A B = 5) Complementary Events - A A c =, A A c = Ω 6) For each collection of disjoint events {A n,a n F},A i Aj = i j exists: P( n A n) = n P(A n) Conclusions I.1 (Probability Space) Conclusions arising from the above properties: 1) P(A c ) = 1 P(A) 2) For any arbitrary eventrs {A,B} : P(A B) = P(A)+P(B) P(A B) Example 1 (Fair coin flip) If the space concerns one flip of a fair coin, then the outcomes are heads and tails: Ω = {H,T}. F = 2 Ω contains 2 2 = 4 events, namely, {H} : heads, {T} : tails, {} : neither heads nor tails, and {H,T} : heads or tails. So, F = {{},{H},{T},{H,T}}. There is a fifty percent chance of tossing either heads or tail: thus P({H}) = P({T}) = 0.5. The chance of tossing neither is zero: P({}) = 0, and the chance of tossing one or the other is one: P({H,T}) = 1.

A-2 A. Conditional Probability. Definition I.2 (Conditional probability) is the probability of some eventa, given the occurrence of some other event B. Conditional probability is written P(A B), and is read the probability of A, given B. Intersection events and conditional events are related by the formula: which can also be written as, P(A B) = P(A B) P(B) P(A B) = P(A,B) P(B) (I.1) (I.2) B. Statistically Independent Probabilities. Definition I.3 (Statistically Independent Probabilities) Events A and B are Statistically Independent iff: P(A B) = P(A), (I.3) or similarly, ( P A ) B = P(A)P(B) (I.4) which can also be written as, P(A,B) = P(A)P(B) (I.5) C. Bayes Theorem. Bayes theorem relates the conditional and marginal probabilities of events A and B, where B has a non-vanishing probability: Theorem I.1 (Bayes theorem) P(A B) = P(A)P(B A) P(B) Each term in Bayes theorem has a conventional name. (I.6) P(A) is the prior probability or marginal probability of A. It is prior in the sense that it does not take into account any information about B. P(A B) is the conditional probability of A, given B. It is also called the posterior probability because it is derived from or depends upon the specified value of B. P(B A) is the conditional probability of B given A. P(B) is the prior or marginal probability of B, and acts as a normalizing constant. Intuitively, Bayes theorem in this form describes the way in which one s beliefs about observing A are updated by having observed B.

A-3 II. RANDOM VARIABLES. A random variable is a variable whose possible values are numerical outcomes of a stochastic phenomenon. There are two types of random variables, discrete and continuous. A. Continuous Random Variables. Definition II.1 (Continuous Random Variables) is a function X : Ω R such that for any real number a the set {ω : X(ω) a} is an event. According to that definition: 1) The Cumulutive Distribution Function (CDF) of a continuous random variable: F (α) = P({X(ω) α}) 2) The Probability Density Function (PDF) of a continuous random variable where F X (α) is a differentiable function f X (α) = FX(α) α Note that the derivative FX(α) α might not always exist, but in our course we will deal only with continuous random variable that the PDF exists Properties II.1 (Continuous Random Variables) 1) Its CDF is monotonically rising and right continuous. 2) F X ( ) = 1, F X ( ) = 0 3) if a 2 > a 1 then P(a 1 < X a 2 ) = F X (a 2 ) F X (a 1 ) Conclusions II.1 (Continuous Random Variables) 2) f X(α)dα = F X ( ) = 1 3) f X (α) 0 1) F X (β) = β f X(α)dα B. Discrete Random Variables. Definition II.2 (Discrete Random Variables) Let {x 0,x 1,...} be the values a discrete r.v can take with non-zero probability. Ω i = {ω : X(ω) = x i }, i N 1) A discrete r.v has a Probability Mass Function (PMF) (instead of a PDF such as for a continuous r.v). It is a function that gives the probability that a discrete r.v is exactly equal to some value. P X (x) = P(X = x) = P({ω Ω : X(ω) = x}) where {ω Ω} (II.1) 2) Since the image of X is countable, the probability mass functionp X (x) is zero for all but a countable number of values of X. The discontinuity of probability mass functions reflects the fact that the CDF of a discrete r.v is also discontinuous. Where it is differentiable, the derivative is zero, just as the probability mass function is zero at all such points. Properties II.2 (Discrete Random Variables) 1) P X (x) 0

A-4 2) i P(X = x i) = i P X(x i ) = 1 C. Expectation. Definition II.3 (Expected value of a continuous random variable) { } µ = E[X] = αf X (α)dα where f X (α)dα < (II.2) Definition II.4 (Expected value of a discrete random variable) E[X] = x X xp(x = x) (II.3) = i x i P(X = x i ) Properties II.3 (Expectation) 1) For a deterministic variable: E[c] = c 2) Linearity : E[cX +dy] = E[cX]+E[dY] where X,Y are r.v and c,d are constants. 3) Monotonicity : If X and Y are random variables such that X(ω) Y(ω) then E[X] E[Y]. D. Variance. The variance of a random variable is a measure of statistical dispersion, averaging the squares of the deviations of its possible values from its expected value. [ Var[X] = E (X E[X]) 2] (II.4) = E [ X 2] (E[X]) 2 Definition II.5 (Variance of a continuous random variable) Var(X) = (α E[X]) 2 f x (α)dα (II.5) Definition II.6 (Variance of a discrete random variable) Var(X) = x X (x E[X]) 2 P(X = x) (II.6) = x X x 2 P(X = x) E 2 [X] (II.7) Properties II.4 (Variance) 1) For a deterministic variable: Var[c] = 0 2) Var(X +a) = Var(X) 3) Var(aX) = a 2 Var(X)

A-5 E. Covariance. Definition II.7 (Covariance) Covariance is a measure of how much two variables change together. Cov(X,Y) = E[(X E[X])(Y E[Y])] (II.8) which can also be written as, Cov(X,Y) = E[XY] E[X]E[Y] (II.9) Properties II.5 (Covariance) Let X,Y be real valued r.v and a,b are constants. 1) If X and Y are independent their covariance is zero and they are called uncorrelated. 2) Cov(X,a) = 0 3) Cov(X, X) = Var(X) 4) Cov(X,Y) = Cov(Y,X) 5) Cov(aX,bY) = ab Cov(X,Y) 6) Cov(X +a,y +b) = Cov(X,Y) Cov(X,Y) = E[XY] E[X]E[Y] = E[X]E[Y] E[X]E[Y] = 0 Example 2 (Fair coin flip) For a coin toss, the possible events are heads or tails. The number of heads appearing in one fair coin toss can be described using the following random variable: 1, if heads, X = 0, if tails. with probability mass function given by: 1 2, if x = 0, f X (x) = 1 2, if x = 1, 0, otherwise. Let s calculate the Expectetaion of the r.v : E[X] = i x i P(X = x i ) = 1 2 0+ 1 2 1 = 1 2

A-6 And the Variance: Var(X) = 2 x 2 i P(X = x) E 2 [X] i=1 = 1 2 02 + 1 2 12 ( ) 2 1 2 = 1 4 III. RANDOM VECTORS. Definition III.1 (Random Vectors) Let X 1,X 2...X n Random Variables of the same probability space. X 2 Then, Their columnwise order will compose a Random Vector in R n. Hence, X = X =. X 1 X n A. CDF of Random Vector: Definition III.2 (CDF of Random Vector) F X (α 1,α 2...α n ) = P(X 1 α 1,X 2 α 2,...,X n α n ) (III.1) if F X (α 1,α 2...α n ) = n i=1 f X i (α i ) then X 1,X 2...X n are statistically independent, or mutually independent. IV. INDEPENDENT AND IDENTICALLY-DISTRIBUTED RANDOM VARIABLES. Definition IV.1 (i.i.d.) A collection of random variables is independent and identically distributed (i.i.d.) if each random variable has the same probability distribution as the others and all are mutually independent V. GAUSSIAN DISTRIBUTION. A. Gaussian Random Variable Definition V.1 (Gaussian Random Variable) The random variable X is said to be a Gaussian random variable (or normal random variable) if its PDF has the form : f X (α) = 1 e (α µ)2 2σ 2 2πσ 2 where { µ = E[X], σ 2 = Var(X) } (V.1) Hence, a Gaussian r.v is characterized by 2 parameters : { µ,σ 2} and its common notation is : X N(µ,σ 2 )

A-7 B. Gaussian Random Vector Definition V.2 (Gaussian Random Vector) The random variables {X i } n i=1 random variables, or similarly are called Jointly Gaussian X = (X 1,X 2...X n ) T is called Gaussian random vector if for every collection of non random constants {a i } n i=1 such that n i=1 a ix i is a Gaussian random vector. Definition V.3 (Gaussian Random Vector) An equivalent definition: A random vector X R n is called a Gaussian random vector if it is continuous with the joint PDF : where Properties V.1 (Gaussian Random Vector) 1) If X is a Gaussian random vector, each of its components is a Gaussian random variable. 1 f (X) (α)= e 2[ 1 (X µ) T Λ 1 (X µ) ] (V.2) (2π) n 2 Λ 1 2 µ = E[X] = [E[X 1 ],E[X 2 ],...,E[X n ]] [ (X )( ) ] T Λ = E µ X µ 2) If X is a random vector with independent components and each one is a Gaussian r.v then X is a Gaussian random vector. 3) Linear transformation of a Gaussian random vector Y = AX +b is a Gaussian random vector. 4) The Gaussian distribution is uniquely determined by the 2 first moments: { µ,λ }. VI. MMSE ESTIMATOR. MMSE - MINIMUM MEAN SQUARE ERROR In statistics and signal processing, an MMSE estimator describes the approach which minimizes the mean square error (MSE), which is a common measure of estimator quality. A. MMSE Estimator. Definition VI.1 (MMSE Estimator) Let X be an unknown random variable/vector, and let Y be a known random variable - the measurements of X. An estimator ˆX(Y) is any function of the measurements Y, and its MSE is given by : [ ( ) ] MSE = E[ǫ 2 2 ] = E ˆX(Y) X (VI.1) ǫ= ˆX(Y) X { } ˆX MMSE (Y) = argmin ˆXMMSE (Y ) ˆX(Y) {MSE} where ˆX(Y) = g(y) (VI.2) The MMSE estimator is then defined as the estimator achieving minimal MSE.

A-8 B. Linear MMSE Estimator. The linear MMSE estimator is the estimator achieving minimum MSE among all estimators of the form AY +b. If the measurement Y is a random vector, A is a matrix and b is a vector. If X and Y are jointly Gaussian, then the MMSE estimator is linear. As a consequence, to find the MMSE estimator, it is sufficient to find the linear MMSE estimator. [ ( ) ] MSE = E[ǫ 2 2 ] = E ˆXlinear (Y) X ǫ= ˆXlinear(Y) X (VI.3) { } ˆX linear MMSE (Y) = argmin ˆXMMSE linear (Y ) ˆX {MSE} where ˆXlinear linear(y) = AY +b (VI.4) For the Scalar case : The estimation error in the Scalar case : For the Vectorial case : ˆX linear MMSE (Y) = E[X]+ Cov(X,Y) (Y E[Y]) (VI.5) Var(Y) MSE = E [ ǫ 2 ] Cov 2 (X,Y) linear = Var(X) Var(Y) (VI.6) ˆX MMSE linear (Y) = E[X]+Λ XY Λ 1 YY (Y E[Y]) (VI.7) The estimation error Covariance matrix for the Vectorial case : ( ) ] T Λ ǫǫ = E[ X ˆX linear )(X ˆX linear = Λ XX Λ XY Λ 1 YY Λ YX (VI.8) (VI.9) Where, Λ XY = E [(X E[X])(Y E[Y]) T] Properties VI.1 (MMSE Estimator) 1) ˆX(Y) is an unbiased estimator of E[X]. In other words, X and ˆX(Y) have the same expectation: E[ ˆX] = E[X] (VI.10) Since the expectation of the error is 0. E[ǫ] = 0 where { ( )} ǫ = ˆX X (VI.11) 2) The estimator ˆX and the error ǫ are orthogonal: E[ ˆX ǫ] = 0 (Orthogonality Principle) (VI.12) This will imply that the estimator ˆX and the error ǫ are uncorrelated