Lecture 3: Probability Distributions

Similar documents
j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Expected Value and Variance

Chapter 3 Describing Data Using Numerical Measures

Module 2. Random Processes. Version 2 ECE IIT, Kharagpur

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

SELECTED PROOFS. DeMorgan s formulas: The first one is clear from Venn diagram, or the following truth table:

PhysicsAndMathsTutor.com

Chapter 1. Probability

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Limited Dependent Variables

First Year Examination Department of Statistics, University of Florida

Introduction to Random Variables

Probability and Random Variable Primer

Probability Theory (revisited)

Linear Regression Analysis: Terminology and Notation

Statistics and Probability Theory in Civil, Surveying and Environmental Engineering

Effects of Ignoring Correlations When Computing Sample Chi-Square. John W. Fowler February 26, 2012

A random variable is a function which associates a real number to each element of the sample space

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

MATH 5707 HOMEWORK 4 SOLUTIONS 2. 2 i 2p i E(X i ) + E(Xi 2 ) ä i=1. i=1

PROBABILITY PRIMER. Exercise Solutions

A be a probability space. A random vector

Multiple Choice. Choose the one that best completes the statement or answers the question.

6. Stochastic processes (2)

6. Stochastic processes (2)

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

Engineering Risk Benefit Analysis

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Primer on High-Order Moment Estimators

More metrics on cartesian products

CS-433: Simulation and Modeling Modeling and Probability Review

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

Strong Markov property: Same assertion holds for stopping times τ.

NUMERICAL DIFFERENTIATION

The Geometry of Logit and Probit

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

On mutual information estimation for mixed-pair random variables

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

8.1 Arc Length. What is the length of a curve? How can we approximate it? We could do it following the pattern we ve used before

Affine transformations and convexity

Definition. Measures of Dispersion. Measures of Dispersion. Definition. The Range. Measures of Dispersion 3/24/2014

/ n ) are compared. The logic is: if the two

Composite Hypotheses testing

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

CSCE 790S Background Results

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

BIO Lab 2: TWO-LEVEL NORMAL MODELS with school children popularity data

Stochastic Structural Dynamics

β0 + β1xi. You are interested in estimating the unknown parameters β

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

PES 1120 Spring 2014, Spendier Lecture 6/Page 1

APPENDIX A Some Linear Algebra

7. Multivariate Probability

} Often, when learning, we deal with uncertainty:

Chapter 11: Simple Linear Regression and Correlation

Statistical Inference. 2.3 Summary Statistics Measures of Center and Spread. parameters ( population characteristics )

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

k t+1 + c t A t k t, t=0

β0 + β1xi. You are interested in estimating the unknown parameters β

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity

7. Multivariate Probability

Google PageRank with Stochastic Matrix

Linear, affine, and convex sets and hulls In the sequel, unless otherwise specified, X will denote a real vector space.

Lecture Notes on Linear Regression

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

1. Fundamentals 1.1 Probability Theory Sample Space and Probability Random Variables Limit Theories

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1

= z 20 z n. (k 20) + 4 z k = 4

Continuous Time Markov Chain

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

β0 + β1xi and want to estimate the unknown

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

CS286r Assign One. Answer Key

AS-Level Maths: Statistics 1 for Edexcel

Lecture 17 : Stochastic Processes II

Systems of Equations (SUR, GMM, and 3SLS)

Complex Numbers Alpha, Round 1 Test #123

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Differentiating Gaussian Processes

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Statistics II Final Exam 26/6/18

U-Pb Geochronology Practical: Background

e i is a random error

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

18.1 Introduction and Recap

Randomness and Computation

8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS

Polynomial Regression Models

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

Economics 101. Lecture 4 - Equilibrium and Efficiency

MODELING TRAFFIC LIGHTS IN INTERSECTION USING PETRI NETS

Outline. Zero Conditional mean. I. Motivation. 3. Multiple Regression Analysis: Estimation. Read Wooldridge (2013), Chapter 3.

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Notes on Frequency Estimation in Data Streams

Statistics and Quantitative Analysis U4320. Segment 3: Probability Prof. Sharyn O Halloran

Transcription:

Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the real lne. It s gven by x: S R. It assgns to each element n a sample space a real number value. Each element n the sample space has an assocated probablty and such probabltes sum to one. Probablty Dstrbutons. Let A R and let Prob(x A) denote the probablty that x wll belong to A. 2. Def. The dstrbuton functon of a random varable x s a functon defned by F(x') Prob(x x'), x' R. 3. Key Propertes of a Dstrbuton Functon P. F s nondecreasng n x. P.2 lm F(x) = and lm F(x) = 0. x x P.3 For all x', Prob(x > x') = - F(x'). P.4 For all x' and x'' such that x'' > x', Prob(x' < x x'') = F(x'') - F(x'). P.5 For all x', Prob(x < x') = lm F(x). ' x x P.6 For all x', Prob(x=x') = lm F(x) - lm F(x). x x' + x x' Dscrete Random Varables. If the random varable can assume only a fnte number or a countable nfnte set of values, t s sad to be a dscrete random varable. The set of values assumed by the random varable has a one-to-one correspondence to a subset of the postve ntegers. In contrast, a contnuous random varable has a set of possble values whch conssts of an nterval of the reals. 2. Wth a dscrete random varable, x, x takes on values as {x,...,x k } (fnte) or {x,x 2,...} (countable but nfnte)

3. There are three key propertes of dscrete random varables. P. Prob(x = x') f(x') 0. (f s called the probablty mass functon or the probablty functon.) P.2 f( x ) = Pr ob( x= x ) =. = = P.3 Prob(x A) = f( x ). x A 3. Graphcally, the dstrbuton functon of a dscrete random varable s a step-dot graph wth jumps between ponts equal to f(x ). Example: # Consder the random varable assocated wth 2 tosses of a far con. The possble values for the #heads x are {0,, 2}. We have that f(0) = /4, f() = /2, and f(2) = /4. f(x) F(x) X /2 X 3/4 X /4 X X /4 X 0 2 0 2 #2 A sngle toss of a far de. f(x) = /6 f x =,2,3,4,5,6. F(x ) = x /6. Dscrete Jont Dstrbutons. Let the two random varables x and y have a jont probablty functon f(x ',y ') = Prob(x = x ' and y = y '). The set of values taken on by (x,y) s X Y = { (x,y ) : x X and y Y}. 2

The probablty functon satsfes P. f(x, y ) 0. P.2 Prob((x,y ) A) = f( x, y ). P.3 f( x, y ) =. ( x, y) ( x, y) A The dstrbuton functon s gven by F(x ', y ' ' ) = Prob( x x and y y ' ) = f( x, y ), where ' L = {(x, y ) : x x and y y ' }. ( x, y) L 2. Next we wsh to consder the margnal probablty functon and the margnal dstrbuton functon. a. The margnal probablty functon assocated wth x s gven by f (x j ) Prob(x = x j ) = fx (, y). Lkewse the margnal probablty functon assocated wth y s gven by f 2 (y j ) y j Prob(y = y j ) = f( x, y ). x j b. The margnal dstrbuton functon of x s gven by F (x j ) = Prob(x x j ) = lm Prob(x x j y j and y y j ) = lm F(x j,y j ). Lkewse for y, the margnal dstrbuton functon s F 2 (y j ) = y j lm x j F(x j,y j ). 3. An example. Let x and y represent random varables representng whether or not two dfferent stocks wll ncrease or decrease n prce. Each of x and y can take on the values 0 or, where a means that ts prce has ncreased and a 0 means that t has decreased. The probablty functon s descrbed by f(,) =.50 f(0,) =.35 f(,0) =.0 f(0,0) =.05. Answer each of the followng questons. 3

a. Fnd F(,0) and F(0,). F(,0) =. +.05 =.5. F(0,) =.35 +.05 =.40. b. Fnd F (0) = lm y F(0,y) = F(0,) =.4 c. Fnd F 2 () = lm F(x,) = F(,) =. x d. Fnd f (0) = f( 0, y) = f(0,) + f(0,0) =.4. y e. Fnd f () = f(, y) = f(,) +f(,0) =.6 y 4. Next, we consder condtonal dstrbutons. a. After a value of y has been observed, the probablty that a value of x wll be observed s gven by Prob(x = x y = y ) = b. The functon g (x y ) f ( x, y ) f ( y ). 2 Pr ob(x = x & y = y ) Pr ob(y = y ) = f ( x, y ) f ( y ). 2 s called the condtonal probablty functon of x, gven y. g 2 (y x ) s defned analogously. c. Propertes. () g (x y ) 0. () g (x y ) = x f(x,y ) / x (() and () hold for g 2 (y x )) f(x,y ) =. x () f(x,y ) = g (x y )f 2 (y ) = g 2 (y x )f (x ). 5. The condtonal dstrbuton functons are defned by the followng. G (x y ) = f( x, y)/ f2 ( y), x x G 2 (y x ) = f( x, y )/ f( x). y y 6. The stock prce example revsted. 4

a. Compute g ( 0) = f(,0)/f 2 (0). We have that f 2 (0) = f(0,0) + f(,0) =.05 +. =.5. Further f(,0) =.. Thus, g ( 0) =./.5 =.66. b. Fnd g 2 (0 0) = f(0,0)/f (0) =.05/.4 =.25. Here f (0) = f( 0, y ) = f(0,0) + f(0,) =.05 + y.35 =.4. Independent Random Varables. Def. The random varables (x,...,x n ) are sad to be ndependent f for any n sets of real numbers A, we have Prob(x A & x 2 A 2 &...& x n A n ) = Prob(x A )Prob(x 2 A 2 ) Prob(x n A n ). 2. Gven ths defnton, let F and f represent the jont margnal denstes and the margnal dstrbuton functons for the random varables x and y. Let F and f represent the jont dstrbuton and densty functons. The random varables x and y are ndependent ff F(x,y) = F (x)f 2 (y) or f(x,y) = f (x)f 2 (y). Further, f x and y are ndependent, then g (x y) = f(x,y)/f 2 (y) = f (x)f 2 (y)/ f 2 (y) = f (x). Summary Measures of Probablty Dstrbutons. Summary measures are scalars that convey some aspect of the dstrbuton. Because each s a scalar, all of the nformaton about the dstrbuton cannot be captured. In some cases t s of nterest to know multple summary measures of the same dstrbuton. 2. There are two key types of measures. a. Measures of central tendency: Expectaton, medan and mode b. measures of dsperson: Varance 5

Expectaton. The expectaton of a random varable x s gven by E(x) = Σ x f(x ) 2. Examples #. A lottery. A church holds a lottery by sellng 000 tckets at a dollar each. One wnner wns $750. You buy one tcket. What s your expected return? E(x) =.00(749) +.999(-) =.749 -.999 = -.25. The nterpretaton s that f you were to repeat ths game nfntely your long run return would be -.25. #2. You purchase 00 shares of a stock and sell them one year later. The net gan s x. The dstrbuton s gven by. (-500,.03), (-250,.07), (0,.), (250,.25),(500,.35), (750,.5), and (000,.05). E(x) = $367.50 3. E(x) s also called the mean of x. A common notaton for E(x) s µ. 4. Propertes of E(x) P. Let g(x) be a functon of x. Then E(g(x)) s gven by E(g(x)) = Σ g(x ) f(x ). In what follows, tems wth an astersk may be skpped. *P.2 If k s a constant, then E(k) = k. *P.3 Let a and b be two arbtrary constants. Then E(ax + b) = ae(x) + b. *P.4 Let x,...,x n be n random varables. Then E(Σx ) = ΣE(x ). 6

*P.5 If there exsts a constant k such that Prob(x k) =, then E(x) k. If there exsts a constant k such that Prob(x k) =, then E(x) k. *P.6 Let x,...,x n be n ndependent random varables. Then E( n x = n ) = Ex ( ). = Medan. Def. If Prob(x m).5 and Prob(x m).5, then m s called a medan. m need not be unque. Example: (x,f(x )) gven by (6,.), (8,.4), (0,.3), (5,.), (25,.05), (50,.05). In ths case, m = 8 or 0. One conventon s to average these two and take the result as the medan. Mode. Def. The mode s gven by m o = argmax f(x ). 2. A mode s a maxmzer of the densty functon. Obvously, t need not be unque. *Other Descrptve Termnology for Central Tendency. Symmetry. The dstrbuton can be dvded nto two mrror mage halves. 2. Skewed. Rght skewed means that the bulk of the probablty falls on the lower values of x. Left skewed means that the bulk of probablty falls on the hgher values. A Summary Measure of Dsperson: The Varance. In many cases the mean the mode or the medan are not nformatve. 7

2. In partcular, two dstrbutons wth the same mean can be very dfferent dstrbutons. One would lke to know how common or typcal s the mean. The varance measures ths noton by takng the expectaton of the squared devaton about the mean. Def. For a random varable x, the varance s gven by E[(x-µ) 2 ]. Remark: The varance s also wrtten as Var(x) or as σ 2. The square root of the varance s called the standard devaton of the dstrbuton. It s wrtten as σ. 3. Computaton of the varance. For the dscrete case, Var(x) = Σ (x -µ) 2 f(x ). As an example, f (x, f(x )) are gven by (0,.), (500,.8), and (000,.). We have that E(x) = 500. Var(x) = (0-500) 2 (.) + (500-500) 2 (.8) + (000-500) 2 (.) = 50,000. 4. Propertes of the Varance. *P. Var(x) = 0 ff there exsts a c such that Prob(x = c) =. *P.2 For any constants a and b, Var(ax +b) = a 2 Var(x). P.3 Var(x) = E(x 2 ) - [E(x)] 2. *P.4 If x, =,...,n, are ndependent, then Var(Σx ) = Σ Var(x ). *P.5 If x are ndependent, =,...,n, then Var(Σa x ) = Σ a 2 Var(x ). 5. The Standard Devaton and Standardzed Random Varables 8

a. Usng the standard devaton, we may transform any random varable x nto a random varable wth zero mean and untary varance. Such a varable s called the standardzed random varable assocated wth x. b. Gven x we would defne ts standardzed form as z = x µ σ. z tells us how many standard devatons x s from ts mean (.e., σz = (x-µ)). c. Propertes of z. P. E(z) = 0. P.2 Var (z) =. Proof: P.2. Var(z) = E(z-0) 2 = E(z 2 ) = E[(x-µ) 2 /σ 2 ] = (/σ 2 )E(x-µ) 2 = σ 2 /σ 2 =. *4. A remark on moments a. Var (x) s sometmes called the second moment about the mean, wth E(x-µ) = 0 beng the frst moment about the mean. b. Usng ths termnology, E(x-µ) 3 s the thrd moment about the mean. It can gve us nformaton about the skewedness of the dstrbuton. E(x-µ) 4 s the fourth moment about the mean and t can yeld nformaton about the modes of the dstrbuton or the peaks (kurtoss). *Moments of Condtonal and Jont Dstrbutons. Gven a jont probablty densty functon f(x,..., x n ), the expectaton of a functon of the n varables say g(x,..., x n ) s defned as 9

E(g(x,..., x n )) = Σ g(x ) f(x ). 2. Uncondtonal expectaton of a jont dstrbuton. Gven a jont densty f(x,y), E(x) s gven by E(x) = = x f(x ) = = = x f (x, y ). Lkewse, E(y) s E(y) = = y f 2 (y ) = = = x f (x, y ). 3. Condtonal Expectaton. defned by The condtonal expectaton of x gven that x and y are jontly dstrbuted as f(x,y) s E(x y) = = x g (x y j ). Further the condtonal expectaton of y gven x s defned analogously as E(y x) = = y g 2 (y x j ). Correlaton and Covarance. Covarance. a. Covarance s a moment reflectng drecton of movement of two varables. It s defned as Cov(x,y) = E[(x-µ x )(y-µ y )]. 0

When ths s large and postve, then x and y tend to be both much above or both much below ther respectve means at the same tme. Conversely when t s negatve. b. Computaton of the covarance. Frst compute (x-µ x )(y-µ y ) = xy - µ x y - µ y x +µ x µ y. Takng E, E(xy) - µ x µ y - µ x µ y + µ x µ y = E(xy) - µ x µ y. Thus, Cov(xy) = E(xy) - E(x)E(y). If x and y are ndependent, then E(xy) = E(x)E(y) and Cov(xy) = 0. 2. Correlaton Coeffcent. a. The covarance s a good measure when dealng wth just two varables. However, t has the flaw that ts sze depends on the unts n whch the varables x and y are measured. Ths s a problem when one wants to compare the relatve strengths of two covarance estmates, say between x and y and z and w. b. The correlaton coeffcent cures ths problem by standardzng the unts of the covarance. The correlaton coeffcent s defned by ρ = Cov(x,y)/σ x σ y. c. Generally, ρ [-,]. If y and x are perfectly lnearly related then ρ. The less lnearly related are x and y, the closer s ρ to zero.