SELECTED PROOFS. DeMorgan s formulas: The first one is clear from Venn diagram, or the following truth table:

Similar documents
3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Lecture 3: Probability Distributions

Expected Value and Variance

REVIEW OF MAIN CONCEPTS AND FORMULAS A B = Ā B. Pr(A B C) = Pr(A) Pr(A B C) =Pr(A) Pr(B A) Pr(C A B)

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

Difference Equations

Math 426: Probability MWF 1pm, Gasson 310 Homework 4 Selected Solutions

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

APPENDIX A Some Linear Algebra

Marginal Effects in Probit Models: Interpretation and Testing. 1. Interpreting Probit Coefficients

CSCE 790S Background Results

Probability and Random Variable Primer

Module 2. Random Processes. Version 2 ECE IIT, Kharagpur

= z 20 z n. (k 20) + 4 z k = 4

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

MATH 281A: Homework #6

Randomness and Computation

Limited Dependent Variables

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Statistics and Probability Theory in Civil, Surveying and Environmental Engineering

Engineering Risk Benefit Analysis

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Bernoulli Numbers and Polynomials

Lecture 12: Discrete Laplacian

Solutions Homework 4 March 5, 2018

Simulation and Random Number Generation

PES 1120 Spring 2014, Spendier Lecture 6/Page 1

Affine transformations and convexity

PHYS 705: Classical Mechanics. Calculus of Variations II

1 Matrix representations of canonical matrices

Causal Diamonds. M. Aghili, L. Bombelli, B. Pilgrim

CS-433: Simulation and Modeling Modeling and Probability Review

First Year Examination Department of Statistics, University of Florida

Chat eld, C. and A.J.Collins, Introduction to multivariate analysis. Chapman & Hall, 1980

7. Multivariate Probability

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Modelli Clamfim Equazione del Calore Lezione ottobre 2014

More metrics on cartesian products

Open Systems: Chemical Potential and Partial Molar Quantities Chemical Potential

Lecture 3. Ax x i a i. i i

Salmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2

PROBABILITY PRIMER. Exercise Solutions

Linear, affine, and convex sets and hulls In the sequel, unless otherwise specified, X will denote a real vector space.

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

PhysicsAndMathsTutor.com

Foundations of Arithmetic

Exercises of Chapter 2

Math1110 (Spring 2009) Prelim 3 - Solutions

Strong Markov property: Same assertion holds for stopping times τ.

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

A be a probability space. A random vector

Differentiating Gaussian Processes

Convergence of random processes

Lecture 10: May 6, 2013

MA 323 Geometric Modelling Course Notes: Day 13 Bezier Curves & Bernstein Polynomials

MATH Sensitivity of Eigenvalue Problems

2.3 Nilpotent endomorphisms

Indeterminate pin-jointed frames (trusses)

Effects of Ignoring Correlations When Computing Sample Chi-Square. John W. Fowler February 26, 2012

Polynomials. 1 More properties of polynomials

Lecture 6/7 (February 10/12, 2014) DIRAC EQUATION. The non-relativistic Schrödinger equation was obtained by noting that the Hamiltonian 2

Solutions to Problem Set 6

NP-Completeness : Proofs

9 Characteristic classes

Exercise Solutions to Real Analysis

NUMERICAL DIFFERENTIATION

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

COMPLEX NUMBERS AND QUADRATIC EQUATIONS

Quantum Mechanics for Scientists and Engineers. David Miller

5 The Rational Canonical Form

The Feynman path integral

k t+1 + c t A t k t, t=0

Calculus of Variations Basics

Transfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system

Supplementary material: Margin based PU Learning. Matrix Concentration Inequalities

Introduction to Random Variables

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Digital Signal Processing

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1

Probability Theory. The nth coefficient of the Taylor series of f(k), expanded around k = 0, gives the nth moment of x as ( ik) n n!

7. Multivariate Probability

1 (1 + ( )) = 1 8 ( ) = (c) Carrying out the Taylor expansion, in this case, the series truncates at second order:

Bezier curves. Michael S. Floater. August 25, These notes provide an introduction to Bezier curves. i=0

TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES

Lecture 4: Universal Hash Functions/Streaming Cont d

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Applied Stochastic Processes

1 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations

Lecture Torsion Properties for Line Segments and Computational Scheme for Piecewise Straight Section Calculations

Economics 130. Lecture 4 Simple Linear Regression Continued

Expectation propagation

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

STAT 3008 Applied Regression Analysis

18.1 Introduction and Recap

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

Statistics Spring MIT Department of Nuclear Engineering

Modelli Clamfim Equazioni differenziali 7 ottobre 2013

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

Linear Approximation with Regularization and Moving Least Squares

Transcription:

SELECTED PROOFS DeMorgan s formulas: The frst one s clear from Venn dagram, or the followng truth table: A B A B A B Ā B Ā B T T T F F F F T F T F F T F F T T F T F F F F F T T T T The second one can be derved from the frst by changng A to Ā and B to B, thus: Ā B = A B (snce A = A), and takng complement of each sde: Product rule: Ā B = A B Pr(A) Pr(B A) Pr(C A B) = Pr(A B) Pr(A B C) Pr(A) = Pr(A) Pr(A B) Pr(A B C) Total probablty formula: P Pr(B A k )Pr(A k )= All k P All k P All k Pr(A k B) Pr(A k ) Pr(A k )= Pr(A k B) = (because they are dsjont) Pr [ All k (A k B)] = (by dstrbutve law) Pr [( All k A k ) B] =Pr(B) (snce All k A k = Ω) Total mean formula: E (X) = P Pr(X = ) (defnton of expacted value) = P P Pr(X = A k )Pr(A k ) (by the prevous formula) All k = P P Pr(X = A k )Pr(A k ) (nterchange the 2 summatons) All k = P E(X = A k )Pr(A k ) (by defnton of condtonal expacted value) All k

ρ (or, equvalently, ρ 2 ): E{[X µ x λ(y µ y )] 2 } = E[(X µ x ) 2 ] 2λE[(X µ x )(Y µ y )] + λ 2 E[(Y µ y ) 2 ]= Var(X) 2λCov(X, Y )+λ 2 Var(Y ) for any λ (averagng non-negatve quantty yelds non-negatve answer). The last expresson wll have the smallest possble value when the λ-dervatve s zero, namely 2Cov(X, Y )+2λVar(Y )= or λ = Cov(X, Y ) Var(Y ) Substtutng ths nto the same expresson yelds Var(X) 2 Cov(X, Y ) Cov(X, Y )2 Cov(X, Y )+ Var(Y ) Var(Y ) 2 Var(Y )= Var(X) Cov(X, Y )2 Var(Y ) Ths mples or Cov(X, Y )2 Var(X)Var(Y ) Cov(X, Y )2 Var(X)Var(Y ) = ρ2 Expected value of lnear combnaton of RVs: E (ax + by + c) = (ax + by + c) f(x, y)dxdy = a x f(x, y)dxdy + b x f(x, y)dxdy + c = ae (X)+bE(Y )+c (snce an ntegral s a lnear operator, whch means: a constant can be taken out, and: ntegratng a sum can be done by ntegratng the terms ndvdually and addng the answers). In a dscrete case, we use summaton nstead of ntegraton (rest s the same). 2

Varance of lnear combnaton of RVs: Var (ax + by + c) = [(ax + by + c) (aµ x + bµ y + c)] 2 f(x, y)dxdy = [a(x µ x )+b(y µ y )] 2 f(x, y)dxdy = [a 2 (x µ x ) 2 + b 2 (y µ y ) 2 +2ab(x µ x )(y µ y )] f(x, y)dxdy = a 2 Var(X)+b 2 Var(Y )+2abCov((X, Y ) Propertes of MGF: Snce: M X (u) =E[e ux ]= then, qute clearly R e bu All x R All x M ax+b (u) = E[e u(ax+b) ]= R e aux f(x)dx = e bu M X (au) When X and Y are ndependent, then Fnally: e ux f(x)dx All x e aux+bu f(x)dx = M X+Y (u) =E[e u(x+y ) ]= R R e ux+uy f X (x) f Y (y)dxdy = (the ntegral s separable) All y All x R e ux f X (x)dx R e uy f Y (y)dy = M X (u) M Y (y) All x All y M X (u) = E[e ux ]=E[ + ux + u2 2 X2 + u3 3! X3 +...] = +ue[x]+ u2 2! E[X2 ]+ u3 3! E[X3 ]+... whch proves that the smple moments are coeffcents n the Taylor expanson of M X (u). Propertes of PGF: Snce: P X (z) =E[z X ]= P z f() 3

then, when X and Y are ndependent, we get M X+Y (z) =E[z (X+Y ) ]= P P z +j f X () f Y (j) = All j P z f X () P z j f Y (j) =P X (z) P Y (z) All y Snce we get P (k) X (z) = P ( )...( k +)z k f() P (k) X (z) z= = P ( )...( k +)f() = E [X(X )...(X k +)].e. the k th factoral moment. Smlarly P (k) X (z) z= = k!f(k) Convoluton: Snce v x Pr(X + Y<v)= f(x, y)dydx The pdf of V = X + Y s the v dervatve of the above, namely Here, we need to recall that d dv g(v) n general. Central Lmt theorem: We need the MGF of X µ σ/ n = P n f(x, v x)dx f(y)dy = g (v) f[g(v)] = X µ σ n. The MGF of each X µ σ expands to: n + u2 2n + u3 E[(X µ) 3 ] +... 3! σ 3 n 3/2 Rasng ths to the power of n, and takng the n lmt yelds + u2 2n + u3 E[(X µ) 3 n ] +... 3! σ 3 n 3/2 4 /2 n eu2

snce terms wth a hgher-than- power of n n the denomnaton don t matter. Ths s the MGF of the standardzed Normal dstrbuton wth the pdf equal to f(z) = e z2 /2 2π (for all real z). Verfcaton: 2π Composton: e uz e z2 /2 dz = eu2 /2 2π e (z u)2 /2 dz = e u2 /2 where E z P SN = E z SN N = n Pr(N = n) = n= P E z S P n Pr(N = n) = P X (z) n Pr(N = n) = n= n= P n Pr(N = n) =P N () =P N [P X (z)] n= S N = N P X = N s a RV wth PGF gven by P N (z), and the X are IID from a dstrbuton wth the followng PGF: P X (z) Bnomal dstrbuton: Thesamplespaceoftheexpermentconsstsofalln-letter words made up of two letters, S and F (success and falure) - we know there are 2 n of them. Each of these has the probablty of p ( p) n where p s the probablty of a sngle success, and s the number of S letters the word contans. We also know that there are n words wth letters S, the total probablty of gettng successes (n any order) s thus µ n Pr(X = ) = p ( p) n where ranges from to n nclusve. Note that provng the Bnomal Theorem, whch states that µ n A B n =(A + B) n = 5

for any A and B, would be smlar: expand (A + B) (A + B)... (A + B) usng the dstrbutve law, and get a sum of all words consstng of A and B, etc. Wth the help of the Bnomal Theorem, the PGF of X s P (z) = The correspondng mean s the second factoral moment s mplyng, for the varance: µ n (pz) ( p) n =( p + pz) n µ = n( p + pz) n p z= = np n(n )( p + pz) n 2 p 2 z= = n(n )p 2 σ 2 = n(n )p 2 + µ µ 2 = np( p) Geometrc: It s obvous that the probablty of falures followed by a success s Pr(X = ) =pq for any postve (nteger), where q p. The correspondng PGF s: P (z) X pq z = pz( + qz + q 2 z 2 + q 3 z 3 +...) = pz qz = snce +A + A 2 + A 3 +... = A for any A < - to prove that, do the Taylor expanson of A (as a functon of A). Expandng PGF n terms of z at (Maple s qute good at ths) yelds pz ( p)z ' +z ( p) + p p 2 (z ) 2 +... whch mples that the mean s 2( p) p and the varance s p + 2 p p = p 2 p. 2 Negatve bnomal: Snce t s a sum of k ndependent RVs of geometrc type, ts mean and varance are k tmes the prevous two results, and P (z) = µ pz k qz 6

To get the k th success at the th tral, the frst trals must result n exactly k successes (n any order), and the th tral must be a success. Thus, we get µ µ Pr(X = ) = p k q p = p k q k k where s a postve nteger k. Posson: It can be ntroduced as a lmt of the Bnomal dstrbuton, when n but the mean s kept constant at λ (thsmplesthatp = λ n ), namely: snce n n Pr(X = ) = lm n, n 2 n,... all tend to, and n(n )(n 2)...(n +) λ! n ( λ n )n = λ! e λ for any a. Its PGF s P (z) =e λ X = (λz)! lm ( + a n n )n = e a = e λ( z) ' +λ(z ) + λ2 (z ) 2 +... 2 mplyng that the mean s λ and the varance s λ 2 + λ λ 2 = λ. Exponental: It can be ntroduced as a lmt of the geometrc dstrbuton, when we perform n trals every unt of tme, keepng the mean tme (of gettng the frst success) fxed at β (ths mples that p = nβ ), thus: F (x) =Pr(X x) = lm n ( nβ )nx = e x/β for any x>. Ths mples that and the MGF s β e x/β+xu dx = β Expandng n terms of u at yelds f(x) =F (x) = β e x/β e x/β+xu u β x= = +βu + β 2 u 2 +... β(u β ) = βu tellng us that the mean s β and the varance s 2β 2 β 2 = β 2. The memory-less property Pr(X a>x X>a)=Pr(X>x) 7

(where x and a are postve) s verfed by: Pr(X a > x X>a)= Pr(X >x+ a) Pr(X >a) Pr(X a>x X>a) Pr(X >a) = e (x+a)/β e a/β = = e x/β =Pr(X>x) Gamma: Snce t s defned as a sum of k ndependent RVs of the exponental type, ts mean and varance are kβ and kβ 2 respectvely, and ts MGF s ( βu) k. To derve ts pdf, we start wth k =2and do a convoluton of two exponentals, thus: β 2 y e x/β e (y x)/β dx = ye y/β β 2 Convoluton of ths and another exponental (k =3case): β 3 y And one more tme (k =4): 2β 4 y whch makes t obvous that, n general: xe x/β e (y x)/β dx = y2 e y/β 2β 3 x 2 e x/β e (y x)/β dx = y3 e y/β f(x) = xk e x/β (k )!β k 3!β 4 for x> (zero otherwse). To verfy, let s fnd the correspondng MGF: (k )!β k x k e x/β+xu dx = β k ( β = u)k ( βu) k (check!). The dstrbuton functon s h F (x) =Pr(X x) = + x β + x2 + x3 +... + xk e x/β 2β 2 3!β 3 (k )!β k To verfy ths, dfferentate F (x) wth respect to x, and get f(x). Multnomal: It s a generalzaton of bnomal dstrbuton, except: nstead of two possble outcomes, each tral can have three (or more - our formulas wll assume three). We wll call them Wn, Loss, and Te. Then, the probablty of wns, j losses and k tes s computed by Pr(X = Y = j = k) = n,j,k p x p j y pk z 8

canbeprovensthesamemannerasweddforbnomal(thesamplespacewould now consst of all n-letter words bult out of three letters,...). The margnal dstrbutons of X, Y and are (qute obvously) all bnomal, so we can easly compute ther means and varances. To fnd Cov(X, Y ), we wrte X = X + X 2 +... + X n Y = Y + Y 2 +... + Y n where X,X 2,... s the number of wns n Game, Game 2,... (smlarly Y,Y 2,... count the losses). Obvously, each of these 2n RVs can have only two values, or. Now Cov(X + X 2 +... + X n,y + Y 2 +... + Y n )= Cov(X,Y j )=,j= n Cov(X,Y ) Cov(X,Y )+ = Cov(X,Y j )= snce, when 6= j, the RVs are ndependent and have covarance. Snce E (X Y )=(the X Y product cannot have any other value than, as you cannot have a wn and loss n Game at the same tme!), Cov(X,Y )= p x p y. The fnal formula thus reads: Cov(X, Y )= np x p y 6=j 9