Conditional Distributions

Similar documents
Brownian Motion and Stochastic Calculus

Math 140A - Fall Final Exam

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14

THE INVERSE FUNCTION THEOREM

MATH 6337: Homework 8 Solutions

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 8 10/1/2008 CONTINUOUS RANDOM VARIABLES

Economics 204 Fall 2011 Problem Set 2 Suggested Solutions

Probability (continued)

Review: mostly probability and some statistics

Change Of Variable Theorem: Multiple Dimensions

Solutions: Problem Set 4 Math 201B, Winter 2007

1 Solution to Problem 2.1

A solution to the exercise in the slide p.17

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Mathematics 426 Robert Gross Homework 9 Answers

4. CONTINUOUS RANDOM VARIABLES

ECE 4400:693 - Information Theory

Kolmogorov Equations and Markov Processes

Math 67. Rumbos Fall Solutions to Review Problems for Final Exam. (a) Use the triangle inequality to derive the inequality

HW4 : Bivariate Distributions (1) Solutions

Continuous Random Variables

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get

Lecture 3: Random variables, distributions, and transformations

Find the indicated derivative. 1) Find y(4) if y = 3 sin x. A) y(4) = 3 cos x B) y(4) = 3 sin x C) y(4) = - 3 cos x D) y(4) = - 3 sin x

More on Distribution Function

MATH 52 FINAL EXAM DECEMBER 7, 2009

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm

Hints/Solutions for Homework 3

Let X and Y denote two random variables. The joint distribution of these random

Chapter 4: Modelling

Conditional densities, mass functions, and expectations

Multiple Random Variables

Sample Spaces, Random Variables

DIFFERENTIATION RULES

2 Functions of random variables

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N

Math 172 Problem Set 5 Solutions

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

Introduction to ODE's (0A) Young Won Lim 3/9/15

Ch. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited

Exercises. T 2T. e ita φ(t)dt.

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

The Central Limit Theorem: More of the Story

STAT J535: Introduction

MAE294B/SIOC203B: Methods in Applied Mechanics Winter Quarter sgls/mae294b Solution IV

ELEMENTS OF PROBABILITY THEORY

Lecture 8: Channel Capacity, Continuous Random Variables

ST5215: Advanced Statistical Theory

7. Let X be a (general, abstract) metric space which is sequentially compact. Prove X must be complete.

Chapter 5. Random Variables (Continuous Case) 5.1 Basic definitions

Stat 366 A1 (Fall 2006) Midterm Solutions (October 23) page 1

Stochastic Differential Equations

Lecture Notes 3 Multiple Random Variables. Joint, Marginal, and Conditional pmfs. Bayes Rule and Independence for pmfs

Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic

Probability Density (1)

BASICS OF PROBABILITY

EXACT EQUATIONS AND INTEGRATING FACTORS

Math 113/113H Winter 2006 Departmental Final Exam

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part II Probability and Measure

SDS 321: Introduction to Probability and Statistics

Lecture 11: Probability, Order Statistics and Sampling

Mathematical Statistics 1 Math A 6330

Statistical Learning Theory

converges as well if x < 1. 1 x n x n 1 1 = 2 a nx n

Probability and Distributions

Economics 620, Lecture 8: Asymptotics I

1 Review of di erential calculus

Exercises and Answers to Chapter 1

REAL ANALYSIS I HOMEWORK 4

Random Variables. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

Order Statistics and Distributions

Metric Spaces. Exercises Fall 2017 Lecturer: Viveka Erlandsson. Written by M.van den Berg

MULTIVARIATE PROBABILITY DISTRIBUTIONS

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.

Math 265 (Butler) Practice Midterm III B (Solutions)

Continuous Distributions

A CLASSROOM NOTE: ENTROPY, INFORMATION, AND MARKOV PROPERTY. Zoran R. Pop-Stojanović. 1. Introduction

MIT 2.71/2.710 Optics 10/31/05 wk9-a-1. The spatial frequency domain

On Reparametrization and the Gibbs Sampler

n 2 xi = x i. x i 2. r r ; i r 2 + V ( r) V ( r) = 0 r > 0. ( 1 1 ) a r n 1 ( 1 2) V( r) = b ln r + c n = 2 b r n 2 + c n 3 ( 1 3)

1* (10 pts) Let X be a random variable with P (X = 1) = P (X = 1) = 1 2

Jim Lambers MAT 280 Summer Semester Practice Final Exam Solution. dy + xz dz = x(t)y(t) dt. t 3 (4t 3 ) + e t2 (2t) + t 7 (3t 2 ) dt

Stat 5101 Notes: Algorithms

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( )

x 2 y = 1 2. Problem 2. Compute the Taylor series (at the base point 0) for the function 1 (1 x) 3.

Transformations and Expectations

Extremal behaviour of chaotic dynamics

ADVANCED PROBABILITY: SOLUTIONS TO SHEET 1

Elementary ODE Review

Lecture 11: Continuous-valued signals and differential entropy

Chapter 4 Multiple Random Variables

MA 242 Review Exponential and Log Functions Notes for today s class can be found at

Notes on Random Vectors and Multivariate Normal

Review of Probability Theory

(x 3)(x + 5) = (x 3)(x 1) = x + 5. sin 2 x e ax bx 1 = 1 2. lim

Homework 11. Solutions

Transcription:

Conditional Distributions The goal is to provide a general definition of the conditional distribution of Y given X, when (X, Y ) are jointly distributed. Let F be a distribution function on R. Let G(, ) be a map from R B R to [, 1] satisfying: (a) G(x, ) ia a probability measure on B R for every x in R, and, (b) G(, A) is a measurable function for every Borel set A. We can then form the generalized product F G in the following sense: There exists a measure H on B R 2, which we call F G such that: (F G)[(, x ] (, y ]] = x G(x, (, y ]) df (x). More generally, for Borel subsets A, B, we should have: (F G)(A B) = G(x, B) df (x). If (X, Y ) has distribution F G, then the marginal of X is F and the conditional of Y given X = x is G(x, ). From F G, we can recover the marginal distribution of Y, say F and the conditional of X given Y = y, say G(y, ), where G has the same properties as G and F G gives the distribution of (Y, X). Note that: A F (y ) = P (X <, Y y ) = G(x, (, y ]) df (x). When G does not depend on its first co-ordinate, i.e. G(x, ) is the same measure for all x, X and Y are independent, and G( ) G(, ), gives the marginal distribution of Y. Example 1: Suppose that F is U(, 1), and that G is defined by: G(x, {1}) = x and G(x, {}) = 1 x. 1

We seek to find F G which is defined on ([, 1] {, 1}, B [,1] 2 {,1} ). Let (X, Y ) follow F G. Now, Similarly, P (X x, Y = 1) = x G(u, {1}) df (u) = x u du = x2 2. (.1) P (X x, Y = ) = x x2 2. (.2) So P (Y = 1) = 1/2; therefore Y Ber(1/2). It is also clear from the above discussion that given X, Y Ber(x). This can also be verified through the limiting definition of conditional probabilities that was discussed before. P (Y = 1 X = x) = lim h P (Y = 1 X [x, x + h]) P (Y = 1, X [x, x + h]) = lim h P (X [x, x + h]) = lim h x+h x u du h = x. Next, we seek to find H(y, ), the conditional of X given Y = y. We can do it via the definition of conditional probabilities when conditioning on a discrete random variable but let s try the more formal recipe. We have: P (Y = 1, X x) = H(y, (, x]) df (y) = H(1, (, x]) 1 2, {1} and P (Y =, X x) = and using (.1) and (.2), we get: {} H(y, (, x]) df (y) = H(, (, x]) 1 2, H(1, (, x]) = x 2 and H(, (, x]) = 2x = x 2. Example 2: Let X be a continuous random variable with: Note that the distribution function of X is: f(x) = 2 3 e x 1(x > ) + e x 1(x < ). F (x) = P (X x) = 1 3 ex 1(x ) + 2 ( 1 3 + 2 ) 3 (1 e x ) 1(x > ).

We first find the conditional distribution of Y = sign(x) given X, i.e. P (Y = 1 X = x) for x >. So, it suffices to get a transition function G(t, a), a { 1, 1}, t >, such that: and Now, and P ( X x, Y = 1) = P ( X x, Y = 1) = x x G(t, 1) df X ) t. (.3) G(t, 1) df X ) t. P ( X x, Y = 1) = P ( < X x) = 2 3 (1 e x ), P ( X x, Y = 1) = P ( x X < ) = 1 3 (1 e x ). So P ( X x) = 1 e x. Now, by (.3), 2 3 (1 e x ) = x G(t, 1) d(1 e t ) = x G(t, 1) e t dt, showing that G(x, 1) = 2/3. Similarly G(x, 1) = 1/3. Note that the distribution corresponding to f can be generated by the following stochastic mechanism: Let V follow Exp(1) and let B be a { 1, 1} valued random variable independent of V, with p B (1) = 2/3 = 1 p B ( 1), and let X = V 1{B = 1} V 1{B = 1}. Then V is precisely X and X f. Note that the sign of X is precisely B and it is independent of V = X by the mechanism itself. So the conditional of sign(x) given X is simply the unconditional distribution of B and we obtain the same result as with the formal derivation. Next, consider the distribution of X given Y. Note that P (Y = 1) = 1/3 and P (Y = 1) = 2/3. We have: so H( 1, (, x]) = 3 P (Y = 1, X x) Similarly, we compute H(1, (, x]). P (Y = 1, X x) = H( 1, (, x]) 1 3, = 3 P (X x) [ ] 1 = 3 1(x > ) + P (X x) 1(x ) = e x 1(x ) + 1(x > ). 3 3

.1 Order statistics and conditional distributions Let X 1, X 2,..., X n be i.i.d. from a distribution F with Lebesgue density f. Let X (1), X (2),..., X (n) be the corresponding order statistics. Note that the order statistics are all distinct with probability 1 and P ( (X (1), X (2),..., X (n) ) B) = 1, where B = {(x 1, x 2,..., x n ) : x 1 < x 2 <... < x n }. Let s first find the joint density of the order statistics. Let Π be the set of all permutations of the numbers 1 through n. For a measurable subset of B, say A, we have: P ((X (1), X (2),..., X (n) ) A)) = P ( π Π {(X π1, X π2,..., X πn ) A}) = π Π P ({(X π1, X π2,..., X πn ) A}) This shows that: = n! P ((X 1, X 2,..., X ) A) = n! f(x 1, x 2,..., x n ) dx 1 dx 2... dx n. A f ord (x 1, x 2,..., x n ) = n! Π n i=1 f(x i ), (x 1, x 2,..., x n ) B. Remark: If we assumed that the X i s were not independent but came from an exchangeable distribution with density f(x 1, x 2,..., x n ), i.e. the distribution of the X i s is invariant to permutations of the X i s, then f is necessarily symmetric in its arguments and an argument similar to the one above would show that f ord (x 1, x 2,..., x n ) = n! f(x 1, x 2,..., x n ), (x 1, x 2,..., x n ) B. Now, consider the situation that the distribution of (X 1, X 2,..., X n ) is exchangeable. We seek to find: P ((X 1, X 2,... X n ) = (x π1, x π2,..., x πn ) X (1) = x 1, X (2) = x 2,..., X (n) = x n ) for some permutation π. Let τ be an arbitrary permutation. Note that (Y 1, Y 2,..., Y n ) (X τ1, X τ2,..., X τn ) has the same distribution as (X 1, X 2,..., X n ). Thus, P((X 1, X 2,... X n ) = (x π1, x π2,..., x πn ) X (1) = x 1, X (2) = x 2,..., X (n) = x n ) = P((Y 1, Y 2,... Y n ) = (x π1, x π2,..., x πn ) Y (1) = x 1, Y (2) = x 2,..., Y (n) = x n ) = P((X τ1, X τ2,... X τn ) = (x π1, x π2,..., x πn ) X (1) = x 1, X (2) = x 2,..., X (n) = x n ) = P((X 1, X 2,... X n ) = (x (π τ 1 ) 1, x (π τ 1 ) 2,..., x (π τ 1 ) n ) X (i) = x i, i = 1,..., n). 4

As τ runs over all permutations, so does π τ 1, showing that the conditional probability under consideration does not depend upon the permutation π initially fixed. As there are n! permutations, we conclude that: P ((X 1, X 2,... X n ) = (x π1, x π2,..., x πn ) X (1) = x 1, X (2) = x 2,..., X (n) = x n ) = 1 n!. An example with Uniforms: Suppose that X 1, X 2,..., X n are i.i.d. Uniform (, θ). The joint density of {X (i) } is given by: f ord (x 1, x 2,..., x n ) = n! θ n 1 { < x 1 < x 2 <... < x n < θ}. The marginal density of the maximum, X (n) is: f X(n) (x n ) = n θ n xn 1 n 1 { < x n < θ}. So, the conditional density of (X (1), X (2),..., X (n 1) ) X (n) = x n ), by direct division, is seen to be: (n 1)! f cond (x 1, x 2,..., x n 1 ) = 1 { < x 1 < x 2 <... < x n }. x n 1 n This shows that the first n 1 order statistics given the maximum, x n, are distributed as the n 1 order statistics from a sample of size n 1 from Uniform(, x n ). But note that the distribution of the vector {X 1, X 2,..., X n } {X (n) } given (X (1), X (2),..., X (n) ) must be uniformly distributed on all the (n 1)! permutations of the first n 1 order statistics. Thus, the random vector {X 1, X 2,..., X n } {X (n) }, conditional on X (n), must behave like an i.i.d. random sample from Uniform(, X (n) ). These arguments can be made more rigorous but at the expense of much notation. Order statistics and non-exchangeable distributions: Take (X, Y ) to be a pair of independent random variables, each defined on (, 1), with X having Lebesgue density f and Y having Lebesgue density g. Now, X and Y are not exchangeable. We consider P ((X, Y ) A, (U, V ) B) where U = X Y, V = X Y, A is a Borel subset of (, 1) and B a Borel subset of {x < y : x, y (, 1)}. Let π be the permutation on {1, 2} that swaps indices. Then: P ((X, Y ) A, (U, V ) B) = P ((X, Y ) A (B πb)) = P ((X, Y ) A B) + P ((X, Y ) (A πb) = f(x)g(y) dx dy + f(x)g(y) dxdy A B A πb = f(u)g(v) du dv + f(v)g(u) dudv (change of variable) A B πa B =, {(f(u)g(v) 1((u, v) A) + f(v)g(u) 1((u, v) πa)} du dv. B 5

From the above derivation, taking A to be the unit square, we find that: P ((U, V ) B) = (f(u)g(v) + f(v)g(u)) dudv, so that df U,V (u, v) = (f(u)g(v) + f(v)g(u)) dudv. Conclude that: P ((X, Y ) A, (U, V ) B) = ξ((u, v), A) df U,V (u, v) where, for u < v, ξ((u, v), A) = B B f(u) g(v) f(v) g(u) 1((u, v) A) + 1((u, v) πa). f(u)g(v) + f(v)g(u) f(u)g(v) + f(v)g(u) Remark: If (X 1, X 2,..., X n ) is a random vector with density n i=1 f i(x i ), you should be able to guess the form of the conditional distribution of the X i s given the order statistics. 6