CS145: Probability & Computing

CS45: Probability & Computing Lecture 0: Continuous Bayes Rule, Joint and Marginal Probability Densities Instructor: Eli Upfal Brown University Computer Science Figure credits: Bertsekas & Tsitsiklis, Introduction to Probability, 2008 Pitman, Probability, 999

CS45: Lecture 0 Outline Ø Bayes Rule: Classification from Continuous Data Ø Joint and Marginal Probability Densities

Continuous Random Variables Ø For any discrete random variable, the CDF is discontinuous and piecewise constant Ø If the CDF is monotonically increasing and continuous, have a continuous random variable: 0 apple F X (x) apple F X (x 2 ) F X (x )ifx 2 >x. lim x! F X(x) =0 lim x!+ F X(x) = Ø The probability that continuous random variable X lies in the interval (x,x 2 ] is then P (x <Xapple x 2 )=F X (x 2 ) F X (x ) CDF 0 n = b a + F X (x) a b x

Probability Density Function (PDF) Ø If the CDF is differentiable, its first derivative is called the probability density function (PDF): f X (x) = df X(x) dx F X (x) = Z x = F 0 X(x) Ø By the fundamental theorem of calculus: Z x2 x f X (x) dx = F X (x 2 ) F X (x )=P (x <Xapple x 2 ) Ø For any valid PDF: f X (x) 0 Z + f X (t) dt f X (x) dx = CDF 0 f (x X ) b a 0 a b x a b x

Classification Problems Athitsos et al., CVPR 2004 & PAMI 2008 Ø Which of the 0 digits did a person write by hand? Ø Which of the basic ASL phonemes did a person sign? Ø Is an email spam or not spam (ham)? Ø Is this image taken in an indoor our outdoor environment? Ø Is a pedestrian visible from a self-driving car s camera? Ø What language is a webpage or document written in? Ø How many stars would a user rate a movie that they ve never seen?

Continuous & Discrete Variables X! discrete random variable with PMF p X (x) p X (x) 0, X p X (x) =. x Example: Probability of each category in a classification problem. Y! continuous random variable with conditional PDF f Y X (y x) f Y X (y x) 0, Z + f Y X (y x) dy = for all x. Example: Output of temperature sensor, motion sensor, camera, etc. For each possible X=x, the distribution of this sensor is different.

Marginal Distributions are Mixtures X! discrete random variable with PMF p X (x) p X (x) 0, X p X (x) =. Y! continuous random variable with conditional PDF f Y X (y x) f Y X (y x) 0, Z + x f Y X (y x) dy = for all x. Marginal Distribution: Summing over all possible values of X, the marginal CDF (PDF) is a mixture of the conditional CDFs (PDFs): F Y (y) =P (Y apple y) = X x P (X = x \ Y apple y) = X x p X (x)p (Y apple y X = x) = X x p X (x)f Y X (y x) Ø Then taking derivative of each term in sum: f Y (y) = X x p X (x)f Y X (y x)

Example: Hard Drive Lifetimes Ø Suppose 90% of hard drives in some laptop computer 0 model have exponentially distributed lifetime param f Y X (y 0) = 0 e 0y p X (0) = 0.9 Exponential Distributions: Ø However, 0% of hard drives have a manufacturing defect that gives them a shorter lifetime f Y X (y ) = e y Ø Recall mean of exponential distribution: > 0 p X () = 0. E[Y X = 0] = 0 > = E[Y X = ] F Y (y) = e y f Y (y) = e y

Example: Hard Drive Lifetimes Ø Suppose 90% of hard drives in some laptop computer 0 model have exponentially distributed lifetime param f Y X (y 0) = 0 e 0y p X (0) = 0.9 Exponential Distributions: Ø However, 0% of hard drives have a manufacturing defect that gives them a shorter lifetime f Y X (y ) = e y > 0 p X () = 0. Ø If your hard drive has operated for t seconds and has not yet failed, what is the probability it is defective? P (Y > t X = )P (X = ) P (X = Y > t)= P (Y > t) 0.e t = 0.e t +0.9e 0t F Y (y) = e y f Y (y) = e y

Example: Hard Drive Lifetimes Ø Suppose 90% of hard drives in some laptop computer 0 model have exponentially distributed lifetime param f Y X (y 0) = 0 e 0y p X (0) = 0.9 Ø However, 0% of hard drives have a manufacturing defect that gives them a shorter lifetime f Y X (y ) = e y p X () = 0. Ø If your hard drive fails after exactly t seconds of operation, what is the probability it is defective? P (X = Y = t) = > 0 P (Y = t X = )P (X = ) P (Y = t) Problem: For continuous variable Y, P (Y = t) = 0? Exponential Distributions: F Y (y) = e y f Y (y) = e y

Discrete Inference from Continuous Data X! discrete random variable with PMF p X (x) Y! continuous random variable with conditional PDF f Y X (y x) Ø Y has a different PDF for each possible discrete X=x Ø Given Y=y, we want to find the probability of each possible X=x Ø But how can we condition on an event of probability zero? P(X = x Y = y) P(X = x y Y y + δ) l'hopital's Rule = p X(x)P(y Y y + δ X = x) P(y Y y + δ) p X(x)f Y X (y x)δ f Y (y)δ = p X(x)f Y X (y x). f Y (y) nator can be evaluated using a version of the total prob f Y (y) = X x f Y (y) PDF f X (x ) ( ) δ yx x y + δ p X (x)f Y X (y x)

Example: Hard Drive Lifetimes Ø Suppose 90% of hard drives in some laptop computer 0 model have exponentially distributed lifetime param f Y X (y 0) = 0 e 0y p X (0) = 0.9 Exponential Distributions: Ø However, 0% of hard drives have a manufacturing defect that gives them a shorter lifetime f Y X (y ) = e y p X () = 0. Ø If your hard drive fails after exactly t seconds of operation, what is the probability it is defective? P (X = Y = t) = f Y X(y )p X () f Y (y) = > 0 0. e t 0. e t +0.9 0 e 0t F Y (y) = e y f Y (y) = e y

Classification from Continuous Data Suppose X is discrete and Y is continuous, and we observe Y=y. Prior probability mass function for X: p X (x) Conditional probability density function for Y: f Y X (y x) Posterior probability mass function of X given Y=y: px Y (x y) = p X (x)f (y x) Y X f Y (y) f Y (y) = X px (x)f Y X (y x) x weight 280 260 240 220 200 80 60 40 20 00 f Y X (y ) red = female, blue=male f Y X (y 0) 80 55 60 65 70 75 80 height f Y (y)

Moments of Continuous Data Suppose X is discrete and Y is continuous, and we observe Y=y. Prior probability mass function for X: p X (x) Conditional probability density function for Y: f Y X (y x) Marginal expected value of random variable Y: E[Y ]= Z + f Y (y) = X x yf Y (y) dy = X x = X x p X (x)f Y X (y x) p X (x) Z + yf Y X (y x) dy p X (x)e[y X = x] weight 280 260 240 220 200 80 60 40 20 00 f Y X (y ) red = female, blue=male f Y X (y 0) 80 55 60 65 70 75 80 height f Y (y)

Mixed Continuous/Discrete Distributions Representations of distribution of X: Ø CDF is (always) well defined Ø Formally, PDF does not exist in this case Ø Informally, can illustrate via a hybrid of a probability density function (PDF) and a probability mass function (PMF) Ø Related concepts in physics & engineering: Dirac delta function, impulse response Schematic drawing of a combination of a PDF and a PMF /2 0 /2 x 0 The corresponding CDF: F X (x) =P(X x) Distribution of random variable X: Ø With probability 0.5, X has a continuous uniform distribution on [0,] Ø With probability 0.5, X=0.5 CDF 3/4 /4 /2 x

CS45: Lecture 0 Outline Ø Bayes Rule: Classification from Continuous Data Ø Joint and Marginal Probability Densities

Joint Probability Distributions Definition The joint distribution function of X and Y is F (x, y) =Pr(X apple x, Y apple y). The variables X and Y have joint density function f if for all x, y, F (x, y) = Z y Z x f (u, v) du dv. Z Z when the derivative exists. f (x, y) = @2 F (x, y) @x@y P( x X x +,y Y y+ ) f ( ) 2 X,Y x, y

Example: Joint Distribution values of x f XY (x, y) = 5!x(y x)( y) 0 <x<y< Z Z y P (X > 0.25,Y > 0.5) = Z Z y 0.5 0.25 f XY (x, y) dxdy 0 0 x(y x)( y) dxdy = 5! Pitman s Probability, 999

2D Uniform Distributions Density function: f (x, y) = x, y 2 [0, ] 2 0 otherwise. Probability distribution: F (x, y) =Pr(X apple x, Y apple y) = Z min[,x] Z min[,y] 0 0 dxdy F (x, y) = 8 >< >: 0 if x < 0 or y < 0 xy if x, y 2 [0, ] 2 x 0 apple x apple, y > y 0 apple y apple, x > x >, y >.

Definition Marginal Distributions Given a joint distribution function F X,Y (x, y) the marginal distribution function of X is F X (x) =Pr(X apple x) = Z x Z f X,Y (x, y)dydx and the corresponding marginal density functions is f X (x) = Z Example: Uniform Distributions x, y 2 [0, ] 2 f X,Y (x, y) = 0 otherwise. f X,Y (x, y)dy 8 < R : R f X (x) = x 2 [0, ] 0 otherwise.

Independence Definition The random variables X and Y are independent if for all x, y, Pr((X apple x) \ (Y apple y)) = Pr(X apple x)pr(y apple y). Two random variables are independent if and only if F (x, y) =F X (x)f Y (y). f (x, y) =f X (x)f Y (y). If X and Y are independent then E[XY ]=E[X ]E[Y ].

Classification Problems Athitsos et al., CVPR 2004 & PAMI 2008 Ø Which of the 0 digits did a person write by hand? Ø Which of the basic ASL phonemes did a person sign? Ø Is this image taken in an indoor our outdoor environment? Ø Is a pedestrian visible from a self-driving car s camera? If raw data is a list of M continuous features (like pixel intensities), scale models to high-dimensional data via independence assumptions: f Y X (y,y 2,...,y M x) = MY f Yi X(y i x) i=

Buffon s Needle Bu on s needle Parallel lines at distance d Needle of length (assume <d) Find P(needle intersects one of the lines) x l q d X [0,d/2]: distance of needle midpoint to nearest line Model: X, uniform, independent 4 d fx, ( x, ) = 0 x d/ 2, 0 /2 Intersect if X sin 2 Z Z P X sin 2 = Z Z x 2 sin f X (x)f ( ) dx d 4 Z /2 Z ( /2) sin = dx d d 0 0 = Z 4 /2 2 sin d = d 0 2 d