Continuous random variables and probability distributions

and probability distributions Sta. 113 Chapter 4 of Devore March 12, 2010

Table of contents 1 2

Mathematical definition Definition A random variable X is continuous if its set of possible values is an entire interval of real numbers: x [A,B] for A < B.

Examples 1 Heights of people. 2 Amount of rainfall per square meter. 3 IQ scores.

Probability distributions Definition Let X be a continuous rv. The probability density function (pdf) of X is a function p(x) such that for any two numbers a b IP(a X b) = b a p(x) dx. The probability that X takes values in the interval [a,b] is the area under the graph of the density function in the interval.

Picture

Restatement Proposition Let X be a continuous rv. Then for any number c, IP(X = c) = 0 and for any two numbers a < b IP(a X b) = IP(a < X b) = IP(a X < b) = IP(a < X < b).

Cumulative distribution function Definition The cumulative distribution function (cdf) F(x) for a continuous rv X is defined for every number x by F(x) = IP(X x) = x p(u)du. So for for each x, F(x) is the area under the density to the left of x.

Picture 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 10 8 6 4 2 0 2 4 6 8 10

Matlab code x= -10:.01:10; y = normpdf(x,.5,1); plot(x,y, r ); hold on; y1 = normcdf(x,.5,1); plot(x,y1, b );

More properties Proposition Let X be a continuous rv with pdf p(x) and cdf F(x). Then for any number a, IP(X > a) = 1 F(a), and for any two numbers a < b, IP(a X b) = F(b) F(a).

More properties Theorem (Radon-Nikodym) Let X be a continuous rv with pdf p(x) and cdf F(x). Then at every x at which F (x) exists, F (x) = p(x).

Picture 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Erf 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 10 8 6 4 2 0 2 4 6 8 10

Matlab code x= 0:.01:.2; y = unipdf(x,0,2); plot(x,y, r ); hold on; y1 = unicdf(x,0,2); plot(x,y1, b );

Matlab code x= -10:.01:10; y = normpdf(x,.5,1); plot(x,y, r ); hold on; y1 = normcdf(x,.5,1); plot(x,y1, b );

Reprise The opposite of integrate is differentiate. Proposition F(x) = x F (x) = p(x). p(u)du,

Mean Definition The expected or mean value of a continuous rv X with pdf p(x) is µ X = E(X) = and the expectation of a function h(x) is µ h(x) = E[h(x)] = x p(x)dx, h(x) p(x)dx.

Variance Definition The variance of a continuous rv X with pdf p(x) is σ 2 = V(X) = (x µ) 2 p(x)dx, X and the standard deviation is σ X.

Percentiles Definition Let p be a number between 0 and 100 the p-th percentile of the distribution of a continuous rv X is the value a such that p = 100 F(a) = 100 a The median is the value a for which p = 50. p(u) du.

For real numbers a < b 0 if x < a 1 p(x;a,b) = b a if a x < b 0 if x > b. 0 if x < a x a F(x;a,b) = b a if a x < b 1 if x > b.

Picture 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Mean µ X = = = = = b a 1 b a x dx b 1 x dx b a a 1 b a x2 b a 1 b 2 a 2 b a 2 1 b a = b + a 2 (b a)(b + a) 2

Variance Z b σ 2 X = 1 a b a (x a + b ) 2 dx. 2 Change of variables u = x a+b 2 σ 2 X = = = = = = Z 1 (b a)/2 u 2 du b a (a b)/2 " # 1 (b a) 3 (a b)3 3(b a) 8 8 " # 1 (b a) 2 (a b)(b a) 3 8 8 " 1 a 2 + b 2 2ab b2 a 2 # + 2ab 3 8 8 " # 1 2(a b) 2 3 (a b) 2 12 8

pdf Definition A continuous rv X has a Gaussian or normal distribution with paramaters < µ < and 0 < σ with pdf p(x) = No(x;µ,σ) = 1 σ 2π e (x µ)2 /(2σ2).

cdf Theorem If X N(µ,σ) the cdf is IP(X x) = 1 σ 2π x e (z µ)2 /(2σ 2) dz.

Picture 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 10 8 6 4 2 0 2 4 6 8 10

Carl Friedrich Gauss

Abraham de Moivre

The normal pdf Fix µ =.4 and vary σ.

The normal pdf 0.7 0.6 0.5 p(x) 0.4 0.3 0.2 0.1 0 20 10 0 10 20

The normal pdf 0.35 0.3 0.25 p(x) 0.2 0.15 0.1 0.05 0 20 10 0 10 20

The normal pdf 0.25 0.2 p(x) 0.15 0.1 0.05 0 20 10 0 10 20

The normal pdf 0.15 p(x) 0.1 0.05 0 20 10 0 10 20

The normal pdf 0.15 0.1 p(x) 0.05 0 20 10 0 10 20

The normal pdf 0.12 0.1 0.08 p(x) 0.06 0.04 0.02 0 20 10 0 10 20

The normal pdf 0.1 0.08 p(x) 0.06 0.04 0.02 0 20 10 0 10 20

The normal pdf 0.08 0.06 p(x) 0.04 0.02 0 20 10 0 10 20

The normal pdf p(x) 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 20 10 0 10 20

The normal pdf 0.07 0.06 0.05 p(x) 0.04 0.03 0.02 0.01 0 20 10 0 10 20

Matlab code x=-20:.0005:20; sig =.5; for i=1:10 y=normpdf(x,0,sig*i); ym = max(y); figure(i); plot(x,y, * ); h=gca; set(h, FontSize,[20]); set(h, XLim,[-20 20]); set(h, YLim,[0 ym]); xlabel( x ); ylabel( p(x) ); filename = sprintf( vpnorm%d.eps,i); saveas(h,filename, psc2 ) end

Standardization Definition A continuous rv X has a standard normal distribution if it is a Gaussian with with paramaters µ = 0 and σ = 1. Definition A Gaussian rv X with mean µ and standard deviation σ can be standardized to a standard normal variable Z by the following transformation z = x µ σ, called a z-score.

Erf Definition The error function is the cdf of a standard normal rv erf (z) = Φ(z) = IP(X z) = 1 2π z e u2 /2 du.

z α notation Definition The notation z α denotes the value z such that for a standard normal rv Pr(Z z α ) = α or Pr(Z < z α ) = 1 α.

Pictures

Standardization If X No(µ,σ) then Z = X µ, σ is standard normal. Thus ( a µ IP(a X b) = IP Z b µ ) σ σ ( ) ( ) b µ a µ = Φ Φ, σ σ and ( ) a µ IP(X a) = Φ σ ( ) b µ IP(X > b) = 1 Φ. σ

Pictures

What to do with normality An examine is taken and it is decided that grading will be curved. The mean grade is 74 points and the standard deviation is 5 points. The professor decides that grades will be given based on quantiles 90-th quantile A between 75 90-th quantile B between 65 75-th quantile C between 45 65-th quantile D less than 45-th quantile F What scores define the boundaries for grades?

Percentile computation The distribution is No(74, 5). We need to compute values v 1, v 2, v 3, v 4 such that IP(X < v 1 ) =.9 IP(X < v 2 ) =.75 IP(X < v 3 ) =.65 IP(X < v 4 ) =.45. First standardize X Z so So which implies IP(Z < z) =.9 = Φ(z.1 ). v 1 µ = z.1 σ v 1 = σ z.1 + µ = 5 z.9 + 74. Same idea for v 2, v 3, v 4, v 5.

pdf Definition A continuous rv X has an exponential distribution with paramater λ > 0 with pdf p(x) = exp(x;λ) = λe λx for x > 0. µ = 1 λ σ = 1 λ

cdf Definition The cdf of an exponential distribution with paramater λ > 0 is IP(X < x) = 1 e λx for x > 0.

The exponential pdf Vary λ. Note that in Matlab the exponential distribution is scaled by the location or mean parameter µ = 1 λ.

The exponential pdf 2 1.5 p(x) 1 0.5 0 Artin 0 Armagan and Sayan 10 Mukherjee 20 30 40

The exponential pdf 1 0.8 0.6 p(x) 0.4 0.2 0 Artin 0 Armagan and Sayan 10 Mukherjee 20 30 40

The exponential pdf 0.6 0.5 0.4 p(x) 0.3 0.2 0.1 0 0 10 20 30 40

The exponential pdf 0.5 0.4 0.3 p(x) 0.2 0.1 0 Artin 0 Armagan and Sayan 10 Mukherjee 20 30 40

The exponential pdf 0.4 0.35 0.3 0.25 p(x) 0.2 0.15 0.1 0.05 0 0 10 20 30 40

The exponential pdf 0.3 0.25 0.2 p(x) 0.15 0.1 0.05 0 0 10 20 30 40

The exponential pdf 0.25 0.2 p(x) 0.15 0.1 0.05 0 0 10 20 30 40

The exponential pdf 0.25 0.2 0.15 p(x) 0.1 0.05 0 0 10 20 30 40

The exponential pdf 0.2 0.15 p(x) 0.1 0.05 0 0 10 20 30 40

Matlab code x=0:.0005:40; mu =.5; for i=1:10 y=exppdf(x,mu*i); ym = max(y); figure(i); plot(x,y, * ); h=gca; set(h, FontSize,[20]); set(h, XLim,[0 40]); set(h, YLim,[0 ym]); xlabel( x ); ylabel( p(x) ); filename = sprintf( vlexp%d.eps,i); saveas(h,filename, psc2 ) end

Some properties Proposition Suppose that the number of events occurring in any time interval of length t has a Poisson distribution with parameter λt (where λ, the rate of the event process, is the expected number of events occurring in 1 unit of time) and that numbers of occurrences in nonoverlapping intervals are independent of one another. Then the distribution of elapsed time between the occurrence of two successive events is exponentially distributed with parameter λ.

Some properties Proposition The exponential distribution is the continuous analog of the geometric distribution. Proposition The exponential distribution is memoryless IP(T > s + t T > s) = IP(T > t)for all s,t 0. In words if the component lasts time s then its chance of failure in time t + s is a function of t and has nothing to do with s.

functions The exponential distribution is an example of a distribution from a more general class of functions. The gamma distribution. We first need to define the gamma function which for α > 0 Γ(α) = 0 x α 1 e x dx.

pdf Definition A continuous rv X has a gamma distribution with paramaters α,β > 0 with pdf p(x) = gamma(x;α,β) = 1 β α Γ(α) xα 1 e x/β for x > 0. The standard gamma distribution has β = 1.

Properties µ = α β σ 2 = α β 2 The parameter β is called the scale parameter because it stretches or compresses the distribution. Plot the gamma distribution for a variety of α and β.

distribution The chi-square distribution is a particular case of the gamma distribution. If x N(µ,σ) then the chi-square distribution is related to the distribution of functions of x 2.

pdf Definition A continuous rv X has a chi-squared distribution with paramater ν > 0 if the pdf is the gamma distribution with α = ν/2 and β = 2 p(x) = gamma(x;ν) = 1 2 ν/2 Γ(ν/2) xν/2 1 e x/2 for x > 0. The parameter ν is the number of degrees of freedom of X.

pdf Definition A continuous rv X has a beta distribution with paramaters α,β > 0 and real unmbers a < b with pdf p(x;α,β,a,b) = { ( ) 1 Γ(α+β) α 1 ( β 1 x a b x b a Γ(α)Γ(β) b a b a) a x < b 0 otherwise.

Properties Plot the beta distribution as a function of α,β. How does the beta distribution relate to the uniform? How does the beta distribution relate to the gamma?