MAS223 Statistical Modelling and Inference Examples

Size: px
Start display at page:

Download "MAS223 Statistical Modelling and Inference Examples"

Transcription

1 Chapter MAS3 Statistical Modelling and Inference Examples Example : Sample spaces and random variables. Let S be the sample space for the experiment of tossing two coins; i.e. Define the random variables X to be the number of heads seen, S {HH, HT, T H, T T }. Y to be equal to 5 if we see both a head and a tail, and otherwise. Element of S Value of X Value of Y HH HT 5 T H 5 T T Example : Discrete random variables The random variables X and Y from Example are both discrete random variables. > Calculate P[X ]. If X then either X or X. We have P[X ] + P[X ] 3 4. > Sketch the distribution function of X. A sketch of its distribution function looks like: Example 3: Continuous random variables > Recall from MAS3 that an exponential random variable with parameter λ > has probability density function λe λx x > f X x Calculate P[ X ], and find the distribution function F X x.

2 We can calculate P[ X ] f X x dx λe λx dx [ e λx] x e λ e λ. To find the distribution function, note that for x we have P[X ] and for x > we have x x P[X x] f X u du λe λu du e λx. Therefore, e λx if x >, F X x A sketch of the distribution function F X x looks like: Example 4: Properties of distribution functions > Let x if x > F x Sketch F and show that F is a distribution function A sketch of F looks like To show that F is a distribution function, we ll check properties -3 from Section... From the definition, F x for all x. Since F x for all x we have lim x F x, and also lim x x.. Since F x for all x, its clear that F x is non-decreasing while x. If x < y then y < x so x y. Hence F is non-decreasing across all x R.

3 3. From its definition, F is continuous for x, and x,. Since F + F F, we have that F is continuous everywhere. Alternatively, in this course, we allow ourselves to prove continuity by drawing a sketch, as above. Hence, F is a distribution function, and as a result there exists a random variable X with distribution function F X F. Example 5: Calculating expectations and variances > Let X be an Exponential random variable, from Example 3, with p.d.f. λe λx x > f X x Find the mean and variance of X. We can calculate, integrating by parts, E[X] + xf X x dx [ λ e λx ] x λ. xλe λx [x e λx] x e λx dx For the variance, it is easiest to calculate E[X ] and then use from MAS3 that VarX E[X ] E[X]. So, E[X ] + λ x f X x dx xλe λx dx λ x λe λx [x e λx] where we use that we already calculated xλe λx dx λ. Hence, Chapter VarX E[X ] E[X] λ. Example 6: Calculating E[e Y ] where X N,. x > Let Y be a normal random variable, with mean and variance, with p.d.f. f Y y π e y /. x e λx dx Find E[e Y ]. We need to calculate E[e Y ] e y f Y y dy π e y e y / dy. 3

4 We can t evaluate this integral explicitly. However, we do know the value of a similar integral, that is we know P[Y R] π e y / dy. Our aim is to rewrite E[e Y ] into this form and hope we can deal with whatever else is left over. We can do so by completing the square: e y e y / exp y + y + Putting this into, we have E[e Y ] e / exp y + e y / e /. e / π e y / dy π e z / dz where z y. Then, using we have E[e Y ] e /. See Q.9 for a more general case of this method. Example 7: Mean and variance of the Gamma distribution > Let X have the Gaα, β distribution, where α, β >. Find the mean and variance of X. We can calculate E[X] xf X x dx β α Γα xα e βx dx βα Γα + Γα β α+ using Lemma.3 αγα using Lemma. βγα α β. Similarly, for the variance, So E[X ] x f X x dx β α Γα xα+ e βx dx βα Γα + Γα β α+ using Lemma.3 αα + Γα β Γα using Lemma. αα + β. VarX E[X ] E[X] αα + α β α β. 4

5 Example 8: Mean and variance of the Beta distribution > Let X have the Beα, β distribution, where α, β >. Find the mean and variance of X. For the mean, For the variance, E[X] Bα, β xα x β dx Bα +, β Bα, β / Γα + Γβ ΓαΓβ using.5 Γα + β + Γα + β αγαγα + β using Lemma. α + βγα + βγα α α + β. E[X ] Bα, β xα+ x β dx Bα +, β Bα, β / Γα + Γβ ΓαΓβ using.5 Γα + β + Γα + β αα + ΓαΓα + β using Lemma. α + βα + β + Γα + βγα αα + α + βα + β +. So, using that VarX E[X ] E[X] we have Chapter 3 VarX αα + α + βα + β + α α + β αα + α + β α α + β + α + β α + β + αβ α + β α + β +. Example 9: Cube root of the Be3, distribution. > Let X Be3, and let Y 3 X. Find the probability density function of Y. From.6, the p.d.f. of the Beα, β distribution is Bα,β f X x xα x β if x, 5

6 Note that, for any α >, Bα, ΓαΓ Γα + ΓαΓ αγα α by.5 and Lemma.. Putting this, along with α 3 and β into.6, the p.d.f. of X is 3x if x, f X x For the transformation, we use the function gx 3 x, which is strictly increasing. 3 The p.d.f. of X is non-zero on,, and g maps R X, to,, so gr X,. We have g y y 3, so dg dy 3y. Therefore, by Lemma 3. we have 3y 3 3y if y, f Y y otherwise, 9y 8 if y, In fact, using the same calculations as above, it can be seen that this is the p.d.f. of a Be9, distribution. See Q3.5 for a more general case. Example : Standardization of the normal distribution. > Let X Nµ, σ and define Y X µ σ. Show that Y N,. The p.d.f. of the normal distribution with mean µ and variance σ is f X x x µ exp πσ σ with range R X R. The function gx x µ σ is strictly increasing, and gr R. 6

7 If y x µ σ then x σy + µ, hence the inverse function is g y σy + µ, with derivative dg dy σ >. Hence, by Lemma 3., f Y y exp y σ πσ π exp which is the p.d.f. of a N, random variable. Example : The log-normal distribution. y > Find the probability density function of Y e X, where X Nµ, σ. Recall that Y is known as the log-normal distribution, which we introduced in Section... The probability density function of X is f X x πσ exp x µ σ, which is non-zero for all x R. Our transformation is gx e x, which is strictly increasing for all x R. The range of X is R, which is mapped by g to gr,. We have g y log y, and dg dy f Y y y. Hence, by Lemma 3. the p.d.f. of Y is given by yσ exp π log y µ σ if y, otherwise. Example : Square of a standard normal the chi-squared distribution. > Let X N, and let Y X. Find the p.d.f. of Y and verify that Y has the χ distribution. We aim to find the p.d.f. of Y and check that it matches the p.d.f. given for the χ distribution in Section.3.3. Note that R X R, and we can t apply Lemma 3. because gx x is not strictly monotone on R. If y < then P[Y y] because Y X. Moreover, because the normal distribution is a continuous distribution, P[X ], so also P[Y ] 7

8 This leaves y >, and in this case we have F Y y P[Y y] P[ y X y] P[X y] P[X y] Φ y Φ y Here, Φx P[X x] is the distribution function of the standard normal distribution. Differentiating with respect to y, we have f Y y y φ y y φ y y φ y πy exp y/. Here, φ is the probability density function of the standard normal distribution. We use that φx φ x. If we recall from Section.3. that Γ/ π, we then have y f Y y Γ/ exp y if y > which exactly matches the p.d.f. given for the χ distribution in Section.3.3. Chapter 4 Example 3: Joint probability density functions > Let T be the triangle {x, y : x,, y, x}. Define kx + y if x, y T fx, y Find the value of k such that f is a joint probability density function. First, we sketch the region T on which f X,Y x, y is non-zero. 8

9 We need fx, y for all x, y, which means we must have k. T fx, y dx dy. Therefore, fx, y dy dx k k T x k. kx + y dy dx [xy + y 3x dx ] x y dx Also, we need that So k. Here, to find the limits of integration, we describe the region T as being covered by vertical lines, one for each fixed x. With x fixed, the range of y that makes up T is y, x. That is, we use that T {x, y : x,, y, x}. > If X and Y have joint p.d.f. f X,Y x, y fx, y, find P[X + Y > ]. To find P[X + Y > ], we need to integrate f X,Y x, y over the region of x, y for which x, y T and x + y >. Let s call this region T, and sketch it. 9

10 We have T {x, y : x,, y x, x}. So, P[X + Y > ] x x x + y dy dx [ xy + y ] x y x dx 4x dx [ 4 3 x3 x 3 3 ] 3 Example 4: Marginal distributions > Let X, Y be as in Example 3. Find the marginal p.d.f.s of X and Y. For x,, f X x f X,Y x, y dy x x + y dy [ xy + y ] x y 3x. Here, to find the limits of the integral, we keep x fixed, and then look for the range of y for which f X,Y x, y is non-zero. That is, we use T {x, y : x,, y, x}. For x /,, we have f X,Y x, y, so 3x if x, f X x is the marginal p.d.f. of X. For y,, we have f Y y f X,Y x, y dx y x + y dx [ x + xy ] xy + y 3y. Here, to find the limits of the integral, we keep y fixed, and then look for the range of x for which f X,Y x, y is non-zero. That is, we use T {x, y : y,, x y, }. For y /, we have f X,Y x, y, so + y 3y if y, f Y y is the marginal p.d.f. of Y. Example 5: Conditional distributions

11 > Let X, Y be as in Example 3. For y,, find the conditional p.d.f. of X given Y y. We obtained f Y y in Example 4, and we know f X,Y x, y from Example 3. Note that, with y, fixed, f X,Y x, y is non-zero only for x y,. So, f X Y y x f X,Y x, y x+y +y 3y if x y, f Y y Example 6: Independence, factorizing f X,Y. > Are the random variables X and Y from Example 3 independent? The random variables X and Y from Example 3 are not independent as the p.d.f. x + y if x, y T fx, y otherwise cannot be factorised as a function of x times a function of y. > Let U and V be two random variables with joint probability density function ue u+3v if u >, v > f U,V u, v Are U and V independent? f U,V u, v can be factorised into a function of x and a function of y, 4ue u 3e 3v if u >, v > f U,V u, v guhv where 4ue u if u > 3e 3v if v > gu hv otherwise, Therefore, U and V are independent. In fact, in this case we can recognize that g is the p.d.f. of a Ga, and h is the p.d.f. of a Exp3, so U and V are Ga, and Exp3 respectively. Example 7: Covariance and correlation > Let X, Y be as in Example 3. Find the covariance CovX, Y.

12 We want to calculate CovX, Y E[XY ] E[X]E[Y ]. We have E[XY ] x x4 dx xyx + y dy dx Using the marginal probability density functions for X and Y that we found in Example 4, we have E[X] E[Y ] xf X x dx yf Y y dy evaluating these two integrals is left to you. So, > Find the correlation ρx, Y. We now need to find ρx, Y of X and Y. We have E[X ] E[Y ] CovX, Y x3x dx 3 4 y + y 3y dy CovX,Y VarX VarY. So, we also need to calculate the variances x f X x dx y f Y y dy x 3x dx 3 5 y + y 3y dy 7 3 again, evaluating these two integrals is left to you. From this we obtain, 3 and we get VarX E[X ] E[X] VarY E[Y ] E[Y ] ρx, Y / Example 8: Calculating conditional expectation > Let X, Y be as in Example 3. Let y,. Find E[X Y y] and E[X Y ]. We have already found the conditional p.d.f. of X in Example 5, it is x+y +y 3y if x y, f X Y y x

13 So, Hence, E[X Y y] y [ ] x + yx + y 3y dx 3 x3 + yx + y 3y y E[X Y ] + 3Y 5Y Y 3Y. + 3y 5y3 3 + y 3y. > Show that E[E[X Y ]] E[X]. To find E[E[X Y ]], we first note that E[X Y ] gy + 3Y 5Y Y 3Y use then use the usual method for finding the expectation of a function of Y. That is, E[E[X Y ]] E[gY ] gyf Y y dy + 3y 5y 3 3 dy We have already shown during Example 7 that E[X] 3 4. Example 9: Proof of E[E[X Y ]] E[X] It is no coincidence that E[E[X Y ]] E[X] in Example 8. In fact, this holds true for all pairs of random variables X and Y. Here is a general proof. We have E[X Y ] gy, where So, gy E[X Y y] xf X Y y x dx. E[E[X Y ]] E[gY ] E[X]. gyf Y y dy x xf X Y y xf Y y dx dy xf X,Y x, y dy dx xf X x dx f X,Y x, y dy dx by definition of the conditional p.d.f. by definition of the marginal p.d.f. Example : Calculation of expectation and variance by conditioning 3

14 Let X Ga, and, conditional on X x, let Y P ox. Then, using standard results about the mean and variance of Gamma/Poisson random variables, E[X], VarX, E[Y X] X and VarY X X. So, using the formulae from Lemma 4., Chapter 5 E[Y ] E[E[Y X]] E[X] VarY E[VarY X] + VarE[Y X] E[X] + VarX 3. Example : Transforming bivariate random variables > Let X Ga3, and Y Be,, and let X and Y be independent. Find the joint p.d.f. of the vector U, V, where U X + Y and V X Y. The p.d.f.s of X and Y are f X x x e x if x > otherwise, 6y y if y, f Y y By independence, their joint p.d.f. is 3x y ye x if x > and y, f X,Y x, y The transformation we want is u x + y and v x y. So, u + v x, u v y, and the inverse transformation is x u+v u v, and y. Hence, the Jacobian is J det x u y u x v y v. Now, we need to transform the region T {x, y : x >, y, } into the u, v plane. This region is bounded by the three lines x, y and y, which map respectively to the lines u v, u v and u v +. 4

15 Our transformed region must also be bounded by the three lines; to check which section of the sketch it is we simply find out where some x, y T maps to. We have, T which maps to,, so the shaded region is the image of T. Therefore, f u+v X,Y f U,V u, v, u v if u >, v u, u, v > u 3 3 u + v u v u + ve u+v if u >, v u, u, v > u Example : The Box-Muller transform, simulation of normal random variables Let S Exp and Θ U[, π, and let S and Θ be independent. Then S and Θ have joint p.d.f. given by 4π f S,Θ s, θ e s if s and θ [, π We can think of S and Θ as giving the location of a point S, Θ in polar co-ordinates. We transform this point into Cartesian co-ordinates, meaning that we want to use the transformation X S cosθ and Y S sinθ. Therefore, our transformation is x s cos θ, y s sin θ. This transformation maps the set of s, θ for which f S,Θ s, θ > onto all of R it is just Polar coordinates r, θ with r s. To find the inverse transformation, note that s x +y and y/x tan θ, so θ arctany/x. So the Jacobian is Hence, J det s x θ x s y θ y det x y y/x /x +y/x +y/x f X,Y x, y x +y π e for all x, y R. Now, we can factorise this as f X,Y x, y e x e y, π π + y/x y /x + y/x which implies that X and Y are independent standard normal random variables. Assuming we can simulate uniform random variables, then using the transformation in Q3.3 we can also simulate exponential random variables. Then, using above transformation, we can simulate standard normals. Example 3: Finding the distribution of a sum of Gamma random variables 5

16 > Suppose that two independent random variables X and Y follow the distributions X Ga4, and Y Ga,. Find the distribution of Z X + Y Let W X. So the transformation we want to apply is z x + y, w x. The inverse transformation is x w and y z w, so the Jacobian is x x J det z w det. y z By independence of X and Y, their joint p.d.f. is 4 Γ4 f X,Y x, y x3 e x Γ ye y if x, y > otherwise 6 6 x3 ye x+y if x, y > y w The region of x, y on which f X,Y x, y is non-zero is x > and y >. This is bounded by the lines x, y, which are respectively mapped to w and z w. The point, is mapped to,, meaning that the shaded area is the region on which f Z,W z, w is non-zero. Hence, the joint p.d.f. of Z and W is 6 6 f Z,W z, w w3 z we z if z > and w, z othwerwise. Lastly, to obtain the marginal p.d.f. of Z, we integrate out w. For z >, f Z z 6 6 e z 6 6 e z z 6 6 z5 e z 6 Γ6 z5 e z. w 3 z w 4 dw z 5 4 z5 5 6

17 For z we have f Z z. So, we can recognise f Z z as the p.d.f. of a Ga6, random variable, and conclude that Z Ga6,. More generally, this method can be used to show that if X Gaα, β, Y Gaα, β and X and Y are independent, then X + Y Gaα + α, β for any α, α, β. See Q5.8. Chapter 6 Example 4: Mean vectors and covariance matrices Recall the random variables X, Y from Example 3. In Example 7 we calculated that E[X] and E[Y ]. So the mean vector of X X, Y T is E[X] In Example 7 we also calculated that CovX, Y 48 Therefore, the covariance matrix of X is CovX Example 5: Affine transformation of a random vector > Suppose that the random vector X X, X, X 3 T has E[X], CovX and that Var[X] 8, VarY 7. Define two new random variables, U X X + X 3 and V X X 3 +. Find the mean vector and covariance matrix of U U, V T. We can express the relationship between X and U as an affine transformation: X U U AX + b X +. V So, we can use Lemma 6.3 to find the mean vector and covariance matrix of U. Firstly,. X 3 E[U] AE[X] + b + + 7

18 and secondly, CovU A CovXA T > Find the correlation coefficient ρu, V.. We can read off VarU, VarV and CovU, V from the covariance matrix of U. So the correlation coefficient of U and V is ρu, V Example 6: Variance of a sum CovU, V VarU VarV > Suppose that two random variables X and Y have variances σx and σ Y, and covariance CovX, Y. Find the variance of X + Y. If we write U X + Y, then X U U Y where U denotes the matrix with the single entry U. We usually won t bother to write brackets around matrices/vectors. We can apply Lemma 6.3 to this case, with A and X X, Y T, to obtain that CovU A CovXA T. The covariance matrix of X is given by σx CovX, Y CovX. CovX, Y Since U is, CovU VarU, so we have σx CovX, Y VarX + Y CovX, Y which you should recognize. σ Y σ X + CovX, Y + σ Y, σ Y 8

19 Example 7: The bivariate normal with independent components > Find the p.d.f. of the bivariate normal X X, X T in the case where CovX, X. From Definition 6.4, the general bivariate normal distribution X, with mean vector µ and covariance matrix Σ has joint probability density function f X,X x, x π σ σ exp σ x µ σ x µ x µ + σ x µ σ σ σ σ If we assume CovX, X σ σ, then the p.d.f. simplifies to f X,X x, x exp x µ πσ σ σ x µ σ exp x µ exp x µ πσ πσ σ f X x f X x. 4 Here, in the final line we see factorize f X,X x, x, into the product of the p.d.f. of the Nµ, σ random variable X and the p.d.f. of the Nµ, σ random variables X. Therefore, in this case X and X are independent. Note that, setting µ µ and σ σ, we recover 6.. We have shown above that if CovX, X then X and X are independent. If X and X are independent then it is automatic that CovX, X. Hence: X and X are independent if and only if CovX, X. We will record this fact as Lemma 6.8. Example 8: Plotting the p.d.f. of the bivariate normal. The pdf of a bivariate normal is a bell curve : σ This example is the standard bivariate normal Nµ, Σ where µ, and Σ. It was generated in Mathematica with the code all one line 9

20 Plot3D[/Pi E^-x^ + y^/, {x, -4, 4}, {y, -4, 4}, PlotRange -> All, ColorFunction -> ColorData["Rainbow"][#3] &] Changing µ alters the positive of the center of the bell, without changing the shape of the curve. For example, taking µ, and Σ gives Changing Σ afters the shape of the bell. For example, taking µ, and Σ 4 gives Changing both µ and Σ together results in a bell curve that is both translated and reshaped. Example 9: Marginal distributions of the bivariate normal, and their covariance. > Let X X, X T have distribution N µ, Σ where µ 3 and Σ 3. Write down the marginal distributions of X and X. From Lemma 6.7 we know that X and X are both univariate normals. We can read their means and covariances off from the mean vector µ and covariance matrix Σ. We have X Nµ, σ, so X N,, and also X Nµ, σ so X N3, 3.

21 > Find CovX, X and ρx, X. Are X and X independent? From the covariance matrix, CovX, X. Hence, ρx, X CovX, X VarX VarX. 3 6 Clearly, we have CovX, X so X and X are not independent. Example 3: Conditional distributions for bivariate normal > Let a R and let X N µ, Σ where µ and Σ 3 3. Find the conditional distribution of X given X a. By Lemma 6.9, the conditional distribution of X given X a is a univariate normal with mean given by µ + ρ σ σ x µ and variance ρ σ. In this case, µ, µ, ρ 3/, σ, σ, and x a. So, µ + 3a and σ 9. Hence, the conditional distribution of X given X a is N + 3a,. Example 3: Transformations of bivariate normal > Let X N µ, Σ where µ and Σ are as in Example 3. Let Y X + X X Y Find the distribution of Y Y, Y T. We can write Y as an affine transformation of X, that is X Y AX + b +. The matrix A is a non-singular matrix, so by Lemma 6., Y is a bivariate normal. Therefore, if we can find the mean vector and covariance matrix of Y, we know the distribution of Y. and X 4 E[Y] AE[X] + b, CovY A CovXA T So, the distribution of Y is [ ] Y N,. 3 Example 3: Affine transformation of a three dimensional normal distribution.

22 > Suppose X X X 3 4 N 3, 9. 4 Find the joint distribution of Y Y, Y T where Y X X and Y X + X + X 3. so and We can write, X Y AX X, X 3 E[Y] AE[X], 4 CovY A CovXA T It is not hard to see that A is an onto transformation, so Y has a bivariate normal distribution here we use the multivariate equivalent of Lemma 6.. Hence, [ ] Y 9 6 N, > Find ρx, X 3. Are X and X 3 independent? From the covariance matrix of X, we can read off Y ρx, X 3 CovX, X 3 VarX VarX Since X and X 3 are components of a multivariable normal distribution, and CovX, X 3, by the three dimensional equivalent of Lemma 6.8 X and X 3 are independent. Chapter 7 Example 33: Maximising a function

23 > Find the value of θ which maximises fθ θ 5 θ on the range θ [, ]. First, we look for turning points. We have f θ 5θ 4 θ + θ 5 θ 4 5 6θ So the turning points are at θ and θ 5 6. To see which ones are local maxima, we calculate the second derivative: f θ 4θ 3 5 6θ + θ 4 6 θ 3 3θ. So, f < and θ 5 6 is a local maximum. Unfortunately, f, so we don t know if θ is a local maximum, minimum or inflection. However, we can check that f, so it doesn t matter which, we still have f < f 5 6. Hence, θ 5 6 is the global maximiser. Example 34: Likelihood functions and maximum likelihood estimators > Let X be a random variable with Expθ distribution, where the parameter λ is unknown. Find the and sketch the likelihood function of X, given the data x 3. The likelihood function is Lθ; 3 f X 3; θ θe 3θ defined for all θ Θ,. We can plot this in R, for θ,, with the command curvex*exp-3x, from, to5, xlab~theta, ylab"l"~theta~";4 3

24 Note that we use x as the θ variable here because R hard-codes its use of x as a graph variable. The result is > Given this data, find the likelihood of θ,,,, 5. Amongst these values of θ, which has the highest likelihood? The likelihoods are L ; 3 e 3.7 L ; 3 e 3. L; 3 e 3.5 L; 3 e 6.5 L5; 3 5e So, restricted to looking at these values, θ has the highest likelihood. > Find the maximum likelihood estimator of θ,, based on the single data point x 3. We need to find the value of θ Θ which maximises Lθ; 3. We differentiate, to look for turning points, obtaining dl dθ e 3θ 3θe 3θ e 3θ 3θ. 4

25 Hence, there is only one turning point, at θ 3. We differentiate again, obtaining d L dθ 3e 3θ 3θ + e 3θ 3 e 3θ 6 + 9θ At θ 3, we have d L dθ e <, so the turning point at θ 3 is a local maximum. Since it is the only turning point, it is also the global maximum. Hence, the maximum likelihood estimator of θ is ˆθ 3. Example 35: Models, parameters and data aerosols. > The particle size distribution of an aerosol is the distribution of the diameter of aerosol particles within a typical region of air. The term is also used for particles within a powder, or suspended in a fluid. In many situations, the particle size distribution is modelled using the log-normal distribution. It is typically reasonable to assume that the diameters of particles are independent. Assuming this model, find the joint probability density function of the diameters observed in a sample of n particles, and state the parameters of the model. Recall that the p.d.f. of the log-normal distribution is f Y y yσ exp log y µ π σ if y, The parameters of this distribution, and hence also the parameters of our model, are µ R and σ,. Since the diameters of particles are assumed to be independent, the joint probability density function of Y Y, Y,..., Y n, where Y i is the diameter of the i th particle, is f Y y,..., y n n f Yi y i i πσ n/ y y...y n exp n log y i µ σ if y i > for all i i otherwise. Note that, if one or more of the y i is less than or equal to zero then f Yi y i, which means that also f Y y,..., y n. Example 36: Maximum likelihood estimation with i.i.d. data. > Let X Bernθ, where θ is an unknown parameter. Suppose that we have 3 independent samples of X, which are x {,, }. Find the likelihood function of θ, given this data. 5

26 The probability function of a single Bernθ random variable is θ if x f X x; θ θ if x otherwise Since our three samples are independent, we model x as a sample from the joint distribution X X, X, X 3, where f X x; θ 3 f Xi x i ; θ i and f Xi is the p.d.f. of a single Bernθ random variable. Since f Xi has several cases, it would be unhelpful to try and expand out this formula before we put in values for the x i. Our likelihood function is therefore Lθ; x f X ; θ f X ; θ f X3 ; θ θθθ θ θ 3. The range of values that the parameter θ can take is Θ [, ]. > Find the maximum likelihood estimator of θ, given the data x. We seek to maximize Lθ; x for θ [, ]. Differentiating once, dl dθ θ 3θ θ 3θ so the turning points are at θ and θ 3. Differentiating again, d L dθ 6θ which gives d L θ dθ and d L θ/3 dθ 4. Hence, θ is a local minimum and θ 3 is a local maximum, so θ 3 maximises Lθ; x over θ [, ]. The maximum likelihood estimator of θ is therefore ˆθ 3. This is, hopefully, reassuring. The number of s in our sample of 3 was, so using independence θ 3 seems like a good guess. See Q7. for a much more general case of this example. Example 37: Maximum likelihood estimation radioactive decay. > Atoms of radioactive elements decay as time passes, meaning that any such atom will, at some point in time, suddenly break apart. This process is known as radioactive decay. The time taken for a single atom of, say, carbon-5 to decay is usually modelled as an exponential random variable, with unknown parameter λ,. The parameter λ is known as the decay rate. The times at which atoms decay are known to be independent. 6

27 Using this model, find the likelihood function for the time to decay of a sample of n carbon-5 atoms. The decay time X i of the i th atom is exponential with parameter λ,, and therefore has p.d.f. λe λxi if x i > f Xi x i ; λ Since each atom decays independently, the joint distribution of X X i n i is n n λe λxi if x i > for all i f X x; λ f Xi x i ; λ i i otherwise. λ n exp λ n i x i if x i > for all i Therefore, the likelihood function is λ n exp λ n i Lλ; x x i if x i > for all i The range of possible values of the parameter λ is Θ,. > Suppose that we have sampled the decay times of 5 carbon-5 atoms in seconds, accurate to two decimal places, and found them to be x {.5,.9,.88, 4.6, 9.75,.6,.3,.7,.3,.8, 4.5, 9.5,.67, 3.79, 4.3}. Find the maximum likelihood estimator of λ, based on this data. Given this data, for which 5 x i 47.58, our likelihood function is Differentiating, we have i Lλ; x λ 5 e 47.58λ. dl dλ 5λ4 e 47.58λ 47.58λ 5 e 47.58λ λ λe 47.58λ which is zero only when λ or λ 5/ Since λ is outside of the range Θ, of possible parameter values, the only turning point of interest is λ 5/ Differentiating again with the details left to you, we end up with d L dλ λ3 47.4λ λ 5 e 47.58λ λ λ λ e 47.58λ 7

28 Evaluating at our turning point gives d L dλ λ5/ e 5 < So, our turning point is a local maximum. Since there are no other turning points within the allowable range our turning point is the global maximum. estimator of λ, given our data x, is ˆλ Hence, the maximum likelihood In reality, physicists are able to collect vastly more data than n 5, but even with 5 data points we are not far away from the true value of λ, which is λ Of course, by true value here we mean the value that has been discovered experimentally, with the help of statistical inference. So-called carbon dating typically uses carbon-4, which has a much slower decay rate of approximately. 4. Carbon-4 is present in many living organisms and, crucially, the proportion of carbon in living organisms that is carbon-4 is essentially the same for all living organisms. Once organisms die, the carbon-4 radioactively decays. The key idea behind carbon dating is that, by measuring the concentration of carbon-4 within a fossil, scientists can estimate how long ago that fossil lived. To do so, a highly accurate estimate of the decay rate of carbon-4 is needed. Example 38: Maximum likelihood estimation via log-likelihood mutations in DNA. > When organisms reproduce, the DNA or RNA of the offspring is a combination of the DNA of its one, or two parents. Additionally, the DNA of the offspring contains a small number of locations in which it differs from its parents. These locations are called mutations. The number of mutations per unit length of DNA is typically modelled using a Poisson distribution, with an unknown parameter θ,. The numbers of mutations found in disjoint sections of DNA are independent. Using this model, find the likelihood function for the number of mutations present in a sample of n disjoint strands of DNA, each of which has unit length. Let X i be the number of mutations in the i th strand of DNA. So, under our model, f Xi x i ; θ e θ θ xi for x i {,,,...}, and f Xi x i if x i / N {}. Since we assume the X i are independent, x i! the joint distribution of X X, X,..., X n has probability function f X x n i e θ θ xi x i! x!x!... x n! e nθ θ n xi Actually, the biological details here are rather complicated, and we omit discussion of them. 8

29 provided all x i N {}, and zero otherwise. Therefore, our likelihood function is Lθ; x The range of possible values for θ is Θ,. x!x!... x n! e nθ θ n xi. > Let x be a vector of data, where x i is the number of mutations observed in a distinct unit length segment of DNA. Suppose that at least one of the x i is non-zero. Find the corresponding log-likelihood function, and hence find the maximum likelihood estimator of θ. The log-likelihood function is lθ; x log Lθ; x, so log Lθ, x log x!x!... x n! e nθ θ n xi n n logx i! nθ + log θ x i. i We now look to maximise lθ; x, over θ,. Differentiating, we obtain dl dθ n + n x i. θ Note that this is much simpler than what we d get if we differentiated Lθ; x. So, the only turning point of lθ, x is at θ n n i x i. Differentiating again, we have d l dθ θ i n x i. Since our x i are counting the occurrences of mutations, x i, and since at least one is non-zero we have d l dθ < for all θ. Hence, our turning point is a maximum and, since it is the only maximum, is also the global maximum. Therefore, the maximum likelhood estimator of θ is ˆθ n x i. n > Mutations rates were measured, for HIV patients, and there were found to be { } x 9, 6, 37, 8, 4, 34, 37, 6, 3, 48, 45 mutations per 4 possible locations i.e. per unit length. This data comes from the article Cuevas et al. 5. i Assuming the model suggested above, calculate the maximum likelihood estimator of the mutation rate of HIV. The data has x i i x i so we conclude that the maximum likelihood estimator of the mutation rate θ, given this data, is ˆθ i 9

30 Example 39: Maximum likelihood estimation via log-likelihood spectrometry. > Using a mass spectrometer, it is possible to measure the mass 3 of individual molecules. For example, it is possible to measure the masses of individual amino acid molecules. A sample of 5 amino acid molecules, which are all known to be of the same type and therefore, the same mass, were reported to have masses x {65.76, 4.4, 94., 3.3, 5., 4.77, 6., 86.4, 9.4, 66.7, 9., , 58.9}. It is known that these molecules are either Alanine, which has mass 7., or Leucine, which has mass 3.. Given a molecule of mass θ, the spectrometer is known to report its mass as X Nθ, 35, independently for each molecule. Using this model, and the data above, find the likelihoods of Alanine and Leucine. Specify which of these has the greatest the likelihood. Our model, for the reported mass X of a single molecule with real weight θ, is X N, 35. Therefore, X i Nθ, 3 and the p.d.f. of a single data point is f Xi x i exp x i θ π Therefore, the p.d.f. of the reported masses X X,..., X n of n molecules is n f X x f Xi x i π n/ 35 n exp n x i θ. 45 i We know that, in reality, θ must be one of only two different values; 7. for Alanine and 3. for Leucine. Therefore, our likelihood function is Lθ; x π n/ 35 n exp 45 i n x i θ and the possible range of values for θ is the two point set Θ {7., 3.}. We need to find out which of these two values maximises the likelihood. Our data x contains n 5 data points. A short calculation use e.g. R shows that i 45 and, therefore, that 5 i x i 7..7, 45 5 i x i L7.; x.9 34, L3.; x We conclude that θ 7. has much greater likelihood than θ 3., so we expect that the molecules sampled are Alanine. 3 This is a simplification; in reality a mass spectrometer measure the mass to charge ratio of the molecule, but since the charges of molecule are already known, the mass can be inferred later. Atomic masses are measured in so-called atomic mass units. 3

31 Note that, if we were to differentiate as we did in other examples, we would find the maximiser θ for Lθ; x across the whole range θ,, which turns out to be θ 8.7. This is not what we want here! The design of our experiment has meant that the range of possible values for θ is restricted to the two point set Θ {7., 3.}. See Q7.5 for the unrestricted case. Example 4: Two parameter maximum likelihood estimation rainfall. > Find the maximum likelihood estimator of the parameter vector θ µ, σ when the data x x, x,..., x n are modelled as i.i.d. samples from a normal distribution Nµ, σ. Our parameter vector is θ µ, σ, so let us write v σ to avoid confusion. As a result, we are interested in the parameters θ µ, v, and the range of possible values of θ is Θ R,. The p.d.f. of the univariate normal distribution Nµ, v is f X x πv e x µ /v. Writing X X,..., X n, where the X i are i.i.d. univariate Nµ, v random variables, the likelihood function of X is Lθ; x f X x Therefore, the log likelihood is lθ; x n exp πv n/ v logπ + logv v n x i µ. i n x i µ. We now look to maximise lθ; x over θ Θ. The partial derivatives are l µ n n x i µ x i nµ v v i i l v n v + n v x i µ. i Solving l µ gives µ n n i x i x. Solving l v gives v n n i x i µ. So both partial derivatives will be zero if and only if i µ x, v n n x i x. 5 i This gives us the value of θ µ, v at the single turning point of l. 3

32 so Next, we use the Hessian matrix to check if this point is a local maximum. We have l µ n v l µ v v l v n x i nµ i n v v 3 n x i µ Evaluating these at our turning point, we get l µ ṋ 5 v l n µ v 5 v x i n x i l v n 5 v n v 3 x i x n v n nˆv v3 v H i i n v. n v Since n n v < and det H v >, our turning point 5 is a local maximum. Since it is the 3 only turning point, it is also the global maximum. Hence, the MLE is ˆµ x ˆσ ˆv n n x i x. Note ˆµ is the sample mean, and ˆσ is the biased sample variance. i > For the years 985-5, the amount of rainfall in milimeters recorded as falling on Sheffield in December is as follows: {78., 4.3, 38., 36., 59., 36., 78.4, 67.4, 7.4, 3.9, 7.4, 98., 79.4, 57.9, 35.6, 8., 8., 9.8, 6.5, 46.3, 56.7, 4., 74.9, 5.8, 66., 8.8, 4.6, 36., 69.8,.,.} This data comes from the historical climate data stored by the Met Office 4. Meteorologists often model the long run distribution of rainfall by a normal distribution although in some cases the Gamma distribution is used. Assuming that we choose to model the amount of rainfall in Sheffield each December by a normal distribution, find the maximum likelihood estimators for µ and σ. The data has n 3, and x 3 3 i 93.9, 3 3 i x i x

33 So we conclude that, according to our model, the maximum likelihood estimators are ˆµ 93.9 and ˆσ 4.4, which means that Sheffield receives a N93.9, 4.4 quantity of rainfall, in millimetres, each December. Example 4: Maximum likelihood estimation for the uniform distribution > Find the maximum likelihood estimator of the parameter θ when the data x x, x,..., x n are i.i.d. samples from a uniform distribution U[, θ], with unknown parameter θ >. Here the p.d.f. of X i is fx θ for x θ and zero otherwise. So the likelihood, for θ Θ R +, is θ if θ x Lθ; x n i for all i if θ < x i for some i θ if θ max n i x i if θ < max i x i. Differentiating the likelihood, we see that Lθ; x is decreasing but positive for θ > max i x i. For θ < max i x i we know Lθ; x, so by looking at the graph, we can see that the maximum occurs at This is the MLE. θ ˆθ max x i. i,...,n Example 4: Interval estimation based on likelihood > Suppose that we have i.i.d. data x x, x,..., x n, for which each data point is modelled as a random sample from Nµ, σ where µ is unknown and σ is known. Find the k-likelihood region R k for the parameter µ. First, we need to find the MLE ˆµ of µ. The likelihood function for our model is n Lµ; x φx i ; µ πσ exp n n/ σ x i µ, i where the range of parameter values is all µ R. The log likelihood is lµ; x n logπ + logσ n σ x i µ. 33 i i

34 The usual process of maximisation which is left for you and is a simplified case of Example 4 shows that the maximum likelihood estimator is the sample mean, ˆµ n n x i. i Now we are ready to identify the k-likelihood region for µ. By definition, the k-likelihood region is So, µ R k if and only if R k {µ R : lµ; x lˆµ; x k}. σ n i x i µ σ We can simplify this inequality, by noting that n x i µ i n x i ˆµ i n x i ˆµ k. i n x i x i µ + µ x i + x iˆµ ˆµ i nµ nˆµ + ˆµ µ n i nµ nˆµ + ˆµ µnˆµ nµ + ˆµ µˆµ nˆµ µ. x i So, µ R k if and only if or in other words, n σ ˆµ µ k, [ ] k k R k ˆµ σ n, ˆµ + σ. n Example 43: Hypothesis tests based on likelihood > In Example 37, if we used a -likelihood test, would we accept the hypothesis that the radioactive decay of carbon-5 is equal to λ.7? We had found, given the data, that the likelihood function of θ was Lλ; x λ 5 e 47.58λ and the maximum likelihood estimator of λ was ˆλ.3. The -likelihood region for λ is the set so λ R if and only if R { } λ > : Lλ; x e Lˆλ; x, λ 5 e 47.58λ e L.3; x

35 Note that, unlike the previous example, we can t simplify this inequality and find a nice form for the likelihood region. Our hypothesis is that, in fact, λ.7. Our -likelihood test will pass if λ.7 is within the -likelihood region, and fail if not. We can evaluate use e.g. R,.7 5 e and note that Hence λ.7 is within the -likelihood region and we accept the hypothesis. 35

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

MAS223 Statistical Inference and Modelling Exercises and Solutions

MAS223 Statistical Inference and Modelling Exercises and Solutions MAS3 Statistical Inference and Modelling Exercises and Solutions The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up

More information

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix: Joint Distributions Joint Distributions A bivariate normal distribution generalizes the concept of normal distribution to bivariate random variables It requires a matrix formulation of quadratic forms,

More information

Exercises and Answers to Chapter 1

Exercises and Answers to Chapter 1 Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean

More information

Statistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v }

Statistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v } Statistics 35 Probability I Fall 6 (63 Final Exam Solutions Instructor: Michael Kozdron (a Solving for X and Y gives X UV and Y V UV, so that the Jacobian of this transformation is x x u v J y y v u v

More information

Lecture 1: August 28

Lecture 1: August 28 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random

More information

18 Bivariate normal distribution I

18 Bivariate normal distribution I 8 Bivariate normal distribution I 8 Example Imagine firing arrows at a target Hopefully they will fall close to the target centre As we fire more arrows we find a high density near the centre and fewer

More information

18.440: Lecture 28 Lectures Review

18.440: Lecture 28 Lectures Review 18.440: Lecture 28 Lectures 18-27 Review Scott Sheffield MIT Outline Outline It s the coins, stupid Much of what we have done in this course can be motivated by the i.i.d. sequence X i where each X i is

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

2 Functions of random variables

2 Functions of random variables 2 Functions of random variables A basic statistical model for sample data is a collection of random variables X 1,..., X n. The data are summarised in terms of certain sample statistics, calculated as

More information

18.440: Lecture 28 Lectures Review

18.440: Lecture 28 Lectures Review 18.440: Lecture 28 Lectures 17-27 Review Scott Sheffield MIT 1 Outline Continuous random variables Problems motivated by coin tossing Random variable properties 2 Outline Continuous random variables Problems

More information

Chapter 5 continued. Chapter 5 sections

Chapter 5 continued. Chapter 5 sections Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Lecture 25: Review. Statistics 104. April 23, Colin Rundel Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April

More information

This exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text.

This exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text. TEST #3 STA 5326 December 4, 214 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. (You will have access to

More information

4. CONTINUOUS RANDOM VARIABLES

4. CONTINUOUS RANDOM VARIABLES IA Probability Lent Term 4 CONTINUOUS RANDOM VARIABLES 4 Introduction Up to now we have restricted consideration to sample spaces Ω which are finite, or countable; we will now relax that assumption We

More information

conditional cdf, conditional pdf, total probability theorem?

conditional cdf, conditional pdf, total probability theorem? 6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random

More information

Bivariate Transformations

Bivariate Transformations Bivariate Transformations October 29, 29 Let X Y be jointly continuous rom variables with density function f X,Y let g be a one to one transformation. Write (U, V ) = g(x, Y ). The goal is to find the

More information

Statistics STAT:5100 (22S:193), Fall Sample Final Exam B

Statistics STAT:5100 (22S:193), Fall Sample Final Exam B Statistics STAT:5 (22S:93), Fall 25 Sample Final Exam B Please write your answers in the exam books provided.. Let X, Y, and Y 2 be independent random variables with X N(µ X, σ 2 X ) and Y i N(µ Y, σ 2

More information

Probability and Distributions

Probability and Distributions Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated

More information

Introduction to Normal Distribution

Introduction to Normal Distribution Introduction to Normal Distribution Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 17-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Introduction

More information

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample

More information

Conditional densities, mass functions, and expectations

Conditional densities, mass functions, and expectations Conditional densities, mass functions, and expectations Jason Swanson April 22, 27 1 Discrete random variables Suppose that X is a discrete random variable with range {x 1, x 2, x 3,...}, and that Y is

More information

3. Probability and Statistics

3. Probability and Statistics FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important

More information

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu Home Work: 1 1. Describe the sample space when a coin is tossed (a) once, (b) three times, (c) n times, (d) an infinite number of times. 2. A coin is tossed until for the first time the same result appear

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Chapter 5. Chapter 5 sections

Chapter 5. Chapter 5 sections 1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours MATH2750 This question paper consists of 8 printed pages, each of which is identified by the reference MATH275. All calculators must carry an approval sticker issued by the School of Mathematics. c UNIVERSITY

More information

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities PCMI 207 - Introduction to Random Matrix Theory Handout #2 06.27.207 REVIEW OF PROBABILITY THEORY Chapter - Events and Their Probabilities.. Events as Sets Definition (σ-field). A collection F of subsets

More information

18.440: Lecture 26 Conditional expectation

18.440: Lecture 26 Conditional expectation 18.440: Lecture 26 Conditional expectation Scott Sheffield MIT 1 Outline Conditional probability distributions Conditional expectation Interpretation and examples 2 Outline Conditional probability distributions

More information

Chp 4. Expectation and Variance

Chp 4. Expectation and Variance Chp 4. Expectation and Variance 1 Expectation In this chapter, we will introduce two objectives to directly reflect the properties of a random variable or vector, which are the Expectation and Variance.

More information

STA 256: Statistics and Probability I

STA 256: Statistics and Probability I Al Nosedal. University of Toronto. Fall 2017 My momma always said: Life was like a box of chocolates. You never know what you re gonna get. Forrest Gump. Exercise 4.1 Let X be a random variable with p(x)

More information

1 Probability theory. 2 Random variables and probability theory.

1 Probability theory. 2 Random variables and probability theory. Probability theory Here we summarize some of the probability theory we need. If this is totally unfamiliar to you, you should look at one of the sources given in the readings. In essence, for the major

More information

Notes on the Multivariate Normal and Related Topics

Notes on the Multivariate Normal and Related Topics Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions

More information

Random Variables and Their Distributions

Random Variables and Their Distributions Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital

More information

5 Operations on Multiple Random Variables

5 Operations on Multiple Random Variables EE360 Random Signal analysis Chapter 5: Operations on Multiple Random Variables 5 Operations on Multiple Random Variables Expected value of a function of r.v. s Two r.v. s: ḡ = E[g(X, Y )] = g(x, y)f X,Y

More information

Continuous Random Variables

Continuous Random Variables 1 / 24 Continuous Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay February 27, 2013 2 / 24 Continuous Random Variables

More information

1 Presessional Probability

1 Presessional Probability 1 Presessional Probability Probability theory is essential for the development of mathematical models in finance, because of the randomness nature of price fluctuations in the markets. This presessional

More information

MULTIVARIATE PROBABILITY DISTRIBUTIONS

MULTIVARIATE PROBABILITY DISTRIBUTIONS MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined

More information

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows. Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage

More information

E X A M. Probability Theory and Stochastic Processes Date: December 13, 2016 Duration: 4 hours. Number of pages incl.

E X A M. Probability Theory and Stochastic Processes Date: December 13, 2016 Duration: 4 hours. Number of pages incl. E X A M Course code: Course name: Number of pages incl. front page: 6 MA430-G Probability Theory and Stochastic Processes Date: December 13, 2016 Duration: 4 hours Resources allowed: Notes: Pocket calculator,

More information

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete

More information

MATH/STAT 3360, Probability Sample Final Examination Model Solutions

MATH/STAT 3360, Probability Sample Final Examination Model Solutions MATH/STAT 3360, Probability Sample Final Examination Model Solutions This Sample examination has more questions than the actual final, in order to cover a wider range of questions. Estimated times are

More information

Stat 5101 Notes: Algorithms

Stat 5101 Notes: Algorithms Stat 5101 Notes: Algorithms Charles J. Geyer January 22, 2016 Contents 1 Calculating an Expectation or a Probability 3 1.1 From a PMF........................... 3 1.2 From a PDF...........................

More information

Expectation and Variance

Expectation and Variance Expectation and Variance August 22, 2017 STAT 151 Class 3 Slide 1 Outline of Topics 1 Motivation 2 Expectation - discrete 3 Transformations 4 Variance - discrete 5 Continuous variables 6 Covariance STAT

More information

STA 256: Statistics and Probability I

STA 256: Statistics and Probability I Al Nosedal. University of Toronto. Fall 2017 My momma always said: Life was like a box of chocolates. You never know what you re gonna get. Forrest Gump. There are situations where one might be interested

More information

Stat410 Probability and Statistics II (F16)

Stat410 Probability and Statistics II (F16) Stat4 Probability and Statistics II (F6 Exponential, Poisson and Gamma Suppose on average every /λ hours, a Stochastic train arrives at the Random station. Further we assume the waiting time between two

More information

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3. Mathematical Statistics: Homewor problems General guideline. While woring outside the classroom, use any help you want, including people, computer algebra systems, Internet, and solution manuals, but mae

More information

Continuous Random Variables

Continuous Random Variables Continuous Random Variables Recall: For discrete random variables, only a finite or countably infinite number of possible values with positive probability. Often, there is interest in random variables

More information

More than one variable

More than one variable Chapter More than one variable.1 Bivariate discrete distributions Suppose that the r.v. s X and Y are discrete and take on the values x j and y j, j 1, respectively. Then the joint p.d.f. of X and Y, to

More information

ECE Lecture #9 Part 2 Overview

ECE Lecture #9 Part 2 Overview ECE 450 - Lecture #9 Part Overview Bivariate Moments Mean or Expected Value of Z = g(x, Y) Correlation and Covariance of RV s Functions of RV s: Z = g(x, Y); finding f Z (z) Method : First find F(z), by

More information

1 Review of Probability and Distributions

1 Review of Probability and Distributions Random variables. A numerically valued function X of an outcome ω from a sample space Ω X : Ω R : ω X(ω) is called a random variable (r.v.), and usually determined by an experiment. We conventionally denote

More information

HT Introduction. P(X i = x i ) = e λ λ x i

HT Introduction. P(X i = x i ) = e λ λ x i MODS STATISTICS Introduction. HT 2012 Simon Myers, Department of Statistics (and The Wellcome Trust Centre for Human Genetics) myers@stats.ox.ac.uk We will be concerned with the mathematical framework

More information

Lecture 2: Review of Probability

Lecture 2: Review of Probability Lecture 2: Review of Probability Zheng Tian Contents 1 Random Variables and Probability Distributions 2 1.1 Defining probabilities and random variables..................... 2 1.2 Probability distributions................................

More information

14.30 Introduction to Statistical Methods in Economics Spring 2009

14.30 Introduction to Statistical Methods in Economics Spring 2009 MIT OpenCourseWare http://ocw.mit.edu 14.30 Introduction to Statistical Methods in Economics Spring 2009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

STA 2201/442 Assignment 2

STA 2201/442 Assignment 2 STA 2201/442 Assignment 2 1. This is about how to simulate from a continuous univariate distribution. Let the random variable X have a continuous distribution with density f X (x) and cumulative distribution

More information

Hypothesis Testing: The Generalized Likelihood Ratio Test

Hypothesis Testing: The Generalized Likelihood Ratio Test Hypothesis Testing: The Generalized Likelihood Ratio Test Consider testing the hypotheses H 0 : θ Θ 0 H 1 : θ Θ \ Θ 0 Definition: The Generalized Likelihood Ratio (GLR Let L(θ be a likelihood for a random

More information

Review of Probability Theory

Review of Probability Theory Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving

More information

APPM/MATH 4/5520 Solutions to Exam I Review Problems. f X 1,X 2. 2e x 1 x 2. = x 2

APPM/MATH 4/5520 Solutions to Exam I Review Problems. f X 1,X 2. 2e x 1 x 2. = x 2 APPM/MATH 4/5520 Solutions to Exam I Review Problems. (a) f X (x ) f X,X 2 (x,x 2 )dx 2 x 2e x x 2 dx 2 2e 2x x was below x 2, but when marginalizing out x 2, we ran it over all values from 0 to and so

More information

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Exam 2. Jeremy Morris. March 23, 2006

Exam 2. Jeremy Morris. March 23, 2006 Exam Jeremy Morris March 3, 006 4. Consider a bivariate normal population with µ 0, µ, σ, σ and ρ.5. a Write out the bivariate normal density. The multivariate normal density is defined by the following

More information

Formulas for probability theory and linear models SF2941

Formulas for probability theory and linear models SF2941 Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms

More information

We introduce methods that are useful in:

We introduce methods that are useful in: Instructor: Shengyu Zhang Content Derived Distributions Covariance and Correlation Conditional Expectation and Variance Revisited Transforms Sum of a Random Number of Independent Random Variables more

More information

Practice Examination # 3

Practice Examination # 3 Practice Examination # 3 Sta 23: Probability December 13, 212 This is a closed-book exam so do not refer to your notes, the text, or any other books (please put them on the floor). You may use a single

More information

4. Distributions of Functions of Random Variables

4. Distributions of Functions of Random Variables 4. Distributions of Functions of Random Variables Setup: Consider as given the joint distribution of X 1,..., X n (i.e. consider as given f X1,...,X n and F X1,...,X n ) Consider k functions g 1 : R n

More information

Math 152. Rumbos Fall Solutions to Assignment #12

Math 152. Rumbos Fall Solutions to Assignment #12 Math 52. umbos Fall 2009 Solutions to Assignment #2. Suppose that you observe n iid Bernoulli(p) random variables, denoted by X, X 2,..., X n. Find the LT rejection region for the test of H o : p p o versus

More information

ARCONES MANUAL FOR THE SOA EXAM P/CAS EXAM 1, PROBABILITY, SPRING 2010 EDITION.

ARCONES MANUAL FOR THE SOA EXAM P/CAS EXAM 1, PROBABILITY, SPRING 2010 EDITION. A self published manuscript ARCONES MANUAL FOR THE SOA EXAM P/CAS EXAM 1, PROBABILITY, SPRING 21 EDITION. M I G U E L A R C O N E S Miguel A. Arcones, Ph. D. c 28. All rights reserved. Author Miguel A.

More information

Chapter 4 continued. Chapter 4 sections

Chapter 4 continued. Chapter 4 sections Chapter 4 sections Chapter 4 continued 4.1 Expectation 4.2 Properties of Expectations 4.3 Variance 4.4 Moments 4.5 The Mean and the Median 4.6 Covariance and Correlation 4.7 Conditional Expectation SKIP:

More information

This does not cover everything on the final. Look at the posted practice problems for other topics.

This does not cover everything on the final. Look at the posted practice problems for other topics. Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry

More information

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable Distributions of Functions of Random Variables 5.1 Functions of One Random Variable 5.2 Transformations of Two Random Variables 5.3 Several Random Variables 5.4 The Moment-Generating Function Technique

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1). Name M362K Final Exam Instructions: Show all of your work. You do not have to simplify your answers. No calculators allowed. There is a table of formulae on the last page. 1. Suppose X 1,..., X 1 are independent

More information

Chapter 5. Random Variables (Continuous Case) 5.1 Basic definitions

Chapter 5. Random Variables (Continuous Case) 5.1 Basic definitions Chapter 5 andom Variables (Continuous Case) So far, we have purposely limited our consideration to random variables whose ranges are countable, or discrete. The reason for that is that distributions on

More information

Continuous Random Variables and Continuous Distributions

Continuous Random Variables and Continuous Distributions Continuous Random Variables and Continuous Distributions Continuous Random Variables and Continuous Distributions Expectation & Variance of Continuous Random Variables ( 5.2) The Uniform Random Variable

More information

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables.

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables. Random vectors Recall that a random vector X = X X 2 is made up of, say, k random variables X k A random vector has a joint distribution, eg a density f(x), that gives probabilities P(X A) = f(x)dx Just

More information

MSc Mas6002 Introductory Material Block A Introduction to Probability and Statistics

MSc Mas6002 Introductory Material Block A Introduction to Probability and Statistics MSc Mas6002 Introductory Material Block A Introduction to Probability and Statistics 1 Probability 1.1 Multiple approaches The concept of probability may be defined and interpreted in several different

More information

University of Chicago Graduate School of Business. Business 41901: Probability Final Exam Solutions

University of Chicago Graduate School of Business. Business 41901: Probability Final Exam Solutions Name: University of Chicago Graduate School of Business Business 490: Probability Final Exam Solutions Special Notes:. This is a closed-book exam. You may use an 8 piece of paper for the formulas.. Throughout

More information

1.12 Multivariate Random Variables

1.12 Multivariate Random Variables 112 MULTIVARIATE RANDOM VARIABLES 59 112 Multivariate Random Variables We will be using matrix notation to denote multivariate rvs and their distributions Denote by X (X 1,,X n ) T an n-dimensional random

More information

MATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation)

MATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation) MATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation) Last modified: March 7, 2009 Reference: PRP, Sections 3.6 and 3.7. 1. Tail-Sum Theorem

More information

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

[POLS 8500] Review of Linear Algebra, Probability and Information Theory [POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming

More information

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) D. ARAPURA This is a summary of the essential material covered so far. The final will be cumulative. I ve also included some review problems

More information

Master s Written Examination - Solution

Master s Written Examination - Solution Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2

More information

Stat 5101 Notes: Brand Name Distributions

Stat 5101 Notes: Brand Name Distributions Stat 5101 Notes: Brand Name Distributions Charles J. Geyer February 14, 2003 1 Discrete Uniform Distribution DiscreteUniform(n). Discrete. Rationale Equally likely outcomes. The interval 1, 2,..., n of

More information

LIST OF FORMULAS FOR STK1100 AND STK1110

LIST OF FORMULAS FOR STK1100 AND STK1110 LIST OF FORMULAS FOR STK1100 AND STK1110 (Version of 11. November 2015) 1. Probability Let A, B, A 1, A 2,..., B 1, B 2,... be events, that is, subsets of a sample space Ω. a) Axioms: A probability function

More information

3 Continuous Random Variables

3 Continuous Random Variables Jinguo Lian Math437 Notes January 15, 016 3 Continuous Random Variables Remember that discrete random variables can take only a countable number of possible values. On the other hand, a continuous random

More information

Final Exam # 3. Sta 230: Probability. December 16, 2012

Final Exam # 3. Sta 230: Probability. December 16, 2012 Final Exam # 3 Sta 230: Probability December 16, 2012 This is a closed-book exam so do not refer to your notes, the text, or any other books (please put them on the floor). You may use the extra sheets

More information

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n = Spring 2012 Math 541A Exam 1 1. (a) Let Z i be independent N(0, 1), i = 1, 2,, n. Are Z = 1 n n Z i and S 2 Z = 1 n 1 n (Z i Z) 2 independent? Prove your claim. (b) Let X 1, X 2,, X n be independent identically

More information

(y 1, y 2 ) = 12 y3 1e y 1 y 2 /2, y 1 > 0, y 2 > 0 0, otherwise.

(y 1, y 2 ) = 12 y3 1e y 1 y 2 /2, y 1 > 0, y 2 > 0 0, otherwise. 54 We are given the marginal pdfs of Y and Y You should note that Y gamma(4, Y exponential( E(Y = 4, V (Y = 4, E(Y =, and V (Y = 4 (a With U = Y Y, we have E(U = E(Y Y = E(Y E(Y = 4 = (b Because Y and

More information

Statistics for scientists and engineers

Statistics for scientists and engineers Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3

More information

BASICS OF PROBABILITY

BASICS OF PROBABILITY October 10, 2018 BASICS OF PROBABILITY Randomness, sample space and probability Probability is concerned with random experiments. That is, an experiment, the outcome of which cannot be predicted with certainty,

More information

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y. CS450 Final Review Problems Fall 08 Solutions or worked answers provided Problems -6 are based on the midterm review Identical problems are marked recap] Please consult previous recitations and textbook

More information

Delta Method. Example : Method of Moments for Exponential Distribution. f(x; λ) = λe λx I(x > 0)

Delta Method. Example : Method of Moments for Exponential Distribution. f(x; λ) = λe λx I(x > 0) Delta Method Often estimators are functions of other random variables, for example in the method of moments. These functions of random variables can sometimes inherit a normal approximation from the underlying

More information

1.1 Review of Probability Theory

1.1 Review of Probability Theory 1.1 Review of Probability Theory Angela Peace Biomathemtics II MATH 5355 Spring 2017 Lecture notes follow: Allen, Linda JS. An introduction to stochastic processes with applications to biology. CRC Press,

More information

Conditioning a random variable on an event

Conditioning a random variable on an event Conditioning a random variable on an event Let X be a continuous random variable and A be an event with P (A) > 0. Then the conditional pdf of X given A is defined as the nonnegative function f X A that

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Chapter 4 Multiple Random Variables

Chapter 4 Multiple Random Variables Review for the previous lecture Theorems and Examples: How to obtain the pmf (pdf) of U = g ( X Y 1 ) and V = g ( X Y) Chapter 4 Multiple Random Variables Chapter 43 Bivariate Transformations Continuous

More information

01 Probability Theory and Statistics Review

01 Probability Theory and Statistics Review NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement

More information

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Problem Set 8 Fall 2007

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Problem Set 8 Fall 2007 UC Berkeley Department of Electrical Engineering and Computer Science EE 6: Probablity and Random Processes Problem Set 8 Fall 007 Issued: Thursday, October 5, 007 Due: Friday, November, 007 Reading: Bertsekas

More information

Closed book and notes. 60 minutes. Cover page and four pages of exam. No calculators.

Closed book and notes. 60 minutes. Cover page and four pages of exam. No calculators. IE 230 Seat # Closed book and notes. 60 minutes. Cover page and four pages of exam. No calculators. Score Exam #3a, Spring 2002 Schmeiser Closed book and notes. 60 minutes. 1. True or false. (for each,

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

Multiple Random Variables

Multiple Random Variables Multiple Random Variables Joint Probability Density Let X and Y be two random variables. Their joint distribution function is F ( XY x, y) P X x Y y. F XY ( ) 1, < x

More information