MAS223 Statistical Modelling and Inference Examples
|
|
- Gerald Gallagher
- 5 years ago
- Views:
Transcription
1 Chapter MAS3 Statistical Modelling and Inference Examples Example : Sample spaces and random variables. Let S be the sample space for the experiment of tossing two coins; i.e. Define the random variables X to be the number of heads seen, S {HH, HT, T H, T T }. Y to be equal to 5 if we see both a head and a tail, and otherwise. Element of S Value of X Value of Y HH HT 5 T H 5 T T Example : Discrete random variables The random variables X and Y from Example are both discrete random variables. > Calculate P[X ]. If X then either X or X. We have P[X ] + P[X ] 3 4. > Sketch the distribution function of X. A sketch of its distribution function looks like: Example 3: Continuous random variables > Recall from MAS3 that an exponential random variable with parameter λ > has probability density function λe λx x > f X x Calculate P[ X ], and find the distribution function F X x.
2 We can calculate P[ X ] f X x dx λe λx dx [ e λx] x e λ e λ. To find the distribution function, note that for x we have P[X ] and for x > we have x x P[X x] f X u du λe λu du e λx. Therefore, e λx if x >, F X x A sketch of the distribution function F X x looks like: Example 4: Properties of distribution functions > Let x if x > F x Sketch F and show that F is a distribution function A sketch of F looks like To show that F is a distribution function, we ll check properties -3 from Section... From the definition, F x for all x. Since F x for all x we have lim x F x, and also lim x x.. Since F x for all x, its clear that F x is non-decreasing while x. If x < y then y < x so x y. Hence F is non-decreasing across all x R.
3 3. From its definition, F is continuous for x, and x,. Since F + F F, we have that F is continuous everywhere. Alternatively, in this course, we allow ourselves to prove continuity by drawing a sketch, as above. Hence, F is a distribution function, and as a result there exists a random variable X with distribution function F X F. Example 5: Calculating expectations and variances > Let X be an Exponential random variable, from Example 3, with p.d.f. λe λx x > f X x Find the mean and variance of X. We can calculate, integrating by parts, E[X] + xf X x dx [ λ e λx ] x λ. xλe λx [x e λx] x e λx dx For the variance, it is easiest to calculate E[X ] and then use from MAS3 that VarX E[X ] E[X]. So, E[X ] + λ x f X x dx xλe λx dx λ x λe λx [x e λx] where we use that we already calculated xλe λx dx λ. Hence, Chapter VarX E[X ] E[X] λ. Example 6: Calculating E[e Y ] where X N,. x > Let Y be a normal random variable, with mean and variance, with p.d.f. f Y y π e y /. x e λx dx Find E[e Y ]. We need to calculate E[e Y ] e y f Y y dy π e y e y / dy. 3
4 We can t evaluate this integral explicitly. However, we do know the value of a similar integral, that is we know P[Y R] π e y / dy. Our aim is to rewrite E[e Y ] into this form and hope we can deal with whatever else is left over. We can do so by completing the square: e y e y / exp y + y + Putting this into, we have E[e Y ] e / exp y + e y / e /. e / π e y / dy π e z / dz where z y. Then, using we have E[e Y ] e /. See Q.9 for a more general case of this method. Example 7: Mean and variance of the Gamma distribution > Let X have the Gaα, β distribution, where α, β >. Find the mean and variance of X. We can calculate E[X] xf X x dx β α Γα xα e βx dx βα Γα + Γα β α+ using Lemma.3 αγα using Lemma. βγα α β. Similarly, for the variance, So E[X ] x f X x dx β α Γα xα+ e βx dx βα Γα + Γα β α+ using Lemma.3 αα + Γα β Γα using Lemma. αα + β. VarX E[X ] E[X] αα + α β α β. 4
5 Example 8: Mean and variance of the Beta distribution > Let X have the Beα, β distribution, where α, β >. Find the mean and variance of X. For the mean, For the variance, E[X] Bα, β xα x β dx Bα +, β Bα, β / Γα + Γβ ΓαΓβ using.5 Γα + β + Γα + β αγαγα + β using Lemma. α + βγα + βγα α α + β. E[X ] Bα, β xα+ x β dx Bα +, β Bα, β / Γα + Γβ ΓαΓβ using.5 Γα + β + Γα + β αα + ΓαΓα + β using Lemma. α + βα + β + Γα + βγα αα + α + βα + β +. So, using that VarX E[X ] E[X] we have Chapter 3 VarX αα + α + βα + β + α α + β αα + α + β α α + β + α + β α + β + αβ α + β α + β +. Example 9: Cube root of the Be3, distribution. > Let X Be3, and let Y 3 X. Find the probability density function of Y. From.6, the p.d.f. of the Beα, β distribution is Bα,β f X x xα x β if x, 5
6 Note that, for any α >, Bα, ΓαΓ Γα + ΓαΓ αγα α by.5 and Lemma.. Putting this, along with α 3 and β into.6, the p.d.f. of X is 3x if x, f X x For the transformation, we use the function gx 3 x, which is strictly increasing. 3 The p.d.f. of X is non-zero on,, and g maps R X, to,, so gr X,. We have g y y 3, so dg dy 3y. Therefore, by Lemma 3. we have 3y 3 3y if y, f Y y otherwise, 9y 8 if y, In fact, using the same calculations as above, it can be seen that this is the p.d.f. of a Be9, distribution. See Q3.5 for a more general case. Example : Standardization of the normal distribution. > Let X Nµ, σ and define Y X µ σ. Show that Y N,. The p.d.f. of the normal distribution with mean µ and variance σ is f X x x µ exp πσ σ with range R X R. The function gx x µ σ is strictly increasing, and gr R. 6
7 If y x µ σ then x σy + µ, hence the inverse function is g y σy + µ, with derivative dg dy σ >. Hence, by Lemma 3., f Y y exp y σ πσ π exp which is the p.d.f. of a N, random variable. Example : The log-normal distribution. y > Find the probability density function of Y e X, where X Nµ, σ. Recall that Y is known as the log-normal distribution, which we introduced in Section... The probability density function of X is f X x πσ exp x µ σ, which is non-zero for all x R. Our transformation is gx e x, which is strictly increasing for all x R. The range of X is R, which is mapped by g to gr,. We have g y log y, and dg dy f Y y y. Hence, by Lemma 3. the p.d.f. of Y is given by yσ exp π log y µ σ if y, otherwise. Example : Square of a standard normal the chi-squared distribution. > Let X N, and let Y X. Find the p.d.f. of Y and verify that Y has the χ distribution. We aim to find the p.d.f. of Y and check that it matches the p.d.f. given for the χ distribution in Section.3.3. Note that R X R, and we can t apply Lemma 3. because gx x is not strictly monotone on R. If y < then P[Y y] because Y X. Moreover, because the normal distribution is a continuous distribution, P[X ], so also P[Y ] 7
8 This leaves y >, and in this case we have F Y y P[Y y] P[ y X y] P[X y] P[X y] Φ y Φ y Here, Φx P[X x] is the distribution function of the standard normal distribution. Differentiating with respect to y, we have f Y y y φ y y φ y y φ y πy exp y/. Here, φ is the probability density function of the standard normal distribution. We use that φx φ x. If we recall from Section.3. that Γ/ π, we then have y f Y y Γ/ exp y if y > which exactly matches the p.d.f. given for the χ distribution in Section.3.3. Chapter 4 Example 3: Joint probability density functions > Let T be the triangle {x, y : x,, y, x}. Define kx + y if x, y T fx, y Find the value of k such that f is a joint probability density function. First, we sketch the region T on which f X,Y x, y is non-zero. 8
9 We need fx, y for all x, y, which means we must have k. T fx, y dx dy. Therefore, fx, y dy dx k k T x k. kx + y dy dx [xy + y 3x dx ] x y dx Also, we need that So k. Here, to find the limits of integration, we describe the region T as being covered by vertical lines, one for each fixed x. With x fixed, the range of y that makes up T is y, x. That is, we use that T {x, y : x,, y, x}. > If X and Y have joint p.d.f. f X,Y x, y fx, y, find P[X + Y > ]. To find P[X + Y > ], we need to integrate f X,Y x, y over the region of x, y for which x, y T and x + y >. Let s call this region T, and sketch it. 9
10 We have T {x, y : x,, y x, x}. So, P[X + Y > ] x x x + y dy dx [ xy + y ] x y x dx 4x dx [ 4 3 x3 x 3 3 ] 3 Example 4: Marginal distributions > Let X, Y be as in Example 3. Find the marginal p.d.f.s of X and Y. For x,, f X x f X,Y x, y dy x x + y dy [ xy + y ] x y 3x. Here, to find the limits of the integral, we keep x fixed, and then look for the range of y for which f X,Y x, y is non-zero. That is, we use T {x, y : x,, y, x}. For x /,, we have f X,Y x, y, so 3x if x, f X x is the marginal p.d.f. of X. For y,, we have f Y y f X,Y x, y dx y x + y dx [ x + xy ] xy + y 3y. Here, to find the limits of the integral, we keep y fixed, and then look for the range of x for which f X,Y x, y is non-zero. That is, we use T {x, y : y,, x y, }. For y /, we have f X,Y x, y, so + y 3y if y, f Y y is the marginal p.d.f. of Y. Example 5: Conditional distributions
11 > Let X, Y be as in Example 3. For y,, find the conditional p.d.f. of X given Y y. We obtained f Y y in Example 4, and we know f X,Y x, y from Example 3. Note that, with y, fixed, f X,Y x, y is non-zero only for x y,. So, f X Y y x f X,Y x, y x+y +y 3y if x y, f Y y Example 6: Independence, factorizing f X,Y. > Are the random variables X and Y from Example 3 independent? The random variables X and Y from Example 3 are not independent as the p.d.f. x + y if x, y T fx, y otherwise cannot be factorised as a function of x times a function of y. > Let U and V be two random variables with joint probability density function ue u+3v if u >, v > f U,V u, v Are U and V independent? f U,V u, v can be factorised into a function of x and a function of y, 4ue u 3e 3v if u >, v > f U,V u, v guhv where 4ue u if u > 3e 3v if v > gu hv otherwise, Therefore, U and V are independent. In fact, in this case we can recognize that g is the p.d.f. of a Ga, and h is the p.d.f. of a Exp3, so U and V are Ga, and Exp3 respectively. Example 7: Covariance and correlation > Let X, Y be as in Example 3. Find the covariance CovX, Y.
12 We want to calculate CovX, Y E[XY ] E[X]E[Y ]. We have E[XY ] x x4 dx xyx + y dy dx Using the marginal probability density functions for X and Y that we found in Example 4, we have E[X] E[Y ] xf X x dx yf Y y dy evaluating these two integrals is left to you. So, > Find the correlation ρx, Y. We now need to find ρx, Y of X and Y. We have E[X ] E[Y ] CovX, Y x3x dx 3 4 y + y 3y dy CovX,Y VarX VarY. So, we also need to calculate the variances x f X x dx y f Y y dy x 3x dx 3 5 y + y 3y dy 7 3 again, evaluating these two integrals is left to you. From this we obtain, 3 and we get VarX E[X ] E[X] VarY E[Y ] E[Y ] ρx, Y / Example 8: Calculating conditional expectation > Let X, Y be as in Example 3. Let y,. Find E[X Y y] and E[X Y ]. We have already found the conditional p.d.f. of X in Example 5, it is x+y +y 3y if x y, f X Y y x
13 So, Hence, E[X Y y] y [ ] x + yx + y 3y dx 3 x3 + yx + y 3y y E[X Y ] + 3Y 5Y Y 3Y. + 3y 5y3 3 + y 3y. > Show that E[E[X Y ]] E[X]. To find E[E[X Y ]], we first note that E[X Y ] gy + 3Y 5Y Y 3Y use then use the usual method for finding the expectation of a function of Y. That is, E[E[X Y ]] E[gY ] gyf Y y dy + 3y 5y 3 3 dy We have already shown during Example 7 that E[X] 3 4. Example 9: Proof of E[E[X Y ]] E[X] It is no coincidence that E[E[X Y ]] E[X] in Example 8. In fact, this holds true for all pairs of random variables X and Y. Here is a general proof. We have E[X Y ] gy, where So, gy E[X Y y] xf X Y y x dx. E[E[X Y ]] E[gY ] E[X]. gyf Y y dy x xf X Y y xf Y y dx dy xf X,Y x, y dy dx xf X x dx f X,Y x, y dy dx by definition of the conditional p.d.f. by definition of the marginal p.d.f. Example : Calculation of expectation and variance by conditioning 3
14 Let X Ga, and, conditional on X x, let Y P ox. Then, using standard results about the mean and variance of Gamma/Poisson random variables, E[X], VarX, E[Y X] X and VarY X X. So, using the formulae from Lemma 4., Chapter 5 E[Y ] E[E[Y X]] E[X] VarY E[VarY X] + VarE[Y X] E[X] + VarX 3. Example : Transforming bivariate random variables > Let X Ga3, and Y Be,, and let X and Y be independent. Find the joint p.d.f. of the vector U, V, where U X + Y and V X Y. The p.d.f.s of X and Y are f X x x e x if x > otherwise, 6y y if y, f Y y By independence, their joint p.d.f. is 3x y ye x if x > and y, f X,Y x, y The transformation we want is u x + y and v x y. So, u + v x, u v y, and the inverse transformation is x u+v u v, and y. Hence, the Jacobian is J det x u y u x v y v. Now, we need to transform the region T {x, y : x >, y, } into the u, v plane. This region is bounded by the three lines x, y and y, which map respectively to the lines u v, u v and u v +. 4
15 Our transformed region must also be bounded by the three lines; to check which section of the sketch it is we simply find out where some x, y T maps to. We have, T which maps to,, so the shaded region is the image of T. Therefore, f u+v X,Y f U,V u, v, u v if u >, v u, u, v > u 3 3 u + v u v u + ve u+v if u >, v u, u, v > u Example : The Box-Muller transform, simulation of normal random variables Let S Exp and Θ U[, π, and let S and Θ be independent. Then S and Θ have joint p.d.f. given by 4π f S,Θ s, θ e s if s and θ [, π We can think of S and Θ as giving the location of a point S, Θ in polar co-ordinates. We transform this point into Cartesian co-ordinates, meaning that we want to use the transformation X S cosθ and Y S sinθ. Therefore, our transformation is x s cos θ, y s sin θ. This transformation maps the set of s, θ for which f S,Θ s, θ > onto all of R it is just Polar coordinates r, θ with r s. To find the inverse transformation, note that s x +y and y/x tan θ, so θ arctany/x. So the Jacobian is Hence, J det s x θ x s y θ y det x y y/x /x +y/x +y/x f X,Y x, y x +y π e for all x, y R. Now, we can factorise this as f X,Y x, y e x e y, π π + y/x y /x + y/x which implies that X and Y are independent standard normal random variables. Assuming we can simulate uniform random variables, then using the transformation in Q3.3 we can also simulate exponential random variables. Then, using above transformation, we can simulate standard normals. Example 3: Finding the distribution of a sum of Gamma random variables 5
16 > Suppose that two independent random variables X and Y follow the distributions X Ga4, and Y Ga,. Find the distribution of Z X + Y Let W X. So the transformation we want to apply is z x + y, w x. The inverse transformation is x w and y z w, so the Jacobian is x x J det z w det. y z By independence of X and Y, their joint p.d.f. is 4 Γ4 f X,Y x, y x3 e x Γ ye y if x, y > otherwise 6 6 x3 ye x+y if x, y > y w The region of x, y on which f X,Y x, y is non-zero is x > and y >. This is bounded by the lines x, y, which are respectively mapped to w and z w. The point, is mapped to,, meaning that the shaded area is the region on which f Z,W z, w is non-zero. Hence, the joint p.d.f. of Z and W is 6 6 f Z,W z, w w3 z we z if z > and w, z othwerwise. Lastly, to obtain the marginal p.d.f. of Z, we integrate out w. For z >, f Z z 6 6 e z 6 6 e z z 6 6 z5 e z 6 Γ6 z5 e z. w 3 z w 4 dw z 5 4 z5 5 6
17 For z we have f Z z. So, we can recognise f Z z as the p.d.f. of a Ga6, random variable, and conclude that Z Ga6,. More generally, this method can be used to show that if X Gaα, β, Y Gaα, β and X and Y are independent, then X + Y Gaα + α, β for any α, α, β. See Q5.8. Chapter 6 Example 4: Mean vectors and covariance matrices Recall the random variables X, Y from Example 3. In Example 7 we calculated that E[X] and E[Y ]. So the mean vector of X X, Y T is E[X] In Example 7 we also calculated that CovX, Y 48 Therefore, the covariance matrix of X is CovX Example 5: Affine transformation of a random vector > Suppose that the random vector X X, X, X 3 T has E[X], CovX and that Var[X] 8, VarY 7. Define two new random variables, U X X + X 3 and V X X 3 +. Find the mean vector and covariance matrix of U U, V T. We can express the relationship between X and U as an affine transformation: X U U AX + b X +. V So, we can use Lemma 6.3 to find the mean vector and covariance matrix of U. Firstly,. X 3 E[U] AE[X] + b + + 7
18 and secondly, CovU A CovXA T > Find the correlation coefficient ρu, V.. We can read off VarU, VarV and CovU, V from the covariance matrix of U. So the correlation coefficient of U and V is ρu, V Example 6: Variance of a sum CovU, V VarU VarV > Suppose that two random variables X and Y have variances σx and σ Y, and covariance CovX, Y. Find the variance of X + Y. If we write U X + Y, then X U U Y where U denotes the matrix with the single entry U. We usually won t bother to write brackets around matrices/vectors. We can apply Lemma 6.3 to this case, with A and X X, Y T, to obtain that CovU A CovXA T. The covariance matrix of X is given by σx CovX, Y CovX. CovX, Y Since U is, CovU VarU, so we have σx CovX, Y VarX + Y CovX, Y which you should recognize. σ Y σ X + CovX, Y + σ Y, σ Y 8
19 Example 7: The bivariate normal with independent components > Find the p.d.f. of the bivariate normal X X, X T in the case where CovX, X. From Definition 6.4, the general bivariate normal distribution X, with mean vector µ and covariance matrix Σ has joint probability density function f X,X x, x π σ σ exp σ x µ σ x µ x µ + σ x µ σ σ σ σ If we assume CovX, X σ σ, then the p.d.f. simplifies to f X,X x, x exp x µ πσ σ σ x µ σ exp x µ exp x µ πσ πσ σ f X x f X x. 4 Here, in the final line we see factorize f X,X x, x, into the product of the p.d.f. of the Nµ, σ random variable X and the p.d.f. of the Nµ, σ random variables X. Therefore, in this case X and X are independent. Note that, setting µ µ and σ σ, we recover 6.. We have shown above that if CovX, X then X and X are independent. If X and X are independent then it is automatic that CovX, X. Hence: X and X are independent if and only if CovX, X. We will record this fact as Lemma 6.8. Example 8: Plotting the p.d.f. of the bivariate normal. The pdf of a bivariate normal is a bell curve : σ This example is the standard bivariate normal Nµ, Σ where µ, and Σ. It was generated in Mathematica with the code all one line 9
20 Plot3D[/Pi E^-x^ + y^/, {x, -4, 4}, {y, -4, 4}, PlotRange -> All, ColorFunction -> ColorData["Rainbow"][#3] &] Changing µ alters the positive of the center of the bell, without changing the shape of the curve. For example, taking µ, and Σ gives Changing Σ afters the shape of the bell. For example, taking µ, and Σ 4 gives Changing both µ and Σ together results in a bell curve that is both translated and reshaped. Example 9: Marginal distributions of the bivariate normal, and their covariance. > Let X X, X T have distribution N µ, Σ where µ 3 and Σ 3. Write down the marginal distributions of X and X. From Lemma 6.7 we know that X and X are both univariate normals. We can read their means and covariances off from the mean vector µ and covariance matrix Σ. We have X Nµ, σ, so X N,, and also X Nµ, σ so X N3, 3.
21 > Find CovX, X and ρx, X. Are X and X independent? From the covariance matrix, CovX, X. Hence, ρx, X CovX, X VarX VarX. 3 6 Clearly, we have CovX, X so X and X are not independent. Example 3: Conditional distributions for bivariate normal > Let a R and let X N µ, Σ where µ and Σ 3 3. Find the conditional distribution of X given X a. By Lemma 6.9, the conditional distribution of X given X a is a univariate normal with mean given by µ + ρ σ σ x µ and variance ρ σ. In this case, µ, µ, ρ 3/, σ, σ, and x a. So, µ + 3a and σ 9. Hence, the conditional distribution of X given X a is N + 3a,. Example 3: Transformations of bivariate normal > Let X N µ, Σ where µ and Σ are as in Example 3. Let Y X + X X Y Find the distribution of Y Y, Y T. We can write Y as an affine transformation of X, that is X Y AX + b +. The matrix A is a non-singular matrix, so by Lemma 6., Y is a bivariate normal. Therefore, if we can find the mean vector and covariance matrix of Y, we know the distribution of Y. and X 4 E[Y] AE[X] + b, CovY A CovXA T So, the distribution of Y is [ ] Y N,. 3 Example 3: Affine transformation of a three dimensional normal distribution.
22 > Suppose X X X 3 4 N 3, 9. 4 Find the joint distribution of Y Y, Y T where Y X X and Y X + X + X 3. so and We can write, X Y AX X, X 3 E[Y] AE[X], 4 CovY A CovXA T It is not hard to see that A is an onto transformation, so Y has a bivariate normal distribution here we use the multivariate equivalent of Lemma 6.. Hence, [ ] Y 9 6 N, > Find ρx, X 3. Are X and X 3 independent? From the covariance matrix of X, we can read off Y ρx, X 3 CovX, X 3 VarX VarX Since X and X 3 are components of a multivariable normal distribution, and CovX, X 3, by the three dimensional equivalent of Lemma 6.8 X and X 3 are independent. Chapter 7 Example 33: Maximising a function
23 > Find the value of θ which maximises fθ θ 5 θ on the range θ [, ]. First, we look for turning points. We have f θ 5θ 4 θ + θ 5 θ 4 5 6θ So the turning points are at θ and θ 5 6. To see which ones are local maxima, we calculate the second derivative: f θ 4θ 3 5 6θ + θ 4 6 θ 3 3θ. So, f < and θ 5 6 is a local maximum. Unfortunately, f, so we don t know if θ is a local maximum, minimum or inflection. However, we can check that f, so it doesn t matter which, we still have f < f 5 6. Hence, θ 5 6 is the global maximiser. Example 34: Likelihood functions and maximum likelihood estimators > Let X be a random variable with Expθ distribution, where the parameter λ is unknown. Find the and sketch the likelihood function of X, given the data x 3. The likelihood function is Lθ; 3 f X 3; θ θe 3θ defined for all θ Θ,. We can plot this in R, for θ,, with the command curvex*exp-3x, from, to5, xlab~theta, ylab"l"~theta~";4 3
24 Note that we use x as the θ variable here because R hard-codes its use of x as a graph variable. The result is > Given this data, find the likelihood of θ,,,, 5. Amongst these values of θ, which has the highest likelihood? The likelihoods are L ; 3 e 3.7 L ; 3 e 3. L; 3 e 3.5 L; 3 e 6.5 L5; 3 5e So, restricted to looking at these values, θ has the highest likelihood. > Find the maximum likelihood estimator of θ,, based on the single data point x 3. We need to find the value of θ Θ which maximises Lθ; 3. We differentiate, to look for turning points, obtaining dl dθ e 3θ 3θe 3θ e 3θ 3θ. 4
25 Hence, there is only one turning point, at θ 3. We differentiate again, obtaining d L dθ 3e 3θ 3θ + e 3θ 3 e 3θ 6 + 9θ At θ 3, we have d L dθ e <, so the turning point at θ 3 is a local maximum. Since it is the only turning point, it is also the global maximum. Hence, the maximum likelihood estimator of θ is ˆθ 3. Example 35: Models, parameters and data aerosols. > The particle size distribution of an aerosol is the distribution of the diameter of aerosol particles within a typical region of air. The term is also used for particles within a powder, or suspended in a fluid. In many situations, the particle size distribution is modelled using the log-normal distribution. It is typically reasonable to assume that the diameters of particles are independent. Assuming this model, find the joint probability density function of the diameters observed in a sample of n particles, and state the parameters of the model. Recall that the p.d.f. of the log-normal distribution is f Y y yσ exp log y µ π σ if y, The parameters of this distribution, and hence also the parameters of our model, are µ R and σ,. Since the diameters of particles are assumed to be independent, the joint probability density function of Y Y, Y,..., Y n, where Y i is the diameter of the i th particle, is f Y y,..., y n n f Yi y i i πσ n/ y y...y n exp n log y i µ σ if y i > for all i i otherwise. Note that, if one or more of the y i is less than or equal to zero then f Yi y i, which means that also f Y y,..., y n. Example 36: Maximum likelihood estimation with i.i.d. data. > Let X Bernθ, where θ is an unknown parameter. Suppose that we have 3 independent samples of X, which are x {,, }. Find the likelihood function of θ, given this data. 5
26 The probability function of a single Bernθ random variable is θ if x f X x; θ θ if x otherwise Since our three samples are independent, we model x as a sample from the joint distribution X X, X, X 3, where f X x; θ 3 f Xi x i ; θ i and f Xi is the p.d.f. of a single Bernθ random variable. Since f Xi has several cases, it would be unhelpful to try and expand out this formula before we put in values for the x i. Our likelihood function is therefore Lθ; x f X ; θ f X ; θ f X3 ; θ θθθ θ θ 3. The range of values that the parameter θ can take is Θ [, ]. > Find the maximum likelihood estimator of θ, given the data x. We seek to maximize Lθ; x for θ [, ]. Differentiating once, dl dθ θ 3θ θ 3θ so the turning points are at θ and θ 3. Differentiating again, d L dθ 6θ which gives d L θ dθ and d L θ/3 dθ 4. Hence, θ is a local minimum and θ 3 is a local maximum, so θ 3 maximises Lθ; x over θ [, ]. The maximum likelihood estimator of θ is therefore ˆθ 3. This is, hopefully, reassuring. The number of s in our sample of 3 was, so using independence θ 3 seems like a good guess. See Q7. for a much more general case of this example. Example 37: Maximum likelihood estimation radioactive decay. > Atoms of radioactive elements decay as time passes, meaning that any such atom will, at some point in time, suddenly break apart. This process is known as radioactive decay. The time taken for a single atom of, say, carbon-5 to decay is usually modelled as an exponential random variable, with unknown parameter λ,. The parameter λ is known as the decay rate. The times at which atoms decay are known to be independent. 6
27 Using this model, find the likelihood function for the time to decay of a sample of n carbon-5 atoms. The decay time X i of the i th atom is exponential with parameter λ,, and therefore has p.d.f. λe λxi if x i > f Xi x i ; λ Since each atom decays independently, the joint distribution of X X i n i is n n λe λxi if x i > for all i f X x; λ f Xi x i ; λ i i otherwise. λ n exp λ n i x i if x i > for all i Therefore, the likelihood function is λ n exp λ n i Lλ; x x i if x i > for all i The range of possible values of the parameter λ is Θ,. > Suppose that we have sampled the decay times of 5 carbon-5 atoms in seconds, accurate to two decimal places, and found them to be x {.5,.9,.88, 4.6, 9.75,.6,.3,.7,.3,.8, 4.5, 9.5,.67, 3.79, 4.3}. Find the maximum likelihood estimator of λ, based on this data. Given this data, for which 5 x i 47.58, our likelihood function is Differentiating, we have i Lλ; x λ 5 e 47.58λ. dl dλ 5λ4 e 47.58λ 47.58λ 5 e 47.58λ λ λe 47.58λ which is zero only when λ or λ 5/ Since λ is outside of the range Θ, of possible parameter values, the only turning point of interest is λ 5/ Differentiating again with the details left to you, we end up with d L dλ λ3 47.4λ λ 5 e 47.58λ λ λ λ e 47.58λ 7
28 Evaluating at our turning point gives d L dλ λ5/ e 5 < So, our turning point is a local maximum. Since there are no other turning points within the allowable range our turning point is the global maximum. estimator of λ, given our data x, is ˆλ Hence, the maximum likelihood In reality, physicists are able to collect vastly more data than n 5, but even with 5 data points we are not far away from the true value of λ, which is λ Of course, by true value here we mean the value that has been discovered experimentally, with the help of statistical inference. So-called carbon dating typically uses carbon-4, which has a much slower decay rate of approximately. 4. Carbon-4 is present in many living organisms and, crucially, the proportion of carbon in living organisms that is carbon-4 is essentially the same for all living organisms. Once organisms die, the carbon-4 radioactively decays. The key idea behind carbon dating is that, by measuring the concentration of carbon-4 within a fossil, scientists can estimate how long ago that fossil lived. To do so, a highly accurate estimate of the decay rate of carbon-4 is needed. Example 38: Maximum likelihood estimation via log-likelihood mutations in DNA. > When organisms reproduce, the DNA or RNA of the offspring is a combination of the DNA of its one, or two parents. Additionally, the DNA of the offspring contains a small number of locations in which it differs from its parents. These locations are called mutations. The number of mutations per unit length of DNA is typically modelled using a Poisson distribution, with an unknown parameter θ,. The numbers of mutations found in disjoint sections of DNA are independent. Using this model, find the likelihood function for the number of mutations present in a sample of n disjoint strands of DNA, each of which has unit length. Let X i be the number of mutations in the i th strand of DNA. So, under our model, f Xi x i ; θ e θ θ xi for x i {,,,...}, and f Xi x i if x i / N {}. Since we assume the X i are independent, x i! the joint distribution of X X, X,..., X n has probability function f X x n i e θ θ xi x i! x!x!... x n! e nθ θ n xi Actually, the biological details here are rather complicated, and we omit discussion of them. 8
29 provided all x i N {}, and zero otherwise. Therefore, our likelihood function is Lθ; x The range of possible values for θ is Θ,. x!x!... x n! e nθ θ n xi. > Let x be a vector of data, where x i is the number of mutations observed in a distinct unit length segment of DNA. Suppose that at least one of the x i is non-zero. Find the corresponding log-likelihood function, and hence find the maximum likelihood estimator of θ. The log-likelihood function is lθ; x log Lθ; x, so log Lθ, x log x!x!... x n! e nθ θ n xi n n logx i! nθ + log θ x i. i We now look to maximise lθ; x, over θ,. Differentiating, we obtain dl dθ n + n x i. θ Note that this is much simpler than what we d get if we differentiated Lθ; x. So, the only turning point of lθ, x is at θ n n i x i. Differentiating again, we have d l dθ θ i n x i. Since our x i are counting the occurrences of mutations, x i, and since at least one is non-zero we have d l dθ < for all θ. Hence, our turning point is a maximum and, since it is the only maximum, is also the global maximum. Therefore, the maximum likelhood estimator of θ is ˆθ n x i. n > Mutations rates were measured, for HIV patients, and there were found to be { } x 9, 6, 37, 8, 4, 34, 37, 6, 3, 48, 45 mutations per 4 possible locations i.e. per unit length. This data comes from the article Cuevas et al. 5. i Assuming the model suggested above, calculate the maximum likelihood estimator of the mutation rate of HIV. The data has x i i x i so we conclude that the maximum likelihood estimator of the mutation rate θ, given this data, is ˆθ i 9
30 Example 39: Maximum likelihood estimation via log-likelihood spectrometry. > Using a mass spectrometer, it is possible to measure the mass 3 of individual molecules. For example, it is possible to measure the masses of individual amino acid molecules. A sample of 5 amino acid molecules, which are all known to be of the same type and therefore, the same mass, were reported to have masses x {65.76, 4.4, 94., 3.3, 5., 4.77, 6., 86.4, 9.4, 66.7, 9., , 58.9}. It is known that these molecules are either Alanine, which has mass 7., or Leucine, which has mass 3.. Given a molecule of mass θ, the spectrometer is known to report its mass as X Nθ, 35, independently for each molecule. Using this model, and the data above, find the likelihoods of Alanine and Leucine. Specify which of these has the greatest the likelihood. Our model, for the reported mass X of a single molecule with real weight θ, is X N, 35. Therefore, X i Nθ, 3 and the p.d.f. of a single data point is f Xi x i exp x i θ π Therefore, the p.d.f. of the reported masses X X,..., X n of n molecules is n f X x f Xi x i π n/ 35 n exp n x i θ. 45 i We know that, in reality, θ must be one of only two different values; 7. for Alanine and 3. for Leucine. Therefore, our likelihood function is Lθ; x π n/ 35 n exp 45 i n x i θ and the possible range of values for θ is the two point set Θ {7., 3.}. We need to find out which of these two values maximises the likelihood. Our data x contains n 5 data points. A short calculation use e.g. R shows that i 45 and, therefore, that 5 i x i 7..7, 45 5 i x i L7.; x.9 34, L3.; x We conclude that θ 7. has much greater likelihood than θ 3., so we expect that the molecules sampled are Alanine. 3 This is a simplification; in reality a mass spectrometer measure the mass to charge ratio of the molecule, but since the charges of molecule are already known, the mass can be inferred later. Atomic masses are measured in so-called atomic mass units. 3
31 Note that, if we were to differentiate as we did in other examples, we would find the maximiser θ for Lθ; x across the whole range θ,, which turns out to be θ 8.7. This is not what we want here! The design of our experiment has meant that the range of possible values for θ is restricted to the two point set Θ {7., 3.}. See Q7.5 for the unrestricted case. Example 4: Two parameter maximum likelihood estimation rainfall. > Find the maximum likelihood estimator of the parameter vector θ µ, σ when the data x x, x,..., x n are modelled as i.i.d. samples from a normal distribution Nµ, σ. Our parameter vector is θ µ, σ, so let us write v σ to avoid confusion. As a result, we are interested in the parameters θ µ, v, and the range of possible values of θ is Θ R,. The p.d.f. of the univariate normal distribution Nµ, v is f X x πv e x µ /v. Writing X X,..., X n, where the X i are i.i.d. univariate Nµ, v random variables, the likelihood function of X is Lθ; x f X x Therefore, the log likelihood is lθ; x n exp πv n/ v logπ + logv v n x i µ. i n x i µ. We now look to maximise lθ; x over θ Θ. The partial derivatives are l µ n n x i µ x i nµ v v i i l v n v + n v x i µ. i Solving l µ gives µ n n i x i x. Solving l v gives v n n i x i µ. So both partial derivatives will be zero if and only if i µ x, v n n x i x. 5 i This gives us the value of θ µ, v at the single turning point of l. 3
32 so Next, we use the Hessian matrix to check if this point is a local maximum. We have l µ n v l µ v v l v n x i nµ i n v v 3 n x i µ Evaluating these at our turning point, we get l µ ṋ 5 v l n µ v 5 v x i n x i l v n 5 v n v 3 x i x n v n nˆv v3 v H i i n v. n v Since n n v < and det H v >, our turning point 5 is a local maximum. Since it is the 3 only turning point, it is also the global maximum. Hence, the MLE is ˆµ x ˆσ ˆv n n x i x. Note ˆµ is the sample mean, and ˆσ is the biased sample variance. i > For the years 985-5, the amount of rainfall in milimeters recorded as falling on Sheffield in December is as follows: {78., 4.3, 38., 36., 59., 36., 78.4, 67.4, 7.4, 3.9, 7.4, 98., 79.4, 57.9, 35.6, 8., 8., 9.8, 6.5, 46.3, 56.7, 4., 74.9, 5.8, 66., 8.8, 4.6, 36., 69.8,.,.} This data comes from the historical climate data stored by the Met Office 4. Meteorologists often model the long run distribution of rainfall by a normal distribution although in some cases the Gamma distribution is used. Assuming that we choose to model the amount of rainfall in Sheffield each December by a normal distribution, find the maximum likelihood estimators for µ and σ. The data has n 3, and x 3 3 i 93.9, 3 3 i x i x
33 So we conclude that, according to our model, the maximum likelihood estimators are ˆµ 93.9 and ˆσ 4.4, which means that Sheffield receives a N93.9, 4.4 quantity of rainfall, in millimetres, each December. Example 4: Maximum likelihood estimation for the uniform distribution > Find the maximum likelihood estimator of the parameter θ when the data x x, x,..., x n are i.i.d. samples from a uniform distribution U[, θ], with unknown parameter θ >. Here the p.d.f. of X i is fx θ for x θ and zero otherwise. So the likelihood, for θ Θ R +, is θ if θ x Lθ; x n i for all i if θ < x i for some i θ if θ max n i x i if θ < max i x i. Differentiating the likelihood, we see that Lθ; x is decreasing but positive for θ > max i x i. For θ < max i x i we know Lθ; x, so by looking at the graph, we can see that the maximum occurs at This is the MLE. θ ˆθ max x i. i,...,n Example 4: Interval estimation based on likelihood > Suppose that we have i.i.d. data x x, x,..., x n, for which each data point is modelled as a random sample from Nµ, σ where µ is unknown and σ is known. Find the k-likelihood region R k for the parameter µ. First, we need to find the MLE ˆµ of µ. The likelihood function for our model is n Lµ; x φx i ; µ πσ exp n n/ σ x i µ, i where the range of parameter values is all µ R. The log likelihood is lµ; x n logπ + logσ n σ x i µ. 33 i i
34 The usual process of maximisation which is left for you and is a simplified case of Example 4 shows that the maximum likelihood estimator is the sample mean, ˆµ n n x i. i Now we are ready to identify the k-likelihood region for µ. By definition, the k-likelihood region is So, µ R k if and only if R k {µ R : lµ; x lˆµ; x k}. σ n i x i µ σ We can simplify this inequality, by noting that n x i µ i n x i ˆµ i n x i ˆµ k. i n x i x i µ + µ x i + x iˆµ ˆµ i nµ nˆµ + ˆµ µ n i nµ nˆµ + ˆµ µnˆµ nµ + ˆµ µˆµ nˆµ µ. x i So, µ R k if and only if or in other words, n σ ˆµ µ k, [ ] k k R k ˆµ σ n, ˆµ + σ. n Example 43: Hypothesis tests based on likelihood > In Example 37, if we used a -likelihood test, would we accept the hypothesis that the radioactive decay of carbon-5 is equal to λ.7? We had found, given the data, that the likelihood function of θ was Lλ; x λ 5 e 47.58λ and the maximum likelihood estimator of λ was ˆλ.3. The -likelihood region for λ is the set so λ R if and only if R { } λ > : Lλ; x e Lˆλ; x, λ 5 e 47.58λ e L.3; x
35 Note that, unlike the previous example, we can t simplify this inequality and find a nice form for the likelihood region. Our hypothesis is that, in fact, λ.7. Our -likelihood test will pass if λ.7 is within the -likelihood region, and fail if not. We can evaluate use e.g. R,.7 5 e and note that Hence λ.7 is within the -likelihood region and we accept the hypothesis. 35
MAS223 Statistical Inference and Modelling Exercises
MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,
More informationMAS223 Statistical Inference and Modelling Exercises and Solutions
MAS3 Statistical Inference and Modelling Exercises and Solutions The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up
More informationJoint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:
Joint Distributions Joint Distributions A bivariate normal distribution generalizes the concept of normal distribution to bivariate random variables It requires a matrix formulation of quadratic forms,
More informationExercises and Answers to Chapter 1
Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean
More informationStatistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v }
Statistics 35 Probability I Fall 6 (63 Final Exam Solutions Instructor: Michael Kozdron (a Solving for X and Y gives X UV and Y V UV, so that the Jacobian of this transformation is x x u v J y y v u v
More informationLecture 1: August 28
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random
More information18 Bivariate normal distribution I
8 Bivariate normal distribution I 8 Example Imagine firing arrows at a target Hopefully they will fall close to the target centre As we fire more arrows we find a high density near the centre and fewer
More information18.440: Lecture 28 Lectures Review
18.440: Lecture 28 Lectures 18-27 Review Scott Sheffield MIT Outline Outline It s the coins, stupid Much of what we have done in this course can be motivated by the i.i.d. sequence X i where each X i is
More informationMA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems
MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions
More information2 Functions of random variables
2 Functions of random variables A basic statistical model for sample data is a collection of random variables X 1,..., X n. The data are summarised in terms of certain sample statistics, calculated as
More information18.440: Lecture 28 Lectures Review
18.440: Lecture 28 Lectures 17-27 Review Scott Sheffield MIT 1 Outline Continuous random variables Problems motivated by coin tossing Random variable properties 2 Outline Continuous random variables Problems
More informationChapter 5 continued. Chapter 5 sections
Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions
More informationLecture 25: Review. Statistics 104. April 23, Colin Rundel
Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April
More informationThis exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text.
TEST #3 STA 5326 December 4, 214 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. (You will have access to
More information4. CONTINUOUS RANDOM VARIABLES
IA Probability Lent Term 4 CONTINUOUS RANDOM VARIABLES 4 Introduction Up to now we have restricted consideration to sample spaces Ω which are finite, or countable; we will now relax that assumption We
More informationconditional cdf, conditional pdf, total probability theorem?
6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random
More informationBivariate Transformations
Bivariate Transformations October 29, 29 Let X Y be jointly continuous rom variables with density function f X,Y let g be a one to one transformation. Write (U, V ) = g(x, Y ). The goal is to find the
More informationStatistics STAT:5100 (22S:193), Fall Sample Final Exam B
Statistics STAT:5 (22S:93), Fall 25 Sample Final Exam B Please write your answers in the exam books provided.. Let X, Y, and Y 2 be independent random variables with X N(µ X, σ 2 X ) and Y i N(µ Y, σ 2
More informationProbability and Distributions
Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated
More informationIntroduction to Normal Distribution
Introduction to Normal Distribution Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 17-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Introduction
More informationRandom Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R
In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample
More informationConditional densities, mass functions, and expectations
Conditional densities, mass functions, and expectations Jason Swanson April 22, 27 1 Discrete random variables Suppose that X is a discrete random variable with range {x 1, x 2, x 3,...}, and that Y is
More information3. Probability and Statistics
FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important
More informationCourse: ESO-209 Home Work: 1 Instructor: Debasis Kundu
Home Work: 1 1. Describe the sample space when a coin is tossed (a) once, (b) three times, (c) n times, (d) an infinite number of times. 2. A coin is tossed until for the first time the same result appear
More informationPart IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015
Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More informationChapter 5. Chapter 5 sections
1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions
More informationMATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours
MATH2750 This question paper consists of 8 printed pages, each of which is identified by the reference MATH275. All calculators must carry an approval sticker issued by the School of Mathematics. c UNIVERSITY
More informationPCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities
PCMI 207 - Introduction to Random Matrix Theory Handout #2 06.27.207 REVIEW OF PROBABILITY THEORY Chapter - Events and Their Probabilities.. Events as Sets Definition (σ-field). A collection F of subsets
More information18.440: Lecture 26 Conditional expectation
18.440: Lecture 26 Conditional expectation Scott Sheffield MIT 1 Outline Conditional probability distributions Conditional expectation Interpretation and examples 2 Outline Conditional probability distributions
More informationChp 4. Expectation and Variance
Chp 4. Expectation and Variance 1 Expectation In this chapter, we will introduce two objectives to directly reflect the properties of a random variable or vector, which are the Expectation and Variance.
More informationSTA 256: Statistics and Probability I
Al Nosedal. University of Toronto. Fall 2017 My momma always said: Life was like a box of chocolates. You never know what you re gonna get. Forrest Gump. Exercise 4.1 Let X be a random variable with p(x)
More information1 Probability theory. 2 Random variables and probability theory.
Probability theory Here we summarize some of the probability theory we need. If this is totally unfamiliar to you, you should look at one of the sources given in the readings. In essence, for the major
More informationNotes on the Multivariate Normal and Related Topics
Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions
More informationRandom Variables and Their Distributions
Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital
More information5 Operations on Multiple Random Variables
EE360 Random Signal analysis Chapter 5: Operations on Multiple Random Variables 5 Operations on Multiple Random Variables Expected value of a function of r.v. s Two r.v. s: ḡ = E[g(X, Y )] = g(x, y)f X,Y
More informationContinuous Random Variables
1 / 24 Continuous Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay February 27, 2013 2 / 24 Continuous Random Variables
More information1 Presessional Probability
1 Presessional Probability Probability theory is essential for the development of mathematical models in finance, because of the randomness nature of price fluctuations in the markets. This presessional
More informationMULTIVARIATE PROBABILITY DISTRIBUTIONS
MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined
More informationPerhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.
Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage
More informationE X A M. Probability Theory and Stochastic Processes Date: December 13, 2016 Duration: 4 hours. Number of pages incl.
E X A M Course code: Course name: Number of pages incl. front page: 6 MA430-G Probability Theory and Stochastic Processes Date: December 13, 2016 Duration: 4 hours Resources allowed: Notes: Pocket calculator,
More informationJoint Probability Distributions and Random Samples (Devore Chapter Five)
Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete
More informationMATH/STAT 3360, Probability Sample Final Examination Model Solutions
MATH/STAT 3360, Probability Sample Final Examination Model Solutions This Sample examination has more questions than the actual final, in order to cover a wider range of questions. Estimated times are
More informationStat 5101 Notes: Algorithms
Stat 5101 Notes: Algorithms Charles J. Geyer January 22, 2016 Contents 1 Calculating an Expectation or a Probability 3 1.1 From a PMF........................... 3 1.2 From a PDF...........................
More informationExpectation and Variance
Expectation and Variance August 22, 2017 STAT 151 Class 3 Slide 1 Outline of Topics 1 Motivation 2 Expectation - discrete 3 Transformations 4 Variance - discrete 5 Continuous variables 6 Covariance STAT
More informationSTA 256: Statistics and Probability I
Al Nosedal. University of Toronto. Fall 2017 My momma always said: Life was like a box of chocolates. You never know what you re gonna get. Forrest Gump. There are situations where one might be interested
More informationStat410 Probability and Statistics II (F16)
Stat4 Probability and Statistics II (F6 Exponential, Poisson and Gamma Suppose on average every /λ hours, a Stochastic train arrives at the Random station. Further we assume the waiting time between two
More informationt x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.
Mathematical Statistics: Homewor problems General guideline. While woring outside the classroom, use any help you want, including people, computer algebra systems, Internet, and solution manuals, but mae
More informationContinuous Random Variables
Continuous Random Variables Recall: For discrete random variables, only a finite or countably infinite number of possible values with positive probability. Often, there is interest in random variables
More informationMore than one variable
Chapter More than one variable.1 Bivariate discrete distributions Suppose that the r.v. s X and Y are discrete and take on the values x j and y j, j 1, respectively. Then the joint p.d.f. of X and Y, to
More informationECE Lecture #9 Part 2 Overview
ECE 450 - Lecture #9 Part Overview Bivariate Moments Mean or Expected Value of Z = g(x, Y) Correlation and Covariance of RV s Functions of RV s: Z = g(x, Y); finding f Z (z) Method : First find F(z), by
More information1 Review of Probability and Distributions
Random variables. A numerically valued function X of an outcome ω from a sample space Ω X : Ω R : ω X(ω) is called a random variable (r.v.), and usually determined by an experiment. We conventionally denote
More informationHT Introduction. P(X i = x i ) = e λ λ x i
MODS STATISTICS Introduction. HT 2012 Simon Myers, Department of Statistics (and The Wellcome Trust Centre for Human Genetics) myers@stats.ox.ac.uk We will be concerned with the mathematical framework
More informationLecture 2: Review of Probability
Lecture 2: Review of Probability Zheng Tian Contents 1 Random Variables and Probability Distributions 2 1.1 Defining probabilities and random variables..................... 2 1.2 Probability distributions................................
More information14.30 Introduction to Statistical Methods in Economics Spring 2009
MIT OpenCourseWare http://ocw.mit.edu 14.30 Introduction to Statistical Methods in Economics Spring 2009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationSTA 2201/442 Assignment 2
STA 2201/442 Assignment 2 1. This is about how to simulate from a continuous univariate distribution. Let the random variable X have a continuous distribution with density f X (x) and cumulative distribution
More informationHypothesis Testing: The Generalized Likelihood Ratio Test
Hypothesis Testing: The Generalized Likelihood Ratio Test Consider testing the hypotheses H 0 : θ Θ 0 H 1 : θ Θ \ Θ 0 Definition: The Generalized Likelihood Ratio (GLR Let L(θ be a likelihood for a random
More informationReview of Probability Theory
Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving
More informationAPPM/MATH 4/5520 Solutions to Exam I Review Problems. f X 1,X 2. 2e x 1 x 2. = x 2
APPM/MATH 4/5520 Solutions to Exam I Review Problems. (a) f X (x ) f X,X 2 (x,x 2 )dx 2 x 2e x x 2 dx 2 2e 2x x was below x 2, but when marginalizing out x 2, we ran it over all values from 0 to and so
More informationPart IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015
Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More informationExam 2. Jeremy Morris. March 23, 2006
Exam Jeremy Morris March 3, 006 4. Consider a bivariate normal population with µ 0, µ, σ, σ and ρ.5. a Write out the bivariate normal density. The multivariate normal density is defined by the following
More informationFormulas for probability theory and linear models SF2941
Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms
More informationWe introduce methods that are useful in:
Instructor: Shengyu Zhang Content Derived Distributions Covariance and Correlation Conditional Expectation and Variance Revisited Transforms Sum of a Random Number of Independent Random Variables more
More informationPractice Examination # 3
Practice Examination # 3 Sta 23: Probability December 13, 212 This is a closed-book exam so do not refer to your notes, the text, or any other books (please put them on the floor). You may use a single
More information4. Distributions of Functions of Random Variables
4. Distributions of Functions of Random Variables Setup: Consider as given the joint distribution of X 1,..., X n (i.e. consider as given f X1,...,X n and F X1,...,X n ) Consider k functions g 1 : R n
More informationMath 152. Rumbos Fall Solutions to Assignment #12
Math 52. umbos Fall 2009 Solutions to Assignment #2. Suppose that you observe n iid Bernoulli(p) random variables, denoted by X, X 2,..., X n. Find the LT rejection region for the test of H o : p p o versus
More informationARCONES MANUAL FOR THE SOA EXAM P/CAS EXAM 1, PROBABILITY, SPRING 2010 EDITION.
A self published manuscript ARCONES MANUAL FOR THE SOA EXAM P/CAS EXAM 1, PROBABILITY, SPRING 21 EDITION. M I G U E L A R C O N E S Miguel A. Arcones, Ph. D. c 28. All rights reserved. Author Miguel A.
More informationChapter 4 continued. Chapter 4 sections
Chapter 4 sections Chapter 4 continued 4.1 Expectation 4.2 Properties of Expectations 4.3 Variance 4.4 Moments 4.5 The Mean and the Median 4.6 Covariance and Correlation 4.7 Conditional Expectation SKIP:
More informationThis does not cover everything on the final. Look at the posted practice problems for other topics.
Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry
More informationDistributions of Functions of Random Variables. 5.1 Functions of One Random Variable
Distributions of Functions of Random Variables 5.1 Functions of One Random Variable 5.2 Transformations of Two Random Variables 5.3 Several Random Variables 5.4 The Moment-Generating Function Technique
More informationStatistics 3858 : Maximum Likelihood Estimators
Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,
More information2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).
Name M362K Final Exam Instructions: Show all of your work. You do not have to simplify your answers. No calculators allowed. There is a table of formulae on the last page. 1. Suppose X 1,..., X 1 are independent
More informationChapter 5. Random Variables (Continuous Case) 5.1 Basic definitions
Chapter 5 andom Variables (Continuous Case) So far, we have purposely limited our consideration to random variables whose ranges are countable, or discrete. The reason for that is that distributions on
More informationContinuous Random Variables and Continuous Distributions
Continuous Random Variables and Continuous Distributions Continuous Random Variables and Continuous Distributions Expectation & Variance of Continuous Random Variables ( 5.2) The Uniform Random Variable
More informationRandom vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables.
Random vectors Recall that a random vector X = X X 2 is made up of, say, k random variables X k A random vector has a joint distribution, eg a density f(x), that gives probabilities P(X A) = f(x)dx Just
More informationMSc Mas6002 Introductory Material Block A Introduction to Probability and Statistics
MSc Mas6002 Introductory Material Block A Introduction to Probability and Statistics 1 Probability 1.1 Multiple approaches The concept of probability may be defined and interpreted in several different
More informationUniversity of Chicago Graduate School of Business. Business 41901: Probability Final Exam Solutions
Name: University of Chicago Graduate School of Business Business 490: Probability Final Exam Solutions Special Notes:. This is a closed-book exam. You may use an 8 piece of paper for the formulas.. Throughout
More information1.12 Multivariate Random Variables
112 MULTIVARIATE RANDOM VARIABLES 59 112 Multivariate Random Variables We will be using matrix notation to denote multivariate rvs and their distributions Denote by X (X 1,,X n ) T an n-dimensional random
More informationMATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation)
MATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation) Last modified: March 7, 2009 Reference: PRP, Sections 3.6 and 3.7. 1. Tail-Sum Theorem
More information[POLS 8500] Review of Linear Algebra, Probability and Information Theory
[POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming
More informationSUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)
SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) D. ARAPURA This is a summary of the essential material covered so far. The final will be cumulative. I ve also included some review problems
More informationMaster s Written Examination - Solution
Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2
More informationStat 5101 Notes: Brand Name Distributions
Stat 5101 Notes: Brand Name Distributions Charles J. Geyer February 14, 2003 1 Discrete Uniform Distribution DiscreteUniform(n). Discrete. Rationale Equally likely outcomes. The interval 1, 2,..., n of
More informationLIST OF FORMULAS FOR STK1100 AND STK1110
LIST OF FORMULAS FOR STK1100 AND STK1110 (Version of 11. November 2015) 1. Probability Let A, B, A 1, A 2,..., B 1, B 2,... be events, that is, subsets of a sample space Ω. a) Axioms: A probability function
More information3 Continuous Random Variables
Jinguo Lian Math437 Notes January 15, 016 3 Continuous Random Variables Remember that discrete random variables can take only a countable number of possible values. On the other hand, a continuous random
More informationFinal Exam # 3. Sta 230: Probability. December 16, 2012
Final Exam # 3 Sta 230: Probability December 16, 2012 This is a closed-book exam so do not refer to your notes, the text, or any other books (please put them on the floor). You may use the extra sheets
More informationSpring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =
Spring 2012 Math 541A Exam 1 1. (a) Let Z i be independent N(0, 1), i = 1, 2,, n. Are Z = 1 n n Z i and S 2 Z = 1 n 1 n (Z i Z) 2 independent? Prove your claim. (b) Let X 1, X 2,, X n be independent identically
More information(y 1, y 2 ) = 12 y3 1e y 1 y 2 /2, y 1 > 0, y 2 > 0 0, otherwise.
54 We are given the marginal pdfs of Y and Y You should note that Y gamma(4, Y exponential( E(Y = 4, V (Y = 4, E(Y =, and V (Y = 4 (a With U = Y Y, we have E(U = E(Y Y = E(Y E(Y = 4 = (b Because Y and
More informationStatistics for scientists and engineers
Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3
More informationBASICS OF PROBABILITY
October 10, 2018 BASICS OF PROBABILITY Randomness, sample space and probability Probability is concerned with random experiments. That is, an experiment, the outcome of which cannot be predicted with certainty,
More information2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.
CS450 Final Review Problems Fall 08 Solutions or worked answers provided Problems -6 are based on the midterm review Identical problems are marked recap] Please consult previous recitations and textbook
More informationDelta Method. Example : Method of Moments for Exponential Distribution. f(x; λ) = λe λx I(x > 0)
Delta Method Often estimators are functions of other random variables, for example in the method of moments. These functions of random variables can sometimes inherit a normal approximation from the underlying
More information1.1 Review of Probability Theory
1.1 Review of Probability Theory Angela Peace Biomathemtics II MATH 5355 Spring 2017 Lecture notes follow: Allen, Linda JS. An introduction to stochastic processes with applications to biology. CRC Press,
More informationConditioning a random variable on an event
Conditioning a random variable on an event Let X be a continuous random variable and A be an event with P (A) > 0. Then the conditional pdf of X given A is defined as the nonnegative function f X A that
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationChapter 4 Multiple Random Variables
Review for the previous lecture Theorems and Examples: How to obtain the pmf (pdf) of U = g ( X Y 1 ) and V = g ( X Y) Chapter 4 Multiple Random Variables Chapter 43 Bivariate Transformations Continuous
More information01 Probability Theory and Statistics Review
NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement
More informationUC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Problem Set 8 Fall 2007
UC Berkeley Department of Electrical Engineering and Computer Science EE 6: Probablity and Random Processes Problem Set 8 Fall 007 Issued: Thursday, October 5, 007 Due: Friday, November, 007 Reading: Bertsekas
More informationClosed book and notes. 60 minutes. Cover page and four pages of exam. No calculators.
IE 230 Seat # Closed book and notes. 60 minutes. Cover page and four pages of exam. No calculators. Score Exam #3a, Spring 2002 Schmeiser Closed book and notes. 60 minutes. 1. True or false. (for each,
More informationPhysics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester
Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability
More informationMultiple Random Variables
Multiple Random Variables Joint Probability Density Let X and Y be two random variables. Their joint distribution function is F ( XY x, y) P X x Y y. F XY ( ) 1, < x
More information