Petter Mostad Mathematical Statistics Chalmers and GU

Petter Mostad Mathematical Statistics Chalmers and GU Solution to MVE55/MSG8 Mathematical statistics and discrete mathematics MVE55/MSG8 Matematisk statistik och diskret matematik Re-exam: 4 August 6, 4: - 8:. (a One can use the Hypergeometric distribution: ( ( ( 6 4 5.. Alternatively, the probability can be computed by using that it is the same as sequentially choosing two non-defect items out of the 6, i.e., 6 5 5.. (b One can use the Hypergeometric distribution repeatedly. Alternatively, one can compute the probability of the particular sequence of events described in the question. It is: 6 5 4.5.. A confidence interval for a proportion p with significance level α is ˆp ± z α/ ˆp( ˆp/n where ˆp is the observed frequency, n is the sample size, and z α/ is the corresponding quantile of the standard normal distribution. In our case, ˆp 9/, n, and z α/ z.5.96 so we get i.e., [.98,.598].. (a A 95% confidence interval: 9/ ±.96 9/( 9//.9 ±.995, [7.4 z.5.4/7, 7.4 + z.5.4/7] [7.4.96.765, 7.4 +.96.765] [7.8, 7.58]

and a 99% confidence interval: [7.4 z.5.4/7, 7.4 + z.5.4/7] [7.4.58.765, 7.4 +.58.765] [7., 7.6] (b A 95% confidence interval: [7.4 t 6,.5.9/7, 7.4 + t6,.5.9/7] [7.4.45.7464, 7.4 +.45.7464] [7.4, 7.6] 4. (a The distribution can be computed as (b We get.....4..4....4.6 t.6..9.....4.57..4....59.5.4.6.8 t.6..9.7 Pr(X Pr(X X Pr(X X....4. (c The chain is neither ergodic nor absorbing. (d We get t Pr(X X Pr(X X Pr(X Pr(X.4...87. 5. The likelihood for λ given data x,..., x n is L(λ e λ λx x! λx λxn e λ e λ x! x n! e nλ λx+ +xn x!... x n! C exp( nλλs where C x!... x n! and S x + + x n. So the derivative of the likelihood is L (λ C ( exp( nλs λ S n exp( nλλ S C exp( nλλ S (S nλ Setting L (λ gives λ S/n, so the maximum likelihood estimator for λ is S/n. We get that ( S E n n E(S n n E(x i λ so the ML estimator is unbiased.

6. (a We have that θ aθ θ + ax θ dx + a θ θ a( x dx a [ x θ θ aθ ( θ + θ ] θ + a θ + a( θ ( θ a ] [x x θ which shows that a. Alternatively, we can see that the density function looks like two triangles, with the same height a and with total length of baselike equal to. Thus their area is a/, and for this area to be, we need to have a. (b We have 7. We get E(X E(cos X θ aθ θ + x ax θ dx + a θ θ a( x x dx a θ θ ( θ + θ a θ + a 6 ( + θ θ a ( + θ 6 cos x π dx π [sin x]π [ x a θ + Var(cos X E(cos X E(cos X cos x π dx π [ π x + ] π 4 sin(x π π E(sin X sin x π dx π [ cos x]π π Var(sin X E(sin X E(sin X sin x π dx 4 π ( π cos(x dx 4 π [ π x ] π 4 sin(x 4 π π π 4 π 4 π ] θ + a [ ] x θ x θ a 6( θ ( θ + θ ( + cos(x dx 8. (a When x increases, x/( + x goes to zero at the rate of /x: Specifically, we prove that, for x >, x + x > x by noting that x > + x when x >. Thus we get x π( + x dx π x + x dx > π x dx π [ ] log(x

so the expectation does not exist. (b If we had started with a Beta distribution or a uniform distribution, the histograms would have been bell-shaped, i.e., they would look like samples from normal distributions. The reason is the Central Limit Theorem. However, for the Central Limit Theorem to apply, the distribution we start with must have an existing expectation and an existing variance. Thus, the CLT does not apply to the Cauchy distribution.

9. (a P(X > P(X + P(X. +.. (b P(XY > P(X > and Y > P(X > P(Y > (.4..8 (c P(X + Y P(X and Y + P(X and Y + P(X and Y P(X P(Y + P(X P(Y + P(X P(Y.4. +..4 +... (d E(X.4 +. +. +. (e E(XY E(X E(Y as E(Y. (f Var(Y E(Y E(Y E(Y.4 +. + (..6. Assume X, X,..., X n is a random sample from some probability distribution, and assume this distribution is from a family of distributions parametrized by parameters θ,..., θ k. The purpose of the Method of Moments is to construct functions θ,..., θ k of the random sample that can work as estimators for the parameters θ,..., θ k. The idea is the following: If M,..., M s denote the first s moments of a distribution in the parametric family, then these depend on the parameters θ,..., θ k, and one can obtain formulas expressing relating M,..., M s to θ,..., θ k. In these formulas, one may make the replacement M j n n i X j i and solve for the parameters θ,..., θ n in order to obtain estimators. As an example, consider the Negative Binomial distribution with parameters r and p, where r is a positive integer and p (,. The expressions for its expectation and variance gives us M r/p M M r( p/p Solving for the parameters and making the substitutions, one obtain formulas for example on the form p r M M M + M M M M + M X n i Xi X + X X n i Xi X + X with the restriction that r must be an integer.

. The statement of the Central Limit Theorem (CLT given in Milton and Arnold is: Let X, X,..., X n be a random sample of size n from a distribution with mean µ and variance σ. Then for large n, X is approximately normal with mean µ and variance σ /n. Furthermore, for large n, the random variable (X µ/(σ/ n is approximately standard normal. There are many practial effects of the CTL. One very fundamental is that many variables measured in practice will tend to have a normal distribution, as their values can be modelled as the sum of many small variables that are more or less independent. (Example: Weight of a bag of chips supposed to weigh grams.. (a f (x π exp ( x. (b (c yields m(t E(e tx e tx exp ( x dx π ( exp π (x tx dx exp ( (x tx + t + t dx π e t exp ( (x t dx π e t m (t te t m (t e t + t e t ( + t e t m (t te t + ( + t te t (t + t e t M m ( M m ( M m (. (a A confidence interval for µ is given by x ± z.5 σ/ n 6.89 ±.96.7/ 9 6.89 ±.47, in other words, [.77, 8.66].

(b The length of the interval will be (c We get z.5 σ/ 4 + 9.96.7/ 5.5 z.5 σ/ n..58.7/. n 7.56 n n so one should sample a total of values. (d The sample standard deviation for the numbers is s.66. The 95% confidence interval becomes x ± t 8,.5 s/ n 6.89 ±.6.66/ 9 6.89 ±.77, in other words, [.47, 8.96]. 4. (a E(X i p +( p p and Var(X i E(X i E(X i p +( p p p( p. (b E(X n i E(X i n i p p and Var(X n i Var(X i n i p( p p( p/n. (c By the central limit theorem, X has an approximately normal distribution, and by the above, it is then approximately distributed as a normal distribution with expectation p and variance p( p/n. (d From the above, we get that P ( p z α/ p( p/n x p + zα/ p( p/n α. As p x, we substitute some p s with x and get and thus P ( p z α/ x( x/n x p + zα/ x( x/n α, P ( x z α/ x( x/n p x + zα/ x( x/n α. In particular, we get, for α.5, so P ( x.96 x( x/n p x +.96 x( x/n α. x ±.96 x( x/n is a confidence interval with confidence degree 95%.

5. (a A simple description of simple linear regression is that one tries to fit a straight line to a set of data points in the plane. More precisely the best-fitting line is considered to be the line such that the sum of the squares of the vertical distances between the points and the line is minimized. Such a line represents the least squares solution. (b The definition of S shows that it is the sum of the residuals y i ŷ i of the regression. One may remember that this sum is always zero. However, one may also show directly that S : Assume S is not zero. Then there exists an ɛ such that i(y i ŷ i + ɛ. But then n (y i ŷ i i n ( yi ŷ i + ɛ ɛ i n [( yi ŷ i + ɛ ( y i ŷ i + ɛ ɛ + ɛ ] i n ( yi ŷ i + ɛ ɛ i n ( yi ŷ i + ɛ + nɛ. i n (y i ŷ i + ɛ + nɛ i Thus the line going through the points (x, ŷ ɛ, (x, ŷ ɛ,..., (x n, ŷ n ɛ has a smaller sum of squares than the original regression line. This is a contradiction, proving that S is indeed zero. 6. P is a quadratic matrix of non-negative numbers with rows summing to, so it is a transition matrix for a Markov chain. It is not absorbing as it does not have any absorbing states. It is ergodic, and also regular, as it has only positive entries. P has some negative values, so it is not a transition matrix. P is a quadratic matrix of non-negative numbers with rows summing to, so it is a transition matrix. It is an absorbing chain, as it has an absorbing state and all states has a positive probability ending up in the absorbing state. It is not ergodic and not regular. P 4 is a quadratic matrix of non-negative numbers with rows summing to, so it is a transition matrix. It has an absorbing state, but it is not absorbing as the other states have zero probability of entering the absorbing state. It is not ergodic, and not regular. P 5 is a quadratic matrix of non-negative numbers with rows summing to, so it is a transition matrix. it is not absorbing. It is however ergodic. It is also regular, as one can go from any state to any other state in at most two steps. P 6 does not have rows summing to, so it is not a transition matrix of a Markov chain.