Information in Data. Sufficiency, Ancillarity, Minimality, and Completeness

Size: px
Start display at page:

Download "Information in Data. Sufficiency, Ancillarity, Minimality, and Completeness"

Transcription

1 Information in Data Sufficiency, Ancillarity, Minimality, and Completeness Important properties of statistics that determine the usefulness of those statistics in statistical inference. These general properties often can be used as guides in seeking optimal statistical procedures.

2 Approaches to Statistical Inference use of the empirical cumulative distribution function (ECDF) for example, method of moments use of a likelihood function for example, maximum likelihood fitting expected values for example, least squares fitting a probability distribution for example, maximum entropy definition and use of a loss function for example, uniform minimum variance unbiased estimation.

3 The Likelihood Function The domain of the likelihood function is some class of distributions specified by their probability densities, P = {p i (x)}, where all PDFs are with respect to a common σ-finite measure. In applications, often the PDFs are of a common parametric form, so equivalently, we can think of the domain of the likelihood function as being a parameter space, say Θ, so the family of densities can be written as P = {p θ (x)} where θ Θ, the known parameter space. We often write the likelihood function as L(θ ; x) = n i=1 p(x i ; θ). Although we have written L(θ ; x), the expression L(p θ ; x) may be more appropriate because it reminds us of an essential ingredient in the likelihood, namely a PDF.

4 PDF p X (x;θ) Likelihood L(θ; x) θ=1 θ=5 x=1 x= x 0 θ

5 What Likelihood Is Not A likelihood is neither a probability nor a probability density. Notice, for example, that the definite integral over IR + of the likelihood in exponential is x 2 ; that is, the likelihood does not integrate to 1. It is not appropriate to refer to the likelihood of an observation. We use the term likelihood in the sense of the likelihood of a model or the likelihood of a distribution given observations.

6 Likelihood Principle The likelihood principle in statistical inference asserts that all of the information which the data provide concerning the relative merits of two hypotheses (two possible distributions that give rise to the data) is contained in the likelihood ratio of those hypotheses and the data. An alternative statement of the likelihood principle is that if for x and y, L(θ ; x) = c(x, y) L(θ ; y) θ, where c(x, y) is constant for given x and y, then any inference about θ based on x should be in agreement with any inference about θ based on y.

7 Example 1 The likelihood principle in sampling in a Bernoulli distribution Consider the problem of making inferences on the parameter π in a family of Bernoulli distributions. One approach is to take a random sample of size n, X 1,..., X n from the Bernoulli(π), and then use T = X i, which has a binomial distribution with parameters n and π. Another approach is to take a sequential sample, X 1, X 2,..., until a fixed number t of 1 s have occurred. The size of the sample N is random has the negative binomial with parameters t and π.

8 Now, suppose we take the first approach with n = 10 and we observe t = 5; and then we take the second approach with t = 5 and we observe n = 10. We get the likelihoods L B (π) = ( 10) π 5 (1 π) 5 5 and L NB (π) = ( 9) π 5 (1 π) 5. 4 Because L B (π)/l NB (π) does not involve π, in accordance with the likelihood principle, any decision about π based on a binomial observation of 5 out of 10 should be the same as any decision about π based on a negative binomial observation of 10 for 5 1 s. This seems reasonable. We see that the likelihood principle allows the likelihood function to be defined as any member of an equivalence class {cl : c IR + }.

9 MLE Let L(θ ; x) be the likelihood of θ Θ for the observations x from a distribution with PDF with respect to a σ-finite measure ν. A maximum likelihood estimate, or MLE, of θ, written ˆθ, is defined as if it exists. ˆθ = arg max θ Θ L(θ ; x), There may be more than solution; any one is an MLE.

10 Example 2 MLE in a linear model Let Y = Xβ + E, where Y and E are n-vectors with E(E) = 0 and V(E) = σ 2 I n, X is an n p matrix whose rows are the x T i, and β is the p-vector parameter. A least squares estimator of β is b = arg min Y Xb 2 b B = (X T X) X T Y. Even if X is not of full rank, in which case the least squares estimator is not unique, we found that the least squares estimator has certain optimal properties for estimable functions. Of course at this point, we could not use MLE we do not have a distribution.

11 Then we considered the additional assumption in the model that or E N n (0, σ 2 I n ), Y N n (Xβ, σ 2 I n ). In that case, we found that the least squares estimator yielded the unique UMVUE for any estimable function of β and for σ 2. We could define a least squares estimator without an assumption on the distribution of Y or E, but for an MLE we need an assumption on the distribution. Again, let us assume Y N n (Xβ, σ 2 I n ), yielding, for an observed y, the log-likelihood l L (β, σ 2 y, X) = n 2 log(2πσ2 ) 1 2σ 2(y Xβ)T (y Xβ). So an MLE of β is the same as a least squares estimator of β.

12 Estimation of σ 2, however, is different. In the case of least squares estimation, with no specific assumptions about the distribution of E in the model, we have no basis for forming an objective function of squares to minimize. Even with an assumption of normality, however, instead of explicitly forming a least squares problem for estimating σ 2, we used the least squares estimator of β, which is complete and sufficient, and we used the distribution of (y Xb ) T (y Xb ) to form a UMVUE of σ 2, where r = rank(x). s 2 = (Y Xb ) T (Y Xb )/(n r),

13 In the case of maximum likelihood, we directly determine the value of σ 2 that maximizes the expression. This is an easy optimization problem. The solution is where β = b is an MLE of β. σ 2 = (y X β) T (y X β)/n Compare the MLE of σ 2 with the least squares estimator, and note that the MLE is biased. Recall that we have encountered these two estimators in the simpler cases.

14 Example 3 one-way random-effects AOV model Consider the linear model Y ij = µ + δ i + ɛ ij, i = 1,..., m, j = 1,..., n, where the δ i are identically distributed with E(δ i ) = 0, V(δ i ) = σ 2 δ, and Cov(δ i, δĩ) = 0 for i ĩ, and the ɛ ij are independent of the δ i and are identically distributed with with E(ɛ ij ) = 0, V(ɛ ij ) = σ 2 ɛ, and Cov(ɛ ij, ɛĩ j ) = 0 for either i ĩ or j j. A model such as this may be appropriate when there are a large number of possible treatments and m of them are chosen randomly and applied to experimental units whose responses Y ij are observed. While in the fixed-effects model, we are interested in whether α 1 = = α m = 0, in the random-effects model, we are interested in whether σδ 2 = 0, which would result in a similar practical decision about the treatments.

15 In the random-effects model the variance of each Y ij is σ 2 δ + σ2 ɛ, and our interest in using the model is to make inference on the relative sizes of the components of the variance σ 2 δ and σ2 ɛ. The model is sometimes called a variance components model. To estimate the variance components, we first observe that Cov(Y ij, Yĩ j ) = σ 2 δ + σ2 ɛ for i = ĩ, j = j, σ 2 δ for i = ĩ, j j, 0 for i ĩ.

16 To continue with the analysis, we follow the same steps, and get the same decomposition of the adjusted total sum of squares : m n i=1 j=1 (Y ij Y ) 2 = SSA + SSE. Now, to make inferences about σδ 2, we will add to our distribution assumptions, which at this point involve only means, variances, and covariances and independence of the δ i and ɛ ij. Let us suppose now that δ i i.i.d. N(0, σ 2 δ ) and ɛ ij i.i.d. N(0, σ 2 ɛ ). Again, we get chi-squared distributions, but the distribution involving SSA is not the same as for the fixed-effects model.

17 Forming MSA = SSA/(m 1) and MSE = SSE/(m(n 1)), we see that E(MSA) = nσ 2 δ + σ2 ɛ and E(MSE) = σ 2 ɛ. Unbiased estimators of σ 2 δ and σ2 ɛ are therefore s 2 δ = (MSA MSE)/n and s 2 ɛ = MSE, and we can also see that these are UMVUEs.

18 At this point, we note something that might at first glance be surprising: s 2 δ may be negative. This occurs if (m 1)MSA/m < MSE. (This will be the case if the variation among Y ij for a fixed i is relatively large compared to the variation among Y i.) Now, we consider the MLE of σ 2 δ and σ2 ɛ. In the case of the model with the assumption of normality and independence, it is relatively easy to write the log-likelihood, l L (µ, σ 2 δ, σ2 ɛ ; y) = 1 2 ( mn log(2π) + m(n 1) log(σ 2 ɛ ) + m log(σ 2 δ ) +SSE/σ 2 ɛ + SSA/(σ 2 ɛ + nσ 2 δ ) + mn(ȳ µ)2 /(σ 2 ɛ + nσ 2 δ )).

19 The MLEs must be in the closure of the parameter space, which for σ 2 δ and σ2 ɛ is IR + {0}. From this we have the MLEs if (m 1)MSA/m MSE σ δ 2 = 1 ( m 1 MSA MSE n m σ ɛ 2 = MSE ) if (m 1)MSA/m < MSE σ 2 δ = 0 σ ɛ 2 = m 1 m MSE. This was easy because the model was balanced. If, instead, we had j = 1,..., n i, it would be much harder.

20 Any approach to estimation may occasionally yield very poor or meaningless estimators. In addition to the possibly negative UMVUEs for variance components, the UMVUE of g(θ) = e 3θ in a Poisson distribution is not a very good estimator. While in some cases the MLE is more reasonable, in other cases the MLE may be very poor. An example of a likelihood function that is not very useful without some modification is in nonparametric probability density estimation. Suppose we assume that a sample comes from a distribution with continuous PDF p(x). The likelihood is n i=1 p(x i ). Even under the assumption of continuity, there is no solution.

21 C. R. Rao cites another example in which the likelihood function is not very meaningful. Consider an urn containing N balls labeled 1,..., N and also labeled with distinct real numbers θ 1,..., θ N (with N known). For a sample without replacement of size n < N where we observe (x i, y i ) = (label, θ label ), what is the likelihood function? It is either 0, if the label and θ label for at least one observation is inconsistent, or ( ) N 1, n otherwise; and, of course, we don t know! This likelihood function is not informative, and could not be used, for example, for estimating θ = θ θ N. (There is a pretty good estimator of θ; it is N( y i )/n.)

22 Finding an MLE Notice that the problem of obtaining an MLE is a constrained optimization problem; that is, an objective function is to be optimized subject to the constraints that the solution be within the closure of the parameter space. In some cases the MLE occurs at a stationary point, which can be identified by differentiation. That is not always the case, however. A standard example in which the MLE does not occur at a stationary point is a distribution in which the range depends on the parameter, and the simplest such distribution is the uniform U(0, θ).

23 Derivatives of the Likelihood and Log-Likelihood If the likelihood function and consequently the log-likelihood are twice differentiable within Θ, finding an MLE may be relatively easy. The derivatives of the log-likelihood function relate directly to useful concepts in statistical inference. If it exists, the derivative of the log-likelihood is the relative rate of change, with respect to the parameter placeholder θ, of the probability density function at a fixed observation. If θ is a scalar, some positive function of the derivative, such as its square or its absolute value, is obviously a measure of the effect of change in the parameter, or of change in the estimate of the parameter.

24 More generally, an outer product of the derivative with itself is a useful measure of the changes in the components of the parameter: l L (θ (k) ; x) ( ll (θ (k) ) T ; x). Notice that the average of this quantity with respect to the probability density of the random variable X, I(θ 1 ; X) = E θ1 ( l L (θ (k) ( ) ; X) ll (θ (k) T ; X)), is the information matrix for an observation on Y parameter θ. about the

25 If θ is a scalar, the square of the first derivative is the negative of the second derivative, or, in general, ( ) 2 θ l L(θ ; x) = 2 θ 2l L(θ ; x), l L (θ (k) ; x) ( ll (θ (k) ) T ; x) = HlL (θ (k) ; x).

26 Computations If the log-likelihood is twice differentiable and if the range does not depend on the parameter, Newton s method could be used. Newton s equation H ll (θ (k 1) ; x) d (k) = l L (θ (k 1) ; x) is used to determine the step direction in the k th iteration. A quasi-newton method uses a matrix H ll (θ (k 1) ) in place of the Hessian H ll (θ (k 1) ). The second derivative, or an approximation of it, is used in a Newton-like method to solve the maximization problem.

27 The Likelihood Equation In the regular case, with the likelihood log-likelihood function differentiable within Θ, we call or the likelihood equations. L(θ ; x) = 0 l L (θ ; x) = 0 If the maximum occurs within Θ, then every MLE is a root of the likelihood equations. There may be other roots within Θ, of course. A likelihood equation is an estimating equation, similar to the estimating equation for least squares estimators.

28 The Likelihood Equation Any root of the likelihood equations, which is called an RLE, may be an MLE. A theorem from functional analysis, usually proved in the context of numerical optimization, states that if θ is an RLE and H ll (θ ; x) is negative definite, then there is a local maximum at θ. This may allow us to determine that an RLE is an MLE. There are, of course, other ways of determining whether an RLE is an MLE. In MLE, the determination that an RLE is actually an MLE is an important step in the process.

29 Consider the gamma family of distributions with parameters α and β. Given a random sample x 1,..., x n, the log-likelihood of α and β is l L (α, β ; x) = nα log(β) n log(γ(α)))+(α 1) log(x i ) 1 β xi. This yields the likelihood equations and n log(β) nγ (α) Γ(α) + log(x i ) = 0 nα β + 1 β 2 xi = 0.

30 Nondifferentiable Likelihood Functions The definition of MLEs does not depend on the existence of a likelihood equation. The likelihood function may not be differentiable with respect to the parameter, as in the case of the hypergeometric distribution, in which the parameter must be an integer (see Shao, Example 4.32).

31 Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that they also often have good asymptotic properties. We now consider some other properties; some useful and some less desirable.

32 Relation to Sufficient Statistics Theorem 1 If there is a sufficient statistic and an MLE exists, then an MLE is a function of the sufficient statistic. Proof. This follows directly from the factorization theorem.

33 Relation to Efficient Statistics Given the three Fisher information regularity conditions, we have defined Fisher efficient estimators as unbiased estimators that achieve the lower bound on their variance. Theorem 2 Assume the FI regularity conditions for a family of distributions {P θ } with the additional Le Cam-type requirement that the Fisher information matrix I(θ) is positive definite for all θ. Let T (X) be a Fisher efficient estimator of θ. Then T (X) is an MLE of θ.

34 Proof. Let p θ (x) be the PDF. We have θ log(p θ(x) = I(θ)(T (x) θ) for any θ and x. Clearly, for θ = T (x), this equation is 0 (hence, T (X) is an RLE). Because I(θ), which is the negative of the Hessian of the likelihood, is positive definite, T (x) maximizes the likelihood. Notice that without the additional requirement of a positive definite information matrix, Theorem 2 would yield only the conclusion that T (X) is an RLE.

35 Equivariance of MLEs If ˆθ is a good estimator of θ, it would seem to be reasonable that g(ˆθ) is a good estimator of g(θ), where g is a Borel function. Good, of course, is relative to some criterion. If the criterion is UMVU, then the estimator in general will not have this equivariance property; that is, if ˆθ is a UMVUE of θ, then g(ˆθ) may not be a UMVUE of g(θ). Maximum likelihood estimators almost have this equivariance property, however. Because of the word almost in the previous sentence, which we will elaborate on below, we will just define MLEs of functions so as to be equivariant.

36 Definition 1 (functions of estimands and of MLEs) If ˆθ is an MLE of θ, and g is a Borel function, then g(ˆθ) is an MLE of the estimand g(θ). The reason that this does not follow from the definition of an MLE is that the function may not be one-to-one. We could just restrict our attention to one-to-one functions. Instead, however, we will now define a function of the likelihood, and show that g(ˆθ) is an MLE w.r.t. this transformed likelihood, called the induced likelihood. Note, of course, that the expression g(θ) does not allow other variables and functions, e.g., we would not say that E((Y Xβ) T (Y Xβ)) qualifies as a g(θ) in the definition above. (Otherwise, what would an MLE of σ 2 be?)

37 Definition 2 (induced likelihood) Let {p θ : θ Θ} with Θ IR d be a family of PDFs w.r.t. a common σ-finite measure, and let L(θ) be the likelihood associated with this family, given observations. Now let g be a Borel function from Θ to Λ IR d 1 where 1 d 1 d. Then L(λ) = sup {θ : θ Θ and g(θ)=λ} L(θ) is called the induced likelihood function for the transformed parameter. Theorem 3 Suppose {p θ : θ Θ} with Θ IR d is a family of PDFs w.r.t. a common σ-finite measure with associated likelihood L(θ). Let ˆθ be an MLE of θ. Now let g be a Borel function from Θ to Λ IR d 1 where 1 d 1 d and let L(λ) be the resulting induced likelihood. Then g(ˆθ) maximizes L(λ).

38 Example 4 MLE of the variance in a Bernoulli distribution Consider the Bernoulli family of distributions with parameter π. The variance of a Bernoulli distribution is g(π) = π(1 π). Given a random sample x 1,..., x n, the MLE of π is ˆπ = n i=1 x i /n, hence the MLE of the variance is 1 ( xi 1 xi /n ). n Note that this estimator is biased and that it is the same estimator as that of the variance in a normal distribution: (xi x) 2 /n. The UMVUE of the variance in a Bernoulli distribution is, 1 ( xi 1 xi /n ). n 1

39 The difference in the MLE and the UMVUE of the variance in the Bernoulli distribution is the same as the difference in the estimators of the variance in the normal distribution. How do the MSEs of the estimators of the variance in a Bernoulli distribution compare? (Exercise.) Whenever the variance of a distribution can be expressed as a function of other parameters g(θ), as in the case of the Bernoulli distribution, the estimator of the variance is g(ˆθ), where ˆθ is an MLE of θ. The MLE of the variance of the gamma distribution, for example, is ˆαˆβ 2, where ˆα and ˆβ are the MLEs. The plug-in estimator of the variance of the gamma distribution is different. Given the sample, X 1, X 2..., X n, as always, it is 1 n n i=1 (X i X) 2.

40 Other Properties Other properties of MLEs are not always desirable. First of all, we note that an MLE may be biased. familiar example of this is the MLE of the variance. The most Another example is the MLE of the location parameter in the uniform distribution. Although the MLE approach is usually an intuitively logical one, it is not based on a formal decision theory, so it is not surprising that MLEs may not possess certain desirable properties that are formulated from that perspective. There are other interesting examples in which MLEs do not have desirable (or expected) properties.

41 An MLE may be discontinuous as, for example, in the case of ɛ-mixture distribution family with CDF where 0 ɛ 1. P x,ɛ (y) = (1 ɛ)p (y) + ɛi [x, [ (y), An MLE may not be a function of a sufficient statistic (if the MLE is not unique). An MLE may not satisfy the likelihood equation as, for example, when the likelihood function is not differentiable at its maximum. An MLE may differ from a simple method of moments estimator; in particular an MLE of the population mean may not be the sample mean.

42 The likelihood equation may have a unique root, but no MLE exists. While there are examples in which the roots of the likelihood equations occur at minima of the likelihood, this situation does not arise in any realistic distribution (that I am aware of). Romano and Siegel (1986) construct a location family of distributions with support on IR {x 1 + θ, x 2 + θ : x 1 < x 2 }, where x 1 and x 2 are known but θ is unknown, with a Lebesgue density p(x) that rises as x x 1 to a singularity at x 1 and rises as x x 2 to a singularity at x 2 and that is continuous and strictly convex over ]x 1, x 2 [ and singular at both x 1 and x 2. With a single observation, the likelihood equation has a root at the minimum of the convex portion of the density between x 1 +θ and x 2 + θ, but likelihood increases without bound at both x 1 + θ and x 2 + θ.

43 Nonuniqueness There are many cases in which the MLEs are not unique (and I m not just referring to RLEs). The following examples illustrate this. Example 5 likelihood in a Cauchy family Consider the Cauchy distribution with location parameter θ. The likelihood equation is n i=1 2(x i θ) 1 + (x i θ) 2. This may have multiple roots (depending on the sample), and so the one yielding the maximum would be the MLE. Depending on the sample, however, multiple roots can yield the same value of the likelihood function.

44 Another example in which the MLE is not unique is U(θ 1 2, θ ). Example 6 likelihood in a uniform family with fixed range The likelihood function for U(θ 1 2, θ ) is I ]x(n) 1/2,x (1) +1/2[ (θ). It is maximized at any value between x (n) 1/2 and x (1) + 1/2.

45 Nonexistence and Other Properties We have already mentioned situations in which the likelihood approach does not seem to be the logical way, and have seen that sometimes in nonparametric problems, the MLE does not exist. This often happens when there are more things to estimate than there are observations. This can also happen in parametric problems. In this case, some people prefer to say that the likelihood function does not exist; that is, they suggest that the definition of a likelihood function include boundedness.

46 MLE and the Exponential Class If X has a distribution in the exponential class and we write its density in the natural or canonical form, the likelihood has the form L(η ; x) = exp(η T T (x) ζ(η))h(x). The log-likelihood equation is particularly simple: T (x) ζ(η) = 0. η Newton s method for solving the likelihood equation is ( η (k) = η (k 1) 2 ) ζ(η) 1 ( η( η) T η=η (k 1) T (x) ζ(η) η η=η (k 1) Note that the second term includes the Fisher information matrix for η. (The expectation is constant.) (Note that the FI is not for a distribution; it is for a parametrization.) )

47 We have V(T (X)) = 2 ζ(η) η( η) T η=η. Note that the variance is evaluated at the true η. If we have a full-rank member of the exponential class then V is positive definite, and hence there is a unique maximum. If we write µ(η) = ζ(η) η, in the full-rank case, µ 1 exists and so we have the solution to the likelihood equation: ˆη = µ 1 (T (x)). So maximum likelihood estimation is very nice for the exponential class.

48 EM Methods Although EM methods do not rely on missing data, they can be explained most easily in terms of a random sample that consists of two components, one observed and one unobserved or missing. A simple example of missing data occurs in life-testing, when, for example, a number of electrical units are switched on and the time when each fails is recorded. In such an experiment, it is usually necessary to curtail the recordings prior to the failure of all units. The failure times of the units still working are unobserved, but the number of censored observations and the time of the censoring obviously provide information about the distribution of the failure times.

49 Another common example that motivates the EM algorithm is a finite mixture model. Each observation comes from an unknown one of an assumed set of distributions. The missing data is the distribution indicator. The parameters of the distributions are to be estimated. As a side benefit, the class membership indicator is estimated. The missing data can be missing observations on the same random variable that yields the observed sample, as in the case of the censoring example; or the missing data can be from a different random variable that is related somehow to the random variable observed. Many common applications of EM methods involve missing-data problems, but this is not necessary.

50 Example 7 MLE in a normal mixtures model A two-component normal mixture model can be defined by two normal distributions, N(µ 1, σ1 2) and N(µ 2, σ2 2 ), and the probability that the random variable (the observable) arises from the first distribution is w. The parameter in this model is the vector θ = (w, µ 1, σ 2 1, µ 2, σ 2 2 ). (Note that w and the σs have the obvious constraints.) The pdf of the mixture is p(y; θ) = wp 1 (y; µ 1, σ 2 1 ) + (1 w)p 2(y; µ 2, σ 2 2 ), where p j (y; µ j, σ 2 j ) is the normal pdf with parameters µ j and σ 2 j.

51 We denote the complete data as C = (X, U), where X represents the observed data, and the unobserved U represents class membership. Let U = 1 if the observation is from the first distribution and U = 0 if the observation is from the second distribution. The unconditional E(U) is the probability that an observation comes from the first distribution, which of course is w. Suppose we have n observations on X, x 1,..., x n. Given a provisional value of θ, we can compute the conditional expected value E(U x) for any realization of X. It is merely E(U x, θ (k) ) = w (k) p 1 (x; µ (k) 1, σ2(k) 1 ) p(x; w (k), µ (k) 1, σ2(k) 1, µ (k) 2, σ2(k) 2 )

52 The M step is just the familiar MLE of the parameters: σ 2(k+1) 1 = w (k+1) = 1 n E(U xi, θ (k) ) µ (k+1) 1 = 1 nw (k+1) q (k) (xi, θ (k) )x i 1 nw (k+1) q (k) (xi, θ (k) )(x i µ (k+1) 1 ) 2 µ (k+1) 2 = 1 n(1 w (k+1) ) q (k) (x i, θ (k) )x i σ 2(k+1) 2 = 1 n(1 w (k+1) ) q (k) (xi, θ (k) )(x i µ (k+1) 2 ) 2 (Recall that the MLE of σ 2 has a divisor of n, rather than n 1.)

53 Asymptotic Properties of MLEs, RLEs, and GEE Estimators The argmax of the likelihood function, that is, the MLE of the argument of the likelihood function, is obviously an important statistic. In many cases, a likelihood equation exists, and often in those cases, the MLE is a root of the likelihood equation. In some cases there are roots of the likelihood equation (RLEs) that may or may not be an MLE.

54 Asymptotic Distributions of MLEs and RLEs We recall that asymptotic expectations are defined as expectations in asymptotic distributions (rather than as limits of expectations). The first step in studying asymptotic properties is to determine the asymptotic distribution.

55 Example 8 asymptotic distribution of the MLE of the variance in a Bernoulli family In Example 4 we determined the MLE of the variance g(π) = π(1 π) in the Bernoulli family of distributions with parameter π. The MLE of g(π) is T n = X(1 X). Now, let us determine its asymptotic distributions.

56 From the central limit theorem, n(x π) N(0, g(π)). If π 1/2, g (π) 0, we can use the delta method and the CLT to get n(g(π) Tn ) N(0, π(1 π)(1 2π) 2 ). (I have written (g(π) T n ) instead of (T n g(π)) so the expression looks more similar to one we get next.) If π = 1/2, this is a degenerate distribution. (The limiting variance actually is 0, but the degenerate distribution is not very useful.)

57 Let s take a different approach for the case π = 1/2. We have from the CLT, ( n(x 1/2) N 0, 1 ). 4 Hence, if we scale and square, we get 4n(X 1/2) 2 d χ 2 1, or 4n(g(π) T n ) d χ 2 1. This is a specific instance of a second order delta method.

58 Following the same methods as for deriving the first order delta method using a Taylor series expansion, for the univariate case, we can see that n(sn c) N(0, σ 2 ) g (c) = 0 g (c) 0 = 2n(g(S n) g(c)) σ 2 g (c) d χ 2 1.

Review and continuation from last week Properties of MLEs

Review and continuation from last week Properties of MLEs Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that

More information

Statistical Estimation

Statistical Estimation Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Put your solution to each problem on a separate sheet of paper. Problem 1. (5166) Assume that two random samples {x i } and {y i } are independently

More information

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or

More information

ECE 275A Homework 7 Solutions

ECE 275A Homework 7 Solutions ECE 275A Homework 7 Solutions Solutions 1. For the same specification as in Homework Problem 6.11 we want to determine an estimator for θ using the Method of Moments (MOM). In general, the MOM estimator

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions. Large Sample Theory Study approximate behaviour of ˆθ by studying the function U. Notice U is sum of independent random variables. Theorem: If Y 1, Y 2,... are iid with mean µ then Yi n µ Called law of

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function

More information

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018 Mathematics Ph.D. Qualifying Examination Stat 52800 Probability, January 2018 NOTE: Answers all questions completely. Justify every step. Time allowed: 3 hours. 1. Let X 1,..., X n be a random sample from

More information

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may

More information

Stat 5102 Lecture Slides Deck 3. Charles J. Geyer School of Statistics University of Minnesota

Stat 5102 Lecture Slides Deck 3. Charles J. Geyer School of Statistics University of Minnesota Stat 5102 Lecture Slides Deck 3 Charles J. Geyer School of Statistics University of Minnesota 1 Likelihood Inference We have learned one very general method of estimation: method of moments. the Now we

More information

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments STAT 135 Lab 3 Asymptotic MLE and the Method of Moments Rebecca Barter February 9, 2015 Maximum likelihood estimation (a reminder) Maximum likelihood estimation Suppose that we have a sample, X 1, X 2,...,

More information

Lecture 32: Asymptotic confidence sets and likelihoods

Lecture 32: Asymptotic confidence sets and likelihoods Lecture 32: Asymptotic confidence sets and likelihoods Asymptotic criterion In some problems, especially in nonparametric problems, it is difficult to find a reasonable confidence set with a given confidence

More information

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak. Large Sample Theory Large Sample Theory is a name given to the search for approximations to the behaviour of statistical procedures which are derived by computing limits as the sample size, n, tends to

More information

Introduction to Estimation Methods for Time Series models Lecture 2

Introduction to Estimation Methods for Time Series models Lecture 2 Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

Chapter 3 : Likelihood function and inference

Chapter 3 : Likelihood function and inference Chapter 3 : Likelihood function and inference 4 Likelihood function and inference The likelihood Information and curvature Sufficiency and ancilarity Maximum likelihood estimation Non-regular models EM

More information

Some General Types of Tests

Some General Types of Tests Some General Types of Tests We may not be able to find a UMP or UMPU test in a given situation. In that case, we may use test of some general class of tests that often have good asymptotic properties.

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)

More information

Generalized Linear Models Introduction

Generalized Linear Models Introduction Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,

More information

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Theory of Maximum Likelihood Estimation. Konstantin Kashin Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two

More information

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Data from one or a series of random experiments are collected. Planning experiments and collecting data (not discussed here). Analysis:

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random

More information

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown. Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)

More information

Chapter 3: Maximum Likelihood Theory

Chapter 3: Maximum Likelihood Theory Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood

More information

Lecture 1: August 28

Lecture 1: August 28 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random

More information

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models. Last time: Background & motivation for moving beyond linear Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

More information

Probability and Statistics qualifying exam, May 2015

Probability and Statistics qualifying exam, May 2015 Probability and Statistics qualifying exam, May 2015 Name: Instructions: 1. The exam is divided into 3 sections: Linear Models, Mathematical Statistics and Probability. You must pass each section to pass

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 05 Full points may be obtained for correct answers to eight questions Each numbered question (which may have several parts) is worth

More information

Problem Selected Scores

Problem Selected Scores Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected

More information

Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic

Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic Unbiased estimation Unbiased or asymptotically unbiased estimation plays an important role in

More information

Stat 710: Mathematical Statistics Lecture 12

Stat 710: Mathematical Statistics Lecture 12 Stat 710: Mathematical Statistics Lecture 12 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 12 Feb 18, 2009 1 / 11 Lecture 12:

More information

5.2 Fisher information and the Cramer-Rao bound

5.2 Fisher information and the Cramer-Rao bound Stat 200: Introduction to Statistical Inference Autumn 208/9 Lecture 5: Maximum likelihood theory Lecturer: Art B. Owen October 9 Disclaimer: These notes have not been subjected to the usual scrutiny reserved

More information

Estimation, Inference, and Hypothesis Testing

Estimation, Inference, and Hypothesis Testing Chapter 2 Estimation, Inference, and Hypothesis Testing Note: The primary reference for these notes is Ch. 7 and 8 of Casella & Berger 2. This text may be challenging if new to this topic and Ch. 7 of

More information

Lecture 17: Likelihood ratio and asymptotic tests

Lecture 17: Likelihood ratio and asymptotic tests Lecture 17: Likelihood ratio and asymptotic tests Likelihood ratio When both H 0 and H 1 are simple (i.e., Θ 0 = {θ 0 } and Θ 1 = {θ 1 }), Theorem 6.1 applies and a UMP test rejects H 0 when f θ1 (X) f

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Graduate Econometrics I: Maximum Likelihood I

Graduate Econometrics I: Maximum Likelihood I Graduate Econometrics I: Maximum Likelihood I Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Assume X P θ, θ Θ, with joint pdf (or pmf) f(x θ). Suppose we observe X = x. The Likelihood function is L(θ x) = f(x θ) as a function of θ (with the data x held fixed). The

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

2017 Financial Mathematics Orientation - Statistics

2017 Financial Mathematics Orientation - Statistics 2017 Financial Mathematics Orientation - Statistics Written by Long Wang Edited by Joshua Agterberg August 21, 2018 Contents 1 Preliminaries 5 1.1 Samples and Population............................. 5

More information

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let

More information

ST5215: Advanced Statistical Theory

ST5215: Advanced Statistical Theory Department of Statistics & Applied Probability Wednesday, October 19, 2011 Lecture 17: UMVUE and the first method of derivation Estimable parameters Let ϑ be a parameter in the family P. If there exists

More information

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3. Mathematical Statistics: Homewor problems General guideline. While woring outside the classroom, use any help you want, including people, computer algebra systems, Internet, and solution manuals, but mae

More information

3.1 General Principles of Estimation.

3.1 General Principles of Estimation. 154 Chapter 3 Basic Theory of Point Estimation. Suppose X is a random observable taking values in a measurable space (Ξ, G) and let P = {P θ : θ Θ} denote the family of possible distributions of X. An

More information

General Linear Model: Statistical Inference

General Linear Model: Statistical Inference Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter 4), least

More information

Review Quiz. 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the

Review Quiz. 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the Review Quiz 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the Cramér Rao lower bound (CRLB). That is, if where { } and are scalars, then

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 8 Maximum Likelihood Estimation 8. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

Estimation MLE-Pandemic data MLE-Financial crisis data Evaluating estimators. Estimation. September 24, STAT 151 Class 6 Slide 1

Estimation MLE-Pandemic data MLE-Financial crisis data Evaluating estimators. Estimation. September 24, STAT 151 Class 6 Slide 1 Estimation September 24, 2018 STAT 151 Class 6 Slide 1 Pandemic data Treatment outcome, X, from n = 100 patients in a pandemic: 1 = recovered and 0 = not recovered 1 1 1 0 0 0 1 1 1 0 0 1 0 1 0 0 1 1 1

More information

POLI 8501 Introduction to Maximum Likelihood Estimation

POLI 8501 Introduction to Maximum Likelihood Estimation POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,

More information

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30 MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)

More information

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. Rebecca Barter May 5, 2015 Linear Regression Review Linear Regression Review

More information

Mathematics Qualifying Examination January 2015 STAT Mathematical Statistics

Mathematics Qualifying Examination January 2015 STAT Mathematical Statistics Mathematics Qualifying Examination January 2015 STAT 52800 - Mathematical Statistics NOTE: Answer all questions completely and justify your derivations and steps. A calculator and statistical tables (normal,

More information

HT Introduction. P(X i = x i ) = e λ λ x i

HT Introduction. P(X i = x i ) = e λ λ x i MODS STATISTICS Introduction. HT 2012 Simon Myers, Department of Statistics (and The Wellcome Trust Centre for Human Genetics) myers@stats.ox.ac.uk We will be concerned with the mathematical framework

More information

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1 Chapter 4 HOMEWORK ASSIGNMENTS These homeworks may be modified as the semester progresses. It is your responsibility to keep up to date with the correctly assigned homeworks. There may be some errors in

More information

Lecture 4 September 15

Lecture 4 September 15 IFT 6269: Probabilistic Graphical Models Fall 2017 Lecture 4 September 15 Lecturer: Simon Lacoste-Julien Scribe: Philippe Brouillard & Tristan Deleu 4.1 Maximum Likelihood principle Given a parametric

More information

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants 18.650 Statistics for Applications Chapter 5: Parametric hypothesis testing 1/37 Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009

More information

Ph.D. Qualifying Exam Monday Tuesday, January 4 5, 2016

Ph.D. Qualifying Exam Monday Tuesday, January 4 5, 2016 Ph.D. Qualifying Exam Monday Tuesday, January 4 5, 2016 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Find the maximum likelihood estimate of θ where θ is a parameter

More information

A General Overview of Parametric Estimation and Inference Techniques.

A General Overview of Parametric Estimation and Inference Techniques. A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying

More information

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

STA216: Generalized Linear Models. Lecture 1. Review and Introduction STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Probability Distributions Columns (a) through (d)

Probability Distributions Columns (a) through (d) Discrete Probability Distributions Columns (a) through (d) Probability Mass Distribution Description Notes Notation or Density Function --------------------(PMF or PDF)-------------------- (a) (b) (c)

More information

Likelihood-Based Methods

Likelihood-Based Methods Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)

More information

Exponential Families

Exponential Families Exponential Families David M. Blei 1 Introduction We discuss the exponential family, a very flexible family of distributions. Most distributions that you have heard of are in the exponential family. Bernoulli,

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Mixed models Yan Lu March, 2018, week 8 1 / 32 Restricted Maximum Likelihood (REML) REML: uses a likelihood function calculated from the transformed set

More information

1 One-way analysis of variance

1 One-way analysis of variance LIST OF FORMULAS (Version from 21. November 2014) STK2120 1 One-way analysis of variance Assume X ij = µ+α i +ɛ ij ; j = 1, 2,..., J i ; i = 1, 2,..., I ; where ɛ ij -s are independent and N(0, σ 2 ) distributed.

More information

Mathematical Statistics

Mathematical Statistics Mathematical Statistics Chapter Three. Point Estimation 3.4 Uniformly Minimum Variance Unbiased Estimator(UMVUE) Criteria for Best Estimators MSE Criterion Let F = {p(x; θ) : θ Θ} be a parametric distribution

More information

STAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method

STAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method STAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method Rebecca Barter February 2, 2015 Confidence Intervals Confidence intervals What is a confidence interval? A confidence interval is calculated

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

STAT 512 sp 2018 Summary Sheet

STAT 512 sp 2018 Summary Sheet STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}

More information

Chapter 4: Asymptotic Properties of the MLE (Part 2)

Chapter 4: Asymptotic Properties of the MLE (Part 2) Chapter 4: Asymptotic Properties of the MLE (Part 2) Daniel O. Scharfstein 09/24/13 1 / 1 Example Let {(R i, X i ) : i = 1,..., n} be an i.i.d. sample of n random vectors (R, X ). Here R is a response

More information

Bias Variance Trade-off

Bias Variance Trade-off Bias Variance Trade-off The mean squared error of an estimator MSE(ˆθ) = E([ˆθ θ] 2 ) Can be re-expressed MSE(ˆθ) = Var(ˆθ) + (B(ˆθ) 2 ) MSE = VAR + BIAS 2 Proof MSE(ˆθ) = E((ˆθ θ) 2 ) = E(([ˆθ E(ˆθ)]

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

BIOS 2083 Linear Models c Abdus S. Wahed

BIOS 2083 Linear Models c Abdus S. Wahed Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter

More information

Statistical Inference

Statistical Inference Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA. Asymptotic Inference in Exponential Families Let X j be a sequence of independent,

More information

BTRY 4090: Spring 2009 Theory of Statistics

BTRY 4090: Spring 2009 Theory of Statistics BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 7 Maximum Likelihood Estimation 7. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression

More information

Stat 411 Lecture Notes 03 Likelihood and Maximum Likelihood Estimation

Stat 411 Lecture Notes 03 Likelihood and Maximum Likelihood Estimation Stat 411 Lecture Notes 03 Likelihood and Maximum Likelihood Estimation Ryan Martin www.math.uic.edu/~rgmartin Version: August 19, 2013 1 Introduction Previously we have discussed various properties of

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Merlise Clyde STA721 Linear Models Duke University August 31, 2017 Outline Topics Likelihood Function Projections Maximum Likelihood Estimates Readings: Christensen Chapter

More information

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

MATH4427 Notebook 2 Fall Semester 2017/2018

MATH4427 Notebook 2 Fall Semester 2017/2018 MATH4427 Notebook 2 Fall Semester 2017/2018 prepared by Professor Jenny Baglivo c Copyright 2009-2018 by Jenny A. Baglivo. All Rights Reserved. 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

More information