Likelihood based Statistical Inference. Dottorato in Economia e Finanza Dipartimento di Scienze Economiche Univ. di Verona

Size: px
Start display at page:

Download "Likelihood based Statistical Inference. Dottorato in Economia e Finanza Dipartimento di Scienze Economiche Univ. di Verona"

Transcription

1 Likelihood based Statistical Inference Dottorato in Economia e Finanza Dipartimento di Scienze Economiche Univ. di Verona L. Pace, A. Salvan, N. Sartori Udine, April 2008 Likelihood: observed quantities, exact properties, inference L 2 slide 1

2 Lecture 2 Likelihood: observed quantities, exact properties, inference 2.1 Likelihood: observed quantities 2.2 Likelihood: exact properties 2.3 Likelihood: inference (no nuisance parameters) 2.4 Reparameterizations 2.5 A two-parameter model 2.6 Example: linear model with normal errors Likelihood: observed quantities, exact properties, inference L 2 slide 2

3 Detailed References 2.1 Likelihood: observed quantities *PS96:/PS *PS01: Azzalini, A. (1996). Statistical Inference based on the Likelihood, Chapman and Hall, London, 2.1, 2.2, 3.2. Azzalini, A. (2001). Inferenza Statistica: una Presentazione basata sul Concetto di Verosimiglianza, Springer, Milano, 2.1, 2.2, 3.2. Barndorff-Nielsen, O.E., Cox, D.R. (1944). Inference and Asymptotics, Chapman and Hall, London, 2.2. Severini, T.A. (2000). Likelihood Methods in Statistics, Oxford University Press, Oxford, Ch. 3, see also 3.8 con ulteriori riferimenti bibliografici. Likelihood: observed quantities, exact properties, inference L 2 slide 3

4 Detailed References 2.2 Likelihood: exact properties *PS01: Ch. 5 *PS96/PS97: SS , Azzalini, A. (1996). Statistical Inference based on the Likelihood, Chapman and Hall, London, 2.1, 2.2, 3.2. Azzalini, A. (2001). Inferenza Statistica: una Presentazione basata sul Concetto di Verosimiglianza, Springer, Milano, 2.1, 2.2, 3.2. Barndorff-Nielsen, O.E., Cox, D.R. (1944). Inference and Asymptotics, Chapman and Hall, London, 2.2. Severini, T.A. (2000). Likelihood Methods in Statistics, Oxford University Press, Oxford, Ch. 3. Likelihood: observed quantities, exact properties, inference L 2 slide 4

5 Detailed References 2.3 Likelihood: inference (no nuisance parameters) *PS01: 3.6 *PS96: , 3.5 (introd.) *PS97: , 3.4 (introd.) Azzalini, A. (1996). Statistical Inference based on the Likelihood, Chapman and Hall, London. Azzalini, A. (2001). Inferenza Statistica: una Presentazione basata sul Concetto di Verosimiglianza, Springer, Milano. Barndorff-Nielsen, O.E., Cox, D.R. (1944). Inference and Asymptotics, Chapman and Hall, London, 3.2. Severini, T.A. (2000). Likelihood Methods in Statistics, Oxford University Press, Oxford, 3.7. Likelihood: observed quantities, exact properties, inference L 2 slide 5

6 Detailed References 2.4 Reparameterizations *PS96: Azzalini, A. (1996). Statistical Inference based on the Likelihood, Chapman and Hall, London, and Azzalini, A. (2001). Inferenza Statistica: una Presentazione basata sul Concetto di Verosimiglianza, Springer, Milano and Barndorff-Nielsen, O.E. (1988). Parametric Statistical Models and Likelihood, Lecture Notes in Statistics, 50, Springer, Berlin, Ch. 3. Barndorff-Nielsen, O.E., Cox, D.R. (1944). Inference and Asymptotics, Chapman and Hall, London, Likelihood: observed quantities, exact properties, inference L 2 slide 6

7 2.1 Likelihood: observed quantities The likelihood function is L = L(θ) = L(θ; y) = c(y) p Y (y; θ), (1) c(y) > 0 arbitrary proportionality constant. Interpretation: on the basis of data y, the parameter value θ Θ is more credible than θ Θ as an index of the data generating probability model if L(θ ) > L(θ ). the empirical support received by θ Θ from y is to be compared with that received by θ Θ on the basis of the ratio L(θ )/L(θ ). L(θ) is not a density on Θ! Two likelihood functions are called equivalent if they agree up to a positive multiplicative constant (which can depend on y but not on θ). Likelihood: observed quantities, exact properties, inference L 2 slide 7

8 Likelihood: observed quantities In many cases it is more convenient to consider the log-likelihood function l = l(θ) = l(θ; y) = log L, (2) we let l(θ) = if L(θ) = 0 (the log-likelihood is defined up to an additive constant, which only depends on y). Independent observations: l(θ) = n log p Yi (y i ; θ). (3) i=1 A value of θ that maximizes L(θ; y) over Θ, that is a value ˆθ such that L(ˆθ) = sup θ Θ L(θ), is called maximum likelihood estimate (m.l.e.) of θ. Likelihood: observed quantities, exact properties, inference L 2 slide 8

9 Likelihood: observed quantities Regular parametric model if Θ is an open subset of IR p and the function l(θ) is continuously differentiable on Θ up to the third order. A major condition for a model to be regular is that all its probability distribution have the same support. Typically exponential families are regular models. Log-likelihood derivatives. In a regular parametric model, l r = l r (θ; y) = θ r l(θ), l rs = l rs (θ; y) = 2 θ r θ sl(θ), l rst = l rst (θ; y) = 3 θ r θ s θ tl(θ), Likelihood: observed quantities, exact properties, inference L 2 slide 9

10 Likelihood: observed quantities The score function is the vector l = l (θ; y) = (l 1,...,l p ). In the following, unless otherwise stated, it will be assumed that the m.l.e. is unique and that it is the unique solution to the likelihood equation l (θ) = 0. (4) Sufficient conditions exist for specific classes of parametric models (Ch. s 5 7 of PS97). The observed information matrix is l 11 l 1p j = j(θ) =..... l p1 l pp. or, in compact notation, j = [ l rs ]; here and below the matrix with elements a rs is indicated by [a rs ]. Likelihood: observed quantities, exact properties, inference L 2 slide 10

11 Likelihood: observed quantities The elements of the inverse of a matrix [a rs ] will be denoted by upper indices, that is, [a rs ] 1 = [a rs ]; in particular j 1 = [j rs ] and i 1 = [i rs ], when these inverses exist. Quantities like [l rst ] are called arrays. Sometimes the notations ˆl = l(ˆθ), ĵ = j(ˆθ), î = i(ˆθ), etc., are used to denote likelihood quantities evaluated at ˆθ. By slightly adjusting the terminology, if θ = (τ, ζ) and ˆθ = (ˆτ, ˆζ), we say that ˆτ is the maximum likelihood estimate of τ. In regular models, ˆθ and partial derivatives of l(θ) with respect to components of θ give us information about the behaviour of the log-likelihood function. With p = 1, l(θ) = l(ˆθ) + (θ ˆθ)l (ˆθ) (θ ˆθ) 2 l (ˆθ) +... = l(ˆθ) 1 2 (θ ˆθ) 2 j(ˆθ) Likelihood: observed quantities, exact properties, inference L 2 slide 11

12 Likelihood: observed quantities Hence, the relative log-lokelihood l(θ) l(ˆθ) may be expanded as l(θ) l(ˆθ) = 1 2 (θ ˆθ) 2 j(ˆθ) Thus, the larger j(ˆθ) is, the more rapidly values of θ different from ˆθ lose empirical support. In other words, the larger j(ˆθ) is, the more the likelihood is concentrated around ˆθ. For this reason, j(ˆθ) is a measure of information about θ contained in the data. When p > 1, j(ˆθ), is positive-definite. Expansion: l(θ) l(ˆθ) = (θ ˆθ) j(ˆθ)(θ ˆθ)/2 +..., l(θ) l(ˆθ) can be approximated around ˆθ by a negative-definite quadratic form. To sum up, in a regular model the log-likelihood function has approximately a parabolic behaviour, at least in the region with largest likelihood values. Likelihood: observed quantities, exact properties, inference L 2 slide 12

13 Likelihood: observed quantities The practise of examining the log-likelihood function is to be recommended and it is facilitated by the graphic options of many popular statistics packages. For an approach to inference which is based entirely on the likelihood function, see Edwards (1972) and Royall (1997, Statistical Evidence, Chapman and Hall, London). Likelihood: observed quantities, exact properties, inference L 2 slide 13

14 2.2 Likelihood: exact properties Behaviour under one-to-one transformations Let F = {p Y (y; θ), θ Θ IR p } be a parametric stat. model for data y. Invariance under one-to-one data transformations. Let t = g(y) be a one-to-one function of y defined on Y with range T, a subset of the same Euclidean space, and let us denote by F T = {p T (t; θ), θ Θ IR p } the statistical model for T = g(y ). Information about θ contained in y is equivalent to information contained in t: only the measurement of data changes. Likelihood is invariant under one-to-one transformations of the data. Indeed the likelihood for θ based on the statistical model F T = {p T (t; θ), θ Θ IR p } and on data t = g(y) is equivalent to that based on F and on data y: L T (θ; t(y)) = c(y)l Y (θ; y). Likelihood: observed quantities, exact properties, inference L 2 slide 14

15 Likelihood: exact properties Invariance under reparameterizations. Let ψ = ψ(θ) be a reparameterization of F (hereafter a one-to-one smooth function from Θ IR p to Ψ IR p, infinitely differentiable together with its inverse). In the new parameterization: F = {p Y (y; ψ), ψ Ψ IR p }, with Ψ = {ψ IR p : ψ = ψ(θ), θ Θ}. The statistical model as well as the inference problem are left unchanged. Hence, inference conclusions should not depend on the parameterization. Likelihood: observed quantities, exact properties, inference L 2 slide 15

16 Likelihood: exact properties The likelihood function is indeed invariant under reparameterizations of the model: the likelihood function L Ψ (ψ) in the new parameterization and the original likelihood L Θ (θ) are related as L Ψ (ψ) = L Θ (θ(ψ)), (5) so that l Ψ (ψ) = l Θ (θ(ψ)), (6) ˆψ = ψ(ˆθ). This means that the m.l.e. is equivariant under reparametrizations. In general, likelihood inference about ψ is simply the translation in the new parameter space of conclusions about θ. Likelihood: observed quantities, exact properties, inference L 2 slide 16

17 Likelihood: exact properties Sampling properties Repeated sampling principle (r.s.p.) study of the probability distribution of the random variable l(θ) = l(θ; Y ), or of related quantities, for θ fixed and as y varies in the sample space Y according to a density p Y (y; θ) in F, where θ Θ is a parameter value not necessarily equal to θ. Null distribution if θ = θ. Non-null distribution if θ θ. Analogously, we refer to null moments (evaluated with respect to a null distribution), and to non-null moments (evaluated with respect to a non-null distribution). Symbols such as E θ ( ), Var θ ( ) are used to indicate expectation, variance (or covariance matrix), etc., calculated with reference to p Y (y; θ). Likelihood: observed quantities, exact properties, inference L 2 slide 17

18 Likelihood: exact properties Wald inequality. Key result in proving consistency of the m.l.e. (under suitable conditions) E θ (l(θ; Y )) > E θ (l(θ ; Y )), θ θ. (7) Likelihood: observed quantities, exact properties, inference L 2 slide 18

19 Likelihood: exact properties To prove (7), note first that because of parameter identifiability ( py (Y ; θ ) ) Pr θ p Y (Y ; θ) = 1 < 1, that is the r.v. p Y (Y ; θ )/p Y (Y ; θ) is non-degenerate for any pair of values θ and θ in Θ with θ θ. Moreover, log( ) is a strictly concave function, so that, by Jensen s inequality, E θ ( log p Y (Y ; θ ) p Y (Y ; θ) ) < log { E θ ( py (Y ; θ ) p Y (Y ; θ) )} = 0, (8) taking into account that E θ ( py (Y ; θ ) p Y (Y ; θ) ) = Y p Y (y; θ ) p Y (y; θ) p Y (y; θ) dy = 1. (Notation for the continuous case has been used without loss of generality). Inequality (7) follows from (8). Likelihood: observed quantities, exact properties, inference L 2 slide 19

20 Likelihood: exact properties Expectation of the score function. For a regular model, assuming that the conditions which permit differentiation and integration to be interchanged are satisfied, E θ (l (θ)) = E θ (l (θ; Y )) = 0, θ Θ. (9) Thus, in agreement with the property that log-likelihood functions associated to data from p Y (y; θ 0 ) have, on average, maximum at θ 0, the components of the score function evaluated at θ 0, are, on average, equal to zero. Property (9) could also be phrased as: the likelihood equation l (θ) = 0 is an unbiased estimating equation. Likelihood: observed quantities, exact properties, inference L 2 slide 20

21 Likelihood: exact properties In general, an estimating equation is an equation in θ of the form q(θ; y) = 0. Its solution gives an estimate ˆθ q of θ. The estimating equation q(θ; y) = 0 is called unbiased E θ (q(y ; θ)) = 0 for all θ Θ. Unbiasedness of an estimating equation is a key property to show consistency of the corresponding estimator (see e.g. 6.2 in PS01; Serfling, 1980, 7.2.1; van der Vaart, 1998, 5.2). Likelihood: observed quantities, exact properties, inference L 2 slide 21

22 Likelihood: exact properties Expected information. We call expected information or Fisher information the quantity [ ( 2 )] l(θ) i(θ) = E θ (j(θ)) = E θ, θ r θ s (null) expectation of the observed information. It is a p p matrix, i(θ) = [i rs (θ)]. Under random sampling of size n, i(θ) is proportional to n. Indeed, i(θ) = ni 1 (θ), where i 1 (θ) is the expected information for one observation. Likelihood: observed quantities, exact properties, inference L 2 slide 22

23 Likelihood: exact properties Covariance matrix of the score function. Under regularity conditions (see e.g. Azzalini, 1996, section 3.2.4), the information identity E θ ( l (θ)(l (θ)) ) = i(θ), θ Θ (10) holds. In other words, the expected information matrix is the null second moment of the score and, as such, is a non-negative definite matrix. In other words, the log-likelihood has Hessian matrix which is, on average, non-positive definite at the true parameter value. Likelihood: observed quantities, exact properties, inference L 2 slide 23

24 2.3 Likelihood: inference (no nuisance parameters) From a likelihood function L(θ) it is natural: i) to use as a point estimate for θ the value ˆθ = ˆθ(y) having maximum likelihood; ii) to consider confidence regions of the form ˆΘ(y) = {θ Θ : L(θ) cl((ˆθ)}, with 0 < c < 1, made up of parameter values having large likelihood; Likelihood: observed quantities, exact properties, inference L 2 slide 24

25 Likelihood: inference iii) to test an hypothesis about θ, H 0 : θ Θ 0 Θ, by comparing the maximum of L(θ) for θ Θ 0 with L(ˆθ), the maximum of L(θ) for θ Θ. Let us denote by L(ˆθ 0 ) the maximum of L(θ) for θ Θ 0. This leads to the test statistic L(ˆθ) L(ˆθ 0 ) (11) with large values significant. This statistic can be equivalently expressed using the log-likelihood, as l(ˆθ) l(ˆθ 0 ). Likelihood: observed quantities, exact properties, inference L 2 slide 25

26 Likelihood: inference Likelihood ratio tests and confidence regions Likelihood ratio test Suppose we want to test the simple hypothesis H 0 : θ = θ 0 against the alternative H 1 : θ θ 0. Twice the logarithm of (11) is { } W(θ 0 ) = 2 l(ˆθ) l(θ 0 ), also called log likelihood ratio statistic. Large values of W(θ 0 ) are critical against H 0. Likelihood: observed quantities, exact properties, inference L 2 slide 26

27 Likelihood: inference Sometimes W(θ 0 ) is a monotonically increasing function of a statistic t(y) having known distribution under H 0. It is thus easy to compute the observed significance level (p-value) asssociated to the observed y obs as α obs = Pr θ0 (W(θ 0 ) W(θ 0 ) obs ) = Pr θ0 (t(y ) t(y obs )). Usually, however, one has to resort to an approximation for the null distribution of W(θ 0 ). Under regularity conditions, the asymptotic null distribution of W(θ 0 ) is chi-squared on p degrees of freedom, where p is the number of scalar components of θ. Thus, under such conditions, W(θ) is an asymptotically pivotal quantity. Likelihood: observed quantities, exact properties, inference L 2 slide 27

28 Likelihood: inference Confidence regions Confidence regions with nominal level (level = null coverage probability) 1 α are ˆΘ(y) = {θ Θ : W(θ) < χ 2 p; 1 α}. The region ˆΘ(y) can be written as ˆΘ(y) = {θ Θ : l(θ) > l(ˆθ) 1 2 χ2 p;1 α}. With p = 1 or 2, graphical representation of likelihood confidence regions is rather simple starting form the log-likelihood plot. Alternatively, the region ˆΘ(y) might also be read from the plot of W(θ). Likelihood: observed quantities, exact properties, inference L 2 slide 28

29 Likelihood: inference One-sided version of W(θ) When θ is a scalar, one might be interested in testing H 0 : θ = θ 0 against the one-sided alternative H1 dx : θ > θ 0 or H1 sx : θ < θ 0. The natural test statistic based on the likelihood is the signed root of W(θ 0 ), r(θ 0 ) = sgn(ˆθ θ 0 ) W(θ 0 ), where sgn( ) is the sign function: sgn(x) = +1 if x > 0, sgn(x) = 1 if x < 0, sgn(x) = 0 if x = 0. Moreover the non-negative square root of W(θ 0 ) is to be considered. Likelihood: observed quantities, exact properties, inference L 2 slide 29

30 Likelihood: inference One-sided version of W(θ) When θ is a scalar, one might be interested in testing H 0 : θ = θ 0 against the one-sided alternative H1 dx : θ > θ 0 or H1 sx : θ < θ 0. The natural test statistic based on the likelihood is the signed root of W(θ 0 ), r(θ 0 ) = sgn(ˆθ θ 0 ) W(θ 0 ), where sgn( ) is the sign function: sgn(x) = +1 if x > 0, sgn(x) = 1 if x < 0, sgn(x) = 0 if x = 0. Moreover the non-negative square root of W(θ 0 ) is to be considered. The test based on r(θ 0 ) rejects H 0 for large (small) values when the alternative is H dx 1 (H sx 1 ). For some models r(θ 0 ) is s a monotonically increasing function of a statistic t(y) having known distribution under H 0. Usually, however, under regularity conditions, the approximation r(θ 0 ) N(0, 1) is used for the null distribution. Likelihood: observed quantities, exact properties, inference L 2 slide 29

31 Likelihood: inference r(θ 0 ) with two-sided and symmetric around zero critical region is equivalent to the test based on W(θ 0 ). ˆΘ(y) = {θ Θ : z 1 α/2 < r(θ) < z 1 α/2 }. For graphical illustrations of construction of confidence regions, see e.g. Example 3.9 in PS01. Likelihood: observed quantities, exact properties, inference L 2 slide 30

32 Likelihood: inference Asymptotically equivalent versions of W(θ) and r(θ) The statistic { } W(θ 0 ) = 2 l(ˆθ) l(θ 0 ), is also called Wilks test statistic, from S. Wilks, who obtained its chi-squared asymptotic null distribution in W(θ 0 ) is a likelihood-based measure of the distance between ˆθ and θ 0. Due to the parabolic approximation for the relative log-likelihood, the asymptotic null distribution of W(θ 0 ) coincides with the asymptotic null distribution of W e (θ 0 ) = (ˆθ θ 0 ) i(θ 0 )(ˆθ θ 0 ). The test statistic W e (θ 0 ) is called Wald test statistic. It measures the distance between ˆθ and θ 0 directly on Θ using the metric defined by the expected information. Likelihood: observed quantities, exact properties, inference L 2 slide 31

33 Likelihood: inference A third asymptotically equivalent statistic is the score test, also called Rao s test. It is given by W u (θ 0 ) = l (θ 0 ) i(θ 0 ) 1 l (θ 0 ). It measures the distance between l (θ 0 ) and E θ0 (l (θ 0 )) = 0, using the metric defined by the inverse of the expected information. When p = 1, it is possible to define one-sided versions of W e (θ 0 ) and of W u (θ 0 ), analogous to r(θ 0 ). For instance, r e (θ 0 ) = i(θ 0 )(ˆθ θ 0 ). The asymptotic null distribution of both W e (θ 0 ) and W u (θ 0 ) is χ 2 p. The asymptotic null distribution of the one-sided versions (p = 1) is N(0, 1). The same asymptotic results hold if the expected information is replaced by i(ˆθ) or j(ˆθ). Likelihood: observed quantities, exact properties, inference L 2 slide 32

34 Likelihood: inference Comparisons Wald test has a rather simple expression and interpretation, it is widely used in applications. However it does depend on the parameterization of the statistical model (see the following section). On the other hand, W u and W do not depend on the parameterization. If compared with the other two versions, W does respect more closely the shape of the log-likelihood, especially when this is far from being parabolic. Confidence regions based on W are necessarily included in the parameter space. This is not always true for confidence regions based on the other two versions. W u is simple to compute and is related to (locally) optimal tests (cf of PS97). Likelihood: observed quantities, exact properties, inference L 2 slide 33

35 2.4 Reparameterizations Likelihood and log-likelihood do not depend on the parameterization of F. Let ψ = ψ(θ) be a one-to-one smooth function from Θ IR p to Ψ IR p, infinitely differentiable together with its inverse. Then ψ defines an alternative parameterization of the model. Since θ and ψ(θ) identify the same element of F, we have (recall (12) and (13)): L Ψ (ψ) = L Θ (θ(ψ)), l Ψ (ψ) = l Θ (θ(ψ)). L Ψ (ψ) = L Θ (θ(ψ)) (12) e l Ψ (ψ) = l Θ (θ(ψ)) ; (13) likelihood is an intrinsic function, i.e. it does not depend on the coordinate system expressed by the parameterization of F. Likelihood: observed quantities, exact properties, inference L 2 slide 34

36 Reparameterizations Important consequence for the m.l.e.: equivariance under reparameterization. If ψ is an alternative parameterization of F, we will have ˆψ = ψ(ˆp( )) = ψ(ˆθ) and ˆθ = θ( ˆψ). On the other hand, log-likelihood derivatives l r, l rs,..., and their expectations depend on the parameterization. Let us denote by ψ a, ψ b,...(a, b = 1,...,p), the generic components of ψ, whereas θ r, θ s,..., denote the generic components of θ. According to the differentiation rule for composite functions (chain rule), where la = p l r θa r, r=1 (14) la = ψ alψ (ψ), (15) and l r = l r (θ(ψ)). Likelihood: observed quantities, exact properties, inference L 2 slide 35

37 Reparameterizations Let lab = 2 ψ a ψ blψ (ψ), labc = 3 ψ a ψ b ψ clψ (ψ), (16) etc., by re-applying the chain rule, we have lab = p p l rs θa r θb s + l r θab r, (17) r,s=1 r=1 where l rs = l rs (θ(ψ)) θ r ab = ( 2 / ψ a ψ b )θ r (ψ). Likelihood: observed quantities, exact properties, inference L 2 slide 36

38 Reparameterizations Transformation rules for observed and expected information: j ab = p p j rs θa r θb s l r θab r, (18) r,s=1 r=1 ī ab = p r,s=1 i rs θ r a θ s b. (19) Perhaps (14) and (19) look more familiar in matrix notation: l Ψ (ψ) = [θa] r T l Θ (θ(ψ)) and i Ψ (ψ) = [θa] r T i Θ (θ(ψ)) [θa] r ; (20) [θa]: r p p matrix with (r, a) element θa. r Likelihood: observed quantities, exact properties, inference L 2 slide 37

39 2.5 A two-parameter model Let y = (0.446, 0.604, 2.137, 0.737, 0.996, 1.152, 1.124, 0.137, 0.982, 1.196, 0.841, 0.636, 0.459, 0.947, 0.037, 1.307, 0.858, 0.164, 0.444, 0.623) be a random sample of size n = 20 from a Weibull W(γ, λ) distribution, with p.d.f. for one observation p Yi (y i ; γ, λ) = λγy γ 1 i e λyγ i, γ, λ > 0, yi > 0. Likelihood equation: n γ + log y i λ y γ i log y i = 0 n λ y γ i = 0 Likelihood: observed quantities, exact properties, inference L 2 slide 38

40 A two-parameter model The solution (obtained numerically) is (ˆγ, ˆλ) = (1.642, 1.24) and ĵ = ( ). W(γ, λ) = 2 { nlog ) (ˆλ λ + nlog ) (ˆγ γ + (ˆγ γ) log y i ˆλ yˆγ i + λ y γ i } Parabolic approximation (Wald test): W(γ, λ). = W e (γ, λ) = (θ ˆθ) ĵ(θ ˆθ), θ = (γ, λ). For H 0 : γ = 1, λ = 1.2 against H 1 : H 0, we get W(1, 1.2) = with α oss. = 0.048, W e (1, 1.2) = 4.94 with α oss. = 0.084, (χ 2 2;0.95 = 5.99). Likelihood: observed quantities, exact properties, inference L 2 slide 39

41 A two-parameter model Confidence regions for (γ, λ) based on W(γ, λ) (1 α) {0.25, 0.50, 0.75, 0.90, 0.95, 0.99} λ γ Likelihood: observed quantities, exact properties, inference L 2 slide 40

42 A two-parameter model Confidence regions for (γ, λ) based on W(γ, λ) (1 α) {0.25, 0.50, 0.75, 0.90, 0.95, 0.99} λ γ Likelihood: observed quantities, exact properties, inference L 2 slide 41

43 2.6 Example: linear model with normal errors Let y = (y 1,..., y n ) be a realization of Y = (Y 1,...,Y n ) with Y i independent, i = 1,..., n. Let us assume that Y i N(µ i, σ 2 ) where the unknown µ i has to be related to covariates whose values are measured on the same units where values y i are measured. Therefore the data pertaining to the i-th unit are (y i, x i1,...,x ip ). The simplest way to relate µ i to (x i1,...,x ip ) is through the linear model where β = (β 1,...,β p ) IR p. µ i = x i1 β x ip β p Likelihood: observed quantities, exact properties, inference L 2 slide 42

44 2.6 Example: linear model with normal errors The model parametric statistical model thus defined is known as multiple regression model with normal errors, in brief normal linear regression model. It is customary to express this model also in the form Y = Xβ + ǫ, where Y is the n 1 random variable generating y, X is the n p matrix containing the fixed (non-stocastic) values of the covariates, ǫ N n (0, σ 2 I n ) is a vector of i.i.d. Gaussian errors. Even more succinctly y is modeled as an observation of Y N n (Xβ, σ 2 I n ). The parameter space of the model is Θ = IR p (0, + ) and the likelihood function is L(β, σ 2 ) = (σ 2 ) n/2 exp{ 1 2σ 2(y Xβ) (y Xβ))}. Likelihood: observed quantities, exact properties, inference L 2 slide 43

45 2.6 Example: linear model with normal errors The log likelihood function is therefore l(β, σ 2 ) = n 2 log(σ2 ) 1 2σ 2(y Xβ) (y Xβ)). For fixed σ 2 the (log) likelihood is maximized by a value of β that renders the shortest possible the distance between y and Xβ. This value of β does not depend on σ 2. Geometrically, we need to find the orthogonal projection of y upon the vector space having the columns of X as its basis. Orthogonality between the residual y X ˆβ and the basis X means in terms of scalar products X (y X ˆβ) = 0. Therefore X y = X X ˆβ. When the columns of X are linearly independent, i.e. X has rank p < n, one has ˆβ = (X X) 1 X y. Likelihood: observed quantities, exact properties, inference L 2 slide 44

46 2.6 Example: linear model with normal errors A simple maximization of l(ˆβ, σ 2 ) gives ˆσ 2 = 1 n (y X ˆβ) (y X ˆβ). Inference on the linear model with normal error is greatly simplified by the fact that the exact sampling distribution of the maximum likelihood estimator is easily described: ˆβ N p (β, σ 2 (X X) 1 ) ˆσ 2 σ2 n χ2 n p ˆβ and ˆσ 2 are independent. Likelihood: observed quantities, exact properties, inference L 2 slide 45

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Theory and Methods of Statistical Inference. PART I Frequentist theory and methods

Theory and Methods of Statistical Inference. PART I Frequentist theory and methods PhD School in Statistics cycle XXVI, 2011 Theory and Methods of Statistical Inference PART I Frequentist theory and methods (A. Salvan, N. Sartori, L. Pace) Syllabus Some prerequisites: Empirical distribution

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random

More information

Theory and Methods of Statistical Inference

Theory and Methods of Statistical Inference PhD School in Statistics cycle XXIX, 2014 Theory and Methods of Statistical Inference Instructors: B. Liseo, L. Pace, A. Salvan (course coordinator), N. Sartori, A. Tancredi, L. Ventura Syllabus Some prerequisites:

More information

Theory and Methods of Statistical Inference. PART I Frequentist likelihood methods

Theory and Methods of Statistical Inference. PART I Frequentist likelihood methods PhD School in Statistics XXV cycle, 2010 Theory and Methods of Statistical Inference PART I Frequentist likelihood methods (A. Salvan, N. Sartori, L. Pace) Syllabus Some prerequisites: Empirical distribution

More information

ECE 275A Homework 7 Solutions

ECE 275A Homework 7 Solutions ECE 275A Homework 7 Solutions Solutions 1. For the same specification as in Homework Problem 6.11 we want to determine an estimator for θ using the Method of Moments (MOM). In general, the MOM estimator

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

Likelihood inference in the presence of nuisance parameters

Likelihood inference in the presence of nuisance parameters Likelihood inference in the presence of nuisance parameters Nancy Reid, University of Toronto www.utstat.utoronto.ca/reid/research 1. Notation, Fisher information, orthogonal parameters 2. Likelihood inference

More information

ASSESSING A VECTOR PARAMETER

ASSESSING A VECTOR PARAMETER SUMMARY ASSESSING A VECTOR PARAMETER By D.A.S. Fraser and N. Reid Department of Statistics, University of Toronto St. George Street, Toronto, Canada M5S 3G3 dfraser@utstat.toronto.edu Some key words. Ancillary;

More information

The formal relationship between analytic and bootstrap approaches to parametric inference

The formal relationship between analytic and bootstrap approaches to parametric inference The formal relationship between analytic and bootstrap approaches to parametric inference T.J. DiCiccio Cornell University, Ithaca, NY 14853, U.S.A. T.A. Kuffner Washington University in St. Louis, St.

More information

Nancy Reid SS 6002A Office Hours by appointment

Nancy Reid SS 6002A Office Hours by appointment Nancy Reid SS 6002A reid@utstat.utoronto.ca Office Hours by appointment Light touch assessment One or two problems assigned weekly graded during Reading Week http://www.utstat.toronto.edu/reid/4508s14.html

More information

Introduction to Estimation Methods for Time Series models Lecture 2

Introduction to Estimation Methods for Time Series models Lecture 2 Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:

More information

Likelihood based Statistical Inference. Dottorato in Economia e Finanza Dipartimento di Scienze Economiche Univ. di Verona

Likelihood based Statistical Inference. Dottorato in Economia e Finanza Dipartimento di Scienze Economiche Univ. di Verona Likelihood based Statistical Inference Dottorato in Economia e Finanza Dipartimento di Scienze Economiche Univ. di Verona L. Pace, A. Salvan, N. Sartori Udine, April 2009 Statistical models L 1 slide 1

More information

DA Freedman Notes on the MLE Fall 2003

DA Freedman Notes on the MLE Fall 2003 DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar

More information

Loglikelihood and Confidence Intervals

Loglikelihood and Confidence Intervals Stat 504, Lecture 2 1 Loglikelihood and Confidence Intervals The loglikelihood function is defined to be the natural logarithm of the likelihood function, l(θ ; x) = log L(θ ; x). For a variety of reasons,

More information

Practical Econometrics. for. Finance and Economics. (Econometrics 2)

Practical Econometrics. for. Finance and Economics. (Econometrics 2) Practical Econometrics for Finance and Economics (Econometrics 2) Seppo Pynnönen and Bernd Pape Department of Mathematics and Statistics, University of Vaasa 1. Introduction 1.1 Econometrics Econometrics

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or

More information

Approximate Inference for the Multinomial Logit Model

Approximate Inference for the Multinomial Logit Model Approximate Inference for the Multinomial Logit Model M.Rekkas Abstract Higher order asymptotic theory is used to derive p-values that achieve superior accuracy compared to the p-values obtained from traditional

More information

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30 MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

Lecture 17: Likelihood ratio and asymptotic tests

Lecture 17: Likelihood ratio and asymptotic tests Lecture 17: Likelihood ratio and asymptotic tests Likelihood ratio When both H 0 and H 1 are simple (i.e., Θ 0 = {θ 0 } and Θ 1 = {θ 1 }), Theorem 6.1 applies and a UMP test rejects H 0 when f θ1 (X) f

More information

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function

More information

Conditional Inference by Estimation of a Marginal Distribution

Conditional Inference by Estimation of a Marginal Distribution Conditional Inference by Estimation of a Marginal Distribution Thomas J. DiCiccio and G. Alastair Young 1 Introduction Conditional inference has been, since the seminal work of Fisher (1934), a fundamental

More information

Nancy Reid SS 6002A Office Hours by appointment

Nancy Reid SS 6002A Office Hours by appointment Nancy Reid SS 6002A reid@utstat.utoronto.ca Office Hours by appointment Problems assigned weekly, due the following week http://www.utstat.toronto.edu/reid/4508s16.html Various types of likelihood 1. likelihood,

More information

PRINCIPLES OF STATISTICAL INFERENCE

PRINCIPLES OF STATISTICAL INFERENCE Advanced Series on Statistical Science & Applied Probability PRINCIPLES OF STATISTICAL INFERENCE from a Neo-Fisherian Perspective Luigi Pace Department of Statistics University ofudine, Italy Alessandra

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Likelihood Inference in the Presence of Nuisance Parameters

Likelihood Inference in the Presence of Nuisance Parameters PHYSTAT2003, SLAC, September 8-11, 2003 1 Likelihood Inference in the Presence of Nuance Parameters N. Reid, D.A.S. Fraser Department of Stattics, University of Toronto, Toronto Canada M5S 3G3 We describe

More information

Bootstrap and Parametric Inference: Successes and Challenges

Bootstrap and Parametric Inference: Successes and Challenges Bootstrap and Parametric Inference: Successes and Challenges G. Alastair Young Department of Mathematics Imperial College London Newton Institute, January 2008 Overview Overview Review key aspects of frequentist

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

1. Fisher Information

1. Fisher Information 1. Fisher Information Let f(x θ) be a density function with the property that log f(x θ) is differentiable in θ throughout the open p-dimensional parameter set Θ R p ; then the score statistic (or score

More information

8. Hypothesis Testing

8. Hypothesis Testing FE661 - Statistical Methods for Financial Engineering 8. Hypothesis Testing Jitkomut Songsiri introduction Wald test likelihood-based tests significance test for linear regression 8-1 Introduction elements

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano Testing Statistical Hypotheses Third Edition 4y Springer Preface vii I Small-Sample Theory 1 1 The General Decision Problem 3 1.1 Statistical Inference and Statistical Decisions

More information

Likelihood Inference in the Presence of Nuisance Parameters

Likelihood Inference in the Presence of Nuisance Parameters Likelihood Inference in the Presence of Nuance Parameters N Reid, DAS Fraser Department of Stattics, University of Toronto, Toronto Canada M5S 3G3 We describe some recent approaches to likelihood based

More information

Chapter 4. Theory of Tests. 4.1 Introduction

Chapter 4. Theory of Tests. 4.1 Introduction Chapter 4 Theory of Tests 4.1 Introduction Parametric model: (X, B X, P θ ), P θ P = {P θ θ Θ} where Θ = H 0 +H 1 X = K +A : K: critical region = rejection region / A: acceptance region A decision rule

More information

Chapter 7. Hypothesis Testing

Chapter 7. Hypothesis Testing Chapter 7. Hypothesis Testing Joonpyo Kim June 24, 2017 Joonpyo Kim Ch7 June 24, 2017 1 / 63 Basic Concepts of Testing Suppose that our interest centers on a random variable X which has density function

More information

Flat and multimodal likelihoods and model lack of fit in curved exponential families

Flat and multimodal likelihoods and model lack of fit in curved exponential families Mathematical Statistics Stockholm University Flat and multimodal likelihoods and model lack of fit in curved exponential families Rolf Sundberg Research Report 2009:1 ISSN 1650-0377 Postal address: Mathematical

More information

2018 2019 1 9 sei@mistiu-tokyoacjp http://wwwstattu-tokyoacjp/~sei/lec-jhtml 11 552 3 0 1 2 3 4 5 6 7 13 14 33 4 1 4 4 2 1 1 2 2 1 1 12 13 R?boxplot boxplotstats which does the computation?boxplotstats

More information

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

DEFNITIVE TESTING OF AN INTEREST PARAMETER: USING PARAMETER CONTINUITY

DEFNITIVE TESTING OF AN INTEREST PARAMETER: USING PARAMETER CONTINUITY Journal of Statistical Research 200x, Vol. xx, No. xx, pp. xx-xx ISSN 0256-422 X DEFNITIVE TESTING OF AN INTEREST PARAMETER: USING PARAMETER CONTINUITY D. A. S. FRASER Department of Statistical Sciences,

More information

UNIVERSITÄT POTSDAM Institut für Mathematik

UNIVERSITÄT POTSDAM Institut für Mathematik UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam

More information

To appear in The American Statistician vol. 61 (2007) pp

To appear in The American Statistician vol. 61 (2007) pp How Can the Score Test Be Inconsistent? David A Freedman ABSTRACT: The score test can be inconsistent because at the MLE under the null hypothesis the observed information matrix generates negative variance

More information

Likelihood and Asymptotic Theory for Statistical Inference

Likelihood and Asymptotic Theory for Statistical Inference Likelihood and Asymptotic Theory for Statistical Inference Nancy Reid 020 7679 1863 reid@utstat.utoronto.ca n.reid@ucl.ac.uk http://www.utstat.toronto.edu/reid/ltccf12.html LTCC Likelihood Theory Week

More information

Answer Key for STAT 200B HW No. 7

Answer Key for STAT 200B HW No. 7 Answer Key for STAT 200B HW No. 7 May 5, 2007 Problem 2.2 p. 649 Assuming binomial 2-sample model ˆπ =.75, ˆπ 2 =.6. a ˆτ = ˆπ 2 ˆπ =.5. From Ex. 2.5a on page 644: ˆπ ˆπ + ˆπ 2 ˆπ 2.75.25.6.4 = + =.087;

More information

Some General Types of Tests

Some General Types of Tests Some General Types of Tests We may not be able to find a UMP or UMPU test in a given situation. In that case, we may use test of some general class of tests that often have good asymptotic properties.

More information

An exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form.

An exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form. Stat 8112 Lecture Notes Asymptotics of Exponential Families Charles J. Geyer January 23, 2013 1 Exponential Families An exponential family of distributions is a parametric statistical model having densities

More information

Last week. posterior marginal density. exact conditional density. LTCC Likelihood Theory Week 3 November 19, /36

Last week. posterior marginal density. exact conditional density. LTCC Likelihood Theory Week 3 November 19, /36 Last week Nuisance parameters f (y; ψ, λ), l(ψ, λ) posterior marginal density π m (ψ) =. c (2π) q el P(ψ) l P ( ˆψ) j P ( ˆψ) 1/2 π(ψ, ˆλ ψ ) j λλ ( ˆψ, ˆλ) 1/2 π( ˆψ, ˆλ) j λλ (ψ, ˆλ ψ ) 1/2 l p (ψ) =

More information

Improved Inference for First Order Autocorrelation using Likelihood Analysis

Improved Inference for First Order Autocorrelation using Likelihood Analysis Improved Inference for First Order Autocorrelation using Likelihood Analysis M. Rekkas Y. Sun A. Wong Abstract Testing for first-order autocorrelation in small samples using the standard asymptotic test

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given. (a) If X and Y are independent, Corr(X, Y ) = 0. (b) (c) (d) (e) A consistent estimator must be asymptotically

More information

parameter space Θ, depending only on X, such that Note: it is not θ that is random, but the set C(X).

parameter space Θ, depending only on X, such that Note: it is not θ that is random, but the set C(X). 4. Interval estimation The goal for interval estimation is to specify the accurary of an estimate. A 1 α confidence set for a parameter θ is a set C(X) in the parameter space Θ, depending only on X, such

More information

Maximum Likelihood Tests and Quasi-Maximum-Likelihood

Maximum Likelihood Tests and Quasi-Maximum-Likelihood Maximum Likelihood Tests and Quasi-Maximum-Likelihood Wendelin Schnedler Department of Economics University of Heidelberg 10. Dezember 2007 Wendelin Schnedler (AWI) Maximum Likelihood Tests and Quasi-Maximum-Likelihood10.

More information

Default priors and model parametrization

Default priors and model parametrization 1 / 16 Default priors and model parametrization Nancy Reid O-Bayes09, June 6, 2009 Don Fraser, Elisabeta Marras, Grace Yun-Yi 2 / 16 Well-calibrated priors model f (y; θ), F(y; θ); log-likelihood l(θ)

More information

Akaike Information Criterion

Akaike Information Criterion Akaike Information Criterion Shuhua Hu Center for Research in Scientific Computation North Carolina State University Raleigh, NC February 7, 2012-1- background Background Model statistical model: Y j =

More information

Likelihood and Asymptotic Theory for Statistical Inference

Likelihood and Asymptotic Theory for Statistical Inference Likelihood and Asymptotic Theory for Statistical Inference Nancy Reid 020 7679 1863 reid@utstat.utoronto.ca n.reid@ucl.ac.uk http://www.utstat.toronto.edu/reid/ltccf12.html LTCC Likelihood Theory Week

More information

Likelihood and p-value functions in the composite likelihood context

Likelihood and p-value functions in the composite likelihood context Likelihood and p-value functions in the composite likelihood context D.A.S. Fraser and N. Reid Department of Statistical Sciences University of Toronto November 19, 2016 Abstract The need for combining

More information

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018 Mathematics Ph.D. Qualifying Examination Stat 52800 Probability, January 2018 NOTE: Answers all questions completely. Justify every step. Time allowed: 3 hours. 1. Let X 1,..., X n be a random sample from

More information

Accurate directional inference for vector parameters

Accurate directional inference for vector parameters Accurate directional inference for vector parameters Nancy Reid October 28, 2016 with Don Fraser, Nicola Sartori, Anthony Davison Parametric models and likelihood model f (y; θ), θ R p data y = (y 1,...,

More information

Pubh 8482: Sequential Analysis

Pubh 8482: Sequential Analysis Pubh 8482: Sequential Analysis Joseph S. Koopmeiners Division of Biostatistics University of Minnesota Week 8 P-values When reporting results, we usually report p-values in place of reporting whether or

More information

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Theory of Maximum Likelihood Estimation. Konstantin Kashin Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical

More information

Advanced Quantitative Methods: maximum likelihood

Advanced Quantitative Methods: maximum likelihood Advanced Quantitative Methods: Maximum Likelihood University College Dublin 4 March 2014 1 2 3 4 5 6 Outline 1 2 3 4 5 6 of straight lines y = 1 2 x + 2 dy dx = 1 2 of curves y = x 2 4x + 5 of curves y

More information

Accurate directional inference for vector parameters

Accurate directional inference for vector parameters Accurate directional inference for vector parameters Nancy Reid February 26, 2016 with Don Fraser, Nicola Sartori, Anthony Davison Nancy Reid Accurate directional inference for vector parameters York University

More information

1 One-way analysis of variance

1 One-way analysis of variance LIST OF FORMULAS (Version from 21. November 2014) STK2120 1 One-way analysis of variance Assume X ij = µ+α i +ɛ ij ; j = 1, 2,..., J i ; i = 1, 2,..., I ; where ɛ ij -s are independent and N(0, σ 2 ) distributed.

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano, 02LEu1 ttd ~Lt~S Testing Statistical Hypotheses Third Edition With 6 Illustrations ~Springer 2 The Probability Background 28 2.1 Probability and Measure 28 2.2 Integration.........

More information

Likelihood-based Inference for Linear-by-Linear Association in Contingency Tables

Likelihood-based Inference for Linear-by-Linear Association in Contingency Tables Università degli Studi di Padova Dipartimento di Scienze Statistiche Corso di Laurea Triennale in Statistica per le Tecnologie e le Scienze Relazione Finale Likelihood-based Inference for Linear-by-Linear

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE

FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE Donald A. Pierce Oregon State Univ (Emeritus), RERF Hiroshima (Retired), Oregon Health Sciences Univ (Adjunct) Ruggero Bellio Univ of Udine For Perugia

More information

Various types of likelihood

Various types of likelihood Various types of likelihood 1. likelihood, marginal likelihood, conditional likelihood, profile likelihood, adjusted profile likelihood 2. semi-parametric likelihood, partial likelihood 3. empirical likelihood,

More information

Modern Likelihood-Frequentist Inference. Donald A Pierce, OHSU and Ruggero Bellio, Univ of Udine

Modern Likelihood-Frequentist Inference. Donald A Pierce, OHSU and Ruggero Bellio, Univ of Udine Modern Likelihood-Frequentist Inference Donald A Pierce, OHSU and Ruggero Bellio, Univ of Udine Shortly before 1980, important developments in frequency theory of inference were in the air. Strictly, this

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n Chapter 9 Hypothesis Testing 9.1 Wald, Rao, and Likelihood Ratio Tests Suppose we wish to test H 0 : θ = θ 0 against H 1 : θ θ 0. The likelihood-based results of Chapter 8 give rise to several possible

More information

Hypothesis Testing. A rule for making the required choice can be described in two ways: called the rejection or critical region of the test.

Hypothesis Testing. A rule for making the required choice can be described in two ways: called the rejection or critical region of the test. Hypothesis Testing Hypothesis testing is a statistical problem where you must choose, on the basis of data X, between two alternatives. We formalize this as the problem of choosing between two hypotheses:

More information

The outline for Unit 3

The outline for Unit 3 The outline for Unit 3 Unit 1. Introduction: The regression model. Unit 2. Estimation principles. Unit 3: Hypothesis testing principles. 3.1 Wald test. 3.2 Lagrange Multiplier. 3.3 Likelihood Ratio Test.

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

Recap. Vector observation: Y f (y; θ), Y Y R m, θ R d. sample of independent vectors y 1,..., y n. pairwise log-likelihood n m. weights are often 1

Recap. Vector observation: Y f (y; θ), Y Y R m, θ R d. sample of independent vectors y 1,..., y n. pairwise log-likelihood n m. weights are often 1 Recap Vector observation: Y f (y; θ), Y Y R m, θ R d sample of independent vectors y 1,..., y n pairwise log-likelihood n m i=1 r=1 s>r w rs log f 2 (y ir, y is ; θ) weights are often 1 more generally,

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)

More information

Semiparametric Regression

Semiparametric Regression Semiparametric Regression Patrick Breheny October 22 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Introduction Over the past few weeks, we ve introduced a variety of regression models under

More information

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models.

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models. 1 Transformations 1.1 Introduction Diagnostics can identify two possible areas of failure of assumptions when fitting linear models. (i) lack of Normality (ii) heterogeneity of variances It is important

More information

Answer Key for STAT 200B HW No. 8

Answer Key for STAT 200B HW No. 8 Answer Key for STAT 200B HW No. 8 May 8, 2007 Problem 3.42 p. 708 The values of Ȳ for x 00, 0, 20, 30 are 5/40, 0, 20/50, and, respectively. From Corollary 3.5 it follows that MLE exists i G is identiable

More information

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Jeremy S. Conner and Dale E. Seborg Department of Chemical Engineering University of California, Santa Barbara, CA

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation University of Pavia Maximum Likelihood Estimation Eduardo Rossi Likelihood function Choosing parameter values that make what one has observed more likely to occur than any other parameter values do. Assumption(Distribution)

More information

Bayesian and frequentist inference

Bayesian and frequentist inference Bayesian and frequentist inference Nancy Reid March 26, 2007 Don Fraser, Ana-Maria Staicu Overview Methods of inference Asymptotic theory Approximate posteriors matching priors Examples Logistic regression

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Lecture 3. Inference about multivariate normal distribution

Lecture 3. Inference about multivariate normal distribution Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X 1,..., X n be i.i.d. N p (µ, Σ). We are interested in evaluation of the maximum likelihood estimates

More information

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may

More information

Lecture 15. Hypothesis testing in the linear model

Lecture 15. Hypothesis testing in the linear model 14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study TECHNICAL REPORT # 59 MAY 2013 Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study Sergey Tarima, Peng He, Tao Wang, Aniko Szabo Division of Biostatistics,

More information

Applied Asymptotics Case studies in higher order inference

Applied Asymptotics Case studies in higher order inference Applied Asymptotics Case studies in higher order inference Nancy Reid May 18, 2006 A.C. Davison, A. R. Brazzale, A. M. Staicu Introduction likelihood-based inference in parametric models higher order approximations

More information

ECE531 Lecture 10b: Maximum Likelihood Estimation

ECE531 Lecture 10b: Maximum Likelihood Estimation ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

1 Likelihood. 1.1 Likelihood function. Likelihood & Maximum Likelihood Estimators

1 Likelihood. 1.1 Likelihood function. Likelihood & Maximum Likelihood Estimators Likelihood & Maximum Likelihood Estimators February 26, 2018 Debdeep Pati 1 Likelihood Likelihood is surely one of the most important concepts in statistical theory. We have seen the role it plays in sufficiency,

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Stat 5102 Lecture Slides Deck 3. Charles J. Geyer School of Statistics University of Minnesota

Stat 5102 Lecture Slides Deck 3. Charles J. Geyer School of Statistics University of Minnesota Stat 5102 Lecture Slides Deck 3 Charles J. Geyer School of Statistics University of Minnesota 1 Likelihood Inference We have learned one very general method of estimation: method of moments. the Now we

More information