Likelihood based Statistical Inference. Dottorato in Economia e Finanza Dipartimento di Scienze Economiche Univ. di Verona

Size: px
Start display at page:

Download "Likelihood based Statistical Inference. Dottorato in Economia e Finanza Dipartimento di Scienze Economiche Univ. di Verona"

Transcription

1 Likelihood based Statistical Inference Dottorato in Economia e Finanza Dipartimento di Scienze Economiche Univ. di Verona L. Pace, A. Salvan, N. Sartori Udine, April 2009 Statistical models L 1 slide 1

2 Lecture 1 Statistical models: data variability and uncertainty in inference Abbreviations 1.0 Detailed References 1.1 The theory of statistical inference Statistical models Paradigms of inference The problems of statistical inference in the Fisherian paradigm 1.2 Model specification (data variability) Levels of specification Notes on the specification of a parametric model 1.3 Problems of distribution (statistical variability) How do we solve a distribution problem? Multivariate normal distributions Convergence of sums of r.v. s Empirical distribution function Simulation Delta method Statistical models L 1 slide 2

3 Abbreviations Books: PS01: Pace, L. and Salvan, A. (2001). Introduzione alla Statistica - II. Inferenza, verosimiglianza, modelli. Cedam, Padova. PS96: Pace, L. and Salvan, A. (1996). Teoria della Statistica: Metodi, modelli, approssimazioni asintotiche. Cedam, Padova. PS97: (English version of PS96) Principles of Statistical Inference from a Neo-Fisherian Perspective, Advanced Series on Statistical Science and Applied Probability, Vol.4, World Scientific, Singapore. Statistical models L 1 slide 3

4 Detailed References 1.1 The theory of statistical inference *PS01: *PS96/PS97: (see also references in PS96/PS97 1.6) Barnett, V. (1999). Comparative Statistical Inference. Third edition, Wiley, New York. Breiman, L. (2001). Statistical Modeling: the two cultures. Stat. Sci., 16, Cox, D.R. (2006). Principles of Statistical Inference. Cambridge Univ. Press, Ch. 1. Cox, D.R. and Hinkley, D.V. (1974). Theoretical Statistics, Chapman and Hall, London. Davison, A.C. (2003). Statistical Models. Cambridge Univ. Press. Welsh, A. (1996). Aspects of Statistical Inference, Wiley, New York. Young, G.A. and Smith, R.L. (2005). Essentials of Statistical Inference. Cambridge Univ. Press. Statistical models L 1 slide 4

5 Detailed References 1.2 Model specification (data variability) *PS01: 1.6 *PS96/PS97: See also further references in PS96 1.6, in particular Cox (1990) and Lehmann (1990), Stat. Sci.. Burnham, K.P. and Anderson, D.R. (2002). Model Selection and Multimodel Inference, Springer, New York. Statistical models L 1 slide 5

6 Detailed References 1.3 Problems of distribution (statistical variability) PS96/PS97: 2.2 and 2.10 (statistics and combinants) PS01: 0.7 and 2.7 (simulation) Severini, T.A. (2005). Elements of Distribution Theory. Cambridge University Press, Cambridge. van der Vaart (1998). Asymptotic Statistics. Cambridge Univ. Press, see 3.1 for Delta method. Davison, A.C., Hinkley, D.V. (1997). Bootstrap Methods and their Application. Cambridge University Press, Cambridge. Efron, B., Tibshirani, R. (1993). An Introduction to the Bootstrap. Chapman and Hall, London. Statistical models L 1 slide 6

7 References for each Section Young, G.A. and Smith, R.L. (2005). Essentials of Statistical Inference. Cambridge Univ. Press, Ch. 11 on bootstrap methods. Statistical Science, 2003, 18, n.2. Special issue on Silver Anniversary of the Bootstrap. Statistical models L 1 slide 7

8 Detailed References Convergence of sums of r.v. s *PS96: Appendix C. Serfling, R.J. (1981). Approximation Theorems of Mathematical Statistics. Wiley, New York, 1.9. van der Vaart (1998). Asymptotic Statistics. Cambridge Univ. Press, Ch.2. Billingsley, P. (1986). Probability and Measure, Wiley, New York, Ch Empirical distribution function *PS01: *PS96/PS97: 3.6 Serfling, R.J. (1981). Approximation Theorems of Mathematical Statistics. Wiley, New York, 2.1. van der Vaart (1998). Asymptotic Statistics. Cambridge Univ. Press, Ch.19. Statistical models L 1 slide 8

9 Detailed References Delta method *PS96: Appendix; PS97 Appendix A. van der Vaart (1998). Asymptotic Statistics. Cambridge Univ. Press, 3.1. Statistical models L 1 slide 9

10 1.1.1 Statistical models Fundamental assumption: the observed data y obs, often of the form y obs = (y1 obs,...,yobs n ), with yi obs an observation on the i-th observed unit, are the realization of a random vector Y (or, more generally, of a stochastic process) whose probability distribution is (partly) unknown. Data are used to reconstruct the distribution of Y. Dual role of probability: 1. descriptive: modelling data variability; 2. epistemological: quantification of uncertainty of inductions. Statistical models L 1 slide 10

11 Statistical models The way in which the fundamental assumption conforms to the observations being studied can vary considerably... Statistical models L 1 slide 11

12 Statistical models The way in which the fundamental assumption conforms to the observations being studied can vary considerably... In the following, we formally assume y obs as a realization of Y p 0 (y), y Y, where p 0 (y) represents the unknown probability density function (p.d.f.), with respect to a suitable measure, and where Y is the sample space. Statistical models L 1 slide 11

13 Statistical models The way in which the fundamental assumption conforms to the observations being studied can vary considerably... In the following, we formally assume y obs as a realization of Y p 0 (y), y Y, where p 0 (y) represents the unknown probability density function (p.d.f.), with respect to a suitable measure, and where Y is the sample space. Aim: to reconstruct p 0 (y) on the basis of both data and suitable assumptions and, possibly, on the grounds of previous information concise description interpretation and prediction. Statistical models L 1 slide 11

14 Statistical models p 0 (y) probability model Statistical models L 1 slide 12

15 Statistical models p 0 (y) probability model Previous information family F of probability distributions which are, at least qualitatively, compatible with y obs. Statistical models L 1 slide 12

16 Statistical models p 0 (y) probability model Previous information family F of probability distributions which are, at least qualitatively, compatible with y obs. F statistical model p 0 ( ) F: statistical model correctly specified; otherwise the model is said to be misspecified. Statistical models L 1 slide 12

17 Statistical models p 0 (y) probability model Previous information family F of probability distributions which are, at least qualitatively, compatible with y obs. F statistical model p 0 ( ) F: statistical model correctly specified; otherwise the model is said to be misspecified. Parametric statistical model: F = {p(y; θ), θ Θ IR p }. If F is correctly specified: p 0 (y) = p(y; θ 0 ) for a value θ 0 Θ, called true parameter value. Statistical models L 1 slide 12

18 Statistical models p 0 (y) probability model Previous information family F of probability distributions which are, at least qualitatively, compatible with y obs. F statistical model p 0 ( ) F: statistical model correctly specified; otherwise the model is said to be misspecified. Parametric statistical model: F = {p(y; θ), θ Θ IR p }. If F is correctly specified: p 0 (y) = p(y; θ 0 ) for a value θ 0 Θ, called true parameter value. In applications, F is usually considered as an approximation, expected to be adequate for the aims of the research. Statistical models L 1 slide 12

19 Statistical models The probability model has been defined above using the p.d.f. p(y). In some contexts, specification could be in terms of other functions. For instance, if Y is a univariate random variable (r.v.): its distribution function (d.f.) F(y) = P(Y y), Statistical models L 1 slide 13

20 Statistical models The probability model has been defined above using the p.d.f. p(y). In some contexts, specification could be in terms of other functions. For instance, if Y is a univariate random variable (r.v.): its distribution function (d.f.) F(y) = P(Y y), its moment generating function M(t) = E{exp(tY )} or its cumulant generating function K(t) = log(m(t)). Statistical models L 1 slide 13

21 Statistical models The probability model has been defined above using the p.d.f. p(y). In some contexts, specification could be in terms of other functions. For instance, if Y is a univariate random variable (r.v.): its distribution function (d.f.) F(y) = P(Y y), its moment generating function M(t) = E{exp(tY )} or its cumulant generating function K(t) = log(m(t)). With a continuous non-negative r.v. which describes a lifetime, the failure rate r(y) = p(y)/{1 F(y)}. Recall: r(y)dy = P(Y (y, y + dy) Y > y). If Y has failure rate r(y) the p.d.f. of Y is p(y) = r(y) exp{ y 0 r(t)dt}. (1) In the following, notations such as p Y ( ), F Y ( ), r Y ( ) will be used when necessary. Statistical models L 1 slide 13

22 1.1.2 Paradigms of inference One philosophy of inference is not available. There are some broad structures for interpreting inference. Statistical models L 1 slide 14

23 1.1.2 Paradigms of inference One philosophy of inference is not available. There are some broad structures for interpreting inference. Four general views, or paradigms, for statistical inference are distinguished here, using schematization which is, of course, reductive. The essential differences relate to the interpretation of probability and to the objectives of statistical inference. Statistical models L 1 slide 14

24 1.1.2 Paradigms of inference One philosophy of inference is not available. There are some broad structures for interpreting inference. Four general views, or paradigms, for statistical inference are distinguished here, using schematization which is, of course, reductive. The essential differences relate to the interpretation of probability and to the objectives of statistical inference. In order to grasp such differences, we need to recall two fundamental ideas: Bayesian modelling and the repeated sampling principle. Statistical models L 1 slide 14

25 Bayesian models Let F be a (parametric) statistical model indexed by θ taking values in Θ. Suppose we are able to summarize beliefs about θ through a prior density π(θ). Statistical models L 1 slide 15

26 Bayesian models Let F be a (parametric) statistical model indexed by θ taking values in Θ. Suppose we are able to summarize beliefs about θ through a prior density π(θ). Let us write p(y θ) instead of p(y; θ), with the conditional notation to underline that the model describes the distribution of the data y given θ. Statistical models L 1 slide 15

27 Bayesian models Let F be a (parametric) statistical model indexed by θ taking values in Θ. Suppose we are able to summarize beliefs about θ through a prior density π(θ). Let us write p(y θ) instead of p(y; θ), with the conditional notation to underline that the model describes the distribution of the data y given θ. By Bayes theorem, information about θ is updated by y according to π(θ y) = π(θ)p(y θ) π(θ)p(y θ)dθ, giving the posterior density for θ given y. Statistical models L 1 slide 15

28 Repeated sampling principle Inference we draw from y should be assessed by its behaviour in hypothetical repetitions, under the same conditions, of the experiment which generated the data y. Whenever we evaluate the mean squared error of an estimator, the p-value of a test or the power of a level α test, the level of a confidence region,..., we are adopting the repeated sampling principle. In Bayesian views, inference is conditional upon the observed data y and the repeated sampling principle is not accepted. Statistical models L 1 slide 16

29 Paradigms of inference subjectivist Fisherian frequency- non-personalistic Bayesian decision Bayesian Bayes, Laplace,..., (K.Pearson), Fisher, Neyman, E.Pearson, Jeffreys,..., Zellner, Ramsey, De Finetti,..., Cox,...,... Wald, Lehmann,... Ferguson, PROBABILITY: PROBABILITY: PROBABILITY: PROBABILITY: subject s state objective, with objective, with representing of knowledge experimental interpr. experimental interpr. prior ignorance Statistical models L 1 slide 17

30 Paradigms of inference subjectivist Fisherian frequency- non-personalistic Bayesian decision Bayesian INFERENCE: INFERENCE: INFERENCE: INFERENCE: formalization of emphasis on decision problems use of how it changes the likelihood ( optimality of tests, non-informative in the light of data ( sufficiency ) estimators, etc.) prior (Bayes theorem) distributions aim: give evaluate uncertainty repeated probability to the of inference: sampling elements of F repeated sampling principle ( causes ) principle+conditioning ( relevance of probability) NO conditioning ROBUST METHODS, BOOTSTRAP, ASYMPTOTICS Statistical models L 1 slide 18

31 1.1.3 The problems of statistical inference in the Fisherian paradigm Fisher (1922a): the aim of a statistical analysis is to summarize the data y obs by means of the reconstruction of p 0 (y). Three classes of problems: of specification: linked to the identification of a statistical model F appropriate for y obs (ideally, p 0 (y) F...). of estimation of inference: finding statistical procedures able to locate p 0 (y) within F or, with the help of F. F correctly specified the reconstruction of p 0 (y) will, usually, be all the easier the less mathematically complex F is; this class also includes finding procedures which are appropriate for giving indications the plausibility of the assumption p 0 (y) F; of distribution: evaluation of how sensitive the reconstruction of p 0 (y) is to the fact that the data used are only a sample; in general, the reconstruction of p 0 (y) will be more effective the lower the extension of F is. Statistical models L 1 slide 19

32 Three classes of problems The three classes of problems are closely interlinked...should not be understood as corresponding to successive phases in data analysis; rather, they should be understood as logical moments along a necessarily iterative path (think e.g. to the analysis of residuals in a linear regression model possbly followed by specification of a better model). In applications, a statistical model is usually considered as an approximation (no one expects it to capture p 0 (y) accurately), rather it can be considered adequate for the aims of the research. Statistical models L 1 slide 20

33 1.2 Model specification (data variability) Model specification is very important, and usually reflects more on the conclusions than does the inference paradigm adopted. However, the theory of statistical inference, traditionally, lacks explicit indications. Fisher applied statistics. Some guidelines, based on common sense: nature of the data being examined: qualitative (nominal, ordinal) or quantitative (discrete, continuous) variables, functions, images, etc.. variables could be subdivided into subsets, for example, into response and explanatory variables; the model must respect both the support and the role of variables. information about the observation scheme: e.g., random sampling (it s an idealization), randomization, censoring or other models for missing data, sequential sampling, time and space dependence. Statistical models L 1 slide 21

34 Model specification (data variability) what aspects of the data should the model be able to catch? e.g. the centre of distribution, unimodality or bimodality, dependence on explanatory variables etc., complementary aspects, such as dispersion, asymmetry, heteroschedasticity, etc. a statistical model should be able to succinctly describe the aspects that are of primary interest and must also be sufficiently flexible to allow a realistic description of additional aspects. Statistical models L 1 slide 22

35 1.2.1 Levels of specification Depending on the information available it may be deemed appropriate to extend the statistical model F to a greater or a lesser degree. The following three levels of specification can be outlined. Parametric specification F = {p(y; θ), θ Θ IR p }. If the model is correctly specified: p 0 (y) = p(y; θ 0 ) for a value θ 0 Θ, called true parameter value. Semiparametric specification F = {p(y; θ), θ Θ} θ = (τ, h( )), τ T IR k, whereas the set of possible specifications of the function h( ) cannot be indexed by a finite number of real parameters. Statistical models L 1 slide 23

36 Examples of semiparametric models: a) the class of continuous symmetrical distributions on IR, with density of the form p(y; µ) = p 0 (y µ), with µ IR and p 0 ( ) an unknown probability density symmetric around the origin; b) the usual linear regression model of Y on X, E(Y i ) = α + βx i, where the Y i, i = 1,..., n, are independent with a common variance σ 2, the x i are n known constants, τ = (α, β, σ 2 ), and the distribution of Y is not further specified; c) Cox proportional hazards model (Cox, 1972): Y i : lifetime of the i-th unit x i : k-dimensional explanatory variable; This model is specified through the failure rate function as r Yi (y i ) = r 0 (y i ) exp{β x i }, (2) r 0 ( ): unknown function, called baseline hazard function; β = (β 1,...,β k ) vector of unknown regression coefficients, (β x i scalar product= β x i ). Statistical models L 1 slide 24

37 Levels of specification Nonparametric specification The model F is a restriction of the family of all the probability distributions defined on a support which is suitable for the data under analysis. Its elements cannot be indexed by a finite number of parameters that are primary subject of inference. E.g., with data y = (y 1,...,y n ), a possible nonparametric model is given by family F made up of distributions with independent and identically distributed components Statistical models L 1 slide 25

38 Levels of specification Nonparametric specification The model F is a restriction of the family of all the probability distributions defined on a support which is suitable for the data under analysis. Its elements cannot be indexed by a finite number of parameters that are primary subject of inference. E.g., with data y = (y 1,...,y n ), a possible nonparametric model is given by family F made up of distributions with independent and identically distributed components The specification of a model is usually the product of an iterative process. The choice between competing models can be made through informal and formal tools, such as plots, analysis of residuals, selection procedures and tests of goodness-of-fit. Statistical models L 1 slide 25

39 1.2.2 Notes on the specification of a parametric model Here, a direct comparison is required between the knowledge available regarding the mechanism that has generated the data and the probabilistic genesis, exact or asymptotic, of the various families of distributions. Examples: binomial distribution indep. of trials and constant probability of success in each trial; exponential distrib. lack of memory (constant failure rate). Characterization results for a parametric family F can also be helpful (a characterization is a necessary and sufficient condition for the density p(y) to belong to F). See e.g. characterization results for the normal or exponential distributions or for the Poisson process,... Asymptotic arguments: normal distribution, extreme value distributions, stable distributions,... Statistical models L 1 slide 26

40 1.3 Problems of distribution (statistical variability) Usually, to solve an inference problem, suitable data reductions are used: data y statistic t(y), or, more generally, y combinant q(y; θ), θ Θ. Statistical models L 1 slide 27

41 1.3 Problems of distribution (statistical variability) Usually, to solve an inference problem, suitable data reductions are used: data y statistic t(y), or, more generally, y combinant q(y; θ), θ Θ. y realization of Y t(y) realization of T = t(y ). Statistical models L 1 slide 27

42 1.3 Problems of distribution (statistical variability) Usually, to solve an inference problem, suitable data reductions are used: data y statistic t(y), or, more generally, y combinant q(y; θ), θ Θ. y realization of Y t(y) realization of T = t(y ). induced model F T = {p T (t; θ), θ Θ}. For a combinant, Q θ = q(y ; θ), null distribution for θ = θ. F Qθ = {p Q (q; θ, θ ), θ Θ}, Statistical models L 1 slide 27

43 1.3 Problems of distribution (statistical variability) Usually, to solve an inference problem, suitable data reductions are used: data y statistic t(y), or, more generally, y combinant q(y; θ), θ Θ. y realization of Y t(y) realization of T = t(y ). induced model F T = {p T (t; θ), θ Θ}. For a combinant, Q θ = q(y ; θ), F Qθ = {p Q (q; θ, θ ), θ Θ}, null distribution for θ = θ. To solve a distribution problem is the same as to obtain F T o F Qθ, θ Θ, or some of their elements. Statistical models L 1 slide 27

44 1.3 Problems of distribution (statistical variability) Special cases Distribution constant statistic: a statistic whose associated statistical model has only one element, that is, such that the distribution of T does not depend on θ in Θ. Statistical models L 1 slide 28

45 1.3 Problems of distribution (statistical variability) Special cases Distribution constant statistic: a statistic whose associated statistical model has only one element, that is, such that the distribution of T does not depend on θ in Θ. First-order distribution constant statistic: a statistic t such that its expectation E θ (t(y )) does not depend on θ in Θ. Statistical models L 1 slide 28

46 1.3 Problems of distribution (statistical variability) Special cases Distribution constant statistic: a statistic whose associated statistical model has only one element, that is, such that the distribution of T does not depend on θ in Θ. First-order distribution constant statistic: a statistic t such that its expectation E θ (t(y )) does not depend on θ in Θ. Pivotal quantity: a combinant whose null distribution does not depend on θ in Θ. Statistical models L 1 slide 28

47 1.3.1 How do we solve a distribution problem? Exact methods: laws of functions of r.v. s (Probability Theory). Approximate methods: Simulation (Monte Carlo methods, Bootstrap) Asymptotic approximations based on limit theorems of Probability Theory. Statistical models L 1 slide 29

48 1.3.2 Multivariate normal distributions Bivariate normal distributions Hierarchical definition. Consider first the case with standardized components. Let (U, V ) be a random vector with U N(0, 1) and V U = u N(ρu, 1 ρ 2 ), where 1 < ρ < 1 and u IR. Since the support of U, S U is IR and the conditional support for V given U = u is IR, we have that the joint support of (U, V ) is S U,V = IR 2. Moreover p U,V (u, v) = p U (u)p V U=u (v) is equal to p U,V (u, v) = { 1 2π 1 ρ exp 2 } 1 2(1 ρ 2 ) (u2 + v 2 2ρuv) (U, V ) is the called a bivariate normal distribution with standardized components and parameter ρ. Components U and V are independent only if ρ = 0.. Statistical models L 1 slide 30

49 Bivariate normal distributions Support and joint density are symmetrical with respect to interchange of u with v. Thus, also V N(0, 1) and U V = v N(ρv, 1 ρ 2 ). The parameter ρ is the linear correlation coefficient of U and V. Indeed, it is easy to check that Cov(U, V ) = E(UV ) = E U (Uµ V (U)) = ρ, where µ V (U) = E(V U) is the regression function of V on U. The moment generating function of (U, V ) is M U,V (t 1, t 2 ) = E (exp{t 1 U + t 2 V }) = exp{ 1 2 (t2 1 + t ρt 1 t 2 )}. All linear combinations T = au + bv with a 2 + b 2 0 have a normal distribution. Indeed, { } 1 M T (t) = M U,V (at, bt) = exp 2 (a2 + b 2 + 2ρab)t 2. Statistical models L 1 slide 31

50 Bivariate normal distributions The general bivariate normal distribution is obtained by assuming non-standardized marginal components. In particular, we call bivariate normal with mean vector µ = (µ X, µ Y ) and covariance matrix Σ = [σ ij ], with i, j = 1, 2, where σ 11 = σ 2, σ X 22 = σ 2, σ Y 12 = σ 21 = ρσ X σ Y (µ X, µ Y IR ; σ 2, σ2 > 0; 1 < ρ < 1), the probability distribution of (X, Y ),where X Y X = µ X + σ X U Y = µ Y + σ Y V with (U, V ) bivariate normal with standardized components. Notation: N 2 (µ, Σ). Density: { [ (x ) 2 ( ) 2 1 µx y µy p X,Y (x, y) = c exp 2(1 ρ 2 + 2ρ x µ X ) with c = (2πσ X σ Y 1 ρ 2 ) 1. σ X σ Y σ X y µ Y σ Y ]} Statistical models L 1 slide 32

51 Bivariate normal distributions In matrix notation, p X,Y (z) = { 1 exp 1 } 2π Σ 1/2 2 (z µ) Σ 1 (z µ), where z = (x, y) and Σ is the determinant of Σ. The moment generating function of (X, Y ) N 2 (µ, Σ) is M X,Y (t 1, t 2 ) = exp{t 1 µ X + t 2 µ Y }M U,V (σ X t 1, σ Y t 2 ) { = exp t µ + 1 } 2 t Σt. Being M X (t 1 ) = M X,Y (t 1, 0), we have X N(µ X, σ 2 X ). Similarly, being M Y (t 2 ) = M X,Y (0, t 2 ), Y N(µ Y, σ 2 ) as well. Y Statistical models L 1 slide 33

52 Bivariate normal distributions More generally, if T = ax + by with a 2 + b 2 0, we have M T (t) = M X,Y (at, bt), so that T N ( aµ X + bµ Y, a 2 σ 2 X + b2 σ 2 Y + 2abρσ X σ Y ) and Cov(X, Y ) = ρσ X σ Y. The interpretation of parameters µ and Σ given in the definition of N 2 (µ, Σ) is thus fully justified. Statistical models L 1 slide 34

53 Bivariate normal distributions More generally, if T = ax + by with a 2 + b 2 0, we have M T (t) = M X,Y (at, bt), so that T N ( aµ X + bµ Y, a 2 σ 2 X + b2 σ 2 Y + 2abρσ X σ Y ) and Cov(X, Y ) = ρσ X σ Y. The interpretation of parameters µ and Σ given in the definition of N 2 (µ, Σ) is thus fully justified. Vice versa, let (X, Y ) be a two-dim. r.v. with E(X) = µ X, E(Y ) = µ Y, V ar(x) = σ 2 X > 0, V ar(y ) = σ2 Y > 0 and Cov(X, Y ) = ρσ X σ Y, where ρ < 1, such that for all real a and b (a 2 + b 2 0) the distribution of T = ax + by is univariate normal. Thus, from the expression of M T (t) one concludes that (X, Y ) N 2 (µ, Σ). Statistical models L 1 slide 34

54 Bivariate normal distributions Writing the joint density as 1 p X,Y (x, y) = p X (x) exp 1 { 2πσY 1 ρ 2 2σ 2 (1 ρ2 ) Y [ y µ Y ρ σ ] } 2 Y (x µ σ X ) X we have Y X = x N ( µ Y + ρ σ ) Y (x µ σ X ), σ 2 (1 ρ2 ) Y X. Similarly, X Y = y N ( µ X + ρ σ ) X (y µ σ Y ), σ 2 (1 ρ2 ) X Y. Statistical models L 1 slide 35

55 Bivariate normal distributions The regression function of Y on X and of X on Y are straight lines, with intercept and slope equal to the usual values of linear regression expressed in terms of moments of (X, Y ) up to second order. Moreover, the conditional distributions have the same variance equal to the unexplained variance in linear regression. Statistical models L 1 slide 36

56 Multivariate normal distributions We call a d-dimensional multivariate normal distribution with mean vector µ (µ IR d ) and covariance matrix Σ (square of order d, symmetric and positive-definite) the distribution of Y = (Y 1,...,Y d ) characterized by the following property. For any vector a = (a 1,...,a d ) IR d with d i=1 a2 i > 0, the distribution of the linear combination d T = a Y = a i Y i is univariate normal: T N i=1 ( ) a µ, a Σa If Y has a multivariate normal distribution with the parameters defined above, we will write Y N d (µ, Σ). Clearly, Y has support S Y = IR d.. Statistical models L 1 slide 37

57 Multivariate normal distributions If Z = (Z 1,...,Z d ) has independent components Z i N(0, 1), then Z N d (0, I d ), where I d = diag(1,...,1) is the identity matrix of order d. More generally, if Z N d (0, I d ), a IR k with 1 k d and B is a k d matrix with rank k, then a + BZ N k (a, BB ), because linear combinations of components of a + BZ have a univariate normal distribution. Multivariate normality is thus preserved under affine transformations. Statistical models L 1 slide 38

58 Multivariate normal distributions As a special case of the previous result, with k = d, a = µ and B such that BB = Σ, we have Y N d (µ, Σ) if Y = µ + BZ. Using well-known results about positive-definite matrices, a possible choice for B is B = V Λ 1/2 V, where V is an orthogonal matrix (V V = I d ) of eigenvectors of Σ, Λ = diag(λ 1,...,λ d ) is the diagonal matrix of the corresponding eigenvalues, Λ 1/2 = diag( λ 1,..., λ d ). The transformation y = µ + Bz is invertible, with inverse z = B 1 (y µ) = (V Λ 1/2 V ) 1 (y µ) = V Λ 1/2 V (y µ) and Jacobian determinant J(y) = V Λ 1/2 V = V Λ 1/2 V = Σ 1/2, because V = 1 and Λ = Σ. The p.d.f. of Z is p Z (z) = (2π) d/2 exp{ z z/2}. Statistical models L 1 slide 39

59 Multivariate normal distributions Hence, Y N d (µ, Σ) has p.d.f. p Y (y; µ, Σ) = 1 (2π) d/2 exp Σ 1/2 { 1 } 2 (y µ) Σ 1 (y µ). (3) Indeed, z z = ( B 1 (y µ) ) B 1 (y µ) = (y µ) ( B 1) B 1 ( (y µ) = (y µ) B ) 1 ( B 1 (y µ) = (y µ) BB ) 1 (y µ) = (y µ) Σ 1 (y µ). An important consequence: (Y µ) Σ 1 (Y µ) = Z Z χ 2 d. Statistical models L 1 slide 40

60 Multivariate normal distributions If Y N d (µ, Σ) we have T = t Y N(t µ, t Σt). Hence Y has moment generating function M Y (t) = E(e t Y ) = E(e T ) = M T (1) = exp { t µ + 1 } 2 t Σt. Statistical models L 1 slide 41

61 Multivariate normal distributions The marginal distribution of a component Y i of Y N d (µ, Σ) is N(µ i, σ ii ). If d > 2, the marginal distribution of (Y i, Y j ), with i j, is N 2 ((µ i, µ j ), Σ ij ), where ( ) σii σ Σ ij = ij. σ ji More generally, if Y N d (µ, Σ) and we consider the partition of Y as Y = (S, T), where S has d S and T has d T components, with d S + d T = d, and accordingly we consider the partition of µ as µ = (µ S, µ T ) and of Σ as ( ΣSS Σ Σ = ST Σ TS Σ TT ), where Σ SS = V ar(s) is d S d S, Σ TT = V ar(t) is d T d T, it can be shown that marginal and conditional distributions are multivariate normal. σ jj (4) Statistical models L 1 slide 42

62 Multivariate normal distributions In particular, T S = s S N ds (µ S, Σ SS ) N dt (µ T (s), Σ T (s)) where the regression function of T on S is µ T (s) = µ T + Σ TS Σ 1 (s µ ), (5) SS S so that conditional expectations lie on a hyperplane (regression hyperplane). The conditional covariance matrix is Σ T (s) = Σ T S = Σ TT Σ TS Σ 1 SS Σ ST (6) and does not depend on the conditioning value, see Proposition A.6 in PS01, p.397. Statistical models L 1 slide 43

63 Multivariate normal distributions: transformations (First 2 results encountered above... ). For proofs, see PS01, A.6. Theorem 1 If Y N d (µ, Σ), B is a k d matrix with rank k and a IR k, then T = a + BY N k (a + Bµ, BΣB ). Theorem 2 If Y N d (µ, Σ), then (Y µ) Σ 1 (Y µ) χ 2 d. Statistical models L 1 slide 44

64 Multivariate normal distributions: transformations Theorem 3 If Z N d (0, I d ) and P is a d d matrix, which is symmetric, idempotent (PP = P) and with rank r, where 1 r d, then T = Z PZ χ 2 r. Theorem 4 Let Z N d (0, I d ). If P 1 and P 2 are d d symmetric, idempotent matrices, with rank r 1 and r 2, respectively, and if P 1 P 2 = O d, where O d is the d d matrix with all elements equal to 0, then T 1 = Z P 1 Z and T 2 = Z P 2 Z are independent. Theorem 5 Let Z N d (0, I d ). If A is a k d matrix with k < d and P is a d d symmetric, idempotent matrix, with rank r, and if AP = O, where O is the k d matrix with all elements equal to 0, then T 1 = AZ and T 2 = Z PZ are independent. Statistical models L 1 slide 45

65 1.3.3 Convergence of sums of r.v. s In the following, {Y i } is used to indicate a sequence of random variables i = 1, 2,..., with {S n } the corresponding sequence of sums, S n = n i=1 Y i, and {Ȳn} the sequence of sample means, Ȳn = S n /n. Sums of i.i.d. Random Variables a) Laws of large numbers Theorem 6 (Khintchine s weak law. ) Let {Y i } be a sequence of independent and identically distributed random variables with finite expectation E(Y i ) = µ. Then, Ȳn p µ. Theorem 7 (Kolmogorov s strong law (I ). ) Let {Y i } be a sequence of as independent and identically distributed random variables. Then, Ȳn µ if and only if the expectation E(Y i ) is finite and equal to µ. Statistical models L 1 slide 46

66 Central limit theorems b) Central limit theorems Theorem 8 (The finite variance case: Lindberg-Lévy s theorem. ) Let {Y i } be a sequence of independent and identically distributed variables with expectation µ and finite variance σ 2 > 0. Then n( Ȳ n µ) σ d N(0, 1). Theorem 9 (Multivariate central limit theorem. ) Let {Y i } be a sequence of d-dimensional independent and identically distributed random variables with mean vector µ = [κ r ] and finite covariance matrix Σ = [κ r,s ], r, s = 1,..., d, κ r,r < + for r = 1,...,d. Then, denoting by Ȳn the vector of the sample means, n( Ȳ n µ) d N d (0, Σ). Statistical models L 1 slide 47

67 Comments Laws of large numbers and central limit theorems are also extended to sequences of independent but not identically distributed random variables and even to sequences of dependent random variables. The extensions that are most important for the study of the asymptotic properties of likelihood quantities are those related to martingales; for the main results and further references, see e.g. Hall (1985) and Andersen, Borgan, Gill and Keiding (1993, Chapter 2). Statistical models L 1 slide 48

68 Smooth Functions of Converging Sequences a) Convergence in probability Theorem 10 Let {Y n }, n = 1, 2,..., be a sequence of random variables which converges in probability to a constant c. If g( ) is a continuous function defined on the support of Y n, then g(y n ) p g(c). b) Convergence in distribution Theorem 11 such that Let {Y n }, n = 1, 2,..., be a sequence of random variables n(yn θ) d U, with θ IR and U a non-degenerate random variable. Furthermore, let g(θ) be a twice differentiable real function with g (θ) 0. Then n(g(yn ) g(θ)) d g (θ)u. Statistical models L 1 slide 49

69 1.3.4 Empirical distribution function Let y = (y 1,..., y n ) be a random sample of size n from a univariate random variable with unknown distribution function F 0 ( ). For the statistical model specified in the nonparametric form F = {F Y (y) : F Y (y) = n F(y i ), F( ) f.r. su IR} (7) i=1 the empirical distribution function ˆF n (u) = 1 n n I (,u] (y i ) (8) i=1 is a minimal sufficient statistic (see also Example 2.3 in PS96/PS97). Statistical models L 1 slide 50

70 Basic Properties n ˆF n (u) Bi(n, F 0 (u)) E 0 ( ˆF n (u)) = F 0 (u) and Var 0 ( ˆF n (u)) = 1 n F 0(u)(1 F 0 (u)). The subscript in E 0 ( ), Var 0 ( ) and in similar expressions indicates evaluation with respect to F 0 ( ). Moreover, ( Cov 0 ˆFn (u), ˆF ) n (v) = 1 n {min[f 0(u), F 0 (v)] F 0 (u)f 0 (v)}. (9) Statistical models L 1 slide 51

71 Basic properties By the strong law of large numbers, ˆFn (u) will converge almost surely (hence also in probability) to F 0 (u) for every fixed value of u, as n +. The following stronger result holds. Let D n = sup u IR ˆF n (u) F 0 (u), then (Glivenko-Cantelli theorem) ( ) P 0 lim D n = 0 = 1. n + If, furthermore, F 0 (u) is continuous, then for every fixed real values u 1,...,u k, the k-dimensional random variable with components n( ˆFn (u j ) F 0 (u j )), j = 1,...,k, will converge in distribution to a k-dimensional normal distribution N k (0, [σ jh ]), with σ jh = min(f 0 (u j ), F 0 (u h )) F 0 (u j )F 0 (u h ), j, h = 1,...,k. This variable is non-degenerate provided u 1,...,u k are inner points of the support of F 0 ( ). Statistical models L 1 slide 52

72 1.3.5 Simulation Let t(y ) be a scalar statistic and suppose we want to accurately approximate its distribution, under the assumption that Y has (known) distribution p 0 (y). Statistical models L 1 slide 53

73 1.3.5 Simulation Let t(y ) be a scalar statistic and suppose we want to accurately approximate its distribution, under the assumption that Y has (known) distribution p 0 (y). Suppose we can obtain, using a suitable computer program, a sequence of independent realizations y r, r = 1,...,R, of Y under p 0 (y). Statistical models L 1 slide 53

74 1.3.5 Simulation Let t(y ) be a scalar statistic and suppose we want to accurately approximate its distribution, under the assumption that Y has (known) distribution p 0 (y). Suppose we can obtain, using a suitable computer program, a sequence of independent realizations y r, r = 1,...,R, of Y under p 0 (y). Then, t r = t(y r), r = 1,...,R, is a sequence of realizations of t(y ) under p 0 (y) and we can compute the e.d.f. ˆF R (t). Statistical models L 1 slide 53

75 1.3.5 Simulation Let t(y ) be a scalar statistic and suppose we want to accurately approximate its distribution, under the assumption that Y has (known) distribution p 0 (y). Suppose we can obtain, using a suitable computer program, a sequence of independent realizations y r, r = 1,...,R, of Y under p 0 (y). Then, t r = t(y r), r = 1,...,R, is a sequence of realizations of t(y ) under p 0 (y) and we can compute the e.d.f. ˆF R (t). If R is big enough (e.g. R = 10000), ˆF R (t) is a very good approximation of P(t(Y ) t) when Y p 0 (y). Statistical models L 1 slide 53

76 1.3.5 Simulation Let t(y ) be a scalar statistic and suppose we want to accurately approximate its distribution, under the assumption that Y has (known) distribution p 0 (y). Suppose we can obtain, using a suitable computer program, a sequence of independent realizations y r, r = 1,...,R, of Y under p 0 (y). Then, t r = t(y r), r = 1,...,R, is a sequence of realizations of t(y ) under p 0 (y) and we can compute the e.d.f. ˆF R (t). If R is big enough (e.g. R = 10000), ˆF (t) is a very good approximation R of P(t(Y ) t) when Y p 0 (y). Suppose e.g. that t(y) is a scalar test statistic and that p 0 (y) corresponds to the null hypothesis. If large values of t are significant against the null hypothesis and y obs denotes the observed value of y, an approximation of the p-value is given by 1 ˆF R (t(yobs )). Statistical models L 1 slide 53

77 Bootstrap (basic formulation) Let y = (y 1,...,y n ), with y 1,...,y n independent realizations of a univariate r.v. with unknown d.f. F 0 ( ). Let t n = t(y 1,..., y n ) be a scalar statistic computed to make inference about some scalar characteristic τ 0 = G(F 0 ( )), e.g. mean, variance,... For use in frequentist inference, we need the distribution of T n = t(y 1,...,Y n ). We cannot use simulation as above because F 0 ( ) is unknown. The bootstrap combines two principal ideas: Substitute F 0 ( ) with a suitable estimate ˆF 0 ( ); Use Monte Carlo simulation from ˆF 0 ( ). Statistical models L 1 slide 54

78 Bootstrap (basic formulation) Let Y = (Y1,...,Y n ), with Yj i.i.d. with marginal distribution ˆF 0 ( ). Obtain by simulation R realizations of Y. These are R samples of size n, called pseudo-observations or bootstrap samples. On each sample, compute t. This gives a random sample of size R from Tn = t(y1,...,y n ), and this can be used e.g. to estimate the d.f. of Tn using the e.d.f. Parametric bootstrap. Assume F 0 ( ) = F( ; θ 0 ), for a parametric model F = {F( ; θ), θ Θ IR p }, with θ 0 Θ. Obtain an estimate ˆθ n of θ 0 and use ˆF 0 ( ) = F( ; ˆθ n ). Nonparametric bootstrap. Let ˆF n ( ) be the e.d.f. computed on y. Use ˆF 0 ( ) = ˆF n ( ). In both cases, under suitable conditions, if n is large enough, T n Tn under F 0 ( ). Statistical models L 1 slide 55

79 1.3.5 Delta method Under the assumptions of Theorem 6, the following asymptotic expansion holds n(g(yn ) g(θ)) = ng (θ)(y n θ) + o p (1) = g (θ)u + o p (1). Hence, asymptotically, g(y n ) has mean g(θ) and variance (g (θ)) 2 Var(U)/n. The local linearization of g( ) in the above theorem and, more generally, local approximations of g(y n ) based on Taylor expansions, are usually referred to as the delta method (see also Theorems 12 and 13). The delta method is widely used to obtain approximations for moments of g(y n ) in terms of moments of Y n (cf. section 8.5 of PS96/PS97). Statistical models L 1 slide 56

80 Univariate delta method Theorem 12 (Univariate case. ) Let {Y n }, n = 1, 2,..., be a sequence of random variables such that n(yn θ) d N(0, σ 2 ), with θ IR. Furthermore let g( ) be a twice differentiable real function defined on IR such that g (θ) 0. Then n(g(yn ) g(θ)) d N(0, (g (θ)) 2 σ 2 ). Statistical models L 1 slide 57

81 Multivariate delta method Theorem 13 (Multivariate case. ) Let {Y n }, n = 1, 2,..., be a sequence of d-dimensional random variables such that n(yn θ) d N d (0, Σ), with θ IR d and covariance matrix Σ. Also, let g( ) be a function defined on IR d with values in IR k, k d, with twice differentiable components g i ( ), i = 1,...,k. Then n(g(yn ) g(θ)) d N k (0, DΣD T ), where D = [d ij ] is a k d matrix with d ij = g i (θ)/ θ j, i = 1,...,k and j = 1,...,d. Statistical models L 1 slide 58

Theory and Methods of Statistical Inference. PART I Frequentist theory and methods

Theory and Methods of Statistical Inference. PART I Frequentist theory and methods PhD School in Statistics cycle XXVI, 2011 Theory and Methods of Statistical Inference PART I Frequentist theory and methods (A. Salvan, N. Sartori, L. Pace) Syllabus Some prerequisites: Empirical distribution

More information

Theory and Methods of Statistical Inference. PART I Frequentist likelihood methods

Theory and Methods of Statistical Inference. PART I Frequentist likelihood methods PhD School in Statistics XXV cycle, 2010 Theory and Methods of Statistical Inference PART I Frequentist likelihood methods (A. Salvan, N. Sartori, L. Pace) Syllabus Some prerequisites: Empirical distribution

More information

Theory and Methods of Statistical Inference

Theory and Methods of Statistical Inference PhD School in Statistics cycle XXIX, 2014 Theory and Methods of Statistical Inference Instructors: B. Liseo, L. Pace, A. Salvan (course coordinator), N. Sartori, A. Tancredi, L. Ventura Syllabus Some prerequisites:

More information

Likelihood based Statistical Inference. Dottorato in Economia e Finanza Dipartimento di Scienze Economiche Univ. di Verona

Likelihood based Statistical Inference. Dottorato in Economia e Finanza Dipartimento di Scienze Economiche Univ. di Verona Likelihood based Statistical Inference Dottorato in Economia e Finanza Dipartimento di Scienze Economiche Univ. di Verona L. Pace, A. Salvan, N. Sartori Udine, April 2008 Likelihood: observed quantities,

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Lecture 25: Review. Statistics 104. April 23, Colin Rundel Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

The Multivariate Normal Distribution. In this case according to our theorem

The Multivariate Normal Distribution. In this case according to our theorem The Multivariate Normal Distribution Defn: Z R 1 N(0, 1) iff f Z (z) = 1 2π e z2 /2. Defn: Z R p MV N p (0, I) if and only if Z = (Z 1,..., Z p ) T with the Z i independent and each Z i N(0, 1). In this

More information

Notes on Random Vectors and Multivariate Normal

Notes on Random Vectors and Multivariate Normal MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

Probability Background

Probability Background Probability Background Namrata Vaswani, Iowa State University August 24, 2015 Probability recap 1: EE 322 notes Quick test of concepts: Given random variables X 1, X 2,... X n. Compute the PDF of the second

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

1 Glivenko-Cantelli type theorems

1 Glivenko-Cantelli type theorems STA79 Lecture Spring Semester Glivenko-Cantelli type theorems Given i.i.d. observations X,..., X n with unknown distribution function F (t, consider the empirical (sample CDF ˆF n (t = I [Xi t]. n Then

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Lecture 11. Multivariate Normal theory

Lecture 11. Multivariate Normal theory 10. Lecture 11. Multivariate Normal theory Lecture 11. Multivariate Normal theory 1 (1 1) 11. Multivariate Normal theory 11.1. Properties of means and covariances of vectors Properties of means and covariances

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Jonathan Marchini Department of Statistics University of Oxford MT 2013 Jonathan Marchini (University of Oxford) BS2a MT 2013 1 / 27 Course arrangements Lectures M.2

More information

Stat 5101 Notes: Algorithms

Stat 5101 Notes: Algorithms Stat 5101 Notes: Algorithms Charles J. Geyer January 22, 2016 Contents 1 Calculating an Expectation or a Probability 3 1.1 From a PMF........................... 3 1.2 From a PDF...........................

More information

A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling

A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling Min-ge Xie Department of Statistics, Rutgers University Workshop on Higher-Order Asymptotics

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

ICES REPORT Model Misspecification and Plausibility

ICES REPORT Model Misspecification and Plausibility ICES REPORT 14-21 August 2014 Model Misspecification and Plausibility by Kathryn Farrell and J. Tinsley Odena The Institute for Computational Engineering and Sciences The University of Texas at Austin

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI. Changliang Zou Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

First Year Examination Department of Statistics, University of Florida

First Year Examination Department of Statistics, University of Florida First Year Examination Department of Statistics, University of Florida August 19, 010, 8:00 am - 1:00 noon Instructions: 1. You have four hours to answer questions in this examination.. You must show your

More information

PRINCIPLES OF STATISTICAL INFERENCE

PRINCIPLES OF STATISTICAL INFERENCE Advanced Series on Statistical Science & Applied Probability PRINCIPLES OF STATISTICAL INFERENCE from a Neo-Fisherian Perspective Luigi Pace Department of Statistics University ofudine, Italy Alessandra

More information

Contents 1. Contents

Contents 1. Contents Contents 1 Contents 6 Distributions of Functions of Random Variables 2 6.1 Transformation of Discrete r.v.s............. 3 6.2 Method of Distribution Functions............. 6 6.3 Method of Transformations................

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna

More information

Formulas for probability theory and linear models SF2941

Formulas for probability theory and linear models SF2941 Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms

More information

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours MATH2750 This question paper consists of 8 printed pages, each of which is identified by the reference MATH275. All calculators must carry an approval sticker issued by the School of Mathematics. c UNIVERSITY

More information

Probability Lecture III (August, 2006)

Probability Lecture III (August, 2006) robability Lecture III (August, 2006) 1 Some roperties of Random Vectors and Matrices We generalize univariate notions in this section. Definition 1 Let U = U ij k l, a matrix of random variables. Suppose

More information

A simple analysis of the exact probability matching prior in the location-scale model

A simple analysis of the exact probability matching prior in the location-scale model A simple analysis of the exact probability matching prior in the location-scale model Thomas J. DiCiccio Department of Social Statistics, Cornell University Todd A. Kuffner Department of Mathematics, Washington

More information

18 Bivariate normal distribution I

18 Bivariate normal distribution I 8 Bivariate normal distribution I 8 Example Imagine firing arrows at a target Hopefully they will fall close to the target centre As we fire more arrows we find a high density near the centre and fewer

More information

STA 2201/442 Assignment 2

STA 2201/442 Assignment 2 STA 2201/442 Assignment 2 1. This is about how to simulate from a continuous univariate distribution. Let the random variable X have a continuous distribution with density f X (x) and cumulative distribution

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples Bayesian inference for sample surveys Roderick Little Module : Bayesian models for simple random samples Superpopulation Modeling: Estimating parameters Various principles: least squares, method of moments,

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

Chapter 5 continued. Chapter 5 sections

Chapter 5 continued. Chapter 5 sections Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

Principles of Statistical Inference

Principles of Statistical Inference Principles of Statistical Inference Nancy Reid and David Cox August 30, 2013 Introduction Statistics needs a healthy interplay between theory and applications theory meaning Foundations, rather than theoretical

More information

Large Sample Properties of Estimators in the Classical Linear Regression Model

Large Sample Properties of Estimators in the Classical Linear Regression Model Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in

More information

Physics 403. Segev BenZvi. Propagation of Uncertainties. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Propagation of Uncertainties. Department of Physics and Astronomy University of Rochester Physics 403 Propagation of Uncertainties Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Maximum Likelihood and Minimum Least Squares Uncertainty Intervals

More information

Principles of Statistical Inference

Principles of Statistical Inference Principles of Statistical Inference Nancy Reid and David Cox August 30, 2013 Introduction Statistics needs a healthy interplay between theory and applications theory meaning Foundations, rather than theoretical

More information

Statistics for scientists and engineers

Statistics for scientists and engineers Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3

More information

Likelihood and p-value functions in the composite likelihood context

Likelihood and p-value functions in the composite likelihood context Likelihood and p-value functions in the composite likelihood context D.A.S. Fraser and N. Reid Department of Statistical Sciences University of Toronto November 19, 2016 Abstract The need for combining

More information

Verifying Regularity Conditions for Logit-Normal GLMM

Verifying Regularity Conditions for Logit-Normal GLMM Verifying Regularity Conditions for Logit-Normal GLMM Yun Ju Sung Charles J. Geyer January 10, 2006 In this note we verify the conditions of the theorems in Sung and Geyer (submitted) for the Logit-Normal

More information

LECTURE NOTES FYS 4550/FYS EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2013 PART I A. STRANDLIE GJØVIK UNIVERSITY COLLEGE AND UNIVERSITY OF OSLO

LECTURE NOTES FYS 4550/FYS EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2013 PART I A. STRANDLIE GJØVIK UNIVERSITY COLLEGE AND UNIVERSITY OF OSLO LECTURE NOTES FYS 4550/FYS9550 - EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2013 PART I PROBABILITY AND STATISTICS A. STRANDLIE GJØVIK UNIVERSITY COLLEGE AND UNIVERSITY OF OSLO Before embarking on the concept

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical

More information

The Delta Method and Applications

The Delta Method and Applications Chapter 5 The Delta Method and Applications 5.1 Local linear approximations Suppose that a particular random sequence converges in distribution to a particular constant. The idea of using a first-order

More information

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30 Random Vectors 1 STA442/2101 Fall 2017 1 See last slide for copyright information. 1 / 30 Background Reading: Renscher and Schaalje s Linear models in statistics Chapter 3 on Random Vectors and Matrices

More information

ECE531 Lecture 8: Non-Random Parameter Estimation

ECE531 Lecture 8: Non-Random Parameter Estimation ECE531 Lecture 8: Non-Random Parameter Estimation D. Richard Brown III Worcester Polytechnic Institute 19-March-2009 Worcester Polytechnic Institute D. Richard Brown III 19-March-2009 1 / 25 Introduction

More information

Probability Theory and Statistics. Peter Jochumzen

Probability Theory and Statistics. Peter Jochumzen Probability Theory and Statistics Peter Jochumzen April 18, 2016 Contents 1 Probability Theory And Statistics 3 1.1 Experiment, Outcome and Event................................ 3 1.2 Probability............................................

More information

Lecture Note 1: Probability Theory and Statistics

Lecture Note 1: Probability Theory and Statistics Univ. of Michigan - NAME 568/EECS 568/ROB 530 Winter 2018 Lecture Note 1: Probability Theory and Statistics Lecturer: Maani Ghaffari Jadidi Date: April 6, 2018 For this and all future notes, if you would

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano Testing Statistical Hypotheses Third Edition 4y Springer Preface vii I Small-Sample Theory 1 1 The General Decision Problem 3 1.1 Statistical Inference and Statistical Decisions

More information

The formal relationship between analytic and bootstrap approaches to parametric inference

The formal relationship between analytic and bootstrap approaches to parametric inference The formal relationship between analytic and bootstrap approaches to parametric inference T.J. DiCiccio Cornell University, Ithaca, NY 14853, U.S.A. T.A. Kuffner Washington University in St. Louis, St.

More information

Robustness to Parametric Assumptions in Missing Data Models

Robustness to Parametric Assumptions in Missing Data Models Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Multiple Random Variables

Multiple Random Variables Multiple Random Variables This Version: July 30, 2015 Multiple Random Variables 2 Now we consider models with more than one r.v. These are called multivariate models For instance: height and weight An

More information

TAMS39 Lecture 2 Multivariate normal distribution

TAMS39 Lecture 2 Multivariate normal distribution TAMS39 Lecture 2 Multivariate normal distribution Martin Singull Department of Mathematics Mathematical Statistics Linköping University, Sweden Content Lecture Random vectors Multivariate normal distribution

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

Lecture 1: August 28

Lecture 1: August 28 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random

More information

Review (Probability & Linear Algebra)

Review (Probability & Linear Algebra) Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint

More information

Introduction to Normal Distribution

Introduction to Normal Distribution Introduction to Normal Distribution Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 17-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Introduction

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

Modern Likelihood-Frequentist Inference. Donald A Pierce, OHSU and Ruggero Bellio, Univ of Udine

Modern Likelihood-Frequentist Inference. Donald A Pierce, OHSU and Ruggero Bellio, Univ of Udine Modern Likelihood-Frequentist Inference Donald A Pierce, OHSU and Ruggero Bellio, Univ of Udine Shortly before 1980, important developments in frequency theory of inference were in the air. Strictly, this

More information

Political Science 236 Hypothesis Testing: Review and Bootstrapping

Political Science 236 Hypothesis Testing: Review and Bootstrapping Political Science 236 Hypothesis Testing: Review and Bootstrapping Rocío Titiunik Fall 2007 1 Hypothesis Testing Definition 1.1 Hypothesis. A hypothesis is a statement about a population parameter The

More information

Multivariate Gaussian Distribution. Auxiliary notes for Time Series Analysis SF2943. Spring 2013

Multivariate Gaussian Distribution. Auxiliary notes for Time Series Analysis SF2943. Spring 2013 Multivariate Gaussian Distribution Auxiliary notes for Time Series Analysis SF2943 Spring 203 Timo Koski Department of Mathematics KTH Royal Institute of Technology, Stockholm 2 Chapter Gaussian Vectors.

More information

Next is material on matrix rank. Please see the handout

Next is material on matrix rank. Please see the handout B90.330 / C.005 NOTES for Wednesday 0.APR.7 Suppose that the model is β + ε, but ε does not have the desired variance matrix. Say that ε is normal, but Var(ε) σ W. The form of W is W w 0 0 0 0 0 0 w 0

More information

A Probability Review

A Probability Review A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 14 GEE-GMM Throughout the course we have emphasized methods of estimation and inference based on the principle

More information

MULTIVARIATE DISTRIBUTIONS

MULTIVARIATE DISTRIBUTIONS Chapter 9 MULTIVARIATE DISTRIBUTIONS John Wishart (1898-1956) British statistician. Wishart was an assistant to Pearson at University College and to Fisher at Rothamsted. In 1928 he derived the distribution

More information

Likelihood Construction, Inference for Parametric Survival Distributions

Likelihood Construction, Inference for Parametric Survival Distributions Week 1 Likelihood Construction, Inference for Parametric Survival Distributions In this section we obtain the likelihood function for noninformatively rightcensored survival data and indicate how to make

More information

The Multivariate Normal Distribution 1

The Multivariate Normal Distribution 1 The Multivariate Normal Distribution 1 STA 302 Fall 2017 1 See last slide for copyright information. 1 / 40 Overview 1 Moment-generating Functions 2 Definition 3 Properties 4 χ 2 and t distributions 2

More information

Regression and Statistical Inference

Regression and Statistical Inference Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF

More information

Exercises. (a) Prove that m(t) =

Exercises. (a) Prove that m(t) = Exercises 1. Lack of memory. Verify that the exponential distribution has the lack of memory property, that is, if T is exponentially distributed with parameter λ > then so is T t given that T > t for

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Data from one or a series of random experiments are collected. Planning experiments and collecting data (not discussed here). Analysis:

More information

Nonparametric Bayesian Methods - Lecture I

Nonparametric Bayesian Methods - Lecture I Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics

More information

ECON 3150/4150, Spring term Lecture 6

ECON 3150/4150, Spring term Lecture 6 ECON 3150/4150, Spring term 2013. Lecture 6 Review of theoretical statistics for econometric modelling (II) Ragnar Nymoen University of Oslo 31 January 2013 1 / 25 References to Lecture 3 and 6 Lecture

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

Submitted to the Brazilian Journal of Probability and Statistics

Submitted to the Brazilian Journal of Probability and Statistics Submitted to the Brazilian Journal of Probability and Statistics Multivariate normal approximation of the maximum likelihood estimator via the delta method Andreas Anastasiou a and Robert E. Gaunt b a

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be Quantile methods Class Notes Manuel Arellano December 1, 2009 1 Unconditional quantiles Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be Q τ (Y ) q τ F 1 (τ) =inf{r : F

More information

STAT 461/561- Assignments, Year 2015

STAT 461/561- Assignments, Year 2015 STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

STAT 4385 Topic 01: Introduction & Review

STAT 4385 Topic 01: Introduction & Review STAT 4385 Topic 01: Introduction & Review Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 Outline Welcome What is Regression Analysis? Basics

More information

BFF Four: Are we Converging?

BFF Four: Are we Converging? BFF Four: Are we Converging? Nancy Reid May 2, 2017 Classical Approaches: A Look Way Back Nature of Probability BFF one to three: a look back Comparisons Are we getting there? BFF Four Harvard, May 2017

More information

Multivariate Random Variable

Multivariate Random Variable Multivariate Random Variable Author: Author: Andrés Hincapié and Linyi Cao This Version: August 7, 2016 Multivariate Random Variable 3 Now we consider models with more than one r.v. These are called multivariate

More information

Multivariate Analysis and Likelihood Inference

Multivariate Analysis and Likelihood Inference Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density

More information

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland Frederick James CERN, Switzerland Statistical Methods in Experimental Physics 2nd Edition r i Irr 1- r ri Ibn World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI CONTENTS

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini April 27, 2018 1 / 1 Table of Contents 2 / 1 Linear Algebra Review Read 3.1 and 3.2 from text. 1. Fundamental subspace (rank-nullity, etc.) Im(X ) = ker(x T ) R

More information

HANDBOOK OF APPLICABLE MATHEMATICS

HANDBOOK OF APPLICABLE MATHEMATICS HANDBOOK OF APPLICABLE MATHEMATICS Chief Editor: Walter Ledermann Volume VI: Statistics PART A Edited by Emlyn Lloyd University of Lancaster A Wiley-Interscience Publication JOHN WILEY & SONS Chichester

More information

Supplementary Materials for Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach

Supplementary Materials for Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach Supplementary Materials for Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach Part A: Figures and tables Figure 2: An illustration of the sampling procedure to generate a surrogate

More information

Chapter 5. Chapter 5 sections

Chapter 5. Chapter 5 sections 1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information