MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Size: px
Start display at page:

Download "MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems"

Transcription

1 MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Principles of Statistical Inference Recap of statistical models Statistical inference (frequentist) Parametric vs. semiparametric models and inference Large sample approximation MA/ST

2 Recap of statistical models Theophylline example: Recap Mathematical model f(t, U, θ) = U = D = oral dose at t = 0 k afd V (k a k e ) {e k et e k at }, θ = (k a, k e, V ) T Y = (Y 1,...,Y n ) at times (t 1,...,t n ) Statistical model with intra-subject correlation Y U N n {f(u, θ), σ 2 1I n + σ 2 2Γ}, ψ = (θ T, σ 2 1, σ 2, φ) T (1) Simplified statistical model assuming negligible intra-subject correlation Y U N n {f(u, θ), σ 2 I n }, ψ = (θ T, σ 2 ) T (2) MA/ST

3 Recap of statistical models HIV example: Recap Y = (Y T 1,...,Y T n ) at times (t 1,...,t n ), where is observation at time t j on Y j = (Y (1) j, Y (2) j ) T bivariate {T 1 (t j ) + T 1 (t j ), V I (t j ) + V NI (t j )} T = (total CD4, viral load) Mathematical model: Not repeated here for brevity; 7 states x(t) = {T 1 (t), T 2 (t), T 1 (t), T 2 (t), V I (t), V NI (t), E(t)} T and observed continuous time ARV input U U(t) f(t, U, θ) = Ox(t, U, θ) = f(1) (t, U, θ) f (2) (t, U, θ) MA/ST

4 Recap of statistical models HIV example: Recap Usual simplified statistical model assuming all intra-subject correlations negligible Y j U N 2 {f(t, U, θ), V}, j = 1,...,n (3) V = σ(1)2 0 0 σ (2)2, ψ = (θ T, σ (1)2, σ (2)2 ) T More complicated statistical models of course possible; (3) is often tacitly assumed in inverse problems without comment regarding sources of variation MA/ST

5 Recap of statistical models Recall: A statistical model like (1), (2), or (3) describes the assumed family of all possible probability distributions for random vector Y representing the data generating mechanism for observations we might see at t 1,...,t n under conditions U Indexed by a parameter vector ψ We are interested in the component θ of ψ that governs the mathematical model, but the other components are needed to describe fully the statistical model, so we are stuck dealing with all of ψ The assumed statistical model may or may not be correct MA/ST

6 Statistical inference For definiteness: We discuss basic principles of statistical inference in the context of statistical models of the form Y j = f(t j, U, θ) + ǫ j, where Y j is scalar (generalize to multivariate Y j later...) (Y 1,...,Y n ) are independent (conditional on U) with Y U = (Y 1,...,Y n ) T N n {f(u, θ), σ 2 I n }, ψ = (θ T, σ 2 ) T, Y j U N {f(t j, U, θ), σ 2 } I.e., Y j, j = 1,...,n, may be regarded as representing the result of a potential draw from the normal probability distribution describing how such observations would turn out at t j (given phenomena like measurement error and biological fluctuation ) under conditions U MA/ST

7 Statistical inference Conceptually: The statistical model is a formal representation of the population of all possible realizations of (Y 1,...,Y n ) we would ever see under U When we collect data, we observe a sample from the population; i.e., we get to see a single realization of (Y 1,...,Y n ), (y 1,...,y n ) (and a realization u of U, say) Objective, restated: What can we learn about the true value of ψ (which determines the nature of the population) from a sample? Do not get to observe the entire population (or else we d know ψ) How uncertain will we be? What if we re wrong about the statistical model (later)? Statistical inference (loosely speaking): Making statements about a population of interest on the basis of only a sample from the population MA/ST

8 Statistical inference Statistic: Any function of a random variable(s) Parameter (point) estimation: Construct a function of (Y 1,...,Y n ) (i.e, a statistic ) that, if evaluated at a particular realization (y 1,...,y n ), yields a numerical value that gives information on the true value of ψ Estimator: The function itself Estimate: The numerical value based on a particular realization Estimation: Used both to denote the procedure (estimator) and actual calculation of a numerical value (estimate) Also depend on U and u, of course Remark: The distinction between estimator and estimate is often subject to abuse in the literature Meaning is often clear from the context MA/ST

9 Statistical inference Simplest example: Suppose that f(t, U, θ) = θ 1 + θ 2 t, θ = (θ 1, θ 2 ) T Simple linear regression model No U here so suppress conditioning Usual assumption Y j N(θ 1 + θ 2 t j, σ 2 ), Y 1,...,Y n all, ψ = (θ T, σ 2 ) T Standard estimator for θ: The ordinary least squares estimator for θ arg min θ n j=1 (Y j θ 1 θ 2 t j ) 2 (4) For a particular data set, the estimate of θ minimizes P n j=1 (y j θ 1 θ 2 t j ) 2 The rest of ψ, σ 2, can be estimated separately (later) Note that (4) is the usual objective function adopted in inverse problems for arbitrary f(t, U, θ) MA/ST

10 Statistical inference Closed form solution: Defining 1 t 1 1 t 2 X =.., y = y 1 y 2. 1 t n y n and minimizing yields the estimator and estimate θ(y ) = (X T X) T X T Y and θ = (X T X) T X T y Convention : means estimator or estimate (for θ here) Emphasis that the estimator θ(y ) is a function of (Y 1,...,Y n ) is usually suppressed MA/ST

11 Statistical inference Motivation for ordinary least squares: Coming up Separate estimation of σ 2 : ψ = ( θ T, σ 2 ) T n σ 2 = (n 2) 1 (Y j θ 1 θ 2 t j ) 2 Question: How good is θ = θ(y ) as an estimator for the true value of θ? More generally, how good is ψ = ψ(y ) as an estimator for the true value of ψ? A question about the procedure, not a particular numerical value What do we mean by good? Can we characterize the extent of uncertainty? j=1 MA/ST

12 Statistical inference Key idea: An estimator is a function of random variables that represent the data generating mechanism Thus, for any value of ψ, the estimator itself has probability distribution (that depends on that of (Y 1,...,Y n ) and hence on ψ) In general problems, this distribution is conditional on U It is typical to suppress this dependence for brevity in general arguments; it is understood that everything is conditional on U (we ll do that here) MA/ST

13 Statistical inference Conceptually: Each possible realization of (Y 1,...,Y n ) yields a (numerical) value of ψ; b i.e., the random vector ψ(y b ) has sample space comprising all these values We may thus think of the probability distribution of ψ(y b ) as representing probabilities for values of ψ b that would be observed across all possible data sets we could end up with When we collect data, we end up with only one of these If this distribution has large variance, estimates vary a lot across possible data sets, and our one estimate tells us little about the true value of ψ (another data set might have yielded a very different numerical value) lots of uncertainty On the other hand, if this distribution has small variance, estimates vary little across possible data sets, and our one estimate may be quite informative about the true value of ψ (another data set might have yielded a similar value) only mild uncertainty MA/ST

14 Statistical inference Sampling distribution: The probability distribution of an estimator Properties of the sampling distribution characterize the uncertainty in the estimation procedure (estimator) Unbiasedness: Intuitively, for a particular ψ (or its components) Some values in the sample space are larger than ψ, some smaller All possible values of estimator for ψ should average out to ψ Formally, writing E ψ ( ) to denote expectation with respect to the distribution of (Y 1,...,Y n ) under ψ E ψ ( ψ) = ψ (5) An estimator that satisfies (5) is called an unbiased estimator MA/ST

15 Statistical inference Sampling variance and standard error: Quantify variation across possible values of ψ The sampling covariance matrix is the covariance matrix of the sampling distribution of b ψ(y ), which we write as var ψ ( b ψ) Diagonal elements of var ψ ( b ψ) are the sampling variances of the components of b ψ(y ), e.g., var( b ψ k ) for the kth component The standard q error is the standard deviation of the kth component of bψ(y ), var( ψ b k ) (on the same scale as ψ) Key factors determining magnitude of sampling variance: Variance of original probability distribution of Y (very well out of our control) Sample size n (often under our control experimental design ) MA/ST

16 Statistical inference Example, continued: Sampling distribution of ordinary least squares estimator (a component of ψ) The estimator is unbiased: E(Y ) = Xθ E( θ) = E{(X T X) 1 X T Y } = (X T X) 1 X T Xθ = θ The sampling variance: var(y ) = σ 2 I n var( θ) = var{(x T X) 1 X T Y } = (X T X) 1 X T var(y )X(X T X) 1 = σ 2 (X T X) 1 The sampling distribution itself is multivariate normal: θ N 2 {θ, σ 2 (X T X) 1 } Sampling variance depends on σ 2 (variance of the assumed probability distribution of Y j ) Writing (X T X) 1 = n 1 {n 1 X T X} 1, sampling variance decreases as n increases MA/ST

17 Statistical inference Absolute necessity: Whenever an estimate based on data is reported, it should be accompanied by an assessment of uncertainty based on the sampling distribution Natural approach: look at the standard error; here, for the kth component of θ, var( θ k ) = σ 2 (X T X) 1 kk σ 2 is not known, so provide an estimate of the standard error of the kth component (usually itself called standard error ) In the example, SE( θ k ) = σ 2 (X T X) 1 kk, k = 1, 2 How to interpret? (i) If θ 2 = 2.0 and SE( θ 2 ) = 3.0 very uncertain (ii) If θ 2 = 2.0 and SE( θ 2 ) = 0.03 feeling pretty good! Ways to improve precision (reduce uncertainty) if case (i)? MA/ST

18 Statistical inference In fact: We can do more: The entire sampling distribution can help us to make a probability statement to better characterize uncertainty For example, Z = bθ k θ k q σ 2 (X T X) 1 kk N(0, 1) standard normal distribution, k = 1, 2 When σ 2 is replaced by bσ 2, T = b θ k θ k SE( b θ k ) t n 2 Student s t distribution with n 2 degrees of freedom Let α = some small probability, so 1 α is large; e.g., α = 0.05 Let t 1 α/2 satisfy P(T t 1 α/2 ) = α/2, so, by symmetry, P(T t 1 α/2 ) = α/2 P( t 1 α/2 T t 1 α/2 ) = 1 α P { b θ 2 t 1 α/2 SE( b θ 2 ) θ 2 b θ 2 + t 1 α/2 SE( b θ 2 )} = 1 α MA/ST

19 Statistical inference Important: P { θ 2 t 1 α/2 SE( θ 2 ) θ 2 θ 2 + t 1 α/2 SE( θ 2 )} = 1 α A probability pertaining to the sampling distribution of θ 2 That is, for all possible realizations of Y of size n, probability is 1 α that the endpoints of the interval [ θ 2 t 1 α/2 SE( θ 2 ), θ 2 + t 1 α/2 SE( θ 2 ) ] will include (the fixed value) θ 2 Confidence interval MA/ST

20 Statistical inference Confidence interval: A probability statement about the procedure by which an estimator is constructed from a realization of Y NOT a probability statement about θ 2 (fixed) Interpretation: For all possible realizations of Y of size n, if we were to calculate the interval according to the estimation procedure, (1 α)% of such intervals would cover θ 2 Provides more information than just standard error alone how large or small SE must be relative to b θ 2 to feel confident that the procedure of data generation and estimation provides a reliable understanding ( confident quantified by α) α is chosen by the analyst For the example, α = 0.05, n large t 1 α/ (i) b θ 2 = 2.0, SE( b θ 2 ) = 3.0 gives [-3.88,7.88] no confidence (ii) b θ 2 = 2.0, SE( b θ 2 ) = 0.03 gives [1.94,2.09] feeling pretty confident! MA/ST

21 Statistical inference Warning: The numerical values themselves are meaningless except for the impression they give about the quality of the procedure Once a realization of data is in hand, the numerical interval either covers the true value of θ 2 or it doesn t Wrong interpretation: The probability is 1 α that θ 2 is between 3.88 and What we CAN say: We are (1 α)% confident that intervals constructed this way would cover θ 2 the numerical values give only a sense of the faith we should attach to results of gathering data the way we did MA/ST

22 Parametric vs. semiparametric models Recall: In our modeling context A statistical model describes all possible probability distributions for Y (conditional on U), indexed by a parameter ψ (under some set of assumptions) Statistical inference, formalized: Estimation of ψ and associated assessment of uncertainty (sampling distribution) Issues: (i) Derivation of sensible estimators for ψ? (ii) Competing estimators? Optimality? (iii) Sensitivity to possibly incorrect assumptions? (iv) Derivation of sampling distribution in complex models? (next section) MA/ST

23 Parametric vs. semiparametric models Example, continued: In the theophylline example we assumed Y j = f(t j, U, θ) + (ǫ 1j + ǫ 2j ), j = 1,...,n, Y = f(u, θ) + (ǫ 1 + ǫ 2 ) }{{}}{{} ǫ j ǫ E(ǫ 1 U) = 0, E(ǫ 2 U) = 0, var(ǫ 1 U) = σ 2 1I n, var(ǫ 2 U) = σ 2 2I n and ǫ 1 ǫ 2, so that E(ǫ U) = 0, var(ǫ U) = σ 2 I n, σ 2 = σ σ 2 2 and thus E(Y U) = f(u, θ), var(y U) = σ 2 I n (6) (6) is an assumption on certain features of the probability distribution of Y U [first two moments, parameterized by ψ = (θ T, σ 2 ) T ] In addition, we assumed that ǫ U is a multivariate normally distributed random vector, and hence so is Y U, with mean and covariance matrix given in (6) MA/ST

24 Parametric vs. semiparametric models Parametric statistical model: A statistical model in which the probability distribution of Y is completely specified For theophylline Y U N n {f(u, θ), σ 2 I n } All possible probability distributions are multivariate normal with this mean and covariance matrix (over all ψ) Under this assumption, all aspects of Y (given U) (probabilities of any event, all higher moments) are (in principle) known if we know ψ For (fully) parametric models, there is a standard approach to inference (coming up) But taking Y is multivariate normal is an assumption how will inference be affected if it s wrong? MA/ST

25 Parametric vs. semiparametric models Semiparametric statistical model: A statistical model in which the probability distribution of Y is only partially specified For theophylline E(Y U) = f(u, θ) var(y U) = σ 2 I n First two moments in terms of parameter ψ We could stop here this is all we re willing to say All possible probability distributions are any probability distributions having this mean and covariance matrix (over all ψ) Intuition fewer assumptions including more possible models, less chance for problems due to misspecification Price to pay Fewer assumptions cannot take advantage of additional information in full probability model if it s correct MA/ST

26 Parametric vs. semiparametric models Estimators in parametric models: Several possibilities The likelihood function: Suppose p(y ψ) is the pmf/pdf for Y under the assumed parametric model (suppress U) The notation emphasizes that p(y) is indexed by ψ (and is a function of y for fixed ψ) Given that Y = y is observed, the function of ψ defined by is called the likelihood function This is a function of ψ for fixed y L(ψ y) = p(y ψ) Remark: The likelihood function forms the basis for one of the most popular and defensible general strategies for finding estimators The maximum likelihood estimator MA/ST

27 Parametric vs. semiparametric models Demonstration: Y discrete random vector L(ψ y) = p(y ψ) = P ψ (Y = y), where the subscript emphasizes that the pmf is indexed by ψ Suppose ψ 1 and ψ 2 are two possible values for ψ such that P ψ1 (Y = y) = L(ψ 1 y) > L(ψ 2 y) = P ψ2 (Y = y) Interpretation The y we actually observed is more likely to have occurred if ψ = ψ 1 than if ψ = ψ 2 ψ 1 is a more plausible value for the true value of ψ than is ψ 2 Seems reasonable to examine the probability of the y we actually observed under different possible ψ Reasoning extends to continuous random vectors under some technicalities MA/ST

28 Parametric vs. semiparametric models Likelihood principle: If y 1 and y 2 are two possible realizations of Y such that L(ψ y 1 ) = C(y 1, y 2 )L(ψ y 2 ) for all ψ, for constant C(y 1, y 2 ) not depending on ψ, then conclusions drawn from observing y 1 or y 2 should be identical May view the likelihood function as a data reduction device the likelihood function contains all information about ψ in a sample C(y 1, y 2 ) = 1 y 1 and y 2 contain the same information about ψ Moreover, if the likelihood functions are only proportional, the information is still the same L(ψ 2 y 1 ) = 3L(ψ 1 y 1 ) L(ψ 2 y 2 ) = 3L(ψ 1 y 2 ) and will conclude ψ 2 is more plausible than ψ 1 whether we observe y 1 or y 2 MA/ST

29 Parametric vs. semiparametric models Definition: For each y Y, let ψ(y) be a parameter value where L(ψ y) attains its maximum as a function of ψ, with y held fixed. A maximum likelihood estimator for ψ is ψ(y ) Acronym MLE Intuitively reasonable MLE is the parameter value for which the observed y is most likely More formally MLE has optimality properties under the assumption that the specified probability model is correct Operationally must find the global maximum (possible computational issues) If L(ψ y) is differentiable in each component of ψ, possible candidate for MLE is the value of ψ satisfying ψ log L(ψ y) = 0 (Necessary but not sufficient, local maximum, etc) MA/ST

30 Parametric vs. semiparametric models Nice properties: Invariance if ψ is the MLE for ψ, then for any function τ(ψ), τ( ψ) is the MLE for τ(ψ) Is the best estimator for ψ in a certain sense (coming up) Example, continued: Simple linear regression model Y = (Y 1,...,Y n ) T Y j N(θ 1 + θ 2 t j, σ 2 ) are independent, ψ = (θ 1, θ 2, σ 2 ) T Under normality assumption, the likelihood function is n 1 L(ψ y) = exp { (y j θ 1 θ 2 t j ) 2 } 2πσ 2σ 2 j=1 log L(ψ y) = n log σ n (y j θ 1 θ 2 t j ) 2 /(2σ 2 ) j=1 MA/ST

31 Parametric vs. semiparametric models Example, continued: Clearly maximized in θ 1, θ 2 by minimizing n (y j θ 1 θ 2 t j ) 2 estimator minimizes j=1 n (Y j θ 1 θ 2 t j ) 2 j=1 Differentiating log L(ψ y) wrt θ 1, θ 2 yields n (y j θ 1 θ 2 t j ) 1 = 0 (7) t j j=1 The so-called normal equations Differentiating log L(ψ y) wrt σ 2 yields n σ 2 = n 1 (y j θ 1 θ 2 t j ) 2 j=1 MA/ST

32 Parametric vs. semiparametric models Result: The ordinary least squares estimator is the maximum likelihood estimator under the assumption of normality Inference in semiparametric models: Maximum likelihood seems infeasible without a full distributional assumption Derive estimators in other (sometimes seemingly ad hoc) ways MA/ST

33 Parametric vs. semiparametric models Example, continued: Simple linear regression model Only willing to assume E(Y j ) = θ 1 + θ 2 t j, var(y j ) = σ 2 Perspective minimize distance between model and data arg min θ n X j=1 (Y j θ 1 θ 2 t j ) 2 ordinary least squares estimator Alternatively, (7) can be written as X n n 1 X n A = n 1 (θ 1 + θ 2 t j 1 A t j t j j=1 Y j j=1 Equate the LHS to its expectation moment method Solving (7) turns out to yield the best estimator for θ in a certain sense if all one is willing specify is the mean and variance of Y j MA/ST

34 Parametric vs. semiparametric models Remarks: Identifiability of statistical models. A parameter ψ for a family of probability distributions is identifiable if distinct values of ψ correspond to distinct pmf/pdfs. I.e., if the pmf/pdf for Y is p(y ψ) under the model ψ ψ p(y ψ) is not the same function of y as p(y ψ ) Identifiability is a feature of the model, NOT of an estimator or estimation procedure Difficulty with nonidentifiability in inference. Observations governed by p(y ψ) and p(y ψ ) look the same no way to know if ψ or ψ is the true value (same likelihood function value) How to fix nonidentifiability. May revise model, impose constraints, make assumptions Limitations of data. Even if a model is identifiable, without sufficient data, it may be practically impossible to estimate some parameters, because the information needed is not available in the data MA/ST

35 Large sample approximation Complication: Unfortunately, things aren t always so easy For simple statistical models like the simple linear regression model with normality assumptions, deriving the sampling distribution and showing unbiasedness of the least squares estimator for θ is straightforward and available exactly However, things get ugly quickly; e.g., recall the theophylline example with the parametric model f(t,u, θ) = Y j U N {f(tj, U, θ), σ 2 } k afd V (k a k e ) {e k et e k at }, θ = (k a,k e, V ) T The ordinary least squares estimator minimizes nx {Y j f(t j, θ)} 2 solve j=1 nx {Y j f(t j, U, θ)}f θ (t j,uθ) = 0 j=1 f θ (t,u, θ) = f(t,u, θ) θ MA/ST

36 Large sample approximation Obviously not possible to solve for θ explicitly θ(y ) is a complicated function of Y Result: Simply moving from a linear model to a nonlinear (albeit simple) one complicates derivation of the sampling distribution for θ This is the case for most statistical models Further consideration: For semiparametric models, with no distributional assumption for Y U, even if an explicit expression θ were available, how to derive the full sampling distribution? Approach: Resort to an approximation Standard approximation evaluate properties of estimators for large sample size; i.e, for n Hope that properties so derived will provide a reasonable approximation to (intractable) properties for realistic (finite) n MA/ST

37 Large sample approximation Large sample (asymptotic) theory: Evaluate properties as sample size Require some basic concepts and tools... Generically: Let X n, n = 1, 2,... represent a sequence of random variables or vectors indexed by n, and let X be a random variable or vector [all on the same probability space (Ω, B, P)] Convergence in probability: X n converges in probability to X, written p X, if X n lim P( X n X < δ) = 1 for all δ > 0 n For random vectors, holds iff component-wise convergence Fact: If h( ) is continuous, then X n p X h(x n ) p h(x) MA/ST

38 Large sample approximation Usefulness: Identify X n as a sequence of estimators ψ n based on samples of size n for parameter ψ in a parametric or semiparametric model Assuming the model is correct, let ψ 0 be the true value of ψ; i.e, the value that actually generates data we might see Identify X with ψ 0 (a degenerate random variable) Consistency: An estimator b ψ n is said to be consistent for ψ 0 if bψ n p ψ 0 Practical interpretation: For n large, the probability is small that ψ b n takes value outside an arbitrarily small neighborhood of ψ 0 As n gets large, the probability that ψ b n will wander away from ψ 0 is small Consistency is good sort of like unbiasedness MA/ST

39 Large sample approximation Convergence in distribution: Suppose X n has cdf F n and X has cdf F. X n is said to converge in distribution (or converge in law) to X iff, for each continuity point of F lim F n(x) = F(x) n Write X n D X or Xn L X Fact if X n p X then X n L X Practical interpretation: If we are interested in probabilities associated with X n, we can replace them by probabilities for X Use to approximate sampling distribution of an estimator Not so fast... b ψ n p ψ 0 (consistent estimator) b ψ n L ψ 0 ψ 0 is a constant, so the limit distribution is degenerate not particularly useful! MA/ST

40 Large sample approximation Instead: Consider suitably centered and scaled version of ψ n with more interesting behavior Asymptotic normality (random variable): X n is asymptotically normal if we can find sequences {a n } and {c n } such that c n (X n a n ) L N(0, 1) (8) Notational convention : this means converges in law to a random variable with a standard normal distribution Advantage: (8) implies X n N(a n, c 2 n), where means distributed approximately as For regular estimators: It may be shown that n 1/2 ( ψ n ψ 0 ) where C = lim n C n so that MA/ST L N(0, C), ψ n N(ψ 0, n 1 C n )

41 Large sample approximation Asymptotic relative efficiency (ARE): Suppose we have two (1) (2) competing estimators ψ n and ψ n satisfying n 1/2 ( ψ (1) n ψ 0 ) L N(0, C 1 ), n 1/2 (2) ( ψ n ψ 0 ) L N(0, C 2 ) If dim ψ = 1, asymptotic relative efficiency of ψ (1) n to ψ (2) n and ψ (1) is preferred if ARE > 1 ARE = C 2 /C 1, If dim ψ = p > 1, then ARE = ( C 2 / C 1 ) 1/2 If ARE > 1 ψ (2), needs proportionately ARE as many observations as ψ (1) to achieve the same (large-n) precision MA/ST

42 Large sample approximation Maximum likelihood estimator: For a parametric model, if the model assumptions are all exactly correct, then the maximum likelihood estimator for ψ is asymptotically efficient The large-n distribution for a MLE achieves the smallest possible variance among all regular estimators MA/ST

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Integrating Mathematical and Statistical Models Recap of mathematical models Models and data Statistical models and sources of

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

A General Overview of Parametric Estimation and Inference Techniques.

A General Overview of Parametric Estimation and Inference Techniques. A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying

More information

5601 Notes: The Sandwich Estimator

5601 Notes: The Sandwich Estimator 560 Notes: The Sandwich Estimator Charles J. Geyer December 6, 2003 Contents Maximum Likelihood Estimation 2. Likelihood for One Observation................... 2.2 Likelihood for Many IID Observations...............

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

V. Properties of estimators {Parts C, D & E in this file}

V. Properties of estimators {Parts C, D & E in this file} A. Definitions & Desiderata. model. estimator V. Properties of estimators {Parts C, D & E in this file}. sampling errors and sampling distribution 4. unbiasedness 5. low sampling variance 6. low mean squared

More information

A Few Notes on Fisher Information (WIP)

A Few Notes on Fisher Information (WIP) A Few Notes on Fisher Information (WIP) David Meyer dmm@{-4-5.net,uoregon.edu} Last update: April 30, 208 Definitions There are so many interesting things about Fisher Information and its theoretical properties

More information

Statistical inference

Statistical inference Statistical inference Contents 1. Main definitions 2. Estimation 3. Testing L. Trapani MSc Induction - Statistical inference 1 1 Introduction: definition and preliminary theory In this chapter, we shall

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random

More information

6 Pattern Mixture Models

6 Pattern Mixture Models 6 Pattern Mixture Models A common theme underlying the methods we have discussed so far is that interest focuses on making inference on parameters in a parametric or semiparametric model for the full data

More information

POLI 8501 Introduction to Maximum Likelihood Estimation

POLI 8501 Introduction to Maximum Likelihood Estimation POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,

More information

7 Sensitivity Analysis

7 Sensitivity Analysis 7 Sensitivity Analysis A recurrent theme underlying methodology for analysis in the presence of missing data is the need to make assumptions that cannot be verified based on the observed data. If the assumption

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE

FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE Donald A. Pierce Oregon State Univ (Emeritus), RERF Hiroshima (Retired), Oregon Health Sciences Univ (Adjunct) Ruggero Bellio Univ of Udine For Perugia

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

The Delta Method and Applications

The Delta Method and Applications Chapter 5 The Delta Method and Applications 5.1 Local linear approximations Suppose that a particular random sequence converges in distribution to a particular constant. The idea of using a first-order

More information

Confidence Intervals

Confidence Intervals Quantitative Foundations Project 3 Instructor: Linwei Wang Confidence Intervals Contents 1 Introduction 3 1.1 Warning....................................... 3 1.2 Goals of Statistics..................................

More information

22 : Hilbert Space Embeddings of Distributions

22 : Hilbert Space Embeddings of Distributions 10-708: Probabilistic Graphical Models 10-708, Spring 2014 22 : Hilbert Space Embeddings of Distributions Lecturer: Eric P. Xing Scribes: Sujay Kumar Jauhar and Zhiguang Huo 1 Introduction and Motivation

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

DD Advanced Machine Learning

DD Advanced Machine Learning Modelling Carl Henrik {chek}@csc.kth.se Royal Institute of Technology November 4, 2015 Who do I think you are? Mathematically competent linear algebra multivariate calculus Ok programmers Able to extend

More information

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u Interval estimation and hypothesis tests So far our focus has been on estimation of the parameter vector β in the linear model y i = β 1 x 1i + β 2 x 2i +... + β K x Ki + u i = x iβ + u i for i = 1, 2,...,

More information

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete

More information

STAT 512 sp 2018 Summary Sheet

STAT 512 sp 2018 Summary Sheet STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}

More information

1 Degree distributions and data

1 Degree distributions and data 1 Degree distributions and data A great deal of effort is often spent trying to identify what functional form best describes the degree distribution of a network, particularly the upper tail of that distribution.

More information

HT Introduction. P(X i = x i ) = e λ λ x i

HT Introduction. P(X i = x i ) = e λ λ x i MODS STATISTICS Introduction. HT 2012 Simon Myers, Department of Statistics (and The Wellcome Trust Centre for Human Genetics) myers@stats.ox.ac.uk We will be concerned with the mathematical framework

More information

Statistics for scientists and engineers

Statistics for scientists and engineers Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3

More information

Evaluating the Performance of Estimators (Section 7.3)

Evaluating the Performance of Estimators (Section 7.3) Evaluating the Performance of Estimators (Section 7.3) Example: Suppose we observe X 1,..., X n iid N(θ, σ 2 0 ), with σ2 0 known, and wish to estimate θ. Two possible estimators are: ˆθ = X sample mean

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

12 Statistical Justifications; the Bias-Variance Decomposition

12 Statistical Justifications; the Bias-Variance Decomposition Statistical Justifications; the Bias-Variance Decomposition 65 12 Statistical Justifications; the Bias-Variance Decomposition STATISTICAL JUSTIFICATIONS FOR REGRESSION [So far, I ve talked about regression

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Mathematical statistics

Mathematical statistics October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

Probability and Statistics

Probability and Statistics CHAPTER 5: PARAMETER ESTIMATION 5-0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 5: PARAMETER

More information

Bayesian Linear Regression [DRAFT - In Progress]

Bayesian Linear Regression [DRAFT - In Progress] Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory

More information

Chapter 3: Maximum Likelihood Theory

Chapter 3: Maximum Likelihood Theory Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood

More information

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n = Spring 2012 Math 541A Exam 1 1. (a) Let Z i be independent N(0, 1), i = 1, 2,, n. Are Z = 1 n n Z i and S 2 Z = 1 n 1 n (Z i Z) 2 independent? Prove your claim. (b) Let X 1, X 2,, X n be independent identically

More information

Lecture 2: Basic Concepts of Statistical Decision Theory

Lecture 2: Basic Concepts of Statistical Decision Theory EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture

More information

10. Linear Models and Maximum Likelihood Estimation

10. Linear Models and Maximum Likelihood Estimation 10. Linear Models and Maximum Likelihood Estimation ECE 830, Spring 2017 Rebecca Willett 1 / 34 Primary Goal General problem statement: We observe y i iid pθ, θ Θ and the goal is to determine the θ that

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Lecture 25: Review. Statistics 104. April 23, Colin Rundel Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April

More information

University of Regina. Lecture Notes. Michael Kozdron

University of Regina. Lecture Notes. Michael Kozdron University of Regina Statistics 252 Mathematical Statistics Lecture Notes Winter 2005 Michael Kozdron kozdron@math.uregina.ca www.math.uregina.ca/ kozdron Contents 1 The Basic Idea of Statistics: Estimating

More information

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function

More information

Space Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses

Space Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses Space Telescope Science Institute statistics mini-course October 2011 Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses James L Rosenberger Acknowledgements: Donald Richards, William

More information

STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN

STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN Massimo Guidolin Massimo.Guidolin@unibocconi.it Dept. of Finance STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN SECOND PART, LECTURE 2: MODES OF CONVERGENCE AND POINT ESTIMATION Lecture 2:

More information

Statistical Estimation

Statistical Estimation Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from

More information

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21 CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about

More information

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals (SW Chapter 5) Outline. The standard error of ˆ. Hypothesis tests concerning β 3. Confidence intervals for β 4. Regression

More information

Introduction to Estimation Theory

Introduction to Estimation Theory Introduction to Statistics IST, the slides, set 1 Introduction to Estimation Theory Richard D. Gill Dept. of Mathematics Leiden University September 18, 2009 Statistical Models Data, Parameters: Maths

More information

Optimization Problems

Optimization Problems Optimization Problems The goal in an optimization problem is to find the point at which the minimum (or maximum) of a real, scalar function f occurs and, usually, to find the value of the function at that

More information

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Theory of Maximum Likelihood Estimation. Konstantin Kashin Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

ACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H.

ACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H. ACE 564 Spring 2006 Lecture 8 Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information by Professor Scott H. Irwin Readings: Griffiths, Hill and Judge. "Collinear Economic Variables,

More information

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline. Practitioner Course: Portfolio Optimization September 10, 2008 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x,

More information

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research Linear models Linear models are computationally convenient and remain widely used in applied econometric research Our main focus in these lectures will be on single equation linear models of the form y

More information

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n Chapter 9 Hypothesis Testing 9.1 Wald, Rao, and Likelihood Ratio Tests Suppose we wish to test H 0 : θ = θ 0 against H 1 : θ θ 0. The likelihood-based results of Chapter 8 give rise to several possible

More information

The problem is to infer on the underlying probability distribution that gives rise to the data S.

The problem is to infer on the underlying probability distribution that gives rise to the data S. Basic Problem of Statistical Inference Assume that we have a set of observations S = { x 1, x 2,..., x N }, xj R n. The problem is to infer on the underlying probability distribution that gives rise to

More information

Probability. Table of contents

Probability. Table of contents Probability Table of contents 1. Important definitions 2. Distributions 3. Discrete distributions 4. Continuous distributions 5. The Normal distribution 6. Multivariate random variables 7. Other continuous

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Probabilities & Statistics Revision

Probabilities & Statistics Revision Probabilities & Statistics Revision Christopher Ting Christopher Ting http://www.mysmu.edu/faculty/christophert/ : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 January 6, 2017 Christopher Ting QF

More information

Bias Variance Trade-off

Bias Variance Trade-off Bias Variance Trade-off The mean squared error of an estimator MSE(ˆθ) = E([ˆθ θ] 2 ) Can be re-expressed MSE(ˆθ) = Var(ˆθ) + (B(ˆθ) 2 ) MSE = VAR + BIAS 2 Proof MSE(ˆθ) = E((ˆθ θ) 2 ) = E(([ˆθ E(ˆθ)]

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

Lecture 3: Introduction to Complexity Regularization

Lecture 3: Introduction to Complexity Regularization ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,

More information

SOLUTIONS Problem Set 2: Static Entry Games

SOLUTIONS Problem Set 2: Static Entry Games SOLUTIONS Problem Set 2: Static Entry Games Matt Grennan January 29, 2008 These are my attempt at the second problem set for the second year Ph.D. IO course at NYU with Heski Bar-Isaac and Allan Collard-Wexler

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 12: Frequentist properties of estimators (v4) Ramesh Johari ramesh.johari@stanford.edu 1 / 39 Frequentist inference 2 / 39 Thinking like a frequentist Suppose that for some

More information

The loss function and estimating equations

The loss function and estimating equations Chapter 6 he loss function and estimating equations 6 Loss functions Up until now our main focus has been on parameter estimating via the maximum likelihood However, the negative maximum likelihood is

More information

Central limit theorem. Paninski, Intro. Math. Stats., October 5, probability, Z N P Z, if

Central limit theorem. Paninski, Intro. Math. Stats., October 5, probability, Z N P Z, if Paninski, Intro. Math. Stats., October 5, 2005 35 probability, Z P Z, if P ( Z Z > ɛ) 0 as. (The weak LL is called weak because it asserts convergence in probability, which turns out to be a somewhat weak

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

Practice Problems Section Problems

Practice Problems Section Problems Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

GMM and SMM. 1. Hansen, L Large Sample Properties of Generalized Method of Moments Estimators, Econometrica, 50, p

GMM and SMM. 1. Hansen, L Large Sample Properties of Generalized Method of Moments Estimators, Econometrica, 50, p GMM and SMM Some useful references: 1. Hansen, L. 1982. Large Sample Properties of Generalized Method of Moments Estimators, Econometrica, 50, p. 1029-54. 2. Lee, B.S. and B. Ingram. 1991 Simulation estimation

More information

ECO Class 6 Nonparametric Econometrics

ECO Class 6 Nonparametric Econometrics ECO 523 - Class 6 Nonparametric Econometrics Carolina Caetano Contents 1 Nonparametric instrumental variable regression 1 2 Nonparametric Estimation of Average Treatment Effects 3 2.1 Asymptotic results................................

More information

2. The Concept of Convergence: Ultrafilters and Nets

2. The Concept of Convergence: Ultrafilters and Nets 2. The Concept of Convergence: Ultrafilters and Nets NOTE: AS OF 2008, SOME OF THIS STUFF IS A BIT OUT- DATED AND HAS A FEW TYPOS. I WILL REVISE THIS MATE- RIAL SOMETIME. In this lecture we discuss two

More information

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain 0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher

More information

ECE 275A Homework 7 Solutions

ECE 275A Homework 7 Solutions ECE 275A Homework 7 Solutions Solutions 1. For the same specification as in Homework Problem 6.11 we want to determine an estimator for θ using the Method of Moments (MOM). In general, the MOM estimator

More information

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Lecture 2: Review of Basic Probability Theory

Lecture 2: Review of Basic Probability Theory ECE 830 Fall 2010 Statistical Signal Processing instructor: R. Nowak, scribe: R. Nowak Lecture 2: Review of Basic Probability Theory Probabilistic models will be used throughout the course to represent

More information

Lecture 1: August 28

Lecture 1: August 28 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random

More information

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking

More information

Peter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11

Peter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11 Contents 1 Motivation and definition 1 2 Least favorable prior 3 3 Least favorable prior sequence 11 4 Nonparametric problems 15 5 Minimax and admissibility 18 6 Superefficiency and sparsity 19 Most of

More information

1 Differentiability at a point

1 Differentiability at a point Notes by David Groisser, Copyright c 2012 What does mean? These notes are intended as a supplement (not a textbook-replacement) for a class at the level of Calculus 3, but can be used in a higher-level

More information

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That Statistics Lecture 2 August 7, 2000 Frank Porter Caltech The plan for these lectures: The Fundamentals; Point Estimation Maximum Likelihood, Least Squares and All That What is a Confidence Interval? Interval

More information

Long-Run Covariability

Long-Run Covariability Long-Run Covariability Ulrich K. Müller and Mark W. Watson Princeton University October 2016 Motivation Study the long-run covariability/relationship between economic variables great ratios, long-run Phillips

More information

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds Chapter 6 Logistic Regression In logistic regression, there is a categorical response variables, often coded 1=Yes and 0=No. Many important phenomena fit this framework. The patient survives the operation,

More information

Lecture 4: September Reminder: convergence of sequences

Lecture 4: September Reminder: convergence of sequences 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 4: September 6 In this lecture we discuss the convergence of random variables. At a high-level, our first few lectures focused

More information

Mathematical statistics

Mathematical statistics October 18 th, 2018 Lecture 16: Midterm review Countdown to mid-term exam: 7 days Week 1 Chapter 1: Probability review Week 2 Week 4 Week 7 Chapter 6: Statistics Chapter 7: Point Estimation Chapter 8:

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

FINANCIAL OPTIMIZATION

FINANCIAL OPTIMIZATION FINANCIAL OPTIMIZATION Lecture 1: General Principles and Analytic Optimization Philip H. Dybvig Washington University Saint Louis, Missouri Copyright c Philip H. Dybvig 2008 Choose x R N to minimize f(x)

More information

Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1

Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1 Estimation and Model Selection in Mixed Effects Models Part I Adeline Samson 1 1 University Paris Descartes Summer school 2009 - Lipari, Italy These slides are based on Marc Lavielle s slides Outline 1

More information