MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems
|
|
- Felicia Kelly
- 5 years ago
- Views:
Transcription
1 MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Principles of Statistical Inference Recap of statistical models Statistical inference (frequentist) Parametric vs. semiparametric models and inference Large sample approximation MA/ST
2 Recap of statistical models Theophylline example: Recap Mathematical model f(t, U, θ) = U = D = oral dose at t = 0 k afd V (k a k e ) {e k et e k at }, θ = (k a, k e, V ) T Y = (Y 1,...,Y n ) at times (t 1,...,t n ) Statistical model with intra-subject correlation Y U N n {f(u, θ), σ 2 1I n + σ 2 2Γ}, ψ = (θ T, σ 2 1, σ 2, φ) T (1) Simplified statistical model assuming negligible intra-subject correlation Y U N n {f(u, θ), σ 2 I n }, ψ = (θ T, σ 2 ) T (2) MA/ST
3 Recap of statistical models HIV example: Recap Y = (Y T 1,...,Y T n ) at times (t 1,...,t n ), where is observation at time t j on Y j = (Y (1) j, Y (2) j ) T bivariate {T 1 (t j ) + T 1 (t j ), V I (t j ) + V NI (t j )} T = (total CD4, viral load) Mathematical model: Not repeated here for brevity; 7 states x(t) = {T 1 (t), T 2 (t), T 1 (t), T 2 (t), V I (t), V NI (t), E(t)} T and observed continuous time ARV input U U(t) f(t, U, θ) = Ox(t, U, θ) = f(1) (t, U, θ) f (2) (t, U, θ) MA/ST
4 Recap of statistical models HIV example: Recap Usual simplified statistical model assuming all intra-subject correlations negligible Y j U N 2 {f(t, U, θ), V}, j = 1,...,n (3) V = σ(1)2 0 0 σ (2)2, ψ = (θ T, σ (1)2, σ (2)2 ) T More complicated statistical models of course possible; (3) is often tacitly assumed in inverse problems without comment regarding sources of variation MA/ST
5 Recap of statistical models Recall: A statistical model like (1), (2), or (3) describes the assumed family of all possible probability distributions for random vector Y representing the data generating mechanism for observations we might see at t 1,...,t n under conditions U Indexed by a parameter vector ψ We are interested in the component θ of ψ that governs the mathematical model, but the other components are needed to describe fully the statistical model, so we are stuck dealing with all of ψ The assumed statistical model may or may not be correct MA/ST
6 Statistical inference For definiteness: We discuss basic principles of statistical inference in the context of statistical models of the form Y j = f(t j, U, θ) + ǫ j, where Y j is scalar (generalize to multivariate Y j later...) (Y 1,...,Y n ) are independent (conditional on U) with Y U = (Y 1,...,Y n ) T N n {f(u, θ), σ 2 I n }, ψ = (θ T, σ 2 ) T, Y j U N {f(t j, U, θ), σ 2 } I.e., Y j, j = 1,...,n, may be regarded as representing the result of a potential draw from the normal probability distribution describing how such observations would turn out at t j (given phenomena like measurement error and biological fluctuation ) under conditions U MA/ST
7 Statistical inference Conceptually: The statistical model is a formal representation of the population of all possible realizations of (Y 1,...,Y n ) we would ever see under U When we collect data, we observe a sample from the population; i.e., we get to see a single realization of (Y 1,...,Y n ), (y 1,...,y n ) (and a realization u of U, say) Objective, restated: What can we learn about the true value of ψ (which determines the nature of the population) from a sample? Do not get to observe the entire population (or else we d know ψ) How uncertain will we be? What if we re wrong about the statistical model (later)? Statistical inference (loosely speaking): Making statements about a population of interest on the basis of only a sample from the population MA/ST
8 Statistical inference Statistic: Any function of a random variable(s) Parameter (point) estimation: Construct a function of (Y 1,...,Y n ) (i.e, a statistic ) that, if evaluated at a particular realization (y 1,...,y n ), yields a numerical value that gives information on the true value of ψ Estimator: The function itself Estimate: The numerical value based on a particular realization Estimation: Used both to denote the procedure (estimator) and actual calculation of a numerical value (estimate) Also depend on U and u, of course Remark: The distinction between estimator and estimate is often subject to abuse in the literature Meaning is often clear from the context MA/ST
9 Statistical inference Simplest example: Suppose that f(t, U, θ) = θ 1 + θ 2 t, θ = (θ 1, θ 2 ) T Simple linear regression model No U here so suppress conditioning Usual assumption Y j N(θ 1 + θ 2 t j, σ 2 ), Y 1,...,Y n all, ψ = (θ T, σ 2 ) T Standard estimator for θ: The ordinary least squares estimator for θ arg min θ n j=1 (Y j θ 1 θ 2 t j ) 2 (4) For a particular data set, the estimate of θ minimizes P n j=1 (y j θ 1 θ 2 t j ) 2 The rest of ψ, σ 2, can be estimated separately (later) Note that (4) is the usual objective function adopted in inverse problems for arbitrary f(t, U, θ) MA/ST
10 Statistical inference Closed form solution: Defining 1 t 1 1 t 2 X =.., y = y 1 y 2. 1 t n y n and minimizing yields the estimator and estimate θ(y ) = (X T X) T X T Y and θ = (X T X) T X T y Convention : means estimator or estimate (for θ here) Emphasis that the estimator θ(y ) is a function of (Y 1,...,Y n ) is usually suppressed MA/ST
11 Statistical inference Motivation for ordinary least squares: Coming up Separate estimation of σ 2 : ψ = ( θ T, σ 2 ) T n σ 2 = (n 2) 1 (Y j θ 1 θ 2 t j ) 2 Question: How good is θ = θ(y ) as an estimator for the true value of θ? More generally, how good is ψ = ψ(y ) as an estimator for the true value of ψ? A question about the procedure, not a particular numerical value What do we mean by good? Can we characterize the extent of uncertainty? j=1 MA/ST
12 Statistical inference Key idea: An estimator is a function of random variables that represent the data generating mechanism Thus, for any value of ψ, the estimator itself has probability distribution (that depends on that of (Y 1,...,Y n ) and hence on ψ) In general problems, this distribution is conditional on U It is typical to suppress this dependence for brevity in general arguments; it is understood that everything is conditional on U (we ll do that here) MA/ST
13 Statistical inference Conceptually: Each possible realization of (Y 1,...,Y n ) yields a (numerical) value of ψ; b i.e., the random vector ψ(y b ) has sample space comprising all these values We may thus think of the probability distribution of ψ(y b ) as representing probabilities for values of ψ b that would be observed across all possible data sets we could end up with When we collect data, we end up with only one of these If this distribution has large variance, estimates vary a lot across possible data sets, and our one estimate tells us little about the true value of ψ (another data set might have yielded a very different numerical value) lots of uncertainty On the other hand, if this distribution has small variance, estimates vary little across possible data sets, and our one estimate may be quite informative about the true value of ψ (another data set might have yielded a similar value) only mild uncertainty MA/ST
14 Statistical inference Sampling distribution: The probability distribution of an estimator Properties of the sampling distribution characterize the uncertainty in the estimation procedure (estimator) Unbiasedness: Intuitively, for a particular ψ (or its components) Some values in the sample space are larger than ψ, some smaller All possible values of estimator for ψ should average out to ψ Formally, writing E ψ ( ) to denote expectation with respect to the distribution of (Y 1,...,Y n ) under ψ E ψ ( ψ) = ψ (5) An estimator that satisfies (5) is called an unbiased estimator MA/ST
15 Statistical inference Sampling variance and standard error: Quantify variation across possible values of ψ The sampling covariance matrix is the covariance matrix of the sampling distribution of b ψ(y ), which we write as var ψ ( b ψ) Diagonal elements of var ψ ( b ψ) are the sampling variances of the components of b ψ(y ), e.g., var( b ψ k ) for the kth component The standard q error is the standard deviation of the kth component of bψ(y ), var( ψ b k ) (on the same scale as ψ) Key factors determining magnitude of sampling variance: Variance of original probability distribution of Y (very well out of our control) Sample size n (often under our control experimental design ) MA/ST
16 Statistical inference Example, continued: Sampling distribution of ordinary least squares estimator (a component of ψ) The estimator is unbiased: E(Y ) = Xθ E( θ) = E{(X T X) 1 X T Y } = (X T X) 1 X T Xθ = θ The sampling variance: var(y ) = σ 2 I n var( θ) = var{(x T X) 1 X T Y } = (X T X) 1 X T var(y )X(X T X) 1 = σ 2 (X T X) 1 The sampling distribution itself is multivariate normal: θ N 2 {θ, σ 2 (X T X) 1 } Sampling variance depends on σ 2 (variance of the assumed probability distribution of Y j ) Writing (X T X) 1 = n 1 {n 1 X T X} 1, sampling variance decreases as n increases MA/ST
17 Statistical inference Absolute necessity: Whenever an estimate based on data is reported, it should be accompanied by an assessment of uncertainty based on the sampling distribution Natural approach: look at the standard error; here, for the kth component of θ, var( θ k ) = σ 2 (X T X) 1 kk σ 2 is not known, so provide an estimate of the standard error of the kth component (usually itself called standard error ) In the example, SE( θ k ) = σ 2 (X T X) 1 kk, k = 1, 2 How to interpret? (i) If θ 2 = 2.0 and SE( θ 2 ) = 3.0 very uncertain (ii) If θ 2 = 2.0 and SE( θ 2 ) = 0.03 feeling pretty good! Ways to improve precision (reduce uncertainty) if case (i)? MA/ST
18 Statistical inference In fact: We can do more: The entire sampling distribution can help us to make a probability statement to better characterize uncertainty For example, Z = bθ k θ k q σ 2 (X T X) 1 kk N(0, 1) standard normal distribution, k = 1, 2 When σ 2 is replaced by bσ 2, T = b θ k θ k SE( b θ k ) t n 2 Student s t distribution with n 2 degrees of freedom Let α = some small probability, so 1 α is large; e.g., α = 0.05 Let t 1 α/2 satisfy P(T t 1 α/2 ) = α/2, so, by symmetry, P(T t 1 α/2 ) = α/2 P( t 1 α/2 T t 1 α/2 ) = 1 α P { b θ 2 t 1 α/2 SE( b θ 2 ) θ 2 b θ 2 + t 1 α/2 SE( b θ 2 )} = 1 α MA/ST
19 Statistical inference Important: P { θ 2 t 1 α/2 SE( θ 2 ) θ 2 θ 2 + t 1 α/2 SE( θ 2 )} = 1 α A probability pertaining to the sampling distribution of θ 2 That is, for all possible realizations of Y of size n, probability is 1 α that the endpoints of the interval [ θ 2 t 1 α/2 SE( θ 2 ), θ 2 + t 1 α/2 SE( θ 2 ) ] will include (the fixed value) θ 2 Confidence interval MA/ST
20 Statistical inference Confidence interval: A probability statement about the procedure by which an estimator is constructed from a realization of Y NOT a probability statement about θ 2 (fixed) Interpretation: For all possible realizations of Y of size n, if we were to calculate the interval according to the estimation procedure, (1 α)% of such intervals would cover θ 2 Provides more information than just standard error alone how large or small SE must be relative to b θ 2 to feel confident that the procedure of data generation and estimation provides a reliable understanding ( confident quantified by α) α is chosen by the analyst For the example, α = 0.05, n large t 1 α/ (i) b θ 2 = 2.0, SE( b θ 2 ) = 3.0 gives [-3.88,7.88] no confidence (ii) b θ 2 = 2.0, SE( b θ 2 ) = 0.03 gives [1.94,2.09] feeling pretty confident! MA/ST
21 Statistical inference Warning: The numerical values themselves are meaningless except for the impression they give about the quality of the procedure Once a realization of data is in hand, the numerical interval either covers the true value of θ 2 or it doesn t Wrong interpretation: The probability is 1 α that θ 2 is between 3.88 and What we CAN say: We are (1 α)% confident that intervals constructed this way would cover θ 2 the numerical values give only a sense of the faith we should attach to results of gathering data the way we did MA/ST
22 Parametric vs. semiparametric models Recall: In our modeling context A statistical model describes all possible probability distributions for Y (conditional on U), indexed by a parameter ψ (under some set of assumptions) Statistical inference, formalized: Estimation of ψ and associated assessment of uncertainty (sampling distribution) Issues: (i) Derivation of sensible estimators for ψ? (ii) Competing estimators? Optimality? (iii) Sensitivity to possibly incorrect assumptions? (iv) Derivation of sampling distribution in complex models? (next section) MA/ST
23 Parametric vs. semiparametric models Example, continued: In the theophylline example we assumed Y j = f(t j, U, θ) + (ǫ 1j + ǫ 2j ), j = 1,...,n, Y = f(u, θ) + (ǫ 1 + ǫ 2 ) }{{}}{{} ǫ j ǫ E(ǫ 1 U) = 0, E(ǫ 2 U) = 0, var(ǫ 1 U) = σ 2 1I n, var(ǫ 2 U) = σ 2 2I n and ǫ 1 ǫ 2, so that E(ǫ U) = 0, var(ǫ U) = σ 2 I n, σ 2 = σ σ 2 2 and thus E(Y U) = f(u, θ), var(y U) = σ 2 I n (6) (6) is an assumption on certain features of the probability distribution of Y U [first two moments, parameterized by ψ = (θ T, σ 2 ) T ] In addition, we assumed that ǫ U is a multivariate normally distributed random vector, and hence so is Y U, with mean and covariance matrix given in (6) MA/ST
24 Parametric vs. semiparametric models Parametric statistical model: A statistical model in which the probability distribution of Y is completely specified For theophylline Y U N n {f(u, θ), σ 2 I n } All possible probability distributions are multivariate normal with this mean and covariance matrix (over all ψ) Under this assumption, all aspects of Y (given U) (probabilities of any event, all higher moments) are (in principle) known if we know ψ For (fully) parametric models, there is a standard approach to inference (coming up) But taking Y is multivariate normal is an assumption how will inference be affected if it s wrong? MA/ST
25 Parametric vs. semiparametric models Semiparametric statistical model: A statistical model in which the probability distribution of Y is only partially specified For theophylline E(Y U) = f(u, θ) var(y U) = σ 2 I n First two moments in terms of parameter ψ We could stop here this is all we re willing to say All possible probability distributions are any probability distributions having this mean and covariance matrix (over all ψ) Intuition fewer assumptions including more possible models, less chance for problems due to misspecification Price to pay Fewer assumptions cannot take advantage of additional information in full probability model if it s correct MA/ST
26 Parametric vs. semiparametric models Estimators in parametric models: Several possibilities The likelihood function: Suppose p(y ψ) is the pmf/pdf for Y under the assumed parametric model (suppress U) The notation emphasizes that p(y) is indexed by ψ (and is a function of y for fixed ψ) Given that Y = y is observed, the function of ψ defined by is called the likelihood function This is a function of ψ for fixed y L(ψ y) = p(y ψ) Remark: The likelihood function forms the basis for one of the most popular and defensible general strategies for finding estimators The maximum likelihood estimator MA/ST
27 Parametric vs. semiparametric models Demonstration: Y discrete random vector L(ψ y) = p(y ψ) = P ψ (Y = y), where the subscript emphasizes that the pmf is indexed by ψ Suppose ψ 1 and ψ 2 are two possible values for ψ such that P ψ1 (Y = y) = L(ψ 1 y) > L(ψ 2 y) = P ψ2 (Y = y) Interpretation The y we actually observed is more likely to have occurred if ψ = ψ 1 than if ψ = ψ 2 ψ 1 is a more plausible value for the true value of ψ than is ψ 2 Seems reasonable to examine the probability of the y we actually observed under different possible ψ Reasoning extends to continuous random vectors under some technicalities MA/ST
28 Parametric vs. semiparametric models Likelihood principle: If y 1 and y 2 are two possible realizations of Y such that L(ψ y 1 ) = C(y 1, y 2 )L(ψ y 2 ) for all ψ, for constant C(y 1, y 2 ) not depending on ψ, then conclusions drawn from observing y 1 or y 2 should be identical May view the likelihood function as a data reduction device the likelihood function contains all information about ψ in a sample C(y 1, y 2 ) = 1 y 1 and y 2 contain the same information about ψ Moreover, if the likelihood functions are only proportional, the information is still the same L(ψ 2 y 1 ) = 3L(ψ 1 y 1 ) L(ψ 2 y 2 ) = 3L(ψ 1 y 2 ) and will conclude ψ 2 is more plausible than ψ 1 whether we observe y 1 or y 2 MA/ST
29 Parametric vs. semiparametric models Definition: For each y Y, let ψ(y) be a parameter value where L(ψ y) attains its maximum as a function of ψ, with y held fixed. A maximum likelihood estimator for ψ is ψ(y ) Acronym MLE Intuitively reasonable MLE is the parameter value for which the observed y is most likely More formally MLE has optimality properties under the assumption that the specified probability model is correct Operationally must find the global maximum (possible computational issues) If L(ψ y) is differentiable in each component of ψ, possible candidate for MLE is the value of ψ satisfying ψ log L(ψ y) = 0 (Necessary but not sufficient, local maximum, etc) MA/ST
30 Parametric vs. semiparametric models Nice properties: Invariance if ψ is the MLE for ψ, then for any function τ(ψ), τ( ψ) is the MLE for τ(ψ) Is the best estimator for ψ in a certain sense (coming up) Example, continued: Simple linear regression model Y = (Y 1,...,Y n ) T Y j N(θ 1 + θ 2 t j, σ 2 ) are independent, ψ = (θ 1, θ 2, σ 2 ) T Under normality assumption, the likelihood function is n 1 L(ψ y) = exp { (y j θ 1 θ 2 t j ) 2 } 2πσ 2σ 2 j=1 log L(ψ y) = n log σ n (y j θ 1 θ 2 t j ) 2 /(2σ 2 ) j=1 MA/ST
31 Parametric vs. semiparametric models Example, continued: Clearly maximized in θ 1, θ 2 by minimizing n (y j θ 1 θ 2 t j ) 2 estimator minimizes j=1 n (Y j θ 1 θ 2 t j ) 2 j=1 Differentiating log L(ψ y) wrt θ 1, θ 2 yields n (y j θ 1 θ 2 t j ) 1 = 0 (7) t j j=1 The so-called normal equations Differentiating log L(ψ y) wrt σ 2 yields n σ 2 = n 1 (y j θ 1 θ 2 t j ) 2 j=1 MA/ST
32 Parametric vs. semiparametric models Result: The ordinary least squares estimator is the maximum likelihood estimator under the assumption of normality Inference in semiparametric models: Maximum likelihood seems infeasible without a full distributional assumption Derive estimators in other (sometimes seemingly ad hoc) ways MA/ST
33 Parametric vs. semiparametric models Example, continued: Simple linear regression model Only willing to assume E(Y j ) = θ 1 + θ 2 t j, var(y j ) = σ 2 Perspective minimize distance between model and data arg min θ n X j=1 (Y j θ 1 θ 2 t j ) 2 ordinary least squares estimator Alternatively, (7) can be written as X n n 1 X n A = n 1 (θ 1 + θ 2 t j 1 A t j t j j=1 Y j j=1 Equate the LHS to its expectation moment method Solving (7) turns out to yield the best estimator for θ in a certain sense if all one is willing specify is the mean and variance of Y j MA/ST
34 Parametric vs. semiparametric models Remarks: Identifiability of statistical models. A parameter ψ for a family of probability distributions is identifiable if distinct values of ψ correspond to distinct pmf/pdfs. I.e., if the pmf/pdf for Y is p(y ψ) under the model ψ ψ p(y ψ) is not the same function of y as p(y ψ ) Identifiability is a feature of the model, NOT of an estimator or estimation procedure Difficulty with nonidentifiability in inference. Observations governed by p(y ψ) and p(y ψ ) look the same no way to know if ψ or ψ is the true value (same likelihood function value) How to fix nonidentifiability. May revise model, impose constraints, make assumptions Limitations of data. Even if a model is identifiable, without sufficient data, it may be practically impossible to estimate some parameters, because the information needed is not available in the data MA/ST
35 Large sample approximation Complication: Unfortunately, things aren t always so easy For simple statistical models like the simple linear regression model with normality assumptions, deriving the sampling distribution and showing unbiasedness of the least squares estimator for θ is straightforward and available exactly However, things get ugly quickly; e.g., recall the theophylline example with the parametric model f(t,u, θ) = Y j U N {f(tj, U, θ), σ 2 } k afd V (k a k e ) {e k et e k at }, θ = (k a,k e, V ) T The ordinary least squares estimator minimizes nx {Y j f(t j, θ)} 2 solve j=1 nx {Y j f(t j, U, θ)}f θ (t j,uθ) = 0 j=1 f θ (t,u, θ) = f(t,u, θ) θ MA/ST
36 Large sample approximation Obviously not possible to solve for θ explicitly θ(y ) is a complicated function of Y Result: Simply moving from a linear model to a nonlinear (albeit simple) one complicates derivation of the sampling distribution for θ This is the case for most statistical models Further consideration: For semiparametric models, with no distributional assumption for Y U, even if an explicit expression θ were available, how to derive the full sampling distribution? Approach: Resort to an approximation Standard approximation evaluate properties of estimators for large sample size; i.e, for n Hope that properties so derived will provide a reasonable approximation to (intractable) properties for realistic (finite) n MA/ST
37 Large sample approximation Large sample (asymptotic) theory: Evaluate properties as sample size Require some basic concepts and tools... Generically: Let X n, n = 1, 2,... represent a sequence of random variables or vectors indexed by n, and let X be a random variable or vector [all on the same probability space (Ω, B, P)] Convergence in probability: X n converges in probability to X, written p X, if X n lim P( X n X < δ) = 1 for all δ > 0 n For random vectors, holds iff component-wise convergence Fact: If h( ) is continuous, then X n p X h(x n ) p h(x) MA/ST
38 Large sample approximation Usefulness: Identify X n as a sequence of estimators ψ n based on samples of size n for parameter ψ in a parametric or semiparametric model Assuming the model is correct, let ψ 0 be the true value of ψ; i.e, the value that actually generates data we might see Identify X with ψ 0 (a degenerate random variable) Consistency: An estimator b ψ n is said to be consistent for ψ 0 if bψ n p ψ 0 Practical interpretation: For n large, the probability is small that ψ b n takes value outside an arbitrarily small neighborhood of ψ 0 As n gets large, the probability that ψ b n will wander away from ψ 0 is small Consistency is good sort of like unbiasedness MA/ST
39 Large sample approximation Convergence in distribution: Suppose X n has cdf F n and X has cdf F. X n is said to converge in distribution (or converge in law) to X iff, for each continuity point of F lim F n(x) = F(x) n Write X n D X or Xn L X Fact if X n p X then X n L X Practical interpretation: If we are interested in probabilities associated with X n, we can replace them by probabilities for X Use to approximate sampling distribution of an estimator Not so fast... b ψ n p ψ 0 (consistent estimator) b ψ n L ψ 0 ψ 0 is a constant, so the limit distribution is degenerate not particularly useful! MA/ST
40 Large sample approximation Instead: Consider suitably centered and scaled version of ψ n with more interesting behavior Asymptotic normality (random variable): X n is asymptotically normal if we can find sequences {a n } and {c n } such that c n (X n a n ) L N(0, 1) (8) Notational convention : this means converges in law to a random variable with a standard normal distribution Advantage: (8) implies X n N(a n, c 2 n), where means distributed approximately as For regular estimators: It may be shown that n 1/2 ( ψ n ψ 0 ) where C = lim n C n so that MA/ST L N(0, C), ψ n N(ψ 0, n 1 C n )
41 Large sample approximation Asymptotic relative efficiency (ARE): Suppose we have two (1) (2) competing estimators ψ n and ψ n satisfying n 1/2 ( ψ (1) n ψ 0 ) L N(0, C 1 ), n 1/2 (2) ( ψ n ψ 0 ) L N(0, C 2 ) If dim ψ = 1, asymptotic relative efficiency of ψ (1) n to ψ (2) n and ψ (1) is preferred if ARE > 1 ARE = C 2 /C 1, If dim ψ = p > 1, then ARE = ( C 2 / C 1 ) 1/2 If ARE > 1 ψ (2), needs proportionately ARE as many observations as ψ (1) to achieve the same (large-n) precision MA/ST
42 Large sample approximation Maximum likelihood estimator: For a parametric model, if the model assumptions are all exactly correct, then the maximum likelihood estimator for ψ is asymptotically efficient The large-n distribution for a MLE achieves the smallest possible variance among all regular estimators MA/ST
MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems
MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Integrating Mathematical and Statistical Models Recap of mathematical models Models and data Statistical models and sources of
More informationMA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems
MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions
More informationA General Overview of Parametric Estimation and Inference Techniques.
A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying
More information5601 Notes: The Sandwich Estimator
560 Notes: The Sandwich Estimator Charles J. Geyer December 6, 2003 Contents Maximum Likelihood Estimation 2. Likelihood for One Observation................... 2.2 Likelihood for Many IID Observations...............
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationV. Properties of estimators {Parts C, D & E in this file}
A. Definitions & Desiderata. model. estimator V. Properties of estimators {Parts C, D & E in this file}. sampling errors and sampling distribution 4. unbiasedness 5. low sampling variance 6. low mean squared
More informationA Few Notes on Fisher Information (WIP)
A Few Notes on Fisher Information (WIP) David Meyer dmm@{-4-5.net,uoregon.edu} Last update: April 30, 208 Definitions There are so many interesting things about Fisher Information and its theoretical properties
More informationStatistical inference
Statistical inference Contents 1. Main definitions 2. Estimation 3. Testing L. Trapani MSc Induction - Statistical inference 1 1 Introduction: definition and preliminary theory In this chapter, we shall
More informationStatistics 3858 : Maximum Likelihood Estimators
Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random
More information6 Pattern Mixture Models
6 Pattern Mixture Models A common theme underlying the methods we have discussed so far is that interest focuses on making inference on parameters in a parametric or semiparametric model for the full data
More informationPOLI 8501 Introduction to Maximum Likelihood Estimation
POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,
More information7 Sensitivity Analysis
7 Sensitivity Analysis A recurrent theme underlying methodology for analysis in the presence of missing data is the need to make assumptions that cannot be verified based on the observed data. If the assumption
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationFREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE
FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE Donald A. Pierce Oregon State Univ (Emeritus), RERF Hiroshima (Retired), Oregon Health Sciences Univ (Adjunct) Ruggero Bellio Univ of Udine For Perugia
More informationPart 6: Multivariate Normal and Linear Models
Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of
More informationEstimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators
Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationThe Delta Method and Applications
Chapter 5 The Delta Method and Applications 5.1 Local linear approximations Suppose that a particular random sequence converges in distribution to a particular constant. The idea of using a first-order
More informationConfidence Intervals
Quantitative Foundations Project 3 Instructor: Linwei Wang Confidence Intervals Contents 1 Introduction 3 1.1 Warning....................................... 3 1.2 Goals of Statistics..................................
More information22 : Hilbert Space Embeddings of Distributions
10-708: Probabilistic Graphical Models 10-708, Spring 2014 22 : Hilbert Space Embeddings of Distributions Lecturer: Eric P. Xing Scribes: Sujay Kumar Jauhar and Zhiguang Huo 1 Introduction and Motivation
More informationMonte Carlo Studies. The response in a Monte Carlo study is a random variable.
Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating
More informationDD Advanced Machine Learning
Modelling Carl Henrik {chek}@csc.kth.se Royal Institute of Technology November 4, 2015 Who do I think you are? Mathematically competent linear algebra multivariate calculus Ok programmers Able to extend
More informationSo far our focus has been on estimation of the parameter vector β in the. y = Xβ + u
Interval estimation and hypothesis tests So far our focus has been on estimation of the parameter vector β in the linear model y i = β 1 x 1i + β 2 x 2i +... + β K x Ki + u i = x iβ + u i for i = 1, 2,...,
More informationJoint Probability Distributions and Random Samples (Devore Chapter Five)
Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete
More informationSTAT 512 sp 2018 Summary Sheet
STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}
More information1 Degree distributions and data
1 Degree distributions and data A great deal of effort is often spent trying to identify what functional form best describes the degree distribution of a network, particularly the upper tail of that distribution.
More informationHT Introduction. P(X i = x i ) = e λ λ x i
MODS STATISTICS Introduction. HT 2012 Simon Myers, Department of Statistics (and The Wellcome Trust Centre for Human Genetics) myers@stats.ox.ac.uk We will be concerned with the mathematical framework
More informationStatistics for scientists and engineers
Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3
More informationEvaluating the Performance of Estimators (Section 7.3)
Evaluating the Performance of Estimators (Section 7.3) Example: Suppose we observe X 1,..., X n iid N(θ, σ 2 0 ), with σ2 0 known, and wish to estimate θ. Two possible estimators are: ˆθ = X sample mean
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationMath 494: Mathematical Statistics
Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/
More information12 Statistical Justifications; the Bias-Variance Decomposition
Statistical Justifications; the Bias-Variance Decomposition 65 12 Statistical Justifications; the Bias-Variance Decomposition STATISTICAL JUSTIFICATIONS FOR REGRESSION [So far, I ve talked about regression
More informationStatistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation
Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence
More informationCentral Limit Theorem ( 5.3)
Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately
More informationBayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence
Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns
More informationMathematical statistics
October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter
More informationLeast Squares Estimation-Finite-Sample Properties
Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions
More informationProbability and Statistics
CHAPTER 5: PARAMETER ESTIMATION 5-0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 5: PARAMETER
More informationBayesian Linear Regression [DRAFT - In Progress]
Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory
More informationChapter 3: Maximum Likelihood Theory
Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood
More informationSpring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =
Spring 2012 Math 541A Exam 1 1. (a) Let Z i be independent N(0, 1), i = 1, 2,, n. Are Z = 1 n n Z i and S 2 Z = 1 n 1 n (Z i Z) 2 independent? Prove your claim. (b) Let X 1, X 2,, X n be independent identically
More informationLecture 2: Basic Concepts of Statistical Decision Theory
EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture
More information10. Linear Models and Maximum Likelihood Estimation
10. Linear Models and Maximum Likelihood Estimation ECE 830, Spring 2017 Rebecca Willett 1 / 34 Primary Goal General problem statement: We observe y i iid pθ, θ Θ and the goal is to determine the θ that
More informationStatistical Distribution Assumptions of General Linear Models
Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationLectures on Simple Linear Regression Stat 431, Summer 2012
Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population
More informationLecture 25: Review. Statistics 104. April 23, Colin Rundel
Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April
More informationUniversity of Regina. Lecture Notes. Michael Kozdron
University of Regina Statistics 252 Mathematical Statistics Lecture Notes Winter 2005 Michael Kozdron kozdron@math.uregina.ca www.math.uregina.ca/ kozdron Contents 1 The Basic Idea of Statistics: Estimating
More informationTesting Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata
Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function
More informationSpace Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses
Space Telescope Science Institute statistics mini-course October 2011 Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses James L Rosenberger Acknowledgements: Donald Richards, William
More informationSTATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN
Massimo Guidolin Massimo.Guidolin@unibocconi.it Dept. of Finance STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN SECOND PART, LECTURE 2: MODES OF CONVERGENCE AND POINT ESTIMATION Lecture 2:
More informationStatistical Estimation
Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from
More informationDiscrete Mathematics and Probability Theory Fall 2015 Lecture 21
CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about
More informationRegression with a Single Regressor: Hypothesis Tests and Confidence Intervals
Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals (SW Chapter 5) Outline. The standard error of ˆ. Hypothesis tests concerning β 3. Confidence intervals for β 4. Regression
More informationIntroduction to Estimation Theory
Introduction to Statistics IST, the slides, set 1 Introduction to Estimation Theory Richard D. Gill Dept. of Mathematics Leiden University September 18, 2009 Statistical Models Data, Parameters: Maths
More informationOptimization Problems
Optimization Problems The goal in an optimization problem is to find the point at which the minimum (or maximum) of a real, scalar function f occurs and, usually, to find the value of the function at that
More informationTheory of Maximum Likelihood Estimation. Konstantin Kashin
Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical
More informationIf we want to analyze experimental or simulated data we might encounter the following tasks:
Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction
More informationMachine learning - HT Maximum Likelihood
Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce
More informationACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H.
ACE 564 Spring 2006 Lecture 8 Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information by Professor Scott H. Irwin Readings: Griffiths, Hill and Judge. "Collinear Economic Variables,
More informationDependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.
Practitioner Course: Portfolio Optimization September 10, 2008 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x,
More informationLinear models. Linear models are computationally convenient and remain widely used in. applied econometric research
Linear models Linear models are computationally convenient and remain widely used in applied econometric research Our main focus in these lectures will be on single equation linear models of the form y
More informationRecall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n
Chapter 9 Hypothesis Testing 9.1 Wald, Rao, and Likelihood Ratio Tests Suppose we wish to test H 0 : θ = θ 0 against H 1 : θ θ 0. The likelihood-based results of Chapter 8 give rise to several possible
More informationThe problem is to infer on the underlying probability distribution that gives rise to the data S.
Basic Problem of Statistical Inference Assume that we have a set of observations S = { x 1, x 2,..., x N }, xj R n. The problem is to infer on the underlying probability distribution that gives rise to
More informationProbability. Table of contents
Probability Table of contents 1. Important definitions 2. Distributions 3. Discrete distributions 4. Continuous distributions 5. The Normal distribution 6. Multivariate random variables 7. Other continuous
More informationEC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)
1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For
More informationPeter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8
Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall
More informationProbabilities & Statistics Revision
Probabilities & Statistics Revision Christopher Ting Christopher Ting http://www.mysmu.edu/faculty/christophert/ : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 January 6, 2017 Christopher Ting QF
More informationBias Variance Trade-off
Bias Variance Trade-off The mean squared error of an estimator MSE(ˆθ) = E([ˆθ θ] 2 ) Can be re-expressed MSE(ˆθ) = Var(ˆθ) + (B(ˆθ) 2 ) MSE = VAR + BIAS 2 Proof MSE(ˆθ) = E((ˆθ θ) 2 ) = E(([ˆθ E(ˆθ)]
More informationUnbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.
Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it
More informationLecture 3: Introduction to Complexity Regularization
ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,
More informationSOLUTIONS Problem Set 2: Static Entry Games
SOLUTIONS Problem Set 2: Static Entry Games Matt Grennan January 29, 2008 These are my attempt at the second problem set for the second year Ph.D. IO course at NYU with Heski Bar-Isaac and Allan Collard-Wexler
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 12: Frequentist properties of estimators (v4) Ramesh Johari ramesh.johari@stanford.edu 1 / 39 Frequentist inference 2 / 39 Thinking like a frequentist Suppose that for some
More informationThe loss function and estimating equations
Chapter 6 he loss function and estimating equations 6 Loss functions Up until now our main focus has been on parameter estimating via the maximum likelihood However, the negative maximum likelihood is
More informationCentral limit theorem. Paninski, Intro. Math. Stats., October 5, probability, Z N P Z, if
Paninski, Intro. Math. Stats., October 5, 2005 35 probability, Z P Z, if P ( Z Z > ɛ) 0 as. (The weak LL is called weak because it asserts convergence in probability, which turns out to be a somewhat weak
More informationUnbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.
Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it
More informationPractice Problems Section Problems
Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationGMM and SMM. 1. Hansen, L Large Sample Properties of Generalized Method of Moments Estimators, Econometrica, 50, p
GMM and SMM Some useful references: 1. Hansen, L. 1982. Large Sample Properties of Generalized Method of Moments Estimators, Econometrica, 50, p. 1029-54. 2. Lee, B.S. and B. Ingram. 1991 Simulation estimation
More informationECO Class 6 Nonparametric Econometrics
ECO 523 - Class 6 Nonparametric Econometrics Carolina Caetano Contents 1 Nonparametric instrumental variable regression 1 2 Nonparametric Estimation of Average Treatment Effects 3 2.1 Asymptotic results................................
More information2. The Concept of Convergence: Ultrafilters and Nets
2. The Concept of Convergence: Ultrafilters and Nets NOTE: AS OF 2008, SOME OF THIS STUFF IS A BIT OUT- DATED AND HAS A FEW TYPOS. I WILL REVISE THIS MATE- RIAL SOMETIME. In this lecture we discuss two
More informationf(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain
0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher
More informationECE 275A Homework 7 Solutions
ECE 275A Homework 7 Solutions Solutions 1. For the same specification as in Homework Problem 6.11 we want to determine an estimator for θ using the Method of Moments (MOM). In general, the MOM estimator
More informationWooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics
Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).
More informationRestricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model
Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationLecture 2: Review of Basic Probability Theory
ECE 830 Fall 2010 Statistical Signal Processing instructor: R. Nowak, scribe: R. Nowak Lecture 2: Review of Basic Probability Theory Probabilistic models will be used throughout the course to represent
More informationLecture 1: August 28
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random
More informationRidge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation
Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking
More informationPeter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11
Contents 1 Motivation and definition 1 2 Least favorable prior 3 3 Least favorable prior sequence 11 4 Nonparametric problems 15 5 Minimax and admissibility 18 6 Superefficiency and sparsity 19 Most of
More information1 Differentiability at a point
Notes by David Groisser, Copyright c 2012 What does mean? These notes are intended as a supplement (not a textbook-replacement) for a class at the level of Calculus 3, but can be used in a higher-level
More informationStatistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That
Statistics Lecture 2 August 7, 2000 Frank Porter Caltech The plan for these lectures: The Fundamentals; Point Estimation Maximum Likelihood, Least Squares and All That What is a Confidence Interval? Interval
More informationLong-Run Covariability
Long-Run Covariability Ulrich K. Müller and Mark W. Watson Princeton University October 2016 Motivation Study the long-run covariability/relationship between economic variables great ratios, long-run Phillips
More informationChapter 6. Logistic Regression. 6.1 A linear model for the log odds
Chapter 6 Logistic Regression In logistic regression, there is a categorical response variables, often coded 1=Yes and 0=No. Many important phenomena fit this framework. The patient survives the operation,
More informationLecture 4: September Reminder: convergence of sequences
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 4: September 6 In this lecture we discuss the convergence of random variables. At a high-level, our first few lectures focused
More informationMathematical statistics
October 18 th, 2018 Lecture 16: Midterm review Countdown to mid-term exam: 7 days Week 1 Chapter 1: Probability review Week 2 Week 4 Week 7 Chapter 6: Statistics Chapter 7: Point Estimation Chapter 8:
More informationPhysics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester
Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability
More informationFINANCIAL OPTIMIZATION
FINANCIAL OPTIMIZATION Lecture 1: General Principles and Analytic Optimization Philip H. Dybvig Washington University Saint Louis, Missouri Copyright c Philip H. Dybvig 2008 Choose x R N to minimize f(x)
More informationEstimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1
Estimation and Model Selection in Mixed Effects Models Part I Adeline Samson 1 1 University Paris Descartes Summer school 2009 - Lipari, Italy These slides are based on Marc Lavielle s slides Outline 1
More information