MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Size: px

Start display at page:

Download "MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems"

Felicia Kelly
5 years ago
Views:

1 MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Principles of Statistical Inference Recap of statistical models Statistical inference (frequentist) Parametric vs. semiparametric models and inference Large sample approximation MA/ST

2 Recap of statistical models Theophylline example: Recap Mathematical model f(t, U, θ) = U = D = oral dose at t = 0 k afd V (k a k e ) {e k et e k at }, θ = (k a, k e, V ) T Y = (Y 1,...,Y n ) at times (t 1,...,t n ) Statistical model with intra-subject correlation Y U N n {f(u, θ), σ 2 1I n + σ 2 2Γ}, ψ = (θ T, σ 2 1, σ 2, φ) T (1) Simplified statistical model assuming negligible intra-subject correlation Y U N n {f(u, θ), σ 2 I n }, ψ = (θ T, σ 2 ) T (2) MA/ST

3 Recap of statistical models HIV example: Recap Y = (Y T 1,...,Y T n ) at times (t 1,...,t n ), where is observation at time t j on Y j = (Y (1) j, Y (2) j ) T bivariate {T 1 (t j ) + T 1 (t j ), V I (t j ) + V NI (t j )} T = (total CD4, viral load) Mathematical model: Not repeated here for brevity; 7 states x(t) = {T 1 (t), T 2 (t), T 1 (t), T 2 (t), V I (t), V NI (t), E(t)} T and observed continuous time ARV input U U(t) f(t, U, θ) = Ox(t, U, θ) = f(1) (t, U, θ) f (2) (t, U, θ) MA/ST

4 Recap of statistical models HIV example: Recap Usual simplified statistical model assuming all intra-subject correlations negligible Y j U N 2 {f(t, U, θ), V}, j = 1,...,n (3) V = σ(1)2 0 0 σ (2)2, ψ = (θ T, σ (1)2, σ (2)2 ) T More complicated statistical models of course possible; (3) is often tacitly assumed in inverse problems without comment regarding sources of variation MA/ST

5 Recap of statistical models Recall: A statistical model like (1), (2), or (3) describes the assumed family of all possible probability distributions for random vector Y representing the data generating mechanism for observations we might see at t 1,...,t n under conditions U Indexed by a parameter vector ψ We are interested in the component θ of ψ that governs the mathematical model, but the other components are needed to describe fully the statistical model, so we are stuck dealing with all of ψ The assumed statistical model may or may not be correct MA/ST

6 Statistical inference For definiteness: We discuss basic principles of statistical inference in the context of statistical models of the form Y j = f(t j, U, θ) + ǫ j, where Y j is scalar (generalize to multivariate Y j later...) (Y 1,...,Y n ) are independent (conditional on U) with Y U = (Y 1,...,Y n ) T N n {f(u, θ), σ 2 I n }, ψ = (θ T, σ 2 ) T, Y j U N {f(t j, U, θ), σ 2 } I.e., Y j, j = 1,...,n, may be regarded as representing the result of a potential draw from the normal probability distribution describing how such observations would turn out at t j (given phenomena like measurement error and biological fluctuation ) under conditions U MA/ST

7 Statistical inference Conceptually: The statistical model is a formal representation of the population of all possible realizations of (Y 1,...,Y n ) we would ever see under U When we collect data, we observe a sample from the population; i.e., we get to see a single realization of (Y 1,...,Y n ), (y 1,...,y n ) (and a realization u of U, say) Objective, restated: What can we learn about the true value of ψ (which determines the nature of the population) from a sample? Do not get to observe the entire population (or else we d know ψ) How uncertain will we be? What if we re wrong about the statistical model (later)? Statistical inference (loosely speaking): Making statements about a population of interest on the basis of only a sample from the population MA/ST

8 Statistical inference Statistic: Any function of a random variable(s) Parameter (point) estimation: Construct a function of (Y 1,...,Y n ) (i.e, a statistic ) that, if evaluated at a particular realization (y 1,...,y n ), yields a numerical value that gives information on the true value of ψ Estimator: The function itself Estimate: The numerical value based on a particular realization Estimation: Used both to denote the procedure (estimator) and actual calculation of a numerical value (estimate) Also depend on U and u, of course Remark: The distinction between estimator and estimate is often subject to abuse in the literature Meaning is often clear from the context MA/ST

9 Statistical inference Simplest example: Suppose that f(t, U, θ) = θ 1 + θ 2 t, θ = (θ 1, θ 2 ) T Simple linear regression model No U here so suppress conditioning Usual assumption Y j N(θ 1 + θ 2 t j, σ 2 ), Y 1,...,Y n all, ψ = (θ T, σ 2 ) T Standard estimator for θ: The ordinary least squares estimator for θ arg min θ n j=1 (Y j θ 1 θ 2 t j ) 2 (4) For a particular data set, the estimate of θ minimizes P n j=1 (y j θ 1 θ 2 t j ) 2 The rest of ψ, σ 2, can be estimated separately (later) Note that (4) is the usual objective function adopted in inverse problems for arbitrary f(t, U, θ) MA/ST

10 Statistical inference Closed form solution: Defining 1 t 1 1 t 2 X =.., y = y 1 y 2. 1 t n y n and minimizing yields the estimator and estimate θ(y ) = (X T X) T X T Y and θ = (X T X) T X T y Convention : means estimator or estimate (for θ here) Emphasis that the estimator θ(y ) is a function of (Y 1,...,Y n ) is usually suppressed MA/ST

11 Statistical inference Motivation for ordinary least squares: Coming up Separate estimation of σ 2 : ψ = ( θ T, σ 2 ) T n σ 2 = (n 2) 1 (Y j θ 1 θ 2 t j ) 2 Question: How good is θ = θ(y ) as an estimator for the true value of θ? More generally, how good is ψ = ψ(y ) as an estimator for the true value of ψ? A question about the procedure, not a particular numerical value What do we mean by good? Can we characterize the extent of uncertainty? j=1 MA/ST

12 Statistical inference Key idea: An estimator is a function of random variables that represent the data generating mechanism Thus, for any value of ψ, the estimator itself has probability distribution (that depends on that of (Y 1,...,Y n ) and hence on ψ) In general problems, this distribution is conditional on U It is typical to suppress this dependence for brevity in general arguments; it is understood that everything is conditional on U (we ll do that here) MA/ST

13 Statistical inference Conceptually: Each possible realization of (Y 1,...,Y n ) yields a (numerical) value of ψ; b i.e., the random vector ψ(y b ) has sample space comprising all these values We may thus think of the probability distribution of ψ(y b ) as representing probabilities for values of ψ b that would be observed across all possible data sets we could end up with When we collect data, we end up with only one of these If this distribution has large variance, estimates vary a lot across possible data sets, and our one estimate tells us little about the true value of ψ (another data set might have yielded a very different numerical value) lots of uncertainty On the other hand, if this distribution has small variance, estimates vary little across possible data sets, and our one estimate may be quite informative about the true value of ψ (another data set might have yielded a similar value) only mild uncertainty MA/ST

14 Statistical inference Sampling distribution: The probability distribution of an estimator Properties of the sampling distribution characterize the uncertainty in the estimation procedure (estimator) Unbiasedness: Intuitively, for a particular ψ (or its components) Some values in the sample space are larger than ψ, some smaller All possible values of estimator for ψ should average out to ψ Formally, writing E ψ ( ) to denote expectation with respect to the distribution of (Y 1,...,Y n ) under ψ E ψ ( ψ) = ψ (5) An estimator that satisfies (5) is called an unbiased estimator MA/ST

15 Statistical inference Sampling variance and standard error: Quantify variation across possible values of ψ The sampling covariance matrix is the covariance matrix of the sampling distribution of b ψ(y ), which we write as var ψ ( b ψ) Diagonal elements of var ψ ( b ψ) are the sampling variances of the components of b ψ(y ), e.g., var( b ψ k ) for the kth component The standard q error is the standard deviation of the kth component of bψ(y ), var( ψ b k ) (on the same scale as ψ) Key factors determining magnitude of sampling variance: Variance of original probability distribution of Y (very well out of our control) Sample size n (often under our control experimental design ) MA/ST

16 Statistical inference Example, continued: Sampling distribution of ordinary least squares estimator (a component of ψ) The estimator is unbiased: E(Y ) = Xθ E( θ) = E{(X T X) 1 X T Y } = (X T X) 1 X T Xθ = θ The sampling variance: var(y ) = σ 2 I n var( θ) = var{(x T X) 1 X T Y } = (X T X) 1 X T var(y )X(X T X) 1 = σ 2 (X T X) 1 The sampling distribution itself is multivariate normal: θ N 2 {θ, σ 2 (X T X) 1 } Sampling variance depends on σ 2 (variance of the assumed probability distribution of Y j ) Writing (X T X) 1 = n 1 {n 1 X T X} 1, sampling variance decreases as n increases MA/ST

17 Statistical inference Absolute necessity: Whenever an estimate based on data is reported, it should be accompanied by an assessment of uncertainty based on the sampling distribution Natural approach: look at the standard error; here, for the kth component of θ, var( θ k ) = σ 2 (X T X) 1 kk σ 2 is not known, so provide an estimate of the standard error of the kth component (usually itself called standard error ) In the example, SE( θ k ) = σ 2 (X T X) 1 kk, k = 1, 2 How to interpret? (i) If θ 2 = 2.0 and SE( θ 2 ) = 3.0 very uncertain (ii) If θ 2 = 2.0 and SE( θ 2 ) = 0.03 feeling pretty good! Ways to improve precision (reduce uncertainty) if case (i)? MA/ST

18 Statistical inference In fact: We can do more: The entire sampling distribution can help us to make a probability statement to better characterize uncertainty For example, Z = bθ k θ k q σ 2 (X T X) 1 kk N(0, 1) standard normal distribution, k = 1, 2 When σ 2 is replaced by bσ 2, T = b θ k θ k SE( b θ k ) t n 2 Student s t distribution with n 2 degrees of freedom Let α = some small probability, so 1 α is large; e.g., α = 0.05 Let t 1 α/2 satisfy P(T t 1 α/2 ) = α/2, so, by symmetry, P(T t 1 α/2 ) = α/2 P( t 1 α/2 T t 1 α/2 ) = 1 α P { b θ 2 t 1 α/2 SE( b θ 2 ) θ 2 b θ 2 + t 1 α/2 SE( b θ 2 )} = 1 α MA/ST

19 Statistical inference Important: P { θ 2 t 1 α/2 SE( θ 2 ) θ 2 θ 2 + t 1 α/2 SE( θ 2 )} = 1 α A probability pertaining to the sampling distribution of θ 2 That is, for all possible realizations of Y of size n, probability is 1 α that the endpoints of the interval [ θ 2 t 1 α/2 SE( θ 2 ), θ 2 + t 1 α/2 SE( θ 2 ) ] will include (the fixed value) θ 2 Confidence interval MA/ST

20 Statistical inference Confidence interval: A probability statement about the procedure by which an estimator is constructed from a realization of Y NOT a probability statement about θ 2 (fixed) Interpretation: For all possible realizations of Y of size n, if we were to calculate the interval according to the estimation procedure, (1 α)% of such intervals would cover θ 2 Provides more information than just standard error alone how large or small SE must be relative to b θ 2 to feel confident that the procedure of data generation and estimation provides a reliable understanding ( confident quantified by α) α is chosen by the analyst For the example, α = 0.05, n large t 1 α/ (i) b θ 2 = 2.0, SE( b θ 2 ) = 3.0 gives [-3.88,7.88] no confidence (ii) b θ 2 = 2.0, SE( b θ 2 ) = 0.03 gives [1.94,2.09] feeling pretty confident! MA/ST

21 Statistical inference Warning: The numerical values themselves are meaningless except for the impression they give about the quality of the procedure Once a realization of data is in hand, the numerical interval either covers the true value of θ 2 or it doesn t Wrong interpretation: The probability is 1 α that θ 2 is between 3.88 and What we CAN say: We are (1 α)% confident that intervals constructed this way would cover θ 2 the numerical values give only a sense of the faith we should attach to results of gathering data the way we did MA/ST

22 Parametric vs. semiparametric models Recall: In our modeling context A statistical model describes all possible probability distributions for Y (conditional on U), indexed by a parameter ψ (under some set of assumptions) Statistical inference, formalized: Estimation of ψ and associated assessment of uncertainty (sampling distribution) Issues: (i) Derivation of sensible estimators for ψ? (ii) Competing estimators? Optimality? (iii) Sensitivity to possibly incorrect assumptions? (iv) Derivation of sampling distribution in complex models? (next section) MA/ST

23 Parametric vs. semiparametric models Example, continued: In the theophylline example we assumed Y j = f(t j, U, θ) + (ǫ 1j + ǫ 2j ), j = 1,...,n, Y = f(u, θ) + (ǫ 1 + ǫ 2 ) }{{}}{{} ǫ j ǫ E(ǫ 1 U) = 0, E(ǫ 2 U) = 0, var(ǫ 1 U) = σ 2 1I n, var(ǫ 2 U) = σ 2 2I n and ǫ 1 ǫ 2, so that E(ǫ U) = 0, var(ǫ U) = σ 2 I n, σ 2 = σ σ 2 2 and thus E(Y U) = f(u, θ), var(y U) = σ 2 I n (6) (6) is an assumption on certain features of the probability distribution of Y U [first two moments, parameterized by ψ = (θ T, σ 2 ) T ] In addition, we assumed that ǫ U is a multivariate normally distributed random vector, and hence so is Y U, with mean and covariance matrix given in (6) MA/ST

24 Parametric vs. semiparametric models Parametric statistical model: A statistical model in which the probability distribution of Y is completely specified For theophylline Y U N n {f(u, θ), σ 2 I n } All possible probability distributions are multivariate normal with this mean and covariance matrix (over all ψ) Under this assumption, all aspects of Y (given U) (probabilities of any event, all higher moments) are (in principle) known if we know ψ For (fully) parametric models, there is a standard approach to inference (coming up) But taking Y is multivariate normal is an assumption how will inference be affected if it s wrong? MA/ST

25 Parametric vs. semiparametric models Semiparametric statistical model: A statistical model in which the probability distribution of Y is only partially specified For theophylline E(Y U) = f(u, θ) var(y U) = σ 2 I n First two moments in terms of parameter ψ We could stop here this is all we re willing to say All possible probability distributions are any probability distributions having this mean and covariance matrix (over all ψ) Intuition fewer assumptions including more possible models, less chance for problems due to misspecification Price to pay Fewer assumptions cannot take advantage of additional information in full probability model if it s correct MA/ST

26 Parametric vs. semiparametric models Estimators in parametric models: Several possibilities The likelihood function: Suppose p(y ψ) is the pmf/pdf for Y under the assumed parametric model (suppress U) The notation emphasizes that p(y) is indexed by ψ (and is a function of y for fixed ψ) Given that Y = y is observed, the function of ψ defined by is called the likelihood function This is a function of ψ for fixed y L(ψ y) = p(y ψ) Remark: The likelihood function forms the basis for one of the most popular and defensible general strategies for finding estimators The maximum likelihood estimator MA/ST

27 Parametric vs. semiparametric models Demonstration: Y discrete random vector L(ψ y) = p(y ψ) = P ψ (Y = y), where the subscript emphasizes that the pmf is indexed by ψ Suppose ψ 1 and ψ 2 are two possible values for ψ such that P ψ1 (Y = y) = L(ψ 1 y) > L(ψ 2 y) = P ψ2 (Y = y) Interpretation The y we actually observed is more likely to have occurred if ψ = ψ 1 than if ψ = ψ 2 ψ 1 is a more plausible value for the true value of ψ than is ψ 2 Seems reasonable to examine the probability of the y we actually observed under different possible ψ Reasoning extends to continuous random vectors under some technicalities MA/ST

28 Parametric vs. semiparametric models Likelihood principle: If y 1 and y 2 are two possible realizations of Y such that L(ψ y 1 ) = C(y 1, y 2 )L(ψ y 2 ) for all ψ, for constant C(y 1, y 2 ) not depending on ψ, then conclusions drawn from observing y 1 or y 2 should be identical May view the likelihood function as a data reduction device the likelihood function contains all information about ψ in a sample C(y 1, y 2 ) = 1 y 1 and y 2 contain the same information about ψ Moreover, if the likelihood functions are only proportional, the information is still the same L(ψ 2 y 1 ) = 3L(ψ 1 y 1 ) L(ψ 2 y 2 ) = 3L(ψ 1 y 2 ) and will conclude ψ 2 is more plausible than ψ 1 whether we observe y 1 or y 2 MA/ST

29 Parametric vs. semiparametric models Definition: For each y Y, let ψ(y) be a parameter value where L(ψ y) attains its maximum as a function of ψ, with y held fixed. A maximum likelihood estimator for ψ is ψ(y ) Acronym MLE Intuitively reasonable MLE is the parameter value for which the observed y is most likely More formally MLE has optimality properties under the assumption that the specified probability model is correct Operationally must find the global maximum (possible computational issues) If L(ψ y) is differentiable in each component of ψ, possible candidate for MLE is the value of ψ satisfying ψ log L(ψ y) = 0 (Necessary but not sufficient, local maximum, etc) MA/ST

30 Parametric vs. semiparametric models Nice properties: Invariance if ψ is the MLE for ψ, then for any function τ(ψ), τ( ψ) is the MLE for τ(ψ) Is the best estimator for ψ in a certain sense (coming up) Example, continued: Simple linear regression model Y = (Y 1,...,Y n ) T Y j N(θ 1 + θ 2 t j, σ 2 ) are independent, ψ = (θ 1, θ 2, σ 2 ) T Under normality assumption, the likelihood function is n 1 L(ψ y) = exp { (y j θ 1 θ 2 t j ) 2 } 2πσ 2σ 2 j=1 log L(ψ y) = n log σ n (y j θ 1 θ 2 t j ) 2 /(2σ 2 ) j=1 MA/ST

31 Parametric vs. semiparametric models Example, continued: Clearly maximized in θ 1, θ 2 by minimizing n (y j θ 1 θ 2 t j ) 2 estimator minimizes j=1 n (Y j θ 1 θ 2 t j ) 2 j=1 Differentiating log L(ψ y) wrt θ 1, θ 2 yields n (y j θ 1 θ 2 t j ) 1 = 0 (7) t j j=1 The so-called normal equations Differentiating log L(ψ y) wrt σ 2 yields n σ 2 = n 1 (y j θ 1 θ 2 t j ) 2 j=1 MA/ST

32 Parametric vs. semiparametric models Result: The ordinary least squares estimator is the maximum likelihood estimator under the assumption of normality Inference in semiparametric models: Maximum likelihood seems infeasible without a full distributional assumption Derive estimators in other (sometimes seemingly ad hoc) ways MA/ST

33 Parametric vs. semiparametric models Example, continued: Simple linear regression model Only willing to assume E(Y j ) = θ 1 + θ 2 t j, var(y j ) = σ 2 Perspective minimize distance between model and data arg min θ n X j=1 (Y j θ 1 θ 2 t j ) 2 ordinary least squares estimator Alternatively, (7) can be written as X n n 1 X n A = n 1 (θ 1 + θ 2 t j 1 A t j t j j=1 Y j j=1 Equate the LHS to its expectation moment method Solving (7) turns out to yield the best estimator for θ in a certain sense if all one is willing specify is the mean and variance of Y j MA/ST

34 Parametric vs. semiparametric models Remarks: Identifiability of statistical models. A parameter ψ for a family of probability distributions is identifiable if distinct values of ψ correspond to distinct pmf/pdfs. I.e., if the pmf/pdf for Y is p(y ψ) under the model ψ ψ p(y ψ) is not the same function of y as p(y ψ ) Identifiability is a feature of the model, NOT of an estimator or estimation procedure Difficulty with nonidentifiability in inference. Observations governed by p(y ψ) and p(y ψ ) look the same no way to know if ψ or ψ is the true value (same likelihood function value) How to fix nonidentifiability. May revise model, impose constraints, make assumptions Limitations of data. Even if a model is identifiable, without sufficient data, it may be practically impossible to estimate some parameters, because the information needed is not available in the data MA/ST

35 Large sample approximation Complication: Unfortunately, things aren t always so easy For simple statistical models like the simple linear regression model with normality assumptions, deriving the sampling distribution and showing unbiasedness of the least squares estimator for θ is straightforward and available exactly However, things get ugly quickly; e.g., recall the theophylline example with the parametric model f(t,u, θ) = Y j U N {f(tj, U, θ), σ 2 } k afd V (k a k e ) {e k et e k at }, θ = (k a,k e, V ) T The ordinary least squares estimator minimizes nx {Y j f(t j, θ)} 2 solve j=1 nx {Y j f(t j, U, θ)}f θ (t j,uθ) = 0 j=1 f θ (t,u, θ) = f(t,u, θ) θ MA/ST

36 Large sample approximation Obviously not possible to solve for θ explicitly θ(y ) is a complicated function of Y Result: Simply moving from a linear model to a nonlinear (albeit simple) one complicates derivation of the sampling distribution for θ This is the case for most statistical models Further consideration: For semiparametric models, with no distributional assumption for Y U, even if an explicit expression θ were available, how to derive the full sampling distribution? Approach: Resort to an approximation Standard approximation evaluate properties of estimators for large sample size; i.e, for n Hope that properties so derived will provide a reasonable approximation to (intractable) properties for realistic (finite) n MA/ST

37 Large sample approximation Large sample (asymptotic) theory: Evaluate properties as sample size Require some basic concepts and tools... Generically: Let X n, n = 1, 2,... represent a sequence of random variables or vectors indexed by n, and let X be a random variable or vector [all on the same probability space (Ω, B, P)] Convergence in probability: X n converges in probability to X, written p X, if X n lim P( X n X < δ) = 1 for all δ > 0 n For random vectors, holds iff component-wise convergence Fact: If h( ) is continuous, then X n p X h(x n ) p h(x) MA/ST

38 Large sample approximation Usefulness: Identify X n as a sequence of estimators ψ n based on samples of size n for parameter ψ in a parametric or semiparametric model Assuming the model is correct, let ψ 0 be the true value of ψ; i.e, the value that actually generates data we might see Identify X with ψ 0 (a degenerate random variable) Consistency: An estimator b ψ n is said to be consistent for ψ 0 if bψ n p ψ 0 Practical interpretation: For n large, the probability is small that ψ b n takes value outside an arbitrarily small neighborhood of ψ 0 As n gets large, the probability that ψ b n will wander away from ψ 0 is small Consistency is good sort of like unbiasedness MA/ST

39 Large sample approximation Convergence in distribution: Suppose X n has cdf F n and X has cdf F. X n is said to converge in distribution (or converge in law) to X iff, for each continuity point of F lim F n(x) = F(x) n Write X n D X or Xn L X Fact if X n p X then X n L X Practical interpretation: If we are interested in probabilities associated with X n, we can replace them by probabilities for X Use to approximate sampling distribution of an estimator Not so fast... b ψ n p ψ 0 (consistent estimator) b ψ n L ψ 0 ψ 0 is a constant, so the limit distribution is degenerate not particularly useful! MA/ST

40 Large sample approximation Instead: Consider suitably centered and scaled version of ψ n with more interesting behavior Asymptotic normality (random variable): X n is asymptotically normal if we can find sequences {a n } and {c n } such that c n (X n a n ) L N(0, 1) (8) Notational convention : this means converges in law to a random variable with a standard normal distribution Advantage: (8) implies X n N(a n, c 2 n), where means distributed approximately as For regular estimators: It may be shown that n 1/2 ( ψ n ψ 0 ) where C = lim n C n so that MA/ST L N(0, C), ψ n N(ψ 0, n 1 C n )

41 Large sample approximation Asymptotic relative efficiency (ARE): Suppose we have two (1) (2) competing estimators ψ n and ψ n satisfying n 1/2 ( ψ (1) n ψ 0 ) L N(0, C 1 ), n 1/2 (2) ( ψ n ψ 0 ) L N(0, C 2 ) If dim ψ = 1, asymptotic relative efficiency of ψ (1) n to ψ (2) n and ψ (1) is preferred if ARE > 1 ARE = C 2 /C 1, If dim ψ = p > 1, then ARE = ( C 2 / C 1 ) 1/2 If ARE > 1 ψ (2), needs proportionately ARE as many observations as ψ (1) to achieve the same (large-n) precision MA/ST

42 Large sample approximation Maximum likelihood estimator: For a parametric model, if the model assumptions are all exactly correct, then the maximum likelihood estimator for ψ is asymptotically efficient The large-n distribution for a MLE achieves the smallest possible variance among all regular estimators MA/ST

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Integrating Mathematical and Statistical Models Recap of mathematical models Models and data Statistical models and sources of