Applied Multivariate and Longitudinal Data Analysis

Size: px
Start display at page:

Download "Applied Multivariate and Longitudinal Data Analysis"

Transcription

1 Applied Multivariate and Longitudinal Data Analysis Longitudinal Data Analysis: General Linear Model Ana-Maria Staicu SAS Hall 522; ;

2 Introduction Consider the following examples. Pay attention to what is the response variable, what is the observational unit, how many measurements are collected per unit. Low flux dialyzers are used to treat patients with end stage renal disease to remove excess fluid and waste from their blood. In low flux hemodialysis, the ultrafiltration rate (ml/hr) at which fluid is removed is thought to follow a straight line relationship with the transmembrane pressure (mmhg) applied across the dialyzer membrane. A study was conducted to compare the average ultrafiltration rate (the response) of such dialyzers across three dialysis centers where they are used on patients. A total of 4 dialyzers (units) were involved. The experiment involved recording the ultrafiltration rate at 4 transmembrane pressures (depicted by dots in the Figure below) for each dialyzer. 2

3 Dietary Calcium Absorption Data. Calcium absorption is measured for 88 subjects aged between 35 to 45 years at the beginning of the study. There are between one to four measurements are taken. How does the typical calcium absorption vary across the subjects? Age Calcium absorption Age Calcium absorption Calcium absorption for 2 subjects Calcium absorption for all subjects 3

4 Outline. In the second part of the course we will focus on statistical models and methods for studies in which individuals/subjects/objects/units are measured repeatedly over time. Specifically we will focus on modeling the longitudinal data. We will discuss modeling using a marginal perspective (i.e. aggregating the among/between sources of variability) as well as modeling using a subject specifica perspective (i.e. models explicitly all the sources of variability in the data). This chapter considers a general model perspective. We discuss flexible modeling of the mean trajectory (that incorporate the time specifically) and various covariance structures to model the dependence. The modeling techniques allow for incorporation of covariate information. We discuss estimation/statistical inference of the mean parameters and estimation of the covariance parameters. Basic concepts and Notations Response is the outcome of interest (denoted typically by Y ). Unit (object or subject) is the object on which repeated measurements are taken; typically they are individuals (i indexes units and j indexes the repeated measurement). Y ij - denotes the jth repeated measurement taken on the ith subject or unit; n denotes the total number of units and m i denotes the number of repeated measurements for unit i. The response vector of measurements for subject/individual/object/unit i is: Y i Y i2 Y i =. The responses are typically assumed independent across units (e.g. Y i, Y i are independent for i i ). However within the unit the responses are correlated (e.g. Y ij and Y ij are typically correlated). Many statistical models consider modeling the response vector Y i and not the Y ij s separately; nevertheless it is not uncommon to model Y ij s separately. We ll discuss modes that exploit both representations. Time is the generic term for the condition of measurement (t is used to denote time). Time is considered an important covariate in longitudinal data; it is modeled differently than the other covariates in the data. Both the mean of the response vector Y i and the covariance of Y i may be depend on the time. t ij - denotes the time corresponding to the Y ij. Y imi We say the design is balanced when m i = m (same number of repeated measurements across units). Otherwise we say the design is unbalanced. We say the design is regular if t ij = t j (the times of measurements are the same for all the units). Otherwise we say the design is irregular. 4

5 Although not specified explicitly, it is assumed that times occur in an increasing order t i < t i2 <... < t imi. General data structure for a balanced, regular design (m i = m and t ij = t i ) is: t t 2 t 3... t m Units Y Y 2 Y 3... Y m 2 Y 2 Y 22 Y Y 2m n Y n Y n2 Y n3... Y nm Setting: In the following, consider the observed data: {(Y ij, t ij ) : j =,..., m i } i, where Y ij is assumed to be continuous. For simplicity we assume t ij = t j and m i = m (balanced and regular design). We are interested in studying the typical behavior of the outcome over time, and furthermore in studying the way the outcome vary over time. Modeling longitudinal data is more complex than modeling independent data; multiple observations from the same person are correlated need to model the correlation among the repeated measurements modeling the mean trend across time requires attention typically the effect of the other possible predictors is modeled in the mean (systematic part). Conceptual model: For continuous data we write: DATA ij = Mean j + Residual ij, where Mean j is the average response corresponding time t j and Residual ij describes the deviation of the data DATA ij from the mean Mean j. The mean describes how the response changes on average over time. If additional factors (or covariate info such as group, additional subject information) are available then the mean may depend on these factors. Common notation: µ = (µ,..., µ m ) T for the mean vector. The residual determines how far the data deviate from its mean. It determines the distribution of the response (in this part is commonly assumed normal). It also determines how the repeated observations correlate over time. 5

6 The three main steps in modeling longitudinal data are: modeling the mean, the covariance, and selecting the distribution. In each of these it is imperative that we look at the data and use any available visualization tools. Mean. Because the elements of the mean vector µ = (µ..., µ m ) T are arranged in time increasing order t < t 2 <... < t n (µ j corresponds to t j ), we refer to the mean µ as a mean trajectory instead and less as a mean vector (common in multivariate statistics). Examples: µ j = µ(t j ) = a + bt j (linear trajectory) µ j = µ(t j ) = a + bt j + ct 2 j (quadratic trajectory) These are representation of the mean trajectory using a finite set of parameters, (parametric structures of the mean function). By an abuse of notation, in this example µ( ) was used to denote a function. Random deviation. Sources of variation. For longitudinal data there are two main types of potential sources of variation in the data Among-unit variation: this is the variation that occurs among units (subjects/individuals/objects/units are different). Within-unit variation: this is the fluctuation of the response that occurs from one measurement to another within the same subject/individual/object/unit. Measurement error, for instance, is included in this source. The next figure depicts the two sources of variation. Left panel: subject mean trajectory ( inherent trend for each subject ) in dashed line; overall mean trajectory in solid black line. The variation between these curves represents the among units variation. Middle panel: the true subject trajectory, where the deviations from the mean subject trajectories are due to the biological variation of the responses (think of fluctuations of one s heart blood pressure from one time to another). Right panel shows the observed data for each subject. Notice that the measurements deviate more, and this is due to measurement error (say imperfections in the measuring device). 6

7 subject mean response true subject response observed subject response (filled circles) time time time Illustration: One simple way to represent how the data Data ij vary with the time t ij = t j is the following: Data ij = Mean j + SubjSpecific i + BiologicalDev ij + Error 2ij where Mean j = µ j is the overall (population) mean at time point t ij = t j SubjSpecific i represents the biological variation of the ith unit; this deviation dictates the inherent trend of the ith subject, BiologicalDev ij is the component of deviation from the subject s trend that is due to the biological variation over time within the subject, Error 2ij is the component of the deviation that is due to the measurement error. 7

8 8

9 Exploratory analysis: Look at your data as much as possible (in general we don t look enough at the data). Visualization. Spaghetti plots are a method of viewing data to visualize the dynamic behavior over time, corresponding to each unit/subject. Notice the measurements for each subject are connected with line; but measurements from different subjects are not connected. Example: Researchers are interested in the development of kids over time. They collect dental growth measurements of the distance (mm) from the center of the pituitary gland to the pterygomaxillary fissure for 27 kids ( girls and 6 boys) at ages 8,, 2, and 4. A picture of the pterygomaxillary fissure can be found at The interest is in how the dental growth measurements vary over time, if they are different in boys and girls, and furthermore if the rate of change is different for boys than girls. The Figure below displays the distance measurements per age for each child. The plotting symbols denote girls () and boys (), and the trajectory for each child is connected by a solid line so that individual child patterns may be seen. spaghetti plot distance Mean Distance for girls () and boys () Age(years) Age (years) Mean. The primary objective in LDA is estimation and inference of the mean function. In our setting we have a mean vector, µ = (µ,..., µ m ) T, but recall µ j corresponds to the time t j. In the case of balanced and regular design, an estimator for the mean is obtained by the sample mean. Likewise an estimator for the error variance is obtained by the sample covariance. Most longitudinal studies do not involve balanced and regular designs; estimator in these cases are not this simple. 9

10 The estimator of µ = (µ,..., µ m ) T is µ defined by: µ Ȳ µ 2 µ =. = Ȳ 2. µ m where Ȳj = n n i= Y ij. Graphical inspection of the mean vector is an important tool to understand the possible relationship of the means over time. Examine how µ j changes with time t j. In particular look for a linear trend, or curvature for a quadratic trend, etc. We apply this estimator to the dental study data for girls and obtain the estimate: 2.8 µ G = Ȳ m Variation/Correlation. An unbiased estimator for the covariance is the sample covariance Σ = n n (Y i µ)(y i µ) T ; i= the numerator n i= (Y i µ)(y i µ) T is also known as the sums of square and cross-product matrix (SS & CP). Denote the elements of this matrix as ( Σ jk ) j,k n and also let σ j = Σjj. To describe the dependence of the data, often the cross-covariance matrix is used, and in particular the variance behavior over time and the correlation (for dependence). Examine how the variances Σ jj change with time t j, to learn about the various sources of variability in the data. Examine how the correlation varies: ρ jk = Σ jk Σjj Σ kk ; denote by Γ = ( ρ jk ) j,k n the estimated correlation of Γ = (ρ jk ) j,k n. The off diagonal terms of Γ estimate the combined sources of variability (among+between variability), but they do not distinguish between them (dental data for girls).

11 Interpretation: Remark: instead of estimating ρ jk it is common to plot the association using the so-called scatterplot matrix - essentially a matrix where for each distinct pair (t j, t k ) it is plot Y ij µ j σ j Y ik µ k σ k, and examine whether: the association seem constant across the pairs, the association seem to decay over time, or the association does not vary at all with time? Graphical display of the observations through scatterplot for systematic features.

12 2

13 Autocorrelation. Another measure that describes the association is the autocorrelation: the correlation between the repeated measurements when the lag, or distance between the time, is constant. Stationarity is a property of a stochastic processes that is related to the first/second/etc. moments being constant over time. Examining the autocorrelation is done with the purpose of checking for stationarity assumption (whether the covariance varies with the lag between the observations t j t j instead of the actual times, t j, t j ). Autocorrelation is formally defined as: ρ(u) = corr{y ij, Y ij },, where t j t j = u; this measure describes the stationarity nature of the dependence. Here u is commonly referred as the lag. To study this behavior, plot, for each lag u, the following standardized residuals Y ij µ j Y ij µ j, t j t j = u; σ j σ j Equivalently you can calculate a sample autocorrelation estimator ρ(u), based on these standardized residuals. Notice however that the estimator is based on different number of pairs, hence is characterized by different theoretical properties at various lags, and thus caution should be used in interpreting it. Alternative examination - using variogram. The variogram is defined as V (u) = 2 E{(Y ij Y ij )} 2,, where t j t j = u; For stationary processes (mean and variance constant over time) we have V (u) = τ 2 +σ 2 { ρ(u)}, where τ 2 is the noise variance (known from spatial statistics as nugget effect). When data are unbalanced it is easier to estimate V (u) than ρ(u). To estimate the variogram, we need v ijj = 2 (Y ij Y ij ) 2, and estimate V (u) by V (u) = Ave tij t ij u(v ijj ). 3

14 2 General linear model Motivation: Dialysis study (Ultrafiltration Data For Low Flux Dialyzers presented in Vonesh and Chinchilli, 997) Low flux dialyzers are used to treat patients with end stage renal disease to remove excess fluid and waste from their blood. In low flux hemodialysis, the ultrafiltration rate (ml/hr) at which fluid is removed is thought to follow a straight line relationship with the transmembrane pressure (mmhg) applied across the dialyzer membrane. A study was conducted to compare the average ultrafiltration rate (the response) of such dialyzers across three dialysis centers where they are used on patients. A total of 4 dialyzers (units) were involved. The experiment involved recording the ultrafiltration rate at 4 transmembrane pressures (depicted by dots in the Figure below) for each dialyzer. 4

15 Models for the mean trajectory. Many situations involve irregular and unbalanced sampling designs. One uses simple parametric models to describe the behavior of the mean response over time. Or more generally one assumes that the mean response changes over time in a smooth way. i. Polynomial trends in time The simplest possible curve that describes how the mean response changes over time is a straight line, E[Y ij ] = β + β t ij. Similarly a quadratic trend over time can be represented as E[Y ij ] = β + β t ij + β 2 t 2 ij. A. Observed data {Y ij, t ij : j =,..., m i } where Y ij is the ultrafiltration rate for the ith dialyzer (within Center ), corresponding to the transmembrane pressure t ij. Think of a model that describes the mean ultrafiltration rate and how it varies over time. B. Observed data {Y ij, t ij : j =,..., m i, C i } where Y ij is the ultrafiltration rate for the ith dialyzer, corresponding to the transmembrane pressure t ij and C i is the center membership. Think of a model that describes the mean ultrafiltration rate and how it varies over time. 5

16 Hip replacement study. These data are adapted from Crowder and Hand (99, section 5.2). 3 patients (3 males and 7 females) underwent hip-replacement surgery. Haematocrit, the ratio of volume packed red blood cells relative to volume of whole blood recorded on a percentage basis, was supposed to be measured for each patient at week, before the replacement, and then at weeks, 2, and 3, after the replacement. In addition the age of each participant is recorded. The primary interest was to determine whether there are possible differences in mean response following replacement for men and women. Spaghetti plots of the profiles for each patient are shown in the left hand panels of Figure 3. (We will discuss the right-hand panels later.). It may be seen from the figure that a number of both male and female patients are missing the measurement at week 2; in fact, there is one female missing the pre-replacement measurement and week 2. Here, we have a situation where the data vectors Y i are of possibly different lengths for different units. Exercise: Think and write down a model that describes the mean trajectory and how it varies over time. How do you incorporate the effect of age in this modeling framework? 6

17 ii. Linear splines In some applications the longitudinal trends in the mean response cannot be characterized by simple order degree polynomial (first or second) in time. In some application the trend cannot be well represented by polynomials in time of any order. This will mostly occur when the mean response increases (or decreases) rapidly for some duration, and slower thereafter (or vice versa). When this type of change pattern occurs, the mean trend can be modeled by spline models. In a nutshell, a spline regression model involves a linear combination of connected or joined piecewise polynomial functions. Splines are defined by degree and knots. A linear (quadratic, cubic etc.) spline means that the joined polynomials are lines (quadratic functions or cubic functions etc.). Knots are the locations at which the lines meet or are tied together. Linear spline models provide a useful and flexible way to model non-linear trends that cannot be approximated by simple polynomial functions in time. We defined earlier polynomial models as linear combinations of the power basis functions, {, t, t 2,...}. Linear spline models rely on the same general idea, except the basis functions are of the form {, t, (t κ ) +,..., (t κ k ) + } where {κ,..., κ k } are knots and k is the number of knots. Here (x) + = x if x > and if x. A linear spline model (using a single knot κ) for the mean trend can be represented as: E[Y ij ] = β + β t ij + β 2 (t ij κ) +. response response 5 5 response time time time Figure : Examples of: linear mean trend (left), quadratic mean trend (middle) and linear spline with one knot (right). The mean trend is depicted in red solid line while the observed data is shown in black circles. 7

18 Overall, a parametric model for the mean trajectory can be represented mathematically as µ(t ij ) = X T ijβ, where X T ij is a row-vector of covariates corresponding to the j measurement of the ith subject and β is the column vector of unknown parameters. For the three examples considered above, specify the form of X ij and β linear trend quadratic trend linear spline with one knot κ Remark that one can easily incorporate additional covariate information in the mean structure. 8

19 Models for the covariance: Assume the observed data is {Y ij, X ij : j =,... m i }; let Y i be the m i dimensional vector of Y ij s and X i be m i p dimensional design matrix (e.g. could include s or t ij s or t 2 ij or other covariates observed for subject i or time-varying covariates etc.). Assume that cov(y i ) = Σ i where Σ i is m i by m i dimensional covariance matrix. Here the index i is used specifically to allow for different number of repeated measurements per unit m i. In this part we assume that the covariance model is parametric, that is Σ i = Σ i (ω) is known up to a lower dimensional parameter ω. Recall the responses measured on the same unit/subject are correlated. Although the correlations or more generally the covariance among the repeated responses is not usually of particular interest, we need to account for it in making inferences for the mean parameters. Accounting for the correlations among the repeated measures completes the specification of a (normal) model for the longitudinal data and usually increases precision with which the regression parameters are estimated. There are three main approaches to describe the covariance among the repeated measures: ) unstructured; 2) covariance pattern models (to be described below); and 3) random effects covariance models (to be discussed later in the course). Importantly, these models considered for the covariance matrix will not explicitly distinguish between the among-units and the between units variation. Here are few common covariance pattern models that are described by only few parameters: () The unstructured covariance is typically used when there is a regular (sampling) design say {t ij : j =,..., m i, i =,..., n} = {t, t 2,..., t r } for not so large r; it involves r(r )/2 pairwise covariances. (2) Covariance pattern models Compound symmetric ω = (σ 2, ρ) σ 2 ρσ 2... ρσ 2 ρσ 2 ρσ 2 σ 2... ρσ 2 ρσ 2 Σ i (ω) =..... ρσ 2 ρσ 2... σ 2 ρσ 2 ρσ 2 ρσ 2... ρσ 2 σ 2 One dependent ω = (σ 2, ρ) σ 2 ρσ 2... ρσ 2 σ 2... Σ i (ω) = σ 2 ρσ 2... ρσ 2 σ 2 9

20 Toeplitz structure. ω = (σ 2, ρ,..., ρ m ) σ 2 ρ σ 2... ρ m 2 σ 2 ρ m σ 2 ρ σ 2 σ 2... ρ m 3 σ 2 ρ m 2 σ 2 Σ(ω) =..... ρ m σ 2 ρ m 2 σ 2... ρ σ 2 σ 2 Exponential structure. ω = (σ 2, ρ) σ 2 ρ t i t i2 σ 2... ρ t i t imi σ 2 ρ t i2 t i σ 2 σ 2... ρ t i2 t imi σ 2 Σ i (ω) =.... ρ t im i t i σ 2 ρ t im i t i2 σ 2... σ 2 Notice: when the set of time points {t ij : i, j} is a set of equispaced time points then the above covariance resembles to AR() covariance model corresponding to set of unique points. Remark: The above covariance structure assumes the same variance over time. This was used for simplicity, and one can specify covariance structures with unstructured variance over time. General linear model formulation (population average or marginal model) We can write the general model for the variation of responses in a matrix form as: Y i = X i β + ɛ i where ɛ i is m i -dimensional vector of random deviations, β is the fixed effects parameter and corresponds to the design matrix X i ; β (often called the mean regression parameter) is the main object of inference. The term ɛ i is the deviation from the systematic component, which has a multivariate random distribution with mean mi and covariance matrix Σ i = Σ i (ω). In this chapter we assume that the responses are normally distributed; that is Y i N mi (X i β, Σ i (ω)). Remark: This approach separates the modeling of the mean (systematic component) and the correlation of the random component; the covariance for the random component does not distinguish between the two main sources of variability - between units and among units. Modeling the correlation in longitudinal data is important to be able to obtain correct inferences on regression coefficients β. The correlation model does not change the interpretation of the β parameters. 2

21 3 Estimation of the regression parameters Parameters estimation: Maximum Likelihood (ML) Consider a framework for the estimation of the unknown parameters: the mean regression parameters (β) and the variance parameters (ω). When full distributional assumptions have been made about the vector of responses, a standard approach is to employ Maximum Likelihood Estimation (MLE). For simplicity assume first that the covariance parameters ω are known. Recall: The main idea in the MLE is to estimate the parameters by the values that make the observed data most likely to have occurred, under the specified model. As usual, we use hat to denote parameter estimators. Setting: Observed data are {Y ij : j =,... m i ; X ij } where Y ij are the responses and denote by Y i the vector of responses for unit i, and X ij is the k-dimensional vector of covariate information. Assume Y ij = X T ijβ + ɛ ij ɛ i N( mi, Σ i ) for Σ i = Σ i (ω) and assume Σ i is m i m i matrix, and is known. To obtain the MLE of β we need to maximize the following log-likelihood function: l(β) = n 2 ( m i ) log(2π) 2 i= { n n } log Σ i (Y i X i β) T Σ i (Y i X i β) ; 2 i= i= where X i is the m i k dimensional matrix with the jth row given by X T ij β MLE = argmax β l(β). Since β does not appear in the first two terms, it follows that maximization of the log-likelihood function l(β) is equivalent to minimization of: The solution is: n i= (Y i X i β) T Σ i (Y i X i β); n β MLE = argmin β (Y i X i β) T Σ (Y i X i β). β MLE = { n i= i= (X T i Σ i X i ) } n i= i (X T i Σ i Y i ); this is exactly the generalized least squares (GLS) estimator of β, β GLS. In the case when Σ i s are known, then this estimator is the best linear unbiased estimator of β (Gauss-Markov thm). 2

22 Properties of the β: Unbiasedness of β. What does it mean in layman s terms? Covariance of β. What does it mean in layman s terms? The sampling distribution of β. What does it mean in layman s terms? What is the expression of β when Σ i = σ 2 I mi 22

23 Remark. The GLS estimator is best linear unbiased estimator (BLUE) for β. When the underlying distribution is multivariate normal, then GLS is also MLE for β and furthermore one can show that is uniformly minimum variance unbiased estimator (UMVUE). Question: what is the ordinary least squares (OLS) estimator say β OLS and what is the difference between OLS estimator and GLS estimator. The GLS estimator has the smallest variance among all the weighted least squares estimators. The loss of efficiency is calculated as: eff( β OLS ) = precision( β OLS ) precision( β GLS ) = /var( β OLS ) /var( β GLS ) () if this ratio is close to, then use of β OLS is fine. In general, the ratio is less than one, which means that there is loss of efficiency by using an incorrect independence assumption in estimating the mean regression parameter. In practice the covariance parameter ω is not known. Typically ML/REML estimation is used to obtain an estimate for ω (REML = restricted maximum likelihood, to be discussed soon). The ML/REML estimator ω does not have a close form simple expression; numerical algorithms are used to obtain ω. When such an estimate is obtained then Σ i = Σ i ( ω) is substituted in the expression of the GLS β. When the sample size n is large, the resulting estimator β will have (approximately) all the same properties as if ω, and thus Σ i, were known. In R we fit marginal models (or population average models) using the function geeglm from geepack. The syntax is similar to glm. The correlation structure can be specified either using pre-specified models: independence, exchangeable, ar, unstructured, userdefined (specified by the option constr) using an user-defined correlation model (specified by the option zcor) 23

24 Study description: Case study: the Vlagtwedde-Vlaardingen Study This is an epidemiologic study conducted in two different areas in the Netherlands - the rural area of Vlagtwedde (N-E) and the urban, industrial area of Vlaardingen (S-W). The residents were followed over time to obtain information on the prevalence of and risk factors for chronic obstructive lung diseases. This dataset is based on the sample of men and women from the rural area of Vlagtwedde. The sample, initially aged 5-44, participated in follow-up surveys approximately every 3 years for up to 2 years. At each survey, information on respiratory symptoms and smoking status was collected by questionnaire and spirometry was performed. Pulmonary function was determined by spirometry and a measure of forced expiratory volume (FEV) was obtained every three years for the first 5 years of the study, and also at year 9. The dataset is comprised of a sub-sample of 33 residents aged 36 or older at their entry into the study and whose smoking status did not change over the 9 years of follow-up. Each study participant was either a current or former smoker. Current smoking was defined as smoking at least one cigarette per day. In this dataset FEV was not recorded for every subject at each of the planned measurement occasions. The number of repeated measurements of FEV on each subject varied from to 7. Question of interest: How the pulmonary function change over time? Is this different for current smokers than for former ones? Use various visualization tools to asses the mean behavior over time, gain insight into the dependence over time. Write down a parametric model for both mean and covariance. Using a normal model assumption, estimate the model parameters. 24

25 Parameters estimation: Restricted Maximum Likelihood (REML) Recall setting: Observed data are {Y ij : j =,... m i ; X ij } where Y ij are the responses and denote by Y i the vector of responses for unit i, and X ij is the k-dimensional vector of additional covariates which does not vary across repeated measures. Assume Y ij = X ij β + ɛ ij ɛ i N(, Σ i ) for Σ i = Σ i (ω) and assume ω is unknown vector of parameters. Recall the log-likelihood function l(β, ω) = log L(β, ω): l(β, ω) = m i= n i log(2π) 2 2 { m log Σ i (ω) m } (Y i X i β) T Σ i (ω) (Y i X i β). 2 i= As we stated earlier, the MLE of β and ω are obtained by maximizing the above log-likelihood function. The maximization over ω implies numerical optimization; there is no analytical solution of the ML estimator ω obtained in this way. Nevertheless we can still study properties of the ML-based covariance estimator. It turns out that the ML-based estimator is biased. Optional. To gain more insight, consider the simpler case, where we have scalar data m i = for all i. That is the observed data are {Y i ; X i }, where X i is k-dimensional vector of covariates and assume model: Y i = X T i β + ɛ i, ɛ i N(, σ 2 ) Determine the ML estimator of σ 2, and then discuss its bias. Hint: substitute m i = in the above log-likelihood function, and Σ i (ω) = σ 2. The maximizer with respect to σ 2 is σ 2 ML = n i= (Y i X T i β) 2 /n. i= Insight: Bias arises because the ML estimate σ M L 2 does not take into account that β is also estimated. It may be shown that similar problems arise more generally, when the covariance is more complex. The theory of restricted maximum likelihood (REML) was precisely developed to address this limitation. The REML likelihood is the function for the marginal distribution of the residuals. REML produces estimates of the variance/covariance parameters that are unbiased. For example in the ordinary regression with independent errors, σ REML 2 i = (Y i Xi T β) 2 ; E[ σ 2 n k REML] = σ 2 ( σ REML 2 is unbiased for σ 2 ). The idea behind REML was proposed by Bartlett (937) and was further developed for the estimation of covariance components in unbalanced data by Peterson and Thompson (Biometrika, 97). Harville (974) gives a Bayesian interpretation. The distinction between the REML and 25

26 the ML becomes relevant when k is relatively large. Nice article on REML is by: LaMotte, L.R. (Statistical Papers 27). Intuition: REML is a generalization of the unbiased sample variance estimator. In a nutshell, REML approach uses a ML function calculated to a suitably transformed data, that allows the estimation of the covariance parameters independent of the estimation of the mean parameters. Intuition behind the procedure: Transform data Y to Y = A T Y where matrix A is chosen N (N k) to make the distribution of Y free of β. Here N = n i= m i. For example consider A such that {I X(X T X) X T } = AA T and A T A = I N k ; then Y has multivariate normal distribution, with mean zero and covariance equal to AΣA T which is free of β. The covariance estimators are obtained by maximizing the likelihood of Y. Remark that this likelihood function (which is called the REML function) is in fact the product between the original likelihood function evaluated at β and an adjustment factor. The adjustment factor is n i= XT i Σ i (ω)x i /2. The REML log-likelihood function is: l REML (β, ω) = { n i= m i log(2π) n n } log Σ i (ω) (Y i X i β) T Σ i (ω) (Y i X i β) i= i= n log Xi T Σ i (ω)x i ; 2 i= The solution, again, is obtained by numerical optimization. The REML estimator of ω, ω REML, is unbiased of ω. REML estimation is the default method used to estimate the variance component parameters for many algorithms. Remark: Since the adjustment is a function solely of ω, the ML and REML-based estimators of the mean regression parameter β coincide. 26

27 4 Selection of various covariance models For this section, a maximal model for the mean is assumed and the mean structure is thus FIXED. How to select the most appropriate covariance model? The choices of models for the mean and the covariance are interdependent. Since the confidence intervals and tests of hypotheses for the mean regression parameters depend critically upon the correct specification of the correct model assumed for the covariance it is important to begin with specifying the covariance model. Nevertheless the model for the covariance depends on the assumed model for the mean: the model for the covariance models the dependence between the residuals {Y ij µ ij (β)} and {Y ij µ ij (β)} for j j. Therefore the model for the covariance should be based on a maximal model for the mean. Intuitively any systematic part that is left out (due to misspecification of the mean model) will lead to certain amount of spurious covariance among the residuals and will induce spurious dependence of the covariance on the covariates. In longitudinal models with balanced design for the time points and a very small number of covariates (e.g. group and time-points at which the repeated outcome is measured) it is possible to fit a saturated model. A saturated model would allow for arbitrary pattern for the mean response trajectory at every level of the covariates, and thus minimize the impact of the misspecification of the mean model. Nevertheless determining the maximal model is, in general, difficult and should be made on subject matter grounds. However, once a maximal mean model for the mean response had been fixed, the residual variance and covariance can be used to select an appropriate model for the covariance. Likelihood ratio test (LRT) One possible way to choose between two competing covariance models is by comparing the maximized (REML) likelihood for the corresponding covariance models and using a hypothesis testing framework. Specifically, consider the case where we compare two covariance models that are nested within one another (two covariance models are nested when the reduced model is a special case of the full model). For example, the compound symmetric covariance model is a special case of the Toeplitz covariance model when ρ = ρ 2 =... = ρ n. The null hypothesis is H : Σ has compound symmetric vs H : Σ has Toeplitz structure The LRT is obtained by comparing the maximized (REML) likelihood for the reduced covariance model (compound symmetric) with the maximized (REML) likelihood for the full covariance model (Toeplitz). Formally the test is: LRT = 2 l full 2 l red. Because of the unbiased properties of the estimators obtained using the REML likelihood, the REML likelihoods are typically used for this test. 27

28 Under the null hypothesis, the sampling distribution of LRT is chi-square with degrees of freedom equal to the difference between the number of covariance parameters in the full and the reduced models. In general LRT is preferable for testing between competing nested models. One important limitation is when the null hypothesis includes testing of parameters that are on the boundary of their space set. For example when the null hypothesis comes down to H : σ 2 = (i.e. a variance parameter is equal to zero), where recall σ 2. Such situation is known in the literature by the name testing a null hypothesis that is on the boundary of the parameter space. In this case, the usual asymptotics used to develop the null distribution of the LRT are no longer valid; in particular the null distribution of the LRT is no longer chi-square. We will discuss more about this later when we study linear mixed models. Akaike s Information Criterion (AIC). Often it is of interest to compare models that are not nested. One common method is using the Akaike s Information Criterion (AIC), which is also based on the maximized log-likelihood, but it includes a penalty for complexity of the covariance model assumed AIC = 2 l model + 2c where l model is the maximized or fitted (REML) log-likelihood using the assumed model and c is the number of parameters included in this model. Among all the covariance models of interest, the one with the smallest AIC is preferred. The basic idea behind the AIC is to strike a balance between the fit to the data and the number of parameters involved in the covariance model (if the competing models assume the same model for the mean trend). Schwarz s Bayesian Information Criterion (BIC) Another information criterion for choosing among competing covariance models is Schwarz s Bayesian Information Criterion (BIC), which also uses the maximized log-likelihood and penalizes the complexity of the model (though in a different way). BIC is defined as BIC = 2 l model + (log N)c where c is the number of parameters included in the model of interest, and N is the total number of observations in the data N = n i= m i. Among all the covariance models of interest, the one with the smallest BIC is preferred. The main idea of the BIC comes for Bayesian approach to model selection, which is based on the highest posterior probability (or largest Bayes factor); BIC tries to approximate this Bayesian criterion. Because BIC penalizes drastically the number of components in the model, it tends to select the most parsimonious (simplistic) model; because of this BIC is not among the most popular approaches to select covariance models. 28

29 Remark: AIC penalizes the number of model parameters less strongly than BIC. In small samples, the corrected AIC (caic= AIC with a greater penalty for extra parameters) has been found more successful than AIC/BIC. Inferences about β using the model-based covariance rely heavily on the correct specification of the covariance model. Any misspecification of the covariance model has negligible effects on the estimation of the mean regression parameters β, but it may have serious implications on the inference about these parameters (construction of confidence intervals and hypothesis test). Fortunately one can still make valid inferences even if there are concerns about the specification of the covariance model. In particular valid inferences can be made using the so-called sandwich estimator of the cov( β); the resulting standard error are robust to misspecification of the covariance model. The sandwich estimator of the cov( β) are more common for marginal models for discrete longitudinal observations, and we will study them in detail when we discuss this topic. 29

30 5 Inference of the regression parameters In this section we discuss how to make inferences about β: specifically we consider the construction of confidence intervals and tests of hypotheses. To construct confidence intervals and tests of hypotheses we use the ML (or REML) estimator of β and its estimated covariance matrix: ĉov( β) = { N i= (X T i Σ i X i )}, Σi = Σ i ( ω), where ω is obtained either by ML or by REML. Confidence intervals: Using this result we can construct confidence intervals for a single component of β, say β l : 95%CI for β l : βl ±.96 var( β l ). Essentially we used the lth element of the diagonal of the estimated covariance of β, ĉov( β), and the multivariate normal distribution of the estimator β. This confidence interval is approximate confidence interval if the data are not normally distributed, but the number of units n is large. Hypothesis tests. Assume it is of interest to test H : β l = versus the alternative H : β l. One can use the Wald test statistic: Z = β l. var( β l ) More generally, it may be of interest to construct tests that certain linear combinations of the components of β are. For example, of β = (β, β 2, β 3 ), of interest might be to test a hypothesis of the form H : β β 2 = and so on. Let L be a k dimensional matrix of weights, and assume that we want to test the null hypothesis H : Lβ = versus the alternative H : Lβ. Statistical inference about Lβ relies on the distribution of L β which is N(Lβ, Lcov( β)l T ), based on the distribution of β for the case when the data is multivariate normal. Here we discuss hypothesis testing; but the ideas can be applied to the construction of confidence intervals. Wald test statistic for Lβ, where L is k-dimensional matrix: Z = L β Lĉov( β)l T ; Z N(, ). Equivalently W = Z 2 has chi-square distribution with degrees of freedom, χ 2 : W = (L β ){Lĉov( β)l T } (L β ) T ; W χ 2. 3

31 The advantage of the former test is that it readily generalizes to cases when L has more than one row, for instance when L is r k dimensional matrix. In that case, the null distribution of W would be χ 2 r, and p-values will be calculated based on this distribution. The function esticon in R can be used to estimate linear combinations of regression parameters, test them using Wald and construct confidence intervals. An alternative to Wald test statistics is the likelihood ratio test (LRT) statistic. The LRT for testing H : Lβ = versus the alternative H : Lβ is obtained by comparing the maximized likelihood for 2 models: one model that incorporates the constraint specified in the null hypothesis Lβ = (this is called the reduced model); and one without constraint (this model is called full model). Note that the two models are nested in the sense that the reduced model is a special case of the full model. Thus when the constraint holds, the full model reduces to the reduced model. The maximized log-likelihood for the full model is denoted by l full and the maximized log-likelihood for the reduced model is denoted by l red. The LRT is obtained as: LRT = 2( l full l red ); when the null hypothesis is true, then the distribution of the LRT is χ 2 with df equal to the difference between the number of parameters in the full and the number of parameters in the reduced models. Remark : Wald tests are commonly employed when testing mean regression parameters. When testing between competing covariance models, Wald tests are NOT valid. Remark2 : When testing two nested models in terms of mean regression parameters, do not use REML (because the adjustment is affected by the structure of the systematic mean part). Do use REML when testing between two nested covariance models. 3

32 6 Final Remarks: main features and limitations When confronted with a real data application an important step is the selection of the appropriate covariance model. Such covariance structure incorporates both sources of variation (among-units and between-units). Useful ideas in the selection of the covariance model are: Informal graphical/numerical summaries and other techniques may be used on a preliminary fit using OLS estimates of the regression parameters AIC and BIC criteria may be used, but a dose of subjectivity is also involved If no model is truly appropriate, that is alright too. The models used in the next chapter offer an alternative approach. Important features of the regression approach The regression approach gives the analyst much flexibility in representing the form of the mean of the response. The mean can be modeled smoothly over time; the rate of change is the slope of this function. Also modeling of the mean in this fashion allows estimation of the mean at any time, not just the observed times. The approach does not require a balanced time points design: the vectors of observations may have different lengths m i. One important aspect we should be aware: if the unbalanced is due to missingness when data were intended to be collected at the same points. If the missingness is completely unrelated to the issues under study (e.g. sample of a certain subject at a certain time is mistakenly destroyed/ misplaced), then the analysis is ok. However if the missingness is related to issues under study (e.g. two treatments are compared and in one treatment a subject does not show up because they are too ill) then the missingness might contain information about the treatment; this type of analysis would not be valid. The approach allows the analyst to consider an appropriate model for the covariance out of many choices. Multiple groups/populations can be accounted for by appropriately manipulating the design matrix. Recall the explicit parameterization and the difference parameterizations. Some limitations of this methodology The modeling of the covariance matrix aggregates the two sources of variation and does not allow the analysts to understand the two sources separately The main focus is modeling of the mean trajectories over time; the reconstruction of the individual trajectories is not considered. Characterizing the subject trajectories may be of interest (the current framework does not allow such study.) 32

4 Introduction to modeling longitudinal data

4 Introduction to modeling longitudinal data 4 Introduction to modeling longitudinal data We are now in a position to introduce a basic statistical model for longitudinal data. The models and methods we discuss in subsequent chapters may be viewed

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 12 1 / 34 Correlated data multivariate observations clustered data repeated measurement

More information

Modeling the Mean: Response Profiles v. Parametric Curves

Modeling the Mean: Response Profiles v. Parametric Curves Modeling the Mean: Response Profiles v. Parametric Curves Jamie Monogan University of Georgia Escuela de Invierno en Métodos y Análisis de Datos Universidad Católica del Uruguay Jamie Monogan (UGA) Modeling

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE Biostatistics Workshop 2008 Longitudinal Data Analysis Session 4 GARRETT FITZMAURICE Harvard University 1 LINEAR MIXED EFFECTS MODELS Motivating Example: Influence of Menarche on Changes in Body Fat Prospective

More information

Step 2: Select Analyze, Mixed Models, and Linear.

Step 2: Select Analyze, Mixed Models, and Linear. Example 1a. 20 employees were given a mood questionnaire on Monday, Wednesday and again on Friday. The data will be first be analyzed using a Covariance Pattern model. Step 1: Copy Example1.sav data file

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject

More information

Time-Invariant Predictors in Longitudinal Models

Time-Invariant Predictors in Longitudinal Models Time-Invariant Predictors in Longitudinal Models Today s Class (or 3): Summary of steps in building unconditional models for time What happens to missing predictors Effects of time-invariant predictors

More information

Review of CLDP 944: Multilevel Models for Longitudinal Data

Review of CLDP 944: Multilevel Models for Longitudinal Data Review of CLDP 944: Multilevel Models for Longitudinal Data Topics: Review of general MLM concepts and terminology Model comparisons and significance testing Fixed and random effects of time Significance

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information

with the usual assumptions about the error term. The two values of X 1 X 2 0 1

with the usual assumptions about the error term. The two values of X 1 X 2 0 1 Sample questions 1. A researcher is investigating the effects of two factors, X 1 and X 2, each at 2 levels, on a response variable Y. A balanced two-factor factorial design is used with 1 replicate. The

More information

Statistical Practice. Selecting the Best Linear Mixed Model Under REML. Matthew J. GURKA

Statistical Practice. Selecting the Best Linear Mixed Model Under REML. Matthew J. GURKA Matthew J. GURKA Statistical Practice Selecting the Best Linear Mixed Model Under REML Restricted maximum likelihood (REML) estimation of the parameters of the mixed model has become commonplace, even

More information

Introduction to Within-Person Analysis and RM ANOVA

Introduction to Within-Person Analysis and RM ANOVA Introduction to Within-Person Analysis and RM ANOVA Today s Class: From between-person to within-person ANOVAs for longitudinal data Variance model comparisons using 2 LL CLP 944: Lecture 3 1 The Two Sides

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

Introduction to Random Effects of Time and Model Estimation

Introduction to Random Effects of Time and Model Estimation Introduction to Random Effects of Time and Model Estimation Today s Class: The Big Picture Multilevel model notation Fixed vs. random effects of time Random intercept vs. random slope models How MLM =

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Course topics (tentative) The role of random effects

Course topics (tentative) The role of random effects Course topics (tentative) random effects linear mixed models analysis of variance frequentist likelihood-based inference (MLE and REML) prediction Bayesian inference The role of random effects Rasmus Waagepetersen

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of

More information

ST 732, Midterm Solutions Spring 2019

ST 732, Midterm Solutions Spring 2019 ST 732, Midterm Solutions Spring 2019 Please sign the following pledge certifying that the work on this test is your own: I have neither given nor received aid on this test. Signature: Printed Name: There

More information

Time-Invariant Predictors in Longitudinal Models

Time-Invariant Predictors in Longitudinal Models Time-Invariant Predictors in Longitudinal Models Today s Topics: What happens to missing predictors Effects of time-invariant predictors Fixed vs. systematically varying vs. random effects Model building

More information

Time-Invariant Predictors in Longitudinal Models

Time-Invariant Predictors in Longitudinal Models Time-Invariant Predictors in Longitudinal Models Topics: What happens to missing predictors Effects of time-invariant predictors Fixed vs. systematically varying vs. random effects Model building strategies

More information

A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data

A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data Today s Class: Review of concepts in multivariate data Introduction to random intercepts Crossed random effects models

More information

Analysis of Longitudinal Data: Comparison between PROC GLM and PROC MIXED.

Analysis of Longitudinal Data: Comparison between PROC GLM and PROC MIXED. Analysis of Longitudinal Data: Comparison between PROC GLM and PROC MIXED. Maribeth Johnson, Medical College of Georgia, Augusta, GA ABSTRACT Longitudinal data refers to datasets with multiple measurements

More information

Modeling the Covariance

Modeling the Covariance Modeling the Covariance Jamie Monogan University of Georgia February 3, 2016 Jamie Monogan (UGA) Modeling the Covariance February 3, 2016 1 / 16 Objectives By the end of this meeting, participants should

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

LDA Midterm Due: 02/21/2005

LDA Midterm Due: 02/21/2005 LDA.665 Midterm Due: //5 Question : The randomized intervention trial is designed to answer the scientific questions: whether social network method is effective in retaining drug users in treatment programs,

More information

Interaction effects for continuous predictors in regression modeling

Interaction effects for continuous predictors in regression modeling Interaction effects for continuous predictors in regression modeling Testing for interactions The linear regression model is undoubtedly the most commonly-used statistical model, and has the advantage

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Time Invariant Predictors in Longitudinal Models

Time Invariant Predictors in Longitudinal Models Time Invariant Predictors in Longitudinal Models Longitudinal Data Analysis Workshop Section 9 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section

More information

One-stage dose-response meta-analysis

One-stage dose-response meta-analysis One-stage dose-response meta-analysis Nicola Orsini, Alessio Crippa Biostatistics Team Department of Public Health Sciences Karolinska Institutet http://ki.se/en/phs/biostatistics-team 2017 Nordic and

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Institute of Statistics and Econometrics Georg-August-University Göttingen Department of Statistics

More information

Ch 6. Model Specification. Time Series Analysis

Ch 6. Model Specification. Time Series Analysis We start to build ARIMA(p,d,q) models. The subjects include: 1 how to determine p, d, q for a given series (Chapter 6); 2 how to estimate the parameters (φ s and θ s) of a specific ARIMA(p,d,q) model (Chapter

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

3 Repeated Measures Analysis of Variance

3 Repeated Measures Analysis of Variance 3 Repeated Measures Analysis of Variance 3.1 Introduction As we have discussed, many approaches have been taken in the literature to specifying statistical models for longitudinal data. Within the framework

More information

Describing Within-Person Change over Time

Describing Within-Person Change over Time Describing Within-Person Change over Time Topics: Multilevel modeling notation and terminology Fixed and random effects of linear time Predicted variances and covariances from random slopes Dependency

More information

Ordinary Least Squares Regression

Ordinary Least Squares Regression Ordinary Least Squares Regression Goals for this unit More on notation and terminology OLS scalar versus matrix derivation Some Preliminaries In this class we will be learning to analyze Cross Section

More information

Describing Change over Time: Adding Linear Trends

Describing Change over Time: Adding Linear Trends Describing Change over Time: Adding Linear Trends Longitudinal Data Analysis Workshop Section 7 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section

More information

TABLE OF CONTENTS INTRODUCTION TO MIXED-EFFECTS MODELS...3

TABLE OF CONTENTS INTRODUCTION TO MIXED-EFFECTS MODELS...3 Table of contents TABLE OF CONTENTS...1 1 INTRODUCTION TO MIXED-EFFECTS MODELS...3 Fixed-effects regression ignoring data clustering...5 Fixed-effects regression including data clustering...1 Fixed-effects

More information

Serial Correlation. Edps/Psych/Stat 587. Carolyn J. Anderson. Fall Department of Educational Psychology

Serial Correlation. Edps/Psych/Stat 587. Carolyn J. Anderson. Fall Department of Educational Psychology Serial Correlation Edps/Psych/Stat 587 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 017 Model for Level 1 Residuals There are three sources

More information

BIOS 2083 Linear Models c Abdus S. Wahed

BIOS 2083 Linear Models c Abdus S. Wahed Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter

More information

Describing Within-Person Fluctuation over Time using Alternative Covariance Structures

Describing Within-Person Fluctuation over Time using Alternative Covariance Structures Describing Within-Person Fluctuation over Time using Alternative Covariance Structures Today s Class: The Big Picture ACS models using the R matrix only Introducing the G, Z, and V matrices ACS models

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs) 36-309/749 Experimental Design for Behavioral and Social Sciences Dec 1, 2015 Lecture 11: Mixed Models (HLMs) Independent Errors Assumption An error is the deviation of an individual observed outcome (DV)

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

ST 790, Homework 1 Spring 2017

ST 790, Homework 1 Spring 2017 ST 790, Homework 1 Spring 2017 1. In EXAMPLE 1 of Chapter 1 of the notes, it is shown at the bottom of page 22 that the complete case estimator for the mean µ of an outcome Y given in (1.18) under MNAR

More information

Maximum Likelihood Estimation; Robust Maximum Likelihood; Missing Data with Maximum Likelihood

Maximum Likelihood Estimation; Robust Maximum Likelihood; Missing Data with Maximum Likelihood Maximum Likelihood Estimation; Robust Maximum Likelihood; Missing Data with Maximum Likelihood PRE 906: Structural Equation Modeling Lecture #3 February 4, 2015 PRE 906, SEM: Estimation Today s Class An

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Principles of Statistical Inference Recap of statistical models Statistical inference (frequentist) Parametric vs. semiparametric

More information

How the mean changes depends on the other variable. Plots can show what s happening...

How the mean changes depends on the other variable. Plots can show what s happening... Chapter 8 (continued) Section 8.2: Interaction models An interaction model includes one or several cross-product terms. Example: two predictors Y i = β 0 + β 1 x i1 + β 2 x i2 + β 12 x i1 x i2 + ɛ i. How

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 14 1 / 64 Data structure and Model t1 t2 tn i 1st subject y 11 y 12 y 1n1 2nd subject

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent

More information

Time-Invariant Predictors in Longitudinal Models

Time-Invariant Predictors in Longitudinal Models Time-Invariant Predictors in Longitudinal Models Topics: Summary of building unconditional models for time Missing predictors in MLM Effects of time-invariant predictors Fixed, systematically varying,

More information

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Models for longitudinal data

Models for longitudinal data Faculty of Health Sciences Contents Models for longitudinal data Analysis of repeated measurements, NFA 016 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen

More information

Outline. Linear OLS Models vs: Linear Marginal Models Linear Conditional Models. Random Intercepts Random Intercepts & Slopes

Outline. Linear OLS Models vs: Linear Marginal Models Linear Conditional Models. Random Intercepts Random Intercepts & Slopes Lecture 2.1 Basic Linear LDA 1 Outline Linear OLS Models vs: Linear Marginal Models Linear Conditional Models Random Intercepts Random Intercepts & Slopes Cond l & Marginal Connections Empirical Bayes

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Multiple Linear Regression

Multiple Linear Regression Andrew Lonardelli December 20, 2013 Multiple Linear Regression 1 Table Of Contents Introduction: p.3 Multiple Linear Regression Model: p.3 Least Squares Estimation of the Parameters: p.4-5 The matrix approach

More information

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics Summary Algebraic and Statistical Preliminaries Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models

Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models EPSY 905: Multivariate Analysis Spring 2016 Lecture #12 April 20, 2016 EPSY 905: RM ANOVA, MANOVA, and Mixed Models

More information

Heteroskedasticity. Part VII. Heteroskedasticity

Heteroskedasticity. Part VII. Heteroskedasticity Part VII Heteroskedasticity As of Oct 15, 2015 1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least

More information

Generalized, Linear, and Mixed Models

Generalized, Linear, and Mixed Models Generalized, Linear, and Mixed Models CHARLES E. McCULLOCH SHAYLER.SEARLE Departments of Statistical Science and Biometrics Cornell University A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS, INC. New

More information

Spatial Regression. 3. Review - OLS and 2SLS. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Spatial Regression. 3. Review - OLS and 2SLS. Luc Anselin.   Copyright 2017 by Luc Anselin, All Rights Reserved Spatial Regression 3. Review - OLS and 2SLS Luc Anselin http://spatial.uchicago.edu OLS estimation (recap) non-spatial regression diagnostics endogeneity - IV and 2SLS OLS Estimation (recap) Linear Regression

More information

Some properties of Likelihood Ratio Tests in Linear Mixed Models

Some properties of Likelihood Ratio Tests in Linear Mixed Models Some properties of Likelihood Ratio Tests in Linear Mixed Models Ciprian M. Crainiceanu David Ruppert Timothy J. Vogelsang September 19, 2003 Abstract We calculate the finite sample probability mass-at-zero

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 217, Chicago, Illinois Outline 1. Opportunities and challenges of panel data. a. Data requirements b. Control

More information

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Circle the single best answer for each multiple choice question. Your choice should be made clearly. TEST #1 STA 4853 March 6, 2017 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. There are 32 multiple choice

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

Sample Size and Power Considerations for Longitudinal Studies

Sample Size and Power Considerations for Longitudinal Studies Sample Size and Power Considerations for Longitudinal Studies Outline Quantities required to determine the sample size in longitudinal studies Review of type I error, type II error, and power For continuous

More information

Chapter 3: Regression Methods for Trends

Chapter 3: Regression Methods for Trends Chapter 3: Regression Methods for Trends Time series exhibiting trends over time have a mean function that is some simple function (not necessarily constant) of time. The example random walk graph from

More information

Covariance Models (*) X i : (n i p) design matrix for fixed effects β : (p 1) regression coefficient for fixed effects

Covariance Models (*) X i : (n i p) design matrix for fixed effects β : (p 1) regression coefficient for fixed effects Covariance Models (*) Mixed Models Laird & Ware (1982) Y i = X i β + Z i b i + e i Y i : (n i 1) response vector X i : (n i p) design matrix for fixed effects β : (p 1) regression coefficient for fixed

More information

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Linear Models 1. Isfahan University of Technology Fall Semester, 2014 Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

Computationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor

Computationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor Computationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor Y. Wang M. J. Daniels wang.yanpin@scrippshealth.org mjdaniels@austin.utexas.edu

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Lecture 5: ANOVA and Correlation

Lecture 5: ANOVA and Correlation Lecture 5: ANOVA and Correlation Ani Manichaikul amanicha@jhsph.edu 23 April 2007 1 / 62 Comparing Multiple Groups Continous data: comparing means Analysis of variance Binary data: comparing proportions

More information

Applied Multivariate and Longitudinal Data Analysis

Applied Multivariate and Longitudinal Data Analysis Applied Multivariate and Longitudinal Data Analysis Chapter 2: Inference about the mean vector(s) Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 In this chapter we will discuss inference

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

10. Alternative case influence statistics

10. Alternative case influence statistics 10. Alternative case influence statistics a. Alternative to D i : dffits i (and others) b. Alternative to studres i : externally-studentized residual c. Suggestion: use whatever is convenient with the

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

POWER ANALYSIS TO DETERMINE THE IMPORTANCE OF COVARIANCE STRUCTURE CHOICE IN MIXED MODEL REPEATED MEASURES ANOVA

POWER ANALYSIS TO DETERMINE THE IMPORTANCE OF COVARIANCE STRUCTURE CHOICE IN MIXED MODEL REPEATED MEASURES ANOVA POWER ANALYSIS TO DETERMINE THE IMPORTANCE OF COVARIANCE STRUCTURE CHOICE IN MIXED MODEL REPEATED MEASURES ANOVA A Thesis Submitted to the Graduate Faculty of the North Dakota State University of Agriculture

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

General Linear Model: Statistical Inference

General Linear Model: Statistical Inference Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter 4), least

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Quick Review on Linear Multiple Regression

Quick Review on Linear Multiple Regression Quick Review on Linear Multiple Regression Mei-Yuan Chen Department of Finance National Chung Hsing University March 6, 2007 Introduction for Conditional Mean Modeling Suppose random variables Y, X 1,

More information

Analysis of Longitudinal Data: Comparison Between PROC GLM and PROC MIXED. Maribeth Johnson Medical College of Georgia Augusta, GA

Analysis of Longitudinal Data: Comparison Between PROC GLM and PROC MIXED. Maribeth Johnson Medical College of Georgia Augusta, GA Analysis of Longitudinal Data: Comparison Between PROC GLM and PROC MIXED Maribeth Johnson Medical College of Georgia Augusta, GA Overview Introduction to longitudinal data Describe the data for examples

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

An Introduction to Path Analysis

An Introduction to Path Analysis An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

WU Weiterbildung. Linear Mixed Models

WU Weiterbildung. Linear Mixed Models Linear Mixed Effects Models WU Weiterbildung SLIDE 1 Outline 1 Estimation: ML vs. REML 2 Special Models On Two Levels Mixed ANOVA Or Random ANOVA Random Intercept Model Random Coefficients Model Intercept-and-Slopes-as-Outcomes

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information