Log-mean linear regression models for binary responses with an application to multimorbidity

Size: px
Start display at page:

Download "Log-mean linear regression models for binary responses with an application to multimorbidity"

Transcription

1 Log-mean linear regression models for inary responses with an application to multimoridity arxiv: v3 [stat.me] 16 May 2016 Monia Lupparelli and Alerto Roverato March 30, 2016 Astract In regression models for categorical data a linear model is typically related to the response variales via a transformation of proailities called the link function. We introduce an approach ased on two link functions for inary data named log-mean (LM) and log-mean linear (LML), respectively. The choice of the link function plays a key role for the interpretation of the model, and our approach is especially appealing in terms of interpretation of the effects of covariates on the association of responses. Similarly to Poisson regression, the LM and LML regression coefficients of single outcomes are log-relative risks, and we show that the relative risk interpretation is maintained also in the regressions of the association of responses. Furthermore, certain collections of zero LML regression coefficients imply that the relative risks for joint responses factorize with respect to the corresponding relative risks for marginal responses. This work is motivated y the analysis of a dataset otained from a case-control study aimed to investigate the effect of HIV-infection on multimoridity, that is simultaneous presence of two or more noninfectious commoridities in one patient. Keywords: Categorical data; Link function; Multimoridity pattern; Relative risk; Response association. epartment of Statistical Sciences, University of Bologna, IT (monia.lupparelli@unio.it) epartment of Statistical Sciences, University of Bologna, IT (alerto.roverato@unio.it) 1

2 1 Introduction In many research fields it is required to model the dependence of a collection of response variales on one or more explanatory variales. McCullagh and Nelder (1989, Sec. 6.5) specified that in this context there are typically three lines of inquiry: (i) the dependence structure of each response marginally on covariates, (ii) a model for the joint distriution of all responses and (iii) the joint dependence of response variales on covariates. When in a regression model responses are categorical, a linear model is typically related to the response variales via a transformation of proailities called the link function, and the choice of the link function plays a key role for the interpretation of the model along the lines (i) to (iii). We refer to Tutz (2011) and Agresti (2013) for a full account of regression models for categorical data; see also Ekholm et al. (2000, Section 5) for a review of some link functions commonly used in the inary case. This work is motivated y a research aimed to investigate the effect of HIVinfection on multimoriditiy which is defined as the co-occurrence of two or more chronic medical conditions in one person. It is well known that multimoridity is associated with age and, furthermore, that HIV-infected patients experience an increased prevalence of noninfectious comoridities, compared with the general population. Guaraldi et al. (2011) considered a dataset otained from a cross-sectional retrospective case-control study, and investigated the effect of HIV-infection on the prevalence of a set of noninfectious chronic medical conditions y applying an univariate regression to each response. However, multimoridity is characterised y complex interactions of co-existing diseases and to gain relevant insight it is necessary to use a multivariate approach aimed to investigate the effect of HIV on the way different chronic conditions associate. The main scientific ojective of this study is thus the line of enquiry (iii). However, to the est of our knowledge, this line has never een explicitly addressed in the literature, and this paper is fully devoted to this issue. The application we consider naturally requires a marginal modelling approach ecause the main interest is for the effect of HIV on the marginal association of susets of comoridities; see Tutz (2011, Chapter 13) and Agresti (2013, Chapter 12). For this reason, we focus on the case where the link function satisfies upward compatiility, that is every association term among responses can e computed in the relative marginal distriution. In this way, the parameterization of the response variales will include terms that can e regarded as single outcomes, computed marginally on univariate responses, and terms that can e regarded as association outcomes, hereafter referred 2

3 to as response associations, which are computed marginally on susets of responses. Regression models typically include coefficients encoding the effect of the covariates, as well as of interactions of covariates, on response associations and difficulties involve oth the interpretation of the response associations and the interpretation of the relevant regression coefficients. More seriously, the effect of a covariate on a response association might e removale in the sense that it disappears when a different link function is used. It is therefore crucially relevant to e ale to define models with interpretale regressions coefficients; see Berrington de González and Cox (2007) for a review on statistical interactions, with emphasis on interpretation. In marginal modelling a central role is played y the multivariate logistic regression model ecause it maintains a marginal logistic regression interpretation for the single outcomes (McCullagh and Nelder, 1989; Glonek and McCullagh, 1995). Nevertheless, this family of regression models does not provide a satisfying answer to the multimoridity application. This is due to the fact that, although the regression coefficients for the single outcomes can e interpreted in terms of odds ratios, this feature does not translate to the higher order regressions where oth the response association and the relevant regression coefficients are high-level log-linear parameters which are difficult to interpret. We consider two different parameterizations, the log-mean (LM) (rton and Richardson, 2008; rton, 2009) and the log-mean linear (LML) parameterization (Roverato et al., 2013; Roverato, 2015), and investigate the use of these parameterizations as link functions. In this way, we introduce an approach where, similarly to Poisson regression, regression coefficients can e interpreted in terms of relative risks. Furthermore, and more interestingly, the relative risk interpretation can e extended from the regressions of the single outcomes to the regressions of the response associations, therey providing interpretale coefficients. The LM and the LML links can e used to specify the same classes of sumodels ut the LML link has the advantage that relevant sumodels can e specified y setting regression coefficients to zero. Specifically, we show that certain collections of zero LML regression coefficients imply that the relative risks for joint responses factorize with respect to the corresponding relative risks for marginal responses. The paper is organized as follows. In Section 2 we descrie the motivating prolem concerning the analysis of the multimoridity data. Section 3 gives the ackground concerning the theory of regression for multivariate inary responses, as required for this paper. Section 4 introduces the LM and the LML regression models and descries the relevant properties of these models. The analysis of multimoridity data is carried 3

4 out in Section 5 and, finally, Section 6 contains a discussion. 2 Motivating prolem: multimoridity in HIVpositive patients Antiretroviral therapy (ART) for human immunodeficiency virus (HIV) infection has een a great medical success story. Nowadays, in countries with good access to treatment, clinical AIS is no longer the inevitale outcome of HIV infection and this disease, previously associated with extremely high mortality rates, is now generally thought of as a chronic condition (Mocroft et al., 2003; May and Ingle, 2011). espite a marked increase in life expectancy, mortality rates among HIV-infected persons remain higher than those seen in the general population. Some of the excess mortality oserved among HIV-infected persons can e directly attriuted to illnesses that occur as a consequence of immunodeficiency, however, more than half of the deaths oserved in recent years among ART-experienced HIV infected patients are attriutale to noninfectious comoridities (NICMs) (Phillips et al., 2008; Guaraldi et al., 2011). Multimoridity is defined as the co-occurrence of two or more chronic medical conditions in one person that is, for HIV positive patients, as the simultaneous presence of two or more NICMs. Multimoridity, which is associated with age, is perhaps the most common disease pattern found among the elderly and, for this reason, it is turning into a major medical issue for oth individuals and health care providers (Marengoni et al., 2011). It is well known that HIV-infected patients experience an increased prevalence of NICMs, compared with the general population, and it has een hypothesized that such increased prevalence is the result of premature aging of HIV-infected patients (eeks and Phillips, 2009; Shiels et al., 2010; Guaraldi et al., 2011). Multimoridity is characterised y the co-occurrence of NICMs and therefore investigating the effect of HIV-infection on multimoridity requires to investigate the effect of HIV on the way different chronic conditions associate. The dataset we analyse here comes from a study of Guaraldi et al. (2011) who investigated the effect of HIV-infection on the prevalence of a set of noninfectious chronic medical conditions. ata were otained from a cross-sectional retrospective case-control study with sample size n = (2854 cases and 8562 controls). Cases were ART-experienced HIV-infected patients older than 18 years of age who were consecutively enrolled at the Metaolic Clinic of Modena University in Italy from 2002 to Control sujects were matched according to age, sex, race (all white), 4

5 and geographical area. The oserved variales include oth a set of inary (response) variales encoding the presence of NICMs of interest and a set of context and clinical covariates. See Guaraldi et al. (2011) for details and additional references. 3 Background and notation 3.1 Möius inversion In this susection we introduce the notation used for matrices and recall a well known result named Möius inversion that will e extensively used in the following. For two finite sets V and U, with V = p and U = q, we write θ = {θ (E)} V,E U to denote a 2 p 2 q real matrix with rows and columns indexed y the susets of V and U, respectively. Furthermore, we will write θ(e) to denote the column of θ indexed y E U and θ to denote the row of θ indexed y V. Note that the notation we use may e easier to read if one associates with iseases and E with Exposure. Example 3.1 (Matrix notation.) For the case V = {, c, d} and U = {h, a} the matrix θ has eight rows indexed y the susets, {}, {c}, {d} {, c}, {, d}, {c, d}, {, c, d} and four columns indexed y the susets, {h}, {a}, {h, a}. The matrix θ, with row and column indexes, is given elow; note that we use the suppressed notation θ c (ha) to denote θ {,c} ({h, a}), and similarly for the other quantities. θ = {h} {a} {h, a} θ ( ) θ (h) θ (a) θ (ha) θ ( ) θ (h) θ (a) θ (ha) θ c ( ) θ c (h) θ c (a) θ c (ha) θ d ( ) θ d (h) θ d (a) θ d (ha) θ c ( ) θ c (h) θ c (a) θ c (ha) θ ( ) θ (h) θ (a) θ (ha) θ cd ( ) θ cd (h) θ cd (a) θ cd (ha) θ cd ( ) θ cd (h) θ cd (a) θ cd (ha) {} {c} {d} {, c} {, d} {c, d} {, c, d} 5

6 Furthermore, the rows and columns of θ are denoted as follows. θ = θ θ θ c θ d θ c θ θ cd θ cd ( ) = θ( ) θ(h) θ(a) θ(ha) This matrix notation is not standard in the literature concerning categorical data ut, in the case of regression models for inary data, it allows us to provide a compact representation of model parameters in matrix form and to compute alternative parameterizations, as well as regression coefficients, y direct application of Möius inversion. Let ω e another real matrix indexed y the susets of V and U. For a suset V Möius inversion states that θ (E) = ω (E ), E U ω (E) = E E E E( 1) E\E θ (E ), E U; (1) see, among others, Lauritzen (1996, Appendix A). Let Z U and M U e two (2 q 2 q ) matrices with entries indexed y the susets of U U such that the entry of Z indexed y the pair E, H U is equal to 1(E H) and the corresponding entry of M is equal to ( 1) H\E 1(E H), where 1( ) denotes the indicator function. Then, the equivalence (1) can e written in matrix form as θ = ω Z U ω = θ M U and Möius inversion follows y noticing that M U = Z 1. Note that it is straightforward to extend this result to the matrices ω and θ as U θ = ωz U ω = θm U (2) and, furthermore, that it makes sense to consider Möius inversion also with respect to the columns of ω and θ so that θ = ZV ω ω = M V θ. 6

7 3.2 Multivariate inary response models Let Y V = (Y v ) v V e a inary random vector of response variales with entries indexed y V and X U = (X u ) u U a vector of inary covariates with entries indexed y U. Without loss of generality, we assume that Y V and X U take value in {0, 1} p and {0, 1} q, respectively. The values of covariates denote different oservational or experimental conditions and we assume that, for every x U {0, 1} q, the distriution of Y V (X U = x U ) is multivariate Bernoulli. Furthermore, we assume that the latter distriutions are independent across conditions and that, when X U is regarded as a random vector, then also (Y V, X U ) follows a multivariate Bernoulli distriution. We can write the proaility distriutions of Y V X U y means of a matrix π = {π (E)} V,E U where, for every E U, the column vector π(e) is the proaility distriution of Y V given (X E = 1 E, X U\E = 0 U\E ) and, more specifically, π (E) = pr(y = 1, Y V \ = 0 V \ X E = 1 E, X U\E = 0 U\E ). We assume that all the entries of π are strictly positive. In the following, for a suset V we use the suppressed notation Y = 1 to denote Y = 1 and similarly for Y = 0 and the suvectors of X U. Given three random vectors X, Y and Z, we write X Y Z to say that X is independent of Y given Z (awid, 1979) or, in the case where Y and Z are not random, that the conditional distriution of X given Y and Z does not depend on Y. In regression models for categorical responses a linear regression is typically related to the response variales via a link function θ(π). All the link functions considered in this paper are such that, for every E U, the vector θ(e) = θ(π(e)) parameterizes the distriution of Y V (X E = 1, X U\E = 0). In this way, the link function induces a matrix θ = {θ (E)} V,E U and the associated linear regression, in the saturated case, has form θ (E) = E E β θ (E ) for every V, E U (3) where β θ = {β θ (E)} V,E U is a matrix of regression coefficients. It follows from (1) and (2) that (3) can e written in matrix form as θ = β θ Z U so that β θ = θm U. (4) The regression setting (3) involves multivariate cominations of oth the responses and the covariates and for this reason it is important to explicitly distinguish etween the -response associations which are given y θ for V with > 1, and the 7

8 E-covariate interactions given y β θ (E) for E U with E > 1. Hence, β θ (E) encodes the effect of the E-covariate interaction on the -response association, given a fixed level of X U\E that, without loss of generality, in our approach is the zero level. Example 3.2 (Matrix notation continued.) In all the examples of this paper we use the variales of the multimoridity data as given in Section 5. Specifically, we consider three of the response variales, which are Y = Bone fracture, Y c = Cardiovascular disease and Y d = iaetes, with level 1 encoding the presence of the disease, and the most relevant covariates, that is, X h = HIV with the level 1 encoding the presence of the infection and X a = Age with the value 1 for patients aged 45 or more. Hence, if the link function is the matrix θ given in Example 3.1, then the matrix of regression coefficients β θ is β θ = β θ ( ) β θ (h) β θ (a) β θ (ha) β θ ( ) β θ (h) β θ (a) β θ β c θ ( ) β c θ (h) β c θ (a) β c θ β θ d ( ) β θ d β θ c ( ) β θ ( ) β θ cd ( ) β θ cd ( ) β θ c (h) β θ (h) β θ cd (h) β θ cd (h) (h) β θ d β θ c (a) β θ (a) β θ cd (a) β θ cd (a) (ha) (ha) (a) β θ d (ha) β θ c (ha) β θ (ha) β θ cd (ha) β θ cd (ha) = Equation (3) states that there exists a one-to-one Möius inversion relationship etween every row θ of θ and the corresponding row β θ of β θ. For instance, θ is the link function of the marginal distriution of one fracture, and each of its entries, θ (E) for E U, can e computed y considering β θ and then y taking the sum of its entries which are indexed y a suset of E. More specifically, for E = {h} it holds that θ (h) = β θ ( ) + β θ (h) whereas for E = {h, a} it holds that θ (ha) = β θ ( ) + β θ (h) + β θ (a) + β θ (ha). Hence, β θ (h) and β θ (ha) encode the effect of HIV and of the interaction of HIV and age, respectively, on one fracture. Similarly, θ is the response association of one fracture and diaetes. Hence, for E = {h} it holds that θ (h) = β θ ( ) + β θ (h) whereas for E = {h, a} it holds that θ (ha) = β θ ( ) + β θ (h) + β θ (a) + β θ (ha). Here, β θ (h) and β θ (ha) encode the effect of HIV and of the interaction of HIV and age, respectively, on the {, d}-response association θ. 8 β θ β θ β θ c β θ d β θ c β θ β θ cd β θ cd

9 There exists an extensive literature on models for categorical data analysis ut, remarkaly, Lang (1996), extending previous work y Lang and Agresti (1994), introduced a very general method to specify regression models for categorical data therey defining an extremely road class of models named generalized log-linear models. This includes, as special cases, many of the existing models for multiple categorical responses such as log-linear and, more generally, marginal log-linear models (Bergsma et al., 2009). In particular, in marginal regression modelling a relevant instance within this class is otained when θ( ) is the multivariate logistic link function denoted y η( ) (McCullagh and Nelder, 1989; Glonek and McCullagh, 1995; Molenerghs and Lesaffre, 1999; Bartolucci et al., 2007; Bergsma et al., 2009; Marchetti and Lupparelli, 2011). This induces the parameterization η = {η (E)} V,E U where η (E) is the -way log-linear interaction computed in the margin Y (X E = 1, X U\E = 0) and, more specifically, η {v} is the usual logistic link for every v V. Marchetti and Lupparelli (2011) considered the regression framework (3) with θ = η and showed that β η (E) is the E -way log-linear interaction computed in the distriutions of (Y, X U ). Example 3.3 (Multivariate logistic regression) Consider the case V = {, c} and U = {h}. The multivariate logistic parameters are η i ( ) = log pr(y i = 1 X h = 0) 1 pr(y i = 1 X h = 0) and η i (h) = log pr(y i = 1 X h = 1) 1 pr(y i = 1 X h = 1) for i {, c}, which are logit links. Furthermore, if we denote y OR(Y c ) and y OR(Y c h) the odds ratio etween one fracture and cardiovascular disease for HIVnegative and HIV-positive patients, respectively, then η c ( ) = log OR(Y c ) and η c (h) = log OR(Y c h) so that the coefficients for the regression of η c are β η c ( ) = log OR(Y c ) and β η c (h) = log OR(Y c h) OR(Y c ). We now move from the saturated model to sumodels defined y means of linear constraints on the regression coefficients and, more specifically, to sumodels with regression coefficients equal to zero. If for every suset of U, denoted as E, that has non-empty intersection with U U, it holds that β θ (E) = 0, then the covariates X U have no-effect on θ. The following lemma states a connection etween no-effect 9

10 of a suset of variales and linear constraints in θ. Lemma 3.1 Let θ = {θ (E)} V,E U e a real matrix with entries indexed y two nonempty sets V and U. If β θ = θm U and U U then the following are equivalent, for every V, as they oth say that X U has no effect on θ ; (i) β θ (E) = 0 for every E U such that E U ; (ii) θ (E 1 ) = θ (E 2 ) for every E 1, E 2 U such that E 1 \U = E 2 \U. Proof. See Appendix A. 4 Log-mean and log-mean linear regression models The multivariate Bernoulli distriution elongs to the natural exponential family, and the mean parameter associated with the distriution π(e) is µ(e) where µ (E) = pr(y = 1 X E = 1, X U\E = 0) for V. rton and Richardson (2008) used the mean parameter to parameterize graphical models of marginal independence and called it the Möius parameter ecause µ(e) = Z V π(e) for every E V. Susequently, rton (2009) used the matrix µ = Z V π to parameterize regression graph models; see also Roverato et al. (2013) and Roverato (2015). The mean parameterization has the disadvantage that sumodels of interest are defined y, non-linear, multiplicative constraints. Roverato et al. (2013) introduced the log-mean linear parameter γ defined as a log-linear expansion of the mean parameters, formally, γ = MV log(µ), (5) and showed that this approach improves on the mean parameterization as sumodels of interest can e specified y setting certain zero log-mean linear interactions. We remark that oth the mean and the log-mean linear parameters are not variation independent, in the sense that setting some parameters to particular values may restrict the valid range of other parameters. As a consequence, unlike variation independent parameterizations, it might e more difficult to interpret separately each parameter. Here, we show that the analysis of multimoridity data can e effectively approached y applying the theory descried in the previous section to the log-mean (LM) and the log-mean linear (LML) parameterizations to develop the LM and the LML regression model, respectively. 10

11 4.1 Log-mean regression The LM regression model is otained y setting θ in (3) equal to the logarithm of the mean parameter, log(µ), so that, in the saturated case, log(µ) = β µ Z U that, in its extended form, is log µ (E) = E E β µ (E ) for every V, E U (6) where for = the equation (6) is trivial for every E U ecause µ is a row vector of ones so that oth log(µ ) = 0 and β µ = log(µ )M U = 0. LM regression can e regarded as one of the possile alternative ways to parameterize the distriution of Y V X U, ut it is of special interest for the application considered in this paper. This can e seen y noticing that one can write µ (E) = pr(y = 1 X E = 1, X U\E = 0) where Y = v Y v is the inary random variale associated with the multimoridity pattern. More specifically, pr(y = 1) is the proaility that the multimoridity pattern occurs, and therefore (6) is a sequence of regressions corresponding to the univariate inary responses Y for V. Similarly to Poisson regression, the parameters of LM regression can e interpreted in terms of relative risks. We first consider the case U = {u}, so that U = 1. Hence, if we denote the relative risk of an event E with respect to the two groups identified y X u = 1 and X u = 0 y RR u (E) = pr(e X u = 1) pr(e X u = 0) (7) then for every v V it holds that β v µ (u) = log RR u (Y v = 1) whereas, more generally, for V β µ (u) = log RR u(y = 1) = log RR u ( v {Y v = 1}) (8) where we use the convention that RR u (Y = 1) = 1. Example 4.1 (LM regression) Consider the case where V = {, c, d} and U = {h}. Then the saturated LM regression model contains a regression equation for every 11

12 V, where the case = is trivial. For = 1, equation (6) has form log µ i ( ) = log pr(y i = 1 X h = 0) = β µ i ( ) log µ i (h) = log pr(y i = 1 X h = 1) = β µ i ( ) + β µ i (h) where, for every i V, β µ i (h) = log RR h (Y i = 1) that is the log-relative risk of the disease Y i for HIV-positive patients compared to HIV-negative patients. For = 2, log µ ij ( ) = log pr(y i = 1, Y j = 1 X h = 0) = β µ ij ( ) log µ ij (h) = log pr(y i = 1, Y j = 1 X h = 1) = β µ ij ( ) + β µ ij (h) where, for every i, j V with i j, β µ ij (h) = log RR h(y ij = 1) that is the logrelative risk of the co-occurrence of the diseases Y i and Y j for HIV-positive patients compared to HIV-negative patients. Finally, for = 3 so that = V, log µ cd ( ) = log pr(y = 1, Y c = 1, Y d = 1 X h = 0) = β µ cd ( ) log µ cd (h) = log pr(y = 1, Y c = 1, Y d = 1 X h = 1) = β µ cd ( ) + β µ cd (h) where β µ cd (h) = log RR h(y cd = 1) is the log-relative risk of the co-occurrence of the three diseases for HIV-positive patients compared to HIV-negative patients. Consider the regression equation relative to the suset V. It follows from (6) that if for an element u U it holds that β µ (E) = 0 for every E U such that u E, then Y X u X U\{u}. This can e easily extended to a suset of covariates X U with U U, whereas to generalize this result to the vector Y it is necessary to consider a collection of regression equations as shown elow. Proposition 4.1 Let Y e the suvector of Y V indexed y V. For a suset U U, it holds that Y X U X U\U if and only if β µ (E) = 0 for every and E U such that E U. Proof. See Appendix A. We can conclude that it makes sense to focus on sumodels characterized y zero LM regression coefficients ecause they encode interpretale relationships, possily implying that one or more covariates have no-effect on the distriution of Y, or even on the joint distriution of Y. However, this approach did not identify any missing effects in the application to multimoridity data. One reason for that is that the zero 12

13 pattern of regression coefficients descried in Proposition 4.1 implies no-effect of X U on Y and therefore that X U has no-effect on Y v for every single v. Consider the case U = 1 of a unique covariate X h representing the HIV-infection. It is well estalished that HIV has a relevant effect on each of the comoridites considered singularly, that is, β v µ (h) 0 for every v V. Although in principle it is possile for a covariate X h with a non-zero effect on Y v for some v to have coefficients equal to zero in the regression relative to Y, this hardly happens in practice ecause of the strong association existing etween Y and every Y v with v ; recall that Y v = 0 for any v implies Y = 0 and, conversely, Y = 1 implies Y v = 1 for every v. In other words, it is well known that, if X h = 1 for HIV-postive patients, then pr(y v = 1 X h = 1) > pr(y v = 1 X h = 0) for every comoridity v V, and therefore it is reasonale to expect that also pr(y = 1 X h = 1) > pr(y = 1 X h = 0) for every comoridity pattern V, with > 1. To disclose the usefulness of LM regression for the application considered in this paper, it is necessary to propose a different approach. In the multimoridity analysis, it is useful to distinguish etween two different effects of HIV, specifically, one can e interest in the effect of HIV (i) on the prevalence of a comoridity pattern and (ii) on the association among comoridities in. As discussed aove, the former is in general a well known matter as multimoridity shows a higher prevalence in infected patients. Indeed, the main question is whether HIV plays a role in the way single comoridities comine together to produce the comoridity pattern, that is in the association of variales in Y. We approach the prolem y considering the extreme case of no-effect of HIV on the -response associations ecause responses are conditionally independent given the covariates. Our standpoint is that if for a suset V there exists a proper partition A B = such that Y A Y B X U, then there is no-effect of the covariates on the -way association of Y. This allows us to compare the performance of different link functions in multimoridity analysis and, more importantly, to provide a clear interpretation to the effect of HIV in LM regression. In the Example 4.2 elow we illustrate how this idea can e formalized. Example 4.2 (LM regression vs. multivariate logistic regression) For the case V = {, d} and U = {h}, assume Y Y d X h so that we can say that there is no-effect of X h = HIV on the association of Y and Y d. To see this in practice it is sufficient to consider any measure of association that represents independence y a constant value, for instance the value zero, so that the association is the same for the two values of X h. Exploiting the factorization of the proaility function of Y X h implied y conditional independence, it is not difficult to see that under the LM regression 13

14 oth Y Y d X h β µ ( ) = β µ ( ) + β µ d ( ) and β µ (u) = β µ (h) + β µ d (h), and, under the multivariate logistic regression, Y Y d X h β η ( ) = 0 and β η (h) = 0. The effect of HIV on the joint distriution of Y is represented y β µ (h) in LM regression and y β η (h) in multivariate logistic regression and therefore oth and β µ (h) = β µ (h) + β µ (h) (9) d β η (h) = 0 (10) can e used to state that there is no-effect of HIV on the association of Y and Y d. However, (9) and (10) refer to different kinds of association ecause, although they are necessary conditions for Y Y d X h, they are not sufficient for the same condition to hold true, and it can e easily checked that neither (9) implies (10) nor (10) implies (9). From this perspective, it makes little sense to state that HIV has no-effect on the association of Y and Y d if a clear interpretation to equalities (9) and (10) is not provided. The interpretation of equality (9) is straightforward. It states a connection etween regressions of different order and implies that the effect of HIV on the distriution of Y is explained y the effects of HIV on the marginal distriutions of Y and Y d. More interestingly, it provides an useful insight on the ehaviour of relative risks ecause (9) is equivalent to RR h ({Y = 1} {Y d = 1}) = RR h (Y = 1) RR h (Y d = 1) (11) that is, if we are willing to interpret the effect of HIV y means of relative risks, then (9) allows one to carry out the analysis marginally on the regression equations for the main responses, ecause the computation of the relative risk of the joint event {Y = 1} {Y d = 1} does not require the joint distriution of Y X h ut only the marginal distriutions of Y X h and Y d X h as in the case where Y Y d X h. Multivariate logistic regression parameters are naturally associated with odds ra- 14

15 tios, which play a very fundamental role among association measures for categorical data. However, we deem that, in this context, the interpretation of (9) is more straightforward than (10). Indeed, the latter is equivalent to the identity OR(Y ) = OR(Y h) that, unlike (11), does not explains how the marginal effects of HIV on one fracture and diaetes comine together to give the effect of HIV on the {, d}-response association. This example shows that the LM regression coefficients provide a clear way to investigate the effect of a covariate on the association of two inary variales and the following theorem generalises this result to -response associations. Theorem 4.2 Let µ e the mean parameter of Y V X U and β µ = log(µ)m U. Then, for a pair of disjoint nonempty susets A and B of V it holds that Y A Y B X U if and only if for every A B and E U, with oth A and B, it holds that β µ (E) = B µ (E) where B µ (E) = ( 1) \ β µ (E). (12) Proof. See Appendix A. Theorem 4.2 shows that, whenever Y can e split into two conditionally independent suvectors Y A and Y B, then every coefficient of the LM regression with response log(µ ) can e written as a linear comination of the corresponding coefficients in lower-order regressions. Hence, the difference etween case and control patients with respect to Y is not given y the effect of HIV on the association of all the comoridities in, ut it is only the consequence of the effect that HIV has on the occurrence of the susets A and B of comoridity patterns. For the case U = {u}, the relationship β µ (u) = B µ (u) can e used to state that X u has no-effect on the -response association and, as well as in Example 4.2, this statement means that the effect of X u on the distriution of Y is explained y the corresponding coefficients in lower-order regressions. Furtheremore, it follows immediately from (8) that the equality β µ (u) = B µ (u) can e equivalently stated in terms of factorization of the relative risk RR u (Y = 1) with respect to the collection of relative risks RR u (Y = 1) for. Example 4.3 (LM regression cont.) For the LM regression in Example 4.1 consider the case where Y A Y B X h where A and B are nonempty, disjoint susets of V 15

16 such that A B = V. Then, it follows from Theorem 4.2 oth that and that More generally, for the case U > 1 it follows from Theorem 4.2 that the relationship β µ cd ( ) = β µ ( ) β c µ ( ) β µ d ( ) + β µ c ( ) + β µ ( ) + β µ cd ( ) β µ cd (h) = β µ (h) β c µ (h) β µ d (h) + β µ c (h) + β µ (h) + β µ cd (h) RR h (Y cd = 1) = RR h(y c =1) RR h (Y =1) RR h (Y cd =1) RR h (Y =1) RR h (Y c =1) RR h (Y d =1). β µ (E) = B µ (E) for every E U such that u E (13) implies that every regression coefficient involving X u is the linear comination of the corresponding coefficients of lower-order regressions, and can e used to state that X u has no-effect on the -response association. Also in this case, (13) can e interpreted in terms of relative risks. Consider the collection of relative risks of the event Y = 1, with respect to X u, conditionally on the values of the remaining covariates X U\{u}, given y RR u (Y = 1 E) = pr(y = 1 X u = 1, X E = 1, X U\(E {u}) = 0) pr(y = 1 X u = 0, X E = 1, X U\(E {u}) = 0) for E U\{u};(14) we recall that we set RR u (Y = 1 E) = 1. Then we associate to every element of (14) a reference value defined as follows. efinition 4.1 For V with > 1, the reference relative risk of the event Y = 1 with respect to X u and a suset E U\{u} is defined as RR u (Y = 1 E) = RR u (Y = 1 E) ( 1) \ +1. The following lemma states the connection etween relative risks and regression coefficients as well as etween reference relative risks in efinition 4.1 and Theorem 4.2. Lemma 4.3 Let µ e the mean parameter of Y V X U and β µ = log(µ)m U. Then, 16

17 for every V, u U and E U\{u} it holds that log RR u (Y = 1 E) = E E β µ (E {u}) and, for > 1, that log RR u (Y = 1 E) = E E B µ (E {u}). Proof. See Appendix A. The introduction of the reference relative risks is motivated y the following result, Corollary 4.4 Let the suset V e such that there exists a proper partition = A B, with A, B, satisfying Y A Y B X U. Then for every u U it holds that RR u (Y = 1 E) = RR u (Y = 1 E) for every E U\{u}. (15) Proof. See Appendix A. Hence RR u (Y = 1 E) is a reference value in the sense that it is the value taken y the corresponding relative risk when Y can e split into two conditionally independent suvectors. If (15) is satisfied, then all the conditional relative risks of the events {Y = 1}, with respect to X u, are equal to their reference value. This implies that, as far as the relative risk of the multimoridity pattern is of concern, the analysis can e carried out marginally on the distriutions of Y X U with. The following corollary shows that one can interpret (13) in terms of relative risks ecause it is equivalent to (15). Corollary 4.5 Let µ e the mean parameter of Y V X U and β µ = log(µ)m U. For any V, with > 1, and u U the relationship (13) is satisfied if and only if (15) is satisfied. Proof. See Appendix A. We can conclude that LM regression provides a useful framework to investigate the effect of covariates on the association of responses. Sumodels of interest involve possily zero regression coefficients, as in Proposition 4.1, ut, more commonly, regression 17

18 coefficients which are linear comination of lower-order coefficients as in Theorem 4.2. This approach can e easily implemented ecause (i) it involves sumodels defined y linear constraints on the regression parameters and (ii) the computation of B µ (E) in (12) does not require to specify explicitly the partition of Y into independent suvectors. A shortcoming of LM regression is that in order to interpret the values of regression coefficients in terms of conditional independence the coefficients must e contrasted with the theoretical values (12) given in Theorem 4.2. In the next section we show that LML regression provides a solution to this prolem. 4.2 Log-mean linear regression The LML regression model is otained y setting θ in (3) equal to the LML parameter γ = M V log(µ) in (5), so that, in the saturated case, it follows from (4) that β γ = γm U and γ = β γ Z U ; the latter can e written as γ (E) = E E β γ (E ) for every V, E U. (16) There is a close connection etween LM and LML regression given y a linear relationship etween β γ and β µ. Lemma 4.6 Let µ and γ e the mean and LML parameter, respectively, of Y V X U so that β µ = log(µ)m U and β γ = γm U. Then it holds that β γ = M V β µ, that is for every V and E U β γ (E) = ( 1) \ β µ (E). (17) Proof. See Appendix A. As a consequence of Lemma 4.6, sumodels defined y zero LM regression coefficients as in Proposition 4.1 can e equivalently stated y setting to zero the corresponding LML regression coefficients. Corollary 4.7 Let Y e the suvector of Y V indexed y = V. For a suset U U, it holds that Y X U X U\U if and only if β γ (E) = 0 for every and E U such that E U. Proof. See Appendix A. 18

19 β γ Furthermore, it follows immediately from (17) that for 1 it holds that = β µ whereas for > 1 LML regression coefficients allow one to immediately check whether a LM regression coefficients coincides with the associated theoretical value given in Theorem 4.2. Proposition 4.8 Let µ and γ e the mean and LML parameter, respectively, of Y V X U so that β µ = log(µ)m U and β γ = γm U. Then for every V, such that > 1, and E U it holds that β γ (E) = β µ (E) B µ (E) so that β γ (E) = 0 if and only if β µ (E) = B µ (E). Proof. This is an immediate consequence of (17). Hence, the relationship β γ (E) = 0 for every E U such that u E (18) is equivalent to oth (13) and (15) and we can conclude that LML regression is equivalent to LM regression for the purposes of our analysis, with the advantage that all the sumodels of interest can e specified y setting LML regression coefficients to zero. Furthermore, the value of the regression coefficients β γ can e used to contrast relative risks with the corresponding reference values as follows. Corollary 4.9 Let γ e the LML parameter of Y V X U so that β γ = γm U. Then for every V, such that > 1, and E U it holds that log RR u(y = 1 E) RR u (Y = 1 E) = β γ (E {u}) (19) E E Proof. See Appendix A. Note that, interestingly, Corollary 4.9 implies that for the case U = 1 the LML regression coefficients such that > 1 have an immediate interpretation as deviation of a relative risk from its reference value β γ (u) = log RR u(y = 1) RR u (Y = 1). (20) We close this section y noticing that also the LM and the LML regression models elong to the family of generalized log-linear models. This allows us to exploit 19

20 the asymptotic results of Lang (1996) for the computation of maximum likelihood estimates (MLEs) and for model comparison. Nevertheless, we remark that the regression framework introduced in this paper is entirely novel. Both the LM and the LML link functions allow us to specify the functional relationship etween π and the regression coefficients in closed form. This is a specific property that is not shared y the wider class of generalised log-linear models, and that makes it possile to specify closed-form functional relationships etween regression coefficients of different responses. In turn, this is the asis for the factorization of relative risks with respect to the corresponding relative risks of lower order regressions. 5 Analysis of multimoridity data We now apply the LM and the LML regression models to the analysis of the multimoridity data descried in Section 2. We consider four response variales: Y = Bone fracture, Y c = Cardiovascular disease, Y d = iaetes and Y r = Renal failure. Hence, V = {, c, d, r} and Y V takes values in {0, 1} 4 where, for each variale, the value 1 encodes the presence of the disease. The four responses define 11 multimoridity patterns denoted y the susets V with 2 and we say that k = is the size of the multimoridity pattern. For the computation of MLEs we applied the algorithm and the maximization procedure given in Lang (1996) properly adjusted to account for the inclusion of the LM and LML link functions; for technical details and a review of further maximization approaches see also Evans and Forcina (2013) and references therein. It is well estalished that asymptotic methods are not efficient when the tale of oserved counts is sparse, that is when many cells have small frequencies (see Agresti, 2013, Section 10.6). For a given sample size, sparsity increases with the numer of variales included in the analysis, therefore inference is less reliale for response associatons of higher-order since they are computed on the relevant marginal tales. In order to keep sparsity at an acceptale level, we restrict the analysis to the effect of two inary covariates indexed y U = {a, h}; specifially, X h = HIV with the level 1 encoding the presence of the infection and X a = Age with the value 1 for patients aged 45 or more. Firstly, in Susection 5.1, we consider the regression model including only the covariate X h, so that each regression coefficient represents the marginal effect of HIV on a -response associations. Next, in Susection 5.2 we consider a regression model with two covariates including also the effect of age. We remark that the case-control design for this study is not ased on an outcome- 20

21 dependent sampling. In fact the enrollment of each patient is independent of the disease status given the HIV status and further individual covariates. Therefore, for this case-study the population relative risk represents an identifiale measure of association, unlike case-control designs where the sample selection depends on the outcome of interest. 5.1 The single covariate case When only the covariate X h is included in the model, the regression coefficients have a straightforward interpretation in terms of relative risks for cases versus controls. Indeed, in the LM regression model, it holds that for = 1 the coefficient β µ (h) is the log-relative risk of the occurrence of single commoridities and, otherwise, it is the log-relative risk of the occurrence of the multimoridity pattern ; see (8). In the LML regression, the coefficient β γ (h) is equal to β µ (h) in case of single responses, otherwise it is the log-ratio of the relative and the reference relative risks for the occurrence of the multimoridity pattern, as shown in (20). Hence, from Proposition 4.8, the no-effect of HIV defined y β γ (h) = 0 with > 1 implies that the relative risk equals the reference relative risk for the multimoridity pattern. Conversely, a positive or a negative value of β γ (h) states that the relative risk for the pattern is higher or lower than its reference relative risk, respectively. As a preliminary analysis, we provide in Tale 1 the MLEs under the saturated LM and LML regression models. To clarify the meaning of the values given in this tale consider, for instance, the disease pattern {, c}, that is one fracture diaetes. The µ estimated LM coefficient ˆβ c (h) = provides a strong evidence of the existence of an effect of HIV on the co-occurrence of the two diseases whereas the corresponding γ LML parameter estimate ˆβ c (h) = is not significantly different from zero, and this can e interpreted as no-effect of HIV on the association of the two diseases. More specifically, the estimated relative risk of the co-occurrence of one fracture and µ diaetes takes value exp{ ˆβ c (h)} = exp(2.941) = 18.9 and is significantly different from 1; however, it is not significantly different from the product of the two marginal relative risks given y exp{β µ (h)} exp{β c µ (h)}. The latter is the value taken y the relative risk of the co-occurrence of the two diseases when Y Y c X h and therefore it implies that the relative risk of the co-occurrence of the two diseases only depends on the marginal relative risks of the two diseases rather than on the way the two diseases associate to give the multimoridity pattern. ifferent is the case of the disease pattern {c, d} ecause in this case the estimated LML coefficient 21

22 ˆβ γ cd (h) = is significantly different form zero and negative and therefore the relative risk of the co-occurrence of the two diseases is significantly smaller than expected in the case the two diseases are conditionally independent given HIV. We remark that, when we say that a regression coefficient is significantly different from zero we refer to the statistical test, at level 5%, ased on the asymptotic normal distriution of the MLEs. As expected (see the discussion following Proposition 4.1) in Tale 1 the estimated log-relative risks ˆβ µ (h) are positive for every single disease and for every multimoridity pattern. Furthermore, all the corresponding regression coefficients are significantly different from zero, except for the pattern {, c, d}. The lowest relative risk corresponds to the pattern {c, d}, whereas the highest relative risks are in correspondence of the patterns {, r}, {c, r} and {, d, r}. More informative is the analysis of the LML regression coefficients. Indeed, most of the coefficients β γ (h) are not significantly different from zero, therey providing an empirical evidence that the log-relative risks for the corresponding disease patterns are positive as a consequence of the effect of HIV on the single diseases they are composed, rather than for an effect of HIV on the associations of diseases. It is interesting that most of the γ coefficients ˆβ (h), for 2, take a negative value. This is especially true for the multimoridity patterns of size two, suggesting that the corresponding relative risks, although positive, take a value that is smaller than the value one would expect in case of conditional independence of the diseases. The results provided y the saturated model can e summarized considering the average HIV-effect for disease patterns of the same size k, under the LM and LML γ approaches. In particular, we compute the average of (h) and of ˆβ (h), for - response associations of the same size; see Appendix B for further details. Confidence intervals for these effects are plotted in Figure 1. The plot in the left panel of Figure 1 gives the estimated LM average effect, i.e. the average log-relative risk for the occurrence of multimoridity patterns of the same size, which clearly increases with the size of the disease pattern. On the other hand, the plot in the right panel of Figure 1 shows the estimated LML average effect, that is the average effect of HIV on the association among diseases forming patterns of the same size. This effect appears to e negative for patterns of size k = 2 and it might e null for k = 3 and k = 4. Next, we apply the forward inclusion stepwise procedure descried in Appendix B and select, in this way, the LML regression sumodel given in Tale 2. The latter provides a very good fit of the data with a deviance 3.45 on 12 degrees of freedom and p-value, computed on the asymptotic chi-square distriution of the deviance, ˆβ µ 22

23 Tale 1: The saturated LML and LM regression models for Y V X h. The tale gives the MLEs of the regression coefficients with their standard error (s.e.) and p-value. ˆβ γ ( ) γ µ µ s.e. p-value ˆβ (h) s.e. p-value ˆβ ( ) s.e. p-value ˆβ (h) s.e. p-value {} < < < <.001 {c} < < < <.001 {d} < < < <.001 {r} < < < <.001 {, c} < <.001 {, d} < <.001 {, r} < <.001 {c, d} < < < {c, r} < <.001 {d, r} < < <.001 {, c, d} < {, c, r} < < <.001 {, d, r} < <.001 {c, d, r} < <.001 {, c, d, r} < <.001 LM regression model LML regression model average effect average effect disease pattern size disease pattern size Figure 1: Confidence intervals of the average HIV-effect for response associations for disease patterns of the same size under the saturated LM and LML regression models. 23

24 Tale 2: The selected LML, and the corresponding LM, regression model for Y V X h. The tale gives the MLEs of the regression coefficients with their standard error (s.e.) and p-value. ˆβ γ ( ) γ µ s.e. p-value ˆβ (h) s.e. p-value ˆβ ( ) s.e. ˆβ µ (h) s.e. {} < < {c} < < {d} < < {r} < < {, c} {, d} {, r} {c, d} < < {c, r} {d, r} < {, c, d} {, c, r} < < {, d, r} < < {c, d, r} {, c, d, r} < equal to In the selected model there is no effect of HIV on the associations {, c}, {, d}, {, r}, {c, r}, {d, r}, {, c, d} and {c, d, r}. Furthermore, it follows from Theorem 4.2 and Proposition 4.8 that, under the selected model, three pairs of responses are conditionally independent given HIV, specifically, Y Y d X h, Y Y r X h and Y c Y r X h. We now look more closely at the values taken y the estimated regression coefficients given in Tale 2. We ase this analysis on the asymptotic normal distriution of the MLEs, ut it is important to remark that, as we are dealing with post-selection parameter estimates, there might e distortions on the sampling distriutions; see Berk et al. (2013) for a discussion and recent developments. The highest estimated relative risks are for multimoridity patterns {, r} and {, d, r} with 95% confidence intervals (5.793; 6.130) and (5.146; 6.130), respectively. More generally, high relative risks are given in correspondence of multimoridity patterns including Renal failure. γ The negative values taken y the estimates ˆβ (h) for the remaining patterns of size two and three suggest that the relative risks of the multimoridity patterns {c, d}, {, d, r} and {, c, r} are lower than their reference relative risks. For the disease pattern of size 4 the estimate of β γ (h) is positive suggesting that the relative risk of the pattern {, c, d, r} is higher than its reference relative risk. 24

ARTICLE IN PRESS. Journal of Multivariate Analysis ( ) Contents lists available at ScienceDirect. Journal of Multivariate Analysis

ARTICLE IN PRESS. Journal of Multivariate Analysis ( ) Contents lists available at ScienceDirect. Journal of Multivariate Analysis Journal of Multivariate Analysis ( ) Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Marginal parameterizations of discrete models

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

AN EXTENDED CLASS OF MARGINAL LINK FUNCTIONS FOR MODELLING CONTINGENCY TABLES BY EQUALITY AND INEQUALITY CONSTRAINTS

AN EXTENDED CLASS OF MARGINAL LINK FUNCTIONS FOR MODELLING CONTINGENCY TABLES BY EQUALITY AND INEQUALITY CONSTRAINTS Statistica Sinica 17(2007), 691-711 AN EXTENDED CLASS OF MARGINAL LINK FUNCTIONS FOR MODELLING CONTINGENCY TABLES BY EQUALITY AND INEQUALITY CONSTRAINTS Francesco Bartolucci 1, Roberto Colombi 2 and Antonio

More information

arxiv: v1 [stat.me] 5 May 2018 Monia Lupparelli

arxiv: v1 [stat.me] 5 May 2018 Monia Lupparelli Conditional and marginal relative risk parameters for a class of recursive regression graph models arxiv:1805.02044v1 [stat.me] 5 May 2018 Monia Lupparelli Department of Statistical Sciences, University

More information

Chapter 20: Logistic regression for binary response variables

Chapter 20: Logistic regression for binary response variables Chapter 20: Logistic regression for binary response variables In 1846, the Donner and Reed families left Illinois for California by covered wagon (87 people, 20 wagons). They attempted a new and untried

More information

Lecture 9. Statistics Survival Analysis. Presented February 23, Dan Gillen Department of Statistics University of California, Irvine

Lecture 9. Statistics Survival Analysis. Presented February 23, Dan Gillen Department of Statistics University of California, Irvine Statistics 255 - Survival Analysis Presented February 23, 2016 Dan Gillen Department of Statistics University of California, Irvine 9.1 Survival analysis involves subjects moving through time Hazard may

More information

A class of latent marginal models for capture-recapture data with continuous covariates

A class of latent marginal models for capture-recapture data with continuous covariates A class of latent marginal models for capture-recapture data with continuous covariates F Bartolucci A Forcina Università di Urbino Università di Perugia FrancescoBartolucci@uniurbit forcina@statunipgit

More information

#A50 INTEGERS 14 (2014) ON RATS SEQUENCES IN GENERAL BASES

#A50 INTEGERS 14 (2014) ON RATS SEQUENCES IN GENERAL BASES #A50 INTEGERS 14 (014) ON RATS SEQUENCES IN GENERAL BASES Johann Thiel Dept. of Mathematics, New York City College of Technology, Brooklyn, New York jthiel@citytech.cuny.edu Received: 6/11/13, Revised:

More information

1. Define the following terms (1 point each): alternative hypothesis

1. Define the following terms (1 point each): alternative hypothesis 1 1. Define the following terms (1 point each): alternative hypothesis One of three hypotheses indicating that the parameter is not zero; one states the parameter is not equal to zero, one states the parameter

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

FinQuiz Notes

FinQuiz Notes Reading 9 A time series is any series of data that varies over time e.g. the quarterly sales for a company during the past five years or daily returns of a security. When assumptions of the regression

More information

The Mean Version One way to write the One True Regression Line is: Equation 1 - The One True Line

The Mean Version One way to write the One True Regression Line is: Equation 1 - The One True Line Chapter 27: Inferences for Regression And so, there is one more thing which might vary one more thing aout which we might want to make some inference: the slope of the least squares regression line. The

More information

Luis Manuel Santana Gallego 100 Investigation and simulation of the clock skew in modern integrated circuits. Clock Skew Model

Luis Manuel Santana Gallego 100 Investigation and simulation of the clock skew in modern integrated circuits. Clock Skew Model Luis Manuel Santana Gallego 100 Appendix 3 Clock Skew Model Xiaohong Jiang and Susumu Horiguchi [JIA-01] 1. Introduction The evolution of VLSI chips toward larger die sizes and faster clock speeds makes

More information

Expansion formula using properties of dot product (analogous to FOIL in algebra): u v 2 u v u v u u 2u v v v u 2 2u v v 2

Expansion formula using properties of dot product (analogous to FOIL in algebra): u v 2 u v u v u u 2u v v v u 2 2u v v 2 Least squares: Mathematical theory Below we provide the "vector space" formulation, and solution, of the least squares prolem. While not strictly necessary until we ring in the machinery of matrix algera,

More information

Depth versus Breadth in Convolutional Polar Codes

Depth versus Breadth in Convolutional Polar Codes Depth versus Breadth in Convolutional Polar Codes Maxime Tremlay, Benjamin Bourassa and David Poulin,2 Département de physique & Institut quantique, Université de Sherrooke, Sherrooke, Quéec, Canada JK

More information

Regression models for multivariate ordered responses via the Plackett distribution

Regression models for multivariate ordered responses via the Plackett distribution Journal of Multivariate Analysis 99 (2008) 2472 2478 www.elsevier.com/locate/jmva Regression models for multivariate ordered responses via the Plackett distribution A. Forcina a,, V. Dardanoni b a Dipartimento

More information

Genetic Algorithms applied to Problems of Forbidden Configurations

Genetic Algorithms applied to Problems of Forbidden Configurations Genetic Algorithms applied to Prolems of Foridden Configurations R.P. Anstee Miguel Raggi Department of Mathematics University of British Columia Vancouver, B.C. Canada V6T Z2 anstee@math.uc.ca mraggi@gmail.com

More information

Statistical Methods for Joint Analysis of Survival Time and Longitudinal Data

Statistical Methods for Joint Analysis of Survival Time and Longitudinal Data Statistical Methods for Joint Analysis of Survival Time and Longitudinal Data Jaeun Choi A dissertation sumitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment

More information

,..., θ(2),..., θ(n)

,..., θ(2),..., θ(n) Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

HIGH-DIMENSIONAL GRAPHS AND VARIABLE SELECTION WITH THE LASSO

HIGH-DIMENSIONAL GRAPHS AND VARIABLE SELECTION WITH THE LASSO The Annals of Statistics 2006, Vol. 34, No. 3, 1436 1462 DOI: 10.1214/009053606000000281 Institute of Mathematical Statistics, 2006 HIGH-DIMENSIONAL GRAPHS AND VARIABLE SELECTION WITH THE LASSO BY NICOLAI

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

A732: Exercise #7 Maximum Likelihood

A732: Exercise #7 Maximum Likelihood A732: Exercise #7 Maximum Likelihood Due: 29 Novemer 2007 Analytic computation of some one-dimensional maximum likelihood estimators (a) Including the normalization, the exponential distriution function

More information

High-Throughput Sequencing Course

High-Throughput Sequencing Course High-Throughput Sequencing Course DESeq Model for RNA-Seq Biostatistics and Bioinformatics Summer 2017 Outline Review: Standard linear regression model (e.g., to model gene expression as function of an

More information

Smoothness of conditional independence models for discrete data

Smoothness of conditional independence models for discrete data Smoothness of conditional independence models for discrete data A. Forcina, Dipartimento di Economia, Finanza e Statistica, University of Perugia, Italy December 6, 2010 Abstract We investigate the family

More information

Chapter 4. Parametric Approach. 4.1 Introduction

Chapter 4. Parametric Approach. 4.1 Introduction Chapter 4 Parametric Approach 4.1 Introduction The missing data problem is already a classical problem that has not been yet solved satisfactorily. This problem includes those situations where the dependent

More information

Branching Bisimilarity with Explicit Divergence

Branching Bisimilarity with Explicit Divergence Branching Bisimilarity with Explicit Divergence Ro van Glaeek National ICT Australia, Sydney, Australia School of Computer Science and Engineering, University of New South Wales, Sydney, Australia Bas

More information

Measures of Association and Variance Estimation

Measures of Association and Variance Estimation Measures of Association and Variance Estimation Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University D. Bandyopadhyay (VCU) BIOS 625: Categorical Data & GLM 1 / 35

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

Describing Contingency tables

Describing Contingency tables Today s topics: Describing Contingency tables 1. Probability structure for contingency tables (distributions, sensitivity/specificity, sampling schemes). 2. Comparing two proportions (relative risk, odds

More information

SVETLANA KATOK AND ILIE UGARCOVICI (Communicated by Jens Marklof)

SVETLANA KATOK AND ILIE UGARCOVICI (Communicated by Jens Marklof) JOURNAL OF MODERN DYNAMICS VOLUME 4, NO. 4, 010, 637 691 doi: 10.3934/jmd.010.4.637 STRUCTURE OF ATTRACTORS FOR (a, )-CONTINUED FRACTION TRANSFORMATIONS SVETLANA KATOK AND ILIE UGARCOVICI (Communicated

More information

Lecture 01: Introduction

Lecture 01: Introduction Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction

More information

Generalized logit models for nominal multinomial responses. Local odds ratios

Generalized logit models for nominal multinomial responses. Local odds ratios Generalized logit models for nominal multinomial responses Categorical Data Analysis, Summer 2015 1/17 Local odds ratios Y 1 2 3 4 1 π 11 π 12 π 13 π 14 π 1+ X 2 π 21 π 22 π 23 π 24 π 2+ 3 π 31 π 32 π

More information

Simple Examples. Let s look at a few simple examples of OI analysis.

Simple Examples. Let s look at a few simple examples of OI analysis. Simple Examples Let s look at a few simple examples of OI analysis. Example 1: Consider a scalar prolem. We have one oservation y which is located at the analysis point. We also have a ackground estimate

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Upper Bounds for Stern s Diatomic Sequence and Related Sequences

Upper Bounds for Stern s Diatomic Sequence and Related Sequences Upper Bounds for Stern s Diatomic Sequence and Related Sequences Colin Defant Department of Mathematics University of Florida, U.S.A. cdefant@ufl.edu Sumitted: Jun 18, 01; Accepted: Oct, 016; Pulished:

More information

CPM: A Covariance-preserving Projection Method

CPM: A Covariance-preserving Projection Method CPM: A Covariance-preserving Projection Method Jieping Ye Tao Xiong Ravi Janardan Astract Dimension reduction is critical in many areas of data mining and machine learning. In this paper, a Covariance-preserving

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

EVALUATIONS OF EXPECTED GENERALIZED ORDER STATISTICS IN VARIOUS SCALE UNITS

EVALUATIONS OF EXPECTED GENERALIZED ORDER STATISTICS IN VARIOUS SCALE UNITS APPLICATIONES MATHEMATICAE 9,3 (), pp. 85 95 Erhard Cramer (Oldenurg) Udo Kamps (Oldenurg) Tomasz Rychlik (Toruń) EVALUATIONS OF EXPECTED GENERALIZED ORDER STATISTICS IN VARIOUS SCALE UNITS Astract. We

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Multivariate Survival Analysis

Multivariate Survival Analysis Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in

More information

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What? You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?) I m not goin stop (What?) I m goin work harder (What?) Sir David

More information

Determinants of generalized binary band matrices

Determinants of generalized binary band matrices Determinants of generalized inary and matrices Dmitry Efimov arxiv:17005655v1 [mathra] 18 Fe 017 Department of Mathematics, Komi Science Centre UrD RAS, Syktyvkar, Russia Astract Under inary matrices we

More information

ANOVA 3/12/2012. Two reasons for using ANOVA. Type I Error and Multiple Tests. Review Independent Samples t test

ANOVA 3/12/2012. Two reasons for using ANOVA. Type I Error and Multiple Tests. Review Independent Samples t test // ANOVA Lectures - Readings: GW Review Independent Samples t test Placeo Treatment 7 7 7 Mean... Review Independent Samples t test Placeo Treatment 7 7 7 Mean.. t (). p. C. I.: p t tcrit s pt crit s t

More information

Minimizing a convex separable exponential function subject to linear equality constraint and bounded variables

Minimizing a convex separable exponential function subject to linear equality constraint and bounded variables Minimizing a convex separale exponential function suect to linear equality constraint and ounded variales Stefan M. Stefanov Department of Mathematics Neofit Rilski South-Western University 2700 Blagoevgrad

More information

Generalized Linear Models 1

Generalized Linear Models 1 Generalized Linear Models 1 STA 2101/442: Fall 2012 1 See last slide for copyright information. 1 / 24 Suggested Reading: Davison s Statistical models Exponential families of distributions Sec. 5.2 Chapter

More information

Lecture 12: Grover s Algorithm

Lecture 12: Grover s Algorithm CPSC 519/619: Quantum Computation John Watrous, University of Calgary Lecture 12: Grover s Algorithm March 7, 2006 We have completed our study of Shor s factoring algorithm. The asic technique ehind Shor

More information

A note on profile likelihood for exponential tilt mixture models

A note on profile likelihood for exponential tilt mixture models Biometrika (2009), 96, 1,pp. 229 236 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asn059 Advance Access publication 22 January 2009 A note on profile likelihood for exponential

More information

A NEW HYBRID TESTING PROCEDURE FOR THE LOW CYCLE FATIGUE BEHAVIOR OF STRUCTURAL ELEMENTS AND CONNECTIONS

A NEW HYBRID TESTING PROCEDURE FOR THE LOW CYCLE FATIGUE BEHAVIOR OF STRUCTURAL ELEMENTS AND CONNECTIONS SDSS Rio 010 STABILITY AND DUCTILITY OF STEEL STRUCTURES E. Batista, P. Vellasco, L. de Lima (Eds.) Rio de Janeiro, Brazil, Septemer 8-10, 010 A NEW HYBRID TESTING PROCEDURE FOR THE LOW CYCLE FATIGUE BEHAVIOR

More information

Lecture 10: Introduction to Logistic Regression

Lecture 10: Introduction to Logistic Regression Lecture 10: Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 2007 Logistic Regression Regression for a response variable that follows a binomial distribution Recall the binomial

More information

Real option valuation for reserve capacity

Real option valuation for reserve capacity Real option valuation for reserve capacity MORIARTY, JM; Palczewski, J doi:10.1016/j.ejor.2016.07.003 For additional information aout this pulication click this link. http://qmro.qmul.ac.uk/xmlui/handle/123456789/13838

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

PHY451, Spring /5

PHY451, Spring /5 PHY451, Spring 2011 Notes on Optical Pumping Procedure & Theory Procedure 1. Turn on the electronics and wait for the cell to warm up: ~ ½ hour. The oven should already e set to 50 C don t change this

More information

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Communications of the Korean Statistical Society 2009, Vol 16, No 4, 697 705 Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Kwang Mo Jeong a, Hyun Yung Lee 1, a a Department

More information

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014 LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers

More information

A Solution to Separation for Clustered Binary Data

A Solution to Separation for Clustered Binary Data A Solution to Separation for Clustered Binary Data José Cortiñas Arahantes and Marc Aerts Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-BioStat), Center for Statistics,

More information

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH The First Step: SAMPLE SIZE DETERMINATION THE ULTIMATE GOAL The most important, ultimate step of any of clinical research is to do draw inferences;

More information

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P. Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Melanie M. Wall, Bradley P. Carlin November 24, 2014 Outlines of the talk

More information

10 Lorentz Group and Special Relativity

10 Lorentz Group and Special Relativity Physics 129 Lecture 16 Caltech, 02/27/18 Reference: Jones, Groups, Representations, and Physics, Chapter 10. 10 Lorentz Group and Special Relativity Special relativity says, physics laws should look the

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Chapter 2: Describing Contingency Tables - I

Chapter 2: Describing Contingency Tables - I : Describing Contingency Tables - I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu]

More information

Fixed-b Inference for Testing Structural Change in a Time Series Regression

Fixed-b Inference for Testing Structural Change in a Time Series Regression Fixed- Inference for esting Structural Change in a ime Series Regression Cheol-Keun Cho Michigan State University imothy J. Vogelsang Michigan State University August 29, 24 Astract his paper addresses

More information

Longitudinal + Reliability = Joint Modeling

Longitudinal + Reliability = Joint Modeling Longitudinal + Reliability = Joint Modeling Carles Serrat Institute of Statistics and Mathematics Applied to Building CYTED-HAROSA International Workshop November 21-22, 2013 Barcelona Mainly from Rizopoulos,

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about

More information

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds Chapter 6 Logistic Regression In logistic regression, there is a categorical response variables, often coded 1=Yes and 0=No. Many important phenomena fit this framework. The patient survives the operation,

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: ) NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3

More information

Multi-state Models: An Overview

Multi-state Models: An Overview Multi-state Models: An Overview Andrew Titman Lancaster University 14 April 2016 Overview Introduction to multi-state modelling Examples of applications Continuously observed processes Intermittently observed

More information

Differential Geometry of Surfaces

Differential Geometry of Surfaces Differential Geometry of urfaces Jordan mith and Carlo équin C Diision, UC Berkeley Introduction These are notes on differential geometry of surfaces ased on reading Greiner et al. n. d.. Differential

More information

Properties of Likelihood Inference for Order Restricted Models 1

Properties of Likelihood Inference for Order Restricted Models 1 Journal of Multivariate Analysis 72, 577 (2) Article ID jmva.999.847, availale online at http:www.idealirary.com on Properties of Likelihood Inference for Order Restricted Models Arthur Cohen, J. H. B.

More information

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1

More information

GEE for Longitudinal Data - Chapter 8

GEE for Longitudinal Data - Chapter 8 GEE for Longitudinal Data - Chapter 8 GEE: generalized estimating equations (Liang & Zeger, 1986; Zeger & Liang, 1986) extension of GLM to longitudinal data analysis using quasi-likelihood estimation method

More information

Rollout Policies for Dynamic Solutions to the Multi-Vehicle Routing Problem with Stochastic Demand and Duration Limits

Rollout Policies for Dynamic Solutions to the Multi-Vehicle Routing Problem with Stochastic Demand and Duration Limits Rollout Policies for Dynamic Solutions to the Multi-Vehicle Routing Prolem with Stochastic Demand and Duration Limits Justin C. Goodson Jeffrey W. Ohlmann Barrett W. Thomas Octoer 2, 2012 Astract We develop

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu 1 / 35 Tip + Paper Tip Meet with seminar speakers. When you go on

More information

1 Systems of Differential Equations

1 Systems of Differential Equations March, 20 7- Systems of Differential Equations Let U e an open suset of R n, I e an open interval in R and : I R n R n e a function from I R n to R n The equation ẋ = ft, x is called a first order ordinary

More information

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2) Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2) B.H. Robbins Scholars Series June 23, 2010 1 / 29 Outline Z-test χ 2 -test Confidence Interval Sample size and power Relative effect

More information

Generalized Linear Models Introduction

Generalized Linear Models Introduction Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,

More information

Representation theory of SU(2), density operators, purification Michael Walter, University of Amsterdam

Representation theory of SU(2), density operators, purification Michael Walter, University of Amsterdam Symmetry and Quantum Information Feruary 6, 018 Representation theory of S(), density operators, purification Lecture 7 Michael Walter, niversity of Amsterdam Last week, we learned the asic concepts of

More information

Today. HW 1: due February 4, pm. Aspects of Design CD Chapter 2. Continue with Chapter 2 of ELM. In the News:

Today. HW 1: due February 4, pm. Aspects of Design CD Chapter 2. Continue with Chapter 2 of ELM. In the News: Today HW 1: due February 4, 11.59 pm. Aspects of Design CD Chapter 2 Continue with Chapter 2 of ELM In the News: STA 2201: Applied Statistics II January 14, 2015 1/35 Recap: data on proportions data: y

More information

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

Estimating a Finite Population Mean under Random Non-Response in Two Stage Cluster Sampling with Replacement

Estimating a Finite Population Mean under Random Non-Response in Two Stage Cluster Sampling with Replacement Open Journal of Statistics, 07, 7, 834-848 http://www.scirp.org/journal/ojs ISS Online: 6-798 ISS Print: 6-78X Estimating a Finite Population ean under Random on-response in Two Stage Cluster Sampling

More information

Lecture 1. Introduction Statistics Statistical Methods II. Presented January 8, 2018

Lecture 1. Introduction Statistics Statistical Methods II. Presented January 8, 2018 Introduction Statistics 211 - Statistical Methods II Presented January 8, 2018 linear models Dan Gillen Department of Statistics University of California, Irvine 1.1 Logistics and Contact Information Lectures:

More information

Marginal Screening and Post-Selection Inference

Marginal Screening and Post-Selection Inference Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2

More information

Evidence, eminence and extrapolation

Evidence, eminence and extrapolation Research Article Received 9 Decemer 204, Accepted 3 Decemer 205 Pulished online January 206 in Wiley Online Lirary wileyonlinelirary.com DOI: 0.002/sim.6865 Evidence, eminence and extrapolation Gerald

More information

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing. Previous lecture P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing. Interaction Outline: Definition of interaction Additive versus multiplicative

More information

Frailty Modeling for clustered survival data: a simulation study

Frailty Modeling for clustered survival data: a simulation study Frailty Modeling for clustered survival data: a simulation study IAA Oslo 2015 Souad ROMDHANE LaREMFiQ - IHEC University of Sousse (Tunisia) souad_romdhane@yahoo.fr Lotfi BELKACEM LaREMFiQ - IHEC University

More information

Statistical Analysis of List Experiments

Statistical Analysis of List Experiments Statistical Analysis of List Experiments Graeme Blair Kosuke Imai Princeton University December 17, 2010 Blair and Imai (Princeton) List Experiments Political Methodology Seminar 1 / 32 Motivation Surveys

More information

Probabilistic Modelling and Reasoning: Assignment Modelling the skills of Go players. Instructor: Dr. Amos Storkey Teaching Assistant: Matt Graham

Probabilistic Modelling and Reasoning: Assignment Modelling the skills of Go players. Instructor: Dr. Amos Storkey Teaching Assistant: Matt Graham Mechanics Proailistic Modelling and Reasoning: Assignment Modelling the skills of Go players Instructor: Dr. Amos Storkey Teaching Assistant: Matt Graham Due in 16:00 Monday 7 th March 2016. Marks: This

More information

The matrix approach for abstract argumentation frameworks

The matrix approach for abstract argumentation frameworks The matrix approach for abstract argumentation frameworks Claudette CAYROL, Yuming XU IRIT Report RR- -2015-01- -FR February 2015 Abstract The matrices and the operation of dual interchange are introduced

More information

THE BALANCED DECOMPOSITION NUMBER AND VERTEX CONNECTIVITY

THE BALANCED DECOMPOSITION NUMBER AND VERTEX CONNECTIVITY THE BALANCED DECOMPOSITION NUMBER AND VERTEX CONNECTIVITY SHINYA FUJITA AND HENRY LIU Astract The alanced decomposition numer f(g) of a graph G was introduced y Fujita and Nakamigawa [Discr Appl Math,

More information

FIDUCIAL INFERENCE: AN APPROACH BASED ON BOOTSTRAP TECHNIQUES

FIDUCIAL INFERENCE: AN APPROACH BASED ON BOOTSTRAP TECHNIQUES U.P.B. Sci. Bull., Series A, Vol. 69, No. 1, 2007 ISSN 1223-7027 FIDUCIAL INFERENCE: AN APPROACH BASED ON BOOTSTRAP TECHNIQUES H.-D. HEIE 1, C-tin TÂRCOLEA 2, Adina I. TARCOLEA 3, M. DEMETRESCU 4 În prima

More information

Contingency Tables Part One 1

Contingency Tables Part One 1 Contingency Tables Part One 1 STA 312: Fall 2012 1 See last slide for copyright information. 1 / 32 Suggested Reading: Chapter 2 Read Sections 2.1-2.4 You are not responsible for Section 2.5 2 / 32 Overview

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information