A stationarity test on Markov chain models based on marginal distribution

Size: px
Start display at page:

Download "A stationarity test on Markov chain models based on marginal distribution"

Transcription

1 Universiti Tunku Abdul Rahman, Kuala Lumpur, Malaysia 646 A stationarity test on Markov chain models based on marginal distribution Mahboobeh Zangeneh Sirdari 1, M. Ataharul Islam 2, and Norhashidah Awang 1 1 School of Mathematical Sciences, Universiti Sains Malaysia, USM, Pulau Pinang zangeneh m@yahoo.com, 2 Department of Statistics, University of Dhaka, Dhaka 1000 Abstract. A stationarity test on Markov chain models is proposed in this paper. Most of the previous test procedures for Markov chain models have been done based on conditional probabilities of transition matrix. The likelihood ratio test and chi-square test have been used for test procedures such as stationarity, order of Markov chain, and goodness of fit test, for which all the parameters need to be estimated. This paper uses the efficient score test, an extension of Tsiatis model, for testing the stationarity of Markov chain model based on marginal distribution as obtained (Azzalini, 1994). For testing the suitability of the proposed method, a numerical example of real life data is given Introduction Markov chain models are used in various applied fields such as time series analysis, longitudinal studies, real life time data, and environmental problems. The behavior of a Markov chain depends on the transition matrix, which contains transitional probabilities. In most practical studies the transition matrix is unknown and needs to be estimated. There are several methods for estimation and test procedure of transition probabilities. However, most of the researchers have worked on estimation of parameters. Yet, reports on test procedures are hardly found. One of the most important tests on Markov chain models is stationarity of transition probabilities which is interested to work on. In this section a brief summary of procedure tests on Markov chain is presented. Anderson and Goodman (1957) obtained maximum likelihood estimates and their asymptotic distribution in a Markov chain of arbitrary order when there are repeated observations of the chain. Likelihood ratio tests and χ 2 -tests are considered for testing stationary and order of higher-order Markov chains. Billingsley (1961) used Whittle s formula, chi-square and maximum likelihood methods to estimate and test the parameters. A sample {x 1, x 2,..., x n } from a first order Markov process with transition probabilities p ij and initial probabilities p i was considered. If s s matrix F {f ij } is defined as the transition count of the sequence, then it can be shown that (f ij f i p ij ) 2 /(f i p ij ) (1) ij

2 Universiti Tunku Abdul Rahman, Kuala Lumpur, Malaysia 647 is asymptotically chi-square in distribution with s(s 1) degrees of freedom. This chi-square statistic is useful for testing whether the transition probabilities of the process have specified values p ij. Then the natural problem of testing whether these transition probabilities have a specified form p ij (t) arises, where t is an unknown from the sample. Bartlett (1951) constructed a likelihood ratio test for the goodness of fit by proving that the asymptotic distribution in Markov chains was normal. For testing whether a sequence of observations is at most r-dependent, it is assumed that the transition probabilities are known, or at least depend on a limited number of parameters which can be estimated. If the transition probabilities are completely unknown, a different test is needed and this test is presented (Hoel, 1954) where the derivation depended upon Bartlett s results and methods. However, the previous methods of testing parameters were based on transition probabilities and statistic test was depended on transition probabilities. In recent decades, more research on estimating and test procedures of parameters of Markov chain model are extended to the new methods where covariates and link functions are used and repeated measures are considered. For example, Muenz and Rubinstein (1985) proposed a model for Markov chain based on covariates and showed how the covariates relate to changes in state. An extensive covariate-dependent for higher order Markov models was improved (Islam and Chowdhury, 2006). An influence of time-dependent covariates on the marginal distribution of binary response has been studied (Azzalini, 1994). It has been shown that the covariates relate only to the mean value of the process, independently of the association parameter. An application of Markov models based on marginal distribution is provided (Shafiqur Rahman and Islam, 2007). A goodness of fit test for the logistic regression model based on binary data was employed (Tsiatis, 1980). He modified the model related to the probability of responses with a set of covariates. In this paper, Tsiatis (1980) method is used for testing the stationarity of binary Markov chain model based on marginal distribution, modified (Azzalini, 1994). The efficient score test is used for testing null hypothesis, which only requires the estimate of parameters under the true null hypothesis. 2 Stationarity test A single stationary process (y 1,..., y T ) generated by a binary Markov chain taking values 0 and 1 is considered. The transition matrix is defined by [ ] [ ] p00 p P 01 1 p01 p 01 p 10 p 11 1 p 11 p 11

3 Universiti Tunku Abdul Rahman, Kuala Lumpur, Malaysia 648 where p ijt Pr(Y t j Y t 1 i); i, j 0, 1. The denoted mean θ E(Y t ) is for the case of stationary while θ t E(Y t ) stands for non-stationary process. The odds ratio between successive observations is defined as ψ p 11/(1 p 11 ) p 01 /(1 p 01 ) Pr(Y t 1 Y t 1) Pr(Y t 1 Y t 0) Pr(Y t 1 0, Y t 1) Pr(Y t 1 1, Y t 0). The range of possible values for ψ is independent of the value of θ. The relationship between mean and the probabilities can be presented as θ θp 11 + (1 θ)p 01 and for generalization, in non-stationary case θ t θ t 1 p 11 + (1 θ t 1 )p 01 In this case, θ t E(Y t ) varies with t via logit function resulting to logit(θ t ) χ tβ; θ t exp(χ tβ)/(1 + exp(χ tβ)) (2) where χ t is p-dimensional vector of time-dependent covariates and β is a p-dimensional parameter. The transition probabilities have been obtained in terms of the odds ratio and mean for observations as derived (Azzalini, 1994) { θt for ψ 1 p jt δ 1+(ψ 1)(θ t θ t 1 ) 2(ψ 1)(1 θ t 1 + j 1 δ+(ψ 1)(θt+θ t 1 2θ tθ t 1 ) ) 2(ψ 1)θ t 1 (1 θ t 1 for ψ 1 ) t 2,..., T. Where, δ 2 1+(ψ 1){(θ t θ t 1 ) 2 ψ (θ t +θ t 1 ) 2 2(θ t +θ t 1 )}. It is assumed that a sequence of observed data y 1,..., y T is available for inference. The likelihood function would be L i0 j0 p y ijt ijt Thus, the log-likelihood function is (1 p 01t ) 1 y 01t p y 01t 01t (1 p 11t ) 1 y 11t p y 11t 11t t0 ln L {(1 y 01t ) ln (1 p 01t )+y 01t ln p 01t +(1 y 11t ) ln (1 p 11t )+y 11t ln p 11t } p 01t {y 01t ln (1 p 01t ) +ln (1 p p 11t 01t)}+ {y 11t ln (1 p 11t ) +ln (1 p 11t)}

4 Universiti Tunku Abdul Rahman, Kuala Lumpur, Malaysia 649 {y 01t logit(p 01t ) + ln (1 p 01t )} + {y 11t logit(p 11t ) + ln (1 p 11t )} {y 01t logit(θ 01t )+ln (1 θ 01t )}+ {y 11t logit(θ 11t )+ln (1 θ 11t )}. Since the logit functions for conditional means are logit(θ 01t ) χ 01tβ 01, and logit(θ 11t ) χ 11tβ 11, ln L {y 01t χ 01tβ 01 ln (1 + exp (χ 01tβ 01 ))} + {y 11t χ 11tβ 11 ln (1 + exp (χ 11tβ 11 ))}. (3) The likelihood function for marginal model, as obtained from Azzalini (1994) would be L i0 p y it it (1 p 1t ) 1 yt p yt 1t. t0 Then the log-likelihood function is defined as p 1t l ln L {(1 y t ) ln (1 p 1t ) + y t ln p 1t } {y t ln (1 p 1t ) + ln (1 p 1t)} {y t logit(p 1t ) + ln (1 p 1t )} {y t logit(θ t ) + ln (1 θ t )} {y t χ tβ ln (1 + exp (χ tβ))}. (4) The estimate of parameters can be computed from the following equation ln L β ln L p 01t ln p 01t θ 01t ln θ 01t β + ln L p 11t ln p 11t θ 11t ln θ 11t β 0. Via equation (4), ln L β q {y t χ tq χ tq exp (χ tβ)/(1 + exp (χ tβ))} 0. And by exploding equation (3) the following equations can be written.

5 Universiti Tunku Abdul Rahman, Kuala Lumpur, Malaysia 650 ln L 0 β 01q {y 01t χ 01tq χ 01tq exp (χ 01tβ 01 )/(1 + exp (χ 01tβ 01 ))} 0, ln L 0 {y 11t χ 11tq χ 11tq exp (χ β 11tβ 11 )/(1 + exp (χ 11tβ 11 ))} 0, 11q where, ln L ln L 0 + ln L 1. To test the stationarity of binary Markov model based on marginal distribution, by considering Tsiatis model (Tsiatis, 1980), it is assumed that the space of covariate (χ 1,..., χ p ) is partitioned into G distinct regions in p-dimensional space denoted by R 1,..., R G. The indicator functions I (k) t defined by I (k) t considered as follow 1 if (χ 1,..., χ p ) R k and I (k) t (k 1,..., G) are 0 otherwise. The model is logit(θ t ) χ β + γ I t ; θ t exp (χ β + γ I t )/(1 + exp (χ β + γ I t )) (5) where I t (I (1) t,..., I (G) t ) and γ (γ 1,..., γ G ). The null hypothesis test is H 0 : γ 1... γ G 0, based on partitioning the space of time-dependent covariates into distinct regions. The related test statistic is a quadratic form of observed counts minus the expected counts which has asymptotic chi-square distribution with G degrees of freedom, as proven (Rao, 1973). Both the efficient scores test or likelihood ratio test can be used for testing the hypothesis. At this point, the efficient score test was used and the test statistic is defined by T Z V Z, (6) where Z is the G-dimensional vector ( l/ γ 1,..., l/ γ G ). And the matrix V is where, V A BC 1 B A jj 2 l/ γ j γ j (j, j 1,..., k), B jj 2 l/ γ j β j (j 1,..., k; j 0,..., p), C jj 2 l/ β j β j (j, j 1,..., p), All above terms were evaluated at γ 0 and β ˆβ, where ˆβ is the maximum likelihood estimate of the parameters when H 0 is true. The log-likelihood based on model (5) is

6 Universiti Tunku Abdul Rahman, Kuala Lumpur, Malaysia 651 l ln L {y t (χ β + γ I t ) ln (1 + exp (χ β + γ I t ))}. where I t is the row vector of indicator variables for the tth observation. The j th element of vector Z used in the computation of the statistic (6) is the partial derivative of l with respect to γ at γ 0 and β ˆβ, y t I (j) t I (j) t exp (χ β)/(1 + exp (χ β)) O j E j, where O j and E j are the observed and expected numbers of responses in the j th region. Therefore the statistic test (6) is a quadratic form of the vector of observed counts minus expected counts. Quantities necessary for computing the covariance matrix V are presented in A jj { ξ j ˆθt (1 ˆθ t ) j j 0 j j ; j, j 1,..., k, B jj ξ j χ jtˆθt (1 ˆθ t ) (j 1,..., k; j 0,..., p), C jj ξ j χ jt χ j tˆθ t (1 ˆθ t ) (j, j 0,..., p), where ξ i denotes the set of indices j such that (χ i1,..., χ ip ) R j, ˆθt exp (χ ˆβ)/(1 + exp (χ ˆβ)). The second derivatives of log-likelihood function for computing the statistic (6) with respect to γ q and β q, under null hypothesis are 2 ln L β q β q 2 ln L γ q β q 2 ln L γ q γ q exp(χ β + γ I t ) χ tq χ tq [ 1 + exp(χ β + γ I t ) ][ exp(χ β + γ I t ) ] χ tq χ tq θ t (1 θ t ), I (q) exp(χ β + γ I t ) t χ tq [ 1 + exp(χ β + γ I t ) ][ exp(χ β + γ I t ) ] I (q) t χ tq θ t (1 θ t ), I (q) t I (q ) exp(χ β + γ I t ) t [ 1 + exp(χ β + γ I t ) ][ exp(χ β + γ I t ) ]

7 Universiti Tunku Abdul Rahman, Kuala Lumpur, Malaysia 652 I (q) t I (q) t θ t (1 θ t ). 3 Extension the model for second order Markov chains The second order Markov model for times t 2, t 1 and t is considered. The related transition matrix is shown by p 000 p p 001 p 001 p P 100 p 101 p 010 p p 101 p p 011 p 011 p 110 p p 111 p 111 where p ljit Pr(Y t i Y t 2 l, Y t 1 j); i, j, l 0, 1,t 1,..., T. Marginal mean is defined by θ t E(Y t ), which is θ t E(Y t ) Pr(Y t 1) Pr(Y t 1 Y t 2 0, Y t 1 0) Pr(Y t 2 0, Y t 1 0) + Pr(Y t 1 Y t 2 1, Y t 1 0) Pr(Y t 2 1, Y t 1 0) + Pr(Y t 1 Y t 2 0, Y t 1 1) Pr(Y t 2 0, Y t 1 1) + Pr(Y t 1 Y t 2 1, Y t 1 1) Pr(Y t 2 1, Y t 1 1) θ 001t + θ 101t + θ 011t + θ 111t where, θ ljit E(Y ljit ); i, j, l 0, 1 are called conditional means. For an available sequence observed data, y 1,..., y T, the likelihood function can be written as L i0 j0 l0 p y ljit ljit Thus, the log-likelihood function is (1 p lj1t ) 1 y lj1t p y lj1t lj1t. t0 j0 l0 1 1 ln L {(1 y lj1t ) ln (1 p lj1t ) + y lj1t ln p lj1t j1 l1 1 1 p lj1t {y lj1t ln j1 l1 (1 p lj1t ) + ln (1 p lj1t)} 1 1 {y lj1t logit(p lj1t ) + ln (1 p lj1t )} j1 l1

8 Universiti Tunku Abdul Rahman, Kuala Lumpur, Malaysia {y lj1t logit(θ lj1t ) + ln (1 θ lj1t )}. j1 l1 The logit functions for conditional means are logit(θ lj1t ) χ lj1tβ lj1 ; l, j 0, 1. Hence, ln L {y 001t χ 001tβ 001 ln (1 + exp (χ 001tβ 001 ))} + {y 101t χ 101tβ 101 ln (1 + exp (χ 101tβ 101 ))} + {y 011t χ 011tβ 011 ln (1 + exp (χ 011tβ 011 ))} + {y 111t χ 111tβ 111 ln (1 + exp (χ 111tβ 111 ))} ln L 1 + ln L 2 + ln L 3 + ln L 4. The likelihood function for marginal model would be L i0 p y it it (1 p 1t ) 1 yt p yt 1t. t0 where, p it Pr(Y t i); i 0, 1. Then, the log-likelihood function is defined as p 1t ln L {(1 y t ) ln (1 p 1t ) + y t ln p 1t } {y t ln (1 p 1t ) + ln (1 p 1t)} {y t logit(p 1t ) + ln (1 p 1t )} {y t logit(θ t ) + ln (1 θ t )}. Via equation (2) ln L {y t χ tβ ln (1 + exp (χ tβ))}. The estimate of parameters can be computed from the following equation ln L β ln L ln p 001t ln θ 001t + ln L ln p 101t ln θ 101t p 001t θ 001t β p 101t θ 101t β + ln L ln p 011t ln θ 011t + ln L ln p 111t ln θ 111t p 011t θ 011t β p 111t θ 111t β 0.

9 Universiti Tunku Abdul Rahman, Kuala Lumpur, Malaysia 654 Hence, ln L 1 β 001q ln L 2 β 101q ln L 3 β 011q ln L 4 β 111q {y 001t χ 001tq χ 001tq exp (χ 001tβ 001 )/(1 + exp (χ 001tβ 001 ))} 0, {y 101t χ 101tq χ 101tq exp (χ 101tβ 101 )/(1 + exp (χ 101tβ 101 ))} 0, {y 011t χ 011tq χ 011tq exp (χ 011tβ 011 )/(1 + exp (χ 011tβ 011 ))} 0, {y 111t χ 111tq χ 111tq exp (χ 111tβ 111 )/(1 + exp (χ 111tβ 111 ))} 0, Model (5) can be used for testing of stationarity on the second order Markov model with the null hypothesis H 0 : γ 1... γ G 0, and the test statistic is T Z V Z. 4 Example The proposed test procedure in this paper is applied for the Health and Retirement Study (HRS) data, which is about retirement and health among the elderly in the United States. The data were collected from 1992 to 2006 by the RAND Centre for people in 8 waves, for considering repeated measures. In this case, only individuals who attended to the program in 1992 and the follow up until 2006 have been considered. The study is about the affective factors of depression during the elderly. Depression (0 for no depress and 1 for depress) is considered as dependent variable, and age (in year), gender (0 for male and 1 for female), body mass index (BMI), and drink (0 for no drink and 1 for drink) as covariate variables. The space of covariate (χ 1,..., χ p ) is partitioned into 4 distinct regions, (male and no drink), (male and drink), (female and no drink), and (female and drink). Some of variables were contained missing values because reference person did not respond to the all waves. Thus, these individuals are dropped completely from studying if there were missing value in the covariate variables, but were kept if the value of dependent variable (depression) was missing. There were 668 missing values in the covariate variables that included 353 IDs, i.e. in these individuals there was respond for depression variable but not for covariate variables; so 353 IDs have dropped from data in this work. For estimating the parameters of model, S-Plus program which has been modified by Chowdhury et al. (2005), is developed and used. The result of estimation parameters and test statistics for first and second order of Markov chain model based on conditional probabilities is showed in Table 1

10 Universiti Tunku Abdul Rahman, Kuala Lumpur, Malaysia 655 and 2. Table 3 shows the result for marginal model. Billingsley chi-square statistics is computed by equation (1), and Tsiatis statistics is estimated by equation (6). The results in table 3 show that data satisfy the model for first and second order Markov chain based on marginal distribution. The estimates of parameters for the first and second order transitions specify negative association between depression and age (non-significant) and drink (non-significant); and positive association with BMI and sex. 5 Conclusion In the previous works most of test procedures for stationarity and order of higher order Markov chain have been based on likelihood ratio test and usual chi-square test. In this paper a stationarity test for first and second order Markov chain model was developed based on marginal probabilities by considering repeated measures. It was an extension of Tsiatis test procedure for logistic regression models which is improved for Markov chain models by using logistic regression function. The test is also done by using Billingsley chi-square test. At this point, the Tsiatis test considered the efficient score test which only requires the estimate of parameters under the null hypothesis. The results of tests showed satisfied stationarity of the model. But the important points are, the estimate of Tsiatis statistic is easier than Billingsley statistic, the number of estimated parameters is smaller, and extension of the model for Tsiatis is easier. The utility of the proposed test has been examined with an example for real life data. The results indicate the suitability of the techniques. In addition the proposed test procedure can be extending for higher order and test of order of Markov chain. References 1. Anderson, T. W. and Goodman, L. A.: Statistical inference about Markov chains. The Annals of Mathematical Statistics 28, (1957) 2. Azzalini, A.: Logistic regression for autocorrelated data with application to repeated measures. Biometrika 81, (1994) 3. Bartlett, M. S.: The frequency goodness of fit test for probability chains. Proc. Camb. Phil. Soc. 47, 86 (1951) 4. Billingsley, P.: Statistical methods in Markov chains. The Annals of Mathematical Statistics 32, (1961) 5. Bonney, G. E.: Regressive logistic models for familial disease and other binary traits. Biometrics 42, (1986) 6. Chowdhury, R. I., Islam, M. A., Shah, M. A. and Al-Enezi, N.: A computer program to estimate the parameters of covariate dependence higher order Markov model. Computer Methods and Program in Biomedicine 77, (2005) 7. Cox, D. R.: The Analysis of Binary Data. London: Methuen (1970) 8. Hoel, G.: A test for Markoff chains. Biometrika 41, (1954) 9. Islam, M. A. and Chowdhury, R. I.: A higher order Markov model for analyzing covariate dependence. Applied Mathematical 30, (2006)

11 Universiti Tunku Abdul Rahman, Kuala Lumpur, Malaysia Islam, M. A., Chowdhury, R. I. and Huda, Shahriar: Markov Models with Covariate Dependence for Repeated Measures. New York: Nova Science (2008) 11. Muenz, L.R. and Rubinstein, L.V.: Markov models for covariate dependence of binary sequences. Biometrics 41, (1985) 12. Rao, C. R.: Linear Statistical Inference and its Applications. 2nd edition, New York: Wiley (1973) 13. Shafiqur M. Rahman and Islam, M. A.: Markov structure based logistic regression for repeated measures: An application to diabetes mellitus data. Statistical Methodology 4, (2007) 14. Tsiatis, Anastasios A.: A note on a goodness-of-fit test for the logistic regression model. Biometrika 67, (1980) Table 1. Transition counts of Markov chain of depression data. Transition 0 1 First order Second order

12 Universiti Tunku Abdul Rahman, Kuala Lumpur, Malaysia 657 Table 2. Estimates of parameters of covariate-dependent Markov models for depression data based on conditional probabilities. Model ˆβ s.e. p-value First order 0 1 Constant Age Sex BMI Drink Constant Age Sex BMI Drink Billingsley Chi-square 3.94E Tsiatis test Second order Constant Age Sex BMI Drink Constant o.003 Age Sex BMI Drink Constant Age Sex BMI Drink Constant Age Sex BMI Drink Billingsley Chi-square 2.13E Tsiatis test

13 Universiti Tunku Abdul Rahman, Kuala Lumpur, Malaysia 658 Table 3. Estimates of parameters of covariate-dependent Markov models for depression data based on marginal probabilities. Model ˆβ s.e. p-value First order Constant Age Sex BMI Drink Billingsley Chi-square 4.217E Tsiatis test Second order Constant Age Sex BMI Drink Billingsley Chi-square 4.552E Tsiatis test

Introduction to mtm: An R Package for Marginalized Transition Models

Introduction to mtm: An R Package for Marginalized Transition Models Introduction to mtm: An R Package for Marginalized Transition Models Bryan A. Comstock and Patrick J. Heagerty Department of Biostatistics University of Washington 1 Introduction Marginalized transition

More information

Stat 642, Lecture notes for 04/12/05 96

Stat 642, Lecture notes for 04/12/05 96 Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal

More information

On Fitting Generalized Linear Mixed Effects Models for Longitudinal Binary Data Using Different Correlation

On Fitting Generalized Linear Mixed Effects Models for Longitudinal Binary Data Using Different Correlation On Fitting Generalized Linear Mixed Effects Models for Longitudinal Binary Data Using Different Correlation Structures Authors: M. Salomé Cabral CEAUL and Departamento de Estatística e Investigação Operacional,

More information

Covariate Dependent Markov Models for Analysis of Repeated Binary Outcomes

Covariate Dependent Markov Models for Analysis of Repeated Binary Outcomes Journal of Modern Applied Statistical Methods Volume 6 Issue Article --7 Covariate Dependent Marov Models for Analysis of Repeated Binary Outcomes M.A. Islam Department of Statistics, University of Dhaa

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q) Supplementary information S7 Testing for association at imputed SPs puted SPs Score tests A Score Test needs calculations of the observed data score and information matrix only under the null hypothesis,

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Longitudinal Modeling with Logistic Regression

Longitudinal Modeling with Logistic Regression Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to

More information

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about

More information

Lecture 2: Categorical Variable. A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti

Lecture 2: Categorical Variable. A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti Lecture 2: Categorical Variable A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti 1 Categorical Variable Categorical variable is qualitative

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

GOODNESS-OF-FIT FOR GEE: AN EXAMPLE WITH MENTAL HEALTH SERVICE UTILIZATION

GOODNESS-OF-FIT FOR GEE: AN EXAMPLE WITH MENTAL HEALTH SERVICE UTILIZATION STATISTICS IN MEDICINE GOODNESS-OF-FIT FOR GEE: AN EXAMPLE WITH MENTAL HEALTH SERVICE UTILIZATION NICHOLAS J. HORTON*, JUDITH D. BEBCHUK, CHERYL L. JONES, STUART R. LIPSITZ, PAUL J. CATALANO, GWENDOLYN

More information

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

Three-Way Contingency Tables

Three-Way Contingency Tables Newsom PSY 50/60 Categorical Data Analysis, Fall 06 Three-Way Contingency Tables Three-way contingency tables involve three binary or categorical variables. I will stick mostly to the binary case to keep

More information

M. H. Gonçalves 1 M. S. Cabral 2 A. Azzalini 3

M. H. Gonçalves 1 M. S. Cabral 2 A. Azzalini 3 M. H. Gonçalves 1 M. S. Cabral 2 A. Azzalini 3 1 University of Algarve, Portugal 2 University of Lisbon, Portugal 3 Università di Padova, Italy user!2010, July 21-23 1 2 3 4 5 What is bild? an R. parametric

More information

Quasi-likelihood Scan Statistics for Detection of

Quasi-likelihood Scan Statistics for Detection of for Quasi-likelihood for Division of Biostatistics and Bioinformatics, National Health Research Institutes & Department of Mathematics, National Chung Cheng University 17 December 2011 1 / 25 Outline for

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)

More information

Lecture 5: Poisson and logistic regression

Lecture 5: Poisson and logistic regression Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 3-5 March 2014 introduction to Poisson regression application to the BELCAP study introduction

More information

Lecture 2: Poisson and logistic regression

Lecture 2: Poisson and logistic regression Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 11-12 December 2014 introduction to Poisson regression application to the BELCAP study introduction

More information

Solutions for Examination Categorical Data Analysis, March 21, 2013

Solutions for Examination Categorical Data Analysis, March 21, 2013 STOCKHOLMS UNIVERSITET MATEMATISKA INSTITUTIONEN Avd. Matematisk statistik, Frank Miller MT 5006 LÖSNINGAR 21 mars 2013 Solutions for Examination Categorical Data Analysis, March 21, 2013 Problem 1 a.

More information

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities

More information

Parameter Estimation in Logistic Regression for Transition, Reverse Transition and Repeated Transition from Repeated Outcomes *

Parameter Estimation in Logistic Regression for Transition, Reverse Transition and Repeated Transition from Repeated Outcomes * Applied Mathematics, 0, 3, 739-749 http://dx.doi.org/0.436/am.0.3340 Published Online November 0 (http://www.scirp.org/journal/am) Parameter Estimation in Logistic Regression for Transition, Reverse Transition

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

Logistic regression model for survival time analysis using time-varying coefficients

Logistic regression model for survival time analysis using time-varying coefficients Logistic regression model for survival time analysis using time-varying coefficients Accepted in American Journal of Mathematical and Management Sciences, 2016 Kenichi SATOH ksatoh@hiroshima-u.ac.jp Research

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University Lecture 25 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 6 7 8 9 10 11 1 Hypothesis s of homgeneity 2 Estimating risk

More information

STA 450/4000 S: January

STA 450/4000 S: January STA 450/4000 S: January 6 005 Notes Friday tutorial on R programming reminder office hours on - F; -4 R The book Modern Applied Statistics with S by Venables and Ripley is very useful. Make sure you have

More information

A class of latent marginal models for capture-recapture data with continuous covariates

A class of latent marginal models for capture-recapture data with continuous covariates A class of latent marginal models for capture-recapture data with continuous covariates F Bartolucci A Forcina Università di Urbino Università di Perugia FrancescoBartolucci@uniurbit forcina@statunipgit

More information

Discrete Multivariate Statistics

Discrete Multivariate Statistics Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are

More information

Likelihood-based inference for antedependence (Markov) models for categorical longitudinal data

Likelihood-based inference for antedependence (Markov) models for categorical longitudinal data University of Iowa Iowa Research Online Theses and Dissertations Summer 2011 Likelihood-based inference for antedependence (Markov) models for categorical longitudinal data Yunlong Xie University of Iowa

More information

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013 Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not

More information

Sociology 362 Data Exercise 6 Logistic Regression 2

Sociology 362 Data Exercise 6 Logistic Regression 2 Sociology 362 Data Exercise 6 Logistic Regression 2 The questions below refer to the data and output beginning on the next page. Although the raw data are given there, you do not have to do any Stata runs

More information

Lecture 10: Introduction to Logistic Regression

Lecture 10: Introduction to Logistic Regression Lecture 10: Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 2007 Logistic Regression Regression for a response variable that follows a binomial distribution Recall the binomial

More information

An Approximate Test for Homogeneity of Correlated Correlation Coefficients

An Approximate Test for Homogeneity of Correlated Correlation Coefficients Quality & Quantity 37: 99 110, 2003. 2003 Kluwer Academic Publishers. Printed in the Netherlands. 99 Research Note An Approximate Test for Homogeneity of Correlated Correlation Coefficients TRIVELLORE

More information

,..., θ(2),..., θ(n)

,..., θ(2),..., θ(n) Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.

More information

Introduction Large Sample Testing Composite Hypotheses. Hypothesis Testing. Daniel Schmierer Econ 312. March 30, 2007

Introduction Large Sample Testing Composite Hypotheses. Hypothesis Testing. Daniel Schmierer Econ 312. March 30, 2007 Hypothesis Testing Daniel Schmierer Econ 312 March 30, 2007 Basics Parameter of interest: θ Θ Structure of the test: H 0 : θ Θ 0 H 1 : θ Θ 1 for some sets Θ 0, Θ 1 Θ where Θ 0 Θ 1 = (often Θ 1 = Θ Θ 0

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Testing Independence

Testing Independence Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1

More information

Sample size determination for logistic regression: A simulation study

Sample size determination for logistic regression: A simulation study Sample size determination for logistic regression: A simulation study Stephen Bush School of Mathematical Sciences, University of Technology Sydney, PO Box 123 Broadway NSW 2007, Australia Abstract This

More information

Single-level Models for Binary Responses

Single-level Models for Binary Responses Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =

More information

CHAPTER 1: BINARY LOGIT MODEL

CHAPTER 1: BINARY LOGIT MODEL CHAPTER 1: BINARY LOGIT MODEL Prof. Alan Wan 1 / 44 Table of contents 1. Introduction 1.1 Dichotomous dependent variables 1.2 Problems with OLS 3.3.1 SAS codes and basic outputs 3.3.2 Wald test for individual

More information

Modeling and inference for an ordinal effect size measure

Modeling and inference for an ordinal effect size measure STATISTICS IN MEDICINE Statist Med 2007; 00:1 15 Modeling and inference for an ordinal effect size measure Euijung Ryu, and Alan Agresti Department of Statistics, University of Florida, Gainesville, FL

More information

One-stage dose-response meta-analysis

One-stage dose-response meta-analysis One-stage dose-response meta-analysis Nicola Orsini, Alessio Crippa Biostatistics Team Department of Public Health Sciences Karolinska Institutet http://ki.se/en/phs/biostatistics-team 2017 Nordic and

More information

Tests of independence for censored bivariate failure time data

Tests of independence for censored bivariate failure time data Tests of independence for censored bivariate failure time data Abstract Bivariate failure time data is widely used in survival analysis, for example, in twins study. This article presents a class of χ

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH The First Step: SAMPLE SIZE DETERMINATION THE ULTIMATE GOAL The most important, ultimate step of any of clinical research is to do draw inferences;

More information

Simple logistic regression

Simple logistic regression Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a

More information

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction

More information

A novel method for testing goodness of fit of a proportional odds model : an application to an AIDS study

A novel method for testing goodness of fit of a proportional odds model : an application to an AIDS study Goodness J.Natn.Sci.Foundation of fit testing in Sri Ordinal Lanka 2008 response 36 (2):25-35 reession models 25 RESEARCH ARTICLE A novel method for testing goodness of fit of a proportional odds model

More information

(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ )

(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ ) Setting RHS to be zero, 0= (θ )+ 2 L(θ ) (θ θ ), θ θ = 2 L(θ ) 1 (θ )= H θθ (θ ) 1 d θ (θ ) O =0 θ 1 θ 3 θ 2 θ Figure 1: The Newton-Raphson Algorithm where H is the Hessian matrix, d θ is the derivative

More information

Categorical Data Analysis Chapter 3

Categorical Data Analysis Chapter 3 Categorical Data Analysis Chapter 3 The actual coverage probability is usually a bit higher than the nominal level. Confidence intervals for association parameteres Consider the odds ratio in the 2x2 table,

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Sunil Kumar Dhar Center for Applied Mathematics and Statistics, Department of Mathematical Sciences, New Jersey

More information

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Jeremy S. Conner and Dale E. Seborg Department of Chemical Engineering University of California, Santa Barbara, CA

More information

Cohen s s Kappa and Log-linear Models

Cohen s s Kappa and Log-linear Models Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance

More information

Statistics 135 Fall 2008 Final Exam

Statistics 135 Fall 2008 Final Exam Name: SID: Statistics 135 Fall 2008 Final Exam Show your work. The number of points each question is worth is shown at the beginning of the question. There are 10 problems. 1. [2] The normal equations

More information

1/15. Over or under dispersion Problem

1/15. Over or under dispersion Problem 1/15 Over or under dispersion Problem 2/15 Example 1: dogs and owners data set In the dogs and owners example, we had some concerns about the dependence among the measurements from each individual. Let

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

BMI 541/699 Lecture 22

BMI 541/699 Lecture 22 BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

(x t. x t +1. TIME SERIES (Chapter 8 of Wilks)

(x t. x t +1. TIME SERIES (Chapter 8 of Wilks) 45 TIME SERIES (Chapter 8 of Wilks) In meteorology, the order of a time series matters! We will assume stationarity of the statistics of the time series. If there is non-stationarity (e.g., there is a

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

IP WEIGHTING AND MARGINAL STRUCTURAL MODELS (CHAPTER 12) BIOS IPW and MSM

IP WEIGHTING AND MARGINAL STRUCTURAL MODELS (CHAPTER 12) BIOS IPW and MSM IP WEIGHTING AND MARGINAL STRUCTURAL MODELS (CHAPTER 12) BIOS 776 1 12 IPW and MSM IP weighting and marginal structural models ( 12) Outline 12.1 The causal question 12.2 Estimating IP weights via modeling

More information

Stat 135 Fall 2013 FINAL EXAM December 18, 2013

Stat 135 Fall 2013 FINAL EXAM December 18, 2013 Stat 135 Fall 2013 FINAL EXAM December 18, 2013 Name: Person on right SID: Person on left There will be one, double sided, handwritten, 8.5in x 11in page of notes allowed during the exam. The exam is closed

More information

The Logit Model: Estimation, Testing and Interpretation

The Logit Model: Estimation, Testing and Interpretation The Logit Model: Estimation, Testing and Interpretation Herman J. Bierens October 25, 2008 1 Introduction to maximum likelihood estimation 1.1 The likelihood function Consider a random sample Y 1,...,

More information

STAT 7030: Categorical Data Analysis

STAT 7030: Categorical Data Analysis STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

REVISED PAGE PROOFS. Logistic Regression. Basic Ideas. Fundamental Data Analysis. bsa350

REVISED PAGE PROOFS. Logistic Regression. Basic Ideas. Fundamental Data Analysis. bsa350 bsa347 Logistic Regression Logistic regression is a method for predicting the outcomes of either-or trials. Either-or trials occur frequently in research. A person responds appropriately to a drug or does

More information

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Michael J. Daniels and Chenguang Wang Jan. 18, 2009 First, we would like to thank Joe and Geert for a carefully

More information

BIOSTATS Intermediate Biostatistics Spring 2017 Exam 2 (Units 3, 4 & 5) Practice Problems SOLUTIONS

BIOSTATS Intermediate Biostatistics Spring 2017 Exam 2 (Units 3, 4 & 5) Practice Problems SOLUTIONS BIOSTATS 640 - Intermediate Biostatistics Spring 2017 Exam 2 (Units 3, 4 & 5) Practice Problems SOLUTIONS Practice Question 1 Both the Binomial and Poisson distributions have been used to model the quantal

More information

LOGISTICS REGRESSION FOR SAMPLE SURVEYS

LOGISTICS REGRESSION FOR SAMPLE SURVEYS 4 LOGISTICS REGRESSION FOR SAMPLE SURVEYS Hukum Chandra Indian Agricultural Statistics Research Institute, New Delhi-002 4. INTRODUCTION Researchers use sample survey methodology to obtain information

More information

An Introduction to Multivariate Statistical Analysis

An Introduction to Multivariate Statistical Analysis An Introduction to Multivariate Statistical Analysis Third Edition T. W. ANDERSON Stanford University Department of Statistics Stanford, CA WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Binomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials

Binomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials Lecture : Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 27 Binomial Model n independent trials (e.g., coin tosses) p = probability of success on each trial (e.g., p =! =

More information

1 Interaction models: Assignment 3

1 Interaction models: Assignment 3 1 Interaction models: Assignment 3 Please answer the following questions in print and deliver it in room 2B13 or send it by e-mail to rooijm@fsw.leidenuniv.nl, no later than Tuesday, May 29 before 14:00.

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

Introduction to the Logistic Regression Model

Introduction to the Logistic Regression Model CHAPTER 1 Introduction to the Logistic Regression Model 1.1 INTRODUCTION Regression methods have become an integral component of any data analysis concerned with describing the relationship between a response

More information

Estimating Explained Variation of a Latent Scale Dependent Variable Underlying a Binary Indicator of Event Occurrence

Estimating Explained Variation of a Latent Scale Dependent Variable Underlying a Binary Indicator of Event Occurrence International Journal of Statistics and Probability; Vol. 4, No. 1; 2015 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education Estimating Explained Variation of a Latent

More information

Power and sample size calculations for designing rare variant sequencing association studies.

Power and sample size calculations for designing rare variant sequencing association studies. Power and sample size calculations for designing rare variant sequencing association studies. Seunggeun Lee 1, Michael C. Wu 2, Tianxi Cai 1, Yun Li 2,3, Michael Boehnke 4 and Xihong Lin 1 1 Department

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Linear Models and Estimation by Least Squares

Linear Models and Estimation by Least Squares Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:

More information

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios ST3241 Categorical Data Analysis I Two-way Contingency Tables 2 2 Tables, Relative Risks and Odds Ratios 1 What Is A Contingency Table (p.16) Suppose X and Y are two categorical variables X has I categories

More information

Comparison between conditional and marginal maximum likelihood for a class of item response models

Comparison between conditional and marginal maximum likelihood for a class of item response models (1/24) Comparison between conditional and marginal maximum likelihood for a class of item response models Francesco Bartolucci, University of Perugia (IT) Silvia Bacci, University of Perugia (IT) Claudia

More information

Exercise 7.4 [16 points]

Exercise 7.4 [16 points] STATISTICS 226, Winter 1997, Homework 5 1 Exercise 7.4 [16 points] a. [3 points] (A: Age, G: Gestation, I: Infant Survival, S: Smoking.) Model G 2 d.f. (AGIS).008 0 0 (AGI, AIS, AGS, GIS).367 1 (AG, AI,

More information

GEE for Longitudinal Data - Chapter 8

GEE for Longitudinal Data - Chapter 8 GEE for Longitudinal Data - Chapter 8 GEE: generalized estimating equations (Liang & Zeger, 1986; Zeger & Liang, 1986) extension of GLM to longitudinal data analysis using quasi-likelihood estimation method

More information

Statistical Analysis of List Experiments

Statistical Analysis of List Experiments Statistical Analysis of List Experiments Graeme Blair Kosuke Imai Princeton University December 17, 2010 Blair and Imai (Princeton) List Experiments Political Methodology Seminar 1 / 32 Motivation Surveys

More information

4 Multicategory Logistic Regression

4 Multicategory Logistic Regression 4 Multicategory Logistic Regression 4.1 Baseline Model for nominal response Response variable Y has J > 2 categories, i = 1,, J π 1,..., π J are the probabilities that observations fall into the categories

More information

Binary Logistic Regression

Binary Logistic Regression The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b

More information

Likelihood Inference in the Presence of Nuisance Parameters

Likelihood Inference in the Presence of Nuisance Parameters PHYSTAT2003, SLAC, September 8-11, 2003 1 Likelihood Inference in the Presence of Nuance Parameters N. Reid, D.A.S. Fraser Department of Stattics, University of Toronto, Toronto Canada M5S 3G3 We describe

More information

Power and Sample Size Calculations with the Additive Hazards Model

Power and Sample Size Calculations with the Additive Hazards Model Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine

More information

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III) Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III) Florian Pelgrin HEC September-December 2010 Florian Pelgrin (HEC) Constrained estimators September-December

More information

Analysis of variance, multivariate (MANOVA)

Analysis of variance, multivariate (MANOVA) Analysis of variance, multivariate (MANOVA) Abstract: A designed experiment is set up in which the system studied is under the control of an investigator. The individuals, the treatments, the variables

More information