Negative Multinomial Model and Cancer. Incidence
|
|
- Harry Harris
- 6 years ago
- Views:
Transcription
1 Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence S. Lahiri & Sunil K. Dhar Department of Mathematical Sciences, CAMS New Jersey Institute of Technology, Newar, NJ-07111, USA SUMMARY The generalized linear model for a multi-way contingency table for several independent populations that follow the extended negative multinomial distributions is introduced. This model represents an extension of negative multinomial log-linear model. The parameters of the new model are estimated bythe quasi-lielihood method and the corresponding score function Corresponding author 1
2 which gives a close from estimate of the regression parameters. The goodnessof-fit test for the model is also discussed. An application of the log-linear model under the generalized inverse sampling scheme representing cancer incidence data is given as an example of this model. Keywords: generalized inverse sampling, extended negative multinomial distribution, quasi-lielihood, asymptotic distribution. 1 INTRODUCTION A generalized linear model (GLIM) for functions of the extended negative multinomial frequency counts for s independent sub populations is defined along the lines of Grizzle, Starmer and Koch (1969), Bonett (1985a) and Bonett(1985b). The extended negative multinomial model (ENMn) is defined in Dhar (1995) and the extended negative multinomial log-linear model is defined in Lahiri and Dhar (2008). Let (f ri,, f 2i, f 1i, f 1i, f 2i,, f ni ), be s sets of count observations from s subpopulations labeled i = 1, 2,, s, and let j = r,, 1, 1,, n be the set of response categories as summarized in Table 1 below. Here, denotes transpose of a matrix. To define the model the following notations are needed. Denote the vector f i = (f ri,, f 2i, f 1i ), f i = (f 1i, f 2i,, f ni ) and their corresponding expected valued vectors µ i = (µ ri,, µ 1i ), µ i = (µ 1i,, µ ni ), respectively. The vector f (i) = (f i, f i) is assumed to follow an extended negative multinomial distribution with parameters, i = r j=1 f ji and µ (i) = (µ i, µ i ) = i ( r j=1 p ji) 1 p i, where p i = (p ri,, p 1i, p 1i,, p ni ) is from Lahiri and Dhar (2008, Section 2). Let f be the augmented vector de- 2
3 fined by f = (f (1), f (2),, f (s) ) with expected value µ = (µ (1), µ (2),, µ (s) ). The generalized linear model for count data is defined by y = Xβ + δ (1.1) where y = (y 1, y 2,, y s ) and y i = g(f ri ),, g(f 2i ), g(f 1i ), g(f 1i ), g(f 2i ),, g(f ni ), i = 1, 2,, s, and g is function from R to R, which has an inverse. Let X be a nown full ran q and of dimensions (r + n)s q (q [r + n]s) design matrix with intercept, main effects, and interaction effects, β is a q 1 vector of unnown non-random parameters, and δ is a (r + n)s 1 unobservable error vector such that y 0 = g( 1 ), g( 2 ),, g( s ), y 0 Aβ = 0 and y Xβ is asymptotically zero, as. Here, = s i=1 i, the fractions of the total sample, namely, i = w i, f 0 = ( 1, 2,, s ), and A = (a 1, a 2,, a s ), s q, are nown constants, with linearly independent constraints y 0 Aβ = 0. The data consists of observing the vector f along with design matrix X as shown in the r c contingency Table 1. This model is nown to be the generalized linear model under the extended inverse sampling scheme or the extended negative multinomial distribution, with g in (1.1) as the lin function. This model is specifically nown to be linear or the log linear model according to g being linear or the log function. A closed-form estimator of the model parameters, estimate of the covariance matrix, and the general Wald test are derived under the assumption of extended negative multinomial sampling. The commonly used GLIM, does not accommodate contingency tables of inverse sampling schemes as has also been observed by Bonett (1985a). 3
4 Observations from a subpopulation are sampled following generalized inverse sampling and are described below for the sae of completeness. The resulting observed frequency counts can be summarized in the following s (r + n) contingency table. 1.1 Definition of ENMn Consider a sequence of independent trials, where one of the events A i occurs n with probability p i, i { r,, 1, 1,, n}, p i = 1. Suppose that A r, A (r 1),, A 1 are the rare events. i= r,i 0 Let f i represent the frequency with which A i, i { r,, 1} occurs. Here, f i s are counted until we get a total of (predetermined value) observations of at least one of the A i s, i { r,, 1}. Then the distribution of f = (f r,, f 1, f 1,, f n ) is said to follow an extended negative multinomial distribution with parameters and p = (p r,, p 1, p 1,, p n ) with the joint probability density function given as n ( f i + 1)! i=1 n f i! ( 1)! i=1 ( r i=1 p i) p f 1 1 p fn n! r f i! i=1 (p 1) f 1 (p r) f r, f i and f i 0, i { r,, 1, 1,, n}, where p i = n i= r,i 0 p i r, p i i=1 i = 1,...r, p i = 1 and = r i=1 f i. An observation that fall in cell (j,i) has the probability p ji. 4
5 Table 1: Frequency Distribution Categories of responses Population r n 1 f r1... f 21 f 11 f 11 f f n1 2 f r2... f 22 f 12 f 12 f f n s f rs... f 2s f 1s f 1s f 2s... f ns In order to define the covariance matrix of f, the following notations are introduced. Let p p ji ji = r j=1 p and D µi be a diagonal matrix with ji the elements of the vector µ i, n 1, along the main diagonal. Then the covariance matrix of f is given by the (r + n)s (r + n)s bloc diagonal matrix Σ f of ran (r + n)s with Σ (i) f ( i) = Σ (i) 1 ( i ) 0, i = 1, 2,, s (1.2) 0 Σ (i) 2 ( i ) as the blocs, where 0 is r n zero matrix and where Σ (i) 1 ( i ) = i p 1i(1 p 1i) i p 1ip 2i i p 1ip ri i p 1ip 2i i p 2i(1 p 2i) i p 2ip ri,(1.3) i p 1ip ri i p 2ip ri i p ri(1 p ri) 5
6 and Σ (i) 2 ( i ) = (µ iµ i) i + D µi. (1.4) 2 Estimation The quasi-lielihood estimator of β (Myers et al. 2002, Section 5.4) with constraints has been given in this section. An efficient estimator of β, under the constraint Aβ = y 0, minimizes X 2 = (y Xβ) Ω(y Xβ) λ (Aβ y 0 ), (2.5) where Ω 1 = [ y / f] Σ f [ y / f], Σ f is the estimate of the bloc diagonal matrix Σ f with blocs given by the matrix in (1.2), with i = 1, i = 1, 2,..., s times w i, where w i = i nown, λ is an s 1 vector of Lagrange multipliers, y 0 = g( 1 ), g( 2 ),, g( s ). The maximum lielihood estimator of the covariance matrix, Σ f, is obtained by replacing the elements of p and µ (which is a function of p), by their corresponding sample proportions based on f (Dhar 1995). Please see the expression for the true parameter Ω 1 in equation (3.7) in the following section. Minimizing (2.5) with respect to β and using multivariate calculus, the quasi-lielihood estimator of β with constraints is computed and has the following closed form: β = β (X ΩX) 1 A [A(X ΩX) 1 A ] 1 (A β y 0 ), (2.6) where β = (X ΩX) 1 X Ωy, which is shown more generally by Ferguson (1958). The estimated covariance matrix of β is = (X Σ β ΩX) 1 (X ΩX) 1 A [A(X ΩX) 1 A ] 1 A(X ΩX) 1. The cell frequencies that follow the extended negative multinomial constraints are predicted by ŷ = X β. 6
7 3 Hypothesis Testing In this section, the test of the general linear hypothesis H 0 : Hβ = h versus its negation is derived, where H is a (n + r) q nown matrix of ran (n + r) and h is a nown (n + r) 1 vector. The Wald s statistics W = (H β h) (H(X ΩX) 1 )H ) 1 (H β h)/, where = s i=1 i is used to test these hypotheses. The asymptotic distribution of W is now derived. Theorem 1: Under the true model as described by (1.1), the assumptions g is a differentiable function from R to R, and sup E y Xβ 1+η < for some η > 0, where. represents the Euclidean norm, then W d χ 2 (n+r) as. Proof: Define, B (m) ji = 1, if m th unit drawn at random from i th population belongs to j th category, 0, otherwise, for m = 1,, i ; j = 1,, r; i = 1,, s, and D (m) ji = 1, if m th unit drawn at random from i th population belongs to j th category, 0, otherwise, for m = 1,, i ; j = 1,, n; i = 1,, s. 7
8 Then f (i) = (f i, f i ) = (f 1i, f 2i,, f ri, f 1i, f 2i,, f ni ) i = ( = = i = 1,, s. i B (m) 1i, m=1 m=1 i (B (m) 1i, B(m) m=1 i (C (m) m=1 i ), i B (m) 2i,, 2i,, B(m) m=1 ri, D(m) 1i i B (m) ri, m=1,, D (m) D (m) 1i,, ni ) i m=1 D (m) ni ) Therefore, C (m) i, j = r,, 1, 1,, n, follows an extended negative multinomial distribution with parameters (1, µ (i) / i ), where µ (i) / i = ( r j=1 p ji) 1 p i. Now, C (m) i are iid random vectors with mean µ (i) / i and covariance matrix Σ (i) f (1), for m = 1,, i. The central limit theorem (CLT) yields, ( 1 i ( ) (m) i i m=1 C i µ (i) / i ) d N ( 0, Σ (i) f (1)) as i, i = 1, 2,, s. This implies that 1 i (f (i) µ (i) ) d N ( 0, Σ (i) f (1)), as i, hence 1 (f µ) d N ( 0, Σ f), as, where i = w i is a nown constant and Σ f is the bloc diagonal matrix with blocs w i Σ (i) f (1), i = 1, 2,, s, along the diagonal of ran s(n + r). Therefore, 1 (y µ ) d N ( ) 0, Dµ(µ )Σ fdµ (µ ) (3.7) as, where µ is equal to g(µ), i.e., g applied to each component of µ and Dµ(µ ) is the differential of µ with respect to µ (Theorem A, Serfling, 2002, p. 122). Here, Ω 1 = Dµ(µ )Σ fdµ (µ ). Now consider β = (X ΩX) 1 X Ωy, as described in Section 2, which converges as follows: 1 ( β β) d N(0, (X ΩX) 1 ) as (Theorem A, Serfling, 2002, p.122). 8
9 Similarly, as, 1 (H β Hβ) d N(0, H(X ΩX) 1 H ), where H is a (n + r) q nown matrix with ran (n + r). Therefore, from the fact that x H(X ΩX) 1 H )x is a continuous function of x gives W = (H β h) (H(X ΩX) 1 )H ) 1 (H β h) d χ 2 (n+r) (Rao, 1973, p. 188). Hence the proof. To evaluate the goodness-of-fit of this model with linearly independent constraints, one can consider the statistics (y X β ) Ω(y X β )/. Similar to Bonnett (1989 and 1985a) and Haber et al. (1986) one can show that this statistic converges in distribution to the chi-square distribution with s(n + r) q degrees of freedom. This statement holds true under conditions of the aforementioned Theorem 1 and is similar to its proof as well as Bonnett (1985a, Section 4, page 128) or Bonnett (1989), which uses Haber et al. (1986). 4 An Example This section illustrates an example of an extended negative multinomial model to a given data set from s = 3 subpopulations. The application involves modeling of the incidence of several related diseases in different cities. There is no maximum lielihood estimate for the shape parameter in general. An estimate based on quantiles of Pearson s chi-squared statistics is applied to model the cancer incidence for three cities in Ohio. This approach is similar to those proposed by Williams (1982), Breslow (1984), and more recently, in Waller and Zelterman (1997) for estimating over-dispersion parameters. 9
10 4.1 Model of Cancer Incidence for Three Cities Suppose that the following table gives the cancer incidence for the three largest cities according to the site of primary cancer during a particular year. It is assumed that the overall disease incidence may be higher (lower) in one location than the other, but this increase (decrease) is not disease-specific that is, the relative frequencies of disease do not change across cities. Waller and Zelterman (1997) used the log-linear model with the negative multinomial distribution to fit the cancer deaths, during 1989, in the three largest Ohio cities. The structure of the data used in this example is similar to the one described in Waller and Zelterman (1997), but hypothetical numbers have been used to illustrate the proposed model. The sites of primary tumors are as follows: 1 = eye, 2 = oral cavity, 3 = gallbladder, 4 = lung, 5 = breast, 6 = genitals, 7 = urinary organs, 8 = leuemia and 9 = lymphatic tissues. Table 2: Cancer Deaths in the Three Different Cities City City City City Our objective is to fit the data with an appropriate model. Poisson models are appropriate when the sample mean and the sample variance are equal. Multinomial models can be used when the cell counts are negatively correlated and the negative multinomial models are used when the cell counts are 10
11 positively correlated. It can be observed from the above data that some of the frequency counts are positively correlated, while others are negatively correlated. In this situation, the extended negative multinomial model is expected to be the most appropriate one to fit the data due to its covariance structure (equations 1.2 to 1.4). Consider the log-linear model of means with no interaction between city and disease type as ln µ js = µ + α s city + β j disease, where α city s, s = 1, 2, 3, is the effect due to city and β disease j, j = 1, 2,, 9, is the effect due to disease. Note that each of these effects add up to zero. No interaction between city and disease type implies that an increase (decrease) in incidence of one disease is accompanied by a similar increase (decrease) in incidence of the same disease across all the other cities. It has been assumed that the state has a large amount of manufacturing and industry. So, if there is an environmental cause for high incidence of one type of cancer, this may translate into high incidence of another type of cancer. To describe the design matrix X for the above model let 0 be the 8 1 zero column vector, 1 represent the 8 1 vector with all one s, and I represent the 8 8 identity 11
12 matrix. Then, X is given by I I I Let f js denote the number of cancer deaths of site s in city j and use the extended negative multinomial distribution, since many of the counts have both positive and negative covariances. The test statistic χ 2 s( s ) will have the following form: χ 2 s( s ) = j (f js µ js ) 2 µ js, (4.8) where s is the shape parameter of the extended negative multinomial distribution. The estimated means µ js are the same as described in Section 2. The above test statistic not only measures fit, but also suggests a method of estimating s. In each cities case, solve for s, such that the median of the limiting chi-squared distribution in (4.8) with 8 d.f. is This method is similar to the one proposed by Waller and Zelterman, which is used to find values of shape parameter. Using these values of s, the model parameters are computed by maximizing the log-lielihood with penalized constraints using Newton-Raphson algorithm iteratively and by R program. By improving the estimate of s, the model will be improved. 12
13 Acnowledgments Professor Hira Lal Koul is my statistics guru. I wish to than him for all the nowledge he has given me. Bibliography [1] Bonett,D.G. (1985a). A linear negative multinomial model. Statist.& Probability Letters, 3: [2] Bonett, D.G. (1985b). The negative multinomial logit model. Commun. Statist.-Theory Methods, 3: [3] Bonett, D.G. (1989). Pearson chi-square estimator and test for log-linear models with expected frequencies subject to linear constraints. Statist.& Probability Letters, 8: [4] Breslow, N.E. (1984). Extra-Poisson variation in log-linear models. Applied Statistics, 33: [5] Dhar, S.K. (1995). Extension of a negative multinomial model. Commun. Statist.-Theory Methods, 24(1): [6] Ferguson, T.S. (1958). A method of generating best asymptotically normal estimates with application to the estimating of bacterial densities. Ann. Math. Stat., 33: [7] Grizzle, J.E., Starmer, C.F. and Koch, G.G.(1969). Analysis of categorical variables by linear models. Biometrics, 25: [8] Haber, M. and Brown, M.B.(1986). Maximum lielihood methods for log-linear models when expected frequencies are subject to linear constraints. J. Amer. Statist. Assoc., 81:
14 [9] Serfling, R.J. (2002). Approximation theorems of mathematical statistics. New Yor: Wiley. [10] Lahiri, S. and Dhar, S.K. (2008). Log-linear Modeling under Generalized Inverse Sampling Scheme. Communications in Statistics - Theory and Methods, 37(8): [11] Myers, R.H., Montgomery, D.C., Vining, G.G. and Robinson, T.J. (2002). Generalized linear models with application in engineering and the sciences, second edition New Yor: Wiley. [12] Rao, C.R. (1973). Linear statistical inference and its applications. second edition New Yor: Wiley. [13] Williams, D.A. (1982). Extra-binomial variation in logistic linear models. Applied Statistics, 31: [14] Waller, L.A. and Zelterman, D. (1997). Log-linear modeling with the negative multinomial distribution. Biometrics, 53:
Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence
Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Sunil Kumar Dhar Center for Applied Mathematics and Statistics, Department of Mathematical Sciences, New Jersey
More informationDecomposition of Parsimonious Independence Model Using Pearson, Kendall and Spearman s Correlations for Two-Way Contingency Tables
International Journal of Statistics and Probability; Vol. 7 No. 3; May 208 ISSN 927-7032 E-ISSN 927-7040 Published by Canadian Center of Science and Education Decomposition of Parsimonious Independence
More informationA Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46
A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response
More informationStatistics 3858 : Contingency Tables
Statistics 3858 : Contingency Tables 1 Introduction Before proceeding with this topic the student should review generalized likelihood ratios ΛX) for multinomial distributions, its relation to Pearson
More informationGeneralized Linear Models I
Statistics 203: Introduction to Regression and Analysis of Variance Generalized Linear Models I Jonathan Taylor - p. 1/16 Today s class Poisson regression. Residuals for diagnostics. Exponential families.
More informationModel Estimation Example
Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions
More informationChapter 22: Log-linear regression for Poisson counts
Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure
More informationParametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1
Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson
More informationAsymptotic Statistics-III. Changliang Zou
Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (
More informationStat 642, Lecture notes for 04/12/05 96
Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More information8 Nominal and Ordinal Logistic Regression
8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on
More informationGeneralized linear models
Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data
More informationRegression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.
Regression models Generalized linear models in R Dr Peter K Dunn http://www.usq.edu.au Department of Mathematics and Computing University of Southern Queensland ASC, July 00 The usual linear regression
More informationCentral Limit Theorem ( 5.3)
Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately
More informationPh.D. Qualifying Exam Friday Saturday, January 3 4, 2014
Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Put your solution to each problem on a separate sheet of paper. Problem 1. (5166) Assume that two random samples {x i } and {y i } are independently
More informationAnalysis of Microtubules using. for Growth Curve modeling.
Analysis of Microtubules using Growth Curve Modeling Md. Aleemuddin Siddiqi S. Rao Jammalamadaka Statistics and Applied Probability, University of California, Santa Barbara March 1, 2006 1 Introduction
More informationMARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES
REVSTAT Statistical Journal Volume 13, Number 3, November 2015, 233 243 MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES Authors: Serpil Aktas Department of
More informationStatistics in medicine
Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu
More informationApproximate Test for Comparing Parameters of Several Inverse Hypergeometric Distributions
Approximate Test for Comparing Parameters of Several Inverse Hypergeometric Distributions Lei Zhang 1, Hongmei Han 2, Dachuan Zhang 3, and William D. Johnson 2 1. Mississippi State Department of Health,
More informationGeneralized Linear Models 1
Generalized Linear Models 1 STA 2101/442: Fall 2012 1 See last slide for copyright information. 1 / 24 Suggested Reading: Davison s Statistical models Exponential families of distributions Sec. 5.2 Chapter
More informationST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples
ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will
More informationDA Freedman Notes on the MLE Fall 2003
DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar
More informationUNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator
UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population
More informationMultinomial Logistic Regression Models
Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word
More informationPoisson Regression. Ryan Godwin. ECON University of Manitoba
Poisson Regression Ryan Godwin ECON 7010 - University of Manitoba Abstract. These lecture notes introduce Maximum Likelihood Estimation (MLE) of a Poisson regression model. 1 Motivating the Poisson Regression
More informationAFT Models and Empirical Likelihood
AFT Models and Empirical Likelihood Mai Zhou Department of Statistics, University of Kentucky Collaborators: Gang Li (UCLA); A. Bathke; M. Kim (Kentucky) Accelerated Failure Time (AFT) models: Y = log(t
More informationReview: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:
Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic
More informationNormal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,
Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability
More informationAsymptotic Statistics-VI. Changliang Zou
Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous
More informationAnalysis of Survival Data Using Cox Model (Continuous Type)
Australian Journal of Basic and Alied Sciences, 7(0): 60-607, 03 ISSN 99-878 Analysis of Survival Data Using Cox Model (Continuous Type) Khawla Mustafa Sadiq Department of Mathematics, Education College,
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationLogistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression
Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024
More informationPerson-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data
Person-Time Data CF Jeff Lin, MD., PhD. Incidence 1. Cumulative incidence (incidence proportion) 2. Incidence density (incidence rate) December 14, 2005 c Jeff Lin, MD., PhD. c Jeff Lin, MD., PhD. Person-Time
More informationMeasure for No Three-Factor Interaction Model in Three-Way Contingency Tables
American Journal of Biostatistics (): 7-, 00 ISSN 948-9889 00 Science Publications Measure for No Three-Factor Interaction Model in Three-Way Contingency Tables Kouji Yamamoto, Kyoji Hori and Sadao Tomizawa
More informationMATH 644: Regression Analysis Methods
MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100
More informationModel Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction
Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction ReCap. Parts I IV. The General Linear Model Part V. The Generalized Linear Model 16 Introduction 16.1 Analysis
More informationAnalysis of data in square contingency tables
Analysis of data in square contingency tables Iva Pecáková Let s suppose two dependent samples: the response of the nth subject in the second sample relates to the response of the nth subject in the first
More informationSTAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis
STAT 6350 Analysis of Lifetime Data Failure-time Regression Analysis Explanatory Variables for Failure Times Usually explanatory variables explain/predict why some units fail quickly and some units survive
More informationGeneralized Linear Models (GLZ)
Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the
More informationANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS
Libraries 1997-9th Annual Conference Proceedings ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS Eleanor F. Allan Follow this and additional works at: http://newprairiepress.org/agstatconference
More informationThe Multinomial Model
The Multinomial Model STA 312: Fall 2012 Contents 1 Multinomial Coefficients 1 2 Multinomial Distribution 2 3 Estimation 4 4 Hypothesis tests 8 5 Power 17 1 Multinomial Coefficients Multinomial coefficient
More information3 Joint Distributions 71
2.2.3 The Normal Distribution 54 2.2.4 The Beta Density 58 2.3 Functions of a Random Variable 58 2.4 Concluding Remarks 64 2.5 Problems 64 3 Joint Distributions 71 3.1 Introduction 71 3.2 Discrete Random
More informationMath 152. Rumbos Fall Solutions to Exam #2
Math 152. Rumbos Fall 2009 1 Solutions to Exam #2 1. Define the following terms: (a) Significance level of a hypothesis test. Answer: The significance level, α, of a hypothesis test is the largest probability
More informationUnit 9: Inferences for Proportions and Count Data
Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)
More informationCohen s s Kappa and Log-linear Models
Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance
More informationIntroduction to Statistical Analysis
Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive
More informationSTA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3
STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae
More informationGeneralized Linear Models Introduction
Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,
More informationGoodness of Fit Test and Test of Independence by Entropy
Journal of Mathematical Extension Vol. 3, No. 2 (2009), 43-59 Goodness of Fit Test and Test of Independence by Entropy M. Sharifdoost Islamic Azad University Science & Research Branch, Tehran N. Nematollahi
More informationHypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations
Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Takeshi Emura and Hisayuki Tsukuma Abstract For testing the regression parameter in multivariate
More informationGeneralized Linear Models. Kurt Hornik
Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general
More information11 Hypothesis Testing
28 11 Hypothesis Testing 111 Introduction Suppose we want to test the hypothesis: H : A q p β p 1 q 1 In terms of the rows of A this can be written as a 1 a q β, ie a i β for each row of A (here a i denotes
More informationSIMULTANEOUS CONFIDENCE BANDS FOR THE PTH PERCENTILE AND THE MEAN LIFETIME IN EXPONENTIAL AND WEIBULL REGRESSION MODELS. Ping Sa and S.J.
SIMULTANEOUS CONFIDENCE BANDS FOR THE PTH PERCENTILE AND THE MEAN LIFETIME IN EXPONENTIAL AND WEIBULL REGRESSION MODELS " # Ping Sa and S.J. Lee " Dept. of Mathematics and Statistics, U. of North Florida,
More informationSome General Types of Tests
Some General Types of Tests We may not be able to find a UMP or UMPU test in a given situation. In that case, we may use test of some general class of tests that often have good asymptotic properties.
More informationClass Notes: Week 8. Probit versus Logit Link Functions and Count Data
Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While
More informationGeneralized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model
Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example
More informationModel Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)
Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV
More informationA Bivariate Weibull Regression Model
c Heldermann Verlag Economic Quality Control ISSN 0940-5151 Vol 20 (2005), No. 1, 1 A Bivariate Weibull Regression Model David D. Hanagal Abstract: In this paper, we propose a new bivariate Weibull regression
More informationUnit 9: Inferences for Proportions and Count Data
Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 12/15/2008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)
More informationTUTORIAL 8 SOLUTIONS #
TUTORIAL 8 SOLUTIONS #9.11.21 Suppose that a single observation X is taken from a uniform density on [0,θ], and consider testing H 0 : θ = 1 versus H 1 : θ =2. (a) Find a test that has significance level
More informationTesting and Model Selection
Testing and Model Selection This is another digression on general statistics: see PE App C.8.4. The EViews output for least squares, probit and logit includes some statistics relevant to testing hypotheses
More informationDeccan Education Society s FERGUSSON COLLEGE, PUNE (AUTONOMOUS) SYLLABUS UNDER AUTOMONY. SECOND YEAR B.Sc. SEMESTER - III
Deccan Education Society s FERGUSSON COLLEGE, PUNE (AUTONOMOUS) SYLLABUS UNDER AUTOMONY SECOND YEAR B.Sc. SEMESTER - III SYLLABUS FOR S. Y. B. Sc. STATISTICS Academic Year 07-8 S.Y. B.Sc. (Statistics)
More informationThe GENMOD Procedure. Overview. Getting Started. Syntax. Details. Examples. References. SAS/STAT User's Guide. Book Contents Previous Next
Book Contents Previous Next SAS/STAT User's Guide Overview Getting Started Syntax Details Examples References Book Contents Previous Next Top http://v8doc.sas.com/sashtml/stat/chap29/index.htm29/10/2004
More informationMSH3 Generalized linear model
Contents MSH3 Generalized linear model 7 Log-Linear Model 231 7.1 Equivalence between GOF measures........... 231 7.2 Sampling distribution................... 234 7.3 Interpreting Log-Linear models..............
More informationDiscrete Multivariate Statistics
Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationStatistics. Statistics
The main aims of statistics 1 1 Choosing a model 2 Estimating its parameter(s) 1 point estimates 2 interval estimates 3 Testing hypotheses Distributions used in statistics: χ 2 n-distribution 2 Let X 1,
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationLinear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52
Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationTA: Sheng Zhgang (Th 1:20) / 342 (W 1:20) / 343 (W 2:25) / 344 (W 12:05) Haoyang Fan (W 1:20) / 346 (Th 12:05) FINAL EXAM
STAT 301, Fall 2011 Name Lec 4: Ismor Fischer Discussion Section: Please circle one! TA: Sheng Zhgang... 341 (Th 1:20) / 342 (W 1:20) / 343 (W 2:25) / 344 (W 12:05) Haoyang Fan... 345 (W 1:20) / 346 (Th
More informationLet us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided
Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or
More informationGMM Estimation of a Maximum Entropy Distribution with Interval Data
GMM Estimation of a Maximum Entropy Distribution with Interval Data Ximing Wu and Jeffrey M. Perloff January, 2005 Abstract We develop a GMM estimator for the distribution of a variable where summary statistics
More informationLecture 01: Introduction
Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction
More informationMultilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2
Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do
More informationBasic Medical Statistics Course
Basic Medical Statistics Course S7 Logistic Regression November 2015 Wilma Heemsbergen w.heemsbergen@nki.nl Logistic Regression The concept of a relationship between the distribution of a dependent variable
More informationGeneralized, Linear, and Mixed Models
Generalized, Linear, and Mixed Models CHARLES E. McCULLOCH SHAYLER.SEARLE Departments of Statistical Science and Biometrics Cornell University A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS, INC. New
More informationGeneralized Linear Models
Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n
More informationFundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur
Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new
More informationGeneralized logit models for nominal multinomial responses. Local odds ratios
Generalized logit models for nominal multinomial responses Categorical Data Analysis, Summer 2015 1/17 Local odds ratios Y 1 2 3 4 1 π 11 π 12 π 13 π 14 π 1+ X 2 π 21 π 22 π 23 π 24 π 2+ 3 π 31 π 32 π
More informationGeneralized Linear Models
York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear
More informationChapter 5 Matrix Approach to Simple Linear Regression
STAT 525 SPRING 2018 Chapter 5 Matrix Approach to Simple Linear Regression Professor Min Zhang Matrix Collection of elements arranged in rows and columns Elements will be numbers or symbols For example:
More informationLarge Sample Properties of Estimators in the Classical Linear Regression Model
Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in
More informationExam Applied Statistical Regression. Good Luck!
Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.
More information2 Describing Contingency Tables
2 Describing Contingency Tables I. Probability structure of a 2-way contingency table I.1 Contingency Tables X, Y : cat. var. Y usually random (except in a case-control study), response; X can be random
More informationSTAC51: Categorical data Analysis
STAC51: Categorical data Analysis Mahinda Samarakoon April 6, 2016 Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 25 Table of contents 1 Building and applying logistic regression models (Chap
More information36-463/663: Multilevel & Hierarchical Models
36-463/663: Multilevel & Hierarchical Models (P)review: in-class midterm Brian Junker 132E Baker Hall brian@stat.cmu.edu 1 In-class midterm Closed book, closed notes, closed electronics (otherwise I have
More informationJournal of Forensic & Investigative Accounting Volume 10: Issue 2, Special Issue 2018
Detecting Fraud in Non-Benford Transaction Files William R. Fairweather* Introduction An auditor faced with the need to evaluate a number of files for potential fraud might wish to focus his or her efforts,
More informationKRUSKAL-WALLIS ONE-WAY ANALYSIS OF VARIANCE BASED ON LINEAR PLACEMENTS
Bull. Korean Math. Soc. 5 (24), No. 3, pp. 7 76 http://dx.doi.org/34/bkms.24.5.3.7 KRUSKAL-WALLIS ONE-WAY ANALYSIS OF VARIANCE BASED ON LINEAR PLACEMENTS Yicheng Hong and Sungchul Lee Abstract. The limiting
More informationIntroduction to the Logistic Regression Model
CHAPTER 1 Introduction to the Logistic Regression Model 1.1 INTRODUCTION Regression methods have become an integral component of any data analysis concerned with describing the relationship between a response
More information1/15. Over or under dispersion Problem
1/15 Over or under dispersion Problem 2/15 Example 1: dogs and owners data set In the dogs and owners example, we had some concerns about the dependence among the measurements from each individual. Let
More informationECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria
ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria SOLUTION TO FINAL EXAM Friday, April 12, 2013. From 9:00-12:00 (3 hours) INSTRUCTIONS:
More informationLogistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University
Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction
More informationUsing the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes 1
Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, Discrete Changes 1 JunXuJ.ScottLong Indiana University 2005-02-03 1 General Formula The delta method is a general
More informationThe goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.
The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions. A common problem of this type is concerned with determining
More informationPseudo-score confidence intervals for parameters in discrete statistical models
Biometrika Advance Access published January 14, 2010 Biometrika (2009), pp. 1 8 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asp074 Pseudo-score confidence intervals for parameters
More information