+ 020) + u3(k) + u12(ij) + u23(jk),

Size: px
Start display at page:

Download "+ 020) + u3(k) + u12(ij) + u23(jk),"

Transcription

1 MISCLASSIFICATION PROBLEM AND ITS RELATION TO THE CONTINGENCY TABLE WITH SUPPLEMENTAL MARGINS T. Timothy Chen, The Upjohn Company 1. Introduction. In many studies, data may have errors. could happen if we -use fallible and inexpensive, rather than exact and expensive, devices to measure some variables. For example, in epidemiological studies, data are usually collected from an inexpensive interview instead of,physicians' examination or laboratory chemical tests. If the data are categorical, this problem is called the misclassification problem. Suppose we are interested in one variable only, which has r possible categories; due to using a fallible and inexpensive device, we observe a different variable with same r categories. Let us use a two -dimensional r x r contingency table to represent the situation, the first dimension is the fallible classification and the second dimension is the correct or true classification; let the probability of any observation having (i,j) as its fallible and correct classification be and = 1. The elements } of misclassification matrix A is defined as ai,j =., which is the conditional proba- +3 bility of any observation having í as the fallible classification given that it has j as the true classification. If a. 's do not depend on j, then we have a random misclassification and = Now instead of just one variable, we are interested in the interrelationship between two variables, where the first variable X has r possible categories and is subjected to misclassification, and the second variable Y has t possible categories and can be easily determined without error. Let us use a three -dimensional r x r x t contingency table to describe the situation; the first and the second dimensions represent the fallible and the correct classifications of the variable X, and the third dimension represents the variable Y. The misclassification matrix A is a r by rt matrix with elements {ai,jk }, where a i,jk which is the conditional probability /n +jk' of any observation having i as the fallible X value, given j,k are the true X and Y values. If ai,jk's do not depend on k, then the misclassifi- cation is the same for any Y value and we have nijk +.n (1.1) which is the model of conditional independence of the first and the third dimensions in each layer of the second dimension. Let us denote this model by H(12,23), where the 12- marginal and the 23- marginal counts are the complete minimal sufficient statistics under the Poisson or multinomial sampling schemes (see Bishop, Fienberg, and Holland (1975) and Haberman (1974)). From equation (1.1), we can see that independence on the 23- margin implies independence on the 13- margin, but not vice versa unless the matrix A has r as its rank. Diamond and Lilienfeld (1962), Newill (1962), and Rogot (1961) considered the above model in the case r = t = 2 and they showed that and +11 v+12 n n1+1 1'1+2 r1+1n 2+2 r1+2r2+1 (1.2) (1.3) In epidemiology, if Y represents two different populations, and X represents having disease or not, then the above two equations say that the true risk difference is greater than the fallible or stated risk difference, and the true approximate relative risk (true odds ratio or its inverse whichever is greater than 1) is greater than the fallible or stated approximate relative risk. But these will not be true with probability one when we substitute the population by the ob- served proportions. Equations (1.2) and (1.3) can be explained intuitively by the log- linear repre sentation of the model log nijk u ) + u3(k) + u12(ij) + u23(jk), (1.4) where we see no u 13 -terms; hence, the risk differ- ence and the approximate relative risk on the 13- margin are smaller than those of the 23 margin. Since it is very expensive to observe the true X values, we usually only collect the fallible X and the true Y values; i.e., we only observe the 13- margin of a three -dimensional contingency table. Bross (1954), Rubin, Rosenbaum, and Cobb (1956), and Mote and Anderson (1962) discussed about the inference of the relationship between true X and true Y (23- margin) in this situation. They concluded that the usual chi - square test of independence or homogeneity on the observed 13- margin is a correct a -level test with less power for the independence or homogeneity on the unobserved 23- margin, provided that the model H(12,23) is true and the misclassifica- tion matrix A has r as its rank. Now let us discuss the situation where the variable Y is also subjected to misclassification. 765

2 Let the fallible and the true X be the first and the third dimensions, the fallible and the true Y be the second and the fourth dimensions of a four- dimensional contingency table. The misclassification matrix A is a rt by rt matrix with each element a = /n If the ele- ij,kl ijkl ment of the matrix A, aij,kl is a product of two probabilities: one is the probability of any observation being fallibly classified into i on X variable given that its true X is k, and the other is the probability of any observation being fallibly classified into j on Y variable given that its true Y is 1, then we have a model of independent misclassification and = (i+k+ +j+1 ++k+ (1.5) From the above equation, it's clear that independence on the 34- margin implies independence on the 12- margin, but not vice versa unless the matrix A is non -singular. Also, if we only look at the marginal table, then the misclassfication matrix is independent of the variable Y. For the 234- marginal table, the misclassification matrix is independent of the variable X. We denote this model by H(13,24,34)and we have a log- linear rep - prsentation: log r u + ul(i) + u2(1) + u3(k) (1.6) + u4(1) + u13(ik) (11) where we don't have u12- terms. Keys and Kihlberg (1963) and Gullen, Bearman, and Johnson (1968) discussed the above model in the case r =t =2 and they showed that r It can also be shown that n (1.7) (1.8) If Y represents two populations, and X represents having disease or not, above equations say that the risk difference and the approximate relative risk on 12- margin (fallible X and Y) are smaller than those of 34- margin (true X and Y), which can be explained intuitively by equation (1.6). When we observe only the fallible classifications for both variables, under assumptions of independent misclassification and non -singularity of the matrix A, Àssakul and Proctor (1967) showed that the usual chi -square test of independence on the observed 12- margin would give us a correct a -level test of independence on the unobserved 34- margin, but misclassification reduced the power of this test comparing to the direct test on the 34- margin. In case of non -independent errors they showed that the test on the 12- margin, in general, would have a larger type I error for the independence hypothesis on the 34- margin. Above discussion shows that log- linear models provide a class of models which give meaningful interpretation for the misclassification matrix, and under some models the test on the observed fallible data provide a correct test for the unobserved true data. But unless from past experience or from examination of some data which have both the fallible and the true values, we are not sure about the applicability of a particular model for the misclassification matrix. Therefore, besides observing the inexpensive fallible data, we should also collect both fallible and true data on some observations. This is the double sampling scheme proposed recently by Tenenbein (1970, 1971, 1972) and Chiaccheierini and Arnold (1977); the data collected can be presented as a full contingency table of both fallible and true data with a supplemental lower dimensional margin of fallible data. 2. Double Sampling Scheme. In this section we will discuss how to analyze categorical data with misclassification and double sampling. The detail of analysis will be shown for a three - dimensional contingency table with the first and the second dimensions representing the fallible and the true X, and the third dimensions representing the true Y variable. Suppose we observe n subjects with all three dimensions, and N -n subjects for the first and the third dimensions; the observed counts in the full table are denoted the observed counts in the supplemental by xijkand 13- margin are denoted by Vik (where = n, and Vik N -n). We assume all xijk are greater than zero for simplicity. The main inference is about the independence of the true X and the true Y variables, but specifying a correct structure of misclassification may give us a better power for the test. The structures of misclassification we want to investigate are those log- linear models having u23 -terms like H(123), H(12,13,23)' and H(1,23). The first H(1223)* H(13,23)ß model can be expressed as H(123) log nijk u + + u2(j) + u3(k) u12(ij) + u13(ik) + u 23(jk) + u123(ijk) ' (2.1) 766

3 with each set of subscripted u -terms adds to zero when summed over any subscript.. This is the unrestricted (saturated)..modal. where we have no restriction on The. second model is a model of no second =order interaction with u123(ijk) 0 in (2.1). The third and the fourth models are models of conditional independence as explained in section 1. The fifth model is a model of independence between the first dimension and the other two dimensions, which is equivalent to (2.1) with u12(ij), all set to zero. u13(ik)' u123(ijk) Since we have double sampling data, the expected counts for xijk and Vik are nirijk and (N- respectively. Under the unrestricted +k model H(123), we have the following ML equations: Nrijk = + Vik rijk /ri +k, Vi,j,k, (2.2) where the right hand side is the observed count in the cell (i,j,k) plus a proportional allocation of supplemental marginal count to that cell based on the MLE's {rijk }. interaction model, are given by: For the no second -order the ML equations Nrij+ xij+ + Vik rijk +k' Vi,j, (2.3) Ni+k xi+k + Vik' Vi,k, (2.4) and Nir+jk = x+jk + Vikrijk/ri+k' Vj.k. (2.5) Next for the model the ML equations are H(12,23) given by equations (2.3) and (2.5). For the model the ML equations are given by equation H(1,23)' (2.5) and A = xi++ + Vi In general the ML equations will correspond to the highest order subscripted u -terms in the model. We can use an iterative procedure such as the one described below to get a numerical solution to the ML equations. The iterative procedure we propose is an extension of the iterative proportional fitting used by Bishop et al (1975), Goodman (1970) and Haberman (1974). For all modela, we take. the same initial value: 1 /r2t for all i,j,k. For a given log - (2.6) linear model each cycle consists of a set of pairs of steps, each pair corresponding to one of the sets of ML equations for the model. For example, for the model H(12,13,23)' each cycle of the :iterative procedure consists of the following six steps: (v+1) (v) (v +1)/ (v) ijk rijk ij+ ' (v+2) i +k \xi +k Vik N' r(v+2) r(v+l) ijk ijk i +k i +k ' ik r(v+3) V r(v+2)i (v+2)1/n +jk +jk ik i jk i +k / r(v+3) r(v+2) r(v+3)ir(v+2) ijk ijk +jk + jk' yi,j, (2.7) Vi,j,k, (2.8) Vi,k, (2.9) vi,j,k (2.10) Vj'(2.11) Vi,j,k. (2.12) For the model H(12,23)' each cycle of the itera- tive procedure consists of the four steps given by equations (2.7), (2.8), (2.11) and (2.12) with r(v+2) ijk ijk' Once we have the MLE's for cell probabilities, we can compute either the Pearson or likelihood ratio statistics to test the goodness -offit of the model: x2 and (x3k - (Vik nri (N-n) ri+k G2 = 2EEE xijk log Xiik + 2EE Vik log nrijk (N- V (2.13) +k (2.14) with appropriate degrees of freedom. For the model H have estimated u u u (12,13,23) 1, 1 2, 3, terms; hence, we have (r2t-1) + u12, u13, u23 ( rt- 1)- 2(r- 1)- (t- 1)- 2(r- 1)(t- 1)d.f. for the tests. We will first fit the model H(123) to the data {xijk to find out whether they are }, {Vik consistent to each other, i.e., whether {xijk} and {Vik} are both random samples from the same target population. After showing this model fits the. data, we can fit the next simple model We.can examine both unconditional test and conditional test (which is the difference between two unconditional tests) statistics 767

4 to decide whether to accept this model. We can proceed like this to choose a most appropriate and simplest model to describe the data. The general step -wise procedure of fitting models for a contingency table has been described in Goodman (1971). After a final model for the full table which still has u23 -terms has been chosen, i.e., we have chosen a model for the misclassification matrix, we can now test the independence (or homogeneity) of true X and true Y in the 23- margin (H*(2,3)). We will again obtain the MLE's } under a particular model for the ijk full table plus the model H *(2,3). Under the model H(123) and H *(2,3), we have the following ML equations: N E H *(xijk)- xijk + +k (2.15) where E (xijk [i,k(xijk + Vik Aijk[i(xijk + Vik +k)] +k)]} which is the adjustment of ñijk by the independ- ence hypothesis on the 23- margin. Under the model and *(2,3), the ML equations are H(12,13,23) given by (7.3), (2.4), and (2.5) with the left hand sides substituted by N E EH...etc. These three ML equations can be solved by the following iterative procedure with n(o) ijk = *(0) = 1/r2t, /r2ti ijk Vi,j,k, then (2.16) *(v+1) (x + E V rr(v) ij+ ij+ ík ijk / +k) / N, Vi,j, (2.17) 7*(v +1)= *(v +1) /7*(v) vi,j,k (2.18) ijk ijk ij+ ij+ ' rr(v+l) Tr*(v+l) *(v +l) *(v +l) (v +1) ijk ijk +j+ ++k/ +jk Vi,j,k, (2.19) and the other six steps are just similar modifications of (2.9), (2.10), (2.11), and (2.12) into procedures like (2.17), (2.18), and (2.19). The rationale behind the whole procedure is that we first obtain *(v) in the parameter space speci- fled by the model for the full table, then we adjust *(v) to which is in the intersec- tion of the above space and the space specified by H *(2,3). The convergence can be achieved if there is no empty cell in the full table, since the likelihood function is concave and bounded above. Once the MLE's are obtained, we can ijk test the goodness -of-fit of the model by computing either the Pearson or likelihood ratio statistics as (2.13) and (2.14). For the model *, since we have 23- marginal H(12,13,23) H (2,3) independence constraints on those u- terms, we reduce the number of free u -terms by (r- 1)(t -1), so we have (r2t-1)+(rt-l)-2(r-l)-(t-l)- (r- l)2- (r- 1)(t -l) d.f. for the tests. We will decide whether H *(2,3) is true or not condition- ing upon a particular model for the misclassfication matrix. The value of this conditional test statistic does depend on the model we've specified for the full table. It should be noted here, the model H(12,23) H *(2,3) is equivalent to H (12,3), similarily H *(2,3) is H(13,2) and H(1,23) H(13,23) H *(2,3) is H which is mutual independence of (1,2,3), three dimensions. The model H(12,13) does not have u23-terms, hence the ML equations for H (12,13) (2,3) are not the type specified in (2.15) and (2.16). The method described above can be extended easily to higher dimensional table with many variables subjected to misclassification. We will first build log- linear models for the full table (including both fallible and true classifications), which have u -terms corresponding to the lower dimensional margin of true classifications. The method for this was explained in detail in Chen (1972). After a model is finally chosen for the full table, we can then build log- linear models for the lower dimensional, margin of true classifications using similar procedure as explained in this paper. The iterative procedures proposed herein are examples of the generalized EM algorithm given in Dempster, Laird, and Rubin (1977). A computer program, which is an extension of Haberman (1972), has been written according to the method in this paper to give MLE's of cell 768

5 probabilities and counts under different models and produce both goodness -of -fit statistics with appropriate degrees of freedom. It is available to any interested person upon request. Tenenbein (1970, 1971, 1972) first proposed using a double sampling scheme to make inference about categorical data with misclassification. He only discussed the estimation problem in one variable case without any assumption on the misclassification matrix. The estimates he obtained are similar to those obtained in Chen and Fienberg (1974). He derived formula to determine the optimum double sampling ratio (n /N) so that the variances of estimates are smallest; his formulas may be used in our model building problem. Chiacchierini and Arnold (1977) discussed a test of independence for the two variable case with r =t =2, which is our conditional test of H *(3,4) given that H(1234) is true. 3. An Example. Cobb and Rosenbaum (1956) reported an arthritis study in the Arsenal Health District of Pittsburgh. A household morbidity survey was conducted in July, 1952, using a random sample of 3,000 households. All the persons over 14 years old in these households were classified into three strata, based on the information regarding rheumatism and arthritis obtained by non - medical interviewers: Stratum 1 consisted of individuals who were recorded as having arthritis or rheumatism; Stratum 2 consisted of individuals who were recorded free of arthritis or rheumatis, but were reported to have some rheumatic symptoms; Stratum 3 was made up of the remainder who were not recorded as suffering from rheumatis, arthritis, or related manifestations. A random sample of persons was selected for each sex separately and within each strata. The sampling rate was 60% for males and 30% for females in the Strata 1 and 2, 7% for both males and females in Stratum 3; this resulted in a total sample of 798 persons. Each person thus sampled was visited in his home by a non - medical interviewer equipped with the detailed arthritis questionnaire, and the individuals who were interviewed were urged to have an examiniation by physicians in the arthritis clinic. Some persons refused the interview, or were unavailable for interview, and some did not return to the clinic for examination. The final data included 478 people with both the interview and the examination. The data about whether the person had joint pain is given in Table 1. The two "unknown" rows were not reported in Cobb and Rosenbaum (1956); instead, they are generated artificially as supplemental data to demonstrate the methodology. Let the first dimension be the interview result, the second dimension be the physician's history, and the third and fourth dimensions be the strata and the sex. 1. Number of Persons Having Joint Pain by Sex and Stratum as Obtained by Physicians vs by Non - Medical Interviewers Interview Result Physician's Yes No Examination Stratum Stratum a.males Yes No Unknown Yes No Unknown b. Females We first fit the model H(1234) just to see whether the supplemental data are consistent with the data in the full 2 x 2 x 3 x 2 table. This model fits the data very well with X2 = 4.19 and G2 = 4.20, 11 d.f. We then try to fit the models which will give us nice interpretations for the misclassification matrix. Among the models H(123,234)' H the model (124,234) (134,234). fits the data well with X and H(123,234) G2 = 11.87, 17 d.f. When we try to fit simpler models which have the misclassification probabilities in explicit formula of the marginal probabilities, H(12,234) and H(13,234)' both fail to fit the data. We then try to tit the model and this model fits well with H(12,13,234)' X and G , 19 d.f.; therefore, we will use it to interpret the misclassification matrix. Under this model we have or ijkl ijk+7 +jkl/+jk+' V i,j,k,l, (3.1) 1Tijk+/7+jk+, (3.2) Hence, the misclassification matrix are uniform over sex, and only dependent on strata. Now we try to investigate relationship among the margin of true joint pain, strata, and sex, given that the model H true; (12,13,234) it turns out that the simplest model, which still has good fit, is H(12,13,234) with H *(23,4) X , G2 = 16.14, 24 d.f. But, since we have the fixed sex by strata margin (34- margin) originally, we have to settle on the model H(12,13,234) as the final model: the joint pain and H *(23,34) the sex are conditionally independent given the 769

6 strata. The conclusion is that the prevalence rate of joint pain is not a function of sex, but only a function of strata. The estimates of proportions of classification errors, and the estimates of prevalence rates for joint pain under the final model H H* are given (12,13,234) (23,24) g in Table 2 by stratum. 2. The Estimates of Proportion of Classification Errors and the Estimates of Prevalence for Joint Pain by Stratum Under the Model H H* (12,13,234) (23,24) False Negatives False Positives Physicians' Interviewers' Stratum a. Classification Errors b. Prevalence Estimates Dempster, A.P., Laird, N.W., and Rubin, D.B. (1977), "Maximum Likelihood from Incomplete Data via the EM Algorithm," J. R. Statist. Soc., Ser. B., 39, Diamond, E.L., and Lilienfeld, A.M. (1962), "Effects of Errors in Classification and Diagnosis in Various Types of Epidemiological Studies," Amer. J. Public Health, 52, Goodman, L.A. (1970), "The Multivariate Analysis of Qualitative Data: Interactions Among Multiple Classifications," J. Amer. Statist. Assoc. 65, Goodman, L.A. (1971), "The Analysis of Multidimensional Contingency Tables: Stepwise Procedures and Direct Estimation Methods for Building Models for Multiple Classification," Technometrics 13, Haberman, S.J. (1972), "Log-Linear Fit for Contingency Tables," Applied Statis., 21, REFERENCES Assakul, K., and Proctor, C. H. (1967), "Testing Independence in Two -Way Contingency Tables with Data Subject to Misclassification," Psychometrika, 32, Bishop, Y.M.M., Fienberg, S.E., and Holland, P.W. (1975), Discrete Multivariate Analysis: Theory and Practice, Cambridge, Massachusetts: MIT Press. Bross, I. (1954), "Misclassification in 2 x 2 Tables," Biometrics, 10, Chen, T.T. (1972), "Mixed -up Frequencies and Missing Data in Contingency Tables," Unpublished Ph.D. Dissertation, Dept. of Statistics, Univ. of Chicago , and Fienberg, S.E. (1974), "Two -Dimensional Contingency Tables with Both Completely and Partially Cross -Classified Data," Biometrics, 30, Chiacchierini, R.P., and Arnold, J.C. (1977), "A Two -Sample Test for Independence in 2x 2 Contingency Tables with Both Margins Subject to Misclassification," J. Amer Statist. Assoc. 72, Cobb, S., and Rosenbaum, J. (1956), "A Comparison of Specific Symptom Data Obtained by Nonmedical Interviewers and by Physicians," J. Chronic Diseases, 4, Haberman, S.J. (1974), The Analysis of Frequency Data, Univ. of Chicago Press, Chicago. Keys, A., and Kihlberg, J.K. (1963), "Effects of Misclassification on Estimated Relative Prevalence of a Characteristic," Amer. J. Public Health, 53, Mote, V.L., and Anderson, R.L. (1962), "An Investigation of the Effect of Misclassification on the Properties of X2 -Test in the Analysis of Categorical Data," Biometrika, 52, Newell, D.J. (1962), "Errors in the Interpretation of Errors in Epidemiology," Amer. J. Public Health, 52, Rogot, E. (1961), "A Note on Measurement Errors and Detecting Real Differences," J. Amer. Statist. Assoc., 56, Rubin, T., Rosenbaum, J., and Cobb, S. (1956), "The Use of Interview Data for the Detection of Associations in Field Studies," J. Chronic Diseases, 4, Tenenbein, A. (1970), "A Double Sampling Scheme for Estimating from Binomial Data with Misclassification," J. Amer. Statist. Assoc., 65, (1971), "A Double Sampling Scheme for Estimating From Binomial Data With Misclassification: Sample Size Determination," Biometrics, 27, (1972), "A Double Sampling Scheme for Estimating From Misclassified Multinomial Data With Application to Sampling Inspection," Technometrics, 14,

Describing Contingency tables

Describing Contingency tables Today s topics: Describing Contingency tables 1. Probability structure for contingency tables (distributions, sensitivity/specificity, sampling schemes). 2. Comparing two proportions (relative risk, odds

More information

ij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as

ij i j m ij n ij m ij n i j Suppose we denote the row variable by X and the column variable by Y ; We can then re-write the above expression as page1 Loglinear Models Loglinear models are a way to describe association and interaction patterns among categorical variables. They are commonly used to model cell counts in contingency tables. These

More information

Log-linear multidimensional Rasch model for capture-recapture

Log-linear multidimensional Rasch model for capture-recapture Log-linear multidimensional Rasch model for capture-recapture Elvira Pelle, University of Milano-Bicocca, e.pelle@campus.unimib.it David J. Hessen, Utrecht University, D.J.Hessen@uu.nl Peter G.M. Van der

More information

Misclassification in Logistic Regression with Discrete Covariates

Misclassification in Logistic Regression with Discrete Covariates Biometrical Journal 45 (2003) 5, 541 553 Misclassification in Logistic Regression with Discrete Covariates Ori Davidov*, David Faraggi and Benjamin Reiser Department of Statistics, University of Haifa,

More information

Lecture 8: Summary Measures

Lecture 8: Summary Measures Lecture 8: Summary Measures Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 8:

More information

ON INFERENCE FROM GENERAL CATEGORICAL DATA WITH MISCLASSIFICATION ERRORS BASED ON DOUBLE SAMPLING SCHEMES. Yosef Hochberg

ON INFERENCE FROM GENERAL CATEGORICAL DATA WITH MISCLASSIFICATION ERRORS BASED ON DOUBLE SAMPLING SCHEMES. Yosef Hochberg ~.e ON INFERENCE FROM GENERAL CATEGORICAL DATA WITH MISCLASSIFICATION ERRORS BASED ON DOUBLE SAMPLING SCHEMES by Yosef Hochberg Department of Bios~atistics University of North Carolina at Chapel Hill Institute

More information

Correspondence Analysis

Correspondence Analysis Correspondence Analysis Q: when independence of a 2-way contingency table is rejected, how to know where the dependence is coming from? The interaction terms in a GLM contain dependence information; however,

More information

Statistical Process Control for Multivariate Categorical Processes

Statistical Process Control for Multivariate Categorical Processes Statistical Process Control for Multivariate Categorical Processes Fugee Tsung The Hong Kong University of Science and Technology Fugee Tsung 1/27 Introduction Typical Control Charts Univariate continuous

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

Measure for No Three-Factor Interaction Model in Three-Way Contingency Tables

Measure for No Three-Factor Interaction Model in Three-Way Contingency Tables American Journal of Biostatistics (): 7-, 00 ISSN 948-9889 00 Science Publications Measure for No Three-Factor Interaction Model in Three-Way Contingency Tables Kouji Yamamoto, Kyoji Hori and Sadao Tomizawa

More information

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878 Contingency Tables I. Definition & Examples. A) Contingency tables are tables where we are looking at two (or more - but we won t cover three or more way tables, it s way too complicated) factors, each

More information

Testing Independence

Testing Independence Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1

More information

Chapter 2: Describing Contingency Tables - I

Chapter 2: Describing Contingency Tables - I : Describing Contingency Tables - I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu]

More information

What is Latent Class Analysis. Tarani Chandola

What is Latent Class Analysis. Tarani Chandola What is Latent Class Analysis Tarani Chandola methods@manchester Many names similar methods (Finite) Mixture Modeling Latent Class Analysis Latent Profile Analysis Latent class analysis (LCA) LCA is a

More information

Discrete Multivariate Statistics

Discrete Multivariate Statistics Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are

More information

METHODS & DESIGNS. Exact procedures for the analysis of multidimensional contingency tables =1. 1. ~ij~~nijk'

METHODS & DESIGNS. Exact procedures for the analysis of multidimensional contingency tables =1. 1. ~ij~~nijk' METHODS & DESIGNS Exact procedures for the analysis of multidimensional contingency tables JULIET POPPER SHAFFER University of Kansas, Lawrence, Kansas 66044 The log-linear model for contingency tables

More information

Statistics 3858 : Contingency Tables

Statistics 3858 : Contingency Tables Statistics 3858 : Contingency Tables 1 Introduction Before proceeding with this topic the student should review generalized likelihood ratios ΛX) for multinomial distributions, its relation to Pearson

More information

STAC51: Categorical data Analysis

STAC51: Categorical data Analysis STAC51: Categorical data Analysis Mahinda Samarakoon January 26, 2016 Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 32 Table of contents Contingency Tables 1 Contingency Tables Mahinda Samarakoon

More information

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES REVSTAT Statistical Journal Volume 13, Number 3, November 2015, 233 243 MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES Authors: Serpil Aktas Department of

More information

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels. Contingency Tables Definition & Examples. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels. (Using more than two factors gets complicated,

More information

Various Issues in Fitting Contingency Tables

Various Issues in Fitting Contingency Tables Various Issues in Fitting Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Complete Tables with Zero Entries In contingency tables, it is possible to have zero entries in a

More information

MSH3 Generalized linear model

MSH3 Generalized linear model Contents MSH3 Generalized linear model 7 Log-Linear Model 231 7.1 Equivalence between GOF measures........... 231 7.2 Sampling distribution................... 234 7.3 Interpreting Log-Linear models..............

More information

Means or "expected" counts: j = 1 j = 2 i = 1 m11 m12 i = 2 m21 m22 True proportions: The odds that a sampled unit is in category 1 for variable 1 giv

Means or expected counts: j = 1 j = 2 i = 1 m11 m12 i = 2 m21 m22 True proportions: The odds that a sampled unit is in category 1 for variable 1 giv Measures of Association References: ffl ffl ffl Summarize strength of associations Quantify relative risk Types of measures odds ratio correlation Pearson statistic ediction concordance/discordance Goodman,

More information

A class of latent marginal models for capture-recapture data with continuous covariates

A class of latent marginal models for capture-recapture data with continuous covariates A class of latent marginal models for capture-recapture data with continuous covariates F Bartolucci A Forcina Università di Urbino Università di Perugia FrancescoBartolucci@uniurbit forcina@statunipgit

More information

Contingency Tables Part One 1

Contingency Tables Part One 1 Contingency Tables Part One 1 STA 312: Fall 2012 1 See last slide for copyright information. 1 / 32 Suggested Reading: Chapter 2 Read Sections 2.1-2.4 You are not responsible for Section 2.5 2 / 32 Overview

More information

The concord Package. August 20, 2006

The concord Package. August 20, 2006 The concord Package August 20, 2006 Version 1.4-6 Date 2006-08-15 Title Concordance and reliability Author , Ian Fellows Maintainer Measures

More information

Three-Way Tables (continued):

Three-Way Tables (continued): STAT5602 Categorical Data Analysis Mills 2015 page 110 Three-Way Tables (continued) Now let us look back over the br preference example. We have fitted the following loglinear models 1.MODELX,Y,Z logm

More information

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data Person-Time Data CF Jeff Lin, MD., PhD. Incidence 1. Cumulative incidence (incidence proportion) 2. Incidence density (incidence rate) December 14, 2005 c Jeff Lin, MD., PhD. c Jeff Lin, MD., PhD. Person-Time

More information

2 Describing Contingency Tables

2 Describing Contingency Tables 2 Describing Contingency Tables I. Probability structure of a 2-way contingency table I.1 Contingency Tables X, Y : cat. var. Y usually random (except in a case-control study), response; X can be random

More information

Bayesian methods for categorical data under informative censoring

Bayesian methods for categorical data under informative censoring Bayesian Analysis (2008) 3, Number 3, pp. 541 554 Bayesian methods for categorical data under informative censoring Thomas J. Jiang and James M. Dickey Abstract. Bayesian methods are presented for categorical

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

Frequency Distribution Cross-Tabulation

Frequency Distribution Cross-Tabulation Frequency Distribution Cross-Tabulation 1) Overview 2) Frequency Distribution 3) Statistics Associated with Frequency Distribution i. Measures of Location ii. Measures of Variability iii. Measures of Shape

More information

Lecture 01: Introduction

Lecture 01: Introduction Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction

More information

13.1 Categorical Data and the Multinomial Experiment

13.1 Categorical Data and the Multinomial Experiment Chapter 13 Categorical Data Analysis 13.1 Categorical Data and the Multinomial Experiment Recall Variable: (numerical) variable (i.e. # of students, temperature, height,). (non-numerical, categorical)

More information

Short Note: Naive Bayes Classifiers and Permanence of Ratios

Short Note: Naive Bayes Classifiers and Permanence of Ratios Short Note: Naive Bayes Classifiers and Permanence of Ratios Julián M. Ortiz (jmo1@ualberta.ca) Department of Civil & Environmental Engineering University of Alberta Abstract The assumption of permanence

More information

DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS

DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS Ivy Liu and Dong Q. Wang School of Mathematics, Statistics and Computer Science Victoria University of Wellington New Zealand Corresponding

More information

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure). STAT 515 -- Chapter 13: Categorical Data Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure). Many studies allow for more than 2 categories. Example

More information

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios ST3241 Categorical Data Analysis I Two-way Contingency Tables 2 2 Tables, Relative Risks and Odds Ratios 1 What Is A Contingency Table (p.16) Suppose X and Y are two categorical variables X has I categories

More information

BOOTSTRAPPING WITH MODELS FOR COUNT DATA

BOOTSTRAPPING WITH MODELS FOR COUNT DATA Journal of Biopharmaceutical Statistics, 21: 1164 1176, 2011 Copyright Taylor & Francis Group, LLC ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543406.2011.607748 BOOTSTRAPPING WITH MODELS FOR

More information

CHL 5225 H Crossover Trials. CHL 5225 H Crossover Trials

CHL 5225 H Crossover Trials. CHL 5225 H Crossover Trials CHL 55 H Crossover Trials The Two-sequence, Two-Treatment, Two-period Crossover Trial Definition A trial in which patients are randomly allocated to one of two sequences of treatments (either 1 then, or

More information

HANDBOOK OF APPLICABLE MATHEMATICS

HANDBOOK OF APPLICABLE MATHEMATICS HANDBOOK OF APPLICABLE MATHEMATICS Chief Editor: Walter Ledermann Volume VI: Statistics PART A Edited by Emlyn Lloyd University of Lancaster A Wiley-Interscience Publication JOHN WILEY & SONS Chichester

More information

Statistical Estimation

Statistical Estimation Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from

More information

Confidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection

Confidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection Biometrical Journal 42 (2000) 1, 59±69 Confidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection Kung-Jong Lui

More information

Testing Non-Linear Ordinal Responses in L2 K Tables

Testing Non-Linear Ordinal Responses in L2 K Tables RUHUNA JOURNA OF SCIENCE Vol. 2, September 2007, pp. 18 29 http://www.ruh.ac.lk/rjs/ ISSN 1800-279X 2007 Faculty of Science University of Ruhuna. Testing Non-inear Ordinal Responses in 2 Tables eslie Jayasekara

More information

A note on shaved dice inference

A note on shaved dice inference A note on shaved dice inference Rolf Sundberg Department of Mathematics, Stockholm University November 23, 2016 Abstract Two dice are rolled repeatedly, only their sum is registered. Have the two dice

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

Sections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21

Sections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21 Sections 2.3, 2.4 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 21 2.3 Partial association in stratified 2 2 tables In describing a relationship

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

Latent Class Analysis for Models with Error of Measurement Using Log-Linear Models and An Application to Women s Liberation Data

Latent Class Analysis for Models with Error of Measurement Using Log-Linear Models and An Application to Women s Liberation Data Journal of Data Science 9(2011), 43-54 Latent Class Analysis for Models with Error of Measurement Using Log-Linear Models and An Application to Women s Liberation Data Haydar Demirhan Hacettepe University

More information

Module 10: Analysis of Categorical Data Statistics (OA3102)

Module 10: Analysis of Categorical Data Statistics (OA3102) Module 10: Analysis of Categorical Data Statistics (OA3102) Professor Ron Fricker Naval Postgraduate School Monterey, California Reading assignment: WM&S chapter 14.1-14.7 Revision: 3-12 1 Goals for this

More information

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007 MIT OpenCourseWare http://ocw.mit.edu HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Epidemiology Wonders of Biostatistics Chapter 11 (continued) - probability in a single population. John Koval

Epidemiology Wonders of Biostatistics Chapter 11 (continued) - probability in a single population. John Koval Epidemiology 9509 Wonders of Biostatistics Chapter 11 (continued) - probability in a single population John Koval Department of Epidemiology and Biostatistics University of Western Ontario What is being

More information

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University Lecture 25 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 6 7 8 9 10 11 1 Hypothesis s of homgeneity 2 Estimating risk

More information

Three-Way Contingency Tables

Three-Way Contingency Tables Newsom PSY 50/60 Categorical Data Analysis, Fall 06 Three-Way Contingency Tables Three-way contingency tables involve three binary or categorical variables. I will stick mostly to the binary case to keep

More information

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression Rebecca Barter April 20, 2015 Fisher s Exact Test Fisher s Exact Test

More information

Chapter 10. Discrete Data Analysis

Chapter 10. Discrete Data Analysis Chapter 1. Discrete Data Analysis 1.1 Inferences on a Population Proportion 1. Comparing Two Population Proportions 1.3 Goodness of Fit Tests for One-Way Contingency Tables 1.4 Testing for Independence

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Biostat 2065 Analysis of Incomplete Data

Biostat 2065 Analysis of Incomplete Data Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies

More information

Survival Analysis for Case-Cohort Studies

Survival Analysis for Case-Cohort Studies Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz

More information

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables. Chapter 10 Multinomial Experiments and Contingency Tables 1 Chapter 10 Multinomial Experiments and Contingency Tables 10-1 1 Overview 10-2 2 Multinomial Experiments: of-fitfit 10-3 3 Contingency Tables:

More information

Describing Stratified Multiple Responses for Sparse Data

Describing Stratified Multiple Responses for Sparse Data Describing Stratified Multiple Responses for Sparse Data Ivy Liu School of Mathematical and Computing Sciences Victoria University Wellington, New Zealand June 28, 2004 SUMMARY Surveys often contain qualitative

More information

11-2 Multinomial Experiment

11-2 Multinomial Experiment Chapter 11 Multinomial Experiments and Contingency Tables 1 Chapter 11 Multinomial Experiments and Contingency Tables 11-11 Overview 11-2 Multinomial Experiments: Goodness-of-fitfit 11-3 Contingency Tables:

More information

EM for ML Estimation

EM for ML Estimation Overview EM for ML Estimation An algorithm for Maximum Likelihood (ML) Estimation from incomplete data (Dempster, Laird, and Rubin, 1977) 1. Formulate complete data so that complete-data ML estimation

More information

Decomposition of Parsimonious Independence Model Using Pearson, Kendall and Spearman s Correlations for Two-Way Contingency Tables

Decomposition of Parsimonious Independence Model Using Pearson, Kendall and Spearman s Correlations for Two-Way Contingency Tables International Journal of Statistics and Probability; Vol. 7 No. 3; May 208 ISSN 927-7032 E-ISSN 927-7040 Published by Canadian Center of Science and Education Decomposition of Parsimonious Independence

More information

Lecture 41 Sections Mon, Apr 7, 2008

Lecture 41 Sections Mon, Apr 7, 2008 Lecture 41 Sections 14.1-14.3 Hampden-Sydney College Mon, Apr 7, 2008 Outline 1 2 3 4 5 one-proportion test that we just studied allows us to test a hypothesis concerning one proportion, or two categories,

More information

Multi-Level Test of Independence for 2 X 2 Contingency Table using Cochran and Mantel Haenszel Statistics

Multi-Level Test of Independence for 2 X 2 Contingency Table using Cochran and Mantel Haenszel Statistics IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. Issue 8, August 015. ISSN 348 7968 Multi-Level Test of Independence for X Contingency Table using Cochran and Mantel

More information

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti Good Confidence Intervals for Categorical Data Analyses Alan Agresti Department of Statistics, University of Florida visiting Statistics Department, Harvard University LSHTM, July 22, 2011 p. 1/36 Outline

More information

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Chapte The McGraw-Hill Companies, Inc. All rights reserved. er15 Chapte Chi-Square Tests d Chi-Square Tests for -Fit Uniform Goodness- Poisson Goodness- Goodness- ECDF Tests (Optional) Contingency Tables A contingency table is a cross-tabulation of n paired observations

More information

The Analysis of Multivariate Misclassified Data With Special Attention to Randomized Response Data

The Analysis of Multivariate Misclassified Data With Special Attention to Randomized Response Data The Analysis of Multivariate Misclassified Data With Special Attention to Randomized Response Data ARDO van den HOUT PETER G. M. van der HEIJDEN Utrecht University, The Netherlands This article discusses

More information

Generalized linear models with a coarsened covariate

Generalized linear models with a coarsened covariate Appl. Statist. (2004) 53, Part 2, pp. 279 292 Generalized linear models with a coarsened covariate Stuart Lipsitz, Medical University of South Carolina, Charleston, USA Michael Parzen, University of Chicago,

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Log-linear Models for Contingency Tables

Log-linear Models for Contingency Tables Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A

More information

Basics on Probability. Jingrui He 09/11/2007

Basics on Probability. Jingrui He 09/11/2007 Basics on Probability Jingrui He 09/11/2007 Coin Flips You flip a coin Head with probability 0.5 You flip 100 coins How many heads would you expect Coin Flips cont. You flip a coin Head with probability

More information

Lecture 2: Poisson and logistic regression

Lecture 2: Poisson and logistic regression Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 11-12 December 2014 introduction to Poisson regression application to the BELCAP study introduction

More information

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial

More information

Maximum likelihood estimation via the ECM algorithm: A general framework

Maximum likelihood estimation via the ECM algorithm: A general framework Biometrika (1993), 80, 2, pp. 267-78 Printed in Great Britain Maximum likelihood estimation via the ECM algorithm: A general framework BY XIAO-LI MENG Department of Statistics, University of Chicago, Chicago,

More information

3 Way Tables Edpsy/Psych/Soc 589

3 Way Tables Edpsy/Psych/Soc 589 3 Way Tables Edpsy/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University of Illinois Spring 2017

More information

Lectures of STA 231: Biostatistics

Lectures of STA 231: Biostatistics Lectures of STA 231: Biostatistics Second Semester Academic Year 2016/2017 Text Book Biostatistics: Basic Concepts and Methodology for the Health Sciences (10 th Edition, 2014) By Wayne W. Daniel Prepared

More information

Prediction of ordinal outcomes when the association between predictors and outcome diers between outcome levels

Prediction of ordinal outcomes when the association between predictors and outcome diers between outcome levels STATISTICS IN MEDICINE Statist. Med. 2005; 24:1357 1369 Published online 26 November 2004 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/sim.2009 Prediction of ordinal outcomes when the

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Probability and Probability Distributions. Dr. Mohammed Alahmed

Probability and Probability Distributions. Dr. Mohammed Alahmed Probability and Probability Distributions 1 Probability and Probability Distributions Usually we want to do more with data than just describing them! We might want to test certain specific inferences about

More information

Con dentiality, Uniqueness, and Disclosure Limitation for Categorical Data 1

Con dentiality, Uniqueness, and Disclosure Limitation for Categorical Data 1 Journal of Of cial Statistics, Vol. 14, No. 4, 1998, pp. 385±397 Con dentiality, Uniqueness, and Disclosure Limitation for Categorical Data 1 Stephen E. Fienberg 2 and Udi E. Makov 3 When an agency releases

More information

Categorical Data Analysis Chapter 3

Categorical Data Analysis Chapter 3 Categorical Data Analysis Chapter 3 The actual coverage probability is usually a bit higher than the nominal level. Confidence intervals for association parameteres Consider the odds ratio in the 2x2 table,

More information

Chapter 2: Describing Contingency Tables - II

Chapter 2: Describing Contingency Tables - II : Describing Contingency Tables - II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu]

More information

TWO-WAY CONTINGENCY TABLES UNDER CONDITIONAL HOT DECK IMPUTATION

TWO-WAY CONTINGENCY TABLES UNDER CONDITIONAL HOT DECK IMPUTATION Statistica Sinica 13(2003), 613-623 TWO-WAY CONTINGENCY TABLES UNDER CONDITIONAL HOT DECK IMPUTATION Hansheng Wang and Jun Shao Peking University and University of Wisconsin Abstract: We consider the estimation

More information

Categorical Data Analysis 1

Categorical Data Analysis 1 Categorical Data Analysis 1 STA 312: Fall 2012 1 See last slide for copyright information. 1 / 1 Variables and Cases There are n cases (people, rats, factories, wolf packs) in a data set. A variable is

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

MAT 2379, Introduction to Biostatistics, Sample Calculator Questions 1. MAT 2379, Introduction to Biostatistics

MAT 2379, Introduction to Biostatistics, Sample Calculator Questions 1. MAT 2379, Introduction to Biostatistics MAT 2379, Introduction to Biostatistics, Sample Calculator Questions 1 MAT 2379, Introduction to Biostatistics Sample Calculator Problems for the Final Exam Note: The exam will also contain some problems

More information

Modelling Rates. Mark Lunt. Arthritis Research UK Epidemiology Unit University of Manchester

Modelling Rates. Mark Lunt. Arthritis Research UK Epidemiology Unit University of Manchester Modelling Rates Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 05/12/2017 Modelling Rates Can model prevalence (proportion) with logistic regression Cannot model incidence in

More information

JOINT MODELING OF TIME-TO-EVENT DATA AND MULTIPLE RATINGS OF A DISCRETE DIAGNOSTIC TEST WITHOUT A GOLD STANDARD

JOINT MODELING OF TIME-TO-EVENT DATA AND MULTIPLE RATINGS OF A DISCRETE DIAGNOSTIC TEST WITHOUT A GOLD STANDARD JOINT MODELING OF TIME-TO-EVENT DATA AND MULTIPLE RATINGS OF A DISCRETE DIAGNOSTIC TEST WITHOUT A GOLD STANDARD by Seung Hyun Won B.S. in Statistics, Chung-Ang University, Korea, 2007 M.S. in Biostatistics,

More information

The identification of synergism in the sufficient-component cause framework

The identification of synergism in the sufficient-component cause framework * Title Page Original Article The identification of synergism in the sufficient-component cause framework Tyler J. VanderWeele Department of Health Studies, University of Chicago James M. Robins Departments

More information

Generalization to Multi-Class and Continuous Responses. STA Data Mining I

Generalization to Multi-Class and Continuous Responses. STA Data Mining I Generalization to Multi-Class and Continuous Responses STA 5703 - Data Mining I 1. Categorical Responses (a) Splitting Criterion Outline Goodness-of-split Criterion Chi-square Tests and Twoing Rule (b)

More information

L2: Review of probability and statistics

L2: Review of probability and statistics Probability L2: Review of probability and statistics Definition of probability Axioms and properties Conditional probability Bayes theorem Random variables Definition of a random variable Cumulative distribution

More information

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology FE670 Algorithmic Trading Strategies Lecture 3. Factor Models and Their Estimation Steve Yang Stevens Institute of Technology 09/12/2012 Outline 1 The Notion of Factors 2 Factor Analysis via Maximum Likelihood

More information

Dirichlet-multinomial Model with Varying Response Rates over Time

Dirichlet-multinomial Model with Varying Response Rates over Time Journal of Data Science 5(2007), 413-423 Dirichlet-multinomial Model with Varying Response Rates over Time Jeffrey R. Wilson and Grace S. C. Chen Arizona State University Abstract: It is believed that

More information

Loglinear models. STAT 526 Professor Olga Vitek

Loglinear models. STAT 526 Professor Olga Vitek Loglinear models STAT 526 Professor Olga Vitek April 19, 2011 8 Can Use Poisson Likelihood To Model Both Poisson and Multinomial Counts 8-1 Recall: Poisson Distribution Probability distribution: Y - number

More information

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem Recall from last time: Conditional probabilities Our probabilistic models will compute and manipulate conditional probabilities. Given two random variables X, Y, we denote by Lecture 2: Belief (Bayesian)

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 3: Bivariate association : Categorical variables Proportion in one group One group is measured one time: z test Use the z distribution as an approximation to the binomial

More information