Table 2.14 : Distribution of 125 subjects by laboratory and +/ Category. Test Reference Laboratory Laboratory Total
|
|
- Jared Miller
- 5 years ago
- Views:
Transcription
1 2.5. Kappa Coefficient and the Paradoxes Kappa s Dependency on Trait Prevalence On February 9, 2003 we received an from a researcher asking whether it would be possible to apply the AC 1 coefficient 12 to the data in Table The motivation behind this request is presumably the failure of Kappa to provide a reasonable estimation of the extent of agreement between the two laboratories. Table 2.14 : Distribution of 125 subjects by laboratory and +/ Category Test Reference Laboratory Laboratory + Total Total A look at Table 2.14 suggests that the test and the reference laboratories agree almost perfectly with respect to the scoring of the 125 participating subjects, with the exception of 5 subjects classified as positive and negative by the test and reference laboratories respectively. The high agreement on one category is an indication of high prevalence of the positive trait in the subject population being tested. The Kappa coefficient associated with this data is obtained as follows : } p a = ( )/125 = 0.96, p e = ( )/125 2 γ = 0.96 κ = p a p e = 0. 1 p e Here is an example of a situation where a researcher would normally expect a near perfect agreement between observers, regardless of how it is measured. Yet Kappa yields a 0 coefficient, suggesting a total absence of agreement between laboratories. We want to make the following 2 comments : (i) Regardless of how the notion of agreement between laboratories is defined in this particular instance, Kappa does not quantify it well. Therefore, this situation is a paradox. (ii) The magnitude of the overall agreement probability p a of 0.96 is as largely expected as the magnitude of chance-agreement probability of 0.96 is unexpected. This raises questions about the very nature of the concept being represented by the chance-agreement probability p e. According to Cohen (1960) 12 Note that AC 1 is an agreement statistic to be discussed in chapter 4, and that we recommend as an alternative to Kappa (see Gwet (2008a)).
2 Chapter 2 : The Kappa Coefficient : A Review the inventor of the Kappa coefficient, p e represents... the proportion of units for which agreement is expected by chance. An examination of the expression of p e suggests that it measures agreement probability under the following 2 assumptions : Both laboratories classify all 125 subjects randomly, Each laboratory s random classification is performed according to the observed marginal probabilities (1.0 and 0 for the test laboratory, and 0.96 and 0.04 for the reference laboratory). It is this second condition that is unreasonable, and causing the paradox. The observed marginals indicate that the test laboratory classifies all subjects into the + category with certainty (i.e. with a probability of 1), while the reference laboratory classifies 96% of the subjects into the same category. Implementing the mechanism for generating random ratings according to these probabilities will dramatically and perhaps artificially increase the proportion of agreement by chance. The use of observed marginals to define chance agreement (or random rating) may not be reasonable if these marginals are very unbalanced toward one category. Moreover, p a and p e (as defined by Cohen (1960)) may not even have a common base for the difference p a p e to have a clear meaning (more on this in chapter 4). The use of marginal probabilities as an objective means for quantifying chanceagreement probability p e is questionable, and is at the origin of Kappa paradoxes. On this issue, Feinstein and Cicchetti (1990, p.548) say the following : The reasoning makes the assumption that each observer has a relatively fixed prior probability of making positive or negative responses. The assumption does not seem appropriate, however for most clinical observers. If unbiased, the observers will usually respond to whatever is presented in each particular instance of challenge. The observers may develop a fixed prior probability if they know in advance that the challenge population is predominantly normal or abnormal, positive or negative - but there is no reason to assume that such probabilities will be established in advance if the observers are blind to the characteristics of the challenge population. This paradox problem in our views is not created by a high trait prevalence (which is often unknown) as often suggested in the literature. Instead, it finds its origin in the way the researcher defines agreement by chance. In the current formulation of Kappa, all or most ratings associated with one category could be used in the calculation of p e as if they were assigned randomly. This may lead to a chance-agreement probability that is higher that the overall agreement probability, producing a negative value
3 2.5. Kappa Coefficient and the Paradoxes for the Kappa coefficient. Even in the case of Kappa, trait prevalence will distort the coefficient s value only if it is high, and the raters assumed to have classified the subjects correctly. This problem is discussed extensively by Gwet (2008a). Regardless of what causes Kappa paradoxes, they pose a serious underestimation problem in the extreme situations where the table is very unbalanced in its marginals. Kraemer et al. (2002, p. 2114) attempted to defend the merit of Kappa and its paradoxical behavior in the situation illustrated in Table 2.14, by saying the following : It is useful to note that κ = 0 indicates either that the heterogeneity of the patients in the population is not well detected by the raters or ratings, or that the patients in the population are homogeneous. Consequently, it is well known that it is very difficult to achieve high reliability of any measure (binary or not) in a very homogeneous population (P near 0 or 1 for binary measures). That is not a flaw in kappa... or any other measure of reliability, or a paradox. It merely reflects the fact that it is difficult to make clear distinctions between the patients in a population in which those distinctions are very rare or fine. In such populations, noise quickly overwhelms the signals. This statement is very convoluted, and does not tell why a homogeneous population (i.e. high prevalence) should prevent two raters from agreeing on the classification of subjects that appear to have everything in common. Is Kappa measuring the extent of agreement among raters then? Moreover, what is known in statistical science is not the difficulty to achieve high reliability of very homogeneous populations (P = 0, or P = 1), it is rather the difficulty to achieve high reliability of very heterogeneous populations (i.e. P 0.5) ; it is the situation in which the variance will reach its maximum Kappa dependency on Marginal Homogeneity Feinstein and Cicchetti (1990) presented Tables 2.15 and 2.16 to illustrate the second Kappa paradox, which is characterized by a low Kappa value associated with good agreement among raters on marginal counts. These authors argue that good agreement on marginals should translate into a good interrater agreement 13. The Kappa value associated with Table 2.15 data is γ κ = 0.13, which is twice smaller than the Kappa value of γ κ = 0.26 associated with Table 2.16 data. The point here is that observers A and B should not be penalized for having marginal probabilities that are similar as in Table Although reasonable, this statement could be disputable and may be inaccurate if the overall agreement p a among raters is low.
4 Chapter 2 : The Kappa Coefficient : A Review The root cause of the second Kappa paradox is similar to that of the first Kappa paradox, which is high trait prevalence. That is, to obtain Cohen s chance-agreement probability p e for Table 2.15, one must make the (strong and often unreasonable) assumption that the observed marginal probabilities - (0.7,0.3) for Observer A and (0.6,0.4) for Observer B - are fixed and predetermined yes and no classification propensities specific to each observer, which they will remain associated with whether they classify a subject randomly or not. A major implication of this assumption for Table 2.15 is the unduly high chance-agreement probability on the yes category alone of = 0.42, which will lead to a low Kappa value. Table 2.15 : Table 2.16 : Distribution of 100 subjects by Distribution of 100 subjects by rater : symmetrical imbalance rater : asymmetrical imbalance Observer Observer A Observer Observer A B Total Yes No B Yes No Total Yes Yes No No Total Total We indicated in the beginning of this section that the use of 3 nominal or ordinal categories or more may create different types of disagreements, some potentially more serious than others. Researchers may want to considered the less serious disagreements as partial agreements and treat them accordingly. This problem has been resolved by weighting the Kappa coefficient, and is briefly discussed in the next section. 2.6 Weighting of the Kappa Coefficient Tables 2.17 and 2.18 contain pregnancy type data collected from 100 women who presented themselves in an Emergency Room with a positive pregnancy test and a second condition, which is either abdominal pain or vaginal bleeding. After reviewing their medical records, three reviewers (also referred to as abstractors) classified them into one of the following 3 pregnancy categories : Ectopic Pregnancy (Ectopic), Abnormal Intrauterine pregnancy (ABN IUP), and Normal Intrauterine Pregnancy (NOR IUP).
5 2.6. Weighted Kappa : A Review Table 2.17 : Distribution of 100 pregnant women by pregnancy type and abstractor, as classified by abstractors 1 and 2 Abstractor 2 Abstractor 1 Ectopic ABN IUP NOR IUP Total Ectopic ABN IUP NOR IUP Total Table 2.18 : Distribution of 100 pregnant women by pregnancy type and abstractor, as classified by abstractors 1 and 3 Abstractor 3 Abstractor 1 Ectopic ABN IUP NOR IUP Total Ectopic ABN IUP NOR IUP Total The extent of agreement between abstractors 1 and 2 measured by the Kappa coefficient is given by : γ κ (1, 2) = (p a p e )/(1 p e ) = ( )/( ) = , and is identical to the extent of agreement 14 between abstractors 1 and 3. However, the fundamental issue for a clinician in this chart review is whether the pregnancy is ectopic or intrauterine. That is, a disagreement over the nature (normal versus abnormal) of the intrauterine pregnancy is of secondary importance. This fact suggests that abstractors 1 and 2 are more in agreement than abstractors 1 and 3 are. But the Kappa coefficient does not reflect this reality. This example features a reliability experiment where certain disagreements are more serious than others. Those less serious disagreements such as abnormal versus normal intrauterine pregnancies are actually partial agreements that should be treated as such. To resolve this problem Cohen (1968) proposed the Weighted Kappa coefficient that typically assigns a weight of 1 to full (or diagonal) agreements, and assigns to the disagreements a weight whose magnitude decreases proportionally to their seriousness. Cohen(1968) proposed the weighted Kappa in the context of 2 raters and q- item response categories. A set of weight w kl (k, l = 1,, q) between 0 and 1 14 The overall agreement probability p a and chance-agreement probability p e are obtained as follows : p a = ( )/100 = 0.89, and p e = (13/100) 2 +(27/100)(24/100)+(60/100)(63/100) =
6 Chapter 2 : The Kappa Coefficient : A Review must be assigned to the q 2 cells of the contingency table similar to Table 2.7, and prior to collecting rating data. This a-priori assignment of weights to cells aims at ensuring their independence from the observations, which will make them an integral part of the definitional formulation of the Weighted Kappa coefficient. For the classical unweighted Kappa of Cohen (1960) the weights are defined as w kk = 1 for k = 1,, q, and w kl = 0 if k l. Once the weights are assigned, the weighted Kappa can be defined as follows : γ κw = p a p e 1 p e, (2.15) where the weighted overall agreement probability p a, and weighted chance-agreement probability p e are respectively given by : q q q q p a = w kl p kl, and p e = w kl p k+ p +l. (2.16) k=1 l=1 k=1 k=1 Cohen(1968) suggests that the weights could be assigned in an arbitrary fashion either by a group of experts or by a particular investigator. Two sets of weights that have been proposed in the literature are the Linear Weights defined for all cell (k, l) by w kl = 1 k l /(q 1) and the Quadratic Weight defined by w kl = 1 (k l) 2 /(q 1) 2. Example 2.5 Let us label the 3 response categories {Ectopic, ABN IUP, NOR IUP} of Tables 2.17 numerically as categories 1, 2, and 3. Table 2.19 : Linear and Quadratic Weights for the Pregnancy Classification Example Linear Weights Quadratic Weights Table 2.20 : Weighted Cell Proportions (w kl p kl ) of Table 2.17 using Linear and Quadratic Weights Linear Weights Quadratic Weights
Agreement Coefficients and Statistical Inference
CHAPTER Agreement Coefficients and Statistical Inference OBJECTIVE This chapter describes several approaches for evaluating the precision associated with the inter-rater reliability coefficients of the
More informationMeasures of Agreement
Measures of Agreement An interesting application is to measure how closely two individuals agree on a series of assessments. A common application for this is to compare the consistency of judgments of
More informationChapter 19. Agreement and the kappa statistic
19. Agreement Chapter 19 Agreement and the kappa statistic Besides the 2 2contingency table for unmatched data and the 2 2table for matched data, there is a third common occurrence of data appearing summarised
More informationSection 6-5 THE CENTRAL LIMIT THEOREM AND THE SAMPLING DISTRIBUTION OF. The Central Limit Theorem. Central Limit Theorem: For all samples of
Section 6-5 The Central Limit Theorem THE CENTRAL LIMIT THEOREM Central Limit Theorem: For all samples of the same size with 30, the sampling distribution of can be approximated by a normal distribution
More informationKappa Coefficients for Circular Classifications
Journal of Classification 33:507-522 (2016) DOI: 10.1007/s00357-016-9217-3 Kappa Coefficients for Circular Classifications Matthijs J. Warrens University of Groningen, The Netherlands Bunga C. Pratiwi
More informationECON1310 Quantitative Economic and Business Analysis A
ECON1310 Quantitative Economic and Business Analysis A Topic 1 Descriptive Statistics 1 Main points - Statistics descriptive collecting/presenting data; inferential drawing conclusions from - Data types
More informationCHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)
FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter
More informationLecture 25: Models for Matched Pairs
Lecture 25: Models for Matched Pairs Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture
More informationUNIVERSITY OF CALGARY. Measuring Observer Agreement on Categorical Data. Andrea Soo A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES
UNIVERSITY OF CALGARY Measuring Observer Agreement on Categorical Data by Andrea Soo A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR
More informationMeasures of Diversity in Combining Classifiers
Measures of Diversity in Combining Classifiers Part. Non-pairwise diversity measures For fewer cartoons and more formulas: http://www.bangor.ac.uk/~mas00a/publications.html Random forest :, x, θ k (i.i.d,
More informationØ Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.
Statistical Tools in Evaluation HPS 41 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific number
More informationLearning Classification with Auxiliary Probabilistic Information Quang Nguyen Hamed Valizadegan Milos Hauskrecht
Learning Classification with Auxiliary Probabilistic Information Quang Nguyen Hamed Valizadegan Milos Hauskrecht Computer Science Department University of Pittsburgh Outline Introduction Learning with
More informationSample Size Formulas for Estimating Intraclass Correlation Coefficients in Reliability Studies with Binary Outcomes
Western University Scholarship@Western Electronic Thesis and Dissertation Repository September 2016 Sample Size Formulas for Estimating Intraclass Correlation Coefficients in Reliability Studies with Binary
More informationStatistical Models of the Annotation Process
Bob Carpenter 1 Massimo Poesio 2 1 Alias-I 2 Università di Trento LREC 2010 Tutorial 17th May 2010 Many slides due to Ron Artstein Annotated corpora Annotated corpora are needed for: Supervised learning
More informationRandom marginal agreement coefficients: rethinking the adjustment for chance when measuring agreement
Biostatistics (2005), 6, 1,pp. 171 180 doi: 10.1093/biostatistics/kxh027 Random marginal agreement coefficients: rethinking the adjustment for chance when measuring agreement MICHAEL P. FAY National Institute
More informationAn extended summary of the NCGR/Berkeley Double-Blind Test of Astrology undertaken by Shawn Carlson and published in 1985
From: http://www.astrodivination.com/moa/ncgrberk.htm An extended summary of the NCGR/Berkeley Double-Blind Test of Astrology undertaken by Shawn Carlson and published in 1985 Introduction Under the heading
More information-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the
1 2 3 -Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the 1950's. -PCA is based on covariance or correlation
More informationClassical Test Theory. Basics of Classical Test Theory. Cal State Northridge Psy 320 Andrew Ainsworth, PhD
Cal State Northridge Psy 30 Andrew Ainsworth, PhD Basics of Classical Test Theory Theory and Assumptions Types of Reliability Example Classical Test Theory Classical Test Theory (CTT) often called the
More informationGroup Dependence of Some Reliability
Group Dependence of Some Reliability Indices for astery Tests D. R. Divgi Syracuse University Reliability indices for mastery tests depend not only on true-score variance but also on mean and cutoff scores.
More informationProbability and Statistics. Terms and concepts
Probability and Statistics Joyeeta Dutta Moscato June 30, 2014 Terms and concepts Sample vs population Central tendency: Mean, median, mode Variance, standard deviation Normal distribution Cumulative distribution
More informationVariance Estimation of the Survey-Weighted Kappa Measure of Agreement
NSDUH Reliability Study (2006) Cohen s kappa Variance Estimation Acknowledg e ments Variance Estimation of the Survey-Weighted Kappa Measure of Agreement Moshe Feder 1 1 Genomics and Statistical Genetics
More informationDescribing Contingency tables
Today s topics: Describing Contingency tables 1. Probability structure for contingency tables (distributions, sensitivity/specificity, sampling schemes). 2. Comparing two proportions (relative risk, odds
More informationCohen s s Kappa and Log-linear Models
Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance
More informationØ Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.
Statistical Tools in Evaluation HPS 41 Fall 213 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific
More informationSTAC51: Categorical data Analysis
STAC51: Categorical data Analysis Mahinda Samarakoon January 26, 2016 Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 32 Table of contents Contingency Tables 1 Contingency Tables Mahinda Samarakoon
More informationAssessing agreement with multiple raters on correlated kappa statistics
Biometrical Journal 52 (2010) 61, zzz zzz / DOI: 10.1002/bimj.200100000 Assessing agreement with multiple raters on correlated kappa statistics Hongyuan Cao,1, Pranab K. Sen 2, Anne F. Peery 3, and Evan
More informationLecture 2. Judging the Performance of Classifiers. Nitin R. Patel
Lecture 2 Judging the Performance of Classifiers Nitin R. Patel 1 In this note we will examine the question of how to udge the usefulness of a classifier and how to compare different classifiers. Not only
More informationDIFFERENT APPROACHES TO STATISTICAL INFERENCE: HYPOTHESIS TESTING VERSUS BAYESIAN ANALYSIS
DIFFERENT APPROACHES TO STATISTICAL INFERENCE: HYPOTHESIS TESTING VERSUS BAYESIAN ANALYSIS THUY ANH NGO 1. Introduction Statistics are easily come across in our daily life. Statements such as the average
More informationIntroduction to Basic Statistics Version 2
Introduction to Basic Statistics Version 2 Pat Hammett, Ph.D. University of Michigan 2014 Instructor Comments: This document contains a brief overview of basic statistics and core terminology/concepts
More informationSTAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression
STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression Rebecca Barter April 20, 2015 Fisher s Exact Test Fisher s Exact Test
More informationChapter 2: Describing Contingency Tables - II
: Describing Contingency Tables - II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu]
More informationChapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals
Chapter 8 Linear Regression Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 8-1 Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Fat Versus
More informationAyfer E. Yilmaz 1*, Serpil Aktas 2. Abstract
89 Kuwait J. Sci. Ridit 45 (1) and pp exponential 89-99, 2018type scores for estimating the kappa statistic Ayfer E. Yilmaz 1*, Serpil Aktas 2 1 Dept. of Statistics, Faculty of Science, Hacettepe University,
More informationLecture 01: Introduction
Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction
More informationSections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21
Sections 2.3, 2.4 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 21 2.3 Partial association in stratified 2 2 tables In describing a relationship
More informationTwo-sample Categorical data: Testing
Two-sample Categorical data: Testing Patrick Breheny October 29 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/22 Lister s experiment Introduction In the 1860s, Joseph Lister conducted a landmark
More informationPractice problems from chapters 2 and 3
Practice problems from chapters and 3 Question-1. For each of the following variables, indicate whether it is quantitative or qualitative and specify which of the four levels of measurement (nominal, ordinal,
More informationTypes of spatial data. The Nature of Geographic Data. Types of spatial data. Spatial Autocorrelation. Continuous spatial data: geostatistics
The Nature of Geographic Data Types of spatial data Continuous spatial data: geostatistics Samples may be taken at intervals, but the spatial process is continuous e.g. soil quality Discrete data Irregular:
More informationCHAPTER 3. THE IMPERFECT CUMULATIVE SCALE
CHAPTER 3. THE IMPERFECT CUMULATIVE SCALE 3.1 Model Violations If a set of items does not form a perfect Guttman scale but contains a few wrong responses, we do not necessarily need to discard it. A wrong
More informationInterpret Standard Deviation. Outlier Rule. Describe the Distribution OR Compare the Distributions. Linear Transformations SOCS. Interpret a z score
Interpret Standard Deviation Outlier Rule Linear Transformations Describe the Distribution OR Compare the Distributions SOCS Using Normalcdf and Invnorm (Calculator Tips) Interpret a z score What is an
More informationAn Overview of Methods in the Analysis of Dependent Ordered Categorical Data: Assumptions and Implications
WORKING PAPER SERIES WORKING PAPER NO 7, 2008 Swedish Business School at Örebro An Overview of Methods in the Analysis of Dependent Ordered Categorical Data: Assumptions and Implications By Hans Högberg
More informationTest Homogeneity The Single-Factor Model. Test Theory Chapter 6 Lecture 9
Test Homogeneity The Single-Factor Model Test Theory Chapter 6 Lecture 9 Today s Class Test Homogeneity. The Single Factor Model. AKA the Spearman model. Chapter 6. Homework questions? Psych 892 - Test
More informationAnalysis of Variance (ANOVA)
Analysis of Variance (ANOVA) Two types of ANOVA tests: Independent measures and Repeated measures Comparing 2 means: X 1 = 20 t - test X 2 = 30 How can we Compare 3 means?: X 1 = 20 X 2 = 30 X 3 = 35 ANOVA
More informationAnomaly Detection. Jing Gao. SUNY Buffalo
Anomaly Detection Jing Gao SUNY Buffalo 1 Anomaly Detection Anomalies the set of objects are considerably dissimilar from the remainder of the data occur relatively infrequently when they do occur, their
More informationGuideline on adjustment for baseline covariates in clinical trials
26 February 2015 EMA/CHMP/295050/2013 Committee for Medicinal Products for Human Use (CHMP) Guideline on adjustment for baseline covariates in clinical trials Draft Agreed by Biostatistics Working Party
More informationCHAPTER 5. Outlier Detection in Multivariate Data
CHAPTER 5 Outlier Detection in Multivariate Data 5.1 Introduction Multivariate outlier detection is the important task of statistical analysis of multivariate data. Many methods have been proposed for
More informationTwo-sample Categorical data: Testing
Two-sample Categorical data: Testing Patrick Breheny April 1 Patrick Breheny Introduction to Biostatistics (171:161) 1/28 Separate vs. paired samples Despite the fact that paired samples usually offer
More informationBasic Probability Reference Sheet
February 27, 2001 Basic Probability Reference Sheet 17.846, 2001 This is intended to be used in addition to, not as a substitute for, a textbook. X is a random variable. This means that X is a variable
More information6.3 How the Associational Criterion Fails
6.3. HOW THE ASSOCIATIONAL CRITERION FAILS 271 is randomized. We recall that this probability can be calculated from a causal model M either directly, by simulating the intervention do( = x), or (if P
More informationLinear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?
Did You Mean Association Or Correlation? AP Statistics Chapter 8 Be careful not to use the word correlation when you really mean association. Often times people will incorrectly use the word correlation
More information1 The problem of survival analysis
1 The problem of survival analysis Survival analysis concerns analyzing the time to the occurrence of an event. For instance, we have a dataset in which the times are 1, 5, 9, 20, and 22. Perhaps those
More information2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2
PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 When and why do we use logistic regression? Binary Multinomial Theory behind logistic regression Assessing the model Assessing predictors
More informationQ-Matrix Development. NCME 2009 Workshop
Q-Matrix Development NCME 2009 Workshop Introduction We will define the Q-matrix Then we will discuss method of developing your own Q-matrix Talk about possible problems of the Q-matrix to avoid The Q-matrix
More informationDay 6: Classification and Machine Learning
Day 6: Classification and Machine Learning Kenneth Benoit Essex Summer School 2014 July 30, 2013 Today s Road Map The Naive Bayes Classifier The k-nearest Neighbour Classifier Support Vector Machines (SVMs)
More informationEconometric Modelling Prof. Rudra P. Pradhan Department of Management Indian Institute of Technology, Kharagpur
Econometric Modelling Prof. Rudra P. Pradhan Department of Management Indian Institute of Technology, Kharagpur Module No. # 01 Lecture No. # 28 LOGIT and PROBIT Model Good afternoon, this is doctor Pradhan
More informationChapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.
Chapter 8 Linear Regression Copyright 2010 Pearson Education, Inc. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu: Copyright
More informationDiscrete Multivariate Statistics
Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are
More informationLecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests
Lecture 9 Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests Univariate categorical data Univariate categorical data are best summarized in a one way frequency table.
More informationA Better Way to Do R&R Studies
The Evaluating the Measurement Process Approach Last month s column looked at how to fix some of the Problems with Gauge R&R Studies. This month I will show you how to learn more from your gauge R&R data
More informationDescribing Stratified Multiple Responses for Sparse Data
Describing Stratified Multiple Responses for Sparse Data Ivy Liu School of Mathematical and Computing Sciences Victoria University Wellington, New Zealand June 28, 2004 SUMMARY Surveys often contain qualitative
More informationModel Accuracy Measures
Model Accuracy Measures Master in Bioinformatics UPF 2017-2018 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Variables What we can measure (attributes) Hypotheses
More informationCS Homework 2: Combinatorics & Discrete Events Due Date: September 25, 2018 at 2:20 PM
CS1450 - Homework 2: Combinatorics & Discrete Events Due Date: September 25, 2018 at 2:20 PM Question 1 A website allows the user to create an 8-character password that consists of lower case letters (a-z)
More informationMATH ELEMENTARY STATISTICS SPRING 2013 ANSWERS TO SELECTED EVEN PROBLEMS & PRACTICE PROBLEMS, UNIT 1
MATH 10043 ELEMENTARY STATISTICS SPRING 2013 ANSWERS TO SELECTED EVEN PROBLEMS & PRACTICE PROBLEMS, UNIT 1 1.3 (2) Qualitative data are values assigning items to non-numeric categories; quantitative data
More informationParametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami
Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous
More informationLecture 8: Summary Measures
Lecture 8: Summary Measures Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 8:
More informationTwo Correlated Proportions Non- Inferiority, Superiority, and Equivalence Tests
Chapter 59 Two Correlated Proportions on- Inferiority, Superiority, and Equivalence Tests Introduction This chapter documents three closely related procedures: non-inferiority tests, superiority (by a
More informationWELCOME! Lecture 13 Thommy Perlinger
Quantitative Methods II WELCOME! Lecture 13 Thommy Perlinger Parametrical tests (tests for the mean) Nature and number of variables One-way vs. two-way ANOVA One-way ANOVA Y X 1 1 One dependent variable
More informationAlgorithm-Independent Learning Issues
Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning
More informationPerformance evaluation of binary classifiers
Performance evaluation of binary classifiers Kevin P. Murphy Last updated October 10, 2007 1 ROC curves We frequently design systems to detect events of interest, such as diseases in patients, faces in
More information1. How will an increase in the sample size affect the width of the confidence interval?
Study Guide Concept Questions 1. How will an increase in the sample size affect the width of the confidence interval? 2. How will an increase in the sample size affect the power of a statistical test?
More informationCHOOSING THE RIGHT SAMPLING TECHNIQUE FOR YOUR RESEARCH. Awanis Ku Ishak, PhD SBM
CHOOSING THE RIGHT SAMPLING TECHNIQUE FOR YOUR RESEARCH Awanis Ku Ishak, PhD SBM Sampling The process of selecting a number of individuals for a study in such a way that the individuals represent the larger
More information6.867 Machine learning
6.867 Machine learning Mid-term eam October 8, 6 ( points) Your name and MIT ID: .5.5 y.5 y.5 a).5.5 b).5.5.5.5 y.5 y.5 c).5.5 d).5.5 Figure : Plots of linear regression results with different types of
More informationpsyc3010 lecture 2 factorial between-ps ANOVA I: omnibus tests
psyc3010 lecture 2 factorial between-ps ANOVA I: omnibus tests last lecture: introduction to factorial designs next lecture: factorial between-ps ANOVA II: (effect sizes and follow-up tests) 1 general
More informationIE 316 Exam 1 Fall 2011
IE 316 Exam 1 Fall 2011 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed 1 1. Suppose the actual diameters x in a batch of steel cylinders are normally
More informationTreatment of Error in Experimental Measurements
in Experimental Measurements All measurements contain error. An experiment is truly incomplete without an evaluation of the amount of error in the results. In this course, you will learn to use some common
More informationRelate Attributes and Counts
Relate Attributes and Counts This procedure is designed to summarize data that classifies observations according to two categorical factors. The data may consist of either: 1. Two Attribute variables.
More informationLogistic Regression: Regression with a Binary Dependent Variable
Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression
More information1 Maintaining a Dictionary
15-451/651: Design & Analysis of Algorithms February 1, 2016 Lecture #7: Hashing last changed: January 29, 2016 Hashing is a great practical tool, with an interesting and subtle theory too. In addition
More informationDiscriminative Direction for Kernel Classifiers
Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering
More informationPropensity Score Matching
Methods James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 Methods 1 Introduction 2 3 4 Introduction Why Match? 5 Definition Methods and In
More informationRelative Effect Sizes for Measures of Risk. Jake Olivier, Melanie Bell, Warren May
Relative Effect Sizes for Measures of Risk Jake Olivier, Melanie Bell, Warren May MATHEMATICS & THE UNIVERSITY OF NEW STATISTICS SOUTH WALES November 2015 1 / 27 Motivating Examples Effect Size Phi and
More information1 [15 points] Frequent Itemsets Generation With Map-Reduce
Data Mining Learning from Large Data Sets Final Exam Date: 15 August 2013 Time limit: 120 minutes Number of pages: 11 Maximum score: 100 points You can use the back of the pages if you run out of space.
More informationEstimating Coefficients in Linear Models: It Don't Make No Nevermind
Psychological Bulletin 1976, Vol. 83, No. 2. 213-217 Estimating Coefficients in Linear Models: It Don't Make No Nevermind Howard Wainer Department of Behavioral Science, University of Chicago It is proved
More informationOptimal rules for timing intercourse to achieve pregnancy
Optimal rules for timing intercourse to achieve pregnancy Bruno Scarpa and David Dunson Dipartimento di Statistica ed Economia Applicate Università di Pavia Biostatistics Branch, National Institute of
More informationProbability and Samples. Sampling. Point Estimates
Probability and Samples Sampling We want the results from our sample to be true for the population and not just the sample But our sample may or may not be representative of the population Sampling error
More informationThe concord Package. August 20, 2006
The concord Package August 20, 2006 Version 1.4-6 Date 2006-08-15 Title Concordance and reliability Author , Ian Fellows Maintainer Measures
More informationEXAMINATION: QUANTITATIVE EMPIRICAL METHODS. Yale University. Department of Political Science
EXAMINATION: QUANTITATIVE EMPIRICAL METHODS Yale University Department of Political Science January 2014 You have seven hours (and fifteen minutes) to complete the exam. You can use the points assigned
More information1. Types of Biological Data 2. Summary Descriptive Statistics
Lecture 1: Basic Descriptive Statistics 1. Types of Biological Data 2. Summary Descriptive Statistics Measures of Central Tendency Measures of Dispersion 3. Assignments 1. Types of Biological Data Scales
More informationRoman Hornung. Ordinal Forests. Technical Report Number 212, 2017 Department of Statistics University of Munich.
Roman Hornung Ordinal Forests Technical Report Number 212, 2017 Department of Statistics University of Munich http://www.stat.uni-muenchen.de Ordinal Forests Roman Hornung 1 October 23, 2017 1 Institute
More informationSampling Distributions
Sampling Distributions Sampling Distribution of the Mean & Hypothesis Testing Remember sampling? Sampling Part 1 of definition Selecting a subset of the population to create a sample Generally random sampling
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification
More informationf rot (Hz) L x (max)(erg s 1 )
How Strongly Correlated are Two Quantities? Having spent much of the previous two lectures warning about the dangers of assuming uncorrelated uncertainties, we will now address the issue of correlations
More informationChapter 19: Logistic regression
Chapter 19: Logistic regression Self-test answers SELF-TEST Rerun this analysis using a stepwise method (Forward: LR) entry method of analysis. The main analysis To open the main Logistic Regression dialog
More informationappstats8.notebook October 11, 2016
Chapter 8 Linear Regression Objective: Students will construct and analyze a linear model for a given set of data. Fat Versus Protein: An Example pg 168 The following is a scatterplot of total fat versus
More informationWorked Examples for Nominal Intercoder Reliability. by Deen G. Freelon October 30,
Worked Examples for Nominal Intercoder Reliability by Deen G. Freelon (deen@dfreelon.org) October 30, 2009 http://www.dfreelon.com/utils/recalfront/ This document is an excerpt from a paper currently under
More informationItem Response Theory and Computerized Adaptive Testing
Item Response Theory and Computerized Adaptive Testing Richard C. Gershon, PhD Department of Medical Social Sciences Feinberg School of Medicine Northwestern University gershon@northwestern.edu May 20,
More informationEVALUATING THE REPEATABILITY OF TWO STUDIES OF A LARGE NUMBER OF OBJECTS: MODIFIED KENDALL RANK-ORDER ASSOCIATION TEST
EVALUATING THE REPEATABILITY OF TWO STUDIES OF A LARGE NUMBER OF OBJECTS: MODIFIED KENDALL RANK-ORDER ASSOCIATION TEST TIAN ZHENG, SHAW-HWA LO DEPARTMENT OF STATISTICS, COLUMBIA UNIVERSITY Abstract. In
More informationClassification and Regression Trees
Classification and Regression Trees Ryan P Adams So far, we have primarily examined linear classifiers and regressors, and considered several different ways to train them When we ve found the linearity
More informationThe Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)
The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) Authored by: Sarah Burke, PhD Version 1: 31 July 2017 Version 1.1: 24 October 2017 The goal of the STAT T&E COE
More information