Introduction to the Analysis of Tabular Data
|
|
- Francis Gordon
- 5 years ago
- Views:
Transcription
1 Introduction to the Analysis of Tabular Data Anthropological Sciences 192/292 Data Analysis in the Anthropological Sciences James Holland Jones & Ian G. Robertson March 15,
2 Tabular Data Is there an association between age at first parturition and breast cancer? Using a case-control study we can test for an association BC No BC Total < , ,245 13,465 Anthropological Sciences 192/292: Tables 2
3 Some Abstraction Case Control Total Exposed a b a + b Not Exposed c d c + d a + c b + d n = a + b + c + d Anthropological Sciences 192/292: Tables 3
4 Risks Two possibilities exist for thinking about risks Define p 1 = a/(a+b) as the probability of developing disease for exposed individuals Define p 2 individuals = c/(c + d) as the probability of developing disease for unexposed The risk difference is p 1 p 2 The risk ratio or relative risk is p 1 /p 2 RR = a/(a + b) c/(c + d If the entries of our table are large enough a normal approximation (to the binomial distribution, if you must know) applies and we can use normal theory to calculate standard errors and place confidence bounds around RR Anthropological Sciences 192/292: Tables 4
5 It turns out that log RR has a sampling distribution better approximated by a normal, so we work with that se[log RR] b an 1 + d cn 2 where n 1 = a + b and n 2 = c + d are the row sums The same logic that led to confidence bounds on a sample mean with known standard deviation then leads to the following expression for a 95% confidence interval on the relative risk: c 1 = log RR b d an 1 cn 2 c 2 = log RR b d an 1 cn 2 Anthropological Sciences 192/292: Tables 5
6 The CI is then [e c 1, e c 2] We mentioned large samples: For this approximation to be at all valid n 1ˆp 1 (1 ˆp 1 ) 5 and n 2ˆp 2 (1 ˆp 2 ) 5 Anthropological Sciences 192/292: Tables 6
7 Problems with Relative Risks The relative risk is a pretty intuitive idea The problem is that it is constrained by the denominator If p 2 = 0.5, that the biggest the relative risk could be is 2 Another problem relates to the fact that we often (usually?) don t have a prospective design for our data collection Without the prospective design, relative risks don t make much sense since we shouldn t believe that a/(a + b) is a good estimator of p 1 We can get around this with odds ratios For probability of success p, define the odds of p as: Odds = p 1 p Anthropological Sciences 192/292: Tables 7
8 Note that probabilities are (by definition) bounded by 0 and 1, odds are bounded by 0 and (This becomes important later) Look at p 1, the probability of disease given an exposure The odds of p 1 are p 1 1 p 1 = a/(a + b) b/(a + b) = a b The odds of p 2 (probability of disease without exposure) are: p 2 1 p 2 = c/(c + d) d/(c + d) = c d Define the Odds Ratio as Anthropological Sciences 192/292: Tables 8
9 OR = a/b c/d = ad bc Among the many nice features of the OR, if the probabilities (p 1, p 2 ) are low, the odds ratio is approximately equal to the relative risk ÔR RR p 1, p 2 < 0.1 Anthropological Sciences 192/292: Tables 9
10 On Prospective vs. Case-Control Studies Why can t we estimate relative risks in a case-control study? Take our table again: Case Control Total Exposed a b a + b Not Exposed c d c + d a + c b + d n = a + b + c + d This table is a sample of a larger population Case Control Total Exposed A B A + B Not Exposed C D C + D A + C B + D N = A + B + C + D Assume a random fraction of the diseased population f 1 are included in the study Anthropological Sciences 192/292: Tables 10
11 Assume a random fraction of the non-diseased population f 2 are included in the study RR = a/(a + b) c/(c + d) = f 1A/(f 1 A + f 2 B) f 1 C/(f 1 C + f 2 D) = A/(f 1A + f 2 B) C/(f 1 C + f 2 D) The only way that these will be equal is if f 1 = f 2 That is, if the sampling fraction is the same from each Chances are, we don t have that! Anthropological Sciences 192/292: Tables 11
12 But we can always calculate an odds-ratio! Anthropological Sciences 192/292: Tables 12
13 Confidence Intervals on Odds-Ratios Again, if the cells are large enough, the normal approximation works Again, we work with the logarithm of the measure of association s.e.[log OR] = 1 a + 1 b + 1 c + 1 d The 95% confidence intervals for the log-odds ratio is thus: 1 c 1 = log OR 1.96 a + 1 b + 1 c + 1 d 1 c 2 = log OR a + 1 b + 1 c + 1 d Anthropological Sciences 192/292: Tables 13
14 Back-transforming, we get the confidence interval on the unit scale of [e c 1, e c 2] Returning to our breast cancer example: > a <- 683 > b < > c < > d < > (a*d)/(b*c) # take a peek [1] > selo <- sqrt((1/a)+(1/b)+(1/c)+(1/d)) > lo <- log( (a*d)/(b*c)) > exp(lo *selo) [1] > exp(lo *selo) [1] Late first birth appears to be significantly associated with breast cancer Anthropological Sciences 192/292: Tables 14
15 What if You Have Small Cell Counts?? Anthropologists frequently have very small sample sizes This means that our n iˆp i (1 ˆp i ) values are frequently less than 5 and the normal approximations fall apart (utterly) R.A. Fisher to the rescue Consider the hypothesis that a chronic enzootic infection promotes the occurrence of some other disease of interest (I have obscured the actual diseases here, since I am posting these notes to the web and am in the process of submitting them for publication, wink, wink) Small cell counts, eh? Disease No Disease Total Chronic Not Chronic Anthropological Sciences 192/292: Tables 15
16 > chronic.table <- matrix(c(8,0,3,11), nr=2,byrow=t) > chronic.table [,1] [,2] [1,] 8 0 [2,] 3 11 > fisher.test(chronic.table) Fisher s Exact Test for Count Data data: chronic.table p-value = alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: Inf sample estimates: odds ratio Inf Fisher s test using exact probabilities from a hypergeometric distribution, so there are no approximations required That said, if your cells are too big, you will choke your computer Anthropological Sciences 192/292: Tables 16
17 A Teaser on Power Calculations Say we want to collect data to test the hypothesis that p 1 p 2 Say we want the probability of type I error α and probability of type II error β What? Type I Error The probability of falsely rejecting a true null hypothesis (i.e., your test says significantly different but reality says not different ) Type II Error The probability of falsely accepting a null hypothesis (i.e., your test says not different but reality is really different) Type I error is what we usually think about (you know, the 95% thing) But clearly, we also care about missing out on real effects just because we don t have enough power to measure it We call 1 β the power An integral part of research design should be a power calculation Anthropological Sciences 192/292: Tables 17
18 Can t emphasize this enough... Back to p 1 p 2 question Assume you will have k times as many subjects in class 2 as in class 1: n 2 = kn 1 The sample size that we need to confidently test the hypothesis p 1 p 2 is n 1 = p q ( ) z 1 α/2 + k p 1 q 1 + p 1q 1 k z 1 β / 2 where p 1, p 2 are the projected true probabilities of success in the two groups q 1 = 1 p 1 and q 2 = 1 p 2 = p 2 p 1 p = p 1+kp 2 1+k q = 1 p Anthropological Sciences 192/292: Tables 18
19 What does this mean from a practical standpoint? We want to know how many samples (e.g., respondents) we need to test our hypothesis We set our α (typically 0.05) and our 1 β (typically 0.8, though higher is better) We use theory, or previous research (or just a best guess!) as to what our expected effect size ( ) is We also use this method to make a guess as to what k will be (working on the assumption that your thing of interest is rare, otherwise you can use k = 1) We then solve the big ugly equation above for n 1 Or, should I say, we use R! The library pwr contains routines for doing all sorts of power calculations Anthropological Sciences 192/292: Tables 19
20 Another Way... Say we wanted to model the probability p of some event Say also that we imagine our probability to be a linear function of some covariates x i p = α + β 1 x β k x k Here s the rub: 0 p 1 What if our linear function gives us something that falls outside that range? Define the logit transform of a probability p logit(p) = log[p/(1 p)] Anthropological Sciences 192/292: Tables 20
21 logit is also known as log-odds, for obvious reasons Odds range from 0 to Log odds range from to logit(p) = log ( p ) 1 p = α + β 1 x β k x k We can solve for p to get p = eα+β 1x 1 + +β k x k 1 + e α+β 1x 1 + +β k x k In general, the anti-logit transform is: L = log ( p ) 1 p Anthropological Sciences 192/292: Tables 21
22 ( ) e L p = 1 + e L Anthropological Sciences 192/292: Tables 22
23 Interpretation of the Model Parameters Say that all covariates are the same except for one call it j Say that j = 0 for one individual and j = 1 for another We then have: logit(p A ) = α + β 1 x β j 1 x j 1 + β j (1) + β j+1 x j+1 β k x k logit(p B ) = α + β 1 x β j 1 x j 1 + β j (0) + β j+1 x j+1 β k x k Subtract logitp A from logitp B to get logit(p A ) logit(p B ) = β j Anthropological Sciences 192/292: Tables 23
24 From the definition of a logit, this is log[p A /(1 p A )] log[p B /(1 p B )] = β j which is just: log [ ] pa /(1 p A ) p B /(1 p B ) In other words e β j is the odds in favor of subject A = β j Say we have a variable E that defines exposure (E = 1 means exposure, E = 0 means no exposure) Then log[p/(1 p)] = α + βe Logistic regression gives you the same answer as cross-multiplying a two-way table Anthropological Sciences 192/292: Tables 24
25 Logistic Regression and Cross-Multiplication Give the Same Odds-Ratio! > FB <- cbind(c(a,c), c(a+b,c+d)-c(a,c)) > expose <- factor(c("yes", "no")) > expose [1] yes no Levels: no yes > fblm <- glm(fb~ expose, family=binomial) > summary(fblm) Call: glm(formula = FB ~ expose, family = binomial) Deviance Residuals: [1] 0 0 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) <2e-16 *** exposeyes <2e-16 *** --- Signif. codes: 0 *** ** 0.01 * (Dispersion parameter for binomial family taken to be 1) Null deviance: e+01 on 1 degrees of freedom Residual deviance: e-12 on 0 degrees of freedom AIC: Anthropological Sciences 192/292: Tables 25
26 Number of Fisher Scoring iterations: 2 > exp( ) [1] > > (a*d)/(b*c) [1] > # pretty durn close... Anthropological Sciences 192/292: Tables 26
27 Logistic Regression Can Be Used with Continuous Independent Variables Too Consider the contrived case where all covariates are identical between to cases, A and B except for j which differs by an additive factor logit(p A ) = α + β 1 x β j 1 x j 1 + β j (x j + ) + β j+1 x j+1 β k x k logit(p B ) = α + β 1 x β j 1 x j 1 + β j (x j ) + β j+1 x j+1 β k x k logit(p A ) logit(p B ) = β j From the definition of a logit, this is Anthropological Sciences 192/292: Tables 27
28 log[p A /(1 p A )] log[p B /(1 p B )] = β j which is just: log [ ] pa /(1 p A ) p B /(1 p B ) = β j In other words e β j is the odds in favor of subject A per unit of increase in covariate j Anthropological Sciences 192/292: Tables 28
29 Multiple Logistic Regression The tool of quantitative social science research? Almost certainly true for epidemiology... > Mroz <- read.table("/home/jhj1/teaching/a192/mroz.txt", header=true, skip=33) > mrozglm <- glm(lfp ~., family="binomial", data=mroz) > summary(mrozglm) Call: glm(formula = lfp ~., family = "binomial", data = Mroz) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) e-07 *** k e-13 *** k age e-07 *** wcyes *** hcyes lwg e-05 *** Anthropological Sciences 192/292: Tables 29
30 inc e-05 *** --- Signif. codes: 0 *** ** 0.01 * (Dispersion parameter for binomial family taken to be 1) Null deviance: on 752 degrees of freedom Residual deviance: on 745 degrees of freedom AIC: Number of Fisher Scoring iterations: 4 Having kids under 5 decreases a woman s probability of labor force participation Having kids between 6 and 18 decreases a woman s probability of labor force participation Being older decreases a woman s probability of labor force participation Going to college increases a woman s probability of labor force participation Being married to a man who went to college has no significant effect Making more money (or having the potential to make more money) increases a woman s probability of labor force participation Having more money decreases a woman s probability of labor force participation Anthropological Sciences 192/292: Tables 30
Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.
Introduction to Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca September 18, 2014 38-1 : a review 38-2 Evidence Ideal: to advance the knowledge-base of clinical medicine,
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationLogistic Regressions. Stat 430
Logistic Regressions Stat 430 Final Project Final Project is, again, team based You will decide on a project - only constraint is: you are supposed to use techniques for a solution that are related to
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science
UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator
More informationR Hints for Chapter 10
R Hints for Chapter 10 The multiple logistic regression model assumes that the success probability p for a binomial random variable depends on independent variables or design variables x 1, x 2,, x k.
More informationSTA102 Class Notes Chapter Logistic Regression
STA0 Class Notes Chapter 0 0. Logistic Regression We continue to study the relationship between a response variable and one or more eplanatory variables. For SLR and MLR (Chapters 8 and 9), our response
More informationBMI 541/699 Lecture 22
BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population
More informationGeneralized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model
Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example
More informationLogistic Regression - problem 6.14
Logistic Regression - problem 6.14 Let x 1, x 2,, x m be given values of an input variable x and let Y 1,, Y m be independent binomial random variables whose distributions depend on the corresponding values
More informationLecture 24. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University
Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 1 Odds ratios for retrospective studies 2 Odds ratios approximating the
More informationRegression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102
Background Regression so far... Lecture 21 - Sta102 / BME102 Colin Rundel November 18, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent
More informationExam Applied Statistical Regression. Good Luck!
Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.
More informationTruck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation
Background Regression so far... Lecture 23 - Sta 111 Colin Rundel June 17, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical or categorical
More informationModeling Overdispersion
James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 1 Introduction 2 Introduction In this lecture we discuss the problem of overdispersion in
More informationHypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)
Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2) B.H. Robbins Scholars Series June 23, 2010 1 / 29 Outline Z-test χ 2 -test Confidence Interval Sample size and power Relative effect
More informationChecking the Poisson assumption in the Poisson generalized linear model
Checking the Poisson assumption in the Poisson generalized linear model The Poisson regression model is a generalized linear model (glm) satisfying the following assumptions: The responses y i are independent
More informationLogistic Regression 21/05
Logistic Regression 21/05 Recall that we are trying to solve a classification problem in which features x i can be continuous or discrete (coded as 0/1) and the response y is discrete (0/1). Logistic regression
More information7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis
Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression
More informationToday. HW 1: due February 4, pm. Aspects of Design CD Chapter 2. Continue with Chapter 2 of ELM. In the News:
Today HW 1: due February 4, 11.59 pm. Aspects of Design CD Chapter 2 Continue with Chapter 2 of ELM In the News: STA 2201: Applied Statistics II January 14, 2015 1/35 Recap: data on proportions data: y
More informationStatistics in medicine
Statistics in medicine Lecture 3: Bivariate association : Categorical variables Proportion in one group One group is measured one time: z test Use the z distribution as an approximation to the binomial
More informationInteractions in Logistic Regression
Interactions in Logistic Regression > # UCBAdmissions is a 3-D table: Gender by Dept by Admit > # Same data in another format: > # One col for Yes counts, another for No counts. > Berkeley = read.table("http://www.utstat.toronto.edu/~brunner/312f12/
More informationChapter Six: Two Independent Samples Methods 1/51
Chapter Six: Two Independent Samples Methods 1/51 6.3 Methods Related To Differences Between Proportions 2/51 Test For A Difference Between Proportions:Introduction Suppose a sampling distribution were
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationSurvival Analysis I (CHL5209H)
Survival Analysis Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca January 7, 2015 31-1 Literature Clayton D & Hills M (1993): Statistical Models in Epidemiology. Not really
More information22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression
22s:52 Applied Linear Regression Ch. 4 (sec. and Ch. 5 (sec. & 4: Logistic Regression Logistic Regression When the response variable is a binary variable, such as 0 or live or die fail or succeed then
More informationOn the Inference of the Logistic Regression Model
On the Inference of the Logistic Regression Model 1. Model ln =(; ), i.e. = representing false. The linear form of (;) is entertained, i.e. ((;)) ((;)), where ==1 ;, with 1 representing true, 0 ;= 1+ +
More informationClassification: Logistic Regression and Naive Bayes Book Chapter 4. Carlos M. Carvalho The University of Texas McCombs School of Business
Classification: Logistic Regression and Naive Bayes Book Chapter 4. Carlos M. Carvalho The University of Texas McCombs School of Business 1 1. Classification 2. Logistic Regression, One Predictor 3. Inference:
More informationMODULE 6 LOGISTIC REGRESSION. Module Objectives:
MODULE 6 LOGISTIC REGRESSION Module Objectives: 1. 147 6.1. LOGIT TRANSFORMATION MODULE 6. LOGISTIC REGRESSION Logistic regression models are used when a researcher is investigating the relationship between
More informationLogistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression
Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024
More informationPoisson Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University
Poisson Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Poisson Regression 1 / 49 Poisson Regression 1 Introduction
More informationLectures 5 & 6: Hypothesis Testing
Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across
More informationssh tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm
Kedem, STAT 430 SAS Examples: Logistic Regression ==================================== ssh abc@glue.umd.edu, tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm a. Logistic regression.
More informationSTAT 526 Spring Midterm 1. Wednesday February 2, 2011
STAT 526 Spring 2011 Midterm 1 Wednesday February 2, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points
More informationLecture 12: Effect modification, and confounding in logistic regression
Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression
More information1. Logistic Regression, One Predictor 2. Inference: Estimating the Parameters 3. Multiple Logistic Regression 4. AIC and BIC in Logistic Regression
Logistic Regression 1. Logistic Regression, One Predictor 2. Inference: Estimating the Parameters 3. Multiple Logistic Regression 4. AIC and BIC in Logistic Regression 5. Target Marketing: Tabloid Data
More informationStatistics 203 Introduction to Regression Models and ANOVA Practice Exam
Statistics 203 Introduction to Regression Models and ANOVA Practice Exam Prof. J. Taylor You may use your 4 single-sided pages of notes This exam is 7 pages long. There are 4 questions, first 3 worth 10
More informationGeneralized linear models
Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models
More informationWeek 7 Multiple factors. Ch , Some miscellaneous parts
Week 7 Multiple factors Ch. 18-19, Some miscellaneous parts Multiple Factors Most experiments will involve multiple factors, some of which will be nuisance variables Dealing with these factors requires
More informationMatched Pair Data. Stat 557 Heike Hofmann
Matched Pair Data Stat 557 Heike Hofmann Outline Marginal Homogeneity - review Binary Response with covariates Ordinal response Symmetric Models Subject-specific vs Marginal Model conditional logistic
More informationDuration of Unemployment - Analysis of Deviance Table for Nested Models
Duration of Unemployment - Analysis of Deviance Table for Nested Models February 8, 2012 The data unemployment is included as a contingency table. The response is the duration of unemployment, gender and
More informationUnit 5 Logistic Regression Practice Problems
Unit 5 Logistic Regression Practice Problems SOLUTIONS R Users Source: Afifi A., Clark VA and May S. Computer Aided Multivariate Analysis, Fourth Edition. Boca Raton: Chapman and Hall, 2004. Exercises
More informationLog-linear Models for Contingency Tables
Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A
More informationLab #11. Variable B. Variable A Y a b a+b N c d c+d a+c b+d N = a+b+c+d
BIOS 4120: Introduction to Biostatistics Breheny Lab #11 We will explore observational studies in today s lab and review how to make inferences on contingency tables. We will only use 2x2 tables for today
More informationLogistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University
Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction
More informationSTAC51: Categorical data Analysis
STAC51: Categorical data Analysis Mahinda Samarakoon April 6, 2016 Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 25 Table of contents 1 Building and applying logistic regression models (Chap
More informationLecture 3: Measures of effect: Risk Difference Attributable Fraction Risk Ratio and Odds Ratio
Lecture 3: Measures of effect: Risk Difference Attributable Fraction Risk Ratio and Odds Ratio Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK March 3-5,
More informationExercise 5.4 Solution
Exercise 5.4 Solution Niels Richard Hansen University of Copenhagen May 7, 2010 1 5.4(a) > leukemia
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.
More informationVarious Issues in Fitting Contingency Tables
Various Issues in Fitting Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Complete Tables with Zero Entries In contingency tables, it is possible to have zero entries in a
More informationGeneralised linear models. Response variable can take a number of different formats
Generalised linear models Response variable can take a number of different formats Structure Limitations of linear models and GLM theory GLM for count data GLM for presence \ absence data GLM for proportion
More informationLogistic regression analysis. Birthe Lykke Thomsen H. Lundbeck A/S
Logistic regression analysis Birthe Lykke Thomsen H. Lundbeck A/S 1 Response with only two categories Example Odds ratio and risk ratio Quantitative explanatory variable More than one variable Logistic
More informationIntroduction to Statistics and R
Introduction to Statistics and R Mayo-Illinois Computational Genomics Workshop (2018) Ruoqing Zhu, Ph.D. Department of Statistics, UIUC rqzhu@illinois.edu June 18, 2018 Abstract This document is a supplimentary
More informationAn introduction to biostatistics: part 1
An introduction to biostatistics: part 1 Cavan Reilly September 6, 2017 Table of contents Introduction to data analysis Uncertainty Probability Conditional probability Random variables Discrete random
More informationRegression Methods for Survey Data
Regression Methods for Survey Data Professor Ron Fricker! Naval Postgraduate School! Monterey, California! 3/26/13 Reading:! Lohr chapter 11! 1 Goals for this Lecture! Linear regression! Review of linear
More informationGeneralized Linear Models. stat 557 Heike Hofmann
Generalized Linear Models stat 557 Heike Hofmann Outline Intro to GLM Exponential Family Likelihood Equations GLM for Binomial Response Generalized Linear Models Three components: random, systematic, link
More informationRegression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.
Regression models Generalized linear models in R Dr Peter K Dunn http://www.usq.edu.au Department of Mathematics and Computing University of Southern Queensland ASC, July 00 The usual linear regression
More informationChapter 22: Log-linear regression for Poisson counts
Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure
More informationSTA 450/4000 S: January
STA 450/4000 S: January 6 005 Notes Friday tutorial on R programming reminder office hours on - F; -4 R The book Modern Applied Statistics with S by Venables and Ripley is very useful. Make sure you have
More informationSimple logistic regression
Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a
More informationConsider fitting a model using ordinary least squares (OLS) regression:
Example 1: Mating Success of African Elephants In this study, 41 male African elephants were followed over a period of 8 years. The age of the elephant at the beginning of the study and the number of successful
More informationCount data page 1. Count data. 1. Estimating, testing proportions
Count data page 1 Count data 1. Estimating, testing proportions 100 seeds, 45 germinate. We estimate probability p that a plant will germinate to be 0.45 for this population. Is a 50% germination rate
More informationNormal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,
Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability
More informationSection IX. Introduction to Logistic Regression for binary outcomes. Poisson regression
Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about
More informationAdministration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books
STA 44/04 Jan 6, 00 / 5 Administration Homework on web page, due Feb NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00... administration / 5 STA 44/04 Jan 6,
More informationAge 55 (x = 1) Age < 55 (x = 0)
Logistic Regression with a Single Dichotomous Predictor EXAMPLE: Consider the data in the file CHDcsv Instead of examining the relationship between the continuous variable age and the presence or absence
More informationTests for the Odds Ratio in a Matched Case-Control Design with a Quantitative X
Chapter 157 Tests for the Odds Ratio in a Matched Case-Control Design with a Quantitative X Introduction This procedure calculates the power and sample size necessary in a matched case-control study designed
More informationLeftovers. Morris. University Farm. University Farm. Morris. yield
Leftovers SI 544 Lada Adamic 1 Trellis graphics Trebi Wisconsin No. 38 No. 457 Glabron Peatland Velvet No. 475 Manchuria No. 462 Svansota Trebi Wisconsin No. 38 No. 457 Glabron Peatland Velvet No. 475
More informationPubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH
PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH The First Step: SAMPLE SIZE DETERMINATION THE ULTIMATE GOAL The most important, ultimate step of any of clinical research is to do draw inferences;
More informationAnalysing categorical data using logit models
Analysing categorical data using logit models Graeme Hutcheson, University of Manchester The lecture notes, exercises and data sets associated with this course are available for download from: www.research-training.net/manchester
More informationModel Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection
Model Selection in GLMs Last class: estimability/identifiability, analysis of deviance, standard errors & confidence intervals (should be able to implement frequentist GLM analyses!) Today: standard frequentist
More informationPB HLTH 240A: Advanced Categorical Data Analysis Fall 2007
Cohort study s formulations PB HLTH 240A: Advanced Categorical Data Analysis Fall 2007 Srine Dudoit Division of Biostatistics Department of Statistics University of California, Berkeley www.stat.berkeley.edu/~srine
More information12 Modelling Binomial Response Data
c 2005, Anthony C. Brooms Statistical Modelling and Data Analysis 12 Modelling Binomial Response Data 12.1 Examples of Binary Response Data Binary response data arise when an observation on an individual
More informationLogistic & Tobit Regression
Logistic & Tobit Regression Different Types of Regression Binary Regression (D) Logistic transformation + e P( y x) = 1 + e! " x! + " x " P( y x) % ln$ ' = ( + ) x # 1! P( y x) & logit of P(y x){ P(y
More informationExperimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla.
Experimental Design and Statistical Methods Workshop LOGISTIC REGRESSION Jesús Piedrafita Arilla jesus.piedrafita@uab.cat Departament de Ciència Animal i dels Aliments Items Logistic regression model Logit
More informationTesting Independence
Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1
More informationNATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours
NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks
More informationExplanatory variables are: weight, width of shell, color (medium light, medium, medium dark, dark), and condition of spine.
Horseshoe crab example: There are 173 female crabs for which we wish to model the presence or absence of male satellites dependant upon characteristics of the female horseshoe crabs. 1 satellite present
More informationBinomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials
Lecture : Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 27 Binomial Model n independent trials (e.g., coin tosses) p = probability of success on each trial (e.g., p =! =
More informationChapter 5 Simplifying Formulas and Solving Equations
Chapter 5 Simplifying Formulas and Solving Equations Look at the geometry formula for Perimeter of a rectangle P = L W L W. Can this formula be written in a simpler way? If it is true, that we can simplify
More informationLecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015
Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2015 1 / 1 Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits.
More informationClass Notes: Week 8. Probit versus Logit Link Functions and Count Data
Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While
More informationIntroduction to logistic regression
Introduction to logistic regression Tuan V. Nguyen Professor and NHMRC Senior Research Fellow Garvan Institute of Medical Research University of New South Wales Sydney, Australia What we are going to learn
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationA Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46
A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response
More informationLecture 10: Introduction to Logistic Regression
Lecture 10: Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 2007 Logistic Regression Regression for a response variable that follows a binomial distribution Recall the binomial
More informationModel Estimation Example
Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions
More information9 Generalized Linear Models
9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models
More informationBinary Logistic Regression
The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b
More information1 Comparing two binomials
BST 140.652 Review notes 1 Comparing two binomials 1. Let X Binomial(n 1,p 1 ) and ˆp 1 = X/n 1 2. Let Y Binomial(n 2,p 2 ) and ˆp 2 = Y/n 2 3. We also use the following notation: n 11 = X n 12 = n 1 X
More informationIntroduction to Logistic Regression
Introduction to Logistic Regression Problem & Data Overview Primary Research Questions: 1. What are the risk factors associated with CHD? Regression Questions: 1. What is Y? 2. What is X? Did player develop
More informationA Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps
More informationChapter 6. Logistic Regression. 6.1 A linear model for the log odds
Chapter 6 Logistic Regression In logistic regression, there is a categorical response variables, often coded 1=Yes and 0=No. Many important phenomena fit this framework. The patient survives the operation,
More informationCalculating Effect-Sizes. David B. Wilson, PhD George Mason University
Calculating Effect-Sizes David B. Wilson, PhD George Mason University The Heart and Soul of Meta-analysis: The Effect Size Meta-analysis shifts focus from statistical significance to the direction and
More informationDiscrete Math, Spring Solutions to Problems V
Discrete Math, Spring 202 - Solutions to Problems V Suppose we have statements P, P 2, P 3,, one for each natural number In other words, we have the collection or set of statements {P n n N} a Suppose
More informationBooklet of Code and Output for STAD29/STA 1007 Midterm Exam
Booklet of Code and Output for STAD29/STA 1007 Midterm Exam List of Figures in this document by page: List of Figures 1 Packages................................ 2 2 Hospital infection risk data (some).................
More informationTests for the Odds Ratio in Logistic Regression with One Binary X (Wald Test)
Chapter 861 Tests for the Odds Ratio in Logistic Regression with One Binary X (Wald Test) Introduction Logistic regression expresses the relationship between a binary response variable and one or more
More information