Distribution-free ROC Analysis Using Binary Regression Techniques
|
|
- Gertrude Barker
- 5 years ago
- Views:
Transcription
1 Distribution-free ROC Analysis Using Binary Regression Techniques Todd A. Alonzo and Margaret S. Pepe As interpreted by: Andrew J. Spieker University of Washington Dept. of Biostatistics Update Talk 1 / 37
2 Prediction diagram... 2 / 37
3 Any old ROC curve 3 / 37
4 Prediction diagram... 4 / 37
5 Prediction diagram... 5 / 37
6 Prediction diagram... 6 / 37
7 ROC G 0 and G 1 the survivor functions for Y 0 and Y 1, respectively... ROC Y0,Y 1 (s) = P ( Y 1 G 1 0 (s)) = G 1 ( G 1 0 (s)) 7 / 37
8 ROC 8 / 37
9 ROC Regression X : covariate(s) for all subjects X 1 : variables specific to the diseased group (most often part of the outcome) G 0 (y X ) and G 1 (y X, X 1 ) the (conditional) survivor functions for Y 0 and Y 1, respectively... ROC Y0,Y 1 X,X 1 (s) = P ( Y 1 G0 1 (s X ) ) X, X1 ( = G 1 G 1 0 (s X ) ) X, X1 9 / 37
10 Estimation (Pepe, 2000) We observe Y - and X -values on n 0 healthy controls and n 1 diseased participants... Key Observation: If U ij = 1(Y 1i Y 0j ), then... E[U ij G 0 (Y 0j X j ) = s, X i, X j, X 1i ] = P(Y 1 Y 0 G 0 (Y 0j X j ) = s, X i, X j, X 1i ) = P(Y 1 Y 0 G 1 0 (s X j) = Y 0j, X i, X j, X 1i ) = P(Y 1 G 1 0 (s X j) X i, X 1i ) = G 1 (G 1 0 (s X j) X i, X 1i ) = ROC Y0j,Y 1i X i,x j,x 1i (s) 10 / 37
11 Estimation (Pepe, 2000) Hence, it is sensible to fit a GLM to estimate ROC with a binary outcome. But how do we choose a link function? Need to make sure ROC curve is monotonically increasing in s. 11 / 37
12 Choice of link function? Binormal ROC curve: ROC Y0j,Y 1i X i,x j,x 1i (s) = Φ ( α 0 + α 1 Φ 1 (s) + βx + γx 1 ) Called binormal because if Y 0 and Y 1 are normally distributed with means µ 0, µ 1, and variances σ 2 0, σ2 1 (respectively), then the ROC curve is of this form. 12 / 37
13 Computational (In?)efficiency Case n 0 = n 1 = 50 n 0 = n 1 = 500 n 0 = n 1 = 1000 Runtime 0.01 sec sec sec. 13 / 37
14 Computational (In?)efficiency: 500 Bootstraps Case n 0 = n 1 = 50 n 0 = n 1 = 500 n 0 = n 1 = 1000 Runtime > 5 sec. > 8 min. > 24 min. 14 / 37
15 Computational (In?)efficiency: 500 Bootstraps Solution: Load up Bayes, Gauss, Gosset, Hercules, etc. and annoy everyone in the department. 15 / 37
16 ! Questions?... Just kidding 16 / 37
17 Computational (In?)efficiency: 500 Bootstraps Solution: Load up Bayes, Gauss, Gosset, Hercules, etc. and annoy everyone in the department. (I like having good standing with people in the department, when possible). Solution: Consider a more sensible set of comparisons... i.e., Not all pairs of diseased and healthy participants... Anticipated Problem: Fewer pairs will likely come with a loss of (statistical) efficiency. (Spoilers!) 17 / 37
18 Back to the drawing board! (Recall... ) We observe Y - and X -values on n 0 healthy controls and n 1 diseased participants... Key Observation: If U ij = 1(Y 1i Y 0j ), then... E[U ij G 0 (Y 0j X j ) = s, X i, X j, X 1i ] = P(Y 1 Y 0 G 0 (Y 0j X j ) = s, X i, X j, X 1i ) = P(Y 1 Y 0 G 1 0 (s X j) = Y 0j, X i, X j, X 1i ) = P(Y 1 G 1 0 (s X j) X i, X 1i ) = G 1 (G 1 0 (s X j) X i, X 1i ) = ROC Y0j,Y 1i X i,x j,x 1i (s) 18 / 37
19 Back to the drawing board! We observe Y - and X -values on n 0 healthy controls and n 1 diseased participants... Key Observation: If U ij = 1(Y 1i Y 0j ), then... E[U ij G 0 (Y 0j X j ) = s, X i, X j, X 1i ] = P(Y 1 Y 0 G 0 (Y 0j X j ) = s, X i, X j, X 1i ) = P(Y 1 Y 0 G 1 0 (s X j) = Y 0j, X i, X j, X 1i ) = P(Y 1 G 1 0 (s X j) X i, X 1i ) = G 1 (G 1 0 (s X j) X i, X 1i ) = ROC Y0j,Y 1i X i,x j,x 1i (s) 19 / 37
20 Back to the drawing board! We observe Y - and X -values on n 0 healthy controls and n 1 diseased participants... Key Observation: If U is = 1(Y 1i G 1 0 (s X j)), then... E[U is X i, X 1i ] = P(Y 1 G 1 0 (s X j) X i, X 1i ) = G 1 (G 1 0 (s X j) X i, X 1i ) = ROC Y0j,Y 1i X i,x j,x 1i (s) For a suitable set of choices for s. 20 / 37
21 Specificity Set Key Observation: E[U is X i, X 1i ] = ROC Y0j,Y 1i X i,x j,x 1i (s) Indeed, the choice S = {i/n0 : i = 1,..., n 0 1} is maximal and corresponds to the original GLM method proposed. 21 / 37
22 Simulation Studies There are so many questions we can ask: How well does this approach work (and how does it do compared to maximum likelihood?) How does efficiency change when you shrink the specificity set? Can you estimate standard errors well with bootstrap? Is it robust to model misspecification? 22 / 37
23 Study 1: How well does it work? n 0 = n 1 = 50 Y 0i N (0, 1); Y 1j N ( α 0 /α 1, 1/α1 2 )... this is done so that ROC Y0,Y 1 (s) is binormal with parameters α 0 and α 1 Case 1: α = (0.75, 0.90) Case 2: α = (1.50, 0.85) Two-thousand replications 23 / 37
24 Study 1: How well does it work? 24 / 37
25 Study 1: How well does it work? Note: Maximum likelihood estimates are given by: ˆα 0 = X 1 X 0 ˆσ 1 ; ˆα 1 = ˆσ 0 ˆσ 1 which follows from finding the inverse survivor function for the healthy subjects, and then applying the normal survivor function from the diseased subjects. = Φ ( ) G0 1 (s) µ 1 = 1 Φ σ 1 ( ROC Y0,Y 1 (s) = G 1 G 1 0 (s)) ( σ0 Φ 1 ) (s) + µ 0 µ 1 σ 1 ( µ1 µ 0 = Φ + σ 0 Φ 1 (s) σ 1 σ 1 ), 25 / 37
26 Study 1: How well does it work? Scenario 1: α = (0.75, 0.90) 26 / 37
27 Study 1: How well does it work? Scenario 2: α = (1.50, 0.85) 27 / 37
28 Study 2: Loss of efficiency with smaller S n 0 = n 1 = 200 Y 0i N (0, 1); Y 1j N ( α 0 /α 1, 1/α1 2 )... this is done so that ROC Y0,Y 1 (s) is binormal with parameters α 0 and α 1 Case 1: α = (0.75, 0.90) Case 2: α = (1.50, 0.85) Two-thousand replications Consider S = {i/(l + 1); i = 1,..., l} for several choices of l 28 / 37
29 Study 2: Loss of efficiency with smaller S α = (0.75, 0.90) α = (1.50, 0.85) 29 / 37
30 Computational (In?)efficiency: 500 Bootstraps Case n 0 = n 1 = 50 n 0 = n 1 = 500 n 0 = n 1 = 1000 Runtime > 5 sec. > 8 min. > 24 min. 30 / 37
31 Childhood Predictors of Adult Obesity Study (CPAO) 823 adults (133 obese and 823 non-obese) Determine whether childhood BMI can predict adult obesity Unadjusted model (easy enough) Adjusted model: include age, sex (for everyone), and adult BMI (for the obese participants) in the model 31 / 37
32 Table 1. ROC-GLM Models for CPAO Data Method Estimate Bootstrap S.E. p-value Unadjusted: Intercept < Φ 1 (s) < Adjusted: Intercept Age Gender Adult BMI Φ 1 (s) < / 37
33 Table 1. ROC-GLM Models for CPAO Data Method Estimate Bootstrap S.E. p-value Unadjusted: Intercept < Φ 1 (s) < Adjusted: Intercept Age Gender Adult BMI Φ 1 (s) < / 37
34 Table 1. ROC-GLM Models for CPAO Data Method Estimate Bootstrap S.E. p-value Age Two obese adults of the same gender and adult BMI, but differing in age by one year, differ in estimated probability of having a BMI exceed the healthy quantiles of the same respective covariates by on the probit scale (with the older participant having the higher probability) 34 / 37
35 35 / 37
36 In Summary Meaningful computational efficiency gains in considering fewer cut-points... At a cost of statistical efficiency Might be acceptable when trying to get done a ton of computations... or working with an absolutely massive data set 36 / 37
37 What are your thoughts? I am hugely disappointed not to have been able to discuss model misspecification... With the five extra minutes I have next time would you rather see the case where the data generation mechanism is not normal, or when curve itself is misspecified? Is there anything from this you would cut out? Anyone still have trouble remembering which is sensitivity and which is specificity? 37 / 37
Distribution-free ROC Analysis Using Binary Regression Techniques
Distribution-free Analysis Using Binary Techniques Todd A. Alonzo and Margaret S. Pepe As interpreted by: Andrew J. Spieker University of Washington Dept. of Biostatistics Introductory Talk No, not that!
More informationCategorical Predictor Variables
Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively
More informationLinear Regression With Special Variables
Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 12: Logistic regression (v1) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 30 Regression methods for binary outcomes 2 / 30 Binary outcomes For the duration of this
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More information5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is
Practice Final Exam Last Name:, First Name:. Please write LEGIBLY. Answer all questions on this exam in the space provided (you may use the back of any page if you need more space). Show all work but do
More informationIntroduction to Econometrics Midterm Examination Fall 2005 Answer Key
Introduction to Econometrics Midterm Examination Fall 2005 Answer Key Please answer all of the questions and show your work Clearly indicate your final answer to each question If you think a question is
More information1. Let A and B be two events such that P(A)=0.6 and P(B)=0.6. Which of the following MUST be true?
1 UNIVERSITY OF MASSACHUSETTS Department of Biostatistics and Epidemiology BioEpi 540W - Introduction to Biostatistics Fall 2004 Exercises with Solutions Topic 2 Introduction to Probability Due: Monday
More informationBinomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials
Lecture : Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 27 Binomial Model n independent trials (e.g., coin tosses) p = probability of success on each trial (e.g., p =! =
More informationProblems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B
Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2
More informationLecture 10: Introduction to Logistic Regression
Lecture 10: Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 2007 Logistic Regression Regression for a response variable that follows a binomial distribution Recall the binomial
More informationEstimation and Centering
Estimation and Centering PSYED 3486 Feifei Ye University of Pittsburgh Main Topics Estimating the level-1 coefficients for a particular unit Reading: R&B, Chapter 3 (p85-94) Centering-Location of X Reading
More informationMidterm: CS 6375 Spring 2015 Solutions
Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an
More informationIntroduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data
Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data Bryan A. Comstock and Patrick J. Heagerty Department of Biostatistics University of Washington
More informationBayesian Linear Models
Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public
More informationSummer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.
Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall
More informationSTAT2201 Assignment 6
STAT2201 Assignment 6 Question 1 Regression methods were used to analyze the data from a study investigating the relationship between roadway surface temperature (x) and pavement deflection (y). Summary
More informationCorrelation and regression
1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,
More informationBayesian Linear Models
Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam
ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions
More informationPart 8: GLMs and Hierarchical LMs and GLMs
Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 9: Logistic regression (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 28 Regression methods for binary outcomes 2 / 28 Binary outcomes For the duration of this lecture suppose
More informationBiostat 2065 Analysis of Incomplete Data
Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies
More informationGeneralized Linear Models. Last time: Background & motivation for moving beyond linear
Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered
More informationInteractions among Continuous Predictors
Interactions among Continuous Predictors Today s Class: Simple main effects within two-way interactions Conquering TEST/ESTIMATE/LINCOM statements Regions of significance Three-way interactions (and beyond
More informationSimple Linear Regression
Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More information1 What does the random effect η mean?
Some thoughts on Hanks et al, Environmetrics, 2015, pp. 243-254. Jim Hodges Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota USA 55414 email: hodge003@umn.edu October 13, 2015
More informationRegression diagnostics
Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model
More informationYou know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?
You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?) I m not goin stop (What?) I m goin work harder (What?) Sir David
More informationSimple, Marginal, and Interaction Effects in General Linear Models
Simple, Marginal, and Interaction Effects in General Linear Models PRE 905: Multivariate Analysis Lecture 3 Today s Class Centering and Coding Predictors Interpreting Parameters in the Model for the Means
More informationLogistic Regression and Generalized Linear Models
Logistic Regression and Generalized Linear Models Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/2 Topics Generative vs. Discriminative models In
More informationBayesian Linear Regression
Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective
More informationMultivariate Regression
Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the
More informationThe linear model is the most fundamental of all serious statistical models encompassing:
Linear Regression Models: A Bayesian perspective Ingredients of a linear model include an n 1 response vector y = (y 1,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x 1,..., x
More informationISQS 5349 Final Exam, Spring 2017.
ISQS 5349 Final Exam, Spring 7. Instructions: Put all answers on paper other than this exam. If you do not have paper, some will be provided to you. The exam is OPEN BOOKS, OPEN NOTES, but NO ELECTRONIC
More informationAccounting for Complex Sample Designs via Mixture Models
Accounting for Complex Sample Designs via Finite Normal Mixture Models 1 1 University of Michigan School of Public Health August 2009 Talk Outline 1 2 Accommodating Sampling Weights in Mixture Models 3
More informationCorrelation & Simple Regression
Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.
More informationSimple Linear Regression Using Ordinary Least Squares
Simple Linear Regression Using Ordinary Least Squares Purpose: To approximate a linear relationship with a line. Reason: We want to be able to predict Y using X. Definition: The Least Squares Regression
More informationMidterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.
CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic
More informationA Brief and Friendly Introduction to Mixed-Effects Models in Linguistics
A Brief and Friendly Introduction to Mixed-Effects Models in Linguistics Cluster-specific parameters ( random effects ) Σb Parameters governing inter-cluster variability b1 b2 bm x11 x1n1 x21 x2n2 xm1
More informationTECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study
TECHNICAL REPORT # 59 MAY 2013 Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study Sergey Tarima, Peng He, Tao Wang, Aniko Szabo Division of Biostatistics,
More informationStatistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018
Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationOptimal Designs for 2 k Experiments with Binary Response
1 / 57 Optimal Designs for 2 k Experiments with Binary Response Dibyen Majumdar Mathematics, Statistics, and Computer Science College of Liberal Arts and Sciences University of Illinois at Chicago Joint
More informationMixed-Models. version 30 October 2011
Mixed-Models version 30 October 2011 Mixed models Mixed models estimate a vector! of fixed effects and one (or more) vectors u of random effects Both fixed and random effects models always include a vector
More informationPredicted Y Scores. The symbol stands for a predicted Y score
REGRESSION 1 Linear Regression Linear regression is a statistical procedure that uses relationships to predict unknown Y scores based on the X scores from a correlated variable. 2 Predicted Y Scores Y
More informationStatistical Modelling with Stata: Binary Outcomes
Statistical Modelling with Stata: Binary Outcomes Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 21/11/2017 Cross-tabulation Exposed Unexposed Total Cases a b a + b Controls
More informationCSC2515 Assignment #2
CSC2515 Assignment #2 Due: Nov.4, 2pm at the START of class Worth: 18% Late assignments not accepted. 1 Pseudo-Bayesian Linear Regression (3%) In this question you will dabble in Bayesian statistics and
More informationBeyond GLM and likelihood
Stat 6620: Applied Linear Models Department of Statistics Western Michigan University Statistics curriculum Core knowledge (modeling and estimation) Math stat 1 (probability, distributions, convergence
More informationMultiple Regression. Dr. Frank Wood. Frank Wood, Linear Regression Models Lecture 12, Slide 1
Multiple Regression Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 12, Slide 1 Review: Matrix Regression Estimation We can solve this equation (if the inverse of X
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More informationLogit Regression and Quantities of Interest
Logit Regression and Quantities of Interest Stephen Pettigrew March 4, 2015 Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 1 / 57 Outline 1 Logistics 2 Generalized Linear Models
More informationAccommodating covariates in receiver operating characteristic analysis
The Stata Journal (2009) 9, Number 1, pp. 17 39 Accommodating covariates in receiver operating characteristic analysis Holly Janes Fred Hutchinson Cancer Research Center Seattle, WA hjanes@fhcrc.org Gary
More informationLecture Slides - Part 1
Lecture Slides - Part 1 Bengt Holmstrom MIT February 2, 2016. Bengt Holmstrom (MIT) Lecture Slides - Part 1 February 2, 2016. 1 / 36 Going to raise the level a little because 14.281 is now taught by Juuso
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationLecture #11: Classification & Logistic Regression
Lecture #11: Classification & Logistic Regression CS 109A, STAT 121A, AC 209A: Data Science Weiwei Pan, Pavlos Protopapas, Kevin Rader Fall 2016 Harvard University 1 Announcements Midterm: will be graded
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationHypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal
Hypothesis testing, part 2 With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal 1 CATEGORICAL IV, NUMERIC DV 2 Independent samples, one IV # Conditions Normal/Parametric Non-parametric
More informationModels for binary data
Faculty of Health Sciences Models for binary data Analysis of repeated measurements 2015 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen 1 / 63 Program for
More informationA general and robust framework for secondary traits analysis
Genetics: Early Online, published on February 19, 2016 as 10.1534/genetics.115.181073 GENETICS INVESTIGATION A general and robust framework for secondary traits analysis Xiaoyu Song,1, Iuliana Ionita-Laza,
More informationACOVA and Interactions
Chapter 15 ACOVA and Interactions Analysis of covariance (ACOVA) incorporates one or more regression variables into an analysis of variance. As such, we can think of it as analogous to the two-way ANOVA
More informationGeneralized logit models for nominal multinomial responses. Local odds ratios
Generalized logit models for nominal multinomial responses Categorical Data Analysis, Summer 2015 1/17 Local odds ratios Y 1 2 3 4 1 π 11 π 12 π 13 π 14 π 1+ X 2 π 21 π 22 π 23 π 24 π 2+ 3 π 31 π 32 π
More informationLogistic regression: Miscellaneous topics
Logistic regression: Miscellaneous topics April 11 Introduction We have covered two approaches to inference for GLMs: the Wald approach and the likelihood ratio approach I claimed that the likelihood ratio
More informationBasic Medical Statistics Course
Basic Medical Statistics Course S7 Logistic Regression November 2015 Wilma Heemsbergen w.heemsbergen@nki.nl Logistic Regression The concept of a relationship between the distribution of a dependent variable
More informationBIOS 2083 Linear Models c Abdus S. Wahed
Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter
More informationWell-developed and understood properties
1 INTRODUCTION TO LINEAR MODELS 1 THE CLASSICAL LINEAR MODEL Most commonly used statistical models Flexible models Well-developed and understood properties Ease of interpretation Building block for more
More information6. Multiple regression - PROC GLM
Use of SAS - November 2016 6. Multiple regression - PROC GLM Karl Bang Christensen Department of Biostatistics, University of Copenhagen. http://biostat.ku.dk/~kach/sas2016/ kach@biostat.ku.dk, tel: 35327491
More informationGeneralized Linear Models Introduction
Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,
More informationSeries 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning)
Exercises Introduction to Machine Learning SS 2018 Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning) LAS Group, Institute for Machine Learning Dept of Computer Science, ETH Zürich Prof
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationSTA441: Spring Multiple Regression. More than one explanatory variable at the same time
STA441: Spring 2016 Multiple Regression More than one explanatory variable at the same time This slide show is a free open source document. See the last slide for copyright information. One Explanatory
More informationAnalysing data: regression and correlation S6 and S7
Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association
More informationChapter 1. Modeling Basics
Chapter 1. Modeling Basics What is a model? Model equation and probability distribution Types of model effects Writing models in matrix form Summary 1 What is a statistical model? A model is a mathematical
More informationLogistic Regression. Advanced Methods for Data Analysis (36-402/36-608) Spring 2014
Logistic Regression Advanced Methods for Data Analysis (36-402/36-608 Spring 204 Classification. Introduction to classification Classification, like regression, is a predictive task, but one in which the
More informationSTAT2201 Assignment 6 Semester 1, 2017 Due 26/5/2017
Class Example 1. Linear Regression Example The code below uses BrisGCtemp.csv, as appeared in a class example of assignment 3. This file contains temperature observations recorded in Brisbane and the GoldCoast.
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationCausal Inference in Observational Studies with Non-Binary Treatments. David A. van Dyk
Causal Inference in Observational Studies with Non-Binary reatments Statistics Section, Imperial College London Joint work with Shandong Zhao and Kosuke Imai Cass Business School, October 2013 Outline
More informationCausal Inference with General Treatment Regimes: Generalizing the Propensity Score
Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke
More informationExpression Data Exploration: Association, Patterns, Factors & Regression Modelling
Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Exploring gene expression data Scale factors, median chip correlation on gene subsets for crude data quality investigation
More informationRemedial Measures for Multiple Linear Regression Models
Remedial Measures for Multiple Linear Regression Models Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Remedial Measures for Multiple Linear Regression Models 1 / 25 Outline
More informationDimensionality reduction
Dimensionality Reduction PCA continued Machine Learning CSE446 Carlos Guestrin University of Washington May 22, 2013 Carlos Guestrin 2005-2013 1 Dimensionality reduction n Input data may have thousands
More informationSemiparametric Inferential Procedures for Comparing Multivariate ROC Curves with Interaction Terms
UW Biostatistics Working Paper Series 4-29-2008 Semiparametric Inferential Procedures for Comparing Multivariate ROC Curves with Interaction Terms Liansheng Tang Xiao-Hua Zhou Suggested Citation Tang,
More informationInstructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses
ISQS 5349 Final Spring 2011 Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses 1. (10) What is the definition of a regression model that we have used throughout
More informationSemiparametric Generalized Linear Models
Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationThe Sum of Standardized Residuals: Goodness of Fit Test for Binary Response Model
The Sum of Standardized Residuals: Goodness of Fit Test for Binary Response Model Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of
More informationAnalysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013
Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not
More informationGeneral Linear Model: Statistical Inference
Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter 4), least
More informationCorrelation and Regression
Elementary Statistics A Step by Step Approach Sixth Edition by Allan G. Bluman http://www.mhhe.com/math/stat/blumanbrief SLIDES PREPARED BY LLOYD R. JAISINGH MOREHEAD STATE UNIVERSITY MOREHEAD KY Updated
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationEco517 Fall 2014 C. Sims FINAL EXAM
Eco517 Fall 2014 C. Sims FINAL EXAM This is a three hour exam. You may refer to books, notes, or computer equipment during the exam. You may not communicate, either electronically or in any other way,
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis Jonathan Taylor, 10/12 Slide credits: Sergio Bacallado 1 / 1 Review: Main strategy in Chapter 4 Find an estimate ˆP
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More information