Statistics in medicine

Size: px
Start display at page:

Download "Statistics in medicine"

Transcription

1 Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Outline Regression Linear Logistic Cox s proportional hazard model S L I D E 0 S L I D E 1 and prediction methods Regression and prediction Is a measure of the strength and direction of the association between two variables measured on numerical scale. Measured by correlation coefficient r Linear Logistic Other Simple Partial Pearson Spearman Other S L I D E 2 S L I D E 3 1

2 The sign of r indicate the direction of association + sign: Positive correlation High value of one variable are associated with high values of the second variable - sign: Negative correlation High value of one variable are associated with low values of the second variable r ranges between +1 and : Positive correlation Perfect correlation - 1: Negative correlation Perfect correlation 0: No correlation S L I D E 4 S L I D E 5 r is immune to the change in x and y position r is immune to linear transformation r close to 0 does not mean lack of relationship i.e. strong non-linear relationship might exist does NOT indicate causation : Visualization Scatterplot A two-dimensional graph displaying the relationship between two numerical characteristics. Visualization of correlation Also called joint distribution graph Plot (x,y) to assess the pattern of the relationship S L I D E 6 S L I D E 7 2

3 : Scatterplot Legend: Patterns A.Perfect positive B.Positive C.Negative D.Week negative E.Nonexistent F.Nonlinear Scatterplots and correlations. A:r = +1.0; B:r = 0.7; C:r = 0.9; D:r = 0.4; E:r = 0.0; F:r = 0.0. Copyright 2016 McGraw-Hill Education. All rights reserved. Date of download: 1/11/2016 Types Pearson product-moment Interval or ratio scale Spearman rank-order Ordinal scale Other From: Chapter 8. Research Questions About Relationships among Variables Basic & Clinical Biostatistics, 4e, 2004 S L I D E 8 S L I D E 9 Pearson product-moment Used for two numerical normally distributed variables Test of significance 1- Calculate r (correlation coefficient) 2-Calculate the degrees of freedom 3-Calculate the test statistic t 4-Find the critical value of significance t 5-Draw a conclusion Assumptions Linear relationship Normal distribution No outliers Large sample size (>30) (X X)(Y Y) r = (X X) 2 (Y Y) 2 df=n-2 r n 2 t= 1 r 2 S L I D E 10 S L I D E 11 3

4 Spearman s Rho Could be used for NOT normally distributed variables Normally distributed variables Based on ranks Test of significance 1- Calculate r s (correlation coefficient) 2-Calculate the degrees of freedom 3-Calculate the test statistic t 4-Find the critical value of significance t 5-Draw a conclusion (RX R r s = X )(RY R Y ) (R X R X ) 2 (R Y R Y ) 2 df=n-2 r n 2 t= Interpretation of the size of r Rule of thumb no to trivial correlation very low correlation low correlation moderate correlation high correlation.90-1 very high correlation 1 perfect correlation 1 r 2 S L I D E 12 S L I D E 13 Interpretation of the size of r r is affected by sample size Large sample size, with small r significant results Better interpretation using r 2 (Known as the coefficient of determination) Is the proportion of the variance in one variable that is accounted for by the other variable Is a measure of the strength of the relationship S L I D E 14 : Example A study was conducted to examine whether serum calcium and serum triglycerides are correlated. If the correlation coefficient is 20%, interpret the r coefficient, what is the coefficient of determination, and its interpretation, and can you infer causation? Answer There is low correlation between serum calcium and serum triglycerides R 2 =.2x.2=.04=4% Interpretation of r 2 : 4% of the variation of serum calcium is accounted for by serum triglycerides (and vice versa) No S L I D E 15 4

5 Partial correlation Is a measure of the strength and direction of the association between two variables controlling for one or more variable r ranges between +1 and -1 Assumptions Linear relationship between all pairs of variables Normal distribution No outliers Definition: Statistical models that have one dependent (outcome) variable, but include more than one independent variable S L I D E 16 S L I D E 17 Rational of the regression equations Example: if we hypothesize that cholesterol level is predicted by age, gender, and diabetic status, and we would like to find out the line (as in the scatter diagram) that best fit this relationship, we can write this as a straight line equation: Y = a + bx Rational of the straight line equation in regression Cholesterol = age + gender + diabetes But not all these predictors are equally important, so we give each predictor a weight(coefficient) relative to its importance Cholesterol=(W1)age+(W2)gender+(W3)diabetes Rational of the regression equations However, we need a starting point for the calculation, so we add it to the equation Cholesterol=starting point+ (W1)age+(W2)gender+(W3)diabetes Because usually the prediction of the outcome is not perfect, so we add an error term Cholesterol=starting point+ (W1)age+(W2)gender+(W3)diabetes + error term S L I D E 18 S L I D E 19 5

6 Rational of the regression equations The final formula could be expressed as y= a+b 1 x 1 +b 2 x 2 +b 3 x 3 + e Also written as y= β 0 +β 1 x 1 +β 2 x 2 +β 3 x 3 + ε Rational of the regression equations y= a+b 1 x 1 +b 2 x 2 +b 3 x 3 + e Also written as y= β 0 +β 1 x 1 +β 2 x 2 +β 3 x 3 + ε Interpretation of the symbols This equation is commonly referred to as general linear model a (β 0 ):intercept i.e. where line crosses the y-axis b (β 1 k ): regression coefficients (slope) i.e. amount y changes each time x change by 1 unit e (ε): error term(residual) i.e. the distance the actual value of y depart from the regression line S L I D E 20 S L I D E 21 Rational of the regression equations Estimation of best estimates(least-squares method) Observed y and x are known, therefore e has to be calculated Use different a and b to calculate the predicted y (y hat) ŷ= a+b 1 x 1 +b 2 x 2 +b 3 x 3 e is then calculated as: y-ŷ The best estimate is the one with the least error i.e. that minimize e 2 = (y-ŷ) 2 i.e. minimize the sum of the squared error term From: Chapter 8. Research Questions About Relationships among Variables Basic & Clinical Biostatistics, 4e, 2004 Geometric interpretation of a regression line. Least squares regression line. Date of download: 1/12/2016 Copyright 2016 McGraw-Hill Education. All rights reserved. S L I D E 22 S L I D E 23 6

7 Applications Test for interaction Adjust for confounding Predict future values of y given x The types of models described by the previous equation are referred to as general linear models General because can accommodate different types of y and or x Linear because is a linear combination of the x terms Commonly used methods Survival S L I D E 24 S L I D E 25 Readings and resources Chapter 8, p : Dawson, B. and Trapp, R. G. (2004). Basic and Clinical Biostatistics (4th edition). New York: McGraw-Hill Chapter 9, p : Dawson, B. and Trapp, R. G. (2004). Basic and Clinical Biostatistics (4th edition). New York: McGraw-Hill. Chapter 10, p : Dawson, B. and Trapp, R. G. (2004). Basic and Clinical Biostatistics (4th edition). New York: McGraw-Hill. Chapter 11, p : Jekel's epidemiology, biostatistics, preventive medicine, and public health by David L. Katz et al (4th edition). Chapter 13, p : Jekel's epidemiology, biostatistics, preventive medicine, and public health by David L. Katz et al (4th edition). Statistics in medicine Lecture 4 part 2: and multiple regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu S L I D E 26 S L I D E 27 7

8 Outline and prediction methods Regression Linear Logistic Cox s proportional hazard model Regression and prediction Linear Logistic Other Simple Partial Pearson Spearman Other S L I D E 28 S L I D E 29 General linear model in which the dependent variable is continuous variable Types Single continuous predictor: simple linear regression Multiple continuous predictors: multiple linear regression Single categorical predictor: one-way ANOVA Multiple categorical predictors: N-way ANOVA Some categorical and some continuous predictors: analysis of covariance (ANCOVA) Assumptions Linearity: the relation is linear between each independent variable and the dependent variable Independence: The values of Y are independent Homogeneity: The equal variance of Y across the range of X S L I D E 30 S L I D E 31 8

9 Build up the model Assess model fit Interpret the regression coefficient Build up the model Most common method is stepwise (it is automated in most programs) Start with a one variable in the model (the main predictor, if one is hypothesized) Add another variable Keep adding variables to the list of variables already in the model Use a stopping criterion such as: The increase in r 2 <.01 S L I D E 32 S L I D E 33 Build up the model R 2 Is a measure of how much of the variation of the outcome is accounted for by the explanatory variables Range no variance accounted for 1 all the variance (100%) accounted for Assess model fit Residuals the part of Y that is not explained by X could be used to assess the model fit Plot the residuals(on Y axis) versus X The mean of the residuals is zero, therefore, if the model fits the data, the residuals and x should not be correlated S L I D E 34 S L I D E 35 9

10 Assess model fit Legend: Good fit: the residuals form a random scatter around the zero line Illustration of analysis of residuals. A: Linear relationship between X and Y. B: Residuals versus values of X for relation in part A.C: Curvilinear relationship between X and Y. D: Residuals versus values of X for relation in part C. Date of download: 1/12/2016 From: Chapter 8. Research Questions About Relationships among Variables Basic & Clinical Biostatistics, 4e, 2004 Copyright 2016 McGraw-Hill Education. All rights reserved. Interpret the regression coefficient The intercept: it is the expected value of Y if all X = zero If x cannot be zero, so intercept is not meaningful If the interest is in the relationship between X and Y, the intercept is not of interest and will not affect the conclusion If the interest is in the prediction of Y from X, then X has to be re-scaled intercept will be the expected value of Y at the chosen X value S L I D E 36 S L I D E 37 Interpret the regression coefficient The slope: If X is continuous: it is the change in Y for a one-unit increase in X, holding other variables (other X s ) constant If X is categorical: it is the mean difference in Y for between one category and the reference category of X, holding other variables (other X s ) constant Interpret the regression coefficient Compare the test statistic (t) with the critical value of significance for the relevant df The p value: Null hypothesis: the coefficients (intercept and slopes) = zero If p < predetermined significance level reject the null S L I D E 38 S L I D E 39 10

11 An example and interpretation A study was conducted to examine the association between insulin sensitivity scores (outcome) and BMI. The resultant regression equation was Y = X. Interpret the terms in the equation. Y : predicted insulin sensitivity : the intercept i.e. patients with zero BMI (unrealistic) have insulin sensitivity of From: Basic & Clinical Biostatistics, 4e, 2004 An example and interpretation A study was conducted to examine the association between insulin sensitivity scores (outcome) and BMI. The resultant regression equation was Y = X. Interpret the terms in the equation. o X: Observed BMI value o : the slope i.e. when BMI increase by 1 unit, predicted insulin sensitivity decrease by From: Basic & Clinical Biostatistics, 4e, 2004 S L I D E 40 S L I D E 41 General linear model in which the dependent variable is nominal/categorical variable Types Commonly used for dichotomous dependent variable Could be used for multinomial dependent variable Using a mathematical function (logit) to transform the regression data so y will be limited to (0,1) Logit(p (y=1 x s ))=log(p/(1-p))= β 0 +β 1 x 1 +β 2 x 2 + +β k x k Translated into the probability of the dependent variable as an exponential function of the independent variables 1 p y=1 = 1 + exp [ (b 0+b 1 x 1 +b 2 x 2 + +b k x k )] S L I D E 42 S L I D E 43 11

12 Types Commonly used for dichotomous dependent variable Could be used for multinomial dependent variable Build up the model Similar to linear regression Assess model fit Hosmer and Lemeshow s goodness of fit test a p value >.05 acceptable fit Interpret the regression coefficient S L I D E 44 S L I D E 45 Steps: Interpret the regression coefficient exp(β 0 ): the odds that y=1, given x=0 exp(β 1 ): If X is categorical: is the odds ratio of y=1 in one category of x compared to the reference category of x, holding other variables (other x s ) constant If x is continuous: is the change in the odds of y=1 for a one-unit increase in x, holding other variables (other x s ) constant Practical coding issues: Interpretation of the results depend on how you code your data, therefore It is important to check how the outcome is coded i.e. what level is coded as 1 and what level is coded as 2 (or 0) It is important check how the predictors are coded Common practice is to code binary predictors as 0,1 S L I D E 46 S L I D E 47 12

13 An example and interpretation Blood alcohol concentration (BAC)>50mg/dL was examined among men with unintentional injury who was admitted to emergency room. Predictors of BAC were daytime, weekday, being Caucasian, and age of 40. the reported OR(95% CI) for Caucasian, and age of 40 were 1.32( ), and 0.89( ). Interpret the results. From: Basic & Clinical Biostatistics, 4e, 2004 An example and interpretation Answer Caucasians were significantly more likely to have elevated BAC than other races. Age did not significantly predict elevated BAC. From: Basic & Clinical Biostatistics, 4e, 2004 S L I D E 48 S L I D E 49 Survival analysis Definition: The statistical methods for analyzing survival data when there are censored observations Censored observation: is an observation whose value is unknown, generally because the subject has not been in the study long enough for outcome of interest, such as death Survival analysis Common methods of summarization and presentation Person-time Life-tables S L I D E 50 S L I D E 51 13

14 Survival analysis Common methods of summarization and presentation Person-time Person-time is the length of follow-up Ex. If two subject were followed, one for 2 years and the second for one year, then the total person-time is three person-year Could be used to calculate incidence density Incidence density is the number of events divided by the total person-time Useful method if the event could be recurrent Survival analysis Common methods of summarization and presentation Life-tables (covered in the epi course) Two methods Kaplan-Meier method Actuarial method Requirements Date of entry Date of withdrawal Cause of withdrawal Death Loss of follow-up S L I D E 52 S L I D E 53 Survival analysis Common methods to test significance in survival analysis Logrank test Mantel-Haenszel chi-square test Cox s proportional hazard model Bi-variate Cox proportional hazard model Regression model when there is censored outcome data Data is said to be censored if Loss of follow up End of the study The dependent variable is the survival time (time to event) h(t, X 1, X 2,.X K )= h 0 (t)e b 1 x 1 +b 2 x 2 +.+b k x k S L I D E 54 S L I D E 55 14

15 Cox proportional hazard model Cox proportional hazard model Answer the question what is the likelihood of survival to a particular time (i.e. dying in the next interval), given survival up to this time, and given a set of independent variables S L I D E 56 Allows estimating relative risk (also called hazard ratio) In other words, answer the question of what is the risk of an event (such as death) at a given time, given it has not occurred until that time The ratio of the risk of the event at a given time, in the exposed to the risk in the unexposed Assessing the assumption of proportional hazard is beyond this class S L I D E 57 Readings and resources Chapter 8, p : Dawson, B. and Trapp, R. G. (2004). Basic and Clinical Biostatistics (4th edition). New York: McGraw-Hill Chapter 9, p : Dawson, B. and Trapp, R. G. (2004). Basic and Clinical Biostatistics (4th edition). New York: McGraw-Hill. Chapter 10, p : Dawson, B. and Trapp, R. G. (2004). Basic and Clinical Biostatistics (4th edition). New York: McGraw-Hill. Chapter 11, p : Jekel's epidemiology, biostatistics, preventive medicine, and public health by David L. Katz et al (4th edition). Chapter 13, p : Jekel's epidemiology, biostatistics, preventive medicine, and public health by David L. Katz et al (4th edition). S L I D E 58 15

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 3: Bivariate association : Categorical variables Proportion in one group One group is measured one time: z test Use the z distribution as an approximation to the binomial

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline

More information

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Correlation. A statistics method to measure the relationship between two variables. Three characteristics Correlation Correlation A statistics method to measure the relationship between two variables Three characteristics Direction of the relationship Form of the relationship Strength/Consistency Direction

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 1- part 1: Describing variation, and graphical presentation Outline Sources of variation Types of variables Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease

More information

Multiple linear regression S6

Multiple linear regression S6 Basic medical statistics for clinical and experimental research Multiple linear regression S6 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/42 Introduction Two main motivations for doing multiple

More information

REVIEW 8/2/2017 陈芳华东师大英语系

REVIEW 8/2/2017 陈芳华东师大英语系 REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p

More information

Basic Medical Statistics Course

Basic Medical Statistics Course Basic Medical Statistics Course S7 Logistic Regression November 2015 Wilma Heemsbergen w.heemsbergen@nki.nl Logistic Regression The concept of a relationship between the distribution of a dependent variable

More information

Can you tell the relationship between students SAT scores and their college grades?

Can you tell the relationship between students SAT scores and their college grades? Correlation One Challenge Can you tell the relationship between students SAT scores and their college grades? A: The higher SAT scores are, the better GPA may be. B: The higher SAT scores are, the lower

More information

Correlation and simple linear regression S5

Correlation and simple linear regression S5 Basic medical statistics for clinical and eperimental research Correlation and simple linear regression S5 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/41 Introduction Eample: Brain size and

More information

THE PEARSON CORRELATION COEFFICIENT

THE PEARSON CORRELATION COEFFICIENT CORRELATION Two variables are said to have a relation if knowing the value of one variable gives you information about the likely value of the second variable this is known as a bivariate relation There

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

Business Statistics. Lecture 10: Correlation and Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Machine Learning. Module 3-4: Regression and Survival Analysis Day 2, Asst. Prof. Dr. Santitham Prom-on

Machine Learning. Module 3-4: Regression and Survival Analysis Day 2, Asst. Prof. Dr. Santitham Prom-on Machine Learning Module 3-4: Regression and Survival Analysis Day 2, 9.00 16.00 Asst. Prof. Dr. Santitham Prom-on Department of Computer Engineering, Faculty of Engineering King Mongkut s University of

More information

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical

More information

Introduction to logistic regression

Introduction to logistic regression Introduction to logistic regression Tuan V. Nguyen Professor and NHMRC Senior Research Fellow Garvan Institute of Medical Research University of New South Wales Sydney, Australia What we are going to learn

More information

TMA 4275 Lifetime Analysis June 2004 Solution

TMA 4275 Lifetime Analysis June 2004 Solution TMA 4275 Lifetime Analysis June 2004 Solution Problem 1 a) Observation of the outcome is censored, if the time of the outcome is not known exactly and only the last time when it was observed being intact,

More information

Bivariate Relationships Between Variables

Bivariate Relationships Between Variables Bivariate Relationships Between Variables BUS 735: Business Decision Making and Research 1 Goals Specific goals: Detect relationships between variables. Be able to prescribe appropriate statistical methods

More information

Practical Biostatistics

Practical Biostatistics Practical Biostatistics Clinical Epidemiology, Biostatistics and Bioinformatics AMC Multivariable regression Day 5 Recap Describing association: Correlation Parametric technique: Pearson (PMCC) Non-parametric:

More information

Nemours Biomedical Research Statistics Course. Li Xie Nemours Biostatistics Core October 14, 2014

Nemours Biomedical Research Statistics Course. Li Xie Nemours Biostatistics Core October 14, 2014 Nemours Biomedical Research Statistics Course Li Xie Nemours Biostatistics Core October 14, 2014 Outline Recap Introduction to Logistic Regression Recap Descriptive statistics Variable type Example of

More information

Important note: Transcripts are not substitutes for textbook assignments. 1

Important note: Transcripts are not substitutes for textbook assignments. 1 In this lesson we will cover correlation and regression, two really common statistical analyses for quantitative (or continuous) data. Specially we will review how to organize the data, the importance

More information

Correlation and Linear Regression

Correlation and Linear Regression Correlation and Linear Regression Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means

More information

Turning a research question into a statistical question.

Turning a research question into a statistical question. Turning a research question into a statistical question. IGINAL QUESTION: Concept Concept Concept ABOUT ONE CONCEPT ABOUT RELATIONSHIPS BETWEEN CONCEPTS TYPE OF QUESTION: DESCRIBE what s going on? DECIDE

More information

Survival Analysis Math 434 Fall 2011

Survival Analysis Math 434 Fall 2011 Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup

More information

Statistics Introductory Correlation

Statistics Introductory Correlation Statistics Introductory Correlation Session 10 oscardavid.barrerarodriguez@sciencespo.fr April 9, 2018 Outline 1 Statistics are not used only to describe central tendency and variability for a single variable.

More information

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013 Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not

More information

Logistic Regression. Continued Psy 524 Ainsworth

Logistic Regression. Continued Psy 524 Ainsworth Logistic Regression Continued Psy 524 Ainsworth Equations Regression Equation Y e = 1 + A+ B X + B X + B X 1 1 2 2 3 3 i A+ B X + B X + B X e 1 1 2 2 3 3 Equations The linear part of the logistic regression

More information

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

 M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2 Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the

More information

Nemours Biomedical Research Biostatistics Core Statistics Course Session 4. Li Xie March 4, 2015

Nemours Biomedical Research Biostatistics Core Statistics Course Session 4. Li Xie March 4, 2015 Nemours Biomedical Research Biostatistics Core Statistics Course Session 4 Li Xie March 4, 2015 Outline Recap: Pairwise analysis with example of twosample unpaired t-test Today: More on t-tests; Introduction

More information

Ph.D. course: Regression models. Introduction. 19 April 2012

Ph.D. course: Regression models. Introduction. 19 April 2012 Ph.D. course: Regression models Introduction PKA & LTS Sect. 1.1, 1.2, 1.4 19 April 2012 www.biostat.ku.dk/~pka/regrmodels12 Per Kragh Andersen 1 Regression models The distribution of one outcome variable

More information

Correlation and Regression Bangkok, 14-18, Sept. 2015

Correlation and Regression Bangkok, 14-18, Sept. 2015 Analysing and Understanding Learning Assessment for Evidence-based Policy Making Correlation and Regression Bangkok, 14-18, Sept. 2015 Australian Council for Educational Research Correlation The strength

More information

Ph.D. course: Regression models. Regression models. Explanatory variables. Example 1.1: Body mass index and vitamin D status

Ph.D. course: Regression models. Regression models. Explanatory variables. Example 1.1: Body mass index and vitamin D status Ph.D. course: Regression models Introduction PKA & LTS Sect. 1.1, 1.2, 1.4 25 April 2013 www.biostat.ku.dk/~pka/regrmodels13 Per Kragh Andersen Regression models The distribution of one outcome variable

More information

Lecture 5: ANOVA and Correlation

Lecture 5: ANOVA and Correlation Lecture 5: ANOVA and Correlation Ani Manichaikul amanicha@jhsph.edu 23 April 2007 1 / 62 Comparing Multiple Groups Continous data: comparing means Analysis of variance Binary data: comparing proportions

More information

Binary Logistic Regression

Binary Logistic Regression The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b

More information

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about

More information

AMS 7 Correlation and Regression Lecture 8

AMS 7 Correlation and Regression Lecture 8 AMS 7 Correlation and Regression Lecture 8 Department of Applied Mathematics and Statistics, University of California, Santa Cruz Suumer 2014 1 / 18 Correlation pairs of continuous observations. Correlation

More information

Stat 642, Lecture notes for 04/12/05 96

Stat 642, Lecture notes for 04/12/05 96 Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,

More information

Correlation & Regression. Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University of Alexandria

Correlation & Regression. Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University of Alexandria بسم الرحمن الرحيم Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University of Alexandria Correlation Finding the relationship between

More information

More Statistics tutorial at Logistic Regression and the new:

More Statistics tutorial at  Logistic Regression and the new: Logistic Regression and the new: Residual Logistic Regression 1 Outline 1. Logistic Regression 2. Confounding Variables 3. Controlling for Confounding Variables 4. Residual Linear Regression 5. Residual

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

BMI 541/699 Lecture 22

BMI 541/699 Lecture 22 BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based

More information

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES Cox s regression analysis Time dependent explanatory variables Henrik Ravn Bandim Health Project, Statens Serum Institut 4 November 2011 1 / 53

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

BIOSTATISTICS NURS 3324

BIOSTATISTICS NURS 3324 Simple Linear Regression and Correlation Introduction Previously, our attention has been focused on one variable which we designated by x. Frequently, it is desirable to learn something about the relationship

More information

Chapter 16: Correlation

Chapter 16: Correlation Chapter : Correlation So far We ve focused on hypothesis testing Is the relationship we observe between x and y in our sample true generally (i.e. for the population from which the sample came) Which answers

More information

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart

More information

Correlation: Relationships between Variables

Correlation: Relationships between Variables Correlation Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means However, researchers are

More information

CORELATION - Pearson-r - Spearman-rho

CORELATION - Pearson-r - Spearman-rho CORELATION - Pearson-r - Spearman-rho Scatter Diagram A scatter diagram is a graph that shows that the relationship between two variables measured on the same individual. Each individual in the set is

More information

Biostatistics 4: Trends and Differences

Biostatistics 4: Trends and Differences Biostatistics 4: Trends and Differences Dr. Jessica Ketchum, PhD. email: McKinneyJL@vcu.edu Objectives 1) Know how to see the strength, direction, and linearity of relationships in a scatter plot 2) Interpret

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

Objectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships

Objectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships Objectives 2.3 Least-squares regression Regression lines Prediction and Extrapolation Correlation and r 2 Transforming relationships Adapted from authors slides 2012 W.H. Freeman and Company Straight Line

More information

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

Biostatistics for physicists fall Correlation Linear regression Analysis of variance Biostatistics for physicists fall 2015 Correlation Linear regression Analysis of variance Correlation Example: Antibody level on 38 newborns and their mothers There is a positive correlation in antibody

More information

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

More information

Linear regression and correlation

Linear regression and correlation Faculty of Health Sciences Linear regression and correlation Statistics for experimental medical researchers 2018 Julie Forman, Christian Pipper & Claus Ekstrøm Department of Biostatistics, University

More information

23. Inference for regression

23. Inference for regression 23. Inference for regression The Practice of Statistics in the Life Sciences Third Edition 2014 W. H. Freeman and Company Objectives (PSLS Chapter 23) Inference for regression The regression model Confidence

More information

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information. STA441: Spring 2018 Multiple Regression This slide show is a free open source document. See the last slide for copyright information. 1 Least Squares Plane 2 Statistical MODEL There are p-1 explanatory

More information

Chapter 4. Regression Models. Learning Objectives

Chapter 4. Regression Models. Learning Objectives Chapter 4 Regression Models To accompany Quantitative Analysis for Management, Eleventh Edition, by Render, Stair, and Hanna Power Point slides created by Brian Peterson Learning Objectives After completing

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania Chapter 10 Regression Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania Scatter Diagrams A graph in which pairs of points, (x, y), are

More information

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between

More information

Small n, σ known or unknown, underlying nongaussian

Small n, σ known or unknown, underlying nongaussian READY GUIDE Summary Tables SUMMARY-1: Methods to compute some confidence intervals Parameter of Interest Conditions 95% CI Proportion (π) Large n, p 0 and p 1 Equation 12.11 Small n, any p Figure 12-4

More information

Categorical data analysis Chapter 5

Categorical data analysis Chapter 5 Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

MAS3301 / MAS8311 Biostatistics Part II: Survival

MAS3301 / MAS8311 Biostatistics Part II: Survival MAS3301 / MAS8311 Biostatistics Part II: Survival M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2009-10 1 13 The Cox proportional hazards model 13.1 Introduction In the

More information

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form: Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic

More information

Measuring Associations : Pearson s correlation

Measuring Associations : Pearson s correlation Measuring Associations : Pearson s correlation Scatter Diagram A scatter diagram is a graph that shows that the relationship between two variables measured on the same individual. Each individual in the

More information

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation

More information

Chapter 6: Exploring Data: Relationships Lesson Plan

Chapter 6: Exploring Data: Relationships Lesson Plan Chapter 6: Exploring Data: Relationships Lesson Plan For All Practical Purposes Displaying Relationships: Scatterplots Mathematical Literacy in Today s World, 9th ed. Making Predictions: Regression Line

More information

Chapter 4: Regression Models

Chapter 4: Regression Models Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,

More information

Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics

Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/25 Residuals for the

More information

APPENDIX B Sample-Size Calculation Methods: Classical Design

APPENDIX B Sample-Size Calculation Methods: Classical Design APPENDIX B Sample-Size Calculation Methods: Classical Design One/Paired - Sample Hypothesis Test for the Mean Sign test for median difference for a paired sample Wilcoxon signed - rank test for one or

More information

Correlation and Regression

Correlation and Regression Elementary Statistics A Step by Step Approach Sixth Edition by Allan G. Bluman http://www.mhhe.com/math/stat/blumanbrief SLIDES PREPARED BY LLOYD R. JAISINGH MOREHEAD STATE UNIVERSITY MOREHEAD KY Updated

More information

Review of Multiple Regression

Review of Multiple Regression Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate

More information

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous

More information

Analysing categorical data using logit models

Analysing categorical data using logit models Analysing categorical data using logit models Graeme Hutcheson, University of Manchester The lecture notes, exercises and data sets associated with this course are available for download from: www.research-training.net/manchester

More information

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between 7.2 One-Sample Correlation ( = a) Introduction Correlation analysis measures the strength and direction of association between variables. In this chapter we will test whether the population correlation

More information

Lecture 8: Summary Measures

Lecture 8: Summary Measures Lecture 8: Summary Measures Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 8:

More information

Review. Number of variables. Standard Scores. Anecdotal / Clinical. Bivariate relationships. Ch. 3: Correlation & Linear Regression

Review. Number of variables. Standard Scores. Anecdotal / Clinical. Bivariate relationships. Ch. 3: Correlation & Linear Regression Ch. 3: Correlation & Relationships between variables Scatterplots Exercise Correlation Race / DNA Review Why numbers? Distribution & Graphs : Histogram Central Tendency Mean (SD) The Central Limit Theorem

More information

Chapter 4 Describing the Relation between Two Variables

Chapter 4 Describing the Relation between Two Variables Chapter 4 Describing the Relation between Two Variables 4.1 Scatter Diagrams and Correlation The is the variable whose value can be explained by the value of the or. A is a graph that shows the relationship

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

Generalized logit models for nominal multinomial responses. Local odds ratios

Generalized logit models for nominal multinomial responses. Local odds ratios Generalized logit models for nominal multinomial responses Categorical Data Analysis, Summer 2015 1/17 Local odds ratios Y 1 2 3 4 1 π 11 π 12 π 13 π 14 π 1+ X 2 π 21 π 22 π 23 π 24 π 2+ 3 π 31 π 32 π

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Draft Proof - Do not copy, post, or distribute. Chapter Learning Objectives REGRESSION AND CORRELATION THE SCATTER DIAGRAM

Draft Proof - Do not copy, post, or distribute. Chapter Learning Objectives REGRESSION AND CORRELATION THE SCATTER DIAGRAM 1 REGRESSION AND CORRELATION As we learned in Chapter 9 ( Bivariate Tables ), the differential access to the Internet is real and persistent. Celeste Campos-Castillo s (015) research confirmed the impact

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

Lecture 7 Time-dependent Covariates in Cox Regression

Lecture 7 Time-dependent Covariates in Cox Regression Lecture 7 Time-dependent Covariates in Cox Regression So far, we ve been considering the following Cox PH model: λ(t Z) = λ 0 (t) exp(β Z) = λ 0 (t) exp( β j Z j ) where β j is the parameter for the the

More information

CHAPTER 5. Outlier Detection in Multivariate Data

CHAPTER 5. Outlier Detection in Multivariate Data CHAPTER 5 Outlier Detection in Multivariate Data 5.1 Introduction Multivariate outlier detection is the important task of statistical analysis of multivariate data. Many methods have been proposed for

More information

Chapter 12 - Part I: Correlation Analysis

Chapter 12 - Part I: Correlation Analysis ST coursework due Friday, April - Chapter - Part I: Correlation Analysis Textbook Assignment Page - # Page - #, Page - # Lab Assignment # (available on ST webpage) GOALS When you have completed this lecture,

More information

Survival Analysis I (CHL5209H)

Survival Analysis I (CHL5209H) Survival Analysis Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca January 7, 2015 31-1 Literature Clayton D & Hills M (1993): Statistical Models in Epidemiology. Not really

More information

CS 5014: Research Methods in Computer Science

CS 5014: Research Methods in Computer Science Computer Science Clifford A. Shaffer Department of Computer Science Virginia Tech Blacksburg, Virginia Fall 2010 Copyright c 2010 by Clifford A. Shaffer Computer Science Fall 2010 1 / 207 Correlation and

More information

Statistical Thinking in Biomedical Research Session #3 Statistical Modeling

Statistical Thinking in Biomedical Research Session #3 Statistical Modeling Statistical Thinking in Biomedical Research Session #3 Statistical Modeling Lily Wang, PhD Department of Biostatistics (modified from notes by J.Patrie, R.Abbott, U of Virginia and WD Dupont, Vanderbilt

More information

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds Chapter 6 Logistic Regression In logistic regression, there is a categorical response variables, often coded 1=Yes and 0=No. Many important phenomena fit this framework. The patient survives the operation,

More information

Unit 11: Multiple Linear Regression

Unit 11: Multiple Linear Regression Unit 11: Multiple Linear Regression Statistics 571: Statistical Methods Ramón V. León 7/13/2004 Unit 11 - Stat 571 - Ramón V. León 1 Main Application of Multiple Regression Isolating the effect of a variable

More information

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information