Statistics in medicine
|
|
- Reginald Stone
- 6 years ago
- Views:
Transcription
1 Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Outline Regression Linear Logistic Cox s proportional hazard model S L I D E 0 S L I D E 1 and prediction methods Regression and prediction Is a measure of the strength and direction of the association between two variables measured on numerical scale. Measured by correlation coefficient r Linear Logistic Other Simple Partial Pearson Spearman Other S L I D E 2 S L I D E 3 1
2 The sign of r indicate the direction of association + sign: Positive correlation High value of one variable are associated with high values of the second variable - sign: Negative correlation High value of one variable are associated with low values of the second variable r ranges between +1 and : Positive correlation Perfect correlation - 1: Negative correlation Perfect correlation 0: No correlation S L I D E 4 S L I D E 5 r is immune to the change in x and y position r is immune to linear transformation r close to 0 does not mean lack of relationship i.e. strong non-linear relationship might exist does NOT indicate causation : Visualization Scatterplot A two-dimensional graph displaying the relationship between two numerical characteristics. Visualization of correlation Also called joint distribution graph Plot (x,y) to assess the pattern of the relationship S L I D E 6 S L I D E 7 2
3 : Scatterplot Legend: Patterns A.Perfect positive B.Positive C.Negative D.Week negative E.Nonexistent F.Nonlinear Scatterplots and correlations. A:r = +1.0; B:r = 0.7; C:r = 0.9; D:r = 0.4; E:r = 0.0; F:r = 0.0. Copyright 2016 McGraw-Hill Education. All rights reserved. Date of download: 1/11/2016 Types Pearson product-moment Interval or ratio scale Spearman rank-order Ordinal scale Other From: Chapter 8. Research Questions About Relationships among Variables Basic & Clinical Biostatistics, 4e, 2004 S L I D E 8 S L I D E 9 Pearson product-moment Used for two numerical normally distributed variables Test of significance 1- Calculate r (correlation coefficient) 2-Calculate the degrees of freedom 3-Calculate the test statistic t 4-Find the critical value of significance t 5-Draw a conclusion Assumptions Linear relationship Normal distribution No outliers Large sample size (>30) (X X)(Y Y) r = (X X) 2 (Y Y) 2 df=n-2 r n 2 t= 1 r 2 S L I D E 10 S L I D E 11 3
4 Spearman s Rho Could be used for NOT normally distributed variables Normally distributed variables Based on ranks Test of significance 1- Calculate r s (correlation coefficient) 2-Calculate the degrees of freedom 3-Calculate the test statistic t 4-Find the critical value of significance t 5-Draw a conclusion (RX R r s = X )(RY R Y ) (R X R X ) 2 (R Y R Y ) 2 df=n-2 r n 2 t= Interpretation of the size of r Rule of thumb no to trivial correlation very low correlation low correlation moderate correlation high correlation.90-1 very high correlation 1 perfect correlation 1 r 2 S L I D E 12 S L I D E 13 Interpretation of the size of r r is affected by sample size Large sample size, with small r significant results Better interpretation using r 2 (Known as the coefficient of determination) Is the proportion of the variance in one variable that is accounted for by the other variable Is a measure of the strength of the relationship S L I D E 14 : Example A study was conducted to examine whether serum calcium and serum triglycerides are correlated. If the correlation coefficient is 20%, interpret the r coefficient, what is the coefficient of determination, and its interpretation, and can you infer causation? Answer There is low correlation between serum calcium and serum triglycerides R 2 =.2x.2=.04=4% Interpretation of r 2 : 4% of the variation of serum calcium is accounted for by serum triglycerides (and vice versa) No S L I D E 15 4
5 Partial correlation Is a measure of the strength and direction of the association between two variables controlling for one or more variable r ranges between +1 and -1 Assumptions Linear relationship between all pairs of variables Normal distribution No outliers Definition: Statistical models that have one dependent (outcome) variable, but include more than one independent variable S L I D E 16 S L I D E 17 Rational of the regression equations Example: if we hypothesize that cholesterol level is predicted by age, gender, and diabetic status, and we would like to find out the line (as in the scatter diagram) that best fit this relationship, we can write this as a straight line equation: Y = a + bx Rational of the straight line equation in regression Cholesterol = age + gender + diabetes But not all these predictors are equally important, so we give each predictor a weight(coefficient) relative to its importance Cholesterol=(W1)age+(W2)gender+(W3)diabetes Rational of the regression equations However, we need a starting point for the calculation, so we add it to the equation Cholesterol=starting point+ (W1)age+(W2)gender+(W3)diabetes Because usually the prediction of the outcome is not perfect, so we add an error term Cholesterol=starting point+ (W1)age+(W2)gender+(W3)diabetes + error term S L I D E 18 S L I D E 19 5
6 Rational of the regression equations The final formula could be expressed as y= a+b 1 x 1 +b 2 x 2 +b 3 x 3 + e Also written as y= β 0 +β 1 x 1 +β 2 x 2 +β 3 x 3 + ε Rational of the regression equations y= a+b 1 x 1 +b 2 x 2 +b 3 x 3 + e Also written as y= β 0 +β 1 x 1 +β 2 x 2 +β 3 x 3 + ε Interpretation of the symbols This equation is commonly referred to as general linear model a (β 0 ):intercept i.e. where line crosses the y-axis b (β 1 k ): regression coefficients (slope) i.e. amount y changes each time x change by 1 unit e (ε): error term(residual) i.e. the distance the actual value of y depart from the regression line S L I D E 20 S L I D E 21 Rational of the regression equations Estimation of best estimates(least-squares method) Observed y and x are known, therefore e has to be calculated Use different a and b to calculate the predicted y (y hat) ŷ= a+b 1 x 1 +b 2 x 2 +b 3 x 3 e is then calculated as: y-ŷ The best estimate is the one with the least error i.e. that minimize e 2 = (y-ŷ) 2 i.e. minimize the sum of the squared error term From: Chapter 8. Research Questions About Relationships among Variables Basic & Clinical Biostatistics, 4e, 2004 Geometric interpretation of a regression line. Least squares regression line. Date of download: 1/12/2016 Copyright 2016 McGraw-Hill Education. All rights reserved. S L I D E 22 S L I D E 23 6
7 Applications Test for interaction Adjust for confounding Predict future values of y given x The types of models described by the previous equation are referred to as general linear models General because can accommodate different types of y and or x Linear because is a linear combination of the x terms Commonly used methods Survival S L I D E 24 S L I D E 25 Readings and resources Chapter 8, p : Dawson, B. and Trapp, R. G. (2004). Basic and Clinical Biostatistics (4th edition). New York: McGraw-Hill Chapter 9, p : Dawson, B. and Trapp, R. G. (2004). Basic and Clinical Biostatistics (4th edition). New York: McGraw-Hill. Chapter 10, p : Dawson, B. and Trapp, R. G. (2004). Basic and Clinical Biostatistics (4th edition). New York: McGraw-Hill. Chapter 11, p : Jekel's epidemiology, biostatistics, preventive medicine, and public health by David L. Katz et al (4th edition). Chapter 13, p : Jekel's epidemiology, biostatistics, preventive medicine, and public health by David L. Katz et al (4th edition). Statistics in medicine Lecture 4 part 2: and multiple regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu S L I D E 26 S L I D E 27 7
8 Outline and prediction methods Regression Linear Logistic Cox s proportional hazard model Regression and prediction Linear Logistic Other Simple Partial Pearson Spearman Other S L I D E 28 S L I D E 29 General linear model in which the dependent variable is continuous variable Types Single continuous predictor: simple linear regression Multiple continuous predictors: multiple linear regression Single categorical predictor: one-way ANOVA Multiple categorical predictors: N-way ANOVA Some categorical and some continuous predictors: analysis of covariance (ANCOVA) Assumptions Linearity: the relation is linear between each independent variable and the dependent variable Independence: The values of Y are independent Homogeneity: The equal variance of Y across the range of X S L I D E 30 S L I D E 31 8
9 Build up the model Assess model fit Interpret the regression coefficient Build up the model Most common method is stepwise (it is automated in most programs) Start with a one variable in the model (the main predictor, if one is hypothesized) Add another variable Keep adding variables to the list of variables already in the model Use a stopping criterion such as: The increase in r 2 <.01 S L I D E 32 S L I D E 33 Build up the model R 2 Is a measure of how much of the variation of the outcome is accounted for by the explanatory variables Range no variance accounted for 1 all the variance (100%) accounted for Assess model fit Residuals the part of Y that is not explained by X could be used to assess the model fit Plot the residuals(on Y axis) versus X The mean of the residuals is zero, therefore, if the model fits the data, the residuals and x should not be correlated S L I D E 34 S L I D E 35 9
10 Assess model fit Legend: Good fit: the residuals form a random scatter around the zero line Illustration of analysis of residuals. A: Linear relationship between X and Y. B: Residuals versus values of X for relation in part A.C: Curvilinear relationship between X and Y. D: Residuals versus values of X for relation in part C. Date of download: 1/12/2016 From: Chapter 8. Research Questions About Relationships among Variables Basic & Clinical Biostatistics, 4e, 2004 Copyright 2016 McGraw-Hill Education. All rights reserved. Interpret the regression coefficient The intercept: it is the expected value of Y if all X = zero If x cannot be zero, so intercept is not meaningful If the interest is in the relationship between X and Y, the intercept is not of interest and will not affect the conclusion If the interest is in the prediction of Y from X, then X has to be re-scaled intercept will be the expected value of Y at the chosen X value S L I D E 36 S L I D E 37 Interpret the regression coefficient The slope: If X is continuous: it is the change in Y for a one-unit increase in X, holding other variables (other X s ) constant If X is categorical: it is the mean difference in Y for between one category and the reference category of X, holding other variables (other X s ) constant Interpret the regression coefficient Compare the test statistic (t) with the critical value of significance for the relevant df The p value: Null hypothesis: the coefficients (intercept and slopes) = zero If p < predetermined significance level reject the null S L I D E 38 S L I D E 39 10
11 An example and interpretation A study was conducted to examine the association between insulin sensitivity scores (outcome) and BMI. The resultant regression equation was Y = X. Interpret the terms in the equation. Y : predicted insulin sensitivity : the intercept i.e. patients with zero BMI (unrealistic) have insulin sensitivity of From: Basic & Clinical Biostatistics, 4e, 2004 An example and interpretation A study was conducted to examine the association between insulin sensitivity scores (outcome) and BMI. The resultant regression equation was Y = X. Interpret the terms in the equation. o X: Observed BMI value o : the slope i.e. when BMI increase by 1 unit, predicted insulin sensitivity decrease by From: Basic & Clinical Biostatistics, 4e, 2004 S L I D E 40 S L I D E 41 General linear model in which the dependent variable is nominal/categorical variable Types Commonly used for dichotomous dependent variable Could be used for multinomial dependent variable Using a mathematical function (logit) to transform the regression data so y will be limited to (0,1) Logit(p (y=1 x s ))=log(p/(1-p))= β 0 +β 1 x 1 +β 2 x 2 + +β k x k Translated into the probability of the dependent variable as an exponential function of the independent variables 1 p y=1 = 1 + exp [ (b 0+b 1 x 1 +b 2 x 2 + +b k x k )] S L I D E 42 S L I D E 43 11
12 Types Commonly used for dichotomous dependent variable Could be used for multinomial dependent variable Build up the model Similar to linear regression Assess model fit Hosmer and Lemeshow s goodness of fit test a p value >.05 acceptable fit Interpret the regression coefficient S L I D E 44 S L I D E 45 Steps: Interpret the regression coefficient exp(β 0 ): the odds that y=1, given x=0 exp(β 1 ): If X is categorical: is the odds ratio of y=1 in one category of x compared to the reference category of x, holding other variables (other x s ) constant If x is continuous: is the change in the odds of y=1 for a one-unit increase in x, holding other variables (other x s ) constant Practical coding issues: Interpretation of the results depend on how you code your data, therefore It is important to check how the outcome is coded i.e. what level is coded as 1 and what level is coded as 2 (or 0) It is important check how the predictors are coded Common practice is to code binary predictors as 0,1 S L I D E 46 S L I D E 47 12
13 An example and interpretation Blood alcohol concentration (BAC)>50mg/dL was examined among men with unintentional injury who was admitted to emergency room. Predictors of BAC were daytime, weekday, being Caucasian, and age of 40. the reported OR(95% CI) for Caucasian, and age of 40 were 1.32( ), and 0.89( ). Interpret the results. From: Basic & Clinical Biostatistics, 4e, 2004 An example and interpretation Answer Caucasians were significantly more likely to have elevated BAC than other races. Age did not significantly predict elevated BAC. From: Basic & Clinical Biostatistics, 4e, 2004 S L I D E 48 S L I D E 49 Survival analysis Definition: The statistical methods for analyzing survival data when there are censored observations Censored observation: is an observation whose value is unknown, generally because the subject has not been in the study long enough for outcome of interest, such as death Survival analysis Common methods of summarization and presentation Person-time Life-tables S L I D E 50 S L I D E 51 13
14 Survival analysis Common methods of summarization and presentation Person-time Person-time is the length of follow-up Ex. If two subject were followed, one for 2 years and the second for one year, then the total person-time is three person-year Could be used to calculate incidence density Incidence density is the number of events divided by the total person-time Useful method if the event could be recurrent Survival analysis Common methods of summarization and presentation Life-tables (covered in the epi course) Two methods Kaplan-Meier method Actuarial method Requirements Date of entry Date of withdrawal Cause of withdrawal Death Loss of follow-up S L I D E 52 S L I D E 53 Survival analysis Common methods to test significance in survival analysis Logrank test Mantel-Haenszel chi-square test Cox s proportional hazard model Bi-variate Cox proportional hazard model Regression model when there is censored outcome data Data is said to be censored if Loss of follow up End of the study The dependent variable is the survival time (time to event) h(t, X 1, X 2,.X K )= h 0 (t)e b 1 x 1 +b 2 x 2 +.+b k x k S L I D E 54 S L I D E 55 14
15 Cox proportional hazard model Cox proportional hazard model Answer the question what is the likelihood of survival to a particular time (i.e. dying in the next interval), given survival up to this time, and given a set of independent variables S L I D E 56 Allows estimating relative risk (also called hazard ratio) In other words, answer the question of what is the risk of an event (such as death) at a given time, given it has not occurred until that time The ratio of the risk of the event at a given time, in the exposed to the risk in the unexposed Assessing the assumption of proportional hazard is beyond this class S L I D E 57 Readings and resources Chapter 8, p : Dawson, B. and Trapp, R. G. (2004). Basic and Clinical Biostatistics (4th edition). New York: McGraw-Hill Chapter 9, p : Dawson, B. and Trapp, R. G. (2004). Basic and Clinical Biostatistics (4th edition). New York: McGraw-Hill. Chapter 10, p : Dawson, B. and Trapp, R. G. (2004). Basic and Clinical Biostatistics (4th edition). New York: McGraw-Hill. Chapter 11, p : Jekel's epidemiology, biostatistics, preventive medicine, and public health by David L. Katz et al (4th edition). Chapter 13, p : Jekel's epidemiology, biostatistics, preventive medicine, and public health by David L. Katz et al (4th edition). S L I D E 58 15
Statistics in medicine
Statistics in medicine Lecture 3: Bivariate association : Categorical variables Proportion in one group One group is measured one time: z test Use the z distribution as an approximation to the binomial
More informationCorrelation and Simple Linear Regression
Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline
More informationCorrelation. A statistics method to measure the relationship between two variables. Three characteristics
Correlation Correlation A statistics method to measure the relationship between two variables Three characteristics Direction of the relationship Form of the relationship Strength/Consistency Direction
More informationCorrelation and regression
1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,
More informationStatistics in medicine
Statistics in medicine Lecture 1- part 1: Describing variation, and graphical presentation Outline Sources of variation Types of variables Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease
More informationMultiple linear regression S6
Basic medical statistics for clinical and experimental research Multiple linear regression S6 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/42 Introduction Two main motivations for doing multiple
More informationREVIEW 8/2/2017 陈芳华东师大英语系
REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p
More informationBasic Medical Statistics Course
Basic Medical Statistics Course S7 Logistic Regression November 2015 Wilma Heemsbergen w.heemsbergen@nki.nl Logistic Regression The concept of a relationship between the distribution of a dependent variable
More informationCan you tell the relationship between students SAT scores and their college grades?
Correlation One Challenge Can you tell the relationship between students SAT scores and their college grades? A: The higher SAT scores are, the better GPA may be. B: The higher SAT scores are, the lower
More informationCorrelation and simple linear regression S5
Basic medical statistics for clinical and eperimental research Correlation and simple linear regression S5 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/41 Introduction Eample: Brain size and
More informationTHE PEARSON CORRELATION COEFFICIENT
CORRELATION Two variables are said to have a relation if knowing the value of one variable gives you information about the likely value of the second variable this is known as a bivariate relation There
More informationAnalysing data: regression and correlation S6 and S7
Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association
More informationBusiness Statistics. Lecture 10: Correlation and Linear Regression
Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form
More informationIntroduction to Statistical Analysis
Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive
More informationMachine Learning. Module 3-4: Regression and Survival Analysis Day 2, Asst. Prof. Dr. Santitham Prom-on
Machine Learning Module 3-4: Regression and Survival Analysis Day 2, 9.00 16.00 Asst. Prof. Dr. Santitham Prom-on Department of Computer Engineering, Faculty of Engineering King Mongkut s University of
More informationAcknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression
INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical
More informationIntroduction to logistic regression
Introduction to logistic regression Tuan V. Nguyen Professor and NHMRC Senior Research Fellow Garvan Institute of Medical Research University of New South Wales Sydney, Australia What we are going to learn
More informationTMA 4275 Lifetime Analysis June 2004 Solution
TMA 4275 Lifetime Analysis June 2004 Solution Problem 1 a) Observation of the outcome is censored, if the time of the outcome is not known exactly and only the last time when it was observed being intact,
More informationBivariate Relationships Between Variables
Bivariate Relationships Between Variables BUS 735: Business Decision Making and Research 1 Goals Specific goals: Detect relationships between variables. Be able to prescribe appropriate statistical methods
More informationPractical Biostatistics
Practical Biostatistics Clinical Epidemiology, Biostatistics and Bioinformatics AMC Multivariable regression Day 5 Recap Describing association: Correlation Parametric technique: Pearson (PMCC) Non-parametric:
More informationNemours Biomedical Research Statistics Course. Li Xie Nemours Biostatistics Core October 14, 2014
Nemours Biomedical Research Statistics Course Li Xie Nemours Biostatistics Core October 14, 2014 Outline Recap Introduction to Logistic Regression Recap Descriptive statistics Variable type Example of
More informationImportant note: Transcripts are not substitutes for textbook assignments. 1
In this lesson we will cover correlation and regression, two really common statistical analyses for quantitative (or continuous) data. Specially we will review how to organize the data, the importance
More informationCorrelation and Linear Regression
Correlation and Linear Regression Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means
More informationTurning a research question into a statistical question.
Turning a research question into a statistical question. IGINAL QUESTION: Concept Concept Concept ABOUT ONE CONCEPT ABOUT RELATIONSHIPS BETWEEN CONCEPTS TYPE OF QUESTION: DESCRIBE what s going on? DECIDE
More informationSurvival Analysis Math 434 Fall 2011
Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup
More informationStatistics Introductory Correlation
Statistics Introductory Correlation Session 10 oscardavid.barrerarodriguez@sciencespo.fr April 9, 2018 Outline 1 Statistics are not used only to describe central tendency and variability for a single variable.
More informationAnalysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013
Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not
More informationLogistic Regression. Continued Psy 524 Ainsworth
Logistic Regression Continued Psy 524 Ainsworth Equations Regression Equation Y e = 1 + A+ B X + B X + B X 1 1 2 2 3 3 i A+ B X + B X + B X e 1 1 2 2 3 3 Equations The linear part of the logistic regression
More information" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2
Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the
More informationNemours Biomedical Research Biostatistics Core Statistics Course Session 4. Li Xie March 4, 2015
Nemours Biomedical Research Biostatistics Core Statistics Course Session 4 Li Xie March 4, 2015 Outline Recap: Pairwise analysis with example of twosample unpaired t-test Today: More on t-tests; Introduction
More informationPh.D. course: Regression models. Introduction. 19 April 2012
Ph.D. course: Regression models Introduction PKA & LTS Sect. 1.1, 1.2, 1.4 19 April 2012 www.biostat.ku.dk/~pka/regrmodels12 Per Kragh Andersen 1 Regression models The distribution of one outcome variable
More informationCorrelation and Regression Bangkok, 14-18, Sept. 2015
Analysing and Understanding Learning Assessment for Evidence-based Policy Making Correlation and Regression Bangkok, 14-18, Sept. 2015 Australian Council for Educational Research Correlation The strength
More informationPh.D. course: Regression models. Regression models. Explanatory variables. Example 1.1: Body mass index and vitamin D status
Ph.D. course: Regression models Introduction PKA & LTS Sect. 1.1, 1.2, 1.4 25 April 2013 www.biostat.ku.dk/~pka/regrmodels13 Per Kragh Andersen Regression models The distribution of one outcome variable
More informationLecture 5: ANOVA and Correlation
Lecture 5: ANOVA and Correlation Ani Manichaikul amanicha@jhsph.edu 23 April 2007 1 / 62 Comparing Multiple Groups Continous data: comparing means Analysis of variance Binary data: comparing proportions
More informationBinary Logistic Regression
The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b
More informationSection IX. Introduction to Logistic Regression for binary outcomes. Poisson regression
Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about
More informationAMS 7 Correlation and Regression Lecture 8
AMS 7 Correlation and Regression Lecture 8 Department of Applied Mathematics and Statistics, University of California, Santa Cruz Suumer 2014 1 / 18 Correlation pairs of continuous observations. Correlation
More informationStat 642, Lecture notes for 04/12/05 96
Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal
More informationComputational Systems Biology: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,
More informationCorrelation & Regression. Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University of Alexandria
بسم الرحمن الرحيم Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University of Alexandria Correlation Finding the relationship between
More informationMore Statistics tutorial at Logistic Regression and the new:
Logistic Regression and the new: Residual Logistic Regression 1 Outline 1. Logistic Regression 2. Confounding Variables 3. Controlling for Confounding Variables 4. Residual Linear Regression 5. Residual
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationBMI 541/699 Lecture 22
BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based
More informationADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables
ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES Cox s regression analysis Time dependent explanatory variables Henrik Ravn Bandim Health Project, Statens Serum Institut 4 November 2011 1 / 53
More informationInferences for Regression
Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In
More informationBIOSTATISTICS NURS 3324
Simple Linear Regression and Correlation Introduction Previously, our attention has been focused on one variable which we designated by x. Frequently, it is desirable to learn something about the relationship
More informationChapter 16: Correlation
Chapter : Correlation So far We ve focused on hypothesis testing Is the relationship we observe between x and y in our sample true generally (i.e. for the population from which the sample came) Which answers
More informationBIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke
BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart
More informationCorrelation: Relationships between Variables
Correlation Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means However, researchers are
More informationCORELATION - Pearson-r - Spearman-rho
CORELATION - Pearson-r - Spearman-rho Scatter Diagram A scatter diagram is a graph that shows that the relationship between two variables measured on the same individual. Each individual in the set is
More informationBiostatistics 4: Trends and Differences
Biostatistics 4: Trends and Differences Dr. Jessica Ketchum, PhD. email: McKinneyJL@vcu.edu Objectives 1) Know how to see the strength, direction, and linearity of relationships in a scatter plot 2) Interpret
More informationCategorical Predictor Variables
Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively
More informationObjectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships
Objectives 2.3 Least-squares regression Regression lines Prediction and Extrapolation Correlation and r 2 Transforming relationships Adapted from authors slides 2012 W.H. Freeman and Company Straight Line
More informationBiostatistics for physicists fall Correlation Linear regression Analysis of variance
Biostatistics for physicists fall 2015 Correlation Linear regression Analysis of variance Correlation Example: Antibody level on 38 newborns and their mothers There is a positive correlation in antibody
More informationLecture 12: Effect modification, and confounding in logistic regression
Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression
More informationLinear regression and correlation
Faculty of Health Sciences Linear regression and correlation Statistics for experimental medical researchers 2018 Julie Forman, Christian Pipper & Claus Ekstrøm Department of Biostatistics, University
More information23. Inference for regression
23. Inference for regression The Practice of Statistics in the Life Sciences Third Edition 2014 W. H. Freeman and Company Objectives (PSLS Chapter 23) Inference for regression The regression model Confidence
More informationSTA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.
STA441: Spring 2018 Multiple Regression This slide show is a free open source document. See the last slide for copyright information. 1 Least Squares Plane 2 Statistical MODEL There are p-1 explanatory
More informationChapter 4. Regression Models. Learning Objectives
Chapter 4 Regression Models To accompany Quantitative Analysis for Management, Eleventh Edition, by Render, Stair, and Hanna Power Point slides created by Brian Peterson Learning Objectives After completing
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationChapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania
Chapter 10 Regression Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania Scatter Diagrams A graph in which pairs of points, (x, y), are
More informationChapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression
BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between
More informationSmall n, σ known or unknown, underlying nongaussian
READY GUIDE Summary Tables SUMMARY-1: Methods to compute some confidence intervals Parameter of Interest Conditions 95% CI Proportion (π) Large n, p 0 and p 1 Equation 12.11 Small n, any p Figure 12-4
More informationCategorical data analysis Chapter 5
Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationMAS3301 / MAS8311 Biostatistics Part II: Survival
MAS3301 / MAS8311 Biostatistics Part II: Survival M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2009-10 1 13 The Cox proportional hazards model 13.1 Introduction In the
More informationBiost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation
Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest
More informationEPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7
Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review
More informationReview: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:
Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic
More informationMeasuring Associations : Pearson s correlation
Measuring Associations : Pearson s correlation Scatter Diagram A scatter diagram is a graph that shows that the relationship between two variables measured on the same individual. Each individual in the
More informationCorrelation and the Analysis of Variance Approach to Simple Linear Regression
Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation
More informationChapter 6: Exploring Data: Relationships Lesson Plan
Chapter 6: Exploring Data: Relationships Lesson Plan For All Practical Purposes Displaying Relationships: Scatterplots Mathematical Literacy in Today s World, 9th ed. Making Predictions: Regression Line
More informationChapter 4: Regression Models
Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,
More informationAnalysis of Time-to-Event Data: Chapter 6 - Regression diagnostics
Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/25 Residuals for the
More informationAPPENDIX B Sample-Size Calculation Methods: Classical Design
APPENDIX B Sample-Size Calculation Methods: Classical Design One/Paired - Sample Hypothesis Test for the Mean Sign test for median difference for a paired sample Wilcoxon signed - rank test for one or
More informationCorrelation and Regression
Elementary Statistics A Step by Step Approach Sixth Edition by Allan G. Bluman http://www.mhhe.com/math/stat/blumanbrief SLIDES PREPARED BY LLOYD R. JAISINGH MOREHEAD STATE UNIVERSITY MOREHEAD KY Updated
More informationReview of Multiple Regression
Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate
More informationParametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami
Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous
More informationAnalysing categorical data using logit models
Analysing categorical data using logit models Graeme Hutcheson, University of Manchester The lecture notes, exercises and data sets associated with this course are available for download from: www.research-training.net/manchester
More information7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between
7.2 One-Sample Correlation ( = a) Introduction Correlation analysis measures the strength and direction of association between variables. In this chapter we will test whether the population correlation
More informationLecture 8: Summary Measures
Lecture 8: Summary Measures Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 8:
More informationReview. Number of variables. Standard Scores. Anecdotal / Clinical. Bivariate relationships. Ch. 3: Correlation & Linear Regression
Ch. 3: Correlation & Relationships between variables Scatterplots Exercise Correlation Race / DNA Review Why numbers? Distribution & Graphs : Histogram Central Tendency Mean (SD) The Central Limit Theorem
More informationChapter 4 Describing the Relation between Two Variables
Chapter 4 Describing the Relation between Two Variables 4.1 Scatter Diagrams and Correlation The is the variable whose value can be explained by the value of the or. A is a graph that shows the relationship
More informationClass Notes: Week 8. Probit versus Logit Link Functions and Count Data
Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While
More informationGeneralized logit models for nominal multinomial responses. Local odds ratios
Generalized logit models for nominal multinomial responses Categorical Data Analysis, Summer 2015 1/17 Local odds ratios Y 1 2 3 4 1 π 11 π 12 π 13 π 14 π 1+ X 2 π 21 π 22 π 23 π 24 π 2+ 3 π 31 π 32 π
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationDraft Proof - Do not copy, post, or distribute. Chapter Learning Objectives REGRESSION AND CORRELATION THE SCATTER DIAGRAM
1 REGRESSION AND CORRELATION As we learned in Chapter 9 ( Bivariate Tables ), the differential access to the Internet is real and persistent. Celeste Campos-Castillo s (015) research confirmed the impact
More informationLogistic Regression: Regression with a Binary Dependent Variable
Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression
More informationLecture 7 Time-dependent Covariates in Cox Regression
Lecture 7 Time-dependent Covariates in Cox Regression So far, we ve been considering the following Cox PH model: λ(t Z) = λ 0 (t) exp(β Z) = λ 0 (t) exp( β j Z j ) where β j is the parameter for the the
More informationCHAPTER 5. Outlier Detection in Multivariate Data
CHAPTER 5 Outlier Detection in Multivariate Data 5.1 Introduction Multivariate outlier detection is the important task of statistical analysis of multivariate data. Many methods have been proposed for
More informationChapter 12 - Part I: Correlation Analysis
ST coursework due Friday, April - Chapter - Part I: Correlation Analysis Textbook Assignment Page - # Page - #, Page - # Lab Assignment # (available on ST webpage) GOALS When you have completed this lecture,
More informationSurvival Analysis I (CHL5209H)
Survival Analysis Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca January 7, 2015 31-1 Literature Clayton D & Hills M (1993): Statistical Models in Epidemiology. Not really
More informationCS 5014: Research Methods in Computer Science
Computer Science Clifford A. Shaffer Department of Computer Science Virginia Tech Blacksburg, Virginia Fall 2010 Copyright c 2010 by Clifford A. Shaffer Computer Science Fall 2010 1 / 207 Correlation and
More informationStatistical Thinking in Biomedical Research Session #3 Statistical Modeling
Statistical Thinking in Biomedical Research Session #3 Statistical Modeling Lily Wang, PhD Department of Biostatistics (modified from notes by J.Patrie, R.Abbott, U of Virginia and WD Dupont, Vanderbilt
More informationChapter 6. Logistic Regression. 6.1 A linear model for the log odds
Chapter 6 Logistic Regression In logistic regression, there is a categorical response variables, often coded 1=Yes and 0=No. Many important phenomena fit this framework. The patient survives the operation,
More informationUnit 11: Multiple Linear Regression
Unit 11: Multiple Linear Regression Statistics 571: Statistical Methods Ramón V. León 7/13/2004 Unit 11 - Stat 571 - Ramón V. León 1 Main Application of Multiple Regression Isolating the effect of a variable
More informationStatistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018
Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical
More informationUnit 6 - Introduction to linear regression
Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,
More information