WORKSHOP 3 Measuring Association

Similar documents
ESP 178 Applied Research Methods. 2/23: Quantitative Analysis

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Binary Dependent Variables

Example: Forced Expiratory Volume (FEV) Program L13. Example: Forced Expiratory Volume (FEV) Example: Forced Expiratory Volume (FEV)

BMI 541/699 Lecture 22

13.1 Categorical Data and the Multinomial Experiment

MATH ASSIGNMENT 2: SOLUTIONS

Multiple linear regression

Introduction to Analysis of Genomic Data Using R Lecture 6: Review Statistics (Part II)

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

Practical Biostatistics

y response variable x 1, x 2,, x k -- a set of explanatory variables

Correlation and regression

SPSS LAB FILE 1

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

McGill University. Faculty of Science MATH 204 PRINCIPLES OF STATISTICS II. Final Examination

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

STAT 4385 Topic 03: Simple Linear Regression

Assoc.Prof.Dr. Wolfgang Feilmayr Multivariate Methods in Regional Science: Regression and Correlation Analysis REGRESSION ANALYSIS

Review of Statistics 101

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Multiple Regression and Model Building Lecture 20 1 May 2006 R. Ryznar

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Finding Relationships Among Variables

Correlation and simple linear regression S5

STAT 7030: Categorical Data Analysis

Basic Medical Statistics Course

Lecture 12: Effect modification, and confounding in logistic regression

df=degrees of freedom = n - 1

General Linear Model (Chapter 4)

LOOKING FOR RELATIONSHIPS

Ch 2: Simple Linear Regression

Review of Multiple Regression

Confidence Intervals, Testing and ANOVA Summary

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Regression ( Kemampuan Individu, Lingkungan kerja dan Motivasi)

Topic 14: Inference in Multiple Regression

Six Sigma Black Belt Study Guides

Correlation and Simple Linear Regression

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Lecture 10 Multiple Linear Regression

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Simple Linear Regression

Correlation and Regression Bangkok, 14-18, Sept. 2015

Three-Way Contingency Tables

BIOS 6222: Biostatistics II. Outline. Course Presentation. Course Presentation. Review of Basic Concepts. Why Nonparametrics.

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

10: Crosstabs & Independent Proportions

Inference for Regression Simple Linear Regression

9. Linear Regression and Correlation

Multiple linear regression S6

STATISTICS. Multiple regression

Online supplement. Absolute Value of Lung Function (FEV 1 or FVC) Explains the Sex Difference in. Breathlessness in the General Population

Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed)

SPSS Guide For MMI 409

Log-linear Models for Contingency Tables

Statistical Techniques II EXST7015 Simple Linear Regression

Cohen s s Kappa and Log-linear Models

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015

Sociology 6Z03 Review II

using the beginning of all regression models

Homework 2: Simple Linear Regression

Bivariate Regression Analysis. The most useful means of discerning causality and significance of variables

Foundations of Correlation and Regression

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

(Where does Ch. 7 on comparing 2 means or 2 proportions fit into this?)

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Case-Control Association Testing. Case-Control Association Testing

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

As always, show your work and follow the HW format. You may use Excel, but must show sample calculations.

Regression Analysis. BUS 735: Business Decision Making and Research

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Topic 10 - Linear Regression

Statistics for exp. medical researchers Regression and Correlation

13 Simple Linear Regression

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

Categorical data analysis Chapter 5

Exam details. Final Review Session. Things to Review

Section 4.6 Simple Linear Regression

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

Unit 9: Inferences for Proportions and Count Data

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

QUANTITATIVE STATISTICAL METHODS: REGRESSION AND FORECASTING JOHANNES LEDOLTER VIENNA UNIVERSITY OF ECONOMICS AND BUSINESS ADMINISTRATION SPRING 2013

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

Lecture 10: Introduction to Logistic Regression

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

( ), which of the coefficients would end

Correlation. Bivariate normal densities with ρ 0. Two-dimensional / bivariate normal density with correlation 0

Confidence Interval for the mean response

STAT Chapter 11: Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Inference for Regression Inference about the Regression Model and Using the Regression Line

REVIEW 8/2/2017 陈芳华东师大英语系

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Transcription:

WORKSHOP 3 Measuring Association Concepts Analysing Categorical Data o Testing of Proportions o Contingency Tables & Tests o Odds Ratios Linear Association Measures o Correlation o Simple Linear Regression Analysis Workshop 3 ~ Measuring Association Page 1 of 1

Analysing Categorical Data A review of methods used to describe the relationship between categorical variables / comparison of proportions. o Contingency Tables & Tests Goodness of Fit Association / Independence o Odds Ratios Testing of Proportions ~ can also calculate C.I.s and apply z-test to proportion(s). (Less common approach (REF: 1.7)) Contingency Tables & Tests types of test Goodness of fit Tests of association and independence Goodness of Fit Test Tests whether distribution of a variable conforms to an expected distribution. Workshop 3 ~ Measuring Association Page of 1

Example: (REF: Chapter 1) Snapdragon flowers can be coloured red, pink or white. According to Mendelian genetic model, self-pollinated pink flowers should produce progeny plants that are red, pink or white with ratio: 1::1 respectively. => H : Pr(R) =.5; Pr(P) =.5; Pr(W) =.5 Sample of 3 plants produce following colours: Red 5 Pr(R).31 Pink 1 Pr(P).51 White 5 Pr(W). To test H, USE CHI-SQUARE TEST, χ test χ ( O E) = E where O is observed frequency and E is expected frequency Calculations: O E (O-E) /E Red 5 5.5.35 Pink 1 117.1 White 5 5.5..5 Compare with χ with (# of categories 1) DF. Workshop 3 ~ Measuring Association Page 3 of 1

Pr(χ >.5 from χ DF) =.7 As p-value >.5 (signif level = 5%) we cannot reject H. Note: Critical χ DF = 5.99 @ 5 % significance level & Calculated χ is < Critical Value so cannot reject H. Tests of Association & Independence Example: The CF_Genotypes data set contains where patients were genotyped for a specific genetic variation and the patients who were with infected with Pseudomonas aeruginosa were recorded. The expectation was that those with the less common A variant would have more severe disease. SPSS Analysis (Analyse>Descriptive Statistics > Crosstabs) PA Infection Present * API Genotype Variant Crosstabulation OBSERVED Count API Genotype Variant Total A G PA Infection Present No 1 1 1 Yes 1 Total 1 1 1 Workshop 3 ~ Measuring Association Page of 1

H : Rate of PA infection present in both genotypes is the same General Formula for Expected Frequencies: E = row total X column total overall total From SPSS PA Infection Present * API Genotype Variant Crosstabulation API Genotype Variant Total A G PA Infection Present No Count 1 1 1 Expected 13.7 1.3 1. Count (O-E) Residual -1.7 1.7 Yes Count 1 Expected.3 17.7. Count (O-E) Residual 1.7-1.7 Total Count 1 1 1 Expected Count 1. 1. 1. Chi-Square Tests (sample output) Value df Asymp. Sig. (- sided) Pearson Chi- 1.9 1.193 Square Fisher's Exact Test N of Valid 1 Exact Sig. (- sided) Exact Sig. (1- sided).7.17 Cases b 1 cells (5.%) have expected count less than 5. The minimum expected count is.9. Workshop 3 ~ Measuring Association Page 5 of 1

Compare with χ with 1 DF. Note: DF = (# rows 1) X (# cols 1) Conclusion: @ 5% sig. Level cannot reject H. => There is No statistically significant evidence that PA infection rates are higher in the Genotype A group. Odds Ratios Odds of Event E is defined as the ratio of the chance that E occurs v s the chance that E does not occur. Let Pr(E) be the probability (chance) of E occurring => 1 Pr(E) is the probability of E not occurring Odds of E = Pr (E) 1 Pr(E) Example o If the probability of E is ¼, then the Odds of E are {¼ / ¾} = 1/3 or 1:3 o If the probability of E is ½, then Odds of E are 1. Odds Ratio, θ is ratio of odds of two events (or conditions). Example ~ Event 1: Low birth weight in smokers; Event : Low birth weight in non-smokers (REF: 1.9) Workshop 3 ~ Measuring Association Page of 1

CF Genotype Data Example API Genotype Variant Total A G PA Infection Present No 1 (n 11 ) 1 (n 1 ) 1 Yes (n 1 ) 1 (n ) Total 1 1 1 Odds Ratio compares: o Odds of PA infection in Genotype A group, Odds A with o Odds of PA infection in Genotype G group, Odds G Odds A = /1 =.333 1 - /1 Odds G = 1/1 =.1 1 1/1 Odds Ratio, θˆ =.333 /.1 =.5 => Estimate that the odds of a contracting a PA Infection for patients in Genotype A group are more than twice that for patients in Genotype G group. Workshop 3 ~ Measuring Association Page 7 of 1

Note: 1. For X Contingency Table θˆ = n 11 X n n 1 X n 1. Odds Ratio is not Normally Distributed but Log Odds Ratio is. We usually work with C.I. for log Odds Ratio and present results as Exponential of C.I. 3. If Exp of C.I. includes 1, it is possible that odds of both events are equal. Workshop 3 ~ Measuring Association Page of 1

Linear Association Measures A review of methods used to describe a LINEAR relationship between continuous variables. o Correlation o Simple linear regression Correlation Describes the strength of a linear relationship between continuous variables Correlation Coefficient range: -1 to 1 o -1 => Perfect Negative Linear Relationship o 1 => Perfect Positive Linear Relationship o => No Linear Relationship 1 1 1 Y 1 1 1 X Workshop 3 ~ Measuring Association Page 9 of 1

1 1 1 Y 1 1 1 1 X 1 Y 1 1 1 X 1 1 Y 1 1 1 X Workshop 3 ~ Measuring Association Page 1 of 1

Simple Linear Regression (SLR) Method of estimating the linear relationship between continuous variables Terminology: o Y: Dependent variable, variable to be predicted o X: Independent variable, Explanatory variable SLR parameters Objective is to estimate straight line that describes relationship between Y & X. Regression Line: Y = α + βx + ε, where error, ε ~ N (,σ ) Require method to estimate α and β. Use method of Least Squares Find estimators, αˆ and βˆ such that S = n i = 1 ( ˆ ) αˆ β y i x i ANOVA for SLR: is minimized o Test H : β = v s H A : β o Divide the total variation in data into: variation due to Regression Line Workshop 3 ~ Measuring Association Page 11 of 1

residual variation o Total Variation = Regression + Residual source of df sum of squares mean F- variation square ratio Regression 1 Regression SS SS/df MS reg Residual (Error) Total n- Residual SS SS/df n-1 Total SS MS res Sig. Pr (F < F-ratio) If sig < Significance Level, then reject H. Conclude β and there is evidence of a linear relationship between Y and X. Note: From ANOVA table, MS res provides an unbiased estimate of the random, unexplained variation in the data; i.e. an unbiased estimate of σ Workshop 3 ~ Measuring Association Page 1 of 1

R : Co-efficient of Determination The proportion of variation in Y that is attributed to its linear regression on X R Regression Sum = Total Sum of of Squares Squares = S S xx xy S yy Range: 1 Closer to 1 => Better fit of regression line to data R = (Correlation Co-efficient) EXAMPLE Lung Function Data Set FVC (forced vital capacity) and FEV (forced expiratory volume) measure the volume capacity of the lung and air volume expired. Both are standard measurements of lung function and are expected to be highly correlated. Dependent Variable: Y ~ FEV Independent Variable: X ~ FVC SPSS Analysis Scatter plot of FEV v s FVC Workshop 3 ~ Measuring Association Page 13 of 1

13 1 11 Forced Expiratory Volume 1 9 7 7 9 1 11 1 13 1 Forced Lung Capacity Correlation & R (SPSS: Analyze > Regression > Linear ) Model Summary Model R R Square Adjusted R Std. Error of the Estimate Square 1..3.3.1 a Predictors: (Constant), Forced Lung Capacity b Dependent Variable: Forced Expiratory Volume Workshop 3 ~ Measuring Association Page 1 of 1

ANOVA Table Testing H : β = ANOVA Model Sum of Squares df Mean Square F Sig. 1 Regression 319.7 1 319.7 9.5. Residual 9.57 11 71. Total 11.93 115 a Predictors: (Constant), Forced Lung Capacity b Dependent Variable: Forced Expiratory Volume As sig <.5 => there is strong evidence of linear relationship between FEV and FVC. Regression Estimators Coefficients Unstandardized Coefficients t Sig. 95% Confidence Interval for B Model B Std. Error Lower Bound Upper Bound 1 (Constant) 33.5 7.7.753. 19. 7.75 Forced Lung Capacity.51.9 9.3..51.7 a Dependent Variable: Forced Expiratory Volume Regression Line: FEV = 33.5+.51 * FVC T-test of β significant => evidence of linear relationship Workshop 3 ~ Measuring Association Page 15 of 1

Error Diagnostics Histogram Dependent Variable: Forced Expiratory Volume 1 1 1 1 Frequency -1.75 -.5 -.75-3.5.5 -.5 -.75-1.5.75 1.75 1.5 Std. Dev = 1. Mean =. N = 11. Regression Standardized Residual Normal P-P Plot of Regression Stand Dependent Variable: Forced Expirato 1..75 Expected Cum Prob.5.5...5.5.75 1. Observed Cum Prob Workshop 3 ~ Measuring Association Page 1 of 1