Lecture 5: ANOVA and Correlation
|
|
- Bernard Giles Farmer
- 6 years ago
- Views:
Transcription
1 Lecture 5: ANOVA and Correlation Ani Manichaikul 23 April / 62
2 Comparing Multiple Groups Continous data: comparing means Analysis of variance Binary data: comparing proportions Pearson s Chi-square tests for r 2 tables Independence Goodness of Fit Homogeneity Categorical data: r c tables Pearson chi-square tests Odds ratio and relative risk 2 / 62
3 ANOVA: Definition Statistical technique for comparing means for multiple populations Partitioning the total variation in a data set into components defined by specific sources ANOVA = ANalysis Of VAriance 3 / 62
4 ANOVA: Concepts Estimate group means Assess magnitude of variation attributable to specific sources Extension of 2-sample t-test to multiple groups Population model Sample model: estimates, standard errors Partition of variability 4 / 62
5 Types of ANOVA One-way ANOVA One factor e.g. smoking status Two-way ANOVA Two factors e.g. gender and smoking status Three-way ANOVA Three factors e.g. gender, smoking and beer 5 / 62
6 Emphasis One-way ANOVA is an extension of the t-test to 3 or more samples focus analysis on group differences Two-way ANOVA (and higher) focuses on the interaction of factors Does the effect due to one factor change as the level of another factor changes? 6 / 62
7 ANOVA Rationale I Variation Variation Variation between each between each in all = observation + group mean observations and its group and the overall mean mean In other words, Total = Within group + Between groups sum of squares sum of squares sum of squares 7 / 62
8 ANOVA Rationale II In shorthand: SST = SSW + SSB If the group means are not very different, the variation between them and the overall mean (SSB) will not be much more than the variation between the observations within a group (SSW) 8 / 62
9 ANOVA: One-Way 9 / 62
10 MSW We can pool the estimates of σ 2 across groups and use an overall estimate for the population variance: Variation within a group = ˆσ 2 W = SSW N k = MSW MSW is called the within groups mean square 10 / 62
11 MSB We can also look at systematic variation among groups Variation between groups = ˆσ 2 B = SSB k 1 = MSB 11 / 62
12 An ANOVA table Suppose there are k groups (e.g. if smoking status has categories current, former or never, then k=3) We calculate our test statistic using the sum of square values as follows: 12 / 62
13 Hypothesis testing with ANOVA In performing ANOVA, we may want to ask: is there truly a difference in means across groups? Formally, we can specify the hypotheses: H 0 : µ 1 = µ 2 = = µ k H a : at least one of the µ i s is different The null hypothesis specifies a global relationship If the result of the test is significant, then perform individual comparisons 13 / 62
14 Goal of the comparisons Compare the two variability estimates, MSW and MSB If F obs = MSW MSB = ˆσ2 B is small, ˆσ W 2 then variability between groups is negligible compared to variation within groups The grouping does not explain much variation in the data 14 / 62
15 The F-statistic For our observations, we assume X N(µgp, σ 2 ), where µgp = E(X gp) = β 0 + β 1 I (group=2) + β 1 I (group=3) + ) and I (group=i) is an indicator to denote whether or not each individual is in the i th group Note: we have assumed the same variance σ 2 for all groups important to check this assumption Under these assumptions, we know the null distribution of the statistic F= MSB MSW The distribution is called an F-distribution 15 / 62
16 The F-distribution Remember that a χ 2 distribution is always specified by its degrees of freedom An F-distribution is any distribution obtained by taking the quotient of two χ 2 distributions divided by their respective degrees of freedom When we specify an F-distribution, we must state two parameters, which correspond to the degrees of freedom for the two χ 2 distributions If X 1 χ 2 df 1 and X 2 χ 2 df 2 we write: X 1 /df 1 X 2 /df 2 F df1,df 2 16 / 62
17 Back to the hypothesis test... Knowing the null distribution of MSB MSW, we can define a decision rule to test the hypothesis for ANOVA: Reject H 0 if F F α;k 1,N k Fail to reject H 0 if F < F α;k 1,N k 17 / 62
18 ANOVA: F-tests I 18 / 62
19 ANOVA: F-tests II 19 / 62
20 Example: ANOVA for HDL Study design: Randomize control trial 132 men randomized to one of Diet + exericse Diet Control Follow-up one year later: 119 men remaining in study Outcome: mean change in plasma levels of HDL cholesterol from baseline to one-year follow-up in the three groups 20 / 62
21 Model for HDL outcomes We model the means for each group as follows: µ c = E(HDL gp = c) = mean change in control group µ d = E(HDL gp = d) = mean change in diet group µ de = E(HDL gp = de) = mean change in diet and exercise group We could also write the model as E(HDL gp) = β 0 + β 1 I (gp = d) + β 2 I (gp = de) Recall that I(gp=D), I(gp=DE) are 0/1 group indicators 21 / 62
22 HDL ANOVA Table We obtain the following results from the HDL experiment: 22 / 62
23 HDL ANOVA results F-test H 0 : µ c = µ d = µ de (or H 0 : β 1 = β 2 = 0) H a : at least one mean is different from the others Test statistic F obs = 13 df 1 = k 1 = 3 1 = 2 df 2 = N k = / 62
24 HDL ANOVA Conclusions Rejection region: F > F 0.05;2,116 = 3.07 Since F obs = 13.0 > 3.07, we reject H 0 We conclude that at least one of the group means is different from the others 24 / 62
25 Which groups are different? We might proceed to make individual comparisons Conduct two-sample t-tests for each pair of groups: t = ˆθ θ 0 SE(ˆθ) = X i X j 0 sp 2 n i + s2 p n j 25 / 62
26 Multiple Comparisons Performing individual comparisons require multiple hypothesis tests If α = 0.05 for each comparison, there is a 5% chance that each comparison will falsely be called significant Overall, the probability of Type I error is elevated above 5% Question How can we address this multiple comparisons issue? 26 / 62
27 Bonferroni adjustment A possible correction for multiple comparisons Test each hypothesis at level α = (α/3) = Adjustment ensures overall Type I error rate does not exceed α = 0.05 However, this adjustment may be too conservative 27 / 62
28 Multiple comparisons α Hypothesis α = α/3 H 0 : µ c = µ d (or β 1 = 0) H 0 : µ c = µ de (or β 2 = 0) H 0 : µ d = µ de (or β 1 β 2 = 0) Overall α = / 62
29 HDL: Pairwise comparisons I Control and Diet groups H 0 : µ c = µ d (or β 1 = 0) t = q p-value = 0.06 = / 62
30 HDL: Pairwise comparisons II Control and Diet + exercise groups H 0 : µ c = µ de (or β 2 = 0) t = q = 5.05 p-value = / 62
31 HDL: Pairwise comparisons III Diet and Diet + exercise groups H 0 : µ d = µ de (or β 1 β 2 = 0) t = q p-value = = / 62
32 Bonferroni corrected p-values Hypothesis p-value adjusted p-value H 0 : µ c = µ d H 0 : µ c = µ de H 0 : µ d = µ de Overall α = 0.05 Conclusion: Significant difference in HDL change for DE group compared to other groups 32 / 62
33 Two-way ANOVA Uses the same idea as one-way ANOVA by partitioning variability Allows us to look at interaction of factors Does the effect due to one factor change as the level of another factor changes? 33 / 62
34 Example: Public health students medical expenditures Study design: In an observation study, total medical expenditures and various demographic characteristics were recorded for 200 public health students Goal: determine how gender and smoking status affect total medical expenditures in this population 34 / 62
35 Example: Set-up Y = Total medical expenditures F = Indicator of Female = 1 if Gender=Female, 0 otherwise S = Indicator of Smoking = 1 if smoked 100 cigarettes or more, 0 otherwise 35 / 62
36 Interaction model We assume the model Y N(µ, σ 2 ) where µ = E(Y ) = β 0 + β 1 F + β 2 S + β 3 F S What are the interpretations of β 0, β 1, β 2, and β 3 36 / 62
37 Two-way ANOVA: Interactions Mean Model µ = E(Y ) = β 0 + β 1 F + β 2 S + β 3 F S Gender No Smoker Yes Male β 0 β 0 + β 2 Female β 0 + β 1 β 0 + β 1 + β 2 + β 3 37 / 62
38 Mean Model E(Expenditure Male, non-smoker) = β 0 + β β β 3 0 = β 0 E(Expenditure Female, non-smoker) = β 0 + β β β 3 0 = β 0 + β 1 E(Expenditure Male, Smoker) = β 0 + β β β 3 0 = β 0 + β 2 E(Expenditure Female, Smoker) = β 0 + β β β 3 1 = β 0 + β 1 + β 2 + β 3 38 / 62
39 Medical Expenditures: ANOVA table Source of Sum of Mean Variation Square df Square F p-value Model (between groups) < Error (within groups) Total / 62
40 Medical Expenditures: Results Overall model F-test: H 0 : β 1 = β 2 = β 3 = 0 H a : At least one group is different Test statistic: F obs = df 1 = k 1 = 3 df 2 = N k = 196 p-value < / 62
41 Medical Expenditures: Overall Conclusions The medical expenditures are different in at least one of the groups Now we can figure out which ones / 62
42 Medical Expenditures: Two-way ANOVA I Table of coefficient estimates Coefficient Estimate Standard Error β 0 (baseline) β 1 (female effect) β 2 (smoker effect) β 3 (female*smoke) / 62
43 Medical Expenditures: Two-way ANOVA II Test statistics and confidence intervals Coefficient t P> t 95% Confidence interval β (3870, 6228) β (276, 3292) β (-1187, 3001) β (3434, 9043) 43 / 62
44 Medical Expenditures: Group-wise Conclusions In this population, an average male non-smoker spends about $5000 on medical costs per year Males who smoked were estimated as having spent about $900 more than non-smokers, but this difference was not found to be statistically significant Female non-smokers spent about $1700 more than there non-smoking male counterparts Female smokers spent about $8900 (= β 1 + β 2 + β 3 ) more than non-smoking males 44 / 62
45 Association and Correlation I Association Express the relationship between two variables Can be measured in different ways, depending on the nature of the variables For now, we ll focus on continuous variables (e.g. height, weight) Important note: association does not imply causation 45 / 62
46 Association and Correlation II Describing the relationship between two continuous variables Correlation analysis Measures strength of relationship between two variables Specifies direction of relationship Regression analysis Concerns prediction or estimation of outcome variable, based on value of another variable (or variables) 46 / 62
47 Correlation analysis Plot the data (or have a computer to do so) Visually inspect the relationship between two continous variables Is there a linear relationship (correlation)? Are there outliers? Are the distributions skewed? 47 / 62
48 Correlation Coefficient I Measures the strength and direction of the linear relationship between to variables X and Y Population correlation coefficient, ρ = cov(x, Y ) var(x ) var(y ) = E[(X µ X )(Y µ Y )] E[(X µx ) 2 ] E[(Y µ Y ) 2 ] 48 / 62
49 Correlation Coefficient II The correlation coefficient, ρ, takes values between -1 and +1-1: Perfect negative linear relationship 0: No linear relationship +1: Perfect positive relationship 49 / 62
50 Correlation Coefficient III Sample correlation coefficient: Obtained by plugging sample estimates into the population correlation coefficient r = sample cov(x, Y ) s 2 x s 2 Y = n n i=1 i=1 (X i X ) 2 n 1 (X i X )(Y i Ȳ ) n 1 n i=1 (Y i Ȳ ) 2 n 1 50 / 62
51 Correlation Coefficient IV Plot standardized Y versus standardized X Observe an ellipse (elongated circle) Correlation is the slope of the major axis 51 / 62
52 Correlation Notes Other names for r Pearson correlation coefficient Product moment of correlation Characteristics of r Measures *linear* association The value of r is independent of units used to measure the variables The value of r is sensitive to outliers r 2 tells us what proportion of variation in Y is explained by linear relationship with X 52 / 62
53 Several levels of correlation 53 / 62
54 Examples of the Correlation Coefficient I Perfect positive correlation, r 1 54 / 62
55 Examples of the Correlation Coefficient II Perfect negative correlation, r / 62
56 Examples of the Correlation Coefficient III Imperfect positive correlation, 0< r <1 56 / 62
57 Examples of the Correlation Coefficient IV Imperfect negative correlation, -1<r <0 57 / 62
58 Examples of the Correlation Coefficient V No relation, r 0 58 / 62
59 Examples of the Correlation Coefficient VI Some relation but little *linear* relationship, r 0 59 / 62
60 Association and Causality In general, association between two variables means there some form of relationship between them The relationship is not necessarily causal Association does not imply causation, no matter how much we would like it to Example: Hot days, ice cream, drowning 60 / 62
61 Sir Bradford Hill s Criteria for Causality I Strength: magnitude of association Consistency of association: repeated observation of the association in different situations Specificity: uniqueness of the association Temporality: cause precedes effect 61 / 62
62 Sir Bradford Hill s Criteria for Causality II Biologic gradient: dose-response relationship Biologic plausibility: known mechanisms Coherence: makes sense based on other known facts Experimental evidence: from designed (randomized) experiments Analogy: with other known associations 62 / 62
Statistical methods for comparing multiple groups. Lecture 7: ANOVA. ANOVA: Definition. ANOVA: Concepts
Statistical methods for comparing multiple groups Lecture 7: ANOVA Sandy Eckel seckel@jhsph.edu 30 April 2008 Continuous data: comparing multiple means Analysis of variance Binary data: comparing multiple
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationDisadvantages of using many pooled t procedures. The sampling distribution of the sample means. The variability between the sample means
Stat 529 (Winter 2011) Analysis of Variance (ANOVA) Reading: Sections 5.1 5.3. Introduction and notation Birthweight example Disadvantages of using many pooled t procedures The analysis of variance procedure
More informationLecture 6 Multiple Linear Regression, cont.
Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression
More informationChapter 8 Student Lecture Notes 8-1. Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance
Chapter 8 Student Lecture Notes 8-1 Department of Economics Business Statistics Chapter 1 Chi-square test of independence & Analysis of Variance ECON 509 Dr. Mohammad Zainal Chapter Goals After completing
More informationDepartment of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance ECON 509. Dr.
Department of Economics Business Statistics Chapter 1 Chi-square test of independence & Analysis of Variance ECON 509 Dr. Mohammad Zainal Chapter Goals After completing this chapter, you should be able
More information16.400/453J Human Factors Engineering. Design of Experiments II
J Human Factors Engineering Design of Experiments II Review Experiment Design and Descriptive Statistics Research question, independent and dependent variables, histograms, box plots, etc. Inferential
More informationIntroduction to Business Statistics QM 220 Chapter 12
Department of Quantitative Methods & Information Systems Introduction to Business Statistics QM 220 Chapter 12 Dr. Mohammad Zainal 12.1 The F distribution We already covered this topic in Ch. 10 QM-220,
More informationWELCOME! Lecture 13 Thommy Perlinger
Quantitative Methods II WELCOME! Lecture 13 Thommy Perlinger Parametrical tests (tests for the mean) Nature and number of variables One-way vs. two-way ANOVA One-way ANOVA Y X 1 1 One dependent variable
More informationModule 03 Lecture 14 Inferential Statistics ANOVA and TOI
Introduction of Data Analytics Prof. Nandan Sudarsanam and Prof. B Ravindran Department of Management Studies and Department of Computer Science and Engineering Indian Institute of Technology, Madras Module
More informationSTATISTICS 141 Final Review
STATISTICS 141 Final Review Bin Zou bzou@ualberta.ca Department of Mathematical & Statistical Sciences University of Alberta Winter 2015 Bin Zou (bzou@ualberta.ca) STAT 141 Final Review Winter 2015 1 /
More informationInstitute of Actuaries of India
Institute of Actuaries of India Subject CT3 Probability & Mathematical Statistics May 2011 Examinations INDICATIVE SOLUTION Introduction The indicative solution has been written by the Examiners with the
More informationTHE PEARSON CORRELATION COEFFICIENT
CORRELATION Two variables are said to have a relation if knowing the value of one variable gives you information about the likely value of the second variable this is known as a bivariate relation There
More informationReview: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.
1 Review: Let X 1, X,..., X n denote n independent random variables sampled from some distribution might not be normal!) with mean µ) and standard deviation σ). Then X µ σ n In other words, X is approximately
More informationLecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests
Lecture 9 Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests Univariate categorical data Univariate categorical data are best summarized in a one way frequency table.
More informationClass 24. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700
Class 4 Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science Copyright 013 by D.B. Rowe 1 Agenda: Recap Chapter 9. and 9.3 Lecture Chapter 10.1-10.3 Review Exam 6 Problem Solving
More informationSimple Linear Regression
Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.
More informationThe t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary
Patrick Breheny October 13 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/25 Introduction Introduction What s wrong with z-tests? So far we ve (thoroughly!) discussed how to carry out hypothesis
More informationAnalysing data: regression and correlation S6 and S7
Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association
More informationOne-Way Analysis of Variance. With regression, we related two quantitative, typically continuous variables.
One-Way Analysis of Variance With regression, we related two quantitative, typically continuous variables. Often we wish to relate a quantitative response variable with a qualitative (or simply discrete)
More informationSTAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression
STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression Rebecca Barter April 20, 2015 Fisher s Exact Test Fisher s Exact Test
More informationINTERVAL ESTIMATION AND HYPOTHESES TESTING
INTERVAL ESTIMATION AND HYPOTHESES TESTING 1. IDEA An interval rather than a point estimate is often of interest. Confidence intervals are thus important in empirical work. To construct interval estimates,
More informationLecture 7: Hypothesis Testing and ANOVA
Lecture 7: Hypothesis Testing and ANOVA Goals Overview of key elements of hypothesis testing Review of common one and two sample tests Introduction to ANOVA Hypothesis Testing The intent of hypothesis
More informationChapter Seven: Multi-Sample Methods 1/52
Chapter Seven: Multi-Sample Methods 1/52 7.1 Introduction 2/52 Introduction The independent samples t test and the independent samples Z test for a difference between proportions are designed to analyze
More information2 Hand-out 2. Dr. M. P. M. M. M c Loughlin Revised 2018
Math 403 - P. & S. III - Dr. McLoughlin - 1 2018 2 Hand-out 2 Dr. M. P. M. M. M c Loughlin Revised 2018 3. Fundamentals 3.1. Preliminaries. Suppose we can produce a random sample of weights of 10 year-olds
More informationImportant note: Transcripts are not substitutes for textbook assignments. 1
In this lesson we will cover correlation and regression, two really common statistical analyses for quantitative (or continuous) data. Specially we will review how to organize the data, the importance
More information10: Crosstabs & Independent Proportions
10: Crosstabs & Independent Proportions p. 10.1 P Background < Two independent groups < Binary outcome < Compare binomial proportions P Illustrative example ( oswege.sav ) < Food poisoning following church
More informationInferences About Two Proportions
Inferences About Two Proportions Quantitative Methods II Plan for Today Sampling two populations Confidence intervals for differences of two proportions Testing the difference of proportions Examples 1
More informationLecture 12: Effect modification, and confounding in logistic regression
Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression
More informationAnalysis of Variance Bios 662
Analysis of Variance Bios 662 Michael G. Hudgens, Ph.D. mhudgens@bios.unc.edu http://www.bios.unc.edu/ mhudgens 2008-10-21 13:34 BIOS 662 1 ANOVA Outline Introduction Alternative models SS decomposition
More informationSummary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)
Summary of Chapter 7 (Sections 7.2-7.5) and Chapter 8 (Section 8.1) Chapter 7. Tests of Statistical Hypotheses 7.2. Tests about One Mean (1) Test about One Mean Case 1: σ is known. Assume that X N(µ, σ
More informationCorrelation and simple linear regression S5
Basic medical statistics for clinical and eperimental research Correlation and simple linear regression S5 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/41 Introduction Eample: Brain size and
More informationPsych 230. Psychological Measurement and Statistics
Psych 230 Psychological Measurement and Statistics Pedro Wolf December 9, 2009 This Time. Non-Parametric statistics Chi-Square test One-way Two-way Statistical Testing 1. Decide which test to use 2. State
More informationBinary Logistic Regression
The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b
More informationWhile you wait: Enter the following in your calculator. Find the mean and sample variation of each group. Bluman, Chapter 12 1
While you wait: Enter the following in your calculator. Find the mean and sample variation of each group. Bluman, Chapter 12 1 Chapter 12 Analysis of Variance McGraw-Hill, Bluman, 7th ed., Chapter 12 2
More informationStatistics in medicine
Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu
More informationIntroduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs
Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs The Analysis of Variance (ANOVA) The analysis of variance (ANOVA) is a statistical technique
More informationBusiness Statistics. Lecture 10: Correlation and Linear Regression
Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form
More informationStatistics Introductory Correlation
Statistics Introductory Correlation Session 10 oscardavid.barrerarodriguez@sciencespo.fr April 9, 2018 Outline 1 Statistics are not used only to describe central tendency and variability for a single variable.
More informationThe Components of a Statistical Hypothesis Testing Problem
Statistical Inference: Recall from chapter 5 that statistical inference is the use of a subset of a population (the sample) to draw conclusions about the entire population. In chapter 5 we studied one
More informationBlack White Total Observed Expected χ 2 = (f observed f expected ) 2 f expected (83 126) 2 ( )2 126
Psychology 60 Fall 2013 Practice Final Actual Exam: This Wednesday. Good luck! Name: To view the solutions, check the link at the end of the document. This practice final should supplement your studying;
More informationIntroduction to the Analysis of Variance (ANOVA)
Introduction to the Analysis of Variance (ANOVA) The Analysis of Variance (ANOVA) The analysis of variance (ANOVA) is a statistical technique for testing for differences between the means of multiple (more
More informationCorrelation & Simple Regression
Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.
More informationChapter 10: Chi-Square and F Distributions
Chapter 10: Chi-Square and F Distributions Chapter Notes 1 Chi-Square: Tests of Independence 2 4 & of Homogeneity 2 Chi-Square: Goodness of Fit 5 6 3 Testing & Estimating a Single Variance 7 10 or Standard
More informationContingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.
Contingency Tables Definition & Examples. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels. (Using more than two factors gets complicated,
More informationMTMS Mathematical Statistics
MTMS.01.099 Mathematical Statistics Lecture 12. Hypothesis testing. Power function. Approximation of Normal distribution and application to Binomial distribution Tõnu Kollo Fall 2016 Hypothesis Testing
More informationTable of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).
Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). For example P(X.04) =.8508. For z < 0 subtract the value from,
More informationHypothesis testing. Data to decisions
Hypothesis testing Data to decisions The idea Null hypothesis: H 0 : the DGP/population has property P Under the null, a sample statistic has a known distribution If, under that that distribution, the
More informationLecture 9: Linear Regression
Lecture 9: Linear Regression Goals Develop basic concepts of linear regression from a probabilistic framework Estimating parameters and hypothesis testing with linear models Linear regression in R Regression
More informationContingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878
Contingency Tables I. Definition & Examples. A) Contingency tables are tables where we are looking at two (or more - but we won t cover three or more way tables, it s way too complicated) factors, each
More informationRegression models. Categorical covariate, Quantitative outcome. Examples of categorical covariates. Group characteristics. Faculty of Health Sciences
Faculty of Health Sciences Categorical covariate, Quantitative outcome Regression models Categorical covariate, Quantitative outcome Lene Theil Skovgaard April 29, 2013 PKA & LTS, Sect. 3.2, 3.2.1 ANOVA
More informationSociology 6Z03 Review II
Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationChapter 12 - Lecture 2 Inferences about regression coefficient
Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous
More informationTesting Independence
Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1
More informationLECTURE 5. Introduction to Econometrics. Hypothesis testing
LECTURE 5 Introduction to Econometrics Hypothesis testing October 18, 2016 1 / 26 ON TODAY S LECTURE We are going to discuss how hypotheses about coefficients can be tested in regression models We will
More informationCan you tell the relationship between students SAT scores and their college grades?
Correlation One Challenge Can you tell the relationship between students SAT scores and their college grades? A: The higher SAT scores are, the better GPA may be. B: The higher SAT scores are, the lower
More informationInference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58
Inference ME104: Linear Regression Analysis Kenneth Benoit August 15, 2012 August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Stata output resvisited. reg votes1st spend_total incumb minister
More informationMathematical Notation Math Introduction to Applied Statistics
Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and should be emailed to the instructor
More informationReview of Multiple Regression
Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate
More informationPartitioning the Parameter Space. Topic 18 Composite Hypotheses
Topic 18 Composite Hypotheses Partitioning the Parameter Space 1 / 10 Outline Partitioning the Parameter Space 2 / 10 Partitioning the Parameter Space Simple hypotheses limit us to a decision between one
More informationTA: Sheng Zhgang (Th 1:20) / 342 (W 1:20) / 343 (W 2:25) / 344 (W 12:05) Haoyang Fan (W 1:20) / 346 (Th 12:05) FINAL EXAM
STAT 301, Fall 2011 Name Lec 4: Ismor Fischer Discussion Section: Please circle one! TA: Sheng Zhgang... 341 (Th 1:20) / 342 (W 1:20) / 343 (W 2:25) / 344 (W 12:05) Haoyang Fan... 345 (W 1:20) / 346 (Th
More informationConfidence Intervals, Testing and ANOVA Summary
Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0
More informationSTAT 4385 Topic 03: Simple Linear Regression
STAT 4385 Topic 03: Simple Linear Regression Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2017 Outline The Set-Up Exploratory Data Analysis
More informationChapter Eight: Assessment of Relationships 1/42
Chapter Eight: Assessment of Relationships 1/42 8.1 Introduction 2/42 Background This chapter deals, primarily, with two topics. The Pearson product-moment correlation coefficient. The chi-square test
More informationSTAT 4385 Topic 01: Introduction & Review
STAT 4385 Topic 01: Introduction & Review Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 Outline Welcome What is Regression Analysis? Basics
More informationCorrelation and Regression
Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class
More informationInferences for Regression
Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In
More informationAnalysis of Variance
Statistical Techniques II EXST7015 Analysis of Variance 15a_ANOVA_Introduction 1 Design The simplest model for Analysis of Variance (ANOVA) is the CRD, the Completely Randomized Design This model is also
More informationSTA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3
STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae
More informationMarketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12)
Marketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12) Remember: Z.05 = 1.645, Z.01 = 2.33 We will only cover one-sided hypothesis testing (cases 12.3, 12.4.2, 12.5.2,
More informationLinear models and their mathematical foundations: Simple linear regression
Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction
More informationChapter 10: STATISTICAL INFERENCE FOR TWO SAMPLES. Part 1: Hypothesis tests on a µ 1 µ 2 for independent groups
Chapter 10: STATISTICAL INFERENCE FOR TWO SAMPLES Part 1: Hypothesis tests on a µ 1 µ 2 for independent groups Sections 10-1 & 10-2 Independent Groups It is common to compare two groups, and do a hypothesis
More informationCorrelation. A statistics method to measure the relationship between two variables. Three characteristics
Correlation Correlation A statistics method to measure the relationship between two variables Three characteristics Direction of the relationship Form of the relationship Strength/Consistency Direction
More informationANOVA: Comparing More Than Two Means
ANOVA: Comparing More Than Two Means Chapter 11 Cathy Poliak, Ph.D. cathy@math.uh.edu Office Fleming 11c Department of Mathematics University of Houston Lecture 25-3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More informationStatistical Hypothesis Testing
Statistical Hypothesis Testing Dr. Phillip YAM 2012/2013 Spring Semester Reference: Chapter 7 of Tests of Statistical Hypotheses by Hogg and Tanis. Section 7.1 Tests about Proportions A statistical hypothesis
More informationDETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics
DETAILED CONTENTS About the Author Preface to the Instructor To the Student How to Use SPSS With This Book PART I INTRODUCTION AND DESCRIPTIVE STATISTICS 1. Introduction to Statistics 1.1 Descriptive and
More informationFormal Statement of Simple Linear Regression Model
Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor
More informationParametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami
Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationCHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity
CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity Prof. Kevin E. Thorpe Dept. of Public Health Sciences University of Toronto Objectives 1. Be able to distinguish among the various
More informationMATH Notebook 3 Spring 2018
MATH448001 Notebook 3 Spring 2018 prepared by Professor Jenny Baglivo c Copyright 2010 2018 by Jenny A. Baglivo. All Rights Reserved. 3 MATH448001 Notebook 3 3 3.1 One Way Layout........................................
More informationSection 4.6 Simple Linear Regression
Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval
More informationREVIEW 8/2/2017 陈芳华东师大英语系
REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p
More informationWISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A
WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, 2016-17 Academic Year Exam Version: A INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This
More informationLecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2
Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Fall, 2013 Page 1 Random Variable and Probability Distribution Discrete random variable Y : Finite possible values {y
More informationy ˆ i = ˆ " T u i ( i th fitted value or i th fit)
1 2 INFERENCE FOR MULTIPLE LINEAR REGRESSION Recall Terminology: p predictors x 1, x 2,, x p Some might be indicator variables for categorical variables) k-1 non-constant terms u 1, u 2,, u k-1 Each u
More information1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College
1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College Spring 2010 The basic ANOVA situation Two variables: 1 Categorical, 1 Quantitative Main Question: Do the (means of) the quantitative
More informationChapter 11. Correlation and Regression
Chapter 11. Correlation and Regression The word correlation is used in everyday life to denote some form of association. We might say that we have noticed a correlation between foggy days and attacks of
More informationMultiple Regression: Chapter 13. July 24, 2015
Multiple Regression: Chapter 13 July 24, 2015 Multiple Regression (MR) Response Variable: Y - only one response variable (quantitative) Several Predictor Variables: X 1, X 2, X 3,..., X p (p = # predictors)
More informationThe One-Way Independent-Samples ANOVA. (For Between-Subjects Designs)
The One-Way Independent-Samples ANOVA (For Between-Subjects Designs) Computations for the ANOVA In computing the terms required for the F-statistic, we won t explicitly compute any sample variances or
More informationGoodness of Fit Tests
Goodness of Fit Tests Marc H. Mehlman marcmehlman@yahoo.com University of New Haven (University of New Haven) Goodness of Fit Tests 1 / 38 Table of Contents 1 Goodness of Fit Chi Squared Test 2 Tests of
More informationStatistics: revision
NST 1B Experimental Psychology Statistics practical 5 Statistics: revision Rudolf Cardinal & Mike Aitken 29 / 30 April 2004 Department of Experimental Psychology University of Cambridge Handouts: Answers
More informationReview for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling
Review for Final For a detailed review of Chapters 1 7, please see the review sheets for exam 1 and. The following only briefly covers these sections. The final exam could contain problems that are included
More informationSuppose that we are concerned about the effects of smoking. How could we deal with this?
Suppose that we want to study the relationship between coffee drinking and heart attacks in adult males under 55. In particular, we want to know if there is an association between coffee drinking and heart
More informationISQS 5349 Final Exam, Spring 2017.
ISQS 5349 Final Exam, Spring 7. Instructions: Put all answers on paper other than this exam. If you do not have paper, some will be provided to you. The exam is OPEN BOOKS, OPEN NOTES, but NO ELECTRONIC
More informationUnit 6 - Introduction to linear regression
Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,
More informationBivariate Relationships Between Variables
Bivariate Relationships Between Variables BUS 735: Business Decision Making and Research 1 Goals Specific goals: Detect relationships between variables. Be able to prescribe appropriate statistical methods
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent
More information