Research Methodology: Tools
|
|
- Clifton Morrison
- 6 years ago
- Views:
Transcription
1 MSc Business Administration Research Methodology: Tools Applied Data Analysis (with SPSS) Lecture 05: Contingency Analysis March 2014 Prof. Dr. Jürg Schwarz Lic. phil. Heidi Bruderer Enzler Contents Slide 2 Aims of the Lecture 3 Typical Syntax 4 Introduction 5 Examples... 5 Overview 10 Concept of the Contingency analysis 11 Key Steps in Contingency Analysis Step 2: Creating a Cross Table Step 3: Comparison of Empirical Values with Expected Values Intermezzo: χ 2 -Distribution Prerequisites Yates's Correction Fisher's Exact Test Step 4: Measures of Association Notes about the Contingency Analysis Contingency Analysis with SPSS: A Detailed Example 24 Customer Satisfaction of a Retail Store (Nominal) Exmple Customer Satisfaction of a Retail Store (Ordinal) Appendix Measures of Association 36
2 Aims of the Lecture Slide 3 You know the concept of a cross table. You know the key steps in conducting a contingency analysis. You know the process of hypothesis testing with χ 2 test of independence. You know the Yates' correction and Fisher's exact test. You know measures of association. You can conduct a contingency analysis with SPSS In particular, you know how to > interpret the output use a layer variable Typical Syntax Slide 4 Crosstable with χ 2 Test CROSSTABS /TABLES=store BY service by contact /FORMAT=AVALUE TABLES /STATISTICS=CHISQ CC PHI LAMBDA /CELLS=COUNT ROW /COUNT ROUND CELL /BARCHART. Variables store and service Layer Variable χ 2 test CHISQ Percentages in rows
3 Introduction Examples Slide 5 Example I: Contingency Analysis ("Wahrscheinlichkeitsanalyse") Study on Smoking and Health. Random sample of n = 200 deaths in a hospital. Other cause Lung cancer Σ Non-smoker % 12 40% 119 Smoker 63 37% 18 60% 81 Σ % % 200 Is there a relationship between smoking and lung cancer? Questions Question in everyday language Is there a relationship between smoking and lung cancer? Research question Is there a higher probability that smokers die from lung cancer? Statistical question H 0 : The two characteristics are independent of each other. H A : The two characteristics are dependent on each other. Slide 6 Solution Using cross tables Perform a χ 2 test (χ is the Greek letter for "c") Calculate a measure of association ("How strong is the relation?") "How to" with SPSS SPSS: AnalyzeDescriptive StatisticsCrosstabs...
4 Slide 7 Results SPSS Output Typical statistical statement: There is a relationship between smoking and lung cancer (Chi-square: χ 2 = 4.658, df =1, p =.031). The value of Cramer's V (.167) is below 0.3, so the relationship is not very strong. Slide 8 Example II: Goodness of Fit (Homogeneity test) Sample of n = 100 smokers in hospital A Other cause Lung cancer Smoker Distribution of deaths in hospital B Other cause Lung cancer Smoker 87.7% 12.3% 100.0% Questions Question in everyday language Are smokers in hospital A and in hospital B equally likely to die from lung cancer? Research question (in this case it is the same like in everyday language) Are smokers in hospital A and in hospital B equally likely to die from lung cancer? Statistical question H 0 : The two distributions are the same. H A : The two distributions are not the same.
5 Solution Performing a χ 2 test Slide 9 "How to" with SPSS SPSS: AnalyzeNonparametric TestsLegacy DialogsChi-square> Results Typical statistical statement: The distribution of deaths among smokers in hospital A is not the same as in hospital B (Chi-square: χ 2 = 5.496, df = 1, p =.019). Overview Slide 10 "Crosstabulation" Contingency analysis (χ 2 test of independence) Is there evidence that the two characteristics are independent or not. Given: Two characteristics (variables) having nominal or ordinal scales Desired: Relationship between the characteristics Measures of association: Evidence of the strength of the relationship. χ 2 test of goodness of fit This topic is not further discussed in this course. Tests whether the frequencies of the different populations have the same distribution. Given: A characteristic (variable) having a nominal or ordinal scale Desired: Comparing two distributions Note: The difference between contingency analysis and goodness of fit is not relevant in terms of methodology; but the results are interpreted differently.
6 Concept of the Contingency analysis Key Steps in Contingency Analysis Slide Definition of the variables to be investigated Not all combinations of variables make sense! There should be theoretical or empirical information on the relationship. 2. Creating a cross table Absolute values and percentages A third variable can be used as a layer variable. 3. χ 2 test of independence: Comparison of empirical and expected frequencies Expected frequencies are calculated from the marginal distributions of the table. 4. Checking measures of association Calculate the measures of association Many different methods: depending of the scale type, among other things. Step 2: Creating a Cross Table Slide 12 (Sometimes called contingency table or "crosstab") Basic form Value of variable 2 Absolute frequency of cell "1,J" (variable 1 = 1, variable 2 = J) J Sum Value of variable 1 1 n 11 n 12 n 1J n 1. 2 n 21 n 22 n 2J n I n I1 n I2 n IJ n I. Sum n.1 n.2 n.j n.. Number of cases with value of variable 2 = 1 Total number of cases (sample size n) The sums of the rows and columns form the so-called marginal distribution.
7 Percentages It is often difficult to evaluate cross tabulations on the basis of absolute counts only. Percentages give a clearer picture. Slide 13 Step 3: Comparison of Empirical Values with Expected Values Slide 14 Marginal distributions Marginal distribution: One-dimensional distribution of the variable X, respectively Y Y X Other cause Lung cancer Σ p i Non-smoker % Smoker % Σ % q j 85.0% 15.0% 100.0% Given the nullhypothesis (H 0 ) one has: X and Y are independent. Then, the relative frequencies p i and q j can be used to calculate the probabilities in the cells as follows: Probability for an outcome {X = i Y = j}: P(X = i Y = j) = p i q j i = 1, 2, j = 1, 2
8 Slide 15 Expected two-dimensional distribution under H 0 Example: p 1 q 1 = 59.5% 85.0% = 50.6% Other cause Lung cancer p i Non-smoker 50.6% 8.9% 59.5% Smoker 34.4% 6.1% 40.5% q j 85.0% 15.0% Multiplication with n = 200 provides expected frequencies f' ij = p i q j n Example: f' 11 = p 1 q 1 n = 59.5% 85.0% 200 = Other cause Lung cancer Non-smoker Smoker Differences between empirical and expected frequencies Slide 16 Empirical frequencies f ij Other cause Lung cancer Non-smoker Smoker Expected frequencies f' ij Differences f ij - f' ij Other cause Lung cancer Non-smoker Smoker Other cause Lung cancer Non-smoker Smoker Test statistic (Karl Pearson, , British mathematician) k m ij ij χ = = = i= 1 j= 1 (f f') ( ) ( ) ( ) ( ) f' ij Test statistic is χ 2 distributed with ν = (k - 1) (m-1) degrees of freedom k = number of columns, m = number of rows 5.62
9 Intermezzo: χ 2 -Distribution Slide 17 The form of the χ 2 distribution depends on the parameter ν. For ν the χ 2 distribution approaches a normal distribution. Quantiles of the χ 2 -distribution Slide 18 ν degrees of freedom α significance level c critical value χ 2 = 5.62 Critical value for significance level α ν α = 0.5% α = 1.0% α = 5.0% : c = 3.84 Example lung cancer ν = (2-1) (2-1) = 1 χ 2 = > 3.84 H 0 can be rejected
10 Prerequisites Slide 19 The empirical distribution cannot be described exactly by the theoretical χ 2 distribution in all cases. Then the χ 2 test of independence must be adapted: Sample size The sample should be greater than 50 (n > 50). In the case of n when 20 < n 50: Use Yates's correction when n 20: Use Fisher's exact test Expected frequency The expected frequency should always be greater than 5 (f' ij > 5). In the case of f' ij 5... Use Fisher's exact test Merge rows and columns in order to increase the frequency (not recommended) Degrees of freedom (df) The degrees of freedom should always be greater than 1 (ν = (k - 1) (m-1) > 1). In the case of 2x2 tables (ν = 1): Use Yates's correction Yates's Correction Example: Effectiveness of a flu vaccination no influenza influenza sum no vaccination vaccination sum SPSS shows Yates's correction as "continuity correction." Slide 20 (f f' ) χ = χ = k m 2 k m 2 ij ij 2 Yates i= 1 j= 1 f' ij i= 1 j= 1 (f f' 0.5) ij ij f' ij 2
11 Fisher's Exact Test Slide 21 Is used instead of the χ 2 test of independence if the prerequisites are not fulfilled. Fisher's exact test is based on simulations, and has no prerequisites. Example: Effectiveness of the flu vaccine Fisher's exact test is automatically calculated in the case of 2x2 tables. Step 4: Measures of Association Slide 22 The χ 2 test of independence gives > only information about the existence of the relationship no indication of strength and direction of the relation Measures of association give some sense of the possible strength of the relation. Three groups of measures of association (see appendix) Nominal with nominal scaling Part A: based on χ 2 test statistic (symmetric) Part B: based of reduction of error (directional) Ordinal with ordinal scaling based on reduction of error Nominal with interval scaling based on correlation Other measures of association This topic will not be further covered in this course.
12 Notes about the Contingency Analysis Slide 23 In cases with more than two variables: In principle, the contingency analysis can be extended to 3 or more variables. A third variable can be introduced as a control variable (layer variable). For example on slide 27: Add "contact with employee" as a layer variable. Today, other types of models are used. Log-linear model (not subject of this course): Modelled frequencies in cross tables using analysis of variance. Logistic regression (see Lecture 08): Is a regression model in which the dependent variable is a binary variable. Correspondence analysis (not subject of this course): Is similar to principal component analysis (PCA). Transforms the data so that PCA can be used for cross tables. Contingency Analysis with SPSS: A Detailed Example Slide 24 Customer Satisfaction of a Retail Store (Nominal) Survey of 582 customers at 4 stores Factor analysis results The quality of customer service is the most important element in customer satisfaction. Overall satisfaction Price satisfaction Variety satisfaction Service satisfaction Quality satisfaction Follow-up question Does each of the store locations provide the same level of customer service? Does one of the store locations have an especially large potential for improvement?
13 SPSS: AnalyzeDescriptive StatisticsCrosstabs... Slide 25 Interpretation of the Output Slide 26 p >.05 There is no relation between service satisfaction and store. Differences of service satisfaction are due to chance variation.
14 Slide 27 Add "contact with employee" (contact) as a layer variable Not all customers interviewed had contact with a service employee. The evaluations of these customers do not reflect the actual quality of service of a store location. Separate the customers into those having contact with service employees, and those without CROSSTABS /TABLES=store BY service BY contact /FORMAT=AVALUE TABLES /STATISTICS=CHISQ CC PHI /CELLS=COUNT /COUNT ROUND CELL. Interpretation of the Output Slide 28 When customers have contact with service employees, the relationship between customer satisfaction and store location is significant (p =.012). This seems to be primarily due to low customer satisfaction in store location 2.
15 Measures of association: Strength of the relationship Slide 29 Symmetric measures Cramer's V is below 0.3, so although the relationship is significant, it is not very strong. Note regarding "Nominal" measures of association Phi: Only suitable for 2x2 tables Cramer's V: Suitable for tables larger than 2x2 Contingency coefficient: Suitable for tables larger than 2x2 that are quadratic Slide 30 Directional Measures Measures of association that are based on the χ 2 test statistic are not easy to interpret. Directional measures make an additional statement about the relationship between two variables. For example, we can better predict whether a customer belongs to a particular store location if the information on service satisfaction is provided in the customer service questionnaire. ("Store dependent" =.080)
16 Example Kruskal lambda 1) Prediction of store (neglecting service) Predicting the row variable (store) only from marginal distribution ("Total"), thus not yet taking into account the variable service. Frequencies are predicted according to the most frequent category. Slide 31 : Wrong cases = 201 Percentage of wrong cases 201/293 = 68.6% 2) Prediction of store taking service into account Predicting the row variable (store) by taking into account the service variable (service). Frequencies are predicted according to the most frequent category. Slide 32 : (49-24) (53-21) (81-25) (50-14) (60-24) Wrong cases Percentage of wrong cases 185/293 = 63.1%
17 Slide 33 3) Quantifying the improvement in the prediction, if service is considered Equation for Kruskal's lambda (λ) as directional measure Error of Prediction 1 - Error of Prediction % % λ row = = =.080 Error of Prediction % The interpretation is not straightforward (store is not "depending" on service). "λ row gives the proportion of errors that can be eliminated by taking account of knowledge >"* This value of.080 in the row "store Dependent" means that there is an 8.0% reduction in misclassification of which store a person belongs to, if service is taken into account. If the reduction is 100% then there is a perfect relation between the variables. *L.A. Goodman, W.H. Kruskal (1954). Measures of association for cross-classification. Journal of the American Statistical Association, 49, p Exmple Customer Satisfaction of a Retail Store (Ordinal) Slide 34 Survey of 582 customers in 4 stores Results Shopping frequency is related to general satisfaction. Questions How strong is the relationship? Which is the sign (positive or negative) of the relationship? Since the categories of both of these variables are ordered, you can use measures that quantify the strength and determine the sign of the association.
18 Measures of association: Strength of the relationship Slide 35 The value of Gamma (.140) is below 0.3, so the relationship is not very strong. Example: The value of.104 in the row "Shopping frequency Dependent" means that there is an 10.4% reduction in misclassification of which shopping frequency a person shows, if Overall satisfaction is taken into account. Appendix Measures of Association Slide 36 Nominal with nominal scaling Part A: based on the χ 2 test statistic Phi coefficient (ϕ) I is a measure of the degree of relationship between two binary variables. is comparable to a correlation coefficient in interpretation. includes the chi-square statistic and the sample size. has no theoretical upper limit. ϕ= 2 χ n Cramer's V I is a standardization of phi, so that the range of values lies between 0 and 1. 0 means that there is no relationship between the column and row variable. Values close to 1 indicate a high relationship between the two variables. The maximum value is dependent on the number of columns and rows in the table. is more conservative than phi when the number of columns and rows increases. 2 χ V = wobei R= min(i,j) n(r 1)
19 Slide 37 Contingency coefficient ( CC)I assumes values between SQRT[(R-1)/R], where R = min(i,j) is more conservative than phi when the relationship between the variables is stronger. CC CC max χ 2 = χ 2 + n = (R 1)/R wobei R= min(i,j) Part B: based on error reduction Measures of association that are based on the χ 2 test statistic are not easy to interpret. Directional measures make an additional statement about the relationship between two variables. Directional measures quantify the reduction of error by predicting the variable's value in a row, when the variable's value in a column is known, and vice versa. Each directional measure uses a different definition of "error". Goodman and Kruskal's tau (τ) I (SPSS: in output, when "lambda" is selected) defines error as incorrectly assigning a case. The cases are categorized to category j with a probability equal to the observed frequency of category j. The value 1 means that the dependent variable can be totally predicted by the independent variable. The value 0 means that it is not possible to predict the dependent variable through the independent variable. Slide 38 Goodman and Kruskal's lambda (λ) I defines error as incorrect assignment of a case. The cases are correspondingly categorized to the modal (most frequent) category. The value 1 means that the dependent variable can be totally predicted by the independent variable. The value 0 means that it is not possible to predict the dependent variable through the independent variable. Uncertainty coefficient I defines error as entropy (measurement of information content). The probabilities P(category j) * ln(p(category j)) are summed over all categories of the variables. For example, a value of 0.83 indicates that knowledge of a variable reduces the error in predicting the values of other variables by 83%. The uncertainty coefficient is also known as Theil s U.
20 Slide 39 Ordinal with Ordinal Scaling (Description from SPSS-Help) Gamma I symmetric measure of association that ranges between -1 and 1 Values close to an absolute value of 1 indicate a strong relationship Somers' d I is an asymmetric extension of gamma that differs only in the inclusion of the number of pairs not tied on the independent variable Kendall's tau-b I is a nonparametric measure of correlation that take ties into account The sign of the coefficient indicates the direction of the relationship The absolute value indicates the strength: The larger the stronger the relationship Values range from -1 to 1. A value of -1 or +1 can be obtained only from square tables Kendall's tau-c I is a nonparametric measure of correlation that ignores ties The sign of the coefficient indicates the direction of the relationship The absolute value indicates the strength: The larger the stronger the relationship Values range from -1 to 1. A value of -1 or +1 can be obtained only from square tables Notes: Slide 40
Frequency Distribution Cross-Tabulation
Frequency Distribution Cross-Tabulation 1) Overview 2) Frequency Distribution 3) Statistics Associated with Frequency Distribution i. Measures of Location ii. Measures of Variability iii. Measures of Shape
More informationLecture 8: Summary Measures
Lecture 8: Summary Measures Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 8:
More informationTextbook Examples of. SPSS Procedure
Textbook s of IBM SPSS Procedures Each SPSS procedure listed below has its own section in the textbook. These sections include a purpose statement that describes the statistical test, identification of
More informationCHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)
FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter
More information10: Crosstabs & Independent Proportions
10: Crosstabs & Independent Proportions p. 10.1 P Background < Two independent groups < Binary outcome < Compare binomial proportions P Illustrative example ( oswege.sav ) < Food poisoning following church
More informationResearch Methodology: Tools
MSc Business Administration Research Methodology: Tools Applied Data Analysis (with SPSS) Lecture 09: Introduction to Analysis of Variance (ANOVA) April 2014 Prof. Dr. Jürg Schwarz Lic. phil. Heidi Bruderer
More informationAnalysis of categorical data S4. Michael Hauptmann Netherlands Cancer Institute Amsterdam, The Netherlands
Analysis of categorical data S4 Michael Hauptmann Netherlands Cancer Institute Amsterdam, The Netherlands m.hauptmann@nki.nl 1 Categorical data One-way contingency table = frequency table Frequency (%)
More informationUtilization of Addictions Services
Utilization of Addictions Services Statistical Consulting Report for Sydney Weaver School of Social Work University of British Columbia by Lucy Cheng Department of Statistics University of British Columbia
More information2 Describing Contingency Tables
2 Describing Contingency Tables I. Probability structure of a 2-way contingency table I.1 Contingency Tables X, Y : cat. var. Y usually random (except in a case-control study), response; X can be random
More informationRelate Attributes and Counts
Relate Attributes and Counts This procedure is designed to summarize data that classifies observations according to two categorical factors. The data may consist of either: 1. Two Attribute variables.
More informationSections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21
Sections 2.3, 2.4 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 21 2.3 Partial association in stratified 2 2 tables In describing a relationship
More informationThere are statistical tests that compare prediction of a model with reality and measures how significant the difference.
Statistical Methods in Business Lecture 11. Chi Square, χ 2, Goodness-of-Fit Test There are statistical tests that compare prediction of a model with reality and measures how significant the difference.
More informationChapter 2: Describing Contingency Tables - II
: Describing Contingency Tables - II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu]
More informationGoodness of Fit Tests: Homogeneity
Goodness of Fit Tests: Homogeneity Mathematics 47: Lecture 35 Dan Sloughter Furman University May 11, 2006 Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, 2006 1 / 13 Testing
More informationParametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami
Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous
More informationFinding Relationships Among Variables
Finding Relationships Among Variables BUS 230: Business and Economic Research and Communication 1 Goals Specific goals: Re-familiarize ourselves with basic statistics ideas: sampling distributions, hypothesis
More informationContingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.
Contingency Tables Definition & Examples. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels. (Using more than two factors gets complicated,
More informationChi-Square. Heibatollah Baghi, and Mastee Badii
1 Chi-Square Heibatollah Baghi, and Mastee Badii Different Scales, Different Measures of Association Scale of Both Variables Nominal Scale Measures of Association Pearson Chi-Square: χ 2 Ordinal Scale
More informationCDA Chapter 3 part II
CDA Chapter 3 part II Two-way tables with ordered classfications Let u 1 u 2... u I denote scores for the row variable X, and let ν 1 ν 2... ν J denote column Y scores. Consider the hypothesis H 0 : X
More informationLISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014
LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers
More informationAn introduction to biostatistics: part 1
An introduction to biostatistics: part 1 Cavan Reilly September 6, 2017 Table of contents Introduction to data analysis Uncertainty Probability Conditional probability Random variables Discrete random
More informationTopic 21 Goodness of Fit
Topic 21 Goodness of Fit Contingency Tables 1 / 11 Introduction Two-way Table Smoking Habits The Hypothesis The Test Statistic Degrees of Freedom Outline 2 / 11 Introduction Contingency tables, also known
More informationMeans or "expected" counts: j = 1 j = 2 i = 1 m11 m12 i = 2 m21 m22 True proportions: The odds that a sampled unit is in category 1 for variable 1 giv
Measures of Association References: ffl ffl ffl Summarize strength of associations Quantify relative risk Types of measures odds ratio correlation Pearson statistic ediction concordance/discordance Goodman,
More informationReadings Howitt & Cramer (2014) Overview
Readings Howitt & Cramer (4) Ch 7: Relationships between two or more variables: Diagrams and tables Ch 8: Correlation coefficients: Pearson correlation and Spearman s rho Ch : Statistical significance
More informationIntroduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution
Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution James V. Lambers Department of Mathematics The University of Southern Mississippi James V. Lambers Statistical Data Analysis
More informationReadings Howitt & Cramer (2014)
Readings Howitt & Cramer (014) Ch 7: Relationships between two or more variables: Diagrams and tables Ch 8: Correlation coefficients: Pearson correlation and Spearman s rho Ch 11: Statistical significance
More informationBivariate Relationships Between Variables
Bivariate Relationships Between Variables BUS 735: Business Decision Making and Research 1 Goals Specific goals: Detect relationships between variables. Be able to prescribe appropriate statistical methods
More informationChapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.
Chapter 10 Multinomial Experiments and Contingency Tables 1 Chapter 10 Multinomial Experiments and Contingency Tables 10-1 1 Overview 10-2 2 Multinomial Experiments: of-fitfit 10-3 3 Contingency Tables:
More informationRetrieve and Open the Data
Retrieve and Open the Data 1. To download the data, click on the link on the class website for the SPSS syntax file for lab 1. 2. Open the file that you downloaded. 3. In the SPSS Syntax Editor, click
More informationRegression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.
Regression Analysis BUS 735: Business Decision Making and Research 1 Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate
More informationRama Nada. -Ensherah Mokheemer. 1 P a g e
- 9 - Rama Nada -Ensherah Mokheemer - 1 P a g e Quick revision: Remember from the last lecture that chi square is an example of nonparametric test, other examples include Kruskal Wallis, Mann Whitney and
More informationESP 178 Applied Research Methods. 2/23: Quantitative Analysis
ESP 178 Applied Research Methods 2/23: Quantitative Analysis Data Preparation Data coding create codebook that defines each variable, its response scale, how it was coded Data entry for mail surveys and
More informationØ Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.
Statistical Tools in Evaluation HPS 41 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific number
More informationN Utilization of Nursing Research in Advanced Practice, Summer 2008
University of Michigan Deep Blue deepblue.lib.umich.edu 2008-07 536 - Utilization of ursing Research in Advanced Practice, Summer 2008 Tzeng, Huey-Ming Tzeng, H. (2008, ctober 1). Utilization of ursing
More informationRegression Analysis. BUS 735: Business Decision Making and Research
Regression Analysis BUS 735: Business Decision Making and Research 1 Goals and Agenda Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn
More informationChapte The McGraw-Hill Companies, Inc. All rights reserved.
er15 Chapte Chi-Square Tests d Chi-Square Tests for -Fit Uniform Goodness- Poisson Goodness- Goodness- ECDF Tests (Optional) Contingency Tables A contingency table is a cross-tabulation of n paired observations
More informationElementary Statistics Lecture 3 Association: Contingency, Correlation and Regression
Elementary Statistics Lecture 3 Association: Contingency, Correlation and Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu Chong Ma (Statistics, USC) STAT 201
More informationChapter 9: Association Between Variables Measured at the Ordinal Level
Chapter 9: Association Between Variables Measured at the Ordinal Level After this week s class: SHOULD BE ABLE TO COMPLETE ALL APLIA ASSIGNMENTS DUE END OF THIS WEEK: APLIA ASSIGNMENTS 5-7 DUE: Friday
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationLecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests
Lecture 9 Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests Univariate categorical data Univariate categorical data are best summarized in a one way frequency table.
More informationWORKSHOP 3 Measuring Association
WORKSHOP 3 Measuring Association Concepts Analysing Categorical Data o Testing of Proportions o Contingency Tables & Tests o Odds Ratios Linear Association Measures o Correlation o Simple Linear Regression
More informationChapter 5 Statistical Analysis of Cross-Tabs D. White and A. Korotayev May 2003 Introduction
Statistical Analysis of Cross-Tabs D. White and A. Korotayev May 2003 Introduction Descriptive statistics includes collecting, organizing, summarizing and presenting descriptive data. We assume here, as
More informationUnderstand the difference between symmetric and asymmetric measures
Chapter 9 Measures of Strength of a Relationship Learning Objectives Understand the strength of association between two variables Explain an association from a table of joint frequencies Understand a proportional
More informationIntroduction to Statistical Analysis using IBM SPSS Statistics (v24)
to Statistical Analysis using IBM SPSS Statistics (v24) to Statistical Analysis Using IBM SPSS Statistics is a two day instructor-led classroom course that provides an application-oriented introduction
More informationLOOKING FOR RELATIONSHIPS
LOOKING FOR RELATIONSHIPS One of most common types of investigation we do is to look for relationships between variables. Variables may be nominal (categorical), for example looking at the effect of an
More informationData Analysis as a Decision Making Process
Data Analysis as a Decision Making Process I. Levels of Measurement A. NOIR - Nominal Categories with names - Ordinal Categories with names and a logical order - Intervals Numerical Scale with logically
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent
More information4 Multicategory Logistic Regression
4 Multicategory Logistic Regression 4.1 Baseline Model for nominal response Response variable Y has J > 2 categories, i = 1,, J π 1,..., π J are the probabilities that observations fall into the categories
More informationModule 10: Analysis of Categorical Data Statistics (OA3102)
Module 10: Analysis of Categorical Data Statistics (OA3102) Professor Ron Fricker Naval Postgraduate School Monterey, California Reading assignment: WM&S chapter 14.1-14.7 Revision: 3-12 1 Goals for this
More informationAssoc.Prof.Dr. Wolfgang Feilmayr Multivariate Methods in Regional Science: Regression and Correlation Analysis REGRESSION ANALYSIS
REGRESSION ANALYSIS Regression Analysis can be broadly defined as the analysis of statistical relationships between one dependent and one or more independent variables. Although the terms dependent and
More informationChapter Eight: Assessment of Relationships 1/42
Chapter Eight: Assessment of Relationships 1/42 8.1 Introduction 2/42 Background This chapter deals, primarily, with two topics. The Pearson product-moment correlation coefficient. The chi-square test
More information13.1 Categorical Data and the Multinomial Experiment
Chapter 13 Categorical Data Analysis 13.1 Categorical Data and the Multinomial Experiment Recall Variable: (numerical) variable (i.e. # of students, temperature, height,). (non-numerical, categorical)
More informationModule 03 Lecture 14 Inferential Statistics ANOVA and TOI
Introduction of Data Analytics Prof. Nandan Sudarsanam and Prof. B Ravindran Department of Management Studies and Department of Computer Science and Engineering Indian Institute of Technology, Madras Module
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More information11-2 Multinomial Experiment
Chapter 11 Multinomial Experiments and Contingency Tables 1 Chapter 11 Multinomial Experiments and Contingency Tables 11-11 Overview 11-2 Multinomial Experiments: Goodness-of-fitfit 11-3 Contingency Tables:
More informationSmall n, σ known or unknown, underlying nongaussian
READY GUIDE Summary Tables SUMMARY-1: Methods to compute some confidence intervals Parameter of Interest Conditions 95% CI Proportion (π) Large n, p 0 and p 1 Equation 12.11 Small n, any p Figure 12-4
More information" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2
Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the
More informationNonparametric statistic methods. Waraphon Phimpraphai DVM, PhD Department of Veterinary Public Health
Nonparametric statistic methods Waraphon Phimpraphai DVM, PhD Department of Veterinary Public Health Measurement What are the 4 levels of measurement discussed? 1. Nominal or Classificatory Scale Gender,
More informationContingency Tables Part One 1
Contingency Tables Part One 1 STA 312: Fall 2012 1 See last slide for copyright information. 1 / 32 Suggested Reading: Chapter 2 Read Sections 2.1-2.4 You are not responsible for Section 2.5 2 / 32 Overview
More informationThree-Way Contingency Tables
Newsom PSY 50/60 Categorical Data Analysis, Fall 06 Three-Way Contingency Tables Three-way contingency tables involve three binary or categorical variables. I will stick mostly to the binary case to keep
More informationRegression With a Categorical Independent Variable
Regression With a Independent Variable Lecture 10 November 5, 2008 ERSH 8320 Lecture #10-11/5/2008 Slide 1 of 54 Today s Lecture Today s Lecture Chapter 11: Regression with a single categorical independent
More informationJoseph O. Marker Marker Actuarial a Services, LLC and University of Michigan CLRS 2010 Meeting. J. Marker, LSMWP, CLRS 1
Joseph O. Marker Marker Actuarial a Services, LLC and University of Michigan CLRS 2010 Meeting J. Marker, LSMWP, CLRS 1 Expected vs Actual Distribution Test distributions of: Number of claims (frequency)
More informationUnit 9: Inferences for Proportions and Count Data
Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 12/15/2008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)
More informationThe goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.
The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions. A common problem of this type is concerned with determining
More informationReview of One-way Tables and SAS
Stat 504, Lecture 7 1 Review of One-way Tables and SAS In-class exercises: Ex1, Ex2, and Ex3 from http://v8doc.sas.com/sashtml/proc/z0146708.htm To calculate p-value for a X 2 or G 2 in SAS: http://v8doc.sas.com/sashtml/lgref/z0245929.htmz0845409
More informationHypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)
Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2) B.H. Robbins Scholars Series June 23, 2010 1 / 29 Outline Z-test χ 2 -test Confidence Interval Sample size and power Relative effect
More informationEntering and recoding variables
Entering and recoding variables To enter: You create a New data file Define the variables on Variable View Enter the values on Data View To create the dichotomies: Transform -> Recode into Different Variable
More informationQuestion. Hypothesis testing. Example. Answer: hypothesis. Test: true or not? Question. Average is not the mean! μ average. Random deviation or not?
Hypothesis testing Question Very frequently: what is the possible value of μ? Sample: we know only the average! μ average. Random deviation or not? Standard error: the measure of the random deviation.
More informationUnit 9: Inferences for Proportions and Count Data
Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)
More informationUnit 14: Nonparametric Statistical Methods
Unit 14: Nonparametric Statistical Methods Statistics 571: Statistical Methods Ramón V. León 8/8/2003 Unit 14 - Stat 571 - Ramón V. León 1 Introductory Remarks Most methods studied so far have been based
More informationØ Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.
Statistical Tools in Evaluation HPS 41 Fall 213 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific
More informationThe material for categorical data follows Agresti closely.
Exam 2 is Wednesday March 8 4 sheets of notes The material for categorical data follows Agresti closely A categorical variable is one for which the measurement scale consists of a set of categories Categorical
More informationMultinomial Logistic Regression Models
Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population
More informationChi-Squared Tests. Semester 1. Chi-Squared Tests
Semester 1 Goodness of Fit Up to now, we have tested hypotheses concerning the values of population parameters such as the population mean or proportion. We have not considered testing hypotheses about
More informationCHAPTER 14: SUPPLEMENT
CHAPTER 4: SUPPLEMENT OTHER MEASURES OF ASSOCIATION FOR ORDINAL LEVEL VARIABLES: TAU STATISTICS AND SOMERS D. Introduction Gamma ignores all tied pairs of cases. It therefore may exaggerate the actual
More informationDETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics
DETAILED CONTENTS About the Author Preface to the Instructor To the Student How to Use SPSS With This Book PART I INTRODUCTION AND DESCRIPTIVE STATISTICS 1. Introduction to Statistics 1.1 Descriptive and
More informationRank-Based Methods. Lukas Meier
Rank-Based Methods Lukas Meier 20.01.2014 Introduction Up to now we basically always used a parametric family, like the normal distribution N (µ, σ 2 ) for modeling random data. Based on observed data
More informationStatistics 3858 : Contingency Tables
Statistics 3858 : Contingency Tables 1 Introduction Before proceeding with this topic the student should review generalized likelihood ratios ΛX) for multinomial distributions, its relation to Pearson
More informationTypes of Statistical Tests DR. MIKE MARRAPODI
Types of Statistical Tests DR. MIKE MARRAPODI Tests t tests ANOVA Correlation Regression Multivariate Techniques Non-parametric t tests One sample t test Independent t test Paired sample t test One sample
More informationBasic Medical Statistics Course
Basic Medical Statistics Course S7 Logistic Regression November 2015 Wilma Heemsbergen w.heemsbergen@nki.nl Logistic Regression The concept of a relationship between the distribution of a dependent variable
More informationTwo-sample Categorical data: Testing
Two-sample Categorical data: Testing Patrick Breheny April 1 Patrick Breheny Introduction to Biostatistics (171:161) 1/28 Separate vs. paired samples Despite the fact that paired samples usually offer
More informationTesting Independence
Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1
More informationExam Applied Statistical Regression. Good Luck!
Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.
More informationNemours Biomedical Research Statistics Course. Li Xie Nemours Biostatistics Core October 14, 2014
Nemours Biomedical Research Statistics Course Li Xie Nemours Biostatistics Core October 14, 2014 Outline Recap Introduction to Logistic Regression Recap Descriptive statistics Variable type Example of
More informationChapter 6. Logistic Regression. 6.1 A linear model for the log odds
Chapter 6 Logistic Regression In logistic regression, there is a categorical response variables, often coded 1=Yes and 0=No. Many important phenomena fit this framework. The patient survives the operation,
More informationHypothesis Testing One Sample Tests
STATISTICS Lecture no. 13 Department of Econometrics FEM UO Brno office 69a, tel. 973 442029 email:jiri.neubauer@unob.cz 12. 1. 2010 Tests on Mean of a Normal distribution Tests on Variance of a Normal
More informationMS-E2112 Multivariate Statistical Analysis (5cr) Lecture 5: Bivariate Correspondence Analysis
MS-E2112 Multivariate Statistical (5cr) Lecture 5: Bivariate Contents analysis is a PCA-type method appropriate for analyzing categorical variables. The aim in bivariate correspondence analysis is to
More information26:010:557 / 26:620:557 Social Science Research Methods
26:010:557 / 26:620:557 Social Science Research Methods Dr. Peter R. Gillett Associate Professor Department of Accounting & Information Systems Rutgers Business School Newark & New Brunswick 1 Overview
More informationCluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May
Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May 5-7 2008 Peter Schlattmann Institut für Biometrie und Klinische Epidemiologie
More informationModel comparison. Patrick Breheny. March 28. Introduction Measures of predictive power Model selection
Model comparison Patrick Breheny March 28 Patrick Breheny BST 760: Advanced Regression 1/25 Wells in Bangladesh In this lecture and the next, we will consider a data set involving modeling the decisions
More informationPOLI 443 Applied Political Research
POLI 443 Applied Political Research Session 6: Tests of Hypotheses Contingency Analysis Lecturer: Prof. A. Essuman-Johnson, Dept. of Political Science Contact Information: aessuman-johnson@ug.edu.gh College
More informationLectures of STA 231: Biostatistics
Lectures of STA 231: Biostatistics Second Semester Academic Year 2016/2017 Text Book Biostatistics: Basic Concepts and Methodology for the Health Sciences (10 th Edition, 2014) By Wayne W. Daniel Prepared
More informationTest Yourself! Methodological and Statistical Requirements for M.Sc. Early Childhood Research
Test Yourself! Methodological and Statistical Requirements for M.Sc. Early Childhood Research HOW IT WORKS For the M.Sc. Early Childhood Research, sufficient knowledge in methods and statistics is one
More informationMultiple Sample Categorical Data
Multiple Sample Categorical Data paired and unpaired data, goodness-of-fit testing, testing for independence University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html
More informationContingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878
Contingency Tables I. Definition & Examples. A) Contingency tables are tables where we are looking at two (or more - but we won t cover three or more way tables, it s way too complicated) factors, each
More informationWeldon s dice. Lecture 15 - χ 2 Tests. Labby s dice. Labby s dice (cont.)
Weldon s dice Weldon s dice Lecture 15 - χ 2 Tests Sta102 / BME102 Colin Rundel March 6, 2015 Walter Frank Raphael Weldon (1860-1906), was an English evolutionary biologist and a founder of biometry. He
More informationNON-PARAMETRIC STATISTICS * (http://www.statsoft.com)
NON-PARAMETRIC STATISTICS * (http://www.statsoft.com) 1. GENERAL PURPOSE 1.1 Brief review of the idea of significance testing To understand the idea of non-parametric statistics (the term non-parametric
More informationCSSS/STAT/SOC 321 Case-Based Social Statistics I. Levels of Measurement
CSSS/STAT/SOC 321 Case-Based Social Statistics I Levels of Measurement Christopher Adolph Department of Political Science and Center for Statistics and the Social Sciences University of Washington, Seattle
More information