ESP 178 Applied Research Methods 2/23: Quantitative Analysis Data Preparation Data coding create codebook that defines each variable, its response scale, how it was coded Data entry for mail surveys and in-person surveys; check for accuracy (e.g. enter data twice) Missing values delete surveys with too many missing values or impute missing values Data transformation e.g. convert positive to negative responses; convert ratio variables to nominal Descriptive Statistics: univariate analysis to describe general properties of one variable Frequency distributions and histograms to show central tendency, variation, and skewness; is it a normal distribution, i.e. a bell-shaped curve? Purpose Measure Variable Type Definition Measures of Ratio (sometimes Arithmetic average central tendency Median Ordinal, ratio Middle value (or average of two middle values if even number of cases) Mode Best for nominal Most frequent value Measures of Range Ratio (sometimes Highest value lowest value dispersion Variance Ratio (sometimes Average squared difference of each case from the mean Standard deviation Ratio (sometimes Square root of the variance Note: This table ignores interval variables which are not common in this field. Inferential Statistics: to look for associations between two or more variables Null hypothesis means are not different H 0 : µ 1 = µ 2 Alternative hypothesis means are different H 1 : µ 1 µ 2 p-value: Probability of obtaining an effect at least as extreme as the one in the sample data if the null hypothesis is true. In other words, probability that the association is due to chance. Goal is p-values less than 0.05 (5% significance level) or 0.01 (1% significance level). Independent Variable Nominal or Ordinal Ratio or interval Dependent Variable Nominal or ordinal Crosstabulation with Chi-square test Logistic regression Other forms of modeling Ratio or interval Difference of means with t-test (if 2 categories) Analysis of Variance (ANOVA) with F-test (if multiple categories Correlation coefficient Linear regression Other forms of modeling Chi-square: Compares expected frequencies in cells to observed frequencies in cells F-statistic: Compares variation between groups to variation within groups Multivariate analysis: Allows us to test causal hypothesis with non-experimental data by testing for relationship between independent and dependent variables while controlling for other variables that might cause a spurious relationship.
Frequencies - Number of Times Children Played Outside in Last 7 Days Statistics N Valid 389 Missing 10 2.81 Median 2.00 Mode 0 Std. Deviation 2.703 Variance 7.307 iles 25.00 50 2.00 75 5.00 Frequency Valid Cumulative Valid 0 111 27.8 28.5 28.5 1 47 11.8 12.1 40.6 2 56 14.0 14.4 55.0 3 39 9.8 10.0 65.0 4 28 7.0 7.2 72.2 5 32 8.0 8.2 80.5 6 11 2.8 2.8 83.3 7 57 14.3 14.7 97.9 8 2.5.5 98.5 10 5 1.3 1.3 99.7 15 1.3.3 100.0 Total 389 97.5 100.0 Missing System 10 2.5 Total 399 100.0 Histogram 120 100 80 Frequency 60 40 20 0-2 0 2 4 6 8 10 12 14 16 = 2.81 Std. Dev. = 2.703 N = 389 2
Frequencies - Whether or Not Children Played Outside in Last 7 Days Statistics outside_play N Valid 389 Missing 10.7147 Median 1.0000 Mode 1.00 Std. Deviation.45216 Variance.204 iles 25.0000 50 1.0000 75 1.0000 outside_play Frequency Valid Cumulative Valid.00 111 27.8 28.5 28.5 1.00 278 69.7 71.5 100.0 Total 389 97.5 100.0 Missing System 10 2.5 Total 399 100.0 Histogram 400 300 Frequency 200 100 = 0.7147 Std. Dev. = 0.45216 0 N = 389-0.50 0.00 0.50 1.00 1.50 outside_play 3
Crosstabs - Cul-de-sac or Not vs. Played Outside or Not Case Processing Summary outside_play * Cases Valid Missing Total N N N 373 93.5% 26 6.5% 399 100.0% outside_play * Crosstabulation.00 1.00 Total outside_pla.00 Count 83 28 111 y % within 32.5% 23.7% 29.8% 1.00 Count 172 90 262 % within 67.5% 76.3% 70.2% Total Count 255 118 373 % within 100.0% 100.0% 100.0% Chi-Square Tests Value df Asymp. Sig. (2-sided) Exact Sig. (2-sided) Exact Sig. (1-sided) Pearson Chi-Square 3.002(b) 1.083 Continuity Correction(a) 2.595 1.107 Likelihood Ratio 3.078 1.079 Fisher's Exact Test.089.052 Linear-by-Linear Association 2.994 1.084 N of Valid Cases 373 a Computed only for a 2x2 table b 0 cells (.0%) have expected count less than 5. The minimum expected count is 35.12. 4
Oneway ANOVA - Times Playing Outside by Cul-de-Sac or Not Descriptives 95% Confidence Interval for N Std. Deviation Std. Error Lower Bound Upper Bound Minimum Maximu m.00 255 2.48 2.492.156 2.17 2.79 0 10 1.00 118 3.47 3.063.282 2.92 4.03 0 15 Total 373 2.80 2.721.141 2.52 3.07 0 15 Sum of Squares Between df ANOVA Square F Sig. 79.420 1 79.420 11.015.001 Groups Within Groups 2675.094 371 7.210 Total 2754.515 372 Oneway ANOVA - Times Playing Outside vs. Cul-de-sac score (1 to 4 scale) Descriptives 95% Confidence Interval for N Std. Deviation Std. Error Lower Bound Upper Bound Minimum Maximu m 1 229 2.50 2.533.167 2.17 2.83 0 10 2 26 2.31 2.131.418 1.45 3.17 0 7 3 38 3.34 3.034.492 2.34 4.34 0 10 4 80 3.54 3.093.346 2.85 4.23 0 15 Total 373 2.80 2.721.141 2.52 3.07 0 15 Sum of Squares Between df ANOVA Square F Sig. 81.287 3 27.096 3.740.011 Groups Within Groups 2673.228 369 7.245 Total 2754.515 372 5
Regression - Times Playing Outside as Dependent Variable Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1.361(a).130.104 2.604 a Predictors: (Constant), Q I. #9 - Nbrs Active Outside, #8 - Current Total Income,, presence of related kids <=5 current, Work or not current, Q I. #9 - Low Traffic, Q I. #9 - Low Crime, Q I. #9 - Nbr Interaction, presence of related kids <=12 current, Q I. #9 - Safe for Kids ANOVA(b) Model Sum of Squares df Square F Sig. 1 Regression 343.683 10 34.368 5.070.000(a) Residual 2297.986 339 6.779 Total 2641.669 349 a Predictors: (Constant), Q I. #9 - Nbrs Active Outside, #8 - Current Total Income,, presence of related kids <=5 current, Work or not current, Q I. #9 - Low Traffic, Q I. #9 - Low Crime, Q I. #9 - Nbr Interaction, presence of related kids <=12 current, Q I. #9 - Safe for Kids b Dependent Variable: Model Coefficients(a) Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -.876.896 -.977.329.932.312.158 2.986.003 Work or not current.821.403.106 2.035.043 #8 - Current Total Income.000.000 -.045 -.860.391 presence of related kids <=5 current -.265.346 -.048 -.766.444 presence of related kids <=12 current 1.519.376.252 4.038.000 Q I. #9 - Safe for Kids.762.269.203 2.827.005 Q I. #9 - Low Traffic -.161.165 -.057 -.974.331 Q I. #9 - Low Crime.042.246.012.169.866 Q I. #9 - Nbr Interaction.312.187.104 1.669.096 Q I. #9 - Nbrs Active Outside -.321.202 -.099-1.591.113 a Dependent Variable: 6