Corso di Laurea Magistrale in Farmacia

Size: px
Start display at page:

Download "Corso di Laurea Magistrale in Farmacia"

Transcription

1 Universita degli Studi di Milano Corso di Laurea Magistrale in Farmacia Medicinali Generici Dott. Matteo Cerea RICHIAMI DI STATISTICA

2 Basic statistics Dr. Matteo Cerea, PhD

3 Why Statistics? Two Purposes 1. Descriptive Finding ways to summarize the important characteristics of a dataset 2. Inferential How (and when) to generalize from a sample dataset to the larger population

4 Descriptive Statistics Provides graphical and numerical ways to organize, summarize, and characterize a dataset.

5 VARIABLE A characteristic or a property that can vary in value among subjects in a sample or a population. Ex: The weight of tablets in a batch The concentration of drug in plasma in patients after the administration of a fixed dose

6 Predictor variable: Types of Variables The antecedent conditions that are going to be used to predict the outcome of interest. If an experimental study, then called an independent variable. x Outcome variable: The variable you want to be able to predict. If an experimental study, then called a dependent variable. y=f(x)

7 Continuous variable: Can assume an infinite number of possible values that fall between any two observed values: the lowest and the highest. Ex: the drug content of tablets in a batch expressed as microgr Ranked variable: Are continuous variables although they do not represent physical measurement, such scale represent numerically ordered system. Ex: 0 no encrustation 1 microscopic deposits on <50% of the stent 2 microscopic deposits on >50% of the stent 3 small macroscopic deposits on <50% of the stent 4 small macroscopic deposits on >50% of the stent 5 heavy macroscopic deposits

8 Discrete (discontinous, meristic) variable: Consists of separate, indivisible categories. Discrete variables have integer numbers Ex: # of asthma attacks, # of fatalities, # of colonies of microrganisms Nominal variable (categorical): Cannot be measured because of their qualitative nature Ex: sex, gender, side effects associated with the treatment Nominal ranked (ordinal) variables Ex: side effects associated with the treatment, if ordered

9 How to present data? 1.Describing data with tables and graphs (quantitative or categorical variables) 2.Numerical descriptions of center, variability, position (quantitative variables) 3.Bivariate descriptions (in practice, most studies have several variables)

10 1. Tables and Graphs There are several types of graphs or plots employed to display scientific data: Graphs or plots that are employed to describe relationships between a fixed (independent) variable and a dependent variable Graphs that are employed to pictorially describe distributions of data Frequency distribution lists possible values of variable (or intervals) and number of times each occurs

11 Example Pharmaceutical Statistics, David Jones, Pharmaceutical Press rce=gbs_ge_summary_r&cad=0#v=onepage&q&f=false

12 Frequency Tables

13 Frequency Distribution Histogram: Bar graph of frequencies or percentages or relative frequency

14 Frequency Distribution

15 Frequency Distribution

16 290,1-291,0 291,1-292,0 292,1-293,0 293,1-294,0 294,1-295,0 295,1-296,0 296,1-297,0 297,1-298,0 298,1-299,0 299,1-300,0 300,1-301,0 301,1-302,0 302,1-303,0 303,1-304,0 304,1-305,0 305,1-306,0 306,1-307,0 307,1-308,0 308,1-309,0 309,1-310,0 Relative Frequency Distribution Proportion of tablets in Interval 0,180 0,160 0,140 0,120 0,100 0,080 0,060 0,040 0,020 0,000

17 Cumulative frequency distribution data Less than More than

18 Cumulative frequency distribution graph Less than More than

19 Civil status X : disconnected qualitative o determined on nominal scale Ex. 4 different possibilities xi : k values of X ni : frequency of xi Qualitative or nominal n: total number of observations x1 = N x2 = C x3 = V x4 = S fi = ni /n relative frequency; pi percent frequency Distribution of frequency of X is xi ni fi = ni /n pi = fi 100% N C V S n=

20 Pie chart

21 Ex. Annual income in thousand Euro W : continuous quantitative Data (k = 20) are devided in classes (4) ai : wideness (amplitude) of each class li = ni /ai : frequency density Frequency table: xi ni fi Ni ai li

22

23 2. Descriptive Measures Numerical descriptions Let X denote a quantitative variable, with observations X 1, X 2, X 3,, X n a-central Tendency measures. They are computed to give a center around which the measurements in the data are distributed. b-variation or Variability measures. They describe data spread or how far away the measurements are from the center. c-relative Standing measures. They describe the relative position of specific measurements in the data.

24 a. Measures of Central Tendency Mean (average): Sum of all measurements divided by the number of measurements. Weighted mean: N is the number of observations Each datum point does not contribute proportionally w is the frequency

25 a. Measures of Central Tendency Median: The central number of a set of data arranged in order of magnitude. Mode: The most frequent measurement in the data.

26 Calculation of the mean = ( ) 10 = 5.2 Calculation of the median 0, 0, 0, 1, 2, 4, 6, 6, 9, 24 3

27 Properties of mean and median For symmetric distributions, mean = median For skewed distributions, mean is drawn in direction of longer tail, relative to median Mean valid for continuous scales, median for continuous or ordinal scales Mean sensitive to outliers (median often preferred for highly skewed distributions) When distribution symmetric or mildly skewed or discrete with few values, mean preferred because uses numerical values of observations Ex. T max, median (range); AUC, mean (std.dev)

28 In other words When the Mean is greater than the Median the data distribution is skewed to the Right. When the Median is greater than the Mean the data distribution is skewed to the Left. When Mean and Median are very close to each other the data distribution is approximately symmetric.

29 b. Describing variability Range: Difference between largest and smallest observations (but highly sensitive to outliers, insensitive to shape). Used for non-normally distributed data (Ex: t max ) Mean deviation: The average distance from the mean The deviation of observation j from the mean is y j -y

30 Mean deviation: The average distance from the mean MD: (Xj Xm) N # drug content Absolute values of errors 1 100,6 0,1 2 98,3 2,2 3 98,9 1,6 4 95,1 5, ,5 4, ,5 5,0 mean 100,5 sum 18,3 MD 3,1

31 Variance: is the sums of squares (SS) of the sample s ( y y) 2 ( y y) 2... ( y y) 2 2 i 1 n n 1 n 1 It is a measure of spread : the larger the deviations (positive or negative) the larger the variance

32 The variance of a sample is the sums of squares (SS) SS = Y j Y 2 The mean sums of squares σ 2 = Y j µ 2 The variance of a sample of a population N s 2 = Y j Y 2 N 1

33 The variance of a sample is the sums of squares (SS) SS = Y j Y 2 The mean sums of squares σ 2 = Y j µ 2 The variance of a sample of a population N s 2 = Y j Y 2 N 1 Population Sample

34 The variance of a sample is the sums of squares (SS) s ( y y) 2 ( y y) 2... ( y y) 2 2 i 1 n n 1 n 1 Standard deviation: s is the square root of the variance, s s 2 It is a measure of spread : the larger the deviations (positive or negative) the larger the variance

35 The variance of a population is ( y y) 2 ( y y) 2... ( y y) 2 i 1 n s 2 m n 1 n 1 n The standard deviation s is the square root of the variance, s s 2 s s 2 Ex: sample of a population: 100 tablets (sample) are removed from a batch of tablets (population) and tested. The variance of a single random sample of measurements does not provide a good estimation of variance of the population from which the sample was derived A good estimation of population variance can be achieved from sample data if an average of several sample variance is calculated

36 Concentration of a penicillin antibiotic in 5 bottles Bottle # Concentration of penicillin (mg/5ml) mean (mg/ 5mL) 101,8 Total variance (sample) 2302,7 standard deviation 48,0 median 123 Contribute to the variance , , , , ,41

37 Standard deviation (error) of the mean: SEM Concentration of amoxicillin in 5 aliquots tested 5 times N = observation in sample Aliquot 1 Aliquot 2 Aliquot 3 Aliquot 4 Aliquot 5 25,1 27,6 24,3 23,9 25,7 25,4 25,5 26,4 24,9 23,5 21,9 25,6 25,1 26,1 24,2 24, , ,7 23,1 24, ,2 24,3 mean 24,0 25,6 25,6 25,4 24,7 s 1,5 1,3 1,1 1,2 1,0 Mean total 25,1 SEM 0,70

38 Standard deviation (error) of the mean: SEM SEM = s/ N s = standard deviation of the sample N = # of observations of the sample

39 Standard deviation (error) of the mean: SEM Concentration of amoxicillin in 5 aliquots Aliquot 1 Aliquot 2 Aliquot 3 Aliquot 4 Aliquot 5 25,1 27,6 24,3 23,9 25,7 25,4 25,5 26,4 24,9 23,5 21,9 25,6 25,1 26,1 24,2 24, , ,7 23,1 24, ,2 24,3 mean 24,0 25,6 25,6 25,4 24,7 s 1,5 1,3 1,1 1,2 1,0 Mean total 25,1 SEM 0,70 SEM estimed 0,66

40 Standard deviation of a sample is an estimation of the variability of a population the value does not reduce if the # of observations increases Standard deviation of the mean is an measure of the variability (precision) of the estimation of a defined population parameter (i.e. the mean) As the size of the sample increases, the magnitude of the standard error decreases

41 Coefficient of variation (CV) CV (%) = S X x 100

42 Accuracy: the closenes of a measured value to the true value (the value in absence of error) Absolute error: error abs = O - E E (true value, exact) O (observed value, or mean) Relative error: error rel = error abs = O E E E

43 Precision: describes the dispersion (variability) of a set of measurements Typically precision is associated with low dispersion of the value around a central value (low standard deviations)

44 Accuracy: Accuracy is how close a measurement is to the "true" value. In a laboratory setting this is often how far a measured value is from a standard with a known value that was measured by different technology or on a different instrument. Precision: Precision is how close repeated measurements are to each other. Precision has no bearing on a target value, it is simply how close multiple measurements are together. Reproducibility is key to scientific research and precision is important in this aspect.

45 c. Measures of position p th percentile: p percent of observations below it, (100 - p)% above it. Example, if in a certain data the 85 th percentile is 340 means that 15% of the measurements in the data are above 340. It also means that 85% of the measurements are below 340 Notice that the median is the 50 th percentile p = 50: median p = 25: lower quartile (LQ) p = 75: upper quartile (UQ) Interquartile range IQR = UQ - LQ

46

47 Quartiles portrayed graphically by box plots (John Tukey) Example: weekly TV watching for n=60 from student survey data file, 3 outliers

48 Box plots have box from LQ to UQ, with median marked. They portray a five-number summary of the data: Minimum, LQ, Median, UQ, Maximum except for outliers identified separately Outlier = observation falling below LQ 1.5(IQR) or above UQ + 1.5(IQR) Ex. If LQ = 2, UQ = 10, then IQR = 8 and outliers above (8) = 22

49 Normal distribution curve

50 Normal distribution curve In a normal distribution of data, also known as a bell curve, - the majority of the data in the distribution approximately 68% will fall within plus or minus one standard deviation of the statistical average. This means that if the standard deviation of a data set is 2, for example, the majority of data in the set will fall within +2 and -2 the average % of normally distributed data is within two standard deviations of the mean, -over 99% are within three

51 For any data At least 75% of the measurements differ from the mean less than twice the standard deviation. At least 89% of the measurements differ from the mean less than three times the standard deviation.

52 Inferential Statistics mathematical tools that permit the researcher to generalize to a population of individuals based upon information obtained from a limited number of research participants (observation)

53 Sample statistics / Population parameters We distinguish between summaries of samples (statistics) and summaries of populations (parameters). Common to denote statistics by Roman letters, parameters by Greek letters: Population mean =m, standard deviation = s proportion p In practice, parameter values unknown, we make inferences about their values using sample statistics.

54 Sample proportion Definition. The statistic that estimates the parameter π, a proportion of a population that has some property, is the sample proportion p p = numberof successes in the sample total number of observations in the sample

55 The sample mean X estimates the population mean m (quantitative variable) The sample standard deviation s estimates the population standard deviation s (quantitative variable) A sample proportion p estimates a population proportion π (categorical variable)

56 Ex. From a population of n individuals 2 samples of 100 individuals are extracted. mean height of the first sample X 1 = 168 cm mean height of the first sample X 2 = 162 cm until all the samples are estracted. The sample proportion estimates the mean of population but with uncertainty. The uncertainty dependes upon: 1 - sample dimension 2 - variability of the population the means are different, but how are they distributed?

57 population sample n1 sample n2>n1

58

59 Standard deviation of sampling distribution Standard Error SE = s/ N SEM = s/ N p(1 p) n

60 4. Probability Distributions Probability: With random sampling or a randomized experiment, the probability an observation takes a particular value is the proportion of times that outcome would occur in a long sequence of observations. Usually corresponds to a population proportion (and thus falls between 0 and 1) for some real or conceptual population. A probability distribution lists all the possible values and their probabilities (which add to 1.0)

61 Basic probability rules Let A, B denotes possible outcomes P(not A) = 1 P(A) For distinct (separate) possible outcomes A and B, P(A or B) = P(A) + P(B) If A and B not distinct, P(A and B) = P(A) x P(B given A) For independent outcomes, P(B given A) = P(B), so P(A and B) = P(A) x P(B).

62 Probability distribution of a variable Lists the possible outcomes for the random variable and their probabilities Discrete variable: Assign probabilities P(y) to individual values y, with 0 P( y) 1, P( y) 1 In practice, probability distributions are often estimated from sample data, and then have the form of frequency distributions

63 Like frequency distributions, probability distributions have descriptive measure like mean and standard deviation m E( Y) yp( y) Expected value Standard Deviation - Measure of the typical distance of an outcome from the mean, denoted by σ If a distribution is approximately bell-shaped, then: all or nearly all the distribution falls between µ - 3σ and µ + 3σ Probability about 0.68 falls between µ - σ and µ + σ

64 Continuous variables: Probabilities assigned to intervals of numbers Most important probability distribution for continuous variables is the normal distribution Symmetric, bell-shaped Characterized by mean (m) and standard deviation (s), representing center and spread Probability within any particular number of standard deviations of m is same for all normal distributions An individual observation from an approximately normal distribution has probability 0.68 of falling within 1 standard deviation of mean 0.95 of falling within 2 standard deviations of falling within 3 standard deviations

65 The normal curve is often called the Gaussian distribution, after Carl Friedrich Gauss, who discovered many of its properties. Gauss, commonly viewed as one of the greatest mathematicians of all time, was honoured by Germany on their 10 Deutschmark bill.

66 Properties (cont.) Has a mean = 0 and standard deviation = 1. General relationships: ±1 s = about 68.26% ±2 s = about 95.44% ±3 s = about 99.72% 68.26% 95.44% 99.72%

67 Notes about z-scores Are a way of determining the position of a single score under the normal curve. Measured in standard deviations relative to the mean of the curve. The Z-score can be used to determine an area under the curve known as a probability. z = (y - µ) σ

68 The standard normal distribution is the normal distribution with µ = 0, σ = 1 For that distribution, z = (y - µ)/σ = (y - 0)/1 = y i.e., original score = z-score µ + zσ = 0 + z(1) = z Why is normal distribution so important? If different studies take random samples and calculate a statistic (e.g. sample mean) to estimate a parameter (e.g. population mean), the collection of statistic values from those studies usually has approximately a normal distribution. (So?)

69 Notes about z-scores Ex tablets produced and assayed for content. The mean (µ ± σ) is 200 ± 10 mg and the concentration is normally distributed. Calculate the proportion of tablets that contain 180 mg or less % 95.44% 99.72% z = (y - µ) σ z y-µ (mg) Y (mg)

70 z = y µ σ = = % 95.44% 99.72% probability distribution z-score table (z probability table) ( z= , probability below 0.023%

71 A sampling distribution lists the possible values of a statistic (e.g., sample mean or sample proportion) and their probabilities How close is sample mean Ῡ to population mean µ? To answer this, we must be able to answer, What is the probability distribution of the sample mean?

72 Sampling distribution of sample mean is a variable, its value varying from sample to sample about the population mean µ Standard deviation of sampling distribution of called the standard error of For random sampling, the sampling distribution of y y has mean µ and standard error y y is s s y n population standard deviation sample size

73 Central Limit Theorem: For random sampling with large n, the sampling distribution of the sample mean is approximately a normal distribution Approximate normality applies no matter what the shape of the population distribution. How large n needs to be depends on skew of population distribution, but usually n 30 sufficient

74

75

76 5. Statistical Inference: Estimation Goal: How can we use sample data to estimate values of population parameters? Point estimate: A single statistic value that is the best guess for the parameter value Interval estimate: An interval of numbers around the point estimate, that has a fixed confidence level of containing the parameter value. Called a confidence interval. (Based on sampling distribution of the point estimate)

77 Point Estimators Most common to use sample values Sample mean estimates population mean m ˆ m y n Sample std. dev. estimates population std. dev. s y i ˆ s s ( y y) i n 1 2 Sample proportion proportion π pˆ estimates population

78 Confidence Intervals A confidence interval (CI) is an interval of numbers believed to contain the parameter value. Ex. When public health practitioners use health statistics, sometimes they are interested in the actual number of health events, but more often they use the statistics to assess the true underlying risk of a health problem in the community. Statistical sampling theory is used to compute a confidence interval to provide an estimate of the potential discrepancy between the true population parameters and observed rates. Understanding the potential size of that discrepancy can provide information about how to interpret the observed statistic.

79 Confidence Intervals A confidence interval (CI) is an interval of numbers believed to contain the parameter value. The probability the method produces an interval that contains the parameter is called the confidence level. Most studies use a confidence level close to 1, such as 0.95 or Most CIs have the form point estimate ± margin of error with margin of error based on spread of sampling distribution of the point estimator; e.g., margin of error ±2(standard error) for 95% confidence.

80

81 Confidence Intervals A 95% confidence interval for a percentage is the range of scores within which the percentage will be found if you went back and got a different sample from the same population.

82 The sampling distribution of a sample proportion for large random samples is approximately normal (Central Limit Theorem) So, with probability 0.95, sample proportion pˆ falls within 1.96 standard errors of population proportion π 0.95 probability that ˆ p falls between p 1.96 s and p 1.96s Once sample selected, we re 95% confident ˆ p 1.96 s to ˆ p 1.96 s contains p ˆ p z= 1.96 probability of 97.5% (or better 0.975) This is the CI for the population proportion π (almost) ˆ p ˆ p ˆ p

83

84 Finding a CI in practice Complication: The true standard error s ˆ s / n p (1 p ) / n p itself depends on the unknown parameter! In practice, we estimate se = p 1 p n (1-p)= q s ˆ p ˆ 1 ˆ p p p(1 p) by se n n and then find the 95% CI using the formula ˆ p 1.96( se) to ˆ p 1.96( se)

85

86 Greater confidence requires wider CI Greater sample size gives narrower CI (quadruple n to halve width of CI)

87 Some comments about CIs Effects of n, confidence coefficient true for CIs for other parameters also If we repeatedly took random samples of some fixed size n and each time calculated a 95% CI, in the long run about 95% of the CI s would contain the population proportion π. The probability that the CI does not contain π is called the error probability, and is denoted by α. α = 1 confidence coefficient (1- )100% /2 z /2 90% % %

88 Confidence Interval for the Mean In large random samples, the sample mean has approximately a normal sampling distribution with mean m and standard error Thus, s s y n P( m 1.96s y m 1.96 s ).95 y We can be 95% confident that the sample mean lies within 1.96 standard errors of the (unknown) population mean y

89 Problem: Standard error is unknown (s is also a parameter). It is estimated by replacing s with its point estimate from the sample data: se 95% confidence interval for m : y 1.96( se), which is y 1.96 s n s n This works ok for large n, because s then a good estimate of σ (and CLT applies). But for small n, replacing σ by its estimate s introduces extra error, and CI is not quite wide enough unless we replace z-score by a slightly larger t-score.

90 The t distribution (Student s t) The t distribution is used instead of the normal distribution whenever the standard deviation is estimated. Bell-shaped, symmetric about 0 Standard deviation a bit larger than 1 (slightly thicker tails than standard normal distribution, which has mean = 0, standard deviation = 1) Precise shape depends on degrees of freedom (df). For inference about mean, df = n 1 Gets narrower and more closely resembles standard normal distribution as df increases (nearly identical when df > 30) CI for mean has margin of error t(se), (instead of z(se) as in CI for proportion)

91 The t distribution (Student s t)

92 Part of a t table Confidence Level 90% 95% 98% 99% df t.050 t.025 t.010 t infinity df = corresponds to standard normal distribution

93 CI for a population mean For a random sample from a normal population distribution, a 95% CI for µ is y t ( se), with se s / n.025 where df = n - 1 for the t-score Normal population assumption ensures sampling distribution has bell shape for any n.

94 Comments about CI for population mean µ The method is robust to violations of the assumption of a normal population distribution (But, be careful if sample data distribution is very highly skewed, or if severe outliers. Look at the data.) Greater confidence requires wider CI Greater n produces narrower CI t methods developed by the statistician William Gosset of Guinness Breweries, Dublin (1908)

95 Choosing the Sample Size Determine parameter of interest (population mean or population proportion) Select a margin of error (M) and a confidence level (determines z-score) Proportion (to be safe, set p = 0.50): n z p(1 p) M 2 Mean (need a guess for value of s): n s M 2 z 2

96 We ve seen that n depends on confidence level (higher confidence requires larger n) and the population variability (more variability requires larger n) In practice, determining n not so easy, because (1) many parameters to estimate, (2) resources may be limited and we may need to compromise CI s can be formed for any parameter.

97 Using CI Inference in Practice What is the variable of interest? quantitative inference about mean categorical inference about proportion Are conditions satisfied? Randomization (why? Needed so sampling distribution and its standard error are as advertised)

98 6. Statistical Inference: Significance Tests Goal: Use statistical methods to test hypotheses such as For treating anorexia, cognitive behavioral and family therapies have same mean weight change as placebo (no effect) Mental health tends to be better at higher levels of socioeconomic status (SES) (i.e., there is an effect) Spending money on other people has a more positive impact on happiness than spending money on oneself.

99 Hypotheses: For statistical inference, these are predictions about a population expressed in terms of parameters (e.g., population means or proportions or correlations) for the variables considered in a study A significance test uses data to evaluate a hypothesis by comparing sample point estimates of parameters to values predicted by the hypothesis. We answer a question such as, If the hypothesis were true, would it be unlikely to get data such as we obtained?

100 Five Parts of a Significance Test Assumptions about type of data (quantitative, categorical), sampling method (random), population distribution (e.g., normal, binary), sample size (large enough?) Hypotheses: Null hypothesis (H 0 ): A statement that parameter(s) take specific value(s) (Usually: no effect ) Alternative hypothesis (H a ): states that parameter value(s) falls in some alternative range of values (an effect )

101 Test Statistic: Compares data to what null hypotesis H 0 predicts, often by finding the number of standard errors between sample point estimate and H 0 value of parameter P-value (P): A probability measure of evidence about H 0. The probability (under presumption that H 0 true) the test statistic equals observed value or value even more extreme in direction predicted by H a. The smaller the P-value, the stronger the evidence against H 0. Conclusion: If no decision needed, report and interpret P-value If decision needed, select a cutoff point (such as 0.05 or 0.01) and reject H 0 if P-value that value

102 The most widely accepted cutoff point is 0.05, and the test is said to be significant at the.05 level if the P- value If the P-value is not sufficiently small, we fail to reject H 0 (then, H 0 is not necessarily true, but it is plausible) Process is analogous to American judicial system H 0 : Defendant is innocent H a : Defendant is guilty

103 Fine prima parte

104

105

106 Significance Test for Mean Assumptions: Randomization, quantitative variable, normal population distribution (robustness?) Null Hypothesis: H 0 : µ = µ 0 where µ 0 is particular value for population mean (typically no effect or no change from a standard) Alternative Hypothesis: H a : µ µ 0 2-sided alternative includes both > and < H 0 value Test Statistic: The number of standard errors that the sample mean falls from the H 0 value y m0 t where se s / n se

107 When H 0 is true, the sampling distribution of the t test statistic is the t distribution with df = n - 1. P-value: Under presumption that H 0 true, probability the t test statistic equals observed value or even more extreme (i.e., larger in absolute value), providing stronger evidence against H 0 This is a two-tail probability, for the two-sided H a Conclusion: Report and interpret P-value. If needed, make decision about H 0

108 Making a decision: The α-level is a fixed number, also called the significance level, such that if P-value α, we reject H 0 If P-value > α, we do not reject H 0 Note: We say Do not reject H 0 rather than Accept H 0 because H 0 value is only one of many plausible values. A high significance level means there is a large chance that the experiment proves something that is not true. A very small significance level assures the statistician that there is little room to doubt the results.

109 Effect of sample size on tests With large n (say, n > 30), assumption of normal population distribution not important because of Central Limit Theorem. For small n, the two-sided t test is robust against violations of that assumption. However, one-sided test is not robust. For a given observed sample mean and standard deviation, the larger the sample size n, the larger the test statistic (because se in denominator is smaller) and the smaller the P-value. (i.e., we have more evidence with more data) We re more likely to reject a false H 0 when we have a larger sample size (the test then has more power ) With large n, statistical significance not the same as practical significance.

110 Significance Test for a Proportion π Assumptions: Categorical variable Randomization Large sample (but two-sided ok for nearly all n) Hypotheses: Null hypothesis: H 0 : p p 0 Alternative hypothesis: H a : p p 0 (2-sided) H a : p > p 0 H a : p < p 0 (1-sided) Set up hypotheses before getting the data

111 Test statistic: Note z ˆ p p ˆ 0 p p 0 s p (1 p ) / n 0 0 As in test for mean, test statistic has form (estimate of parameter H 0 value)/(standard error) = no. of standard errors the estimate falls from H 0 value ˆ p s ˆ se ˆ ˆ 0 p0(1 p0) / n, not se p(1 p ) / n as in a CI p P-value: H a : p p 0 H a : p > p 0 H a : p < p 0 P = 2-tail prob. from standard normal dist. P = right-tail prob. from standard normal dist. P = left-tail prob. from standard normal dist. Conclusion: As in test for mean (e.g., reject H 0 if P-value α)

112 Decisions in Tests -level (significance level): Pre-specified hurdle for which one rejects H 0 if the P-value falls below it (Typically 0.05 or 0.01) P-Value H 0 Conclusion H a Conclusion.05 Reject Accept >.05 Do not Reject Do not Accept Rejection Region: Values of the test statistic for which we reject the null hypothesis For 2-sided tests with = 0.05, we reject H 0 if z 1.96

113 Error Types Type I Error: Reject H 0 when it is true Type II Error: Do not reject H 0 when it is false Test Result Reject H 0 Don t Reject H 0 Reality H 0 True Type I Error Correct H 0 False Correct Type II Error

114 P(Type I error) Suppose -level = Then, P(Type I error) = P(reject null, given it is true) = P( z > 1.96) = 0.05 i.e., the -level is the P(Type I error). Since we give benefit of doubt to null in doing test, it s traditional to take small, usually 0.05 but 0.01 to be very cautious not to reject null when it may be true. As in CIs, don t make too small, since as goes down, β = P(Type II error) goes up (Think of analogy with courtroom trial) Better to report P-value than merely whether reject H 0

115 P(Type II error) P(Type II error) = b depends on the true value of the parameter (from the range of values in H a ). The farther the true parameter value falls from the null value, the easier it is to reject null, and P(Type II error) goes down. Power of test = 1 -α = P(reject null, given it is false) In practice, you want a large enough n for your study so that P(Type II error) is small for the size of effect you expect.

116 Practical Applications of Statistics researchers use a test of significance to determine whether to reject or fail to reject the null hypothesis involves pre-selecting a level of probability, α (e.g., α =.05) that serves as the criterion to determine whether to reject or fail to reject the null hypothesis

117 Steps in using inferential statistics 1. select the test of significance 2. determine whether significance test will be twotailed or one tailed 3. select α (alpha), the probability level 4. compute the test of significance 5. consult table to determine the significance of the results

118 Tests of significance... statistical formulas that enable the researcher to determine if there was a real difference between the sample means different tests of significance account for different factors including: the scale of measurement represented by the data; method of participant selection, number of groups being compared, and, the number of independent variables the researcher must first decide whether a parametric or nonparametric test must be selected

119 parametric test... assumes that the variable measured is normally distributed in the population the selection of participants is independent the variances of the population comparison groups are equal used when the data represent a interval or ratio scale

120 nonparametric test... makes no assumption about the distribution of the variable in the population, that is, the shape of the distribution used when the data represent a nominal or ordinal scale, when a parametric assumption has been greatly violated, or when the nature of the distribution is not known usually requires a larger sample size to reach the same level of significance as a parametric test

121 The most common tests of significance t-test z-test ANOVA Chi Square

122 t-test... used to determine whether two means are significantly different at a selected probability level adjusts for the fact that the distribution of scores for small samples becomes increasingly different from the normal distribution as sample sizes become increasingly smaller the strategy of the t-test is to compare the actual mean difference observed to the difference expected by chance

123 t-test... forms a ratio where the numerator is the difference between the sample means and the denominator is the chance difference that would be expected if the null hypothesis were true after the numerator is divided by the denominator, the resulting t value is compared to the appropriate t table value, depending on the probability level and the degrees of freedom if the t value is equal to or greater than the table value, then the null hypothesis is rejected because the difference is greater than would be expected due to chance (t stat > t tab, H 0 rejected)

124 t-test... there are two types of t-tests: the t-test for independent samples (randomly formed) the t-test for nonindependent samples (nonrandomly formed, e.g., matching, performance on a pre-/posttest, different treatments)

125 t-test... Ex. t-test

126 Z-test... Ex. Z-test

127 ANOVA (Analysis of Variance) used to determine whether two or more means are significantly different at a selected probability level avoids the need to compute duplicate t-tests to compare groups (more than 2) the strategy of ANOVA is that total variation, or variance, can be divided into two sources: treatment variance between groups, variance caused by the treatment groups error variance within groups variance

128 ANOVA (Analysis of Variance) forms a ratio, the F ratio, with the treatment variance as the numerator (between group variance) and error variance as the denominator (within group variance) the assumption is that randomly formed groups of participants are chosen and are essentially the same at the beginning of a study on a measure of the dependent variable at the study s end, the question is whether the variance between the groups differs from the error variance by more than what would be expected by chance

129 if the treatment variance is sufficiently larger than the error variance, a significant F ratio results, that is, the null hypothesis is rejected and it is concluded that the treatment had a significant effect on the dependent variable if the treatment variance is not sufficiently larger than the error variance, an insignificant F ratio results, that is, the null hypothesis is accepted and it is concluded that the treatment had no significant effect on the dependent variable

130

131 when the F ratio is significant and more than two means are involved, researchers use multiple comparison procedures (e.g., Scheffé test, Tukey s HSD test, Duncan s multiple range test)

132 ANOVA (Analysis of Variance) Ex. ANOVA One-way Two-way

133 Chi Square (Χ 2 ) Pearson s test a nonparametric test of significance appropriate for nominal or ordinal data that can be converted to frequencies It can be applied to interval or ratio data that have been categorized into a small number of groups. It assumes that the observations are randomly sampled from the population. All observations are independent (an individual can appear only once in a table and there are no overlapping categories). It does not make any assumptions about the shape of the distribution nor about the homogeneity of variances.

134 Chi Square (Χ 2 )... compares the proportions actually observed (O) to the proportions expected (E) to see if they are significantly different the chi square value increases as the difference between observed and expected frequencies increases

135 One- and two- tailed tests of significance... tests of significance that indicate the direction in which a difference may occur the word tail indicates the area of rejection beneath the normal curve

136 H o : The two variables are independent H a : The two variables are associated

137 Chi Square (Χ 2 ) calculations Contrasts observed frequencies in each cell of a contingency table with expected frequencies. The expected frequencies represent the number of cases that would be found in each cell if the null hypothesis were true ( i.e. the nominal variables are unrelated). Expected frequency of two unrelated events is product of the row and column frequency divided by number of cases. F e = F r F c / N

138 ( F F ) 2 o e Fe 2

139 Determine Degrees of Freedom df = (R-1)(C-1)

140 Compare computed test statistic against a tabled/critical value The computed value of the Pearson chisquare statistic is compared with the critical value to determine if the computed value is improbable The critical tabled values are based on sampling distributions of the Pearson chisquare statistic If calculated 2 is greater than 2 table value, reject H o

141 A = B no difference between means; the direction can be positive or negative direction can be in either tail of the normal curve called a two-tailed test divides the α level between the two tails of the normal curve A > B or A < B there is a difference between means; the direction is either positive or negative called a one-tailed test the α level is found in one tail of the normal curve

142 Ex. Chi Square (Χ 2 )

143 3. Bivariate description Usually we want to study associations between two or more variables (e.g., how does number of close friends depend on gender, income, education, age, working status, rural/urban, religiosity ) Response variable: the outcome variable Explanatory variable(s): defines groups to compare Ex.: number of close friends is a response variable, while gender, income, are explanatory variables Response var. also called dependent variable Explanatory var. also called independent variable

144 Summarizing associations: Categorical var s: show data using contingency tables Quantitative var s: show data using scatterplots Mixture of categorical var. and quantitative var. (e.g., number of close friends and gender) can give numerical summaries (mean, standard deviation) or side-by-side box plots for the groups

145 Contingency Tables Cross classifications of categorical variables in which rows (typically) represent categories of explanatory variable and columns represent categories of response variable. Counts in cells of the table give the numbers of individuals at the corresponding combination of levels of the two variables

146 Another Example Heparin Lock Placement Time: 1 = 72 hrs 2 = 96 hrs Complication Incidence * Heparin Lock Placement Time Group Crosstabulation Complication Incidence Total Had Compilca Had NO Compilca Count Expected Count % within Heparin Lock Placement Time Group Count Expected Count % within Heparin Lock Placement Time Group Count Expected Count % within Heparin Lock Placement Time Group Heparin Lock Placement Time Group 1 2 Total % 22.0% 20.0% % 78.0% 80.0% % 100.0% 100.0% 147

147 Hypotheses in Heparin Lock Placement H o : There is no association between complication incidence and length of heparin lock placement. (The variables are independent). H a : There is an association between complication incidence and length of heparin lock placement. (The variables are related). 148

148 More of SPSS Output 149

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables)

(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables) 3. Descriptive Statistics Describing data with tables and graphs (quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables) Bivariate descriptions

More information

Chi-Square. Heibatollah Baghi, and Mastee Badii

Chi-Square. Heibatollah Baghi, and Mastee Badii 1 Chi-Square Heibatollah Baghi, and Mastee Badii Different Scales, Different Measures of Association Scale of Both Variables Nominal Scale Measures of Association Pearson Chi-Square: χ 2 Ordinal Scale

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Sociology 6Z03 Review II

Sociology 6Z03 Review II Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability

More information

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous

More information

Chapter 23. Inferences About Means. Monday, May 6, 13. Copyright 2009 Pearson Education, Inc.

Chapter 23. Inferences About Means. Monday, May 6, 13. Copyright 2009 Pearson Education, Inc. Chapter 23 Inferences About Means Sampling Distributions of Means Now that we know how to create confidence intervals and test hypotheses about proportions, we do the same for means. Just as we did before,

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information

Glossary for the Triola Statistics Series

Glossary for the Triola Statistics Series Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling

More information

Harvard University. Rigorous Research in Engineering Education

Harvard University. Rigorous Research in Engineering Education Statistical Inference Kari Lock Harvard University Department of Statistics Rigorous Research in Engineering Education 12/3/09 Statistical Inference You have a sample and want to use the data collected

More information

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics DETAILED CONTENTS About the Author Preface to the Instructor To the Student How to Use SPSS With This Book PART I INTRODUCTION AND DESCRIPTIVE STATISTICS 1. Introduction to Statistics 1.1 Descriptive and

More information

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics Last Lecture Distinguish Populations from Samples Importance of identifying a population and well chosen sample Knowing different Sampling Techniques Distinguish Parameters from Statistics Knowing different

More information

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new

More information

Chapter 2: Tools for Exploring Univariate Data

Chapter 2: Tools for Exploring Univariate Data Stats 11 (Fall 2004) Lecture Note Introduction to Statistical Methods for Business and Economics Instructor: Hongquan Xu Chapter 2: Tools for Exploring Univariate Data Section 2.1: Introduction What is

More information

Inferential statistics

Inferential statistics Inferential statistics Inference involves making a Generalization about a larger group of individuals on the basis of a subset or sample. Ahmed-Refat-ZU Null and alternative hypotheses In hypotheses testing,

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Chapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing

Chapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing Chapter Fifteen Frequency Distribution, Cross-Tabulation, and Hypothesis Testing Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 15-1 Internet Usage Data Table 15.1 Respondent Sex Familiarity

More information

AP Statistics Cumulative AP Exam Study Guide

AP Statistics Cumulative AP Exam Study Guide AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 1- part 1: Describing variation, and graphical presentation Outline Sources of variation Types of variables Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease

More information

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials.

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials. One-Way ANOVA Summary The One-Way ANOVA procedure is designed to construct a statistical model describing the impact of a single categorical factor X on a dependent variable Y. Tests are run to determine

More information

20 Hypothesis Testing, Part I

20 Hypothesis Testing, Part I 20 Hypothesis Testing, Part I Bob has told Alice that the average hourly rate for a lawyer in Virginia is $200 with a standard deviation of $50, but Alice wants to test this claim. If Bob is right, she

More information

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical

More information

Single Sample Means. SOCY601 Alan Neustadtl

Single Sample Means. SOCY601 Alan Neustadtl Single Sample Means SOCY601 Alan Neustadtl The Central Limit Theorem If we have a population measured by a variable with a mean µ and a standard deviation σ, and if all possible random samples of size

More information

Contents. Acknowledgments. xix

Contents. Acknowledgments. xix Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Frequency Distribution Cross-Tabulation

Frequency Distribution Cross-Tabulation Frequency Distribution Cross-Tabulation 1) Overview 2) Frequency Distribution 3) Statistics Associated with Frequency Distribution i. Measures of Location ii. Measures of Variability iii. Measures of Shape

More information

HYPOTHESIS TESTING. Hypothesis Testing

HYPOTHESIS TESTING. Hypothesis Testing MBA 605 Business Analytics Don Conant, PhD. HYPOTHESIS TESTING Hypothesis testing involves making inferences about the nature of the population on the basis of observations of a sample drawn from the population.

More information

Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing

Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing So, What is Statistics? Theory and techniques for learning from data How to collect How to analyze How to interpret

More information

(Where does Ch. 7 on comparing 2 means or 2 proportions fit into this?)

(Where does Ch. 7 on comparing 2 means or 2 proportions fit into this?) 12. Comparing Groups: Analysis of Variance (ANOVA) Methods Response y Explanatory x var s Method Categorical Categorical Contingency tables (Ch. 8) (chi-squared, etc.) Quantitative Quantitative Regression

More information

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur Lecture No. # 36 Sampling Distribution and Parameter Estimation

More information

Lecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population

Lecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population Lecture 5 1 Lecture 3 The Population Variance The population variance, denoted σ 2, is the sum of the squared deviations about the population mean divided by the number of observations in the population,

More information

Review of Statistics

Review of Statistics Review of Statistics Topics Descriptive Statistics Mean, Variance Probability Union event, joint event Random Variables Discrete and Continuous Distributions, Moments Two Random Variables Covariance and

More information

Stat 101 Exam 1 Important Formulas and Concepts 1

Stat 101 Exam 1 Important Formulas and Concepts 1 1 Chapter 1 1.1 Definitions Stat 101 Exam 1 Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2. Categorical/Qualitative

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization. Statistical Tools in Evaluation HPS 41 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific number

More information

Review of Multiple Regression

Review of Multiple Regression Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate

More information

Entering and recoding variables

Entering and recoding variables Entering and recoding variables To enter: You create a New data file Define the variables on Variable View Enter the values on Data View To create the dichotomies: Transform -> Recode into Different Variable

More information

1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests

1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests Overall Overview INFOWO Statistics lecture S3: Hypothesis testing Peter de Waal Department of Information and Computing Sciences Faculty of Science, Universiteit Utrecht 1 Descriptive statistics 2 Scores

More information

Statistics: revision

Statistics: revision NST 1B Experimental Psychology Statistics practical 5 Statistics: revision Rudolf Cardinal & Mike Aitken 29 / 30 April 2004 Department of Experimental Psychology University of Cambridge Handouts: Answers

More information

For instance, we want to know whether freshmen with parents of BA degree are predicted to get higher GPA than those with parents without BA degree.

For instance, we want to know whether freshmen with parents of BA degree are predicted to get higher GPA than those with parents without BA degree. DESCRIPTIVE ANALYSIS For instance, we want to know whether freshmen with parents of BA degree are predicted to get higher GPA than those with parents without BA degree. Assume that we have data; what information

More information

Chapter 3 Multiple Regression Complete Example

Chapter 3 Multiple Regression Complete Example Department of Quantitative Methods & Information Systems ECON 504 Chapter 3 Multiple Regression Complete Example Spring 2013 Dr. Mohammad Zainal Review Goals After completing this lecture, you should be

More information

Introduction to Statistics

Introduction to Statistics Introduction to Statistics Data and Statistics Data consists of information coming from observations, counts, measurements, or responses. Statistics is the science of collecting, organizing, analyzing,

More information

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis ESP 178 Applied Research Methods 2/23: Quantitative Analysis Data Preparation Data coding create codebook that defines each variable, its response scale, how it was coded Data entry for mail surveys and

More information

Descriptive Statistics-I. Dr Mahmoud Alhussami

Descriptive Statistics-I. Dr Mahmoud Alhussami Descriptive Statistics-I Dr Mahmoud Alhussami Biostatistics What is the biostatistics? A branch of applied math. that deals with collecting, organizing and interpreting data using well-defined procedures.

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. Statistics is a field of study concerned with the data collection,

More information

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). For example P(X.04) =.8508. For z < 0 subtract the value from,

More information

Chapter 9 Inferences from Two Samples

Chapter 9 Inferences from Two Samples Chapter 9 Inferences from Two Samples 9-1 Review and Preview 9-2 Two Proportions 9-3 Two Means: Independent Samples 9-4 Two Dependent Samples (Matched Pairs) 9-5 Two Variances or Standard Deviations Review

More information

This gives us an upper and lower bound that capture our population mean.

This gives us an upper and lower bound that capture our population mean. Confidence Intervals Critical Values Practice Problems 1 Estimation 1.1 Confidence Intervals Definition 1.1 Margin of error. The margin of error of a distribution is the amount of error we predict when

More information

Econ 325: Introduction to Empirical Economics

Econ 325: Introduction to Empirical Economics Econ 325: Introduction to Empirical Economics Chapter 9 Hypothesis Testing: Single Population Ch. 9-1 9.1 What is a Hypothesis? A hypothesis is a claim (assumption) about a population parameter: population

More information

Background to Statistics

Background to Statistics FACT SHEET Background to Statistics Introduction Statistics include a broad range of methods for manipulating, presenting and interpreting data. Professional scientists of all kinds need to be proficient

More information

Inferences About the Difference Between Two Means

Inferences About the Difference Between Two Means 7 Inferences About the Difference Between Two Means Chapter Outline 7.1 New Concepts 7.1.1 Independent Versus Dependent Samples 7.1. Hypotheses 7. Inferences About Two Independent Means 7..1 Independent

More information

Degrees of freedom df=1. Limitations OR in SPSS LIM: Knowing σ and µ is unlikely in large

Degrees of freedom df=1. Limitations OR in SPSS LIM: Knowing σ and µ is unlikely in large Z Test Comparing a group mean to a hypothesis T test (about 1 mean) T test (about 2 means) Comparing mean to sample mean. Similar means = will have same response to treatment Two unknown means are different

More information

1.0 Continuous Distributions. 5.0 Shapes of Distributions. 6.0 The Normal Curve. 7.0 Discrete Distributions. 8.0 Tolerances. 11.

1.0 Continuous Distributions. 5.0 Shapes of Distributions. 6.0 The Normal Curve. 7.0 Discrete Distributions. 8.0 Tolerances. 11. Chapter 4 Statistics 45 CHAPTER 4 BASIC QUALITY CONCEPTS 1.0 Continuous Distributions.0 Measures of Central Tendency 3.0 Measures of Spread or Dispersion 4.0 Histograms and Frequency Distributions 5.0

More information

CIVL 7012/8012. Collection and Analysis of Information

CIVL 7012/8012. Collection and Analysis of Information CIVL 7012/8012 Collection and Analysis of Information Uncertainty in Engineering Statistics deals with the collection and analysis of data to solve real-world problems. Uncertainty is inherent in all real

More information

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests Chapters 3.5.1 3.5.2, 3.3.2 Prof. Tesler Math 283 Fall 2018 Prof. Tesler z and t tests for mean Math

More information

Chapter 23. Inference About Means

Chapter 23. Inference About Means Chapter 23 Inference About Means 1 /57 Homework p554 2, 4, 9, 10, 13, 15, 17, 33, 34 2 /57 Objective Students test null and alternate hypotheses about a population mean. 3 /57 Here We Go Again Now that

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization. Statistical Tools in Evaluation HPS 41 Fall 213 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific

More information

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics Mathematics Curriculum A. DESCRIPTION This is a full year courses designed to introduce students to the basic elements of statistics and probability. Emphasis is placed on understanding terminology and

More information

BIOS 6222: Biostatistics II. Outline. Course Presentation. Course Presentation. Review of Basic Concepts. Why Nonparametrics.

BIOS 6222: Biostatistics II. Outline. Course Presentation. Course Presentation. Review of Basic Concepts. Why Nonparametrics. BIOS 6222: Biostatistics II Instructors: Qingzhao Yu Don Mercante Cruz Velasco 1 Outline Course Presentation Review of Basic Concepts Why Nonparametrics The sign test 2 Course Presentation Contents Justification

More information

Two-Sample Inferential Statistics

Two-Sample Inferential Statistics The t Test for Two Independent Samples 1 Two-Sample Inferential Statistics In an experiment there are two or more conditions One condition is often called the control condition in which the treatment is

More information

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career. Introduction to Data and Analysis Wildlife Management is a very quantitative field of study Results from studies will be used throughout this course and throughout your career. Sampling design influences

More information

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data Chapter 2: Summarising numerical data Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data Extract from Study Design Key knowledge Types of data: categorical (nominal and ordinal)

More information

Mathematical Notation Math Introduction to Applied Statistics

Mathematical Notation Math Introduction to Applied Statistics Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and should be emailed to the instructor

More information

2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table 2.0 Lesson Plan Answer Questions 1 Summary Statistics Histograms The Normal Distribution Using the Standard Normal Table 2. Summary Statistics Given a collection of data, one needs to find representations

More information

Overview. INFOWO Statistics lecture S1: Descriptive statistics. Detailed Overview of the Statistics track. Definition

Overview. INFOWO Statistics lecture S1: Descriptive statistics. Detailed Overview of the Statistics track. Definition Overview INFOWO Statistics lecture S1: Descriptive statistics Peter de Waal Introduction to statistics Descriptive statistics Department of Information and Computing Sciences Faculty of Science, Universiteit

More information

ST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart

ST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart ST2001 2. Presenting & Summarising Data Descriptive Statistics Frequency Distribution, Histogram & Bar Chart Summary of Previous Lecture u A study often involves taking a sample from a population that

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart

More information

Statistical Methods. by Robert W. Lindeman WPI, Dept. of Computer Science

Statistical Methods. by Robert W. Lindeman WPI, Dept. of Computer Science Statistical Methods by Robert W. Lindeman WPI, Dept. of Computer Science gogo@wpi.edu Descriptive Methods Frequency distributions How many people were similar in the sense that according to the dependent

More information

A is one of the categories into which qualitative data can be classified.

A is one of the categories into which qualitative data can be classified. Chapter 2 Methods for Describing Sets of Data 2.1 Describing qualitative data Recall qualitative data: non-numerical or categorical data Basic definitions: A is one of the categories into which qualitative

More information

Hypothesis testing. Data to decisions

Hypothesis testing. Data to decisions Hypothesis testing Data to decisions The idea Null hypothesis: H 0 : the DGP/population has property P Under the null, a sample statistic has a known distribution If, under that that distribution, the

More information

STATISTICS 141 Final Review

STATISTICS 141 Final Review STATISTICS 141 Final Review Bin Zou bzou@ualberta.ca Department of Mathematical & Statistical Sciences University of Alberta Winter 2015 Bin Zou (bzou@ualberta.ca) STAT 141 Final Review Winter 2015 1 /

More information

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between 7.2 One-Sample Correlation ( = a) Introduction Correlation analysis measures the strength and direction of association between variables. In this chapter we will test whether the population correlation

More information

TOPIC: Descriptive Statistics Single Variable

TOPIC: Descriptive Statistics Single Variable TOPIC: Descriptive Statistics Single Variable I. Numerical data summary measurements A. Measures of Location. Measures of central tendency Mean; Median; Mode. Quantiles - measures of noncentral tendency

More information

MATH 1150 Chapter 2 Notation and Terminology

MATH 1150 Chapter 2 Notation and Terminology MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the

More information

Statistics and parameters

Statistics and parameters Statistics and parameters Tables, histograms and other charts are used to summarize large amounts of data. Often, an even more extreme summary is desirable. Statistics and parameters are numbers that characterize

More information

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong Statistics Primer ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong 1 Quick Overview of Statistics 2 Descriptive vs. Inferential Statistics Descriptive Statistics: summarize and describe data

More information

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x

More information

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies The t-test: So Far: Sampling distribution benefit is that even if the original population is not normal, a sampling distribution based on this population will be normal (for sample size > 30). Benefit

More information

http://www.statsoft.it/out.php?loc=http://www.statsoft.com/textbook/ Group comparison test for independent samples The purpose of the Analysis of Variance (ANOVA) is to test for significant differences

More information

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

 M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2 Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the

More information

Introduction to Statistics with GraphPad Prism 7

Introduction to Statistics with GraphPad Prism 7 Introduction to Statistics with GraphPad Prism 7 Outline of the course Power analysis with G*Power Basic structure of a GraphPad Prism project Analysis of qualitative data Chi-square test Analysis of quantitative

More information

Nicole Dalzell. July 2, 2014

Nicole Dalzell. July 2, 2014 UNIT 1: INTRODUCTION TO DATA LECTURE 3: EDA (CONT.) AND INTRODUCTION TO STATISTICAL INFERENCE VIA SIMULATION STATISTICS 101 Nicole Dalzell July 2, 2014 Teams and Announcements Team1 = Houdan Sai Cui Huanqi

More information

Last week: Sample, population and sampling distributions finished with estimation & confidence intervals

Last week: Sample, population and sampling distributions finished with estimation & confidence intervals Past weeks: Measures of central tendency (mean, mode, median) Measures of dispersion (standard deviation, variance, range, etc). Working with the normal curve Last week: Sample, population and sampling

More information

Exam details. Final Review Session. Things to Review

Exam details. Final Review Session. Things to Review Exam details Final Review Session Short answer, similar to book problems Formulae and tables will be given You CAN use a calculator Date and Time: Dec. 7, 006, 1-1:30 pm Location: Osborne Centre, Unit

More information

STAT 4385 Topic 01: Introduction & Review

STAT 4385 Topic 01: Introduction & Review STAT 4385 Topic 01: Introduction & Review Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 Outline Welcome What is Regression Analysis? Basics

More information

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

One-Way ANOVA. Some examples of when ANOVA would be appropriate include: One-Way ANOVA 1. Purpose Analysis of variance (ANOVA) is used when one wishes to determine whether two or more groups (e.g., classes A, B, and C) differ on some outcome of interest (e.g., an achievement

More information

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling Review for Final For a detailed review of Chapters 1 7, please see the review sheets for exam 1 and. The following only briefly covers these sections. The final exam could contain problems that are included

More information

Sets and Set notation. Algebra 2 Unit 8 Notes

Sets and Set notation. Algebra 2 Unit 8 Notes Sets and Set notation Section 11-2 Probability Experimental Probability experimental probability of an event: Theoretical Probability number of time the event occurs P(event) = number of trials Sample

More information

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI Introduction of Data Analytics Prof. Nandan Sudarsanam and Prof. B Ravindran Department of Management Studies and Department of Computer Science and Engineering Indian Institute of Technology, Madras Module

More information

Relating Graph to Matlab

Relating Graph to Matlab There are two related course documents on the web Probability and Statistics Review -should be read by people without statistics background and it is helpful as a review for those with prior statistics

More information

HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC

HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC 1 HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC 7 steps of Hypothesis Testing 1. State the hypotheses 2. Identify level of significant 3. Identify the critical values 4. Calculate test statistics 5. Compare

More information

Lecture Slides. Elementary Statistics. by Mario F. Triola. and the Triola Statistics Series

Lecture Slides. Elementary Statistics. by Mario F. Triola. and the Triola Statistics Series Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 13 Nonparametric Statistics 13-1 Overview 13-2 Sign Test 13-3 Wilcoxon Signed-Ranks

More information

Lecture Slides. Section 13-1 Overview. Elementary Statistics Tenth Edition. Chapter 13 Nonparametric Statistics. by Mario F.

Lecture Slides. Section 13-1 Overview. Elementary Statistics Tenth Edition. Chapter 13 Nonparametric Statistics. by Mario F. Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 13 Nonparametric Statistics 13-1 Overview 13-2 Sign Test 13-3 Wilcoxon Signed-Ranks

More information

Unit 2. Describing Data: Numerical

Unit 2. Describing Data: Numerical Unit 2 Describing Data: Numerical Describing Data Numerically Describing Data Numerically Central Tendency Arithmetic Mean Median Mode Variation Range Interquartile Range Variance Standard Deviation Coefficient

More information

First we look at some terms to be used in this section.

First we look at some terms to be used in this section. 8 Hypothesis Testing 8.1 Introduction MATH1015 Biostatistics Week 8 In Chapter 7, we ve studied the estimation of parameters, point or interval estimates. The construction of CI relies on the sampling

More information

Can you tell the relationship between students SAT scores and their college grades?

Can you tell the relationship between students SAT scores and their college grades? Correlation One Challenge Can you tell the relationship between students SAT scores and their college grades? A: The higher SAT scores are, the better GPA may be. B: The higher SAT scores are, the lower

More information