Elementary Statistics for the Biological and Life Sciences

Size: px
Start display at page:

Download "Elementary Statistics for the Biological and Life Sciences"

Transcription

1 Elementary Statistics for the Biological and Life Sciences STAT 205 University of South Carolina Columbia, SC 2005, University of South Carolina. All rights reserved, except t where previous rights exist. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or by any means electronic, mechanical, photoreproduction, recording, or scanning without the prior written consent of the University of South Carolina. DoStat Sign-up Go to SIGN UP as a student using your VIP login name as your DOSTAT login name Use course reference DS- Submit an e address you read often ( this is how you will receive info. on course announcements) STAT205 Elementary Statistics for the Biological and Life Sciences 2 DoStat and StatCrunch We will use the StatCrunch online statistical system for online statistical computations and graphics. We will also use the DoStat course management system for homework and example online calculations. STAT205 Elementary Statistics for the Biological and Life Sciences 3

2 Motivation: why analyze data? Clinical trials/drug development: compare existing treatments with new methods to cure disease. Agriculture: : enhance crop yields, improve pest resistance Ecology: : study how ecosystems develop/respond to environmental impacts Lab studies: : learn more about biological tissue/cellular activity STAT205 Elementary Statistics for the Biological and Life Sciences 4 Chapter 2: Description of Populations and Samples Selected tables and figures from Samuels, M. L., and Witmer,, J. A., Statistics for the Life Sciences,, 3rd Ed. 2003, Prentice Hall, Upper Saddle River, NJ. Used by per- mission. STAT205 Elementary Statistics for the Biological and Life Sciences 5 Statistics is: Statistics is the science of collecting, summarizing, analyzing, and interpreting data. Goal: to understand the underlying biological phenomena that generate the data. STAT205 Elementary Statistics for the Biological and Life Sciences 6

3 Random Variables Data are generated by some random process or phenomenon. Any observed datum represents the outcome of a Random Variable. NOTATION: : upper case letter, W, X, Y, etc. STAT205 Elementary Statistics for the Biological and Life Sciences 7 Types of Random Variables Qualitative Nominal (e.g., blood type A, B, AB, O) Ordinal (e.g., therapy response none, some, cured) Quantitative Discrete (e.g., number of nests 0,1,2, ) Continuous (e.g., cholesterol conc , 210.4, 180.9, etc.) STAT205 Elementary Statistics for the Biological and Life Sciences 8 Random Samples We take data as samples from a larger population. DEF N: : A SAMPLE is a collection of subjects upon which we measure one or more variables. DEF N: : The SAMPLE SIZE is the number of subjects in a sample. NOTATION: : n. STAT205 Elementary Statistics for the Biological and Life Sciences 9

4 Observations DEF N: : The OBSERVATIONAL UNIT is the type of subject being sampled. Example: observational units could be (i) baby, (ii) moth, (iii), Petri dish, etc. DEF N: : An OBSERVATION is a recorded outcome of a variable from a random sample. NOTATION: : lower case letter, x, y, etc. STAT205 Elementary Statistics for the Biological and Life Sciences 10 Frequency Distributions DEF N: : A FREQUENCY DISTRIBUTION is a summary display of the frequencies of occurrence of each value in a sample. DEF N: : A RELATIVE FREQUENCY (a percent or proportion ) is a raw frequency divided by sample size, n: Rel. Freq. = Freq n STAT205 Elementary Statistics for the Biological and Life Sciences 11 Frequency Distn s Frequency distributions come in varied shapes: Symmetric & bell-shaped Symmetric, not bell-shaped Asymmetric & skewed right Asymmetric & skewed left Bimodal We use histograms, etc., to visualize these shapes in the data. STAT205 Elementary Statistics for the Biological and Life Sciences 12

5 Example 2.4 Ex. 2.4: : Y = no. of piglets surviving 21 days (litter size). A sample of n=36 pigs (sows) generated the data in Table 2.4. STAT205 Elementary Statistics for the Biological and Life Sciences 13 Dot Plot A DOT PLOT is a simple graphic where dots indicate observed data in a sample. Ex. 2.4: : Fig. 2.4 gives the dot plot for the litter size data: STAT205 Elementary Statistics for the Biological and Life Sciences 14 Histogram A HISTOGRAM is a simple bar chart where the bars replace the dots in a dot plot. Ex. 2.4 (cont d): Fig. 2.5 gives the histogram for the litter size data. STAT205 Elementary Statistics for the Biological and Life Sciences 15

6 Stemplot A STEMPLOT (a.k.a. STEM-LEAF DIAGRAM) ) is a dot plot (often drawn on its side) with data information replacing the dots. The stems are the core values of the data, set in common groups. The leaves are the last digits of each datum. STAT205 Elementary Statistics for the Biological and Life Sciences 16 Example 2.8 Ex. 2.8: : Y = radish growth. Data in Table 2.8: Radish Growth after 3 days in Total Darkness STAT205 Elementary Statistics for the Biological and Life Sciences 17 Descriptive Statistics DEF N: : The SAMPLE MEAN is the arithmetic average of a set of n data values. NOTATION: y = n 1 n Σ y i i=1 = y 1 + y y n n The sample mean is often viewed as a kind of balance point in the data. STAT205 Elementary Statistics for the Biological and Life Sciences 18

7 Example 2.15 Ex. 2.15: : Y = weight gain (lb) of lambs on special diet. Data: {11, 13, 19, 2, 10, 1} n = 6: Fig. 2.27: y = = = 9.33 lb STAT205 Elementary Statistics for the Biological and Life Sciences 19 Sample Median DEF N: : The SAMPLE MEDIAN is the value of the data nearest to their middle. Find the median by ordering the data, and calculating their middle point (n odd) or the average of their two middle points (n even). NOTATION: : Q 2 STAT205 Elementary Statistics for the Biological and Life Sciences 20 Example 2.17 Ex. 2.17: : (2.15 cont d) Lamb weight gain. n = 6 is even, so find Q 2 as avg. of two middle points ordered data: y (1) = 1, y (2) = 2, y (3) = 10, y (4) = 11, y (5) = 13, y (6) = 19. Q 2 = = 10.5 lb STAT205 Elementary Statistics for the Biological and Life Sciences 21

8 Example 2.19 Ex. 2.19: : Y = cricket singing times. Data in Table 2.10: STAT205 Elementary Statistics for the Biological and Life Sciences 22 Example 2.19 (cont d) STAT205 Elementary Statistics for the Biological and Life Sciences 23 Skewness Mean & median indicate skewness: If data are skewed right, mean > median. If data are skewed left, mean < median. If data are symmetric, mean median. Both the mean and the median are useful summary measures of location.. The median is slightly more ROBUST to extreme values of y i, but of course, the mean is easier to calculate. STAT205 Elementary Statistics for the Biological and Life Sciences 24

9 Quartiles DEF N: : The QUARTILES of a distribution are points that separate the data into quarters or fourths: The first quartile separates the lower 25% of the data from the upper 75%. NOTATION: : Q 1 The second quartile separates the lower 50% of the data from the upper 50%. NOTATION: : Q 2 The third quartile separates the lower 75% of the data from the upper 25%. NOTATION: : Q 3 STAT205 Elementary Statistics for the Biological and Life Sciences 25 Example 2.20 Ex. 2.20: : Y = Systolic blood pressure (mm mm Hg) ) in men; n= 7. Ordered data: y (1) = 113, y (2) = 124, y (3) = 124, y (4) = 132, y (5) = 146, y (6) = 151, y (7) = 170. Q 1 = 124 Q 2 = 132 Q 3 = 151 STAT205 Elementary Statistics for the Biological and Life Sciences 26 IQR DEF N: : The INTER-QUARTILE RANGE is IQR = Q 3 Q 1 DEF N: : The MINIMUM is the smallest value of a data set or distribution. NOTATION: : y (1) DEF N: : The MAXIMUM is the largest value of a data set or distribution. NOTATION: y (n) STAT205 Elementary Statistics for the Biological and Life Sciences 27

10 Five Number Summary DEF N: : The FIVE NUMBER SUMMARY is {y (1), Q 1, Q 2, Q 3, y (n) } DEF N: : A BOXPLOT is a graphic plot of the 5-no. summary, with a box spanning the IQR and bridging the quartiles: y (1) Q 1 Q 2 Q 3 y (n) STAT205 Elementary Statistics for the Biological and Life Sciences 28 Example 2.22 Ex. 2.22: : Y = radish growth data from Ex Five-no. summary is {8, 15, 21, 30, 37}. Boxplot is given in Fig. 2.30: STAT205 Elementary Statistics for the Biological and Life Sciences 29 Example 2.23 Ex. 2.23: : Y = radish growth data over three different growth regimes (see Ex. 2.9). In Fig. 2.32, we use boxplots for compar- ative purposes. STAT205 Elementary Statistics for the Biological and Life Sciences 30

11 Outliers DEF N: : An OUTLIER is an obsv n that differs dramatically from the rest of the data. Formally: Y i is an outlier if Y i < Q 1 (1.5 IQR) ) or Y i > Q 3 + (1.5 IQR) lower fence upper fence STAT205 Elementary Statistics for the Biological and Life Sciences 31 Example 2.25 Ex. 2.25: : Y = radish growth data in full light (from Ex. 2.23). The ordered data are: 3, 5, 5, 7, 7, 8, 9, 10, 10, 10, 10, 14, 20, 21 IQR = Q 3 Q 1 = 10 7 = 3 Upper fence = Q 3 + (1.5 IQR) = 10 + (1.5)(3) = 14.5 Lower fence = Q 1 (1.5 IQR) = 7 (1.5)(3) = 2.5 y = 20 and y = 21 are outliers. STAT205 Elementary Statistics for the Biological and Life Sciences 32 Dispersion DEF N: : The SAMPLE RANGE is Range = Y (n) Y (1) = Max. Min. DEF N: : The SAMPLE VARIANCE is S 2 = 1 n-1 n Σi=1 (Y i - Y) 2 DEF N: : The SAMPLE STANDARD DEVIATION (SD)) is S = S 2 STAT205 Elementary Statistics for the Biological and Life Sciences 33

12 The Empirical Rule The sample mean and the sample SD are useful in describing data sets (that are unimodal and not too skewed). The EMPIRICAL RULE states that ~68% of the data lie between Y - S and Y + S ~95% of the data lie between Y - 2 S and Y + 2 S >99% of the data lie between Y - 3 S and Y + 3 S STAT205 Elementary Statistics for the Biological and Life Sciences 34 Example 2.36 Ex. 2.36: Suppose Y = pulse rate after 5 mins. of exercise. For n = 28 subjects, we find Y = 98 (beats/min) and S = 13.4 (beats/min). Thus, e.g., from the empirical rule we expect ~95% of the data to lie between 98 (2)(13.4) = = 71.2 beats/min and 98 + (2)(13.4) = = beats/min. STAT205 Elementary Statistics for the Biological and Life Sciences 35 Inference DEF N: : The POPULATION is the larger group of subjects (organisms, plots, regions, ecosystems, etc.) on which we wish to draw inferences. DEF N: : A PARAMETER is a quantified population characteristic. E.g., the popl n mean is µ and popl n standard deviation is σ. DEF N: : A STATISTIC is a sample quantity used to estimate a popl n parameter. STAT205 Elementary Statistics for the Biological and Life Sciences 36

13 Proportions DEF N: : The POPULATION PROPORTION is the proportion of subjects exhibiting a particular trait or outcome in the popl n. (It generalizes to the probability that any popl n element will exhibit the trait.) NOTATION: p DEF N: : The SAMPLE PROPORTION is the number of sample elements exhibiting the trait, divided by the sample size, n. NOTATION: p STAT205 Elementary Statistics for the Biological and Life Sciences 37 Chapter 3: Random Sampling, Probability, and the Binomial Distribution Selected tables and figures from Samuels, M. L., and Witmer,, J. A., Statistics for the Life Sciences,, 3rd Ed. 2003, Prentice Hall, Upper Saddle River, NJ. Used by per- mission. STAT205 Elementary Statistics for the Biological and Life Sciences 38 Random Samples DEF N: : A SIMPLE RANDOM SAMPLE of n items is a data set where (a) every popl n element has an equal chance of selection, and (b) every popl n element is chosen independently of every other element. This draws upon the larger concept of RANDOMIZATION: : selection of data that avoids sources of possible bias. STAT205 Elementary Statistics for the Biological and Life Sciences 39

14 Random Sampling To choose a random sample: 1. assign each popl n element a unique code (or set of codes); 2. from a random number table (Table 1, p. 670) or via computer, in a systematic manner select n random digits whose range corresponds to the codes assigned above; and 3. select every element if its code appears in step (2), ignoring repeated codes or those with no assignment. STAT205 Elementary Statistics for the Biological and Life Sciences 40 Example 3.1 Ex. 3.1: : Simple random sample of size n = 6 from population of 75 elements. 1. label each element 01, 02,,, select random digits from a source such as Table 1 or DoStat 3. choose elements for the sample if they correspond to the selected random digits (ignore repeats and drop-outs) outs) See Table 3.1 STAT205 Elementary Statistics for the Biological and Life Sciences 41 Example 3.1 (cont d) The sample uses elements 23, 38, 59, 21, 08, 09 STAT205 Elementary Statistics for the Biological and Life Sciences 42

15 Probability DEF N: : A PROBABILITY is the chance of some event, E, occurring in a specified manner. NOTATION: : P{E} We often view probabilities from a Relative Frequency Interpretation: P{E} = # ways E occurs # total events STAT205 Elementary Statistics for the Biological and Life Sciences 43 Example 3.12 Ex. 3.12: : Toss a fair coin twice. We know P{H} = 1/2 (see Ex. 3.8). What is P{HH}? Consider all possible outcomes: HH, HT, TH, TT If each outcome is equally likely, then P{HH} = = 1 4 # HH # all outcomes STAT205 Elementary Statistics for the Biological and Life Sciences 44 Probability Rules Rule 1: 0 P{E} 1. Rule 2: The entirety of events has probability = 1. That is, if E 1,..., E k are all the possible events, P{E i } = 1. Rule 3: (The Complement Rule): If E c = {not{ E}, then P{E c } = 1 P{E}. STAT205 Elementary Statistics for the Biological and Life Sciences 45

16 Example 3.19 Ex. 3.19: U.S. Blood types: P{O} = 0.44 P{A} = 0.42 P{B} = 0.10 P{AB} = 0.04 Note: (1) all are between 0 and 1 and (2) P{O} + P{A} + P{B} + P{AB} = = 1.00 So, e.g., P{O c } = 1 P{O} = = 0.56 STAT205 Elementary Statistics for the Biological and Life Sciences 46 Probability (cont d) DEF N: : Two events, E 1 and E 2, are DISJOINT (a.k.a MUTUALLY EXCLUSIVE) ) if they cannot occur simultaneously. DEF N: : The UNION of two events, E 1 and E 2, is the event that E 1 or E 2 (or both) occurs. DEF N: : The INTERSECTION of two events, E 1 and E 2, is the event that E 1 and E 2 occurs. STAT205 Elementary Statistics for the Biological and Life Sciences 47 Venn Diagrams A useful graphic to conceptualize how events interrelate is the Venn Diagram. For example, Fig. 3.8 shows a Venn Diagram with 2 intersecting events, E 1 and E 2 : STAT205 Elementary Statistics for the Biological and Life Sciences 48

17 Probability Rules (cont d) We often denote the entirety of events as the Sample Space, S.. Conversely, the Null Space is = S c Rule 4: If E 1 and E 2 are disjoint, then P{E 1 or E 2 } = P{E 1 } + P{E 2 }. Rule 5: If E 1 and E 2 are any two events, then P{E 1 or E 2 } = P{E 1 } + P{E 2 } P{E 1 and E 2 }. STAT205 Elementary Statistics for the Biological and Life Sciences 49 Example 3.20 Ex. 3.20: : Hair/Eye color of 1770 men. We have the following distribution of traits: So, e.g., P{Black Hair} = 500/1770, etc. STAT205 Elementary Statistics for the Biological and Life Sciences 50 Example 3.20 (cont d) Find P{Black Hair OR Red Hair}. Clearly, E 1 = {Black Hair} and E 2 = {Red Hair} are disjoint, so from Rule 4, P{Black Hair OR Red Hair} = P{Black Hair} + P{Red Hair} = 500/ /1770 = 570/1770 = STAT205 Elementary Statistics for the Biological and Life Sciences 51

18 Example 3.20 (cont d) Now, find P{Black Hair OR Blue Eyes}. Here, E 1 = {Black Hair} and E 2 = {Blue Eyes} are NOT disjoint, so apply Rule 5: P{Black Hair OR Blue Eyes} = P{Black Hair} + P{Blue Eyes} P{Black Hair AND Blue Eyes} = 500/ / /1770 = 1350/1770 = STAT205 Elementary Statistics for the Biological and Life Sciences 52 Probability (cont d) DEF N: : Two events, E 1 and E 2, are INDEPENDENT if knowledge that E 1 occurs does not affect P{E 2 } and vice versa. If two events are not independent, they are DEPENDENT. DEF N: : A CONDITIONAL PROBABILITY is the probability that 1 event occurs, given that the other has already occurred. NOTATION: : P{E 1 E 2 }. STAT205 Elementary Statistics for the Biological and Life Sciences 53 Probability Rules (cont d) Rule 6: If E 1 and E 2 are independent, then P{E 1 and E 2 } = P{E 1 } P{E 2 }. Rule 7: If E 1 and E 2 are any two events, then P{E 1 and E 2 } = P{E 1 } P{E 2 E 1 } = P{E 2 } P{E 1 E 2 }. Consequences: if E 1 and E 2 are independent, then P{E 1 } = P{E 1 E 2 } and P{E 2 } = P{E 2 E 1 } also, P{E 2 E 1 } = P{E 1 and E 2 }/P{E 1 } if P{E 1 } 0. STAT205 Elementary Statistics for the Biological and Life Sciences 54

19 Examples Exs (3.20, cont d): Hair/Eye color of 1770 men. Refer back to Table There, we saw P{Blue Eyes AND Black Hair} = 200/1770, while P{Black Hair} = 500/1770. So, P{Blue Eyes Black Hair} P{Blue Eyes AND Black Hair } = P{Black Hair} = 200/ /1770 = = 0.40 STAT205 Elementary Statistics for the Biological and Life Sciences 55 Example 3.25 Ex (3.20, cont d): Hair/Eye color of 1770 men. In Table 3.3, there t is no evidence of indepen- dence between Hair & Eye color. So, e.g., P{Red Hair AND Brown Eyes} = P{Red Hair} P{Brown Eyes Red Hair} = = which agrees with the display in Table 3.3. STAT205 Elementary Statistics for the Biological and Life Sciences 56 Density Curves DEF N: : A RANDOM VARIABLE is a measured outcome of some random process. When a random variable is discrete,, it is usually straightforward to interpret probabilities associated with it. For instance, if Y = {# leaves on tree}: P{Y = 122} = 0.42 is interpretable P{Y = 18} = 0.02 is interpretable but P{Y= } is not interpretable. STAT205 Elementary Statistics for the Biological and Life Sciences 57

20 Probability Histogram A probability histogram is used to visualize discrete probability masses: P{Y=k} k Notice: each mass has area=probability, and all masses sum to 1. STAT205 Elementary Statistics for the Biological and Life Sciences 58 Continuous Random Variables By contrast, a continuous random variable has a different probability interpretation. Extending the probability histogram to the continuous case, we say Y has a PROBABILITY DENSITY CURVE,, where area still represents probability. STAT205 Elementary Statistics for the Biological and Life Sciences 59 Continuous Random Variables Consequences of the continuous probability model: P{Y = a} = 0 = P{Y = b} (area of a line is zero) So, P{Y a} = P{Y < a} + P{Y = a} = P{Y < a} And for that matter: P{a Y b} = P{a < Y b} = P{a Y < b} = P{a < Y < b} (all if Y is continuous). STAT205 Elementary Statistics for the Biological and Life Sciences 60

21 Example 3.30 Ex. 3.30: : Y = diameter (in.) of tree trunk. Suppose the density has the form given in Fig. 3.13: Then, for example, P{Y > 8} = P{8 < Y 10} + P{Y > 10} = = 0.19 STAT205 Elementary Statistics for the Biological and Life Sciences 61 Mean and Expected Value DEF N: : If Y is a discrete random variable, its POPULATION MEAN is given by µ Y = y i P{Y = y i } (where the sum is taken over all possible y i s) More generally, the EXPECTED VALUE of Y is E(Y) = y i P{Y = y i }. STAT205 Elementary Statistics for the Biological and Life Sciences 62 Example 3.35 Ex. 3.35: Y = # tail vertebrae in fish. From Table 3.4 we find y i P{Y = y i } So, E(Y) = y i P{Y = y i } = (20)(.03) + (21)(.51) + (22)(.40) + (23)(.06) = = STAT205 Elementary Statistics for the Biological and Life Sciences 63

22 Variance DEF N: : If Y is a discrete random variable, its POPULATION VARIANCE is given by σ 2 Y = (y i µ Y ) 2 P{Y = y i } One can show this is also σ Y 2 = E(Y 2 ) {E(Y)} 2 = E(Y 2 ) µ 2 Y From this, the POPULATION STANDARD DEVIATION of Y is σ Y = (σ( 2 Y ) 1/2. STAT205 Elementary Statistics for the Biological and Life Sciences 64 Example 3.37 Ex. 3.37: (3.35, cont d). From Table 3.4 we were given the values of P{Y = y i }. Recall µ Y = So, σ 2 Y = (y i µ Y ) 2 P{Y = y i } = ( )2 (.03) + ( ) 21.49) 2 (.51) + ( ) 21.49) 2 (.40) + ( ) 21.49) 2 (.06) = = STAT205 Elementary Statistics for the Biological and Life Sciences 65 Example 3.37 (cont d) So σ 2 Y = But, it s s a lot easier to use σ Y 2 = E(Y 2 ) µ 2 Y = {(20) 2 (.03) + (21) 2 (.51) + (22) 2 (.40) + (23) 2 (.06)} (21.49) 2 = = STAT205 Elementary Statistics for the Biological and Life Sciences 66

23 Rules of Expected Value E( ) ) is a mathematical operator. It has certain general properties: Rule E1: E(aX + by) ) = ae(x) ) + be(y) = aµ X + bµ Y Rule E2: E(a + by) ) = a + be(y) ) = a + bµ Y (a linear operator ) STAT205 Elementary Statistics for the Biological and Life Sciences 67 Rules of Variance The special variance operator also has certain general properties: Rule E3: : If X and Y are independent, then σ X+Y X+Y2 = σ X2 + σ Y2. Rule E4: : If X and Y are independent, then σ X Y2 = σ X2 + σ Y2. General rule: If X and Y are independent, then σ ax+by2 = a 2 σ X2 + b 2 σ Y2. STAT205 Elementary Statistics for the Biological and Life Sciences 68 Example 3.41 Ex. 3.41: X = mass of cylinder from balance. Y = mass of cylinder from 2nd balance. Suppose σ X = 0.03 and σ Y = Then, if we calculate the difference between the two weighings,, X Y, we know σ X-Y = σ X 2 + σ Y 2 = = = = 0.05 STAT205 Elementary Statistics for the Biological and Life Sciences 69

24 Independent Trials DEF N: : The INDEPENDENT TRIALS MODEL occurs when (i) n independent trials are studied (ii) each trial results in a single binary obsv n (iii) each trial s s success has (constant) probability: P{success} } = p Notice that if P{success} } = p, P{failure} } = 1 p. 1 We call this a BInS (Binary / Indep. / n is const. / Same p) setting. STAT205 Elementary Statistics for the Biological and Life Sciences 70 Example 3.43 Ex 3.43: : Suppose 39% of organisms in a popl n exhibit a mutant trait. Sample n=5 organisms randomly and check for mutation: Binary? (mutant vs. non-mutant) Indep.? (if no bias in sampling) n const.? (n=5) Same p? (p = 0.39) STAT205 Elementary Statistics for the Biological and Life Sciences 71 Binomial Distribution DEF N: : In a BInS setting, if we let Y = {# successes} then Y has a BINOMIAL DISTRIBUTION. NOTATION: : Y ~ Bin(n,p). The binomial probability function is P{Y = j} = n C j p j (1 p) n j (j = 0,1,,n).,n). STAT205 Elementary Statistics for the Biological and Life Sciences 72

25 Binomial Coefficient In the binomial probability function P{Y = j} = n C j p j (1 p) n j the BINOMIAL COEFFICIENT is nc j = n! j! (n-j)! Also, j! is the FACTORIAL OPERATOR: j! = j(j 1)(j 1)(j 2) (2)(1) We define 0! = 1. STAT205 Elementary Statistics for the Biological and Life Sciences 73 Factorial Operator Example of factorial operator: at n = 5, 5! = (5)(4)(3)(2)(1) = 120 4! = (4)(3)(2)(1) = 24 3! = (3)(2)(1) = 6 2! = (2)(1) = 2 So: j nc j (Also see Table 3.6 on page 105 of text.) Values of n C j are given in Table 2 (p. 674) STAT205 Elementary Statistics for the Biological and Life Sciences 74 Table 3.6 STAT205 Elementary Statistics for the Biological and Life Sciences 75

26 Example 3.45 Ex 3.45 (Ex cont d): Y ~ Bin(5, 0.39); So P{Y = 3} = 5 C 3 (.39) 3 (.61) 2 = (10)(.0593)(.3721) = Can also find this via DoStat.. Table 3.7 gives the full distribution. Figure 3.15 gives a probability histogram. STAT205 Elementary Statistics for the Biological and Life Sciences 76 Binomial Mean & Variance If Y ~ Bin(n,p), the population mean and variance are: µ Y = np and σ 2 Y = np(1 p) p) Ex. 3.49: : Y = {# Rh + in BInS sample}. We re given p = P{Rh + } = So, if n = 6, we expect µ Y = (6)(0.85) = 5.1 Rh + in the sample, with σ 2 Y = (6)(.85)(.15) = 0.765, so that σ Y =.765 = 0.87 Rh +. STAT205 Elementary Statistics for the Biological and Life Sciences 77 Chapter 4: The Normal Distribution Selected tables and figures from Samuels, M. L., and Witmer,, J. A., Statistics for the Life Sciences,, 3rd Ed. 2003, Prentice Hall, Upper Saddle River, NJ. Used by per- mission. STAT205 Elementary Statistics for the Biological and Life Sciences 78

27 Normal Distribution DEF N: : A continuous random variable Y has a NORMAL DISTRIBUTION if its probability density can be written as f (y) = 1 σ Y 2π e-(y-µ Y) 2 2 /2σ Y over < y <. NOTATION: : Y ~ N(µ Y, σ 2 Y ) The mean and variance of a normal dist n are E(Y) = µ Y and E[(Y µ Y ) 2 ] = σ 2 Y. STAT205 Elementary Statistics for the Biological and Life Sciences 79 Normal Dist n Examples The Normal distribution appears in many biological contexts: Ex. 4.1: : Y = serum cholesterol (mg/dli dli) Ex. 4.2: : Y = eggshell thickness (mm) Ex. 4.3: : Y = nerve cell interspike times (ms) STAT205 Elementary Statistics for the Biological and Life Sciences 80 Normal Curve The Normal density curve is (i) continuous over < y < (ii) symmetric about y = µ (iii) unimodal, and hence bell-shaped STAT205 Elementary Statistics for the Biological and Life Sciences 81

28 Figure 4.7 Since each µ,σ 2 pair indexes a different Normal dist n,, this represents a rich family of curves: STAT205 Elementary Statistics for the Biological and Life Sciences 82 Standard Normal DEF N: : The STANDARDIZATION FORMULA for Y ~ N(µ,σ 2 ) is Z = (Y µ)/ )/σ This is often called a Z-score. If Y ~ N(µ,σ 2 ), then Z ~ N(0,1) and we say Z has a STANDARD NORMAL dist n. Std. Normal probab s are tabulated in Table 3 (p. 675) and on text s s inside front cover. STAT205 Elementary Statistics for the Biological and Life Sciences 83 (Portion of) Table 3, p.675 STAT205 Elementary Statistics for the Biological and Life Sciences 84

29 P(Z z) Example: : (p. 124) Suppose Z ~ N(0,1). Find P{Z 1.53}. In Table 3: Hint: always draw the picture STAT205 Elementary Statistics for the Biological and Life Sciences 85 P(a < Z b) If Z ~ N(0,1), and we find P{Z 1.53} = 0.937, notice then that P{Z > 1.53} = = Example: : (p. 125) Suppose Z ~ N(0,1); then P{ 1.20 < Z 0.80} = P{Z 0.80} P{Z 1.20} = = (See Fig. 4.11) Can also find Std. Normal probabilities using DoStat s Normal dist n calculator! STAT205 Elementary Statistics for the Biological and Life Sciences 86 Empirical Rule, revisited If Z ~ N(0,1), it mimics the empirical rule very closely: The same effect holds for any Y ~ N(µ,σ 2 ). STAT205 Elementary Statistics for the Biological and Life Sciences 87

30 Example 4.5 Ex. 4.5: : Y = length of herrings (mm). Suppose Y ~ N(54, 20.25). Then we know Z = Y = Y ~ N(0,1) (a) What % of fish are less than 60 mm long? P[Y < 60] = P Y = P Z < = < = P[Z < 1.33] STAT205 Elementary Statistics for the Biological and Life Sciences 88 Example 4.5 (cont d) Y = length of herrings ~ N(54, 20.25). (c) What % of fish are between 51 and 60 mm long? P[51 < Y < 60] = P < Y = P < Z < < = P[-.67 < Z < 1.33] = P[Z 1.33] - P[Z < -.67] = = STAT205 Elementary Statistics for the Biological and Life Sciences 89 Std. Normal Tail Areas We can also INVERT the std. Normal table (Table 3): Z ~ N(0,1), so find P{Z < 1.96} = Then we know P{Z > 1.96} = = So, 2.5% of std. normal popl n exceeds STAT205 Elementary Statistics for the Biological and Life Sciences 90

31 z α More generally, if we find some number z α such that P{Z z α } = 1 α,, we know P{Z > z α } = α and vice versa: STAT205 Elementary Statistics for the Biological and Life Sciences 91 Std. Normal Critical Point DEF N: : The UPPER- α CRITICAL POINT from Z ~ N(0,1) is the value z α such that P{Z > z α } = α. Find z α by: carefully inverting Table 3 reading off the bottom row (df( = ) ) of Table 4 (p. 677) using DoStat s Normal dist n calculator STAT205 Elementary Statistics for the Biological and Life Sciences 92 Percentiles DEF N: : The point of a distribution below which p% % lies is the pth PERCENTILE of the dist n. If Z ~ N(0,1), z α is the (1 α)th percentile of Z. We often ask what value is the pth percentile of a biological population (see Ex. 4.6). STAT205 Elementary Statistics for the Biological and Life Sciences 93

32 Example 4.6 STAT205 Elementary Statistics for the Biological and Life Sciences 94 Example 4.6 (cont d) We want to find y* such that P{Y < y*} = This is P Y - 54 y* - 54 y* - 54 < = P Z < Now, from Table 3 we find P{Z < 0.52} = is close to This tells us to equate (approximately) 0.52 and (y* 54)/4.5 y* 54 (0.52)(4.5) y* (0.52)(4.5) + 54 = STAT205 Elementary Statistics for the Biological and Life Sciences 95 Example 4.6 (conclusion) So, we find that approximately 70% (69.85%, exactly) of herring are less than mm long. Notice also that we derived the critical point z (More precisely, we found z = 0.52.) Using DoStat,, we can find z 0.30 = : this yields the exact value y* = (0.5244)(4.5) + 54 = for Example 4.6. STAT205 Elementary Statistics for the Biological and Life Sciences 96

33 Assessing Normality Since many statistical procedures are based on having data from a normal population, we need ways to access whether it is a reasonable to use a normal model. We have shown that a histogram can be distorted by the selection of group size (binwidth)) so we will consider a statistical graph called a normal probability plot or QQ plot. STAT205 Elementary Statistics for the Biological and Life Sciences 97 QQ Plots A QQ Plot can be used to assess normality of the data. A QQ Plot is a scatter plot of the ordered pairs for the normal score (x) vs. data value (y) for all values in a data set. If the plot of data points show a linear pattern we can infer that the data values follow a normal distribution. STAT205 Elementary Statistics for the Biological and Life Sciences 98 Example The heights in inches of 11 women are listed below. Check the assumption that the data is distributed normally STAT205 Elementary Statistics for the Biological and Life Sciences 99

34 Normal Probability Plot of the Height Data STAT205 Elementary Statistics for the Biological and Life Sciences 100 Example Measurements made for 62 mammals. Reference: Sleep in Mammals: Ecological and Constitutional Correlates, by Allison, T. and Cicchetti,, D. (1976), Science, November 12, vol. 194, pp Variable: Brain Weight (g) STAT205 Elementary Statistics for the Biological and Life Sciences 101 Normal Probability Plot of Brain Weight (g) STAT205 Elementary Statistics for the Biological and Life Sciences 102

35 Normal Probability Plot of log(brainweight(g)) STAT205 Elementary Statistics for the Biological and Life Sciences 103 Chapter 5: Sampling Distributions Selected tables and figures from Samuels, M. L., and Witmer,, J. A., Statistics for the Life Sciences,, 3rd Ed. 2003, Prentice Hall, Upper Saddle River, NJ. Used by per- mission. STAT205 Elementary Statistics for the Biological and Life Sciences 104 Sampling Variability Question: If Y is random, say Y ~ N(µ,σ 2 ), and we take a random sample, Y 1,Y 2,,Y n, aren t t the Y i s s also random? And, if the Y i s s are random, aren t t any statistics based on them, such as Y or S 2? This is known as SAMPLING VARIABILITY. STAT205 Elementary Statistics for the Biological and Life Sciences 105

36 Sampling Distributions The fact that a sample statistic may itself have a probab. dist n is called the SAMPLING DISTRIBUTION of the statistic. Think of it as repeatedly taking a new sample from the same popl n and finding each sample mean, ad infinitum. What will the probab.. histogram/density function of the sample mean look like? The textbook calls this a Meta-Experiment Experiment. STAT205 Elementary Statistics for the Biological and Life Sciences 106 Binary Data Recall that for Y ~ Bin(n,p) we can estimate p if it is unknown using the SAMPLE PROPORTION: p = Y n Since Y is random, so is this statistic. What is the sampling dist n of p? STAT205 Elementary Statistics for the Biological and Life Sciences 107 Example 5.4 Ex. 5.4: : Y = # of people with 20/15 vision ( superior ). Say n = 2. We are given P{superior} = 0.3. Let p = Y/n. What are its possible values? Clearly, Y = 0, 1, or 2. Thus, e.g., P p = 1 2 = P[Y = 1] = 2 C 1 (.3) 1 (.7) 1 = (2)(.3)(.7) =.42 STAT205 Elementary Statistics for the Biological and Life Sciences 108

37 Example 5.4 (cont d) Sampling dist n of p : j p 0 1/2 1 P(Y = j) P(p = j 2 ) STAT205 Elementary Statistics for the Biological and Life Sciences 109 Large-Sample Dist n Example 5.4 gives the sampling dist n at n = 2. The effort gets harder as n increases. (Try it at n = 10.) Fig. 5.5 shows the effect at larger n: STAT205 Elementary Statistics for the Biological and Life Sciences 110 Continuous Data DEF N: : Given a random sample, Y 1,Y 2,,Y n, where E(Y i ) = µ and E[(Y i µ) 2 ] = σ 2, then (i) the POPL'N MEAN of Y is E(Y) = µ (ii) the POPL'N VARIANCE of Y is σ Y 2 = σ2 n (iii) the POPL'N SD of Y is σ Y = σ n Notice: same popl n mean, while SD as n. STAT205 Elementary Statistics for the Biological and Life Sciences 111

38 Distribution of the Sample Mean If Y i ~ i.i.d. N(µ, σ 2 ) for i = 1,,n,,n, then Y ~ N(µ, σ2 n ) Once again: Same mean SD as n So, more precision as as n STAT205 Elementary Statistics for the Biological and Life Sciences 112 Example 5.9 Ex. 5.9: : Y = weight of seeds ~ N(500,14400). Suppose n = 4. Since Y is normal,, so is the sample mean: Y ~ N(500, ) = N(500,3600) 4 And so, Z = Y = Y ~ N(0,1) 60 STAT205 Elementary Statistics for the Biological and Life Sciences 113 So, e.g., Example 5.9 (cont d) P[Y > 550] = P Y = P Z > = 1 - P[Z < 0.83] = = > = P[Z > 0.83] STAT205 Elementary Statistics for the Biological and Life Sciences 114

39 CLT Theorem: : The CENTRAL LIMIT THEOREM states that for any i.i.d.. random sample, Y 1,Y 2,,Y n, where E(Y i ) = µ and E[(Y i µ) 2 ] = σ 2, Y N(µ, σ2 n ) as n. This is approximately true for any finite n, and the approximation improves as n. (A powerful tool!) STAT205 Elementary Statistics for the Biological and Life Sciences 115 CLT and Sample Size Sometimes, the CLT kicks in after only a few observations ( ( small n). But, sometimes we need a very large n: STAT205 Elementary Statistics for the Biological and Life Sciences 116 Example 5.13 Ex. 5.13: : Y = # eye facets in fruit fly. Clearly Y is a count and can t t be exactly normal (see the idealized plot in Fig. 5.13). But, by about n = 32 we re close to normal: STAT205 Elementary Statistics for the Biological and Life Sciences 117

40 Unbiased Estimation Parameters such as µ or p are usually unknown, and we use the sample data to estimate them. DEF N: : If an estimator θ of an unknown parameter θ has the property that E θ = θ we say it is an UNBIASED ESTIMATOR. (A BIASED estimator is not unbiased.) For instance, we know E(Y) = µ, so Y is unbiased for µ. STAT205 Elementary Statistics for the Biological and Life Sciences 118 Standard Error DEF N: : The STANDARD ERROR of a point estimator is the estimated SD (the square root of the variance) of the estimator: SE θ = DEF N: : The STANDARD ERROR OF THE MEAN (SEM)) is the estimated SD of the sample mean: SE(Y) = σ Y 2 Variance θ n = S Y n = S Y n 2 STAT205 Elementary Statistics for the Biological and Life Sciences 119 Examples Ex : : Y = stem length of soybean plants (cm). n = 13: We find so Y = cm and S 2 = SE(Y) = = 1.22 = cm 13 STAT205 Elementary Statistics for the Biological and Life Sciences 120

41 SE vs. SD DO NOT confuse the SE with the SD! In Ex. 6.2, the SD of the sample was S = = 1.22, but the SEM was 1.22/ = (Usually, we round SEM to 2 signif.. digits.) Notice here again that as n, SEM more precision in larger samples. STAT205 Elementary Statistics for the Biological and Life Sciences 121

Description of Samples and Populations

Description of Samples and Populations Description of Samples and Populations Random Variables Data are generated by some underlying random process or phenomenon. Any datum (data point) represents the outcome of a random variable. We represent

More information

Probability P{E} Example Consider. Find P{HH}. simultaneously. = # ways E occurs

Probability P{E} Example Consider. Find P{HH}. simultaneously. = # ways E occurs Probability and the Binomial Distribution Definition: A probability is the chance of some event, E, occurring in a specified manner. NOTATION: P{E} We can view probabilitie es from a Relative Frequency

More information

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables Chapter 2 Class Notes Sample & Population Descriptions Classifying variables Random Variables (RVs) are discrete quantitative continuous nominal qualitative ordinal Notation and Definitions: a Sample is

More information

Chapter2 Description of samples and populations. 2.1 Introduction.

Chapter2 Description of samples and populations. 2.1 Introduction. Chapter2 Description of samples and populations. 2.1 Introduction. Statistics=science of analyzing data. Information collected (data) is gathered in terms of variables (characteristics of a subject that

More information

Learning Objectives for Stat 225

Learning Objectives for Stat 225 Learning Objectives for Stat 225 08/20/12 Introduction to Probability: Get some general ideas about probability, and learn how to use sample space to compute the probability of a specific event. Set Theory:

More information

STAT 200 Chapter 1 Looking at Data - Distributions

STAT 200 Chapter 1 Looking at Data - Distributions STAT 200 Chapter 1 Looking at Data - Distributions What is Statistics? Statistics is a science that involves the design of studies, data collection, summarizing and analyzing the data, interpreting the

More information

Chapter 2: Tools for Exploring Univariate Data

Chapter 2: Tools for Exploring Univariate Data Stats 11 (Fall 2004) Lecture Note Introduction to Statistical Methods for Business and Economics Instructor: Hongquan Xu Chapter 2: Tools for Exploring Univariate Data Section 2.1: Introduction What is

More information

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved. 1-1 Chapter 1 Sampling and Descriptive Statistics 1-2 Why Statistics? Deal with uncertainty in repeated scientific measurements Draw conclusions from data Design valid experiments and draw reliable conclusions

More information

Chapter 4. Displaying and Summarizing. Quantitative Data

Chapter 4. Displaying and Summarizing. Quantitative Data STAT 141 Introduction to Statistics Chapter 4 Displaying and Summarizing Quantitative Data Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 31 4.1 Histograms 1 We divide the range

More information

Units. Exploratory Data Analysis. Variables. Student Data

Units. Exploratory Data Analysis. Variables. Student Data Units Exploratory Data Analysis Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison Statistics 371 13th September 2005 A unit is an object that can be measured, such as

More information

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. Statistics is a field of study concerned with the data collection,

More information

University of Jordan Fall 2009/2010 Department of Mathematics

University of Jordan Fall 2009/2010 Department of Mathematics handouts Part 1 (Chapter 1 - Chapter 5) University of Jordan Fall 009/010 Department of Mathematics Chapter 1 Introduction to Introduction; Some Basic Concepts Statistics is a science related to making

More information

Elementary Statistics

Elementary Statistics Elementary Statistics Q: What is data? Q: What does the data look like? Q: What conclusions can we draw from the data? Q: Where is the middle of the data? Q: Why is the spread of the data important? Q:

More information

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES INTRODUCTION TO APPLIED STATISTICS NOTES PART - DATA CHAPTER LOOKING AT DATA - DISTRIBUTIONS Individuals objects described by a set of data (people, animals, things) - all the data for one individual make

More information

are the objects described by a set of data. They may be people, animals or things.

are the objects described by a set of data. They may be people, animals or things. ( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms

More information

STAT Chapter 3: Probability

STAT Chapter 3: Probability Basic Definitions STAT 515 --- Chapter 3: Probability Experiment: A process which leads to a single outcome (called a sample point) that cannot be predicted with certainty. Sample Space (of an experiment):

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected What is statistics? Statistics is the science of: Collecting information Organizing and summarizing the information collected Analyzing the information collected in order to draw conclusions Two types

More information

STATISTICS 1 REVISION NOTES

STATISTICS 1 REVISION NOTES STATISTICS 1 REVISION NOTES Statistical Model Representing and summarising Sample Data Key words: Quantitative Data This is data in NUMERICAL FORM such as shoe size, height etc. Qualitative Data This is

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

Descriptive Data Summarization

Descriptive Data Summarization Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning

More information

MATH 1150 Chapter 2 Notation and Terminology

MATH 1150 Chapter 2 Notation and Terminology MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the

More information

Statistics 1. Edexcel Notes S1. Mathematical Model. A mathematical model is a simplification of a real world problem.

Statistics 1. Edexcel Notes S1. Mathematical Model. A mathematical model is a simplification of a real world problem. Statistics 1 Mathematical Model A mathematical model is a simplification of a real world problem. 1. A real world problem is observed. 2. A mathematical model is thought up. 3. The model is used to make

More information

Lecture Notes 2: Variables and graphics

Lecture Notes 2: Variables and graphics Highlights: Lecture Notes 2: Variables and graphics Quantitative vs. qualitative variables Continuous vs. discrete and ordinal vs. nominal variables Frequency distributions Pie charts Bar charts Histograms

More information

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics Mathematics Curriculum A. DESCRIPTION This is a full year courses designed to introduce students to the basic elements of statistics and probability. Emphasis is placed on understanding terminology and

More information

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode. Chapter 3 Numerically Summarizing Data Chapter 3.1 Measures of Central Tendency Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode. A1. Mean The

More information

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Summarize with Shape, Center, Spread Displays: Stemplots, Histograms Five Number Summary, Outliers, Boxplots Cengage Learning

More information

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable QUANTITATIVE DATA Recall that quantitative (numeric) data values are numbers where data take numerical values for which it is sensible to find averages, such as height, hourly pay, and pulse rates. UNIVARIATE

More information

3.1 Measure of Center

3.1 Measure of Center 3.1 Measure of Center Calculate the mean for a given data set Find the median, and describe why the median is sometimes preferable to the mean Find the mode of a data set Describe how skewness affects

More information

Sections 2.3 and 2.4

Sections 2.3 and 2.4 1 / 24 Sections 2.3 and 2.4 Note made by: Dr. Timothy Hanson Instructor: Peijie Hou Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences

More information

Math 140 Introductory Statistics

Math 140 Introductory Statistics Math 140 Introductory Statistics Professor Silvia Fernández Chapter 2 Based on the book Statistics in Action by A. Watkins, R. Scheaffer, and G. Cobb. Visualizing Distributions Recall the definition: The

More information

Math 140 Introductory Statistics

Math 140 Introductory Statistics Visualizing Distributions Math 140 Introductory Statistics Professor Silvia Fernández Chapter Based on the book Statistics in Action by A. Watkins, R. Scheaffer, and G. Cobb. Recall the definition: The

More information

Chapter 2 Solutions Page 15 of 28

Chapter 2 Solutions Page 15 of 28 Chapter Solutions Page 15 of 8.50 a. The median is 55. The mean is about 105. b. The median is a more representative average" than the median here. Notice in the stem-and-leaf plot on p.3 of the text that

More information

Chapter 1. Looking at Data

Chapter 1. Looking at Data Chapter 1 Looking at Data Types of variables Looking at Data Be sure that each variable really does measure what you want it to. A poor choice of variables can lead to misleading conclusions!! For example,

More information

3 PROBABILITY TOPICS

3 PROBABILITY TOPICS Chapter 3 Probability Topics 135 3 PROBABILITY TOPICS Figure 3.1 Meteor showers are rare, but the probability of them occurring can be calculated. (credit: Navicore/flickr) Introduction It is often necessary

More information

Chapter 3. Data Description

Chapter 3. Data Description Chapter 3. Data Description Graphical Methods Pie chart It is used to display the percentage of the total number of measurements falling into each of the categories of the variable by partition a circle.

More information

Chapter 01 : What is Statistics?

Chapter 01 : What is Statistics? Chapter 01 : What is Statistics? Feras Awad Data: The information coming from observations, counts, measurements, and responses. Statistics: The science of collecting, organizing, analyzing, and interpreting

More information

Chapter 5. Understanding and Comparing. Distributions

Chapter 5. Understanding and Comparing. Distributions STAT 141 Introduction to Statistics Chapter 5 Understanding and Comparing Distributions Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 27 Boxplots How to create a boxplot? Assume

More information

Last time. Numerical summaries for continuous variables. Center: mean and median. Spread: Standard deviation and inter-quartile range

Last time. Numerical summaries for continuous variables. Center: mean and median. Spread: Standard deviation and inter-quartile range Lecture 4 Last time Numerical summaries for continuous variables Center: mean and median Spread: Standard deviation and inter-quartile range Exploratory graphics Histogram (revisit modes ) Histograms Histogram

More information

Counting principles, including permutations and combinations.

Counting principles, including permutations and combinations. 1 Counting principles, including permutations and combinations. The binomial theorem: expansion of a + b n, n ε N. THE PRODUCT RULE If there are m different ways of performing an operation and for each

More information

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019 Lecture 10: Probability distributions DANIEL WELLER TUESDAY, FEBRUARY 19, 2019 Agenda What is probability? (again) Describing probabilities (distributions) Understanding probabilities (expectation) Partial

More information

Statistics for Managers using Microsoft Excel 6 th Edition

Statistics for Managers using Microsoft Excel 6 th Edition Statistics for Managers using Microsoft Excel 6 th Edition Chapter 3 Numerical Descriptive Measures 3-1 Learning Objectives In this chapter, you learn: To describe the properties of central tendency, variation,

More information

Full file at

Full file at IV SOLUTIONS TO EXERCISES Note: Exercises whose answers are given in the back of the textbook are denoted by the symbol. CHAPTER Description of Samples and Populations Note: Exercises whose answers are

More information

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- # Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series by Mario F. Triola Chapter 3 Statistics for Describing, Exploring, and Comparing Data 3-1 Review and Preview 3-2 Measures

More information

Outline. Probability. Math 143. Department of Mathematics and Statistics Calvin College. Spring 2010

Outline. Probability. Math 143. Department of Mathematics and Statistics Calvin College. Spring 2010 Outline Math 143 Department of Mathematics and Statistics Calvin College Spring 2010 Outline Outline 1 Review Basics Random Variables Mean, Variance and Standard Deviation of Random Variables 2 More Review

More information

AP Statistics Cumulative AP Exam Study Guide

AP Statistics Cumulative AP Exam Study Guide AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics

More information

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career. Introduction to Data and Analysis Wildlife Management is a very quantitative field of study Results from studies will be used throughout this course and throughout your career. Sampling design influences

More information

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes We Make Stats Easy. Chapter 4 Tutorial Length 1 Hour 45 Minutes Tutorials Past Tests Chapter 4 Page 1 Chapter 4 Note The following topics will be covered in this chapter: Measures of central location Measures

More information

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things. (c) Epstein 2013 Chapter 5: Exploring Data Distributions Page 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms Individuals are the objects described by a set of data. These individuals

More information

Lecture 1: Descriptive Statistics

Lecture 1: Descriptive Statistics Lecture 1: Descriptive Statistics MSU-STT-351-Sum 15 (P. Vellaisamy: MSU-STT-351-Sum 15) Probability & Statistics for Engineers 1 / 56 Contents 1 Introduction 2 Branches of Statistics Descriptive Statistics

More information

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Summarize with Shape, Center, Spread Displays: Stemplots, Histograms Five Number Summary, Outliers, Boxplots Mean vs.

More information

Notation: X = random variable; x = particular value; P(X = x) denotes probability that X equals the value x.

Notation: X = random variable; x = particular value; P(X = x) denotes probability that X equals the value x. Ch. 16 Random Variables Def n: A random variable is a numerical measurement of the outcome of a random phenomenon. A discrete random variable is a random variable that assumes separate values. # of people

More information

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart

More information

Types of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511

Types of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511 Topic 2 - Descriptive Statistics STAT 511 Professor Bruce Craig Types of Information Variables classified as Categorical (qualitative) - variable classifies individual into one of several groups or categories

More information

Statistics. Nicodème Paul Faculté de médecine, Université de Strasbourg

Statistics. Nicodème Paul Faculté de médecine, Université de Strasbourg Statistics Nicodème Paul Faculté de médecine, Université de Strasbourg Course logistics Statistics & Experimental plani cation Course website: http://statnipa.appspot.com/ (http://statnipa.appspot.com/)

More information

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Lecture 2 Quantitative variables There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Stemplot (stem-and-leaf plot) Histogram Dot plot Stemplots

More information

dates given in your syllabus.

dates given in your syllabus. Slide 2-1 For exams (MD1, MD2, and Final): You may bring one 8.5 by 11 sheet of paper with formulas and notes written or typed on both sides to each exam. For the rest of the quizzes, you will take your

More information

CHAPTER 1. Introduction

CHAPTER 1. Introduction CHAPTER 1 Introduction Engineers and scientists are constantly exposed to collections of facts, or data. The discipline of statistics provides methods for organizing and summarizing data, and for drawing

More information

1. Exploratory Data Analysis

1. Exploratory Data Analysis 1. Exploratory Data Analysis 1.1 Methods of Displaying Data A visual display aids understanding and can highlight features which may be worth exploring more formally. Displays should have impact and be

More information

Chapter 4a Probability Models

Chapter 4a Probability Models Chapter 4a Probability Models 4a.2 Probability models for a variable with a finite number of values 297 4a.1 Introduction Chapters 2 and 3 are concerned with data description (descriptive statistics) where

More information

Sampling, Frequency Distributions, and Graphs (12.1)

Sampling, Frequency Distributions, and Graphs (12.1) 1 Sampling, Frequency Distributions, and Graphs (1.1) Design: Plan how to obtain the data. What are typical Statistical Methods? Collect the data, which is then subjected to statistical analysis, which

More information

MIDTERM EXAMINATION (Spring 2011) STA301- Statistics and Probability

MIDTERM EXAMINATION (Spring 2011) STA301- Statistics and Probability STA301- Statistics and Probability Solved MCQS From Midterm Papers March 19,2012 MC100401285 Moaaz.pk@gmail.com Mc100401285@gmail.com PSMD01 MIDTERM EXAMINATION (Spring 2011) STA301- Statistics and Probability

More information

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives F78SC2 Notes 2 RJRC Algebra It is useful to use letters to represent numbers. We can use the rules of arithmetic to manipulate the formula and just substitute in the numbers at the end. Example: 100 invested

More information

I - Probability. What is Probability? the chance of an event occuring. 1classical probability. 2empirical probability. 3subjective probability

I - Probability. What is Probability? the chance of an event occuring. 1classical probability. 2empirical probability. 3subjective probability What is Probability? the chance of an event occuring eg 1classical probability 2empirical probability 3subjective probability Section 2 - Probability (1) Probability - Terminology random (probability)

More information

Stat 101 Exam 1 Important Formulas and Concepts 1

Stat 101 Exam 1 Important Formulas and Concepts 1 1 Chapter 1 1.1 Definitions Stat 101 Exam 1 Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2. Categorical/Qualitative

More information

Chapter 3. Measuring data

Chapter 3. Measuring data Chapter 3 Measuring data 1 Measuring data versus presenting data We present data to help us draw meaning from it But pictures of data are subjective They re also not susceptible to rigorous inference Measuring

More information

The probability of an event is viewed as a numerical measure of the chance that the event will occur.

The probability of an event is viewed as a numerical measure of the chance that the event will occur. Chapter 5 This chapter introduces probability to quantify randomness. Section 5.1: How Can Probability Quantify Randomness? The probability of an event is viewed as a numerical measure of the chance that

More information

ECLT 5810 Data Preprocessing. Prof. Wai Lam

ECLT 5810 Data Preprocessing. Prof. Wai Lam ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate

More information

Sets and Set notation. Algebra 2 Unit 8 Notes

Sets and Set notation. Algebra 2 Unit 8 Notes Sets and Set notation Section 11-2 Probability Experimental Probability experimental probability of an event: Theoretical Probability number of time the event occurs P(event) = number of trials Sample

More information

AIM HIGH SCHOOL. Curriculum Map W. 12 Mile Road Farmington Hills, MI (248)

AIM HIGH SCHOOL. Curriculum Map W. 12 Mile Road Farmington Hills, MI (248) AIM HIGH SCHOOL Curriculum Map 2923 W. 12 Mile Road Farmington Hills, MI 48334 (248) 702-6922 www.aimhighschool.com COURSE TITLE: Statistics DESCRIPTION OF COURSE: PREREQUISITES: Algebra 2 Students will

More information

Example 2. Given the data below, complete the chart:

Example 2. Given the data below, complete the chart: Statistics 2035 Quiz 1 Solutions Example 1. 2 64 150 150 2 128 150 2 256 150 8 8 Example 2. Given the data below, complete the chart: 52.4, 68.1, 66.5, 75.0, 60.5, 78.8, 63.5, 48.9, 81.3 n=9 The data is

More information

Practice problems from chapters 2 and 3

Practice problems from chapters 2 and 3 Practice problems from chapters and 3 Question-1. For each of the following variables, indicate whether it is quantitative or qualitative and specify which of the four levels of measurement (nominal, ordinal,

More information

20 Hypothesis Testing, Part I

20 Hypothesis Testing, Part I 20 Hypothesis Testing, Part I Bob has told Alice that the average hourly rate for a lawyer in Virginia is $200 with a standard deviation of $50, but Alice wants to test this claim. If Bob is right, she

More information

P8130: Biostatistical Methods I

P8130: Biostatistical Methods I P8130: Biostatistical Methods I Lecture 2: Descriptive Statistics Cody Chiuzan, PhD Department of Biostatistics Mailman School of Public Health (MSPH) Lecture 1: Recap Intro to Biostatistics Types of Data

More information

IV. The Normal Distribution

IV. The Normal Distribution IV. The Normal Distribution The normal distribution (a.k.a., the Gaussian distribution or bell curve ) is the by far the best known random distribution. It s discovery has had such a far-reaching impact

More information

Sections 3.4 and 3.5

Sections 3.4 and 3.5 Sections 3.4 and 3.5 Shiwen Shen Department of Statistics University of South Carolina Elementary Statistics for the Biological and Life Sciences (STAT 205) Continuous variables So far we ve dealt with

More information

Chapter 6. The Standard Deviation as a Ruler and the Normal Model 1 /67

Chapter 6. The Standard Deviation as a Ruler and the Normal Model 1 /67 Chapter 6 The Standard Deviation as a Ruler and the Normal Model 1 /67 Homework Read Chpt 6 Complete Reading Notes Do P129 1, 3, 5, 7, 15, 17, 23, 27, 29, 31, 37, 39, 43 2 /67 Objective Students calculate

More information

2011 Pearson Education, Inc

2011 Pearson Education, Inc Statistics for Business and Economics Chapter 3 Probability Contents 1. Events, Sample Spaces, and Probability 2. Unions and Intersections 3. Complementary Events 4. The Additive Rule and Mutually Exclusive

More information

Probability and Independence Terri Bittner, Ph.D.

Probability and Independence Terri Bittner, Ph.D. Probability and Independence Terri Bittner, Ph.D. The concept of independence is often confusing for students. This brief paper will cover the basics, and will explain the difference between independent

More information

STA Module 4 Probability Concepts. Rev.F08 1

STA Module 4 Probability Concepts. Rev.F08 1 STA 2023 Module 4 Probability Concepts Rev.F08 1 Learning Objectives Upon completing this module, you should be able to: 1. Compute probabilities for experiments having equally likely outcomes. 2. Interpret

More information

MATH 10 INTRODUCTORY STATISTICS

MATH 10 INTRODUCTORY STATISTICS MATH 10 INTRODUCTORY STATISTICS Tommy Khoo Your friendly neighbourhood graduate student. Week 1 Chapter 1 Introduction What is Statistics? Why do you need to know Statistics? Technical lingo and concepts:

More information

Descriptive Univariate Statistics and Bivariate Correlation

Descriptive Univariate Statistics and Bivariate Correlation ESC 100 Exploring Engineering Descriptive Univariate Statistics and Bivariate Correlation Instructor: Sudhir Khetan, Ph.D. Wednesday/Friday, October 17/19, 2012 The Central Dogma of Statistics used to

More information

Section 1.1. Data - Collections of observations (such as measurements, genders, survey responses, etc.)

Section 1.1. Data - Collections of observations (such as measurements, genders, survey responses, etc.) Section 1.1 Statistics - The science of planning studies and experiments, obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions based on the data.

More information

Business Statistics. Lecture 3: Random Variables and the Normal Distribution

Business Statistics. Lecture 3: Random Variables and the Normal Distribution Business Statistics Lecture 3: Random Variables and the Normal Distribution 1 Goals for this Lecture A little bit of probability Random variables The normal distribution 2 Probability vs. Statistics Probability:

More information

Probability and Probability Distributions. Dr. Mohammed Alahmed

Probability and Probability Distributions. Dr. Mohammed Alahmed Probability and Probability Distributions 1 Probability and Probability Distributions Usually we want to do more with data than just describing them! We might want to test certain specific inferences about

More information

AP Final Review II Exploring Data (20% 30%)

AP Final Review II Exploring Data (20% 30%) AP Final Review II Exploring Data (20% 30%) Quantitative vs Categorical Variables Quantitative variables are numerical values for which arithmetic operations such as means make sense. It is usually a measure

More information

REVIEW: Midterm Exam. Spring 2012

REVIEW: Midterm Exam. Spring 2012 REVIEW: Midterm Exam Spring 2012 Introduction Important Definitions: - Data - Statistics - A Population - A census - A sample Types of Data Parameter (Describing a characteristic of the Population) Statistic

More information

Stat Lecture Slides Exploring Numerical Data. Yibi Huang Department of Statistics University of Chicago

Stat Lecture Slides Exploring Numerical Data. Yibi Huang Department of Statistics University of Chicago Stat 22000 Lecture Slides Exploring Numerical Data Yibi Huang Department of Statistics University of Chicago Outline In this slide, we cover mostly Section 1.2 & 1.6 in the text. Data and Types of Variables

More information

Probability and Conditional Probability

Probability and Conditional Probability Probability and Conditional Probability Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison September 27 29, 2011 Probability 1 / 33 Parasitic Fish Case Study Example 9.3

More information

Histograms allow a visual interpretation

Histograms allow a visual interpretation Chapter 4: Displaying and Summarizing i Quantitative Data s allow a visual interpretation of quantitative (numerical) data by indicating the number of data points that lie within a range of values, called

More information

IV. The Normal Distribution

IV. The Normal Distribution IV. The Normal Distribution The normal distribution (a.k.a., a the Gaussian distribution or bell curve ) is the by far the best known random distribution. It s discovery has had such a far-reaching impact

More information

Special distributions

Special distributions Special distributions August 22, 2017 STAT 101 Class 4 Slide 1 Outline of Topics 1 Motivation 2 Bernoulli and binomial 3 Poisson 4 Uniform 5 Exponential 6 Normal STAT 101 Class 4 Slide 2 What distributions

More information

Exam 1 Review (Notes 1-8)

Exam 1 Review (Notes 1-8) 1 / 17 Exam 1 Review (Notes 1-8) Shiwen Shen Department of Statistics University of South Carolina Elementary Statistics for the Biological and Life Sciences (STAT 205) Basic Concepts 2 / 17 Type of studies:

More information

Fourier and Stats / Astro Stats and Measurement : Stats Notes

Fourier and Stats / Astro Stats and Measurement : Stats Notes Fourier and Stats / Astro Stats and Measurement : Stats Notes Andy Lawrence, University of Edinburgh Autumn 2013 1 Probabilities, distributions, and errors Laplace once said Probability theory is nothing

More information

Sociology 6Z03 Topic 10: Probability (Part I)

Sociology 6Z03 Topic 10: Probability (Part I) Sociology 6Z03 Topic 10: Probability (Part I) John Fox McMaster University Fall 2014 John Fox (McMaster University) Soc 6Z03: Probability I Fall 2014 1 / 29 Outline: Probability (Part I) Introduction Probability

More information

Descriptive Statistics Methods of organizing and summarizing any data/information.

Descriptive Statistics Methods of organizing and summarizing any data/information. Introductory Statistics, 10 th ed. by Neil A. Weiss Chapter 1 The Nature of Statistics 1.1 Statistics Basics There are lies, damn lies, and statistics - Mark Twain Descriptive Statistics Methods of organizing

More information

The empirical ( ) rule

The empirical ( ) rule The empirical (68-95-99.7) rule With a bell shaped distribution, about 68% of the data fall within a distance of 1 standard deviation from the mean. 95% fall within 2 standard deviations of the mean. 99.7%

More information

3.2 Intoduction to probability 3.3 Probability rules. Sections 3.2 and 3.3. Elementary Statistics for the Biological and Life Sciences (Stat 205)

3.2 Intoduction to probability 3.3 Probability rules. Sections 3.2 and 3.3. Elementary Statistics for the Biological and Life Sciences (Stat 205) 3.2 Intoduction to probability Sections 3.2 and 3.3 Elementary Statistics for the Biological and Life Sciences (Stat 205) 1 / 47 Probability 3.2 Intoduction to probability The probability of an event E

More information

Preliminary Statistics course. Lecture 1: Descriptive Statistics

Preliminary Statistics course. Lecture 1: Descriptive Statistics Preliminary Statistics course Lecture 1: Descriptive Statistics Rory Macqueen (rm43@soas.ac.uk), September 2015 Organisational Sessions: 16-21 Sep. 10.00-13.00, V111 22-23 Sep. 15.00-18.00, V111 24 Sep.

More information

Summarizing and Displaying Measurement Data/Understanding and Comparing Distributions

Summarizing and Displaying Measurement Data/Understanding and Comparing Distributions Summarizing and Displaying Measurement Data/Understanding and Comparing Distributions Histograms, Mean, Median, Five-Number Summary and Boxplots, Standard Deviation Thought Questions 1. If you were to

More information