CHAPTER SUMMARIES MAT102 Dr J Lubowsky Page 1 of 13 Chapter 1: Introduction to Statistics

CHAPTER SUMMARIES MAT102 Dr J Lubowsky Page 1 of 13 Chapter 1: Itroductio to Statistics Misleadig Iformatio: Surveys ad advertisig claims ca be biased by urepresetative samples, biased questios, iappropriate comparisos ad errors i data. Graphs ca be misleadig through the use of broke scales, pictographs ad errors. Be aware of the motives of who is presetig the data. Statistics: The collectio, aalysis ad iterpretatio of data Populatio: The total collectio of idividuals or objects uder cosideratio Parameter: A umber that describes a characteristic of a populatio Sample: The portio of the populatio selected for study Statistic: A umber that describes a characteristic of a sample Descriptive Statistics: The use of umerical ad/or visual techiques to summarize data Iferetial Statistic: Draws a coclusio about the populatio from the sample Represetative Sample: A sample that has the pertiet characteristics of the populatio i the same proportio as the populatio Chapter 2: Orgaizig ad Presetig Data Variable: Cotais iformatio or a property of a object, perso or thig Categorical Data: Data which falls ito categories Numerical Data: Data values (umbers) which are the result of measuremets (Numerical) Cotiuous Data: Data which ca take o ay value betwee two umbers (Numerical) Discrete Data: Data which ca take o oly certai values Distributio: A presetatio of data alog with the umber of times each data value occurs. Four Levels of Measuremet Nomial (Qualitative Data): Names or categories Ordial (Qualitative or Quatitative): A category which has a iheret order Iterval (Quatitative): A category i which oly the value of the differece betwee the umbers has a meaig. Ratio (Quatitative): A category i which both the iterval ad the ratio have meaig.

CHAPTER SUMMARIES MAT102 Dr J Lubowsky Page 2 of 13 Stem ad Leaf Plot: A display of data i which the rightmost digits of the data are iitially igored ad the umbers represeted by the remaiig digits are listed dow i ascedig order. For example, The values below.would be writte like this 199 19 99 199 20 012 200 21 0 201 22 202 23 113 210 231 231 233 Histogram: A bar graph of data whose vertical coordiate shows how may items of data (frequecy) fall ito each of a series of rages (classes). Bars must touch each other. Height of each class is proportioal to the umber of data items i the class Class Class Class Class Lower class boudary Class width Upper class boudary Class Mark = Middle value of class Costruct a Histogram: largest data value - smallest data value Determie the class width: Class Width = umber of classes (If Class width turs out to be a whole umber icrease it by 1. Else, roud class width up to ext highest whole umber. class frequecy Relative Frequecy = total umber of data values Frequecy Polygo: a straight lie graph coectig the midpoits of the tops of each class i a histogram. The ed poits of the graph fall to touch the x-axis at half the class width outside of the histogram. Usually either the histogram or the frequecy polygo is draw but ot both. Frequecy Polygo Shapes of Distributios: Symmetrical Skewed to Right Skewed to Left

CHAPTER SUMMARIES MAT102 Dr J Lubowsky Page 3 of 13 Chapter 3: Numerical Techiques for Describig Data Sum of all data values µ = Populatio Mea = = Number of Data values x = Sum of all SAMPLE values Sample Mea = = Total umber of all SAMPLE values Addig a costat to each member of a populatio or sample icreases the mea by that value. Multiplyig each member of a populatio or sample by a costat multiplies the mea by that value. Media: The middle value i a sorted list of data values. If the umber of values is a odd umber, the media is the middle value. If the umber of values i eve, the media is the average of the two middle value. Mode: The most frequetly occurrig data value. 2 ( x µ ) = The square root of the average of = Populatio Stadard Deviatio = N the squared deviatios from the mea of the populatio. (It is roughly the average deviatio from the mea.) 2 ( x x) s = Sample Stadard Deviatio = = The square root of the average of the 1 squared deviatios from the mea of the sample. (It is roughly the average deviatio from the mea.) The Empirical Rule States 68% of data values fall withi 1 of µ 95% of data values fall withi 2 of µ 99.7% of data values fall withi 3 of µ The mea, media ad mode are measures of cetral tedecy of the data. The stadard deviatio measures the extet to which data spreads about the mea. x = raw score: the actual data value (e.g. dollars, feet, iches, degrees, etc.) z = z-score: the distace of a data value from the mea i uits of stadard deviatios x -µ To compute oe from the other, z = x = z + µ PR(x) = Percetile Rak of x = Percetage of data values less tha x. B + (1/ 2) E PR = 100 (rouded to the earest whole umber) N Where B is the umber of data values less tha x E is the umber of data values equal to x (icludig x) ad N is the total umber of data values N x x

CHAPTER SUMMARIES MAT102 Dr J Lubowsky Page 4 of 13 Percetile P(x) = Percetile of x = The data value that has x % of the data values below. (Example: Joe s height is 72 which makes him taller tha 89% of his class. The Percetile Rak or 72 = PR(72) = 89 % ad the 89 th Percetile = P(89%) = 72 The miimum is the smallest data value The Q1 poit is the 25th Percetile, that is, 25% of the data values are below it. The media is the 50th Percetile, that is, 50% of the data values are below it. The Q3 poit is the 75th Percetile, that is, 75% of the data values are below it. The maximum is the largest data value Q1 Q3 The box ad whisker plot illustrates the above with a figure. Iterquartile Rage = IQR = Q3 Q1 = distace betwee Q3 ad Q1 A outlier is ay data value greater tha 3 IQR greater tha Q3 or 3 IQR less tha Q1 Chapter 4: Liear Regressio ad Correlatio Mi Media Max Give: Two cotiuous variables (e.g. Weight of Car vs Miles/Gallo) Are these variables liearly correlated? Ho: No Liear Correlatio betwee Car Weight ad Miles/Gallo H a : Car Weight ad Miles/Gallo are liearly correlated with ρ > 0, ρ < 0, or ρ 0, Perform a LiRegTTest which will calculate the correlatio coefficiet (r) as well as the values of a ad b i the liear regressio equatio y = a + bx. The liear regressio equatio allows you to predict value of the depedet variable (y, Miles/Gallo) whe you substitute the a value for the idepedet variable, (x, Weight of Car). (You will eed to eter your data values, Car Weights ad associated Miles/Gallo, ito L1 ad L1). The calculator will also retur the p-value. If p < α, reject Ho. The calculator also returs r 2, the Coefficiet of Determiatio, which tells you proportio of the variace explaied by the idepedet variable (essetially how well the regressio equatio predicts the value of the depedet variable from the idepedet variable.) -----------------------------------------------------------------------------------------------------------

CHAPTER SUMMARIES MAT102 Dr J Lubowsky Page 5 of 13 Chapter 5: Probability Sample Space: The total umber of possible outcomes of a experimet. Defiitio: Classical (a priori) Probability : If a experimet or a situatio has a umber () of possible outcomes, all equally likely, the the probability that ay oe outcome will occur is 1/. ( = sample space). The for equally likely evets, umber of ways evet A ca ocurr P( A) = Total umber of outcomes i the samples pace Defiitio: Relative Frequecy (a posteriori) determiatio of probability Perform a experimet may times. The probability of Evet A occurrig is umber of times evet A ocurred P( A) = Total umber of times experimet was repeated A outcome that has a probability of zero ca ever occur. A outcome that has a probability of oe will occur every time. The probability is always a umber betwee 0 ad 1. ( 0 p 1) If oly two outcomes are possible i a experimet, A ad B, the P(B) = 1 P(A) Additio Rule (for Mutually Exclusive Evets): A evet ca be defied i a experimet as gettig ay oe of a umber of outcomes, for example, the evet that I wi ca be defied as gettig a eve umber whe tossig a die (that is, gettig the umber 2, 4 or 6.) For a 6-sided fair die, this probability is P(eve umber) = P(2) + P(4) + P(6) = 1/6 + 1/6 + 1/6 = 3/6 = ½ Additio Rule (for Mutually Exclusive Evets): If two evets, A ad B, are mutually exclusive, the the probability that evet A or B will occur is the sum of their probabilities, P(A or B) = P(A) + P(B). Multiplicatio Rule (for Idepedet Evets): Two evets are idepedet whe the outcome of oe has othig to do with the outcome of aother. For example, i tossig two dice, the outcome of the secod die has othig to do with the outcome of the first die. The evet of gettig two 6s is P(6&6) = P(6) x P(6) = 1/6 x 1/6 = 1/12. Multiplicatio Rule (for Idepedet Evets): If two evets, A ad B are idepedet, the the probability that both A ad B will occur is the product of their probabilities, P(A & B) = P(A) x P(B) Chapter 6: Radom Variables ad Discrete Probability Distributios If we toss a coi 4 times ad we assume 1. that the coi is fair ( P(Head) = ½ ) ad 2. that each flip has o effect o ay other flip (the flips are INDEPENDENT), the the probabilities of gettig 0, 1, 2, 3 or 4 heads is show below as a

CHAPTER SUMMARIES MAT102 Dr J Lubowsky Page 6 of 13 Discrete Probability Distributio.. Probability P( r ) 0.375 0.313 0.250 0.188 0.125 0.063 0 0 1 2 3 4 Number of Heads ( r ) The values of each of the probabilities above was calculated from the formula P( r heads) = 4 Cr(0.5) r (1 0.5) 4 r for r = 0, 1, 2, 3 ad 4 - - - - - - - - - - - - - - - - - - - - I geeral, i a series of ( ) idepedet trials, where the outcome of each trial is cosidered either a success (S) or a failure (F), ad the probability of success i a sigle trial is p, the the probability of havig r successes i trials is, P( r Successes) = Cr ( p) r (1 p) r where ( ) Cr =! ( r!)( - r)! Example: Usig the Additio Rule, the probability of havig less tha 2 Heads i 4 tosses would be calculated as P(0 Heads) + P(1 Head) =.063 +.250 =.313 Chapter 7: Cotiuous Probability Distributios Whe the umber of tosses (trials) becomes large, say 32, ad we wish to kow the probability of gettig say 14 or fewer Heads, it becomes cumbersome to add all the probabilities from 0 to 14, P(0) + P(1) + P(2) + + P(14). Note that the sum from 0 to 14 is actually the area uder the curve from 0 to 14, (show dark grey). =32 Tosses 0.160 0.140 0.120 0.100 0.080 0.060 0.040 0.020 0.000 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32

CHAPTER SUMMARIES MAT102 Dr J Lubowsky Page 7 of 13 Therefore, we approximate the above discrete probability distributio with the ormal curve ad calculate the area uder the ormal curve, which is much easier. =32 Tosses 0.160 0.140 0.120 0.100 0.080 0.060 0.040 0.020 0.000 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 The mea µ ad stadard deviatio of the ormal approximatio to a biomial probability distributio is µ = p ad = p(1 p). However, this approximatio may oly be used if p > 5 ad (1-p) > 5. To fid the area uder the Normal Distributio betwee ay two values, we eter the z- scores of those values ito the calculator. I the above problem to fid the probability of gettig betwee 0 ad 14 Heads i 32 tosses of a fair coi, we calculate µ = p = 32 0.5 = 16 ad = p(1 p) = 32 0.5 (1 0.5) = 2.828. We eed the z-scores of 0 ad 14, but to iclude all the area of the biomial distributio betwee those two values, we eed to use ad extra 0.5 outside that rage. That is, we use the rage from -0.5 to 14.5. 0.5 16 14.5 16 The z-scores are z left = = 5.834 ad z 0.530 2.828 right = = 2.828 The we use ormalcdf( -5.834, -0.530 ) = 0.298 which is the probability that we will get betwee 0 Heads ad 14 Heads whe we toss a fair coi 32 times. The total area uder the ormal curve is 1 (oe). Areas are always expressed as decimals. If the area to the left of a z-score is kow, the ivormal(area) gives the z-score at the right side of the area. Area z 1 z 2 z The area betwee 2 z-scores is ormalcdf (z 1, z 2) Area The area to the left of a z-score is ormalcdf (-E99, z)

CHAPTER SUMMARIES MAT102 Dr J Lubowsky Page 8 of 13 May kids of data ca be described with a ormal distributio, for example, the heights of studets, the IQ s of studets, the lifetimes of tires. The curve at right shows that the heights of studets are ormally distributed aroud a mea of 65 iches with a stadard deviatio of 5 iches. This meas that 34% (or a proportio of 0.34) of the studets have heights betwee 65 ad 70 iches. It also meas 65 70 that the probability is 0.34 that a studet, chose at radom from a populatio of studets, will be betwee 65 ad 70 iches tall. 34 % = 5" 60 iches Chapter 8: The Samplig Distributio of the Mea Notatio: Populatio µ is the mea of a populatio Sample x is the mea of a sample is the stadard deviatio of the populatio N is the umber of items (people or thigs) i the populatio s is the stadard deviatio of the sample is the umber of items (people or thigs) i the sample If we take every possible sample of size from a populatio, ad calculate the mea of each of those samples, the mea of all those sample meas, µ x, will be equal to the mea of the populatio,µ. The stadard deviatio of the distributio of all those sample meas is called the stadard error of the mea ad is equal to the populatio stadard deviatio divided by the square root of the sample size, that is,. I mathematical otatio µ = µ ad X is called the stadard error of the mea. X =. X If the populatio is ormally distributed, the sample meas will be ormally distributed. The Cetral Limit Theorem states that regardless of the shape of the populatio distributio, the samplig distributio of the mea approaches a Normal Distributio as the sample size becomes large. ( Geerally > 30 )

CHAPTER SUMMARIES MAT102 Dr J Lubowsky Page 9 of 13 Proportios Proportio = Fractio of the populatio with a certai characteristic. p = X / N X N ˆp = x / x = the populatio proportio = umber of occurreces of those characteristic members i the populatio = populatio size = the sample proportio = umber of occurreces of those characteristic members i the sample = sample size The samplig distributio of the proportio ca be approximated by a ormal distributio if p > 5 ad (1-p) > 5 The mea of that distributio equals the populatio proportio. The stadard deviatio (stadard error of the proportio) of that distributio is ˆp µ = p = ˆp p(1-p) Samplig Distributio of the Proportio ˆp µ ˆp

CHAPTER SUMMARIES MAT102 Dr J Lubowsky Page 10 of 13 Chapter 9: Cofidece Itervals of Meas ad Proportios Methods of Estimatio Estimate of the Populatio Mea (µ) t lower = -t upper Cofidece Itervals t upper 99% 98% 95% 90% df =(-1) t(99.5%) t(99%) t(97.5%) t(95%) 1 63.657 31.821 12.706 6.314 2 9.925 6.965 4.303 2.920 3 5.841 4.541 3.182 2.353 4 4.604 3.747 2.776 2.132 5 4.032 3.365 2.571 2.015 6 3.707 3.143 2.447 1.943 7 3.499 2.998 2.365 1.895 8 3.355 2.896 2.306 1.860 9 3.250 2.821 2.262 1.833 10 3.169 2.764 2.228 1.812 11 3.106 2.718 2.201 1.796 12 3.055 2.681 2.179 1.782 13 3.012 2.650 2.160 1.771 14 2.977 2.624 2.145 1.761 15 2.947 2.602 2.131 1.753 16 2.921 2.583 2.120 1.746 17 2.898 2.567 2.110 1.740 18 2.878 2.552 2.101 1.734 19 2.861 2.539 2.093 1.729 20 2.845 2.528 2.086 1.725 21 2.831 2.518 2.080 1.721 22 2.819 2.508 2.074 1.717 23 2.807 2.500 2.069 1.714 24 2.797 2.492 2.064 1.711 25 2.787 2.485 2.060 1.708 26 2.779 2.479 2.056 1.706 27 2.771 2.473 2.052 1.703 28 2.763 2.467 2.048 1.701 29 2.756 2.462 2.045 1.699 30 2.750 2.457 2.042 1.697 35 2.724 2.438 2.030 1.690 40 2.704 2.423 2.021 1.684 45 2.690 2.412 2.014 1.679 50 2.678 2.403 2.009 1.676 60 2.660 2.390 2.000 1.671 80 2.639 2.374 1.990 1.664 100 2.626 2.364 1.984 1.660 200 2.601 2.345 1.972 1.653 500 2.586 2.334 1.965 1.648 Normal 2.576 2.326 1.960 1.645 kow ZIterval (stats) : pop std dev : sample size x : sample mea C-level: Cofidece level Cofidece Iterval x zc < µ < x + z c Margi of Error = E = z Estimate of the Populatio Mea (µ) ukow TIterval (stats) s : sample std dev : sample size x : sample mea C-level: Cofidece level Cofidece Iterval s s x tc < µ < x + tc s Margi of Error = E = tc Estimate of the Populatio Proportio (p) pˆ > 5 ad (1 ˆ 1-PropZIt p) > 5 x : o. of cases i sample : sample size ˆp = x/ C-level: Cofidece level Cofidece Iterval pˆ z c(s pˆ ) < p < pˆ + z c(s pˆ ) p( ˆ 1 p) ˆ...where s ˆp = Margi of Error = E = z (s ) c c pˆ

CHAPTER SUMMARIES MAT102 Dr J Lubowsky Page 11 of 13 Chapter 10: Itroductio to Hypothesis Testig Null Hypothesis: Ho Always try to reject this statemet Joe s tires last about 50,000 miles The average college studet works 20 hrs/week The average Whoopie bar cotais 180 calories The uemploymet rate is 5.5% Alterative Hypothesis Ha so that you ca accept this statemet as true. Joe s tires last less tha 50,000 miles The average college studet works > 20 hrs/week The average Whoopie bar does ot cotai 180 calories The uemploymet rate less tha 5.5% Hypothesis Type ` Directioal Directioal No-directioal Directioal Hypothesis Testig Procedure 1. Formulate both hypotheses 2. Determie the model to test the ull hypothesis 3. Formulate the decisio rule 4. Aalyze the sample data. 5. State the coclusio Type I Error: Level of Sigificace, α, p-value, p, The error you make whe you icorrectly reject the ull hypothesis. The probability of makig a type I error that you are willig to accept whe you test. Based o the test statistic, the p-value is the probability that you will reject the ull hypothesis eve though it is true, thus makig a type I error. Diagram Descriptors: Usig the example of a oe-tail test with a Normal distributio 1. Normal or t-distributio curve 2. Mea ad stadard error of the mea z c, critical value, (the z or t-score of the data value beyod which you will reject Ho). 3. α, sigificace (shaded area to the left of zc) 4. z, test statistic, (the z or t-score of the experimetal result). 5. p-value, area uder the curve beyod the test statistic, z. z or t test statistic Reject Ho α Fail toreject Ho mea stadard error zc or tc critical value Distributio of sample meas or sample proportios.

CHAPTER SUMMARIES MAT102 Dr J Lubowsky Page 12 of 13 Chapter 11: Hypothesis Testig Ivolvig Oe Populatio Formulate the ull ad alterative hypotheses. Look for directioal words (e.g. greater, more, less)to determie whether you eed a 1TT or a 2TT? Determie importat iformatio: If the problem gives the populatio stadard deviatio,, the distributio of sample meas is ormal. If the problem gives the sample stadard deviatio, s, the distributio of sample meas is t-distributed. Use α to determie the critical value ( zc or t c ) for rejectio of Ho. Use the data to determie the test statistic, ( z or t ) If the test statistic is further from the mea tha the critical value, REJECT Ho or Usig the Calculator: or Use the test statistic to determie the p-value. If the p-value is less tha α, REJECT Ho Hypothesis test ivolvig a mea For a ormal distributio, perform a Z-Test to determie the p-value. If p < α, reject H o. For a t-distributio perform a T-Test to determie the p-value. If p < α, reject H o. Hypothesis test ivolvig a proportio First check that p>5 ad (1-p)>5 Perform a 1-PropZTest to determie the p-value. If p < α, reject H o. Chapter 13: Hypothesis Test Ivolvig Two Populatio Meas Give: Two populatios ad a sample from each. x 1 ad x 2 Mea of Sample_1 = x 1 Sample Std Dev of Sample_1 = s 1 Mea of Sample_2 = x 2 Sample Std Dev of Sample_2 = s 2 Test whether the meas of the two populatios are differet, that is, ( Test whether we ca reject the ull hypothesis at a α ) Ho: µ 1 µ 2 = 0 H a : µ 1 µ 2 > 0 or H a : µ 1 µ 2 < 0 Perform a 2-SampTTest ad calculate the p-value. If p < α, reject H o. ------------------------------------------------------------------------------------------------------------

CHAPTER SUMMARIES MAT102 Dr J Lubowsky Page 13 of 13 Chapter 14: Chi-Square Chi-Square determies whether there is a depedece betwee two categorical variables. Give: Two variables of categorical data (e.g. ProLife/ProChoice vs. Geder) Are these variables idepedet? (If you reject Ho you ca coclude that the variables are depedet, but you ca ever prove them idepedet.) Pro Life Pro Choice Male A B Female C D where A,B,C ad D are the umbers of people i each category. Ho: H a : Geder ad Choice are idepedet Geder ad Choice are NOT idepedet (i.e. they are depedet) Perform a χ2 -Test ad calculate the p-value. If p < α, reject H o. (You will eed to set up Matrix A ad eter your observed data before doig the test.).. -----------------------------------------------------------------------------------------------------------