Lecture Mai Topics: Defiitios: Statistics, Populatio, Sample, Radom Sample, Statistical Iferece Type of Data Scales of Measuremet Describig Data with Numbers Describig Data Graphically. Defiitios. Example (uemploymet): Suppose we wat to kow the uemploymet rate i the coutry. This is the umber of uemployed people divided by the umber of people i the labor force. This is estimated by radomly selectig ad surveyig approximately 6000 adults. The uemploymet rate for these 6000 adults is used to estimate the uemploymet rate for the whole coutry. Statistics: A sciece of iformatio. Populatio: The populatio is the collectio of all subjects we re iterested i studyig. Sample: The sample is a subset of populatio. Radom Sample: If a sample is selected radomly, that is, every subject i the populatio has the same chace to be chose, the the sample is called radom sample. Statistical iferece: Drawig coclusios about the populatio based o the sample.. Type of Data.
() Why worry about the type of data? Differet methods of aalysis are appropriate for differet types of data. () Types of data There are two mai types of data: Qualitative (Categorical) data: It coveys a quality. Examples of qualitative data: Occupatio Geder Studet s major Political affiliatio Quatitative (Numerical) data: It coveys a quatity. Examples of quatitative data: Icome i dollars Number of employees i a compay Commutig distace i miles (3) Cautio Qualitative data ca cosist of umbers. For example, we might code me as ad wome as 0, or we may code quality of product, for excellet, for good, 0 for defective. But, of course, computig thigs like meas does t make sese. 3. Scales of Measuremet. There are four geerally used scales of measuremet. From weakest to strogest, they are Nomial Scale: Ordial Scale: Iterval Scale: Ratio Scale:
4. Numerical Summaries of Data Summarizig a data set umerically ad graphically is very importat. Numerical summaries we ll lear about iclude: percetiles, mea, stadard deviatio. (). Percetiles Objective: Percetiles are maily used to describe the distributio of quatitative data. What is percetile? The P-th (0 P 00) percetile of a group of umbers is that value below which lie P% of the umbers i the group. Algorithm to fid the percetile. First order the data from smallest to largest. Secod, the positio of the P-th percetile is (+)P/00, where is the umber of observatios i the data set. If the positio is a whole umber, the the P-th percetile is the umber i that locatio; if the positio is ot a whole umber, take the weighted average of the two umbers surroudig the positio: let f be the fractioal part of the locatio ad let i be the greatest iteger less tha the positio. Let a be the umber at positio i ad let b be the umber at positio i+. The the P-th percetile is ( -f ) a + f b Example (Example -): A large departmet store collects data o sales made by each of its salespeople. The data, umber of sales made o a give day by each of 0 salespeople, are as follows: 9, 6,, 0, 3, 5, 6, 4, 4, 6, 7, 6, 4,,, 8, 9, 8, 0, 7 Fid the 50 th, 80 th, ad 90 th percetiles of this data set.
Commets: There are may differet algorithms for computig percetiles. The algorithm that we re usig is arguably ot the best. The differece i the algorithms fade as gets large, so we ll stick with the text algorithm. The 50 th percetile is also called the media. The 5 th percetile is also called the first quartile. The 75 th percetile is also called the third quartile. (). Mea Objective: Measure the cetral tedecy of the data set. Let x, x,, be the observatios i the data set, the mea of this data set is their K x average. More specific, x = x i i= Example: Calculate the mea of the observatios of Example -. Commets: Mea is ot resistat to the outliers; media is resistat to the outliers. To fully describe the data set, mea is ot eough Example: Two statistics classes take a exam. The first class has scores of 73, 74, 75, 76, 77; The secod class has scores of 50, 60, 75, 90, 00. Both classes have a mea score 70. But there is a big differece (the secod class scores are more variable ) that is ot reflected i the meas. (3). Rage, Iterquartile rage, variace ad stadard deviatio. Objective: Measure the variability of the data set. Measures of variability. Rage: Rage = Maximum - Miimum;
Iterquartile Rage: IQR = Third Quartile - First Quartile; Variace: s = i= ( x i x) = i= x i x i= i Stadard Deviatio: s = s Empirical Rule: For symmetric ad bell-shaped data.. About 68% of data withi oe stdev. of mea.. About 95% of data withi two stdevs. of mea. 3. About 99.7% of data withi three stdevs. of mea. Chebyshev s Rule: For ay data set.. At least 3/4 (75%) of data withi two stdevs. of mea.. At least 8/9 (89%) of data withi three stdevs. of mea. 3. I geeral, at least - /k of the data withi k stdevs. of mea. 4. Does't say aythig about oe stdev.
5. Describig Data Graphically Stem ad Leaf Plots (Applied to small umerical data set). Example: Here are Babe Ruth s home ru totals for the 5 years he played for the Yakees. 54 59 35 4 46 5 47 60 54 46 49 46 4 34 Here is a stem ad leaf plot of these data. Sometimes a back-to-back stem ad leaf plot allows us to quickly compare two data sets. Here is a back-to-back stem ad leaf plot of Babe Ruth s home ru totals ad Mickey Matle s home ru totals. Histogram (Applied to ay size umerical data set). How to create histogram?. Fid a lower boud, a, ad a upper boud, b, of the data set.. Divide the iterval [a, b] ito small subitervals (classes). Obviously, the legth of each iterval is (b-a)/. 3. Cout how may observatios fall ito each subiterval. (The cout is called frequecy) 4. Calculate the relative frequecy i each subiterval.
5. Costruct a x-y coordiate system. Put subitervals o x-axis, y-axis represets the relative frequecy. Over each iterval draw a bar with height beig equal to the frequecy, relative frequecy, or desity which is defied by: Desity = Relative Frequecy / Legth of the subiterval. Example (Mercury i lakes): Data were collected o mercury cocetratios (parts per millio) i 5 Florida lakes. Some of the data are.3, 7.00, 6.00, 0.44. Here are the data divided ito classes. Classes Number of Lakes 0 to 0 to 4 0 4 to 6 6 to 8 4 8 to 0 0 to 0 to 4 3 4 to 6 Here is a histogram of the mercury data. 0.0 Histogram of Mercury Level 0.5 Desity 0.0 0.05 0.00 0 4 6 8 0 Mercury Level 4 6 What we lear from the above histogram? No-symmetric shape May lakes with low mercury level, may lakes with high level, few i the middle. Levels are all betwee 0 ad 6 ppm.
Commets: Differet choices for the umber of subitervals lead to differet lookig histograms. Edpoits. - Q: Should 6.00 go ito the class 6 to 8 or the class 4 to 6? - A: Just be cosistet; if it goes ito 6 to 8, the 0.00 should go ito 0 to. This histogram is draw usig a desity scale i the y-axis. Sometimes, a frequecy or a relative frequecy scale is used. The shape is the same o matter what the scale. What we should look for from a histogram. Symmetric Symmetric ad bell-shaped Skewed to the right Skewed to the left Short tailed Log tailed Uimodal or Multimodal. Etc. Effect of shape o mea ad media. The mea gets pulled i the directio of the skewedess. For right-skewed data, the mea is greater tha the media. For the left-skewed data, the mea is less tha the media. Box plots. How to draw a box plot? Box exteds from first quartile to third quartile. Lie draw at media Whiskers exted from the upper quartile ad lower quartile to the largest ad smallest observatios withi a distace of.5*iqr. Poits outside this rage are called outliers ad are plotted separately. Example. Here are data o stadardized readig scores of 5 th graders. 48 67 73 8 83 86 9 93 94 94 94 95 97 98 98 99 00 0 0 0 03 05 06 07 08 5 7 3 34 49 Draw a box plot for this data set.
Bar Charts ad Pie Charts Example: The followig is the frequecy table of the racial compositio of Igham Couty, accordig to the 000 cesus. Note that the relative frequecy of a category is just the proportio of the data that are i that category. Race Frequecy Relative Frequecy White 935 0.975 Black or Afr. Am. 30340 0.09 Am. Idia or Alaska Native 58 0.005 Asia 073 0.037 Native Hawaiia etc. 43 0.00 Other 6746 0.04 Two or more races 8355 0.09 Total 7930.0 Followig are a pie chart ad a bar chart of the data. Pie Chart of Race White Amid Asia Black Hawaiia other TwoOrMore 50000 Chart of Race 00000 50000 Cout 00000 50000 0 Amid Asia Black Hawaiia Race Other TwoOrMore White