Exploring Data. How to Explore Data

Size: px
Start display at page:

Download "Exploring Data. How to Explore Data"

Transcription

1 Eploring Data Statistics is the art and science of learning from data. This may include: Designing appropriate tools to collect data. Organizing data in a meaningful way. o Displaying data with appropriate graphs. o Summarizing data with numbers. Using data to draw conclusions and make predictions. Data are information in contet. Individuals are the objects described by a set of data. They may be people, animals, or things. A variable is any attribute that can take different values for different individuals. A categorical (or qualitative) variable assigns labels that place each individual into a particular group or category. A quantitative variable takes number values that are quantities counts or measurements for which it makes sense to find an average. Not every variable with a number value is quantitative! Eamples: zip codes, ID numbers, grade levels (sometimes) The distribution of a variable shows what values the variable takes and how often it takes each value. Distributions are summarized in tables and displayed in graphs. How to Eplore Data Begin by eamining each variable by itself. Then move on to study relationships among the variables. Start with a graph or graphs. Then add numerical summaries. Eample: The following table shows information about several popular cell phone models. Phone Operating System Screen Size (inches) Internal Storage (GB) Epandable Storage Rear Camera (megapiels) Battery Life (Talk Time) (hours) Apple iphone 6S Plus ios No 1 4 Apple iphone 6s ios No 1 14 Apple iphone 6 ios No 8 14 BlackBerry DTEK 5 Android Yes BlackBerry Priv Android Yes 18 4 BlackBerry Leap BlackBerry Yes 8 5 LG X Skin Android Yes 8 7 LG G5 SE Android Yes 16 LG G5 Android Yes 16 Microsoft Lumia 65 Windows Yes 8 13 Microsoft Lumia 95 Windows Yes 13 Microsoft Lumia 95 XL Windows Yes 19 Samsung Galay Note 7 Android Yes 1 4 Samsung Galay On 7 Pro Android Yes Samsung Galay S7 Edge Android Yes 1 33

2 a) Who/what are the individuals in this data set? Cell phone models b) What variables are measured? Identify each as categorical or quantitative. In what units were the quantitative variables measured? Operating system (categorical), screen size (quantitative inches), amount of internal storage (quantitative GB), whether or not it has epandable memory (categorical), rear camera resolution (quantitative megapiels), battery life (quantitative hours) c) Give the distributions of the following for the data set: screen size, internal storage, and presence of epandable memory. 4.6 Screen Size (inches) Internal Storage (GB) Epandable Memory? Yes No Screen Size (in) Count Internal Storage (GB) 64 Analyzing Categorical Data Epandable Memory? No % Yes 8% The values of a categorical variable are labels for the categories, such as male or female. The distribution of a categorical variable gives the categories and either the count or proportion of individuals who fall into each category. Proportion: The fraction of the total that possesses a certain attribute. Proportions can be epressed as fractions, decimals, or percentages. Frequency: The number (count) of individuals in each category. Relative Frequency: The proportion of individuals in each category. Often, we organize categorical data into either a frequency table or a relative frequency table. (These are sometimes called frequency distributions and relative frequency distributions.) Eample: The following is a frequency table showing the distribution of responses to the question, How do you eat corn on the cob? Find the relative frequency distribution. How do you eat corn on the cob? Frequency Relative Frequency In rows 8 8/ In circles 4 4/41.98 Bite wherever 5 5/41.1 I don t eat corn on the cob /41.49 Cut the corn off the cobb /41.49 Total 41

3 Categorical data is often displayed using bar graphs, pie charts, and segmented pie charts. A bar graph shows each category as a bar. The heights of the bars correspond to the frequencies or relative frequencies of the categories. A pie chart shows each category as a sector or slice of a circle or pie. The areas of the slices are proportional to the category frequencies or relative frequencies. A segmented bar graph displays the distribution of a categorical variable as a single bar divided into segments. The height of each segment corresponds to the proportion of individuals in the category it represents. Segmented bar graphs use relative frequencies on the vertical ais. Bar Graph Procedure: 1. Draw and label the aes. Put the name of the categorical variable under the horizontal ais. To the left of the vertical ais, indicate whether the graph shows the frequency (count) or relative frequency (proportion) of individuals in each category.. Scale the aes. Write the names of the categories at equally spaced intervals under the horizontal ais. On the vertical ais, start at, and place tick marks at equal intervals until you eceed the highest frequency or relative frequency of any category. 3. Draw bars above the category names. Make sure the bars are equal in width and leave gaps between them. The height of each bar should correspond to the frequency or relative frequency of the individuals in that category. Pie Chart Procedure: 1. Draw a circle to represent the entire data set.. Calculate the size of the central angle for each slice : slice size= 36 relative frequency of category 3. Divide the circle into slices with the appropriate central angles. Use a protractor (or computer) to do this. 4. Label the slices appropriately! Eample: Draw a well-labeled bar graph and a well-labeled segmented bar graph of the corn data from the previous eample. Relative Frequency Rows Circles Bite Don't Eat Cut Off Wherever Method of Corn Eating 1% 8% 6% 4% % % Method of Corn Eating Cut Off Don't Eat Bite Wherever Circles Rows

4 Bar graphs can be used in more situations than pie charts and segmented bar graphs! Pie charts and segmented bar graphs can only be used in situations when the data includes all parts of a single whole! o Bar graphs can compare proportions of different groups who share some trait. For eample, what proportions of sophomores, juniors, and seniors approve of Bingham s parking policy? A pie chart or segmented bar graph couldn t show this, because these proportions are parts of the same whole. o Bar graphs can compare proportions in cases where individuals might fall into multiple categories. For eample, what percent of students like pizza, what percent like spaghetti, and what percent like pancakes? Students could easily fall into multiple categories, so the percentages would add up to more than 1%. This data couldn t be displayed on a pie chart or segmented bar graph, but could still be displayed on a bar graph. o Bar graphs can be used in cases where information is missing. For eample, we might know what category some of the individuals fall into, but not others. To display this kind of data in a pie chart or segmented bar graph, it would be necessary to add an other category. Deceptive Graphs: Watch out for graphs in which the width changes in addition to the height. The eye responds to area, so this makes the graph misleading. This happens a lot in pictographs. Watch out for graphs where the aes don t start at zero (and/or are missing).

5 Watch out for unequally-spaced intervals. Watch out for pie charts or segmented bar graphs where the percentages don t add to 1%. This is a tip-off that they don t represent all the parts of a single whole. Watch out for 3D graphs or graphs set at an angle. This distorts the data. Perception of 3D Pie Charts % 3% % 3% Cool Confusing Misleading Unreadable

6 A two-way table (or contingency table) summarizes the relationship between two categorical variables for some group of individuals. The rows represent values of one variable and the columns represent values of the other variable. A marginal relative frequency gives the percent or proportion of individuals that have a specific value for one categorical variable (ignoring the information about the other variable). It is calculated using the information in a margin of the table and dividing by the overall total number of individuals. A marginal distribution gives the marginal relative frequencies for each of the values of a categorical variable. Eample: AP Statistics students were categorized according to their gender and how they like their bacon cooked. The results are given below. Calculate the marginal distribution of bacon preferences. Draw a graph of the results. Describe what you see. Gender Bacon Preference Female Male Total A Little Limp Crispy Etra Crispy Don t Eat Bacon Total Bacon Preference Marginal Relative Frequency A Little Limp 1/41.44 Crispy 16/41.39 Etra Crispy 7/ Don t Eat Bacon 8/ Total 41 We can also answer questions involving both categorical variables. Relative Frequency A Little Limp Crispy Etra Crispy Bacon Preference Don't Eat Bacon The most popular way to eat bacon is crispy. About 39% of the students in the sample like their bacon this way. About 4% of students like their bacon a little limp, making this the second-most popular way to eat bacon. About % of students don t like bacon. The least popular way to eat bacon is etra crispy. Only 17.1% of students like their bacon etra crispy. A joint relative frequency is an and relative frequency. It gives the proportion of individuals that fall in a specific category of one variable and a specific category of another variable. Joint relative frequencies are proportions of the overall total. Eample: What proportion of the students in the sample are males and like their bacon etra crispy? 3/41 7.3% Eample: What percent of students in the sample are females who don t eat bacon? 7/ % To eamine the relationships between variables, we need to calculate some well-chosen proportions from the counts in the table. A conditional relative frequency gives the proportion of individuals with a specific value of one categorical variable among individuals who share a specific value of another categorical variable (the condition). Eample: What percent of the females in the sample like their bacon a little limp? 6/5 = 4% Eample: What proportion of the people who like their bacon crispy are female? 8/16 = 5%

7 Question: Are either of the above conditional relative frequencies misleading? Why? Hearing that 5% of the people who like their bacon crispy are female and 5% are male makes you think that males and females are equally likely to like bacon crispy. However, this is not true because the number of females in the sample is much higher than the number of males. In reality, only 8/5 = 3% of the females like their bacon crispy, while 8/16 = 5% of the males like their bacon crispy. A conditional distribution gives the conditional relative frequencies for each of the values of a categorical variable among individuals with a specific value of another categorical variable. Eample: Using the data above, calculate the conditional distribution of bacon preference for each gender. (This means figure out what proportion of girls like their bacon each way and what proportion of boys like their bacon each way.) Bacon Preference Female Male A Little Limp 6/5 = 4% 4/16 = 5% Crispy 8/5 = 3% 8/16 = 5% Etra Crispy 4/5 = 16% 3/16 = 18.75% Don t Eat Bacon 7/5 = 8% 1/16 = 6.5% 1% 1% To compare the conditional distributions of a categorical variable, we use side-by-side bar graphs (or comparative bar graphs). These display the distribution of a categorical variable for each value of another categorical variable. The bars are grouped together based on the values of one of the categorical variables and multiple distributions are placed side by side. Color-coding or keys are often used. There is an association (or relationship) between two variables if knowing the value of one variable helps us predict the value of the other. If knowing the value of one variable does not help us predict the value of the other, then there is no association between the variables. If the values of one variable are really different for different values of the other variable, then there is an association between the variables. If the values of one variable are really similar for different values of the other variable, then there isn t an association between the variables. Do not use the word correlation when you mean association. Correlation has a very specific meaning in statistics, which we will talk about later in the year. Eample: Draw a side-by-side bar graph comparing the bacon preferences of males and females. Use relative frequencies for the vertical ais. Then draw a segmented bar graph for each gender. Describe what you see. Does there appear to be an association between gender and bacon preference? Eplain..6 1%.5 A Little Limp 8% Don't Eat Bacon.4 6%.3 Crispy Etra Crispy. 4% Etra Crispy Crispy.1 % Don't Eat Bacon % A Little Limp Female Male Female Male There is a definite association between gender and bacon preference. Specifically, females are much more likely than males to not eat bacon. (8% of females don t eat bacon compared to only 6.5% of males). Also, males are more likely than females to prefer their bacon crispy. (5% of males prefer their bacon crispy compared to 3% of females). Similar proportions of males and females prefer their bacon a little limp and etra crispy. Relative Frequency Relative Frequency

8 Displaying Quantitative Data with Graphs One of the most common parts of a statistical problem is finding an appropriate way to display data. Quantitative data can t be displayed the same way as categorical data (bar graphs and pie charts don t work). The most common ways to display quantitative data are dotplots, stemplots, histograms, and boplots. How to Eamine the Distribution of a Quantitative Variable Describe the overall pattern of a distribution by describing its shape, center, and variation. Point out any outliers (unusually small or unusually large data values). Always put your descriptions in contet! Describing Shape: How many peaks does the distribution have? Don t count minor ups and downs, only major peaks. Ask yourself if there are distinct groups of individuals visible in the graph. o Unimodal: One peak (group). o Bimodal: Two peaks (groups). o Multimodal: Three or more peaks (groups). If there are any major gaps between groups, describe their locations. Is the distribution approimately symmetric or skewed? o If the right and left sides of the graph are close to mirror images of each other, describe the distribution as approimately symmetric. Always use the words approimately or roughly, because in real life, distributions of data are almost never perfectly symmetric. o If the right side of the graph is much longer than the left side (tail to the right), describe the distribution as skewed to the right or skewed to positive values or positively skewed. o If the left side of the graph is much longer than the right side (tail to the left), describe the distribution as skewed to the left or skewed to negative values or negatively skewed. Describing Center: Use the median (middle value) or the mean (average). Describing Variation: Use the range, interquartile range, or standard deviation, or say something like, The [values in contet] vary from a low of to a high of.

9 Dotplots: 1. Draw a horizontal line, label it with the name of the quantitative variable and the units of measurement, and place tick marks at equal intervals.. Locate each value in the data set along the measurement scale and represent it by a dot above the line. If there are two or more observations with the same value, stack the dots vertically. Try to make all the dots the same size and space them out equally as you stack them. To compare two distributions, stack the dotplots on top of each other, using the same scales. Make sure to label the two groups being compared. Eample: Below is a dotplot of the hair lengths of 41 AP Statistics students. Describe the distribution of hair length. Shape: The distribution of hair lengths has multiple peaks. There is one group of students with shorter hair (peak at 7 cm) and another group with longer hair (peak at 41 cm). Tbere are no students with hair between 18 and 6 cm long. Center: The median hair length is 4 cm. (Half of students have hair shorter than 4 cm and half have hair longer than 4 cm). Variability: The hair lengths vary from 1 cm to 68 cm (range = 67 cm). Outliers: There don t appear to be any outliers. Here are parallel dotplots showing the hair lengths of the students sorted by gender. Compare the distributions of hair length for the male and female students. Shape: The distribution of hair lengths for the females is approimately symmetric, while the distribution of hair length for males is slightly skewed to the right, meaning that shorter hair is more common than longer hair for the males. Both distributions have single peaks. There is a peak around 7 cm for the males and a peak around 41 cm for the females. Center: Females in the sample typically have much longer hair (median 47 cm) than the males (median 7 cm). Variability: There is much more variability in hair length for the females than for the males. The hair lengths for the females vary from 9 cm to 68 cm (range 59 cm), while the hair lengths for the males vary from 1 cm to 17 cm (range 16 cm). Outliers: There do not appear to be any outliers for the males, but the female with hair that is 9 cm long is an outlier. Her hair is unusually short compared to the rest of the females in the sample.

10 Stemplots (or Stem-and-Leaf Plots): Each number in the data set is broken into two pieces a stem and a leaf. The stem is the first part of the number and consists of the beginning digits. The leaf is the last part of the number and consists of the final digit(s). 1. Choose stems (one or more of the leading digits) that divide the data into a reasonable number of groups (at least 5, but not too many). List possible stem values (not just those that actually appear in the data set don t skip stems) in a vertical column. Draw a vertical line to the right of the stems.. The net digit(s) after the stem become(s) the leaf. List the leaf for every observation to the right of the corresponding stem. 3. Include a key eplaining what the stems and leaves represent, e.g., 5 represents.5 seconds It is common to round and/or truncate (leave off) the remaining digits. For eample, in a stemplot of annual salary, we might represent $35,36 as 35 3, 35 4, or as 3 5, depending on our data set. If necessary, consider using split stems. Write each stem more than once, and assign the lower group of leaves to the first stem and the higher group of leaves to the net. For eample, put the leaves -4 with the first stem and the leaves 5-9 with the second. If you do this, be sure that each stem is assigned an equal number of possible leaf digits (two stems, with five possible leaves each; or five stems, with two possible leaves each). To compare two groups, make a back-to-back stemplot. Use the same set of stems and write the leaves for one group to the right and for the other group to the left. Be sure to label each side to indicate which group is being represented. Eample: The data below shows the number of pairs of shoes owned for male and female AP Statistics students. Make a back-to-back stemplot of the data using split stems. Comment on the main differences between the two data sets. Female Male Number of Pairs of Shoes Female Male represents 15 pairs of shoes Shape: Both distributions of # of pairs of shoes owned are unimodal. The distribution for males is very slightly skewed to higher numbers, while the distribution for females is strongly skewed to higher numbers. This means that for both genders, it is more common to own a small number of shoes than a large number. Center: Females tend to own a larger number of shoes, on average, than males. (Median = 15 pairs for females vs. 6 pairs for males). Variability: There is more variability in the number of pairs of shoes owned for females than for males (range = 81 for females vs. 9 for males.) Outliers: There do not appear to be any outliers for the males, but the females who own 64 and 87 pairs of shoes both own many more pairs of shoes than the rest of the females in the sample.

11 Histograms: 1. Divide the range of the data into intervals of equal width. The intervals are called bins. The low value in each bin is included in the bin, but the high value is not. For eample, the bins might be to < 3, 3 to < 6, 6 to < 9, etc. If the data are discrete (the observations take only whole number values) and are tightly packed, the bins are usually centered at the integer values with a width of one unit, so the rectangle for 1 is centered at 1 (.5 to < 1.5), the rectangle for is centered at (1.5 to <.5), etc. There are no set-in-stone rules for how many bins to use (5 to 1 is a common number), but it may be a good idea to see what the graph looks like with different width bins. It can change quite a bit!. Find the frequency (count) or relative frequency (proportion) of individuals in each interval. Put values that fall on a boundary in the interval containing larger values. 3. Label and scale your aes. Place equally spaced tick marks at the boundaries of each interval along the horizontal ais (or in the middle of each interval if the data are discrete). Use either frequency (count) or relative frequency (proportion) on the vertical ais. 4. Draw a rectangle for each interval. Make the bars equal width and leave no gaps between them. The height should correspond to the frequency or relative frequency of individuals in that interval. Histograms and bar graphs are different! o Bar graphs are used for categorical data. Histograms are used for quantitative data. o The bars in bar graphs can be rearranged because the order of the categories shouldn t matter. The bars in histograms can t be rearranged because intervals must be in numerical order. o The bars in bar graphs are generally unconnected. The bars in histograms are connected. Eample: The following data gives the average points scored per game (PTSG) for the 3 NBA teams in the regular season. Draw two relative frequency histograms using different bin widths. Describe the distribution < < < < < < < <114 Frequency Points per Game 98 - < < < < < <116 1 Shape: The distribution of points scored per game is single-peaked and skewed to the right. It is more common for teams to score a smaller number of points than a larger number of points. Center: The median number of points scored per game last season was ( )/ = Variability: The number of points scored per game varied from 98.8 to (range = 14.7 points). Outliers: There do not appear to be any outliers. Frequency Points per Game

12 Describing Quantitative Data with Numbers Population: The entire collection of individuals or objects that you want to learn about. Sample: A part of the population that is selected for study. Resistant Measure: A measure that is not influenced very much by strong skewness or etreme values. Measures of Center: The most common measures of center are the mean and the median. Mean: The sum of the values divided by the number of observations n i If the n observations in a sample are 1,,..., n, the mean is = =. n n The mean can be thought of as the average value, the fair share value, or the balance point of a distribution. The mean is not a resistant measure. It is very sensitive to outliers and skewness. The mean of a sample is abbreviated (pronounced -bar ) and the mean of a population is abbreviated μ (the Greek letter mu, pronounced myoo ). They are both calculated the same way. The distinction will be important later in the year. If the problem doesn t specify whether the data represent a population or a sample, assume you are dealing with a sample and use. Median (M): The midpoint of a distribution. Half of the observations are smaller than the median and half of the values are larger than the median. To find the median: 1. Put the n observations in order from smallest to largest.. If the number of observations, n, is odd, the median is the middle observation of the ordered list. 3. If the number of observations, n, is even, the median is the average (mean) of the two middle observations in the ordered list. The median can be thought of as the typical value of a variable. The median is a resistant measure. It is not changed greatly by strong skewness or outliers. Comparing the Mean and the Median: The mean and median of a roughly symmetric distribution are close together. If the distribution is eactly symmetric, they are equal. However, outliers and other etreme values drag the mean toward them without having much effect on the median. As a result, in skewed distributions, the mean will be further out in the long tail than is the median.

13 Eample: Here are the amounts of fat (in grams) in McDonald s beef sandwiches. Make a stemplot of the distribution and comment on its shape. Then calculate the mean and the median amount of fat. Sandwich Fat (g) Sandwich Fat (g) Hamburger 9 Big N Tasty 4 Cheeseburger 1 Big N Tasty with Cheese 8 Double Cheeseburger 3 McRib 6 McDouble 19 Mac Snack Wrap 19 Quarter Pounder 19 Angus Bacon & Cheese 39 Quarter Pounder with Cheese 6 Angus Delue 39 Double Quarter Pounder with Cheese 4 Angus Mushroom & Swiss 4 Big Mac 9 d Grams of Fat represents 1 fat grams The distribution of fat content is unimodal and approimately symmetric, so we would epect the median to be close to the mean. Median = 6 grams Mean = ( )/15 = 394/15 = 6.3 grams Eample: Forty students were enrolled in a statistical reasoning course at a California college. The instructor made course materials, grades, and lecture notes available to students on a class web site, and course management software kept track of how often each student accessed any of these web pages. One month after the course began, the instructor requested a report of how many times each student had accessed a class web page. The 4 observations are below. Wasn t it nice of me to put them in order? (not a typo) Here is a dotplot of the data. Describe the distribution. Based on the graph, do you epect the mean or the median to be higher? Calculate the mean and the median to see if you were right. Which measure would be the best choice to describe center in this situation? Median = = = = Number of Visits to Class Website 5 3 The distribution is unimodal and etremely skewed to the right. Most students accessed the website between and times. The students who accessed the website 84 and 331 times are possible outliers. Since the distribution is so skewed and has high outliers, the mean will be pulled towards the high values, and will be much higher than the median. The median will be more representative of the class as a whole.

14 Measures of Variability: Numbers that describe how spread out the data are. The most common are the range, the interquartile range, and the standard deviation. Range: The difference between the maimum and minimum values. Standard Deviation: The most common measure of spread is the standard deviation. It measures the typical or average distance of the observations from the mean. Eample: Each of these distributions has a mean of 5. Rank the standard deviations from lowest to highest. Eplain your answer Highest standard deviation: The typical distance to the mean is the highest Lowest standard deviation: The typical distance to the mean is the lowest Middle The formula for standard deviation is slightly different depending on whether you have all the data for the entire population or are dealing with a sample from the population. For a Sample: If the n observations in a sample are 1,,..., n, and the mean is, the standard deviation is given by: s The sample standard deviation is abbreviated. ( ) + ( ) + + ( ) ( ) 1... n i = = n 1 n 1 s Variance: The square of the standard deviation is called the variance, abbreviated For the Population: The standard deviation of a population of size N with mean μ and observations 1,,..., n is given by: ( μ) + ( μ) + + ( μ) ( μ) 1... n i σ = = N N The population standard deviation is abbreviated σ (the Greek letter sigma). The population variance is abbreviated σ. The reason that we divide by n 1 in a sample is complicated. We ll discuss it later in the year. Always use s rather than σ unless you know that the data represent the entire population, which is rare! s.

15 Calculating the standard deviation by hand: 1. Calculate the mean,.. Find the distance of each observation from the mean (the deviations). 3. Square each of these distances to eliminate negative numbers. 4. Average the squared distances by adding them together and dividing by n 1. This gives the variance, s. 5. Take the square root of the variance to get the standard deviation, s. 6. Interpret your result. The standard deviation is the average or typical distance of the observations from the mean. Eample: The table below shows the sugar content in several types of candy bar. Find the mean and standard deviation of the data. Interpret your result in contet. Candy Bar Sugar (grams) i Deviations i Squared Deviations ( ) i Hershey s Milk Chocolate 31 4 Kit Kat 7 49 York Peppermint Pattie Reese s Peanut Butter Cups Snickers Milky Way Twi Musketeers Mr. Goodbar 7 49 Baby Ruth Total 9 31 Mean: Variance: s = 9 = = 9 grams 1 ( ) i n 1 31 = = grams 1 1 Standard Deviation: s = s = 5.89 grams The sugar contents of the individual candy bars typically differ from the mean sugar content by about 5.9 grams. Properties of the Standard Deviation The standard deviation measures variation around the mean. It should only be used when the mean is chosen as the measure of center. The standard deviation is always greater than or equal to zero. If there is no variability (all observations have the same value), the standard deviation is zero. Larger standard deviations indicate greater variation from the mean. The standard deviation has the same units of measurement as the original observations. This is one reason we usually interpret the standard deviation and not the variance. The standard deviation is not a resistant measure. A few outliers can change its value dramatically.

16 Interquartile Range (IQR): First, calculate the quartiles: 1. Arrange the data in increasing order and locate the median, M. (The median is sometimes called the second quartile, or Q).. The first quartile (Q1) is the median of all the observations lower than the median. 3. The third quartile (Q3) is the median of all the observations higher than the median. The interquartile range is calculated as follows: IQR = Q3 Q1 The IQR is the range of the middle 5% of the data. The range and interquartile range are numbers! Don t say The range is 5 to 3. In that case, the range would be 5. The IQR is not a location! It doesn t make sense to say an observation is in the IQR. 1.5 IQR Rule for Outliers: Any observation that falls more than 1.5 IQR above the third quartile or below the first quartile. Always check for outliers and eamine them closely! They may be errors, or they may tell you something important about your data that you need to pay attention to. Don t ignore them. Boplots (or Bo and Whisker Plots): 1. Find the Five-Number Summary: Minimum Q1 M Q3 Maimum. Check for outliers. You must always show this step. Calculate the IQR. Find Q1 ( 1.5 IQR) and Q3+ ( 1.5 IQR). If you have any data points outside these thresholds, they are outliers. 3. Draw the boplot: Draw a central bo from Q1 to Q3. Draw a vertical line in the bo to mark the median. Draw the whiskers : lines etending from the bo out to the smallest and largest observations that are not outliers. Mark outliers with dots in the appropriate locations. Each section of a boplot contains 5% of the data. The lower quartile is higher than 5% of the data. The median (or second quartile) is higher than 5% of the data. The upper quartile is higher than 75% of the data. Boplots are useful for comparing the center and spread of distributions, but you have to be careful with them. They can mask important information about the shape of a distribution. For instance, you can t tell from a boplot if a distribution has multiple peaks or gaps.

17 Eample: The data below shows the number of tet messages sent by a random sample of students in a day. Draw parallel boplots of the number of tets sent for male and female students. You must show how you determined whether there are outliers. Compare the distributions. What conclusions can you draw about the teting habits of males and females? Male Female Male Q = = 8 Q = = Min = 3, Q 1 = 8, Med = 17, Q 3 = 4.5, Ma = 111 IQR = = ( ) ( ) ( ) ( ) Q 1.5 IQR = = Q IQR = = 94.5 No low outliers because there are no numbers less than is an outlier because it is higher than Female ( ) ( ) ( ) ( ) Median = = Min = 7, Q 1 =, Med = 45, Q 3 = 79, Ma = 156 IQR = 79 = 59 Q 1.5 IQR = = 68.5 Q IQR = = No outliers because there are no numbers less than 68.5 or higher than Males Females # of Tets Sent in Past 4 Hours The females in the sample tet much more, on average, than the males (median = 45 for females and 17 for males). Since the median number of tets for females is higher than the third quartile for males, we can see that the top 5% of females tet more than the bottom 75% of the males. Both distributions are skewed to the right, meaning that it is more common to send smaller numbers of tets than larger ones. There is more variability in # of tets sent for females than for males (IQR = 59 for females and 34.5 for males). There is one outlier for the males. He sent 111 tets, which is unusually high. There were no outliers for the females. Choosing Measures of Center and Spread: Use the median and IQR for describing a skewed distribution or a distribution with strong outliers. Use the mean and standard deviation for describing reasonably symmetric distributions without outliers. ALWAYS GRAPH YOUR DATA! Numerical measures of center and spread report specific facts about a distribution, but don t give information about its entire shape. You may miss something important if you don t graph the data.

Exploring Data. How to Explore Data

Exploring Data. How to Explore Data Exploring Data Statistics is the art and science of learning from data. This may include: Designing appropriate tools to collect data. Organizing data in a meaningful way. Displaying data with appropriate

More information

Chapter 2: Tools for Exploring Univariate Data

Chapter 2: Tools for Exploring Univariate Data Stats 11 (Fall 2004) Lecture Note Introduction to Statistical Methods for Business and Economics Instructor: Hongquan Xu Chapter 2: Tools for Exploring Univariate Data Section 2.1: Introduction What is

More information

Performance of fourth-grade students on an agility test

Performance of fourth-grade students on an agility test Starter Ch. 5 2005 #1a CW Ch. 4: Regression L1 L2 87 88 84 86 83 73 81 67 78 83 65 80 50 78 78? 93? 86? Create a scatterplot Find the equation of the regression line Predict the scores Chapter 5: Understanding

More information

Chapter 4. Displaying and Summarizing. Quantitative Data

Chapter 4. Displaying and Summarizing. Quantitative Data STAT 141 Introduction to Statistics Chapter 4 Displaying and Summarizing Quantitative Data Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 31 4.1 Histograms 1 We divide the range

More information

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?!

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?! Topic 3: Introduction to Statistics Collecting Data We collect data through observation, surveys and experiments. We can collect two different types of data: Categorical Quantitative Algebra 1 Table of

More information

Elementary Statistics

Elementary Statistics Elementary Statistics Q: What is data? Q: What does the data look like? Q: What conclusions can we draw from the data? Q: Where is the middle of the data? Q: Why is the spread of the data important? Q:

More information

are the objects described by a set of data. They may be people, animals or things.

are the objects described by a set of data. They may be people, animals or things. ( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms

More information

AP Final Review II Exploring Data (20% 30%)

AP Final Review II Exploring Data (20% 30%) AP Final Review II Exploring Data (20% 30%) Quantitative vs Categorical Variables Quantitative variables are numerical values for which arithmetic operations such as means make sense. It is usually a measure

More information

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES INTRODUCTION TO APPLIED STATISTICS NOTES PART - DATA CHAPTER LOOKING AT DATA - DISTRIBUTIONS Individuals objects described by a set of data (people, animals, things) - all the data for one individual make

More information

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things. (c) Epstein 2013 Chapter 5: Exploring Data Distributions Page 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms Individuals are the objects described by a set of data. These individuals

More information

Resistant Measure - A statistic that is not affected very much by extreme observations.

Resistant Measure - A statistic that is not affected very much by extreme observations. Chapter 1.3 Lecture Notes & Examples Section 1.3 Describing Quantitative Data with Numbers (pp. 50-74) 1.3.1 Measuring Center: The Mean Mean - The arithmetic average. To find the mean (pronounced x bar)

More information

Histograms allow a visual interpretation

Histograms allow a visual interpretation Chapter 4: Displaying and Summarizing i Quantitative Data s allow a visual interpretation of quantitative (numerical) data by indicating the number of data points that lie within a range of values, called

More information

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved. 1-1 Chapter 1 Sampling and Descriptive Statistics 1-2 Why Statistics? Deal with uncertainty in repeated scientific measurements Draw conclusions from data Design valid experiments and draw reliable conclusions

More information

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected What is statistics? Statistics is the science of: Collecting information Organizing and summarizing the information collected Analyzing the information collected in order to draw conclusions Two types

More information

CHAPTER 1 Exploring Data

CHAPTER 1 Exploring Data CHAPTER 1 Exploring Data 1.2 Displaying Quantitative Data with Graphs The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Displaying Quantitative Data

More information

1.3.1 Measuring Center: The Mean

1.3.1 Measuring Center: The Mean 1.3.1 Measuring Center: The Mean Mean - The arithmetic average. To find the mean (pronounced x bar) of a set of observations, add their values and divide by the number of observations. If the n observations

More information

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable QUANTITATIVE DATA Recall that quantitative (numeric) data values are numbers where data take numerical values for which it is sensible to find averages, such as height, hourly pay, and pulse rates. UNIVARIATE

More information

Descriptive Statistics Solutions COR1-GB.1305 Statistics and Data Analysis

Descriptive Statistics Solutions COR1-GB.1305 Statistics and Data Analysis Descriptive Statistics Solutions COR-GB.0 Statistics and Data Analysis Types of Data. The class survey asked each respondent to report the following information: gender; birth date; GMAT score; undergraduate

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Section 1.2 with Graphs The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE Chapter 1 Exploring Data Introduction: Data Analysis: Making Sense of Data 1.1

More information

3.1 Measures of Central Tendency: Mode, Median and Mean. Average a single number that is used to describe the entire sample or population

3.1 Measures of Central Tendency: Mode, Median and Mean. Average a single number that is used to describe the entire sample or population . Measures of Central Tendency: Mode, Median and Mean Average a single number that is used to describe the entire sample or population. Mode a. Easiest to compute, but not too stable i. Changing just one

More information

STAT 200 Chapter 1 Looking at Data - Distributions

STAT 200 Chapter 1 Looking at Data - Distributions STAT 200 Chapter 1 Looking at Data - Distributions What is Statistics? Statistics is a science that involves the design of studies, data collection, summarizing and analyzing the data, interpreting the

More information

Chapter 1. Looking at Data

Chapter 1. Looking at Data Chapter 1 Looking at Data Types of variables Looking at Data Be sure that each variable really does measure what you want it to. A poor choice of variables can lead to misleading conclusions!! For example,

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Section 1.3 with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE Chapter 1 Exploring Data Introduction: Data Analysis: Making Sense of Data 1.1

More information

MATH 2560 C F03 Elementary Statistics I Lecture 1: Displaying Distributions with Graphs. Outline.

MATH 2560 C F03 Elementary Statistics I Lecture 1: Displaying Distributions with Graphs. Outline. MATH 2560 C F03 Elementary Statistics I Lecture 1: Displaying Distributions with Graphs. Outline. data; variables: categorical & quantitative; distributions; bar graphs & pie charts: What Is Statistics?

More information

Stat 101 Exam 1 Important Formulas and Concepts 1

Stat 101 Exam 1 Important Formulas and Concepts 1 1 Chapter 1 1.1 Definitions Stat 101 Exam 1 Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2. Categorical/Qualitative

More information

Introduction to Statistics

Introduction to Statistics Introduction to Statistics Data and Statistics Data consists of information coming from observations, counts, measurements, or responses. Statistics is the science of collecting, organizing, analyzing,

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Graphical Summaries Consider the following data x: 78, 24, 57, 39, 28, 30, 29, 18, 102, 34, 52, 54, 57, 82, 90, 94, 38, 59, 27, 68, 61, 39, 81, 43, 90, 40, 39, 33, 42, 15, 88, 94, 50, 66, 75, 79, 83, 34,31,36,

More information

STT 315 This lecture is based on Chapter 2 of the textbook.

STT 315 This lecture is based on Chapter 2 of the textbook. STT 315 This lecture is based on Chapter 2 of the textbook. Acknowledgement: Author is thankful to Dr. Ashok Sinha, Dr. Jennifer Kaplan and Dr. Parthanil Roy for allowing him to use/edit some of their

More information

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data Chapter 2: Summarising numerical data Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data Extract from Study Design Key knowledge Types of data: categorical (nominal and ordinal)

More information

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- # Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series by Mario F. Triola Chapter 3 Statistics for Describing, Exploring, and Comparing Data 3-1 Review and Preview 3-2 Measures

More information

ALGEBRA I SEMESTER EXAMS PRACTICE MATERIALS SEMESTER 2 27? 1. (7.2) What is the value of (A) 1 9 (B) 1 3 (C) 9 (D) 3

ALGEBRA I SEMESTER EXAMS PRACTICE MATERIALS SEMESTER 2 27? 1. (7.2) What is the value of (A) 1 9 (B) 1 3 (C) 9 (D) 3 014-015 SEMESTER EXAMS SEMESTER 1. (7.) What is the value of 1 3 7? (A) 1 9 (B) 1 3 (C) 9 (D) 3. (7.3) The graph shows an eponential function. What is the equation of the function? (A) y 3 (B) y 3 (C)

More information

Which boxplot represents the same information as the histogram? Test Scores Test Scores

Which boxplot represents the same information as the histogram? Test Scores Test Scores Frequency of Test Scores ALGEBRA I 01 013 SEMESTER EXAMS SEMESTER 1. Mrs. Johnson created this histogram of her 3 rd period students test scores. 8 6 4 50 60 70 80 90 100 Test Scores Which boplot represents

More information

AP Statistics Cumulative AP Exam Study Guide

AP Statistics Cumulative AP Exam Study Guide AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics

More information

CHAPTER 2: Describing Distributions with Numbers

CHAPTER 2: Describing Distributions with Numbers CHAPTER 2: Describing Distributions with Numbers The Basic Practice of Statistics 6 th Edition Moore / Notz / Fligner Lecture PowerPoint Slides Chapter 2 Concepts 2 Measuring Center: Mean and Median Measuring

More information

Shape, Outliers, Center, Spread Frequency and Relative Histograms Related to other types of graphical displays

Shape, Outliers, Center, Spread Frequency and Relative Histograms Related to other types of graphical displays Histograms: Shape, Outliers, Center, Spread Frequency and Relative Histograms Related to other types of graphical displays Sep 9 1:13 PM Shape: Skewed left Bell shaped Symmetric Bi modal Symmetric Skewed

More information

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives F78SC2 Notes 2 RJRC Algebra It is useful to use letters to represent numbers. We can use the rules of arithmetic to manipulate the formula and just substitute in the numbers at the end. Example: 100 invested

More information

3.1 Measure of Center

3.1 Measure of Center 3.1 Measure of Center Calculate the mean for a given data set Find the median, and describe why the median is sometimes preferable to the mean Find the mode of a data set Describe how skewness affects

More information

8/4/2009. Describing Data with Graphs

8/4/2009. Describing Data with Graphs Describing Data with Graphs 1 A variable is a characteristic that changes or varies over time and/or for different individuals or objects under consideration. Examples: Hair color, white blood cell count,

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics CHAPTER OUTLINE 6-1 Numerical Summaries of Data 6- Stem-and-Leaf Diagrams 6-3 Frequency Distributions and Histograms 6-4 Box Plots 6-5 Time Sequence Plots 6-6 Probability Plots Chapter

More information

Units. Exploratory Data Analysis. Variables. Student Data

Units. Exploratory Data Analysis. Variables. Student Data Units Exploratory Data Analysis Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison Statistics 371 13th September 2005 A unit is an object that can be measured, such as

More information

Chapter2 Description of samples and populations. 2.1 Introduction.

Chapter2 Description of samples and populations. 2.1 Introduction. Chapter2 Description of samples and populations. 2.1 Introduction. Statistics=science of analyzing data. Information collected (data) is gathered in terms of variables (characteristics of a subject that

More information

CHAPTER 1 Exploring Data

CHAPTER 1 Exploring Data CHAPTER 1 Exploring Data 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers 1.3 Reading Quiz True or false?

More information

Percentile: Formula: To find the percentile rank of a score, x, out of a set of n scores, where x is included:

Percentile: Formula: To find the percentile rank of a score, x, out of a set of n scores, where x is included: AP Statistics Chapter 2 Notes 2.1 Describing Location in a Distribution Percentile: The pth percentile of a distribution is the value with p percent of the observations (If your test score places you in

More information

TOPIC: Descriptive Statistics Single Variable

TOPIC: Descriptive Statistics Single Variable TOPIC: Descriptive Statistics Single Variable I. Numerical data summary measurements A. Measures of Location. Measures of central tendency Mean; Median; Mode. Quantiles - measures of noncentral tendency

More information

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Lecture 2 Quantitative variables There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Stemplot (stem-and-leaf plot) Histogram Dot plot Stemplots

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

Chapter 3. Data Description

Chapter 3. Data Description Chapter 3. Data Description Graphical Methods Pie chart It is used to display the percentage of the total number of measurements falling into each of the categories of the variable by partition a circle.

More information

Announcements. Lecture 1 - Data and Data Summaries. Data. Numerical Data. all variables. continuous discrete. Homework 1 - Out 1/15, due 1/22

Announcements. Lecture 1 - Data and Data Summaries. Data. Numerical Data. all variables. continuous discrete. Homework 1 - Out 1/15, due 1/22 Announcements Announcements Lecture 1 - Data and Data Summaries Statistics 102 Colin Rundel January 13, 2013 Homework 1 - Out 1/15, due 1/22 Lab 1 - Tomorrow RStudio accounts created this evening Try logging

More information

Lecture 1 : Basic Statistical Measures

Lecture 1 : Basic Statistical Measures Lecture 1 : Basic Statistical Measures Jonathan Marchini October 11, 2004 In this lecture we will learn about different types of data encountered in practice different ways of plotting data to explore

More information

Example 2. Given the data below, complete the chart:

Example 2. Given the data below, complete the chart: Statistics 2035 Quiz 1 Solutions Example 1. 2 64 150 150 2 128 150 2 256 150 8 8 Example 2. Given the data below, complete the chart: 52.4, 68.1, 66.5, 75.0, 60.5, 78.8, 63.5, 48.9, 81.3 n=9 The data is

More information

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. Statistics is a field of study concerned with the data collection,

More information

MATH 10 INTRODUCTORY STATISTICS

MATH 10 INTRODUCTORY STATISTICS MATH 10 INTRODUCTORY STATISTICS Tommy Khoo Your friendly neighbourhood graduate student. Week 1 Chapter 1 Introduction What is Statistics? Why do you need to know Statistics? Technical lingo and concepts:

More information

1. Exploratory Data Analysis

1. Exploratory Data Analysis 1. Exploratory Data Analysis 1.1 Methods of Displaying Data A visual display aids understanding and can highlight features which may be worth exploring more formally. Displays should have impact and be

More information

Math 140 Introductory Statistics

Math 140 Introductory Statistics Math 140 Introductory Statistics Professor Silvia Fernández Chapter 2 Based on the book Statistics in Action by A. Watkins, R. Scheaffer, and G. Cobb. Visualizing Distributions Recall the definition: The

More information

Math 140 Introductory Statistics

Math 140 Introductory Statistics Visualizing Distributions Math 140 Introductory Statistics Professor Silvia Fernández Chapter Based on the book Statistics in Action by A. Watkins, R. Scheaffer, and G. Cobb. Recall the definition: The

More information

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Summarize with Shape, Center, Spread Displays: Stemplots, Histograms Five Number Summary, Outliers, Boxplots Cengage Learning

More information

Determining the Spread of a Distribution

Determining the Spread of a Distribution Determining the Spread of a Distribution 1.3-1.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3-2311 Lecture 3-2311 1 / 58 Outline 1 Describing Quantitative

More information

Exercises from Chapter 3, Section 1

Exercises from Chapter 3, Section 1 Exercises from Chapter 3, Section 1 1. Consider the following sample consisting of 20 numbers. (a) Find the mode of the data 21 23 24 24 25 26 29 30 32 34 39 41 41 41 42 43 48 51 53 53 (b) Find the median

More information

Determining the Spread of a Distribution

Determining the Spread of a Distribution Determining the Spread of a Distribution 1.3-1.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3-2311 Lecture 3-2311 1 / 58 Outline 1 Describing Quantitative

More information

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Summarize with Shape, Center, Spread Displays: Stemplots, Histograms Five Number Summary, Outliers, Boxplots Mean vs.

More information

Mathematics Grade 7 Transition Alignment Guide (TAG) Tool

Mathematics Grade 7 Transition Alignment Guide (TAG) Tool Transition Alignment Guide (TAG) Tool As districts build their mathematics curriculum for 2013-14, it is important to remember the implementation schedule for new mathematics TEKS. In 2012, the Teas State

More information

Chapter 2: Descriptive Analysis and Presentation of Single- Variable Data

Chapter 2: Descriptive Analysis and Presentation of Single- Variable Data Chapter 2: Descriptive Analysis and Presentation of Single- Variable Data Mean 26.86667 Standard Error 2.816392 Median 25 Mode 20 Standard Deviation 10.90784 Sample Variance 118.981 Kurtosis -0.61717 Skewness

More information

Chapters 1 & 2 Exam Review

Chapters 1 & 2 Exam Review Problems 1-3 refer to the following five boxplots. 1.) To which of the above boxplots does the following histogram correspond? (A) A (B) B (C) C (D) D (E) E 2.) To which of the above boxplots does the

More information

A C E. Answers Investigation 4. Applications

A C E. Answers Investigation 4. Applications Answers Applications 1. 1 student 2. You can use the histogram with 5-minute intervals to determine the number of students that spend at least 15 minutes traveling to school. To find the number of students,

More information

Comparing Measures of Central Tendency *

Comparing Measures of Central Tendency * OpenStax-CNX module: m11011 1 Comparing Measures of Central Tendency * David Lane This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 1.0 1 Comparing Measures

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

M 140 Test 1 B Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

M 140 Test 1 B Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75 M 140 est 1 B Name (1 point) SHOW YOUR WORK FOR FULL CREDI! Problem Max. Points Your Points 1-10 10 11 10 12 3 13 4 14 18 15 8 16 7 17 14 otal 75 Multiple choice questions (1 point each) For questions

More information

A graph for a quantitative variable that divides a distribution into 25% segments.

A graph for a quantitative variable that divides a distribution into 25% segments. STATISTICS Unit 2 STUDY GUIDE Topics 6-10 Part 1: Vocabulary For each word, be sure you know the definition, the formula, or what the graph looks like. Name Block A. association M. mean absolute deviation

More information

Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore

Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore Chapter 3 continued Describing distributions with numbers Measuring spread of data: Quartiles Definition 1: The interquartile

More information

Chapter 4.notebook. August 30, 2017

Chapter 4.notebook. August 30, 2017 Sep 1 7:53 AM Sep 1 8:21 AM Sep 1 8:21 AM 1 Sep 1 8:23 AM Sep 1 8:23 AM Sep 1 8:23 AM SOCS When describing a distribution, make sure to always tell about three things: shape, outliers, center, and spread

More information

Lecture 1: Description of Data. Readings: Sections 1.2,

Lecture 1: Description of Data. Readings: Sections 1.2, Lecture 1: Description of Data Readings: Sections 1.,.1-.3 1 Variable Example 1 a. Write two complete and grammatically correct sentences, explaining your primary reason for taking this course and then

More information

A is one of the categories into which qualitative data can be classified.

A is one of the categories into which qualitative data can be classified. Chapter 2 Methods for Describing Sets of Data 2.1 Describing qualitative data Recall qualitative data: non-numerical or categorical data Basic definitions: A is one of the categories into which qualitative

More information

PS2.1 & 2.2: Linear Correlations PS2: Bivariate Statistics

PS2.1 & 2.2: Linear Correlations PS2: Bivariate Statistics PS2.1 & 2.2: Linear Correlations PS2: Bivariate Statistics LT1: Basics of Correlation LT2: Measuring Correlation and Line of best fit by eye Univariate (one variable) Displays Frequency tables Bar graphs

More information

CHAPTER 1. Introduction

CHAPTER 1. Introduction CHAPTER 1 Introduction Engineers and scientists are constantly exposed to collections of facts, or data. The discipline of statistics provides methods for organizing and summarizing data, and for drawing

More information

MATH 1150 Chapter 2 Notation and Terminology

MATH 1150 Chapter 2 Notation and Terminology MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the

More information

Chapter 26: Comparing Counts (Chi Square)

Chapter 26: Comparing Counts (Chi Square) Chapter 6: Comparing Counts (Chi Square) We ve seen that you can turn a qualitative variable into a quantitative one (by counting the number of successes and failures), but that s a compromise it forces

More information

Instructor: Doug Ensley Course: MAT Applied Statistics - Ensley

Instructor: Doug Ensley Course: MAT Applied Statistics - Ensley Student: Date: Instructor: Doug Ensley Course: MAT117 01 Applied Statistics - Ensley Assignment: Online 04 - Sections 2.5 and 2.6 1. A travel magazine recently presented data on the annual number of vacation

More information

Statistic: a that can be from a sample without making use of any unknown. In practice we will use to establish unknown parameters.

Statistic: a that can be from a sample without making use of any unknown. In practice we will use to establish unknown parameters. Chapter 9: Sampling Distributions 9.1: Sampling Distributions IDEA: How often would a given method of sampling give a correct answer if it was repeated many times? That is, if you took repeated samples

More information

Lecture 1: Descriptive Statistics

Lecture 1: Descriptive Statistics Lecture 1: Descriptive Statistics MSU-STT-351-Sum 15 (P. Vellaisamy: MSU-STT-351-Sum 15) Probability & Statistics for Engineers 1 / 56 Contents 1 Introduction 2 Branches of Statistics Descriptive Statistics

More information

Lecture Notes 2: Variables and graphics

Lecture Notes 2: Variables and graphics Highlights: Lecture Notes 2: Variables and graphics Quantitative vs. qualitative variables Continuous vs. discrete and ordinal vs. nominal variables Frequency distributions Pie charts Bar charts Histograms

More information

3.1 Graphs of Polynomials

3.1 Graphs of Polynomials 3.1 Graphs of Polynomials Three of the families of functions studied thus far: constant, linear and quadratic, belong to a much larger group of functions called polynomials. We begin our formal study of

More information

Reporting Measurement and Uncertainty

Reporting Measurement and Uncertainty Introduction Reporting Measurement and Uncertainty One aspect of Physics is to describe the physical world. In this class, we are concerned primarily with describing objects in motion and objects acted

More information

Statistics and parameters

Statistics and parameters Statistics and parameters Tables, histograms and other charts are used to summarize large amounts of data. Often, an even more extreme summary is desirable. Statistics and parameters are numbers that characterize

More information

appstats8.notebook October 11, 2016

appstats8.notebook October 11, 2016 Chapter 8 Linear Regression Objective: Students will construct and analyze a linear model for a given set of data. Fat Versus Protein: An Example pg 168 The following is a scatterplot of total fat versus

More information

Vocabulary: Samples and Populations

Vocabulary: Samples and Populations Vocabulary: Samples and Populations Concept Different types of data Categorical data results when the question asked in a survey or sample can be answered with a nonnumerical answer. For example if we

More information

Descriptive Statistics-I. Dr Mahmoud Alhussami

Descriptive Statistics-I. Dr Mahmoud Alhussami Descriptive Statistics-I Dr Mahmoud Alhussami Biostatistics What is the biostatistics? A branch of applied math. that deals with collecting, organizing and interpreting data using well-defined procedures.

More information

Math 082 Final Examination Review

Math 082 Final Examination Review Math 08 Final Examination Review 1) Write the equation of the line that passes through the points (4, 6) and (0, 3). Write your answer in slope-intercept form. ) Write the equation of the line that passes

More information

Description of Samples and Populations

Description of Samples and Populations Description of Samples and Populations Random Variables Data are generated by some underlying random process or phenomenon. Any datum (data point) represents the outcome of a random variable. We represent

More information

download instant at

download instant at Chapter 2 Test B Multiple Choice Section 2.1 (Visualizing Variation in Numerical Data) 1. [Objective: Interpret visual displays of numerical data] For twenty days a record store owner counts the number

More information

CS 5014: Research Methods in Computer Science. Statistics: The Basic Idea. Statistics Questions (1) Statistics Questions (2) Clifford A.

CS 5014: Research Methods in Computer Science. Statistics: The Basic Idea. Statistics Questions (1) Statistics Questions (2) Clifford A. Department of Computer Science Virginia Tech Blacksburg, Virginia Copyright c 2015 by Clifford A. Shaffer Computer Science Title page Computer Science Clifford A. Shaffer Fall 2015 Clifford A. Shaffer

More information

CIVL 7012/8012. Collection and Analysis of Information

CIVL 7012/8012. Collection and Analysis of Information CIVL 7012/8012 Collection and Analysis of Information Uncertainty in Engineering Statistics deals with the collection and analysis of data to solve real-world problems. Uncertainty is inherent in all real

More information

1.3: Describing Quantitative Data with Numbers

1.3: Describing Quantitative Data with Numbers 1.3: Describing Quantitative Data with Numbers Section 1.3 Describing Quantitative Data with Numbers After this section, you should be able to MEASURE center with the mean and median MEASURE spread with

More information

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution.

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution. Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution. 1 Histograms p53 The breakfast cereal data Study collected data on nutritional

More information

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data Review for Exam #1 1 Chapter 1 Population the complete collection of elements (scores, people, measurements, etc.) to be studied Sample a subcollection of elements drawn from a population 11 The Nature

More information

Chapter 27 Summary Inferences for Regression

Chapter 27 Summary Inferences for Regression Chapter 7 Summary Inferences for Regression What have we learned? We have now applied inference to regression models. Like in all inference situations, there are conditions that we must check. We can test

More information

Chapter 2 Solutions Page 15 of 28

Chapter 2 Solutions Page 15 of 28 Chapter Solutions Page 15 of 8.50 a. The median is 55. The mean is about 105. b. The median is a more representative average" than the median here. Notice in the stem-and-leaf plot on p.3 of the text that

More information

Chapter 6 Group Activity - SOLUTIONS

Chapter 6 Group Activity - SOLUTIONS Chapter 6 Group Activity - SOLUTIONS Group Activity Summarizing a Distribution 1. The following data are the number of credit hours taken by Math 105 students during a summer term. You will be analyzing

More information

Sets and Set notation. Algebra 2 Unit 8 Notes

Sets and Set notation. Algebra 2 Unit 8 Notes Sets and Set notation Section 11-2 Probability Experimental Probability experimental probability of an event: Theoretical Probability number of time the event occurs P(event) = number of trials Sample

More information

Vocabulary: Data About Us

Vocabulary: Data About Us Vocabulary: Data About Us Two Types of Data Concept Numerical data: is data about some attribute that must be organized by numerical order to show how the data varies. For example: Number of pets Measure

More information

Practice Questions for Exam 1

Practice Questions for Exam 1 Practice Questions for Exam 1 1. A used car lot evaluates their cars on a number of features as they arrive in the lot in order to determine their worth. Among the features looked at are miles per gallon

More information