SAMPLE. Describing the distribution of a single variable

Size: px
Start display at page:

Download "SAMPLE. Describing the distribution of a single variable"

Transcription

1 Objectives C H A P T E R 22 Describing the distribution of a single variable To introduce the two main types of data categorical and numerical To use bar charts to display frequency distributions of categorical data To use histograms and frequency polygons to display frequency distributions of numerical data To use cumulative frequency polygons and cumulative relative frequency polygons to display cumulative frequency distributions To use the stem-and-leaf plot to display numerical data To use the histogram to display numerical data To use these plots to describe the distribution of a numerical variable in terms of symmetry, centre, spread and outliers To define and calculate the summary statistics mean, median, range, interquartile range, variance and standard deviation To understand the properties of these summary statistics and when each is appropriate To construct and interpret boxplots, and use them to compare data sets 22.1 Types of variables A characteristic about which information is recorded is called a variable, because its value is not always the same. Several types of variable can be identified. Consider the following situations. 500

2 Chapter 22 Describing the distribution of a single variable 501 Students answer a question by selecting yes, no or don t know. Students say how they feel about a particular statement by ticking one of strongly agree, agree, no opinion, disagree or strongly disagree. Students write down the size shoe that they take. Students write down their height. These situations give rise to two different types of data. The data arising from the first two situations are called categorical data, because the data can only be classified by the name of the category from which they come; there is no quantity associated with each category. The data arising from the third and fourth examples is called numerical data. These examples differ slightly from each other in the type of numerical data they each generate. Shoe sizes are of the form..., 6, 6.5, 7, 7.5,... These are called discrete data, because the data can only take particular values. Discrete data often arise in situations where counting is involved. The other type of numerical data is continuous data where the variable may take any value (sometimes within a specified interval). Such data arise when students measure height. In fact, continuous data often arise when measuring is involved. Exercise 22A 1 Classify the data which arise from the following situations into categorical, or numerical. a Kindergarten pupils bring along their favourite toy, and they are grouped together under the headings: dolls, soft toys, games, cars, and other. b The number of students on each of twenty school buses are counted. c Agroup of people each write down their favourite colour. d Each student in a class is weighed in kilograms. e Each student in a class is weighed and then classified as light, average or heavy. f People rate their enthusiasm for a certain rock group as low, medium, or high. 2 Classify the data which arise from the following situations as categorical or numerical. a The intelligence quotient (IQ) of a group of students is measured using a test. b Agroup of people are asked to indicate their attitude to capital punishment by selecting a number from 1 to 5 where 1 = strongly disagree, 2 = disagree, 3 = undecided, 4 = agree, and 5 = strongly agree. 3 Classify the following numerical data as either discrete or continuous. a The number of pages in a book. b The price paid to fill the tank of a car with petrol. c The volume of petrol used to fill the tank of a car. d The time between the arrival of successive customers at an autobank teller. e The number of tosses of a die required before a six is thrown.

3 502 Essential Advanced General Mathematics 22.2 Displaying categorical data the bar chart Suppose a group of 130 students were asked to nominate their favourite kind of music under the categories hard rock, oldies, classical, rap, country or other. The table shows the data for the first few students. Student s name Daniel Karina John Jodie Favourite music hard rock classical country hard rock The table gives data for individual students. To consider the group as a whole the data should be collected into a table called a frequency distribution by counting how many of each of the different values of the variable have been observed. Counting the number of students who responded to the question on favourite kinds of music gave the following results in each category. Hard rock Other Oldies Classical Rap Country While a clear indication of the group s preferences can be seen from the table, a visual display may be constructed to illustrate this. When the data are categorical, the appropriate display is a bar chart. The categories are indicated on the horizontal axis and the corresponding numbers in each category shown on the vertical axis. Number of students Hard rock Other Oldies Classical Rap Country Type of music The order in which the categories are listed on the horizontal axis is not important, as no order is inherent in the category labels. In this particular bar chart, the categories are listed in decreasing order by number. From the bar chart the music preferences for the group of students may be easily compared. The value which occurs most frequently is called the mode of the variable. Here it can be seen that the mode is hard rock.

4 Chapter 22 Describing the distribution of a single variable 503 Exercise 22B 1 Agroup of students were asked to select their favourite type of fast food, with the following results. a b Draw a bar chart for these data. Which is the most popular food type? 2 The following responses were received to a question regarding the return of capital punishment. a Draw a bar chart for these data. b How many respondents either agree or strongly agree? 3 A video shop proprietor took note of the type of films borrowed during a particular day with the following results. a b Construct a bar chart to illustrate these data. Which is the least popular film type? 4 A survey of secondary school students preferred ways of spending their leisure time at home gave the following results. a b Construct a bar chart to illustrate these data. What is the most common leisure activity? Food type Number of students hamburgers 23 chicken 7 fish and chips 6 Chinese 7 pizza 18 other 8 strongly agree 21 agree 11 don t know 42 disagree 53 strongly disagree 129 comedy 53 drama 89 horror 42 music 15 other 33 watch TV 42% read 13% listen to music 23% watch a video 12% phone friends 4% other 6% 22.3 Displaying numerical data the histogram In previous studies you have been introduced to various ways of summarising and displaying numerical data, including dotplots, stem-and-leaf plots, histograms and boxplots. Constructing a histogram for discrete numerical data is demonstrated in Example 1.

5 504 Essential Advanced General Mathematics Example 1 The numbers of siblings reported by each student in Year 11 at a local school is as follows: Construct a frequency distribution of the number of siblings. Solution To construct the frequency distribution count the numbers of students corresponding to each of the numbers of siblings, as shown. Number Frequency A histogram looks similar to a bar chart, but because the data are numeric there is a natural order to the plot which may not occur with a bar chart. Usually for discrete data the actual data values are located at the middle of the appropriate column, as shown. Frequency Number of siblings An alternative display for a frequency distribution is a frequency polygon. Itisformed by plotting the values in the frequency histogram with points, which are then joined by straight lines. A frequency polygon for the data in Example 1 is shown by the red line in this diagram. Frequency Number of siblings When the range of responses is large it is usual to gather the data together into sub-groups or class intervals. The number of data values corresponding to each class interval is called the class frequency.

6 Chapter 22 Describing the distribution of a single variable 505 Class intervals should be chosen according to the following principles: Every data value should be in an interval The intervals should not overlap There should be no gaps between the intervals. The choice of intervals can vary, but generally a division which results in about 5 to 15 groups is preferred. It is also usual to choose an interval width which is easy for the reader to interpret, such as 10 units, 100 units, 1000 units etc (depending on the data). By convention, the beginning of the interval is given the appropriate exact value, rather than the end. For example, intervals of 0 49, 50 99, would be preferred over the intervals 1 50, , etc. Example 2 A researcher asked a group of people to record how many cups of coffee they drank in a particular week. Here are her results Construct a frequency distribution and hence a histogram of these data. Solution Because there are so many different results and they are spread over a wide range, the data are summarised into class intervals. As the minimum value is 0 and the Number of Frequency maximum is 34, intervals of width 5 cups of coffee would be appropriate, giving the frequency distribution shown in the table The corresponding histogram may then be drawn Frequency Number of cups of coffee Example 2 was concerned with a discrete numerical variable. When constructing a frequency distribution of continuous data, the data are again grouped, as shown in Example 3.

7 506 Essential Advanced General Mathematics Example 3 The following are the heights of the players in a basketball club, measured to the nearest millimetre Construct a frequency distribution and hence a histogram of these data. Solution From the data it seems that intervals of width 5 will be suitable. All values of the variable which are 170 or more, but less than 175, have been included in the first interval. The second interval includes values from 175 to less than 180, and so on for the rest of the table. The histogram of these data is shown here. Frequency Player heights Frequency Player heights The interval in a frequency distribution which has the highest class frequency is called the modal class. Here the modal class is Using the TI-Nspire The calculator can be used to construct a histogram for numerical data. This will be illustrated using the basketball player height data from Example 3.

8 Chapter 22 Describing the distribution of a single variable 507 The data is easiest entered in a Lists & Spreadsheet application ( 3). Firstly, use the up/down arrows ( )to name the first column height. Then enter each of the 41 numbers as shown. Open a Data & Statistics application ( 5) tograph the data. At first the data displays as shown. Specify the x variable by selecting Add X Variable from the Plot Properties (b 2 4) and selecting height. The data now displays as shown. (Note: It is also possible to use the NavPad to move down below the x-axis and click to add the x variable.) Select Histogram from the Plot Type menu (b 13). The data now displays as shown. Select Bin Settings from the Histogram Properties submenu of Plot Properties menu (b 222). Let width = 5 and Alignment = 170. Finally, select Zoom, Data from the Window/Zoom menu (b 5 2)to display the data as shown.

9 508 Essential Advanced General Mathematics Using the Casio ClassPad The calculator can be used to construct a histogram for numerical data. This will be illustrated using the basketball player height data from Example 3. In enter the data into list1, tapping EXE to enter and move down the column. Tap SetGraph, Setting...and the tab for Graph 1, enter the settings shown and tap SET. Tap SetGraph, StatGraph1 and then tap the box to tick and select the graph. Tap to produce the graph selecting HStart = 4 (the left bound of the histogram) and HStep = 4 (the desired interval width) when prompted. The histogram is produced as shown. With the graph window selected (bold border) tap 6 to adjust the viewing window for the graph. Tap Analysis, Trace and use the navigator key to move from column to column and display the count for that column.

10 Chapter 22 Describing the distribution of a single variable 509 Relative and percentage frequencies When frequencies are expressed as a proportion of the total number they are called relative frequencies.byexpressing the frequencies as relative frequencies more information is obtained about the data set. Multiplying the relative frequencies by 100 readily converts them to percentage frequencies,which are easier to interpret. An example of the calculation of relative and percentage frequencies is shown in Example 4. Example 4 Construct a relative frequency distribution and a percentage frequency distribution for the player height data. Solution From this table it can be seen, for example, that nine out of forty-one, or 22% of players, have heights from 185 cm to less than 190 cm. Player Relative Percentage heights (cm) Frequency frequency frequency = % 5 = % 13 = % 9 = % 7 = % 1 = % 2 = % Both the relative frequency histogram and the percentage frequency histogram are identical to the frequency histogram only the vertical scale is changed. To construct either of these histograms from a list of data use a graphics calculator to construct the frequency histogram, and then convert the individual frequencies to either relative frequencies or percentage frequencies one by one as required. Cumulative frequency distribution To answer questions concerning the number or proportion of the data values which are less than a given value a cumulative frequency distribution, oracumulative relative frequency distribution can be constructed. In both a cumulative frequency distribution and a cumulative relative frequency distribution, the number of observations in each class are accumulated from low to high values of the variable.

11 510 Essential Advanced General Mathematics Example 1 Example 5 Construct a cumulative frequency distribution and a cumulative relative frequency distribution for the data in Example 4. Solution Player heights Cumulative Cumulative relative (cm) Frequency frequency frequency < < < < < < < < Each cumulative frequency was obtained by adding preceding values of the frequency. In the same way the cumulative relative frequencies were obtained by adding preceding relative frequencies. Thus it can be said that a proportion of 0.54, or 54%, of players are less than 185 cm tall. Agraphical representation of a cumulative frequency distribution is called a cumulative frequency 40 polygon and has a distinctive appearance, as it 30 always starts at zero and is non-decreasing. This graph shows, on the vertical axis, the 20 number of players shorter than any height 10 given on the horizontal axis. The cumulative relative frequency distribution could also be 0 plotted as a cumulative relative frequency Player heights polygon,which would differ from the cumulative frequency polygon only in the scale on the vertical axis, which would run from 0 to 1. Exercise 22C Cumulative frequency The number of pets reported by each student in a class is given in the following table: Construct a frequency distribution of the numbers of pets reported by each student.

12 Chapter 22 Describing the distribution of a single variable The number of children in the family for each student in a class is shown in this histogram. Example 2 Example 4 a b c d Number of students Size of family How many students are the only child in a family? What is the most common number of children in the family? How many students come from families with six or more children? How many students are there in the class? 3 The following histogram gives the scores on a general knowledge quiz for a class of Year 11 students. Number of students Marks a How many students scored from marks? b How many students attempted the quiz? c What is the modal class? d If a mark of 50 or more is designated as a pass, how many students passed the quiz? 4 The maximum temperatures for several capital cities around the world on a particular day, in degrees Celsius, were: a Use a class interval of 5 to construct a frequency distribution for these data. b Construct the corresponding relative frequency distribution. c Draw a histogram from the frequency distribution. d What percentage of cities had a maximum temperature of less than 25 C?

13 512 Essential Advanced General Mathematics Example 3 Example 5 5 A student purchases 21 new text books from a school book supplier with the following prices (in dollars) a Draw a histogram of these data using appropriate class intervals. b What is the modal class? c Construct a cumulative frequency distribution for these data and draw the cumulative frequency polygon. 6 Agroup of students were asked to draw a line which they estimated to be the same length as a 30 cm ruler. The lines were then measured (in cm) with the following results a Construct a histogram of the frequency distribution. b Construct a cumulative frequency distribution for these data and draw the cumulative frequency polygon. c Write a sentence to describe the students performance on this task. 7 The following are the marks obtained by a group of Year 11 Chemistry students on the end of year exam a Using a graphics calculator, or otherwise, construct a histogram of the frequency distribution. b Construct a cumulative frequency distribution for these data and draw the cumulative frequency polygon. c Write a sentence to describe the students performance on this exam. 8 The following 50 values are the lengths (in metres) of some par 4 golf holes from Melbourne golf courses a b Construct a histogram of the frequency distribution. Construct a cumulative frequency distribution for these data and draw the cumulative frequency polygon.

14 c Chapter 22 Describing the distribution of a single variable 513 Use the cumulative frequency polygon to estimate: i the proportion of par 4 holes below 300 m in length ii the proportion of par 4 holes 360 m or more in length iii the length which is exceeded by 90% of the par 4 holes Characteristics of distributions of numerical variables Distributions of numerical variables are characterised by their shapes and special features such as centre and spread. Two distributions are said to differ in centre if the values of the variable in one distribution are generally larger than the values of the variable in the other distribution. Consider, for example, the following histograms shown on the same scale. a b It can be seen that plot b is identical to plot a but moved horizontally several units to the right, indicating that these distributions differ in the location of their centres. The next pair of histograms also differ, but not in the same way. While both histograms are centred at about the same place, histogram d is more spread out. Two distributions are said to differ in spread if the values of the variable in one distribution tend to be more spread out than the values of the variable in the other distribution. c d A distribution is said to be symmetric if it forms a mirror image of itself when folded in the middle along a vertical axis; otherwise it is said to be skewed. Histogram e is perfectly symmetrical, while f shows a distribution which is approximately symmetric. e f

15 514 Essential Advanced General Mathematics If a histogram has a short tail to the left and a long tail pointing to the right it is said to be positively skewed (because of the many values towards the positive end of the distribution) as shown in the histogram g. If a histogram has a short tail to the right and a long tail pointing to the left it is said to be negatively skewed (because of the many values towards the negative end of the distribution), as shown in histogram h. g h positively skewed negatively skewed Knowing whether a distribution is skewed or symmetric is important as this gives considerable information concerning the choice of appropriate summary statistics, as will be seen in the next section. Exercise 22D 1 Do the following pairs of distributions differ in centre, spread, both or neither? a b c

16 Chapter 22 Describing the distribution of a single variable Describe the shape of each of the following histograms. a b 0 0 c 0 3 What is the shape of the histogram drawn in 6, Exercise 22C? 4 What is the shape of the histogram drawn in 7, Exercise 22C? 5 What is the shape of the histogram drawn in 8, Exercise 22C? 22.5 Stem-and-leaf plots An informative data display for a small (less than 50 values) numerical data set is the stem-and-leaf plot. The construction of the stem-and-leaf plot is illustrated in Example 6. Example 6 By the end of 2004 the number of test matches played, as captain, by each of the Australian cricket captains was: Construct a stem-and-leaf plot of these data.

17 516 Essential Advanced General Mathematics Solution To make a stem-and-leaf plot find the smallest and the largest data values. From the table above, the smallest value is 1, which is given a0intheten s column, and the largest is 93, which has a 9 in the ten s column. This means that the stems are chosen to be from 0 9. These are written in a column with avertical line to their right, as shown. The units for each data point are then entered to the right of the dividing line. They are entered initially in the order in which they appear in the data. When all data points are entered in the table, the stem-and-leaf plot looks like this To complete the plot the leaves are ordered, and a key added to specify the place value of the stem and the leaves indicates 39 matches It can be seen from this plot that one captain has led Australia in many more test matches than any other (Allan Border, who captained Australia in 93 test matches). When a value sits away from the main body of the data it is called an outlier

18 Chapter 22 Describing the distribution of a single variable 517 Stem-and-leaf plots have the advantage of retaining all the information in the data set while achieving a display not unlike that of a histogram (turned on its side). In addition, a stem-and-leaf plot clearly shows: the range of values where the values are concentrated the shape of the data set whether there are any gaps in which no values are observed any unusual values (outliers). Grouping the leaves in tens is simplest other convenient groupings are in fives or twos, as shown in Example 7. Example 7 The birth weights, in kilograms, of the first 30 babies born at a hospital in a selected month are as follows Construct a stem-and-leaf plot of these data. Solution A stem-and-leaf plot of the birth weights, with the stem representing units and the leaves representing one-tenth of a unit, may be constructed indicates 3.0 kilograms The plot, which allows one row for each different stem, appears to be too compact. These data may be better displayed by constructing a stem-and-leaf plot with two rows for each stem. These rows correspond to the digits {0, 1, 2, 3, 4} in the first row and {5, 6, 7, 8, 9} in the second row indicates 3.0 kilograms The only other possibility for a stem-and-leaf plot is one which has five rows per stem. These rows correspond to the digits {0, 1}, {2, 3}, {4, 5}, {6, 7} and {8, 9}.

19 518 Essential Advanced General Mathematics indicates 3.0 kilograms None of the stem-and-leaf displays shown are correct or incorrect. A stem-and-leaf plot is used to explore data and more than one may need to be constructed before the most informative one is obtained. Again, from 5 to 15 rows is generally the most helpful, but this may vary in individual cases. When the data have too many digits for a convenient stem-and-leaf plot they should be rounded or truncated. Truncating a number means simply dropping off the unwanted digits. So, for example, a value of would become 149 if truncated to three digits, but 150 if rounded to three digits. Since the object of a stem-and-leaf display is to give a feeling for the shape and patterns in the data set, the decision on whether to round or truncate is not very important; however, generally when constructing a stem-and-leaf display the data is truncated, as this is what commonly used data analysis computer packages will do. Some of the most interesting investigations in statistics involve comparing two or more data sets. Stem-and-leaf plots are useful displays for the comparison of two data sets, as shown in the following example. Example 8 The following table gives the number disposals by members of the Port Adelaide and Brisbane football teams, in the 2004 AFL Grand Final. Port Adelaide Brisbane Construct back to back stem-and-leaf plots of these data.

20 Chapter 22 Describing the distribution of a single variable 519 Example 6 Example 7 Solution To compare the two groups, the stem-and-leaf plots are drawn back to back, using two rows per stem. Port Adelaide Brisbane represents 20 disposals 2 0represents 20 disposals The leaves on the left of the stem are centred slightly higher than the leaves on the right, which suggests that, overall, Port Adelaide recorded more disposals. The spread of disposals for Port Adelaide appears narrower than that of the Brisbane players. Exercise 22E 1 The monthly rainfall for Melbourne, in a particular year, is given in the following table (in millimetres). a b Month J F M A M J J A S O N D Rainfall (mm) Construct a stem-and-leaf plot of the rainfall, using the following stems In how many months is the rainfall 60 mm or more? 2 An investigator recorded the amount of time 24 similar batteries lasted in a toy. Her results in hours were: a b Make a stem-and-leaf plot of these times with two rows per stem. How many of the batteries lasted for more than 30 hours? 3 The amount of time (in minutes) that a class of students spent on homework on one particular night was:

21 520 Essential Advanced General Mathematics Example 8 a b c Make a stem-and-leaf plot of these times. How many students spent more than 60 minutes on homework? What is the shape of the distribution? 4 The cost of various brands of track shoes at a retail outlet are as follows. $49.99 $75.49 $68.99 $ $75.99 $39.99 $ $ $84.99 $36.98 $95.49 $28.99 $25.49 $78.99 $45.99 $46.99 $76.99 $82.99 $79.99 $ a Construct a stem-and-leaf plot of these data. b What is the shape of the distribution? 5 The students in a class were asked to write down the ages of their mothers and fathers. Mother s age Father s age a b Construct a back to back stem-and-leaf plot of these data sets. How do the ages of the students mothers and fathers compare in terms of shape, centre and spread? 6 The results of a mathematics test for two different classes of students are given in the table. Class A Class B a Construct a back to back stem-and-leaf plot to compare the data sets. b How many students in each class scored less than 50%? c Which class do you think performed better overall on the test? Give reasons for your answer Summarising data A statistic is a number that can be computed from data. Certain special statistics are called summary statistics, because they numerically summarise special features of the data set under consideration. Of course, whenever any set of numbers is summarised into just one or two figures much information is lost, but if the summary statistics are well chosen they will also help to reveal the message which may be hidden in the data set. Summary statistics are generally either measures of centre or measures of spread. There are many different examples for each of these measures and there are situations when one of the measures is more appropriate than another.

22 Measures of centre Chapter 22 Describing the distribution of a single variable 521 Mean The most commonly used measure of centre of a distribution of a numerical variable is the mean. This is calculated by summing all the data values and dividing by the number of values in the data set. Example 9 The following data set shows the number of premierships won by each of the current AFL teams, up until the end of Find the mean of the number of premiership wins. Team Premierships Carlton 16 Essendon 16 Collingwood 14 Melbourne 12 Fitzroy/Lions 11 Richmond 10 Hawthorn 9 Geelong 6 Kangaroos 4 Sydney 3 West Coast 2 Adelaide 2 Port Adelaide 1 W Bulldogs 1 St Kilda 1 Fremantle 0 Solution mean = = The mean of a sample is always denoted by the symbol x, which is called x bar. In general, if n observations are denoted by x 1, x 2,...,x n the mean is x = x 1 + x 2 + +x n n or, in a more compact version x = 1 n x i n i=1 where the symbol is the upper case Greek sigma, which in mathematics means the sum of the terms.

23 522 Essential Advanced General Mathematics Note: The subscripts on the x s are used to identify all of the n different values of x. They do not mean that the x s have to be written in any special order. The values of x in the example are in order only because they were listed in that way in the table. Median Another useful measure of the centre of a distribution of a numerical variable is the middle value, or median.tofind the value of the median, all the observations are listed in order and the middle one is the median. The median of median is 6, as there are five observations on either side of this value when the data are listed in order. Example 10 Find the median number of premierships in the AFL ladder using the data in Example 9. Solution As the data are already given in order, it only remains to decide which is the middle observation Since there are 16 entries in the table there is no actual middle observation, so the median is chosen as the value half way between the two middle observations, in this case the eighth and ninth (6 and 4). Thus the median is equal to 1 (6 + 4) = 5. The 2 interpretation here is that of the teams currently playing in the AFL, half (or 50%) have won the premiership 5 or more times and half (or 50%) have have won the premiership 5 or less times. In general, to compute the median of a distribution: Arrange all the observations in ascending order according to size. ( ) n + 1 th If n, the number of observations, is odd, then the median is the 2 observation from the end of the list. If n, the number of observations, is even, then the median is found by averaging the two middle observations in the list. That is, to find the median the nth and the ( 2 n ) th observations are added together, and divided by 2. The median value is easily determined from a stem-and-leaf plot by counting to the required observation or observations from either end.

24 Chapter 22 Describing the distribution of a single variable 523 From Examples 10 and 11, the mean number of times premierships won (6.8) and the median number of premierships won (5) have already been determined. These values are different and the interesting question is: why are they different, and which is the better measure of centre for this example? To help answer this question consider a stem-and-leaf plot of these data From the stem-and-leaf plot it can be seen that the distribution is positively skewed. This example illustrates a property of the mean. When the distribution is skewed or if there are one or two very extreme values, then the value of the mean may be quite significantly affected. The median is not so affected by unusual observations, however, and is thus often a preferable measure of centre. When this is the case, the median is generally preferred as a measure of centre as it will give a better typical value of the variable under consideration. Mode The mode is the observation which occurs most often. It is a useful summary statistic, particularly for categorical data which do not lend themselves to some of the other numerical summary methods. Many texts state that the mode is a third option for a measure of centre but this is generally not true. Sometimes data sets do not have a mode, or they have several modes, or they have a mode which is at one or other end of the range of values. Measures of spread Range A measure of spread is calculated in order to judge the variability of a data set. That is, are most of the values clustered together, or are they rather spread out? The simplest measure of spread can be determined by considering the difference between the smallest and the largest observations. This is called the range. Example 11 Consider the marks, for two different tasks, awarded to a group of students. Task A Task B Find the range of each of these data sets.

25 524 Essential Advanced General Mathematics Solution For Task A, the minimum mark is 2 and the maximum mark is 94. Range for Task A = 94 2 = 92 For Task B, the minimum mark is 11 and the maximum mark is 91. Range for Task B = = 80 The range for Task A is greater than the range for Task B. Is the range a useful summary statistic for comparing the spread of the two distributions? To help make this decision, consider the stem-and-leaf plots of the data sets: Task A Task B From the stem-and-leaf plots of the data it appears that the spread of marks for the two tasks is not well described by the range. The marks for Task A are more concentrated than the marks for Task B, except for the two unusual values for Task A. Another measure of spread is needed, one which is not so influenced by these extreme values. For this the interquartile range is used. Interquartile range To find the interquartile range of a distribution: Arrange all observations in order according to size. Divide the observations into two equal-sized groups. If n, the number of observations, is odd, then the median is omitted from both groups. Locate Q 1, the first quartile, which is the median of the lower half of the observations, and Q 3, the third quartile, which is the median of the upper half of the observations. The interquartile range IQR is defined as the difference between the quartiles. That is IQR = Q 3 Q 1

26 Chapter 22 Describing the distribution of a single variable 525 Definitions of the quartiles of a distribution sometimes differ slightly from the one given here. Using different definitions may result in slight differences in the values obtained, but these will be minimal and should not be considered a difficulty. Example 12 Find the interquartile ranges for Task A and Task B data given in Example 11. Solution For Task A the marks listed in order are: Since there is an even number of observations, then the lower half is: The median of this lower group is the eighth observation, 22, so Q 1 = 22. The upper half is: The median of this upper group is 47, so Q 3 = 47 Thus, the interquartile range, IQR = = 25 Similarly, for Task B data, the lower quartile = 31 and the upper quartile = 73, giving an interquartile range for this data set of 42. Comparing the two values of interquartile range shows the spread of Task A marks to be much smaller than the spread of Task B marks, which seems consistent with the display. The interquartile range is a measure of spread of a distribution which describes the range of the middle 50% of the observations. Since the upper 25% and the lower 25% of the observations are discarded, the interquartile range is generally not affected by the presence of outliers in the data set, which makes it a reliable measure of spread. The median and quartiles of a distribution may also be determined from a cumulative relative frequency polygon. Since the median is the observation which divides the data set in half, this is the data value which corresponds to a cumulative relative frequency of 0.5 or 50%. Similarly, the first quartile corresponds to a cumulative relative frequency of 0.25 or 25%, and the third quartile corresponds to a cumulative relative frequency of 0.75 or 75%.

27 526 Essential Advanced General Mathematics Example 13 Use the cumulative relative frequency polygon to find the median and the interquartile range for the data set shown in the graph. % Solution From the plot of the data it can be seen that the median is 10, the first quartile is 8, the third quartile is 12 and hence the interquartile range is 12 8 = 4. Standard deviation Another extremely useful measure of spread is the standard deviation. Itisderived by considering the distance of each observation from the sample mean. If the average of these distances is used as a measure of spread it will be found that, as some of these distances are positive and some are negative, adding them together results in a total of zero. A more useful measure will result if the distances are squared (which makes them all positive) and are then added together. The variance is defined as a kind of average of these squared distances. When the variance is calculated from a sample, rather than the whole population, the average is calculated by dividing by n 1, rather than n. For the remainder of this discussion it will be assumed that the data under consideration are from a sample. Since the variance has been calculated by squaring the data values it is sensible to find the square root of the variance, so that the measure reverts to a scale comparable to the original data. This results in measure of spread which is called the standard deviation. Standard deviation calculated from a sample is denoted s. Formally the standard deviation may be defined as follows. If a data set consists of n observations denoted x 1, x 2,...,x n, the standard deviation is 1 [ s = (x1 x) 2 + (x 2 x) 2 + +(x n x) 2] n 1 or, in more compact notation, s = 1 n (x i x) n 1 2 i=1

28 Chapter 22 Describing the distribution of a single variable 527 Example 14 Calculate the standard deviation of the following data set Solution Construct a table as shown. x i x i x (x i x) x i = 100 (x i x) 2 = From the table, the standard deviation s is: s = 9 = = 3.53 Interpreting the standard deviation The standard deviation can be made more meaningful by interpreting it in relation to the data set. The interquartile range gives the spread of the middle 50% of the data. Can similar statements be made about the standard deviation? It can be shown that, for most data sets, about 95% of the observations lie within two standard deviations of the mean. Example 15 The cost of a lettuce at a number of different shops on a particular day is given in the table: $3.85 $2.65 $1.90 $2.95 $2.40 $2.42 $2.63 $3.20 $4.20 $2.33 $0.85 $3.81 $1.69 $3.66 $2.60 $2.70 $3.10 $2.80 $1.80 $2.88 $1.40 Calculate the mean cost, the standard deviation and the interval equivalent to two standard deviations above and below the mean.

29 528 Essential Advanced General Mathematics Solution The mean cost is $2.66 and the standard deviation is $0.84. The interval equivalent to two standard deviations above and below the mean is: [ , ] = [0.98, 4.34]. In this case, 20 of the 21 observations, or 95% of observations, have values within the interval calculated. Example 16 The prices of forty secondhand motorbikes listed in a newspaper are as follows: $5442 $5439 $2523 $2358 $2363 $2244 $1963 $2142 $2220 $1356 $738 $656 $715 $1000 $1214 $1788 $3457 $4689 $8218 $ $ $ $8770 $8450 $6469 $7148 $ $ $ $ $ $9878 $5294 $3847 $4219 $4786 $2280 $3019 $7645 $8079 Determine the interval equivalent to two standard deviations above and below the mean. Solution The mean price is $5729 and the standard deviation is $4233 (to the nearest whole dollar). The interval equivalent to two standard deviations above and below the mean is: [ , ] = [ 2737, ]. The negative value does not give a sensible solution and should be replaced by of the 40 observations, or 95% of observations, have values within the interval. The exact percentage of observations which lie within two standard deviations of the mean varies from data set to data set, but in general it will be around 95%, particularly for symmetric data sets. It was noted earlier that even a single outlier can have a very marked effect on the value of the mean of a data set, while leaving the median unchanged. The same is true when the effect of an outlier on the standard deviation is considered, in comparison to the interquartile range. The median and interquartile range are called resistant measures, while the mean and standard deviation are not resistant measures. When considering a data set it is necessary to do more than just compute the mean and standard variation. First it is necessary to examine the data, using a histogram or stem-and-leaf plot to determine which set of summary statistics is more suitable.

30 Chapter 22 Describing the distribution of a single variable 529 Using the TI-Nspire The calculator can be used to calculate the values of all of the summary statistics in this section. Consider the data from Example 16. The data is easiest entered in a Lists & Spreadsheet application ( 3). Firstly, use the up/down arrows ( ) to name the first column bike. Then enter each of the 40 numbers as shown. Open a Calculator application ( 1) to calculate the summary statistics. Select the One-Variable Statistics command from the Stat Calculations submenu of the Statistics menu (b 6 11), specify in the dialog box that there is only one list, and then complete the final dialog box as shown. Press enter to calculate the values of the summary statistics. Use the up arrow ( ) to view the rest of the summary statistics.

31 530 Essential Advanced General Mathematics The calculator can also be used to determine the summary statistics when the data is given in a frequency table such as: x Frequency The data is easiest entered in a Lists & Spreadsheet application ( 3). Firstly, use the up/down arrows ( ) to name the first column x and the second column freq. Then enter the data as shown. Open a Calculator application ( 1) to calculate the summary statistics. Select the One-Variable Statistics command from the Stat Calculations submenu of the Statistics menu (b 6 11), specify in the dialog box that there is only one list, and then complete the final dialog box as shown. Press enter to calculate the values of the summary statistics. Using the Casio ClassPad Consider the following heights in cm of a group of eight women. 176, 160, 163, 157, 168, 172, 173, 169 Enter the data into list1 in the module. Tap Calc, One-Variable and when prompted ensure that the XList is set to list1 and the Freq = 1 (since each score is entered individually). The calculator returns the results as shown and all univariate statistics can be viewed by using the scroll bar. Note that the standard deviation is given by x n 1. Where data is grouped, the scores are entered in list1 and the frequencies in list2. In this case, in Set Calculation use the drop-down arrow to select list2 as the location for the frequencies.

32 Chapter 22 Describing the distribution of a single variable 531 Exercise 22F 1 Find the mean and the median of the following data sets. Examples 9, 10 a Examples 11, 12 b c d Find the mean and the median of the following data sets. a x Frequency b x Frequency The price, in dollars, of houses sold in a particular suburb during a one-week period are given in the following list. $ $ $ $ $ $ $ $ $ $ $ $ $ $ Find the mean and the median of the prices. Which do you think is a better measure of centre of the data set? Explain your answer. 4 Concerned with the level of absence from his classes a teacher decided to investigate the number of days each student had been absent from the classes for the year to date. These are his results. No. of days missed No. of students Find the mean and the median number of days each student had been absent so far that year. Which is the better measure of centre in this case? 5 Find the range and the interquartile range for each of the following data sets. a b c d

33 532 Essential Advanced General Mathematics Example 14 Example 15 Example 13 6 The serum cholesterol levels for a sample of twenty people are: a Find the range of the serum cholesterol levels. b Find the interquartile range of the serum cholesterol levels. 7 Twenty babies were born at a local hospital on one weekend. Their birth weights, in kg, are given in the stem-and-leaf plot below. a b represent 3.6 kg Find the range of the birth weights. Find the interquartile range of the birth weights. 8 Find the standard deviation for the following data sets. a b $2.52 $4.38 $3.60 $2.30 $3.45 $5.40 $4.43 $2.27 $4.50 $4.32 $5.65 $6.89 $1.98 $4.60 $5.12 $3.79 $4.99 $3.02 c d For each of the following data sets a calculate the mean and the standard deviation b determine the percentage of observations falling within two standard deviations of the mean. i ii Agroup of university students was asked to write down their ages with the following results a b c Construct a cumulative relative frequency polygon and use it to find the median and the interquartile range of this data set. Find the mean and standard deviation of the ages. Find the percentage of students whose ages fall within two standard deviations of the mean.

34 Example 17 Chapter 22 Describing the distribution of a single variable The results of a student s chemistry experiment are as follows a i Find the mean and the median of the results. ii Find the interquartile range and the standard deviation of the results. b Unfortunately when the student was transcribing his results into his chemistry book he made a small error, and wrote: c i Find the mean and the median of these results. ii Find the interquartile range and the standard deviation of these results. Describe the effect the error had on the summary statistics calculated in parts a and b. 12 A selection of shares traded on the stock exchange had a mean price of $50 with a standard deviation of $3. Determine an interval which would include approximately 95% of the share prices. 13 A store manager determined the store s mean daily receipts as $550, with a standard deviation of $200. On what proportion of days were the daily receipts between $150 and $950? 22.7 The boxplot Knowing the median and quartiles of a distribution means that quite a lot is known about the central region of the data set. If something is known about the tails of the distribution then a good picture of the whole data set can be obtained. This can be achieved by knowing the maximum and minimum values of the data. These five important statistics can be derived from a data set: the median, the two quartiles and the two extremes. These values are called the five-figure summary and can be used to provide a succinct pictorial representation of a data set called the box and whisker plot, orboxplot. For this visual display, a box is drawn with the ends at the first and third quartiles. Lines are drawn which join the ends of the box to the minimum and maximum observations. The median is indicated by a vertical line in the box. Example 17 Draw a boxplot to show the number of hours spent on a project by individual students in a particular school

35 534 Essential Advanced General Mathematics Solution First arrange the data in order From this ordered list prepare the five-figure summary. median, m = first quartile, Q 1 = = third quartile, Q 3 = = minimum = 2 maximum = 264 The boxplot can then be drawn min = 2 m = 71 Q 1 = 25.5 Q 3 = max = 264 In general, to draw a boxplot: Arrange all the observations in order, according to size. Determine the minimum value, the first quartile, the median, the third quartile, and the maximum value for the data set. Draw a horizontal box with the ends at the first and third quartiles. The height of the box is not important. Join the minimum value to the lower end of the box with a horizontal line. Join the maximum value to the upper end of the box with a horizontal line. Indicate the location of the median with a vertical line. Using a graphics calculator Agraphics calculator can be used to construct a boxplot. Consider the data from Example 17. Enter the data into a list named HOURS. Todraw the boxplot press 2ND STAT PLOT and select and turn on Plot1, as previously described. 300

36 Chapter 22 Describing the distribution of a single variable 535 Press the down arrow key and select from the Type menu the boxplot icon as shown, then press ENTER. Use the LIST menu to paste HOURS as the Xlist. Your calculator screen should appear like this. To bring up the boxplot, press ZOOM and then 9:ZoomStat.Your calculator screen should now look like this. To find out values for the five-figure summary, select TRACE. The symmetry of a data set can be determined from a boxplot. If a data set is symmetric, then the median will be located approximately in the centre of the box, and the tails will be of similar length. This is illustrated in the following diagram, which shows the same data set displayed as a histogram and a boxplot. A median placed towards the left of the box, and/or a long tail to the right indicates a positively skewed distribution, as shown in this plot.

37 536 Essential Advanced General Mathematics A median placed towards the right of the box, and/or a long tail to the left indicates a negatively skewed distribution,asillustrated here. A more sophisticated version of a boxplot can be drawn with the outliers in the data set identified. This is very informative, as one cannot tell from the previous boxplot if an extremely long tail is caused by many observations in that region or just one. Before drawing this boxplot the outliers in the data set must be identified. The term outlier is used to indicate an observation which is rather different from other observations. Sometimes it is difficult to decide whether or not an observation should be designated as an outlier. The interquartile range can be used to give a very useful definition of an outlier. An outlier is any number which is more than 1.5 interquartile ranges above the upper quartile, or more than 1.5 interquartile ranges below the lower quartile. When drawing a boxplot, any observation identified as an outlier is indicated by an asterisk, and the whiskers are joined to the smallest and largest values which are not outliers. Example 18 Use the data from Example 17 to draw a boxplot with outliers. Solution median = 71 interquartile range = Q 3 Q 1 = = 84 An outlier will be any observation which is less than = 100.5, which is impossible, or greater than = From the data it can be seen that there is only one observation greater than this, 264, which would be denoted with an asterisk. The upper whisker is now drawn from the edge of the box to the largest observation less than 235.5, which is 226. *

38 Chapter 22 Describing the distribution of a single variable 537 Using the TI-Nspire The calculator can be used to construct a boxplot. Consider the data from Example 17. The data is easiest entered in a Lists & Spreadsheet application ( 3). Firstly, use the up/down arrows ( )to name the first column hours. Then enter each of the 33 numbers as shown. Open a Data & Statistics application ( 5) tograph the data. At first the data displays as shown. Specify the x variable by selecting Add X Variable from the Plot Properties (b 2 4) and selecting hours. The data now displays as shown. (Note: It is also possible to use the NavPad to move down below the x-axis and click to add the x variable.) Select Box Plot from the Plot Type menu (b 12). The data now displays as shown. Notice how the calculator, by default, shows any outlier(s).

1.1 variable Categorical data categorical Numerical data numerical

1.1 variable Categorical data categorical Numerical data numerical C H A P T E R 1 Univariate data What are categorical and numerical data? What is a bar chart and when is it used? What is a histogram and when is it used? What is a stem-and-leaf plot and when is it used?

More information

Summarising numerical data

Summarising numerical data 2 Core: Data analysis Chapter 2 Summarising numerical data 42 Core Chapter 2 Summarising numerical data 2A Dot plots and stem plots Even when we have constructed a frequency table, or a histogram to display

More information

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data Chapter 2: Summarising numerical data Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data Extract from Study Design Key knowledge Types of data: categorical (nominal and ordinal)

More information

IB Questionbank Mathematical Studies 3rd edition. Grouped discrete. 184 min 183 marks

IB Questionbank Mathematical Studies 3rd edition. Grouped discrete. 184 min 183 marks IB Questionbank Mathematical Studies 3rd edition Grouped discrete 184 min 183 marks 1. The weights in kg, of 80 adult males, were collected and are summarized in the box and whisker plot shown below. Write

More information

STRAND E: STATISTICS. UNIT E4 Measures of Variation: Text * * Contents. Section. E4.1 Cumulative Frequency. E4.2 Box and Whisker Plots

STRAND E: STATISTICS. UNIT E4 Measures of Variation: Text * * Contents. Section. E4.1 Cumulative Frequency. E4.2 Box and Whisker Plots STRAND E: STATISTICS E4 Measures of Variation Text Contents * * Section E4.1 E4.2 Box and Whisker Plots E4 Measures of Variation E4.1 * frequencies are useful if more detailed information is required about

More information

Exploring and describing data

Exploring and describing data 10 Exploring and describing data Syllabus topic S1.2 Exploring and describing data arising from a single continuous variable This topic will develop your skills in calculating summary statistics for single

More information

STAT 200 Chapter 1 Looking at Data - Distributions

STAT 200 Chapter 1 Looking at Data - Distributions STAT 200 Chapter 1 Looking at Data - Distributions What is Statistics? Statistics is a science that involves the design of studies, data collection, summarizing and analyzing the data, interpreting the

More information

LC OL - Statistics. Types of Data

LC OL - Statistics. Types of Data LC OL - Statistics Types of Data Question 1 Characterise each of the following variables as numerical or categorical. In each case, list any three possible values for the variable. (i) Eye colours in a

More information

Chapter 4. Displaying and Summarizing. Quantitative Data

Chapter 4. Displaying and Summarizing. Quantitative Data STAT 141 Introduction to Statistics Chapter 4 Displaying and Summarizing Quantitative Data Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 31 4.1 Histograms 1 We divide the range

More information

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives F78SC2 Notes 2 RJRC Algebra It is useful to use letters to represent numbers. We can use the rules of arithmetic to manipulate the formula and just substitute in the numbers at the end. Example: 100 invested

More information

3.1 Measure of Center

3.1 Measure of Center 3.1 Measure of Center Calculate the mean for a given data set Find the median, and describe why the median is sometimes preferable to the mean Find the mode of a data set Describe how skewness affects

More information

are the objects described by a set of data. They may be people, animals or things.

are the objects described by a set of data. They may be people, animals or things. ( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms

More information

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected What is statistics? Statistics is the science of: Collecting information Organizing and summarizing the information collected Analyzing the information collected in order to draw conclusions Two types

More information

CHAPTER 1. Introduction

CHAPTER 1. Introduction CHAPTER 1 Introduction Engineers and scientists are constantly exposed to collections of facts, or data. The discipline of statistics provides methods for organizing and summarizing data, and for drawing

More information

Types of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511

Types of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511 Topic 2 - Descriptive Statistics STAT 511 Professor Bruce Craig Types of Information Variables classified as Categorical (qualitative) - variable classifies individual into one of several groups or categories

More information

1. Exploratory Data Analysis

1. Exploratory Data Analysis 1. Exploratory Data Analysis 1.1 Methods of Displaying Data A visual display aids understanding and can highlight features which may be worth exploring more formally. Displays should have impact and be

More information

Lecture 1: Descriptive Statistics

Lecture 1: Descriptive Statistics Lecture 1: Descriptive Statistics MSU-STT-351-Sum 15 (P. Vellaisamy: MSU-STT-351-Sum 15) Probability & Statistics for Engineers 1 / 56 Contents 1 Introduction 2 Branches of Statistics Descriptive Statistics

More information

Chapter 2: Tools for Exploring Univariate Data

Chapter 2: Tools for Exploring Univariate Data Stats 11 (Fall 2004) Lecture Note Introduction to Statistical Methods for Business and Economics Instructor: Hongquan Xu Chapter 2: Tools for Exploring Univariate Data Section 2.1: Introduction What is

More information

Shape, Outliers, Center, Spread Frequency and Relative Histograms Related to other types of graphical displays

Shape, Outliers, Center, Spread Frequency and Relative Histograms Related to other types of graphical displays Histograms: Shape, Outliers, Center, Spread Frequency and Relative Histograms Related to other types of graphical displays Sep 9 1:13 PM Shape: Skewed left Bell shaped Symmetric Bi modal Symmetric Skewed

More information

Introduction to Statistics

Introduction to Statistics Introduction to Statistics Data and Statistics Data consists of information coming from observations, counts, measurements, or responses. Statistics is the science of collecting, organizing, analyzing,

More information

Elementary Statistics

Elementary Statistics Elementary Statistics Q: What is data? Q: What does the data look like? Q: What conclusions can we draw from the data? Q: Where is the middle of the data? Q: Why is the spread of the data important? Q:

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

Sampling, Frequency Distributions, and Graphs (12.1)

Sampling, Frequency Distributions, and Graphs (12.1) 1 Sampling, Frequency Distributions, and Graphs (1.1) Design: Plan how to obtain the data. What are typical Statistical Methods? Collect the data, which is then subjected to statistical analysis, which

More information

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?!

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?! Topic 3: Introduction to Statistics Collecting Data We collect data through observation, surveys and experiments. We can collect two different types of data: Categorical Quantitative Algebra 1 Table of

More information

STRAND E: STATISTICS E2 Data Presentation

STRAND E: STATISTICS E2 Data Presentation STRAND E: STATISTICS E2 Data Presentation Text Contents * * Section E2.1 Pie Charts E2.2 Line Graphs E2.3 Stem and Leaf Plots E2.4 Graphs: Histograms E2 Data Presentation E2.1 Pie Charts Pie charts, which

More information

Chapter 4 Statistics

Chapter 4 Statistics Chapter 4 Section 4.1The mean, mode, median and Range The idea of an average is extremely useful, because it enables you to compare one set of data with another set by comparing just two values their averages.

More information

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things. (c) Epstein 2013 Chapter 5: Exploring Data Distributions Page 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms Individuals are the objects described by a set of data. These individuals

More information

Topic 2 Part 1 [195 marks]

Topic 2 Part 1 [195 marks] Topic 2 Part 1 [195 marks] The distribution of rainfall in a town over 80 days is displayed on the following box-and-whisker diagram. 1a. Write down the median rainfall. 1b. Write down the minimum rainfall.

More information

TOPIC: Descriptive Statistics Single Variable

TOPIC: Descriptive Statistics Single Variable TOPIC: Descriptive Statistics Single Variable I. Numerical data summary measurements A. Measures of Location. Measures of central tendency Mean; Median; Mode. Quantiles - measures of noncentral tendency

More information

Investigating data Which capital city in Australia has the highest average temperature? Does Melbourne have higher rainfall than Sydney?

Investigating data Which capital city in Australia has the highest average temperature? Does Melbourne have higher rainfall than Sydney? 6 Statistics and probability Investigating data Which capital city in Australia has the highest average temperature? Does Melbourne have higher rainfall than Sydney? To answer these questions, sets of

More information

Measures of. U4 C 1.2 Dot plot and Histogram 2 January 15 16, 2015

Measures of. U4 C 1.2 Dot plot and Histogram 2 January 15 16, 2015 U4 C 1. Dot plot and Histogram January 15 16, 015 U 4 : C 1.1 CCSS. 9 1.S ID.1 Dot Plots and Histograms Objective: We will be able to represent data with plots on the real number line, using: Dot Plots

More information

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode. Chapter 3 Numerically Summarizing Data Chapter 3.1 Measures of Central Tendency Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode. A1. Mean The

More information

1.3: Describing Quantitative Data with Numbers

1.3: Describing Quantitative Data with Numbers 1.3: Describing Quantitative Data with Numbers Section 1.3 Describing Quantitative Data with Numbers After this section, you should be able to MEASURE center with the mean and median MEASURE spread with

More information

AP Final Review II Exploring Data (20% 30%)

AP Final Review II Exploring Data (20% 30%) AP Final Review II Exploring Data (20% 30%) Quantitative vs Categorical Variables Quantitative variables are numerical values for which arithmetic operations such as means make sense. It is usually a measure

More information

Number of fillings Frequency q 4 1. (a) Find the value of q. (2)

Number of fillings Frequency q 4 1. (a) Find the value of q. (2) 1. The table below shows the frequency distribution of the number of dental fillings for a group of 25 children. Number of fillings 0 1 2 3 4 5 Frequency 4 3 8 q 4 1 Find the value of q. Use your graphic

More information

Math 140 Introductory Statistics

Math 140 Introductory Statistics Math 140 Introductory Statistics Professor Silvia Fernández Chapter 2 Based on the book Statistics in Action by A. Watkins, R. Scheaffer, and G. Cobb. Visualizing Distributions Recall the definition: The

More information

Math 140 Introductory Statistics

Math 140 Introductory Statistics Visualizing Distributions Math 140 Introductory Statistics Professor Silvia Fernández Chapter Based on the book Statistics in Action by A. Watkins, R. Scheaffer, and G. Cobb. Recall the definition: The

More information

SAMPLE. Investigating the relationship between two numerical variables. Objectives

SAMPLE. Investigating the relationship between two numerical variables. Objectives C H A P T E R 23 Investigating the relationship between two numerical variables Objectives To use scatterplots to display bivariate (numerical) data To identify patterns and features of sets of data from

More information

1.3.1 Measuring Center: The Mean

1.3.1 Measuring Center: The Mean 1.3.1 Measuring Center: The Mean Mean - The arithmetic average. To find the mean (pronounced x bar) of a set of observations, add their values and divide by the number of observations. If the n observations

More information

Revision Topic 13: Statistics 1

Revision Topic 13: Statistics 1 Revision Topic 13: Statistics 1 Averages There are three common types of average: the mean, median and mode. The mode (or modal value) is the data value (or values) that occurs the most often. The median

More information

Lecture 1 : Basic Statistical Measures

Lecture 1 : Basic Statistical Measures Lecture 1 : Basic Statistical Measures Jonathan Marchini October 11, 2004 In this lecture we will learn about different types of data encountered in practice different ways of plotting data to explore

More information

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data Review for Exam #1 1 Chapter 1 Population the complete collection of elements (scores, people, measurements, etc.) to be studied Sample a subcollection of elements drawn from a population 11 The Nature

More information

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency Math 1 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency The word average: is very ambiguous and can actually refer to the mean, median, mode or midrange. Notation:

More information

Vocabulary: Samples and Populations

Vocabulary: Samples and Populations Vocabulary: Samples and Populations Concept Different types of data Categorical data results when the question asked in a survey or sample can be answered with a nonnumerical answer. For example if we

More information

SAMPLE 4CORE. Displaying and describing relationships between two variables. 4.1 Investigating the relationship between two categorical variables

SAMPLE 4CORE. Displaying and describing relationships between two variables. 4.1 Investigating the relationship between two categorical variables C H A P T E R 4CORE Displaying and describing relationships between two variables What are the statistical tools for displaying and describing relationships between two categorical variables? a numerical

More information

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. Statistics is a field of study concerned with the data collection,

More information

STT 315 This lecture is based on Chapter 2 of the textbook.

STT 315 This lecture is based on Chapter 2 of the textbook. STT 315 This lecture is based on Chapter 2 of the textbook. Acknowledgement: Author is thankful to Dr. Ashok Sinha, Dr. Jennifer Kaplan and Dr. Parthanil Roy for allowing him to use/edit some of their

More information

Histograms allow a visual interpretation

Histograms allow a visual interpretation Chapter 4: Displaying and Summarizing i Quantitative Data s allow a visual interpretation of quantitative (numerical) data by indicating the number of data points that lie within a range of values, called

More information

Investigating data Which capital city in Australia has the highest average temperature? Does Melbourne have higher rainfall than Sydney?

Investigating data Which capital city in Australia has the highest average temperature? Does Melbourne have higher rainfall than Sydney? 5 Statistics and probability Investigating data Which capital city in Australia has the highest average temperature? Does Melbourne have higher rainfall than Sydney? To answer these questions, sets of

More information

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable QUANTITATIVE DATA Recall that quantitative (numeric) data values are numbers where data take numerical values for which it is sensible to find averages, such as height, hourly pay, and pulse rates. UNIVARIATE

More information

Representations of Data - Edexcel Past Exam Questions

Representations of Data - Edexcel Past Exam Questions Representations of Data - Edexcel Past Exam Questions 1. The number of caravans on Seaview caravan site on each night in August last year is summarised as follows: the least number of caravans was 10.

More information

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved. 1-1 Chapter 1 Sampling and Descriptive Statistics 1-2 Why Statistics? Deal with uncertainty in repeated scientific measurements Draw conclusions from data Design valid experiments and draw reliable conclusions

More information

Determining the Spread of a Distribution

Determining the Spread of a Distribution Determining the Spread of a Distribution 1.3-1.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3-2311 Lecture 3-2311 1 / 58 Outline 1 Describing Quantitative

More information

Determining the Spread of a Distribution

Determining the Spread of a Distribution Determining the Spread of a Distribution 1.3-1.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3-2311 Lecture 3-2311 1 / 58 Outline 1 Describing Quantitative

More information

Chapter2 Description of samples and populations. 2.1 Introduction.

Chapter2 Description of samples and populations. 2.1 Introduction. Chapter2 Description of samples and populations. 2.1 Introduction. Statistics=science of analyzing data. Information collected (data) is gathered in terms of variables (characteristics of a subject that

More information

Mathematics Second Practice Test 1 Levels 6-8 Calculator not allowed

Mathematics Second Practice Test 1 Levels 6-8 Calculator not allowed Mathematics Second Practice Test 1 Levels 6-8 Calculator not allowed Please read this page, but do not open your booklet until your teacher tells you to start. Write your name and the name of your school

More information

Student Performance Analysis. Algebra I Standards of Learning

Student Performance Analysis. Algebra I Standards of Learning Student Performance Analysis Algebra I Standards of Learning Practice for SOL A.1 Select each phrase that verbally translates this algebraic expression: One fourth times the cube root of x less five. One

More information

ST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart

ST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart ST2001 2. Presenting & Summarising Data Descriptive Statistics Frequency Distribution, Histogram & Bar Chart Summary of Previous Lecture u A study often involves taking a sample from a population that

More information

Chapter 6 Assessment. 3. Which points in the data set below are outliers? Multiple Choice. 1. The boxplot summarizes the test scores of a math class?

Chapter 6 Assessment. 3. Which points in the data set below are outliers? Multiple Choice. 1. The boxplot summarizes the test scores of a math class? Chapter Assessment Multiple Choice 1. The boxplot summarizes the test scores of a math class? Test Scores 3. Which points in the data set below are outliers? 73, 73, 7, 75, 75, 75, 77, 77, 77, 77, 7, 7,

More information

Univariate data. topic 12. Why learn this? What do you know? Learning sequence

Univariate data. topic 12. Why learn this? What do you know? Learning sequence topic 12 Univariate data 12.1 Overview Why learn this? According to the novelist Mark Twain, There are three kinds of lies: lies, damned lies and statistics. There is so much information in our lives,

More information

Chapter. Numerically Summarizing Data Pearson Prentice Hall. All rights reserved

Chapter. Numerically Summarizing Data Pearson Prentice Hall. All rights reserved Chapter 3 Numerically Summarizing Data Section 3.1 Measures of Central Tendency Objectives 1. Determine the arithmetic mean of a variable from raw data 2. Determine the median of a variable from raw data

More information

UNCORRECTED PAGE PROOFS

UNCORRECTED PAGE PROOFS STATiSTicS And probabilitybiliity Topic 12 Univariate data 12.1 Overview Why learn this? According to the novelist Mark Twain, There are three kinds of lies: lies, damned lies and statistics. There is

More information

CHAPTER 1 Exploring Data

CHAPTER 1 Exploring Data CHAPTER 1 Exploring Data 1.2 Displaying Quantitative Data with Graphs The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Displaying Quantitative Data

More information

A is one of the categories into which qualitative data can be classified.

A is one of the categories into which qualitative data can be classified. Chapter 2 Methods for Describing Sets of Data 2.1 Describing qualitative data Recall qualitative data: non-numerical or categorical data Basic definitions: A is one of the categories into which qualitative

More information

Unit 1: Statistics. Mrs. Valentine Math III

Unit 1: Statistics. Mrs. Valentine Math III Unit 1: Statistics Mrs. Valentine Math III 1.1 Analyzing Data Statistics Study, analysis, and interpretation of data Find measure of central tendency Mean average of the data Median Odd # data pts: middle

More information

Q1. The table shows information about some items for sale in a clothes shop.

Q1. The table shows information about some items for sale in a clothes shop. Foundation tier unit 3a check in test Non-calculator Q1. The table shows information about some items for sale in a clothes shop. Item Size Colour Price Dress large red 28 Trousers medium black 19 Shirt

More information

CHAPTER 8 INTRODUCTION TO STATISTICAL ANALYSIS

CHAPTER 8 INTRODUCTION TO STATISTICAL ANALYSIS CHAPTER 8 INTRODUCTION TO STATISTICAL ANALYSIS LEARNING OBJECTIVES: After studying this chapter, a student should understand: notation used in statistics; how to represent variables in a mathematical form

More information

CHAPTER 1 Univariate data

CHAPTER 1 Univariate data Chapter Answers Page 1 of 17 CHAPTER 1 Univariate data Exercise 1A Types of data 1 Numerical a, b, c, g, h Categorical d, e, f, i, j, k, l, m 2 Discrete c, g Continuous a, b, h 3 C 4 C Exercise 1B Stem

More information

Chapters 1 & 2 Exam Review

Chapters 1 & 2 Exam Review Problems 1-3 refer to the following five boxplots. 1.) To which of the above boxplots does the following histogram correspond? (A) A (B) B (C) C (D) D (E) E 2.) To which of the above boxplots does the

More information

Descriptive Statistics

Descriptive Statistics Contents 36 Descriptive Statistics 36.1 Describing Data 2 36.2 Exploring Data 26 Learning outcomes In the first Section of this Workbook you will learn how to describe data sets and represent them numerically

More information

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Lecture 2 Quantitative variables There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Stemplot (stem-and-leaf plot) Histogram Dot plot Stemplots

More information

Example 2. Given the data below, complete the chart:

Example 2. Given the data below, complete the chart: Statistics 2035 Quiz 1 Solutions Example 1. 2 64 150 150 2 128 150 2 256 150 8 8 Example 2. Given the data below, complete the chart: 52.4, 68.1, 66.5, 75.0, 60.5, 78.8, 63.5, 48.9, 81.3 n=9 The data is

More information

A C E. Answers Investigation 4. Applications

A C E. Answers Investigation 4. Applications Answers Applications 1. 1 student 2. You can use the histogram with 5-minute intervals to determine the number of students that spend at least 15 minutes traveling to school. To find the number of students,

More information

Resistant Measure - A statistic that is not affected very much by extreme observations.

Resistant Measure - A statistic that is not affected very much by extreme observations. Chapter 1.3 Lecture Notes & Examples Section 1.3 Describing Quantitative Data with Numbers (pp. 50-74) 1.3.1 Measuring Center: The Mean Mean - The arithmetic average. To find the mean (pronounced x bar)

More information

Topic 2 Part 3 [189 marks]

Topic 2 Part 3 [189 marks] Topic 2 Part 3 [189 marks] The grades obtained by a group of 13 students are listed below. 5 3 6 5 7 3 2 6 4 6 6 6 4 1a. Write down the modal grade. Find the mean grade. 1b. Write down the standard deviation.

More information

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES INTRODUCTION TO APPLIED STATISTICS NOTES PART - DATA CHAPTER LOOKING AT DATA - DISTRIBUTIONS Individuals objects described by a set of data (people, animals, things) - all the data for one individual make

More information

Section 3.2 Measures of Central Tendency

Section 3.2 Measures of Central Tendency Section 3.2 Measures of Central Tendency 1 of 149 Section 3.2 Objectives Determine the mean, median, and mode of a population and of a sample Determine the weighted mean of a data set and the mean of a

More information

MATH 1150 Chapter 2 Notation and Terminology

MATH 1150 Chapter 2 Notation and Terminology MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the

More information

Descriptive Statistics-I. Dr Mahmoud Alhussami

Descriptive Statistics-I. Dr Mahmoud Alhussami Descriptive Statistics-I Dr Mahmoud Alhussami Biostatistics What is the biostatistics? A branch of applied math. that deals with collecting, organizing and interpreting data using well-defined procedures.

More information

additionalmathematicsstatisticsadditi onalmathematicsstatisticsadditionalm athematicsstatisticsadditionalmathem aticsstatisticsadditionalmathematicsst

additionalmathematicsstatisticsadditi onalmathematicsstatisticsadditionalm athematicsstatisticsadditionalmathem aticsstatisticsadditionalmathematicsst additionalmathematicsstatisticsadditi onalmathematicsstatisticsadditionalm athematicsstatisticsadditionalmathem aticsstatisticsadditionalmathematicsst STATISTICS atisticsadditionalmathematicsstatistic

More information

Graphing Skill #1: What Type of Graph is it? There are several types of graphs that scientists often use to display data.

Graphing Skill #1: What Type of Graph is it? There are several types of graphs that scientists often use to display data. Graphing Skill #1: What Type of Graph is it? There are several types of graphs that scientists often use to display data. They include: Pie Graphs Bar Graphs Histograms Line Graphs Scatter Plots Dependent

More information

Range The range is the simplest of the three measures and is defined now.

Range The range is the simplest of the three measures and is defined now. Measures of Variation EXAMPLE A testing lab wishes to test two experimental brands of outdoor paint to see how long each will last before fading. The testing lab makes 6 gallons of each paint to test.

More information

What are the mean, median, and mode for the data set below? Step 1

What are the mean, median, and mode for the data set below? Step 1 Unit 11 Review Analyzing Data Name Per The mean is the average of the values. The median is the middle value(s) when the values are listed in order. The mode is the most common value(s). What are the mean,

More information

1. A machine produces packets of sugar. The weights in grams of thirty packets chosen at random are shown below.

1. A machine produces packets of sugar. The weights in grams of thirty packets chosen at random are shown below. No Gdc 1. A machine produces packets of sugar. The weights in grams of thirty packets chosen at random are shown below. Weight (g) 9.6 9.7 9.8 9.9 30.0 30.1 30. 30.3 Frequency 3 4 5 7 5 3 1 Find unbiased

More information

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables Chapter 2 Class Notes Sample & Population Descriptions Classifying variables Random Variables (RVs) are discrete quantitative continuous nominal qualitative ordinal Notation and Definitions: a Sample is

More information

5 + 5 = = = 9 2 = 45 = 5 35 = = = = 4 5 = 60 = = = 38 = = = = 5 10 = 5

5 + 5 = = = 9 2 = 45 = 5 35 = = = = 4 5 = 60 = = = 38 = = = = 5 10 = 5 Answers will vary. This is one example. Name Mental Maths Addition & Subtraction Multiplication & division 0 0 + = = + = = = = + = = + = = = 0 = + = = + = = 0 = 0 = + = = + = = = = + = = + = = 0 = = Number

More information

Paper Reference(s) 6683 Edexcel GCE Statistics S1 Advanced/Advanced Subsidiary Thursday 5 June 2003 Morning Time: 1 hour 30 minutes

Paper Reference(s) 6683 Edexcel GCE Statistics S1 Advanced/Advanced Subsidiary Thursday 5 June 2003 Morning Time: 1 hour 30 minutes Paper Reference(s) 6683 Edexcel GCE Statistics S1 Advanced/Advanced Subsidiary Thursday 5 June 2003 Morning Time: 1 hour 30 minutes Materials required for examination Answer Book (AB16) Graph Paper (ASG2)

More information

STRAND E: Data Analysis. UNIT E2 Data Presentation: Text. Contents. Section. E2.1 Pie Charts. E2.2 Line Graphs. E2.3 Stem and Leaf Plots

STRAND E: Data Analysis. UNIT E2 Data Presentation: Text. Contents. Section. E2.1 Pie Charts. E2.2 Line Graphs. E2.3 Stem and Leaf Plots STRAND E: Data Analysis E2 Data Presentation Text Contents Section E2.1 Pie Charts E2.2 Line Graphs E2.3 Stem and Leaf Plots E2.4 Graphs: Histograms E2. * Histograms with Unequal Class Intervals E2 Data

More information

UNIVERSITY OF MASSACHUSETTS Department of Biostatistics and Epidemiology BioEpi 540W - Introduction to Biostatistics Fall 2004

UNIVERSITY OF MASSACHUSETTS Department of Biostatistics and Epidemiology BioEpi 540W - Introduction to Biostatistics Fall 2004 UNIVERSITY OF MASSACHUSETTS Department of Biostatistics and Epidemiology BioEpi 50W - Introduction to Biostatistics Fall 00 Exercises with Solutions Topic Summarizing Data Due: Monday September 7, 00 READINGS.

More information

Statistics 1. Edexcel Notes S1. Mathematical Model. A mathematical model is a simplification of a real world problem.

Statistics 1. Edexcel Notes S1. Mathematical Model. A mathematical model is a simplification of a real world problem. Statistics 1 Mathematical Model A mathematical model is a simplification of a real world problem. 1. A real world problem is observed. 2. A mathematical model is thought up. 3. The model is used to make

More information

Performance of fourth-grade students on an agility test

Performance of fourth-grade students on an agility test Starter Ch. 5 2005 #1a CW Ch. 4: Regression L1 L2 87 88 84 86 83 73 81 67 78 83 65 80 50 78 78? 93? 86? Create a scatterplot Find the equation of the regression line Predict the scores Chapter 5: Understanding

More information

THE GREEN TEAM Activities and Lesson Plans Alignment with the Massachusetts Curriculum Framework for Mathematics, June 2017

THE GREEN TEAM Activities and Lesson Plans Alignment with the Massachusetts Curriculum Framework for Mathematics, June 2017 PreK Counting and Cardinality PK.CC.1. Listen to and say the names of numbers in meaningful contexts. PK.CC.2. Recognize and name written numerals 0 10. PK.CC.3. Understand the relationships between numerals

More information

Year 11 Intervention Book 1 (Number)

Year 11 Intervention Book 1 (Number) Year 11 Intervention Book 1 (Number) Name Target Grade My areas of strength in this booklet My areas for development in this booklet What can I do to improve? (Be specific) I spent hours on this book in

More information

8.1 Frequency Distribution, Frequency Polygon, Histogram page 326

8.1 Frequency Distribution, Frequency Polygon, Histogram page 326 page 35 8 Statistics are around us both seen and in ways that affect our lives without us knowing it. We have seen data organized into charts in magazines, books and newspapers. That s descriptive statistics!

More information

Statistics Add Ins.notebook. November 22, Add ins

Statistics Add Ins.notebook. November 22, Add ins Add ins We have LOADS of things we need to know for the IGCSE that you haven't learnt as part of the Bavarian Curriculum. We are now going to shoehorn in some of those topics and ideas. Nov 12 11:50 Main

More information

CHAPTER 2: Describing Distributions with Numbers

CHAPTER 2: Describing Distributions with Numbers CHAPTER 2: Describing Distributions with Numbers The Basic Practice of Statistics 6 th Edition Moore / Notz / Fligner Lecture PowerPoint Slides Chapter 2 Concepts 2 Measuring Center: Mean and Median Measuring

More information

Math 9: Review for final

Math 9: Review for final Lesson 1.1: Square Roots of Perfect Squares 1. Use each diagram to determine the value of the square root. 1 a) b) 0.16 9 2. Which numbers below are perfect squares? How do you know? a) 25 121 b) 2.89

More information

MEP Y7 Practice Book B

MEP Y7 Practice Book B 8 Quantitative Data 8. Presentation In this section we look at how vertical line diagrams can be used to display discrete quantitative data. (Remember that discrete data can only take specific numerical

More information

OCR Maths S1. Topic Questions from Papers. Representation of Data

OCR Maths S1. Topic Questions from Papers. Representation of Data OCR Maths S1 Topic Questions from Papers Representation of Data PhysicsAndMathsTutor.com 12 The back-to-back stem-and-leaf diagram below shows the number of hours of television watched per week by each

More information