ST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart

Size: px
Start display at page:

Download "ST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart"

Transcription

1 ST Presenting & Summarising Data Descriptive Statistics Frequency Distribution, Histogram & Bar Chart

2 Summary of Previous Lecture u A study often involves taking a sample from a population that contains all subjects of interest u With random sampling each subject in the population has the same chance of being in the sample u Continuous variables take any value in a given interval u Discrete variables take values from a finite or countably infinite set u Ordinal variables consist of ranked categories u Nominal variables have no assumptions about relations between values

3 Aim & Objectives

4 Aim u Discuss a set of statistical procedures known as descriptive statistics which encompass tabular, graphical and numerical methods

5 Objective u Construct a frequency distribution u Draw and interpret a histogram u Distinguish different distribution shapes u Display categorical data using tables and bar charts u Summarise numerical data using measures of centrality and variability u Compute quartiles and percentiles u Interpret summary statistics u Choose appropriate summary statistics u Draw and interpret a boxplot

6 2.1 Motivating Exercises

7 Motivating Exercise 1: Summer 2009 Q1b Scenario The ages of 90 people seen in the emergency room of a Dublin hospital on a Friday night were recorded. The results are summarised in the frequency table below: Age (years) Frequency

8 Motivating Exercise 1: Summer 2009 Q1b Questions to Explore i. Prepare a histogram for the ages and comment on its shape. ii. Based on the histogram in (i), suggest suitable measures of centrality and spread. Include an explanation for your choice. (Note: you do not need to calculate these measures.) Thinking Ahead We will show in this section how to summarise numerical data in tabular and graphical formats

9 Motivating Exercise 2: Summer 2011 Q2 Scenario A study examined the moisture content of fields in West Cork. The moisture content, measured as a percentage, for a random sample of 30 fields are given below

10 Motivating Exercise 2: Summer 2011 Q2 Questions to Explore u u u u u u u Calculate the mean and median. [8 marks] Calculate the quartiles and interpret these values. [12 marks] Construct a box-plot showing all steps in your calculations. [16 marks] Based on box-plot in (iii), provide suitable measures of centrality and spread. Include an explanation for your choice. (Note: you do not need to do any calculations.) [8 marks] Suppose low moisture content was defined as moisture content less than 10%. What percentage of fields would be classed as having low moisture content? [2 marks]. If you took a random sample of 200 fields in West Cork, how many low moisture content fields would you expect to find? [2 marks]. What other method of presentation would be appropriate for this data? (Note: You do not need to prepare this). [2 marks] Thinking Ahead In this section we study ways to describe the centre of quantitative data and the spread of quantitative data

11 2.2 Presenting Data Frequency Distribution

12 Frequency Distribution u Data should be organised and summarised in a form that allows interpretation and analysis u Methods of presentation should give an overall feel or impression of the data at a glance u Frequency Distribution

13 Frequency Distribution: Example Summer 2007 Q1 u A plant scientist wants to analyse the effect of thiamine hydrochloride (vitamin B 1 ) on vegetable transplants. A sample of 50 tomato plants treated with thiamine hydrochloride is randomly selected and observations on the height of plants 14 days after treatment are recorded. The results (in cm) are

14 Frequency Distribution List does not inform us of u Where the data is concentrated (how high most plants are) u How spread out the data is (how much variation there is in plant height) u About the extremes (whether any plants are unusually tall or short)

15 Frequency Distribution u Consists of a list of class intervals and frequencies u Class intervals must be mutually exclusive - every piece of data can be placed in one class u Class intervals must be all inclusive classes together contain all the data u No. of intervals between 6 and 12 in general u For data: first class interval 21.5 to <21.9

16 Frequency Distribution Class < < < < < < <24.3 Height (cm) Choose class intervals First class interval 21.5 to <21.9 Class width=0.4

17 Frequency Distribution Height (cm) Class Frequency < < < < < < < Count number in each class interval

18 Frequency Distribution Height (cm) Frequency < < < < < < < u Most plant heights are concentrated between 22.7cm and 23.1cm u Plants vary in height between 21.5cm and 24.3cm, with few towards these extremes u Minimum height is 21.5cm u Maximum height is 24.2cm (Found by looking at the original data) u No unusually tall or short plants

19 Frequency Distribution: In-class Exercise Autumn 2006 Q1(c) The data below are the weights (kg) of a random sample of 5 year-old children Prepare a frequency distribution of weights. Use classes of width 2 and the first class should be 16 and <18.

20 2.2 Presenting Data Histogram

21 Histogram u Graphical display of a frequency distribution u Collection of bars; one for each class interval u Base of the histogram represents the class intervals u Area of bar is proportional to the frequency of class u Classes of equal width: bar heights are proportional to class frequency

22 Histogram: Example A microbiologist has carried out an experiment investigating the genome size of 118 common viruses. The data is to be presented using a histogram from the following frequency distribution of the genome sizes (x1000 nucleotide pairs) of viruses: Genome Sizes (x1000 nucleotide pairs) Frequency ?

23 Histogram: Example Genome Sizes (x1000 nucleotide pairs) Frequency u The smallest class width is 10 u Let this be the standard u First two classes are both of width 20, twice the standard u Heights will be their frequencies divided by

24 Histogram: Example Genome Sizes (x1000 nucleotide pairs) Frequency u Last class is open-ended u Need an upper limit u Usually assume that the class is the same width as the adjacent one u In this case we would assign a limit of

25 Histogram: Example Genome Sizes (x1000 nucleotide pairs) Frequency u Nature of data might indicate a different limit u For example, if it were known that the maximum value possible was 105, that value would be used u If we knew that the data were percentages, we would assign a limit of 100 u Open-ended classes can also occur at the lower end of the frequency distribution

26 Histogram: Example Genome Sizes (x1000 nucleotide pairs) Class width/10 Frequency/Multiple Class Frequency Width Multiple Height

27 Histogram: Example

28 Histogram: Example F r e q u e n c y u Histograms quickly provide an idea of where the distribution of values is centred u Example: centred between 40 and Genome Size ( 1000 nucleotide pairs) u Histograms also give an idea about how spread out (variable) the distribution is

29 Histogram: Distribution Shape u When we have a large number of observations, the classes may be made narrower u Having more classes will give a much smoother appearance to the histogram u Histogram then becomes a frequency curve u Shape of the distribution can then be assessed

30 Histogram: Distribution Shape Frequency Skewed to the right (Positive skew)

31 Histogram: Distribution Shape Frequency Skewed to the left (Negative skew)

32 Histogram: Distribution Shape Symmetric Frequency

33 Histogram: Distribution Shape In-class Exercise (a)

34 Histogram: Distribution Shape In-class Exercise (b)

35 Histogram: Distribution Shape In-class Exercise (c)

36

37 Histogram: In-class Exercise Summer 2008 Q1(a) The concentration of nicotine in milligrams (Summer Exam 2008 Q1a) was measured for 98 brands of cigarettes. The results are summarised in the frequency table below: Nicotine (mg) Frequency Prepare a histogram for nicotine concentration and comment on its shape.

38 Histogram: In-class Exercise Summer 2008 Q1(a) Frequency Label the x and y axes Nicotine concentration (mg) Length of bar represents height Each bar represents a class

39 Histogram: Distribution Shape In-class Exercise Summer 2008 Q1(a) Frequency Nicotine concentration (mg)

40 2.2 Presenting Data Bar Chart

41 Bar Chart u Bar chart is useful for illustrating a frequency distribution for a categorical variable u Each category is represented by a bar u Widths of bars are equal u Length (or height) of the bar is proportional to the frequency within the category

42 Bar Chart: Example u Data identifies the satisfaction rating given by 440 customers: v 105 very satisfied v 134 satisfied v 30 dissatisfied v 171 very dissatisfied

43 Bar Chart: Example Frequency distribution Satisfaction Level Frequency Percentage (%) Very satisfied Satisfied Dissatisfied Very dissatisfied

44 Bar Chart: Example Percent Very satisfied Satisfied Dissatisfied Very dissatisfied Satisfaction Level

45 Summary u Developed a frequency distribution u Constructed and interpreted a histogram u Distinguished different distribution shapes u Displayed categorical data using tables and bar charts

46 What next? u Measures of centrality u Variability

47 2. Presenting & Summarising Data Descriptive Statistics Measures of Centrality & Variability

48 Summary of Previous Lecture u For quantitative (numerical) variables (discrete and continuous variables in which numbers are recorded), a frequency table is developed and the data is displayed using a histogram u The histogram shows the distribution shape of the data, such as whether the distribution is bell shaped, skewed to the right (longer tail pointing to the right) or skewed to the left (longer tail pointing to the left) u For categorical variables (ordinal and nominal variables in which categories are recorded), data are summarised using a frequency table and displayed using bar charts

49 Aim & Objectives

50 Objective u Construct a frequency distribution u Draw and interpret a histogram u Distinguish different distribution shapes u Display categorical data using tables and bar charts u Summarise numerical data using measures of centrality and variability u Compute quartiles and percentiles u Interpret summary statistics u Choose appropriate summary statistics u Draw and interpret a boxplot

51 2.3 Measures of Centrality

52 Measures of Centrality u A measure of centrality (or location) is used to indicate where the central tendency or the typical value of a sample (or population) lies u Two commonly used measures of centrality v Mean v Median

53 Mean u Mean (or arithmetic mean) is the most familiar and most useful average u It is calculated by summing all observations and dividing by the number of observations: x Population Mean: µ = N Sample Mean: x = n x

54 Mean: Example u A microbiologist is investigating the size of a certain type of cell. The following are the diameters (in µm) of a sample of 10 cells: 1.2, 2.3, 2.9, 3.4, 3.5, 3.5, 4.0, 4.1, 4.9, 5.0. What is the typical diameter of such cells?

55 Mean: Example Sample of 10 cells 1.2, 2.3, 2.9, 3.4, 3.5, 3.5, 4.0, 4.1, 4.9, 5.0 Solution: Compute the mean x = x n = = 3.48 µm

56 Median u Median is the middle observation in a list of observations in increasing order u Median (Med) is the value in position (n+1)/2

57 Median: Example u A veterinary pharmaceutical company has devised a formulation against canine ticks. To determine whether the formulation works, the company needs to first know how many ticks would be found in a dog s coat before treatment. These are the numbers of ticks counted in the coats of a sample of 9 dogs: 2, 3, 3, 4, 7, 9, 10, 10, 217

58 Median: Example u Numbers of ticks counted in the coats of a sample of 9 dogs: 2, 3, 3, 4, 7, 9, 10, 10, 217 u (n+1)/2=(9+1)/2=10/2=5 position of median in ordered list u Middle value in this ordered list of 9 observations is the 5th value u Med = 7

59 Median u When the list contains an even number of observations, there is no single middle value u In this case the median is taken to be mid-way between the 2 middle values

60 Median: Example u Consider a different sample, this time of 8 dogs: 3, 3, 4, 6, 9, 10, 10, 196 u Calculate the median

61 Median: Example u There are 8 values in the list 3, 3, 4, 6, 9, 10, 10, 196 u There is no middle value u Two middle values in positions 4 (n/2) and 5 ((n/2)+1) u Values are 6 and 9 u Med = 6 + ½ (9-6) = 7.5

62 Relationship between Mean and Median u Example 1 sample of 9 dogs: 2, 3, 3, 4, 7, 9, 10, 10, 217 u Example 2 sample of 8 dogs: 3, 3, 4, 6, 9, 10, 10, 196 u One very high value in these two examples u They had no effect on the median v Example 1 Med=7 v Example 2 Med=7.5 u Median is robust to extreme observations

63 Relationship between Mean and Median u When extreme values occur in a set of observations, the median is the more appropriate measure of central tendency u Mean will be strongly affected by extreme values u Extreme values will drag the mean towards them u Example 2 sample of 8 dogs: 3, 3, 4, 6, 9, 10, 10, 196 u Mean but the median is 7.5

64 Relationship between Mean and Median u Example 2 sample of 8 dogs: 3, 3, 4, 6, 9, 10, 10, 196 u Mean but the median is 7.5 u Which of these give a more reasonable measure of the typical number of ticks in a dog s coat? Answer: v Median of 7.5 as it is unaffected by the extreme value u Biological data frequently contain one or two extreme observations (usually large rather than small)

65 Relationship between Mean and Median u Similarity of the mean and median depends on the shape of the distribution Frequency Skewed to the right (Positive skew) Median Mean

66 Relationship between Mean and Median Frequency Skewed to the left (Negative skew) Mean Median

67 Relationship between Mean and Median Frequency Symmetric Mean=Median

68 Relationship between Mean and Median: In-class Exercise

69 Relationship between Mean and Median: In-class Exercise

70 Relationship between Mean and Median: In-class Exercise

71 Relationship between Mean and Median: In-class Exercise Summer 2008 Q1(a) 40 u Would the median be greater than the mean? 30 u Which would be the best Frequency Nicotine concentration (mg) measure of centrality? u Why?

72 2.4 Variability

73 Variability Same means Different variability

74 Variability u Spread u Dispersion u Variation Most commonly used measures of spread u Range u Variance u Standard Deviation

75 Range u Range is the difference between the largest and smallest observations u Range=maximum-minimum u Only uses two values v Extreme values u May not indicate true variability u Influenced by outliers

76 Range: Example u A microbiologist is interested in the cell division rates of a strain of bacteria. Calculate the range for the following sample of times (hours) for bacteria to double in size: 1, 2, 3, 3, 5, 9, 10, 41 u Range=maximum-minimum=41 1 = 40 hours

77 Range u Advantage of using the range is that it is simple to calculate u Only uses the two most extreme values u No information is used from the other observations

78 Range u Consider the previous example of sample of times (hours) for bacteria to double in size: 1, 2, 3, 3, 5, 9, 10, 41 Range=41-1=40 u Sensitive to values that are extreme, relative to adjacent value (outliers) u If there are outliers, the range can give a distorted measure of dispersion u Range is not robust

79 Variance u Variance uses every value in its calculation u To calculate the variance, the deviation from the mean of each observation is computed u For a distribution with little dispersion, most values will be close to the mean v Most deviations will be small u For a distribution with greater dispersion, many values will be far from the mean v Many deviations will be large

80 Variance u Deviation from the mean of each observation is calculated u Deviations are squared, summed and then divided by n-1 if the observations are from a sample, or N if the observations are from a population 2 population variance: σ = sample variance: s 2 = ( x x) n 1 ( x µ ) N 2 2 u Sample variance s 2 is an estimate of the population variance σ 2

81 Variance: Example u Calculate the variance for the following sample of butterfly wing lengths (mm): 23, 25, 29, 35, 41, 47, 52 2 ( x x) s = n 1 2

82 Variance: Example Data: 23, 25, 29, 35, 41, 47, 52 2 ( x x) s = n 1 2 x x = = n 252 = ( x 36) s = 6 2 = [(23-36) + (25 36) + (29 36) + (35 36) (41-36) + (47 36) + (52 36) ] 6 = [ ] 6 = = mm

83 Variance: In-class Exercise For the following data calculate the variance 3, 5, 6, 8, 6, 7, 7

84 Standard Deviation u Standard deviation is closely related to the variance u Standard deviation is the square root of the variance u Standard deviation is usually preferred to the variance because its units are the same as those of the data v Butterfly wing lengths-unit is mm; unit of standard deviation is also in mm u Sample standard deviation s is an estimate of the population standard deviation σ

85 Standard Deviation Warning: u Calculators provide two versions of s v Sample and population standard deviations v Formula for the population standard deviation is slightly different (the divisor is N, rather than n-1). u These keys give the sample standard deviation v S v σ n-1

86 In-class Exercise Sumer 2010 Q2i,v A study examined the distance (in km) between student s accommodation and their university. The results for 29 full-time students are given below. The study was restricted to full-time students u Calculate the mean and median. u Suppose an additional student who lived 20km from his university was added to the dataset. What effect would this have on the mean and median?

87 Summary u Calculated mean u Computed median u Examined the relationship between mean and median u Determined how to choose between mean and median u Described range u Calculated variance & standard deviation

88 What next? u Compute quartiles and percentiles u Draw and interpret a boxplot

89 2. Presenting & Summarising Data Descriptive Statistics Quartiles & Boxplots

90 Summary of Previous Lecture u For numerical variables, measures of centrality, the mean which is the sum of the observations divided by the number of observations and the median which divides the ordered data into two parts of equal numbers of observations were described u The median is a more representative summary than the mean when the data are highly skewed u The range is the difference between the largest and smallest observations. It uses only the two extreme values (minimum and maximum values) u The standard deviation describes the typical deviation from the mean

91 Aim & Objectives

92 Objective u Construct a frequency distribution u Draw and interpret a histogram u Distinguish different distribution shapes u Display categorical data using tables and bar charts u Summarise numerical data using measures of centrality and variability u Compute quartiles and percentiles u Interpret summary statistics u Choose appropriate summary statistics u Draw and interpret a boxplot

93 2.5 Quartiles and Percentiles

94 Recall the median u Median splits a list of ordered values or the distribution into 2 halves v 50% of values lie below Q 2 v Q 2 is the median Q 2

95 Quartiles u Median splits a list of ordered values or the distribution into 2 halves u Quartiles split the distribution into 4 quarters v Each quarter has the same number of observations u Quartiles are denoted by Q 1, Q 2 and Q 3 v 25% of values lie below Q 1 v 50% of values lie below Q 2 v 75% of values lie below Q 3 v Q 2 is the median Q 1 Q 2 Q 3

96 Quartiles u In an ordered (increasing) list of data Q 1 is in the ¼(n+1) th position Q 3 is in the ¾(n+1) th position u Unless n+1 is divisible by 4, the quartiles cannot be calculated directly u As with the median, we may need to interpolate between values

97 Quartiles: Example u Due to increased shipping traffic in the Straights of Gibraltar, it is feared that sightings of bottle-nosed dolphins may decrease. A baseline study of the species has been carried out. The following are the numbers of sightings of different pods of bottle-nosed dolphins per day over a 2-week period: 4, 6, 7, 8, 8, 9, 12, 13, 14, 16, 16, 19, 20, 22 Calculate the quartiles

98 Quartiles: Example u Data 4, 6, 7, 8, 8, 9, 12, 13, 14, 16, 16, 19, 20, 22 Find Q 1 u n = 14 u Q 1 is in the ¼(14+1) = 3.75 th position u This position lies 0.75 times the distance between the 3 rd and 4 th positions u Q 1 = (8-7) u = rd position 4 th position

99 Quartiles: Example u Data 4, 6, 7, 8, 8, 9, 12, 13, 14, 16, 16, 19, 20, 22 Find Q 2 u n = 14 7 th position 8 th position u Q 2 is in the ½(14+1) = 7.5 th position u This position lies 0.5 times the distance between the 7 th and 8 th positions u Q 2 = (13-12) u = 12.5

100 Quartiles: Example u Data 4, 6, 7, 8, 8, 9, 12, 13, 14, 16, 16, 19, 20, 22 Find Q 3 u n = th position 12 th position u Q 3 is in the ¾(14+1) = th position u This position lies 0.25 times the distance between the 11 th and 12 th positions u Q 3 = (19-16) u = 16.75

101 Quartiles: Example u Q 1 = 7.75 u Q 2 = 12.5 u Q 3 = u What is the interpretation of the quartiles?

102 Quartiles: Example u Q 1 = % of days had 7.75 or less sightings of different pods of bottle-nosed dolphins u Q 2 = % of days had 12.5 or less sightings of different pods of bottle-nosed dolphins u Q 3 = % of days had or less sightings of different pods of bottle-nosed dolphins

103 Quartiles: In-class Exercise Summer 2010 Q2ii A study examined the distance (in km) between student s accommodation and their university. The results for 29 full-time students are given below. The study was restricted to full-time students Calculate the quartiles and interpret the values.

104 Quartiles u Symmetric distribution Q 2 is equidistant from Q 1 and Q 3 Q 2 Q 1 = Q 3 Q 2 Q 1 Q 2 Q 3

105 Interquartile Range u Interquartile range (IQR = Q 3 Q 1 ) is an alternative measure of variability to the Range u It is a modified range that is robust to extreme values u Quartiles are special cases of percentiles

106 Percentiles u Percentiles split a distribution into 100 parts u Percentile P x is the value below which lies x% of the distribution u P x lies in the (x(n+1))/100)th position in an ordered list u It may be necessary to interpolate between values to calculate the percentiles

107 2.6 Boxplots

108 Boxplots u Boxplots are useful for presenting data v Presenting and summarising data u Provides an impression of the location and dispersion u Used to identify outliers (extreme values that are incompatible with the rest of the values) u Construction of a boxplot involves computing quartiles and the IQR

109 Boxplots u A box extending from Q 1 to Q 3 u A line through the box at Q 2 u Lines extending from the box to the values just inside a length of 1.5xIQR (known as adjacent values) u An identifier for each observation beyond these lines (outliers)

110 Boxplots Lower adjacent value Observation > Q IQR IQR Upper adjacent value Observation < Q IQR Q 1 Q 3 Q 1 Q 2 Q 3

111 Boxplots: Example u Construct a Boxplot for the cell diameters (nm) of a sample of a type of virus

112 Boxplots: Example Lower adjacent value Observation > Q IQR IQR Upper adjacent value Observation < Q IQR Q 1 Q 2 Q 3

113 Boxplots: Example 6 th position u Data; Find Q 1 u n = 23 u Q 1 is in the ¼(23+1) th position = 6 th position u Q 1 = 16

114 Boxplots: Example 12 th position u Data; Find Q 2 u n = 23 u Q 2 is in the ½(23+1) th position = 12 th position u Q 2 = 18

115 Boxplots: Example 18 th position u Data; Find Q 3 u n = 23 u Q 3 is in the ¾(23+1) th position = 18 th position u Q 3 = 21

116 Boxplots: Example u Q 1 = 16, Q 2 = 18, Q 3 = 21 u IQR = Q 3 - Q 1 = = 5 u 1.5 x IQR = 1.5 x 5 = u Q x IQR = = 8.5 u Q x IQR = = 28.5

117 Boxplots: Example u Q 1 = 16, Q 2 = 18, Q 3 = 21, 1.5xIQR=7.5 u Q x IQR = = 8.5 u Lower adjacent value = observation > Q x IQR u Lower adjacent value = observation > 8.5 = 12

118 Boxplots: Example u Q 1 = 16, Q 2 = 18, Q 3 = 21, 1.5xIQR=7.5 u Q x IQR = = 28.5 u Upper adjacent value = observation < Q x IQR u Upper adjacent value = observation < 28.5 = 25

119 Boxplots: Example u Q 1 = 16, Q 2 = 18, Q 3 = 21 u Lower adjacent value = 12 u Upper adjacent value = u Observations > 25 or < 12 are outliers u One outlier at 32, identify this observation with a symbol such as an asterisk (*) or circle

120 Boxplots: Example Diameter (nm) Box-plot of the Diameter (nm) of a virus u Distribution is centred at 18nm u Distribution is not symmetric, but slightly skewed to the right u 50% of the distribution lies between 16 and 21nm u One outlier, a value of 32nm

121 Boxplots: Construction u Compute Q 1, Q 2, Q 3 u Calculate interquartile range (IQR) Q 3 - Q 1 u Locate lower and upper adjacent values v Lower adjacent value = observation > Q x IQR v Upper adjacent value = observation < Q x IQR u Identified outliers v Values > Upper adjacent value v Values < Lower adjacent values u Form box between Q 1 and Q 3 ; draw a line at Q 2 u Extend whiskers to lower and adjacent values u Mark outliers

122 Boxplots: Construction Lower adjacent value Observation > Q IQR IQR Upper adjacent value Observation < Q IQR Q 1 Q 2 Q 3

123 Boxplots u Boxplots are useful for comparing two or more distributions u To compare the distributions of two samples, two boxplots could be prepared on the same axes u What can be compared between distributions v Locations of the distributions v Minimum and maximum values v Dispersions v Shapes of the distributions

124 Boxplots: Example Cell diameters (nm) of samples of 2 types of virus. Type 1: Type 2:

125 Boxplots: Example 45.0 Box-plot of Diameters (nm) of 2 types of virus 38.0 Diameter (nm) Type 1 Type 2

126 Boxplots In-class Exercise Summer 2010 Q2iii,iv,vi A study examined the distance (in km) between student s accommodation and their university. The results for 29 full-time students are given below. The study was restricted to full-time students Construct a box-plot showing all calculations. Based on box-plot, provide suitable measures of centrality and spread. Include an explanation for your choice. (Note: you do not need to do any calculations.) What other method of presentation would be appropriate for this data? (Note: You do not need to prepare this).

127 Summary u Computed quartiles v Q 1 v Q 2 v Q 3 u Identified outliers u Constructed boxplots u Examined distribution shape

128 What next? u Motivating Exercises

129 2. Presenting & Summarising Data Descriptive Statistics Motivating Exercises

130 Summary of Previous Lecture u The interquartile range (IQR) presents the lower quartile to the upper quartile spanning the middle half of the data. u IQR is a more resistant measure of spread as it is unaffected by extreme observations. u When data are highly skewed, the standard deviation has no meaning. u The five number summary of a dataset consists of the minimum value, first quartile, median, third quartile and maximum value, and forms the basis of the boxplot. u The boxplot provides information about centrality (by the median), spread (by the interquartile range, first quartile to third quartile) and outliers (values more than 1.5 x IQR below the first quartile or above the third quartile). u An outlier is an extreme value falling far below or above the bulk of the data.

131 2.1 Motivating Exercises

132 Motivating Exercise 1: Summer 2009 Q1b Scenario The ages of 90 people seen in the emergency room of a Dublin hospital on a Friday night were recorded. The results are summarised in the frequency table below: Age (years) Frequency

133 Motivating Exercise 1: Summer 2009 Q1b Questions to Explore i. Prepare a histogram for the ages and comment on its shape. ii. Based on the histogram in (i), suggest suitable measures of centrality and spread. Include an explanation for your choice. (Note: you do not need to calculate these measures.)

134 Motivating Exercise 2: Summer 2011 Q2 Scenario A study examined the moisture content of fields in West Cork. The moisture content, measured as a percentage, for a random sample of 30 fields are given below

135 Motivating Exercise 2: Summer 2011 Q2 Questions to Explore u u u u u u u Calculate the mean and median. [8 marks] Calculate the quartiles and interpret these values. [12 marks] Construct a box-plot showing all steps in your calculations. [16 marks] Based on box-plot in (iii), provide suitable measures of centrality and spread. Include an explanation for your choice. (Note: you do not need to do any calculations.) [8 marks] Suppose low moisture content was defined as moisture content less than 10%. What percentage of fields would be classed as having low moisture content? [2 marks]. If you took a random sample of 200 fields in West Cork, how many low moisture content fields would you expect to find? [2 marks]. What other method of presentation would be appropriate for this data? (Note: You do not need to prepare this). [2 marks]

136 u Probability What next?

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data Chapter 2: Summarising numerical data Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data Extract from Study Design Key knowledge Types of data: categorical (nominal and ordinal)

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Section 1.3 with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE Chapter 1 Exploring Data Introduction: Data Analysis: Making Sense of Data 1.1

More information

2011 Pearson Education, Inc

2011 Pearson Education, Inc Statistics for Business and Economics Chapter 2 Methods for Describing Sets of Data Summary of Central Tendency Measures Measure Formula Description Mean x i / n Balance Point Median ( n +1) Middle Value

More information

Chapter 2: Tools for Exploring Univariate Data

Chapter 2: Tools for Exploring Univariate Data Stats 11 (Fall 2004) Lecture Note Introduction to Statistical Methods for Business and Economics Instructor: Hongquan Xu Chapter 2: Tools for Exploring Univariate Data Section 2.1: Introduction What is

More information

Measures of center. The mean The mean of a distribution is the arithmetic average of the observations:

Measures of center. The mean The mean of a distribution is the arithmetic average of the observations: Measures of center The mean The mean of a distribution is the arithmetic average of the observations: x = x 1 + + x n n n = 1 x i n i=1 The median The median is the midpoint of a distribution: the number

More information

Chapter2 Description of samples and populations. 2.1 Introduction.

Chapter2 Description of samples and populations. 2.1 Introduction. Chapter2 Description of samples and populations. 2.1 Introduction. Statistics=science of analyzing data. Information collected (data) is gathered in terms of variables (characteristics of a subject that

More information

Elementary Statistics

Elementary Statistics Elementary Statistics Q: What is data? Q: What does the data look like? Q: What conclusions can we draw from the data? Q: Where is the middle of the data? Q: Why is the spread of the data important? Q:

More information

P8130: Biostatistical Methods I

P8130: Biostatistical Methods I P8130: Biostatistical Methods I Lecture 2: Descriptive Statistics Cody Chiuzan, PhD Department of Biostatistics Mailman School of Public Health (MSPH) Lecture 1: Recap Intro to Biostatistics Types of Data

More information

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics Last Lecture Distinguish Populations from Samples Importance of identifying a population and well chosen sample Knowing different Sampling Techniques Distinguish Parameters from Statistics Knowing different

More information

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things. (c) Epstein 2013 Chapter 5: Exploring Data Distributions Page 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms Individuals are the objects described by a set of data. These individuals

More information

MATH 1150 Chapter 2 Notation and Terminology

MATH 1150 Chapter 2 Notation and Terminology MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the

More information

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable QUANTITATIVE DATA Recall that quantitative (numeric) data values are numbers where data take numerical values for which it is sensible to find averages, such as height, hourly pay, and pulse rates. UNIVARIATE

More information

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables Chapter 2 Class Notes Sample & Population Descriptions Classifying variables Random Variables (RVs) are discrete quantitative continuous nominal qualitative ordinal Notation and Definitions: a Sample is

More information

CHAPTER 2: Describing Distributions with Numbers

CHAPTER 2: Describing Distributions with Numbers CHAPTER 2: Describing Distributions with Numbers The Basic Practice of Statistics 6 th Edition Moore / Notz / Fligner Lecture PowerPoint Slides Chapter 2 Concepts 2 Measuring Center: Mean and Median Measuring

More information

Unit 2. Describing Data: Numerical

Unit 2. Describing Data: Numerical Unit 2 Describing Data: Numerical Describing Data Numerically Describing Data Numerically Central Tendency Arithmetic Mean Median Mode Variation Range Interquartile Range Variance Standard Deviation Coefficient

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that? Tastitsticsss? What s that? Statistics describes random mass phanomenons. Principles of Biostatistics and Informatics nd Lecture: Descriptive Statistics 3 th September Dániel VERES Data Collecting (Sampling)

More information

Exercises from Chapter 3, Section 1

Exercises from Chapter 3, Section 1 Exercises from Chapter 3, Section 1 1. Consider the following sample consisting of 20 numbers. (a) Find the mode of the data 21 23 24 24 25 26 29 30 32 34 39 41 41 41 42 43 48 51 53 53 (b) Find the median

More information

1. Exploratory Data Analysis

1. Exploratory Data Analysis 1. Exploratory Data Analysis 1.1 Methods of Displaying Data A visual display aids understanding and can highlight features which may be worth exploring more formally. Displays should have impact and be

More information

STAT 200 Chapter 1 Looking at Data - Distributions

STAT 200 Chapter 1 Looking at Data - Distributions STAT 200 Chapter 1 Looking at Data - Distributions What is Statistics? Statistics is a science that involves the design of studies, data collection, summarizing and analyzing the data, interpreting the

More information

MATH 117 Statistical Methods for Management I Chapter Three

MATH 117 Statistical Methods for Management I Chapter Three Jubail University College MATH 117 Statistical Methods for Management I Chapter Three This chapter covers the following topics: I. Measures of Center Tendency. 1. Mean for Ungrouped Data (Raw Data) 2.

More information

Chapter 3. Data Description

Chapter 3. Data Description Chapter 3. Data Description Graphical Methods Pie chart It is used to display the percentage of the total number of measurements falling into each of the categories of the variable by partition a circle.

More information

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Summarize with Shape, Center, Spread Displays: Stemplots, Histograms Five Number Summary, Outliers, Boxplots Mean vs.

More information

Section 3. Measures of Variation

Section 3. Measures of Variation Section 3 Measures of Variation Range Range = (maximum value) (minimum value) It is very sensitive to extreme values; therefore not as useful as other measures of variation. Sample Standard Deviation The

More information

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1 Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 3 Statistics for Describing, Exploring, and Comparing Data 3-1 Overview 3-2 Measures

More information

Lecture 2 and Lecture 3

Lecture 2 and Lecture 3 Lecture 2 and Lecture 3 1 Lecture 2 and Lecture 3 We can describe distributions using 3 characteristics: shape, center and spread. These characteristics have been discussed since the foundation of statistics.

More information

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries) Summarize with Shape, Center, Spread Displays: Stemplots, Histograms Five Number Summary, Outliers, Boxplots Cengage Learning

More information

CIVL 7012/8012. Collection and Analysis of Information

CIVL 7012/8012. Collection and Analysis of Information CIVL 7012/8012 Collection and Analysis of Information Uncertainty in Engineering Statistics deals with the collection and analysis of data to solve real-world problems. Uncertainty is inherent in all real

More information

Chapter 3. Measuring data

Chapter 3. Measuring data Chapter 3 Measuring data 1 Measuring data versus presenting data We present data to help us draw meaning from it But pictures of data are subjective They re also not susceptible to rigorous inference Measuring

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table 2.0 Lesson Plan Answer Questions 1 Summary Statistics Histograms The Normal Distribution Using the Standard Normal Table 2. Summary Statistics Given a collection of data, one needs to find representations

More information

Units. Exploratory Data Analysis. Variables. Student Data

Units. Exploratory Data Analysis. Variables. Student Data Units Exploratory Data Analysis Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison Statistics 371 13th September 2005 A unit is an object that can be measured, such as

More information

Bioeng 3070/5070. App Math/Stats for Bioengineer Lecture 3

Bioeng 3070/5070. App Math/Stats for Bioengineer Lecture 3 Bioeng 3070/5070 App Math/Stats for Bioengineer Lecture 3 Five number summary Five-number summary of a data set consists of: the minimum (smallest observation) the first quartile (which cuts off the lowest

More information

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES INTRODUCTION TO APPLIED STATISTICS NOTES PART - DATA CHAPTER LOOKING AT DATA - DISTRIBUTIONS Individuals objects described by a set of data (people, animals, things) - all the data for one individual make

More information

Example 2. Given the data below, complete the chart:

Example 2. Given the data below, complete the chart: Statistics 2035 Quiz 1 Solutions Example 1. 2 64 150 150 2 128 150 2 256 150 8 8 Example 2. Given the data below, complete the chart: 52.4, 68.1, 66.5, 75.0, 60.5, 78.8, 63.5, 48.9, 81.3 n=9 The data is

More information

Chapter 1:Descriptive statistics

Chapter 1:Descriptive statistics Slide 1.1 Chapter 1:Descriptive statistics Descriptive statistics summarises a mass of information. We may use graphical and/or numerical methods Examples of the former are the bar chart and XY chart,

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics CHAPTER OUTLINE 6-1 Numerical Summaries of Data 6- Stem-and-Leaf Diagrams 6-3 Frequency Distributions and Histograms 6-4 Box Plots 6-5 Time Sequence Plots 6-6 Probability Plots Chapter

More information

Measures of Central Tendency

Measures of Central Tendency Measures of Central Tendency Summary Measures Summary Measures Central Tendency Mean Median Mode Quartile Range Variance Variation Coefficient of Variation Standard Deviation Measures of Central Tendency

More information

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. Statistics is a field of study concerned with the data collection,

More information

Stat 101 Exam 1 Important Formulas and Concepts 1

Stat 101 Exam 1 Important Formulas and Concepts 1 1 Chapter 1 1.1 Definitions Stat 101 Exam 1 Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2. Categorical/Qualitative

More information

Introduction to Statistics

Introduction to Statistics Introduction to Statistics By A.V. Vedpuriswar October 2, 2016 Introduction The word Statistics is derived from the Italian word stato, which means state. Statista refers to a person involved with the

More information

A is one of the categories into which qualitative data can be classified.

A is one of the categories into which qualitative data can be classified. Chapter 2 Methods for Describing Sets of Data 2.1 Describing qualitative data Recall qualitative data: non-numerical or categorical data Basic definitions: A is one of the categories into which qualitative

More information

Chapter 5. Understanding and Comparing. Distributions

Chapter 5. Understanding and Comparing. Distributions STAT 141 Introduction to Statistics Chapter 5 Understanding and Comparing Distributions Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 27 Boxplots How to create a boxplot? Assume

More information

2.1 Measures of Location (P.9-11)

2.1 Measures of Location (P.9-11) MATH1015 Biostatistics Week.1 Measures of Location (P.9-11).1.1 Summation Notation Suppose that we observe n values from an experiment. This collection (or set) of n values is called a sample. Let x 1

More information

Chapter 1 - Lecture 3 Measures of Location

Chapter 1 - Lecture 3 Measures of Location Chapter 1 - Lecture 3 of Location August 31st, 2009 Chapter 1 - Lecture 3 of Location General Types of measures Median Skewness Chapter 1 - Lecture 3 of Location Outline General Types of measures What

More information

Statistics I Chapter 2: Univariate data analysis

Statistics I Chapter 2: Univariate data analysis Statistics I Chapter 2: Univariate data analysis Chapter 2: Univariate data analysis Contents Graphical displays for categorical data (barchart, piechart) Graphical displays for numerical data data (histogram,

More information

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected What is statistics? Statistics is the science of: Collecting information Organizing and summarizing the information collected Analyzing the information collected in order to draw conclusions Two types

More information

Resistant Measure - A statistic that is not affected very much by extreme observations.

Resistant Measure - A statistic that is not affected very much by extreme observations. Chapter 1.3 Lecture Notes & Examples Section 1.3 Describing Quantitative Data with Numbers (pp. 50-74) 1.3.1 Measuring Center: The Mean Mean - The arithmetic average. To find the mean (pronounced x bar)

More information

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode. Chapter 3 Numerically Summarizing Data Chapter 3.1 Measures of Central Tendency Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode. A1. Mean The

More information

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?!

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?! Topic 3: Introduction to Statistics Collecting Data We collect data through observation, surveys and experiments. We can collect two different types of data: Categorical Quantitative Algebra 1 Table of

More information

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- # Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series by Mario F. Triola Chapter 3 Statistics for Describing, Exploring, and Comparing Data 3-1 Review and Preview 3-2 Measures

More information

CHAPTER 1. Introduction

CHAPTER 1. Introduction CHAPTER 1 Introduction Engineers and scientists are constantly exposed to collections of facts, or data. The discipline of statistics provides methods for organizing and summarizing data, and for drawing

More information

Chapter 4. Displaying and Summarizing. Quantitative Data

Chapter 4. Displaying and Summarizing. Quantitative Data STAT 141 Introduction to Statistics Chapter 4 Displaying and Summarizing Quantitative Data Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 31 4.1 Histograms 1 We divide the range

More information

Descriptive Statistics-I. Dr Mahmoud Alhussami

Descriptive Statistics-I. Dr Mahmoud Alhussami Descriptive Statistics-I Dr Mahmoud Alhussami Biostatistics What is the biostatistics? A branch of applied math. that deals with collecting, organizing and interpreting data using well-defined procedures.

More information

TOPIC: Descriptive Statistics Single Variable

TOPIC: Descriptive Statistics Single Variable TOPIC: Descriptive Statistics Single Variable I. Numerical data summary measurements A. Measures of Location. Measures of central tendency Mean; Median; Mode. Quantiles - measures of noncentral tendency

More information

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved. 1-1 Chapter 1 Sampling and Descriptive Statistics 1-2 Why Statistics? Deal with uncertainty in repeated scientific measurements Draw conclusions from data Design valid experiments and draw reliable conclusions

More information

Introduction to Statistics

Introduction to Statistics Introduction to Statistics Data and Statistics Data consists of information coming from observations, counts, measurements, or responses. Statistics is the science of collecting, organizing, analyzing,

More information

Statistics for Managers using Microsoft Excel 6 th Edition

Statistics for Managers using Microsoft Excel 6 th Edition Statistics for Managers using Microsoft Excel 6 th Edition Chapter 3 Numerical Descriptive Measures 3-1 Learning Objectives In this chapter, you learn: To describe the properties of central tendency, variation,

More information

Statistics I Chapter 2: Univariate data analysis

Statistics I Chapter 2: Univariate data analysis Statistics I Chapter 2: Univariate data analysis Chapter 2: Univariate data analysis Contents Graphical displays for categorical data (barchart, piechart) Graphical displays for numerical data data (histogram,

More information

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives F78SC2 Notes 2 RJRC Algebra It is useful to use letters to represent numbers. We can use the rules of arithmetic to manipulate the formula and just substitute in the numbers at the end. Example: 100 invested

More information

Section 3.2 Measures of Central Tendency

Section 3.2 Measures of Central Tendency Section 3.2 Measures of Central Tendency 1 of 149 Section 3.2 Objectives Determine the mean, median, and mode of a population and of a sample Determine the weighted mean of a data set and the mean of a

More information

Describing Distributions

Describing Distributions Describing Distributions With Numbers April 18, 2012 Summary Statistics. Measures of Center. Percentiles. Measures of Spread. A Summary Statement. Choosing Numerical Summaries. 1.0 What Are Summary Statistics?

More information

Statistics and parameters

Statistics and parameters Statistics and parameters Tables, histograms and other charts are used to summarize large amounts of data. Often, an even more extreme summary is desirable. Statistics and parameters are numbers that characterize

More information

Describing Distributions with Numbers

Describing Distributions with Numbers Describing Distributions with Numbers Using graphs, we could determine the center, spread, and shape of the distribution of a quantitative variable. We can also use numbers (called summary statistics)

More information

Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table Lesson Plan Answer Questions Summary Statistics Histograms The Normal Distribution Using the Standard Normal Table 1 2. Summary Statistics Given a collection of data, one needs to find representations

More information

3.1 Measure of Center

3.1 Measure of Center 3.1 Measure of Center Calculate the mean for a given data set Find the median, and describe why the median is sometimes preferable to the mean Find the mode of a data set Describe how skewness affects

More information

MEASURING THE SPREAD OF DATA: 6F

MEASURING THE SPREAD OF DATA: 6F CONTINUING WITH DESCRIPTIVE STATS 6E,6F,6G,6H,6I MEASURING THE SPREAD OF DATA: 6F othink about this example: Suppose you are at a high school football game and you sample 40 people from the student section

More information

200 participants [EUR] ( =60) 200 = 30% i.e. nearly a third of the phone bills are greater than 75 EUR

200 participants [EUR] ( =60) 200 = 30% i.e. nearly a third of the phone bills are greater than 75 EUR Ana Jerončić 200 participants [EUR] about half (71+37=108) 200 = 54% of the bills are small, i.e. less than 30 EUR (18+28+14=60) 200 = 30% i.e. nearly a third of the phone bills are greater than 75 EUR

More information

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes We Make Stats Easy. Chapter 4 Tutorial Length 1 Hour 45 Minutes Tutorials Past Tests Chapter 4 Page 1 Chapter 4 Note The following topics will be covered in this chapter: Measures of central location Measures

More information

1.3: Describing Quantitative Data with Numbers

1.3: Describing Quantitative Data with Numbers 1.3: Describing Quantitative Data with Numbers Section 1.3 Describing Quantitative Data with Numbers After this section, you should be able to MEASURE center with the mean and median MEASURE spread with

More information

Representations of Data - Edexcel Past Exam Questions

Representations of Data - Edexcel Past Exam Questions Representations of Data - Edexcel Past Exam Questions 1. The number of caravans on Seaview caravan site on each night in August last year is summarised as follows: the least number of caravans was 10.

More information

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data Review for Exam #1 1 Chapter 1 Population the complete collection of elements (scores, people, measurements, etc.) to be studied Sample a subcollection of elements drawn from a population 11 The Nature

More information

Practice problems from chapters 2 and 3

Practice problems from chapters 2 and 3 Practice problems from chapters and 3 Question-1. For each of the following variables, indicate whether it is quantitative or qualitative and specify which of the four levels of measurement (nominal, ordinal,

More information

Biostatistics for biomedical profession. BIMM34 Karin Källen & Linda Hartman November-December 2015

Biostatistics for biomedical profession. BIMM34 Karin Källen & Linda Hartman November-December 2015 Biostatistics for biomedical profession BIMM34 Karin Källen & Linda Hartman November-December 2015 12015-11-02 Who needs a course in biostatistics? - Anyone who uses quntitative methods to interpret biological

More information

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami Unit Two Descriptive Biostatistics Dr Mahmoud Alhussami Descriptive Biostatistics The best way to work with data is to summarize and organize them. Numbers that have not been summarized and organized are

More information

Measures of disease spread

Measures of disease spread Measures of disease spread Marco De Nardi Milk Safety Project 1 Objectives 1. Describe the following measures of spread: range, interquartile range, variance, and standard deviation 2. Discuss examples

More information

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart

More information

Math 221, REVIEW, Instructor: Susan Sun Nunamaker

Math 221, REVIEW, Instructor: Susan Sun Nunamaker Math 221, REVIEW, Instructor: Susan Sun Nunamaker Good Luck & Contact me through through e-mail if you have any questions. 1. Bar graphs can only be vertical. a. true b. false 2.

More information

are the objects described by a set of data. They may be people, animals or things.

are the objects described by a set of data. They may be people, animals or things. ( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms

More information

Probabilities and Statistics Probabilities and Statistics Probabilities and Statistics

Probabilities and Statistics Probabilities and Statistics Probabilities and Statistics - Lecture 8 Olariu E. Florentin April, 2018 Table of contents 1 Introduction Vocabulary 2 Descriptive Variables Graphical representations Measures of the Central Tendency The Mean The Median The Mode Comparing

More information

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization. Statistical Tools in Evaluation HPS 41 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific number

More information

ECLT 5810 Data Preprocessing. Prof. Wai Lam

ECLT 5810 Data Preprocessing. Prof. Wai Lam ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate

More information

CHAPTER 2 Description of Samples and Populations

CHAPTER 2 Description of Samples and Populations Chapter 2 27 CHAPTER 2 Description of Samples and Populations 2.1.1 (a) i) Molar width ii) Continuous variable iii) A molar iv) 36 (b) i) Birthweight, date of birth, and race ii) Birthweight is continuous,

More information

Unit 2: Numerical Descriptive Measures

Unit 2: Numerical Descriptive Measures Unit 2: Numerical Descriptive Measures Summation Notation Measures of Central Tendency Measures of Dispersion Chebyshev's Rule Empirical Rule Measures of Relative Standing Box Plots z scores Jan 28 10:48

More information

Descriptive Data Summarization

Descriptive Data Summarization Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning

More information

Vocabulary: Samples and Populations

Vocabulary: Samples and Populations Vocabulary: Samples and Populations Concept Different types of data Categorical data results when the question asked in a survey or sample can be answered with a nonnumerical answer. For example if we

More information

Lecture 1 : Basic Statistical Measures

Lecture 1 : Basic Statistical Measures Lecture 1 : Basic Statistical Measures Jonathan Marchini October 11, 2004 In this lecture we will learn about different types of data encountered in practice different ways of plotting data to explore

More information

Overview. INFOWO Statistics lecture S1: Descriptive statistics. Detailed Overview of the Statistics track. Definition

Overview. INFOWO Statistics lecture S1: Descriptive statistics. Detailed Overview of the Statistics track. Definition Overview INFOWO Statistics lecture S1: Descriptive statistics Peter de Waal Introduction to statistics Descriptive statistics Department of Information and Computing Sciences Faculty of Science, Universiteit

More information

Describing Distributions With Numbers

Describing Distributions With Numbers Describing Distributions With Numbers October 24, 2012 What Do We Usually Summarize? Measures of Center. Percentiles. Measures of Spread. A Summary Statement. Choosing Numerical Summaries. 1.0 What Do

More information

Math 14 Lecture Notes Ch Percentile

Math 14 Lecture Notes Ch Percentile .3 Measures of the Location of the Data Percentile g A measure of position, the percentile, p, is an integer (1 p 99) such that the p th percentile is the position of a data value where p% of the data

More information

Describing Distributions With Numbers Chapter 12

Describing Distributions With Numbers Chapter 12 Describing Distributions With Numbers Chapter 12 May 1, 2013 What Do We Usually Summarize? Measures of Center. Percentiles. Measures of Spread. A Summary. 1.0 What Do We Usually Summarize? source: Prof.

More information

Full file at

Full file at IV SOLUTIONS TO EXERCISES Note: Exercises whose answers are given in the back of the textbook are denoted by the symbol. CHAPTER Description of Samples and Populations Note: Exercises whose answers are

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Chapter 1. Looking at Data

Chapter 1. Looking at Data Chapter 1 Looking at Data Types of variables Looking at Data Be sure that each variable really does measure what you want it to. A poor choice of variables can lead to misleading conclusions!! For example,

More information

Summarising numerical data

Summarising numerical data 2 Core: Data analysis Chapter 2 Summarising numerical data 42 Core Chapter 2 Summarising numerical data 2A Dot plots and stem plots Even when we have constructed a frequency table, or a histogram to display

More information

Sets and Set notation. Algebra 2 Unit 8 Notes

Sets and Set notation. Algebra 2 Unit 8 Notes Sets and Set notation Section 11-2 Probability Experimental Probability experimental probability of an event: Theoretical Probability number of time the event occurs P(event) = number of trials Sample

More information

Measures of the Location of the Data

Measures of the Location of the Data Measures of the Location of the Data 1. 5. Mark has 51 films in his collection. Each movie comes with a rating on a scale from 0.0 to 10.0. The following table displays the ratings of the aforementioned

More information

The empirical ( ) rule

The empirical ( ) rule The empirical (68-95-99.7) rule With a bell shaped distribution, about 68% of the data fall within a distance of 1 standard deviation from the mean. 95% fall within 2 standard deviations of the mean. 99.7%

More information

AP Final Review II Exploring Data (20% 30%)

AP Final Review II Exploring Data (20% 30%) AP Final Review II Exploring Data (20% 30%) Quantitative vs Categorical Variables Quantitative variables are numerical values for which arithmetic operations such as means make sense. It is usually a measure

More information

Descriptive Univariate Statistics and Bivariate Correlation

Descriptive Univariate Statistics and Bivariate Correlation ESC 100 Exploring Engineering Descriptive Univariate Statistics and Bivariate Correlation Instructor: Sudhir Khetan, Ph.D. Wednesday/Friday, October 17/19, 2012 The Central Dogma of Statistics used to

More information