University of Jordan Fall 2009/2010 Department of Mathematics

Size: px
Start display at page:

Download "University of Jordan Fall 2009/2010 Department of Mathematics"

Transcription

1 handouts Part 1 (Chapter 1 - Chapter 5) University of Jordan Fall 009/010 Department of Mathematics Chapter 1 Introduction to Introduction; Some Basic Concepts Statistics is a science related to making decisions in the face of uncertainty, it comprises the following 1) Descriptive statistics : Concerned with the collection, organization, summarization and analysis of a body of data ) Inferential statistics : Concerned with drawing inferences about a large body of data (called a population) through examining a part of that body (called a sample) The performance of statistical activities is motivated by the need to answer a question about a certain population. The usual setup of such activities starts with picking up a sample from the population that is similar to that population in the sense that it has all the characteristics and properties of the population (such a sample is said to be an unbiased sample), then to collect information from the sample and use it to answer the question about the population. If the question (hence the data) is related to a medical, biological, or nutritive problem then we use the term biostatistics to distinguish this particular kind of statistical tools. Now we introduce some of the vocabulary and concepts that are widely used in any statistics course. Random Variable: the information or data collected from the subjects can not be exactly predicted in advance, they are referred to as random variables. Random variables are two kinds : Qualitative Variables : They divide the subjects into groups or categories, the value of a qualitative variable can not be measured or counted, for example the birth place, gender, or marital status of an individual. Qualitative random variables are either nomimnal or ordinal. The possible values of a nominal random variable do not have a natural order. For example: gender, marital status, nationality.. The possible values of an ordinal random variable

2 Page can be ordered naturally. For example: rank, letter grade, degree of improvement such as low, weak, good, very good and excellent. Quantitative Variables : The value of a quantitative variable can be measured or counted. We distinguish between two kinds of quantitative variables: 1. Discrete Variables: if the value of the variable can be counted then it is called a discrete random variable, an example of a discrete random variable is the number of admissions to a general hospital or the number of family members of an individual. Discrete random variables are characterized by gaps or interruptions in the values they assume.. Continuous Variables: if the value of the variable can be measured then it is called a continuous random variable, an example of a continuous random variable is the period of treatment of a tuberculoses patient. A continuous random variable can assume any value within a specified relevant interval of values. Sources of Data: The information about the subjects are usually collected from one or more of the following sources 1. Routinely kept records or archives: for example the medical history of a patient.. Surveys: if the data needed is not available in the kept records then it logical to think of a survey, for example information about whether the patient received a good treatment or not is not usually kept in the hospital records but can be surveyed. 3. Experiments: Frequently the data needed to answer a question are available only as the result of an experiment. Different strategies of motivation may be tried by a pediatrician or a dentist with different children to know the best strategy for maximizing children compliance. 4. External Sources: the data needed to answer a question may already exist in the form of a published report. International organizations like WHO or health ministries usually publish reports that make a good source of data that can be benefited from. Page of 41

3 Page 3 The Simple Random Sample (SRS) If a sample of size n is drawn from a population of size N in such a way that every possible sample of size n has the same chance of being selected then the sample is called a simple random sample. One method of selecting a simple random sample is a method which is uses random number generators or random number tables. The procedure of that method is the following: 1. Get a list of all subjects in the population. Obtain random numbers from a random number generator or a table 3. Select the subjects whose numbers in the list match with the obtained random numbers. Note: The above method is ideal but it is practically inapplicable to some data, in particular it is difficult to implement it when we need to draw a sample from a relatively huge population. Reading Assignment: Chapter 1 (1.1,1.,1.4) in W.W.Daniel. Chapter Descriptive Statistics Introduction In this chapter we learn several techniques for organizing and presenting data so that we may easily determine what information they contain. The Ordered Array An ordered array is a listing of the values of a collection of data in order of magnitude from the smallest value to the largest value. An ordered array enables one to determine quickly the value of the smallest measurement and the value of the largest measurement. Page 33 of 41

4 Page 4 Example: The following data are the ages of 30 people, rounded to the nearest year, who have been discharged from a general hospital last Friday In order to put the above data in an ordered array we just list the measurements from the smallest to the largest Frequency tables without classes: Such tables can be used to organize all types of data. Example: the following table shows letter grades of 150 students. X (letter grade) Frequency (number of students) F 1 D 15 D+ 0 C 35 C+ 30 B 18 B+ 1 A 8 Grouped Data Frequency tables with classes To group a set of observations we select a set of contiguous, non-overlapping intervals such that each observation belongs to exactly one interval. These intervals are called class intervals. Class intervals need not have the same width. All class intervals are listed in a table which is referred to as a frequency table. A typical frequency table consists of the following Class intervals: a column in which all class intervals are listed Page 44 of 41

5 Page 5 Midpoints : a column in which the midpoints of the class intervals are computed. The midpoints of a class interval equals (left side + right side)/ Frequency : the frequency of a class interval is the number of observations that belong to the class interval. Cumulative Frequency: the cumulative frequency of a class interval is the number of observations that are less than or equal the right-hand side of that class interval. Relative Frequency : the relative frequency of a class interval equals (the frequency of the class interval / total frequency) Cumulative Relative Frequency: It equals (cumulative frequency / total frequency) A natural question is how many class intervals should be included in a frequency table? A rule of thumb states that the number of class intervals k should be between 5 and 15. We may use the following rule given by Sturges as a guide for computing k: The number of class intervals is the closest integer k to log10 ( n ) where n is the total number of observations. The number of class intervals specified by the rule can be increased or decreased for more convenience or better presentation. After having decided about the number of class intervals we decide about the class widths. If we decide to give all classes the same width then we compute the class width using the formula (largest value smallest value ) / k rounded to the nearest number from above with the same accuracy unit. Example: Put the data mentioned in the previous example in a frequency table. We start with computing the number of classes. n 30, log Thus we should have 6 class intervals. To obtain the class width we compute Since the observations are integers, 6 we round to 13. Thus the class width is 13. Now we are ready to construct the first class interval which has the least observation, namely 15, as a left-hand side and (left-hand side + the class width one accuracy unit) as a righthand side, the second class's left-hand side is the first class's right-hand side + one accuracy unit. The right-hand side of each class is the left-hand side of the class + the class width one accuracy unit. We construct the other class intervals similarly. Page 55 of 41

6 Page 6 Class intervals Midpoint Frequency Cumulative Relative Cumulative Frequency Frequency Relative Frequency Example: Consider the following cumulative frequency distribution. Class Cumulative Frequency a) What is the width (or length) of each class? b) Find the relative frequency of the second class. c) Find the proportion of observations that are greater than or equal to 16 and less than or equal 33. a) The class width equals16 10 = 6 (or you may say one accuracy unit = = 6) b) The frequency of the second class equals 13 6 = 7 and the total frequency is 50. Thus the relative frequency of the second class equals 7/50 = c) The observations that are greater than or equal to 16 and less than or equal 33 are those in the second, third and fourth classes and their frequencies are 13 6 = 7,38 13 = 5 and 4 38 = 4, respectively. Thus their proportion is (7+5+4)/50 = 0.7. The Histogram; The Frequency Polygon: The histogram is a graphical representation of the frequency distribution (or the relative frequency distribution), it reveals the shape of the data, for example the presence or absence of symmetry. When we construct the histogram the boundaries of the class intervals are presented by the horizontal axis, while the vertical axis has as its scale the frequency (or the Page 66 of 41

7 Page 7 relative frequency). Above each class interval on the horizontal axis a rectangle with height being equal to the frequency (or relative frequency) of the relevant class interval is constructed. All rectangles must be contiguous. The frequency (or relative frequency) polygon is another graphical representation for the frequency (or relative frequency) distribution. To draw a frequency polygon we place a dot above the midpoint of each class interval represented on the horizontal axis in addition to two extra dots on the horizontal axis at the midpoints of two additional class intervals, one is located to the left of the first class and the other is located to the right of the last class, the height of each dot equals the frequency of the relevant class interval and the heights of the extra dots are zero Connecting the dots with line segments produces a frequency polygon. Example: Construct the frequency histogram and the frequency polygon of the following part of a frequency table. Class Intervals Frequency Midpoint Actual limits Frequency The following is the histogram Actual Limits Page 77 of

8 Page 8 The following is the frequency polygon 8 7 Frequency Midpoint Stem-and-Leaf Display (optional): The stem-and-leaf display is similar to the histogram and has the same purpose, its main advantage over the histogram is that it preserves the information contained in the individual data items. It is effective with relatively small data sets. To construct a stem-and-leaf plot we : 1. partition each datum into two parts; the leaf which consists of the units digit and the stem which consists of the rest digits of the datum. on the left hand side of the page write down the stem 3. draw a line to the right of these stems 4. on the other side of the line, write down the leaves of all data with the same stem on the left. The stems of the data should form an ordered column with the smallest stem at the top and the largest at the bottom. All the stems within the range are included in the stem column even if no data with that stem is within our data items. Decimals when present in the original data are omitted in the stem-and-leaf display. If all data items are fractions less than one the we can magnify the data through multiplying each data item by a number (10, 100, 1000 etc.) before we display the data in a stem-and-leaf plot.. Example: Display the following data in a stem-and-leaf plot,3,6,7,1,15,15,15,17,0,0,1,9,9,34,51,56,60,65,69,80,89 Page 88 of 41

9 Page 9 Solution : Stems Leaves Reading Assignment: Chapter (.1,.,.3) in W.W. Daniel Descriptive Statistics Measures of Central Tendency A descriptive measure is a single number that is used to summarize the data. Descriptive measures may be computed from the data of a sample or the data of a population. Definition: 1. A descriptive measure computed from the data of a sample is called a statistic.. A descriptive measure computed from the data of a population is called a parameter. Arithmetic Mean: The arithmetic mean of a sample is denoted by x and of a population is denoted by. From now on we will just say the mean for the arithmetic mean. 1) For raw (unorganized) data: n x x i 1 i, where x1, x,..., xn are the observations in the sample and n is n their number ( the sample size ). N x i 1 N i, where x1, x,..., xn are the observations in the population and N is their number ( the population size ). Page 99 of 41

10 Page 10 ) For frequency tables: =,,,, where x1, x,..., xn are the observations (or midpoints) and are their corresponding frequencies. Properties of the Mean: 1. Uniqueness : for a given set of data there is one and only one mean.. Simplicity: it is so easy to compute the mean of any sample. 3. The value of each data item has an influence on the mean, thus the mean is affected by extreme values, this makes the mean, in some cases, not a good representative of the tendency of the values of the majority of the data.. Example: The mean of the data 50,49,53,48,54,40 equals ( )/6= ; a number which does not represent the tendency of the data, however if we trim out the observation 40 then the mean becomes ( )/5 = Notice the influence of the observation 40 on the value of the mean. Example(part 1 is optional): Compute the mean for the following two data sets. 1) Stem Leaf 0 1,, 5 1 0, 1 1, 1, 1, 3 0, 1,, ) Class Frequency ) mean = sum of observations/ number of observations = ( )/13 = 39/13 = Page of 41

11 Page 11 ) 1 Frequency ( ) Total Midpoint ( ) mean = 64/10 = 6.4 The Median: The median of a finite set of observations is the value which divides the set into two equal parts such that the number of values equal to or greater than the median is equal to the number of values equal or less than the median. The median will be the middle value (or the average of the two middle values) when all values have been arranged in order of magnitude. Example : Find the median of the following observations 45,78,3,54,61,1,90,46,68,45,11 The first step will be arranging the data in order of magnitude 11, 1, 3, 3, 45, 45, 46, 54, 68, 78, 90 Notice that 45 is located exactly in the middle of all ordered values, thus the median is 45. Example : Find the median of 65, 78,94,5,3,56,66,38,78,3,80 We order the data as a first step 3, 3, 5, 38, 56, 66, 78, 78, 80, 94 Notice that no single datum is located in the middle of the ordered data because the number of data items is even, however the two values 56 and 66 are located in the middle, thus the median equals (56+66)/= 61. Properties of the Median: 1. Uniqueness. Simplicity 3. Unlike the mean, it is not drastically affected by extreme values. Page of 41

12 Page 1 The Mode: A mode of a set of observations is an observation that has the largest frequency. If all observations have the same frequency. A data set may have more than one mode. The mode may be used to describe qualitative data. A mode of grouped data is estimated by the midpoint of a class with the highest frequency. Example: The following table represents the nationalities of a sample of 10 patients who had psychotherapy last year in a private clinic British, French, American, American, Dutch, British, Spanish, South African, French, American To find the mode of the above nationalities we make the following table Nationality Frequency American 3 British Dutch French Spanish 1 South African 1 Notice that the most frequently occurring nationality is American, thus the mode is American. Example: Find the mode of the data 8, 8, 8, 8, 8, 9, 30, 31, 3, 3, 3, 3, 3, 36, 39, 4, 44, 44,45 There are two modes for the above data namely 8 and 3 because they have the same highest frequency. Reading Assignment: Chapter (.4) in W.W. Daniel. Page 1 1 of 41

13 Page 13 Descriptive Statistics Measures of Dispersion The dispersion of a set of data (or observations) refers to the variety that they exhibit. A measure of dispersion provides information about the amount of variability present in a set of data. When the dispersion is "small", the values of the data items are "close" together. The following graph represents two frequency polygons for population A and population B with the same mean notice that population B exhibits more dispersion because the values of its observations are more spread out. Dispersion can be measured using one of the following measures: The Range: The range of a set of values is given by R largest value smallest value. The range is so simple to compute, but it is not usually used as a reliable measure of dispersion because it is drastically affected by extreme values. The Variance: 1) For raw data: the variance of the sample x1, x,...xn is given by n s ( x x) i i 1 n 1 where x is the mean of the sample. One can easily show that the above formula for the variance has also following form = 1 1 =1 which is easier for computations with calculators. The populations variance is given by = ( ) = ( where N is the populations size and is the mean of the population. Page of 41 )

14 Page 14 = ( ) For frequency tables: where,,, 1 =1 ) 1 =1 are the observations (or the midpoints) and corresponding frequencies. ( =1 ),,, are their The Standard Deviation: The variance represents squared units and, therefore, is not an appropriate measure of dispersion when we want to express it in terms of the original units. To obtain a measure of dispersion in the original units, we take the square root of the variance, which we refer to as the standard deviation. The standard deviations of a sample and the population is denoted by s and respectively. = and = Example: Find the mean and standard deviation of each of the following samples i) 4, 8, 8, 61, 31, 3, 50, 34, 3, 37 ii) Class Frequency i) Total Page of 41

15 Page 15 Thus, = i) Midpoint = = 36.6 and =. = Frequency Total 9 45 not needed 333 = 5 and = The Coefficient of Variation = The coefficient of variation, denoted by C.V., is a unit free measure that is used to compare the amount of dispersion between two different sets of data with (possibly) different means and x 100 different units. The coefficient of variation is given by C.V. s Example: The following table summarizes the data collected about the weights of two samples of human males Sample 1 Sample Age 5 years 11 years Mean Weight 145 pounds 80 pounds Standard Deviation 10 pounds 10 pounds Which of the samples is more dispersed? To compare dispersion we compute the C.V. for each sample. C.V. for sample 1 = C.V. for sample = Since the C.V. of sample is greater than the C.V. of sample 1, sample is more dispersed. Page of 41

16 Page 16 Percentiles and Quartiles Percentiles and quartiles are used to indicate certain positions (or locations) of the observations (or data). The pth percentile is denoted by Pp ; it is the number P such that (almost) p% of the observations are less than or equal to P. The 5th percentile is also denoted by Q1 and is also called the 1st quartile. The second quartile Q is the 50th percentile (the median) while the 3rd quartile Q3 is the 75th percentile. Computing percentiles: 1) For ungrouped data, the pth percentile is thought of to be the p ( n 1) th ordered 100 observation. Thus Q1 is the 0.5 ( n 1) th ordered observation Q is the 0.5 ( n 1) th ordered observation Q3 is the 0.75 ( n 1) th ordered observation The pth percentile for ungrouped data is computed using the formula: +( )( ), where = ( + 1) and is the floor of and n is the number of observations (or total frequency). Before you apply the formula, make sure that the observations are written in an ascending order. ) For grouped data. Think of the pth percentile to be the observation that has cumulative frequency, where is the total frequency. Find the first class that has cumulative frequency greater than or equal values and cumulative frequencies of the, say. Use the class to approximate the required percentile linearly as shown in the example. Interquartile Range: The interquartile range is denoted by IQR. It is given by IQR Q3 Q1 Example: Find, the median,, and IQR for the following observations 3, 1, 54, 43, 51, 17, 3, 19, 14,, 5, 8, 33, 4, 6, 38, 50 We start with putting the above data in ascending order: 1, 14, 17, 19,, 3, 5, 6, 8, 3, 33, 38, 4, 43, 50, 51, 54 The number of observations is n 17. Page of 41 of this

17 Page 17 The first quartile Q1 is the 0.5 (17 1)th ordered observation, i.e. Q1 is the 4.5th observation. Now, the 4th observation is 19 and the 5th observation is, hence the 4.5th observation is + 0.5( ) = 19 (0.5 ( 19)) = 0.5. The median is the 0.5 (17 1)th ordered observation, i.e. it is the 9th observation, namely 8. Q3 is the 0.75 (17 1)th observation, i.e., it is the 13.5th observation, namely 4 (0.5 (43 4)) = 4.5 P60 is the 0.6 (17 1)th observation, i.e., it is the 10.8th observation, namely 3 (0.8 (33 3)) = 3.8 IQR Q3 Q1 = =. Example: Find the median and the 80th percentile of the following data. The total frequency To find the median. observation, which is Frequency Total 18 = = 9.5. Thus, the median is the 9.5th ordered + 0.5( To find the 80th percentile. ordered observation, which is 17. x ) = (1 9) = = 15.. Thus, the 8th percentile is the 15.nd + 0.( Page of 41 ) = (17 17) =

18 Page 18 Example: Find the median for the following grouped data. Class Frequency Total 17 = 17, = = 8.5. The first class that has cumulative frequency 8.5 is 1 16 and the actual limits of this class are 11.5 and 16.5, respectively.? = = = median. Example: Consider the following table of grouped data Class Total Frequency Estimate the proportion of observations that are less than 4. Page of 41

19 Page 19 The observation 4 belongs to the class 7. Let p be the required proportion. Box-and-Whisker Plots (Box plots) (Optional) : A box-and-whisker plot (or simply a box plot) is a useful visual device for demonstrating the information contained in a data set. It reveals information regarding the amount of spread, location of concentration, and symmetry of the data. The construction of such a plot makes use of the quartiles of a data set and may be accomplished by the following steps: 1. Represent the data on the horizontal axis.. Draw a box in the space above the horizontal axis in such a way that the left end of the box aligns with the first quartile Q1 and the right end of the box aligns with the third quartile Q3. 3. Divide the box into two parts by a vertical line that aligns with the median Q. 4. Draw a horizontal line called a whisker from the left end of the box to a point that aligns with the smallest measurement in the data set. 5. Draw another horizontal line, or whisker, from the right end of the box to a point that aligns with the largest measurement in the data set. Example: Construct a box-and-whisker plot for the data in the previous example. Page of 41

20 Page 0 Reading Assignment: Chapter (.5) in W.W. Daniel. Chapter 3 Some Basic Probability Concepts Elementary Properties of Probability: A random experiment is an experiment whose outcome is a random variable, i.e., can not be predicted with certainty. The sample space of a random experiment is the collection of all possible values of its outcome. An event is a subcollection of the sample space. The empty event is denoted by, it is the event of having no outcomes. The probability of an event E is denoted by P(E). It is a nonnegative number, less than or equal to 1 that measures the likelihood of the occurrence of the event E. Example: The following is the sample space of the experiment of tossing a coin: S H, T where H stands for head and T stands for tail. The following is the collection of all possible events of the experiment of tossing a coin:,{h },{T },{H, T } Example : Find the sample space and five different events of the experiment of tossing a coin times. Solution : S ( H, H ), ( H, T ), (T, H ), (T, T ) Page 0 0 of 41

21 Page 1 The following are events of the experiment E1 ( H, T ), ( H, H ), E ( H, T ),( H, H ),(T, T ), E3 ( H, H ), E4 S ( H, H ), ( H, T ), (T, H ), (T, T ), E5 Definition: If every possible value of the outcome of a random experiment has the same chance to occur then the experiment is said to be equally likely. If an experiment is equally likely and has a finite sample space S, then the probability of an event E this experiment is given by P ( E ) E, where stands for the number of elements and S is the sample S space of the experiment. Example: Find the probability of having a total number of dots greater than 4 if a pair of fair dice are rolled. The sample space of the experiment of rolling a pair of dice is S (1,1), (1, ), (1,3),..., (1, 6), (,1), (, ),..., (, 6),...(6, 6) The mentioned event is the following E (1, 4), (1,5), (1, 6), (,3), (, 4), (,5), (, 6), (3, ),..., (3, 6), (4,1),..., (4, 6), (5,1),..., (5, 6), (6,1),..., (6, 6) Notice that S 36 and E 30. Thus P ( E ) Conditional Probability: If A and B are events then by P ( B A) we denote the probability of occurrence of the event B given that the event A has occurred. It is called a conditional probability and it is read " probability of B given A" Elementary Properties: 1. for any event E, 0 P ( E ) 1. P ( ) 0 and P ( S ) 1 3. if S s1, s,..., sn then P ({s1}) P ({s })... P ({sn }) 1 4. if ( ( then ( ) ( ) )= ( )= 1 ( ) )= ( )= ( )+ ( ) ( ( )= ( ) )= ( )+ ( ) ( ) ( )= ( ) Page 1 1 of 41

22 Page Example: The following table represents the frequency of cocaine use by gender among 111 adult cocaine users (in the US) Life time frequency of cocaine use Male (M) Female (F) Total 1 19 times (A) times (B) times (C) Total What is the probability that a randomly selected user will be a male?. If we pick a person at random from the 111 group and found out that he is a male (M), what is the probability that he used cocaine times (C)? 3. What is the probability that a randomly selected person from the 111 group is a male (M) and a person who used cocaine times (C)? 4. What is the probability that a randomly selected person from the 111 group is a female (F) or a person who used cocaine 0-99 times (B)? 5. What is the probability that a randomly selected person from the 111 group is not a a person who used cocaine times (C)? 1. P ( M ) M We use the notation P (C M ) to denote the probability of the event C given that the event M has occurred. It is read "probability of C given M. Knowing that the selected person is a male reduces our sample space to the group of males only, thus P (C M ) 3. P ( M and C ) ( ( C " for males " 5 M 75 M and C )= ( )+ ( ) ( ) = ( ) = 1 ( ) = 1 )= = + = Example: In a group of people, 5% have both diabetes and hypertension, 4% have hypertension, and 35% have diabetes. A person is selected at random from this group. What is the probability that this person a. is diabetic or hypertensive? Page of 41

23 Page 3 b. does not have hypertension? c. is not diabetic and does not have hypertension? a. b. ( )= ( ( ( ) = = 0.5 =1 ( c. )+ ( ) = = 0.58 = ) = = 0.48 ) =1 Calculating the Probability of an Event; Conditional Probability : Recall that by P ( B A) we denote the probability of occurrence of the event B given that the event A has occurred. The conditional probability P ( B A) can be computed using the formula P( B A) P( A and B) P( A B) P( A) P( A) Thus P ( A and B ) P ( A B ) P ( A) P ( B A) Example: Let A, B be two events such that P(A) = 0.4, P(B) = 0.8 and P(A B) = 0.3. Find ( ). ( ) ( ) =. Use the following table to find the value of each of these ( ) quantities. Total Probability Thus ( ) = ( ) ( ) =.. Total Probability = 0.65 Definition: The events A and B are independent if P( Aand B) P( A B) P( A) P(B) Equivalently, if P(A) > 0 and P(B)>0 then the events A and B are independent if P ( B A) P ( B ) (and P ( A B ) P ( A) ) Example: In a group of people, 5% have both diabetes and hypertension, 4% have hypertension, and 35% have diabetes. a. What is the percent of those people that have hypertension also have a diabetes? b. For that group of people, are the events "Diabetic" and " Hypertensive" independent? Page 3 3 of 41

24 Page 4 a. P (diabetic hypertensive ) = P (diabetic and hypertensive ) P ( have hypertension ) = Thus the percent of those people that have hypertension also have a diabetes is 59.5%. b. The events "Diabetic" and " Has Hypertension" are not independent because P (diabetic hypertensive ) P (diabetic ) 0.35 Fact: If and and, and. Example: Let, are independent then the following events are also independent: and, be two independent events such that ( ) = 0.4 and ( ) = 0.. Find i) ( ) ii) ( ). Since and are independent, and, and are independent. Thus: ( ) = ( ) ( ) = (1 0.4)0. = 0.1 i) ) ( ) = ( ) = 1 0. = 0.8 ii) Definition: : The events A and B are mutually exclusive if P ( A B ) P ( A) P ( B ). Equivalently, the events A and B are mutually exclusive if P ( A B ) 0. Example: if a person (in the above example) is selected at random, d. what is the probability that this person is diabetic or hypertensive? e. are the events "Diabetic" and " Hypertensive" mutually exclusive? P (diabetic or hypertensive ) a. P (diabetic ) P (hypertensive ) P (diabetic and hypertensive ) b. The events "Diabetic" and " Hypertensive" are not mutually exclusive because P (diabetic and hypertensive ) 0 Example: if a person (in the above example) is selected at random, what is the probability that this person: a. does not have hypertension b. is not diabetic and does not have hypertension a. P ( has hypertension ) 1 P ( has hypertension ) Page 4 4 of 41

25 Page 5 P (diabetic and hypertensive ) P (diabetic or hypertensive ) b. 1 P (diabetic or hypertensive ) Bayes s Theorem. Screening Tests, Sensitivity, HANDOUT IS NOT AVAILABLE. READ FROM YOUR MAIN REFERENCE. Reading Assignment: Chapter 3 (3.1,3.,3.3,3.4, 3.5) in W.W. Daniel. Page 5 5 of 41

26

27

28

29

30

31

32

33 Chapter 4 Probability Distributions University of Jordan Fall 008 / 009 Department of Mathematics Chapter 4 Probability Distributions The Distribution of a Discrete Random Variable: The distribution of a discrete random variable X is a table, a graph or a formula that is used to specify all possible values of X along with the probability of each one of these possible values. Example: Consider the following distribution of a discrete random variable X. Find: 1) P(X is odd) ) P(X is even X > 0) k P(X = k) Total 1 1) P(X is odd) = P(X = 1 or X = 3) = P(X = 1) + P(X = 3) = = 0.7 ) P(X is even X > 0) = P(X is even and X > 0) / P(X > 0) = P(X = ) / (1 P(X = 0)) = 0.1 / 0.8 = 0.15 The Expected Value (Mean) and Variance of a Discrete Random Variable: The expected value (or the mean) of a discrete random variable X is denoted by E(X) (or ) and is given by, where the sum runs over all possible values of the random variable. The variance of is given by, where Example: Find and for the random variable given in the above example. k P(X = k) P(X = k) P(X = k) Total and

34 Chapter 4 Probability Distributions Lecture #9 The Binomial Experiment and Distribution: Before we introduce the binomial (or Bernoulli) experiments we introduce some notations for some relevant mathematical quantities. 1. The Factorial of a Nonnegative Integer : if n is a nonnegative integer then by n! we denote 1 if n = 0 what refers to " nfactorial" defined by n! = n ( n 1) ( n )... 1 if n > 0 Remark: for any n 1, n! = n ( n 1)! Example: 0! = 1, 1! = 1,! =, 3! = 3 1 = 6, 4! = 4 3! = 4,.... Combinations: : if n is a positive integer and k is an integer such that 0 < k nthen the combination Example: n n n! is defined by = k k k! ( n k)! 10 10! = = ! 0! 10 10! = = 1 0 0! 10! 10 10! 10 9! = = = ! 9! 1 9! 10 10! ! = = = = ! 6! 4 3 6! n Fact:The number of ways of selecting k objects from n objects is given by. k Example: How many teams of 6 players can we choose out of a group of 8 people? 8 8! 8 7 6! Answer: = = = 8 teams. 6 6!! 6! Example: In how many ways can we choose 3 balls from an urn that contains 5 balls. Answer: 5!! 10 ways. 3!!! Example: How many events with size 4 are there if the size of the sample space is 6? 6 6! 6 5 4! Answer: = = = 15 events. 4 4!! 4!

35 Chapter 4 Probability Distributions The binomial (or Bernoulli) experiment : A binomial (or Bernoulli) experiment is a random experiment that has the following properties: 1) has exactly one of two possible outcomes, one is referred to as success and the other is referred to as failure. ) the probability of success in each trial of the experiment is constant, usually denoted by. 3) all trials of the experiment are independent. Examples: 1. Tossing a coin. The outcome is either a head or a tail.. Checking whether a new born is a boy or a girl 3. Checking whether a person is diabetic or not The Binomial Random Variable: The binomial random variable is the number of successes when a binomial experiment, with probability of success in each trial, is performed times. We denote it by ~,. The possible values of are 0,1,,. Examples: 1. Select a random sample of 10 people. Let be the number of diabetics within this sample. Then ~ 10,, where is the proportion of diabetics in the population from which the sample is selected. The possible values of are 0,1,,,10.. Toss a fair coin 0 times. Let be the number of times a head comes out. Then ~ 0,0.5. The possible values of are 0,1,,,0. Fact: If ~, then 1) for each 0,1,..,, 1 ) 3) 1 Example: Let ~ 5,0.3. Find: 1) ) 3) 1) ! ) !! ). Thus

36 Chapter 4 Probability Distributions Lecture #9 Example: Let ~,0.5. Exhibit the distribution of as a table Total 1 Example: Suppose that the probability that a patient suffering from migraine headache pain will obtain a relief with a particular drug is 0.9. Three randomly selected sufferers from migraine headache are given this drug. Find the probability that the number of sufferers in the selected sample obtaining relief will be: 1) Exactly zero ) At least one 3) Two or three 4) At most two Let be the number of sufferers in the selected sample obtaining a relief. Then ~ 3,0.9. 1) ) ) ) Note: The binomial distribution is completely determined by and. They are called the parameters of the binomial distribution Binomial Tables: When is large, the calculations of binomial probabilities using the equation can be tedious. We may bypass these tedious calculations through using a binomial table. Binomial tables enable us to read the value of for any 0,1,,.

37 Chapter 4 Probability Distributions Lecture #9 The following is a part of the binomial table for 10,. Example: Let ~ 10,0.3. Use the above table to find: 1) 4 ) 4 3) 4 4) 4 5) 4 6) 6 7) 6 8) 6 9) 6 1) ) ) ) ) ) The rest are left as an exercise. Reading Assignment: Chapter 4 (4.1,4.,4,3) in W.W. Daniel, 7 th edition.

38 Page 31 Chapter 4 Probability Distributions Page 31 6 of 41

39 Page 3 The Poisson Random Variable: The Poisson random variable is the number of occurrences of a rare event in an interval of time or a space unit. If is the average (or expected) number of occurrences of this event in the time (or space) unit then we write ~ The possible values of Fact: If ~ 1) for each ) 3) ( )= = 0,1,..,, ( Example: Let ~ ) = )=!, where ) ( > 0) = 1 ( 0) = 1 ( = 0) = 1! ( )= ( (3). Find: 1) ( > 0).71 ) ( 1) are 0,1,, ( ) then ( )= ( ). ) ( ( )). Thus ( )= Page 3 7 of 41 =1 ( ) + ( ( )) = = 1

40 Page 33 Example: The number of cases admitted to the CCU in a certain hospital is distributed according to a Poisson distribution with average 3 cases per day. Find the probability of admitting 5 case to the CCU in this hospital in a random week. Let be the number of cases admitted to the CCU in this hospital in a (3 7) = week. Then ~ (1). Thus, ( Note: Poisson distribution is completely determined by = 5) =! It is called the parameter of the Poisson distribution Poisson Tables: Poisson tables enable us to read the value of ( ) for any = 0,1, when ~ ( ) for several values of. The following is a part of a Poisson table for Exercise: Let ~ 1) ) 3) 4) 5) 6) 7) 8) 9) ( 3) ( = 3) ( > ) ( ) ( < < 5) ( 5) ( ( < ( ). (1.5). Use the above table to find: ( < 3) < 5) 5) Reading Assignment: Chapter 4 (4.4) in W.W. Daniel. Page 33 8 of 41 =

41 Page 34 The Normal Distribution: Normal distribution is probably one of the most important and widely used continuous distributions. A normally distributed random variable is known as a normal random variable. The following are the properties of the normal distribution: Properties of the Normal Distribution: 1. It is bell shaped and is symmetrical about its mean.. Its mean equals its median equals it mode.. 3. It is a continuous distribution. 4. It is completely determined by its mean and its variance. A normal random variable X with mean and variance is expressed as ~ (. ) 5. The total area under the curve equals 1. Thus, the area of the distribution on each side of the mean is The probability that the normal random variable will have a value between any two points is equal to the area under the curve between those points. Page 34 9 of 41

42 Page 35 The curve on the right is skewed to the right. Its mode < its median < its mean. The one on the left is skewed to the left. Its mode > its median > its mean. To find the probability that a normal random variable X will have a value smaller than a given number, we transform the normal random X to the standard normal random variable Z that has mean 0 and variance 1. This transformation is done using the formula =. A standard Z table can be used to find probabilities for any normal curve problem that has been converted to Z scores. The following steps are helpful when working with the normal curve problems: 1. Graph the normal distribution, and shade the area related to the probability you want to find.. Convert the boundaries of the shaded area from X values to the standard normal random variable Z values using the Z formula above. 3. Use the standard Z table to find the probabilities or the areas related to the Z values in step. Example: The weights of 1000 children are normally distributed with mean 5 kg and standard deviation 5 kg. 1) Find the proportion of children that have weights between kg and 8 kg. ) About how many children have weights smaller than 30 kg? 3) If a child is randomly selected, find the probability that her/his weight is smaller than 8. 4) Find the third quartile of the weights of these children. 5) Find a positive number C such that 68% of the children have weights between 5 C and 5+C. Page of 41

43 Page 36 Let X represent the children s weights. Then ~ (5, 5 ). 1) To find ( < < 8) < < = ( 0.6 < < 0.6) 5 5 = ( < 0.6) ( < 0.6) = = ( < < 8) = ) ( < 30) = = ( < 1) = < Thus, about = 841 children have weights less than 30 kg. 3) Find ( < 8) (Exercise) 4) The third quartile is nothing but ( < ) = Thus, we find that 5) (5 < < which is characterized by the property = From the standard normal table Hence, = = 8.35 kg. < 5 + ) = 0.68 < < Reading Assignment: = 0.68 < = 0.84 < < =1 = 5. = 0.68 Chapter 4 (4.6,4.7) in W.W. Daniel. Chapter 5 Some Important Sampling Distributions Introduction: A statistical measure for a sample is called a statistic and a statistical measure for a population is called a parameter. Example of statistics are, s,. The following are parameters, σ,. A statistic is a random variable but a parameter is not. Sample statistics like x and s are used to estimate population parameters like and, respectively. There is some difference (or error ) between statistics and parameters. Different samples from the same population may have different amounts of sampling error. Studying sampling distributions of sample statistics helps us understand statistical inference and allows us to answer questions about sample statistics. Sampling Distributions : The sampling distribution of a statistic is the distribution of the values taken by that statistic in all possible samples of the same size that are drawn from the same population. Page of 41

44 Page 37 Note : The number of all possible samples of size n, drawn without replacement from a N N! population of size N, equals. If we allow replacement then the number n n! ( N n)! of all possible samples is N n. Example : The following table gives all possible samples of size drawn with replacement from a population that comprises the weights ( in pounds ) of 5 children together with the mean of each sample Population data : Population (65,65), 65 (54,65),59.5 (67,65),66 (65,65),65 (88,65), (65,54),59.5 (54,54),54 (67,54),60.5 (65,54),59.5 (88,54),71 67 (65,67),66 (54,67),60.5 (67,67),67 (65,67),66 (88,67), (65,65),65 (54,65),59.5 (67,65),66 (65,65),65 (88,65), (65,88),76.5 (54,88),71 (67,88),77.5 (65,88),76.5 (88,88),88 The following chart represents the above samples' means Page 37 3 of 41

45 Page 38 Sampling Distribution of the Mean: Theorem: The sampling distribution of x in a normally distributed population with mean and standard deviation is also normally distributed with mean and standard deviation n,where n is the sample size, provided that sampling is performed with replacement. If sampling is performed without replacement then the sampling distribution is also normally distributed with mean and standard deviation The factor n N n, where N is the size of the population. N 1 N n is called the correction factor. It is negligible if n 0.05 N or N 1 N is very large (infinite or practically infinite). The Central Limit Theorem (CLT) : When the sample size is large ( n 30 ), the above Theorem is also valid even if the population is not normally distributed. In fact the sampling distribution of the mean is almost normal when n is large.the larger the sample size, the closer the sampling distribution of the mean to being normally distributed. Example: Suppose that the ages of Jordan University students follow a normal distribution with mean 0.5 years and standard deviation 1.4 years. If we repeatedly collect samples of size n 49 : a) what is the sampling distribution of x? Answer: ~ 0.5, (. ) ~ (0.5,0.04)~ (0.5, (0.) ) b) what is the probability that the mean age of a randomly selected sample of size 49 of Jordan University students is smaller than 1 years? Answer: P ( x 1) P ( Z ) P ( Z.5) c) what is the probability that an individual student is younger than 1 years old? Answer: thus ~ (1.5, (1.4) P ( x 1) P ( Z ) P ( Z 0.36) d) what is the distribution of x if the ages of Jordan University students do not follow a normal distribution? Page of 41

46 Page 39 Answer : The distribution of x will be approximately normal with mean 0.5 and standard deviation 0. since the sample size is > 30, Reading Assignment: Chapter 5 (5.1,5.,5.3) in W.W. Daniel. Distribution of the Difference Between Two Sample Means: Suppose that we want to know whether or not the mean serum cholesterol level is higher in a population of sedentary office workers than in a population of laborers. If we know that those means are different then we may wish to know by how much they differ. One way is to take a random sample from each population then look at the sampling distribution of x1 x to answer probability questions and draw statistical inference. Sampling Distribution of x1 x : Theorem: If we draw two independent random samples of sizes n1 and n from two distinct normally distributed populations, having means 1, and standard deviations 1 and, respectively, then x1 x is normally distributed with mean x x 1 and standard 1 deviation x x 1 1 n1 n Note: The above theorem is also valid if the populations are not (both) normally distributed provided that both n1 and n are greater than or equal to 30. Example: One group on a diet lost an average of 7. kg with standard deviation 3.7 kg., another group on sportive exercises lost an average of 4.0 kg with a standard deviation of 3.9 kg. Suppose we collect samples of sizes n1 4 from the diet group and n 47 from the exercises group : (a) what is the sampling distribution of x1 x? Answer: the sampling distribution of x1 x is approximately normal ( since n1 30 and n 30 ) with mean kg and standard deviation (3.7) (3.9) kg 4 47 (b) what is the probability that the difference between mean weight loss of the two groups is larger than 4.0 kg? Page of 41

47 Page Answer: P x1 x 4.0 P Z P Z (c) what is the probability that the mean weight loss of the exercises group is larger than 4.0 kg? ~ Answer : 4.0,. = (4.0, (0.569) ), thus P(x 4.0) P Z (d) Find the IQR (interquartile range) of ( = ( Thus, < = <.. ) = 0.75 < ) = 0.5 < 3. = = = = = =.656 = = = = Distribution of the Sample Proportion: In this section we study the distribution of sample proportion. Such distribution helps us answer probability questions about proportions when it is tedious, difficult or practically impossible to use binomial tables. For example, suppose that in a certain population 0.08 percent are color blind, if we randomly select 1500 individuals from this population, what is the probability that the proportion of color blinds in that sample is at least To answer such question using binomial tables we need to find the probability that the variable x is greater than or equal to given that x is binomially distributed with p 0.08 and n How would we answer that question if we don't have binomial tables for n 1500 (or even for any n 5)? Distribution of Sample Proportion; An Empirical Rule: When the sample size is "large" (we will see shortly what large means), the distribution of sample proportions is approximately normally distributed with mean equal to the true population proportion p and standard deviation equal to p (1 p ). The sample is considered "large n enough" if np 5 and n (1 p ) 5. Page of 41

48 Page 41 Example: Suppose that in a certain population 0.08 percent are color blind, if we randomly select 1500 individuals from this population. Find: a) the probability that the proportion of color blinds in that sample is at least b) the 95th percentile a) of. p 0.08 and n Since np and n (1 p ) , the proportion of color blinds is approximately normally distributed with mean p 0.08 and standard deviation p(1 p) n 1500 Thus b) ( < ) = 0.95 < = = = 1.65 Distribution of the difference between two sample proportions HANDOUT IS NOT AVAILABLE. READ DIRECTLY FROM YOUR MAIN REFERENCE. Reading Assignment: Chapter 5 (5.1,5.,5.3,5.4,5.5,5.6) in W.W. Daniel. Page of 41 =

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. Statistics is a field of study concerned with the data collection,

More information

STAT 200 Chapter 1 Looking at Data - Distributions

STAT 200 Chapter 1 Looking at Data - Distributions STAT 200 Chapter 1 Looking at Data - Distributions What is Statistics? Statistics is a science that involves the design of studies, data collection, summarizing and analyzing the data, interpreting the

More information

Probability and Probability Distributions. Dr. Mohammed Alahmed

Probability and Probability Distributions. Dr. Mohammed Alahmed Probability and Probability Distributions 1 Probability and Probability Distributions Usually we want to do more with data than just describing them! We might want to test certain specific inferences about

More information

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved. 1-1 Chapter 1 Sampling and Descriptive Statistics 1-2 Why Statistics? Deal with uncertainty in repeated scientific measurements Draw conclusions from data Design valid experiments and draw reliable conclusions

More information

Sets and Set notation. Algebra 2 Unit 8 Notes

Sets and Set notation. Algebra 2 Unit 8 Notes Sets and Set notation Section 11-2 Probability Experimental Probability experimental probability of an event: Theoretical Probability number of time the event occurs P(event) = number of trials Sample

More information

A is one of the categories into which qualitative data can be classified.

A is one of the categories into which qualitative data can be classified. Chapter 2 Methods for Describing Sets of Data 2.1 Describing qualitative data Recall qualitative data: non-numerical or categorical data Basic definitions: A is one of the categories into which qualitative

More information

Binomial and Poisson Probability Distributions

Binomial and Poisson Probability Distributions Binomial and Poisson Probability Distributions Esra Akdeniz March 3, 2016 Bernoulli Random Variable Any random variable whose only possible values are 0 or 1 is called a Bernoulli random variable. What

More information

Chapter 2: Tools for Exploring Univariate Data

Chapter 2: Tools for Exploring Univariate Data Stats 11 (Fall 2004) Lecture Note Introduction to Statistical Methods for Business and Economics Instructor: Hongquan Xu Chapter 2: Tools for Exploring Univariate Data Section 2.1: Introduction What is

More information

Chapter 01 : What is Statistics?

Chapter 01 : What is Statistics? Chapter 01 : What is Statistics? Feras Awad Data: The information coming from observations, counts, measurements, and responses. Statistics: The science of collecting, organizing, analyzing, and interpreting

More information

CHAPTER 1. Introduction

CHAPTER 1. Introduction CHAPTER 1 Introduction Engineers and scientists are constantly exposed to collections of facts, or data. The discipline of statistics provides methods for organizing and summarizing data, and for drawing

More information

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives F78SC2 Notes 2 RJRC Algebra It is useful to use letters to represent numbers. We can use the rules of arithmetic to manipulate the formula and just substitute in the numbers at the end. Example: 100 invested

More information

Exercises from Chapter 3, Section 1

Exercises from Chapter 3, Section 1 Exercises from Chapter 3, Section 1 1. Consider the following sample consisting of 20 numbers. (a) Find the mode of the data 21 23 24 24 25 26 29 30 32 34 39 41 41 41 42 43 48 51 53 53 (b) Find the median

More information

MIDTERM EXAMINATION (Spring 2011) STA301- Statistics and Probability

MIDTERM EXAMINATION (Spring 2011) STA301- Statistics and Probability STA301- Statistics and Probability Solved MCQS From Midterm Papers March 19,2012 MC100401285 Moaaz.pk@gmail.com Mc100401285@gmail.com PSMD01 MIDTERM EXAMINATION (Spring 2011) STA301- Statistics and Probability

More information

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart

More information

CIVL 7012/8012. Collection and Analysis of Information

CIVL 7012/8012. Collection and Analysis of Information CIVL 7012/8012 Collection and Analysis of Information Uncertainty in Engineering Statistics deals with the collection and analysis of data to solve real-world problems. Uncertainty is inherent in all real

More information

UNIVERSITY OF MASSACHUSETTS Department of Biostatistics and Epidemiology BioEpi 540W - Introduction to Biostatistics Fall 2004

UNIVERSITY OF MASSACHUSETTS Department of Biostatistics and Epidemiology BioEpi 540W - Introduction to Biostatistics Fall 2004 UNIVERSITY OF MASSACHUSETTS Department of Biostatistics and Epidemiology BioEpi 50W - Introduction to Biostatistics Fall 00 Exercises with Solutions Topic Summarizing Data Due: Monday September 7, 00 READINGS.

More information

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- # Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series by Mario F. Triola Chapter 3 Statistics for Describing, Exploring, and Comparing Data 3-1 Review and Preview 3-2 Measures

More information

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data Review for Exam #1 1 Chapter 1 Population the complete collection of elements (scores, people, measurements, etc.) to be studied Sample a subcollection of elements drawn from a population 11 The Nature

More information

Chapter 3. Data Description

Chapter 3. Data Description Chapter 3. Data Description Graphical Methods Pie chart It is used to display the percentage of the total number of measurements falling into each of the categories of the variable by partition a circle.

More information

Chapter2 Description of samples and populations. 2.1 Introduction.

Chapter2 Description of samples and populations. 2.1 Introduction. Chapter2 Description of samples and populations. 2.1 Introduction. Statistics=science of analyzing data. Information collected (data) is gathered in terms of variables (characteristics of a subject that

More information

Introduction to Statistics

Introduction to Statistics Introduction to Statistics Data and Statistics Data consists of information coming from observations, counts, measurements, or responses. Statistics is the science of collecting, organizing, analyzing,

More information

A SHORT INTRODUCTION TO PROBABILITY

A SHORT INTRODUCTION TO PROBABILITY A Lecture for B.Sc. 2 nd Semester, Statistics (General) A SHORT INTRODUCTION TO PROBABILITY By Dr. Ajit Goswami Dept. of Statistics MDKG College, Dibrugarh 19-Apr-18 1 Terminology The possible outcomes

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

Elementary Statistics

Elementary Statistics Elementary Statistics Q: What is data? Q: What does the data look like? Q: What conclusions can we draw from the data? Q: Where is the middle of the data? Q: Why is the spread of the data important? Q:

More information

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1 Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 3 Statistics for Describing, Exploring, and Comparing Data 3-1 Overview 3-2 Measures

More information

The science of learning from data.

The science of learning from data. STATISTICS (PART 1) The science of learning from data. Numerical facts Collection of methods for planning experiments, obtaining data and organizing, analyzing, interpreting and drawing the conclusions

More information

Practice problems from chapters 2 and 3

Practice problems from chapters 2 and 3 Practice problems from chapters and 3 Question-1. For each of the following variables, indicate whether it is quantitative or qualitative and specify which of the four levels of measurement (nominal, ordinal,

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics CHAPTER OUTLINE 6-1 Numerical Summaries of Data 6- Stem-and-Leaf Diagrams 6-3 Frequency Distributions and Histograms 6-4 Box Plots 6-5 Time Sequence Plots 6-6 Probability Plots Chapter

More information

Chapter 4a Probability Models

Chapter 4a Probability Models Chapter 4a Probability Models 4a.2 Probability models for a variable with a finite number of values 297 4a.1 Introduction Chapters 2 and 3 are concerned with data description (descriptive statistics) where

More information

MATH 10 INTRODUCTORY STATISTICS

MATH 10 INTRODUCTORY STATISTICS MATH 10 INTRODUCTORY STATISTICS Tommy Khoo Your friendly neighbourhood graduate student. Week 1 Chapter 1 Introduction What is Statistics? Why do you need to know Statistics? Technical lingo and concepts:

More information

1. Exploratory Data Analysis

1. Exploratory Data Analysis 1. Exploratory Data Analysis 1.1 Methods of Displaying Data A visual display aids understanding and can highlight features which may be worth exploring more formally. Displays should have impact and be

More information

Section 1.1. Data - Collections of observations (such as measurements, genders, survey responses, etc.)

Section 1.1. Data - Collections of observations (such as measurements, genders, survey responses, etc.) Section 1.1 Statistics - The science of planning studies and experiments, obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions based on the data.

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 1- part 1: Describing variation, and graphical presentation Outline Sources of variation Types of variables Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease

More information

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1 Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 4-1 Overview 4-2 Fundamentals 4-3 Addition Rule Chapter 4 Probability 4-4 Multiplication Rule:

More information

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics Mathematics Curriculum A. DESCRIPTION This is a full year courses designed to introduce students to the basic elements of statistics and probability. Emphasis is placed on understanding terminology and

More information

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 4.1-1

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 4.1-1 Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by Mario F. Triola 4.1-1 4-1 Review and Preview Chapter 4 Probability 4-2 Basic Concepts of Probability 4-3 Addition

More information

Statistical Theory 1

Statistical Theory 1 Statistical Theory 1 Set Theory and Probability Paolo Bautista September 12, 2017 Set Theory We start by defining terms in Set Theory which will be used in the following sections. Definition 1 A set is

More information

Resistant Measure - A statistic that is not affected very much by extreme observations.

Resistant Measure - A statistic that is not affected very much by extreme observations. Chapter 1.3 Lecture Notes & Examples Section 1.3 Describing Quantitative Data with Numbers (pp. 50-74) 1.3.1 Measuring Center: The Mean Mean - The arithmetic average. To find the mean (pronounced x bar)

More information

Glossary for the Triola Statistics Series

Glossary for the Triola Statistics Series Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling

More information

Lecture 1: Descriptive Statistics

Lecture 1: Descriptive Statistics Lecture 1: Descriptive Statistics MSU-STT-351-Sum 15 (P. Vellaisamy: MSU-STT-351-Sum 15) Probability & Statistics for Engineers 1 / 56 Contents 1 Introduction 2 Branches of Statistics Descriptive Statistics

More information

Unit 2. Describing Data: Numerical

Unit 2. Describing Data: Numerical Unit 2 Describing Data: Numerical Describing Data Numerically Describing Data Numerically Central Tendency Arithmetic Mean Median Mode Variation Range Interquartile Range Variance Standard Deviation Coefficient

More information

Histograms allow a visual interpretation

Histograms allow a visual interpretation Chapter 4: Displaying and Summarizing i Quantitative Data s allow a visual interpretation of quantitative (numerical) data by indicating the number of data points that lie within a range of values, called

More information

Probability. Introduction to Biostatistics

Probability. Introduction to Biostatistics Introduction to Biostatistics Probability Second Semester 2014/2015 Text Book: Basic Concepts and Methodology for the Health Sciences By Wayne W. Daniel, 10 th edition Dr. Sireen Alkhaldi, BDS, MPH, DrPH

More information

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes We Make Stats Easy. Chapter 4 Tutorial Length 1 Hour 45 Minutes Tutorials Past Tests Chapter 4 Page 1 Chapter 4 Note The following topics will be covered in this chapter: Measures of central location Measures

More information

3.1 Measures of Central Tendency: Mode, Median and Mean. Average a single number that is used to describe the entire sample or population

3.1 Measures of Central Tendency: Mode, Median and Mean. Average a single number that is used to describe the entire sample or population . Measures of Central Tendency: Mode, Median and Mean Average a single number that is used to describe the entire sample or population. Mode a. Easiest to compute, but not too stable i. Changing just one

More information

Basic Statistics and Probability Chapter 3: Probability

Basic Statistics and Probability Chapter 3: Probability Basic Statistics and Probability Chapter 3: Probability Events, Sample Spaces and Probability Unions and Intersections Complementary Events Additive Rule. Mutually Exclusive Events Conditional Probability

More information

Types of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511

Types of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511 Topic 2 - Descriptive Statistics STAT 511 Professor Bruce Craig Types of Information Variables classified as Categorical (qualitative) - variable classifies individual into one of several groups or categories

More information

FREQUENCY DISTRIBUTIONS AND PERCENTILES

FREQUENCY DISTRIBUTIONS AND PERCENTILES FREQUENCY DISTRIBUTIONS AND PERCENTILES New Statistical Notation Frequency (f): the number of times a score occurs N: sample size Simple Frequency Distributions Raw Scores The scores that we have directly

More information

Counting principles, including permutations and combinations.

Counting principles, including permutations and combinations. 1 Counting principles, including permutations and combinations. The binomial theorem: expansion of a + b n, n ε N. THE PRODUCT RULE If there are m different ways of performing an operation and for each

More information

AIM HIGH SCHOOL. Curriculum Map W. 12 Mile Road Farmington Hills, MI (248)

AIM HIGH SCHOOL. Curriculum Map W. 12 Mile Road Farmington Hills, MI (248) AIM HIGH SCHOOL Curriculum Map 2923 W. 12 Mile Road Farmington Hills, MI 48334 (248) 702-6922 www.aimhighschool.com COURSE TITLE: Statistics DESCRIPTION OF COURSE: PREREQUISITES: Algebra 2 Students will

More information

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Lecture 2 Quantitative variables There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Stemplot (stem-and-leaf plot) Histogram Dot plot Stemplots

More information

Chapter. Probability

Chapter. Probability Chapter 3 Probability Section 3.1 Basic Concepts of Probability Section 3.1 Objectives Identify the sample space of a probability experiment Identify simple events Use the Fundamental Counting Principle

More information

COMPLEMENTARY EXERCISES WITH DESCRIPTIVE STATISTICS

COMPLEMENTARY EXERCISES WITH DESCRIPTIVE STATISTICS COMPLEMENTARY EXERCISES WITH DESCRIPTIVE STATISTICS EX 1 Given the following series of data on Gender and Height for 8 patients, fill in two frequency tables one for each Variable, according to the model

More information

Conditional Probability

Conditional Probability Conditional Probability Idea have performed a chance experiment but don t know the outcome (ω), but have some partial information (event A) about ω. Question: given this partial information what s the

More information

P8130: Biostatistical Methods I

P8130: Biostatistical Methods I P8130: Biostatistical Methods I Lecture 2: Descriptive Statistics Cody Chiuzan, PhD Department of Biostatistics Mailman School of Public Health (MSPH) Lecture 1: Recap Intro to Biostatistics Types of Data

More information

Units. Exploratory Data Analysis. Variables. Student Data

Units. Exploratory Data Analysis. Variables. Student Data Units Exploratory Data Analysis Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison Statistics 371 13th September 2005 A unit is an object that can be measured, such as

More information

Introduction to Statistics

Introduction to Statistics Why Statistics? Introduction to Statistics To develop an appreciation for variability and how it effects products and processes. Study methods that can be used to help solve problems, build knowledge and

More information

Determining the Spread of a Distribution

Determining the Spread of a Distribution Determining the Spread of a Distribution 1.3-1.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3-2311 Lecture 3-2311 1 / 58 Outline 1 Describing Quantitative

More information

Determining the Spread of a Distribution

Determining the Spread of a Distribution Determining the Spread of a Distribution 1.3-1.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3-2311 Lecture 3-2311 1 / 58 Outline 1 Describing Quantitative

More information

Probability Distribution

Probability Distribution Economic Risk and Decision Analysis for Oil and Gas Industry CE81.98 School of Engineering and Technology Asian Institute of Technology January Semester Presented by Dr. Thitisak Boonpramote Department

More information

Statistics 1. Edexcel Notes S1. Mathematical Model. A mathematical model is a simplification of a real world problem.

Statistics 1. Edexcel Notes S1. Mathematical Model. A mathematical model is a simplification of a real world problem. Statistics 1 Mathematical Model A mathematical model is a simplification of a real world problem. 1. A real world problem is observed. 2. A mathematical model is thought up. 3. The model is used to make

More information

MATH 117 Statistical Methods for Management I Chapter Three

MATH 117 Statistical Methods for Management I Chapter Three Jubail University College MATH 117 Statistical Methods for Management I Chapter Three This chapter covers the following topics: I. Measures of Center Tendency. 1. Mean for Ungrouped Data (Raw Data) 2.

More information

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable QUANTITATIVE DATA Recall that quantitative (numeric) data values are numbers where data take numerical values for which it is sensible to find averages, such as height, hourly pay, and pulse rates. UNIVARIATE

More information

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career. Introduction to Data and Analysis Wildlife Management is a very quantitative field of study Results from studies will be used throughout this course and throughout your career. Sampling design influences

More information

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected What is statistics? Statistics is the science of: Collecting information Organizing and summarizing the information collected Analyzing the information collected in order to draw conclusions Two types

More information

AP Final Review II Exploring Data (20% 30%)

AP Final Review II Exploring Data (20% 30%) AP Final Review II Exploring Data (20% 30%) Quantitative vs Categorical Variables Quantitative variables are numerical values for which arithmetic operations such as means make sense. It is usually a measure

More information

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019 Lecture 10: Probability distributions DANIEL WELLER TUESDAY, FEBRUARY 19, 2019 Agenda What is probability? (again) Describing probabilities (distributions) Understanding probabilities (expectation) Partial

More information

Unit 4 Probability. Dr Mahmoud Alhussami

Unit 4 Probability. Dr Mahmoud Alhussami Unit 4 Probability Dr Mahmoud Alhussami Probability Probability theory developed from the study of games of chance like dice and cards. A process like flipping a coin, rolling a die or drawing a card from

More information

Business Statistics: A Decision-Making Approach, 6e. Chapter Goals

Business Statistics: A Decision-Making Approach, 6e. Chapter Goals Chapter 4 Student Lecture Notes 4-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 4 Using Probability and Probability Distributions Fundamentals of Business Statistics Murali Shanker

More information

Chapter 2: Descriptive Analysis and Presentation of Single- Variable Data

Chapter 2: Descriptive Analysis and Presentation of Single- Variable Data Chapter 2: Descriptive Analysis and Presentation of Single- Variable Data Mean 26.86667 Standard Error 2.816392 Median 25 Mode 20 Standard Deviation 10.90784 Sample Variance 118.981 Kurtosis -0.61717 Skewness

More information

REVIEW: Midterm Exam. Spring 2012

REVIEW: Midterm Exam. Spring 2012 REVIEW: Midterm Exam Spring 2012 Introduction Important Definitions: - Data - Statistics - A Population - A census - A sample Types of Data Parameter (Describing a characteristic of the Population) Statistic

More information

Lecture 6. Probability events. Definition 1. The sample space, S, of a. probability experiment is the collection of all

Lecture 6. Probability events. Definition 1. The sample space, S, of a. probability experiment is the collection of all Lecture 6 1 Lecture 6 Probability events Definition 1. The sample space, S, of a probability experiment is the collection of all possible outcomes of an experiment. One such outcome is called a simple

More information

Discrete Random Variables

Discrete Random Variables Discrete Random Variables An Undergraduate Introduction to Financial Mathematics J. Robert Buchanan 2014 Introduction The markets can be thought of as a complex interaction of a large number of random

More information

Lecture 2: Probability and Distributions

Lecture 2: Probability and Distributions Lecture 2: Probability and Distributions Ani Manichaikul amanicha@jhsph.edu 17 April 2007 1 / 65 Probability: Why do we care? Probability helps us by: Allowing us to translate scientific questions info

More information

Introduction to Basic Statistics Version 2

Introduction to Basic Statistics Version 2 Introduction to Basic Statistics Version 2 Pat Hammett, Ph.D. University of Michigan 2014 Instructor Comments: This document contains a brief overview of basic statistics and core terminology/concepts

More information

Chapter 2 Solutions Page 15 of 28

Chapter 2 Solutions Page 15 of 28 Chapter Solutions Page 15 of 8.50 a. The median is 55. The mean is about 105. b. The median is a more representative average" than the median here. Notice in the stem-and-leaf plot on p.3 of the text that

More information

ST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart

ST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart ST2001 2. Presenting & Summarising Data Descriptive Statistics Frequency Distribution, Histogram & Bar Chart Summary of Previous Lecture u A study often involves taking a sample from a population that

More information

Probability: Why do we care? Lecture 2: Probability and Distributions. Classical Definition. What is Probability?

Probability: Why do we care? Lecture 2: Probability and Distributions. Classical Definition. What is Probability? Probability: Why do we care? Lecture 2: Probability and Distributions Sandy Eckel seckel@jhsph.edu 22 April 2008 Probability helps us by: Allowing us to translate scientific questions into mathematical

More information

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that? Tastitsticsss? What s that? Statistics describes random mass phanomenons. Principles of Biostatistics and Informatics nd Lecture: Descriptive Statistics 3 th September Dániel VERES Data Collecting (Sampling)

More information

Mapping Common Core State Standard Clusters and. Ohio Grade Level Indicator. Grade 7 Mathematics

Mapping Common Core State Standard Clusters and. Ohio Grade Level Indicator. Grade 7 Mathematics Mapping Common Core State Clusters and Ohio s Grade Level Indicators: Grade 7 Mathematics Ratios and Proportional Relationships: Analyze proportional relationships and use them to solve realworld and mathematical

More information

Biostatistics Presentation of data DR. AMEER KADHIM HUSSEIN M.B.CH.B.FICMS (COM.)

Biostatistics Presentation of data DR. AMEER KADHIM HUSSEIN M.B.CH.B.FICMS (COM.) Biostatistics Presentation of data DR. AMEER KADHIM HUSSEIN M.B.CH.B.FICMS (COM.) PRESENTATION OF DATA 1. Mathematical presentation (measures of central tendency and measures of dispersion). 2. Tabular

More information

Lesson B1 - Probability Distributions.notebook

Lesson B1 - Probability Distributions.notebook Learning Goals: * Define a discrete random variable * Applying a probability distribution of a discrete random variable. * Use tables, graphs, and expressions to represent the distributions. Should you

More information

Lecture Notes for BUSINESS STATISTICS - BMGT 571. Chapters 1 through 6. Professor Ahmadi, Ph.D. Department of Management

Lecture Notes for BUSINESS STATISTICS - BMGT 571. Chapters 1 through 6. Professor Ahmadi, Ph.D. Department of Management Lecture Notes for BUSINESS STATISTICS - BMGT 571 Chapters 1 through 6 Professor Ahmadi, Ph.D. Department of Management Revised May 005 Glossary of Terms: Statistics Chapter 1 Data Data Set Elements Variable

More information

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode. Chapter 3 Numerically Summarizing Data Chapter 3.1 Measures of Central Tendency Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode. A1. Mean The

More information

HW MATH425/525 Lecture Notes 1

HW MATH425/525 Lecture Notes 1 HW MATH425/525 Lecture Notes 1 Definition 4.1 If an experiment can be repeated under the same condition, its outcome cannot be predicted with certainty, and the collection of its every possible outcome

More information

Describing distributions with numbers

Describing distributions with numbers Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central

More information

Statistics for Managers using Microsoft Excel 6 th Edition

Statistics for Managers using Microsoft Excel 6 th Edition Statistics for Managers using Microsoft Excel 6 th Edition Chapter 3 Numerical Descriptive Measures 3-1 Learning Objectives In this chapter, you learn: To describe the properties of central tendency, variation,

More information

additionalmathematicsstatisticsadditi onalmathematicsstatisticsadditionalm athematicsstatisticsadditionalmathem aticsstatisticsadditionalmathematicsst

additionalmathematicsstatisticsadditi onalmathematicsstatisticsadditionalm athematicsstatisticsadditionalmathem aticsstatisticsadditionalmathematicsst additionalmathematicsstatisticsadditi onalmathematicsstatisticsadditionalm athematicsstatisticsadditionalmathem aticsstatisticsadditionalmathematicsst STATISTICS atisticsadditionalmathematicsstatistic

More information

Event A: at least one tail observed A:

Event A: at least one tail observed A: Chapter 3 Probability 3.1 Events, sample space, and probability Basic definitions: An is an act of observation that leads to a single outcome that cannot be predicted with certainty. A (or simple event)

More information

Chapter 4. Displaying and Summarizing. Quantitative Data

Chapter 4. Displaying and Summarizing. Quantitative Data STAT 141 Introduction to Statistics Chapter 4 Displaying and Summarizing Quantitative Data Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 31 4.1 Histograms 1 We divide the range

More information

are the objects described by a set of data. They may be people, animals or things.

are the objects described by a set of data. They may be people, animals or things. ( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms

More information

TOPIC: Descriptive Statistics Single Variable

TOPIC: Descriptive Statistics Single Variable TOPIC: Descriptive Statistics Single Variable I. Numerical data summary measurements A. Measures of Location. Measures of central tendency Mean; Median; Mode. Quantiles - measures of noncentral tendency

More information

Probabilities and Statistics Probabilities and Statistics Probabilities and Statistics

Probabilities and Statistics Probabilities and Statistics Probabilities and Statistics - Lecture 8 Olariu E. Florentin April, 2018 Table of contents 1 Introduction Vocabulary 2 Descriptive Variables Graphical representations Measures of the Central Tendency The Mean The Median The Mode Comparing

More information

Discrete Random Variables

Discrete Random Variables Discrete Random Variables An Undergraduate Introduction to Financial Mathematics J. Robert Buchanan Introduction The markets can be thought of as a complex interaction of a large number of random processes,

More information

Chapter 4 Probability

Chapter 4 Probability 4-1 Review and Preview Chapter 4 Probability 4-2 Basic Concepts of Probability 4-3 Addition Rule 4-4 Multiplication Rule: Basics 4-5 Multiplication Rule: Complements and Conditional Probability 4-6 Counting

More information

2.6 Tools for Counting sample points

2.6 Tools for Counting sample points 2.6 Tools for Counting sample points When the number of simple events in S is too large, manual enumeration of every sample point in S is tedious or even impossible. (Example) If S contains N equiprobable

More information

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things. (c) Epstein 2013 Chapter 5: Exploring Data Distributions Page 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms Individuals are the objects described by a set of data. These individuals

More information

What is Probability? Probability. Sample Spaces and Events. Simple Event

What is Probability? Probability. Sample Spaces and Events. Simple Event What is Probability? Probability Peter Lo Probability is the numerical measure of likelihood that the event will occur. Simple Event Joint Event Compound Event Lies between 0 & 1 Sum of events is 1 1.5

More information

Bemidji Area Schools Outcomes in Mathematics Algebra 2 Applications. Based on Minnesota Academic Standards in Mathematics (2007) Page 1 of 7

Bemidji Area Schools Outcomes in Mathematics Algebra 2 Applications. Based on Minnesota Academic Standards in Mathematics (2007) Page 1 of 7 9.2.1.1 Understand the definition of a function. Use functional notation and evaluate a function at a given point in its domain. For example: If f x 1, find f(-4). x2 3 Understand the concept of function,

More information

ECLT 5810 Data Preprocessing. Prof. Wai Lam

ECLT 5810 Data Preprocessing. Prof. Wai Lam ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate

More information