Sections OPIM 303, Managerial Statistics H Guy Williams, 2006

Similar documents
Sets and Set notation. Algebra 2 Unit 8 Notes

Statistics for Managers using Microsoft Excel 6 th Edition

Econ 325: Introduction to Empirical Economics

TOPIC: Descriptive Statistics Single Variable

Unit 2. Describing Data: Numerical

Statistics for Business and Economics

Dept. of Linguistics, Indiana University Fall 2015

Review of Statistics

Basic Concepts of Probability

Statistics 1. Edexcel Notes S1. Mathematical Model. A mathematical model is a simplification of a real world problem.

Instrumentation (cont.) Statistics vs. Parameters. Descriptive Statistics. Types of Numerical Data

I - Probability. What is Probability? the chance of an event occuring. 1classical probability. 2empirical probability. 3subjective probability

Probability Year 10. Terminology

STATISTICS 1 REVISION NOTES

Descriptive Statistics-I. Dr Mahmoud Alhussami

Preliminary Statistics course. Lecture 1: Descriptive Statistics

Introduction to probability

Bioeng 3070/5070. App Math/Stats for Bioengineer Lecture 3

Review Basic Probability Concept

Outline. Probability. Math 143. Department of Mathematics and Statistics Calvin College. Spring 2010

Probability Year 9. Terminology

Semester 2 Final Exam Review Guide for AMS I

Unit 7 Probability M2 13.1,2,4, 5,6

University of Jordan Fall 2009/2010 Department of Mathematics

Chapter 2 Solutions Page 15 of 28

Lecture 2 and Lecture 3

Statistics for Engineers

ECE 340 Probabilistic Methods in Engineering M/W 3-4:15. Lecture 2: Random Experiments. Prof. Vince Calhoun

Marquette University MATH 1700 Class 5 Copyright 2017 by D.B. Rowe

Statistics Primer. A Brief Overview of Basic Statistical and Probability Principles. Essential Statistics for Data Analysts Using Excel

Statistics for Managers Using Microsoft Excel (3 rd Edition)

Essentials of Statistics and Probability

Random processes. Lecture 17: Probability, Part 1. Probability. Law of large numbers

UNIT 5 ~ Probability: What Are the Chances? 1

1. Consider the independent events A and B. Given that P(B) = 2P(A), and P(A B) = 0.52, find P(B). (Total 7 marks)

3 Lecture 3 Notes: Measures of Variation. The Boxplot. Definition of Probability

QUIZ 1 (CHAPTERS 1-4) SOLUTIONS MATH 119 SPRING 2013 KUNIYUKI 105 POINTS TOTAL, BUT 100 POINTS = 100%

Elementary Statistics

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami

Announcements. Topics: To Do:

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

QUIZ 1 (CHAPTERS 1-4) SOLUTIONS MATH 119 FALL 2012 KUNIYUKI 105 POINTS TOTAL, BUT 100 POINTS


Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Ch 14 Randomness and Probability

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 2 MATH00040 SEMESTER / Probability

STA 2023 EXAM-2 Practice Problems. Ven Mudunuru. From Chapters 4, 5, & Partly 6. With SOLUTIONS

Event A: at least one tail observed A:

P(A) = Definitions. Overview. P - denotes a probability. A, B, and C - denote specific events. P (A) - Chapter 3 Probability

4 Lecture 4 Notes: Introduction to Probability. Probability Rules. Independence and Conditional Probability. Bayes Theorem. Risk and Odds Ratio

Basic Concepts of Probability

Useful for Multiplication Rule: When two events, A and B, are independent, P(A and B) = P(A) P(B).

Chapter. Probability

2011 Pearson Education, Inc

Glossary Common Core Curriculum Maps Math/Grade 9 Grade 12

Nuevo examen - 02 de Febrero de 2017 [280 marks]

3.1 Measures of Central Tendency: Mode, Median and Mean. Average a single number that is used to describe the entire sample or population

Probability and Probability Distributions. Dr. Mohammed Alahmed

AIM HIGH SCHOOL. Curriculum Map W. 12 Mile Road Farmington Hills, MI (248)

ST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart

STA 2023 EXAM-2 Practice Problems From Chapters 4, 5, & Partly 6. With SOLUTIONS

MATH 1150 Chapter 2 Notation and Terminology

Determining Probabilities. Product Rule for Ordered Pairs/k-Tuples:

Chapter2 Description of samples and populations. 2.1 Introduction.

Announcements. Lecture 5: Probability. Dangling threads from last week: Mean vs. median. Dangling threads from last week: Sampling bias

dates given in your syllabus.

AMS7: WEEK 2. CLASS 2

Chapter 01 : What is Statistics?

Lecture 1. Chapter 1. (Part I) Material Covered in This Lecture: Chapter 1, Chapter 2 ( ). 1. What is Statistics?

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

MIDTERM EXAMINATION (Spring 2011) STA301- Statistics and Probability

20 Hypothesis Testing, Part I

Section 13.3 Probability

With Question/Answer Animations. Chapter 7

CHAPTER 2: Describing Distributions with Numbers

MTH302 Quiz # 4. Solved By When a coin is tossed once, the probability of getting head is. Select correct option:

Chapter 7 Wednesday, May 26th

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES

STT 315 This lecture is based on Chapter 2 of the textbook.

Keystone Exams: Algebra

Presentation on Theo e ry r y o f P r P o r bab a il i i l t i y

Probability the chance that an uncertain event will occur (always between 0 and 1)

Chapter 2. Mean and Standard Deviation

Math 10 - Compilation of Sample Exam Questions + Answers

STA Module 4 Probability Concepts. Rev.F08 1

AP Final Review II Exploring Data (20% 30%)

LECTURE NOTES by DR. J.S.V.R. KRISHNA PRASAD

Statistics for Managers Using Microsoft Excel/SPSS Chapter 4 Basic Probability And Discrete Probability Distributions

STA 218: Statistics for Management

Lecture 3. Measures of Relative Standing and. Exploratory Data Analysis (EDA)

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

Chapter 5 : Probability. Exercise Sheet. SHilal. 1 P a g e

What is Probability? Probability. Sample Spaces and Events. Simple Event

MgtOp 215 Chapter 3 Dr. Ahn

Business Statistics. Lecture 3: Random Variables and the Normal Distribution

Lecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Topic 5 Part 3 [257 marks]

6.2 Introduction to Probability. The Deal. Possible outcomes: STAT1010 Intro to probability. Definitions. Terms: What are the chances of?

Probability Rules. MATH 130, Elements of Statistics I. J. Robert Buchanan. Fall Department of Mathematics

Transcription:

Sections 3.1 3.5 The three major properties which describe a set of data: Central Tendency Variation Shape OPIM 303 Lecture 3 Page 1

Most sets of data show a distinct tendency to group or cluster around a central point. Thus, for any particular set of data, a single typical value can be used to describe the entire data set. Three measures of central tendency include arithmetic mean, median, and mode. We will not be using the mode or geometric mean. Mode The mode is the value in a set of data which appears most frequently. The mode is unaffected by extreme values (outliers). The mode is only used for descriptive purposes because it is more variable than other measures of central tendency. Some data sets will not contain a mode. The property n ( Xi - X) = 0 is one of the important reasons that the arithmetic mean is used as i 1 the most common measure of central tendency. Here we are looking at numerical data. These tools are not useful for categorical data which is typically limited to percentage of total. Central tendency is asking what is the typical value. OPIM 303 Lecture 3 Page 2

Mean Sum the observed numerical values of the variable in the data set and divide by the total number of observations. The calculation of the mean is based on all observations in the set of data. No other measure of central tendency has this feature. However, this can also be a problem when the data contains an extreme value of values (outliers). When summarizing data which contains extreme values report the median or the median and the mean. (n = number of observations) n Xi X1 X2 X i=1 X = n = n n OPIM 303 Lecture 3 Page 3

Outlier shifts the mean Mean has the problem of being sensitive to outliers. OPIM 303 Lecture 3 Page 4

Median The median is the middle value in an ordered array of data. When there are an odd number of data points half the observations will be larger in value than the median and half will be smaller. For data sets where the total number of observations is even the median will be the point midway between the two middle data points. (The median is the value in the center of the ordered data set). The Median is not sensitive to outliers and is therefore a more useful, and more robust, measure. OPIM 303 Lecture 3 Page 5

Of these measures interquartile range and standard deviation are most useful for our purposes. OPIM 303 Lecture 3 Page 6

#1 #2 These two sets above have the same x axis scale. We can see that set 2 varies more. #3 #4 Sets 3 & 4 contain the same data but are shifted on the x axis. This does not effect the variance OPIM 303 Lecture 3 Page 7

Greater spread on this end but both sets have the same range. Range: the numerical difference between the largest value and the smallest value in a data set. Range is not a very good measure of variance. It ignores how the data is distributed. It s simple, but not very useful. Fund 5-Yr Return Putnam OTC Emerging Growth A -6.1 Amer. Century GiftTrust Inv. -2.8 PBHG Growth -1.2 Invesco Growth Inv -0.7 Consulting Group Small Cap Growth 4.3 AXP Stategy Aggressive A 5.5 Fidelity Aggressive Growth 5.9 Janus Enterprise 6.5 John Hancock Small Cap Growth A 7.6 Berger Small Company Growth Inv 8.3 MS Mid Cap Equity Tr. B 9.6 Janus Venture 9.8 Van Kampen Aggressive Growth A 12.9 Rydex OTC Inv 13.1 RS Emerging Growth A 18.5 Mean = 6.08 Median = 6.5 Mode = #N/A Very High Risk Funds Dot Scale Diagram -20.0-10.0 0.0 10.0 20.0 30.0 Values Mean Median 1st Quartile 3rd Quartile +/- 1 Std. Dev. +/- 2 Std. Dev. +/- 3 Std. Dev. OPIM 303 Lecture 3 Page 8

Q 0 Q 4 Excel adds a Q0 and a Q4. Quartiles Quartiles are the most widely used measures of noncentral location and are used to describe properties of large sets of numerical data. The quartiles are descriptive measures which split the ordered data into four quarters. First Quartile, Q 1 The first quartile is a value for which 25% of the observations are smaller and 75% are larger. (n is the total number of observations) n + 1 Q 1 = ordered observations 4 Third Quartile, Q3 The third quartile is a value for which 75% of the observations are smaller and 25% are larger. 3(n + 1) Q 3 = ordered observation 4 Rules for Quartiles: 1. If positioning point is an integer, the value at that positioning point is included in the quartile. 2. If positioning point is half way between two integers, the average of the corresponding observation is selected. 3. If the positioning point is neither an integer or half way between two integers, a simple rule is to round off to the nearest integer and select the numerical value of the corresponding observation. OPIM 303 Lecture 3 Page 9

Interquartile Range (aka: midspread, middle fifty) Interquartile Range = Q 3 Q 1 Interquartile range is a resistance measure, i.e., cannot be influenced by outliers. Tells us how spread out the middle 50% of data are. Outliers are not considered, Very robust. OPIM 303 Lecture 3 Page 10

Variance and Standard Deviation take into account how all the values in the data are distributed. These measures evaluate how the values fluctuate about the mean. Sample Variance n 2 (X i - X ) 2 i=1 S = n 1 Variance is very sensitive to outliers because the magnitude of the deviation from the mean is squared! Remember also that the units are squared. If the x axis were dollars the units of the variance would be dollars squared. Sample Standard Deviation S = n i=1 (X - X ) i n 1 2 The variance and standard deviation measure the average scatter about the mean. For almost all sets of data, the majority of the observed values lie within an interval of plus and minus one standard deviation above and below the arithmetic mean. Knowledge of the mean and standard deviation usually helps define where at least the majority of the data values are clustering. In the standard deviation the units are back to normal, more intuitive. n-1 refers to the degrees of freedom. In this case one degree has been used when we calculated the mean. OPIM 303 Lecture 3 Page 11

These three data sets all have the same mean. They all have different standard deviations because the data are distributed differently. Sets A and C have pretty much the same range but in the case of set C the clumped data at the end points increases the standard deviation further. OPIM 303 Lecture 3 Page 12

Coefficient of Variance The coefficient of variance measures the scatter in the data relative to the mean. It is a relative measure and is always expressed in percentage. (No Units) CV S = *100% X Coefficient of Variance can be useful for comparing two sets of data which have different units or all together different ranges. For instance, if comparing the variation in height among humans to the variation in height among squirrels it will be easier to compare the coefficients of variance. OPIM 303 Lecture 3 Page 13

Coefficient of Variation Example OPIM 303 Lecture 3 Page 14

OPIM 303 Lecture 3 Page 15

(The 5 Measures) The vertical line drawn within the box represents the median. The box contains the middle 50% of observations in the distribution. The whiskers enclose the upper and lower remaining 50% of the data. The box-and-whisker plot is a good tool to graphically display the characteristics of a data set. Left-Skewed Mean < Median Symmetric Mean = Median Right-Skewed Median < Mean Left-Skewed Symmetric Right-Skewed Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3 OPIM 303 Lecture 3 Page 16

Five-Number Summary X smallest Q 1 Median Q 3 Xlargest Symmetrical Data If the data are perfectly symmetrical, the relationship among the various measures in the fivenumber summary is expressed as: The distance from X smallest to the median equals the distance from the median to X largest. The distance from X smallest to the Q 1 equals the distance from the Q 3 to X largest. The diatance from Q 1 to the median equals the distance from the median to Q 3. Asymmetrical Data In Right skewed data the distance from the median to X largest is greater than the distance from X smallest to the median. In Right skewed distributions the distance from Q3 to X largest is greater than the distance from X smallest to the Q1. In Left skewed distributions the distance from X smallest to the median is greater than the distance from the median to X largest. In Left skewed distributions the distance from X smallest to Q1 is greater than the distance from Q3 to X largest. Naming conventions for left and right skewed are counter intuitive. Right Skewed distributions contain most of their data points to the left (long right whisker) and Left Skewed distributions contain most of their data points to the right (long left whisker). Median X minimum Q1 (Q2) Q3 25% 25% 25% 25% 12 30 45 57 70 X maximum Interquartile range = 57 30 = 27 OPIM 303 Lecture 3 Page 17

The covariance measure is unable to measure the relative strength of the linear relationship between the two variables. To do this we use the coefficient of correlation. OPIM 303 Lecture 3 Page 18

aka: Little r No Units The covariance and coefficient of correlation are two numerical descriptive measures for measuring the strength of the linear relationship between two variables. The values of the coefficient of correlation range from -1 for perfect negative correlation to +1 for perfect positive correlation. When the coefficient of correlation equals 0 there is no tendency for one variable to vary in relation to the other variable. Perfect means that if all points where plotted on a scatter diagram the points could all be connected with a straight line. Coefficient of Correlation is denoted by Y Y Y Y X X r = -1 r = -.6 r = 0 Y Y X r = +1 X r = +.3 X r = 0 X OPIM 303 Lecture 3 Page 19

Graphic on previous page Correlation relationships are described as tendencies and not as cause-and-effect. Correlation alone cannot prove that there is a causation effect. Correlation Coefficient only measures the strength of the linear relationship. EXCEL: CORREL(X values cell range, Y values cell range) 3.5 3 2.5 2 1.5 1 0.5 0 Non Linear Relationship 0 2 4 6 8 Calculating a Correlation Coefficient on this data would give something close to zero. OPIM 303 Lecture 3 Page 20

OPIM 303 Lecture 3 Page 21

One or the other. Collectively Exhaustive: the set includes all possible outcomes. OPIM 303 Lecture 3 Page 22

a priori classical probability: the probability of success is based on prior knowledge of the process involved. We will use this method. empirical classical probability: the outcomes are based on observed data, not on prior knowledge of a process. Subjective probability: refers to the chance of occurrence assigned to an event by a particular individual. This is a meaningless measure. Examples: P(coin toss is heads) = 1/2 P(die roll is 6) = 1/6 P(2 kids with 1 boy and 1 girl) We are not looking at birth order so some outcomes are more likely then others. These events are not equally likely. 1st Kid 2nd Kid First birth has 50/50 chance: Boy Boy/Boy or Boy/Girl P(boy) = ½ P(girl)=1/2 Girl Girl/Boy or Girl/Girl Every birth probability is 50%. This is a-priori because we know the probabilities of boy or girl beforehand. From the 2-kid table we can calculate the probability (any birth order): P(1 Boy and 1 Girl) = 2/4 OPIM 303 Lecture 3 Page 23

Sample Space: the set of all possible outcomes. The manner in which the sample space is subdivided depends on the types of probabilities that are to be determined. Complement: the complement of event A includes all events that are not part of event A. Joint Event: a joint event is an event which has two or more characteristics. Simple Probability: refers to the probability of occurrence of a simple event. Marginal Probability: the total number of successes can be obtained from the appropriate margin of the contingency table. The marginal probability of an event consist of a set of joint probabilities. Joint Probability: refers to a probability involving two or more events. Example: given the below table compute the probability that a randomly selected household purchased a HDTV and a DVD player. Purchased DVD Purchased TV Yes No Total HDTV 38 42 80 Not HDTV 70 150 220 Total 108 192 300 Table 4.2 P(HDTV) = Number HDTV sets purchased Total number TV's purchased 38 = = 12.7% 300 OPIM 303 Lecture 3 Page 24

Empirical Probability: the two events are not equally likely. Subjective Estimate: the probability is assigned subjectively by a person. Of no value. OPIM 303 Lecture 3 Page 25

OPIM 303 Lecture 3 Page 26

Must subtract the probability of both events or we will have counted twice. Example: Probability a roll of a dice is odd OR <= 2? P(odd) + P(<=2) P(odd AND <= 2) 3/6 + 2/6-1/6 = 4/6 = 2/3 Key is that you have to subtract off the AND probability of you will be counting the possibility of that occurrence twice. Example: P(die roll is 1) = 1/6 P(die roll is odd) = 3/6 P(die roll is 1 OR odd) = 3/6 Not just the sum of 1/6 and 3/6, that would double count the probability that the roll is 1. Solution: P(roll is 1 OR odd) = 1/6 + 3/6-1/6 = 3/6 = 1/2 OPIM 303 Lecture 3 Page 27

Example: Roll two die, what is the probability that the sum is 10? Possible outcomes: 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12. THIS IS WRONG! The probability is not 1/11 because these events are not equally likely. Each sum (except 2) has multiple ways of being realized. We must look at all possible outcomes of a two (fair) dice roll. Then we calculate how many of the possible outcomes sum to 10. There are 36 possible outcomes, 3 combinations sum to 10. P(2 rolls AND sum is 10) = 3/36 Example: Roll two die, what is the probability that the sum is 7? P(2 rolls AND sum is 7) = 6/36 = 1/6 Example: P(a roll of a dice is <= 2 OR => 4) = 2/6 + 3/6 = 5/6 This case is mutually exclusive, both cannot happen in the same event. Example: Event A; outcome is <= 5, P(A) = 5/6 Event B; outcome is even, P(B) = 3/6 These two events are collectively exhaustive, one or the other must occur in a given trial. They are NOT mutually exclusive because both can occur in the same trial. P(A or B) = P(A) + P(B) P(A and B) = 5/6 + 3/6-2/6 = 6/6 = 1 Must know the difference between Collectively Exhaustive and Mutually Exclusive and be able to test the problem for these conditions. OPIM 303 Lecture 3 Page 28

Sample Space, all possible outcomes Do not count this region twice. Mutually Exclusive: both events cannot occur in the same trial, one or the other only. Plan To Actually Purchased Purchase Yes No Total Yes 200 50 250 No 100 650 750 Total 300 700 1000 Table 4.1 P(planned to purchase OR actually purchased) = P(planned to purchase) + P(actually purchased) - P(planned to purchase AND actually purchased) In the Mutually Exclusive case the joint probability is zero. Example: Type of 250 300 200 350 = + 0.35 35% 1000 1000 1000 1000 Number of Purchase Respondents In store 183 Internet 87 Mail Order 30 Total 300 Table 4.3 P(Internet OR mail order) = P(Internet) + P(mail order) - P(Internet AND mail order) 87 30 0 117 = + 0.39 39% 300 300 300 300 OPIM 303 Lecture 3 Page 29

This information does not change the outcome of the trial so these events are independent. Example: P(red card card is not an ace) = = P(red and not ace) 26-2 24 1 = = P(not ace) 52-4 48 2 Using independent form of equation. Here we examine how various probabilities are determined if certain information about the events involved is already known. In the standard notation, P(A B), probability of A given B, we know that event B has occurred and we are asking how it will effect the probability of A happening. 4 Possible Outcomes Example: 2 children in the family. Event of interest is having all boys. B G P(all boys) = ¼ from the chart of possible outcomes at left. B B But if we are told the sex of the first child the probability of ending up G B with two boys changes. If we are told the first child is a girl the G G probability of all boys goes to zero. P(all boys 1 st is girl) = 0 P(all boys 1 st is boy) = 1/2 Here we see that the probability changes given some dependent information. OPIM 303 Lecture 3 Page 30

Contingency Table 1.00 A set is Collectively Exhaustive if at least one event in the set must occur during a given trial. Example: There is a 70% chance that a person chosen at random is American. Having been given some information about the first event the probability of being American changed. P(american and women) =.45 P(random pick is woman) =.60 Using the conditional probability equation: P(american AND woman) 0.45 P(american woman) = = P(woman) 0.6 = 75% Create a Contingency Table to help solve a problem! OPIM 303 Lecture 3 Page 31

A and B do not have to be independent. This is called the General Multiplication Rule. Example: Using the all boys example P(all boys) = P(all boys 1 st is boy) * P(1 st is boy) = ½ * ½ = 1/4 OPIM 303 Lecture 3 Page 32

Conditional Probability Example: OPIM 303 Lecture 3 Page 33

This is called the Multiplication Rule for Independent Events. If this rule holds for two events, A and B, then A and B are Statistically Independent. Therefore, there are two ways to determine statistical independence: 1) Events A and B are statistically independent if and only if P(A B) = P(A). 2) Events A and B are statistically independent if and only if P(A and B) = P(A)P(B). Example: P(card is red AND an ace) = P(card is red) * P(ace) = 1/2 * 4/52 = 1/26 These events are Independent. OPIM 303 Lecture 3 Page 34

Problem 4.13, page 167 P(warranty repair) =.04 P(US company) =.6 P(w.r. and US company) =.025 This type of problem is best solved using a contingency table. Some of the entries are given in the problem, the rest are solved for. Repair Type Manufacturer Warranty Not Warranty Total US Company 0.025 0.575 0.6 not US Company 0.015 0.385 0.4 Total 0.04 0.96 100% Now we can pick a lot of the probabilities out of the table: e) P(new car selected at random needs w.r.) = 0.04 f) P(car selected at random not US company) =.4 In the category Manufacturer, US company and not US company form a Mutually Exclusive and Collectively Exhaustive set. This being the case we can also solve for this probability using 1 P(random car is US company) = 1 -.6 =.4 g) P(w.r. and US company) =.025 h) P(w.r. and US company) =.385 Another thing we can do is use the conditional probability to test for Independence. If the variables are independent then P(w.r. US company) = P(w.r.) =.025 P(w.r. and US company).025 Now solve P(w.r. US company) =.04166.04 P(US company).6 So the variables are NOT INDEPENDENT. If the variables are not independent you cannot just multiply events from table to solve a conditional probability. i) P(w.r. or US company) = P(w.r.) + P(US company) P(w.r. and US company) =.04 +.6 -.025 =.615 (from table) j) P(w.r. or US company) = P(w.r.) + P(US company) P(w.r. and US company) =.04 +.4 -.015 =.425 k) P(w.r. or w.r.) = P(w.r.) + P(w.r.) =.04 +.96 = 1 Because these events are Mutually Exclusive and Collectively Exhaustive. OPIM 303 Lecture 3 Page 35