Lecture 3. Measures of Relative Standing and. Exploratory Data Analysis (EDA)

Similar documents
Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 3.1-1

Section 3. Measures of Variation

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

QUIZ 1 (CHAPTERS 1-4) SOLUTIONS MATH 119 SPRING 2013 KUNIYUKI 105 POINTS TOTAL, BUT 100 POINTS = 100%

I - Probability. What is Probability? the chance of an event occuring. 1classical probability. 2empirical probability. 3subjective probability

Announcements. Lecture 5: Probability. Dangling threads from last week: Mean vs. median. Dangling threads from last week: Sampling bias

Introduction to Statistics for Traffic Crash Reconstruction

Intro to Stats Lecture 11

3 Lecture 3 Notes: Measures of Variation. The Boxplot. Definition of Probability

Lecture 1. ABC of Probability

Review: Central Measures

Bioeng 3070/5070. App Math/Stats for Bioengineer Lecture 3

Mean/Average Median Mode Range

Sets and Set notation. Algebra 2 Unit 8 Notes

Chapter 14. From Randomness to Probability. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

Mean, Median and Mode. Lecture 3 - Axioms of Probability. Where do they come from? Graphically. We start with a set of 21 numbers, Sta102 / BME102

2011 Pearson Education, Inc

Last time. Numerical summaries for continuous variables. Center: mean and median. Spread: Standard deviation and inter-quartile range

18.600: Lecture 3 What is probability?

Basic Probability. Introduction

STAT Chapter 3: Probability

Random processes. Lecture 17: Probability, Part 1. Probability. Law of large numbers

3.2 Probability Rules

Statistics 1. Edexcel Notes S1. Mathematical Model. A mathematical model is a simplification of a real world problem.

Chapter 4.notebook. August 30, 2017

Lecture 2 and Lecture 3

Measures of the Location of the Data

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

The enumeration of all possible outcomes of an experiment is called the sample space, denoted S. E.g.: S={head, tail}

Probability 5-4 The Multiplication Rules and Conditional Probability

Resistant Measure - A statistic that is not affected very much by extreme observations.

Essentials of Statistics and Probability

LECTURE 1. 1 Introduction. 1.1 Sample spaces and events

AP Statistics Ch 6 Probability: The Study of Randomness

Lecture 3 - Axioms of Probability

are the objects described by a set of data. They may be people, animals or things.

Conditional Probability

Topic -2. Probability. Larson & Farber, Elementary Statistics: Picturing the World, 3e 1

STAT:5100 (22S:193) Statistical Inference I

UNIT 5 ~ Probability: What Are the Chances? 1

MATH 1150 Chapter 2 Notation and Terminology

Section 2.4. Measuring Spread. How Can We Describe the Spread of Quantitative Data? Review: Central Measures

Probability Rules. MATH 130, Elements of Statistics I. J. Robert Buchanan. Fall Department of Mathematics

University of Jordan Fall 2009/2010 Department of Mathematics

3.1 Measure of Center

MATH 10 INTRODUCTORY STATISTICS

STATISTICS 1 REVISION NOTES

Lecture Lecture 5

Venn Diagrams; Probability Laws. Notes. Set Operations and Relations. Venn Diagram 2.1. Venn Diagrams; Probability Laws. Notes

STA Module 4 Probability Concepts. Rev.F08 1

CHAPTER 2: Describing Distributions with Numbers

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Determining the Spread of a Distribution

Describing distributions with numbers

Chapter 3. Data Description

Test 1 Review. Review. Cathy Poliak, Ph.D. Office in Fleming 11c (Department Reveiw of Mathematics University of Houston Exam 1)

Determining the Spread of a Distribution

STP 226 ELEMENTARY STATISTICS

Examples of frequentist probability include games of chance, sample surveys, and randomized experiments. We will focus on frequentist probability sinc

The probability of an event is viewed as a numerical measure of the chance that the event will occur.

Chapter 3. Measuring data

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 2 MATH00040 SEMESTER / Probability

ECE 340 Probabilistic Methods in Engineering M/W 3-4:15. Lecture 2: Random Experiments. Prof. Vince Calhoun

Math 140 Introductory Statistics

Describing distributions with numbers

Describing Distributions

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Chapter. Numerically Summarizing Data Pearson Prentice Hall. All rights reserved

Introduction to Probability. Ariel Yadin. Lecture 1. We begin with an example [this is known as Bertrand s paradox]. *** Nov.

Chapter 1: Exploring Data

Statistical Theory 1

Discrete Probability. Mark Huiskes, LIACS Probability and Statistics, Mark Huiskes, LIACS, Lecture 2

Math 243 Section 3.1 Introduction to Probability Lab

Probability and Statistics Notes

Ch 14 Randomness and Probability

MAT2377. Ali Karimnezhad. Version September 9, Ali Karimnezhad

Mutually Exclusive Events

Final Exam Review. Name: Class: Date: Short Answer

Lecture notes for probability. Math 124

AP Statistics Semester I Examination Section I Questions 1-30 Spend approximately 60 minutes on this part of the exam.

Lecture 1. Chapter 1. (Part I) Material Covered in This Lecture: Chapter 1, Chapter 2 ( ). 1. What is Statistics?

Probability- describes the pattern of chance outcomes

CHAPTER 1. Introduction

Chapter 7 Wednesday, May 26th

Chapter 5. Understanding and Comparing. Distributions

EXAM. Exam #1. Math 3342 Summer II, July 21, 2000 ANSWERS

Fundamentals of Probability CE 311S

Chapter 2. Probability

Chapter 5: Exploring Data: Distributions Lesson Plan

Chapter 5 : Probability. Exercise Sheet. SHilal. 1 P a g e

Lecture 2: Probability Distributions

Introduction to probability

TA Qinru Shi: Based on poll result, the first Programming Boot Camp will be: this Sunday 5 Feb, 7-8pm Gates 114.

P8130: Biostatistical Methods I

Sociology 6Z03 Topic 10: Probability (Part I)

Statistics 100 Exam 2 March 8, 2017

Chapter 2: Tools for Exploring Univariate Data

1. Exploratory Data Analysis

Transcription:

Lecture 3. Measures of Relative Standing and Exploratory Data Analysis (EDA) Problem: The average weekly sales of a small company are $10,000 with a standard deviation of $450. This week their sales were $9050. Is this week unusually low? Measures of relative standing are used to: compare values from different data sets, or compare values within the same data set. Measures of relative standing: z score, percentile. Slide 1

Z-score z -score (or standardized value) is the number of standard deviations that a given value x is above or below the mean: z x mean st. deviation Sample Population z x s x z x Round z to 2 decimal places Example: Weekly sales: mean=$10,000, st.dev.=$450. z-score of $9050 is z 9050 10000 450 2.11. Slide 2

Interpreting Z Scores Whenever a value is less than the mean, its corresponding z score is negative Ordinary values: z score between 2 and 2 Unusual Values: z score < -2 or z score > 2 Exceptional/extreme: z score < -3 or z score > 3 Example: Weekly sales. The z score of -2.11 is unusual. Slide 3

Quartiles Q 1 (First Quartile) separates the bottom (smallest) 25% of sorted values from the top (largest) 75%. Q 2 (Second Quartile) same as the median; separates the bottom (smallest) 50% of sorted values from the top (largest) 50%. Q 1 (Third Quartile) separates the bottom (smallest) 75% of sorted values from the top (largest) 25%. Q 1, Q 2, Q 3 divide ranked/sorted scores into four equal parts 25% 25% 25% 25% (minimum) Q 1 Q 2 Q 3 (maximum) (median) Interquartile Range (or IQR): Q 3 - Q 1 Slide 4

Percentiles 99 percentiles denoted P 1, P 2,... P 99, partition/divide the data into 100 groups. Finding the Percentile of a Given Score number of values less than x Percentile of value x = 100 total number of values Example: Ages of Best Actresses. Find the percentile corresponding to the age of 30 years. Solution: There are 26 ages less than 30, so Percentile of 30 = 26 100 34. 76 Answer: The age of 30 years is the 34 th percentile, so P 34 =30. Slide 5

Converting from the kth Percentile to the Corresponding Data Value Notation k L = n 100 n k L P k total number of values in the data set percentile being used locator that gives the position of a value kth percentile Slide 6

Converting from the kth Percentile to the Corresponding Data Value Example: Find the 20 th percentile of the ages of the Best Actresses. Solution: k=20, n=76: L k 20 n 76 15.2 100 100 L is NOT a whole number, so round it UP: L=16. 16 th smallest (ordered) observation is 27, so P 20 = 27. Slide 7

Exploratory Data Analysis (EDA) Using statistical tools (such as graphs, measures of center, and measures of variation) to (initially) investigate/explore data sets in order to understand their important characteristics: symmetry, outliers, center, spread, etc. Slide 8

Outliers An outlier is a value that is located very far away from almost all of the other values. An outlier can have a dramatic effect on the mean and standard deviation (variance). scale of the histogram so that the true nature of the distribution is totally obscured. Slide 9

5-number summary The 5-number summary of a data set consists of (1) the minimum value; (2) Q 1 ; (3) the median (Q 2 ); (4) Q 3 ; and (5) the maximum value. A boxplot ( or box-and-whisker-diagram) is a graph of a data set that visualizes the 5-point summary. Boxplot of Ages of Best Actresses Slide 10

Boxplots - cont Slide 11

Modified Boxplots Some statistical packages provide modified boxplots which represent outliers as special points. A data value is an outlier if it is or above Q 3 by an amount greater than 1.5 X IQR below Q 1 by an amount greater than 1.5 X IQR A modified boxplot is constructed with these specifications: A special symbol (such as an asterisk) is used to identify outliers. The solid horizontal line extends only as far as the minimum data value that is not an outlier and the maximum data value that is not an outlier. Slide 12

Modified Boxplots - Example Slide 13

PROBABILITY IDEAS Random Experiment know all possible outcomes, BUT - can not predict the outcome of a particular trial. SAMPLE SPACE S set of all possible outcomes EXAMPLES: Toss a coin: Sample space= {T, H} 2. Roll a die, observe the score on top. Sample space = {1, 2, 3, 4, 5, 6,}. 3. Throw a basketball, record the number of attempts to the first basket. Sample space = {1, 2, 3, 4, }. 4. Wait for a Taxi, record waiting time in seconds. Sample space = {t: t 0}, t=time. Slide 14

EVENTS Event a combination of one or more outcomes - a subset of the sample space. Events are usually denoted by capital letters: A, B, C, EXAMPLES: Toss a coin: Event A: Head comes up. A= { H}. 2. Roll a die. Event A: Even number comes up. A = {2, 4, 6}. Event B: Number smaller than 3 comes up. B= {1, 2}. 3. Throw a basketball. Event A: Get the basket before 4th trial. A= {1, 2, 3}. 4. Wait for a Taxi. Event B: Wait no longer than 2 min. B=(0, 120) sec. IN PROBABILITY WE LOOK FOR PROBABILITIES (CHANCES) OF EVENTS Slide 15

PROBABILITY An interpretation of probability Random experiment outcome/event Q: What are the chances/probability of that event? Repeat the experiment many, many times. Record the PROPORTION of times that event occurred in the repeated experiments. Proportion=relative frequency. Probability long-term relative frequency of an event/outcome. EXAMPLE. P(H) proportion of H in many (say 100,000) tosses of a coin. Slide 16

PROBABILITY RULES- AXIOMS OF PROBABILITY Rule 1. Probability of an event A, P(A), is a number between 0 and 1 (inclusive). Rule 2. In any random experiment, probability of all possible outcomes added together is 1. Rule 3. Probability of (A or B) is the sum of P(A) and P(B) if events A and B have no common outcomes (are disjoint). P(A or B) =P(A)+P(B) if A and B do not overlap Venn diagram for two events A and B with no common outcomes, disjoint A and B. A Sample Space B Slide 17

MORE PROBABILITY RULES Rule 4. Probability that event A does not occur P(not A)=1-P(A). Rule 5. Addition rule. P(A or B)=P(A)+P(B)-P(A and B) Illustrations with Venn diagrams. S=Sample space Not A A A B Complement of A A or B A and B A not A P( A) 1 P( A) Overlapping events A and B Slide 18

EXAMPLE Roll a die twice, record the score on both rolls. Sample space: 36 ordered pairs Event A: All results with sum of scores equal to 7. A={(1, 6), (6, 1), (2, 5), (5, 2), (3, 4), (4, 3)} S= {(1, 1), (1, 2), (1, 3),, (1,6), (2, 1), (2,2),, (2, 6), (6,1), (6, 2), (6, 3),, (6, 6)}. Event B: All results with first score equal to 5. B={(5, 1), (5, 2), (5, 6)}. A or B={(1, 6), (6, 1), (2, 5), (5, 2), (3, 4), (4, 3), (5, 1), (5,3), (5,4), (5,5), (5, 6)}. A and B={(5,2)}. Slide 19

Example Let A be the event that it will rain tomorrow, and B the even that it will rain on Sunday, September 13. A or B= event that it will rain tomorrow or Sun, Sep 13 or on both days. A and B= event that it will rain on both days. Slide 20

EQUALLY LIKELY OUTCOMES Fair die all outcomes/scores have the same chances of occurring. Fair coin H and T have 50-50 chances of coming up. Loaded die some scores have larges chances of coming up than others. When all outcomes have the same probability we say they are equally likely. In an experiment with finite number of equally likely outcomes, the probability of an outcome =1/ (total number of possible outcomes), and for an event A number of outcomes that make up A P(A) = --------------------------------------------------. total number of possible outcomes Slide 21

Example Adults are randomly selected for a Gallup poll and asked if they think that cloning of humans should be allowed. Among those surveyed, 91 said that cloning should be allowed; 901 said is should not be allowed, and 20 had no opinion. Based on these results estimate the probability that a randomly selected person believes that cloning of humans should be allowed. Solution. Need to know the sample space, total number of responses: 91+901+20=1012. Use relative frequency approach to probability: Events: A- selected person thinks cloning of humans should be allowed. #people thinking cloning should be allowed 91 PA ( ) 0.0899. total umber of people surveyed 1012 Answer: The probability that a person thinks cloning of humans should be allowed is about 0.09. Slide 22

Example The probability that John will pass a math class is 0.7, and that he will pass a biology class is 0.6. the probability that he will pass both classes is 0.5. What is the probability that he will pass at least one of these classes? What it the probability that he will fail Biology? Solution. Events: M= John will pass math; B=John will pass Biology P(M)=0.7, P(B)=0.6; P(M and B)=0.5; Need: P(M or B)=? Addition rule: P(M or B) = P(M) +P(B) + P(M and B) = 0.7 + 0.6-0.5 =0.8 Probability that John will fail Biology) = P9 not B) = 1- P(B)= 1-0.6= 0.4. Answer: The probability that John will pass Math or Biology is 0.8. The probability that John will fail Biology is 0.4. Slide 23

Example: The following table summarizes data on pedestrain deaths caused by accidents (Natl. Highway Traffic Safety Admin). If one of the pedestrian deaths is randomly selected, what is the probability that: a. Pedestrian was intoxicated, but the driver was not intoxicated. b. Pedestrian or driver (at least one) was intoxicated? Solution. Total number of people in the study =59+79+266+581=985 a. 266/985=0.27 Driver Intoxicated? Yes No Pedestrian Intoxicated? Yes No 59 79 266 581 b. W-event that ped. was intox. P(W)=(59+266)/985=0.33; D-event that driver was intox. P(D)=(59+79)/985=0.14 P(W and D)=59/985=0.06 P(W or D)= via addition rule= P(W) +P(D) + P(W and D)=0.33+0.14-0.06 = 0.41 Slide 24