Last time. Numerical summaries for continuous variables. Center: mean and median. Spread: Standard deviation and inter-quartile range

Similar documents
CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

are the objects described by a set of data. They may be people, animals or things.

Math 1313 Experiments, Events and Sample Spaces

Sets and Set notation. Algebra 2 Unit 8 Notes

Business Statistics. Lecture 3: Random Variables and the Normal Distribution

Lecture 3. Measures of Relative Standing and. Exploratory Data Analysis (EDA)

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

Probability- describes the pattern of chance outcomes

Chapter 1. Looking at Data

P8130: Biostatistical Methods I

Stats Probability Theory

Statistics I Chapter 2: Univariate data analysis

Econ 325: Introduction to Empirical Economics

The probability of an event is viewed as a numerical measure of the chance that the event will occur.

Probability Rules. MATH 130, Elements of Statistics I. J. Robert Buchanan. Fall Department of Mathematics

Mathematical Foundations of Computer Science Lecture Outline October 18, 2018

ELEG 3143 Probability & Stochastic Process Ch. 1 Probability

Statistics I Chapter 2: Univariate data analysis

Lecture notes for probability. Math 124

Probability Basics Review

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

CS70: Jean Walrand: Lecture 16.

RVs and their probability distributions

Chapter 01 : What is Statistics?

A survey of Probability concepts. Chapter 5

MA : Introductory Probability

Bioeng 3070/5070. App Math/Stats for Bioengineer Lecture 3

Probability Space: Formalism Simplest physical model of a uniform probability space:

I - Probability. What is Probability? the chance of an event occuring. 1classical probability. 2empirical probability. 3subjective probability

Lecture 6 Probability

Mean/Average Median Mode Range

Introduction to Statistics

Chapter 2: Tools for Exploring Univariate Data

Exam 1 Review (Notes 1-8)

Chapter 3. Data Description

ECLT 5810 Data Preprocessing. Prof. Wai Lam

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Chapter 5: Probability in Our Daily Lives

Vocabulary: Samples and Populations

Recap. The study of randomness and uncertainty Chances, odds, likelihood, expected, probably, on average,... PROBABILITY INFERENTIAL STATISTICS

Statistics 251: Statistical Methods

Intermediate Math Circles November 8, 2017 Probability II

3 Lecture 3 Notes: Measures of Variation. The Boxplot. Definition of Probability

Continuing Probability.

Probability and Sample space

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected


Lecture 9b: Events, Conditional Probability, Independence, Bayes' Rule Lecturer: Lale Özkahya

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

4 Lecture 4 Notes: Introduction to Probability. Probability Rules. Independence and Conditional Probability. Bayes Theorem. Risk and Odds Ratio

Basic Concepts of Probability

Chapter 14. From Randomness to Probability. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Probability and Probability Distributions. Dr. Mohammed Alahmed

Essentials of Statistics and Probability

Stat Lecture Slides Exploring Numerical Data. Yibi Huang Department of Statistics University of Chicago

1 Probability Theory. 1.1 Introduction

Keystone Exams: Algebra

CHAPTER 2: Describing Distributions with Numbers

Mean, Median and Mode. Lecture 3 - Axioms of Probability. Where do they come from? Graphically. We start with a set of 21 numbers, Sta102 / BME102

MODULE 2 RANDOM VARIABLE AND ITS DISTRIBUTION LECTURES DISTRIBUTION FUNCTION AND ITS PROPERTIES

Chap 4 Probability p227 The probability of any outcome in a random phenomenon is the proportion of times the outcome would occur in a long series of

Descriptive Data Summarization

Week 2. Section Texas A& M University. Department of Mathematics Texas A& M University, College Station 22 January-24 January 2019

University of Jordan Fall 2009/2010 Department of Mathematics

Basic Concepts of Probability

Chapter 6: Probability The Study of Randomness

3.2 Probability Rules

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Lecture 3 - Axioms of Probability

Lecture 3 Probability Basics

Example 1. The sample space of an experiment where we flip a pair of coins is denoted by:

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Chapter 7 Wednesday, May 26th

MATH 1150 Chapter 2 Notation and Terminology

Properties of Probability

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Event A: at least one tail observed A:

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Types of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511

Chapter 5. Understanding and Comparing. Distributions

Revised: 2/19/09 Unit 1 Pre-Algebra Concepts and Operations Review

Units. Exploratory Data Analysis. Variables. Student Data

1. When applied to an affected person, the test comes up positive in 90% of cases, and negative in 10% (these are called false negatives ).

Lecture Lecture 5

STAT200 Elementary Statistics for applications

Chapter 2: Probability Part 1

Describing Distributions with Numbers

Venn Diagrams; Probability Laws. Notes. Set Operations and Relations. Venn Diagram 2.1. Venn Diagrams; Probability Laws. Notes

Bemidji Area Schools Outcomes in Mathematics Algebra 2A. Based on Minnesota Academic Standards in Mathematics (2007) Page 1 of 6

3.2 Intoduction to probability 3.3 Probability rules. Sections 3.2 and 3.3. Elementary Statistics for the Biological and Life Sciences (Stat 205)

What is the probability of getting a heads when flipping a coin

Math 140 Introductory Statistics

Math 140 Introductory Statistics

Chapter. Probability

1. Exploratory Data Analysis

Lecture 2: Probability Distributions

STAT:3510 Biostatistics

Intro to Stats Lecture 11

Section 3. Measures of Variation

MAE 493G, CpE 493M, Mobile Robotics. 6. Basic Probability

Transcription:

Lecture 4

Last time Numerical summaries for continuous variables Center: mean and median Spread: Standard deviation and inter-quartile range Exploratory graphics Histogram (revisit modes )

Histograms Histogram of income Frequency 0 50 100 150 0e00 1e05 2e05 3e05 4e05 income

Histograms: Skew (heavy right tail)

Histograms: Skew bimodal

Histograms: Trimodal

Histograms: Normal (Bell curve)

Histograms: Symmetric

Histograms: Symmetric (with four modes)

5-number summary (Min, Lower Q, Med, Upper Q, Max) Example: Average daily temperatures in Philadelphia (1974-1986), n=5479 Min Lower Qu Median Upper Qu Max -0.25 40.0 55.25 70.25 88.50

Temp 0 20 40 60 80 Boxplots Built around the 5-number summary Box constructed from lower and upper quartiles Median marked with a horizontal bar Whiskers designed to capture most of the data (99.5% for normal or bell shape) Points for data beyond whiskers, possible outliers

Mortality 0 10 20 30 40 50 Boxplots Whiskers Find the largest value that is within 1.5 IQR of the upper quartile; mark that point with a bar Find the smallest value that is no smaller than 1.5 IQR below the lower quartile Points for data beyond whiskers, possible outliers Mortality summary: Min Lower Qu Median Upper Qu Max 3.0 12.0 15.0 78.0 36.0

Histograms of Daily Mortality Histogram of non accidental mortality Frequency 0 200 400 600 800 1000 5 10 15 20 25 30 35 Number of deaths

More boxplots: Non-accidental mortality Daily mortality 5 10 15 20 25 30 35 fall winter spring summer Season

Histograms of Daily Temperatures Histogram of average daily temperatures Frequency 0 100 200 300 400 500 600 0 20 40 60 80 Temperatures

Boxplots of Daily Temperatures Average daily temperature 0 20 40 60 80 fall winter spring summer Season

A similar idea: Conditioning 0 20 40 60 80 summer winter 600 400 200 Count fall spring 0 600 400 200 0 0 20 40 60 80 Temperature

Summaries for discrete or qualitative data Frequency tables and bar graphs Example: BRFSS What is the highest grade or year of school you completed? freq % cumu % None 188 0.1 0.002 Gd 1-8 3413 3.1 0.033 Gd 9-11 7251 6.7 0.100 Gd 12 31483 30.0 0.390 Col1-3 30415 28.0 0.670 Col 4 35654 32.8 0.998 Refused 257 0.2 1.000 n=108,661 100.0

Barplots (note difference from histograms) 0 5000 10000 15000 20000 25000 30000 35000 None Gd 1 8 Gd 9 11 Gd 12 Col1 3 Col 4 Refused

Barplots None Gd 1 8 Gd 12 Col1 3 Col 4 Refused 0.00 0.05 0.10 0.15 0.20 0.25 0.30

Two continuous variables Scatter plots can display the relationship between two continuous variables (conditioning can give more control) Can be used to spot trends in the data, relationships as well as their strength Random versus controlled variables in the plot

Scatter plot (BRFSS) Weight 100 150 200 250 300 350 60 65 70 75 80 Height

Scatter plot (BRFSS) 60 65 70 75 80 100 150 200 250 300 350 Height Weight

Scatter plot (Time series of mortality) Daily mortality 5 10 15 20 25 30 35 Date

Scatter plot (Time series of temperature) Daily temperature 20 40 60 80 Date

Lecture 5

Last time Graphics for exploratory analysis Histograms, boxplots, barplots, scatterplots When faced with several variables... Interactive analysis Conditioning when creating plots How s the lab?

Probability Toss a coin. What is the probability of heads? Roll a die. What is the probability of getting a five?

Some historical examples Count Buffon (1707-1788) tossed a coin 4,040 times, with heads coming up 2,048 or 50.69 percent of the time Karl Pearson (1857-1936) tossed a coin 24,000 times with heads coming up 12,012 or 50.05% of the time John Kerrich tossed a coin 10,000 times (while he was a prisoner of war in WWII) and heads came up 5,067 times or 50.67 percent

Some historical examples F.N. David and Roman dice, 204 tosses Rock crystal 30, 38, 31, 34, 34, 37 Iron 35, 39, 30, 21, 37, 42 Marble 27, 28, 23, 47, 25, 54

Some historical examples Diaconis mechanical coin flipper Coin always lands the same way

Computer experiment 0.6 0.5 relative frequency 0.4 0.3 0.2 0.1 0.0 0 100 200 300 400 500 flip number

Probability and relative frequency Toss a coin many, many times Examine the proportion of flips that turn up heads What should you get?

Probability and relative frequency Perform a large number of independent repetitions of a random phenomenon After each trial, record the proportion of times in which an event occurs This relative frequency approaches a fixed number that we call the probability of the event

Probability models With coins or dice, we often speak of fairness, or rather that there is no preference for one event over another This is an idealization of a random phenomenon Probability models are mathematical descriptions that account for unpredictable factors in random events

Three kinds of probabilities Probabilities from models Probabilities from data (relative frequencies) Subjective probabilities (beliefs)

Some terminology A random experiment is some situation with an unpredictable outcome The sample space of an experiment is the set of all possible outcomes An event is a collection of outcomes

Some terminology Random experiments: Tossing a coin, measuring your blood pressure, taking a test Sample space: H/T, (whatever BP is measured in), A-F grade Event: The coin turns up heads, my blood pressure is in the normal range, I pass the exam

Some notation Sample space is denoted by the symbol S Tossing a coin 2 times, S = {HH,HT,TH,TT} Events (your text uses labels and names events or simply assigns symbols A, B, C...) Tossing at least one head, A={HH,HT,TH}

Lecture 6

Last time Conceptual definition of probability Long-run averages of outcomes from repeated independent experiments Probability models Mathematical descriptions that account account for unpredictable factors in random events Terminology and notation describing events

Working with events The complement of an event A occurs if A does not occur; we denote it by A We combine events with set operations of intersection and union, and and or Two events are mutually exclusive if they cannot occur at the same time A B

Sample space and events S A = {even spots} A

Sample space and events S B = {fewer than 5 spots} A

Complement B = {fewer than 5 spots} B A

Intersection (and) B = {fewer than 5 spots} A = {even spots} A

Intersection (and) A and B = {fewer than 5 spots and even} A

Union (or) B = {fewer than 5 spots} A = {even spots} A

Sample space and events A or B = {Fewer than 5 spots or even} A

Probability distributions For a sample space assign values S = S 1, S 2, S 3,... p 1, p 2, p 3,... we Each value is between 0 and 1 The sum of the values adds to one 1 = p 1 p 2 p 3... The probability of an event A is the sum of the values for all the outcomes in A

Equally likely outcomes By symmetry we can might believe no one outcome occurs more frequently than any other The probability of an event A is then pr(a) = Number of outcomes in A Total number of outcomes in S

S B = {fewer than 5 spots} pr(b) = 4/6 = 2/3 A

Rules for working with probabilities The sample space is certain to occur pr(s) = 1 pr(a does not occur) = 1-pr(A does occur) pr(a) = 1-pr(A) The probability of two mutually exclusive events occurring is the sum of the two events pr(a or B) = pr(a) pr(b)

Mutually exclusive events B = {throw a 1} A = {throw a 6} A pr(a or B) = 1/6 1/6 = 1/3

Rules for working with probabilities Addition rule holds for any number of mutually exclusive events pr(a 1 or A 2 or... or A k ) = pr(a 1 ) pr(a 2 ) pr(a k )

... and if they re not mutually exclusive? When two events are not mutually exclusive, we have double counted the intersection; instead we use the rule pr(a or B) = pr(a) pr(b) - pr(a and B)

Screening tests These are tests designed to determine if someone might have a particular medical condition; typically they are applied to a large segment of the population and the positives are subjected to further diagnostic tests

Screening tests and a 2-by-2 table Disease status Y N Pos sick and pos well and pos Test result Neg sick and neg well and neg

Screening tests and a 2-by-2 table Disease status Y N Pos pr(sick and pos) pr(well and pos) pr(pos) Test result Neg pr(sick and neg) pr(well and neg) pr(neg) pr(sick) pr(well)

Conditional probability The conditional probability of A occurring given that B has occurred is denoted pr(a B) and is defined by pr(a and B) / pr(b) This gives us the multiplicative rule for probabilities p(a and B) = P(A B) pr(b)

Conditional probability Disease status Y N Pos pr(sick and pos) pr(well and pos) pr(pos) Test result Neg pr(sick and neg) pr(well and neg) pr(neg) pr(sick) pr(well)

Conditional probability The conditional probability that a test is positive given that someone is sick is pr(positive and sick) / pr(sick) What is the probability that someone is sick given that the test is negative?