Probability Distributions

Similar documents
When working with probabilities we often perform more than one event in a sequence - this is called a compound probability.

THE SAMPLING DISTRIBUTION OF THE MEAN

Lecture Notes 2: Variables and graphics

MATH 10 INTRODUCTORY STATISTICS

Measures of Central Tendency. Mean, Median, and Mode

P (A) = P (B) = P (C) = P (D) =

Intermediate Math Circles November 8, 2017 Probability II

Math 243 Section 3.1 Introduction to Probability Lab

Experimental Design, Data, and Data Summary

14 - PROBABILITY Page 1 ( Answers at the end of all questions )

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables

are the objects described by a set of data. They may be people, animals or things.

Data 1 Assessment Calculator allowed for all questions

MAT Mathematics in Today's World

8/4/2009. Describing Data with Graphs

Additional practice with these ideas can be found in the problems for Tintle Section P.1.1

Chapter 26: Comparing Counts (Chi Square)

Statistics and parameters

In this investigation you will use the statistics skills that you learned the to display and analyze a cup of peanut M&Ms.

Lesson 19: Understanding Variability When Estimating a Population Proportion

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Data 1 Assessment Calculator allowed for all questions

Great Theoretical Ideas in Computer Science

Bag RED ORANGE GREEN YELLOW PURPLE Candies per Bag

2. Probability. Chris Piech and Mehran Sahami. Oct 2017

Statistics 1. Edexcel Notes S1. Mathematical Model. A mathematical model is a simplification of a real world problem.

Math 1313 Experiments, Events and Sample Spaces

Concepts in Statistics

2.4. Conditional Probability

3 PROBABILITY TOPICS

Data Analysis and Statistical Methods Statistics 651

Lecture 10: Bayes' Theorem, Expected Value and Variance Lecturer: Lale Özkahya

green, green, green, green, green The favorable outcomes of the event are blue and red.

Comparing Measures of Central Tendency *

MA : Introductory Probability

MIDTERM EXAMINATION (Spring 2011) STA301- Statistics and Probability

Chap 4 Probability p227 The probability of any outcome in a random phenomenon is the proportion of times the outcome would occur in a long series of

Chapter 5 : Probability. Exercise Sheet. SHilal. 1 P a g e

Probability Rules. MATH 130, Elements of Statistics I. J. Robert Buchanan. Fall Department of Mathematics

Sampling Distribution Models. Chapter 17

4. Probability of an event A for equally likely outcomes:

Name: Exam 2 Solutions. March 13, 2017

6.2 Introduction to Probability. The Deal. Possible outcomes: STAT1010 Intro to probability. Definitions. Terms: What are the chances of?

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

SESSION 5 Descriptive Statistics

Glossary. COMMON CORE STATE STANDARDS for MATHEMATICS

Data Analysis and Statistical Methods Statistics 651

MATH 1150 Chapter 2 Notation and Terminology

Human-Oriented Robotics. Probability Refresher. Kai Arras Social Robotics Lab, University of Freiburg Winter term 2014/2015

Event A: at least one tail observed A:

GLOSSARY & TABLES MATHEMATICS. UTAH CORE STATE STANDARDS for. Education. Utah. UTAH CORE STATE STANDARDS for MATHEMATICS GLOSSARY & TABLES 135

Common Core Georgia Performance Standards CCGPS Mathematics

Five people were asked approximately how many hours of TV they watched per week. Their responses were as follows.

Final Exam STAT On a Pareto chart, the frequency should be represented on the A) X-axis B) regression C) Y-axis D) none of the above

Unit 4 Probability. Dr Mahmoud Alhussami

Study Island Algebra 2 Post Test

Types of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511

Chapter 8: An Introduction to Probability and Statistics

CIVL 7012/8012. Collection and Analysis of Information

Chapter 6 Group Activity - SOLUTIONS

Grades 7 & 8, Math Circles 24/25/26 October, Probability

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

A C E. Answers Investigation 4. Applications

ECLT 5810 Data Preprocessing. Prof. Wai Lam

Fourier and Stats / Astro Stats and Measurement : Stats Notes

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?!

Basic Concepts of Probability

Lecture 1. ABC of Probability

1 What is the area model for multiplication?

1 Probability Distributions

Chapter 7: Statistics Describing Data. Chapter 7: Statistics Describing Data 1 / 27

CISC 1100/1400 Structures of Comp. Sci./Discrete Structures Chapter 7 Probability. Outline. Terminology and background. Arthur G.

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Review of the Normal Distribution

Probability: Terminology and Examples Class 2, Jeremy Orloff and Jonathan Bloom

Basic Statistics and Probability Chapter 3: Probability

Review of probabilities

Introduction to probability

Chapter I, Introduction from Online Statistics Education: An Interactive Multimedia Course of Study comprises public domain material by David M.

STT 315 This lecture is based on Chapter 2 of the textbook.

LC OL - Statistics. Types of Data

Week 1: Intro to R and EDA

Computations - Show all your work. (30 pts)

Elisha Mae Kostka 243 Assignment Mock Test 1 due 02/11/2015 at 09:01am PST

Intermediate Math Circles November 15, 2017 Probability III

STA Module 4 Probability Concepts. Rev.F08 1

Announcements. Lecture 1 - Data and Data Summaries. Data. Numerical Data. all variables. continuous discrete. Homework 1 - Out 1/15, due 1/22

ACMS Statistics for Life Sciences. Chapter 13: Sampling Distributions

Sets and Set notation. Algebra 2 Unit 8 Notes

Ch 14 Randomness and Probability

Chapter 18. Sampling Distribution Models. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Example. If 4 tickets are drawn with replacement from ,

Outline. Probability. Math 143. Department of Mathematics and Statistics Calvin College. Spring 2010

University of Jordan Fall 2009/2010 Department of Mathematics

Probability 1 (MATH 11300) lecture slides

Econ 325: Introduction to Empirical Economics

Sem. 1 Review Ch. 1-3

Math Day at the Beach 2018

Essentials of Statistics and Probability

Transcription:

Probability Distributions

Probability This is not a math class, or an applied math class, or a statistics class; but it is a computer science course! Still, probability, which is a math-y concept underlies much of what we will do in this course. You might not know it, but you are likely already at least somewhat familiar with probability. If you flip a coin, what is the chance that it will turn up heads? If you roll a die, what is the chance you will roll a 6?

M&M S 2016 TA Andreas loves M&M S; he once ate a bag of 55 M&Ms in less than 10 seconds! M&M S have one variable, color, which has six possible values (outcomes): brown, red, yellow, green, blue, orange The bag Andreas inhaled contained 17 brown M&M S, 18 red M&M S, 7 yellow M&M S, 7 green M&M S, 2 blue M&M S, 4 orange M&M S

Frequency distribution for Andreas bag of M&M S A count of the number of times each outcome occurs is called a(n absolute) frequency distribution. This information can be conveyed in a table or a plot. Color TABLE Frequency PLOT Brown Red Yellow Green Blue Orange 17 18 7 7 2 4

Relative frequency distribution for Andreas M&M S The proportion of times each outcome occurs is called a relative frequency distribution. Count the number times each outcome occurs, and then divide the individual counts by the total count. This last step is called normalizing. Color TABLE Frequency PLOT Brown Red Yellow Green Blue Orange 17/55 =.31 18/55 =.33 7/55 =.13 7/55 =.13 2/55 =.04 4/55 =.07

What is a Probability Distribution? At the M&M factory, the machine is putting some number of each color of M&M into the bag. The machine operates with some variability. Sometimes, there are a lot of a color you love, and other times there are not so many of it. The distribution tells us on average, how many M&Ms of each color are going into the bag. If we don t know the actual distribution, sampling (i.e., eating a bag of M&Ms!) can allow us to estimate it.

Probability distribution for M&M S A probability distribution describes the chance of each possible outcome. The sum of all probabilities is always 1.0. The probability of any event (in this case, the color of M&M) is always bounded between 0 and 1, inclusive. TABLE PLOT Color Probability Source Brown Red Yellow Green Blue Orange.14.13.14.16.24.2

Sample vs. true distribution Relative frequency distribution of Andreas bag True probability distribution set by M&M S manufacturers PLOT TRUE

Error Blue M&Ms are the most common color, with probability 0.24. In the sample, blue was observed with relative frequency 0.10. Error can be calculated as (.24 -.10)/.24 * 100% = 58.33%. There were 58.33% fewer blue M&M s than expected.

The Law of Large Numbers The relative frequency distribution of one bag (i.e., a sample) can differ from the true probability distribution. But in general, very large bags of M&M S will mimic the proportions set by M&M S manufacturers. This is the Law of Large Numbers. Imagine flipping a fair coin 4 times; you might see 3 heads. But by flipping the coin 100 times, you ll likely see close to 50 heads. Likewise, the average of the relative frequency distributions of more and more (small) bags of M&M s (i.e., the sampling distribution) should approach the true underlying distribution.

There are many, many model probability distributions Here s a link to a map of 50+ probability distributions, showing how they all relate!

Features of Probability Distributions The center is the mean, median, or mode. The spread is the variability of the data: Shape can be described by symmetry, skewness, number of peaks (modes), etc. Image source

Summary A probability distribution describes the likelihood that (random) variables take on different values From data, we can build a frequency distribution Distributions, in general, are complex mathematical functions But we can summarize them by their features, such as their center, dispersion, shape, etc. Descriptive statistics like these can be more informative than raw data

Qualitative vs. Quantitative Data

Data can be either qualitative or quantitative Image Source

Image Source Qualitative data Qualitative data describe qualities, like color, texture, smell, taste, appearance, etc. Many qualitative data are categorical: e.g., the color of a ball (yellow, blue, or red) the brand of a product purchased (brand A, B, or C) whether a person is employed (yes or no)

Qualitative data can be nominal or ordinal Nominal means that there is no natural order among the values Image source Ordinal means that there is a natural ordering Image Source

Quantitative data Quantitative data take on numerical values, so are typically ordinal Examples: age, weight, height, income, etc. the value of a country s exports a batter s number of home runs Image Source

Quantitative data can be either discrete or continuous Data are discrete if the measurements are necessarily integral (i.e., integers) Data are continuous if the measurements can take on any value, usually within some range

A First Visualization: Histograms

Population study The population under study is the 65 students in our class. We asked you how many languages you speak (besides English). Here were your responses: 0 1 0 0 1 1 0 0 1 1 2 1 4 1 2 1 2 2 1 2 0 1 1 2 1 0 2 0 1 0 0 1 1 2 0 1 1 1 1 0 2 1 1 1 0 1 1 2 0 1 2 0 1 0 1 1 1 1 1 1 1 0 2 0 1 Each response is a measurement, or an outcome

Histograms of discrete, quantitative data A histogram is a plot of a frequency distribution, when the data are numerical (this description is necessary, not sufficient; formal definition coming soon)

Histograms of continuous, quantitative data Frequency distributions can also be made of continuous data, by clumping similar values into bins (or buckets) Frequency distributions of binned, quantitative data can be displayed in tables, or they can be plotted, as histograms

Frequency table of (two) crew teams weights This data set contains the weights in pounds of the crews participating in the Oxford Cambridge boat race in 1992 The first table (raw data) has the first 15 rows of these data The second table (frequency table) was created by binning the data into intervals of size 10 Raw Data Frequency table

Histogram of the weights of (two) crew teams Raw Data Frequency table

Relative vs. Absolute Frequencies We obtain relative frequencies by dividing absolute frequencies by the sample size. This is called normalization. So, relative frequencies are proportions; they tell us what percentage of the data set falls into each bin. Relative frequencies are easier to compare with one another.

Normalized histogram of the teams weights Raw Data Relative Frequency table

Normalized vs. unnormalized histograms Observe that the shape of the histograms is the same, whether we plot absolute (left) or relative (right) frequencies. Only the scale (y-axis) differs.

Histograms Histograms are bar charts for quantitative data Each bar is associated with a range of neighboring values (i.e., buckets or bins) The horizontal axis of a histogram is continuous like a number line (not categorical, like M&M colors) Image source