Statistical Concepts. Distributions of Data

Similar documents
Probability Method in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur

Name: Firas Rassoul-Agha

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Probability. Table of contents

Random Variable. Discrete Random Variable. Continuous Random Variable. Discrete Random Variable. Discrete Probability Distribution

MAT Mathematics in Today's World

ELEG 3143 Probability & Stochastic Process Ch. 2 Discrete Random Variables

Statistical Methods: Introduction, Applications, Histograms, Ch

Part 3: Parametric Models

Random variable X is a mapping that maps each outcome s in the sample space to a unique real number x, x. X s. Real Line

Chapter 7 Wednesday, May 26th

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 2 MATH00040 SEMESTER / Probability

Random variable X is a mapping that maps each outcome s in the sample space to a unique real number x, < x <. ( ) X s. Real Line

Econ 113. Lecture Module 2

Week 1 Quantitative Analysis of Financial Markets Distributions A

University of California, Berkeley, Statistics 134: Concepts of Probability. Michael Lugo, Spring Exam 1

Discrete and continuous

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

9. DISCRETE PROBABILITY DISTRIBUTIONS

Probability Methods in Civil Engineering Prof. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur

Senior Math Circles November 19, 2008 Probability II

Business Statistics. Lecture 3: Random Variables and the Normal Distribution

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur

Lecture 6 - Random Variables and Parameterized Sample Spaces

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology Kharagpur

Great Theoretical Ideas in Computer Science

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Chapter 2. Random Variable. Define single random variables in terms of their PDF and CDF, and calculate moments such as the mean and variance.

Random Variables Example:

Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008

Part 3: Parametric Models

Probability Density Functions and the Normal Distribution. Quantitative Understanding in Biology, 1.2

Chapter 2: The Random Variable

Statistics 100A Homework 1 Solutions

Probability concepts. Math 10A. October 33, 2017

ACTEX CAS EXAM 3 STUDY GUIDE FOR MATHEMATICAL STATISTICS

The Chi-Square Distributions

Collaborative Statistics: Symbols and their Meanings

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Math 105 Course Outline

Discrete Random Variables. Discrete Random Variables

The Chi-Square Distributions

Theorem 1.7 [Bayes' Law]: Assume that,,, are mutually disjoint events in the sample space s.t.. Then Pr( )

6.042/18.062J Mathematics for Computer Science November 28, 2006 Tom Leighton and Ronitt Rubinfeld. Random Variables

Objective Experiments Glossary of Statistical Terms

are the objects described by a set of data. They may be people, animals or things.

Intro to probability concepts

Probability Experiments, Trials, Outcomes, Sample Spaces Example 1 Example 2

Business Statistics PROBABILITY DISTRIBUTIONS

1 Normal Distribution.

Math 416 Lecture 3. The average or mean or expected value of x 1, x 2, x 3,..., x n is

Chapter 3. Chapter 3 sections

Chapter 2 Random Variables

Example A. Define X = number of heads in ten tosses of a coin. What are the values that X may assume?

Example 1. The sample space of an experiment where we flip a pair of coins is denoted by:

MATH 3C: MIDTERM 1 REVIEW. 1. Counting

P (A) = P (B) = P (C) = P (D) =

Math 151. Rumbos Fall Solutions to Review Problems for Exam 2. Pr(X = 1) = ) = Pr(X = 2) = Pr(X = 3) = p X. (k) =

Probability Distribution

11/16/2017. Chapter. Copyright 2009 by The McGraw-Hill Companies, Inc. 7-2

STAT/SOC/CSSS 221 Statistical Concepts and Methods for the Social Sciences. Random Variables

Statistics and Econometrics I

Problems from Probability and Statistical Inference (9th ed.) by Hogg, Tanis and Zimmerman.

Preliminary Statistics. Lecture 3: Probability Models and Distributions

2. The Binomial Distribution

Probability Theory Review

18.05 Practice Final Exam

Chapter 3 Single Random Variables and Probability Distributions (Part 1)

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages

Example. χ 2 = Continued on the next page. All cells

System Identification

Steve Smith Tuition: Maths Notes

3.2 Intoduction to probability 3.3 Probability rules. Sections 3.2 and 3.3. Elementary Statistics for the Biological and Life Sciences (Stat 205)

Chapter 1: Revie of Calculus and Probability

THE ROYAL STATISTICAL SOCIETY 2007 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 2 PROBABILITY MODELS

Recall the Basics of Hypothesis Testing

Interactietechnologie

The Central Limit Theorem

Unit 4 Probability. Dr Mahmoud Alhussami

1 Probability Distributions

6.041/6.431 Spring 2009 Quiz 1 Wednesday, March 11, 7:30-9:30 PM. SOLUTIONS

Probability Distributions Columns (a) through (d)

Chapter 3 Common Families of Distributions

Math 1313 Experiments, Events and Sample Spaces

CS 361: Probability & Statistics

Binomial and Poisson Probability Distributions

Plotting data is one method for selecting a probability distribution. The following

Conditional Probability (cont'd)

The Central Limit Theorem

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Linear Models: Comparing Variables. Stony Brook University CSE545, Fall 2017

MA 250 Probability and Statistics. Nazar Khan PUCIT Lecture 15

Chapter 26: Comparing Counts (Chi Square)

CS 246 Review of Proof Techniques and Probability 01/14/19

Random Variables. Definition: A random variable (r.v.) X on the probability space (Ω, F, P) is a mapping

Bandits, Experts, and Games

RVs and their probability distributions

Chapter 7: Section 7-1 Probability Theory and Counting Principles

Transcription:

Module : Review of Basic Statistical Concepts. Understanding Probability Distributions, Parameters and Statistics A variable that can take on any value in a range is called a continuous variable. Example: The concentration of a contaminant in water samples A variable that can take on only certain values is called discrete. Example: The number of animals visiting a contaminated site in a single day Module. 2

A probability distribution describes the values that t a variable can take on and the probabilities associated with those values. We use probability density functions (pdfs) to describe these distributions. We can also use cumulative density functions (cdfs) For continuous variables, common distributions are the uniform, the triangular, the normal, and the lognormal Module. 3 Discrete Discrete probability distributions simply show the probability bilit of each value occurring. Note that the sum of all of the probabilities in a pdf is one. For the continuous pdfs, the area under the curve equals one. Module. 4 2

Examples of Discrete Probability Distribution Functions (pdfs) Toss of a Fair Coin Probability 5.5..5.7.6.5.4.3. Head Tail 2 Result Roll of a Fair Die 2 3 4 5 6 Probability.3. General Example of a Discrete pdf 2 3 4 5 6 7 8 9 Value If you sum up the probabilities shown, they sum to. Module. 5 The Binomial Distribution Applies in a situation where there are two possible outcomes (success and failure) and the probability of success is constant. Example: Failure is defined to be contamination above a regulatory limit. Assume contamination is uniformly dispersed throughout an area such as a lake and n samples are collected. There will be variability in the amount of measured contamination in the samples due to sampling and measurement errors. There is a probability p that each of the samples will show contamination above the limit. Module. 6 3

Examples of Continuous Probability Distribution Functions (pdfs) Uniform Distribution Triangular Distribution PROBABILIT TY.6.5.4.3.2. 2 3 4 5 6 7 8 9 Value TY PROBABILIT..9.8 7.7.6.5.4.3.2. 2 3 4 5 6 7 8 9 Value Normal Distribution Lognormal Distribution.8 6.6.4..8.6.4.2 PROBABILITY -2.6 -.9.8 2.5 4.2 5.9 7.6 9.3. 2.7 PROBABILITY.3 5.3.3 2.4 3.5 4.5 5.6 6.7 7.7 8.8 9.9.5..5 Module. 7 Examples of Continuous Probability Distribution Functions (cdfs) Uniform Distribution Triangular Distribution BABILITY CUMULATIVE PROB.2.8.6.4 2 3 4 5 6 7 8 9 Value PROBABILIT TY.2.8.6.4 2 3 4 5 6 7 8 9 Value Normal Distribution Lognormal Distribution PROBABILITY.2.8.6.4-2.6 -.9.8 2.5 4.2 5.9 7.6 9.3. 2.7 PROBABILITY.2.8.6.4.3.3 2.4 3.5 4.5 5.6 6.7 7.7 8.8 9.9 Module. 8 4

Values that define key characteristics of probability distributions are called parameters. Parameters are true values that are unknown and generally unknowable Module. 9 Parameters and Statistics A parameter is a characteristic of a population. p It is a value that we would only know if we had perfect information about the entire population. Since we never have this kind of knowledge, parameters can be considered unknown. They are the quantities that we try and estimate from our data. Statistics are quantities calculated from data. For each parameter, there is one or more statistics that estimate it. Module. 5

Parameters and Statistics Example: The population mean is a parameter denoted by, the sample mean estimates and is denoted by Y with a bar over it called. Y Notation: N = number of units in the population n = number of units in the sample N n Y i Y N Y i i n i Module. Parameters and Statistics The population standard deviation is a parameter denoted by and the sample standard deviation estimates it and is denoted by s. N N i ( Y i Y ) 2 s n n i ( Y i Y ) 2 Module. 2 6

Parameters and Statistics Why square them and then have to take the square root? If you added up all of the deviations from the mean, it would be zero Must get them all to be positive values Easiest to square them and then take a square root Module. 3 Parameters and Statistics Once we have data and an equation to calculate a statistic, it s simple arithmetic to get the estimate. However, er the estimate is just that it s not the actual al value of the parameter. The true value of the parameter might be higher or lower than our estimate. If the population was defined by the students registered for this class today, there is a true mean height of that population However, even if I tried to collect data on this population, I couldn t know the true mean. Why? Module. 4 7

Parameters and Statistics However, I can collect a sample and calculate a sample mean that would estimate the true mean. I could also calculate a sample standard deviation and create a confidence interval around dthe true mean. The confidence interval would be a range with a probability attached. It has that probability of including the true mean. Module. 5 The uniform distribution means that there is a range of values defined by the parameters minimum and maximum. All of the values in between have an equal probability of occurring. Uniform Distribution TY PROBABILIT.6.5 4.4.3.2. 2 3 4 5 6 7 8 9 Value Module. 6 8

The triangular has a minimum, maximum, and a most likely l value Triangular Distribution ROBABILITY PR..9.8.7.6.5.4 3.3.2. 2 3 4 5 6 7 8 9 Value Module. 7 The normal is the bell shaped curve, its parameters are the mean and standard deviation. Normal Distribution.8.6.4..8.6.4.2 BILITY PROBAB -2.6 -.9.8 2.5 4.2 5.9 7.6 9.3. 2.7 Module. 8 9

The lognormal also has parameters of the mean and standard deviation. The lognormal has smaller values having a higher probability of occurring and larger values having a smaller and smaller probability of occurring. PROBABILITY.3 5.5..5 Lognormal Distribution.3.3 2.4 3.5 4.5 5.6 6.7 7.7 8.8 9.9 Module. 9 A distribution that is not symmetric is said to be skewed The direction of the skewness is the direction of the long tail A lognormal o distribution is said to be skewed right This is often counter-intuitive Module. 2

More on the Normal Distribution The normal distribution is the bell-shaped curve. Many things that occur in nature follow a normal distribution. Some characteristics: It has two parameters: the mean mu ( ) and the standard deviation sigma ( ) It is shown as N( ) Module. 2 The Normal Distribution Some characteristics: 68.2% of the probability of a normal lies within plus and minus one standard deviation from the mean 95.4% lies within plus and minus 2 standard deviations from the mean 99.7% lies within plus and minus 3 standard deviations from the mean Module. 22

The Standard Normal Distribution The standard normal has = and = It is shown as N(,) Any normal distribution can be transformed into a standard d normal by Z=(X- )/ Module. 23 The Standard Normal Distribution There are an infinite number of normal distributions because there are an infinite number of combinations of and By transforming to a standard normal using Z=(X- )/, you only need a table of the standard normal Module. 24 2

Using the Table of the Normal Distribution Tables such as Table A2. in Manly relate values of Z to the probability bilit (area) under the standard normal pdf from zero to that value. Example: The probability of a value sampled randomly from the standard normal distribution falling between the mean and one standard deviation above it is found by looking up the probability in the table associated with.. It is.34. So, the probability bilit of a value falling within standard d deviation from the mean is double that or.682. Likewise, the probability of a value falling within plus and minus 2 standard deviations is 2 *.477=.954. Module. 25 Using the Table of the Normal Distribution Another Example: Let s say you want to look up the Z value associated with a 95% Confidence Interval. You need the value with 2.5% or.25 in the tail. For this table, you need to subtract that value from.5. So.5.25 =.475. So, look for.475 in the table and then find the Z value associated with it. There it is Z =.96. That s the Z value to use for a 95% confidence interval. Module. 26 3

The Student s t Distribution The Student's t distribution is similar to the normal but with fatter tails. It is used when the true population standard deviation is not known n (most of the time). The exact shape of the t distribution is controlled by the number of data points used to calculate the sample standard deviation. When n is small, the distribution is wide with fat tails. When n gets large, the estimate of is good and the t distribution approaches the shape of a normal distribution. The term for this index is called degrees of freedom (df). For use with the t distribution, df = n-. Module. 27 Using the Table of the t distribution Table A2.2 of Manly gives some selected values from t distributions with degrees of freedom ranging from to infinity. Example: If you have data points, you have 9 degrees of freedom. If you want the value along the t scale that has 97.5% of the probability below it and 2.5% above, use the second column. That t value is 2.262. If you have an infinite number of data points, it s.96, just like the Z table. Module. 28 4

That s it for now! That s a quick review of some basic concepts. We ll continue in Module.2. Module. 29 5