Stats for Engineers: Lecture 4

Similar documents
Guidelines for Solving Probability Problems

Time: 1 hour 30 minutes

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.

Some Continuous Probability Distributions: Part I. Continuous Uniform distribution Normal Distribution. Exponential Distribution

EXAM. Exam #1. Math 3342 Summer II, July 21, 2000 ANSWERS

Bernoulli Trials, Binomial and Cumulative Distributions

POISSON RANDOM VARIABLES

Introduction to Statistical Data Analysis Lecture 3: Probability Distributions

Random Variables Example:

Lecture 20 Random Samples 0/ 13

3.4. The Binomial Probability Distribution

Statistics 2. Revision Notes

Brief Review of Probability

Probability and Probability Distributions. Dr. Mohammed Alahmed

S n = x + X 1 + X X n.

Chapter 7: Theoretical Probability Distributions Variable - Measured/Categorized characteristic

Basics on Probability. Jingrui He 09/11/2007

37.3. The Poisson Distribution. Introduction. Prerequisites. Learning Outcomes

Counting principles, including permutations and combinations.

Continuous Distributions

Page Max. Possible Points Total 100

STAT Examples Based on all chapters and sections

18.440: Lecture 19 Normal random variables

Each trial has only two possible outcomes success and failure. The possible outcomes are exactly the same for each trial.

Part 3: Parametric Models

STAT 516 Midterm Exam 2 Friday, March 7, 2008

Poisson Processes and Poisson Distributions. Poisson Process - Deals with the number of occurrences per interval.

MgtOp 215 Chapter 5 Dr. Ahn

Continuous Expectation and Variance, the Law of Large Numbers, and the Central Limit Theorem Spring 2014

(Ch 3.4.1, 3.4.2, 4.1, 4.2, 4.3)

STAT 430/510 Probability Lecture 12: Central Limit Theorem and Exponential Distribution

Binomial random variable

STAT Chapter 5 Continuous Distributions

Statistics 100A Homework 5 Solutions

Discrete probability distributions

Definition: A random variable X is a real valued function that maps a sample space S into the space of real numbers R. X : S R

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

(Ch 3.4.1, 3.4.2, 4.1, 4.2, 4.3)

ACM 116: Lecture 2. Agenda. Independence. Bayes rule. Discrete random variables Bernoulli distribution Binomial distribution

Probability and Statistics Concepts

Lecture 13. Poisson Distribution. Text: A Course in Probability by Weiss 5.5. STAT 225 Introduction to Probability Models February 16, 2014

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning

Part 3: Parametric Models

Introduction to Probability

Mathematical Methods (CAS) Unit 4 Revision Lecture Presented by Samuel Goh

Probability Theory and Simulation Methods. April 6th, Lecture 19: Special distributions

1. If X has density. cx 3 e x ), 0 x < 0, otherwise. Find the value of c that makes f a probability density. f(x) =

CSE 103 Homework 8: Solutions November 30, var(x) = np(1 p) = P r( X ) 0.95 P r( X ) 0.

S2 QUESTIONS TAKEN FROM JANUARY 2006, JANUARY 2007, JANUARY 2008, JANUARY 2009

Random Variable. Discrete Random Variable. Continuous Random Variable. Discrete Random Variable. Discrete Probability Distribution

Chapter 4 Continuous Random Variables and Probability Distributions

Special distributions

Chapter 2: Discrete Distributions. 2.1 Random Variables of the Discrete Type

Lecture 12. Poisson random variables

SS257a Midterm Exam Monday Oct 27 th 2008, 6:30-9:30 PM Talbot College 342 and 343. You may use simple, non-programmable scientific calculators.

Probability Distributions Columns (a) through (d)

Name: Firas Rassoul-Agha

Special Discrete RV s. Then X = the number of successes is a binomial RV. X ~ Bin(n,p).

MATH 19B FINAL EXAM PROBABILITY REVIEW PROBLEMS SPRING, 2010

MATH 3510: PROBABILITY AND STATS July 1, 2011 FINAL EXAM

Binomial and Poisson Probability Distributions

18.175: Lecture 13 Infinite divisibility and Lévy processes

Chapter 5. Statistical Models in Simulations 5.1. Prof. Dr. Mesut Güneş Ch. 5 Statistical Models in Simulations

Random Variables. Definition: A random variable (r.v.) X on the probability space (Ω, F, P) is a mapping

The Binomial distribution. Probability theory 2. Example. The Binomial distribution

Massachusetts Institute of Technology

Probability Theory and Statistics (EE/TE 3341) Homework 3 Solutions

STATISTICS 1 REVISION NOTES

1 Basic continuous random variable problems

ECO227: Term Test 2 (Solutions and Marking Procedure)

Confidence Intervals for the Mean of Non-normal Data Class 23, Jeremy Orloff and Jonathan Bloom

Analysis of Engineering and Scientific Data. Semester

EDEXCEL S2 PAPERS MARK SCHEMES AVAILABLE AT:

CS 1538: Introduction to Simulation Homework 1

Discrete Distributions

Exercises in Probability Theory Paul Jung MA 485/585-1C Fall 2015 based on material of Nikolai Chernov

success and failure independent from one trial to the next?

Some Special Discrete Distributions

Lecture 2: Discrete Probability Distributions

Class 26: review for final exam 18.05, Spring 2014

HW7 Solutions. f(x) = 0 otherwise. 0 otherwise. The density function looks like this: = 20 if x [10, 90) if x [90, 100]

Applied Statistics I

STAT 224 Exam 1, Version 1 Discussion (check one): Show your work for partial credit.

Lecture 18: Central Limit Theorem. Lisa Yan August 6, 2018

Chapter 3 Common Families of Distributions

STAT509: Continuous Random Variable

b. ( ) ( ) ( ) ( ) ( ) 5. Independence: Two events (A & B) are independent if one of the conditions listed below is satisfied; ( ) ( ) ( )

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

A random variable is a variable whose value is determined by the outcome of some chance experiment. In general, each outcome of an experiment can be

5.2 Continuous random variables

Lecture 7: Confidence interval and Normal approximation

Lecture Notes 2 Random Variables. Discrete Random Variables: Probability mass function (pmf)

Bernoulli Trials and Binomial Distribution

Discrete and continuous

Outline PMF, CDF and PDF Mean, Variance and Percentiles Some Common Distributions. Week 5 Random Variables and Their Distributions

Lecture 14. Text: A Course in Probability by Weiss 5.6. STAT 225 Introduction to Probability Models February 23, Whitney Huang Purdue University

MTH4451Test#2-Solutions Spring 2009

Poisson Distribution (Poisson Random Variable)

CONTINUOUS RANDOM VARIABLES

(a) Calculate the bee s mean final position on the hexagon, and clearly label this position on the figure below. Show all work.

Transcription:

Stats for Engineers: Lecture 4

Summary from last time Standard deviation σ measure spread of distribution μ Variance = (standard deviation) σ = var X = k μ P(X = k) k = k P X = k k μ σ σ k Discrete Random Variables Binomial distribution number k of successes from n independent Bernoulli (YES/NO) trials P X = k = n k pk 1 p n k

Example: A component has a 0% chance of being a dud. If five are selected from a large batch, what is the probability that more than one is a dud? Answer: Let X = number of duds in selection of 5 Bernoulli trial: dud or not dud, X B(5,0.) P(More than one dud) = P X > 1 = 1 P X 1 = 1 P X = 0 P(X = 1) = 1 C 0 5 0. 0 1 0. 5 C 1 5 0. 1 1 0. 4 = 1 1 1 0.8 5 5 0. 0.8 4 = 1-0.3768-0.4096 0.63.

Binomial or not? A mixed box of 10 screws contains 5 that are galvanized and 5 that are non-galvanized. Three screws are picked at random without replacement. I want galvanized screws, so consider picking a galvanized screw to be a success. Does the number of successes have a Binomial distribution? 1. Yes. No 0% 0% 0 1 Countdown

Binomial or not? A mixed box of 10 screws contains 5 that are galvanized and 5 that are non-galvanized. Three screws are picked at random without replacement. I want galvanized screws, so consider picking a galvanized screw to be a success. Does the number of successes have a Binomial distribution? No, the picks are not independent Independent events have P A B = P(A) but P second galvanized first galvanized P(second galvanized) - If the first is galvanized, then only 4 9 of the remaining screws are galvanized, which is 1 Note: If the box were much larger then consecutive picks would be nearly independent - Binomial then a good approximation. e.g. if box of 1000 screws, with 500 galvanized P second galvanized first galvanized = 499 999 = 0.4995 1

Mean and variance of a binomial distribution If X B(n, p), μ = E X = np σ = var(x) = np(1 p) Derivation Suppose first that we have a single Bernoulli trial. Assign the value 1 to success, and 0 to failure, the first occurring with probability p and the second having probability 1 p. The expected value for one trial is μ 1 = The variance in a single trial is: k kp X = k = 1 p + 0 1 p = p Since the n trials are independent, the total expected value is just the sum of the n expected values for each trial, hence μ 1 = np i=1. σ 1 = X X = 1 p + 0 1 p p = p(1 p) Hence the variance for the sum of n independent trials is, by the rule for summing the variances of independent variables, σ n = i=1 σ 1 = np 1 p.

Polling In the French population about 0% of people prefer Le Pen to other candidates (inc. Hollande and Sarkozy). Mean and variance of a binomial distribution μ = np σ = np(1 p) An opinion poll asks 1000 people if they will vote for Le Pen (YES) or not (NO). The expected number of Le Pen voters (YESs) in the poll is therefore μ = np = 00 What is the standard deviation (approximately)? 1. 6.3. 1.6 3. 5.3 4. 10 5. 160 0% 0% 0% 0% 0% 1.. 3. 4 5 30 Countdown

Polling In the French population about 0% of people prefer Le Pen to other candidates (inc. Hollande and Sarkozy). An opinion poll asks 1000 people if they will vote for Le Pen (YES) or not (NO). The expected number of Le Pen voters (YESs) in the poll is therefore μ = np = 00 What is the standard deviation (approximately)? The number of YES votes has a distribution X B 1000,0. The variance is therefore σ = np 1 p = 1000 0. (1 0.) = 1000 0. 0.8 = 160 σ = σ = 160 1.6 expect 00 ± 1.6 in poll to say Le Pen i.e. the fractional error of 1.6 1000 1.3% Note: quoted errors in polls are not usually the standard deviation see later

Binomial Distribution Summary Discrete random variable X ~ B(n, p) if X is the number of successes in n independent Bernoulli trials each with probability p of success P X = k = n k pk 1 p n k Mean and variance μ = np σ = np(1 p)

Poisson distribution If events happen independently of each other, with average number of events in some fixed interval λ, then the distribution of the number of events k in that interval is Poisson. A random variable X has the Poisson distribution with parameter λ(> 0) if P X = k = e λ λ k k! (k = 0,1,, ) If you are interested in the derivation, see the notes.

Examples of possible Poisson distributions 1) Number of messages arriving at a telecommunications system in a day ) Number of flaws in a metre of fibre optic cable 3) Number of radio-active particles detected in a given time 4) Number of photons arriving at a CCD pixel in some exposure time (e.g. astronomy observations) Sum of Poisson variables If X is Poisson with average number λ X and Y is Poisson with average number λ Y Then X + Y is Poisson with average number λ X + λ Y The probability of events per unit time does not have to be constant for the total number of events to be Poisson can split up the total into a sum of the number of events in smaller intervals.

P(X = k) Example: On average lightning kills three people each year in the UK, λ = 3. What is the probability that only one person is killed this year? Answer: Assuming these are independent random events, the number of people killed in a given year therefore has a Poisson distribution: Let the random variable X be the number of people killed in a year. k Poisson distribution P X = k = e λ λ k k! with λ = 3 P X = 1 = e 3 3 1 1! 0.15

Poisson distribution Question from Derek Bruff Suppose that trucks arrive at a receiving dock with an average arrival rate of 3 per hour. What is the probability exactly 5 trucks will arrive in a two-hour period? 1.. 3. 4. e 3 3 5 5! e 3 3.5.5! e 5 5 6 6! e 6 6 5 5! Reminder: P X = k = e λ λ k k! 0% 0% 0% 0% 1 3 4 0 Countdown

Poisson distribution Suppose that trucks arrive at a receiving dock with an average arrival rate of 3 per hour. What is the probability exactly 5 trucks will arrive in a two-hour period? In two hours mean number is λ = 3 = 6. P X = k = 5 = e λ λ k k! = e 6 6 5 5!

Mean and variance If X Poisson with mean λ, then μ = E X σ = var X = λ = λ

Example: Telecommunications Messages arrive at a switching centre at random and at an average rate of 1. per second. (a) Find the probability of 5 messages arriving in a -sec interval. (b) For how long can the operation of the centre be interrupted, if the probability of losing one or more messages is to be no more than 0.05? Answer: Times of arrivals form a Poisson process, rate ν = 1./sec. (a) Let Y = number of messages arriving in a -sec interval. Then Y ~ Poisson, mean number λ = νt = 1. =.4 P Y = k = 5 = e λ λ k k! = e.4.4 5 5! = 0.060

Question: (b) For how long can the operation of the centre be interrupted, if the probability of losing one or more messages is to be no more than 0.05? Answer: (b) Let the required time = t seconds. Average rate of arrival is 1./second. Let k = number of messages in t seconds, so that k Poisson, with λ = 1. t = 1.t Want P(At least one message) = P k 1 = 1 P k = 0 0.05 P k = 0 = e λ λ k k! = e 1.t 1.t 0 0! = e 1.t 1 e 1.t 0.05 e 1.t 0.05 1 e 1.t 0.95 1.t ln 0.95 = 0.0519 t 0.043 seconds

Poisson or not? Which of the following are likely to be well modelled by a Poisson distribution? (can click more than one) Can you tell what is fish, or will you flounder? 1. Number of duds found when I test four components. The number of heart attacks in Brighton each year 3. The number of planes landing at Heathrow between 8 and 9am 4. The number of cars getting punctures on the M1 each year 5. Number of people in the UK flooded out of their home in July 0% 0% 0% 0% 0% 1 3 4 5 60 Countdown

Are they Poisson? Answers: Number of duds found when I test four components - NO: this is Binomial (it is not the number of independent random events in a continuous interval) The number of heart attacks in Brighton each year - YES: large population, no obvious correlations between heart attacks in different people The number of planes landing at Heathrow between 8 and 9am - NO: 8-9am is rush hour, planes land regularly to land as many as possible (1- a minute) they do not land at random times or they would hit each other! The number of cars getting punctures on the M1 each year - YES (roughly): If punctures are due to tires randomly wearing thin, then expect punctures to happen independently at random But: may not all be independent, e.g. if there is broken glass in one lane Number of people in the UK flooded out of their home in July - NO: floodings of different homes not at all independent; usually a small number of floods each flood many homes at once, P flooded next door flooded P(flooded)

Approximation to the Binomial distribution The Poisson distribution is an approximation to B(n, p), when n is large and p is small (e.g. if np < 7, say). In that case, if X B(n, p) then P(X = k) e λ λ k k! Where λ = np i.e. X is approximately Poisson, with mean λ = np.

Example The probability of a certain part failing within ten years is 10-6. Five million of the parts have been sold so far. What is the probability that three or more will fail within ten years? Answer: Let X = number failing in ten years, out of 5,000,000; X B(5000000,10 6 ) Evaluating the Binomial probabilities is rather awkward; better to use the Poisson approximation. X has approximately Poisson distribution with λ = np = 5000000 10 6 = 5. P(Three or more fail) = P X 3 = 1 P X = 0 P X = 1 P(X = ) = 1 e 5 5 0 0! e 5 5 1 1! e 5 5! = 1 e 5 1 + 5 + 1.5 = 0.875 For such small p and large n the Poisson approximation is very accurate (exact result is also 0.875 to three significant figures).

Poisson Distribution Summary Describes discrete random variable that is the number of independent and randomly occurring events, with mean number λ. Probability of k such events is P X = k = e λ λ k k! Mean and variance: μ = σ = λ The sum of Poisson variables X i is also Poisson, with average number λ i i Approximation to Binomial for large n and small p: if X B(n, p) then P(X = k) e λ λ k k! where λ = np

Continuous Random Variables A continuous random variable is a random variable which can take values measured on a continuous scale e.g. weights, strengths, times or lengths. For any pre-determined value x, P X = x = 0, since if we measured X accurately enough, we are never going to hit the value x exactly. However the probability of some region of values near x can be non-zero. Probability density function (pdf):f(x) P( 1.5 < X < 0.7) b P a X b = f x dx a f(x) Probability of X in the range a to b.

Normalization: Since X has to have some value f x dx = P < X < = 1 And since 0 P 1, for a pdf, f x 0 for all x. Cumulative distribution function (cdf) : This is the probability of X < x. x F x P X < x = f x dx F x = P(X < 1.1)

Mean Expected value (mean) of X: μ = xf x dx Variance Variance of X: σ = x μ f x dx = x f x dx μ Note: the mean and variance may not be well defined for distributions with broad tails. The mode is the value of x where f x is maximum (which may not be unique). Mean μ P X < x = 0.5 Mode The median is given by the value of x where x f x dx = 1. P X > x = 0.5 Median

Question from Derek Bruff Probability density function Consider the continuous random variable X = the weight in pounds of a randomly selected new-born baby. Let f be the probability density function for X. It is safe to assume that P(X < 0) = 0 and P(X < 0) = 1. Which of the following is not a justifiable conclusion about f given this information? 1. No portion of the graph of f can lie below the x-axis.. f is non-zero for x in the range 0 x < 0 3. The area under the graph of f between x = 0 and x = 0 is 1. 4. The non-zero portion of the graph of f lies entirely between x = 0 and x = 0. 0% 0% 0% 0% 1 3 4 30 Countdown

Probability density function Consider the continuous random variable X = the weight in pounds of a randomly selected new-born baby. Let f be the probability density function for X. It is safe to assume that P(X < 0) = 0 and P(X < 0) = 1. Which of the following is not a justifiable conclusion about f given this information? 1. No portion of the graph of f can lie below the x-axis. - Correct, f x 0 for all probabilities to be 0. f is non-zero for x in the range 0 x < 0 - Incorrect, f x can be zero e.g babies must weigh more than an embryo, so at least f x < embryo weight = 0 3. The area under the graph of f between x = 0 and x = 0 is 1. 0 0 - Correct. f x dx = f(x) dx = 1 4. The non-zero portion of the graph of f lies entirely between x = 0 and x = 0. - Correct. P x < 0 = 0 f x < 0 = 0 and P x < 0 = 1 f(x) 0 dx = 0

Uniform distribution The continuous random variable X has the Uniform distribution between θ 1 and θ, with θ 1 < θ if f x = 1 1 0 1 x otherwise X U(θ 1, θ ), for short. f(x) 1 x

Occurrence of the Uniform distribution 1) Waiting times from random arrival time until a regular event (see later) ) Simulation: programming languages often have a standard routine for simulating the U(0, 1) distribution. This can be used to simulate other probability distributions. Actually not very common

Example: Disk wait times In a hard disk drive, the disk rotates at 700rpm. The wait time is defined as the time between the read/write head moving into position and the beginning of the required information appearing under the head. (a) Find the distribution of the wait time. (b) Find the mean and standard deviation of the wait time. (c) Booting a computer requires that 000 pieces of information are read from random positions. What is the total expected contribution of the wait time to the boot time, and rms deviation? Answer: (a) Rotation rate of 700rpm gives rotation time = 1 700 s = 8.33ms. Wait time can be anything between 0 and 8.33ms and each time in this range is as likely as any other time. Therefore, distribution of the wait time is uniform, U(0, 8.33ms)

Mean and variance: for U(θ 1, θ ) Proof: μ = θ 1+θ Let y be the distance from the midpoint, and the width be w = θ θ 1. y = x (θ + θ 1 )/ f(x) σ = θ θ 1 1 Mid-point (θ 1 + θ )/ y 1 w = θ θ 1 x Then since x = θ 1+θ + y, and means add μ = x = θ + θ 1 + y = θ + θ 1 w + yf(y)dy w = θ + θ 1 w + y 1 w dy = θ 1 + θ +0 w Unsurprisingly the mean is the midpoint!

Variance: σ = w x μ f x = y 1 w dy = 1 w w y 3 w 3 w dx f(x) μ y 1 w = θ θ 1 x = 1 3w w 3 8 + w3 8 = w 1 = θ θ 1 1

Example: Disk wait times In a hard disk drive, the disk rotates at 700rpm. The wait time is defined as the time between the read/write head moving into position and the beginning of the required information appearing under the head. (b) Find the mean and standard deviation of the wait time. Answer: (b) http://en.wikipedia.org/wiki/hard_disk_drive μ = θ 1+θ = 0+8.33 ms = 4.17 ms σ = θ θ 1 1 = 8.33 0 1 ms = 5.8 ms σ =.4 ms

Example: Disk wait times In a hard disk drive, the disk rotates at 700rpm. The wait time is defined as the time between the read/write head moving into position and the beginning of the required information appearing under the head. (c) Booting a computer requires that 000 pieces of information are read from random positions. What is the total expected contribution of the wait time to the boot time, and rms deviation? Answer: (c) μ = 4. ms For 000 reads the mean total time is μ tot =000 4.ms = 8.3s. Note: rms = Root Mean Square = standard deviation σ =.4 ms So the variance is σ tot = 000 σ = 000 5.8ms = 0.01s σ tot = 0.01 s = 0.11s