Probability & Statistics: Infinite Statistics Robert Leishman Mark Colton ME 363 Spring 0
Large Data Sets What happens to a histogram as N becomes large (N )? Number of bins becomes large (K ) Width of bins becomes small ( 0) Histogram becomes smoother and approaches a continuous function
Eample: ASTM-A4 Steel n (%) 0.5 0. 0.5 0. 0.05 0 450 460 470 480 490 500 50 S (Mpa) N = 00 K = S ave = 479.8 MPa n (%) 3.5.5.5 0.5 4 0-3 3 0 40 440 460 480 500 50 540 S (Mpa) N = 0,000,000 K =,80 S ave = 480.00 MPa
Probability Density Function For an infinite data set, the frequency distribution is smooth and continuous This function is called the probability density function (p.d.f.) The p.d.f. relates the value of a measured variable to its likelihood of occurring
Infinite vs. Finite Statistics In theory, we can have infinite data sets An infinite number of data points We can eamine the entire population, or all possible values In reality, data sets are finite A limited number of data points Based on a sample of the entire population We will use our limited sample sie to etract information about the entire population Eample: Etract information about ALL ASTM-A4 steel from 000 specimens
Standard p.d.f.s There are many shapes of p.d.f.s that can occur See Table 4. The shape of the p.d.f can be eperimentally determined by looking at the shape of the histogram, and finding which p.d.f best matches it
Standard p.d.f.s
Normal Distribution The normal or Gaussian distribution is one of the most common It can be used to represent variables that are random about a mean Electrical noise Variation in strength of steel Precision error Bell Curve
Normal Distribution The p.d.f. of the normal distribution is given by: p ( ) σ ( ) = σ e π is the value of a particular measurement is the true mean of the entire population Describes the tendency of the population Determines the center point of the normal curve Also called μ (when it is sampled, not true) σ is the standard deviation of the entire population Describes the population s spread Determines the width of the normal curve
Normal Distribution If we have the p.d.f. of some data, then we can make predictions about the probability P of future measurements Specifically, we can predict the probability of a data point falling within a certain interval This probability is given by the area under the p.d.f. over the appropriate interval + = + d p P ) ( ) ( + = + d e P σ π σ ) ( ) ( - +
Normal Distribution Make a change of variables: The integral then becomes Since the normal distribution is symmetric about : + = + d e P σ π σ ) ( ) ( σ σ = ) ( d e P π = 0 ) ( d e P π
Normal Distribution The bracketed epression is called the normal error function Solutions to this integral are tabulated in Table 4.3 (p. 8) The integral gives us a method for calculating the probability that lies between ± for a given distribution defined by and σ Best understood by doing some eamples σ σ = 0 ) ( d e P π
Eample What is the area under the normal distribution curve from = -.43 to =.43? What is the significance of this area?
Eample
From the table: Eample The area under the normal distribution curve from = 0 to =.43 is 0.436 This represents ½ of the integral between = -.43 to =.43 The total area is therefore (0.436) = 0.847 What does this mean? For data following a normal distribution, 84.7% of the population lies within the range -.43.43 But = + σ So this means that 84.7% of the population lies within ±.43 standard deviations of the mean This is true for any normally distributed data
Eample What range of a random variable will contain 90% of the population? Solution: Find such that 45% of the data lie between 0 and + and the other 45% lie between and 0 Use the table
Eample By interpolation, 0.45 =.645
Eample Again, = + σ So 90% of the population will fall within the range ( - 0.45 σ) < < ( + 0.45 σ) (.645σ) < < ( +.645σ) So 90% of the population will lie within ±.645 standard deviations of the mean
Comments The probability of a measurement occurring can be epressed in terms of the standard deviation of the population The probability of a measurement being within: ±σ of the mean is 68.7% ±σ of the mean is 95.45% ±3σ of the mean is 99.73% Data outside ±3σ are often considered outliers
Eample 3 You are assigned to measure the maimum noload speed of a new type of DC motor You apply a constant voltage to many of these motors and measure the maimum no-load speed for each motor You calculate that = 435.5 rpm and σ = 47.5 rpm Assuming that the variations in no-load speed are random (and normally distributed), what is the probability that a motor will have a no-load speed between 5000 and 500 rpm?
Eample 3 P(5000 500) = P(435.5 500) - P(435.5 5000) From upper limit (first term): = (500-435.5)/47.5 =.0696 P(.0696) = 0.4808 From lower limit (second term): = (5000-435.5)/47.5 =.608 P(.608) = 0.4474 P(5000 500) = 0.4808 0.4474 = 0.0334 = 3.34%