Exercise Quality Management

Size: px

Start display at page:

Download "Exercise Quality Management"

Kathlyn Beasley
6 years ago
Views:

1 Exercise Quality Management 0 Basic principles of mathematical statistics Dipl.-Ing. Tobias Effey Dipl.-Ing. Tobias Effey Group Customer Satisfaction and Operations Management Department Quality Management Chair of Metrology and Quality Management Steinbachstr. 5 (ADITEC), D-5074 Aachen, Tel.: +49 (0) Fax: +49 (0) t.effey@wzl.rwth-aachen.de URL:

2 Outline Basic principles of mathematical statistics Terms and basic ideas Inductive and descriptive statistics Population and sample Feature und scale types Distribution function and probability density function Mean, standard deviation, and variance Analysis of measured values Histogram Mode, median, and measures of statistical dispersion Statistic inference to the population Sample size Exercises Seite

3 Why statistics? Seite 3 If a man thinks about statistics, the only thing he remembers is the mean; He doesn t believe in statistics as an example is about to prove: A man, who is on duck-hunt, tries his first shot. He misses because he shots to quick. The shot hits the ground a foot in front of the duck. The second shot misses too, but this time he is to late and the bullet hits the dirt a foot behind the duck. The hunter speaks to himself, because he believes in the mean: Statistically the duck should be dead. But if he were smart, he would use birdshot because he would enlarge his chance. The shot goes of and the duck gets hit because the statistical spread will make it fit. 3

4 Grundsätzliche Anwendungsfelder für Statistik Population Descriptive Statistik Characteristics to describe the population or the sample Sample Selection Inference Grund- population gesamtheit Stichprobe sample Parameter µ: Erwartungswert σ: Standardabweichung Statistischer statistical Zusammenhang relation Inductive Statistik Statistical inference of the sample date to the population Parameter x: Mittelwert s: empirische Standardabweichung Seite 4 Deskriptive und induktive Statistik: Man unterscheidet zwischen deskriptiver und induktiver Statistik. - Die deskriptive Statistik beschreibt Stichproben anhand von Kennzahlen. - Die induktive Statistik erlaubt den Rückschluss von Stichproben auf eine Grundgesamtheit. Auch hier werden Kennzahlen genutzt. 4

5 Population and sample Population Sample Population Sample Material Relation Statistical Relation Generally necessary sample size (will be explained in detail later): Attributive Data: 50 to 00 values Variable Data: at least 30 values Seite 5 5

6 Feature und scale types Types of Features qualitative (observable) quantitative (measurable and countable) nominal (without any order between values) ordinal (with order between values) cardinal discreet (attributive) cardinal continuous (variable) Example for scales Produkt: Datum: Produkt: Datum: Prüfer: Prüfer: Fehler Häufigkeit Summe Fehler Häufigkeit Summe A 6 A B C B C D D E F Summe E F Summe Standard deviation, confidence interval Example for feature Zipcode: 506 Paper: rough, very rough Quality: OK, n. OK Number: 3 mistakes per lot Amperage:,46 A Typical key data Frequency of the value Frequency of the class Median, quantile Seite 6 Before measurement is conducted the type of feature needs to be defined: - nominal: features are the same or different - ordinal: features can be lined up in an ascending or descending order - cardinal: the interval between the different features are determinable - discreet: the features consist only of finite values - continuous: the distribution function F(x) of the features is continuous and can be differentiated 6

7 Statistical spread: Deming s experiment Every operational procedure which is repeated has some kind of variation Input parameter, process parameter and output parameter are subject to variation, too. µ x Setup This variation is known as statistical spread. - 3σ x + 3σ x + 3σ y Result target value µ y µ mean σ Standard deviation - 3σ y Seite 7 Statistical Spread Statistical Spread can be observed in every technical process (i.e. manufacturing process, natural process ). A manufacturing process is subject to statistical spread caused by e.g. tool wear, workers, vibration and so on. Conclusion: The distribution causes a statistical spread which eventually is unacceptable. 7

8 Deming s experiment readjust after every throw Setup y x Strategy Location of the ball x Result y Zielpunkt Seite 8 8

9 Deming s experiment readjust after n throws y x x / y Seite 9 9

10 Deming s experiment The standard deviation at setup A is much higher than the standard deviation at setub B The natural spread gets bigger because of the systematical readjustment (systematical spread) after every throw The natural spread can only be redused by changeing for example the opening of the funnel s Setup A Readjust after every throw Setup B Readjust after n throws s It is reasonable to analyse your process (e.g. mean and standard deviation) before changing parameters at your process Seite 0 0

Definition of terms and explanations linked to a normal distributed population Probability density y f ( x µ ) σ ( x) = e σ π Mean (Expected value): N E( x) = N i= N: Population size x i = µ σ µ-σ µ

11 Definition of terms and explanations linked to a normal distributed population Probability density y f ( x µ ) σ ( x) = e σ π Mean (Expected value): N E( x) = N i= N: Population size x i = µ σ µ-σ µ x Variance: N Var( x) = i σ N i= ( x E( x) ) = The square is used in order to avoid that negative values reduce positive values. Standard deviation: σ = Var( x) = σ Seite The normal distribution describes the natural statistical spread as shown above (bell-shaped curve) and is determined by two parameters: The arithmetic meanµdetermines the position of the bell-shaped curve. This value represents the highest density of values and therefore is known as expected value. In a manufacturing process the µ should be the target value. The standard deviation σ determines the width of the bell-shaped curve. The standard deviation equals the distance between the arithmetic mean and the turning point of the bell-shaped curve as shown above. Besides those two parameters the stretch of the distribution and the form are very important. The stretch of the distribution can be describe by various parameters, while the variance σ² is the most used one. f(x) is the probability density function of the normal distribution (depending on the value x). Many measurands are in reality in compliance with the normal distribution.

12 Difference between probability density function und distribution function Probability density function of a normal distribution f(x): y f (x) Mathematical description of the probability density of a continuous random variable The hatched area is equal to the probability, that the value of a random variable X is included in the interval [a, b]. a b x Distribution function of a normal distribution F(x): Added probability of the probability density The difference of F(b)-F(a) equals the hatched area in the picture shown above. y 0,8 0, F(a) F(b) F a b x Seite Probability density function of a normal distribution f(x): - The probability density function represents an infinite number of values. - The probability of one value is equal to zero. Distribution function of a normal distribution F(x): - The distribution function describes the probability, that the value of the random variable is equal to or less a certain value (in the example shown above the probability that x is equal to or less then a is 0% (F(a)=0,)) - The value F(a) equals the area beneath the bell-shaped curve (see picture above), which is on the left side of a.

13 Samples in the manufacturing technique Function: Determination of statistical parameters Consideration of random and systematic perturbation Causes: Time and money In reality it is close to impossible to collect data for the whole population representative sample size Advantages: Saves time & money by focusing Simplifies actions in the long run Assessment of the process characteristics operator Disadvantages: Inaccuracy Scope of testing testing disturbance Correction Adjustment of testing accuracy analysis checkup Seite 3 Function of samples: - Determination of random parameters in order to determine e.g. statistical spread and confidence intervals - Consideration of random and systematic perturbation Causes for collecting samples: - In order to save time and money most of the time samples instead of the whole population are tested - In reality it is close to impossible to collect data for the whole population - Statistical methods allow to determine a representative sample size - Often even small amounts of data allow for good conclusions Advantages: - Saves time & money by focusing on the important data - Simplifies actions in the long run Disadvantages: - Inaccuracy, because only part of the population is considered 3

14 Definition of terms and explanation linked to a normal distributed sample Average Value: s σ x = n n x i i= x x µ Empirical Variance: s = n n i= ( x i x) Empirical deviation: s = s Seite 4 This section focuses on the means of samples. These samples are selected from the same population. The means are normal distributed if the samples reaches a certain size. Application: Statistic process control: average charts show the variation of position Statistical Analysis in order to assess effects Average Value: - Mean of a sample - The expected value E of x (i.e. E (x) = µ) equals the mean of the population - Usually the focus is on the difference between the average value and µ Empirical Variance: - This variance is used to asses the spread of values Empirical deviation : - Common parameter to determine the deviation characteristics of the measured values from the mean. - Used as an estimated value for the standard deviation σ of the population! - The squared deviation is not divided by N but by n- (see degree of freedom). Degree of freedom: In order to determine a statistical parameter it is often necessary to determine extensions at first (e.g. mean). If those parameters are used in a calculation, the number of independent values is reduced, because the extension already includes all values. This fact is shown by an example: For example the calculation of the variance is calculated by the use of the average of the squared deviation from the mean. The mean consists of all values, therefore the number of independent values in the calculation of the variance is reduced by one (because one of the original values could be calculated by the mean and the rest of the original values). Generally speaking the degree of freedom depends on the number of independent observations. 4

15 Sampling procedure & Sampling character Sampling procedure Sampling character Different main groups can be differentiated: Random selection Population probability Consciously Selection Prozess every n part Stratification Constant sample of n parts Seite 5 Eine Stichprobe soll ein möglichst repräsentatives Bild der Grundgesamtheit liefern Stichprobenverfahren: Eine Vorschrift, die festlegt, in welcher Weise Elemente der Grundgesamtheit ausgewählt werden Stichprobenarten: - Population: - Ziehen aus einer festen Gruppen mit definierten Grenzen - Kein Zeitelement - Jede Einheit hat eine gleich große Chance, in der Stichprobe enthalten zu sein - Prozess: - Stichprobe aus einem Fluss von Objekten - Hat ein Zeitelement - Ist informativer als Population, da Trends und Verschiebungen berücksichtigt werden Alternativen: Jedes n-te Teil Regelmäßige Stichproben des Umfangs n Erfolgsfaktoren der Stichprobenauswahl: - Vermeidung systematischer Fehler (Auswahl eine Stichprobe, die die Grundgesamtheit repräsentiert): - Stichproben sind basierend auf dem Prozesswissen einer Person auszuwählen, die die Repräsentativität beurteilen kann - Grundgesamtheit mit Schichtung - Unterteilung der Stichprobe in Untergruppen, die zufällige Stichprobenentnahmen in den Gruppen ermöglichen 5

16 Outline Basic principles of mathematical statistics Terms and basic ideas Inductive and descriptive statistics Population and sample Feature und scale types Distribution function and probability density function Mean, standard deviation, and variance Analysis of measured values Histogram Mode, median, and measures of statistical dispersion Statistic inference to the population Sample size Exercises Seite 6 6

17 Histogram 5 0 Number or relative Relative frequency Häufigkeit Histograms allow for generating classes for discrete or continuous characteristics. Display frequency per class. Percent ,88 39,94 40,00 Value or Class 40,06 40, Reasons for generating classes More clearly arranged than a scatter diagram (if the sample size is large). If the data is continuous the likelihood of repeating values is marginal (respectively =0). Determination of the class width : The class width has to be larger than the uncertainty of a single measurement. Seite 7 7

18 Continuous frequency distribution Generating classes Measuring the size a= 40 +/- 0,mm of n = 75 work piece:. Number of classes (k):. class width (H): Discrete data: max. and min. Value are inside the interval Range (R) H = Number of classes (k) The measured values are between 39,88 mm and 40, mm. 40,mm 39,88 H = = 0, The class width determines the class bounds.. n = 75 = 8,66 k = 8 The smallest value determines the bottom boundary of the first class. Regarding discrete data, the range of the highest class can differ from the other classes Class No. j Absolute frequency nj class j a= 40 ± 0, Value of the class x i [mm] 39,88 x i < 39,9 39,9 x i < 39,94 39,94 x i < 39,97 39,97 x i < 40,00 40,00 x i < 40,03 40,03 x i < 40,06 40,06 x i < 40,09 40,09 x i 40, Absolute frequency n j Seite 8 Method for Generating classes:. First of all the number of classes (k) has to be determined.. To define the class bounds one has to determine the class width (H). The class width (H) equals the range of the sample (i.e. range here: 40,mm-39,88mm=0,4mm) divide by the number of classes ( i.e. k here k=8). 3. Starting from the smallest value (or some suitable rounded value) of the sample the class bounds can be determined by adding the n-th of the class width (H) to the starting value. 4. After defining the class bounds the absolute frequency (n j ) of each class is determined by counting the samples which belong to each class. Remark: n = 75, i.e. square root of 75 is about 8,66! Why classes? The likelihood of a repeating values (important for the determination of frequency), which are equal to the n-th position after the decimal point, is very low. Therefore classes are generated to allow for determination of the frequency per class. 8

19 Further display of classified, continuous frequency distribution Measuring the size a= 40 +/- 0,mm of n=75 work pieces: Class No. j Absolute frequency 5. Relative frequency Relative frequency density Relative total frequency h j a= f j ,06 0,080 0,60 0,87 0,7 0,47 0,0 0,053 0,866,667 5,333 6,33 7,567 4,900 4,000,767 0,06 0,06 0,66 0,453 0,680 0,87 0,947,000 n j absolute frequency rel. frequency = h j = = n total number of samples hj relative frequency rel. frequency density= f j = = H class width The estimation of rel. Frequency densities supports the analysis of histograms with different class ranges. relative total frequency = j j h j j= Seite 9 Method for Generating Classes: 5. The relative frequency of each class (h j ) equals the absolute frequency (n j ) divided by the total number of samples (n). 6. The relative frequency density of each class (f j ) equals the relative frequency (h j ) of each class divided by the class width (H). 7. The relative total frequency of class j equals the sum of all relative frequencies of all previous classes and the relative frequency of the class j. The frequency density of each characteristic can be approximated by the bell-shaped curve. Histogram: width of the rectangle = class width; height = frequency density Total frequency: Sum of all relative frequencies 9

20 Measures of location Mode Mo: Value with largest number of observations If generating classes: Ordinal Attributes (Class): class with highest frequency density; Approximation by middle of class Median Me: Point which separates the higher half of a sample or population from the lower half Absolute Value regarding odd numbers Average value between two values regarding even values If generating classes: Class boundary regarding even values Class average regarding odd values For comparison average Value: x = 40,00 Frequency density class: j= -8 attribute x [mm] Frequency density Mo = 40, Me = class: j= -8 attribute x [mm] Seite 0 The mode of the example on page 4 can be calculated as follows: - The Mode is equal to (if the Attribute is ordinal) the class with the highest frequency density: - In the example this would be the class No. 5 with the bounds (40,00 < < x40,03) i - In order to be able to come up with a number for the mode, it gets approximated by the middle of the class: 40, ,03 = 40,05 The median of the example on page 4 can be calculated as follows : - The absolute value of the median equals (the scale starts at 39,88 and ends at 40, ): 39, , Remark: = 40,00 The mode and median are so close to each other because the sample is close to normal distributed. If the distribution would be different the values could differ a lot. 0

21 Measures of statistical dispersion R=0,4 s=0, 8 Q=0, Range R: Interquartile range Q: the difference between the highest and lowest value the difference between the upper and lower quartile (5%-75% of the values) (regarding classes, boundaries are chosen, which include 5-75%) Seite Measures of statistical dispersion The measures of statistical distribution are measures, which describe how the values of a sample spread around the mean. The different measures of statistical dispersion can be judged by their robustness. A measures of statistical distribution has a high robustness if the value of the measure is insensitive to outlier of the sample. Range R: The range R is the simplest measure of statistical distribution. The range is equal to the difference between the highest and lowest value. From this follows that the range has a low robustness, because an outlier changes the value oft the measure. The range R of the example on page 4 is calculated as follows: R = 40, 39,88 = 0,4 Interquartile Range Q: The interquartile range Q is equal to the difference between the upper quartile (75% of the values) and lower quartile (5% of the values). The lower quartile equals the 0,5-qauntil, that is the value of the point of a distribution which is larger than 5 % of all values. The upper quartile equals the 0,75- Qauntil, that is the value of the point of a distribution which is larger than 75 % of all values.

22 Häufigkeitsdichte Central limit theorem population (µ,σ) Sample Distribution of means x x x 3 x4 A value, which is the result of the sum of independent parameters, whereby none of those parameters is dominant, is approximately normal distributed! Approximation by t-distribution: for n 5 Normal distribution (n=30) Approximation by normal distribution: for n 30 t-distribution (n=5) Seite An example will illustrate this: First of all we choose a population with an unknown density distribution function. Then we randomly select a sample, which consists of n values, and we calculate the average value. If we repeat this procedure many times and draw the histogram of the average values, we will notice that the histogram looks like a normal distribution or at least close to one. Even if we repeat this experiment with a different distribution, we would still receive average values, which are normal distributed.

23 T-distribution y The t-distribution (student distribution) is symmetrical and based on the normal distribution The t-distribution is wider and on the edge a little bit more flat x Premise for its application: The standard deviation σ of the population is unknown, it is calculated from the sample as estimated value s One has to assume that the sample characteristics are normal distributed (e.g. measured values) The t-distribution converges to the normal distribution if the sample size rises Important Application: Calculation of confidence intervals Statistical test methods Seite 3 The t-distribution with n degrees of freedom models the standardized characteristics of a sample with the sample size n, if the characteristics of the population are normal distributed. The t-distribution is also known as student distribution. The t-distribution is used to model normal distributed characteristics, if the sample size is finite (e.g. measured values). The t-distribution converges to the normal distribution for rising sample size, so that for n>30 one can use the normal distribution instead of the t-distribution. Remark: The t-distribution was published by William Sealy Gosset in 908, who worked at the Guinness Brewery in Dublin. He published his paper under the pseudonym Student, because he was not allowed to use his real name. 3

24 Confidence intervals Application: Inference from the sample to the population Problem: fuzziness of the analysis of the sample due to the limited sample size The objective is to quantify this fuzzyness! Focus on: x: Confidence interval for the characteristic of a sample (e.g. ) For a comparative value X of the population (e.g. µ) The comparative value of the sample is within the interval with the Probability of error α. x Probability of error e.g. α/ =,5% In generral: example: With: Confidence interval, e.g. (-α) = 95% x α x α P( x α X X µ x + α + x α + x α ) = α Seite 4 Interpretation of the confidence interval: The measured value X is with a probability of -αin the confidence interval x +, x. α α 4

25 Procedure for confidence intervals Selection of a confidence interval, which includes the value of the sample with a probability of (-α) Determination of the distribution function that needs to be used (depending on comparative value of the population): For average value µ, while population is known: u respectively z (normal distribution) (more a theoretic model!) For average value µ, while the population is unknown: t (t-distribution) For standard deviation σ : X ( X -distribution) Example for calculating a confidence interval by using the t-distribution (most used application): + x α µ x α x t s α µ n, n x + t α n, s n To be continued on next slide Seite 5 5

26 Influence on the interval size The larger the sample size is, the more likely is the x close to the real mean of the population the smaller the confidence interval The larger the confidence, (small α) x α x Small sample size Large standard deviation Large confidence interval + x α the larger the assumed interval, which includes the average value µ (large confidence interval) The larger the standard deviation, x α + x α the more spreads the average value x the larger the confidence interval Large sample size Small standard deviation Small confidence interval Seite 6 6

27 Calculation of the necessary sample size for a statistical significant conclusion required information Datatyp (continuous/ discreet) Population size ( N ) Demanded accuracy for the test Accepted deviation µ of the sample values from the mean of the population Confidence in the Process (Variance of the characteristics): Continuous data: Estimation of the standard deviation σ of the population How much confidence is needed? The student-factor t for a certain confidence (-α) (the value of t can be looked up in table which lists values for the t- distribution) In general one chooses t=,96 which equals a 95% confidence interval (to sided examination) Seite 7 7

28 Calculation of the necessary sample size for a statistical significant conclusion Sample size for a variable characteristic s: known standard deviation of the Process (from previous sample analysis) -α: confidence In General: tα / s α = 0,95 = n µ T α/ : Student-Factor for the confidence -α (In consideration of n of the actual examination) If n>30 the t-distribution is close to the normal distribution. Because of that t α/=0,05 =,96 is constant Für n>30 (t α/=0,05 =const.) n α = 0,95,96 s = µ µ: Demanded accuracy of the intended test (realizable difference) n α : smallest sample size for the intended test Seite 8 8

29 Outline Basic principles of mathematical statistics Terms and basic ideas Inductive and descriptive statistics Population and sample Feature und scale types Distribution function and probability density function Mean, standard deviation, and variance Analysis of measured values Histogram Mode, median, and measures of statistical dispersion Statistic inference to the population Sample size Exercises Seite 9 9

30 Exercise : Average value und empirical deviation Example: The body height X is random, i.e. X is a random variable. Table and show the result of the measurement of the body height of 8 male Master- Students from 006 and 007 in cm. Calculate the average value and the empirical deviation. Tab. x x x 3 x 4 x 5 x 6 x 7 x 8 x s Tab. x x x 3 x 4 x 5 x 6 x 7 x 8 x s Seite 30 30

31 Exercise : Generating classes, median, mode und range In an interview you ask 0 Students for their age and you receive the answers:, 3, 4, 4, 5, 7, 7, 9, 3, 34 Generate classes Determine the median and mode Determine the range Seite 3 3

32 Exercise 3: Calculation of a confidence interval Given are two distributions of body heights of male master students from 006 and 007: For the t-distribution see Appendix. x = 75 s = 4,8 n = 8 = 75 =,07 Calculate the confidence interval with 5% probability of error for a two-sided test. x s n = 8 Seite 3 3

Exercise 4: Sample size There is much competition for pizza delivery companies, the customer demands that the pizza is delivered on time (i.e. +/- 0 minutes before/after the scheduled delivery time).

33 Exercise 4: Sample size There is much competition for pizza delivery companies, the customer demands that the pizza is delivered on time (i.e. +/- 0 minutes before/after the scheduled delivery time). For pizza delivery service (A) and pizza delivery service (B) the standard deviation has been estimated as follows: (A) 4 values were measured; s A =9,6 (minutes) (B) 3 values were measured; s B =6,49 (minutes) Determine the necessary sample size under the assumption of a twosided t-distribution with 95% confidence and a demand of accuracy for the delivery time of Minutes and interpret your result. Seite 33 33

34 Appendix: Quantile of the t-distribution -α n 0,6 0,8 0,9 0,95 0,975 0,995 0,999 0,9995 0,35,376 3,078 6,34,447 63,657 38, ,69 0,89,06,886,90 4,303 9,95,37 3, ,77 0,978,638,353 3,8 5,84 0,5,94 4 0,7 0,94,533,3,776 4,604 7,73 8,60 5 0,67 0,90,476,05,57 4,03 5,893 6, ,65 0,906,440,943,447 3,707 5,08 5, ,63 0,896,45,895,365 3,499 4,785 5, ,6 0,889,397,860,306 3,355 4,50 5,04 9 0,6 0,883,383,833,6 3,50 4,97 4,78 0 0,60 0,879,37,8,8 3,69 4,44 4,587 0,60 0,876,363,796,0 3,06 4,05 4,437 0,59 0,873,356,78,79 3,055 3,930 4,38 3 0,59 0,870,350,77,60 3,0 3,85 4, 4 0,58 0,868,345,76,45,977 3,787 4,40 5 0,58 0,866,34,753,3,947 3,733 4, ,58 0,865,337,746,0,9 3,686 4,05 7 0,57 0,863,333,740,0,898 3,646 3, ,57 0,86,330,734,0,878 3,60 3,9 9 0,57 0,86,38,79,093,86 3,579 3, ,57 0,860,35,75,086,845 3,55 3,850 0,57 0,859,33,7,080,83 3,57 3,89 0,56 0,858,3,77,074,89 3,505 3,79 3 0,56 0,858,39,74,069,807 3,485 3, ,56 0,857,38,7,064,797 3,467 3, ,56 0,856,36,708,060,787 3,450 3,75 6 0,56 0,856,35,706,056,779 3,435 3, ,56 0,855,34,703,05,77 3,4 3, ,56 0,855,33,70,048,763 3,408 3, ,56 0,854,3,699,045,756 3,396 3, ,56 0,854,30,697,04,750 3,385 3,646 z 0,53 0,84,8,645,96,576 3,09 3,9 n 30: the t-distribution (student distribution) converges towards the normal distribution α: level of significance (probability of error) v = n (sample size) E.g.: Sample size of 0 and two sided safety of 95% : t n-,-α/ = t 9;0,975 =,6 Seite 34 34

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Chapter 3 Numerically Summarizing Data Chapter 3.1 Measures of Central Tendency Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode. A1. Mean The