Applied Statistics Part 2: mathematical statistics, r c-cross tables and nonparametric methods.

Size: px

Start display at page:

Download "Applied Statistics Part 2: mathematical statistics, r c-cross tables and nonparametric methods."

Cornelia Cunningham
5 years ago
Views:

1 Applied Statistics Part : mathematical statistics, r c-cross tables ad oparametric methods. Preface This part of the reader is based o a secod year bachelor course Mathematical statistics with applicatios i The Netherlads. It provides a repetitio of the topics that are (should be) taught i the UTG course Probability ad Statistics ad exteds it with mathematical backgroud ad more statistical techiques, as idicated i the cotets (AS meas part of the cotet of applied Statistics ). Page Cotets 0-1 Formula sheet Descriptive statistics 1.1 Itroductio Numerical measures, histogram ad bar graph of data Classical umerical summary 1-1 AS 1.4 Outliers, Box plot ad Stem-ad-leaf plot Q-Q plots 1-18 AS 1.6 Exercises 1-6. Estimatio.1 Itroductio o estimates ad estimators -1. Compariso ad limitig behaviour of estimators -9 AS.3 Method of Momets ad method of Maximum Likelihood -16 AS.4 Exercises Cofidece itervals 3.1 Itroductio Cofidece iterval for the populatio mea μ Cofidece iterval for the variace σ Cofidece iterval for the populatio proportio p Predictio iterval 3-17 AS 3.6 Exercises 3-19

2 4. Hypothesis tests 4.1 Test o μ for kow σ : itroductio of cocepts Test o the populatio mea μ, if σ is ukow 4-15 AS 4.3 Test o the variace σ Test o the populatio proportio p 4-0 AS 4.5 The fudametal Lemma of Neyma ad Pearso 4-5 AS 4.6 Likelihood ratio tests 4-31 AS 4.7 Exercises Two samples problems 5.1 The differece of two populatio proportios 5-1 AS 5. The differece of two populatio meas 5-5 AS 5.3 Test o the equality of variaces 5-9 AS 5.4 Paired samples 5-1 AS 5.5 Exercises Chi-square tests 6.1 Testig o a specific distributio with k categories 6-1 AS 6. Chi-square tests for cross tables 6-7 AS 6.3 Fisher s exact test 6-15 AS 6.4 Exercises Choice of model ad No-Parametric methods 7.1 Itroductio 7-1 AS 7. Large samples 7- AS 7.3 Shapiro-Wilk`s test o ormality 7-4 AS 7.4 The sig test o the media 7-7 AS 7.5 Wilcoxo`s rak sum test 7-11 AS 7.6 Exercises The distributio of S ad related topics 8.1 A umber of distributios 8-1 AS 8. The distributio of the sample variace S 8-6 AS 8.3 Cosistecy of the sample variace S 8-8 AS 8.4 About the distributio of T = (X μ) (S ) 8-10 AS 8.5 Two samples problems: ormal case with commo variace 8-11 AS Overview of parametric ad o-parametric tests - The stadard ormal distributio - The t-distributio - The Chi square distributio - The F-distributio - Shapiro-Wilk`s table - The biomial distributio Tab-1 Tab- Tab-3 Tab-4 Tab-8 Tab-11 - The Poisso distributio Tab-14 Aswers to exercises A-1/5

3 0-3 Formula Sheet Mathematical Statistics Testig procedure i 8 steps 1. Give a probability model of the observed values (the statistical assumptios).. State the ull hypothesis ad the alterative hypothesis, usig parameters i the model. 3. Give the proper test statistic. 4. State the distributio of the test statistic if H 0 is true. 5. Compute (give) the observed value of the test statistic. 6. State the test ad a. Determie the rejectio regio or b. Compute the p-value. 7. State your statistical coclusio: reject or fail to reject H 0 at the give sigificace level. 8. Draw the coclusio i words. Probability Theory E(X + Y) = E(X) + E(Y) E(X Y) = E(X) E(Y) E(aX + b) = ae(x) + b var(x) = E(X ) (EX) var(ax + b) = a var(x) If X ad Y are idepedet: var(x + Y) = var(x) + var(y), var(x Y) = var(x) + var(y) var(t) = E(var(T V)) + var(e(t V)) Distributio Probability/Desity fuctio Rage E(X) var(x) Biomial (, p) ( x ) px (1 p) x 0, 1,,, p p(1 p) Poisso (μ) e µ µ x /x! 0, 1,, µ µ Uiform o (a, b) 1/(b a) a < x < b (a + b)/ (b a) /1 Expoetial (λ) λexp ( λx) x 0 1/λ 1/λ Gamma (α, β) x α 1 exp ( x β ) /(Γ(α)βα ) x > 0 α β α β Chi square (χ f ) is the gamma distributio with α = f ad β = Bouds for Cofidece Itervals: p (1 p ) p ± c X ± c S ( 1)S ( 1)S (, ) c 1 c X Y ± c S ( ), with S = S X S Y or: X Y ± c S X + S Y 1 p 1 p ± c p 1(1 p 1) 1 β i ± c se(β i) + p (1 p ) (regressio)

4 0-4 Predictio iterval: X ± c S (1 + 1 ) Test statistics X (umber of successes for a biomial) T = X μ 0 S S T = (X Y) Δ 0 S ( ), with S = S X S Y or: Z = (X Y) Δ 0 S X + S Y 1 p 1 p Z = p (1 p ) ( ), with p = X 1 + X 1 + F = S X S Y T = β i se(β i) (regressio) SS(regressio) k F = SS(error) ( k 1) (regressio) Aalysis of categorical variables 1 row ad k colums: χ = (N i E 0 N i ) k i=1 c E 0 N i r c cross table: χ = (N ij E 0N ij ) r j=1 i=1 E 0N ij (df = k 1), with E 0N ij = ad df = (r 1)(c 1). row total colum total No-parametric tests Sig test: X ~ B (, 1 ) uder H 0 Wilcoxo`s Rak sum test: W = R(X i ), 1 i=1 uder H 0 with: E(W) = 1 1(N + 1) ad var(w) = (N + 1) Test o the ormal distributio Shapiro Wilk`s test statistic: W = ( i=1 a ix (i) ) (X i X) i=1

5 1-1 Chapter 1 Descriptive statistics 1.1 Itroductio I the itroductory chapter we discussed the basic cocepts of probability theory ad statistics: I the course Probability Theory we leared how to model stochastic situatios i reality. - A variable ad its distributio form the probability model of a real life situatio. The distributio is ofte a family of distributios, with ukow parameters. - The observatios x 1, x,, x is a realizatio of a radom sample X 1, X,, X : X 1, X,, X are idepedet ad idetically distributed (accordig the same distributio). - The radom sample eables us to umerically determie (estimate) the parameters or probabilities or expectatios such as P(X = x), P(X x) or E(X ). Example The ukow temperature μ i a garbage icierator caot be measured exactly. That is why the temperature is measured several times ad we observe temperatures x 1, x,, x, which, because of lack of a accurate method, are differet, but are supposed to be close to the real value μ. A atural method to estimate µ from the repeated measuremets is to compute the observed mea temperature x = 1 i=1 x i. The idea to estimate a ukow value i this way, based o repeated observatios, seems obvious owadays, but the cocept was itroduced ot more tha 400 years ago. Why is the mea the best way to combie multiple observatios as to estimate a ukow value? Below we give two reasos from a data-aalytic poit of view. (1) If the observatios x 1, x,, x all give a idicatio of the real, but ukow value μ, the we could estimate μ with a value a, such that the differeces x i a are as small as possible. The differeces x i a are the so called residuals: they are either positive or egative (or 0). If we choose a to be the estimate such that the sum of all residuals is 0, the we fid a = x: From i=1 (x i a) = i=1 x i i=1 a = i=1 x i a = 0 we fid: a = 1 x i=1 i. () Aother logical approach to fid the ukow μ from the observatios x 1, x,, x is to compute a value a such that i=1 (x i a), the sum of squared residuals, is as small as possible. This least-squares-estimate is, agai, x: (x i a) = (x i x + x a) i=1 i=1

6 1- = (x i x) + (x a) (x i x) + (x a) i=1 i=1 i=1 The secod term is 0, sice i=1 (x i x) = x i x = x x = 0, so we have: (x i a) = (x i x) + (x a) i=1 i=1 Sice the last expressio cosists of (o-egative) squares ad the x i s are give, the expressio attais its smallest value if the last square is 0, so if a = x. i=1 The priciples sum 0 of residuals i (1) ad the least squares i () are part of the domai of Data aalysis. The mea seems a reasoable measure to obtai the cetre of a collectio of observatios. A formal justificatio to use x as estimate for μ, ca be give if we defie a relatio betwee the observatios x 1, x,, x ad μ. The relatio is established by assumig that x 1, x,, x are observed values of radom variables X 1, X,, X, which are idepedet ad all have expectatio μ ad variace σ. The crucial step to itroduce probability models for observatios is attributed to Simpso (1755), i the followig way: X i = μ + U i where U 1, U,, U are idepedet ad all have expectatio 0 ad variace σ. I this approach, accordig to the classical statistics, U i is the measuremet error (i the model!) for the i th observatio. We do ot observe U 1, U,, U : they are merely variables i the model. We oly observe the values x 1, x,, x of X 1, X,, X. I this model we ca describe what we mea by givig a good estimate of μ by computig the mea of the observatios : Sice E(X i ) = E( μ + U i ) = μ + E(U i ) = μ ad E(X) = μ, we have E(X μ) = var( X ) = σ = E(X i μ) So the expected quadratic differece betwee X ad μ is a factor 1 smaller tha for a sigle observatio (the variace (X i μ) )

7 1. Numerical measures, histogram ad bar graph of data 1-3 Example 1..1 I a survey a group of studets is asked to aswer the followig questios: - What is the colour of your eyes? Possible aswers: dark brow, grey, blue, light brow ad gree. - How politically active are you? Possible aswers: ot at all, little, average, much, very much - What is your legth i cetimetres? Usually, if software (such as SPSS) is used to process the data, the data are coded. I this case: - Colour of eyes: dark brow = 0, grey = 1, blue =, light brow = 3, gree = 4, - Political activity: ot at all = 1, little =, average = 3, much = 4, very much = 5. - Legth: the umber of cetimetres. The the result (, 4, 169) refers to a studet i the survey who had blue eyes, is quite politically active ad has a legth of 169 cm. Note that the first two umbers do ot have a umerical meaig: if we wat, we could as well use the triple (blue, much, 169) istead of (, 4, 169). I SPSS both otatios ca be preseted. I example 1..1 we have differet kids of variables. Legth, for istace, is a quatitative (or umerical) variable: if the studet is arbitrarily chose from a populatio, oe could iterpret the legth as a realizatio of a cotiuous radom variable X. If we determie the legth of a group of studets (a radom sample), we could compute the mea legth of the group to estimate the mea legth i the populatio. Legth measuremets have a iterval-scale. The other two variables are categorical variables. The values are (apart from the codig) oumerical, we distiguished categories of studets with respect to their eye colour ad their political activity: if we use the codig to compute the mea eye colour this umber has o meaig. The two categorical or qualitative variables have differet scales: Political activity is scored o a ordial scale: the possible aswers are ordered from ot at all to very much (i this case), from small to big, etc. The eye colour is a variable with a omial scale: there is o order of the categories possible or desirable. For categorical variables we caot determie the mea. But determiatio of the sample mode, the most frequetly occurrig category, is possible. For the ordial variable we ca i additio determie the sample media: the category for which the cumulative percetage 50% is attaied. Furthermore we otice that, for the sample as a whole, radom variables (sample variables) ca be defied. The mea of the observed legths is a example, but for categorical variables we could cout the umber of evets, such as the umber of blue-eyed studets i the sample. If coditios are met (idepedece) we ca apply the biomial distributio for this umber. For this goal we ca defie a ew variable Blue-eye, which is 1 if the studet has blue eyes ad 0, if ot. The sum of these variables is the biomial umber, where p = probability of blue eyes.

8 1-4 Returig to the quatitative variables (observed o a iterval-scale), we are goig to discuss the commo graphical presetatio of the sample observatios is x 1, x,, x. For discrete variables this is the bar graph of relative frequecies. Ad for cotiuous variables a histogram. Example 1.. We presume that the umber of washig machies that a salesma sells i oe week is Poisso distributed. But the parameter, the expected umber μ of sold washig machies per week, is ukow. The salesma recorded the followig umbers of sold washig machies i oe year ( = 5 weeks). The umbers are preseted i a frequecy table: Sales umber x Total Frequecy (umber of weeks) (x) = 5 A suitable estimate of the expected sales umber ( the log term average umber of sold washig machies ) is the mea of the 5 weekly sales umbers. Sice the sales umbers vary from 0 to 7 ad some umbers occur more ofte tha other umbers, we ca compute the weighted average of the values of x, usig the relative frequecies: f (x) = (x), so the estimate of μ 4 x = x f 5 (x) = x This computatio is (ot coicidetally) similar to the computatio of the expectatio: E(X) = x P(X = x). x The remaiig issue is whether the Poisso distributio applies. A idicatio ca be give by comparig the relative frequecies tot the Poisso probabilities for the estimated μ =.7. Below we show the umerical compariso ad the bar graphs of both the relative frequecies ad the Poisso probability fuctio: Number x > 7 Total Rel. freq. f 5 (x) P(X = x) if μ =

9 1-5 The umerical ad graphical comparisos show that the distributios are roughly the same: differeces could be explaied from stochastic variatio : if the distributio is really Poisso, the observed values are quite commo. Later i this reader we will be able to assess whether the differeces betwee probabilities ad relative frequecies are statistically sigificat. I example 1.. we showed a bar graph: it ca be cosidered to be a experimetal (estimated) probability fuctio. For cotiuous (or iterval) variables the histogram is the experimetal aalogue of the desity fuctio. To costruct a histogram the measuremets x 1, x,, x are grouped ito itervals, usually of equal width. The umbers of observatios i each iterval are preseted i a frequecy table. I the graph above each iterval a rectagle is erected. The height of the rectagle ca be either the frequecy, the relative frequecy ( frequecy of the iterval that the area of the rectagle equals the relative frequecy: relative frequecy = area = height width, for each iterval. ) or the height is chose such The latter presetatio of the histogram follows the aalogue to the desity fuctio closest: the total area, beig the total relative frequecy, is 1, aalogously to the total probability 100% of the desity fuctio. Example 1..3 I the seveties, after the oil crises, the petrol cosumptio of cars was ivestigated. 3 Car models were tested: the distaces x 1, x,, x 3 i mile (1609 meter) per gallo (3.79 liter) were recorded. Below you ca fid the observed distaces: x 1 = 1.0, x =.8,, x 3 = To get a better view o the differeces i petrol cosumptio we ca order the observatios, from the smallest to the largest: The ordered observatios are called the order statistics ad their otatio is x (1), x (),, x (3) : x (3) meas the two but smallest observatio i the data set. If we choose to graph total rage of the observed values (the observatios are ragig from 10 to 35) with 10 itervals of width.5, we fid (usig SPSS):

1-6 Of course the choice of the umber of itervals is arbitrary.

Compare the first histogram to the oes below: oe is too rough, the other too detailed (empty itervals).

the shape of the histogram similar as the desired model distributio?

10 1-6 Of course the choice of the umber of itervals is arbitrary. We try to choose a umber of itervals such that itervals do ot too may or too few observatios. Compare the first histogram to the oes below: oe is too rough, the other too detailed (empty itervals). A histogram ca be used to check graphically whether a specific distributio, that we wat to use as a model for the variable, applies: is the shape of the histogram similar as the desired model distributio? I this case we might check whether the ormal distributio applies to the petrol cosumptio: as you ca see i the graph with the (adapted) desity ad the histogram, we caot uambiguously coclude that the ormal distributio applies: betwee 10 ad 8 the graph looks reasoably symmetric but a empty iterval ad the observatios betwee 30 ad 35 disturb this picture.

11 1-7 I example 1..3 we observed that the choice of the itervals ca ifluece the overall shape of the histogram. If software, such as SPSS, is used to create histograms automatically, oe should be aware of this pheomeo. Usually it is possible to chage the umber of the itervals (i SPSS it is). If ecessary you ca use a simple rule of thumb to determie the desired umber of itervals roughly. Rule of thumb for determiatio of the umber of itervals i a histogram: for observatios of a cotiuous variable a histogram with about (equally large) itervals is costructed. Orderig of observatios is helpful i descriptive statistics: these order statistics ca be used to determie the frequecy table easily ad will be used later o to determie percetiles. Defiitio 1..4 The order statistics x (1), x (),, x () is a order of the observatios x 1, x,, x such that x (1) x () x () The umber k is the rak of the observatio x (k). The cetre of a sequece of ordered observatios is, for a odd umber of observatios, the middle observatio: the sample media, or for short the media. If is eve, the observatios are i the middle; i that case the media is the mea of the middle two. For istace, i example 1..3 ( = 3) we have: media= x (16)+x (17) = = 19. x +1 ( if is odd ) Defiitio 1..5 The (sample) media is m = { 1 [x ( 1 + x ) ( 1 +1)] if is eve The sample media distiguishes the greater ad the smaller 50% of the observatios: it is the 50 th percetile of the data set. Percetiles are widely used to put a score i perspective. I The Netherlads the percetile score o the CITO-test (at the ed of the primary school) is well kow: a percetile score 98 of a pupil meas for istace that the 98 percet of the pupils i The Netherlads had a lower score ad percet had a higher score. Percetiles are also used to split the data set up ito 4 equally large subsets (usig the 5 th, 50 th ad 75 th percetile) or to determie the top 1%, usig the 99 th percetile. The 5 th percetile is idicated as the lower quartile Q 1 (or: Q L ), the media m is the secod quartile (Q ) ad the upper quartile Q 3 = the 75 th percetile (or: Q U ). I geeral the k th percetile of ordered observatios meets the followig coditios: - at least k% of the observatios are less tha or equal to the k th percetile ad - at least (100 k)% is greater tha or equal to the k th percetile. This defiitio allows, however, a multiple choice of the k th percetile i some cases.

12 1-8 Example 1..6 We retur to the 3 observed petrol cosumptios i example 1..3: rak: rak: We will use the percetile defiitio above: The 10 th percetile ca be determied by computig 10% of = 3: = 3., so x (4) = 14.3 is the 10 th percetile, sice 4 of the 3 observatios are less (4 is more tha 10%) ad % are at least The lower quartile Q 1 is the 5 th percetile: 5% of 3 is 8. But Q 1 is ot simply x (8), sice x (9) distiguishes the 5% smallest ad 75% largest observatios as well. Similar to the approach used for the media for eve, we will use the mea of these two cadidates: Q 1 = x (8)+x (9) Check o the media, beig the 50 th percetile: 50% of 3 is 16, so x (16) ad x (17) are both = = cadidates: m = = 19.. Correct! Computatio of Q 3 : 75% is 3 is 4 observatios, so Q 3 = x (4)+x (5) =.8+.8 =.8. Computatio of the top 10% of the observatios (the 90 th percetile): 90% of 3 is 8.8, so the 90 th percetile is x (9) = The top 10% cosists of the observatios 30.4 ad larger. Without formal defiitio we foud a uivocal method to determie the k th percetile of observatios x 1, x,, x : Compute k% of : c = k 100. If c is ot iteger, roud c upward to the first larger iteger [c]: the k th percetile is x ([c]).) If c ìs iteger, the the k th percetile = x (c)+x (c+1). It should be oted that statistical software does ot always use the defiitio above to determie quartiles ad percetiles. This could result i small deviatios to the percetiles that we compute by had. Some books use quatiles to deote percetiles. Measures for the cetre. Besides the sample mea we discussed two alterative measures for the cetre. 1. The sample mea x = 1 x i i=1.. The media m: the middle observatio (defiitio 1..5). 3. The mode: the most frequetly occurrig observatio.

13 1-9 The mode is ot used ofte i practise: it applies mostly to discrete variables with few values ad for a histogram the mode is iterpreted as the most frequetly occurrig iterval, e.g. the modal salary is the iterval of salaries that occur most frequetly i a populatio, i the histogram visible as the highest peak. Media ad sample mea (ad mode) are approximately the same for symmetric distributios ad histograms, but if the distributio is o-symmetric the differeces ca be large. The graphs below show that, if the distributio (of observatios or of a variable), has a tail to the right, the x > m: the mea is strogly iflueced by (very) large observatios, but the media is ot. The media is said to be resistat, ot sesitive for extreme observatios (outliers). Similar to the situatio that the graph is skewed to the right, we have x < m if the graph is skewed to the left (a tail o the left). Measures for variatio We wat to characterize variatio (or spread or variability or dispersio) of observatios with just oe umber: if oe data set has a larger measure of spread tha the other, the the mutual differeces of the first set should be larger ad if the measure is 0, it would preferably mea that there are o differeces: all observatios are the same. It seems reasoable to cosider differeces to the overall mea x. x 1 x x x 3 x 4 x 4 x Similar to the defiitio of the variace for distributios we will ot use the mea of the distaces x i x as a measure, but the mea of the squared differeces. Defiitio 1..7 The sample variace of the observatios x 1,, x ca be computed by:

14 1-10 s = 1 1 (x i x) Of course we should have at least = observatios: oe observatio does ot give ay iformatio about variatio. i=1 We do ot divide the sum of squares by but by 1, which is called the umber of the degrees of freedom : for fixed mea x, we ca freely choose 1 umbers, but the last value x will deped o the 1 choices. Furthermore we will see i chapter that the factor 1 1 i the formula is ecessary to make s a ubiased estimate of the populatio variace σ. For observatios we ca compute the stadard deviatio like i probability theory. Defiitio 1..8 The sample stadard deviatio of x 1,, x is s = s Stadard deviatio ad variace are, as before, exchageable measures for variatio ad have similar properties: s ad s are o-egative ad oly equal to 0 if all observed values are the same. Note the similarities ad differeces: Measures for the cetre for the variatio (discrete) distributio E(X) = x P(X = x) x σ = E(X μ) = (x μ) P(X = x) x σ = σ Data set x = 1 x i = x i 1 i=1 i=1 s = 1 (x 1 i x) i=1 = (x i x) 1 1 i=1 s = s Applied to probability distributios the mea weighs every value x with its probability P(X = x). For data sets all observed values are equally importat (factor 1 ad 1 1, respectively). Measures for variatio: 1. The sample variace s 1 = (x 1 i=1 i x),. the sample stadard deviatio s = s ad 3. the Iter Quartile Rage IQR = Q 3 Q 1 I the iterval (Q 1, Q 3 ) there are (about) 50% of the observatios, the middle 50% of the sample distributio : the IQR is the width (rage) of this iterval. Example 1..9 (cotiuatio of the examples 1..3 ad.5 w.r.t. the petrol cosumptio of cars) We determied (Q 1, Q 3 ) = (15.35,.8). Sice there are two observatios.8, 15 of the 3

15 1-11 observatios are cotaied i the (ope) iterval, a little less tha 50%. The iter quartile rage IQR = Q 3 Q 1 = = We ca compute the mea ad the variace of the 3 observatios, to fid a simple, frequetly used umerical summary i statistics: = 3, x ad s Or, equivaletly, agai i two decimals: = 3, x ad s 6. 0 You should be able to eter the data oce i your scietific calculator (o GR) i sd- or STATmode, fidig mea x ad stadard deviatio s immediately. Chebyshev s rule for all distributios ad the Empirical rule for moud shaped distributios apply to both probability distributios (μ ad σ ) ad data distributios (x ad s ). For the 3 petrol cosumptios we computed the itervals (x k s, x + k s) with k = 1,, 3: Iterval (x s, x + s) = (14.07,6.11) Proportio of the observatios Proportio accordig to Chebyshev Proportio accordig to Empirical rule = 75% 0 68% (x s, x + s) = (8.05, 3.13) 94% 75% 95% (x 3s, x + 3s) = (.03, 38,15) 100% 89% 99.7% Chebyshev is (as always) fulfilled ad as a cosequece of oly small deviatios from the ormal distributio, the proportios of the observatios ad the empirical rule are almost the same. I Probability Theory we discussed that the Empirical rule is based o the ormal distributio. If we have a radom sample take from the ormal distributio (or a approximately ormal distributio) the Empirical rule should apply: the larger, the closer the observed proportios should be to the percetages accordig to the Empirical rule. Chebyshev`s rule applies to ay data set, o matter what shape the distributio has: it is a cosequece of Chebyshev s iequality give i property Property (Chebyshev`s rule) For ay set of observatios x 1,, x the proportio of observatios withi the iterval (x k s, x + k s) is at least 1 1 k. This iequality is iformative for all iteger ad ratioal umbers k, larger tha 1 ( k > 1). Applyig z-scores to observatios: remember that i probability theory we computed probabilities for a N(µ, σ )-distributio of X usig the z-score = P (Z x μ σ useful as well. x μ σ, for istace P(X x) = ), to be foud i stadard ormal table. For observatios stadardizatio ca be

16 1-1 x x Defiitio.1.11 If the sample is x 1,, x, the z-score of a observatio x is z = s The iterpretatio of a z-score is straight forward: A z-score 3 meas that the observatio x is three stadard deviatios less tha the sample mea (quite extreme accordig to the empirical rule): x = x 3 s. z = 1.4 meas: x is 1.4 stadard deviatios larger tha x: x = x s. 1.3 Classical umerical summary For a probability distributio of a radom variable X we kow the measures of cetre ad variatio are E(X) = μ ad var(x) = σ (or stadard deviatio σ X ). Ad x ad s (or s) are similarly defied as the correspodig measures for observatios x 1,, x. I this sectio we add two more measures: a measure for skewess (o-symmetry) ad the kurtosis, a measure for the thickess of the tails of the distributio. As before, we will use the similarities of the measures i probability ad i statistics. E(X k ), the k th momet of X for k = 1,,, has bee used i probability theory. E(X μ) k is called the k th cetral momet of X. - The first cetral momet (k = 1) is always 0: E(X μ) = E(X) μ = 0 - The secod cetral momet (k = ) is per defiitio the variace: E(X μ) = var(x) - The third cetral momet E(X μ) 3 gives iformatio about the symmetry of the distributio: if the distributio is symmetric, such as the ormal ad the uiform distributio, this cetral momet is 0. If the distributio is skewed to the right (e.g. expoetial), it is positive. Ad egative if the distributio is skewed to the left. - The fourth cetral momet E(X μ) 4 is larger if the tails of the distributio are thicker. Oe ca easily verify that the E(X μ) 3 ad E(X μ) 4 deped o the chose uit of measuremet: that is why the cetral momets should be divided by σ 3 ad σ 4, to make them idepedet of scale. E(X μ)3 γ 1 = σ 3 is the skewess (coefficiet) of X γ = E(X μ)4 σ 4 is the kurtosis of X

17 1-13 For some importat distributios the values are give i the table below: Populatio distributio Measure for U(a, b) N(μ, σ ) Exp(λ) cetre μ a+b 1 μ λ variatio σ (b a) σ 1 1 λ skewess γ tail thickess γ As the table shows γ 1 ad γ ideed do ot deped o expectatio or variace of the distributio: γ 1 ad γ do ot deped o a ad b, μ ad σ ad λ, respectively. The skewess coefficiet 0 ad the kurtosis 3 will be used, from ow o, as the referece values of the ormal distributio. The referece values of the expoetial distributio ( ad 9) are larger: the positive skewess coefficiet reflects the o-symmetry ad strog skewess to the right of the expoetial desity fuctio. The kurtosis 9 meas that the tail of the expoetial distributio is much thicker tha the ormal oe. If we compare the desity fuctios of the expoetial ad ormal distributio, simplified e x versus e 1 x, it is clear that for large x the ormal desity fuctio coverges to 0 more rapidly. The uiform distributio has the smallest kurtosis: the tails just break off at a ad at b. Now that we kow the probability-theoretical formulas of γ 1 ad γ we ca costruct estimates, based o the sample observatios x 1,, x. We will use the followig estimates: - A estimate of E(X μ) 3 is the mea of the values (x i x) 3, so: 1 (x i x) 3 i=1. - Sice σ = E(X μ) a estimate is the mea of the squares: 1 (x i=1 i x). So γ 1 = E(X μ)3 could be estimated: divide 1 (x σ 3 i=1 i x) 3 by [ 1 (x i=1 i x) ] 3/. Similarly γ = E(X μ)4 is estimated by σ 4 1 i=1 (x i x) 4 [ 1 i=1 (x i x) ]. Note that we did ot use s to estimate σ, but the formula with the factor 1.. I the followig defiitio all relevat measures for a data set are combied:

18 1-14 Defiitio The classical umerical summary of observatios x 1,, x cosist of: Sample size Sample mea x = 1 x i Sample variace s = 1 1 (x i x) Sample stadard deviatio i=1 s = s Sample skewess coefficiet b 1 = Sample kurtosis b = i=1 1 i=1 (x i x) 3 [ 1 i=1 (x i x) ] 1 i=1 (x i x) 4 [ 1 i=1 (x i x) ] The skewess coefficiet gives iformatio about the symmetry of the distributio: - if the distributio is symmetric, such as the ormal ad the uiform distributio, its value is 0, so for a sample take from a symmetrical distributio it should be close to 0. - If the distributio is skewed to the right (e.g. expoetial), it is positive. A positive value of the sample skewess idicates skewess to the right. - Negative values of the skewess idicates that the distributio is skewed to the left. The kurtosis attais larger values if the tail (or both tails) of a distributio is thicker, meaig that the probability of (very) large or small values is large. Example 1.3. Referrig to the petrol cosumptios of cars, itroduced i example 1..1, we computed the followig classical umerical summary: Sample size = 3 Sample mea x 0.09 Sample variace s Sample stadard deviatio s 6.03 Sample skewess coefficiet b Sample kurtosis b.83 Assessig this summary: the skewess coefficiet is positive ad closer to 0 (ormal referece value) tha to (expoetial), so the observatios are slightly skewed to the right. The kurtosis.83 is close to the ormal referece value 3. The histogram i example 1..3 cofirms the slight skewess to the right. So the umerical summary idicates a preferece for the ormal model over the expoetial alterative, but we caot fully choose the ormal distributio as the oly possible model for the petrol cosumptios. 3/

19 1-15 How the umerical summary is used might be clear from the previous example: if, for example, it is presumed that the ormal distributio applies to the sample, take from a populatio, the x ad s estimate μ ad σ. But before applyig the assumptio of ormality the observed skewess ad kurtosis have to be compared to the theoretical values 0 ad 3. What is cosidered as sufficietly close to 0 or 3, is relatively arbitrary, but ofte, like i SPSS, a stadard error (estimatio of the stadard deviatio) of the observed skewess ad kurtosis is provided: if the observed value does ot deviate more tha stadard errors from the referece values, there is o reaso to doubt the presumed distributio. The graph (histogram) could give some additioal iformatio. If the expoetial distributio is presumed, the histogram has to be skewed to the right: Furthermore x ad s are estimates of μ = 1 ad λ σ = 1 λ, so we should have x s. The sample skewess coefficiet ad kurtosis should be close to the referece values ad 9. We ote that software, such as SPSS, ofte uses the adjusted kurtosis: the referece value for the ormal distributio is set to γ 3 = 0. Istead of the sample kurtosis b the adjusted value b 3 is preseted i umerical summaries. 1.4 Outliers, Box plot ad Stem-ad-leaf plot Besides the triple umerical summary, x ad s (or s) or the exteded classical umerical summary, sometimes resistat measures, such as media ad IQR, are used as a alterative, especially whe the data set is skewed or has outliers. Media, quartiles ad iter quartile rage are or sesitive for outliers or tail behaviour. Defiitio The 5-umbers-summary of x 1,, x is x (1), Q 1, m, Q 3 ad x (). This summary is graphically preseted as a so called box plot: lie of umbers: x (1) Q 1 m Q 3 x ()

1-16 The box cotais the middle 50% of the observatios ad has a legth equal to the IQR. The whiskers are at the positio of the smallest ad the largest observatio, x (1) ad x ().

20 1-16 The box cotais the middle 50% of the observatios ad has a legth equal to the IQR. The whiskers are at the positio of the smallest ad the largest observatio, x (1) ad x (). Above we preseted a horizotally positioed box plot ad a horizotal lie of umbers, but most programs use vertical presetatios. Example 1.4. (cotiuatio of the examples 1..3, 1..5 ad 1..8). The maximum, miimum ad the quartiles of the petrol cosumptios have bee determied already: the 5-umberssummary 10.4, 15.35, 19.,.8 ad 33.9 could also be determied by SPSS, or we could directly graph the box plot, which is show alogside. Is the largest observatio 33.9 extremely large? Is it a outlier? The 1. 5 IQR-rule is a simple rule to determie whether observatios are suspect : observatios at least 1.5 IQR larger tha the third quartile or at least 1.5 IQR less tha the first quartile. I this example Q 1 = ad Q 3 =.8, so IQR = = Outside the iterval (Q IQR, Q IQR) = ( , ) (4.18, 33.98) All observatios are cotaied i the iterval, so o outliers i this data set. I the followig graph we show how to preset a box plot with outliers accordig to the 1.5 IQR-rule (sometimes referred to as the boxplot method ): outliers are idicated with a asterix (*), oe o the left ad two o the right, the whiskers are positioed at the smallest ad largest of the remaiig observatios. * * * x (1) x () Q 1 m Q 3 x ( ) x ( 1) x () 1. 5 IQR IQR IQR Defiitio The 1. 5 IQR-rule for determiatio of outliers: observatios outside the iterval (Q IQR, Q IQR) are outliers.

21 1-17 Outliers (Dutch: uitschieters) are cosidered to be suspect, potetially false observatios. Relatively large or small observatios could be perfectly ormal observatios for a populatio, just caused by chace (stochastic variatio). O the other had data sets sometimes cotai false or eve impossible observatios. Oly if we are sure that a mistake has bee made (a mismeasuremet), we will remove a outlier from the data set ad we will adjust the data aalysis to the remaiig observatios. There are several alterative methods to detect outliers, such as: the 3 IQR-rule: SPSS calls observatios outside the iterval (Q 1 3 IKA, Q IKA) extreme. the 3 s-rule: observatios outside tolerace bouds x 3s e x + 3s are potetial outliers. From the empirical rule we kow that for symmetric, moud shaped distributios these values will occur at a rate of oly 0.3%. Stem-ad-leaf plot Below we show the stem-ad-leaf plot of the 3 petrol cosumptios. It ca be composed easily with the order statistics or usig software such as the SPSS-program. Stem-ad-leaf Plot of Petrol cosumptio i mile per gallo 1 5 = 15 mile/gallo Frequecy Stem Leaf (13) This diagram presets the 3 observatios as follows: - The observatios ru from 10.4 to 33.9: the tes 10, 0 ad 30 are used as stem, ad otated as 1, ad 3. - The secod digit is the leaf, so the smallest observatio 10.4 is show as 1 0 : the decimal.4 is simply removed. Likewise: 33.9 is Sice we wat to divide the observatios i sufficietly may ( rule!) itervals, the stems are split up: for all tes we distiguish the small (0-4) ad large (5-9) leafs. - The leafs i each row are ordered from small to large. - The colum Frequecy couts the umber of observatio for the related stem. The media is cotaied i the stem where the frequecy is betwee brackets. Stem-ad-leaf plots ad histograms are both based o divisios i itervals ad their frequecies or relative frequecies. A histogram shows a clear relatio with the correspodig desity fuctio, but both show the overall shape, peaks, symmetry ad empty itervals. I box plots symmetry is visible: the smallest quarter ad largest quarter of the observatios, ad

22 1-18 secod ad third quarter should have similar rages for symmetrical distributios. Furthermore box plots provide iformatio about outliers. I geeral outliers accordig the 1.5 IQR- or the 3s-rule occur more frequetly (are more probable) if the uderlyig distributio is skewed: i this respect outliers might be cosidered as a idicatio for o-ormality. I exercise 5 we will show that, for a ormal distributio, the theoretical probability of a outlier accordig the 1.5 IQR-rule is approximately 0.7%. Cosequetly, i a radom sample draw from a ormal distributio we expect o outliers if the sample size = 10: the probability of o outlier is %. But if = 1000 the expected umber of outliers is 7 ad o outlier is ulikely: probability %. The larger the sample size, the likelier outliers will occur! 1.5 Q-Q plots Numerical summaries ad histograms ca be helpful i idetifyig the model that applies to a set of observatios ad thereby the distributio of the populatio from which the sample is draw. I this sectio we will discuss a additioal graphical techique to check whether a presumed distributio applies: Q-Q plots. Q-Q plot for the uiform distributio o (0, 1) Example If a series of, for istace, 9 arbitrary umbers x 1, x,, x 9 are observed ad we woder whether they origiate from a U(0,1)-distributio, the umbers should at least be betwee 0 ad 1. Subsequetly, we could order the umbers, from small to large: x (1), x (),, x (9) : if it is a radom sample from the U(0,1)-distributio we would expect them to be spread evely o the iterval. But what is the exact expected positio of the order statistics x (1), x (), etc.? The aswer to this questio ca be give if we defie a probability model of the observatios ad use probability techiques to determie the expected values. Model: X 1, X,, X 9 are idepedet ad all U(0, 1)-distributed. So f(x) = 1, if 0 < x < 1 ad F(x) = P(X x) = x, if 0 < x < 1.

x=0 10 Because of symmetry the expectatio of the smallest observatio is E(X (1) ) = 1 10. Similarly we ca fid the expected positio of all 9 order statistics: E(X (i) ) = i 10, i = 1,,, 9.

23 1-19 The distributio of the largest observatio X (9) = max(x 1,, X 9 ) ca be determied: F X (9) (x) = P(max(X 1,, X 9 ) x) = P(X 1 x ad. ad X 9 x) id. = P(X1 x) P(X 9 x) = x 9 So f X (9) (x) = d F dx X(9) (x) = 9x8, if 0 < x < 1 ad f X (x) (9) = 0, elsewhere. Now we ca compute E(X (9) ) = xf(x)dx 1 = x 9x 8 dx 0 = 9 x=1 10 x10 9 =. x=0 10 Because of symmetry the expectatio of the smallest observatio is E(X (1) ) = Similarly we ca fid the expected positio of all 9 order statistics: E(X (i) ) = i 10, i = 1,,, 9. See for more techical details ote Apparetly if we cosider 9 radom umbers betwee 0 ad 1, we ca plot 10 equal itervals of (0, 1): the expected values E(X (i) ) are positioed o the bouds of the itervals: If the radom sample of 9 umbers is produced by a radom umber geerator (usig a calculator or Excel), a uiform Q-Q plot of the poits (x (i), EX (i) ) ca be plotted: the expected values E(X (i) ) o the Y-axis ad the observed values x (i) o the X-axis. Below the result of 3 repeated simulatios of = 9 radom umbers is show. We expect that x (i) EX (i) : the poits ( x (i), EX (i) ) are expected to lie o the lie y = x, but due to stochastic variatio (fluctuatios, oise) deviatios from the lie will ievitably occur. For

1-0 istace, i the graph of simulatio 1 the smallest radom umber is greater tha 0.: the probability that this evet all 9 umbers greater tha 0. occurs is 0. 8 9 13.

24 1-0 istace, i the graph of simulatio 1 the smallest radom umber is greater tha 0.: the probability that this evet all 9 umbers greater tha 0. occurs is %, oce i 7 repetitios of the simulatio. The deviatios from the lie y = x ted to be smaller as the sample size icreases: Reversely, if the observatios illustrate that the observatios show a fairly straight lie i the uiform Q-Q plot, oe ca coclude that this uiform distributio applies. A uiform Q-Q plot is a graph of poits (x (i), EX (i) ), where the ordered observatio x (i) is the X-co-ordiate ad its expected value E(X (i) ) = the Y-co-ordiate (i = 1,.., ) i +1 accordig to the U(0,1)-distributio is Expoetial Q-Q plot A expoetial Q-Q plot is a graph of poits (x (i), EX (i) ) of ordered observatios x (1),, x () o the X-axis ad their expected values E(X (i) ) accordig to the expoetial distributio o the Y-axis

1-1 The expectatios EX (i) ca be computed exactly after determig the distyributio of the order statistics for a specific distributio, as show i ote 1.5., but we will use the aproximate SPSSapproach.

25 1-1 The expectatios EX (i) ca be computed exactly after determig the distyributio of the order statistics for a specific distributio, as show i ote 1.5., but we will use the aproximate SPSSapproach. This approach uses the expected values for ordered observatios, draw from the +1 uiform distributio o (0,1): divide the iterval (0, 1) ito + 1 equally wide subitervals with legth 1. But the expoetial distributio, show i the graph below, has a ifiite rage +1 1 (0, ): we ca split this iterval ito subitervals, all with probability, as is show i the graph for = 9 observatios. i +1 (Note that the theoretical value of E(X (i) ) is slightly differet: we adopted SPSS s approximate method to simply determie estimates of the expected values, see ote 1.5. below). The graph illustrates that i geeral P (E(X (i) ) X E(X (i+1) )) 1 or P( X E(X (i) ) ) i +1 +1,, for i = 1,,. Usig the expoetial distributio fuctio F(x) = P(X x) = 1 e λx (x > 0), the value of x i the graph above ca be computed: F(x) = 1 e λx = 0.8, so x = l (0.)/λ, expressed i the ukow λ. More geeral: F (E(X (i) )) i + 1, so E(X (i)) F 1 i ( + 1 ) The example where = 9 ad i = 8 is illustrated i the graph of the distributio fuctio F:

26 1- Sice F 1 (x) = l(1 x), we ca express the estimate of E(X λ (i) ) i a formula with λ: i F 1 i l (1 ( ) = + 1 ) (i = 1,,, ) + 1 λ The scale parameter λ is ukow, but give the observatios we ca estimate the value of λ: the mea x estimates E(X) = 1, so λ ca be estimated by 1. Sice we use estimates for the λ x value of λ, the expected values i the expoetial Q-Q plot are estimates as well. We simulated a expoetial distributio as to see how the expoetial QQ-plot looks like, if the populatio is really expoetial. I the Q-Q plot of = 9 observatios i SPSS the observed value x (i) are placed o the X-axis ad the expected expoetial values E(X (i) ) o the Y-axis: The poits are quite close to the lie y = x: apparetly the deviatios are caused by atural variatio. Such a Q-Q plot would cofirm the assumptio of a expoetial distributio. Note 1.5. The approach we chose above is the same as SPSS, but it should be oticed that the expected values of the order statistics are approximated. If, for example, we have a radom sample of = 9 is draw from a expoetial distributio, the the smallest observatio X (1) = mi(x 1,, X 9 ) has a expoetial distributio as well, with parameter 9 λ. So E(X (1) ) = 1 But the P (X 1 ) is ot exactly 0.1, as i the approach above, but: 9λ 1 e λ 1 9λ = 1 e I geeral the expected values of the order statistics of a radom sample X 1,., X, draw from a populatio with desity f ca be determied usig the followig desity fuctio! f X(i) (x) = (i 1)! ( i)! F(x)i 1 [1 F(x)] i f(x), x R 9λ.

We will start off with a stadard ormal Q-Q plot of ordered observatios X (i) o the X-axis ad the (estimates of) expected values E(X (i) ) o the Y-axis. So a plot of the poits (X (i), Φ 1 ( i +1 )).

27 1-3 The derivatio ad the formula is beyod the scope of this course: see text books o Probability Theory or Mathematical Statistics. Normal Q-Q plot A ormal Q-Q plot is a plot to check out the ormality assumptio of the data set. We will start off with a stadard ormal Q-Q plot of ordered observatios X (i) o the X-axis ad the (estimates of) expected values E(X (i) ) o the Y-axis. So a plot of the poits (X (i), Φ 1 ( i +1 )). The determiatio of the expected values is illustrated below for = 9 observatios. geerate Remember that Φ(z) = e 1 x dx π is the stadard ormal distributio fuctio, for which we have to cosult the N(0,1)-table with umerically approximated values. Cosider the k th percetile of the N(0,1)- distributio: if Φ(z) = k, the z = 100 Φ 1 ( k ). 100 The stadard ormal Q-Q plot alogside is costructed as follows: we used Excel to z 1 Commeted [MT(1]:

1-4 = 9 radom draws from the N(0,1)-distributio. The Q-Q plot cosists of 9 poits, where the X-co-ordiate is the observed x (i) ad the Y-coordiate the expected value for x (i) : EX (i) Φ 1 ( i ).

+1 The geeralizatio to a ormal Q-Q plot is easily made, sice from Probability Theory we kow that the lik betwee a N(μ, σ )- ad the N(0,1)-distributio is stadardizatio: X μ ~N(0,1).

28 1-4 = 9 radom draws from the N(0,1)-distributio. The Q-Q plot cosists of 9 poits, where the X-co-ordiate is the observed x (i) ad the Y-coordiate the expected value for x (i) : EX (i) Φ 1 ( i ). 10 For observatios the poits cosist of the observed x (i) ad the expected values Φ 1 ( i ). +1 The geeralizatio to a ormal Q-Q plot is easily made, sice from Probability Theory we kow that the lik betwee a N(μ, σ )- ad the N(0,1)-distributio is stadardizatio: X μ ~N(0,1). σ Or: if Z ~ N(0, 1), the X = μ + σ z ~ N(μ, σ ) I a ormal Q-Q plot the poits cosist of order statistic x (i) ad its (estimated) expected value μ + σ Φ 1 ( i +1 ). The parameters μ ad σ are ecessary to compute the expected values, but i geeral they are ukow: istead of μ ad σ we will use the estimates x ad s computed from the observatios x 1,, x. We kow that these estimates are sesitive for outliers: this sesitivity also applies to Q-Q plots, especially for small sample sizes. Iterpretatio of a ormal Q-Q plot (as before): if the poits do ot deviate from the lie y = x too much the assumptio of a ormal distributio is cofirmed. Example I example 1..3 the histogram showed some deviatios from the ormal distributio. The skewess coefficiet was 0.67, cofirmig slight skewess to the right. To support the choice of a model of the observatios we could assess both the ormal ad the expoetial Q-Q plot, preseted below with SPSS:

29 1-5 Commet: the ormal Q-Q plot shows a patter of relatively small deviatios from the lie y = x, which seems to be caused by the larger observatios, that are larger tha expected. The coclusio from the expoetial Q-Q plot is straightforward: the expoetial distributio does ot apply, sice there is a patter of large deviatios from the lie. I coclusio: the ormal distributio is the most likely of the two, but it is questioable whether the deviatios are explaied by atural variatio. For this kid of problems we will discuss a test o ormality i the last chapter. Note Sometimes o the X- ad Y-axis ot the observatios ad the expected values are preseted, but their z-scores: x (i) x o the Y-axis ad Φ 1 ( i ) o the X-axis. s +1 These trasformatios leave the overall shape of the Q-Q plot uchaged, but the lie y = x is trasformed accordigly.

30 Exercises 1. Compute the mea, the media, the variace ad the stadard deviatio of each of the followig data sets. Use a simple scietific calculator with data fuctios (o GR) ad roud your aswers i two decimals. a b c Suppose that 40 ad 90 are two (of may) observatios: their z-scores are ad 3, respectively. Ca you determie the mea x ad s from this iformatio? If so, do it. If ot, explai why ot. 3. (Quartiles of a ormal distributio) a. Determie the quartiles Q 1 ad Q 3 for the stadard ormally distributed radom variable Z. b. Determie the bouds for the 1.5 IQR-rule (for detectig outliers) applied to the stadard ormal distributio, resultig i a iterval (Q IQR, Q IQR). c. Compute the probability of a outlier for a stadard ormal distributio. d. Determie the probability of a outlier for a expoetial distributio. (f(x) = λe λx, x 0) owers of the ew electric car Nissa Leaf are willig to participate i a survey which aims to determie the radius of actio of these cars uder real life coditios (accordig to Nissa about 160 km). The owers reported the followig distaces, after fully chargig the car. The results are ordered. Furthermore a umerical summary ad two graphical presetatios are added. Oe of the questios to be aswered is whether the ormal distributio applies. I their evaluatio the researchers stated that the observatios ca be cosidered to be a radom sample of the distaces of this type of cars.

31 Numerical summary: Sample size 38 Sample mea Sample stadard deviatio 9.15 Sample variace Sample skewess coefficiet Sample kurtosis 3.09 a. Use the box plot method to determie outliers. b. Assess whether the ormal distributio is a justifiable model based o, respectively: 1. The umerical summary.. The histogram 3. The Q-Q plot What is your total coclusio? Repetitio Probability Theory 5. X 1,..., X are idepedet radom umbers betwee 0 ad 1 (every X i has a U(0, 1)- distributio). a. Determie E(X i ), var(x i ) ad show that for X i the kurtosis γ = 1.8. b. Determie the distributio fuctio (cdf) of X 1. c. Show that Y = l (X 1 ) has a expoetial desity fuctio ad determie E(Y). d. Derive the desity fuctio of Z = max(x 1,..., X ): f Z (z) = z 1, 0 < z < 1 The determie E(Z) e var(z). 6. X 1,..., X are idepedet ad all expoetially distributed with parameter λ, so f(x) = λe λx, x 0. Defie S = i=1 X i, X = 1 X i=1 i ad M = mi(x 1,, X ). a. Prove the formulas E(X 1 ) = 1 λ ad var(x 1) = 1 λ ad show that γ 1 = E(X μ)3 σ 3 =

32 1-8 b. Approximate P(S > 55) for = 100 ad λ =. c. Approximate P(X > 0.55) for = 50 ad λ =. d. Show that M has a expoetial distributio ad determie E(M) for = 10 ad λ =. 7. Determie the distributio of X + Y for idepedet variables X ad Y if (give type + parameter or desity fuctio or probability fuctio). a. X ~ Poisso(μ = 3) ad Y ~ Poisso(μ = 4). b. X ad Y are both U(0,1)-distributed. c. X ~ B(m, p) ad Y ~ B(, p). d. X ~ N(0, 81) ad Y ~ N(30, 144). e. X ~ Exp(λ = ) ad Y ~ Exp(λ = 3).

33 -1 Chapter Estimatio.1 Itroductio o estimates ad estimators I accordace with the Classic Statistics approach (see sectio 0.1) we will discuss i more detail what we mea by estimatig a populatio parameter, such as the populatio proportio p, the populatio mea µ ad the populatio variace σ. We will presume that they are fixed, but ukow values (ot variable). I chapter 1 we have oticed how, ituitively, estimates are used, based o radom samples. The sample mea x = 1 x i=1 i is a (poit) estimate of the populatio mea or expectatio μ = E(X), if x 1,, x are the observatios. The sample variace s = 1 (x 1 i=1 i x) is a estimate of the populatio variace σ = var(x) The sample proportio p = x is a estimate of the populatio proportio p (success rate). x is the observed umber of successes, a realizatio of the biomial umber X, which ca be writte as x = i=1 x i, where x i is the 1-0 alterative for each Beroulli trial (1 for success, 0 for failure). So: p = 1 = x i=1 x i Beside these stadard-estimates we ca costruct may differet estimates of other ukow parameters of distributios: ofte the choice of a estimate is made ituitively, but there are also some systematic methods of estimatio to determie estimates the parameters, as will be discussed i sectio.3. If there is a reasoable assumptio of the distributio at had (which ca be assessed by data aalysis of sample results), we are left with the problem of estimatig the ukow parameters. For example, if a ormal distributio is assumed, we ca use x ad s as estimates of μ ad σ. Example.1.1 Suppose, a radom umber geerator produces umbers from the iterval (0, b). We do ot kow the de parameter b, but a sample of four of these umbers, produced by the geerator is available: 4.1, 0.6,.9 ad 8.4. These are the (idepedet ad radomly chose) observatios: they ca be see as the umbers x 1, x, x 3, x 4, from the U(0, b)-distributio:

34 - Sice E(X) = 1 b is the populatio mea, this ukow value ca be estimated by: x = x i = = If the estimate of 1 b is 4.0, the the estimate of b is twice as large: 8.0 = x. But the largest observatio, 8.4 is larger tha this estimate, provig that this estimate is iadequate. Searchig for a alterative estimate we could choose the largest observatio, so with these measuremets we would fid max(x 1, x, x 3, x 4 ) = 8.4, as a alterative estimate of b. We kow that b is at least 8.4. A estimate is a umerical value that aims to be a good idicatio of the real value of a ukow populatio parameter: usually the estimate is give by a formula (e.g. the mea or the maximum of the sample variables), that ca be computed umerically if the sample results are available. I geeral terms we estimate a populatio parameter θ (e.g. µ, σ, p, λ or, i example..1, b) with a fuctio T(x 1,, x ), that depeds (oly) o the observatios x 1,, x, such as T(x 1,, x ) = 1 x i i=1 A fuctio T(x 1,, x ) is a statistic (Dutch: steekproeffuctie). or T(x 1,, x ) = max(x 1,, x ) If a (umerical) estimate is based o the result of a radom sample, oe could try to aswer the questio whether a estimate is good. Related questios are: - How large is the probability that a estimate T(x 1,, x ) substatially deviates from the populatio parameter θ? - What is the effect o deviatios if we icrease the sample size? - If we have differet cadidates for estimates ( or more methods), what is the best? For istace i example..1: is the maximum a better estimate of b tha twice the mea? To aswer this kid of questios we eed to retur to the probability model of the sample, which is give by the radom variables X 1,, X : idepedet ad all havig the same populatio distributio, that cotais the ukow θ. Defiitio.1. A estimator T of the populatio parameter θ is a statistic T(X 1,.., X ) A estimate t is the observed value T(x 1,, x ) of T. Note the differece i otatio: t = T(x 1,, x ) is the estimate (a umerical value) ad T = T(X 1,.., X ) is a estimator, a radom variable havig a distributio. Both are referred to as statistic, a fuctio of the (umerical) observatios x 1,, x or the variables X 1,.., X. Example.1.3 From a large batch of digital thermometers are arbitrarily chose ad assessed: the observed ad the real temperature should ot differ more tha 0.1 degrees. A model of the observatios cosists of the idepedet alteratives X 1,, X, where the success probability p = P(X i = 1) is the probability that a thermometer is approved (differece < 0.1).

35 -3 X i = { 1 if the ith thermometer is approved 0 if the i th thermometer is ot approved p is the (ukow) proportio of approved thermometers i the whole batch, with 0 p 1. Though the samplig is evidetly without replacemet, we will cosider the X i`s to be (approximately) idepedet, implicitly assumig that we have a relatively small sample take from a (very) large batch. Of course we will deote i this case p as ukow parameter, istead of the geeral otatio θ. Suppose = 10, ad the observed values of the alteratives X 1,, X 10 are 0, 1, 1, 0, 1, 0, 0, 1, 1, 0 With the sample proportio i mid we choose estimator of p: T(X 1,.., X 10 ) = 10 i=1 X i 10 Ad usig the observed results: estimate of p: t = T(x 1,.., x 10 ) = 5 10 As before we will use the compact otatio p as estimate, so p = 0.5. I example..1 we have T 1 = X ad T = max(x 1, X, X 3, X 4 ) as two differet estimators of parameter b: t 1 = x = 8.0 ad t = max(x 1, x, x 3, x 4 ) = 8.4 are the correspodig estimates. The examples above showed that: A estimator T = T(X 1,.., X ) is a radom variable that ca take o may values accordig its distributio. After executig the sample i practice the estimate t = T(x 1,.., x ) is oe of these values. (a realizatio of T). Aother executio of the (same) sample will provide aother estimate. For oe parameter several estimators ca be chose. For a fuctio T = T(X 1,.., X ) to be a estimator the oly coditio is that it should be computable, that is: it should attai a real value if the observed values x 1,.., x which are substituted i the fuctio T: t = T(x 1,.., x ). The broad defiitio of estimator does ot mea that we just ca choose ay estimator: i this chapter we will see that there are several criteria to choose the best. Furthermore we ote that some distributios have more tha oe ukow parameters, such as µ ad σ i the ormal distributio. I that case θ is a vector of parameters. Example.1.4 The IQ of a UT-studet (X) is modelled as a ormally distributed variable. µ ad σ, the expected ( mea ) IQ of a arbitrary UT-studet ad the variace of the IQ`s of UT-studets, are ukow populatio parameters: θ = (µ, σ ). A radom sample of 0 UT-studets is subjected to a stadard IQ-test ad the results are summarized as follows: = 0, x = 115. ad s = (x, s ) = (115., 81.1) is a pair of estimates of (µ, σ ). These estimates ca be used to compute

36 estimates of probabilities, e.g. the probability of highly gifted studet (IQ > 130): P(X > 130) = P (Z 130 μ ), where µ ad σ still are ukow, though we have estimates. σ -4 A estimate of P(X > 130) is P (Z ) 1 Φ(1.64) = 5.05% But how good are the estimates we used? To aswer this questio we retur to the probability model of the observed IQ`s: a probability model of the radom sample: X 1,.., X 0 are idepedet ad all have the same distributio as X, so a N(μ, σ )-distributio with ukow μ ad σ. The estimator of μ is the sample mea X = X 1+ +X 0, that, accordig property 0.. has a 0 N (μ, σ )-distributio. Cosequetly we ca coclude: 0 1) E(X) = µ, meaig that the value of X o average equals µ: o average implies the frequecy iterpretatio if we cosider may repetitios of the same radom sample. Therefore we will call X a ubiased estimator of μ. (Dutch: zuivere schatter). ) The variatio of X is expressed i var(x) = σ, so the variace of X is a factor 0 smaller tha the variace of the populatio (σ ): how large the variace is, is ukow, but σ s ca be estimated by = 81.1 = ) Accordig to the empirical rule, the probability that X attais a value i the iterval σ (μ, μ + σ ) is about 95%, where μ ad 0 0 σ are ukow, σ but µ is estimated by x = 115. ad by s = : σ (μ, μ + σ ) (110., 119.), a iterval estimatio of μ. 0 0 We foud that X is a ubiased estimator of µ, that attais values aroud μ. The deviatios decrease, as the sample size icreases. 0 A estimator T = T(X 1,, X ) of the populatio parameter θ, based o a radom sample take from the populatio distributio ca either be ubiased or ot. Defiitio.1.5 T is a ubiased estimator of θ if E(T) = θ. If T is ot ubiased, the the differece E(T) θ is the bias (Dutch: ozuiverheid) of the estimator: if E(T) > θ, the T is said to (structurally) overestimate θ ad if E(T) < θ, the T is said to uderestimate θ. If a radom sample is take from a o-ormal distributio with expectatio μ ad variace σ, the we caot state that X has a N (μ, σ )-distributio. This is, accordig to the CLT, oly approximately true, if is large.

37 -5 But for all we ca state: X is a ubiased estimator of μ sice E(X) = μ var(x) = σ : the variatio of X decreases as the sample size icreases. These properties of the sample mea are show graphically below: μ X var( X) The estimator S = 1 (X 1 i X) i=1 is a ubiased estimator of σ. This explais why we itroduced the formula of the correspodig formula for s 1 with a factor istead of 1. 1 To prove the ubiased-ess of S we will first expad the summatio (X i X) i=1 : (X i X) = [X i X i X + X ] i=1 i=1 = X i X i X i=1 i=1 + X i=1, where X i X = X X i i=1 i=1 = X X = X i X + X i=1 = X i X i=1 To compute the expectatio of (X i X) i=1, that is, express it i σ, we will use the variace formula var(x) = E(X ) (EX), so E(X ) = σ + μ Applied to the sample variace : var(x) = E (X ) (EX), so E (X ) = σ + μ E ( (X i X) i=1 ) = E ( X i X ) i=1 = E(X i ) E (X ) i=1 = [var(x i ) + (EX i ) ] i=1 [ σ + μ ]

38 -6 = σ + μ σ μ = ( 1)σ So: E(S ) = 1 1 E ( (X i X) ) = 1 1 ( 1)σ = σ. i=1 We showed that S is a ubiased estimator of σ. I sectio 0. we reported the Probability Theory result, that ( 1)S σ has a Chi-square distributio, if the radom sample is draw from a ormal distributio. This property will be coveiet whe costructig cofidece itervals ad tests o the variace σ. The third most frequetly used estimator is the oe to estimate the populatio proportio p: the sample proportio p = X, where X is the umber of successes i Beroulli-trials, or: if a proportio p of the populatio has a specific property, the X is the umber of elemets with this property i the radom sample: X has a B(, p)-distributio. p is a ubiased estimator of p sice E(p ) = E ( X ) = 1 E(X) = 1 p = p The variability of p (aroud p) decreases as the sample size icreases: var(p ) = var ( X ) = (1 ) var(x) = 1 p(1 p) p(1 p) = Summarizig the discussed properties i this sectio: Property.1.6 (Frequetly used estimators ad their properties) Populatio Variable X has a expectatio μ ad variace σ Alterative distributio P(X i = 1) = p P(X i = 0) = 1 p Populatio parameter θ Radom sample Estimator T Ubiased if E(T) = θ Variace var(t) μ X 1,, X X Yes, E(X) = μ σ X 1,, X S Yes, E(S ) = σ --- p X 1,, X X = X i p = X p(1 p) Yes, E(p ) = p X~B(, p) σ The estimators p ad X oly have a workable distributio if is sufficietly large ( 5): accordig to the CLT we have approximately: p ~ N (p, p(1 p) ) ad X ~ N (μ, σ ), where p ad σ are ukow. Oly if we kow (ca assume) that X, the populatio variable, is ormally distributed, we ca use a exact distributio of the sample mea: X ~ N (μ, σ ), for ay.

39 -7 Note.1.7 The variace of the sample variace S is omitted i the table above, but it ca be determied, if E(X μ) 4 exists: var(s ) = 1 [E(X μ)4 3 1 σ4 ] From this it follows that lim var(s ) = 0. For the special case of a radom sample, draw from the N(μ, σ )-distributio, we fid: E(X μ) 4 = 3σ 4 ad var(s ) = σ4 1 The latter formula ca be verified, usig property 0..9.b that ( 1)S ~ χ σ 1, so var ( ( 1)S ) = ( 1) or σ ( 1) σ 4 var(s ) = ( 1) or var(s ) = σ4 1 Below we will discuss some of the well-kow distributios, showig that ukow parameters ca ofte be estimated i a ituitive way. If X has a Poisso distributio with ukow parameter μ, we kow that E(X) = μ. So X is a ubiased estimator of μ, based o a radom sample X 1,, X. Suppose a radom sample delivers a mea x =.4, the.4 is the estimate of μ. Now that we have a estimate we ca also give estimates for probabilities w.r.t. X: sice P(X = 0) = e μ, the estimate of this probability is e.4 9.1%. If X has a geometric distributio with parameter p, we have E(X) = 1 p : X is a ubiased estimator of E(X), so p ca be estimated by 1. If, i a radom sample X of X, we eeded a mea of 10 trials to get a success, 1 is a estimate of p. 10 It ca be show that this estimator is ot ubiased: E (X 1 ) p. For = 1 it is easy: E (X 1 ) = E( 1 X ) = 1 x=1 (1 x p)x 1 p = p + 1 (1 p)p +.. > p. For > 1 we ca apply the egative biomial distributio of i=1 X i similarly. If a radom sample of radom umbers X i, take from the uiform distributio o the iterval (a, b) with both parameters a ad b ukow, is available, the a ad b ca be estimated by mi(x 1,, X ) ad max(x 1,, X ). Example.1.8 A radom umber chose from a iterval (0, θ) with ukow θ has a uiform distributio o the iterval. Give a radom sample X 1,, X 9 of ie of these radom umbers, the maximum va X 1,, X 9 is a estimator of θ. Is this maximum a ubiased estimator? Ituitively the aswer is: o, the maximum uderestimates θ, because for all X i we kow: X i θ We wat to show that the expectatio of the maximum is less tha θ.

40 -8 I the graph the distributio fuctio F(x) is sketched as a area: F(x) = P(X x) = x θ, 0 x θ We ca use this to fid the distributio fuctio of the maximum: P(max(X 1,, X 9 ) x) id. = P(X 1 x) P(X 9 x) = ( x 9 θ ), for 0 x θ The derivative of this distributio fuctio is the desity fuctio of the maximum: f max(x1,,x 9 )(x) = 9x8, if 0 x θ (ad the desity is 0 outside the iterval [0, θ]) θ9 So: E(max(X 1,, X 9 )) = x f θ max(x 1,,X 9 )(x)dx = x 9x8 x=θ 9x10 dx = 0 θ 9 10θ 9 9 = θ < θ x=0 10 So max(x 1,, X 9 ) is ot a ubiased estimator of θ: the bias is 9 θ θ = 1 θ Of course the discussio above ca easily be exteded to two idepedet samples, used i practice to compare the cetres or the variability of two populatios. For the first populatio with mea μ X ad variace σ X we have a radom sample X 1,, X 1 ad for the secod populatio with mea μ Y ad variace σ Y we have a radom sample Y 1,, Y. The estimator for the differece i meas, μ X μ Y, is straightforward: X Y is a ubiased estimator with variace var(x Y) id. = var(x) + var(y) = σ X + σ Y. 1 If the variaces are ukow we ca use the sample variaces, S X ad S Y respectively, to estimate the var(x Y) by S X + S Y. 1 A special case of a two samples problem we will ecouter i chapter 5: two idepedet samples, both draw from ormal distributios with equal variaces: σ X = σ Y = σ. What is the best way to estimate the commo variace σ of both populatios? The average S X +S Y is a ubiased estimator of σ, but ot the ubiased estimator with the smallest variace, if we cosider all (liear) ubiased estimators of type T = as X + bs Y. T is ubiased if b = 1 a, so the variace ca be expressed i a: var(t) = a var(s X ) + (1 a) var(s Y )

41 -9 I ote.1.7 we foud that var(s ) = σ4, so var(t) = 1 a σ4 + (1 1 1 a) σ4, which ca 1 be miimized as a fuctio of a: the derivative is a σ4 1 1 σ4 (1 a) = 0 if a = Sice the secod derivative is greater tha 0 we foud that T is ubiased ad havig the smallest variace if T = 1 1 S 1 + X + 1 S 1 + Y Property.1.9 (pooled sample variace) If X 1,, X 1 is a radom sample from a ormal distributio ad Y 1,, Y is a radom sample from a ormal distributio with the same variace σ, the the ubiased estimator as X + (1 a)s Y has the smallest variace if a = Notatio: the pooled sample variace S p or S i case of equal variaces is S = S X S Y Compariso ad limitig behaviour of estimators Estimators are better as the deviatios betwee the estimates ad parameters are o average smaller. Both the bias ad the variace of the estimator play a role i deviatios of T w.r.t. θ. But how ca we quatify the mea distace betwee the estimator T ad the parameter θ? Aalogously to the itroductio of the measure of variatio, that is var(x) = E(X μ), we will ot cosider the mea of the absolute differeces T θ, but the mea of the squared distaces (T θ). Defiitio..1 The Mea Squared Error of a estimator T of the parameter θ is: E(T θ). Short otatio: MSE, MSE(T) (Dutch: verwachte kwadratische fout) Note that MSE(T) = E(T θ) is ot the same as ar(t) = E(T ET), but if the estimator is ubiased, the mea squared error equals the variace of the estimator T: if E(T) = θ, the we have: E(T θ) = E(T ET) = var(t) If a estimator is ubiased it oly guaratees that estimates are o average (i case of repeated samples) close to θ, but it does ot guaratee that a give estimate is close to θ. The Mea Squared Error is used to compare estimators: Suppose T 1 (X 1,, X ) ad T (X 1,, X ) are both estimators of the parameter θ, the: T 1 is better tha T if the mea squared error of T 1 is less tha the mea squared error of T

42 -10 MSE(T 1 ) < MSE(T ) (Usually both MSE`s are expressed i the ukow θ ad the iequality holds for all possible values of θ. Formally T 1 is better tha T, if the iequality holds for at least oe value of θ ad the equality holds for other values.) If both estimators are ubiased, it is sufficiet to compare the variaces: If T 1 ad T are both ubiased, the T 1 is better tha T if var(t 1 ) < var(t ). If the estimators are ot ubiased, the T 1 is better tha T if MSE(T 1 ) < MSE(T ) Computatio of the MSE is simplified by the followig property: MSE(T) depeds o ET ad var(t): Property.. MSE(T) = (ET θ) + var(t) Proof: sice T θ = (T ET) + (ET θ), we have E(T θ) = E[ (T ET) + (T ET)(ET θ) + (ET θ) ] = E(T ET) + (ET θ)e(t ET) + (ET θ) (ET are θ are fixed umbers!) = var(t) (ET θ). This property reveals that the Mea Squared Error ca be split ito the bias ad the variace of T: MSE(T) = (ET θ) + var(t) Mea Squared Error of T = (bias of T) + variace of T Note that for a ubiased estimator, ET = θ ad thus MSE(T) = var(t). Example..3 Based o a radom sample X 1,, X 0, draw from a populatio with ukow expectatio μ ad variace σ, we cosider two estimators: T 1 = 1 10 X 10 i=1 i is the mea of the first 10 observatios. 0 T = 1 X 0 i=1 i is the mea of all 0 observatios. Both estimators are sample meas, so they are ubiased: the bias is 0, so we ca compare the variaces istead of the MSE`s. Sice var(x) = σ σ, we fid: = var(t 0 ) < var(t 1 ) = σ, so T 10 is better tha T 1. Example..3 illustrates a simple, ituitive rule: the larger the sample size is, the better the sample mea estimates µ. This rule applies to may families of estimators. Example..4 I example.1.1 we ituitively chose two methods to estimate the parameter θ of the U(0, θ)-distributio if four radom umbers from the iterval are available. But which estimatio method is the best?

43 -11 T 1 = X, where X is the mea of four radom umbers X 1, X, X 3 ad X 4, or T = max (X 1, X, X 3, X 4 ) We will compare the mea squared errors of both estimators usig the characteristics of the uderlyig U(0, θ)-distributio: if X ~ U(0, θ), the μ = E(X) = θ ad σ = var(x) = θ 1 T 1 : E(T 1 ) = E( X) = E(X) = E(X) = θ, so T 1 is a ubiased estimator of θ. The MSE(T 1 ) = var(t 1 ) = var(x) = var(x) = 4 σ 4 = θ 1. T = max(x 1, X, X 3, X 4 ): we will first determie the distributio of T : the we will determie E(T ) e var(t ). Similar to example.1.7 we fid: f T (x) = 4x3, if 0 x θ (ad f θ 4 T 1 (x) = 0, otherwise). θ 0 E(T ) = x 4x3 dx θ 0 θ 4 E(T ) = x 4x3 dx = 4x5 x=θ 5θ 4 4 = x=0 5 x=θ 4 = 6 θ, = 4x6 θ 4 6θ 4 x=0 so var(t ) = E(T ) (ET ) = 75 θ. θ ad MSE(T ) = (ET θ) + var(t ) = ( 4 5 θ θ) + 75 θ = 1 15 θ. I coclusio: θ = MSE(T 1 1) > MSE(T ) = θ (for all θ > 0), so the estimator 15 T = max(x 1, X, X 3, X 4 ) is better tha T 1 = X. Example..4 shows that a estimator that is ot ubiased ca be better tha a ubiased estimator. This pheomea is illustrated i the followig example as well. Example..5 The model of iterarrival times (i secods) of customers i a electroic system is give by the expoetial distributio with ukow expectatio E(X) = 1 (ad var(x) = 1 λ λ ). Based o a radom sample of 5 observed iterarrival times researchers wat to estimate E(X) as accurate as possible. The estimator at had is the sample mea X = 1 5 : this estimator is 5 ubiased ad has a small variace 1 5λ. i=1 X i If we would choose c X with c < 1 as a estimator, the this estimator will have a larger bias (MSE icreases), but the variace will be smaller (MSE decreases). So, for which value of c > 0 is the estimator T = c X the best? E(T) = E(c X) = c E(X) = c λ var(t) = var(c X) = c var(x) = c 1/λ = c 5 5λ MSE(T) = (ET 1 λ ) + var(t) = (c 1) 1 + c = [(c λ 5λ 1) + c ] 1 5 λ The mea squared error has a miimum value, if we determie the smallest value of g(c) = (c 1) + c 5

44 -1 Sice g (c) = (c 1) + 5 c = 0 if c = : this is a miimum, sice 5 6 g ( 5 ) > 0. 6 So, the best estimator of µ is T = 5 X = 1 5 X 6 6 i i=1 I sectio.1 we oticed that the commo estimators X, S ad p = X are ubiased estimators of the populatio parameters μ, σ ad p, respectively. The estimators also have the pleasat property that the variace approaches 0 if the sample size approaches ifiity. Because of these properties the estimators are called cosistet. Defiitio..6 (cosistet estimator) A estimator T = T(X 1,, X ) of the parameter θ is cosistet if P( T θ > c) = 0 for each c > 0. lim Note that we cosider T to be a series of estimators T(X 1 ), T(X 1, X ), T(X 1, X, X 3 ),. Property..7 For a estimator T of θ we have: a. If lim MSE(T) = 0, the T is a cosistet estimator. b. If T is ubiased ad lim var(t) = 0, the T is cosistet. c. If lim E(T) = θ (T is asymptotically ubiased ) ad lim var(t) = 0, the T is cosistet. Proof: a. If lim MSE(T) = 0, it follows from Markov s iequality P( Y c) E(Y ) c, for ay c > 0, by substitutig Y = T θ, that P( T θ c) E(T θ) c MSE(T) So lim P( T θ c) lim = 0. c = MSE(T) c. b. ad c.: applyig property.., MSE(T) = (ET θ) + var(t): lim MSE(T) = 0, if E(T) = θ ad lim var(t) = 0 or if lim E(T) = θ ad lim var(t) = 0: the a. proves the statemets. The ext cocept is very useful for summarizig data. Defiitio..8 Sufficiet statistic (Dutch: voldoede steekproeffuctie ). Suppose observatios X 1, X,, X have a joit distributio depedig o some parameter θ. The statistic V = V(X 1, X,, X ) is called a sufficiet statistic if the coditioal distributio of X 1, X,, X give V = v, does ot deped o θ. With the followig theorem Rao ad Blackwell showed that ay estimator T ca be improved by meas of a sufficiet statistic V.

45 -13 Theorem..9 (Rao ad Blackwell). If T is a estimator of θ ad V is a sufficiet statistic of θ the the estimator T ca be improved by meas of the coditioal expectatio T = E(T V), i the followig way: (1) E(T ) = E(T), () var(t ) var(t) Proof: Because of the defiitio of coditioal expectatio we have: E(T) = E( E(T V) ) = E(T ). Furthermore the equality var(t) = E(var(T V)) + var( E(T V) ) holds. Sice var( E(T V) ) = var(t ) ad E(var(T V)) 0 the secod statemet follows rather easily. Note that a statistic T = E(T V) is a fuctio of V. So the meaig of the theorem is that you ca reduce your data X 1, X,, X to a sufficiet statistic without losig iformatio or performace. The ext example illustrates this approach. Example..10 Suppose a group of 30 patiets received some treatmet i a hospital. After the treatmet the hospital registers for each patiet whether the patiet has bee cured (outcome is 1) or ot (outcome is 0). The data are as follows: We regard the group of 30 patiets as a radom sample, draw from some populatio of patiets. We wat to estimate the probability p that a (ew) arbitrary patiet from the same populatio will be cured whe he/she receives the treatmet. The model of the observatios ca be give by the alteratives (1-0 variables) X 1,, X 30 which are idepedetly takig o the values 1 or 0 with (ukow) probabilities p as 1 p. The: 30 X = X i i=1 is the umber of cured patiets: X has a B(30, p) distributio 30 Ad the sample proportio p = X = 1 X i=1 i is a ubiased ad cosistet estimator of p: the observed value of p is 15 30, i this case. Ca we use X = 15 i stead of the values of the X i`s,

46 -14 without loss of iformatio? I other words: is p = X a sufficiet estimator of p? We will first cosider the joit distributio of X 1,, X 30 : P(X 1 = 1, X = 0, X 3 = 1,, X 30 = 0) = p x (1 p) x = p 15 (1 p) 15. The coditioal distributio of these X i`s, give X = 15 is: P(X 1 = 1, X = 0, X 3 = 1,, X 30 = 0 X = 15) = P(X 1 = 1, X = 0, X 3 = 1,, X 30 = 0 ad X = 15) P(X = 15) = P(X 1 = 1, X = 0, X 3 = 1,, X 30 = 0) P(X = 15) = p15 (1 p) 15 1 ( 30 = 15 ) p15 (1 p) 15 ( ) Note that for a geeral outcome x of X this coditioal probability is 1 ( 30 x ): it does ot deped o the values of the p, so X, the total umber of cured patiets, is a sufficiet statistic for p. 30 The data X 1, X,, X 30 ca hece be summarized by meas of X = (or by meas of p = X ), without loss of iformatio. 30 i=1 X i MVU estimators May times oe prefers estimators that are ubiased. Withi this class of ubiased estimators it is iterestig to search for estimators with miimum variace. We eed the followig two defiitios for our theorem. Defiitio..11 Miimum Variace Ubiased (MVU) estimators A estimator V is called a MVU estimator of θ if V is ubiased ad var(v) var(t) holds for a arbitrary ubiased estimator T. Defiitio..1 Complete statistic (Dutch: volledige steekproeffuctie) A statistic V is called complete, if E( h(v) ) = 0 for all values of θ implies that h 0 (h 0 meas h is the ull fuctio ).

47 -15 Now we are ready to state the followig theorem. Theorem..13 (Lehma ad Scheffé) Let V be a statistic that is sufficiet ad complete. The for each ubiased estimator T of θ the followig holds: T = E(T V) is the uique MVU estimator of θ. Proof: Suppose T is a ubiased estimator of θ. The the class of ubiased estimators is ot empty. Accordig to the theorem of Rao ad Blackwell the estimator T = E(T V) is ubiased as well ad its variace var(t ) is ot larger tha var(t). The clue is that there is oly oe estimator of the type T = E(T V). Suppose there is aother ubiased estimator W, which ca be improved by W = E(W V). Note that both T ad W are fuctios of V. Hece the differece is a fuctio of V, T W = h(v). Because of the ubiasedess of both T ad W we get: E(T W ) = E( h(v) ) = 0 for all values of θ. So from the completeess we coclude that T ad W are idetical. There is oly oe ubiased estimator that is a fuctio of V. Example..14 The estimator p is the MVU estimator of p Let us retur to our simple data set of curig 30 patiets, give i Example..10. The sufficiet statistic X = i X i is also complete. Let us show this for arbitrary sample size. Sice X has a biomial distributio, the equality E(h(X)) = 0 ca be expressed as follows: E(h(X)) = h(x) ( x=0 x ) px (1 p) x = 0 (for all values of p) Let us rewrite this as fuctio of a ew parameter θ = p 1 p real umber if 0 < p < 1. The equatio ca be rewritte as follows:. Note that θ may attai ay positive h(x) ( x ) θx (1 p) = 0 x=0 h(x) ( x ) θx = 0 x=0 The left had side is a polyomial i θ of degree. The polyomial should be zero for all (positive) values of θ. We coclude that the coefficiets h(x) ( ) are all zero, so all values h(x) x

48 -16 are zero. Hece h is the ull fuctio ad X is a complete statistic. There exists a ubiased estimator of p, this is p = X. This estimator is already a fuctio of the sufficiet ad complete statistics X. Hece the improvemet E(p X) = p turs out to be the same estimator: p is the MVU estimator of p. Note..15 I example..14 we succeeded to derive a sufficiet ad complete statistics for a populatio proportio p, but usually this is oly possible i simple cases. I may situatios there exists o MVU estimator. I writte examiatios of this course verificatio of completeess will ot be asked for, except for the biomial case i this sectio. Sufficiecy ca be a topic oly i case of simple discrete data..3 Method of Momets ad Method of Maximum Likelihood I the first two sectios of this chapter we discussed commoly used estimators, their properties ad a criterio to compare the performace of estimators: the mea squared error. There are several methods to, systematically, derive estimators of a ukow parameter θ of a specified distributio. Method of Momets The first method uses estimators of the momets μ k = E(X k ), k = 1,, 3, of a distributio: Defiitio.3.1 Based o a radom sample X 1,, X, draw from a distributio, the momet estimator of the k th momet μ k is M k = 1 X i k. Of course, the momet estimator of the first momet μ 1 = E(X) is M 1 = X. The sample mea is a ubiased ad cosistet estimator of the first momet: this is true for each momet estimator M k of the k th momet, as ca be easily verified (provided that the momet exists). i=1

49 -17 The approach of the method of momets is simple: - Express a probability or expectatio i the momets μ 1, μ,, μ k, resultig i a fuctio g(μ 1, μ,, μ k ) of the first k momets. - Based o a radom sample of the distributio the momet estimator of g(μ 1, μ,, μ k ) is g(m 1, M,, M k ). Example.3. Give a radom sample X 1,, X of a expoetial distributio with ukow parameter λ, fid the momet estimators of the parameter λ, the expectatio, the variace ad the probability P(X > 10). - E(X) = 1 1, so λ =. The momet estimator of λ is 1 = 1 λ E(X) M 1 X - E(X) = 1 1, so the momet estimator of E(X) is = X. λ 1/X - var(x) = E(X ) (EX), so the momet estimator of var(x) is M M 1 = 1 X i X Note that i sectio.1 we foud the equality i=1 X i X = i=1(x i X), so that M M 1 = 1 (X i=1 i X) = 1 S is ot a ubiased estimator of σ = 1 λ. - The momet estimator of P(X > 10) = e 10λ = e 10/µ is e 10/X The correspodig momet estimates ca be computed if we have a realizatio x 1,, x of the radom sample: just replace the estimators M k by the estimates m k = 1 x i=1 i k. It should be oted that the momet estimators are ot ecessarily uique: sice for the expoetial distributio var(x) = 1 λ = (1 λ ) = (EX), X is a alterative momet estimator for the variace. i=1 The Method of Maximum Likelihood We will use the same startig poit as before, that is, a radom sample X 1,, X, draw from a distributio with ukow parameter θ. First we will discuss the approach of the method of maximum likelihood for a discrete distributio with a ukow parameter, e.g. a geometric distributio with ukow success rate p or a Poisso distributio with parameter µ. Some realizatio x 1,, x of the radom sample is available: the probability of this realizatio is give by the joit probability fuctio: P(X 1 = x 1,., X = x ) id. = P(X 1 = x 1 ) P(X = x ) = P(X i = x i ) i=1 This probability depeds o the observed values x 1,, x àd the value of the parameter θ. If we cosider x 1,, x to be real, fixed umbers, the the value of θ, for which the joit

50 -18 probability attais its maximum value (supremum) is called the maximum likelihood estimate of θ. Example.3.3 Give the realizatio x 1,, x of a radom sample, draw from a Poisso distributio with ukow mea µ, the joit distributio is a fuctio L of µ (> 0): i=1 L(μ) = P(X i = x i ) = μ x i e μ = μ x 1+ +x x i! x 1! x! e μ i=1 Note that for fixed values of x 1,,, x the graph of L(µ) is similar to the graph of f(µ) = aμ b e μ, for some real values a, b > 0 as show i the graph. For which value does the joit probability L(μ) attai its largest value? d L(μ) = 1 [( x dμ x 1! x! i)μ ( x i ) 1 e μ μ ( x i ) e μ ] = μ( xi) 1 e μ [( x x 1! x! i ) μ] = 0, if ( x i ) μ = 0 or μ = 1 x i = x. This value of µ determies a maximum of L(μ), sice L chages its sig from positive to egative at μ = x: therefore x is said to be the maximum likelihood estimate of µ. Returig to the radom variables i the model we foud that the sample mea X = 1 i=1 X i is the maximum likelihood estimator (mle) of the parameter µ of the Poisso distributio. Now that we foud a estimate/estimator of µ, we ca also fid maximum likelihood estimates or estimators for probabilities or expectatios: - The probability P(X = 0) = e μ ca be estimated by e x ad its mle is e X. - The secod momet E(X ) = var(x) + (EX) = μ + μ ca be estimated by the umerical value x + x ad the mle of E(X ) is X + X. Oe ca try to verify whether these maximum likelihood estimators are ubiased ad cosistet. E.g. the mle X is a ubiased ad cosistet estimator of µ, but X + X is ot a ubiased estimator of E(X ), sice E (X + X ) = E(X) + E (X ) = μ + [var(x) + (EX) ] = + 1 μ + μ > μ + μ = E(X ) We ca see that the mle of E(X ) is asymptotically ubiased, but the cosistecy is ot easily verified: we could show, for example, that lim var (X + X ) = 0. I example.3.3 we determied the joit probability as a fuctio of the ukow parameter θ (beig µ i the example), ad the observed values x 1,, x :

51 -19 L(θ) = L(θ; x 1,, x ) = P(X i = x i ) i=1 is the likelihood fuctio Of course, we have to maximize L o its domai: i the example L(μ) has a domai (0, ). Verify that maximizig the log-likelihood fuctio l[l(θ)], i stead of L(θ), ca simplify the aalysis. Sice l (L) is a icreasig fuctio i L a maximum of l (L) is attaied i the same value of θ as the maximum of L. For cotiuous variables we will use the joit desity fuctio as the likelihood fuctio L(θ) sice the joit desity is a measure for the probability of observig certai values i the sample. Defiitio.3.4 Maximum Likelihood Estimators If X 1,, X is a radom sample of a radom variable X ad X has a discrete distributio with probability fuctio f θ (x) = P θ (X = x) or a cotiuous distributio with desity fuctio f θ (x) ad the ukow parameter θ attais values i the parameter space Θ the L(θ) = f θ (x i ) i=1 is the Likelihood fuctio, θ = θ (x 1,, x ) is the maximum likelihood estimate of θ, if L(θ) attais its maximum value o Θ at θ = θ, θ = θ (X 1,, X ) is the maximum likelihood estimator (mle) of θ ad g(θ ) is the maximum likelihood estimator of g(θ), for ay fuctio g of θ is. Note that both the maximum likelihood estimate ad the maximum likelihood estimator are deoted as θ : the meaig follows from the cotext. E.g. μ =.9 may be the observed sample mea ad μ = 1 X i=1 i is the estimator i the probability model of the sample. Example.3.5 The mle of the parameter λ of the expoetial distributio Recall that the expoetial desity fuctio is f(x) = λe λx, x 0. The, give the result of a radom sample x 1,, x, the likelihood fuctio is: L(λ) = f(x i ) = λe λx 1 λe λx = λ e λ x i, for λ > 0 (Θ = (0, )) i=1 Simplifyig the determiatio of the maximum via the log-likelihood fuctio, we have: l L(λ) = l(λ e λ x i) = l ( λ) λ Searchig for the maximum of l L(λ): L(λ) attais its maximum value at λ = So: x i i=1 x i (λ > 0) d l L(λ) = dλ λ i=1 = 0 λ = i=1 x i i=1 x i, sice d l L(λ) = < 0 ( > 0, λ > 0). dλ λ

52 - λ = i=1 x i -0 = 1 is the maximum likelihood estimate of λ. x is the maximum likelihood estimator (mle) of λ. - λ = 1 X - Sice E(X) = 1, X is the mle of E(X): ubiased ad cosistet. λ - var(x) = 1 λ : X is the mle of var(x). Note that E (X ) = var(x) + (EX) = var(x) asymptotically ubiased. + ( 1 λ ) = +1 1 λ : X is ot ubiased, but it is Example.3.6 Referrig to example..4 (ad example.1.1) we foud two differet estimators of the parameter θ of the U(0, θ)-distributio, based o a radom sample draw from this distributio. Geeralizig to sample size (istead of 4): T 1 = X ad T = max (X 1,, X ). - T 1 = X is the momet estimator of θ, sice E(X) = 1 θ or θ = E(X). - T = max (X 1,, X ) is the mle of θ: f θ (x) = 1 for all 0 x θ (where θ > 0) θ L(θ) = f θ (x i ) i=1 = 1 θ 1 θ = 1 θ Note that the coditio θ > 0 is ot adequate, sice x i θ for each value of i: Cosequetly the coditio for maximizig L(θ) is θ max(x 1,, x ). L(θ) is a decreasig fuctio i θ: the fuctio attais its maximum i the smallest possible value of θ: max(x 1,, x ). θ = max(x 1,, X ) is mle of θ. Similar to example..4 we ca determie the expectatio ad variace of T 1 ad T : E(T 1 ) = θ ad var(t 1 ) = 4θ 0, if : T 1 1 is ubiased ad cosistet. E(T ) = θ θ, if ad var(t +1 ) = (+)(+1) θ 0, if : T is asymptotically ubiased ad cosistet. Of course, a distributio ca have more tha oe ukow parameter, such as is the case for the ormal distributio. The, applyig the method of maximum likelihood the parameter θ should be iterpreted as a vector of, i geeral, k parameters: θ = (θ 1,, θ k ). The likelihood fuctio is ow a multivariate fuctio that must be maximized o the parameter space of all parameters. The resultig maximum leads to θ = (θ 1,, θ k).

53 -1 We will ow apply the described extesio of the method of maximum likelihood to the ormal distributio: θ = (μ, σ ). The mle of θ is θ = (μ, σ ) (otatio: we will use σ istead of σ ). Example.3.7 The mle of the parameters μ ad σ of the ormal distributio. The desity of the ormal distributio is f θ (x) = 1 (x μ) πσ e σ, x R, where θ is the pair (μ, σ ) ad its parameter space is Θ = R R + (R + = (0, )) L(μ, σ ) = f θ (x i ) i=1 = (πσ ) 1 e 1 σ (x i μ), (μ, σ ) R R + l L(μ, σ ) = 1 l(πσ ) 1 σ (x i μ), (μ, σ ) R R + Note that above σ is the secod variable, ot σ (we might replace σ by, for example, y). I the extrema both partial derivatives must be 0: 1. Solvig l L(μ, μ σ ) = 0 we fid + 1 (x σ i μ) = 0 or (x i μ) = 0, sice σ > 0. So i=1 x i μ = 0. Coclusio: μ = 1 x i=1 i = x. This is true for all values of σ > 0. For arbitrary σ we have: μ l L(μ, σ ) = μ [ 1 σ (x i μ)] = σ < 0 So, for arbitrary (fixed) σ, L(μ, σ ) attais its maximum value for μ = x.. We ca ow cotiue maximizig L(μ, σ ), settig μ = x. The secod partial derivative is: (σ ) l L(x, σ ) = σ + 1 (σ ) (x i x) = 1 (σ ) [ σ + (x i x) ] = 0, if σ = 1 (x i x) i=1 i=1 ad (σ ) l L(x, σ ) > 0, if σ < 1 (x i x) i=1 l L(x, (σ ) σ ) < 0, if σ > 1 (x i x), so L(x, σ ) attais, for ay fixed value of x, its maximum value for σ = 1 (x i=1 i x).

54 - Combiig 1. ad. we ca coclude that L(μ, σ ) attais its largest value o Θ i θ = (μ, σ ) = (x, 1 (x i x) ), sice l(l(μ, σ )) l(l(μ, σ )) l(l(μ, σ )) holds for all values of μ ad σ. The maximum likelihood estimators of μ ad σ are X ad 1 (X i X) i=1. Note that the mle of σ is ot ubiased sice it cotais the factor 1 istead of the factor 1 i 1 the ubiased variace estimator S. i=1 I example.3.7 we foud the maximum of a fuctio L(x, y) by usig the specific properties of the fuctio i this case. I geeral we will solve the equatios L(x, y) = 0 ad L(x, y) = 0 ad checkig the x y coditio for the solutios to be a extreme ad a maximum: ( L x y ) L x L y < 0 ad L x < 0. A overview of the momet estimators ad the maximum likelihood estimators of the parameters of commo distributio is give i the followig table:

55 -3 Distributio + par. Momet estimator Maximum likelihood estimator Geometric (p) 1 X p = 1 X Biomial (p) X p = X Poisso (μ) X μ = X Expoetial (λ) 1 λ = 1 X X Uiform o (0, θ) X θ = max(x 1,, X ) Uiform o (a, b) a = mi(x 1,, X ), b = max(x 1,, X ) Normal (µ, σ ) X ad 1 X i=1 i X μ = X, σ = 1 (X i X) i=1.4 Exercises 1. X 1, X,, X 10 is a radom sample of a distributio with ukow expectatio μ = E(X) ad ukow variace σ = var(x). Cosider the followig estimators of µ: 1. T 1 = X 1. T = X 1+X 3. T 3 = X 1 + X + + X T 4 = X 1+X + +X a. Express the expectatio ad variace of T i i μ ad σ (i = 1,, 3, 4) b. Compute the Mea Squared Error (NL: verwachte kwadratische fout) of T 1, T, T 3 ad T 4. What is the best estimator? c. Why is 1 10 (X 10 i=1 i μ) ot a estimator of σ?. Two researchers observe idepedetly the same populatio variable: oe recorded m measuremets, the other. The probability model is: X 1, X,, X m (researcher 1) ad Y 1, Y,, Y (researcher ) are idepedet ad all have the same distributio with ukow µ ad variace σ. Notatio of the sample meas: X = 1 X m i=1 i ad Y = 1 Y i=1 i m

56 -4 a. Show that T 1 = 1 (X + Y) ad T = mx+y m+ µ. b. Which of the two estimators (T 1 or T ) is the best? are both ubiased (NL: zuiver) estimators of 3. (Computatios o stock returs) The yearly stock returs (i %) of specific IT-fuds appear to have a ormal distributio. A IT-fud has a ukow expected yearly retur µ > 0 ( the average retur over may years is positive ). We wat to estimate this expected retur o the basis of some observed yearly returs. Experts observed that for similar fuds the stadard deviatio of the returs are usually twice the expected retur, so σ = μ (where µ > 0). I the questios below you may assume that a yearly retur has a N(μ, 4μ )-distributio. a. Sketch the distributio of the yearly returs ad shade the probability of a egative retur. Compute this probability. For the fud IT-plaet 10 observed yearly returs are available. We will cosider these 10 observatios to be a radom sample X 1,, X 10 of yearly returs. We wat to use these observed returs to estimate the expected yearly retur μ of the IT fud IT-plaet. The use of the sample mea is at had, but: Is the mea of the 10 returs the best estimator of the expected retur i this case? To aswer this questio we will cosider the family of estimators T = ax of μ, where a is a (positive) real costat ad X = i=1 X i. b. What distributio does X have? c. For which value of a is T a ubiased (NL: zuiver) estimator of μ? Motivate your aswer. d. For which value of a is T the best estimator of μ? 4. We kow that i a box there are equally may red ad white marbles, but we do ot kow the total umber of marbles: either 4 or 6. Suppose we arbitrarily took out marbles (i oe draw) ad oe of the marbles is red ad the other oe is white. What is the most likely total umber of marbles i the box (4 or 6)? (I other words: what is the maximum likelihood estimator of the total umber?) 5. Fid the maximum likelihood estimator for the followig parameters of the distributio of X, based o a radom sample X 1,, X of X. (Do ot forget to use the log likelihood fuctio for computatioal simplicity). a. The mea μ of the Poisso distributio (P(X = x) = μx x! e μ, x = 0, 1,,...). Verify whether the sample mea X (the maximum likelihood estimator (mle) of µ, see d.) is cosistet ad/or sufficiet (NL: voldoede). b. The success probability p of the geometric distributio (P(X = x) = (1 p) x 1 p, x = 1,, 3,....). Verify for a radom sample of = whether X 1 + X is a sufficiet estimator.

57 -5 c. The variace of the N(10, σ )-distributio. Verify whether S μ=10 = 1 10 (X i=1 i 10) (the maximum likelihood estimator (mle) of σ, see d.) is ubiased ad/or cosistet. d. Show i a. that X is the maximum likelihood estimator (mle) of µ. Fid i b. the mle of p. Show i c. that S μ=10 is the maximum likelihood estimator of σ. 6. X 1,, X is a radom sample of X, which has a ormal distributio with ukow parameters µ ad σ. The stadard ubiased estimator S is ot the same as the maximum likelihood estimator σ, but what is the best estimator? a. Verify that var(s ) = σ4 1, usig the χ -distributio. b. Is S a better estimator for σ tha σ? c. For what real value a R is T = a S the best estimator of σ? 7. The weight of a egg, produced i a large chicke farm, is assumed to be ormally distributed. The parameters μ ad σ are ukow. a. Express the probability that a egg is heavier tha 68.5 gram i the ukow µ ad σ ad the stadard ormal distributio fuctio Φ. Based o a radom sample of = 5 eggs we foud the followig (stadard) estimates of µ ad σ, usig our calculator: x = 56.3 gram ad s = 7.6 gram. b. Determie the maximum likelihood estimate of the probability that a egg is heavier tha 68.5 gram. 8. (Liear estimators) a. Suppose 3 idepedet surveys have bee carried out to estimate a ukow populatio mea µ. T 1, T ad T 3 are the three idepedet, ubiased estimators of µ i the 3 surveys, with variaces var(t 1 ) = σ, var(t ) = σ ad var(t 3 ) = 3σ. Determie the coefficiets a, b ad c such that the liear estimator at 1 + bt + ct 3 is ubiased with the smallest variace possible ( the MVU estimator amog the liear estimators ). b. Cosider the set of liear ad ubiased estimators T = i=1 a i X i of the populatio mea µ, based o a radom sample X 1,.., X of the populatio (real coefficiets a i ). The sample mea X is oe of them (a i = 1 ). Show that X is the best estimator of this set of ubiased liear estimators. (Hit: suppose that a better ubiased, liear estimator has coefficiets a i = 1 + δ i)

58 -6 9. X has a uiform distributio o the iterval (0, θ), with ukow parameter θ. I order to estimate θ we have a radom sample X 1,, X of X. a. Give the desity fuctio of X, the distributio fuctio of X, E(X) ad var(x). b. Show that T 1 = max (X 1,, X ) is the mle of θ. c. Is T 1 a ubiased estimator of θ? If ot, for which umber a is a T 1 ubiased? d. Is T 1 a cosistet estimator of θ? (you ca apply the property that T is cosistet if lim MSE(T) = 0 ) Sice E(X) = θ, ituitively T = X seems a good estimator of θ. e. Is T a ubiased ad/or a cosistet estimator of θ? f. Which of the two estimators of θ, T 1 ad T, is the best (distiguish possible values of )? 10. I a coutry the political preferece is distiguished i left (group 1), middle () ad right (3) with ukow probabilities p 1, p ad p 3 = 1 p 1 p. Based a radom sample of voters we foud X 1 = x 1 left, X = x ad X 3 = x 3 (= x 1 x ) right voters. Derive the mle of θ = (p 1, p ) ad check whether these estimators are cosistet. Hit: the multiomial (joit) probability fuctio is P(X 1 = x 1 ad X = x ad X 3 = x 3 )! = x 1! x! ( x 1 x )! p x 1 x 1 p (1 p1 p ) x 1 x 11. We cosider two idepedet samples with the same sample size : X 1,, X is a radom sample of X, that has a N(μ 1, σ )-distributio Y 1,, Y is a radom sample of Y, that has a N(μ, σ )-distributio (equal variaces). The parameters μ 1, μ ad σ a. Determie the maximum likelihood estimator σ of σ. (Use a similar approach as o pages 6/7). b. Fid E( σ ) ad var( σ ) ad show that σ is a cosistet estimator. c. Compare σ to the pooled sample variace S (property.1.9). 1. Z 1, Z,, Z are idepedet ad all stadard ormal. a. Compute E(Z 1 ) ad var(z 1 ), ad E(Z Z ) ad var(z Z ) b. Fid the desity fuctio of Z 1, which is the χ 1 -desity fuctio. c. Use the covolutio itegral f X+Y (z) = f X (x)f Y (z x)dx ad z 1 0 x(z x) dx = π (see your calculus book) to show that Z 1 + Z has a expoetial distributio. d. Check that the χ -distributio you foud i c. is a gamma distributio (page 0.6) with parameters α = 1 ad β =.

59 3-1 Chapter 3 Cofidece itervals 3.1 Itroductio I the previous chapter we discussed the estimatio of ukow parameters i a populatio. Estimators are based o radom samples. The radomess is the coditio that esures that we get a reliable idea of the parameter s real value. But eve if this coditio is fulfilled, the statistic returs oly oe real value as a estimate of the parameter. The bias of the estimator ad its variace puts the estimate i some perspective. The goal of this chapter is to take the variability of estimates ito accout ad to costruct iterval estimates: ot oe umerical value as a estimate, but a iterval i which the ukow parameter lies with a certai level of cofidece. For istace, if we have ormal populatio with ukow parameters μ ad σ, we will costruct statemets o the basis of a radom sample of the populatio as follows (we chose a cofidece level of 95%): At a cofidece level of 95% we have:. < μ <.. Or, At a cofidece level of 95% we have:. < σ <.. For the populatio proportio p the desirable statemet will be: At a cofidece level of 95% we have:. < p <.. The boudaries of the iterval deped o the desired level of cofidece ad, of course, o the actual observatios i the sample, so they are statistics, e.g.: l(x 1,, x ) < μ < u(x 1,, x ), where l ad u symbolize lower ad upper bouds We will see that for such a iterval 95% is ot a probability, but it is based o a more geeral probability statemet with respect to µ. Therefore we will first repeat how to determie a iterval aroud kow µ i which the sample mea will attai a value with probability 95%. Example It is difficult to determie the meltig poit of a alloy (a mix of metals) exactly, because of the high temperatures. Therefore multiple measuremets are coducted, e.g. 5 estimates (temperatures) are measured. The sample mea x is a estimate of the real meltig poit μ.we will assume that the measuremet method is ubiased (o uder- or overestimatio), so E(X) = μ, ad that (kow from past experiece) the stadard deviatio i this rage of temperatures is 10. Suppose we have a alloy with a kow meltig poit: μ = 818. How much will the sample mea of 5 measuremets deviate from this value? Or: determie a iterval (μ a, μ + a), symmetric aroud μ, such that X attais a value withi the iterval with probability 95%. To meet this coditio we eed to state the probability model explicitly:

1)we kow: X ~ N (μ, σ ) so: X μ σ ~ N(0,1) From the stadard ormal table it follows that Φ(c) = 0.975, if c = 1.96: for a N(0,1)- distributed variable Z we have: P( c < Z < c) = 0.95.

60 Model: X 1,..., X 5 is a radom sample of N(818, 100)-distributio. To make our approach wider applicable we will replace the values 818, 100 ad 5 by the symbols μ, σ ad (ad keep these umerical values i mid). 3- From probability theory (see sectio.1)we kow: X ~ N (μ, σ ) so: X μ σ ~ N(0,1) From the stadard ormal table it follows that Φ(c) = 0.975, if c = 1.96: for a N(0,1)- distributed variable Z we have: P( c < Z < c) = So: X μ P ( c < σ < c) = 0.95 From this it follows: σ P ( c < X μ < c σ ) = 0.95 σ Or: P (μ c < X < μ + c σ ) = 0.95 The requested 95%-iterval (μ a, μ + a) is: σ (μ c, μ + c σ ) = ( , ) = (814.08, 81.9) 5 The frequecy iterpretatio of this iterval is: About 95 of the 100 meas, each computed from 5 ew observatios will have a value withi the iterval, ad about 5 outside the iterval. c σ = 3.9 is called the estimatio error (for a 95% probability): it is half the legth of the iterval. X μ, the differece betwee estimator ad real value, is the measuremet error: P( estimatio error < X μ < estimatio error) = 0.95 The umerical iterval (μ c σ, μ + c σ ) i the example above is referred to as the predictio iterval of X with a 95% probability, for give values of μ ad σ : this iterval predicts the value of X before really gatherig the data from the sample. The iterval formula idicates that the stadard deviatio σ X = σ will decrease if the sample size icreases, ad the predictio iterval will be smaller accordigly. But we are iterested i a cofidece iterval for the ukow ad fixed value of populatio mea µ.

61 Cofidece iterval for the populatio mea μ Example 3..1 I example we determied, for a kow meltig poit μ = 818 ad a kow N(μ, σ )-distributio of the meltig poit observatios, a iterval estimate of the mea of 5 of these observatios. Now we wat to perform a reverse operatio: based o a actually observed mea of 5 meltig poit observatios we wat a iterval estimate of the ukow expected meltig poit μ of a ewly composed alloy. The sample results are summarized as follows: = 5, x = 40.0 e s = 15.4 I this sectio we will restrict ourselves to situatios where the ormal distributio is a correct model of the observed values, such as i the example above. Additioal assumptios must be made with respect to the parameters. Of course, µ is ukow: it is poitless to determie a estimatio iterval of µ, if we kow its value. For the other parameter, the variace σ, we distiguish two possibilities: The variace σ is kow. I practice this situatio does ot occur ofte: usually if μ is ukow, so is σ. However, sometimes a reasoable assumptio w.r.t. the value of σ ca be made. For istace, i the case of the meltig poit measuremets i example 1..1 the measuremet errors i a give rage of temperatures are roughly the same. De variace σ is ukow. This is the most frequetly occurrig situatio. Remember (as a rule) that σ is ukow, uless. Cofidece iterval for μ if σ is kow As a model of the sample results x 1,, x we will assume that we have a radom sample X 1,, X take from the N(μ, σ )-distributio, with ukow μ ad kow σ. We oticed that the stadard ormal distributio is symmetric about the Y-axis, so X μ μ X σ ~ N(0,1) ad (usig symmetry) σ ~ N(0,1) We will, agai, take a iterval with probability 95% aroud 0 of the N(0,1)-distributio as a startig poit of our aalysis: choose c = 1.96 such that P( c < Z < c) = 0.95 Costructio of a 95%-cofidece iterval for μ if σ is kow μ X From P ( c < σ < c) = 0.95 it follows: σ P ( c < μ X < c σ ) = 0.95 σ Or: P (X c < μ < X + c σ ) = 0.95

62 3-4 95% CI(μ) = (X c σ, X + c σ ) This iterval is a stochastic 95%-Cofidece Iterval for μ. Notice that the boudaries are statistics: they are oly depedig o the sample variables. X 1,, X, sice all other symbols c (= 1.96), sample size ad stadard deviatio σ are kow. Example 3.. Determie a 95%-cofidece iterval for the meltig μ of a ew alloy if for the = 5 meltig poit measuremets i example 3..1 we foud: x = 40.0 ad s = The model of the 5 measuremets is a N(μ, 100)-distributio, with ukow μ ad kow σ = 100, as assumed before i similar cases (example 3.1.1). The estimate s = 15.4 is actually superfluous iformatio, but the estimate does ot cotradict the assumed value of σ (compare s = ad σ = 10 as well). We will apply the 95%-CI(μ) = (X c σ, X + c σ ) above, by replacig X by the observed value x = 40.0 ad substitute the other kow values: c = 1.96, σ = 10 ad = 5 to obtai: 95%-CI(μ)= (x c σ, x + c σ ) = (40 3.9, ) (36.1, 43,9). We coclude that we are 95% cofidet that the meltig poit μ lies betwee 36.1 ad The calculated iterval is the umerical 95%-cofidece iterval for µ. Repeatig the sample will lead to differet (5) measuremets: the ceter x of the umerical iterval will chage, but the estimatio error c σ = 3.9 will remai the same. A correct iterpretatio of cofidece itervals ad the differece betwee stochastic ad umerical cofidece itervals is importat. Applied to example 3..: Correct statemet: There is a 95% probability that the meltig poit μ lies i the (stochastic) iterval (X c σ, X + c σ ). Icorrect statemet: There is a 95% probability that the meltig poit μ lies i the umerical iterval (36.1, 43,9) Icorrect statemet: About 95% of the observatios will lie i this iterval Startig with the last statemet: this statemet is icorrect because we determied a iterval for the mea (µ) of all possible measuremets, ot for separate measuremets. A iterval for oe measuremet is called a predictio iterval, which is wider tha a cofidece iterval: see sectio 3.5. The correct statemet follows from the probability statemet: σ P (X c μ X + c σ ) = P (μ (X c σ, X + c σ )) = 0.95 The frequecy iterpretatio of this probability says: about 95 out of 100 repetitios of the sample will produce umerical itervals, that iclude µ, but (about) 5 of the itervals do ot iclude μ. Of course, i practise we will oly coduct a research oce. But, whe iterpretig a umerical iterval, we will have to bear this iterpretatio i mid. This is the reaso why, for umerical itervals, we should ot state 95% probability that., but we are 95% cofidet that

63 3-5 We prefer to use 95% cofidet istead of ituitive (or utidy) termiology as 95% sure or 95% certai : we do ot compute certaity itervals or the like. The followig graph illustrates the umerical cofidece itervals of 0 repetitios of the sample: Iterval umber: μ Ukow μ μ 3.9 O average, 19 out of 0 repetitios lead to the desired situatio: μ is icluded i the iterval. But oe iterval (o. 7) does ot. I practice we will compute oe iterval: the problem is, that we do ot kow which type of iterval (cotaiig µ or ot?) we have at had: we are 95% cofidet that the meltig poit μ is icluded. Beside the correct iterpretatio the followig aspects are importat: The cofidece level 1 α If the cofidece level is 95%, the α is apparetly 5%: i the graph α is the sum of two equally large areas of the tail probabilities, each with a area 1 α = c = 1.96, such that Φ(c) = 1 1 α = 0.975, or c = Φ 1 (1 1 α). I the examples we used a 95% level of cofidece, but, depedig o the desired level of cofidece, the choice of 90% or 99% is quite commo as well: i that case the tail probabilities are 5% ad 0.5%, respectively. If 1 α is the cofidece level, we will use the otatio: (1 α)100%-ci(μ) = (X c σ, X + c σ ), where c is take fom the N(0,1)-table: 1 α 90% 95% 99% 1 α c = Φ 1 (1 1 α) σ is the stadard deviatio of X, or the stadard error of X, c σ is the estimatio error (margi) of the iterval (for give cofidece level 1 α) ad c σ is the width (or legth) of the cofidece iterval. The formula of the cofidece iterval shows the followig rules: If the cofidece level icreases, c icreases ad the iterval will be wider. Reversely, if we wat a less wide iterval, the level of cofidece will be lower.

64 3-6 If we choose to icrease the sample size, the stadard error σ will decrease ad the iterval will be smaller. If i a populatio the variatio (σ) is large, the cofidece iterval for μ is wider tha for populatios with a smaller σ (assumig a fixed sample size). Cofidece iterval for μ if σ is ukow We will cotiue assumig a ormally distributed variable i a populatio, but ow with both parameters μ ad σ ukow. Example 3..3 O the job market of IT-specialists it is obvious that the startig salaries decreased as a cosequece of the ecoomic crisis. To get a impressio of the recet startig salaries a IT-studet gathered 15 startig salaries, offered i job advertisemets. He computed a mea startig salary of 3.30 k (gross, i thousads of Euro s a moth) ad a sample stadard deviatio of 0.60 k. We may assume that the15 observatios are a realizatio of a radom sample, take from the ormal distributio of all startig salaries of IT-specialists. a. Determie a 95%-cofidece iterval for the mea startig salary of IT-specialists. b. Determie a 95%-cofidece iterval for the stadard deviatio of the startig salaries of IT-specialists. The b-part of example will be aswered i sectio 3.3, The a-part could be simply aswered by usig the formula (x c σ, x + c σ ), where = 15, x = 3.30, c = 1.96 from the N(0,1)-table ad the ukow σ could be replaced by the estimate s = But this ituitive approach leads, alas, to large errors compared to a theoretically correct approach. The errors are caused by the distributio o which the costructio if a 95%-CI(µ) is based: X μ σ ~ N(0,1) Replacig σ by S = S leads to a ew variable T, roughly the quotiet of two variables X ad S: X μ T = S The distributio of S will be discussed i the followig sectio. The costructio of the distributio of the variable T is a quite complicated mathematical operatio, which was first coducted successfully by W.S. Gosset, a employee of the famous Guiess breweries i Dubli, i Sice his employer forbade publicatios by its researchers, he published his fidigs uder the ame Studet. That s is why the distributio of T = X μ has a so called S

65 3-7 Studet s t-distributio (or, for short, t-dsitributio). The shape of this distributio resembles the stadard ormal distributio, but the replacemet of σ by S causes a larger stadard deviatio. The differece betwee the t- ad the N(0,1)-distributio depeds o the sample size, or, as is the commo termiology, the umber of degrees of freedom 1 ( 1 1 is the factor i the formula of S ). The umber of degrees of freedom is briefly otated as: df = 1. The graph above shows that the t-distributio coverges for large umbers of the degrees of freedom to the stadard ormal distributio. Defiitio 3..4 (The Studet`s t-distributio) If Z ~ N(0, 1) ad Y ~ χ are idepedet, the T = Z Y/ has a Studet s t distributio with degrees of freedom. Short otatio: T ~ t With techiques, discussed i Probability Theory, the desity fuctio of the t-distributio ca be determied: usig the joit distributio of the idepedet Z ad Y, we ca fid the distributio Z fuctio by solvig P ( x), resultig i f Y/ T (x) = c (1 + x ) (+1)/, where c is a costat. For = 1 we have f T1 (x) = 1 1, the Cauchy desity fuctio, for which the π 1+x expectatio does ot exist. If X 1,..., X is a radom sample, draw from the N(μ, σ )-distributio ad we choose Z = X μ ad Y = ( 1)S ~ χ σ 1 i defiitio 3..4, we fid: X μ Z Y/ = σ ( 1)S σ /( 1) X μ = S This ad other relevat properties are give i the followig property. has a t 1 distributio. σ Property 3..5 a. The desity fuctio of the t-distributio with degrees of freedom (T ~ t ) is symmetric about the lie Y-axis ad E(T ) = 0 for =, 3, b. var (T ) =, for = 3, 4,. c. If, the t -distributio coverges to the N(0, 1)-distributio. d. If X 1,..., X is a radom sample of the N(μ, σ )-distributio, the T = X μ Studet`s t-distributio with 1 degrees of freedom. ( T ~ t 1 ) S has a

66 3-8 A t-distributed radom variable with 1 degrees of freedom is otated as T 1 (similar to the otatio of stadard ormal Z). Ad, similarly to the umerical approximatios of the stadard ormal distributio fuctio Φ(z), tables of probabilities of the t-distributios are available. But ow we eed a table for each value of the umber of degrees of freedom, 1. Aother differece is that the t-table cotais upper tail probabilities P(T 1 c) = α, for just a few values of α. The costructio of a 95%-cofidece iterval for μ if σ is ukow is similar to the costructio we gave before if σ is kow: X μ μ X From P ( c < < c) = P ( c < < c) = 1 α follows: S S S P ( c < μ X < c S ) = 1 α S Or: P (X c < μ < X + c S ) = 1 α (1 α)100% CI(μ) = (X c S, X + c S ) Example 3..6 (cotiuatio of example 3..3). We have = 15 startig salaries of ITspecialists with sample mea x = 3.30 k (i thousads of Euro s per a moth) ad the sample stadard deviatio is s = 0.60 k. a. Fid a 95%-cofidece iterval for the mea (expected) startig salary i the job market of ITspecialists. Solutio: Probability model of the observed startig salaries: X 1,..., X 15 is a radom sample from the N(μ, σ )-distributio with ukow μ ad σ. The formula of this iterval for this model ca be foud o the formula sheet: 95%-CI(μ) = (X c S, X + c S ) Summary of the observatios: = 15, x = 3.30 ad s = 0.60 k. c =.145 from the t-table with df = 1 = 14, such that P(T 14 c) = α = 0.05

67 3-9 So 95%-CI(μ) = ( , ) (.97, 3.63) We are 95% cofidet that mea startig salary of all IT-specialists (or: the expected startig salary of a IT-specialist) lies betwee 970 ad 3630 Euro. About the use of t-tables: for small df 30 all t-distributios are covered; betwee df = 30 ad df = 10 oe could choose the earest umber of degrees of freedom or apply liear iterpolatio of the two earest values (a weighted average of those two table values); if df > 10, accordig to property 3..4d, the t-distributio is approximated by the N(0, 1)- distributio: the stadard ormal table values are show i the t-table o the last lie df =. Determiig the sample size for give iterval width ad cofidece level Example 3..7 A machie fills bags of playgroud sad: the producer says that the bags cotai 5 kg, but the stadard deviatio σ i the fillig process is 100 grams (0.1 kg). A customer, who purchases may of these bags, wats to check whether the mea cotet is really 5 kg: therefore he wats to kow how may bags he should weigh, so that a 95%-cofidece iterval of the mea weight has a width of at most 0 grams (0.0 kg). How large should his sample size be? Model: the weights X 1,, X are idepedet ad all N(μ, σ ), with ukow μ ad σ = 0.1. Coditio: the width c σ 0.0, where c = 1.96 (Φ(c) = 1 1 α = 0.975) ad σ = 0.1. σ From c it follows: ( ) , so should be at least I example 3..6 we determied the sample size for the case of kow σ, but the case of ukow σ is more complicated. For a desired maximum width W, the coditio is: S c W, so (cs W ) But we do ot kow the value of S, or c from the t 1 -table (for give cofidece level), sice is yet to be determied. For S we eed prior kowledge (a kow maximum or a result of a first small sample) ad c ca be approximated by the stadard ormal distributio, especially if is expected to be large.

68 Cofidece iterval for the variace σ If we wat to costruct a cofidece iterval of σ for a ormally distributed populatio, we eed to fid the distributio of S first. This distributio is discussed i sectio 0.: the Chi-square distributio (usig the Greek letter Chi oe may write χ -distributio). We repeat the defiitio: if Z 1,, Z are idepedet ad all N(0,1)-distributed the: X = Z i i=1 Short otatio: X is χ -distributed or: X ~ χ is Chi square distributed with degrees of freedom For the (ucommo) case that we have a ormally distributed populatio with kow μ the lik betwee the variace estimator S μ = 1 (X i X) i=1 ad the Chi square distributio is straightforward. Example Referrig to the examples o meltig poits i sectio 3.1, we could determie the variability of the temperature measuremets by estimatig the variace σ for a alloy with a kow meltig poit. Assume that the meltig poit of the alloy is μ = 818. The probability model is a radom sample of measuremets X 1,, X, take from a N(μ, σ )- distributio, with kow μ = 818 ad ukow σ. We will ot use S = 1 (X 1 i X) i=1 as a estimator of σ, but S μ = 1 (X i μ) i=1 (sice it has a smaller MSE). S μ is a ubiased estimator of σ = E(X μ), sice:

69 3-11 E(S μ ) = E ( 1 (X i μ) ) = 1 E(X i μ) = 1 σ i=1 Because X i μ = Z σ i is stadard ormal for every X i, we have: S μ σ = i=1 (X i μ) σ = ( X i μ σ ) = Z i is χ distributed by defiitio. i=1 i=1 i=1 If μ is ukow, we will use the ubiased estimator S = 1 (X 1 i X) i=1, where μ is replaced by X. A result from Probability Theory is give i sectio 0. (property 0..9) ad we will give a thorough proof of this property i chapter 8, but for ow we will repeat the relatio of S ad the chi square distributio to use it for costructio of a cofidece iterval. Property 3.3. a. If X 1,, X is a radom sample take from a N(μ, σ )-distributio, the ( 1)S σ ~ χ 1 b. If Y ~ χ, the we have: E(Y) = ad var(y) = Property 3.3.a will be used to costruct a cofidece iterval of σ. For that aim we will first fid a iterval (c 1, c ), such that the probability, that the Chi-square distributed variable ( 1)S σ attais a value i this iterval, is 1 α. c 1 ad c are chose such that the two tail probabilities P(χ 1 c 1 ) ad P(χ 1 c ) are α, as show i the graph below: The costructio of a cofidece iterval for σ (if μ is ukow): ( 1)S P (c 1 < < c ) = 1 α, P ( 1 c < σ σ ( 1)S < 1 c 1 ) = 1 α

70 3-1 ( 1)S P ( < σ < c ( 1)S ) = 1 α ( 1)S ( 1)S P ( < σ < ) = 1 α c c 1 Above we costructed two itervals, oe for the variace σ ad oe for the stadard deviatio σ. The geeral formulas for these stochastic itervals are: (1 α)100%-ci(σ ) = ( ( 1)S c, (1 α)100%-ci(σ) = ( ( 1)S c For both itervals we have: P(χ 1 ( 1)S c 1 ) c 1, ( 1)S c 1 ) c 1 ) = P(χ 1 c ) = α The formula for σ ad the tail probabilities are metioed o the formula sheet. I practice these formulas ca be applied if the ormality assumptio is reasoable ad if the expectatio μ ad the variace σ are both ukow. The latter is usually the case: if μ is kow, oe could use S μ, as i example 3.3.1, to costruct a cofidece iterval of σ similarly. (1 α)100%-ci(σ ) = ( S μ c, (1 α)100%-ci(σ) = ( S μ c S μ c 1 ), S μ ) c 1 For both itervals we have: P(χ c 1 ) = P(χ c ) = α Example (Cotiuatio of examples 3..3 ad 3..6.) We observed = 15 startig salaries of IT-specialists: the mea is x = 3.30 k (i thousads of Euro s a moth) ad the (sample) stadard deviatio is s = 0.60 k. Requested: b. A 95%-cofidece iterval for the stadard deviatio of the startig salaries. We will use: The formula of the cofidece iterval for σ (formula sheet). We will extract the root of the bouds sice σ = σ Furthermore c 1 = 5.63 ad c = 6.1, take from the χ -table with df = 1 = 14, such that P(χ 14 c 1 ) = P(χ 14 c ) = α = %-CI(σ) = ( ( 1)S c, ( 1)S ) = ( c 1 6.1, ) (0.44, 0.95) We are 95% cofidet that the stadard deviatio of the startig salaries lies betwee 440 ad 950 Euro. Example What iformatio ca the past performace give us about the future returs o ivestmets? A cofidece iterval could quatify our expectatios, but we eed several (sometimes

71 3-13 disputable!) assumptios for the yearly returs, i the past ad i future. Idepedece is oe of them (are the returs i two cosecutive years idepedet?) ad a ormal distributio with the same expected retur µ ad variace σ is aother (Will the future returs be roughly the same as those i the past?). Suppose the followig yearly returs are measured (i %): 15.4, 6.4,.1, 1.8, 4.8, 11.4, 7.3 Fid a 95%-cofidece iterval for a. The expected yearly retur b. The variace of the yearly retur. Usig a (simple) scietific calculator we fid: = 7, x = 8. 0 ad s = Probability model: the observed returs X 1,, X 7 is a radom sample of a N(μ, σ )- distributio, with ukow expected retur µ ad ukow σ. a. 95%-cofidece iterval for the expected yearly retur μ: Besides the give values, x ad s = 34.11, we will use the t-table with 1 = 6 degrees of freedom to fid the value of c i the formula, such that P( c < T 6 < c) = 0.95, or (the tail probability o the right) P(T 6 c) = 0.05, so c =.447. s 95% CI(μ) = (x c, x + c s ) = ( , ) (.6, 13.4) 7 7 The expected yearly retur is betwee.6% ad 13.4% at a 95% level of cofidece b. 95%-Cofidece iterval for the variace σ : I the χ 6 -table we fid c 1 = 1.4 ad c = 14.45, as P(χ 6 c 1 ) = P(χ 6 c ) = α = %-CI(σ ) = ( ( 1)S c, ( 1)S c 1 ) = ( , ) (14., 165.0) We are 95% cofidet that the variace of the yearly retur lies betwee 14. ad (Ad the stadard deviatio σ betwee ad %). Note (A approximate cofidece iterval for σ for large ) I examples ad we used the Chi-square table to produce cofidece itervals for σ or σ. For umbers of degrees of freedom i the rage we ca use liear iterpolatio of the give table values. But for larger values of we ca also use the ormal approximatio of the Chi-square distributio: the χ -distributio of i=1 Z i is, accordig to the CLT ad property 3.3..b approximately N(, )-distributed. (See for applicatio exercise 6)

72 Cofidece iterval for the populatio proportio p Remember that, if a proportio p of a populatio has a specific property, e.g. ows a iphoe, the the probability that a arbitrarily chose perso from the populatio has the property (a iphoe) is the success probability p: populatio proportio ad success rate are iterchageable cocepts. I terms of the populatio we have a dichotomous populatio: the variable is ot umerical but a categorical variable with two categories: successes ad failures. Example A pollig agecy wats to determie the preset support of the Labour party (PvdA) i The Netherlads ad wats to ask 1000 arbitrarily chose voters whether or ot they will vote the Labour party. The aim is to determie a 90%-cofidece iterval of the populatio proportio p of Labour voters, ad of the expected umber of members i parliamet (150 i total) as well. After coductig the survey they foud that there are 58 Labour voters amog 1000 voters. We assume the agecy solved problems as drawig a really radom sample ad o-respose, so that we ca assume that X, the umber of Labour voters is assumed to be B(1000, p)- distributed. Accordig to the CLT X is approximately N(1000p, 1000p(1 p))-distributed (p is ot close to 0 or 1), the we kow that p = X/ is a estimator of p, with a approximate ormal distributio: p = X p(1 p) N (p, ) or: p p p(1 p) 1000 ~ N(0,1) Sice for a N(0,1)-distributed Z we have: P( < Z < 1.645) = 0.90, the approximately: P < p p < = 0.90 p(1 p) ( 1000 ) This iequality ca be solved with respect to the ukow p (see exercises), but a simple approximatig approach gives easy to apply results: just replace p(1 p) by its mle p (1 p ). The, the stadard deviatio p(1 p) 1000 Approximately we have: P ( < p p p (1 p ) 1000 of p is estimated by p (1 p ), the stadard error of p < 1.645) = P (p p (1 p ) 1000 where p = x = 58 is the observed proportio, so: 1000 < p < p p (1 p ) 1000 ) 0.90 p (1 p ) (p , p p (1 p ) ) (0.35, 0.81) 1000 This iterval is the (approximate) cofidece iterval for the populatio proportio p, at a 90% level of cofidece.

73 3-15 Sice the expected umber of members of the (Dutch) parliamet is 150p, we ca easily fid a cofidece iterval for this umber: if (A, B) is a stochastic iterval of p, we have P(A < p < B) = 0.90, but the we have P(150A < 150p < 150B) = 0.90 as well. The umerical iterval is: ( , ) (35.3, 4.1). So [35, 4] is a approximate cofidece iterval for the expected umber of Labour members of parliamet, at a 90% level of cofidece. We are 90% cofidet that Labour will have betwee 35 ad 4 members i parliamet. The costructio of the cofidece iterval i example is simply geeralized: Property 4.4. If we have a radom sample from a populatio i which a proportio p has a specific property (success), the the umber X of successes is, for sample size, B(, p)-distributed ad the approximate (1 α)100%-cofidece iterval for p is give by: p (1 p ) p (1 p ) (1 α)100% CI(p) = (p c, p + c ), where p = X ad c from the N(0, 1)-table, such that Φ(c) = 1 1 α. Rule of thumb for applyig this large sample approach for the cofidece iterval of p: > 5, p > 5 ad (1 p ) > 5 Determiatio of the sample size for give iterval width W ad give level of cofidece ca be solved from the coditio c p (1 p ) W: ( c W ) p (1 p ) The upper boud for ca be computed if the ukow p is determied (estimated), as follows: 1. Geeral solutio, if p is completely ukow: replace p (1 p ) by 1, sice 0 p (1 p ) Replace p by p 0, if we kow that p p 0 or p p 0 (the latter for p 0 < 1 ).

N(μ, σ ) 3-16 Overview of cofidece itervals i case of oe sample problems Populatio model parameter Cofidece iterval c from the μ σ if σ (X c is kow, X + c σ ) N(0,1)-table μ if σ is ukow σ if μ is

74 N(μ, σ ) 3-16 Overview of cofidece itervals i case of oe sample problems Populatio model parameter Cofidece iterval c from the μ σ if σ (X c is kow, X + c σ ) N(0,1)-table μ if σ is ukow σ if μ is ukow (X c ( 1)S ( c S, X + c S ) t 1-table ( 1)S, ) χ c 1 -table 1 σ if μ is ukow ( 1)S ( c ( 1)S, ) χ c 1 -table 1 σ if μ is kow ( S μ c, S μ ) χ c -table 1 Dichotomous, proportio p p p (1 p ) p (1 p ) (p c, p + c ) N(0,1)-table Oe-sided cofidece itervals All the itervals we gave i this overview are so called two-sided cofidece itervals, but i some practical cases we are oly iterested i a lower boud of the parameter, or, alteratively a upper boud, at a give cofidece level. I that case we will give a cofidece iterval with the shape (A, ) or (, B) for the parameter. Example (cotiuatio of exercise 4.4.1) Determie a lower-tailed 90%-cofidece iterval for the proportio of Labour voters, if 58 vote Labour i a radom sample of 1000 voters. I other words: what is the miimal proportio of Labour voters, at a 90% cofidece level. I this case we do ot eed a upper boud for p, but the the 90%-CI for p is p (1 p ) (p c, ), where c is such that Φ(c) = 1 α = 0.90: c = 1.8. Takig ito accout that p 1: 90%-CI (p) = (p c p (1 p ), 1) ( , 1] (0.40, 1] We are 90% cofidet that the proportio of Labour voters is at least 4.0%

75 3-17 Similarly we ca costruct oe sided cofidece itervals for μ or σ. Note that for a iterval with a lower boud we will use a upper tail probability ad, reversely, for a upper boud a lower tail probability, e.g.: ( 1)S ( 1)S ( P ( σ < c ) = 1 α P ( < σ ) = P (σ 1)S > ) = 1 α c c 3.5 Predictio iterval Cofidece itervals are give for populatio parameters μ, σ ad p. I sectio 3. we determied the cofidece iterval of the populatio mea, such as the mea startig salary of IT-specialists: μ is ukow ad fixed ad a iterval estimate is determied usig a radom sample of observatios. Note that the iterval does ot predict a sigle ew startig salary, it predicts the mea of all startig salaries. I sectio 3. we observed: the larger the sample size, the smaller the cofidece iterval, so the smaller the proportio of observed salaries that fall i the iterval. We are searchig for a predictio iterval for a ew observatio X +1, based o a observed radom sample X 1,, X : the value of X +1 does ot deped o the observed sample, but its distributio is the same, the (ormal) populatio distributio. We will assume the followig model: Model: X 1,, X, X +1 are idepedet ad all N(μ, σ )-distributed, with ukow μ ad σ. Of course the observed value of X = 1 X i=1 i ca serve as a predictio of the ( + 1) th observatio: X +1 X is the predictio error. Sice X +1 ~ N(μ, σ ) ad X ~ N (μ, σ ) are idepedet, X +1 X is ormally distributed as well, with Expected predictio error E(X +1 X) = E(X +1 ) E(X) = μ μ = 0 ad Variace var(x +1 X) id. = var(x +1 ) + var(x) = σ + σ = σ (1 + 1 )

76 3-18 So: X +1 X ~ N (0, σ (1 + 1 X )) ad +1 X σ (1 + 1 ) ~ N(0, 1) The problem of the ukow σ ca be solved by substitutig S, resultig i a t-distributed pivot with 1 degrees of freedom, similar to T = X μ i sectio 3.. Sice Z = X +1 X S ~ N(0, 1) ad Y = ( 1)S ~ χ σ (1+ 1 ) σ 1 are idepedet, T = Z Y/( 1) = X +1 X σ (1 + 1 ) ( 1)S σ /( 1) = X +1 X S (1 + 1 ) ~ t 1 First we will determie a value c such that P( c < T 1 < c) = 1 α, or, usig the t 1 - table, such that P(T 1 c) = 1 α. P ( c < X +1 X S (1 + 1 ) < c ) = 1 α P ( c S (1 + 1 ) < X +1 X < c S (1 + 1 )) = 1 α P (X c S (1 + 1 ) < X +1 < X + c S (1 + 1 )) = 1 α Property If X 1,, X, X +1 are idepedet ad all draw from a N(μ, σ )-distributio, the (X c S (1 + 1 ), X + c S (1 + 1 )) with P(T 1 > c) = 1 α is a predictio iterval for a ew observatio X +1, based o a radom sample X 1,, X, with cofidece level 1 α Short otatio: (1-α)100%-PI(X +1 ) = (X c S (1 + 1 ), X + c S (1 + 1 ))

77 3-19 Example 3.5. (cotiuatio of example 3..3). We have = 15 startig salaries of IT-specialists with sample mea x = 3.30 k (i thousads of Euro s per a moth) ad the sample stadard deviatio is s = 0.60 k. Suppose a IT-specialist just graduated what will be his startig salary, at a 95%-cofidece level? Clearly we are asked a iterval for this specific IT-specialist, ot for the mea of all startig salaries: a predictio iterval for a ew observatio (X 16 ) is requested. Probability model of the startig salaries: X 1,..., X 15, X 16 is a radom sample from the N(μ, σ )-distributio with ukow μ ad σ. For the predictio iterval we ca use the same igrediets: = 15, x = 3.30 ad s = 0.60 k. c =.145 from the t-tabel with df = 1 = 14, such that P(T 14 c) = α = %-PI(X 16 ) = (X c S (1 + 1 ), X + c S (1 + 1 )) = ( ( ), ( )) (1.97, 4.63) Iterpretatio: We are 95% cofidet that the startig salary of a IT-specialist lies betwee 1970 ad The more detailed frequecy iterpretatio: if we repeat the radom sample may times ad the ew observatio may times, the i about 95% of the repetitios the predictio iterval will iclude the ew observatio. Note that the predictio iterval is much wider tha the cofidece iterval for the mea startig salary, that we determied i example 3..3: 95%-CI(μ) (.97, 3.63) versus 95%-PI(X 16 ) (1.97, 4.63) 3.6 Exercises 1. A compay produces foils for idustrial use. A ew type of foil is itroduced ad the producer claims that the ew foil has a mea pressure resistace of at least 30.0 (psi). A radom sample of pieces of foil were tested ad the followig pressure resistace values were observed: a. Use a simple (scietific) calculator with statistical fuctios (ot a GR ), to determie estimates of the expected pressure resistace ad of the variace. b. Determie a 95%-cofidece iterval for the expected pressure resistace of the ew foil, assumig ormality. Give first the probability model. c. Determie a cofidece iterval for the variace of the pressure resistace at a 95% cofidece level.

78 3-0. The umber of hours of sushie durig the moth of July was measured i De Bilt (Hollad) i a 0 years period: Year Hours of su Year Hours of su Sample mea ad sample stadard deviatio are ad 39.50, respectively. a. Determie the 99%-cofidece iterval of the expected umber of hours of sushie durig the moth of July. b. Ca we iterpret the iterval i a. as follows: about 99% of the July moths will have a umber of hours of su withi the iterval? If ot, give a proper iterpretatio. c. I 1984 the umber of hours of sushie apparetly was hours Is this a exceptioal low value? Motivate your aswer d. Determie the 95%-cofidece iterval of the stadard deviatio of the umber of hours of sushie. e. Commet o the assumptios of idepedece ad of ormality, o which the applied itervals are based. 3. For search methods i data baks usually performace measures are used. Assume we have such a measure ad the ormal distributio applies: the performace measure varies aroud a ukow expected value µ with a ukow variace σ accordig to a ormal distributio. The followig observed values x 1, x,.., x 9 of the performace measure are supposed to be a radom sample: 5, 54, 54, 57, 58, 59, 64, 70, 7 a. Give estimates of μ ad σ, based o the usual ubiased estimators. b. Determie a 95%-cofidece iterval for the expected value. c. Fid a (umerical) 95%-cofidece iterval for σ. 4. A large batch of lamps is checked by samplig 100 arbitrarily chose lamps. a. Determie a approximate 95%-cofidece iterval for the proportio of defectives i the whole batch if were defective i the sample of 100 lamps. b. How large should the sample size be, as to make sure that the legth of the (95%-) cofidece iterval is at most 0.0 (or %)? 5. A expert i Traffic ad Trasport claims that 30% of all private cars i a regio show legal deficiecies, i e.g. lights ad breaks. Alarmed by this statemet the govermet sets up a large sample to check the claim: at radom 400 private cars are checked ad 73 of them showed deficiecies. Let p be the proportio of private cars with deficiecies i the regio. a. Determie a (umerical) 95%-cofidece iterval for p ad give the proper iterpretatio

79 3-1 of this iterval. b. If the questio would be What is the proportio of cars with deficiecies at most, determie a oe-sided 95%-cofidece iterval for p to aswer this questio ad give a simple iterpretatio of this iterval. c. How large should we choose the sample size to estimate the proportio p with a estimatio error of at most % at 99% cofidece level? Use the reported sample proportio. 6. Cosider the sample variace S of a radom sample X 1,, X draw from a ormal distributio with ukow μ ad σ. The sample size is large: > 5. a. Use the approximately ormal distributio of S to costruct a cofidece iterval for σ with cofidece level 1 α. Suppose we foud for = 101 a sample variace s = 50 b. Use the result i a. to compute a approximate 95%-cofidece iterval for σ. c. Compare the iterval i b. to the iterval you get if we use the Chi-square distributio to fid the 95%-CI(σ ). 7. The times (i miutes), spet by 50 customers i a web store are observed ad summarized as follows: = 50, x = 1.6, s = From the umerical ad graphical aalysis it was cocluded that the expoetial model applies (so f(x) = λe λx, x 0 ad μ = σ = 1 λ ). We wat to estimate he ukow parameter λ, usig the 50 observed times. a. What is a ituitive estimator of λ? b. Costruct a (formula for) a approximate (1-α)100%-cofidece iterval for λ, by usig the approximate distributio of the sample mea X. c. Use the result i b. to compute a (umerical) 95%-cofidece iterval for λ. 8. Buyig behaviour i a warehouse. A maagerial assistat has to assess the buyig behaviour of visitors of a plat of the warehouse. A first, s\mall sample of 75 visitors leavig the warehouse should give a idicatio of their buyig behaviour before a exteded survey with more detailed questios will be coducted. I the radom sample of 75 visitors 60% bought at least oe product: these buyers sped o average 40, with a sample stadard deviatio of 10). a. Determie, usig the result of the small sample, a 95%-cofidece iterval of the expected umber of buyers o a day where 50 visitors etered the warehouse. b. Determie a 95%-cofidece iterval for the expected (total) tur-over o a day 1350 buyers (1350 = 60% of 50).

80 3- c. Determie a iterval estimate of the moey sped by buyer o o a day with 1350 buyers, with a 90% level of cofidece. 9. The cofidece iterval for p is deduced from the approximate N (p, p(1 p) )-distributio of the sample proportio of p for sufficietly large. I the costructio of the iterval we used the estimate p (1 p ) of the stadard deviatio p(1 p). a. Repeat the costructio of a cofidece iterval of p (as give i sectio 3-4), but ow without usig the aforemetioed estimate of the stadard deviatio. Show we fid the followig bouds of the iterval i that case c [(p + + c ) ± c p (1 p ) + b. Compare the result of the 95%-cofidece iterval i a. to the stadard-iterval for p o the formula sheet, if i a radom sample of = 100 trials 300 successes are observed. c 4 ] 10. Retur to the pressure-resistace-data i exercise 1. a. Determie a 90%-predictio iterval for pressure resistace that we will observe i a ewly tested piece of foil. b. How may of the observed pressure resistace values i exercise 1 would you expect to lie outside the iterval i a.? Cout them ad commet o the result. c. Determie a (oe-sided) 95%-cofidece iterval with a lower boud for the expected pressure resistace of the ew foil. 11. Sice workload is cosidered to be a problem at the uiversity, a radom sample of 00 members of the scietific staff with a full-time appoitmet (formally 40 hours per week) were asked to register their umber of actually worked hours per week. The result of the survey alarmed the Labour Uios: o average the 00 employees worked 47.3 hours per week ad the sample stadard deviatio was reported to be 4.0 hours per week. a. Joh works as a lecturer at the uiversity. Determie a iterval estimatio for the umber of hours that he actually works per week, usig a 90% level of cofidece. b. Determie a 95% cofidece iterval for the stadard deviatio of the workig hours per week of the scietific staff. 1. We observed i a survey a ormally distributed radom variable, that is: x 1,, x (with mea x ad variace s ) are observed ad are modelled as a radom sample X 1,, X, draw from a ormal distributio with ukow μ ad σ. Suppose that we pla to coduct a secod survey, resultig i a secod radom sample Y 1,, Y m. Determie a formula of the predictio iterval for the mea of this secod sample, based o the observatios i the first sample ad usig a level of cofidece 1 α. (Use a similar approach as i sectio 3.5).

81 4-1 Chapter 4 Hypothesis tests 4.1 Test o μ for kow σ : itroductio of cocepts I the previous chapters we discussed the poit ad iterval estimates of ukow parameters i a populatio: the mea μ, the variace σ or the proportio (success probability) p. Whe testig hypotheses we are ot specifically iterested i estimates, but we wat to show whether a claim or a cojecture (a hypothesis) ca be prove by statistical evidece, the observed data. I scietific articles o all kids of research (biology, physics, sociology, ecoomy, busiess, medicie, egieerig, etc.) data are used to prove statemets. I comparisos statistically sigificat differeces are poited out. Tests for may kids of situatios are developed i the theory of hypothesis testig: they ca assist i decisio makig ad i choosig optios i the most efficiet ad powerful way, o the basis of available observatios. Furthermore a good uderstadig of the theory of hypothesis testig ca assist us i desig of experimets ad gatherig relevat data. I this sectio we will focus o all basic cocepts of hypothesis testig, usig radom samples draw from a ormal distributio with kow σ : we will start with a simple example (4.1.1) ad a elaborate example (itroduced i example 4..) to make the statistical reasoig i hypothesis testig clear, before formalizig the cocepts. Sectios, 3 ad 4 discuss stadard tests o μ, σ ad p ad sectios 5 ad 6 discuss the costructio of tests ad their properties. Example I Lodo durig the 17 th ad 18 th cetury the muicipality started to keep record of all births ad their geder. I those days people assumed that boys ad girls were bor equally ofte: biologists explaied these equal umbers from the preservatio of makid. Research of the birth records, however, seemed to suggest uexpectedly that this was ot true: i all cosidered 8 years more boys were bor i Lodo tha girls. Does this observatio prove that the commo sese, that boys ad girls are equally likely to be bor, is ot correct? To aswer this questio a researcher reasoed as follows: if we assume that the probability of a boy equals the probability of a girl, the the occurrece of more boys durig a year is 50% (the probability of equal umbers of boys ad girls is egligibly small). But the the probability of 8 years more boys tha girls i a row is ( 1 )8. This probability is very small: about This shows that the assumptio of equal probabilities is ot correct: statistics show that more boys tha girls are bor, beyod reasoable doubt. Example 4.1. Are techical studets above average itelliget? This questio was posed after a extesive survey amog Dutch studets showig that the mea IQ of all studets is 115, where the stadard deviatio was 9. The IQ s were measured usig a stadard IQ-test. Sice IQ s i homogeeous groups (like studets) show ormal distributios,

82 4- it seems reasoable to model the IQ of a arbitrary Dutch studet as a N(115, 81)-distributed variable. The research questio suggests that the expected IQ, μ, of techical studets is greater tha 115: the cojecture that we wat to prove is μ > 115. But without sufficiet evidece, give by statistical data, we will have to accept that μ = 115, for the time beig. High time to provide some statistical evidece, which will cosist of a radom sample of IQ s of techical studets. Of course, we will compute the sample mea to give a estimate of the ukow expected IQ. It seems logical that this mea IQ i the sample should be greater tha 115 to give sufficiet proof, but how large is sufficietly large? I example 4.1. we cosidered the populatio of IQ s of techical studets, with ukow mea μ, the expected IQ. The questio at had is whether (or whe) there is sufficiet statistical evidece i a sample to prove the statemet that μ > 115. Without sufficiet proof we will have to accept that μ = 115 (or eve μ 115). This is what we will call the ull hypothesis H 0. What we try to prove statistically, μ > 115, is called the alterative hypothesis H 1 (the cojecture). For the results of the radom sample we eed a decisio criterio or test, that tells us whe we ca reject H 0 : μ = 115 i favour of H 1 : μ > 115. Example The Coca Cola Compay (CCC) claims i a advertisemet that its brad is preferred over Pepsi Cola by a majority of the Dutch coke drikers. Pepsi challeges CCC opely to prove this statemet, or otherwise chage their advertisig policy. After some discussio the compaies agree to give a assigmet to a idepedet testig agecy to statistically sort this out, o the basis of a radom sample of 1000 coke drikers. Each of the test subjects has to taste the two kids of coke blidly ad choose the oe which tastes best. I this set up it is clear that CCC ca oly prove its claim if a majority of the 1000 subjects prefer Coca cola. But is CCC s statemet prove if there are 501 that prefer Coca Cola? Pepsi will ot accept this ad argues that 501 or eve 510 prefereces of Coca Cola could be coicidece, whilst i the whole populatio the preferece of Coca Cola is at most 50%. So, what umber of prefereces is large eough to state safely that Coca Cola is preferred? 550, 600? We are searchig for a boudary k i our decisio criterio: if k or more of the 1000 subjects prefer Coca Cola, CCC has prove its statemet. k is the critical value. The testig agecy therefore proposes i advace: We will choose the value of k such that, if the prefereces i the populatio is really balaced the probability is at most 1% that we fid at least k prefereces of Coca cola i the sample. So if we fid i the test k (or more) subjects that prefer Coca Cola, we agree ot to blame coicidece (probability 1%), but to accept that this evet is the cosequece of a majority of prefereces of Coca Cola i the populatio. Sice both parties agree o this approach the agecy ca start its work. If p is the proportio of prefereces of Coca Cola i the populatio, the 1 p is the proportio of prefereces of Pepsi. The statemet (by CCC) to be prove is p > 1 p, or p > 1 : what we wat to prove statistically is the alterative hypothesis. The test (or decisio criterio) ca be formulated as follows:

83 4-3 Reject H 0 : p = 1 i favour of H 1: p > 1, if X = the umber of Coca Cola prefereces k. X k, the values of X, for which we will reject H 0, is called the rejectio regio. (We still eed to determie the critical value k.) I the previous examples the approach i hypothesis testig becomes explicit: assumig the situatio as described i the ull hypothesis to be the true reality i the populatio, we are oly willig to reject this assumptio if the results of the sample are very ulikely. Ulikely i the meaig of occurrig with (very) small probability if H 0 is true. What is small eough is somethig to agree upo i advace. CCC ad Pepsi agreed upo a level of sigificace of 1%. Ofte 5% is the chose level of sigificace ad sometimes 10%. Note that the level of sigificace is a error probability: i the CCC example the probability is about 1% to fid a value of X i the rejectio regio ad (thus) to reject H 0, i case of p = 1 (the ull hypothesis) With the same termiology we wat to show that the mea IQ of techical studets is sigificatly greater tha 115: we wat to coduct a test o H 0 : μ = 115 agaist H 1 : μ > 115. The (sample) mea of a radom sample of = 9 IQ s of techical studets is observed: x = Is this mea large eough to reject the ull hypothesis at a 5% sigificace level? We will see that there are (at least) two viable solutios to this questio, but let us first give the probability model o which both approaches are based: Model: the IQ s X 1,, X 9 of 9 techical studets are idepedet ad all N(μ, 81)-distributed. So, the mea of the IQ s of techical studets could be differet from 115, but the variace is assumed to be the same as the variace of all Dutch studets (stadard deviatio σ = 9). Cosequetly, it follows that X = 1 X 9 i=1 i is ormally distributed as well, with a kow variace: X ~ N (μ, σ ) = N (μ, 81 9 )

84 4-4 Assumig that the ull hypothesis is true, μ is kow: μ = 115: X 115 the X ~ N(115, 9) ad the z score is 3 ~ N(0, 1) Rejectig or failig to reject H 0 usig the p-value. We will cosider the alterative hypothesis μ > 115 to be prove, if, assumig H 0 : µ = 115 is really true, the mea is uexpectedly large. The observed mea is x = Sice X is a cotiuous variable we have P(X = 119) = 0. We will compute the probability that X is at least 119: the evet that we observe such a large value of the sample mea this large or eve larger. If this evet is ulikely, that is if its probability is less tha or equals α = 5%, we will reject the assumptio µ = 115. Well: P(X 119 H 0 ) = P (Z ) 1 Φ(1.33) = = 9.18% 3 P(X 119 H 0 ) = 9.18% is the p-value, or observed sigificace (Dutch: overschrijdigskas): the probability that the mea takes o a value that is deviatig this much from the expected value μ = 115, or eve more. The frequecy iterpretatio of this probability 9.18% i this case is that if H 0 is true, it will occur oce i 11 repetitios of the samplig process. This is ot what we call a rare evet. Accordig to the criterio α = 5%, it should be oce i 0 or more repetitios. Now we ca state our decisio o the research questio: P(X 119 H 0 ) = 9. 18% > α = 5%, so we fail to reject H 0 By the way, if we would have agreed upo a α = 10%, H 0 would have bee rejected: it is clear that the α should be chose i advace, otherwise we ca ifluece the decisio. The desire to prove a desirable result ca lead to uethical or o-scietific behaviour. Aother aspect of the choice of α is that we agree i advace to possibly drop to a wrog coclusio with respect to the truth of H 0 : if H 0 is true, the oce i 0 repetitios we will reject H 0, falsely. The decisio criterio (the test) o the basis of the sample result x = is stated as follows: Reject H 0 if the p value P(X 119 H 0 ) α

85 4-5 Rejectig H 0 or failig to reject H 0, usig the Rejectio Regio (RR). Aother, but equivalet approach is to fid out i advace for which values of X the ull hypothesis will be rejected. These values costitute the Rejectio Regio (Dutch: Kritieke gebied). I the discussed example we will reject for large values of X, so if X is at least c: the Rejectio Regio is X c. The critical value c is always icluded i the Rejectio Regio ad is determied usig α = 5%, the chose level of sigificace: c 115 c 115 P(X c H 0 ) = α, or P (Z ) = 1 Φ ( ) = 5% 3 3 c 115 or Φ ( ) = 95% 3 Search i the stadard ormal table to fid: c 115 = 1.645, so c = = Now we are ready to decide, whether or ot, we will reject H 0 : Sice x = < c (119 ot i the RR), we will ot reject H 0. The test, regardless which mea we (will) observe, is stated as follows: Reject H 0 if X c = Note that coductig the test with the p-value ad coductig the test with the rejectio regio are equivalet procedures, leadig to the same coclusios. If the observed sample mea x is less tha c = , the p-value P(X x H 0 ) is greater tha α = 5% (see the graph above). Reversely: if x lies to the right of c, the p-value is less tha α = 5%. Fially, the decisio should be stated i commo words, aswerig the research questio: At a sigificace level of 5% the sample did ot provide sufficiet evidece to state that techical studets are o average more itelliget tha all studets i the Netherlads. Note (Aalogy betwee law ad hypothesis tests) It may be clear from the examples that H 0 ad H 1 are ot equivalet choices: the questio is ot whether oe of the two is most likely. No, our aim is to provide eough evidece to prove that H 0 is rejected ad that H 1 is true. The law process is similar: a suspect is iocet

86 4-6 (H 0 ), uless prove guilty (H 1 is true). The judge examies whether the evidece is sufficietly strog to setece the suspect. The testig procedure Above, we discussed the statistical reasoig whe hypotheses are ivolved: there must be a research questio, that ca be stated i a hypothesis that must be prove usig statistical data. The data, observatios, should be modelled i a probability model (or: statistical assumptios), the ull ad alterative hypotheses are formulated i terms of populatio parameters i the model ad a test statistic is chose. The we use either the p-value for the observed value of the test statistic or the Rejectio Regio i order to decide, whether or ot the ull hypothesis is rejected. The last step is traslatig the decisio i a aswer to the research questio. Though we will discuss may more types of tests, the reasoig described above will remai the same. To make sure that all relevat steps i our reasoig are covered, we will use a eight steps testig procedure for ay test we will coduct (the procedure ca be foud o the formula sheet as well): Testig procedure i 8 steps 1. Give a probability model of the observed values (the statistical assumptios).. State the ull hypothesis ad the alterative hypothesis, usig parameters i the model. 3. Give the proper test statistic. 4. State the distributio of the test statistic if H 0 is true. 5. Compute (give) the observed value of the test statistic. 6. State the test: a. Determie the rejectio regio or b. Compute the p-value. 7. State your statistical coclusio: reject or fail to reject H 0 at the give sigificace level. 8. Draw the coclusio i words. The iterpretatio of a exercise or practical situatio might be see as the first step ( step 0 ). The research questio at had was: Do techical studets have a average IQ, larger tha 115, the mea IQ of all Dutch studets? Applyig the testig procedure we foud: 1. Model: we have a radom sample of 9 IQ s of techical studets, draw from the N(μ, 81)- distributio of IQ s.. Hypotheses: test H 0 : µ = 115 agaist H 1 : µ > 115with α = Test statistic: X. 4. Distributio uder H 0 : X ~ N (115, 81 9 ) 5. Observed value: x = a. Rejectio Regio: Reject H 0, if X c P(X c H 0 ) = 1 Φ ( c 115 ) = α = 5%, so c 115 = 1.645, or c = Statistical coclusio: x = 119 is ot i the Rejectio Regio we failed to reject H Coclusio i words: the data did ot prove at a 5% level of sigificace that the expected IQ of techical studets is larger tha 115. If we would have chose to use the p-value, oly steps 6 ad 7 are chaged (it explais why we

87 first compute the value of the test statistic i step 5: we eed it to compute the p-value i 6): b. The p-value: Reject H 0, if the p value α = 5% p value: P(X H 0 ) 1 Φ ( ) = = 9.18% Statistical coclusio: the p-value = 9.18% > α, so we failed to reject H 0. I the remaider of this reader we will always apply this 8 steps procedure, if appropriate. I geeral we are free to choose either the rejectio regio or the p-value, uless it is stated explicitly that oe of the approaches should be used. Probabilities of type I ad type II errors ad the power of a test Ofte the Rejectio Regio is determied for a give level of sigificace α. Sometimes the test is give (the test statistic ad its Rejectio Regio); the the level of sigificace ca be determied. For istace, i our example the Null hypothesis could be rejected if the mea of the 9 IQ s i the sample is at least 1. The the correspodig level of sigificace α ca be computed: α = P(X 1 H 0 ) P (Z ) = 1 Φ(.33) = = 0.99%. 3 This test has thus a error rate (the probability of rejectig H 0, though it is true) of about 1%: at most oe out of 100 repetitios of the sample will lead to rejectio, if i all cases H 0 is true. α is called the probability of a type I error. Errors of type II may occur as well: if the alterative hypothesis is true, but the sample results i a value outside the Rejectio Regio, implyig that the ull hypothesis is ot rejected. I our example with: we observe X < c, though i reality μ > 115. The probability of a type II error is P(X < c μ > 115): this probability depeds o the value of µ (ay value greater tha 115). But give oe of these values we ca compute the probability distributio of the sample mea: X ~ N (μ, 81 ). I the graph below this probability distributio is 9 show for μ = 115 (ull hypothesis) ad μ = 11 (alterative hypothesis is true). Below the probability of type I (α) ad the probability of type II is μ = 11 (the shaded area) are show, for the test with c = (α = 5%).

88 4-8 The probability of a correct decisio is the area to the right of c. The total area is, of course, 1, so i geeral we ca give the followig relatio (for ay µ > 115): P(X c μ) = 1 P(X < c μ) This probability of correct rejectio of the ull hypothesis is called the power of the test at the give value of µ. I words the complemet rule above states for give μ: The power = 1 Probability of a type II error We will compute the probability of a type II error ad he power for two (arbitrarily chose) values of μ: μ = 11: Prob. of a type II error: P(X < c μ = 11) = P (Z ) = Φ( 0.355) 36.1% 3 Power: P(X c μ = 11) = 1 Prob. of a type II error = 63.9% μ = 13: Prob. of a type II error: P(X < c μ = 13) = P (Z ) Φ( 1.0) 15.4% 3 Power: P(X c μ = 13) = 1 Prob. of a type II error = 84.6% From the computatios ad graphs above we uderstad that the probability of a type II error decreases if the value of µ (> 115) icreases ad the probability of a error icreases if μ is closer to 115 (where H 0 is true). Because of the complemet rule the power icreases as μ icreases: the test is more powerful at μ = 13 tha at μ = 11. Powerful i the sese of a higher probability of a correct distictio betwee μ = 115 ad μ = 13. Example The power of the test ad the sigificace level α. The termiology hits to the desirable situatio: a probability of a type II error should be as small as possible ad the power as large as possible. Sice the power depeds o the value of μ, it is a fuctio of μ: β(μ) = P(X c μ)

89 4-9 We have determied two of the fuctio values: β(10) = ad β(13) The power fuctio ca be graphed: Note that the fuctio at H 0 : μ = 115 is just the level of sigificace α = 0.05: β(115) = P(X c μ = 115) = α Moreover, the graph of the power shows that the sigificace level should be iterpreted as the maximum probability of a type I error if we choose a so called composite ull hypothesis, H 0 : μ 115. Now the probability of a type I error depeds o μ: β(μ) = P(X c μ) is the probability of a type I error for each value of μ 115, e.g. if μ = 11: β(11) = P(X c μ = 11) = P (Z ) 1 Φ(.65) = 0.4% 3 Obviously the type I error has the largest probability i the boudary μ =115 of the ull hypothesis μ 115: therefore the sigificace level, for composite ull hypotheses, is see as a threshold value, ofte deoted as α 0 : the maximum allowed value of the probability of a type I error. I the example we oticed that the test o H 0 : µ 115 versus H 1 : µ > 115 is equivalet to the test o H 0 : µ = 115 versus H 1 : µ > 115. For composite ull hypotheses we will iterpret step 4. State the distributio of the test statistic if H 0 is true. of the testig procedure as the distributio at the boudary value of H 0.

90 4-10 For every test it is possible to determie probabilities of errors of type I ad type II ad the power (sometimes approximately), but we will oly do so for oe sample problems: test o μ if σ is kow, test o σ ad test o p. The probability of errors (types I ad II) ad of correct decisios are give i the followig table, where 1 α, the probability of ot rejectig H 0, if it is true does ot have a special ame. Decisio of the test H 0 is rejected H 0 is ot rejected I reality H 0 is true H 1 is true Type I error Correct decisio: power = α = P(X c H 0 ) β(μ) = P(X c μ) Correct decisio Type II error 1 α = P(X < c H 0 ) 1 β(μ) = P(X < c μ) Effect of a larger sample size at the same level of sigificace α If we would observe 4 times as may IQ s as before, so = 36, the variace of the mea will decrease: var(x) = σ = 81, so σ 36 X = σ = 9 = We compute the ew critical value c for the larger sample: P(X c H 0 ) = 1 Φ ( c 115 c 115 ) = α = 5%, so = 1.645, or c The the power of the test at μ = 11 is: β(11) = P(X μ = 11) = P (Z ) Φ(.35) = 99.06% 1.5 I coclusio: if we icrease the umber of observatios by a factor 4, the distributios of X for all values of µ are more peaked aroud µ (smaller stadard deviatio). Cosequetly, the power of the test at μ = 10 icreases dramatically: from 63.9% if = 9 to 99.06% if = 36. Oe- ad Two-tailed tests I the examples so far we eeded to provide large values of the test statistic as to prove H 1 : we will call this a oe-tailed test or, more precise, a upper-tailed or right-sided test. I the extesive IQ example the Rejectio Regio is upper-tailed: X c ad we computed the upper-tailed p-value P(X 119 H 0 ). If the research questio would give rise to the hypotheses H 0 : μ = 115 agaist H 1 : μ < 115, it is evidet that we will reject the ull hypothesis oly if X attais small values. The lower-tailed Rejectio Regio would have the form X c: the critical value c ca be computed at sigificace level α, usig a tail probability with area α to the left of c.

91 4-11 The lower-tailed p-value for the observed x is the area left of x: P(X x H 0 ). If i the example the research questio would be eutral (e.g.: Is the mea IQ of techical studets differet from the mea IQ of all studets ), the the hypotheses ca be formulated as follows: Test H 0 : μ = 115 agaist H 1 : μ 115. A deviatio from the expectatio 115 by the sample mea, i positive or egative directio, should provide sufficiet evidece for the alterative. The Rejectio Regio is two-tailed: Reject H 0, if X c 1 or X c. Sice the sigificace level α is the sum of tail probabilities (left ad right), we will compute c 1 ad c usig two tail probabilities α : P(X c 1 H 0 ) = α ad P(X c H 0 ) = α. I the example we fid for = 9 ad α = 5%: P(X c H 0 ) = α = 0.05 implies 1 Φ (c 115 ) = = 1.96 ad c = Usig the symmetry about µ = 115, we fid c 1 = So c If we cosider the two-tailed p-value, oe is iclied oly to compute the upper tail probability for the observed mea x = 119. But sice H 1 : μ 115, i this two-tailed test oe should cosider all deviatios (positive ad egative) as large as observed (+4) or larger. Below the graph shows these deviatios of at least 4 from the mea. 3

92 4-1 Usig the symmetry we ca easily compute the two-tailed p-value: P(X 119 H 0 ) = 9.18% = 18.36% For all commo values of α ( 10%) we will ot reject the ull hypothesis: H 0 will oly be rejected if the level of sigificace is at least 18.36%. I this chapter ad the followig chapters we will see a wide rage of oe- ad two-tailed tests i may examples ad exercises. Example (The choice of a test statistic) Whe we are coductig a upper-tailed test o the populatio mea µ, the sample mea X is a atural choice: X is ubiased estimator of µ ad the larger the sample, the better the estimator is. If we test H 0 : µ = μ 0 agaist H 1 : µ > μ 0, for a give value μ 0 of µ, the the Rejectio Regio X c is determied by stadardizatio: P ( X μ ) = 0.05 = α σ/ The Rejectio Regio is: X μ σ = c Sice X μ σ is equivalet to X μ , we ca choose the z-score σ/ Z = X μ 0 σ/ as a alterative test statistic, with Rejectio Regio Z This stadard ormal Z is preseted i may books o statistics as the test statistic. But istead of Z or X we could choose 3X with rejectio regio 3X 3c or X with the (lowertailed) rejectio regio X c as equivalet tests. To avoid cofusio it is advisable to make a fixed choice of the test statistic: if a problem requires a test o the expectatio μ of the ormal distributio with kow σ, our choice will be the sample mea X. All the cocepts that we itroduced for the test o the ukow expectatio µ of the ormal distributio with kow variace σ ca be geeralized to other ukow parameters (σ, p, μ 1 ad μ, etc.) of kow distributios. Below the cocepts are give for the oe-sample problem. Geeral otatio of statistical cocepts We will adopt a similar otatio as used i estimatio theory, for testig hypotheses about a populatio o the basis of a radom sample X 1,, X, draw from the populatio variable X. θ is the ukow parameter of the give distributio of X. Note that, i geeral, θ ca be a vector of ukow parameters, e.g. θ = (µ, σ ) for a ormal distributio The parameter space Θ is the set of all possible values of θ. The hypotheses have the shape H 0 : θ Θ 0 ad H 1 : θ Θ 1, where Θ 0 ad Θ 1 are a partitio of Θ: disjoit sets of values of θ ad Θ = Θ 0 Θ 1. If Θ 0 cosists of oly oe value θ 0, the H 0 : θ = θ 0 deotes a sigle ull hypothesis, whereas H 0 : θ θ 0 is a composite ull hypothesis (Θ 0 = (, θ 0 ] ).

93 4-13 Likewise H 1 ca either be sigle (θ = θ 1 ) or composite (e.g. θ θ 0 ). Usually, i practice, the hypotheses are derived from the research questio at had. A test statistic is a fuctio of the sample variables: T = T(X 1,, X ). Usually the test statistic is a estimator of the ukow parameter or a fuctio of this estimator (such as X, S ad p for tests o specific values µ, σ ad p, respectively). The rejectio regio (RR) is the set of all outcomes of the sample for which the ull hypothesis is rejected i favour of the alterative. RR = { (x 1,, x ) H 0 is rejected }, where every x i ca attai values from the rage S X of the populatio variable X ((x 1,, x ) (S X ), a subset of R ) Note that X 0 is a simplified versio of the formal RR = { (x 1,, x ) x 0}. For simplicity we will write T RR istead of the evet (X 1,, X ) i the formal RR. A test o H 0 agaist H 1 cosists of a test statistic T ad its rejectio regio. This combiatio specifies the decisio criterio: reject H 0, (oly) if T is i the RR. A type I error occurs if H 0 is rejected, though i reality H 0 is true. The probability of a type I error: P(T RR θ = θ 0 ) for each value θ 0 Θ 0. The sigificace level of a test: α = max P(T RR θ = θ 0 ). θ 0 Θ 0 For a sigle H 0 : θ = θ 0 we have α = P(T RR θ = θ 0 ). We will deote the sigificace level as α 0, if we should fid a rejectio regio such that the probability of a type I error does ot exceed α 0 : fid a RR such that α α 0. A type II error occurs if H 0 is ot rejected, though i reality H 1 is true. The probability of a type II error: P(T RR θ = θ 1 ) for each value θ 1 Θ 1. The power β of the test is the probability of rightfully rejectig H 0, for a give value of θ 1 Θ 1 : β(θ 1 ) = P(T RR θ = θ 1 ) = 1 P(type II error). ote that β(θ) = P(T RR θ) is the probability of a type I error if θ Θ 0. I the first sectio we applied all of the cocepts above to the oe-sample problem, where the populatio has a ormal distributio with ukow µ ad kow σ ad test statistic T = X. Defiitio If - θ is a ukow parameter i a give distributio of X, where θ is i a parameter space Θ, - we wat to test H 0 : θ Θ 0 ad H 1 : θ Θ 1, where Θ 0 ad Θ 1 are a partitio of Θ, - X 1,, X is a radom sample of X: a realizatio (x 1,, x ) is i the rage S X R ad - the rejectio regio RR is the subset of the rage for which H 0 is rejected i favour of H 1, the 1. β(θ) = P((x 1,, x ) RR θ) is the power fuctio, where θ Θ. β(θ 0 ) = P((x 1,, x ) RR θ = θ 0 ) is the probability of a type I error for θ 0 Θ 0 3. α = max θ 0 Θ 0 P((x 1,, x ) RR θ = θ 0 ) is the sigificace level of the test 4. β(θ 1 ) = P((x 1,, x ) RR θ = θ 1 ) is the power of the test for θ 1 Θ 1 5. P((x 1,, x ) RR θ = θ 1 ) is the probability of a type II error for θ 1 Θ 1 I the remaider of this chapter we will first apply these cocepts for stadard oe sample

94 4-14 tests: the test o the expectatio μ ad the test o the variace σ, both for a ormal model with ukow μ ad σ, ad the test o the populatio proportio p: by cosiderig the values of the test statistic uder H 0 ad uder H 1 the shape of the rejectio regios are foud by reasoig. I sectios 5 ad 6 we will discuss criteria for ad costructio of tests (choice of test statistic ad determiatio of the rejectio regio) i a more structural way ad we will (try to) verify whether the tests, that we use, are the best possible tests. The cocepts above are give for oe-sample problems, but they ca be exteded to two-samples problems (chapter 5), k-samples problems, liear regressio problems (chapters 9 ad 10), etc. For example, if we cosider two (idepedet) samples X 1,, X 1 ad Y 1,, Y, draw from the populatio distributios of X ad Y, respectively, the T = X Y is possibly a test statistic, where T = T(X 1,, X 1, Y 1,, Y ) ad the rejectio regio is a subset of (S X ) 1 (S Y ) R 1+. Mostly we will oly bother to give these formal defiitios, if eeded for provig properties. 4. Test o the populatio mea μ, if σ is ukow I the first sectio we focused o a test of H 0 : μ = μ 0 agaist H 1 : μ > μ 0 for samples, draw from the N(μ, σ )-distributio. We used the kow value of σ to determie the RR. P(X c H 0 ) = 1 Φ ( c μ 0 σ/ ) = α = 0.05, so c = μ σ If σ is ukow we caot simply replace σ by the sample stadard deviatio s: similarly as i the costructio of cofidece itervals we will have to use the t-distributio, see property Defiitio 4..1 If we test H 0 : μ = μ 0, based o a radom sample of the ormal distributio with ukow expectatio µ ad ukow variace σ, the test statistic is T = X μ 0 S/ Property 4.. If i def H 0 is true, the test statistic has a t-distributio: T = X μ 0 S/ ~ t 1 The formal proof of this property is give i sectio 8.4. I situatios where the t-distributios with 1 degrees of freedom ca be applied to fid a cofidece iterval for μ, a test o a specific value (μ 0 ) of μ ca be coducted with T as test statistic. To simplify otatio, T 1 is a t 1 -distributed variable, like T = X μ 0 S/ uder H 0. Furthermore, otice that if T attais a value t, it meas that the observed values x 1, x,, x i the sample produce this umber t = x μ 0 s/ : T t is the evet that T attais the observed value t or larger. The observed value t ca be iterpreted as follows:

95 t is the umber of stadard errors s populatio mea μ 0, as sketched here: 4-15 that the observed mea x deviates from the assumed μ 0 x t = x μ 0 s t is larger (positive) as x is larger ad t will be egative if the sample mea is less tha μ 0. Cosequetly, if we coduct a upper-tailed test o H 0 : μ = μ 0 agaist H 1 : μ > μ 0 we will reject H 0 if we observe large (positive) values of T. Example 4..3 (former exam exercise) The data cetre of Stork Kettles received complaits about the slowess of the computer etwork ad decided to measure the respose times of a specific type of CadCam-commads at arbitrary momets durig workig hours: the observed radom sample of respose times had a mea of secods ad a sample stadard deviatio 5.06 sec. a. Give a probability model for the observatios ad determie a umerical 90%-cofidece iterval for the expected respose time (for this type of commads). b. Prior to the survey it was stated that the expected respose time should be less tha 16 secods. Does the sample prove that this coditio is fulfilled? Coduct a complete test to aswer this questio with α = 10%. Solutios: a. Probability model of the observatios: the respose times X 1,, X 16 are idepedet ad all (approximately) N(, )-distributed with ukow ad. We will use the formula 90%-CI( )=(X c S S, X + c ), where = 16, x = 15.10, s = 5.06 ad c from the t 15 -table, such that P( T 15 c) = 0.95, so c = Computig the bouds: 90%-CI( ) (1.9, 17.3). ( We are 90% cofidet that the expected respose time lies betwee 1.9 ad ) b. 1. Model: the respose times X 1,, X 16 are idepedet ad ormally distributed with ukow ad ukow σ. (So we will coduct a t-test o μ). Test H 0 : µ 16 agaist H 1 : µ < 16 with α = Test statistic T = X μ 0 S/ = X 16 S Distributio of T, if H 0 is true: T ~ t Observed value: t = / a. This a lower tailed test: reject H 0, if T c, c is egative: sice i the t 15 -table we fid P(T ) = 0.10, c = 1.341

96 Statistical decisio: t = > 1.341, so we fail to reject H Coclusio: at a sigificace level 10% there is isufficiet statistical evidece to claim that the expected respose time is less tha 16. The alterative approach with the p-value: 6. b. Reject H 0, if the p-value α, where the p-value is lower-tailed: P(T ) = P(T ) > 10%, 7. The p-value > = 0.10, so we fail to reject H 0. Example 4..3 shows that the critical value c for a left-sided t-test with α = 0.10 is egative ad the table value c is ot the same as i a cofidece iterval of µ with cofidece level 1 α = = This is caused by the fact that the test is oe-tailed ad the cofidece iterval is two-tailed: the lower-tailed probability i the test is α ad the cofidece iterval is based o two tail probabilities α. t-tests o H 0 : μ = μ 0, a overview. If the alterative H 1 : μ > μ 0, the the test with test statistic T is right-sided (upper-tailed): The Rejectio Regio has shape T c, where c from the t 1 -table is such that P(T 1 c) = α. The p-value for the observed value t of T is P(T t H 0 ) = P(T 1 t) If H 1 : μ < μ 0, the the test with T is left-sided (lower-tailed): The RR is T c, where c is egative ad is determied by P(T 1 c) = P(T 1 c) = α The p-value for the observed value t of T is P(T 1 t) If H 1 : μ μ 0, the the test with T is two-sided (two-tailed): The RR is T c or T c, where c from the t 1 -table is such that P(T 1 c) = α. The p-value for the observed value t of T is P(T 1 t), if t 0 or P(T 1 t), if t < 0. A coveiet otatio op the two-tailed p-value is: P(T 1 t ). Example 4..4 A govermet publicatio o the profitability of compaies i a idustrial brach reports that the compaies are recoverig from the ecoomic crisis i the years before. Defie X as the profit of a compay of a arbitrary compay, computed as a percetage of its turover. Accordig to the authors of the publicatio X ca be modelled as a ormally distributed radom variable with expectatio.00 (%). A studet is asked to verify the claims i the publicatio, especially the expected profit. After examiig 5 radomly chose aual reports of compaies i the brach he computed a mea profit 1.90 ad a sample stadard deviatio The studet wats to test the correctess of the reported expected profit.00. Sice he does ot wat to accuse the govermet of givig icorrect iformatio wrogly, he will use a maximum error probability of at most 5%. The research questio at had is: is the observed mea profit 1.90 sigificatly differet from

97 4-17 the reported.00?. The test i 8 steps: 1. The probability model: the profits X 1,, X 5 are idepedet ad all (approximately) N(, )-distributed, with ukow ad.. Test H 0 : µ =.00 agaist H 1 : µ.00 with = Test statistic: = X μ 0 S/ = X.00 S/ Distributio of T, if H 0 is true: T ~ t Observed value: t = b. It is a two-sided test: if the p-value α, the H 0 is rejected. The p-value is P(T 4.78) = 1% (see the table: P(T 4.80) = 0.005). 7. p-value < α = 5%, so we reject H At a level of sigificace 5% we showed statistically that the expected profitability i the idustrial brach is differet from the reported.00 % i the govermet publicatio. I the example we observed a value of T (.78) close to oe of the values i the table (.80), but i geeral we could use liear iterpolatio to fid a approximate p-value. I geeral it is sufficiet just to report the iterval i which the p-value lies, such as the p-value lies betwee 1% ad.5%, sice this is eough to compare it to the usual values of α, 1%, 5% or 10%. A last ote o the oe sample t-tests i this sectio is that we will ot compute the probability of a type II error or the power of the t-test: this is ot part of this course. 4.3 Test o the variace σ The estimator S is a atural choice to use as test statistic for a test o the variace σ of the ormal distributio. I chapter 3 we itroduced the Chi-square distributio ad property 3.3. states that ( 1)S σ has a Chi-square distributio with 1 degrees of freedom. If we coduct a test for testig H 0 : σ = σ 0 (for some fixed value σ 0 ) the S is a suitable test statistic, sice the χ 1 -distributio of ( 1)S σ uder H 0, eables the computatio of a Rejectio 0 Regio for S. Sice S is o-egative, the rejectio regio is, depedig o the research questio, oe- or twosided: (0, c] or [c, ) or (0, c 1 ] [c, ) where the critical values are all positive. Example A wholesale orders medicies i large quatities at producers. The quality coditios are verified by checkig a radom sample of the medicies. Usually for pills two aspects are importat: the mea quatity (i mg) of the effective substace per pill ad the variatio of this quatity. I sectio 4.3 we discussed the t-test o the mea quatity μ. Now we will cosider a test o the variace if the producer claims that the coditio σ 10 mg is fulfilled. If the sample shows this coditio is ot fulfilled, the total order is retured to the producer. The producer wats that

98 4-18 the probability that the order is retured wrogly (the producer`s risk ), is at most 1%. A radom sample of 0 pills should give the decisive aswer, whether the variatio coditio is ot fulfilled. We will derive the rejectio regio by followig the testig procedure up to step 6: 1. Model: the observed quatities of effective substace i the 0 pills ca be see as a realizatio of a radom sample X 1,..., X 0 of the quatity X per pill, that has a N(μ, σ )- distributio with ukow μ ad σ.. We will test H 0 : σ 10 versus H 1 : σ > 10 with α = The test statistic is S. 4. Distributio uder (the boudary value of) H 0 : 19S is χ distributed. 5. The observed s should be computed if the 0 quatities are measured. 6. We have a right sided test here: Reject H 0 if S c. P(S c H 0 ) = P ( 19S 19c H ) = P (χ 19 19c = or c so 19c ) α 0 = 0.01, The test (the decisio criterio) is determied: S Reject H 0 (the order is retured) S < H 0 is ot rejected (the order has to be accepted). The wholesale thiks that the critical value c = of s is ot very satisfactory: there is a cosiderable risk that a order that does ot satisfy the coditio, evertheless has to be accepted. This buyer s risk is the probability of a type II error, as discussed i sectio 4.1. We will compute this risk if for the order i reality σ = 15 (ot fulfillig the coditio, i H 1 ): P(S < σ = 15) = P ( 19S < σ = 15) P(χ ) 80% (we used iterpolatio of the table values P(χ 19.7) = 0.75 ad P(χ ) = 0.90). This high buyer s risk ca be decreased by icreasig the sample size at the same producer s risk 1%. If we wat to test H 0 : σ = σ 0 versus H 1 : σ σ 0, the test is two-sided: we will reject H 0 if the observed value of S deviates sigificatly from σ 0. The Rejectio Regio will be two-tailed: Reject H 0 if S c 1 or S c. Sice the Chi-square distributio is ot symmetric, we will have to determie c 1 ad c

99 4-19 separately, usig tail probabilities α ad the χ 1-distributio of ( 1)S, such that P(S c 1 H 0 ) = P(S c H 0 ) = α. Note 4.3. I the (exceptioal) case that μ is kow, we will use (see example 3.3.) the sample variace for kow σ, S μ = 1 (X i μ), as test statistic. i=1 For the test o H 0 : σ = σ 0 we ca apply the χ -distributio of S μ regio. σ 0 σ 0 to determie the rejectio This completes the discussio of oe sample testig problems where the ormal distributio of the populatio variable is a correct assumptio, or at least a good approximatig distributio. I chapter 7 we will discuss what to do if the ormal distributio does ot apply.

100 Test o the populatio proportio p Did the populatio proportio of voters supportig the liberal party chage? Is the probability that a goalkeeper kills a pealty larger tha 50%? Is for a iteret site the percetage of visitors, who try to eter the computer system illegally, larger tha assumed? Each of these questios addresses a issue that might be described as a biomial test problem : Each questio relates to a populatio proportio p, havig a specific property the remaiig part of the populatio (proportio 1 p) does ot. p is a success probability : the probability that a arbitrary elemet of the populatio has the property. To determie the ukow proportio p, we will cout the umber X of successes (the umber elemets, that have the property) i a radom sample ad compare it to the sample size. The radom sample suggests that the evets (of the elemets havig the property) are (idepedet) Beroulli trials. But, if the draws from the populatio are without replacemet, the distributio of X is hypergeometric. Oly if the sample size is relatively small compared to the populatio size, we ca use the biomial distributio for X, as a approximatio. For the problems i this sectio we will assume the biomial distributio for the umber X: either the draws are idepedet (with replacemet) or approximately idepedet (large populatios). X is biomially distributed with umber of trials ad success probability p: X ~ B(, p) For large ad p ot close to 0 or 1, we will apply the ormal approximatio (CLT): X ~ N(p, p(1 p)) The sample proportio p = X is a ubiased estimator of p ad for large we have: p = X p(1 p) ~ N (p, ) If we wat to test whether the real, but ukow populatio proportio (p) has a larger value tha a specific (real) value p 0, we wat to test H 0 : p = p 0 agaist H 1 : p > p 0. If H 1 is true, the sample proportio p is expected to be large (> p 0 ), implyig that the observed value of X should be sufficietly large to reject H 0. I other words: this is a right-sided test, that has the followig shape: Reject H 0 if X c. Sice the distributio uder H 0 is kow, we ca use the B(, p 0 )- distributio of X to determie the critical value. For large we ca coduct this biomial test, usig the approximate N(p 0, p 0 (1 p 0 ))-distributio. Note that we choose X to be the test statistic, ot p or the z-score of X or p. Though these are all suitable alteratives, we will prefer to use X for computatioal reasos: for small we ca use the exact biomial probability fuctio or tables ad for large we ca use ormal approximatio of biomial probabilities with cotiuity correctio.

101 4-1 Example (Cotiuatio of example 4.1.3) We cosider CCC s statemet that cosumers i majority prefer Coca Cola over Pepsi. Suppose 550 drikers prefer Coca Cola i the radom sample of 1000 coke drikers. Sice we do ot wat to falsely ackowledge Coca Cola s claim we choose a (maximum) level of sigificace α 0 = 1%. So, the probability of a type I error is at most 1%. The eight steps of the test are as follows. 1. Model: X = umber of cola drikers the sample who prefer Coca Cola : X is B(1000, p)-distributed, with ukow p = The proportio with preferece for Coca Cola i the populatio.. We test H 0 : p = 1 agaist H 1: p > 1 with α 0 = 1%. 3. Test statistic: X 4. Distributio if H 0 is true: X ~B (1000, 1 ), so approximately N(500, 50). (σ 16) 5. Observed value: x = Reject H 0 if the p-value α 0 = 1%. Computatio of the upper-tailed p-value (with cotiuity correctio): P(X 550 H 0 ) c.c. = P(X H 0 ) CLT P (Z ) = 1 Φ(3.13) = 0.09% 7. The p-value 0.09% < 1%, so reject H The sample showed at a sigificace level of 1% that coke drikers prefer i majority Coca Cola over Pepsi. The p-value shows that the proof of the statemet is quite strog ; eve if α is as small as 0.09%, H 0 would (just) have bee rejected. We caot easily blame coicidece for this outcome. Of course, we could have applied the approach of the right-sided Rejectio Regio X c, meaig that X has to attai a (iteger) value from {c, c + 1,, 999, 1000}. α 0 = 0.01 is ow a threshold value, sice we search the smallest iteger c such that: P(X c H 0 ) 0.01, so 50 P(X c H 0 ) c.c. = P(X c 0.5 H 0 ) CLT P (Z From the stadard ormal table we fid: (c 0.5) 500 (c 0.5) 500 ) c So c = 538. The Rejectio Regio is {538, 539,, 999, 1000}: reject H 0 if X 538. The observed x = 550 brigs us to the same coclusio as before: reject H 0. Sice for the biomial test the parameter is usually kow (give), the distributio oly depeds o the value of p: if H 0 : p = p 0 is true, we kow that X has a B(, p 0 )-distributio ad for a specific value p 1 uder the alterative hypothesis H 1 X has a B(, p 1 )-distributio ad for sufficietly large ( 5, p 1 > 5 ad (1 p 1 ) > 5): X CLT ~ N(p 1, p 1 (1 p 1 )). So we ca compute (approximate) the probability of a type II error ad the power for this p 1.

102 4- Example 4.4. (cotiuatio of example 4.4.1) If i the populatio a majority of 55% coke drikers prefers Coca Cola over Pepsi, what is the probability that the coclusio of the test cofirms that a majority prefers Coca Cola? This is the power of the test, the probability that X is i the Rejectio Regio, if p = 0.55: P(X 538 p = 0. 55), which ca be computed with the ormal approximatio of the B(1000, 0.55)-distributio, so X CLT ~ N(550, 47.5): β(0.55) c.c. = P(X p = 0.55) CLT P (Z ) 47.5 P(Z 0.79) = Φ(0.79) = 78.5% Cosequetly, the probability of a type II error for p = 0.55 equals 1 β(0.55) =1.48%. I the example above ad i the first sectio we oticed that a large sample is preferable: a large icreases the power of the test. It is possible to determie the sample size for give level of sigificace α ad desirable power β for a give value of the parameter i H 1. E.g. i the example of this sectio we could require a power β(0.55) 0.95 at the same sigificace level α 0 = 1%. The approach is as follows: first determie the Rejectio Regio (the critical value c) as a fuctio of ad the determie such that the coditio β(0.55) 0.95 is fulfilled. Though a large sample is preferable, i practice large samples are ot available or too costly. I practice oe ecouters small samples frequetly if user tests are coducted for ewly desiged cosumer products. Also oe could thik of the success rate of the lauch of a space shuttle. But sometimes a small sample is eough to provide sufficiet evidece for a hypothesis. Sice we caot exclude small samples, the biomial test for small is discussed i the followig example. Example Durig the outbreak of the Ebola-virus i West-Africa i 014, it tured out that uder good medical coditios the survival probability is oly 40%. Small quatities of a ew experimetal medicie were available, ad the lack of other medicies made the authorities decide to test the medicie o 0 voluteerig Ebola-patiets. At the ed of the experimet 9 patiets died, but 11 survived: they foud atibodies i their blood samples. Does the experimet prove, at a sigificace level of α 0 = 5%, that the medicie icreases the survival rate? We will coduct the full biomial test i 8 steps for small : 1. X = umber of survivors i the radom sample of 0 treated Ebola-patiets. X ~ B(0, p), where p = the survival probability if the medicie is used.. Test H 0 : p = 0.40 agaist H 1 : p > 0.40 with α 0 = 5%. 3. Test statistic is X 4. Distributio of X uder H 0 : B(0, 0.40)

103 Observed value x = Right-sided test: reject H 0, if X c. P(X c p = 0.40) 0.05, so P(X c 1 p = 0.40) 0.95 From the B(0, 0.40)-table we fid: c 1 = 1, so c = X = 11 < c = 13, so we caot reject H At a 5% sigificace level the small sample did ot covicigly prove that the medicie icreases the survival rate. The probability of a type I error of this test ca be computed as follows: α = P(X 13 p = 0.40) = 1 P(X 1 p = 0.40) = =.1%, which is much smaller tha the threshold α 0 = 5%. I this case 11 out of 0 (55%) i the sample is ot sufficietly sigificat. But, 13 or more out of 0, so at least 65% survival i the sample, would be sigificat. For larger samples this statistical sigificace would be attaied for lower survival rates i the sample. The power ad the probability of a type II error ca be computed i a similar way. Cosider e.g. a real success rate of 60%. If the umber of survivors is X ~ B(0, 0.60), the the umber of deaths Y = 0 X ~ B(0, 0.40). So the power of the test is the probability of rejectig: β(0.6) = P(X 13 p = 0.60) = P(Y 7 p = 0.40) = 41.6% The probability of a type II error is large: P(X 13 p = 0.60) = = 58.4%. If oe plas a experimet, like the experimet of Example (Coca Cola), oe ofte requires that the ivestigatio is accurate eough. Give the choice of the sigificace level α the probability of the type error is uder cotrol, but the probability of the type II error might be quite large for values of the parameter earby the ull hypothesis. We ca oly assure that the probability of the type II error is small for a specific value of the parameter belogig to the alterative hypothesis if the sample size is large eough. We ca approximate the miimal sample size that satisfies a prescribed level of sigificace α ad a desired power (is 1 probability of a type II error) for a prescribed value of the parameter belogig to the alterative hypothesis. We illustrate the calculatio of a miimal sample by cotiuig the example of Coca cola ad Pepsi. We will eglect the cotiuity correctio i the ormal approximatio of the biomial distributio, i this so-called power calculatio.

104 4-4 Example (cotiuatio of examples 4.1.3, ad 4.4.) Let us agai cosider the experimet of example but ow we wat to calculate the miimal sample size such that (1) The level of sigificace α is 1%, () The power of the test is at least 99% if the true proportio of prefereces of Coca cola is p = 0.55 (The probability of rejectig the ull hypothesis is 99% i case of a 55% preferece for Coca cola i the populatio.) Shortly: β(0. 55) So we do t assume = We wat to calculate a miimal value for that meets the two requiremets. Agai we reject the ull hypothesis if X k ad agai we use the ormal approximatio of the biomial distributio. Uder the ull hypothesis, H 0 : p = 1, the distributio of X is approximated by the ormal distributio with expectatio μ = p = 0.5 ad stadard deviatio σ = p(1 p) = 0.5. So Z = (X 0.5) 0.5 ~ N(0,1) uder the ull hypothesis. Neglectig the cotiuity correctio we the reject the ull hypothesis if Z.33 or (equivaletly) X c with c = , where we used the stadard ormal table ( P(Z.33) = The power for p = 0.55 is the probability P(X c), if X has the biomial distributio with success probability p = Note that i case of p = 0.55 the statemet Z = (X 0.5) 0.5 ~ N(0,1) does ot hold aymore: istead we ow have that the biomial distributio of X is approximated by a ormal distributio with expectatio μ = p = 0.55 ad stadard deviatio σ = p(1 p) = So we get Z = (X 0.55) ~ N(0,1) i case of p = Hece the power β(0.55) is as follows: β(0.55) = P(X c p = 0.55) = P (Z c ) = P (Z ), usig c = = P (Z ) = P (Z )

105 4-5 Usig the stadard ormal table i the reverse way we get: The solvig for we fid: = Hece 160.6, so choose = 161 as miimal value of the sample size, meetig the coditios. 4.5 The fudametal lemma of Neyma ad Pearso I statistics we wat to choose the test statistic ad the rejectio i such a way that the power is as high as possible. Let us study the biomial test as before. Agai let us focus o example (Coca Cola ad Pepsi). Example We are testig H 0 : p = 1 agaist H 1: p > 1 ad we reject the ull hypothesis if X c, where X deotes the umber of prefereces for Coca Cola i the sample of size = If we choose for α = 1% the c = 538 follows (see example 4.4.1). For this test with α = 1% the power for p = 0.55 is give by β(0.55) = 78.5%, see example We may raise the questio whether the power 78.5% is the best value we ca get. Or does there exists a better test: a test with the same level of sigificace α = 1% with a higher power? If we wat to compare the power of a test (beig the combiatio of a test statistic ad the correspodig reject regio) with the power of some other test, the the sigificace levels should be exactly the same. For the biomial test there is a practical problem: the level of sigificace is ot exactly equal to 1%. The probability P(X 538) uder the ull hypothesis,

106 4-6 where X has the biomial distributio with = 1000 ad success probability p = 1, is as follows: P(X 538 H 0 ) c.c. = P(X > H 0 ) CLT = 1 P(Z.37) = = , P(Z > ( ) 50) P(Z >.37) with Z = (X p) p(1 p) = (X 500) 50 havig the stadard ormal distributio. The actual level of sigificace is thus ot α = 0.01 = 1% but = 0.89%. By addig 537 to the rejectio regio we would exceed the threshold α = 1%, sice P(X = 537 H 0 ) = ( ) (1 ) > (we used excel to compute the probability). The sigificace level 1% is exactly obtaied if we will reject for all (iteger) values of X greater tha 537 ad with probability = 11 if X = 537 is observed Formalizig this radomizatio of the rejectio regio we ca defie the coditioal probability of rejectio give the observed value x of X: φ(x) = P("reject H 0 " X = x). I this example φ(x) = 1 if x > 537 (or: X 538) φ(x) = if x = 537 φ(x) = 0 if x < 537 The α = P("reject H 0 ") = P("reject H 0 " X > 537) P(X > 537) + P("reject H 0 " X = 537) P(X = 537) + P("reject H 0 " X < 537) P(X < 537) = = If we reject H 0 accordig to this procedure the the level of sigificace is exactly 1%. Now that we itroduced radomizatio to esure that the tests we are comparig have the same level of sigificace, we ca itroduce the cocept of most powerful tests. We will adopt the formal otatio of the hypotheses, itroduced at the ed of the itroductio i sectio 4.1: if θ is the ukow populatio parameter, the H 0 : θ = θ 0 deotes a sigle ull hypothesis ad H 0 : θ θ 0 is a composite ull hypothesis. Likewise H 1 ca either be sigle (θ = θ 1 ) or composite (e.g. θ θ 1 ). H 0 : θ Θ 0 ad H 1 : θ Θ 1 provide a geeric otatio of

107 4-7 hypotheses, where the total parameter space Θ is a partitio, cosistig of Θ 0 ad Θ 1. Note that Θ is ofte restricted: p [0, 1], σ > 0, etc. The followig defiitio states whe a test (a statistic T(X 1,, X ) ad its rejectio regio) ca be cosidered the best for testig H 0 : θ = θ 0 agaist H 1 : θ = θ 1, based o a radom sample X 1,, X, so better tha ay other test with test statistic T 1 (X 1,, X ) ad its rejectio regio. Defiitio 4.5. For fixed sigificace level α a test o H 0 : θ = θ 0 agaist H 1 : θ = θ 1 is most powerful (MP) if the power β(θ 1 ) of this test is greater tha or equal to the power β 1 (θ 1 ) of ay other test. For sigle ull ad alterative hypotheses the best test, that is the most powerful test for a give level of sigificace α, is delivered by the followig lemma. Lemma The fudametal lemma of Neyma ad Pearso (discrete versio). Cosider a radom sample X 1, X,, X : the radom variables X 1, X,, X are idepedet ad all are distributed accordig to a probability fuctio P(X = x θ) depedig o some parameter θ. We wat to test H 0 : θ = θ 0 agaist H 1 : θ = θ 1, for two give (ad distict) values θ 0 ad θ 1. Defie the ratio r(x 1, x,, x ) = P(X 1 = x 1 θ = θ 0 ) P(X = x θ = θ 0 ) P(X = x θ = θ 0 ) P(X 1 = x 1 θ = θ 1 ) P(X = x θ = θ 1 ) P(X = x θ = θ 1 ) The the test, that rejects the ull hypothesis for small values r(x 1, x,, x ) of test statistic r(x 1, X,, X ), with level of sigificace α (exactly), is most powerful (MP). I order to apply the fudametal lemma of Neyma ad Pearso to the Coca Cola example (4.5.1), we will simplify our testig problem i this sectio to sigle hypotheses: test H 0 : p = 1 agaist H 1: p = 0.55 Furthermore we cosider the origial data of the sample: o the idividual level we observe radom variables X 1, X,, X 1000 with X i = 1 if participat i prefers Cola Cola, otherwise X i = 0 (i = 1,,.,100). These variables are assumed to be idepedet. We shall evaluate the test statistic r(x 1, X,, X ) for this problem. Later o we shall prove the fudametal lemma of Neyma ad Pearso i geeral.

108 Example Note: = 1000, θ = p, θ 0 = 1 ad θ 1 = Sice (X i = 1 θ = θ 0 ) = P(X i = 0 θ = θ 0 ) = 1, P(X i = 1 θ = θ 1 ) = 0.55 = 1 P(X i = 0 θ = θ 1 ) ad X = X 1 + X + + X 1000 deotes the umber of successes: the umber prefereces for Coca Cola. So we get r(x 1, x,, x ) = (0.5) i x i(0.5) 1000 i x i (0.55) i x i(0.45) 1000 i x i = ( ) ( 0.45 i xi 0.55 ) r(x 1, X,, X ) = ( ) 1000 ( ) X (writte as a radom variable) Note that the test statistic r(x 1, X,, X ) = ( )1000 ( )X is a decreasig fuctio of X: rejectig the ull hypothesis for small values of r(x 1, X,, X ) is equivalet to rejectig the ull hypothesis for large values of X. Coclusio: the biomial test applied with the fuctio φ to achieve a exact level of sigificace α = 1%, is the most powerful test: the power for p = 0.55 is the highest possible. For α = 1% we foud that the Rejectio Regio is give by X 537 (with the proper radomizatio for the critical value 537) which is equivalet to r(x 1, X,, X ) ( ) ( ) , the RR of r(x 1, X,, X ) with the same radomizatio of the critical value of r. We ca geeralize the result i example If we test H 0 : p = 1 agaist H 1: p = p 1 for some fixed value p 1 > 1, the the test statistic r(x 1, X,, X ) is give by r(x 1, X,, X ) = ( ) ( 1 p X 1 ), 1 p 1 p 1 which is a decreasig fuctio of X, wheever p 1 > 1. So, agai, the biomial test that rejects the ull hypothesis for large values of X, is most powerful for each chose level of sigificace α. Let us retur to the origial problem i example of testig H 0 : p = 1 agaist H 1: p > 1, with α 0 = 1%, where we reject the ull hypothesis if X 538. We showed i example that α = Accordig to the lemma of Neyma ad Pearso the power P(X 538 p = p 1 ) for each

109 4-9 p 1 > 1 is ot smaller tha the power (for p 1) of ay other test with level of sigificace α = , i other words: the biomial test is the uiformly most powerful (UMP) test for our testig problem. If the biomial test is modified to attai a exact level of sigificace α = 1%, by meas of the fuctio φ, the the biomial test is the UMP test of level of sigificace 1%. Defiitio For fixed sigificace level α a test o H 0 : θ Θ 0 agaist H 1 : θ Θ 1 is uiformly most powerful (UMP) if the power β(θ 1 ) of this test is greater tha or equal to the power β 1 (θ 1 ) of ay other test, for all θ 1 Θ 1. Now we will give the proof of the lemma for a radom sample, draw from a discrete distributio. Note that, formally, the rejectio regio is a subset of the set of all possible realizatios of the sample: outcomes {(x 1, x,, x ) x i S X for i = 1,,, }. To simplify the otatio i the proof we will deote x = (x 1, x,, x ) as a outcome i the sample space, so that the ratio r(x 1, x,, x ) = r(x) ad the coditioal probability of rejectio is φ(x 1, x,, x ) = φ(x). Proof of the lemma of Neyma ad Pearso: For achievig a exact level of sigificace α a fuctio φ has to be defied which describes the probability of rejectig the ull hypothesis give the outcomes x = (x 1, x,, x ) of X = (X 1, X,, X ). For the described test it is as follows: φ(x) = 1 if r(x) < c φ(x) = d if r(x) = c φ(x) = 0 if r(x) > c, where the critical value c ad probability d are chose such that the level of sigificace α holds, which meas: φ(x) P(X = x θ = θ 0 ) = α. x Where P(X = x θ = θ 0 ) = P(X i = x i θ = θ 0 ) i is the probability of the realizatio x = (x 1, x,, x ) of the sample. Now cosider ay other test with level of sigificace α. Such a test ca be described by a

110 4-30 fuctio φ (x 1, x,, x ) which gives the probability of rejectig the ull hypothesis, give the outcomes of X 1, X,, X for this secod test. A exact level of sigificace α meas φ (x)p(x = x θ = θ 0 ) = α x The clue of the proof is to show that the followig iequality holds: (φ(x) φ (x)) (P(X i = x i θ = θ 1 ) 1 c P(X i = x i θ = θ 0 )) 0 x (A) First cosider the subset V of values of sample outcomes x = (x 1, x,, x ) for which holds. Withi the subset V we have: φ(x) φ (x) > 0 φ(x) > 0 ad thus r(x) = P(X=x θ=θ 0) P(X=x θ=θ 1 ) c P(X = x θ = θ 1) 1 c P(X = x θ = θ 0), so: (φ(x) φ (x)) (P(X = x θ = θ 1 ) 1 c P(X = x θ = θ 0) is oegative. Similarly, if we ca cosider the subset W of values of x = (x 1, x,, x ) for which φ(x) φ (x) < 0 holds. Withi the subset W we have φ(x) < 1 ad thus r(x) = P(X=x θ=θ 0) c. The withi W agai P(X=x θ=θ 1 ) (φ(x) φ (x)) (P(X = x θ = θ 1 ) 1 c P(X = x θ = θ 0) is oegative. For the remaiig outcomes, ot i V or W, we have φ(x) = φ (x), completig the proof of (A): (φ(x) φ (x)) (P(X i = x i θ = θ 1 ) 1 c P(X i = x i θ = θ 0 )) 0 x Rewritig the summatio i the left had side we fid: φ(x) P(X i = x i θ = θ 1 ) φ (x) P(X i = x i θ = θ 1 ) x x + 1 c φ(x) P(X i = x i θ = θ 0 ) 1 c φ(x) P(X i = x i θ = θ 0 ) 0 Or: β(θ 1 ) β (θ 1 ) + 1 α 1 α 0 c c Or: β(θ 1 ) β (θ 1 ) x x

111 4-31 So, ideed the power β (θ 1 ) is ot larger tha β(θ 1 ), the power of the test described by the fuctio φ. Remarks: Origially the ratio r(x 1, x,, x ) was defied slightly differet i the lemma of Neyma ad Pearso. We exchaged umerator ad deomiator for a better correspodece with the likelihood ratio test, which is the topic of the ext sectio. There exists a cotiuous versio of the fudametal lemma of Neyma ad Pearso. This versio is as follows: Lemma The fudametal lemma of Neyma ad Pearso (cotiuous versio) Cosider a radom sample X 1, X,, X : the radom variables X 1, X,, X are idepedet ad all are distributed accordig to a desity fuctio f(x θ) depedig o some parameter θ. We wat to test H 0 : θ = θ 0 agaist H 1 : θ = θ 1, for two give (ad distict) values θ 0 ad θ 1. Defie the ratio r(x 1, x,, x ) = f(x 1 θ = θ 0 ) f(x θ = θ 0 ) f(x θ = θ 0 ) f(x 1 θ = θ 1 ) f(x θ = θ 1 ) f(x θ = θ 1 ). Cosider the test with test statistic r(x 1, X,, X ) ad level of sigificace α (exactly), which rejects the ull hypothesis for small values (x 1, x,, x ). This test is most powerful (MP), it meas that ay other test of level of sigificace α has a power that is ot larger tha the power of the test described. 4.6 Likelihood ratio tests A likelihood ratio test is a type of test ispired by the lemma of Neyma ad Pearso. As a geeral rule likelihood ratio tests have a good performace with respect to the power. Provig optimality of the likelihood ratio tests is, however, beyod the scope of this course. May stadard tests withi statistics are equivalet to likelihood ratio tests. We shall show that the usual test for testig H 0 : μ = 0 agaist H 1 : μ 0 is equivalet to the likelihood test for this problem, for kow ad for ukow σ. Furthermore we study the likelihood ratio test for testig H 0 : σ = 10 agaist H 1 :σ 10 i case of a ormal radom sample.

112 4-3 Defiitio likelihood ratio test (cotiuous versio) Cosider a radom sample X 1, X,, X : the radom variables X 1, X,, X are idepedet ad all are distributed accordig to a desity fuctio f(x θ) depedig o some parameter θ Θ, where Θ = Θ 0 Θ 1 cotais all possible values of the parameter θ. If we test H 0 : θ Θ 0 agaist H 1 : θ Θ 1, the the likelihood ratio test of level of sigificace α is the test which rejects H 0 if Λ(X 1, X,, X ) c, where the test statistic Λ(X 1, X,, X ) is defied by sup f(x 1 θ) f(x θ) f(x θ) θ Θ Λ(x 1, x,, x ) = 0 f(x 1 θ) f(x θ) f(x θ) = sup θ Θ ad where the critical value c is such that sup θ Θ 0 sup θ Θ f(x i θ) i f(x i θ) i P( Λ(X 1, X,, X ) c θ) α for all values θ Θ 0 Remarks: Note that compared to the fudametal lemma of Neyma ad Pearso the values θ 0 ad θ 1 are replaced by sets, Θ 0 ad Θ. Ad therefore, i both umerator ad deomiator, the suprema of the likelihood fuctio L(θ) = f(x i θ) i have to be determied. I may cases the supremum turs out to be a maximum (for deomiator ad umerator), give by L(θ ), if the mle θ lies i Θ 0 (umerator) ad i Θ (deomiator). Sice i the umerator the search for the supremum is restricted to Θ 0, the deomiator is usually larger tha the umerator, so Λ 1: Λ = 1, if the suprema i umerator ad deomiator are the same: mle θ i Θ 0. Λ < 1, if the mle θ lies i Θ 1. Discrete versio: If the observatios X i are distributed accordig to a probability fuctio P(X = x θ) the Λ(x 1, x,, x ) i defiitio has to be defied by: sup P(X 1 = x 1 θ) P(X = x θ) P(X = x θ) θ Θ 0 Λ(x 1, x,, x ) = = sup θ Θ sup θ Θ 0 sup θ Θ P(X 1 = x 1 θ) P(X = x θ) P(X = x θ) P(X i = x i θ) i P(X i = x i θ) i

113 4-33 Testig H 0 : μ = 0 agaist H 1 : μ 0 (kow σ ) Suppose we observe idepedet radom variables X 1, X,, X which have all a N(μ, σ )- distributio. We test H 0 : μ = 0 agaist H 1 : μ 0 for a kow value of σ. Accordig to sectio 4.1 we should take X = (X 1 + X + + X ) as test statistic ad we have to reject the ull hypothesis if X c or X c, where c depeds o the level of sigificace α. We shall show that the likelihood ratio test is equivalet to this test. We apply the likelihood ratio test. As σ is assumed to be kow we have parameter θ = μ, Θ 0 = {0} ad Θ = R is the set of all real umbers. Notig that the desity of N(μ, σ ) is equal to f(x) = 1 exp ( 1 (x πσ μ) σ ), we hece get: Λ(x 1, x,, x ) = 1 exp ( 1 i x πσ i σ ) sup μ 1 i exp ( 1 (x πσ i μ) σ ) = exp ( 1 x i i σ ) sup exp ( 1 (x i i μ) σ ). We shall use the followig equality which ca be verified easily. μ (x i μ) = (x i x ) + (x μ), i Usig this equality we see that (x i μ) i is miimized for μ = x, ad hece exp ( 1 (x i i μ) σ ) is maximized for μ = x. Moreover we get (take μ = 0 i the equality): i x i = i(x i x ) + (x ) We thus ca write: i Λ(x 1, x,, x ) = exp ( 1 x i σ ) exp ( 1 (x i x ) σ ) i i = exp ({ 1 x i + 1 (x i x ) } σ ) = exp ( 1 (x ) σ ) i i

114 4-34 The test statistic Λ(X 1, X,, X ) is thus give by Λ(X 1, X,, X ) = exp ( 1 (X ) σ ) ad we reject H 0 if Λ(X 1, X,, X ) c, where the critical value c has to be chose such that level of sigificace α is attaied. Note that rejectig the ull hypothesis if exp ( 1 (X ) σ ) c is equivalet to rejectig the ull hypothesis if X c or X c, wheever c = exp ( 1 c σ ). So the likelihood ratio test is ideed equivalet to the two-sided test based o test statistic X. Testig H 0 : μ = 0 agaist H 1 : μ 0 (ukow σ ) Let us modify our testig problem. We still test H 0 : μ = 0 agaist H 1 : μ 0 but assume ow that the parameter σ is ukow as well. Note that formally the parameter θ has dimesio : θ = (μ, σ ) ad we are testig H 0 : θ Θ 0 = {(µ, σ ) µ = 10, σ > 0} agaist H 1 : θ Θ 1 = {(µ, σ ) µ 10, σ > 0} Accordig to sectio 4. the usual test statistic is T = X S ad we have to reject the ull hypothesis if T c or T c, where c depeds o the level of sigificace α. Let us agai apply the likelihood ratio test. We ow get: Λ(x 1, x,, x ) = sup 1 σ πσ exp ( 1 x i σ ) sup 1 μ, σ πσ exp ( 1 (x i μ) σ ) = sup σ ( 1 i πσ ) exp ( 1 x i σ ) sup i μ, σ ( i 1 πσ ) exp ( 1 (x i μ) σ ) i Let us first evaluate the deomiator of Λ(x 1, x,, x ): sup ( μ, σ 1 πσ ) exp ( 1 (x i μ) σ ) I sectio.3 we leared that the maximum likelihood estimates for a ormal sample are: i

115 4-35 μ = x ad σ = (x i x ). i These estimates maximize the likelihood ( 1 πσ ) exp ( 1 (x i i μ) σ ), hece sup ( μ, σ 1 πσ ) exp ( 1 (x i μ) σ 1 ) = ( πσ ) i 1 = ( πσ ) exp ( 1 (x i μ ) σ ) i exp ( 1 ) Let us ow evaluate the umerator of Λ(x 1, x,, x ): sup σ ( 1 πσ ) exp ( 1 x i σ ) Oe ca show, i a way similar to the derivatio of the estimates μ ad σ, that ow the maximum likelihood estimate of σ is give by σ 0 = i x i, so We thus arrive at: sup σ ( 1 πσ ) exp ( 1 x i σ 1 ) = ( ) πσ 0 i i i exp ( 1 ). Λ(x 1, x,, x ) = ( πσ πσ 0 ) = (σ σ 0 ) = ( x i (x i x ) ) i / We coclude that the likelihood ratio test rejects H 0 for large values x i (x i x ) = 1 + (x ) (x i x ) = (x ) s i i with s = i (x i x ) ( 1). This likelihood ratio test thus rejects the ull hypothesis if (X ) S c, where critical value depeds o α. Note this is equivalet to rejectig the ull hypothesis if i

116 4-36 T c or T c, wheever c = c. Agai the usual test is equivalet to the likelihood ratio test. Testig H 0 : σ = 10 agaist H 1 : σ 10 Cosider agai observatios X 1, X,, X that are idepedet ad all distributed accordig to some N(μ, σ )-distributio. Motivated by example we wat test H 0 : σ = 10 agaist H 1 : σ 10. Accordig to sectio 4.3 the usual test statistic is S ad we have to reject the ull hypothesis if S c 1 or S c, where the critical value c depeds o the level of sigificace α. Let us agai apply the likelihood ratio test. Now we have to evaluate Λ = Λ(x 1, x,, x ) Λ = sup μ, σ =10 1 = ( π10 ) σ / = ( 10 ) 1 πσ exp ( 1 (x i μ) σ ) sup 1 μ, σ πσ exp ( 1 (x i μ) σ ) i exp ( 1 (x i x ) 1 10) ( πσ ) i exp ( 1 (x i x ) 10) exp ( 1 ) i i exp ( 1 (x i x ) σ ) σ / = ( 10 ) exp ( 1 σ 10) exp ( 1 ), where σ = (x i x ) i i = exp ( 1 σ l(σ 10)) exp ( 1 ) = exp ( 1 g(σ 10)) exp ( 1 ), where g(z) = z l (z) Note that for the likelihood ratio test we reject H 0 for small umbers Λ(x 1, x,, x ). That meas that here we have to reject H 0 for large values g(σ 10). If we take the costat a = 1, we ca graph Λ = e a(z lz+1) as a fuctio of z = σ (we chose a = 1 = 1): 10

117 4-37 g`(z) = 1 1, so g(z) is decreasig for z < 1 (Λ icreasig) ad is icreasig for z > 1 (Λ z decreasig) we have to reject H 0 for small values σ 10 ad for large values σ 10, as the graph illustrates: the rejectio regio Λ λ, should be chose such that P(Λ λ H 0 ) = α. The correspodig values a ad b are such that Λ(a) = Λ(b) = λ ad P(σ 10 a or σ 10 b H 0 ) = α. I this case (ad i geeral) this receipt for rejectig the ull hypothesis does ot coicide with rejectig the ull hypothesis if S c 1 or S c, as we discussed i sectio 4.3. Here the usual test differs from the likelihood ratio test, slightly.

118 Exercises 1. I a advertisemet campaig a compay claims that istallatio triple glass widows i houses reduces the costs of heatig by 30%. I a explaatio the compay states that 30% is the mea reductio ad the stadard deviatio of the reductios is σ = 6%. A ecoomist cosiders istallig the triple glass, but doubts the height of the average reductio percetage. Before istallig triple glass i his house the ecoomist iquires at houses that already have the triple glass widows istalled. He foud the followig reductios of the costs of heatig (i %): 18, 0, 37, 4, 33, 7, 1, 30 a. Does the computed mea prove that the ecoomist had reaso to doubt the claim of 30% reductio? Assume that σ = 6 ad coduct the proper test i 8 steps with α = 1%: use X as test statistic ad check that we have a lower-tailed test: rejectio regio: X c. b. Compute the p-value for the observed mea ad state for which values of α the ull hypothesis will be rejected. c. Sketch the distributio of X if µ = 30 ad if µ = 5 i oe graph: shade the sigificace level α = 1% ad idicate the Rejectio Regio X c (see b). d. Compute the power of the test if µ = 5 (first shade this probability i the graph).. Every user of statistical methods has to be aware of the differece betwee statistical sigificace ad relevat differeces. A sufficietly large sample could prove very small (practically irrelevat) effects to be statistically sigificat. To illustrate this pheomeo we will discuss scores o the Scholastic Aptitude Test for Mathematics (SATM) i the US. Without special traiig the test scores are ormally distributed with expected score µ = 475 ad stadard deviatio σ = 100. Assume a special traiig is desiged to icrease the scores ad thereby the expected score. But a icrease of the mea SATM-score from 475 to 478 is irrelevat, sice for access to good uiversities a much higher score is required. But this irrelevat icrease of µ ca be statistically sigificat! As a illustratio of this pheomea we will compute the p-value of the test H 0 : μ = 475 agaist H 1 : μ > 475. i each of the followig cases: a. I a year 100 arbitrary studets are subjected to the special traiig: their mea SATMscore is x = 478. (First give the probability model, the test statistic ad its distributio uder H 0 ad the observed value: compute the p-value to decide for the usual values of α, 1%, 5% ad 10%) b. Next year 1000 studets erolled for the special traiig: their mea SATM-score is x = 478. c. After a advertisemet campaig the ext year the umber of studets, takig the special traiig, icreased to : their mea SATM-score is x = 478. d. Compute for the situatio described i c. the 99%-cofidece iterval for the expected SATM-score µ after special traiig.

119 I a survey o the effectiveess of a helpdesk the service times of customers (amog other aspects) were assessed. Below the results of a radom sample of the service time (i miutes) of 4 customers are show, ordered from small to large: These observatios are summarized: the sample mea is.57 ad the sample variace is.0. I a much larger survey some years ago it was kow that before the mea (expected) service time was 1.98 miutes. a. Coduct a test to see whether the expected service time chaged. Apply the testig procedure with α = 5% ad decide first by determiig the Rejectio Regio ad, after completig the test, check the result by computig the p-value. b. Based o the observatios, ca we coclude that the stadard deviatio of the service times is larger tha 1 miute? Coduct a test o the variace with α = 10%. 4. A marketig cosultat is desigig a advertisemet campaig for girl clothes i the age of 10-1 year. A importat issue is to kow who, i the ed, decides about the purchase: the mother or the daughter. The cosultat referred to a survey of 400 of these purchases, where i 43 times the decisio was take by the mother. Ca we state, at a 5% level of sigificace, that i the majority of the purchases the mother decides? a. Coduct a test to aswer this questio. Use the 8 steps procedure o the formula sheet. (Coduct the test usig the Rejectio Regio ad, after that, check the stregth of the evidece by computig the p-value). b. I a. you foud the rejectio regio to be X 17. Determie the probability of a type II error if i reality 60% of all purchases are decided by the mother. Give the power of the test as well, at this value of p. 5. For a particular mass product a maximum proportio of 10% substadard (bad) products is used as a rule for the quality of the productio process. If more tha 10% of the products is substadard the productio must be stopped ad revised, which is a costly operatio. The quality assessed by drawig a radom sample of 0 products from the productio of a hour (more tha products). At the quality cotrol departmet the quality of each of the 0 products is assessed: the umber of substadard products is used to decide whether the productio has to be revised, at a 5% level of sigificace. a. Determie the Rejectio Regio of this test for the give α 0 = 5%. Apply the first 6 steps of the testig procedure. b. Compute the (maximum) value of the probability of the type I error of the test i a. c. Compute the power of the test i a. for p = 0., p = 0.3 ad p = 0.4.

120 4-40 I b. ad c. we computed the probabilities P(X c p = a), for p = 0.1, p = 0., p = 0.3 ad p = 0.4. The fuctio β(p) = P(X c p) is the power of the test. d. Compute β(0.05) i additio ad explai what the meaig of this probability is. e. Sketch the graph of the power β(p) for p [0,1]. Use the graph to idicate the probabilities of the type I ad type II errors ad the power, e.g. for p = 0.05 ad p = 0.3. f. Compute the p-value if the result of the sample is that 4 out of 0 products are substadard. What coclusio ca you draw from this probability? 6. The lifetime of car tire s is, accordig to the producer, o average km ad the stadard deviatio of the lifetimes is 1500 km. Users are advised to chage their tire s after 4000 km, as to avoid problems or usafe situatios. A user test of 0 arbitrarily chose car drivers (usig the specific tire s) revealed however that the mea lifetime i the sample was km ad the stadard deviatio 4000 km. a. Which (model) assumptios are ecessary to coduct the usual tests o the mea ad the variace of the lifetimes? b. Does the sample show covicigly (at level α 0 = 0.01) that the tire s live is less log tha the claim of the producer? c. Test at a 5% sigificace level whether the real variatio (variace) of the lifetimes deviates from the iformatio of the producer. 7. Klaas suspects that the die that his oppoet Eric is usig is ot fair: the outcome 6 seems to occur far more ofte tha expected. He tests the fairess of the die by rollig it util 6 is the result. X is the required umber of rolls. Use Neyma-Pearso`s lemma to derive the MP-test o the ull hypothesis H 0 : p = 1 agaist 6 the alterative H 1 : p = 1 with α 3 0 = 0.0, based o oe observatio x of X. 8. X 1, X,, X 10 is a radom sample, draw from the N(0, σ )-distributio. a. Determie (usig Neyma-Pierso ) the MP-test, if we wat to test H 0 : σ = 1 agaist H 1 : σ = with α = 5%. b. Repeat a. for H 1 : σ = 4 c. Determie, if possible, the UMP-test o H 0 : σ = 1 agaist H 1 : σ > 1 with α = 5%. 9. Let X 1,, X 10 deote a radom sample of size of a Poisso distributio with mea θ. So P(X i = x i ) = θx i x i! e θ (x i = 0, 1,, ) 10 Show that the test that rejects H 0 : θ = 0.1 i favour of H 1 : θ = 0.5, if i=1 X i 3 is most powerful. Determie, for this test, the sigificace level α ad the power at θ = 0.5.

121 X 1, X,., X are idepedet ad all N(μ, 1)-distributed with ukow μ. We wat to test H 0 : μ 0 agaist H 1 : μ > 0. a. Show that the test, that rejects H 0 : μ = 0 i favour of H 1 : μ > 0 if X c, is the likelihood ratio test. b. Determie the Rejectio Regio for the test i a. as fuctio of if α 0 = c. Show that the probability of a type I error is at most 0.05 (for μ = 0). d. Determie for the test i a. such that α 0 = 0.05 ad the power of the test at µ = 1 is at least 99%. e. Use the RR i b. for = 16 to compute the power β =16 (μ), for μ = , 0.5, 1 ad 1.5 ad sketch the power. f. Show that lim β (μ) = 1 for μ > 0 ad lim β (μ) = 0 for μ < 0. (The first limit asserts that the test i a. is cosistet). 11. The radom variable X has a desity with parameter θ, 0 θ 1, give by: c(1 θ x ), for x 1 f(x θ) = { 0, for x > 1 a. Determie c as a fuctio of θ ad sketch the graph for f if θ = 0 ad θ = 1. b. Determie the MP-test o H 0 : θ = 0 agaist H 1 : θ = 1, based o oe observatio x of X. (First use a. to ituitively determie the shape of the RR). c. Apply the test i b. for α 0 = 0.5 ad a observed value x = (Shifted expoetial distributio) The desity fuctio of the r.v. X cotais a ukow parameter θ 0: f(x θ) = { e (x θ), if x θ 0, elsewhere X 1, X,, X is a radom sample of X. a. Show that θ = mi(x 1,, X ) is the maximum likelihood estimator of θ. b. Check whether θ is a cosistet estimator of θ. c. Determie the likelihood ratio test for H 0 : θ = 0 agaist H 1 : θ > 0 at give α X 1,., X is a radom sample of X, for which the desity with ukow parameter λ > 0 is give as follows: f(x λ) = 1 λ e x λ, if x 0 ad f(x λ) = 0, elsewhere Derive the likelihood ratio test for H 0 : λ = λ 0 agaist H 1 : λ λ 0 at give level of sigificace α 0.

122 4-4 The Rejectio Regio of the test ca be approximated assumig that is sufficietly large ad if the value of λ 0 is give. Give the RR for = 5, λ 0 = 1 ad α 0 = 0.05.

123 5-1 Chapter 5 Two samples problems 5.1 The differece of two populatio proportios Statistical methods are ofte used i a comparative research: the statistics of two (or more) samples are compared. The goal of the research is expressed i the followig research questios: Is oe medicie more effective tha aother? Does smokig shorte your lifetime? Are me more itelliget tha wome? Is there a differece i lifetime betwee two trademarks of smartphoe batteries? Is there a sigificat differece i the performace of two software programs (e.g. a differece i mea respose time)? Does a ew educatioal approach of a course lead to better results tha before? Et cetera. Ofte it is ot simple to fid a correct ad effective experimetal set up i order to, statistically, aswer this kid of questios. The aim is to produce two (or more) sequeces of observatios, that give a clear view o the differece i which we are iterested, ad to avoid too much oise (effects of other variables). I chapters 3 ad 4 we leared that probability models express some aspects of the correct set up, such as the radom samplig ad the assumptio of a ormal or a biomial model. I this chapter we will exted our models to several two samples models. We will start off with two populatio proportios. Example I the world of medicie the complait is ofte heard that doctors advise their patiets to stop smokig, though may of these doctors smoke themselves. Let us assume that we wat to determie the differece of the proportios of smokers amog doctors (proportio p 1 ) ad amog patiets (proportio p ). Sice we have two separate populatios (doctors ad patiets), it seems reasoable to assume that, if two radom samples are take from the two populatios, the samples are idepedet. Probability model: the umber X of smokers amog 1 doctors ad the umber Y of smokers amog patiets are B( 1, p 1 )- ad B(, p )-distributed. X ad Y are idepedet. The probability model give i example ca be applied wheever we have two radom samples to determie the proportios i two populatios or two subpopulatios. As before, the biomial distributio applies if the samplig is with replacemet or without replacemet from large populatios (i that case we have approximate idepedece ). The estimator for the differece p 1 p is at had: X 1 Y, which is ofte deoted as p 1 p. This estimator is ubiased: E ( X 1 Y ) = E ( X 1 ) E ( Y ) = p 1 p,

124 5- Sice the sample proportio X 1 is a ubiased estimator for p 1 (see chapter ). Likewise E ( Y ) = p. The variace of X 1 Y ca be computed, usig the idepedece of X ad Y: var ( X Y ) id. = var ( X ) + var ( Y ) = p 1(1 p 1 ) + p (1 p ) Sice for large 1 ad large both X 1 ad Y are approximately ormally distributed, X 1 Y is ormally distributed as well: μ ad σ have bee computed above. So, approximately (CLT): Z = ( X 1 Y ) (p 1 p ) p 1(1 p 1 ) 1 + p (1 p ) ~ N(0, 1) Costructio of a cofidece iterval for the differece p 1 p of two populatio proportios (for large samples) A 95%-cofidece iterval for p 1 p is costructed similarly as for the oe sample biomial problem. Agai we will use p 1 = X 1 ad p = Y to estimate the ukow p 1 ad p i the stadard deviatio of p 1 p. This leads to the stadard error p 1(1 p 1) 1 p 1 p. + p (1 p ), the estimated stadard deviatio of Note 5.1. Sometimes the stadard error is briefly otated as σ p 1 p or SE(p 1 p ). I the N(0,1)-table we fid the value of c, such that P( c < Z < c) = 1 α For istace, if 1 α = 0.95, the c = 1.96 such that P(Z c) = 1 α = The probability statemet P c < (p 1 p ) (p 1 p ) p 1(1 p 1) ( + p (1 < = 1 α p ) 1 c) ca be rewritte to the followig formula of a iterval for p 1 p :

125 Property (approximate cofidece iterval for the differece of two populatio proportios) If X ~ B( 1, p 1 ) ad Y ~ B(, p ) are idepedet, the for large 1 ad : (1 α)100%-ci(p 1 p ) = (p 1 p c p 1(1 p 1) 1 where c is such that P(Z c) = 1 α p (1 p ), p 1 p + c p 1(1 p 1) + p (1 p ) ), 1 Rule of thumb for sufficietly large 1 ad is, as before: 5, p > 5 ad (1 p ) > 5, ow for both 1 ad (ad p 1 ad p ). Example (cotiuatio of example 5.1.1) The results of the two radom samples are available: 3 of the 1 = 4 doctors are smokers ad 4 of the = 67 patiets. The p 1 p = % is the estimated differece i the 4 67 proportios of smokers. The stadard error of this differece is p 1(1 p 1) 1 + p (1 p ) 3 = ( 9.7%) 67 So: 95% BI(p 1 p ) = ( , ) (0.000, 0.378) At a 95% cofidece level the differece of the proportios smokers amog doctors ad patiets lies betwee 0% ad 37.8%. These relatively small samples do ot result i a precise estimate of the differece i proportios. Test o the equality of two populatio proportios p 1 ad p (for large samples) If the ull hypothesis is H 0 : p 1 = p, we ca use the stadardized differece p 1 p, agai, ow to costruct a test. But 1. Sice H 0 : p 1 = p, the expectatio of p 1 p is p 1 p = 0 uder H 0.. If p 1 = p = p, p 1 ad p estimate the same ukow p. Usig both samples we ca add both the umbers of successes ad the sample sizes: p = X + Y 1 + = total umber of successes total umber of trials The stadard deviatio p 1(1 p 1 ) + p (1 p ) p 1 =p =p 1 = 1 p(1 p) ( + 1 ) ca be 1 estimated by p (1 p ) ( ).. These properties uder H 0 result i the followig test statistic to test H 0 : p 1 = p : p 1 p Z = p (1 p ) ( ~ N(0,1), approximately if H 0 : p 1 = p is true. 1 ) Of course we ca distiguish oe ad two-tailed tests, depedig o the research questio: If H 1 : p 1 > p, the test is upper-tailed.

126 5-4 If H 1 : p 1 < p, the test is lower-tailed. If H 1 : p 1 p, the test is two-tailed. Accordigly the rejectio regio or the p-value are right-, left- or two-sided, as was the case for the oe sample biomial test. Example The Uiversity of Twete cosidered i 014 a trasitio to fully Eglish spoke bachelor programmes. Are studets ad lecturers equally ethusiastic about this chage of policy? I a survey it tured out that 115 out of 3 studets were i favour of the trasitio ad 6 out of 108 lecturers. The relevat proportios are = 49.6% ad = 57.4%. We wat to test whether this iformatio shows that the populatio proportios are differet, if α = We will apply the testig procedure: 1. Model: X = the umber i favour amog 1 = 3 studete is B(3, p 1 )- ad Y = the umber i favour amog = 108 lecturers is B(108, p )-distributed. X ad Y are idepedet.. Test H 0 : p 1 = p versus H 1 : p 1 p with α = The test statistic is Z = p 1 p, where p = X+Y (see formula sheet) p (1 p )( ) Distributio of Z uder H 0 : approximately N(0, 1) 5. Observed value p = , so z = ( ) 6. The rejectio regio is two-sided: reject H 0 if Z c or if Z c. P(Z c H 0 ) = α = 0.05 if Φ(c) = 0.975, so c = z = 1.34 < c, so we fail to reject H At a 5% sigificace level there is isufficiet proof to state that the proportios of studets ad lecturers who are i favour of the trasitio to Eglish bachelor programmes are differet. Note that both the test ad the cofidece iterval of this sectio ca oly be applied if both samples are large.

127 The differece of two populatio meas I this sectio we will restrict ourselves to two idepedet radom samples, take from the ormally distributed variables X ad Y, i (two) populatios. The assumptio of idepedece of the samples is, i geeral, reasoable if the samples relate to two differet populatios or subpopulatios, such as Dutchme ad Belgias, higher ad lower educated people i a coutry, me ad wome. But the experimetal set up should be such that the idepedece of observed variables i the samples are idepedet: e.g. if we wat to compare the salaries of me ad wome i a coutry ad the setup is such that may couples of me ad wome occur i the samples, it is evidet that the salaries of a ma ad a woma of each couple are depedet: the idepedece of the samples is ot preserved. If the idepedece of the radom samples is preserved, we have: Probability model of two idepedet radom samples, draw from ormal distributios: X 1,, X 1 is a radom sample of X, which is N(μ 1, σ 1 )-distributed. Y 1,, Y is a radom sample of Y, which is N(μ, σ )-distributed. X 1,, X 1, Y 1,, Y are idepedet (the idepedece of the samples). Note that we have idepedece withi each ( radom ) sample ad betwee samples. For the sample meas ad sample variaces we will use the followig otatios: 1 X = 1 1 X i i=1 1, S X = (X i X) i=1, Y = 1 Y j j=1, S Y = 1 1 (Y j Y) (Some books use a idex-otatio istead: X 1, S 1, X ad S. I this course we will sometimes use this otatio with idices 1 ad for the sample variaces as well.) We are iterested i the differece of the populatio meas (expectatios) μ 1 μ : the estimator at had is, of course, X Y: E(X Y) = E(X) E(Y) = μ 1 μ Sice the X i s are idepedet ad all N(μ 1, σ 1 ), the sample mea is ormally distributed as well: X ~ N (μ 1, σ 1 ) Likewise: Y ~ N (μ, σ ) 1 So var(x Y) id. = var(x) + var(y) = σ 1 We coclude: Or: + σ. 1 X Y ~ N (μ 1 μ, σ 1 + σ ) 1 (X Y) (μ 1 μ ) σ 1 + σ 1 ~ N(0,1 ) j=1

5-6 This variable ca be used to costruct a cofidece iterval for the differece μ 1 μ ad a test o μ 1 μ, if the variaces σ 1 ad σ are kow. But usually they are ukow.

128 5-6 This variable ca be used to costruct a cofidece iterval for the differece μ 1 μ ad a test o μ 1 μ, if the variaces σ 1 ad σ are kow. But usually they are ukow. We ca oly replace them by S X ad S Y for (very) large samples (ad use the approximate N(0,1)-distributio, as will be discussed i chapter 7), but for small sample sizes we will discuss the approach i oe special case: Cofidece iterval for μ 1 μ ad a test o μ 1 μ for equal, but ukow variaces σ 1 = σ = σ If the variaces are equal, the: var(x Y) = σ 1 + σ = σ ( ) 1 1 Both S X aad S Y are ubiased estimators of σ : a combiatio, such as S X + S Y is better tha each separate estimator: the Mea Squared Error (the variace) is smaller. It ca be show (see property.1.9) that the best ubiased liear combiatio a S X + (1 a)s Y attais its miimal variace for a = 1 1 ad 1 a = The weighig factor a is the proportio of the umber of the degrees of freedom of S X ad the total umber of degrees of freedom of S X ad S Y, 1 +. The best (liear) ubiased estimator is called the pooled sample variace (otatio: S or S p ), as stated i property.1.9: S = S X S Y I (X Y) (μ 1 μ ) σ ( ) we ca replace σ by S : T = (X Y) (μ 1 μ ) S ( ) As before, the replacemet of σ by S itroduces a t-distributio for the variable T: a t-distributio with 1 + degrees of freedom (df = 1 + ). At a give level of cofidece 1 α the value c ca be determied such that P( c < T 1 + < c) = 1 α

129 We ca use the t-distributio of T i order to either costruct a cofidece iterval for μ 1 μ or fid a test statistic to test o a specific differece of the meas: H 0 : μ 1 μ = Δ Property 5..1 (cofidece iterval ad test for the differece of populatio meas) For the probability model of two idepedet samples, draw from ormal distributios with equal, but ukow variaces, we have: 5-7 a. The pooled sample variace S = 1 1 S 1 + X + 1 S 1 + Y is the best ubiased estimator i the family of liear combiatios as X + bs Y. b. (1 α)100%-ci(μ 1 μ ) = (X Y c S ( ), X Y + c S ( )), where P(T 1 + c) = α. c. If we test o H 0 : μ 1 μ = Δ 0, the test statistic T = X Y Δ 0 uder H 0. S ( ) has a t 1 + distributio Example 5.. A desktop-producer woders whether the assembly of desktops i two productio halls is equally fast. 1 Assembly times (i miutes) were observed i hall 1: the mea was 8.5 mi. with a sample variace 18.1 mi. The assembly of 10 desktops i hall took loger: o average 3. mi. with a sample variace 0.4 mi. Probability model of the observatios: The assembly time i hall 1 is X ~ N(μ 1, σ ) ad i hall Y ~ N(μ, σ ), where μ 1, μ ad σ. (Implicitly σ 1 = σ = σ is assumed!) X 1,, X 1 ad Y 1,, Y 10 are idepedet radom samples of X ad Y, respectively. The formulas i property 5..1 (ad o the formula sheet) ca be applied: - Computatio of the pooled sample variace: s = 1 1 s 1+10 X s 1+10 Y = (lies betwee s 0 0 X ad s Y ) - Computatio of the 95%-cofidece iterval: i this example df = 1 + = 0. The value c ca be foud i the t 0 -table: P(T 0 c) = α = 0.05, so c =.086. The other values are give i the problem descriptio: 95%-CI(μ 1 μ ) = (x y c s ( ), x y + c s ( )) = ((8.5 3.) ( ), ) 1 10 ( 7.6, +0.) at a 95% level of cofidece the differece of expected assembly times lies betwee -7.6 ad

130 miutes. - Test o the expected differece. If the research questio is Is there a differece i mea assembly times, we are iclied to coduct a test whether the differece is 0 (= 0 ) or ot. I 8 steps: 1. The probability model is give above.. Test H 0 : μ 1 μ = 0 (or μ 1 = μ ) agaist H 1 : μ 1 μ 0 with α = 5% 3. Test statistic T = X Y 0, where S = 1 1 S S ( ) 1+10 X S 1+10 Y 4. Distributio uder H 0 : T~ t 0 (df = 1 + = 0) 5. Observed value: s = 19.1, so = ( ) It is a two-tailed test: reject H 0 if T c or T c. P(T 0 c) = P(T c) = α = 0.05, dus c = Sice t = > c (t lies betwee c ad c), so we failed to reject H At a 5% level there isufficiet proof to state that there is a statistically sigificat differece i expected assembly times. Note 5..3 Relatio betwee cofidece itervals ad (two-tailed) tests I the last example this relatio may be evidet: the test did ot reject H 0 : μ 1 μ = 0 i favour of H 1 : μ 1 μ 0 at sigificace level α = 5%. This agrees with the observatio that the differece 0 is icluded i the cofidece iterval with a level of cofidece 1 α = 95%. I geeral there is a relatio betwee two-tailed tests ad correspodig two-tailed cofidece itervals. If θ is the ukow parameter ad we test H 0 : θ = θ 0 agaist H 1 : θ θ 0 at a sigificace level α, the H 0 is rejected if ad oly if θ 0 is ot cotaied i the cofidece iterval of θ with a cofidece level 1 α. Reversely, the (1 α)100% CI(θ) cosists of all values of θ 0 for which H 0 is ot rejected. A similar relatio ca be established for oe-sided tests ad oe-sided itervals. For example, if we test H 0 : μ = μ 0 agaist H 1 : μ > μ 0 with α = 5% (a upper-tailed test), the we will reject H 0 if μ 0 is outside the oe-sided CI(μ) = (, u) with cofidece level 1 α = 95%. This CI(μ) with oly a upper boud u ca be determied by idetifyig all values of μ 0 such that H 0 is ot rejected. Despite of the relatio we will treat cofidece itervals ad tests as differet methods: applicatio depeds o the (research) questio. I example 5.. we assumed the equality of both variaces, though the sample variaces are apparetly differet: 8.5 ad 3.. I the ext sectio we will ivestigate whether the assumptio is, evertheless, sustaiable: is the differece sigificat? Of course such a test could be coducted before applyig the methods preseted i this sectio, which are based o the assumptio of equal variaces. O the other had usig the observatios twice, first to check the equality of variaces, ad later for the cofidece iterval or the test o the expected differece may be tricky : usig the observatios twice makes the coclusios depedet! I a perfect statistical world we would prefer to coduct the survey twice (idepedetly).

131 Test o the equality of variaces If we wat to test whether the assumptios of equal variaces of two populatios is justified, we ca choose the followig hypotheses: Test H 0 : σ 1 = σ agaist H 1 : σ 1 σ. We will assume a model with two idepedet radom samples draw from ormal distributios: Oe populatio with variable X ~ N(μ 1, σ 1 ) ad aother with Y ~ N(μ, σ ) (so differet σ`s). X 1,, X 1 ad Y 1,, Y are idepedet ad radom samples of X ad Y, respectively. As before the sample meas will be deoted X ad Y ad the sample variaces S X ad S Y. Searchig for a suitable test statistic the differece of the sample variaces S X S Y might be at had. But for this variable we caot fid a distributio: it depeds o the variaces σ 1 ad σ. That is why we choose the quotiet of the sample variaces, S X S. If the ull hypothesis is true Y ( σ 1 σ = 1), this variable has a distributio which was first derived by Sir R.A Fisher (i the tweties of the previous cetury): if H 0 : σ 1 σ = 1 is true, the F = S X S is likely to be close to 1. Y I that case the observed values of F will vary aroud 1, accordig to a so called F-distributio: F is said to have a Fisher- or F-distributio with 1 1 degrees of freedom i the umerator ad 1 degrees of freedom i the deomiator. For short: a F distributio. Note that the umber of degrees of freedom i the umerator equals the umber of degrees of freedom of the related Chi-square distributio of S X (i the umerator). Likewise df = 1 for S Y i the deomiator. Defiitio If V ~ χ k ad W ~ χ l ad V ad W are idepedet, the F = V/k with k degrees of freedom i the umerator ad l i the deomiator. W/l has a F(isher)-distributio

132 F = 5-10 We will ot give or derive the formula of the desity fuctio, but ote that for the Fisher distributio the upper tail bouds for upper tail probabilities α = 5% ad α =.5% are give i the F-tables, for several combiatios of values of k ad l. If V = ( 1 1)S X σ ~ χ 1 1 ad W = ( 1)S X 1 σ ~ χ 1 ad assume that σ 1 = σ 1 = σ, the ( 1 1)S X σ ( 1 1) = S X ( 1)S Y S ~ Y σ ( 1) F 1 1 1, provig: Property 5.3. For a test o H 0 : σ 1 = σ i a model of two idepedet radom samples, draw from ormal distributios, the test statistic ad its distributio are as follows: F = S X S ~ F 1 1 1, if H 0 : σ 1 = σ is true Y Of course, we could have chose S Y S, so 1, as test statistic: it has uder H X F 0 a F distributio (the umbers of degrees of freedom i umerator ad deomiator are switched). Example Is the assumptio of equal variaces i example 5.. sustaiable, if the sample variaces s X = 18.1 ( 1 = 1) ad s Y = 0.4 ( = 10) are observed? I other words: is the differece statistically sigificat? We will coduct Fisher s F-test to aswer this questio at a 5% level: 1. Probability model: the assembly time i hall 1 is X ~ N(μ 1, σ 1 ) ad i hall Y ~ N(μ, σ ), where μ 1, μ, σ 1 ad σ are ukow parameter (so possibly σ 1 σ ) X 1,, X 1 ad Y 1,, Y 10 are idepedet ad radom samples of X ad Y, respectively.. Test H 0 : σ 1 = σ (or σ 1 = σ ) versus H 1 : σ 1 σ with α = 5%. 3. Test statistic: F = S X S Y Distributio uder H 0 : F ~ F Observed value: F = s X s = Y It is a two-tailed test: reject H 0 if F c 1 or if F c (see the graph below): P(F 11 9 c ) = α = 0.05, so (accordig to the related F-table) c = 3.9 P(F 11 9 c 1 ) = P ( 1 F11 1 ) = P (F 9 9 c 11 1 ) = α = 0.05, so 1 = 3.59 or c 1 c 1 c

133 F = 0.89 does ot lie i the Rejectio Regio, so we fail to reject H I coclusio: at a 5% sigificace level we caot prove that the variaces of the assembly times are differet. I the example we showed how to use the tables of the F-distributio, where oly the uppertailed probabilities 5% ad.5% are give: the critical value c 1 o the left had side ca be determied by rewritig the evet F = S X S < c 1 to 1 = S Y Y F S > 1. The we ca look up the value of X c 1 1 i the F 9 9 c 11 -table i this case, after simply switchig the umbers of degrees of freedom (use F 11 1 istead of F 11 9 ). Though usually we coduct a two-tailed test, with the iformatio give above it is easy to coduct a upper-tailed F-test (H 1 : σ 1 > σ ) or a lower-tailed F-test (H 1 : σ 1 < σ ). I these cases we will have oly oe tail probability with area α. Note that equal variaces is always i the ull hypothesis: we caot prove the equality of variaces, we merely test whether the variaces ca be prove to be uequal. Istead of H 0 : σ 1 = σ we ca state H 0 : σ 1 = σ equivaletly or H 0 : σ 1 σ = 1, which reflects that F is a estimate of the quotiet. I example 5.3. we foud that the variaces should differ a factor betwee 3.5 ad 4 or more to reject H 0 ( 1 c 1 = 3.59 ad c = 3.9): the proportio of the stadard deviatios should be more tha 3.9 or less tha (a factor of about ). Note Some books will give, istead of discussig the F-test, some simple rules of thumb to check the equality of variaces i a simple rule of thumb for relatively small samples (sample sizes less tha 40): If the proportio of the variaces is betwee 1 ad 4 (or the proportio of the stadard 4 deviatios is betwee 1 ad ), equal variaces ca be assumed. Note (Levee s Test o the equality of variaces ad SPSS) The SPSS-software does ot use the F-test to verify the equal variaces assumptio, but Levee`s test o the equality of variaces. Levee developed a alterative test that, i may cases, will produce the same coclusio as the

134 5-1 F-test, at the same level of sigificace. But especially whe there are potetial outliers amog the observatios, Levee`s test may lead to differet coclusios. We will ow show what SPSS reports if we eter the assembly times of examples 5.. ad i a SPSS-file ad apply the meu`s Compare meas ad Idepedet samples. I the SPSS-output below a table is show with the result of the p-value of Levee s test (i SPSS idicated as Sig. or Observed sigificace ). I this case the p-value is (very) large: Idepedet Samples Test Levee's Test for Equality of Variaces t-test for Equality of Meas F Sig. t df Sig. (-tailed) Time Equal variaces assumed,003,959-1,978 0,06 Equal variaces ot assumed -1,967 18,81, Sice the p-value is ot small (> α), we caot reject the assumptio of equal variaces ad we may use the results show o the lie idicated with equal variaces assumed : the test statistic of the two idepedet samples t-test (discussed i the previous sectio) is (see example 5..) ad the two-tailed p-value is 6.% > α = 5%. We fail to reject H 0, i agreemet with the coclusio of the test, that we coducted i example 5.., usig the rejectio regio. The secod lie shows the optio equal variaces ot assumed : here a alterative t-test is coducted where the umber of degrees of freedom is complicated. It is possible to compute the p-value of ay test (like SPSS does), but, sice oly two tables (α = 5% ad α =.5%) are give, we will ot compute the p-value for F-tests. For the same reaso we will ot compute the power of F-tests. Furthermore a cofidece iterval for the proportio σ 1 σ ca be costructed, see exercise Paired samples If two radom samples are idepedet, we ca use the idepedece to fid a expressio for the variace of the differece of the sample meas (sectio 5.): var(x Y) id. = var(x) + var(y) If the samples are depedet, a (usually ukow) covariace-term cov(x, Y) should be added to the right had side: the the distributio of X Y caot be determied. Especially i comparative research this depedece of two samples ofte occurs, although each of the samples is radom. Some examples: At radom 10 married couples are chose ad the legth of each ma ad each woma is measured.

135 5-13 The effectiveess of a medicie, which decreases the blood pressure, is evaluated: for a umber of persos with high blood pressure (hypertesio) the blood pressure is measured twice: before ad after usig the medicie durig a trial period. Two software programs, which are used for searchig words i databases, are evaluated by measurig the search time for both programs i a umber of databases. Though i each of these examples two series of observatios are geerated, we caot apply the methods discussed i sectios ad 3: the assumptio of idepedece of the samples does ot apply: small me ted to choose smaller wome; the blood pressure before will ifluece the blood pressure after use of the medicie; a relatively small database will lead to short search times for both programs. The similarity of the 3 examples is that there are pairs of depedet observatios: two legths per couple, two blood pressures per perso ad two search times per database. It is a atural approach to switch to the differece of each of the observed pairs: the differece i legth of ma ad woma, the decrease (before after) of the blood pressure of a perso, the differece i search time per database. The, istead of two sequeces of observatios we have oly oe sequece of differeces which ca be assumed idepedet. If, i additio, the ormal distributio applies to the differeces we ca apply the oe sample t-test o the mea differece. The trasitio from paired samples to a oe sample test o the differeces has some otatioal aspects: e.g. if X ad Y are the legths (i cm) of a ma ad a woma i a couple, the μ 1 = EX ad μ = EY. Suppose we would like to test whether H 0 : μ 1 μ 10 ca be rejected versus H 1 : μ 1 μ > 10. These hypotheses suggest that we are goig to apply a two idepedet sample method, but we wat to apply the oe sample t-test o the differeces X Y which have a expected differece μ = E(X Y) = μ 1 μ. So we will test H 0 : μ 10 versus H 1 : μ > 10, where μ = the expected differece i legth of ma ad woma. Though equivalet (μ = μ 1 μ ), the latter otatio is less ambiguous. Example Lack of raifall i agricultural area`s is a problem that a coutry tries to solve by strewig crystals from a plae above clouds. Durig a trial period the effectiveess of the method is evaluated by measurig the raifall i two area`s with the same climate coditios. Area 1 is the area where the method with strewig crystals o clouds is applied. I area the method is ot applied. The quatities of raifall is observed durig 6 moths (i cm): Area 1 (with crystals) Area (without) Sice the quatities of raifall deped o the moth i the year both rows caot be coceived as radom samples. But if we compute the differece i raifall for each of the six moths, the for the mothly differeces (with without crystals) +1.3, +.9, +1.3, +3.5, 0.1 ad +0.9 cm the followig probability model seems reasoable: The mothly differeces (with without crystals) X 1, X,, X 6 are idepedet ad all N(μ, σ )-distributed with ukow expected differece μ ad ukow variace σ. (Note that we implicitly assume that μ ad σ of the differeces do ot chage durig the moths.)

136 5-14 The questio whether crystals strewig is effective, ca be coceived as a test o H 0 : μ = 0 agaist H 1 : μ > 0 (more rai i area 1). We will use α = 5%. The test statistic is T = X, that has a t S/ distributio uder H 0. The calculator gives: x ad s 1.331, so t = / 6 This is a upper-tailed test: reject H 0 if T c =. 015, sice P(T 5.015) = 5%. t = >.015, so reject H 0 : at a 5% sigificace level we ca coclude that strewig crystals o clouds is a effective method to icrease the raifall. Sice the effect is sigificat we woder how much extra raifall we will have. The a cofidece iterval ca be iformative: s 95%-CI(μ) = (x c, x + c s ) = ( , ) 6 6 (0.3, 3.03) I the formula c =.571 is foud i the t 5 -table, sice P(T 5 c) = α =.5%. We are 95% cofidet that the icrease of raifall is betwee 0.3 cm ad 3.03 cm per moth. The icrease caot be determied precisely, but the lower boud suggests that the icrease might be margial. I example the 8 steps procedure is ot metioed explicitly, but evertheless each step is executed i the statistical reasoig. If the ormality-assumptio for the differeces i the paired samples is ot reasoable, a alterative test is give by the sig test, which will be discussed i chapter 7.

137 Exercises Youg rats are used to verify whether a homeopathic medicie agaist agig has a positive effect o their lifetimes; the rats are divided arbitrarily ito two groups of each 500 rats. Oe group is give the medicie (i their meals) ad oe group is ot give the medicie. All other coditios are kept the same. After 3 years 100 of the 500 treated rats died, ad 140 of the utreated group of rats died. a. Determie a 99%-cofidece iterval for the differece i death rates of treated ad utreated rats. b. If you would use the iterval of a. to assess whether the death rates are sigificatly differet, what would be your coclusio? c. Determie two 99%-cofidece itervals for the death rate, oe of the treated rats ad oe of the utreated rats. How would you use these itervals to assess the differece? Ad compare your coclusio to the coclusio i b.. Cotiuatio of exercise 1. Does the result of the experimet with the 1000 rats cofirm that the medicie is effective? Use the testig procedure ad compute the p-value to decide whether the medicie proves to be effective at the usual levels of cofidece (α betwee 1% ad 10%). 3. Is there a differece i achievemets by male ad by female PhD-studets? A large uiversity classified all PhD-studets who started i a year ad determied their status after 6 years. After 6 years 98 out of 9 females completed their study successfully, ad 43 out of 795 males. Coduct a test to show whether there is a sigificat differece i success probability betwee male ad female PhD-studets. Use the testig procedure i 8 steps ad a 5% sigificace level. 4. A sociologist wats to fid out whether the political preferece of youg voters (age 5) depeds o geder. He distiguishes parties o the left (liberal) ad o the right (coservative) ad wats to determie a 95%-cofidece iterval for the differece i proportios of left voters amog male ad female yougsters, based o two samples of males ad females. Determie the sample size such that the iterval`s width is at most 0.0. Hit: use that for ay proportio p we have p(1 p) 1 4.

138 I the followig situatios a statistical aalysis is required to aswer the research questio with respect to expectatio(s). Idetify whether we have a problem with (1) oe sample, () paired samples or (3) two idepedet samples a. A educatioalist is iterested i the effectiveess of the set-up of mathematical text books for High schools: should questios about a cocept be posed before the formal itroductio of a ew cocept, or afterwards? He prepares two papers, the first with motivatig questios before the itroductio of a cocept ad the secod paper with questio after itroductio of the same cocept. Each of the papers is the study materials of oe of two separate groups of studets. Afterwards the test results o the topic of both groups of studets are compared. b. A secod educatioalist prefers aother approach. She prepares two papers with totally differet topics. Each paper is made i two versios, oe with questios before ad oe with questios after the itroductio of the mai cocept. The educatioalist uses oe group of studets: each studet is taught both topics: oe topic (arbitrarily chose) with questios before ad the other topic with questios after. Each studet is subjected to two tests, o both topics, ad the test scores (questios after ad before) of the studet are compared. c. A chemist is give the assigmet to evaluate a ew method to determie the cocetratio of a material i a fluid. For that goal he uses a referece fluid with a kow cocetratio of the material. To check the bias of the ew method he repeats the measuremet of the cocetratio accordig to the ew method 0 times ad compares the mea cocetratio to the kow value of the cocetratio. d. Aother chemist has to evaluate the ew method as well. He chooses aother approach: he does ot have a referece fluid: he uses oe fluid with ukow cocetratio, but he uses the old method to compare the results of the ew method to. For both the ew ad the old method he observes the cocetratio 10 times ad iteds to compare the results. 6. Is there a differece i crop quatity per are (100 m ) for two wheat varieties? Uder equal coditios the followig results were foud: Variety A Variety B a. Compute the meas ad stadard deviatios for both samples (apply the usual otatios give i sectios ad 3). Is it, i your opiio, reasoable to assume equal variaces? b. Coduct the F-test to check whether the assumptio of equal assumptios is ot rejected at a 5% level of sigificace. c. Test whether there is a sigificat differece i expected crop quatities for the two varieties if α = 5%. Explicitly give all ecessary assumptios i step 1 of the procedure. d. Determie the 95%-cofidece iterval for the expected differece i crop quatities.

139 5-17 e. Does the cofidece iterval i d. cofirm your coclusio i the test i c.? Explai i words. f. Prove the formal relatio betwee the test i c. ad the cofidece iterval i d., that is: show that the set of all differeces Δ 0, for which H 0 : μ 1 μ = Δ 0 is ot rejected agaist H 0 : μ 1 μ Δ 0 at a 5% level of sigificace, is the 95%-cofidece iterval of μ 1 μ. 7. How effective are advertisemet campaigs i icreasig the sales? A store chai evaluates a large campaig for a specific product by comparig the weekly sales of the product i 7 of its stores i the week before ad i the week after the ad campaig. The results (umber of products per week) are: Before ad campaig (x) After ad campaig (y) Summarized Sample mea Sample stadard deviatio x y z y x a. Ivestigate with a suitable test, usig reasoable assumptios for this problem, whether the expected icrease of the sales after the campaig is positive. Give all 8 steps of the testig procedure ad use α = b. If the questio i a. would be: Test whether there is a differece i sales before ad after the ad campaig, what chages would that cause i your solutio? 8. Agricultural experts developed a cor variety as to icrease the essetial amio acid lysie i the cor. To test the quality of this ew cor variety the experts set up a experimet: a experimetal group of oe day old cockerels (male chickes) are give the ew cor variety ad a cotrol group of 0 cockerels are give the regular cor. The icrease i weight (i grams) of cockerels is measured after 1 days: Cotrol group Experimetal group Cosider the observatios x 1,, x 0 for the cotrol group ad y 1,, y 0 for the experimetal group to be realizatios of the variables X 1,, X 0 ad Y 1,, Y 0, resp.

140 5-18 Furthermore z i = y i x i Below a umerical summary of the observatios is give: Mea Stadard deviatio Sample size x y z a. Is this a paired samples problem or a idepedet samples problem. Motivate your aswer. b. Compute a 90%-cofidece iterval for the differece of the expected icreases of weight of the experimetal ad the cotrol group. c. What assumptios are ecessary to apply the formula i b. Test whether the variaces ca be assumed equal, at α = 10%. 9. Cosider the followig model of two idepedet samples: X 1, X,, X 1 is a radom sample of the r.v. X, that has a N(μ 1, σ 1 )-distributio. Y 1, Y,, Y is a radom sample of the r.v. Y, that has a N(μ, σ )-distributio. X, Y, S X ad S Y are the commo otatios of both sample meas ad sample variaces. a. Show that S X /σ 1 S Y /σ has a F 1 1 -distributio (usig the kow distributios of S X ad S Y ). b. Use the property give i a. to costruct a cofidece iterval for σ 1 σ. Show which table values you will use to compute the umerical iterval if 1 = 6, = 10 ad the level of cofidece is 95%.

141 6-1 Chapter 6 Chi-square tests 6.1 Testig o a specific distributio with k categories Example Is a coi fair? That is, if it is tossed, do we have equal probability of Heads or Tails? To check this property, we could decide to flip the coi ofte, e.g times ad observe the umbers of Heads ad Tails: Result Head Tail Total Frequecy If we wat to test the fairess of the coi, we wat to check whether p, the probability of Heads, equals 50% (or: equivaletly we could test whether we 1 p, the probability of Tails, is 50%). The test o H 0 : p = 1 agaist H 1: p 1 is executed with test statistic X = Number of Heads, as we did i chapter 4. The distributio of X uder H 0 is the B (1000, 1 )-distributio, that ca be approximated by a ormal distributio with μ = p = 500 ad σ = p(1 p) = 50. Usually we will apply cotiuity correctio, but i this example we will eglect this. For this two-tailed biomial test we ca determie the rejectio regio with the (approximate) stadard ormal distributio of Z = X 500 If α 0 = 0.05, the we will reject H 0 if 50 X 500 : Z ~ N(0,1) This last expressio ca be squared: Z = ( X ) 1.96 = I sectio 4.3 we itroduced the Chi-square distributio as a sum of squared stadard ormal Z i`s. Here we have, approximately, Z ~ N(0,1), so Z has, approximately, a Chi-square distributio with df = 1. The distributios of Z ad Z are show i the graphs:

142 6- Coclusio: we ca reject the ull hypothesis H 0 : p = 1 for large values of Z, so if Z c. Usig the Chi-square table (df = 1) we fid c = 3.84, i accordace with the value of Coclusio from example 6.1.1: the two-tailed biomial test ca be replaced by a upper-tailed Chi-square test! Example 6.1. Is the dice that we use for playig games fair? Checkig whether all six outcomes are equally likely, ca be doe by rollig the dice ofte ad coutig the umber of occurreces of each of the face up umbers1,, 3, 4, 5 ad 6. If we defie p i = the probability of umber i face up, where i = 1,, 3, 4, 5, 6, the we will expect p i times i face up i rolls. If the dice is perfect ad we roll it 10 times, we expect 10 1 = 0 times each umber face up. 6 Suppose we do ot kow whether a dice is fair ad we roll the dice 10 times, with the followig result: Number face up i Total Number of times i occurs i Expected umber if perfect E(N i ) = = We wat to test H 0 : p 1 = p = = p 6 = 1 6 (fair dice) versus H 1 : p i 1 for at least oe umber i. (ot fair) 6 So: do the observed umbers ( i ) deviate sufficietly from the expected umbers to reject the fairess of the dice? The testig problem as stated i example 6.1. ca be solved by applyig a Chi-square test. This will always be the case if we have a situatio that ca be described with the multiomial distributio, a geeralizatio of the biomial distributio. The biomial test, like the oe i example 6.1.1, is based o idepedet trials with outcomes ( success with probability p ad failure with probability 1 p). The multiomial distributio is based o idepedet trials with k outcomes umbered 1,, k ad with probabilities p 1,, p k. I example 6.1. we had k = 6 outcomes ad we observed the umbers N i of each outcome i 10 (idepedet) rolls of a dice. The formula of the probability fuctio is a geeralizatio of the biomial case as well: multiply the probability of a specific result with the umber of orders i which it ca occur: Biomial: x successes ad x failures Multiomial: N i outcomes of type i (i = 1,, k)! P(X = x) = x! ( x)! px (1 p) x! P(N 1 = 1,, N k = k ) = 1! k! p 1 1 p k k I the multiomial formula the coditios for the totals are:

143 k = ad p p k = 1. Likewise we have for the biomial distributio: x + ( x) = ad p + (1 p) = 1 The umbers N i s are depedet, sice N N k =, but the margial distributio of each N i, the umber of times that outcome i occurs i trials, is biomial: N i ~ B(, p i ). Based o the multiomial distributio of the umbers N i, it is possible to show that a variable with the squared differeces N i EN i has a Chi-square distributio (without formal proof): Property If the umbers N 1,, N k (k ) have a multiomial distributio with success rates p 1,, p k, total umber ad expected values EN i = p i, the the variable k χ = (N i EN i ) EN i i=1 has a approximate Chi-square distributio with k 1 degrees of freedom. A coditio (rule of thumb) for approximatio with the χ -distributio is: EN i 5, i = 1,, k. (Remember that we had a similar coditio for the ormal approximatio of the biomial distributio: p > 5 ad (1 p) > 5) It will be show i example that the statistic χ for k = categories equals Z, as used i example 6.1.1: for two categories χ has 1 degree of freedom, for k categories df = k 1. The value of the variable χ i property ca oly be computed if all p i`s are kow. So, if we wat to test o specific values p i0 of the p i`s, that is H 0 : p i = p i0, i = 1,, k, the the expectatios of the umbers of observatios uder H 0 are kow ad χ ca be computed. Notatio: E 0 N i is the expectatio of N i, if H 0 is true, so E 0 N i = p i0. The test statisic for the test o H 0 : p i = p i0, i = 1,.., k is: χ = (N i E 0 N i ) k i=1 E 0 N i H 0 ~ χ k 1 As before, this is a approximate distributio if E 0 N i 5, i = 1,, k This test is Pearso s Chi-square test: it is a upper-tailed test, sice χ will attai larger (positive) values as the differeces N i E 0 N i betwee the observed umbers ( i ) ad E 0 N i, the expected umbers uder H 0, get larger. Example (cotiuatio of example 6.1.1) If we test H 0 : p = 1 agaist H 1: p 1, the this ca be cosidered to be a multiomial testig problem with k = categories. Whe we defie p 1 = p ad p = 1 p, we will test H 0 : p 1 = p = 1 (so p 10 = p 0 = 1 ):

144 6-4 the umber of successes N 1 ~B(, p 1 ) has uder H 0 a expectatio E 0 N 1 = p 10 = 1 = 500. The umber of failures N has uder H 0 the same expectatio: E 0 N = 1 = 500. χ = (N i E 0 N i ) E 0 N i i=1 Sice N = 1000 N 1, we ca write: χ = (N 1 500) (500 N 1) 500 = (N 1 500) 500 = (N 1 500) 50 + (N 500) 500 = ( N ) Where Z = N is approximately N(0,1)-distributed, so the χ = Z is χ 1 -distributed. Example (cotiuatio of example 6.1.) The Chi-square test o the fairess of the dice, based o a radom sample of 10 rolls. 1. Model: N i = the umber of rolls with i as result (face up), i = 1,, 6. N 1,, N 6 have a multiomial distributio with = 10 trials ad ukow probabilities p 1,, p 6 for the six possible outcomes.. We test H 0 : p i = 1 for i = 1,, 6 versus 6 H 1 : p i 1 for at least oe i, with α = Test statistic is χ 6 = (N i E 0 N i ) i=1, E 0 N i where E 0 N i = 1 = 0 (i = 1,, 6) 6 4. Distributio uder H 0 : χ ~ χ Observed value of χ : Number face up i Total Number of times i = Expectatio if dice is fair E 0 (N i ) = = So χ = (18 0) 0 + (15 0) 0 + (3 0) 0 + ( 0) 0 + (17 0) 0 + (5 0) 0 = 76 0 = The test is: Reject H 0 if χ c, where c = 11.1 i the χ 5 -table such that P(χ 5 c) = χ = 3.8 < 11.1: χ does ot lie i the rejectio regio, so H 0 is ot rejected. 8. We could ot show covicigly that the dice is ot fair, at 5% sigificace level. Apparetly the differeces betwee the observed ad expected umbers are ot statistically sigificat.

145 A applicatio of these Chi-square tests is a test o a fully specified distributio, e.g.: Is the radom sample of umbers x 1,, x draw from the U(0, 1)-distributio? Or: are the observatios radom umbers betwee 0 ad 1? Are the observed IQ s a radom sample draw from a N(100,100)-distributio? Is the umber of accidets per week o a busy juctio Poisso distributed with mea 4? 6-5 The ull hypothesis for this kid of tests is always the fully specified distributio. This kow distributio uder H 0 ca be used to defie categories (itervals of values) such that the probabilities of the categories ca be determied. Coditio for applyig the Chi-square test: the expected values E 0 N i of all categories should be at least 5. Sometimes the Chi-square tests are used to test o a family of distributios, such as the Poisso distributio (o matter what μ is). But to determie the probability of each category, first the ukow parameter has to be estimated: i this case, μ is estimated by the sample mea. Thereafter the expected umbers of the categories ad the value of χ ca be determied. I this approach the estimatio of μ costs oe degree of freedom (ow df = k ). From research we kow, however, that this approach ot always leads to a correct approximated Chi-square distributio. That is why for tests o distributios special tests are developed. I the last chapter we will discuss the Shapiro-Wilk test o ormality. There are more specific tests for distributios such as Gii`s test o the expoetial distributio. Example Does the geder of the first child i a family affect the probability of the geder of the secod? I this example we assume that the world wide probability of a boy, 51%, applies, ad cosequetly the probability of a girl is 49%. If the geders of childre are idepedet, the probability of two boys, two girls, first a boy ad the a girl or first a girl ad the a boy is 0. 51, 0. 49, ad , respectively. I a radom sample of families (with at least childre) we expect that there are families with boys as the first ad secod bor, etc. Are the observed umbers N close to the (theoretically) expected umbers E 0 i a radom sample of = 1000 families? These are the observed umbers i a cross table: First bor Boy (1) Girl () Secod bor Boy (1) Girl () N 11 = 90 N 1 = 19 E 0 = 60 E 0 = 50 N 1 = 5 N = 66 E 0 = 50 E 0 = 40

146 We will test whether the observed ad the expected umbers are sigificatly differet: The umbers N ij of 4 categories of family compositios have a multiomial distributio with total = 1000 ad ukow probabilities p ij, where i = 1 if the first is a boy ad i = 0 for a girl ad j = 1 if the secod bor is a boy ad j = 0 for a girl.. Test H 0 : p 11 = 0.51 ad p = 0.49 ad p 1 = p 1 = agaist H 1 : p or p 0.49 or p or p with α = 5% 3. Test statistic χ = (N ij E 0 N ij ), with E 0 N 11 = = 60.1, etc. E 0 N ij 4. χ has uder H 0 a Chi-square distributio with df = 4 1 = 3 5. Observed value χ = (90 60) 60 + (19 50) 50 + (5 50) 50 + (66 40) Reject H 0 if χ c, where c = 7.81 i the χ 3 -table such that P(χ 3 c) = α = 5% lies i the Rejectio Regio, so reject H The observed umbers of categories of families are at a 5% level sigificatly differet from what would be expected if we assume idepedece ad a 51% probability of a boy.

147 Chi-square tests for cross tables Do political prefereces of voters deped o their geder? Does smokig affect the survival rate at age 65? Does buyig products o-lie deped o the earess of a large city? Are higher educated Dutchme eatig more healthy food tha lower educated Dutchme? Aalysig all of these questios we will come to the coclusio that i each problem there are two categorical variables, such as political preferece ad geder, smokig ad survival after 65, buyig behaviour ad earess of a large city, etc. Each variable cosists of or more categories. E.g., geder cosists usually of categories, me ad wome (a biomial situatio). The categories of political preferece ca be chose differetly: all parties i the coutry or defie, e.g., 3 categories, left, middle ad right. Samples are used to get a idea about the relative magitude of the categories ad about the relatio of the two categorical variables. I a survey o the relatio betwee political preferece ad geder we will ask every respodet to give their geder ad political preferece: we will cout the umber of respodets i each pair of categories of both variables. The observed umbers are usually preseted i a cross table (or cotigecy table). Example 6..1 I a survey the relatio betwee geder (codes: 1 = male, = female) ad political preferece (codes: 1 = left, = middle, 3 = right) amog yougsters is evaluated. A radom sample of 1000 yougsters showed the followig results: Geder This 3 cross table of the variables Geder ad Political preferece cotais the observed umbers ij, where i is the row umber (1 ad ) ad j the colum umber (1, ad 3), e.g. 13 = 0. The total umber of males is the row total 1 = = 50, the total umber of voters with a middle political preferece is the colum total = 1 + = 60, etc. (ote that the dot meas a summatio over that idex ) This way we fid the margial umbers of Geder (last colum) ad Political preferece (last row). The relative frequecies ij Political preferece Left (1) Middle () Right (3) Male (1) Female () ca be computed for each cell of the table, resultig i the joit distributio of the 3 combiatios of categories. Ad computig j i the Total-row ad i i the Total-colum produces the margial distributios of geder ad political preferece:

148 6-8 Geder Male (1) Female () Total Political preferece Left (1) Middle () Right (3) Total 11 = = = 0 1 = % 1.0%.0% 5.0% 1 = 180 = = 160 = % 14.0% 16.0% 48.0% 1 = 360 = 60 3 = 480 = % 6.0% 38.0% 100% The percetages i the table are estimates of the populatio probabilities p ij, where i is the row umber, 1 or, ad j the colum umber 1, or 3. I the populatio the probability of a male is p 1 = p 11 + p 1 + p 13, ad the probability of a Right vote is p 3 = p 13 + p 3 (colum total). Sice the total umbers of males ad females are differet, the distributio of political preferece for males is computed by dividig the umbers i the first row by 1 = 50 males. Likewise the secod row is divided by the total umber of females, = 480. I this way we foud the coditioal distributios of the political preferece for either the males or the females. These distributios may be compared to the margial political preferece distributio i the totals row. Geder Political preferece Left (1) Middle () Right (3) Total Male (1) 34.6% 3.1% 4.3% 100% Female () 37.5% 9.% 33.3% 100% Total 36.0% 6.0% 38.0% 100% Treatig the rows i this way we foud the distributios of political preferece. Similarly the colum distributios show the male-female proportios for each category of voters, ad amog all voters i the total colum. The differece of the proportios right-wig-voters amog males ad females is apparetly 9%, Are the observed political preferece distributios i such a degree differet that we ca coclude that the political preferece of males ad females are differet, i geeral amog yougsters? To aswer this questio we will use aother Chi-square test. We ca use the umbers N ij : these 6 umbers have a multiomial distributio, with = 1000 ad ukow probabilities p ij. To coduct a Chi-square test we first eed a specificatio of these probabilities uder H 0. I example 6..1 the research questio is whether the political preferece distributio of males ad females are the same. I other words: does the political preferece deped o the geder, or : are Political preferece ad Geder idepedet? We will cosider a 3 cross table of two variables to see what the implicatios are of the ull hypothesis of idepedece, assumig that we wat to prove the depedece.

149 6-9 The row variable has two categories: evets A ad A idicate the occurrece of these categories. The 3 colums (categories) of variable are idicated with evets B, C e D. The: p 11 = P(A B), p 1 = P(AB) + P(AC) + P(AD) = P(A), etc.: Variable 1 Variable B C D Total A p 11 = P(A B) p 1 = P(A C) p 13 = P(A D) p 1 = P(A) A p 1 = P(A B) p = P(A C) p 3 = P(A D) p = P(A) Total p 1 = P(B) p = P(C) p 3 = P(D) 1 The coditioal distributio of variable give A is determied by the coditioal probabilities P(B A), P(C A) ad P(D A), where P(B A) = P(A B), etc. P(A) If the row distributios are equal (idepedece) the e.g. P(B A) = P(B) or: P(A B) = P(B) P(A) or P(A B) = P(A)P(B) So if A ad B are idepedet, the: p 11 = p 1 p 1 For the other cells similar equalities ca be derived, assumig equal distributios i the rows. The same formulas ca be foud if we assume the same distributios i the colums. Coclusio: testig whether the row (or colum) distributios are the same is the same as testig o idepedece. We will test H 0 : p ij = p i p j for all (i, j) agaist H 1 : p ij p i p j for at least oe pair (i, j) Applyig a Chi-square test we will compare the observed umbers N ij with the expected umbers E 0 N ij i case of idepedece: we have to compute summatio of (N ij E 0 N ij ), where E 0 N ij = p ij uder H 0. But the idepedece does ot immediately determie p ij. We choose a approach to estimate the values of p ij : we ca use the row ad the colum totals to estimate the p i i the colum of totals ad de p j i the row of totals: the we kow that p ij = p i p j because of the assumptio of idepedece: E 0 N ij Variable 1 Variable B C D Total A A 1 3 Total 1 3

150 6-10 The estimate of p 1 is 1, the estimate of p 1 is 1, so the estimate of p id. 11 = p 1 p 1 is 1 1. Sice E 0N 11 = p 11, E 0 N 11 ca be estimated by 1 1 This is explicitly a estimate of E 0 N 11 i case of idepedece: therefore we will use the so called hat-otatio (like the sample proportio p, a estimate of p): E 0N 11 = 1 1 Sice 1 ad 1 are the row ad colum total for the cell (1, 1) this formula is ofte give i the followig easy-to-remember form: E 0N 11 = 1 1 For each cell (i, j) of the table we fid aalogously: E 0N ij = i j = row total colum total = row total colum total. I case of idepedece the followig proportios are the same: E 0N ij colum total = row total So: E 0N ij = row total colum total Variable 1 Variable. j i E 0N ij. Row total.... Colum total The test statistic for the test o idepedece will be the summatio of terms (N ij E 0N ij ), also i the geeral case of r rows ad c colums, a r c-cross table. E 0N ij Property 6.. (Test o the idepedece of two variables i a r c-cross table) If the umbers N ij (i = 1,, r ad j = 1,, c) have a multiomial distributios with sample size ad ukow probabilities p ij, the the test o H 0 : p ij = p i p j for all (i, j) agaist H 1 : p ij p i p j for at least oe pair (i, j), ca be executed with test statistic: c r χ = (N ij E 0N ij ) j=1 i=1 E 0N ij, where E 0N ij = row total colum total ad χ has uder H 0 a Chi square distributio with df = (r 1)(c 1) The umber of degrees of freedom ca be ituitively explaied as follows: give the totals of all r rows ad all c colums, i each row c 1 umbers N ij ca be chose freely ad i each colum r 1 of these umbers. So the total umber of degrees of freedom i the r c-cross table is df = (r 1) (c 1). Note that if we would test o a specific distributio o all cells of a r c-table, the test statistic χ has a χ distributio with df = rc 1, as has bee illustrated i example

151 6-11 Example 6..3 I 014 i a opiio poll 800 Dutch voters were asked which party they voted durig the geeral electios i 01 ad whether they are satisfied with the policy of the majority govermet by the parties VVD (liberals) ad PvdA (social democrats). The results are preseted i the table below. Political preferece Opiio positive egative Total VVD PvdA Other parties Total We will choose a suitable test to check whether the metioed groups of voters have differet opiios o the govermet`s policy. Clearly we wat to kow whether the Political preferece ad the opiio o the policy are idepedet. We will apply the Chi-square test o idepedece i the 8 steps procedure with α = 1%. I advace we will determie the expected umbers, assumig idepedece, i formula: row total colum total E 0N ij = I the 1 st row we fid: E 0N 11 = = 8.5. Sice the row total is 0, we have E 0N 1 = 0 E 0N 11 = Political preferece 800 Opiio Positive j = 1 Negative j = Total VVD i = 1 N 11 = 90, E 0N 11 = 8. 5 N 1 = 130, E 0N 1 = PvdA i = N 1 = 90, E 0N 1 = N = 100, E 0N = Other i = 3 N 31 = 10, E 0N 31 = N 3 = 70, E 0N 3 = Total = 1. The umbers of observatios i 6 categories N 11, N 1, N 1, N, N 31 ad N 3 (e.g. N 1 is the umber of voters with a positive opiio amog PvdA-voters) have a multiomial distributio with total = 800 ad accessory (ukow) probabilities p 11, p 1, p 1, p, p 31 ad p 3.. Test H 0 : p ij = p i p j (the variables Political preferece ad Opiio are idepedet) agaist H 1 : p ij p i p j for at least oe pair (i, j) with α = 0.01, 3. The test statistic is χ = (N ij E 0N ij ), with estimates E 0N ij = E 0N ij 4. χ has uder H 0 a Chi-square distributio with df = (r 1)(c 1) =. row total colum total

152 5. Observed value χ = (90 8.5) ( ) ( ) ( ) ( ) ( ) This is a upper-tailed test: reject H 0 if χ c. I the χ -table with df = we fid c = 9.1 for the upper tail probability 1%. 7. The observed value 16.5 lies i the Rejectio Regio (χ > 9.1), so reject H 0. = At a 1% sigificace level a relatio betwee the political preferece ad the opiio o the policy of the govermet is prove. Up to this poit we discussed r c-cross tables that were the result of oe sigle sample: for each respodet the value of two variables was determied ad we couted the umbers of respodets for r c combiatios of categories (i the cells of the table). But, what if the observatios cosist of two or more radom samples, e.g. the rows of the cross table cosist of sub-populatios (such as the me ad the wome i a populatio). The we have for each sub-populatio a radom sample with a give sample size. Below a example with two samples is show, for each sample the variable with 3 categories is observed: Variable 1 3 Total Populatio Populatio 1 3 Variable 1 3 Total Populatio 1 p 11 p 1 p 13 1 Populatio p 1 p p 3 1 The model for the observatios i this 3-cross table is ot the same as before, where oe multiomial distributio is defied o all 3 cells, but ow for each sample a multiomial distributio is defied. We will test whether these two multiomial distributios of variable are the same, a test o the homogeeity, istead of a test o the idepedece i a joit distributio of two variables. Note that the umbers i a colum are idepedet ow! Though actually the probability model is chaged, the otatios of the observed umbers N ij i the cells remai the same, the computatio of the expectatios E 0N ij (assumig homogeeous distributios) ad the test statistic ad its distributio all remai the same. Oly step 1 ad of the testig procedure are differet: Probability model for the test o homogeeity if two samples are give i a 3-cross table: The two radom samples are idepedet. The umbers N 11, N 1 ad N 13 i the first row are multiomially distributed with total 1 ad probabilities p 11, p 1 ad p 13. The umbers N 1, N ad N 3 i the secod row are multiomially distributed with total ad probabilities p 1, p ad p 3. Hypotheses for the test o homogeeity: Test H 0 : p 11 = p 1 ad p 1 = p ad p 13 = p 3 agaist H 1 : p 11 p 1 or p 1 p or p 13 p 3

153 6-13 Note that the probabilities of row 1, p 11, p 1 ad p 13, are i total 1, as is the case for the probabilities p 1, p ad p 3 (I the test of idepedece the summatio of all p ij is 1). Example 6..4 A certai vitami is advised to prevet commo colds, but the effectiveess of the vitami is questioable. I a experimet 00 arbitrarily chose adults are divided radomly ito two groups of each 100 persos. Oe group (the treatmet group ) is give the vitami, the other (the cotrol group ) is give a placebo. All 00 participats are told they get the vitami. After a trial period they are asked to report whether they suffered from colds less or more tha before. Here are the results: Less colds More colds No differece Total Cotrol group Treated group If we wat to test whether the vitami iflueces the colds, should we coduct a test o idepedece or homogeeity? The aswer is that we have two idepedet samples with fixed sample sizes 100 i this example ad we wat to compare the distributios of the variable cold for the cotrol ad the treatmet group: a test o homogeeity (we will use α = 5%). 1. We defie N ij = umber of persos cold category j where i = 1, is the idex for the cotrol ad the treatmet group ad j = 1,, 3 for less, more ad equally may colds, resp. The two samples (cotrol group ad treatmet group) are idepedet: Cotrol: N 11, N 1 ad N 13 are multiomially dist. with 1 = 100 ad prob. p 11, p 1 ad p 13. Treated: N 1, N ad N 3 are multiomially dist. with = 100 ad prob. p 1, p ad p 3.. Test H 0 : p 11 = p 1 ad p 1 = p ad p 13 = p 3 agaist H 1 : p 1i p i, for at least oe value of i, with α = 5%. 3. Test statistic: χ = (N ij E 0N ij ) 3 row total colum total j=1 i=1, with estimates E 0N ij = E 0N ij 4. Uder H 0 χ has a Chi-square distributio with df = (r 1)(c 1) = 5. We will first determie the expected umbers, assumig equal distributios: Less colds (j = 1) More colds (j = ) No differece (j = 3) Total Cotrol (i = 1) N 11 = 39, E 0N 11 = 45 N 1 = 1, E 0N 1 = 0.5 N 13 = 40, E 0N 13 = Treated (i = ) N 1 = 51, E 0N 1 = 45 N = 0, E 0N = 0.5 N 3 = 9, E 0N 3 = Total = Observed value: χ = (39 45) (9 34.5) We will reject H 0 if χ c, where c = 5.99, take from the χ -table with α = 5%. 7. The observed χ 3.38 < 5.99, so we fail to reject H 0.

154 At a 5% level of sigificace we could ot show that the vitami has ay effect o the prevetio of cold. We will complete the discussio of Chi-square tests with a example applyig the Chi-square test o the smallest possible cross table, a table, for which we already discussed the twoidepedet-samples biomial test i sectio 1 of chapter 5. We will illustrate the alterative approach of a Chi-square test applied o the same problem as i example Example 6..5 The problem stated i example was: The Uiversity of Twete cosidered i 014 a trasitio to fully Eglish spoke bachelor programmes. Are studets ad lecturers equally ethusiastic about this chage of policy? I a survey it tured out that 115 out of 3 studets were i favour of the trasitio ad 6 out of 108 lecturers. The relevat proportios are = 49.6% ad = 57.4%. We wat to test whether this iformatio shows that the populatio proportios are differet, if α = p 1 p We applied the two samples biomial test with test statistic Z =, that uder p (1 p )( ) H 0 : p 1 = p has a approximate N(0,1)-distributio: reject H 0 if Z 1.96 The observed proportios ca be preseted i a cross table as well: Eglish BSc-programmes I favour agaist Total Studets = 3 Lecturers 6 46 = 108 Sice we have two separate ad idepedet samples we ca apply the Chi-square test o the homogeeity to check whether the proportios i favour of Eglish BSc-programmes ca be assumed equal. The rows give the umbers of successes ad failures for each sub-group: the two idepedet umbers of successes are biomially distributed. Istead of the ull hypothesis H 0 : p 1 = p we will test: H 0 : p 11 = p 1 (ad p 1 = p ) agaist H 1 : p 11 p 1 ow. We will reject H 0 if χ = (N ij E 0N ij ) c. j=1 i=1 E 0N ij It ca be verified that χ Z = ad from the χ 1 -table we fid c = This example shows that the Chi-square test o homogeeity for a -cross table is equivalet to the biomial test o the equality of two proportios. If we coduct a oe-tailed test o the equality of two proportios, we caot use the Chisquare test as a alterative, because the Chi-square test statistic caot distiguish p 1 > p ad p 1 < p.

155 Fisher s exact test To complete the discussio of the tests o cross tables we will metio a alterative for the Chisquare tests for small samples, where the expected values E(N ij ) are small (< 5). I such a case Fisher s exact test is ofte applied. It is based o the hypergeometric distributio as the followig example illustrates. Example A studet experieced that me more frequetly use Apple-laptops tha wome. To verify this cojecture a group of studets was ivestigated ad here are the results: The cross table shows that 7 out of 8 Apples are owed by me, whilst oe would expect that a proportio 1 of 8 Apples ( 4.4 < 5) would be owed by me, if me ad wome are equally likely to ow Apples. Ca we state that these observatios prove statistically that me ow more Apples (at a 5% sigificace level)? If we choose test statistic X = The umber Apples owed by me, the we observe X = 7 i this case ad P(X 7) is the p-value of this right sided test of H 0 : equal Apple rates for me ad wome agaist H 1 : me ow more ofte Apples. X has a so called hypergeometric distributio, sice uder H 0 me ad wome are equally likely to ow a Apple. The, if there are 8 Apples ad 14 No-Apples, what is the probability that the 1 me have (at least) 7 out of 8 Apples? Or: what is the probability we have 7 or 8 Apples if we choose 1 laptops from the available laptops? Schematically (for the choice of laptops for me): So: P(X = 7) = (8 7 )(14 5 ) ( 1 ).48% I geeral: P(X = x) = (8 x )( 14 1 x ) ( 1 ) Apple Other Total Ma Woma Total 8 14 Apple No-Apple 8 8 )(14 4 ) The p value is: P(X 7) = P(X = 7) + P(X = 8) = (8 7 )(14 5 ) ( 1 ) + ( ( 1 ).63% < α = 5%. At a 5% sigificace level we showed that Apples are more frequetly owed by me. Total Laptops 8 14 Sample 7 5 1

156 6-16 I the example above we could have chose Y = The umber of Apples owed by wome as a test statistic, with observed value Y = 1 To proof that me ow more Apples we should compute the p-value P(Y 1) = P(Y = 1) + P(Y = 0) = (8 1 )(14 9 ) ( 10 ) + ( ot coicidetally the same as i the example, as is show below. 8 0 )(14 10 ) ( 10 ).63%, Justificatio of the hypergeometric distributio uder H 0 : Uder H 0 me ad wome have a equal probability p of owig a Apple-laptop, so: As a model of the umbers of laptop owers we have: X~ B(1, p), Y ~ B(10, p) ad X ad Y are idepedet. P(X = 7 X + Y = 8) = P(X = 7 ad X + Y = 8) P(X + Y = 8) = P(X = 7 ad Y = 1) P(X + Y = 8) id. P(X = 7)P( Y = 1) = = (1 7 )p7 (1 p) 5 ( 10 1 )p1 (1 p) 9 P(X + Y = 8) ( 0 = ( 1 8 )p8 (1 p) 1 ( 8 ) 7 )(10 1 ) Ad, maipulatig the biomial coefficiets: 1! P(X = 7 X + Y = 8) = 7! 5! 10! 1! 9!! 8! 14! = 8! 7! 1! 14! 5! 9!! 1! 10! = P(Y = 1 X + Y = 8) If the research questio would have bee Are the proportios of Apple laptops owed by me ad wome differet?, we have a two-sided test: for a observed value x of X we will compute the p-value as follows: the p-value = mi[ P(X x H 0 ), P(X x H 0 )]. For the observed X = 1 i the example the -sided p-value is P(X 1 H 0 ) =.63%.

157 Exercises 1. We wat to ivestigate whether a sample is represetative for the populatio, which is the case if the umber of observatios of each sub-populatio represets the same proportio as the sub-populatio i the whole populatio. A populatio cosists of 7 sub-populatios with proportios 7%, 18%, 15%, 14%, 10%, 9% ad 7%, respectively. A sample has a size of the =150 ad we observed the followig umbers for the 7 sub-populatios: 43, 7, 31, 0, 11, 10 ad 8. Use a suitable test to check the represetativeess of this sample. Apply the 8-steps-testig procedure with α = 5%.. I exercise 4.7 we applied a biomial test for the followig problem: A marketig cosultat is desigig a advertisemet campaig for clothes of girls i the age of 10-1 year. A importat issue is to kow who, i the ed, decides about the purchase: the mother or the daughter. The cosultat referred to a survey of 400 of these purchases, where i 43 times the decisio was take by the mother. Ca we state, at a 5% level of sigificace, that i the majority of the purchases the mother decides? To test H 0 : p = 1 agaist H 1: p > 1 Z = X 00 = we used statistic Z = X that had a observed value The rejectio regio is Z 1.645, if α = 5%. The observed value could be give i two categories: i 43 cases the mother decided ad i 157 cases the daughter. Determie a 1 table for these two categories ad cosequetly determie. The expected umbers for the two categories H 0, The value of χ = (N i E 0 N i ) i=1 for this table where E 0 N i are the expectatios uder H 0. E i The accompayig Rejectio Regio het (with α = 5%) The decisio w.r.t. rejectio of H 0. Verify whether there is a relatio betwee the values of Z ad χ. Is there a similar relatio betwee the critical values c for both tests? Why (ot)? 3. The followig observatios have to be aalysed:

158 6-18 The observatios are cosidered to be a realizatio of idepedet radom variables X 1,, X 0, all with the same distributio. Apply (Pearso s) Chi-square test to check whether the assumptio of stadard ormal distributio for X 1,, X 0. First split the sample space ito 4 itervals with probability 0.5 uder H 0. Use the testig procedure (formula sheet) ad α = 5%. 4. Does the educatioal level of employees ifluece the result of the exam after a traiig? The results ad level of educatio of 10 employees are evaluated. The results of the exam were classified as high, average ad low. Out of the 35 persos with educatioal level 1, 4 scored high ad 0 average. For the 45 persos with educatioal level these umbers were 1 ad 18, resp. Ad 9 of the persos with educatioal level 3 scored high ad average. Aalyse these observatios with a suitable test. Use the testig procedure with α = 5%. (Hit: preset the observed values ad expected values i a cross table first) 5. Compaies ca be classified as small, medium ad large. A questioaire was set to a radom sample of 00 compaies of each category: 98 of the small, 79 of the medium ad 71 of the large compaies replied. Use a test o idepedece or o homogeeity (which oe should we choose?) to check whether the respose ca be assumed to be (roughly) the same. Apply the testig procedure with α = 1%. 6. Exercise 3 of chapter 5 stated: Is there a differece i achievemets by male ad by female PhD-studets? A large uiversity classified all PhD-studets who started i a year ad determied their status after 6 years. After 6 years 98 out of 9 females completed their study successfully, ad 43 out of 795 males. Coduct a test to show whether there is a sigificat differece i success probability betwee male ad female PhD-studets. Use the testig procedure i 8 steps ad a 5% sigificace level. I chapter 5 we used the biomial Z-test o the differece of success proportios p 1 p. We ca choose a alterative approach, usig a Chi-square test for the followig -cross table: a. Should we apply a test o idepedece or a test o homogeeity? b. Apply the chose test with α = 5%. c. Compare the results of the test i b. to the results of exercise 5.3.

159 Below you will fid the results of a survey amog Dutch car drivers. The survey was set up to ivestigate the eed for automatic adjustmets of the speed of the car, depedig o the actual situatio o the road (such as weather coditios). The results i the followig table cocer 1049 car drivers: Frequecy Need for automatic adjustmet of car speed. of car use Very high high perhaps Not cosiderably Not at all > 3 times per week times per week times a moth or less a. Apply a suitable test to ivestigate whether the eed for automatic adjustmet depeds o the frequecy of the car use. Use the full testig procedure ad decide with α = 5%. b. I the table oe value () is less tha 5: does this cross table evertheless meet the coditio E 0 N ij 5 for applyig the Chi-square test? 8. A robot guide, desiged by UT-studets, has two optios to attract people: 1. The robot simulates huma speech to make people follow or. The robot makes electroic oises to attract people. Prior to the user test the studets expected that huma speech would be more attractive: 10 persos were tested with optio 1 ad 9 of them were attracted, but oly 4 of 9 persos who were offered optio were attracted. Does this prove the cojecture of the studets (at a 5% level)? The test results ca be preseted i a cross table, but sice we have small samples here we will use Fisher s exact test to decide whether optio 1 is more attractive. Follow Not Total Optio Optio Total First give the reasoig how Fisher s exact test is applied to this example, the give the complete 8 steps procedure to aswer the research questio ( Is huma speech more attractive ) at a 5% level of sigificace.

160 6-0

161 7-1 Chapter 7 Choice of Model ad No-Parametric methods 7.1 Itroductio A majority of the discussed tests i this reader are based o the assumptio of a ormal distributio of a variable i oe populatio or o the assumptio of ormal distributios of variables i two populatios: The t-test o the expectatio µ The Chi-square test o the variace σ The t-test o the differece of the µ s for two idepedet samples The F-test o the quotiet of variaces The t-test o the expected differece for paired samples Besides these tests we discussed tests for categorical variables: for large samples we used the ormal approximatio of the biomial distributio or the Chi-square distributio for cross tables. The biomial test o the proportio p (also applicable for small samples). The biomial test o the equality of two proportios. The Chi-square test o the distributio of oe or two categorical variables (icludig tests o idepedece ad homogeeity). I this chapter we will take up the approach of problems, where umerical ad mostly cotiuous variables play a role, which caot be modelled with the ormal distributio. I chapter 1 we discussed the evaluatio of the assumptio of a ormal distributio, usig data aalytic methods: umerical measures, histograms ad Q-Q plots. I additio to these idicative methods we will discuss the Shapiro-Wilk test to fialize the decisio o the ormality assumptio, if ecessary. But first of all we will use the Cetral Limit Theorem to apply approximate stadard ormal distributios for the test statistics, if the populatio is ot ormally distributed ad the sample is large (or, i two samples problems, both the samples are large). If, however, the ormal distributio does ot apply to a populatio ad the sample is small, we caot apply the aforemetioed parametric tests ad we will eed alteratives. Parametric tests are the t-tests, the Chi-square test o the variace ad the F-test: based o the assumptio of a ormal distributio of the populatio variable with ukow parameters µ ad σ (or two µ s ad σ s i case of two samples).

162 7- The alterative for a parametric test does ot use the ormality assumptio, or the assumptio of ay other specific distributio, ad is called a o-parametric test. We will discuss two of these alteratives: A alterative for the t-test o the expectatio of a populatio: the sig test. This test ca be applied o the expected differece for paired samples as well. Ad a alterative for he t-test o the differece of the µ s for two idepedet samples: Wilcoxo`s rak sum test 7. Large samples I chapter we costructed cofidece itervals ad tests usig the estimators X for the populatio mea μ ad p = X for the populatio proportio p. We used the followig properties: The estimators are ubiased: E(X) = μ ad E(p ) = p. The variaces of the estimators var(x) = σ p(1 p) ad var(p ) = decrease if icreases. The sample variace S, the estimator of σ, has the same properties. If the distributio, from which the radom sample is draw, is cotiuous but ot ormal, the sample mea X has oetheless a ormal distributio, but this it is a approximate ormal distributio accordig to the Cetral Limit Theorem (CLT) ad oly if is sufficietly large. Whether the sample size is sufficietly large, depeds o the errors you wat to allow i the approximatio. Furthermore the type of populatio distributio at had determies the covergece speed : a symmetric distributio (e.g. the uiform distributio) will usually coverge quicker tha a skewed distributio (e.g. the expoetial distributio). As a overall rule of thumb for applyig the CLT i approximatios we used 5. Sice skewed distributios affect the correctess of approximated cofidece itervals ad distributios of test statistics we will use i this kid of applicatios 40 as a safer rule of thumb to apply the CLT i statistical methods. Users of statistical methods may be tempted to use the stadard ormal distributio for X μ 0 i all S/ cases where the sample size is at least 40 (o matter whether the populatio is ormal or ot), but it should be stated that, if the populatio is ormal, the t-test o μ is preferable: the t-distributio of T = X μ 0 i the t-table is more accurate (exact!) ad cofirms that the table itself ca be used S/ for umbers of degrees of freedom 10 ad less. Whe the sample size is greater tha 10, the differeces betwee t-table ad N(0,1)-table are egligible. So: for a large ( 40) sample draw from a ot-ormal distributio, we have approximately: X μ CLT S/ ~ N(0, 1)

163 7-3 As usual (chapter 3) we ca costruct a cofidece iterval for μ with this variable, but this time a approximate oe: From P ( c < X μ S < c) 1 α it follows: (X c S, X + c S ), with Φ(c) = 1 1 α The iterval is referred to as the approximate cofidece iterval for μ, similar to the iterval for the proportio p. Ad a test o H 0 : μ = μ 0 is coducted with test statistic Z = X μ 0 S/, which is uder H 0 approximately N(0, 1)-distributed. The cofidece iterval ad test above are valid for paired samples as well sice i that case the differeces ca be treated as a oe-sample-problem. The same method of approximatio ca be applied for comparig two populatio meas μ 1 ad μ for which two idepedet samples are available: - If both samples are draw from a ormal distributio the samples t-procedure (cofidece iterval or test) with the pooled variace could be applied, that is, if the variaces ca be assumed equal. If 1 + > 10 the t-table leaves us o other optio tha to use the approximate N(0,1)-distributio - If both samples are draw from ot-ormal distributio we ca apply the approximate ormal distributio if 1 40 ad 40. For large samples we will ot distiguish betwee equal or uequal variaces σ 1 ad σ : We kow that X Y is accordig to the CLT approximately N (μ 1 μ, σ 1 + σ 1 )- distributed We will estimate σ 1 ad σ with S 1 ad S. Because of the large sample sizes these estimates are supposed to have a large accuracy, so: X Y (μ 1 μ ) The cofidece iterval for μ 1 μ has bouds S 1 + S 1 Ad the test o H 0 : μ 1 μ = Δ 0 ca be coducted with the is approximately N(0, 1) X Y ± c S 1 + S, with Φ(c) = α test statistic Z = X Y Δ 0 S 1 + S 1 CLT ~ N(0, 1) uder H0

164 Shapiro-Wilk`s test o ormality I chapter 6 oe of the applicatios of Pearso s Chi-square test was the test o completely specified distributios. It ca be applied to the ormal distributio, but μ ad σ should be specified i advace, e.g., whe the research questio is: Are the IQ`s i the populatio N(100, 81)-distributed?. But i practice we are i the first place iterested i whether or ot a ormal distributio applies, regardless what the parameters are. Shapiro ad Wilk desiged such a specific test o ormality, with test statistic W = ( i a ix (i) ) (X i X) i I Shapiro-Wilk`s W we recogize i the umerator the order statistics X (1),, X (), give the observed sample X 1,, X (remember that X (1) X () X () ). The umbers a 1,, a ca be foud i the Shapiro Wilk table (see tables). The deomiator is what we call the variatio of the data set: if it is divided by 1, we have the sample variace. So the deomiator = (X i X) i = ( 1)S. Shapiro ad Wilk have chose the coefficiets a i such that the value of W is close to 1 if the ormal distributio applies. Sice W 1, a value sufficietly less tha 1 proves that the ormal distributio does ot apply. Shapiro-Wilk`s test is lower-tailed: Rejectio H 0 if W c. The critical value c ca be foud i Shapiro-Wilk`s table for give sample size ad the usual levels of sigificace. Example We ca verify that W attais a value close 1 by usig the 10 th, 0 th,, 90 th percetiles as 9 observatios, beig a perfect sample draw from a stadard ormal distributio. I sectio 1.5 we determied these percetiles, rouded i two decimals: 1.8, 0.84, 0.5, 0.5, 0, 0.5, 0.5, 0.84, 1.8 Not surprisigly, these observatios have a mea x = 0. Ad s Computatio of Shapiro-Wilk`s W: - The deomiator: (X i X) i = ( 1)S = = The umerator: i the table we fid a 9 = , a 8 = 0.34, a 7 = , a 6 = ad a 5 = 0, ad the first four are egative: a 1 = , a = 0.34, a 3 = , a 4 = So the umerator: ( i a i X (i) ) = [( ) ( 0.34) ( ) 0.5

165 7-5 +( ) ] The observed value: W = ( i a ix (i) ) (X i X) = i Accordig to the table the Rejectio Regio is W for α = 5%: the ormal distributio is ot rejected. (Eve if α = 99% is chose, RR = W ad H 0 is ot rejected.) We will state the hypotheses i terms of the distributio fuctio F(x) = P(X x). If we wat to test o the stadard ormal distributio, the distributio fuctio of such a variable Z is Φ(z) = P(Z z): these probabilities ca be foud i the N(0,1)-table. The distributio fuctio of a N(μ, σ )-distributed variable X, ca be expressed i Φ as well: x μ μ F(x) = P(X x) = P (Z ) = Φ (x σ σ ), i which we used that Z = X μ has the N(0, 1)-distributio. σ We will apply this i the ext example. Example 7.3. We will apply Shapiro-Wilk`s test to the followig 30 measuremets, SO -cocetratios i the air: sice the observatios themselves were ot ormally distributed (the histogram was skewed to the right), the logarithm of the observatios were computed, hopig for a ormal distributio after this trasformatio You ca use your calculator to verify that the mea ad the sample variace: x 5.95 ad s These 30 umbers are cosidered to be the observed results of a radom sample X 1,, X 30. The 8 steps of Shapiro-Wilk`s test are, for α = 5%: 1. X 1,, X 30 are idepedet ad all have the same distributio with ukow distributio fuctio F(x).. We test H 0 : F(x) = Φ ( x μ ) (X has a ormal distributio) agaist σ H 1 : F(x) Φ ( x μ σ ) (X does ot have a ormal distributio) with α = 5%. 3. Test statistic: W = ( a ix (i) i ) (X i X) i, where the umbers a i are from Shapiro-Wilk`s table for = Distributio of W uder H 0 is give by Shapiro-Wilk`s table. 5. The observed value of W: the coefficiets of the summatio a i X (i) i ca be foud i the table: oly the (positive) coefficiets a i+1 metioed i the table (i = 0,, 1 ). But we kow that a i+1 = a i. So, e.g., a 30 = ad a 1 = are the coefficiets for X (1) ad X (30). I the total summatio: a 1 X (1) + a 30 X (30) = 0.454X (1) X (30) = 0.454(X (30) X (1) ),

166 7-6 ad likewise for X () ad X (9) we have (X (9) X () ), etc. The computatio of W: i X ( i+1) X (i) Differece a i+1 Pair of terms a i+1 (X ( i+1) X (i) ) Sum =.09 Numerator = sum = deomiator of W equals 1 times the sample variace: 9s = The observed value of W = Shapiro-Wilk is a lower-tailed test: reject H 0 if W c. The table of the Shapiro-Wilk`s test ( = 30) gives us: c = The observed values does ot lie i the Rejectio Regio, so we fail to reject H At a 5% sigificace level we did ot observe sigificat deviatios from the ormal distributio. The The coclusio i this example implies that we ca apply parametric methods o the observed logarithms. For istace, if we would like to determie a cofidece iterval for the expected SO -cocetratio µ, we should take ito accout that the logarithm of the cocetratios is determied: if Y is the origial SO level, the we showed that X = l(y) ca be assumed to have a N(μ, σ )-distributio. We ca determie a cofidece iterval (a, b) for μ = E(X) = E[l(Y)], usig the formula x ± s c, but, sice E[l(Y)] l (EY), a iterval for E(Y), the expected SO -cocetratio caot be determied. Nevertheless, we ca use the relatio betwee the medias M X ad M Y of X ad Y: sice l(m Y ) = M X = E(X) it follows from a < E(X) < b that e a < M Y < e b. Note that Shapiro-Wilk s test does ot statistically prove the ormal distributio: it oly may show that aother distributio caot be prove. Idirectly we will i that case ofte maitai the assumptio of a ormal distributio. We do ot use this test istead of the descriptive methods that we have discussed i chapter 1. It is a additioal method which ca cofirm the coclusios draw from: - Numerical measures such as skewess ad kurtosis.

167 - Graphical displays such as histogram, box plot ad ormal Q-Q plot. - Other cosideratios, e.g. theoretical properties or discreteess of the variable The sig test o the media We will start the discussio of o-parametric methods with the simplest o-parametric method: the sig test o the media. If μ is the populatio mea ad we wat to test the ull hypothesis H 0 : μ = μ 0 agaist (e.g.) the alterative H 1 : μ > μ 0, o the basis of a radom sample of observatios x 1,, x, the we coducted a t-test with test statistic T = X μ 0, assumig a ormal distributio for the populatio S/ variable. But if the populatio distributio is ot ormal ad ukow, the distributio of the test statistic T is ukow ad we caot complete the test. A distributio that could possibly occur is show i the graphs below: o-ormal distributios uder H 0 (µ = μ 0 ) ad uder H 1 (µ > μ 0 ): The graphs of these o-ormal, but symmetric distributios show that uder H 0 the probability of a observatio > μ 0 is 50% ad uder H 1 this probability is larger tha 50%. But the the alterative is more likely if the umber of observatios > μ 0 (x i > μ 0 ) is large! So, istead of usig the sample mea ad the test statistic T, we ca choose this umber of observatios > μ 0 as test statistic. The alterative probability model is i that case: X = the umber of observatios larger tha μ 0 : if H 0 is true, the X is B (, 1 ) If a observatio x i has a value larger tha μ 0, the the differece x i μ 0 is positive. So, X is simply coutig the umber of positve differeces x i μ 0 i the sample. Sice X uses oly the sig of the differeces x i μ 0, ot its (absolute) value we will call this biomial test the sig test. Example A app was desiged for digitally sedig ivoices to a health care isurace compay. Before the app is lauched, a users` test is coducted. Amog other aspects the Task completio time is observed, to check the coditio that customers should be able to sed a ivoice withi 1 miute (60 secods). The followig times (i secods) of 15 customers were measured:

168 7-8 The average time is 68.3 secods, but to test whether this is sigificatly larger tha 60, we are tempted to use the usual oe sample t-test o µ. However, the desigers suspected that the distributio of the times is ot ormal. This suspicio was cofirmed whe they applied Shapiro- Wilk s test: W = 0.840, where for α = 5% rejectio Regio W ca be foud i the Shapiro-Wilk table: the ull hypothesis of a ormal distributio of the task completio times has to be rejected. The sig test offers the followig appropriate solutio: 1. X = The umber of times larger tha 60 i the radom sample of 15 users. X is B(15, p), where p is the ukow probability of a time larger tha 60 secods.. Test H 0 : p = 1 (average time is 60 sec) agaist H 1: p > 1 (average time > 60) with α = 5%. 3. Test statistic: X 4. X has uder H 0 a B(15, 0.5)-distributio (give i thee biomial table, o approximatio ecessary). 5. X = 9 6. We reject H 0 if the p-value α. p-value = P(X 9 p = 0.5) = 1 P(X 8 p = 0.5) = = 30.4% % > 5%, so we fail to reject H At a 5% sigificace level we caot prove that the mea task completio time is larger tha 60 secods. At the itroductio of the sig test i this sectio ad applyig it i example we assumed symmetrical distributios: i that case we have uder H 0 : μ = μ 0 : P(X > μ 0 ) = P(X < μ 0 ) = 1. But, if the distributio of X is skewed the probability 1 is valid for the (populatio) media M, ad ot for expectatio μ, which is illustrated by the graph below: P(X > M) = P(X < M) = 1 For skewed distributios the probability P(X > μ 0 ) ca oly be determied if the distributio is completely specified. I geeral, this is ot the case. Coclusio: if we wat to use the sig test with a test statistic that couts the umber of observatios larger tha a specific umber, as a o-parametric alterative for the t-test o μ, we should be aware that we are coductig a sig test o the media. The value o which we test is a value M 0 of the media. Ad we cout the umber X of observatios larger tha m 0. H 0 : p = 1 is equivalet with H 0: M = M 0 ad the alterative (e.g.): H 1 : M < M 0 with H 1 : p < 1.

169 7-9 For symmetrical distributios we do have: H 0 : p = 1 H 0: M = M 0 H 0 : μ = μ 0 For paired samples we discussed i chapter 5 the oe-sample t-test o the expected differece, mostly testig o the expected differece μ = 0 (o effect of treatmet whe cosiderig the differeces after before ) agaist the alterative of a positive or egative (or both) effect. For this problem the sig test o the media is a o-parametric alterative as well: We will test H 0 : M = 0 or H 0 : p = 1 with test statistic X = the umber of positive differeces. X has uder H 0 a B (, 1 )-distributio. As before, if the differeces are symmetrically distributed (which is ofte the case for paired samples), the H 0 : M = 0 is the same as H 0 : µ = 0 Example 7.4. Are High school studets with a sciece profile better i mathematics tha i Eglish? Let us compare the exam scores of a radom sample of = 30 studets: Studet Mathematics Eglish Studet Mathematics Eglish We wat to statistically show (with α 0 = 10%), that studets with a sciece profile are better i Math tha i Eglish, i geeral. We are told that the ormal distributio does ot apply to the differeces. 1. X = The umber of positive differeces (Math E) i the sample of = 30 studets X is B(30, p), where p = probability of a higher score o Math tha o Eglish.. Test H 0 : p = 1 ( No systematic differece betwee Math ad Eglish scores ) agaist H 1 : p > 1 (The Math-score is systematically higher tha the Eglish-score ) with α 0 = 10%. 3. Test statistic: X 4. Distributio X uder H 0 : B (30, 1 ), so approximately N (30 1, ) 5. Observed: X = Reject H 0 if the upper-tailed p-value α 0, where the p-value is P(X 0 H 0 ) c.c. = P(X 19.5 H 0 ) = P (Z ) 1 P(Z 1.64) = 5.05% 7. Decisio: the p-value = 5.05% < α 0, so reject H At a 10% level of sigificace we showed that studets with a sciece profile have higher scores o Mathematics tha o Eglish. I the example we used the ormal approximatio of the biomial distributio of X (with cotiuity correctio) the rule of thumb for applyig this approximatio, 5 ad both p > 5 ad (1 p) > 5, is fulfilled. 7.5

170 7-10 But the sig test always uses p = 1, for which the biomial distributio is symmetric (about 1 ) ad coverges more rapidly to a ormal distrributio tha biomial distributios i geeral do. That is why we ca use a weaker rule: Rule of thumb for a ormal approximatio N ( 1, 1 ) of the B (, 1 )-distributio: 15 4 If the ormal distributio applies to the differeces (or applies approximately), it is better to coduct the t-test o the differeces: the power of the t-test is larger tha the power of the sig test for the same sample of observatios. This is uderstadable, sice the sig test oly uses the sig of each differece, ot its absolute value. Research showed that we eed about 10% more observatios for the sig test to provide the same power. A last remark has to be made about differece which are 0, so either positive or egative: how should we cout them i determiig the umber of positive differeces? The aswer is simple: just remove the 0-differeces from your data set ad coduct the sig test o the remaiig differeces. Overview of the oe sample tests o the ceter of a distributio (ceter beig mea or media): Test statistic for Populatio σ model Cofidece iterval Fid c i CI, test o H for µ 0 : μ = or c i RR i the μ 0 N(μ, σ ) Not-ormal, ay distributio σ kow, ay σ ukow, ay σ ukow, 40 (X c (X c (X c < σ, X + c σ ) Z = X μ σ 0 S, X + c S ) T = X μ 0 S S, X + c S ) Z = X μ 0 S Sig test o the media N(0,1)-table t 1 -table N(0,1)-table, approximatio! B (, 1 )-table 15: N (, 4 )

171 Wilcoxo`s rak sum test If we wat to compare the mea values of two populatios with two idepedet radom samples ad the samples are small, that is, if 1 < 40 ad/or < 40, the ormality of both populatios is a coditio to apply a parametric method (the two samples t-procedure i chapter 5). Wilcoxo desiged a method for which o specific distributio for the populatios eeds to be assumed: we will use the example of a test o H 0 : o structural differece of the variables X ad Y versus H 1 : the values of X are structurally greater tha the values of Y. If x 1,, x 1 ad y 1,, y are the observatios, the approach uses order statistics: Order all observatios x 1,, x 1, y 1,, y i oe sequece from small to large, such that each of the observatios is awarded a rak betwee 1 ad 1 + = N. We will add all raks R(X i ) of (oly) the x-values: W = i R(X i ) The sum of the raks R(X i ) will be (relatively) large, if the x-values are systematically (structurally) higher tha the y-values. I these steps we described the simple approach that was proposed ad prove by Wilcoxo: we will reject H 0 : o structural differece i favour of the alterative H 1 : the x-values are structurally greater tha the y-values if the sum of raks is large eough: reject H 0 if W = R(X i ) i c, a upper-tailed rejectio regio i this case. First we will determie the exact distributio of W for very small samples ( 1 5 ad/or 5). Cosequetly we will cosider the wider applicable ormal approximatio of W for 1 > 5 ad > 5. Example iphoe owers state that a higher price of these phoes, compared to the equivalet Samsug smartphoes, is justified sice they will use the phoes for a loger period. For this reaso a studet orgaizatio cosiders a advice to buy a iphoe. To justify such a advice the orgaizatio asked its members to report the period of use of their last iphoe or Samsug smartphoe (both older types). For both types of phoes 4 periods of use (i years) were reported: iphoe:.1, 4., 4.8, 6. Samsug: 1.1, 1.8, 3.1, 4.5 The mea period of use is differet i favour of the iphoes: 4.3 years agaist.6 years. But: is this differece (statistically) sigificat? Sice i the past periods of use did ot tur out to be ormally distributed, we do ot wat to adopt such a possibly false assumptio: as alterative for the two samples t-test we will apply Wilcoxo`s rak sum test with a 5% level of sigificace. First we will order all 8 observatios: 1.1, 1.8,.1, 3.1, 4., 4.5, 4.8, 6. The sum of raks of the iphoe periods (bold ad uderlied) is: W = = 3 The questio is: is this sum of raks large eough to claim that iphoes are used loger i geeral? If the ull hypothesis is true, both types have the same distributio of the period of use: the each observed period ca have each rak with equal probability ( 1 ): the expected rak of each 8

172 7-1 observatio is 1+8. But the the expected sum of raks of the four iphoes is uder H 0 E 0 (W) = = 18: the observed value W = 3 is ideed larger tha expected. But how large is the probability of W 3 if H 0 is true? Or: does W = 3 lie i the Rejectio Regio? To aswer this questio we eed the distributio of W. Let`s start with the largest possible value: what is the probability of W = = 6? This is oe of the combiatios of 4 raks chose from the 8 raks 1,,, 8: uder H 0 each combiatio is equally likely ad there are i total ( 8 ) = 70 combiatios of 4 raks out of 8, so 4 P(W = 6 H 0 ) = 1 ( 8 4 ) = The rak sum W = 5 oly occurs if W = : probability W = 4 occurs for raks , ad for : probability 70, etc. Now we ca compute P(W 4 H 0 ) = % > α 0, where P(W 5 H 0 ) = 1+1.9% 5%. The Rejectio Regio is W The observed value W = 3 does ot lie i the rejectio regio, so we fail to reject H 0. At α 0 = 5% we caot claim that iphoes are used structurally loger tha Samsug smartphoes. Istead of the exact distributio of Wilcoxo s W, we will ofte use the approximately ormal distributio as give i the followig property if the samples are sufficietly large: Property 7.5. If X 1,, X 1 ad Y 1,, Y are idepedet radom samples, draw from ukow but equal distributios ad the raks are determied i the total sequece of 1 + = N observatios, the the sum of raks of the x-values W = 1 i=1 R(X i ) is, for large 1 ad, approximately ormally distributed with μ = E(W) = 1 1(N + 1) ad σ = var(w) = (N + 1) As a rule of thumb for applyig this ormal approximatio, we will use: 1 > 5 ad > 5 The approximate ormal distributio follows from a versio of the CLT, that applies to depedet variables. We will oly prove the formulas for expectatio ad variace. Proof: Simplifyig the otatio of the raks to R i = R(X i ) we kow that uder H 0 each value of a rak is equally likely: P(R i = k) = 1, i = 1,,, N N For the proof we eed furthermore: N k=1 k = 1 N(N + 1) ad N k k=1 = 1 N(N + 1)(N + 1) 6 These formulas ca be prove by iductio. E(R i ) = N+1 (usig symmetry of the distributio of R i ) ad variace of each rak is R i : var(r i ) = (ER i ) (ER i ) = N k 1 k=1 ( N+1 N ) = 1 N(N + 1)(N + 1) 1 1 (N + 6 N 4 1) = 1 1 (N 1) If we ote that the joit distributio of the (depedet) r.v. R 1,, R 1 is symmetrical, we fid:

173 E(W) = E( 1 i=1 R i ) = 1 E(R 1 ) = 1 N+1 = 1 1(N + 1). var(w) = var ( R i ) = cov(r i, R j ) = var(r i ) + cov(r 1, R ) i=1 E(R 1 R ) = k l P(R 1 = k ad R = l) k l i j 1 i=1 i j = 1 var(r 1 ) + ( 1 1 ) cov(r 1, R ) = k l P(R 1 = k) P(R = l R 1 = k) = k l 1 N 1 N 1 k l N N = 1 N 1 [ k l N 1 k=1 l=1 = 1 N 1 N 1 [(1 N(N + 1)) k l] k = l k 1 N(N + 1)(N + 1)] 6 = 1 (N + 1)(3N + ), So: 1 cov(r 1, R ) = E(R 1 R ) E(R 1 )E(R ) = 1 (N + 1)(3N + ) (N+1 1 ) = 1 (N + 1) 1 Hece: var(w) = (N 1) + ( 1 1 ) 1 (N + 1) = (N 1) Property 7.5. assumes that the uderlyig distributios of the two samples are the same, which is the assumptio uder the ull hypothesis. The alterative hypothesis assumes that the distributios have the same shape, but oe of the distributios is shifted to the right of the other distributio, the shift alterative. Illustrated i a graph: l If the graph of the desity of X is shifted to the right of the desity of Y, so that the relatio of both desities ca be give by f X (x) = f Y (x a) with a > 0. We will adopt this otatio with desity fuctios to state the hypotheses. Note Givig the hypotheses i terms of (shifted) desities suggests that Wilcoxo`s rak sum test is oly applicable to cotiuous distributios. But it ca be applied to discrete variables as well,

174 7-14 provided that the distributio of the discrete variable ca be approximated by cotiuous distributio (the rage should ot cosist of a small umber of values that the variable ca attai). Usually we will have ties (observatios with the same value, see the ote below) i that case. Furthermore it is ot ecessary that the shapes of the distributio are exactly the same (apart from a shift), but it is sufficiet that uder the alterative the percetiles of oe distributio should be systematically larger tha the correspodig percetiles of the other distributio. I this kid of problems the hypotheses are usually give with the distributios fuctio, e.g.: test H 0 : F X = F Y agaist H 1 : F X F Y ad F X F Y. Example A studet has to produce a paper o the differeces of muicipal taxes for buildig permits. The presumptio is that the permit rates i large cities are higher tha similar permits of tows i the coutry side. He coducted his research by trackig dow the permit rates for reovatio plas of i 8 cities (> ihabitats) ad i 8 tows. Here are the results (i ): r City Tow Coduct a o-parametric test with α 0 = 0.10 as to verify whether the rates i the cities are higher tha i tows. Solutio: we will apply Wilcoxo`s rak sum test as a o-parametric alterative of the two idepedet samples t-test. First we will order the observatios ad determie the raks of the permit rates of the 8 cities: Observatio Rak City * * * * * * * * 8 So W = i=1 R(X i ) = = 78. The testig procedure applied: 1. We have two idepedet radom samples X 1,.., X 8 ad Y 1,, Y 8 of permit rates i cities ad tows, resp.: the desity fuctios f X ad f Y are ukow.. We will test H 0 : f X (x) = f Y (x) agaist H 1 : f X (x) = f Y (x a) where a > 0, with α 0 = i=1. 3. Test statistic W = R(X i ) 4. Sice 1 = > 5, W is approximately ormally distributed, with: μ = E(W) = 1 1(N + 1) = 1 8 (16 + 1) = 68 ad σ = var(w) = (N + 1) = (16 + 1) Observed: W = This test is right-sided: reject H 0 if the upper-tailed p-value P(W 78 H 0 ) α 0. P(W 78 H 0 ) c.c = P(W 77.5 H 0 ) P (Z ) 1 Φ(1.00) = 15.87% 90.67

175 p-value 15.87% > 10% = α 0, 8. At a 10% sigificace level the observatios do ot prove sufficietly that cities use systematically higher permit rates tha tows. Cotiuity correctio Note that we applied cotiuity correctio whe calculatig the p-value i step 6 of Wilcoxo s rak sum test. Similarly to the cotiuity correctio i case of ormal approximatio of biomial probabilities, the approximatio with cotiuity correctio is more accurate here as well. Ties Whe raks have to be determied, equal observatios occur i practice: the observatios i these ties should be awarded the same rak, the mea of the raks. If, for istace, a tie cosists of 4 equal observatios with raks 4, 5, 6 ad 7, they all get the rak 5.5. As before, the statistic W couts oly the raks of the x-values. E(W), the expected rak sum uder H 0, will remai the same but the variace chages. Though the formula of the variace is ot part of the course cotet we will give it for the sake of completeess. If we defie t j as the umber of observatios i tie j, whe the ties are umbered i a ascedig order, the variace of the rak sum of the x-values is: var(w) = (N 3 3 j t j ) N(N 1) We ote that, if there are oly a few ties, the value of var(w) will ot chage much: if, for istace, i example oly two observatios are the same (so t j =, for oe tie with two observatios ad t j = 1 for all other ties of oe observatio), the: var(w) = 1 1 (N 3 3 j t j ) = (163 [ ]) (versus without ties). 1 N(N 1) 1 16(16 1) So if there are ot may ties, rule of thumb = less tha 0% of the observatios, we ca use the formula without ties as a good approximatio. But if there are ties, we will skip the cotiuity correctio sice the value of W is ot always iteger. Overview of two-samples problems with respect to the differece of two populatio meas: Populatio model N(μ 1, σ 1 ) ad N(μ, σ ) Not-ormal, (arbitrary distributio) 1 ad σ 1 e σ σ 1 = σ, all 1, σ 1 σ, 1 40 ad ad 40 1 < 40 or < 40 bouds CI for μ 1 μ X Y ± c S ( ) X Y ± c S 1 X Y ± c S 1 + S 1 + S 1 Test o H 0 : μ 1 μ = Δ 0 T= X Y Δ 0 S ( ) Z = X Y Δ0 S 1 1 +S Z = X Y Δ0 S 1 1 +S Wilcoxo`s Rak Sum Test: W = R(X i ) i Fid c i CI or c i RR i the t 1 + -table N(0,1)-table N(0,1)-table, approximatio!

176 Exercises 1. Two brads (A ad B) of copper polish are compared: 3 copper plates were exposed to every kid of weather o differet places i the coutry. After a period the plates were collected ad divided i two: oe half was treated with copper polish A ad the other with copper polish B. The results were show to a committee of experts without metioig the brad. The committee graded the polish result for all 46 half plates, as show i the table below. plate A B a. Is this a problem with two idepedet samples or paired samples? b. Is the ormal distributio a proper model for the preseted observatios? c. Ivestigate (test) whether there is a systematic differece i grades of polish A ad polish B. Use α 0 = 0.05 ad, if ecessary ad possible, a ormal approximatio.. A cosumer`s orgaizatio tests the quality of service of several iteret providers. Oe of the aspects to be tested is the waitig time for cliets of the telephoic help desk. For oe iteret provider the average waitig time was 3 miutes ad 0 secods, so 00 secods, which was way larger tha the mea i the market. The iteret provider promised to improve its service. Ad oe year later the cosumer`s orgaizatio ra aother test. The results of both tests are show below. Sample size Mea (i sec) Stadard deviatio (i sec). Last year This year The histogram of both samples showed a strog skewess (similar as the expoetial distributio. We wat to test whether the iteret provider succeeded i reducig the waitig times. a. Which test seems to be the most suitable i this case? b. Coduct the chose test i a. with α = 5%. c. Compute the p-value of the test i b.: what does this probability tell us about the stregth of the proof i b.? 3. The mea temperatures (i ) i July i De Bilt durig a twety year period were, after orderig:

177 7-17 a. Use the Shapiro-Wilk`s test to verify whether the ormal distributio applies to the Julytemperatures. Use α = 5% ad the 8 steps of the testig procedure. The (mea) July temperature i Maastricht is 17.4, derived from statistics over 100 years. Do the 0 observatios i De Bilt show that the July temperature is lower i De Bilt? We will test this presumptio i two ways (with ad without ormality assumptio): b. Coduct a t-test to verify whether the expected July temperature i De Bilt is less tha 17.4 (α = 5%). c. Coduct a o-parametric test as a alterative for the test i b. First explai the meaig of the ull hypothesis. 4. (exercise 7 of chapter 1 revisited, ow icludig Shapiro-Wilk`s test) A group of 38 owers of the ew electric car Nissa Leaf is willig to participate i a survey which aims to determie the radius of actio of these cars uder real life coditios (accordig to Nissa about 160 km). The owers reported the followig distaces, after fully chargig the car. The results are ordered. Furthermore a umerical summary ad two graphical presetatios are added. Oe of the questios to be aswered is whether the ormal distributio applies. I their evaluatio the researchers stated that the observatio ca be cosidered to be a radom sample of the distaces of this type of cars Numerical summary: Sample size 38 Sample mea Sample stadard deviatio 9.15 Sample variace Sample skewess coefficiet Sample kurtosis 3.09 a. Use the box plot method to determie (potetial) outliers.

178 7-18 b. Assess whether the ormal distributio is a justifiable model based o, respectively: 1. The umerical summary.. The histogram 3. The Q-Q plot What is your total coclusio? c. Performig Shapiro-Wilk`s test the observed value W = was foud. Fid i the table the value of the coefficiet a 3 of X (3) (i the formula of W). d. Decide o the basis of the observed value of Shapiro Wilk`s W = whether the ull hypothesis of a ormal distributio has to be rejected at α = 10%. (Note that you are ot asked to give all 8 steps of the procedure here: 5-8 is sufficiet.) 5. Exercise 6 of chapter 5 revisited: Is there a differece i crop quatity per are (100 m ) for two wheat varieties? Uder equal coditios the followig results were foud: (I the table below the observed quatities are give i oe decimal ow.) Variety A Variety B a. Which parametric test was coducted o the origial observatios ad what was the result? b. What is the o-parametric alterative of the test i a. if the assumptio of ormal distributios is ot justified? c. Coduct a o-parametric test with α = 1% to verify whether the crop quatities are differet. Use the p-value to decide. 6. The result of two samples is available: i x i y i Assume that x 1,, x 4 is a e realizatio of a radom sample of X ad y 1,, y 4 of a radom sample of Y. The samples are idepedet. The (ukow) expectatios ad variaces will be deoted as μ X, σ X, μ Y ad σ Y. Use i parts a. ad b. the additioal assumptio of ormal distributios. a. Test, at a sigificace level α 0 = 0.10, the assumptio of equal variaces. b. Test H 0 : μ X = μ Y agaist H 1 : μ X < μ Y, assumig that σ X = σ Y. Coduct the test usig the p-value ad a sigificace level α 0 = c. The o-parametric alterative: apply Wilcoxo`s rak sum test to verify whether Y is stochastically (structurally) larger tha X, at a sigificace level α 0 = 0.05.

179 Chapter 8 The distributio of S ad related topics 8-1 This is a techical chapter where we prove that ( 1)S σ has the χ 1 -distributio where S is the sample variace with respect to a radom sample of a ormal distributio (Property 3.3.a). Furthermore we prove property 3..5d ad we study the two samples problems agai with ormal distributios ad commo variace. This chapter is also a preparatio for the theoretical part of chapter 10 about regressio. I this chapter liear algebra is used as a tool withi statistics. This is a chapter without exercises. Theoretical questios about this chapter may be part of the writte examiatio. 8.1 A umber of distributios I this sectio we repeat a umber of distributios. We itroduce liear algebra by cosiderig stochastic vectors. Our mai goal is itroductio of the multivariate ormal distributio ad a few properties of this distributio. Defiitio F-distributio A radom variable F has the F g f -verdelig (the F distributio wit f ad g degrees of freedom) if F ca be represeted i the followig way: F = U/f V/g where U ad V are idepedet radom variables, U havig the χ f distributio ad V havig the χ g distributio. Defiitio 8.1. Chi square distributio (χ -distributio) U 1 U If the elemets U i of a stochastic vector U = ( ) are idepedet ad all are distributed U accordig to a stadard ormal distributio, the U i = U T i U has the χ distributio (the χ - distributio with degrees of freedom).

180 8- Defiitio (Studet s) t-distributio A radom variable T has a t f distributio (t-distributio with f degrees of freedom) if T ca be Z represeted as follows: T = V/f where Z ad V are idepedet radom variables, Z havig the stadard ormal distributio ad V havig the χ f distributio. Note that if T has the t distributio with f degrees of freedom, the T has the F-distributio with 1 ad f degrees of freedom. From ow o we cosider sometimes stochastic vectors, vectors whose elemets are radom variables. For these radom vectors we formally defie expectatios, show some properties for these expectatios, ad itroduce the variace covariace matrix. Defiitio U 1 E(U 1 ) U The expectatio E(U) of the radom vector U = ( E(U ) is the vector E(U) = ( ) ). U E(U ) Property 8.1.5: E(U + V) = E(U) + E(V) Proof: The vector E(U + V) has elemets E(U i + V i ) = E(U i ) + E(V i ), right had side is the elemet i of E(U + V). Property 8.1.6: E(AU) = A E(U) for a (oradom) matrix A (if the product AU is defied properly). Proof: Elemet i of E(AU) is equal to E( j a ij U j ), if the matrix A has elemets a ij. Usig probability theory we have E( j a ij U j ) = j a ij E(U j ), which is elemet i of the matrix A E(U). Defiitio The variace covariace matrix of a stochastic vector For a stochastic vector U it is useful to defie a matrix cotaiig all variaces var(u i ) ad covariaces cov(u i, U j ). It is the followig matrix:

181 8-3 var(u 1 ) cov(u 1, U ) cov(u 1, U ) cov(u (, U 1 ) var(u ) cov(u, U ) ) cov(u, U 1 ) cov(u, U ) var(u ) This matrix will be deoted by var(u), sice it geeralizes the variace of a radom variable. As var(u i ) = cov(u i, U i ) all elemets of var(u) ca be writte as covariaces: cov(u 1, U 1 ) cov(u 1, U ) cov(u 1, U ) cov(u var(u) = (, U 1 ) cov(u, U ) cov(u, U ) ). cov(u, U 1 ) cov(u, U ) cov(u, U ) The variace covariace matrix var(u) ca be expressed shortly i the followig way: var(u) = E((U µ)(u µ) T ) with μ = (μ 1, μ,, μ ) T = E(U). Here it is uderstood that for a stochastic matrix like (U µ)(u µ) T a expectatio ca be defied as well, it is the correspodig matrix of the expectatios of the respective elemets. The (i, j) elemet of the stochastic matrix (U µ)(u µ) T is (U i μ i )(U j μ j ). The (i, j) elemet of the matrix E((U µ)(u µ) T ) is therefore equal to We hece proved the followig property: E ((U i μ i )(U j μ j )) = cov(u i, U j ) Property : var(u) = E((U µ)(u µ) T ) for stochastic vectors U with μ = E(U). For calculatig variaces it is coveiet to itroduce the followig rule. Property Cosider a stochastic vector U havig expectatio µ = E(U) ad variace covariace matrix Σ = var(u) For oradom matrices A the followig equality holds: var(au) = AΣA T (if the product AU is defied properly). Proof: The (i, j) elemet of the matrix var(au) is the expectatio of the (i, j) elemet of the stochastic matrix (AU Aμ)(AU Aμ) T, where we applied property for E(AU) = Aμ.

182 8-4 Let a ij deote the (i, j) elemet of the matrix A ad let σ ij = cov(u i, U j ) be the (i, j) elemet of Σ = var(u). We rewrite (AU Aμ)(AU Aμ) T a little: (AU Aμ)(AU Aμ) T = A(U μ) (A(U μ)) T = A(U μ)(u μ) T A T, where we applied the rule (AB) T = B T A T for matrix products. Note that the (k, m) elemet of (U μ)(u μ) T is (U k μ k )(U m μ m ) ad that the (m, j) elemet of the trasposed matrix A T is equal to a jm. The (i, j) elemet of A(U μ)(u μ) T A T is therefore k m a ik (U k μ k )(U m μ m )a jm. Its expectatio is the (i, j) elemet of var(au), it is moreover equal to E ( a ik (U k μ k )(U m μ m )a jm ) = a ik E((U k μ k )(U m μ m ))a jm k m k m = a ik σ km a jm k m The last expressio is the (i, j) elemet of the matrix AΣA T, so we coclude that var(au) = AΣA T Multivariate ormal distributios Multivariate ormal distributios are ormal distributios for radom vectors X. We start with expressig the ormal distributios of a radom variable X i some coveiet way. The basic feature of the ormal distributio for radom variables X is the followig property: the radom variable X has the N(µ, σ ) distributio if Z = (X μ) σ has the stadard ormal distributio ( N(0,1) distributio). The other way aroud: we ca write X = µ + σz wheever the radom variable X has the N(µ, σ ) distributio. We use the expressio X = µ + σz for geeralizig the ormal distributios i order to defie multivariate ormal distributios for stochastic radom vectors.

183 Defiitio Stadard ormal distributio of dimesio We firstly defie a stadard ormal distributio of dimesio. This is the distributio of the Z 1 Z 8-5 radom vector Z = ( ), where the elemets Z i are idepedet ad all elemets Z i have the Z stadard ormal distributio. This stadard ormal distributio of dimesio is deoted by N(0, I ), or N(0, I) if the dimesio is obvious. Defiitio multivariate ormal distributio N(μ, Σ) A radom vector X = (X 1, X,, X ) T has the multivariate ormal distributio with expectatio μ = (μ 1, μ,, μ ) T ad variace covariace matrix Σ ( matrix), otatio: X ~ N(μ, Σ), if X ca be writte as: X = µ + AZ, with Z havig a stadard ormal distributio of dimesio r( ) ad A beig a oradom matrix such that AA T = Σ ad Z ~ N(0, I). Remarks about the multivariate ormal distributio: Remark 1 Ideed E(X) = µ ad var(x) = A var(z) A T = A A T = Σ if X = µ + AZ. The matrix A, however, is ot determied uiquely by AA T = Σ. Remark The regular case is that A is a ivertible matrix, but i priciple the matrix A might be a r matrix with r < (the Z ~ N(0, I r ) ). Remark 3 Check that a T X ~ N(a T µ, a T Σa) if X~N(µ, Σ). Here a T X = a 1 X 1 + a X + + a X is a liear fuctio of the elemets X i. From a theoretical poit of view the multivariate ormal distributio is already determied by the requiremet that for each vector a 0 the radom variable a T X has the (uivariate) ormal distributio N(a T µ, a T Σa). I some textbooks the defiitio of the multivariate ormal distributio is therefore as follows: X has the multivariate ormal distributio with expectatio µ ad variace covariace matrix Σ if a T X ~ N(a T µ, a T Σa) for each vector a 0. I this course we restrict to defiitio

184 We coclude this sectio with two properties of the multivariate ormal distributio. 8-6 Property If X ~ N(µ, Σ) the A X ~ N(Aµ, AΣA T ), for oradom matrices A (if the product AX is defied properly). Proof: Sice X ~ N(µ, Σ) we ca write X = µ + BZ with a (oradom) matrix B such that BB T = Σ ad Z~N(0, I). The we get A U = Aµ + AB Z which has a multivariate ormal distributio with expectatio Aµ ad variace covariace matrix var(a U) = (AB)(AB) T by defiitio. It remais to show that the variace covariace matrix of A X is give by AΣA T but this results from (AB)(AB) T = A B B T A T = A Σ A T. Cosider a radom vector X which cotais two subvectors X 1 ad X, X = ( X 1 X ). Defie μ 1 = E(X 1 ) ad μ = E(X ), so μ = E(X) = ( μ 1 μ ). Partitio the variace covariace matrix as follows: Σ = ( Σ 11 Σ 1 ), Σ 1 Σ with Σ 11 = var(x 1 ), Σ = var(x ) ad Σ 1 ( Σ T 1 = Σ 1 ) cotaiig the rest of the covariaces of the elemets of X. Without proof we preset the followig property. Property If X = ( X 1 X ) ~ N (( µ 1 µ ), ( Σ 11 Σ 1 Σ 1 Σ )), the X 1 ad X are idepedet if ad oly if Σ 1 = The distributio of the sample variace S Suppose we observe idepedet radom variables X 1, X,, X havig all a (commo) N(μ, σ )-distributio. Let us study the sample variace S = i (X i X ) ( 1) more closely. We wat to establish the distributio of S.

185 8-7 Startig poit is that we represet our data X 1, X,, X by meas of oe vector X = ( ). X Note that the variables Z i = (X i μ) σ are idepedet ad are distributed accordig to the stadard ormal distributio. Sice μ μ Z X = ( ) + AZ with matrix A = σ I ad Z = ( ), μ Z where I is the idetity matrix of order, we may say that the vector X has the multivariate ormal distributio with expectatio (μ, μ,, μ) T ad variace covariace matrix Σ = AA T = σ I. Let us cosider a orthoormal basis u 1, u,, u for the space of -dimesioal vectors, so u i = 1 ad u i T u j = 0 for i j. We choose the first vector of the basis as follows: For the vector X we obtai: 1 u 1 = 1 ( 1 ) X = (u T 1 X)u 1 + (u T X)u + + (u T X)u with u T 1 X = i X i Z 1 X 1 X = X The we are ready to rewrite the sum of squares (X i X ) i : (X i X ) i = X X 1 = (u T 1 X)u 1 + (u T X)u + + (u T X)u X 1 Where 1 stads for the vector havig oly elemets that are equal to 1. Note that (u T 1 X)u 1 = X 1 ad thus (X i X ) i = (u T X)u + (u T 3 X)u (u T X)u = (u T i X) The sum of squares (X i X ) i ca be writte as the sum of 1 squares! i=

186 u T X We ow study the radom vector V = ( u T T 3 X ) = U X with the (-1) matrix U = ( u 3 ). u T T X u 8-8 Accordig to property the vector V has a multivariate ormal distributio with expectatio μ μ E(V) = U ( ) = 0 (ull vector) ad variace covariace matrix μ var(v) = Uvar(X)U T = U σ I U T = σ ( ) (u u 3 u ) = σ I 1. T u Because of property we coclude that the radom variables u T X, u 3 T X,, u T X are T u T u 3 idepedet (all covariaces cov(u i T X, u j T X) are zero for i j). u T (Re)defiig radom variables Z i = u T i X σ we fid that Z, Z 3,, Z are idepedet ad are all distributed accordig to the stadard ormal distributio. Usig the defiitio of the chi square distributio we coclude that i (X i X ) /σ = i= Z i has the χ 1 distributio. Notig that i (X i X ) /σ = ( 1)S σ, the distributio of the sample variace S ca be stated as follows: ( 1)S σ has the χ 1 distributio. Note that it ca be prove i a similar way that u 1 T X, u T X,, u T X are idepedet. Notig that X is a fuctio of u 1 T X ad that S is a fuctio of u T X, u T X,, u T X we coclude that X ad S are idepedet 8.3 Cosistecy of the sample variace S We shall show that S is also a cosistet estimator of σ, i case of idepedet radom variables X 1, X,, X havig all a (commo) N(μ, σ )-distributio. Therefore we eed the desity of the chi square distributio, sice ( 1)S σ has the χ 1 distributio. Without proof (we do t bother to prove it because of time restrictios) we preset the desity fuctio of a radom variable Z havig the chi square distributio with f degrees of freedom:

187 8-9 f 1 f(z) = z e z/ (Γ(f ) f where Γ(. ) is the well-kow gamma fuctio defied by Γ(α) = 0 x α 1 exp( x) dx ) ( 0 < α < ). Useful properties of the gamma distributio: Γ() = ( 1)!, for itegers ad Γ(α + 1) = αγ(α), for 0 < α < The χ f is a special case of the gamma distributio with α ad β, which has desity f(z) = z α 1 e z/β (Γ(α)β α ) The χ f distributio is a gamma distributio with parameters α = f ad β =. If Z has a gamma distributio the E(Z) = z z α 1 e z β (Γ(α)β α ) dz = z α e z β (Γ(α)β α ) dz 0 0 = β x α e x Γ(α)dx 0 (x = z β, dz = β dx) = β Γ(α + 1) Γ(α) = α β ( Γ(α + 1) = αγ(α) ) ad E(Z ) = z z α 1 e z β (Γ(α)β α ) dz = z α+1 e z β (Γ(α)β α ) 0 0 dz = β x α+1 e x Γ(α)dx 0 ( x = z β, dz = β dx) = β Γ(α + ) Γ(α) = β (α + 1) Γ(α + 1) Γ(α) = β (α + 1) α ad var(z) = E(Z ) (E(Z)) = α(α + 1)β α β = α β I the special case of the χ f distributio (α = f ad β = ) we get: E(Z) = α β = (f ) = f ad var(z) = α β = (f ) 4 = f

188 8-10 Sice ( 1)S σ has the χ 1 distributio we get E(( 1)S σ ) = 1 ad var(( 1)S σ ) = ( 1) E(S ) = σ ad var(s ) = σ 4 ( 1) Now we are able to show the cosistecy of S : P( S σ > c) var(s ) c (Chebyshev) = σ 4 (( 1)ε ) (usig formula for var(s ) ) Hece we coclude that lim P( S σ > c) = 0 for each positive umber c > 0. So we coclude that S is a cosistet estimator of σ. 8.4 About the distributio of T = (X µ) (S ) Let us ow prove property 3..5d. Suppose we observe agai idepedet radom variables X 1, X,, X havig all a (commo) N(μ, σ )-distributio. The commoly used estimators of µ ad σ are X ad S. Some results we proved i this chapter i this reader: (1) X ad S are idepedet, () X has the N(µ, σ /) distributio, (3) ( 1)S σ has the χ 1 distributio. Now we are ready to prove the distributio of T = (X µ) (S ). Defie Z = (X µ) (σ ) which has the stadard ormal distributio ( N(0,1) distributio ) ad defie V = ( 1)S σ the we get T = Z (S σ) = Z V ( 1) where we used V ( 1) = S σ = S σ. From this represetatio we coclude that T has t distributio with 1 degrees of freedom, see defiitio

189 Two samples problems: ormal case with commo variace Sometimes i a statistical aalysis two samples are ivolved. The classical assumptios are the followig. The measuremets X 1, X,, X 1 (first sample) ad Y 1, Y,, X (secod sample) are idepedet. The measuremets X 1, X,, X 1 have the N(μ 1, σ 1 ) distributio ad the measuremets Y 1, Y,, X have the N(μ, σ ) distributio. May times the equality of variaces, σ 1 = σ, is assumed. For checkig this assumptios ofte the ratio of the sample variaces is studied, where S X = i (X i X ) ( 1 1) ad S Y = i (Y i Y ) ( 1). The samples variaces S X ad S Y are idepedet because of the idepedece of the measuremets. Applyig the theory of the previous sectios we get S X U = ( 1 1)S X σ 1 ~ χ 1 1 ad V = ( 1)S Y σ ~ χ 1 S Y So S X = σ 1 U ( 1 1) ad S Y = σ V ( 1) ad thus If we write this equality as S X S Y = (σ 1 σ ) U ( 1 1) V ( 1) S X /σ 1 S Y /σ = U ( 1 1) V ( 1) The left-had side is the pivot we used to costruct a cofidece iterval for the proportio σ 1 σ i exercise 5.9. I case of equality of variaces, σ 1 σ = 1, the ratio sample variaces S X S Y satisfies the structure of a F distributio, see the defiitio

190 8-1 If σ 1 σ = 1 the S X S Y has the F-distributio with 1 1 ad 1 degrees of freedom. So if we are testig H 0 : σ 1 = σ the the test statistic F = S X S Y has ideed the F distributio with 1 1 ad 1 degrees of freedom, uder the ull hypothesis. From ow o we assume equality of variaces: σ 1 = σ. I chapter 5 cofidece itervals for μ 1 μ ad tests for testig H 0 : μ 1 μ = 0 are based o the followig fact. Property If X 1, X,, X 1 (first sample) ad Y 1, Y,, X (secod sample) are idepedet, the measuremets X 1, X,, X 1 have the N(μ 1, σ ) distributio ad the measuremets Y 1, Y,, X have the N(μ, σ ) distributio, the T = (X Y ) (μ 1 μ ) S ( has the t distributio with df = 1 + degrees of freedom, 1 ) where S is the pooled variace: S = S X S Y Proof of property 8.5.1: Because the sample meas are idepedet, X ~ N(μ 1, σ 1 ) ad Y ~ N(μ, σ ) we get X Y ~ N (μ 1 μ, σ ( ) ) ad furthermore Z = (X Y ) (μ 1 μ ) σ ( ) ~ N(0,1) Note furthermore: T = Z. So we have to study S = 1 1 S 1 + X + 1 S 1 + Y. S σ Applyig the results of sectio 8. we get U = ( 1 1)S X σ ~ χ 1 1 ad V = ( 1)S Y σ ~ χ 1, moreover U ad V are idepedet. It is straightforward to see that W = ( 1 + )S σ = U + V

191 8-13 has a χ -distributio with 1 + degrees of freedom. Notig that Z ad W are idepedet ad that T = Z W/( 1 + ), we coclude from defiitio that T has a t-distributio with df = 1 + degrees of freedom Testig H 0 : μ 1 = μ agaist H 1 : μ 1 μ Let us ow cosider testig H 0 : μ 1 = μ agaist H 1 : μ 1 μ, agai assumig idepedet radom variables X 1, X,, X 1, Y 1, Y,, X with X i ~ N(μ 1, σ ) ad Y j ~ N(μ, σ ). Accordig to chapter 5 the usual test is the (two-sided) two samples t-test, where we reject the ull hypothesis if T c or T c, with T = level of sigificace α. (X Y ) S ( ) ad where c is determied by the This t-test is equivalet to likelihood ratio test for this problem. I the remaider of this sectio we shall show this. Check this evaluatio oce i your lifetime. We do t ask for it i the writte examiatio. We shall evaluate the likelihood ratio test. We have to geeralize the likelihood ratio test a little as ow two samples are ivolved istead of oe. I the followig we write Λ shortly for Λ(x 1, x,, x 1, y 1, y,, y ). I the same way we write L 0 for the likelihood uder the ull hypothesis ad write L for the likelihood for the model without the restrictio of H 0. We the have Λ = sup μ, σ L 0 L sup μ 1, μ, σ where μ is the commo expectatio of the ull hypothesis. Let us start with likelihood L: L = 1 πσ exp ( 1 (x i μ 1 ) σ ) 1 πσ exp ( 1 (y i μ ) σ ) i = ( πσ ) exp ( 1 { (x i μ 1 ) + (y i μ ) } σ ) i ( πσ ) exp ( 1 { (x i x ) + (y i y ) } σ ) i i i i

192 ( πσ ) exp ( 1 { (x i x ) + (y i y ) } σ ) = ( πσ ) exp ( 1 ( 1 + )), i with σ = { i (x i x ) + i(y i y ) } ( 1 + ), where we used our kowledge of earlier similar problems. i Let us ow study L 0 : L 0 = 1 πσ exp ( 1 (x i μ) σ ) 1 πσ exp ( 1 (y i μ) σ ) i = ( πσ ) exp ( 1 { (x i μ) + (y i μ) } σ ) i ( πσ ) exp ( 1 { (x i μ ) + (y i μ ) } σ ) i ( ) exp ( 1 { (x i μ ) + (y i μ ) } σ 0 ) πσ = ( ) exp ( 1 ( 1 + )), πσ 0 with μ = ( i x i + i y i ) ( 1 + ) ad σ 0 = ( i (x i μ ) + i(y i μ ) ) ( 1 + ) i Note that these estimates are, as a matter of fact, the usual maximum likelihood estimates if you pool the two samples together. i i i i We thus get: Λ = sup μ, σ L 0 sup L = (σ σ 0 ) ( 1+ )/ μ 1, μ, σ Sice we reject H 0 for small values of Λ, we (equivaletly) reject H 0 for large values σ 0 σ. Thus we ca take as test statistic σ 0 σ = ( i (X i μ ) + i(y i μ ) ) ( i (X i X ) + i(y i Y ) ),

193 8-15 but we shall rewrite this expressio, i order to arrive at the usual test statistic. Note i (X i X ) + i(y i Y ) the pooled variace S = ( 1 + ) S, where we oly used the defiitio of Let us ow rewrite the umerator (X i μ ) + (Y i μ ). Because of we fid: i μ = X Y (X i μ ) + (Y i μ ) = (X i X + X μ ) + (Y i Y + Y μ ) i i where we used that (always) (X i X ) i Sice furthermore i i = (X i X ) + (Y i Y ) + 1 (X μ ) + (Y μ ) i i i +(X μ ) (X i X ) + (Y μ ) (Y i Y ) = ( 1 + ) S + 1 (X μ ) + (Y μ ), = 0 ad (Y i Y ) i = 0. 1 (X μ ) + (Y μ ) = 1 (X X 1 + Y ) + (Y X 1 + Y ) we fially get: σ 0 = 1 ( 1 + X 1 + Y ) + ( X Y ) = 1 ( 1 + ) (X Y ) + ( ) (X Y ) = ( 1 + ) (X Y ) = 1 ( 1 + ) ( 1 + ) (X Y ) = (X Y ) σ = {( 1 + ) S + 1 (X Y ) } {( ) S } = {( 1 + ) + 1 (X Y ) /S } ( ) = {( 1 + ) + T } ( 1 + ), i i

194 8-16 X Y where T = S ( ). Therefore the likelihood ratio test ca be carried out as follows: reject H 0 if T c or T c. The critical value c has to be chose to meet the level of sigificace α. So the two-sided two samples t-test is equivalet to the likelihood ratio test.

195 Type of variable + Probability Model Numerical variables: N(μ, σ ) Numerical variables: ot-ormal Categorical variables: B(, p) Categorical variables: Multiomial Overview parametric tests ad o-parametric samples 1 sample 1 sample id. samples paired samples ukow µ ad σ kow σ F-test o H 0 : σ 1 = σ idepedet samples t-test o μ 1 μ 1-sample t-test o differece χ -test o σ 1 sample t-test o μ 1 sample z-test o μ id. samples large samples z-test o μ 1 μ 1, 1 sample id. samples r c-cross table large Ay -table id. samples 1 sample σ 1 = σ large 1, large 1, E 5 E < 5 E 5 1 sample biomial test o p samples z-test o p 1 p Fisher s exact test χ -test o homogeeity χ -test o idepedece alteratives Wicoxo s Rak Sum Test Sig Test o the Media of the differeces Sig Test o the Media k categories E 5 χ -test o a specific distrib. Tests o specific distributios Fully specified distributio via k categories Shapiro Wilk' s test o ormality

Stadard ormal probabilities Tab-1 The table gives the distributio fuctio Φ for a N(0,1)-variable Z Φ z = P Z z = 1 z e x dx π Last colum: N(0,1)-desity fuctio (z i 1 dec.

196 Stadard ormal probabilities Tab-1 The table gives the distributio fuctio Φ for a N(0,1)-variable Z Φ z = P Z z = 1 z e x dx π Last colum: N(0,1)-desity fuctio (z i 1 dec.): φ z = 1 z e π Secod decimal of z z φ(z)

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet