GG313 GEOLOGICAL DATA ANALYSIS

Size: px

Start display at page:

Download "GG313 GEOLOGICAL DATA ANALYSIS"

Carol Quinn
5 years ago
Views:

1 GG313 GEOLOGICAL DATA ANALYSIS 1 Testig Hypothesis GG313 GEOLOGICAL DATA ANALYSIS LECTURE NOTES PAUL WESSEL SECTION TESTING OF HYPOTHESES Much of statistics is cocered with testig hypothesis agaist data usig several stadard techiques. At the core of these tests lies the cocept of the "ull hypothesis". A ull hypothesis is set up ad we use our tests to see if we ca reject the ull hypothesis, H 0. I other words, if we wat to test whether two rock samples have differet desities, we form the ull hypothesis that they have equal desities ad test if we ca reject H 0. We will illustrate all this with a example: It is claimed that the desity of a particular sadstoe is.35 gcm -3. We are haded a sample of 50 specimes from a outcrop i the same area ad decide to set the criteria that the samples are from aother lithological uit if the sample mea is less tha.5 or larger tha.45. This is a clear-cut criterio for acceptig or rejectig the claim that the samples are from the same uits, but it is ot ifallible. Sice our decisio will be based o a sample, there is the possibility that the sample mea may be <.5 or >.45 eve though the populatio mea is.35. We will therefore wat to kow what the chaces are that we make a wrog decisio. We will ivestigate what the probability is that x will be <.5 or x >.45 eve if =.35. Here, s (= ) = 0.4. This probability is give by the area uder the tails i Fig.-1. Accept claim reject claim reject claim Fig. -1. We reject the ull hypothesis whe the computed statistic falls i the tail area. Sice = 50 >>30 we will treat our sample as of ifiite size. The we have We ca ow evaluate the ormal scores s x = s = = 0.06

2 GG313 GEOLOGICAL DATA ANALYSIS z 0 = z 1 = = = We fid the area uder each tail to be erf Thus, the probability of gettig a sample mea that falls i the tail area of the distributio is p = 0475 =0.095 or 9.5%. This result meas there is a 9.5% chace we will erroeously reject the hypothesis that =.35 whe it is i fact true. We call this committig a type I error. Let us look at aother possibility, where our test will fail to detect that is ot equal to.35. Suppose for the sake of argumet that the true mea is.53. The, the probability of gettig a sample mea i the rage ad hece erroeously accept the claim that =.35 is give by the tail area i Fig. -. erroeously accept claim Reject claim.5.45 Fig. -. Possibility of committig a Type II error..53 As before, s x = 0.06 so the ormal scores become z 0 = = z 1 = = It follows that the area A = 0.5 erf erf 4.67 = 0.09 or 9.%. This is the risk we ru of acceptig the icorrect hypothesis =.35. We call this committig a type II error. We recogize that there are several possibilities whe testig the ull hypothesis. The table below summarizes the variatios: Accept H 0 Reject H 0 H 0 is TRUE Correct Decisio Type I Error H 0 is FALSE Type II Error Correct Decisio If the hypothesis is true, but is rejected, we have committed a Type I error, ad the probability of doig so is desigated. I our example, was If our hypothesis is icorrect, but we still accept it, the we have committed a Type II error, ad the probability of doig so is desigated. I our case, with =.53, was 0.09.

3 GG313 GEOLOGICAL DATA ANALYSIS 3 Sigificace test We saw i our example that the type II error probability depeded o the value of. Sice is ofte ot kow, it is commo to simply either reject H 0 or reserve judgmet (i.e., ever accept H 0 ). This way we avoid committig a type II error altogether, at the expese of ever acceptig H 0. We call this a sigificace test ad say that the results are statistically sigificat if we ca reject H 0. If ot, the results are ot statistically sigificat, ad we attempt o further decisios. Hece, i statistics we ca oly disprove hypotheses, but ever prove them... Differeces betwee meas We will ofte wat to kow if a observed differece i sample meas ca be attributed to chace. We will agai use Studet's t-test. It is assumed that the two distributios have the same variace but possibly differet meas. We are iterested i the distributio of x 1 x, the differece i sample meas. If the samples are idepedet ad radom, the differece distributio will be approximately ormal with mea 1 - ad stadard deviatio e = p (.1) where p is called the pooled variace: = p (.) We fid the t-statistic by evaluatig t = x 1 - x 1-1 s s (.3) ad test the hypothesis H 0 : 1 = based o the t-distributio for = degrees of freedom. For large 1,, the t-distributio becomes very close to a ormal distributio ad we may istead use z-statistics based o z = x 1 x s s We will illustrate the two-sample t-test with a example: We have obtaied radom samples of magetites from two separate outcrops. The measured magetizatios i Am kg -1 are Outcrop 1: {87.4, 93.4, 96.8, 86.1, 96.4} 1 = 5 Outcrop : {106., 10., 105.7, 93.4, 95.0, 97.0} = 6 We state our ull hypothesis H 0 : 1 = ; the alterative hypothesis is of course H 1 : 1. We decide to use 95% sigificace level, so = I this case, = = 9, ad a t- (.4)

4 GG313 GEOLOGICAL DATA ANALYSIS 4 statistics table (Appedix.4) shows that the critical t value is.6, ad we will reject H 0 is our t exceeds this critical value. From the data we fid Usig Eq. (.3) we obtai t = x 1 = 9.0 withs 1 =5.0 x = 99.9 withs = Sice t >.6 we must reject H 0. We coclude that the magetizatios at the two outcrops are ot the same. We have ow put cofidece limits o sample meas ad compared sample meas to ivestigate whether two populatios have differet meas. We will tur our attetio to ifereces about the stadard deviatio. Ifereces about the stadard deviatio The most popular way of estimatig is to compute the sample stadard deviatio. Whe ivestigatig properties of s ad we will be usig the "chi-square" statistic ( )s =.5 = 1 (.5) The distributio depeds o the degrees of freedom = - 1 ad is restricted to positive values because of the power of. It portrays how the sample stadard deviatio would be distributed if we selected radom samples of items. Fig. -3 shows a typical curve 1 - Fig. -3. A typical chi-square distributio. I the same way we used z ad t, we ow use as the value for which the area to the right of equals. Because the distributio is ot symmetrical, we must evaluate the / ad 1 - / critical values separately. Ad i the same way we put cofidece itervals o, we ow use (.5) to fid < ( 1)s < 1 or ( 1)s < < ( 1)s (.6) 1

5 GG313 GEOLOGICAL DATA ANALYSIS 5 which gives the simplified to cofidece iterval o the variace. For large samples ( > 30) this ca be 1 + z s < < s 1 z (.7) Note that the cofidece iterval is ot symmetrical about the sample stadard deviatio. Testig stadard deviatios We might wat to test whether our sample stadard deviatio s is equal to or differet from a give populatio. I such a case the ull hypothesis becomes H 0 : s = with the alterative hypothesis H 1 : s. As usual, we select our level of sigificace to be = Assume we have 15 estimates of temperatures with s = 1.3 C ad we wat to kow if s is ay differet from = 1.5 C based o past experiece. From = 0.05 ad = 14 we fid the critical values from a table to be 0.05 = ad = Based o our sample statistic we compute = = We see that we caot reject H 0 at the 95% sigificace level. Istead we may accept H 0 or reserve the judgmet. This was a two-sided test sice we must check that did ot lad i either of the two tails. For large samples 30, the does ot vary much with ad we may use the simpler statistic z = s ad use the stadard z-statistics table. Testig two stadard deviatios I the t-test for differeces betwee meas we assumed that the stadard deviatio of the two samples were the same. Ofte this is ot the case ad oe should first test whether this assumptio is valid. We wat to kow whether the two variaces are differet or ot. The statistic that is most appropriate for such tests is called the F-statistic, defied as s 1 s,s 1 > s F = s s1,s > s 1 For ormal distributios this variace ratio is a cotiuous distributio called the F distributio. It depeds o the two degrees of freedom 1 = 1-1 ad = - 1. As before, we will reject the ull hypothesis H 0 : 1 = at the level of sigificace ad [possibly] accept the alterative 1 whe our observed F statistic exceeds the critical value F /. Example: I our case of magetic magetizatios we assumed that the 's were approximately the same. Let us ow show that this is actually justified. We fid (.8)

6 GG313 GEOLOGICAL DATA ANALYSIS 6 F = = 1.1 From the table we fid F 0.05 ( 1 = 5, = 4) = Hece we caot reject H 0 ad coclude that the differece i sample stadard deviatios is ot statistically sigificat at the 95% level. The test The last parametric test we shall be cocered with is the chi-squared test. It is a samplebased statistic usig ormal scores that is squared ad summed up: = z i = i=1 i = 1 x If we draw all possible samples of size from a ormal populatio ad plotted Σz, they would form the distributio metioed earlier. The test is used to compare the shape of our data distributio to a distributio of kow shape (usually a ormal distributio). The test is most ofte used o data that have bee categorized or bied. Assumig that our observatios have bee bied ito k bis, the test statistics is foud as = i=1 (.9) ( O j E j ) (.10) where O j ad E j is the umber of observed ad expected values i the j'th bi. Note that this still is o-dimesioal sice we are usig couts, eve if the deomiator is ot squared. With couts, the probability that m out of couts will fall i a give bi j is determied by the biomial distributio, with ad Pluggig i for we fid = i= 1 x E j = E j = p j = p j ( 1 p j ) p j = E j = i =1 O j E j E j = i=1 ( O j E j ) E j As a example, cosider the 48 measuremets of saliity from Whitewater Bay i Florida (Table -1). We would like to kow if these observatios come from a ormal distributio or ot. The aswer might have implicatios for models of mixig salt ad freshwater. The first step is to ormalize the data ito ormal scores. We fid x = ad s = 9.7 thus trasfer all values to z i = x i We choose to bi the data ito 5 bis chose such that the area uder the curve for each bi is the same, i.e., 0.. Usig tables for the ormal distributio, we fid that the correspodig z-values for the itervals are (-, -0.84), (-0.84, -0.6), (-0.6, +0.6), (0.6, 0.84), (0.84, ). Coutig

7 GG313 GEOLOGICAL DATA ANALYSIS 7 the values i Table -1 we fid the observed umber of samples for each of the 5 bis are 10, 11, 10, 5, ad 1. These are O j 's. The expected values E j are all E j = k = 48 5 =9.6 Usig (.10) we fid the observed value = 3.04 The - distributio depeds o, the degrees of freedom, which ormally is = - 1 = 4 i our case. However, we used our observatios to compute x the s. This reduces by, leavig degrees of freedom. From Appedix.6 we fid the critical for = ad = 0.05 to be Sice this is much larger tha our computed value we coclude that we caot reject the ull hypothesis that the saliities were draw from a ormal distributio at the 95% sigificace level. We repeat that while we used a ormal distributio i this example, the E j could have represeted ay other distributio. Table -1 Stadardized scores of saliity measuremets from Whitewater Bay Number Origial Stadardized Number Origial Stadardized

8 GG313 GEOLOGICAL DATA ANALYSIS 8 No-parametric tests Last time we fiished up lookig at the stadard parametric tests, i.e., the t, F, ad tests. We justified usig these tests by either havig large samples ad ivoke the cetral limits theorem, or simply assumig that the distributio we have sampled is approximately ormal. Sometimes, however, oe of these coditios are met. The two cases are: Small samples ( < 30) ad you caot assume populatio is ormal Ay size sample of ordial data (which ca oly be raked, ot operated o umerically) I those cases we apply o-parametric methods which make o assumptios about the form of the data distributio. Ma - Whitey test This test is a o-parametric alterative to the two-sample Studet t-test. It also goes by the ames Wilcoxo test ad the U-test. The Ma-Whitey test is performed by combiig the two data sets we wat to compare, sort them ito ascedig order, ad assig each poit a rak: Smallest value is give rak = 1; the largest observatio is raked 1 +. Should some of the observatios be idetical, oe assigs the average rak to all these values. E.g. if the 7th ad 8th sorted values are idetical, we assig to each the rak 7.5. The idea here is that if the samples cosist of radom drawigs from the same populatio oe would expect the raks for both samples to be scattered more-or-less uiformly through the sequece. After arragig the data, we add up the raks for each data set ito rak sums which we deote W 1 ad W. The sum of W 1 + W must obviously equal the sum of the first ( 1 + ) itegers which is 1 ( + 1 ) ( ) May early rak sum tests were based o W 1 or W but ow it is customary to use the statistic U defied as or. U 1 = ( +1) W (.11) U = ( +1) W (.1) or simply U, the smallest of U 1 ad U. This statistic takes o values from 0 to 1 ad its samplig distributio is symmetrical about 1. The test the cosists of comparig the calculated U statistic to a critical U value give the sample sizes ad desired level of sigificace. Example : We wat to compare the grai size of sad obtaied from two differet locatios o the moo o the basis of measuremets of grai diameters i mm as follows. Locatio 1: 0.37, 0.70, 0.75, 0.30, 0.45, 0.16, 0.6, 0.73, = 9 Locatio : 0.86, 0.55, 0.80, 0.4, 0.97, 0.84, 0.4, 0.51, 0.9, 0.69 = 10

9 GG313 GEOLOGICAL DATA ANALYSIS 9 We do ot kow what distributio the grai sizes of sad o the moo follow so we choose the U-test to see if the mea grai size differ i the two samples. Computig the meas gives 0.49 ad If we wated to use the t-test we would have to assume that the uderlyig distributios are ormal. The U-test requires o such assumptios. We start by arragig the data joitly i ascedig order ad keep track of which sample each poit origiated from: Data Source Rak We first evaluate the rak sum for sample 1, givig W 1 = 69, from which it follows that 19 0 W = W 1 = =11 We ow form the ull hypothesis H 0 : 1 =, with H 1 : 1, ad state the level of sigificace = From a table with critical values for U we fid U (9, 10) = 0. We will reject the ull hypothesis if U is 0. From W 1 ad W we fid U 1 = = 66 U = = 4 ad hece U = mi(66, 4) = 4. This is larger tha the critical value of 0, suggestig we caot reject the ull hypothesis. I other words, the observed differece i grai size meas is ot statistically sigificat at the 95% sigificace level. For large samples ( 1, > 30) thigs agai simplify ad it ca be show that the mea ad stadard deviatio of the U 1 samplig distributio are U = 1 = 1 ( + 1 ) U 1 (.13) We could the form the z-score as z = U - ad use the familiar critical values u ± z u, or simply use the stadard t-test sice we have a large sample. Kolmogorov - Smirov Aother very useful o-parametric method is the Kolmogorov - Smirov (K-S) test. It is a test for goodess of fit or shape, ad is ofte used istead of the - test. A big advatage of the K-S test over the is that oe does ot have to bi the data, which is a arbitrary procedure ayway (how do you select bi size ad why?). I the K-S test we covert the data distributio to a cumulative distributio S(x). S(x) the gives the fractio of data poits to the "left" of x.

10 GG313 GEOLOGICAL DATA ANALYSIS 10 While differet data sets will geeral have differet distributios, all cumulative distributios agree at the smallest x (S(x) = 0) ad the largest x (S(x) = 1). Thus, it is the behavior betwee these poits that sets distributios apart. There is of course a ifiite umber of ways to measure the overall differece betwee two cumulative distributios: We could look at absolute value of the area betwee the curves, the mea square differece, etc. The K-S statistic is very simple: It cosists of the maximum value of the absolute differece betwee the two cumulative curves. Thus, comparig two cumulative distributios S 1 (x) ad S (x) oe K-S statistic becomes D = max S 1 < x < ( x i ) S x i ( ) (.14) Note that S may be aother or a give cumulative probability fuctio like the ormal distributio. The distributio of the K-S statistic itself ca be calculated uder the assumptio that S 1 ad S are draw from the same distributio, thus providig critical values for D. We will use the K-S test o the saliity measuremets we looked at previously. After computig the ormal scores, we plot the cumulative fuctio o the same graph as that of a ormal cumulative distributio. Ispectig the graph we fid the maximum absolute differece at z = 0.37, which correspods to the 53 ppt sample. The D estimate is = Based o a sigificace level of = 0.10 ad = 48, the critical K-S value is 0.17, much larger tha observed. Hece we caot reject the ull hypothesis that the samples were collected from a ormally distributed populatio. Tests of Correlatio Coefficiets There are both parametric ad o-parametric tests for the liear correlatio coefficiet r. We will look at both kids. Traditioal (Least-squares) Correlatio We recall that the covetioal correlatio coefficiet was defied by r = x xy y = i=1 i =1 ( x i x ) y i y ( x i x ) ( ) i =1 ( y i y ) (.15) Ofte, we eed to test if r is sigificat. I such tests, r is our sample-derived estimate of, the actual correlatio of the populatio. The most useful ull hypothesis is H 0 : = 0. It ca be show that the samplig distributio of r for a populatio that has zero correlatio ( = 0) has mea = 0 ad = 1 r. Hece, a t-statistic ca be calculated as t = r = r ( 1 r ) ( ) = r 1 r (.16) The degrees of freedom,, is -. Suppose we rolled a pair of dice, oe red ad oe gree (Table -). Usig (.15) we obtai r = 0.66 which seems quite high, especially sice there is o reaso to believe a correlatio should exist at all. Let us test to see if the correlatio is sigificat. Choosig = 0.05 we fid critical t / = Applyig (.16) gives the observed

11 GG313 GEOLOGICAL DATA ANALYSIS 11 t = 1.5, hece the correlatio of 0.66 is most likely caused by radom fluctuatios of small samples ad we caot reject H 0. Red (x) Gree (y) Table -. Examples of rollig a pair of dice. How high would r have to be for us to fid it sigificat ad commit a type I error by rejectig the (true) ull hypothesis? We must solve for r i t = r 1 r 3.18 = 3r 1 r r = ±0.88 So, if r equals or exceeds ±0.88 we would fid ourselves cocludig that red ad gree dice give correlated pairs of values... No-parametric Correlatio Fially, we will look at o-parametric correlatio called rak correlatio or Spearma's rak correlatio, deoted by r s. The rak correlatio is carried out by rakig the x i 's ad y i 's separately, the fidig the differece i rak d i betwee x i ad y i pairs, ad evaluate r s as r s =1-6 Σ d i - 1 (.17) I the case where the ull Hypothesis H 0 : o correlatio is true, the samplig distributio of r s has mea r s - 0 ad stadard deviatio 1 1. We ca therefore base our statistics o ad compare this z-value to critical z z = r s -0 = r 1 s -1 (.18) -1 values. Rakig the dice data gives Red (x) Rak x Gree (y) Rak y d Usid (.17) we fid r s = 0.65 (surprisigly similar to what we foud usig (.15)). The z- statistic from (.18) becomes z = 1.3, which is way iside the 95% cofidece limits for a ormal distributio (±). Hece, we agai arrive at the same coclusio that we caot reject H 0.

A statistical method to determine sample size to estimate characteristic value of soil parameters

A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig