Additional Notes and Computational Formulas CHAPTER 3

Size: px

Start display at page:

Download "Additional Notes and Computational Formulas CHAPTER 3"

Beatrix Small
5 years ago
Views:

1 Additioal Notes ad Computatioal Formulas APPENDIX CHAPTER 3 1 The Greek capital sigma is the mathematical sig for summatio If we have a sample of observatios say y 1 y 2 y 3 y their sum is y 1 + y 2 + y y This ca also be writte y i or y i which is read as the sum of the y i for i ruig from 1 to The computatioal formula for the sample mea which we shall deote yis y = 1 y i 2 Usig the same otatio the variace is s 2 = 1 1 y i y = 1 1 where d i = y i y This is the best formula to use o a computer that ca store all the y i i memory If a calculator is beig used it is usually more coveiet to use the mathematically equivalet formula s 2 = 1 1 y 2 i 1 d 2 i y i Basic Biostatistics for Geeticists ad Epidemiologists: A Practical Approach 2008 Joh Wiley & Sos Ltd ISBN: R Elsto ad W Johso

2 354 APPENDIX First y i is calculated squared ad divided by ; this gives the correctio factor 1 y 2 i The y2 i is calculated ad the correctio factor subtracted from it The result is divided by 1 3 The coefficiets of skewess ad kurtosis are respectively 1/2 y i y 3 [ y i y] 3/2 ad y i y 4 [ y i y] 2 CHAPTER 4 1 Bayes theorem ca be derived quite simply from the basic laws of probability We have PD j S = PD j ad S PS ad PS = PD 1 ad S ord 2 ad S or D k ad S = PD 1 ad S + PD 2 ad S + + PD k ad S because D 1 D 2 D k are mutually exclusive It follows immediately that PD j S = PD j ad S PD 1 ad S + PD 2 ad S + +PD k ad S 2 Data collected by a paterity testig laboratory ca be used to estimate the proportio of past cases i which the alleged father was the true father without ay kowledge of whether i each case the alleged father was or was ot the true father Briefly oe method of doig this is as follows From a large series of cases that have come to the laboratory we ca calculate the proportio of cases that resulted i a exclusio which we shall call PE Usig geetic theory ad a kowledge of the gee frequecies i the populatio we ca calculate PE D 2 the probability of exclusio give that a radom ma is the true father see for example MacCluer ad Schull 1963 If we assume oly two mutually exclusive

3 APPENDIX 355 possibilities exist D 1 the alleged father is the true father or D 2 a radom ma is the true father we must have PE = PD 1 ad E + PD 2 ad E = PD 1 PE D 1 + PD 2 PE D 2 But PE D 1 the probability of exclusio give that the alleged father is the true father is zero wheever the alleged father is the true father he will ot be excluded by the blood testig Therefore PE = PD 2 PE D 2 ad so we ca calculate PD 2 = PE/PE D 2 ad hece PD 1 = 1 PD 2 I situatios i which this prior probability of paterity has bee estimated it has bee foud to be typically betwee 065 ad 09 CHAPTER 6 1 The asymptotic properties of ubiasedess efficiecy ad ormality hold i geeral oly for maximum likelihood estimators if the likelihood is made a maximum i the mathematical sese ie whe the estimates are substituted for the parameters the likelihood has to be larger tha whe ay other eighborig values are substituted A likelihood just like a probability desity fuctio ca have oe mode several modes or o modes; each modal value is a particular maximum likelihood estimate of the parameter Thus maximum likelihood estimates may ot be uique or eed they eve exist for permissible values of the parameters A variace for example must be positive ad yet its maximum likelihood estimate may sometimes be egative I a situatio such as this the maximum likelihood estimate is sometimes said to be zero if at zero the likelihood is largest for all permissible values of the parameter I geetics the recombiatio fractio caot be greater tha 05 For this reaso the maximum likelihood estimate is ofte said to be 05 if the true maximum occurs at some value greater tha 05 Sice i these cases the likelihood is ot however at a mathematical maximum ie it is ot at a mode such estimators do ot possess the same good properties asymptotically that true maximum likelihood estimators possess 2 Eve if a maximum likelihood estimate is obtaied as a mathematical maximum of the likelihood there are still certai cases i which however large the sample it is still ot ubiased This occurs if as the sample becomes larger ad larger so does the umber of ukow parameters I such situatios the problem is ofte overcome by maximizig a so-called coditioal likelihood You are most

4 356 APPENDIX likely to see this term i coectio with matched-pair desigs You may also ecouter the term partial likelihood ; estimators that maximize this have the same usual asymptotic properties of maximum likelihood estimators 3 To show the equivalece betwee ad P 2 Y μ σ Y 2 2 = 095 P Y 2σ Y μ Y + 2σ Y = 095 first ote that withi each probability statemet there are two iequalities We ca maipulate these iequalities by the ordiary rules of algebra without chagig the probabilities as follows read the sig <=> as is equilvalet to : 2 Y μ σ Y 2 <=> 2 Y μ σ Y ad Y μ σ Y 2 <=> 2σ Y Y μ ad Y μ 2σ Y <=>μ 2σ Y Y ad Y μ + 2σ Y <=>μ Y + 2σ Y ad Y 2σ Y μ <=> Y 2σ Y μ Y + 2σ Y CHAPTER 7 1 The ull hypothesis must be very specific sice we eed to kow the distributio of the test criterio uder it If our research hypothesis is for example < 06 we wat to disprove the alterative 06 But to be specific we take as our ull hypothesis the specific value of π that is least favorable to our research hypothesis; amely = 06 Clearly if we reject the hypothesis = 06 i favor of <06 we must reject with eve greater covictio the possibility that >06 I other words wheever a iequality is ivolved i what we try to disprove we take as our specific ull hypothesis the equality that is closest to the research hypothesis; for if we disprove that particular equality eg = 06 we shall have automatically disproved the whole iequality eg 06

5 APPENDIX The mea of the statistic T for the rak sum test ca be derived as follows First ote that the average of N umbers 1 2 N is N + 1/2 Uder the ull hypothesis T is the sum of 1 radomly picked umbers from the set of umbers So puttig N = their average is /2; therefore the sum of 1 radom umbers from the set would be expected o a average to be /2 If T is less tha this we would suspect that the media of the first populatio is less tha that of the secod ad coversely if T is greater tha this It is ot so simple to derive the stadard deviatio of T 3 Notice that throughout this chapter we talk about testig the ull hypothesis hopig to disprove it i favor of the alterative hypothesis Sometimes we wat to show that two treatmets are equivalet hopig to disprove the hypothesis that they are differet We might wish to show for example that two differet drugs lead to the same chages i blood pressure 1 ad 2 However it is doubtful whether we would ever be i a situatio where the equality 1 = 2 holds exactly ad we would probably be more iterested i kowig whether 1 2 > where is the largest value of the mea differece that is ot cliically sigificat I this case we let the ull hypothesis be 1 2 = that is 1 2 = 0 ad the alterative is two-sided: 1 2 < 0 or 1 2 > 0 which is the same as 1 2 > CHAPTER 8 Let the two hypotheses be H 0 ad H 1 with prior probabilities PH 0 = PH 0 is true ad PH 1 = PH 0 is false respectively ad deote the data D Thus we write: PH 1 D = PH 1 PD H 1 PH 1 PD H 1 + PH 0 PD H 0 First ote that because we assume PH 0 + PH 1 = 1 the deomiator of the fractio o the right is equal to PD that is we have PH 1 D = PH 1PD H 1 PD Now divide both sides by PH 0 D ad we obtai: PH 1 D PH 0 D = PH 1PD H 1 PDPH 0 D

6 358 APPENDIX But the deomiator o the right is PDPH 0 D = PD H 0 = PH 0 PD H 0 so that we have PH 1 D PH 0 D = PH 1PD H 1 PH 0 PD H 0 that is the posterior odds = the prior odds the likelihood ratio 1 For ay quatity x 2 we have Thus CHAPTER 9 x 2 = 1 1 x x 2 y1 π 1 π 1 1 π 1 = 1 π y1 π 1 1 π 1 1 π 1 + π y1 π 1 1 π 1 1 π 1 y1 π 1 y1 π 1 = π π 1 Now substitute y 1 = y 2 ad 1 = 1 2 i the umerator of the secod term; we obtai [ y1 π 1 y2 1 π2 ] 2 y1 π 1 y2 + π 2 + = + π 1 1 π 1 π 1 π 2 y1 π 1 y2 π 2 = + π 1 π 2 2 There is aother mathematically idetical formula for calculatig the Pearso chi-square statistic from a 2 2 cotigecy table Suppose we write the table as follows: The formula is the a b a+ b c d c+ d a + c b+ d N= a + b + c + d ad bc N a + bc + da + cb + d

7 APPENDIX 359 Sometimes this formula is modified to iclude a so-called correctio for cotiuity that makes the resultig chi-square smaller Provided each expected value is at least 5 however there is o eed for this modificatio 3 Suppose we calculate the usual cotigecy table chi-square to test for idepedece betwee two respose traits whe we have matched pairs As i Chapter 9 suppose each pair cosists of a ma ad a woma matched for age ad the respose variables are the cholesterol level of the ma ad the cholesterol level of the woma i each pair The the usual cotigecy table chi-square would be relevat for aswerig the questio: is the cholesterol level of the woma i the pair idepedet of the cholesterol level of the ma i the pair? That is does age the matchig variable have a commo effect o the cholesterol level of both me ad wome? I other words i this type of study the usual cotigecy table chi-square tests whether the matchig was ecessary a sigificat result idicatig that it was a osigificat result idicatig it was ot CHAPTER 10 1 For a sample of pairs let x ad y be the sample meas Defie the quatities: SS x = SS y = SS xy = x i x = x 2 i 1 2 x i yi y 2 = y 2 i 1 2 y i x i x y i y = x i y i 1 x i y i Thus SS x is the total sum of the squared deviatios of the x i from the mea x ad SS y is the total sum of the squared deviatios of the y i from the mea yss xy is the aalogous total sum of the cross-products; ad the same cosideratios gover which formula to use for calculatig this as gover the calculatio of SS x ad SS y See ote 2 for Chapter 3 i this Appedix Usig these quatities we successively calculate the estimated regressio coefficiet b 1 ad itercept b 0 for the regressio lie ŷ i = b 0 + b 1 x i as follows: b 1 = SS xy SS x

8 360 APPENDIX b 0 = y SS xy SS x x = y b 1 x The sum of squares i Table 101 ca the be calculated as SS R = SS2 xy SS x = b 1 SS xy SS E = SS y SS2 xy SS x = SS y b 1 SS xy ad as explaied the mea squares are obtaied by dividig the sums of the squares by their respective degrees of freedom that is Note that the total sum of squares is MS R = SS R 1 = SS R MS E = SS E 2 SS T = SS R + SS E = SS y ie the total sum of squares of the respose variable Y about its mea The stadard error of b 1 is MS E /SS x 2 The estimated regressio lie of X o y is give by b 1 = SS xy SS y b 0 = x SS xy SS y y = x b 1 y The covariace betwee X ad Y is estimated by ad the correlatio by SS xy 1 SS xy SSx SS y

9 APPENDIX It is istructive to check i the tables of the t-distributio ad F-distributio that the square of the 975th percetile of Studet s t with k df is equal to the 95th percetile of F with 1 ad k df Similarly the square of the 95th percetile of t with k df is equal to the 90th percetile of F with 1 ad k df Just as χ 2 with 1 df is the square of a ormal radom variable so F with 1 ad k df is the square of a radom variable distributed as t with k df Thus articles you read i the literature may use either of these equivalet test statistics CHAPTER 11 1 It is possible to calculate the mea squares i Table 113 from the data preseted i Table 112 The mea square amog drug groups is 10 the umber of patiets i each group times the variace of the four group meas ie 10 times the variace of the set of four umbers ad 90 The mea square withi drug groups is the pooled withi-group variace Sice each group cotais the same umber of patiets this is the simple average of the four variaces There are several multiple compariso procedures that ca be used to test which pairs of a set of meas are sigificatly differet The simplest Fisher s least sigificat differece method is to perform Studet s t-test o all the pairwise comparisos but to require i additio before makig the pairwise comparisos a sigificat overall F-test for the equality of the meas beig cosidered The t-tests are performed usig the pooled estimate of 2 give by the error mea square of the aalysis of variace Aother procedure the Fisher Boferroi method similarly uses multiple t-tests but each test must reach sigificace at the /c level where c is the total umber of comparisos made to esure a overall sigificace level of Several procedures due to Tukey Newma ad Keuls ad Duca begi by comparig the largest mea with the smallest ad cotiue with the ext largest differece ad so o util either a osigificat result is ecoutered or util all pairwise comparisos have bee made At each step the differece is compared to a appropriate ull distributio A further method Scheffé s allows for testig more complex cotrasts such as H 0 : = 0 as well as all pairwise comparisos of meas Fially there is a procedure Duett s aimed specifically at idetifyig group meas that are sigificatly differet from the mea of a cotrol group

10 362 APPENDIX All these tests are aimed at maitaiig the overall sigificace level at some fixed value ie esurig that if the ull hypothesis is true the probability of oe or more sigificat differeces beig foud is 3 We ow give simple rules for writig dow formulas for sums of squares that apply i the case of ay balaced desig ie ay desig i which all groups of a particular type are the same size The rules are easy to apply if the degrees of freedom are kow for each source of variatio Cosider first the amog groups sums of squares i Table The degrees of freedom are a 1 2 Write dow a pair of paretheses ad a square superscript 2 for each term i the degrees of freedom: 3 I frot of each pair of paretheses write the appropriate sig correspodig to the term i the degrees of freedom ad a summatio sig for each letter i the degrees of freedom: a 4 Put summatio sigs iside the paretheses for all factors ad replicates that are ot summed over outside the paretheses followed by the symbol for a typical observatio: a a y ik k=1 y ik 5 Fially divide each term by the umber of observatios summed withi the paretheses: SS A = a y 2 k=1 ik k=1 a k=1 y ik Applyig this same sequece of rules we obtai the sum of squares withi groups for Table 111 otig that we multiply out the expressio for the degrees of freedom 1 a a 2 a 3 a k=1 a

11 APPENDIX 363 a 4 y ik a 2 y ik k=1 k=1 a a k=1 yik 5 SS R = y 2 k=1 ik 1 Similarly for the ested factor aalysis of variace i Table 114 we obtai SS A = SS B = SS R = a b a b j=1 a b j=1 k=1 y j=1 k=1 ijk b y 2 k=1 ijk a y 2 ijk a b j=1 b j=1 k=1 y ijk a b For the two-way aalysis i Table 116 we obtai SS A = a b a b j=1 j=1 k=1 y ijk b y 2 k=1 ijk ab y j=1 k=1 ijk b y 2 k=1 ijk a b y j=1 k=1 ijk ab SS B = SS AB = a SS R = b a j=1 a b j=1 a y 2 b k=1 ijk y j=1 k=1 ijk a ab y 2 a b k=1 ijk y j=1 k=1 ijk b a y 2 b k=1 ijk y j=1 k=1 ijk + a ab a b y 2 ijk y 2 j=1 k=1 ijk b a j=1 b j=1 k=1 REFERENCE MacCluer JW ad Schull WJ 1963 O the estimatio of the frequecy of opaterity America Joural of Huma Geetics 15:

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be